Activation Functions in Neural Networks
Getting into Activation Functions
Activation functions are one of the most important unit in the neural networks. It determines the output of the node given the input & we have several choice given whether the node is in hidden layer or output layer depending on what type of problem we are interested in & nodes in input layer don’t have an activation function.
Eventually activation function does the work of deciding whether neuron is important of not by activating it. Input layer just let the input pass as it is and activation functions are mainly involved in hidden and output layer.
Our network can learn any non-linear relationship between input and output using appropriate activation functions.
But, Why do Neural networks need Activation Functions?
The purpose of an activation function is to add non-linearity to the neural-network. You must be thinking like if we will add an extra step each layer then it will increase the computation also but believe me and believe mathematics it will be worth it.
Let me tell you the reason behind this, let say if we have a neural network without activation function then it will perform linear transformation on the inputs using weights and biases. Here it does not matter how many hidden layers we will add as all of them will perform in the same way because composition of the two linear function is always linear.
This may seem like a simpler model but it will not be able to learn much in complex tasks although it may work decent in easy task.
Commonly Used Activation Functions
Step Function
- Step function is basic step function. It is also called as Staircase function. It is also called as floor function or greatest integer function.
- Step function f: R → R with respect to all real numbers x can be written as:
- If we know that n ≥ 0 then a are real number and the indicator of the function is X, it can be written as:
- Step function is an discontinuous function.
Used In: Hidden Layer, Output Layer for Classification
Signum (sgn or sign) function
- It is simply used to given the sign of the given value. It will help us to know whether output is on the positive side or negative side.
- If value is greater than 0 then it will give +1, if value is less than 0 then it will give -1 and if value is equal to 0 then it will also give 0 as an output.
Used In: Hidden Layer, Output Layer for Classification
Linear Function
- It is also termed as a no activation function or identity functions as all the input are just getting multiplied by 1.0 here.
- In this function output is same as input. Function do nothing, it just gives back whatever we give in.
Used In: Hidden Layer, Output Layer for Regression
Sigmoid Function
- It has a characteristic S-shaped curve. There are many common sigmoid function like logistic function, hyperbolic tangent, and arctangent.
- It mainly gained popularity because of it very high usage in the output layer of artificial neural network.
- It is very useful in Machine Learning work also if we need to convert numbers into probabilities. It is very important part of the logistic regression model.
Used In: Hidden Layer, Output Layer for Classification
Softmax Function
- This function does the work of conversion of the vector of numbers to the vector of probabilities and here probability of each value are proportional to the relative scale of each value in the vector.
- It is mainly used as activation function in the output layer of the neural network models that predict a multinomial probability distribution.
- It is used for the multi-class classification problems where the output can be more than 2 classes.
- It can also be used in hidden layer but it is very less common & very less beneficial also.
Used In: Output Layer for Multiclass Classification
TanH Function (Hyperbolic Tangent)
- It is very similar to sigmoid function and also has the same S-shaped curve with the difference in the output range of -1 to 1.
- In tanh if the input is large then output will be very close to +1.0 and if the input is smaller then the output will be near to -1.0
Used In: Hidden Layer, Output Layer for Classification
Rectified Linear Unit (ReLU) Function
- It is very important activation function which is very widely used in computer vision related works. In terms of example we can see that AlexNet which won the 2012 Imagenet challenge had used ReLU as the activation function in the hidden layers.
- It has a derivative function which allows the backpropagation while being computationally efficient.
- The biggest point to make here is that ReLU does not activate all the neuron at the same time. Neuron gets deactivated if they output less than 0 of the linear transformation.
- The rectified linear activation function is very simple in calculation as they either provide input directly or 0.0 as an output.
Used In: Hidden Layer, Output Layer for Regression (only positive output)
Leaky Rectified Linear Unit (LReLU) Function
- It is slightly modified version of ReLU function & it is also commonly used.
- Rather than completely giving 0.0 as an output for the negative input it slightly leaks the output.
Used In: Hidden Layer
Parametric Rectified Linear Unit (PReLU) Function
- It basically takes the idea of Leaky Rectified Linear Unit one step ahead by making the coefficient of leakage into a parameter that can be learned along with other parameters
- Only limitation of this function is that it may perform differently for different problems depending upon the value of slope parameter a.
Used In: Hidden Layer
With all this, let’s end this blog as this is it of Activation Functions in Neural Networks. If you will find any issue regarding concept or code, you can message me on my Twitter or LinkedIn. The next blog will be published on 25 March 2023.
Some words about me
I’m Mohit.❤️ You can also call me Chessman. I’m a Machine learning Developer and a competitive programmer. Most of my time is spent staring at a computer screen. During the day, I am usually programming, working to derive insight from large datasets. My skills include Data Analysis, Data Visualization, Machine learning, Deep Learning, DevOps and working toward Full Stack. I have developed a strong acumen for problem-solving, and I enjoy occasional challenges.