Lecture 05

What are Activation functions and what are it uses in a Neural Network Model? ‘it is also known as Transfer Function. It can also be attached in between two Neural Networks. + Activation functions are important for a Artificial Neural Network to learn and understand the complex patterns. + The main function of itis to introduce non-linear properties into the network. + What it does is, it calculates the ‘weighted sum’ and adds direction and decides whether to ‘fire’ a particular neuron or not + Their main purpose is to convert a input signal of a node in a A-NN to an output signal. That output signal now is used as a input in the next layer in the stack. *The non linear activation function will help the model to understand the complexity and give accurate results.The question arises that why can’t we do it without activating the input signal? * If we do not apply a Activation function then the output signal would simply be a simple finear function, * Ainear function is just a polynomial of one degree. Now, a linear equation is easy to solve but they are limited in their complexity and have less power to learn complex functional mappings from data. + A\Neural Network without Activation function would simply be a Linear regression Model, which hhas limited power and does not performs good most of the times, + Also without activation function our Neural network would not be able to learn and mode! other complicated kinds of data such as images, videos, audio , speech etc.Types + The Activation Functions can be basically divided into 2 types- * Linear Activation Function * Non-linear Activation FunctionsLinear or Identity Activation Function * As you can see the function is a line or linear. Therefore, the output of the functions will not be confined between any range. + Equation : f(x) =x * Range : (-infinity to infinity) * It doesn’t help with the complexity or various parameters of usual data that is fed to the neural networks.* Step Function: Step Function is one of the simplest kind of activation functions. In this, we consider a threshold value and if the value of net input say y is greater than the threshold then the neuron is activated + F(x)=1,if x>=0So why do we need Non-Linearities? * on-linear functions are those which have degree more than one and they have a curvature when we plot a Non-Linear function. Now we need a Neural Network Model to learn and represent almost anything and any arbitrary complex function which maps inputs to outputs. * Hence using a non linear Activation we are able to generate nonlinear mappings from inputs to outputs.Non-linear Activation Function * It makes it easy for the model to generalize or adapt with variety of data and to differentiate between the output.* Derivative or Differential: Change in y-axis w.rt. change in x-axis.It is also known as slope. * The function is differentiable.That means, we can find the slope of the sigmoid curve at any two points. + Monotonic function: A function which is either entirely non-increasing or non-decreasing.The Nonlinear Activation Functions are mainly divided on the basis of their range or curves- + Slgmold or Logistic Activation Function + The Sigmoid Function curve looks like a S-shape. *+ The main reason why we use sigmoid function is because it exists between (0 to 1). Therefore, it is especially used for models where we have to predict the probability as an output Since probability of anything exists only between the range of O and 4, sigmoid is the right choice. ‘The logistic sigmoid function can cause a neural network to get stuck at the training time, ‘The beauty of an exponent is that the value never sess reaches zero nor exceed 1 in the above equation. / ‘The large negative numbers are scaled towards 0 | and large positive numbers are scaled towards 1.Disadvantage of sigmoidal function + major reasons which have made it fall out of popularity - * Vanishing gradient problem * Secondly , its output isn’t zero centered. It makes the gradient updates go too far in different directions. 0 < output < 1, and it makes optimization harder. * Sigmoids saturate and kill gradients. + Sigmoids have slow convergence.Hyperbolic Tangent function- Tanh : It’s mathamatical formula is fbx) = 1 —exp(-2x) / 1+ expl-2x) [Now it’s output is 2ero centered because its range in between -1 to 1 Le-1 < output <1 Hence optimization Is easier inthis method hence in practice its always preferred over Sigmold function ‘But stilt suffers from Vanishing gradient problem, tangy BYPErbolle tangent functionReLu- Rectified Linear units + It was recently proved that it had 6 times improvement in convergence from Tanh function. + R(x) = max(Q,x) ie ifx <0, R(x) = Oand ifx>=0, Rix) =x, + Hence it avoids and rectifies vanishing gradient problem, * But its limitation is that it should only be used within Hidden layers of a Neural Network Model + Another problem with ReLu is that some gradients can be fragile during training and can die. It ‘can cause a weight update which will makes it never activate an any data point again. Simply saying that ReLu could result in Dead Neurons. |Leaky ReLU: Leaky ReLU function is nothing but an improved version of the ReLU function. Instead of defining the Relu function as 0 for x less than 0, we define it as a small linear component of x. It can be defined as: + Fx)zax xo + Else x + Iintroduces a small slope to keep the updates alive.

Lecture 05

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 05

Uploaded by

Copyright:

Available Formats

You might also like