You are on page 1of 12

Loss functions, Back

Propagation, Gradient Descent


WHAT ARE LOSS FUNCTIONS?
• A loss function measures how good a neural network model is in
performing a certain task, which in most cases is regression or
classification.
• 3 Key Types of Loss Functions in Neural Networks
• Mean Squared Error Loss Function
• Cross-Entropy Loss Function
• Mean Absolute Percentage Error
• MEAN SQUARED ERROR LOSS FUNCTION
• Mean squared error (MSE) loss function is the sum of squared
differences between the entries in the prediction vector y and
the ground truth vector y^.
CROSS-ENTROPY LOSS FUNCTION
• Regression is only one of two areas where feedforward networks enjoy great popularity. The other
area is classification.
• In classification tasks, we deal with predictions of probabilities, which means the output of a neural
network must be in a range between zero and one.
• A loss function that can measure the error between a predicted probability and the label which
represents the actual class is called the cross-entropy loss function.
• Mean Absolute Percentage Error (MAPE)

• The mean absolute percentage error, also known as mean absolute percentage deviation (MAPD)
usually expresses accuracy as a percentage. We define it with the following equation:
What is Backpropagation?

• Backpropagation is the essence of neural network training.


• It is the method of fine-tuning the weights of a neural network based on the error rate obtained
in the previous epoch (i.e., iteration).
• Proper tuning of the weights allows you to reduce error rates and make the model reliable by
increasing its generalization.
• Backpropagation in neural network is a short form for “backward propagation of errors.”
• It is a standard method of training artificial neural networks.
• This method helps calculate the gradient of a loss function with respect to all the weights in the
network.
Steps
1. Inputs X, arrive through the preconnected path
2. Input is modeled using real weights W. The weights are usually randomly selected.
3. Calculate the output for every neuron from the input layer, to the hidden layers, to the output layer.
4. Calculate the error in the outputs
ErrorB= Actual Output – Desired Output
5. Travel back from the output layer to the hidden layer to adjust the weights such that the error is
decreased.
Keep repeating the process until the desired output is achieved
Example
• Most prominent advantages of
Backpropagation are:
• Backpropagation is fast, simple and easy to
program
• It has no parameters to tune apart from the
numbers of input
• It is a flexible method as it does not require
prior knowledge about the network
• It is a standard method that generally works
well
• It does not need any special mention of the
features of the function to be learned.
Introduction to Gradient Descent
• Gradient descent is an optimization algorithm that’s used when training a machine
learning model. It’s based on a convex function and tweaks its parameters iteratively to
minimize a given function to its local minimum.
• Gradient Descent is an optimization algorithm for finding a local minimum of a
differentiable function. Gradient descent in machine learning is simply used to find the
values of a function's parameters (coefficients) that minimize a cost function as far as
possible.
• "A gradient measures how much the output of a function changes if you change the inputs
a little bit." -Lex Fridman (MIT)
• A gradient simply measures the change in all weights with regard to the change in error.
Gradient Descent
• Gradient Descent is defined as one of the most commonly used iterative optimization algorithms of machine
learning to train the machine learning and deep learning models. It helps in finding the local minimum of a
function.
• The best way to define the local minimum or local maximum of a function using gradient descent is as follows:
• If we move towards a negative gradient or away from the gradient of the function at the current point, it will give the local
minimum of that function.
• Whenever we move
To achieve towards
this goal,a positive gradienttwo
it performs or towards
stepsthe gradient of the function at the current point, we will get the local
maximum of that function.
iteratively:
•Calculates the first-order derivative of the
function to compute the gradient or slope of that
function.
•Move away from the direction of the gradient,
which means slope increased from the current
point by alpha times, where Alpha is defined as
Learning Rate. It is a tuning parameter in the
optimization process which helps to decide the
length of the steps.
Cost-function andisLearning
• The cost function rate of difference or error
defined as the measurement
between actual values and expected values at the current position and
present in the form of a single real number. It helps to increase and
improve machine learning efficiency by providing feedback to this model so
that it can minimize error
• Learning Rate:
• It is defined as the step size taken to reach the minimum or lowest point.

You might also like