You are on page 1of 2

Q 6- Describe the gradient descent algorithm.

How does the cost func on impact


the ability of the gradient descent algorithm to converge to the global minima? Are
there any other parameters of the algorithm that impact its ability to reach the
global minima?
Ans - Gradient descent is an itera ve op miza on algorithm for nding the local
minimum of a func on.
To nd the local minimum of a func on using gradient descent, we must take steps
propor onal to the nega ve of the gradient (move away from the gradient) of the
func on at the current point. If we take steps propor onal to the posi ve of the
gradient (moving towards the gradient), we will approach a local maximum of the
func on, and the procedure is called Gradient Ascent.
The goal of the gradient descent algorithm is to minimize the given func on (say cost
func on). To achieve this goal, it performs two steps itera vely:
1. Compute the gradient (slope), the rst order deriva ve of the func on at that
point
2. Make a step (move) in the direc on opposite to the gradient, opposite
direc on of slope increase from the current point by alpha mes the gradient
at that point

Impact of Cost Func on –

cost function is something we want to minimize. For example, our cost function might
be the sum of squared errors over the training set. Gradient descent is a method for
finding the minimum of a function of multiple variables. Suppose we have a function
with n variables, then the gradient is the length-n vector that defines the direction in
which the cost is increasing most rapidly. So in gradient descent, we follow the
negative of the gradient to the point where the cost is a minimum. In machine
learning, the cost function is a function to which we are applying the gradient descent
algorithm. Suppose we have a function y = f(x) . The derivative f’(x) gives the slope of
f(x) at point x. It specifies how to scale a small change in the input to obtain the
corresponding change in the output. Let’s say, f(x) = 1/2 x². We can reduce f(x) by
moving in small steps with the opposite sign of the derivative. When f’(x) = 0,the
derivative provides no information about which direction to move. Points where f’(x)
= 0 are known as critical points. The concept of convergence is a well defined
mathematical term. It means that “eventually” a sequence of elements gets closer
and closer to a single value. So what does it mean for an algorithm to converge?
Technically what converges is not the algorithm, but a value the algorithm is
manipulating or iterating. To illustrate this, let's say we are writing an algorithm that
prints all the digits of pi. As we can see, the algorithm prints increasing numbers close
to pi. We say our algorithm converges to pi. And we call such functions convex
functions (like a bowl shape)
fi
ti
ti
ti
ti

ti

ti
ti

ti

ti
ti
fi
ti
ti
ti
ti
ti

ti
ti

fi
ti
ti
ti
ti
Other Parameters that affect the algorithm –

We implement this formula by taking the derivative (the tangential line to a


function) of our cost function. The slope of the tangent line is the value of the
derivative at that point and it will give us a direction to move towards. We make
steps down the cost function in the direction with the steepest descent. The
size of each step is determined by the parameter α (alpha), which is called
the learning rate. To reach a local minimum efficiently, we have to set our
learning rate- parameter α appropriately, neither too high nor too low.
Depending on where the initial point starts on the graph, it could end up at
different points. Typically, the value of the learning rate is chosen manually,
starting with 0.1, 0.01, or 0.001 as the common values. In this case, gradient
descent is taking too long to calculate; we need to increase the learning rate.

You might also like