Professional Documents
Culture Documents
Question 6
Question 6
cost function is something we want to minimize. For example, our cost function might
be the sum of squared errors over the training set. Gradient descent is a method for
finding the minimum of a function of multiple variables. Suppose we have a function
with n variables, then the gradient is the length-n vector that defines the direction in
which the cost is increasing most rapidly. So in gradient descent, we follow the
negative of the gradient to the point where the cost is a minimum. In machine
learning, the cost function is a function to which we are applying the gradient descent
algorithm. Suppose we have a function y = f(x) . The derivative f’(x) gives the slope of
f(x) at point x. It specifies how to scale a small change in the input to obtain the
corresponding change in the output. Let’s say, f(x) = 1/2 x². We can reduce f(x) by
moving in small steps with the opposite sign of the derivative. When f’(x) = 0,the
derivative provides no information about which direction to move. Points where f’(x)
= 0 are known as critical points. The concept of convergence is a well defined
mathematical term. It means that “eventually” a sequence of elements gets closer
and closer to a single value. So what does it mean for an algorithm to converge?
Technically what converges is not the algorithm, but a value the algorithm is
manipulating or iterating. To illustrate this, let's say we are writing an algorithm that
prints all the digits of pi. As we can see, the algorithm prints increasing numbers close
to pi. We say our algorithm converges to pi. And we call such functions convex
functions (like a bowl shape)
fi
ti
ti
ti
ti
ti
ti
ti
ti
ti
ti
fi
ti
ti
ti
ti
ti
ti
ti
fi
ti
ti
ti
ti
Other Parameters that affect the algorithm –