Professional Documents
Culture Documents
Cost Function
Andrew Jarrett
11/30/19
Review
Perceptions take several inputs and create one output. They use weights and
biases to decide the output.
Sigmoid neuron
This allows the perception to have an number in-between zero and one.
The main objective is to create an algorithm which lets us find the weights
and biases so that the output from the network approximates y(x) for each of
the training examples x
In the textbook example this y(x) is a 10 dimensional vector
each dimension corresponds to a number output in this case the 1 means the
output is an 8
Cost Function
The equation below is the quadratic cost function, or mean squared error
is a small positive parameter or the learning rate it dictates how fast the
program will learn
This equation shows how the “ball” is rolling down the hill
Summary: Gradient descent works by repeatedly computing the gradient of
the cost function then to move it in the opposite direction.
Gradient Descent and Learning
Stochastic Gradient Descent