You are on page 1of 8

Neural Networks and

Cost Function
Andrew Jarrett
11/30/19
Review

  
Perceptions take several inputs and create one output. They use weights and
biases to decide the output.
 Sigmoid neuron 

 This allows the perception to have an number in-between zero and one.
  
The main objective is to create an algorithm which lets us find the weights
and biases so that the output from the network approximates y(x) for each of
the training examples x
 In the textbook example this y(x) is a 10 dimensional vector
 each dimension corresponds to a number output in this case the 1 means the
output is an 8
Cost Function

  The equation below is the quadratic cost function, or mean squared error

 C is a function of the weights and biases, n is the number of training inputs, a


is the vector of the actual outputs for each training input x, and it is the sum
of all the training inputs (x).
 C will approach O when y(x) is approximately equal to the output a for all the
training inputs
 This is used because the cost function is smooth unlike the function of the
number images correctly classified.
Gradient Descent

   is used to solve minimization problems, or finding the absolute minimum of


It
a function
 Visualize a ball rolling down a hill and comes to a complete stop at the
bottom
 The ball is changing by a small amount therefore the change of the ball is C
in the equation below
  
The previous equation allows us to choose a value of v as to make the C
negative
 Therefore

  is a small positive parameter or the learning rate it dictates how fast the
program will learn

 This equation shows how the “ball” is rolling down the hill
 Summary: Gradient descent works by repeatedly computing the gradient of
the cost function then to move it in the opposite direction.
Gradient Descent and Learning

  
Stochastic Gradient Descent

  A problem occurs when using the cost function

 When having to find the C we must find

 This is extremely time consuming if the number of training samples is large


 This would be helpful in our project due to large data source

You might also like