Professional Documents
Culture Documents
Gradient
Descent
SSR
Study Guide!!!
© 2020 Joshua Starmer All Rights Reserved
A major part of Machine Learning is
The Problem optimizing a model’s fit to the data. Neural Networks optimize
the weights associated
For example, when doing with each line that
Logistic Regression, we connects nodes.
need to find the squiggly line
that fits the data the best.
NOTES:
optimal solution
and increasing
the number of
SSR values tested the
closer it gets to
the optimal
By eye, this looks like the solution.
minimum SSR, but another
intercept value might be better.
Intercept Value
© 2020 Joshua Starmer All Rights Reserved
Residuals are the difference between the
Residuals: Observed and Predicted values.
X
Weight
Intercept Value
…corresponds
to the teal line.
The goal is to find the intercept
value that results in the minimal
SSR, and that corresponds to
the lowest point in the curve.
SSR
BAM! d SSR
= 2 x Inside x -1 = -2 (Height - (intercept + 0.64 x Weight))
d intercept
© 2020 Joshua Starmer All Rights Reserved
Gradient Boost for One Parameter, Step-by-Step
d
Sum of Squared Residuals =
d intercept
Weight
d
Sum of Squared Residuals =
d intercept
-2(1.4 - (Intercept + 0.64 x 0.5))
+ -2(1.9 - (Intercept + 0.64 x 2.3))
+ -2(3.2 - (Intercept + 0.64 x 2.9))
d
Sum of Squared Residuals =
d intercept
-2(1.4 - (0 + 0.64 x 0.5))
+ -2(1.9 - (0 + 0.64 x 2.3))
+ -2(3.2 - (0 + 0.64 x 2.9))
Intercept Value
The Old Intercept is the value = 0 - (-0.57) = 0.57 The new value for
used to determine the current the Intercept,
slope. In this case, it is 0. 0.57, moves the
line up a little bit.
Step 4.2: Calculate the Step Size. NOTE: The Step Size is smaller
than before because the slope is
Step Size = Slope x Learning Rate not as steep as before. This
= -2.3 x 0.1 means we are getting closer to
= -0.23 the minimum value.
Step 4.3: Calculate the Step Size. NOTE: The Step Size is smaller
than before because the slope is
Step Size = Slope x Learning Rate not as steep as before. This
= -0.9 x 0.1 means we are getting closer to
= -0.09 the minimum value.
Weight
NOTES:
d SSR
= 2 x Inside x -1 = -2 (Height - (intercept + slope x Weight))
d intercept
d SSR
= 2 x Inside x -Weight
d slope
= 2 x (Height - (intercept + slope x Weight)) x -Weight
Height
d SSR = -2(1.4 - (Intercept + Slope x 0.5))
d intercept
+ -2(1.9 - (Intercept + Slope x 2.3))
+ -2(3.2 - (Intercept + Slope x 2.9))
Weight
d SSR d SSR
= -2(1.4 - (0 + 1 x 0.5)) = -2 x 0.5(1.4 - (0 + 1 x 0.5))
d intercept d slope
+ -2(1.9 - (0 + 1 x 2.3)) = -1.6 = -2 x 2.3(1.9 - (0 + 1 x 2.3)) = -0.8
+ -2(3.2 - (0 + 1 x 2.9)) = -2 x 2.9(3.2 - (0 + 1 x 2.9))
Calculate the
Step 4: Step Sizes.
Step SizeIntercept = Derivative x Learning Rate Step SizeSlope = Derivative x Learning Rate
= -1.6 x 0.01 = -0.8 x 0.01
= -0.016 = -0.008
New Intercept = Old Intercept - Step SizeIntercept New Slope = Old Slope - Step SizeSlope
The Sum of the Squared However, there are tons of Regardless of which Loss
Residuals is just one type other Loss Functions that Function you use, Gradient
of Loss Function. work with other types of data. Descent works the same way.
In Summary
Step 1: Take the derivative of the Loss Function for each parameter in
it. In fancy Machine Learning Lingo, take the Gradient of the Loss
Function.
Step 3: Plug the parameter values into the derivatives (ahem, the Gradient).
TRIPLE BAM!!!
© 2020 Joshua Starmer All Rights Reserved