Professional Documents
Culture Documents
Halûk Gümüşkaya
Professor of Computer Engineering
web: http://www.gumuskaya.com
e-mail: haluk@gumuskaya.com, halukgumuskaya@aydin.edu.tr
: https://tr.linkedin.com/in/halukgumuskaya
: https://www.facebook.com/2haluk.gumuskaya
Linear Regression
The "model" takes one or more features as input and returns
one prediction ( y’ ) as output.
To simplify, consider a model that takes one feature and
returns one prediction:
Weight Initialization
For convex problems, weights can start anywhere (say, all 0s).
Convex: think of a bowl shape.
Just one minimum.
https://developers.google
.com/machine-
learning/crash-
course/reducing-
loss/gradient-descent
A Gradient Step
To determine the next point along the loss function curve, the
gradient descent algorithm adds some fraction of the gradient's
magnitude to the starting point as shown in the following figure:
For example:
If gradient magnitude = 2.5, and
Learning rate is 0.01,
Next point: 0.025 away from the previous point.
https://developers.google.com/machine-learning/crash-
course/fitter/graph#exercise-1
Haluk Gümüşkaya @ www.gumuskaya.com 23
As a result, steps
progressively increase in
size.
Overfitting
Overfittinghappens when a model learns both dependencies
among data and random fluctuations.
In other words, a model learns the existing data too well.
Complex models, which have many features or terms, are
often prone to overfitting.
When applied to known data, such models usually yield high
𝑅².
However, they often don’t generalize
well and have significantly
lower 𝑅² when used with new data.
Perfect fit: six points and the polynomial line of the degree 5 (or higher)
yield 𝑅² = 1.
Each actual response equals its corresponding prediction.
In some situations, this might be exactly what you’re looking for.
In many cases, however, this is an overfitted model. It is likely to have poor
behavior with unseen data, especially with the inputs larger than 50.
Haluk Gümüşkaya @ www.gumuskaya.com 38