LINEAR MODELS
MACHINE LEARNING
FARES ALKHAWAJA 1
LINEAR AND AFFINE TRANSFORMATION
[ ] [ ]
𝑦1 𝑥1
𝑦= ⋮ 𝑥= ⋮
Linear model 𝑦𝑚 𝑥𝑛
[ ]
𝑎 11 ⋯ 𝑎1 𝑛
Linear model (affine) 𝐴= ⋮ ⋱ ⋮
𝑎𝑚 1 ⋯ 𝑎𝑚𝑛
[ ]
A linear model assumes that the outputs are a 𝑎 11 ⋯ 𝑎1 𝑛 𝑏 1
linear or affine transformation of the inputs . 𝐴= ⋮ ⋱ ⋮ ⋮
𝑎𝑚 1 ⋯ 𝑎𝑚𝑛 𝑏 𝑚
Most ‘linear models’ are actually affine, as above.
FARES ALKHAWAJA 2
LINEAR LEAST SQUARES
Regression model for a single output:
Assumes a linear function of inputs
Minimizes squared error w.r.t
is the slope
is the intercept
DoF is
.
Our aim is to minimize the diffrence
FARES ALKHAWAJA 3
Image credit: Krishnavedala - Own work, CC BY-
SA 3.0,
GRADIENT
DESCENT
Minimizing with gradients
Example
Closed form solutions do not
always exist
, is the step size (small number)
Taking small steps in the direction
of
FARES ALKHAWAJA 4
GRADIENT DESCENT (MULTIDIMENSIONAL)
Direct solve
Get closed form solutions
GD,
Let’s take the example
FARES ALKHAWAJA 5
STOCHASTIC GRADIENT DESCENT - EXAMPLE
Stochastic
Gradient
Descent
Momentum
FARES ALKHAWAJA 6
SPECIAL CASE OF NEURAL NETWORKS
What is the difference?
Solving linear least squares by SGD
a
Training a 1-1 linear neural network (NN) 𝑥 𝑦
b
using squared error loss
1
NN is higher dimensions
FARES ALKHAWAJA 7
LINEAR LEAST SQUARES (LS) SETUP AND LEARNING
Model OR Loss is
Inputs (“features”)
Parameters (“weights”)
=
Output (“prediction”)
Tune this very special kind of program to match that
data
Set gradient to zero, try to solve system of
Training:
equations
Training set:
Training loss:
FARES ALKHAWAJA 8
SCIKIT-LEARN LIBRARY
FARES ALKHAWAJA 9