Professional Documents
Culture Documents
Goal: Learn a model to predict the output for new test inputs
Let’s write the total error or “loss” of this model over the training
measures the data as
Goal of learning is to find
the that minimizes this loss
) prediction error or “loss” or
“deviation” of the model on
+ does well on test data Unlike models like KNN and DT, here we
a single training input
have an explicit problem-specific
objective (loss function) that we wish to
CS771: Intro to ML
3
Linear Regression: Pictorially
Linear regression is like fitting a line or (hyper)plane to a set of points [ ] 𝑧 1 = 𝜙 (𝑥 )
𝑧2
(Output )
curve or curved surface?
(Output )
Let us find the that optimizes (minimizes) the above squared loss
The “least squares” (LS)
We need calculus and optimization to do this! problem Gauss-Legendre, 18th
century)
The LS problem can be solved easily and has a closed form solutionClosed form
solutions to ML
= problems are
rare.
= ¿( 𝑿 ⊤ 𝑿 )− 1 𝑿 ⊤ 𝒚
matrix inversion – can be expensive.
Ways to handle this. Will see later CS771: Intro to ML
6
Proof: A bit of calculus/optim. (more on this later)
We wanted to find the minima of
Let us apply basic rule of calculus: Take first derivative of and set to zero
Chain rule of calculus
Partial derivative of dot product w.r.t each element of Result of this derivative is - same size as
Using the fact , we get
To separate to get a solution, we write the above as
𝑁 𝑁
∑ 2 𝒙 𝑛 ( 𝑦 𝑛 − 𝒙 𝒘 )=0
⊤
𝑛 ∑ 𝑦 𝑛 𝒙𝑛 − 𝒙𝑛 𝒙 ⊤
𝑛 𝒘 =0
𝑛=1 𝑛=1
= ¿( 𝑿 ⊤ 𝑿 )− 1 𝑿 ⊤ 𝒚
CS771: Intro to ML
7
Problem(s) with the Solution!
We minimized the objective w.r.t. and got
= ¿( 𝑿 ⊤ 𝑿 )− 1 𝑿 ⊤ 𝒚
= ¿( 𝑿 ⊤ 𝑿 + 𝜆 𝐼 𝐷 )− 1 𝑿 ⊤ 𝒚
CS771: Intro to ML
9
A closer look at regularization Remember – in general,
weights with large magnitude
are bad since they can cause
The regularized objective we minimized is overfitting on training data and
𝑁 may not work well on test data
𝐿𝑟𝑒𝑔 ( 𝒘 ) =∑ ( 𝑦 𝑛 − 𝒘 𝒙 𝑛 ) + 𝜆 𝒘 𝒘
⊤ 2 ⊤
𝑛=1
Use non-regularization based approaches All of these are very popular ways
to control overfitting in deep
Early-stopping (stopping training just when we have a decent val.learning
set accuracy)
models. More on these
Dropout (in each iteration, don’t update some of the weights) later when we talk about deep
learning
Injecting noise in the inputs CS771: Intro to ML
11
Linear Regression as Solving System of Linear Eqs
The form of the lin. reg. model is akin to a system of linear equation
Assuming training examples with features each, we have
First training example: 𝑦 1 =𝑥11 𝑤 1+ 𝑥 12 𝑤2 +…+ 𝑥 1 𝐷 𝑤 𝐷 Note: Here denotes the
feature of the training
Second training example: 𝑦 2 =𝑥21 𝑤1+ 𝑥 22 𝑤 2+ …+ 𝑥2 𝐷 𝑤 𝐷 example
equations and unknowns
here ()
N-th training example: 𝑦 𝑁 =𝑥 𝑁 1 𝑤1 +𝑥 𝑁 2 𝑤2 +…+ 𝑥 𝑁𝐷 𝑤 𝐷
CS771: Intro to ML