You are on page 1of 7

Regularization

Prof Rohan Pillai


EE, DTU
Best Fit Criterion : Least
squares
ya

|,¥i,±
"" " "
"

#
n Actual R espouse


[
ei=Yi_f(Ni→ predicted Response
( residual ) error in
prediction
H i 1,2
Optimization goal is to
minimize Ei in

.
=
. . -

Error Least
• Mean Squared :
squares Objective
n

( ) @R) LC W) & e5
MSE W =
In i -
- l

for W =
[ Wo WIT , Hai W) = Wo t
wiki
,

i
'

.
LCW) =
In Eicyi -
Cwotw ai) ) ,

optimize the above for Wo and w


, using suitable techniques : →

cis OLS : T . LC W) =
O
, gives w*= ( XTXIXTY

Gradient Descent : D . LC W) → o
Issue : →

We

optimize the
weight parameters only on the
training dataset .

Machine should data



learning Models
generalize well on unseen
points .

"
"
??
• over
fitting
optimal Mqdoelpnfompeeaity

¥1
n

Pt
Optimal stopping
.

l
n

:÷÷ :*
i
i

" !
It
Under

fitting i
:
Training Error
i
#
Model
complexity ( OR)
YXCOR)
#epgfhs?
" "
solution :→
Regularization
"
"
It

helps overcoming over
fitting training dataset .

" "


keeps the
weight parameters regular ( ie keep .
w small
by
constraints )
??
why
→ A small
change in
input features make
large difference in the

target variable .

" "
Zero for features that not

weights are
very important .
"
"

RegularizationTechniques

Ci ) Early stopping Cii ) Ridge Regularization Iii ) LASSO


Regularization

Ci ) Early stopping

Pt
Optimal stopping
.

l
n

÷:÷÷
i .
i

" i
"
Under
! ←
fitting
Training Error
-
"
÷
# of Epochs) →
Cii ) Ridge Regularization @ R) L 2
Regularization
-

min MSE ( W )
; subject to 11W 11€ CZ ,
for some c >o
W

=
min In
Ei ( yi - ( wot wiki) ) ? subject to wot wise

--→
)→§EgfYe7gshtYd/
Trade off between
having
X (
> o
penalty hyper parameter
rn>

-
.

FC X) Ei ( yi Cwotw ai) ) X ( wot wi e)


Wo w
In
-

,
+
-
.
=
. .

, .

= In Eicyi -

Cwotwiai) ) +
X ( wit wit

HI
-

E' ( yi-cwotw.am all


= In +

argwminf.tn HI}
-


w Ridges E' cyi-cwotw.am + X11

-
'

• W Ridge ( x Tx + XI ) XT y

compromise between 1st


component :

penalty and Loss Loss function


function wz w* minimize
An air
. L,
2nd
component :
[WE WIT
minimize C
→ W Ridge

penalty term
x i wf
-

c
c ago
-
c
" "

shrinks

Ridge Regression the
weight parameters by imposing a
penalty
their
on
size
-

X controls the amount of shrinkage of X



.

Larger the value


, greater
of
the amount shrinkage and the
weight parameters are shrunk
towards
zero
( Under fitting )

• X =
O Least squares linear
regression solution

[ wit wit ]
*
i. e w
=
W =

• X =
A Solution is W
=
[O OTT

" "
If X is at perfect value
regularization helps to avoid under
• a
, fitting .

• If there are irrelevant features in the input ( ie . features that do not


"

affect the output ) ,


L -
2
Regularization will
give them small , but
"
non -

zero weights .
HD LASSO
l

Regularization @
l l

R) L I
Regularization
-

min MSE ( W )
; subject to 11W It ,
E C
,
for some c >o
W

min MSE ( W ) t X11 WH


,

argwmin # }
-

E' ( yi-cwotw.am
LASSO


W
= + all It ,

Compromise between
penalty and Loss
Wz w*= [w ,*wz*]T
Bada Haq
#
function
1st
component :

Loss function
minimize
2nd
component :

pmeinnifgize.ms#wtn-sso=cowi5 C WI
BB
-

c
-

If X
• is
big enough ,
the ellipse is
very likely to intersect the

diamond at one of the corners

If there irrelevant input features LASSO is


• are
" "
,
likely to make

towards
their
"
weights zero
"
.
Hence LASSO is biased
providing
solutions in
sparse general .

is than L 2
L I
optimization computationally pensive
- '

• -
more en

You might also like