Professional Documents
Culture Documents
|,¥i,±
"" " "
"
#
n Actual R espouse
✓
✓
[
ei=Yi_f(Ni→ predicted Response
( residual ) error in
prediction
H i 1,2
Optimization goal is to
minimize Ei in
•
.
=
. . -
Error Least
• Mean Squared :
squares Objective
n
( ) @R) LC W) & e5
MSE W =
In i -
- l
for W =
[ Wo WIT , Hai W) = Wo t
wiki
,
i
'
.
LCW) =
In Eicyi -
Cwotw ai) ) ,
cis OLS : T . LC W) =
O
, gives w*= ( XTXIXTY
Gradient Descent : D . LC W) → o
Issue : →
We
•
optimize the
weight parameters only on the
training dataset .
"
"
??
• over
fitting
optimal Mqdoelpnfompeeaity
¥1
n
Pt
Optimal stopping
.
l
n
:÷÷ :*
i
i
" !
It
Under
←
fitting i
:
Training Error
i
#
Model
complexity ( OR)
YXCOR)
#epgfhs?
" "
solution :→
Regularization
"
"
It
•
helps overcoming over
fitting training dataset .
" "
•
keeps the
weight parameters regular ( ie keep .
w small
by
constraints )
??
why
→ A small
change in
input features make
large difference in the
target variable .
" "
Zero for features that not
•
weights are
very important .
"
"
RegularizationTechniques
Ci ) Early stopping
Pt
Optimal stopping
.
l
n
÷:÷÷
i .
i
" i
"
Under
! ←
fitting
Training Error
-
"
÷
# of Epochs) →
Cii ) Ridge Regularization @ R) L 2
Regularization
-
min MSE ( W )
; subject to 11W 11€ CZ ,
for some c >o
W
=
min In
Ei ( yi - ( wot wiki) ) ? subject to wot wise
--→
)→§EgfYe7gshtYd/
Trade off between
having
X (
> o
penalty hyper parameter
rn>
-
.
,
+
-
.
=
. .
, .
= In Eicyi -
Cwotwiai) ) +
X ( wit wit
HI
-
argwminf.tn HI}
-
•
w Ridges E' cyi-cwotw.am + X11
-
'
• W Ridge ( x Tx + XI ) XT y
penalty term
x i wf
-
c
c ago
-
c
" "
shrinks
•
Ridge Regression the
weight parameters by imposing a
penalty
their
on
size
-
• X =
O Least squares linear
regression solution
[ wit wit ]
*
i. e w
=
W =
• X =
A Solution is W
=
[O OTT
" "
If X is at perfect value
regularization helps to avoid under
• a
, fitting .
zero weights .
HD LASSO
l
Regularization @
l l
R) L I
Regularization
-
min MSE ( W )
; subject to 11W It ,
E C
,
for some c >o
W
argwmin # }
-
E' ( yi-cwotw.am
LASSO
•
W
= + all It ,
Compromise between
penalty and Loss
Wz w*= [w ,*wz*]T
Bada Haq
#
function
1st
component :
Loss function
minimize
2nd
component :
pmeinnifgize.ms#wtn-sso=cowi5 C WI
BB
-
c
-
If X
• is
big enough ,
the ellipse is
very likely to intersect the
towards
their
"
weights zero
"
.
Hence LASSO is biased
providing
solutions in
sparse general .
is than L 2
L I
optimization computationally pensive
- '
• -
more en