You are on page 1of 10

Price

Quality of Fit

Price

Price
Size Size Size

Underfinng Correct fit Overfinng


(high bias) (high variance)

OverfiHng:
•  The learned hypothesis may fit the training set very
well ( )
J(✓) ⇡ 0
•  ...but fails to generalize to new examples

Based on example by Andrew Ng 53


RegularizaIon
•  A method for automaIcally controlling the
complexity of the learned hypothesis
•  Idea: penalize for large values of ✓j
–  Can incorporate into the cost funcIon
–  Works well when we have a lot of features, each that
contributes a bit to predicIng the label

•  Can also address overfinng by eliminaIng features


(either manually or via model selecIon)

54
RegularizaIon
•  Linear regression objecIve funcIon
Xn ⇣ ⇣ ⌘ ⌘2 XXdd
1
J(✓) = h✓ x(i) y (i) + ✓✓j2j2
2n 2 j=1
i=1 j=1

model fit to data regularizaIon

–  is the regularizaIon parameter ( )


0
–  No regularizaIon on !
✓0

55
Understanding RegularizaIon
Xn ⇣ ⇣ ⌘ ⌘2 d
X
1
J(✓) = h✓ x(i) y (i) + ✓j2
2n i=1 2 j=1

X d
✓j2 = k✓1:d k22
•  Note that
j=1
–  This is the magnitude of the feature coefficient vector!

•  We can also think of this as:


Xd
(✓j 0)2 = k✓1:d ~0k22
j=1
•  L2 regularizaIon pulls coefficients toward 0
56
Understanding RegularizaIon
Xn ⇣ ⇣ ⌘ ⌘2 d
X
1
J(✓) = h✓ x(i) y (i) + ✓j2
2n i=1 2 j=1

•  What happens if we set to be huge (e.g., 1010)?


Price

Size

Based on example by Andrew Ng 57


Understanding RegularizaIon
Xn ⇣ ⇣ ⌘ ⌘2 d
X
1
J(✓) = h✓ x(i) y (i) + ✓j2
2n i=1 2 j=1

•  What happens if we set to be huge (e.g., 1010)?


Price

0 Size
0 0 0

Based on example by Andrew Ng 58


Regularized Linear Regression
•  Cost FuncIon
Xn ⇣ ⇣ ⌘ ⌘2 d
X
1
J(✓) = h✓ x(i) y (i) + ✓j2
2n i=1 2 j=1

•  Fit by solving min J(✓)


•  Gradient update:
n ⇣
X ⇣ ⌘ ⌘
@ 1 (i)
@✓0
J(✓) ✓ 0 ✓ 0 ↵ h ✓ x y (i)
n i=1
Xn ⇣ ⇣ ⌘ ⌘
@ 1 (i)
@✓j
J(✓) ✓j ✓j ↵ h✓ x(i) y (i) xj ↵ ✓j
n i=1
regularizaIon
59
Regularized Linear Regression
1 X ⇣ ⇣ (i) ⌘ ⌘2
n d
X
J(✓) = h✓ x y (i) + ✓j2
2n i=1 2 j=1
Xn ⇣ ⇣ ⌘ ⌘
1
✓0 ✓0 ↵ h✓ x(i) y (i)
n i=1
Xn ⇣ ⇣ ⌘ ⌘
1 (i)
✓j ✓j ↵ h✓ x(i) y (i) xj ↵ ✓j
n i=1

•  We can rewrite the gradient step as:


Xn ⇣ ⇣ ⌘ ⌘
1 (i)
✓j ✓j (1 ↵ ) ↵ h✓ x(i) y (i)
xj
n i=1

60
Regularized Linear Regression
•  To incorporate regularizaIon into the closed form
soluIon:
0 2 31 1
0 0 0 ... 0
B 6 0 1 0 . . . 0 7C
B 6 7C
B | 6 0 0 1 . . . 0 7C
✓ = BX X + 6 7C X | y
B 6 .. .. .. . . .. 7C
@ 4 . . . . . 5A
0 0 0 ... 1

61
Regularized Linear Regression
•  To incorporate regularizaIon into the closed form
soluIon:
0 2 31 1
0 0 0 ... 0
B 6 0 1 0 . . . 0 7C
B 6 7C
B | 6 0 0 1 . . . 0 7C
✓ = BX X + 6 7C X | y
B 6 .. .. .. . . .. 7C
@ 4 . . . . . 5A
0 0 0 ... 1
@
•  Can derive this the same way, by solving J(✓) = 0
@✓
•  Can prove that for λ > 0, inverse exists in the
equaIon above
62

You might also like