You are on page 1of 2

Lasso and Ridge Regression Scribe: Joseph Chong

Shrinking the size of the regression coecients increases the tolerance for mean squared error, while giving a simpler model. The Lasso and ridge regression techniques shrink the size of the regression coecients. Ridge Regression:
n k

ridge = argmin
i=1

(yi y i ) +
j =1 n

2 j

Equivalently, ridge = argmin


i=1

(yi y i )2

st
j =1

2 j =c

Lasso Regression:
n k

lasso = argmin
i=1

(yi y i )2 +
j =1 n

|j |

Equivalently, lasso = argmin


i=1

(yi y i )2

st
j =1

|j | = c

Solving the optimization problem min f (x, y )


x,y

st h(x, y ) = c is equivalent to looking for points along the contours of h where f is parallel to h. can be chosen using cross validation. Lasso prefers features to be unequal, while ridge prefers the features to be as equal as possible. Using lasso after using ridge is equivalent to simply using lasso. Using ridge after lasso: Will change the model given by lasso if lasso emphasizes some features over others. Will scale the model given by lasso if lasso gives features weighted equally. Consider the case where 2 features (x(1) , x(2) ) are known to matter (and not many more features beyond x(1) and x(2) matter). How can lasso be used so that x(1) and x(2) are preserved? Approach 1:
n k

= argmin
i=1

(yi y i )2 +
j =3 (1) (2)

|j |

Approach 2: i = (yi 1 xi 2 xi )2

i = 3 xi

(3)

+ + k x i

(k )

= argmin
i=1

(i i )2 +
j =3

|j |

If the columns for x(1) and x(2) are orthogonal to the other k 2 columns, then approach 1 and approach 2 give the same solution. The following approach equalizes 1 and 2 as much as possible, while throwing away as many of 3 . . . k as possible.
n k 2 2 (yi y i )2 + 1 (1 + 2 ) + 2 i=1 j =3

= argmin

|j |

You might also like