You are on page 1of 3

Lasso Regression

Talha Farooq
School of Natural Sciences
MS Statistics

National University of Sciences & Technology


Pakistan
1 Introduction
Lasso Regression is one of the best known technique in supervised learning.
Compared with classical linear regression, the advantage of lasso regression
is that it can shrink the coecient estimates towards zero.
Lasso regression is also based on linear regression and is a relatively recent
alternative to ridge regression. The lasso sum takes the form:

L(β, λ) = kY − Xβk2 + λ|β| (1)


Where λ ≥ 0 is a tuning parameter.The second term λ|β| is a shrinkage
penalty. The only dierence between ridge and lasso sums is that the β 2
term in the ridge penalty is replaced by the |β| term in the lasso penalty .
Like ridge regression, lasso regression shrinks the coecient estimates
towards zero. Unlike ridge regression, lasso penalty has the eect of forcing
some of the coecient estimates to be exactly equal to zero when the
tuning parameter is suciently large.
Lasso regression chooses β to minimize the lasso sum.The minimizers are
dened as the lasso regression coecient estimates.

1
The lasso estimate is dened as
p
X
β̂ = argminky − Xβk22 +λ |βj |
β∈R j=1

The tuning parameter λ controls the strength of the penalty, and we get
β̂ = the linear regression estimate when λ = 0, and β̂ = 0 when λ = ∞
For λ in between these two extremes, we are balancing two ideas:
tting a linear model of y on X , and shrinking the coecients. But the
nature of the l1 penalty causes some coecients to be shrunken to zero
exactly.
Dierentiating both sides of (1) with respect to β and then letting them be
equal to zero.
∂L ∂
= −2(y − Xβ) (y − Xβ) ± λ
∂β ∂β
T
= −2X (y − Xβ) ± λ

−2X T y + 2X T Xβ ± λ = 0
λ
−X T y + X T Xβ ± =0
2
Simply for β and consider XX T = I (Identity Matrix)
λ
β̂ = X T y ±
2

You might also like