Professional Documents
Culture Documents
Regularization Paths
Trevor Hastie
Stanford University
Theme
0.32
0.30
0.28
0.26
0.24
Iterations
December 2005 Trevor Hastie, Stanford Statistics 4
4.0
3.5
GBM Stump
GMB Stump shrink 0.1
3.0
MSE (Squared Error Loss)
2.5
2.0
1.5
Number of Trees
December 2005 Trevor Hastie, Stanford Statistics 5
Linear Regression
1. Start with r = y, β1 , β2 , . . . βp = 0.
2. Find the predictor xj most correlated with r
3. Update βj ← βj + δj , where δj = · signr, xj
4. Set r ← r − δj · xj and repeat steps 2 and 3 many times
0.6
0.4
0.4
Coefficients
Coefficients
svi svi
lweight lweight
pgg45 pgg45
0.2
0.2
lbph lbph
0.0
0.0
gleason gleason
age age
-0.2
-0.2
lcp lcp
β2 ^
β
. β2 ^
β
.
β1 β1
December 2005 Trevor Hastie, Stanford Statistics 9
Diabetes Data
Lasso Stagewise
9 9
500
500
3 3
6 6
4 4
8 8
7 7
10 10
•• • • • •• • •• •• • • • •• • ••
0
0
39 4 7 2 105 8 61 1 39 4 7 2 105 8 16 1
β̂j
2 2
-500
-500
5 5
0 1000
P2000 3000 0 1000
P2000 3000
t= |β̂j | → t= |β̂j | →
December 2005 Trevor Hastie, Stanford Statistics 10
x2 x2
ȳ2
u2
µ̂0 ȳ1
µ̂1 x1
The LAR direction u2 at step 2 makes an equal angle with x1 and
x2 .
December 2005 Trevor Hastie, Stanford Statistics 13
−xj .
2
1
larged space.
1
0
0 20 40
L1 Norm (Standardized)
60 80
per unit arc-length in coeffi-
cients.
December 2005 Trevor Hastie, Stanford Statistics 16
LAR
0 2 3 4 5 7 8 10
df for LAR
9
500
6
• df are labeled at the
4
top of the figure
Standardized Coefficients
8
1 10
• At the point a com-
0
2
tive set, the df are in-
cremented by 1.
−500
|beta|/max|beta|
December 2005 Trevor Hastie, Stanford Statistics 18
Back to Boosting
Mixture data from ESL. Boosting with 4-node trees, gbm package in
R, shrinkage = 0.02, Adaboost loss.
0.1
0.28
0.0
0.27
Test Error
Margin
−0.1
0.26
−0.2
0.25
−0.3
0 2K 4K 6K 8K 10K 0 2K 4K 6K 8K 10K
2267
2267
0.6
0.6
Standardized Coefficients
Standardized Coefficients
1834
0.4
0.4
461
0.2
0.2
2945
2968 6801 2945
246
0.0
0.0
353
−0.2
4535
−0.2
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
|beta|/max|beta| |beta|/max|beta|
December 2005 Trevor Hastie, Stanford Statistics 22
Glmpath
* x1 ** x1
***
***
at the junctions where
1.5
1.5
***
***
***
***
***
***
***
*** the active set changes
1.0
1.0
***
* * x2 ** ** x2
** ***
** ***
** ***
0.5 ** *
Standardized coefficients
Standardized coefficients
**** ***
* **** **
* *** **
* x5 **** *** x5
*** **
*
*
****
*****
******
******
***
****
* * *
******************* * ** ** ** ** * * * * *
* *
* **
*
tor methods in convex
0.0
0.0
* * * * * * * ** * ** * *
* ** ** * *
* ** **
*
* ** * optimization
***
−0.5
−0.5
*** ** *
******
• glmpath package in R
**
** **
******
*********
* *
*** ***
* *** **
−1.0
−1.0
**********
*** **
*** ***
* ****
*
* **
* ***
* ***
** ****
−1.5
−1.5
* x3 x3
***
***
* x4 **** x4
0 5 10 15 0 5 10 15
lambda lambda
December 2005 Trevor Hastie, Stanford Statistics 25
*
10
* * ** * * **
*3 *9 * *
* * * ** * * * * ** *
** * * ** * ** ** * * ** * **
*
11 * * * *
* ** * * * *
* **
* * * *
* * ** *** * ** *** * * * ** *** * ** *** *
* * * * *
7
** * *** ** * * * * * ** * ** * *** ** * * * * * ** *
***** * ** ** * * * * * ***** * ** ** * * * * *
* * *
*** **** ***** * * *** * ** * * *
*** **** ***** * * *** * **
* * * ** * * * * * ** * *
* ** * * ** **** ** * ** * * * ** * * ** **** ** * ** * *
* *** ** **** * * *** ** **** *
* * * ** * * **
12
* * * *
* **** * ****** * * **** * ****** *
* * * ** * * * * ** *
* * *
* * * * *
* *
*6 * ** * * ** *
*5 *1 * **** ** ** * * **** ** ** *
** * ** *
*4 *8 ** * * * ** * * *
** **
* * * *
*2 * *
Step: 17 Error: 0 Elbow Size: 2 Loss: 0 Step: 623 Error: 13 Elbow Size: 54 Loss: 30.46 Step: 483 Error: 1 Elbow Size: 90 Loss: 1.01
December 2005 Trevor Hastie, Stanford Statistics 26
1e−01 1e+01 1e+03 1e−01 1e+01 1e+03 1e−01 1e+01 1e+03 1e−01 1e+01 1e+03
C = 1/λ
Concluding Comments