Professional Documents
Culture Documents
91
Problems with Linear Models
▶ outliers
▶ high-leverage points
▶ collinearity
92
. . . Problems with Linear Models
93
. . . Problems with Linear Models
Residuals vs Fitted
2e+05
6
1e+05 9
Residuals
0e+00
−1e+05
−2e+05
64
Fitted values
lm(Price ~ Taxes + Beds + Baths + New + Size)
94
. . . Problems with Linear Models
• Outliers
Scale−Location
64
2.0
6
9
1.5
Standardized residuals
1.0
0.5
0.0
Fitted values
lm(Price ~ Taxes + Beds + Baths + New + Size)
96
. . . Problems with Linear Models
Normal Q−Q
4
6
2 9
Standardized residuals
0
−2
−4
64
−2 −1 0 1 2
Theoretical Quantiles
lm(Price ~ Taxes + Beds + Baths + New + Size)
97
. . . Problems with Linear Models
• High-leverage points
• Collinearity
▶ misleading results
98
. . . Problems with Linear Models
Residuals vs Leverage
4
6 1
9
2 0.5
Standardized residuals
0
−2
0.5
1
−4
64
Cook's distance
Leverage
lm(Price ~ Taxes + Beds + Baths + New + Size)
99
Model Building
▶ quantitative
▶ categorical
▶ transformation of variables
▶ statistical significance
▶ simplicity v complexity
▶ model assumptions
100
. . . Model Building
101
Causality vs Association
102
. . . Causality vs Association
▶ the association between prestige and income stems from the common
prior education
103
. . . Model Building
• What we know:
104
. . . Model Building
for l = 1, . . . , p, fit the pl models with l covariates and select the best
2
105
. . . Model Building
106
. . . Model Building
• Backward search:
4 continue (3) until a stopping rule is reached (eg largest p-value is less
than a threshold)
107
. . . Model Building
• Forward search
2 fit p models with one covariate and choose one according to some
criterion (eg largest R 2 )
108
. . . Model Building
2 fit p models with one covariate and choose one according to some
criterion (eg smallest p-value, provided it is smaller than a threshold)
3 if the current model has s > 1 covariates, fit p − s models adding one
of the excluded p − s regressors; choose one according to some
criterion (eg smallest p-value, provided it is smaller than a threshold)
4 from the model in (3), remove one variable according to some criterion
(eg largest p-value, provided it is larger than a threshold)
5 repeat steps (3) and (4) until all possible additions and deletions are
performed.
109
. . . Model Building
▶ select tax
110
. . . Model Building
▶ select size
111
. . . Model Building
▶ select new
112
. . . Model Building
113
. . . Model Building
• Automatic selection
114
Cross-validation
▶ test set: check the prediction of the trained model(s) and choose one
115
. . . Cross-validation
• Cross-validation (CV):
• Most common criterion: predicted (or test) mean squared error: for
each test set
avg. (actual − predicted)2
then eg average across test sets
116
. . . Cross-validation
▶ remove observation i
(yi , xi1 , . . . , xip )
from sample
117
. . . Cross-validation
1 Pk
• Test MSE approximated by k l=1 MSEl
• Typical choices: k = 5, k = 10
118
. . . Cross-validation
covariates R2 2
Radj PRESS
size, tax, beds, baths, new 0.793 0.782 2.91
size, tax, beds, new 0.793 0.785 2.85
size, tax, new 0.790 0.783 2.67
119