Professional Documents
Culture Documents
69
Prediction
• We do regression because we want to carry out predictions
▶ what is the predicted response variable for a given new value of the
covariates?
▶ with xnew = (x1,new , . . . , xp,new ) and response
70
Confidence and prediction intervals
71
Categorical Variables
▶ binary variables
▶ ordered or not
72
. . . Categorical Variables
E [Y |X = l1 ] =
E [Y |X = lk ] = , k = 2, . . . , g
or in short
Yi = β1 + βk + εi if level k applies
⇝ linearity always hold for such model
▶ intercept = baseline
74
. . . Categorical Variables
E [Y |Xi1 = lk , Xi2 = x2 ] = ,
k = 2, . . . , g ⇝ change of intercept
E [Y |Xi1 = lk , Xi2 = sh ] =
k = 2, . . . , g , h = 2, . . . , f
⇝ here l1 and f1 are the baseline levels for X1 and X2 respectively
76
. . . Categorical Variables
• Model is
• Result:
βbi SE t p-value
(Intercept) 4526.00 24470.00 0.1849 8.531 · 10−1
size 68.35 13.94 4.9040 3.916 · 10−6
tax 38.14 6.81 5.5960 2.158 · 10−7
beds -11260.00 9115.00 -1.2350 2.198 · 10−1
baths -2114.00 11470.00 -0.1844 8.541 · 10−1
new 41710.00 16890.000 2.4700 1.531 · 10−2
▶ R 2 = 0.793
▶ intercept, beds and baths not significant at 5% level (p-values > 0.05)
78
Inference for Multiple Coefficients
• Test a null hypothesis about a subset of the regression slopes (say the
first 1 ≤ q < p)
▶ H0 : β1 = . . . = βq = 0; vs
▶ H1 : at least one of β1 , . . . , βq is ̸= 0
• Note:
▶ let RSSk , RegSSk , Rk2 , k = 0, 1 the RSS, RegSS and R squared under
the two models
RSS0 + RegSS0 =
80
. . . Inference for Multiple Coefficients
81
. . . Model Building
• Example. Home prices: fit model with baths and beds removed:
R 2 = 0.79
βbi SE t p-value
(Intercept) -21350.00 13310.00 -1.604 1.12 · 10−1
size 61.70 12.50 4.937 3.34 · 10−6
tax 37.23 6.73 5.528 2.78 · 10−7
new 46370.00 16460.00 2.818 5.87 · 10−3
• H0 : β 3 = β 4 = 0
82
Variables Interaction
E [Y |X ] = β0 + β1 X1 + . . . + βp Xp
83
. . . Variables Interaction
84
. . . Variables Interaction
E [Y |X1 = x1 , X2 = x2 ] =
E [Y |X1 = lh , X2 = x2 ] = ,
k = 2, . . . , g
▶ both categorical, X1 with levels l1 , . . . , lg and X2 with levels s1 , . . . , sf :
E [Y |X1 = lk , X2 = sh ] = ,
k = 2, . . . , g , h = 2, . . . , f
85
. . . Variables Interaction
• Example. House prices: fit model with interaction between size and
new: R 2 = 0.8168
βbi SE t p-value
(Intercept) -365.70 13680.00 -0.0267 9.78 · 10−1
size 46.29 12.42 3.7260 3.30 · 10−4
tax 38.81 6.33 6.1290 2.00 · 10−8
new -106900.00 43650.00 -2.4490 1.61 · 10−2
new×size 69.43 18.50 3.7540 2.99 · 10−4
86
. . . Variables Interaction
87
. . . Variables Interaction
covariates R2 Re2
size, tax, beds, baths, new 0.793 0.782
size, tax, beds, new 0.793 0.785
size, tax, new 0.790 0.783
size, tax, new, size×new 0.817 0.809
88
Model Building
▶ quantitative
▶ categorical
▶ transformation of variables
▶ statistical significance
▶ simplicity v complexity
▶ model assumptions
89
. . . Model Building
• With k explanatory variables,
▶ 2k possible models arise from including/omitting each covariate
▶ different ways to select a model exist
• Backward search:
▶ fit model with all k covariates
▶ remove one variable at a time according to some criterion; stop when
the criterion no longer applies
• Forward search
▶ fit model with no covariates
▶ add one variable at a time according to some criterion; stop when the
criterion no longer applies
90
. . . Model Building
91