Professional Documents
Culture Documents
1. Consider fitting a linear regression model, from the data below. 𝑋1 , 𝑋2 are features and 𝑌 is
the response. The relevant matrices are given below:
y x1 x2
193 7.2 4.0
230 8.2 5.3
172 6.2 3.9
91 2.7 5.2
113 6.6 10.7
125 5.0 5.7
(𝑿𝑻 𝑿)−𝟏
𝑿𝑻 𝒀
924
5960.531
5055.35
Hat Matrix
0.358 0.350 0.299 -0.022 -0.087 0.103
0.350 0.445 0.238 -0.217 0.137 0.047
0.299 0.238 0.287 0.144 -0.120 0.153
-0.022 -0.217 0.144 0.752 0.000 0.343
-0.087 0.137 -0.120 0.000 0.938 0.133
0.103 0.047 0.153 0.343 0.133 0.220
1
a. With the above data, fit a linear regression model and estimate the coefficients. (5)
𝜷 = (𝑿𝑻 𝑿)−𝟏 𝑿𝑻 𝒚 =
× =
b. State the hypothesis and perform ANOVA on this regression model, and interpret the
result. (10)
𝑯𝟎 : 𝜷 𝟏 = 𝜷 𝟐 = 𝟎
𝑯𝟏 : 𝒂𝒕 𝒍𝒆𝒂𝒔𝒕 𝒐𝒏𝒆 𝜷𝒋 ≠ 𝟎
c. Perform hypothesis tests on the coefficients and interpret the importance of the
features. (10)
̂ 𝟐 𝑪𝒋𝒋
𝒔𝒆(𝜷𝒋 ) = √𝝈
2
̂ 𝟐 is estimated by MSE from the ANOVA table
𝝈
̂𝒋
𝑪𝒋𝒋 is the diagonal component of (𝑿𝑻 𝑿)−𝟏 matrix corresponding to 𝜷
𝛽 ̂
𝛽 24.16
for 𝛽1 : 𝑡0 1 = 𝑠𝑒(𝛽1 ) = = 7.31
1 √195.08∗0.056
𝛽 ̂
𝛽 −10.7
for 𝛽2 : 𝑡0 2 = 𝑠𝑒(𝛽2 ) = = −4.28
2 √195.08∗0.032
e. What is the average leverage of this model? From the leverage values, determine which
points have more influence on the model? (5)
f. Calculate Cook’s distance for the 5th observation. Interpret the effect of this
observation. (5)
Residual of 5th observation
𝑟5 = 𝑦5 − 𝑦̂5 = 125 − (71.33 + 24.16 ∗ 6.6 − 10.7 ∗ 10.7) = 113 − 116.61 = −3.61
𝑟52 ℎ55 (−3.61^2)∗0.938
Cook’s distance = 𝐷5 = = = 98.08
𝑝(1−ℎ55 ) 2∗(1−0.938)
2. Explain the relation between Bias-variance trade-off and model flexibility. (5)
3
3. Explain KNN method in the context of numerical response and categorical response.
You may use a diagrams to explain the working principle. (5)
Hint:
For ANOVA of Regression
𝟐 𝟐
(∑𝒏
𝒊=𝟏 𝒚𝒊 ) (∑𝒏
𝒊=𝟏 𝒚𝒊 )
̂ ′ 𝑿′ 𝒚 −
𝑺𝑺𝑹 = 𝜷 ; 𝑺𝑺𝑻 = 𝒚′𝒚 −
𝒏 𝒏