You are on page 1of 10

Topic 3: Variable Selection

1. Consider a dataset with 10 predictors and 1 response variable. If we apply best subset selection
method, how many models to be fitted (including the intercept only model)?
A. 1024
B. 10
C. 20
D. 2048

Solution: 210 = 1024

2. Consider a dataset with 10 predictors and 1 response variable. If we apply best subset selection
method, how many models to be fitted for the case of 1 predictor?
A. 1
B. 10
C. 11
D. 1024

Solution: The possible case of 1 variable out of 10 i.e. 𝐶110 = 10

3. Consider a dataset with 10 predictors and 1 response variable. If we apply forward selection
method, how many models to be fitted (including the intercept only model)?
A. 1024
B. 1023
C. 56
D. 55

10(11)
Solution: 1 + 2
= 56
[4-6.] A dataset includes 1 response variable (Y) and 4 predictors (X1 to X4). All the possible models
are given below:

4. According to best subset selection method, which model should we pick for the case of 2
predictors?
A. (X1, X2)
B. (X1, X3)
C. (X3, X4)
D. (X2, X3)

Solution: Since (X1, X2) results in the maximum R-square among all models with 2 predictors.
5. According to best subset selection method, which model is the single best model if Adjusted R-
square is used as the selection criteria?
A. (X1, X2)
B. (X1, X2, X3)
C. (X1, X3, X4)
D. (X1, X2, X3, X4)

Solution:
Best model for 1 predictor: (X1)
Best model for 2 predictors: (X1, X2)
Best model for 3 predictors: (X1, X3, X4)
Best model for 4 predictors: (X1, X2, X3, X4)
While (X1, X2, X3, X4) results in the maximum Adj R-square.

6. If we apply forward selection method, report the selection sequence of the predictors (from the
first to the last one).
A. X1>X2>X3>X4
B. X2>X3>X1>X4
C. X3>X1>X2>X4
D. X4>X2>X1>X3

Solution:
Best model for 1 predictor: (X1)
Best model for 2 predictors holding X1: (X1, X2)
Best model for 3 predictors holding X1, X2: (X1, X2, X3)
Best model for 4 predictors holding X1, X2, X3: (X1, X2, X3, X4)
Topic 4: Classification
[7-8.] Consider the following confusion table after applying a classification method:

Real Y = 0 Real Y = 1
Predicted Y = 0 26946 1911
Predicted Y = 1 2249 1844

7. What is the accuracy?


A. 0.1263
B. 0.0770
C. 0.4911
D. 0.8737

26946+1844
Solution: 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 26946+1844+1911+2249 = 0.873748

8. Find the TPR (True Positive Rate) and FPR (False Positive Rate).
A. TPR = 0.5089 & FPR = 0.9230
B. TPR = 0.4911 & FPR = 0.0770
C. TPR = 0.9230 & FPR = 0.5089
D. TPR = 0.0770 & FPR = 0.4911

Solution:
1844
𝑇𝑃𝑅 = = 0.491079
1844 + 1911
2249
𝐹𝑃𝑅 = = 0.077033
26946 + 2249
9. Which of the following graphs is the ROC Chart of a random model?

A.

B.

C.
D. None of the above.

[10-12.]There is a logistic regression model that calculates the probability for real 𝑌 = 1 given predictors
𝑋1 :
1
𝑃(𝑌 = 1|𝑋1 ) =
1 + exp(−(−4.0777 + 1.5046𝑋1 ))

10. Which of the following is the Log-Odd form of the above model?
𝑝
A. log (1−𝑝) = 4.0777 − 1.5046𝑋1
𝑝
B. log (1−𝑝) = −4.0777 + 1.5046𝑋1
1
C. 𝑃(𝑌 = 1|𝑋1 ) = 1+exp⁡(−4.0777+1.5046𝑋
1)
D. None of the above.

Solution: We can observe that the conditional probability 𝑃(𝑌 = 1|𝑋1 ) is written as
1
𝑃(𝑌 = 1|𝑋1 ) = ))
1+exp(−(𝛽0 +𝛽1 𝑋1
, where 𝛽0 = −4.0777 and 𝛽1 = 1.5046. Therefore, the Log-
𝑝 𝑝
Odd form (log (1−𝑝) = 𝛽0 + 𝛽1 𝑋1) is log (1−𝑝) = −4.0777 + 1.5046𝑋1.
11. What is the predicted probability for real 𝑌 = 1 given predictors 𝑋1 = 5.5?
A. 0.9852
B. 0.0148
C. 0.8744
D. Cannot be determined.

Solution:
1
𝑃(𝑌 = 1|𝑋1 = 5.5) =
1 + exp(−(−4.0777 + 1.5046 ∗ 5.5))
1
𝑃(𝑌 = 1|𝑋1 = 5.5) = 1+0.015032 = 0.9852

12. What is the method for estimating the coefficient of this logistic regression model?
A. Least Square Estimation
B. Method of Moment
C. Maximum Likelihood Estimation
D. None of the above
Topic 5: Clustering
13. The standardized data below are produced by standardizing the original data (use sample standard
deviation here) below:

Original Data: Standardized Data:


Index 𝑋1 𝑋2 Index 𝑋1 𝑋2
1 1 3 1 -1.1209 1.0911
2 5 1 2 𝑎 -0.2182
3 4 0 3 0.3203 𝑏

Given that 𝑥̅ 1 = 3.3333, 𝑥̅ 2 = 1.3333, 𝑠1 = 2.0817, 𝑠2 = 1.5275. What are the values of 𝑎 and
𝑏 respectively?
A. 𝑎 = 0.8006, 𝑏 = −0.8729
B. 𝑎 = 1.1209, 𝑏 = −1.0911
C. 𝑎 = 3.3333, 𝑏 = −1.3333
D. 𝑎 = 2.8017, 𝑏 = −1.5275

Solution:
1 + 5 + 4 10 3+1+0 4
𝑥̅1 = = , 𝑥̅2 = =
3 3 3 3

10 2 10 2 10 2
√(1 − 3 ) + (5 − 3 ) + (4 − 3 ) 13
𝑠1 = =√
3−1 3

4 2 4 2 4 2
√ (3 − 3 ) + (1 − 3 ) + (0 − 3) 7
𝑠2 = =√
3−1 3

10 4
5− 3 0−3
𝑎= = 0.8006, 𝑏 = = −0.8729
√ 13 √ 7
3 3
14. The elbow plot of a k-means clustering model is provided below:

What is the optimal number of clusters?


A. 1
B. 2
C. 3
D. 4

Solution: From the elbow plot, we can see that the WSS decreases non-significantly after 𝑘 = 2.

15. A dataset with the labels {0, 1} clustered by a clustering method is given below:

Index 𝑋1 𝑋2 Label
1 1 -1 1
2 0 1 1
3 2 2 0

What is the coordinate of the centroid for label 1? ((1,0) means the value in 𝑋1 is 1 and the value
in 𝑋2 is 0.)
A. (1, 0)
1
B. (2 , 0)
2
C. (1, 3)
D. (3, 2)

1+0 −1+1 1
Solution: Coordinate: ( , ) = ( , 0)
2 2 2
Optional:
[16-18.]A dataset with the details of the last two observations only is provided below.

Point 𝑋1 𝑋2
⋮ ⋮ ⋮
E 6 3
F 4 0

When we build a hierarchical clustering model, we first construct a distance matrix. The distance
matrix is listed below. For example, from the below matrix, the distance between point A and
point B is 1.

Point A B C D E F
A 0
B 1 0
C 2 √5 0
D 5 √20 √13 0
E √26 5 √10 √5 0
F 5 √18 √17 √2 𝑎 0

Then, we can group A and B together as a new cluster as the distance is the minimum among all
the distances. The new distance matrix is listed below:

Point (A, B) C D E F
(A, B) 0
C 2 0
D √20 √13 0
E 5 √10 √5 0
F √18 √17 √2 𝑎 0

16. What is the value of 𝑎? (The answer will be used for the question 18)
A. √5
B. √13
C. √3
D. √29

Solution: Distance: √(6 − 4)2 + (3 − 0)2 = √13


17. Which linkage method is used?
A. Single linkage
B. Complete linkage
C. Average linkage
D. Cannot be determined

Solution: From the new distance matrix, the distance between (A, B) and C is 2, which is the
minimum of the distance between A and C (=2) and that between B and C (= √5).

18. Based on the new distance matrix, which two observations/clusters should be clustered together?
A. E and F
B. (A, B) and C
C. (A, B) and E
D. D and F

Solution: From the new distance matrix, the distance between D and F (√2) is the minimum.
They are grouped as a new cluster.

You might also like