You are on page 1of 2

INDIAN INSTITUTE OF TECHNOLOGY, KHARAGPUR

Department of Industrial Engineering and Management


Class Test 5

Subject Number: IM31202 Subject Name: Statistical Learning with Applications


Full Marks: 30 Time: 1 hour Date: 13.04.2023

Instructions : 1. Attempt all questions.


2. Maximum marks are shown against each question.
3. Answers should be short and to the point.

1. Explain cross-entropy and gini-index. How are they used to measure performance of decision
trees? (5)

2. Determine the radial kernel values among the 3 points 𝑎: (10,2), 𝑏: (3,5) 𝑎𝑛𝑑 𝑐: (5,7). What
information is derived from the radial kernel values. Comment which points among 𝑎 and 𝑏
has greater influence on point 𝑐. Assume 𝛾 = 1 . (10)
𝑝
2
𝐾(𝑎, 𝑏) = exp (−𝛾 (∑(𝑎𝑗 − 𝑏𝑗 ) ) = exp(−1((10 − 3)2 + (2 − 5)2 ))
𝑗=1

K(a,b) 6.47023E-26
K(b,c) 0.000335463
K(a,c) 1.92875E-22

The radial kernel values show that points (a,b) and (a,c) are relatively far from one another and
do not have much influence on each other. (b,c) are relatively closer and may have some
influence on each other. This indicates that point a is relatively far away from b and c.
Point b has greater influence on point c.

3. Form 2 clusters based on the radial kernel values in problem 2. The clusters should be formed
based on similarity. Determine the complete and centroid linkages between these 2 clusters.
(10)

Based on the radial kernel similarity, the two points that are most similar are b and c. Hence
two clusters are point C1: {a}, C2: {b,c}

Euclidean distance
D(a,b) : 7.615

D(a,c) : 7.071

Complete linkage: max(𝐷(𝑎, 𝑏), 𝐷(𝑎, 𝑐)) = 7.615 (using Euclidean Distance measure)
Centroid of C1: (10,2)
Centroid of C2: (4,6)

Centroid linkage: Euclidean Distance between centroids: 7.211

1
4. The result below shows the first few PCs of mtcars (11 features) data with associated
standard deviation

PC1 PC2 PC3 PC4


2.5707 1.6 0.79196 0.51923

Determine the associated PVE and cumulative PVE of the PCs. (5)

###########################

You might also like