Professional Documents
Culture Documents
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
1. a) P4 0.0979 0.1054 0.1667 1 14
P5 0.0812 0.0860 0.1240 0.3090 1
P6 0.0806 0.0860 0.1218 0.3090 0.4142 1
Part-(ii):
Page 1 of 11
Part-(iii):
4-clusters:
P12: (1,2), (2,2)
P3: (4,5)
P4: (8,8)
P56: (9,10), (10,9)
Part-(iv):
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Answer:
Part-(i):
b) 14
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0860 0.0917 0.1351 1
P5 0.0724 0.0767 0.1041 0.3090 1
P6 0.0728 0.0767 0.1054 0.3090 0.4142 1
Part-(ii):
Page 2 of 11
Part-(iii):
4-clusters:
P12: (2,1), (2,2)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
c) 14
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1667 0.1907 1
P4 0.0812 0.0860 0.1351 1
P5 0.0692 0.0724 0.1041 0.3090 1
P6 0.0692 0.0728 0.1054 0.3090 0.4142 1
Page 3 of 11
Part-(ii):
Part-(iii):
4-clusters:
P12: (1,1), (2,1)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform complete-linkage agglomerative hierarchical clustering using the
d) 14
similarity matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Page 4 of 11
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0979 0.1054 0.1667 1
P5 0.0812 0.0860 0.1240 0.3090 1
P6 0.0806 0.0860 0.1218 0.3090 0.4142 1
Part-(ii):
Part-(iii):
4-clusters:
P12: (1,2), (2,2)
P3: (4,5)
P4: (8,8)
P56: (9,10), (10,9)
Part-(iv):
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0860 0.0917 0.1351 1
P5 0.0724 0.0767 0.1041 0.3090 1
P6 0.0728 0.0767 0.1054 0.3090 0.4142 1
Part-(ii):
Part-(iii):
4-clusters:
P12: (2,1), (2,2)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
Page 6 of 11
Consider the following training data samples:
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform complete-linkage agglomerative hierarchical clustering using the
similarity matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1667 0.1907 1
P4 0.0812 0.0860 0.1351 1
f) P5 0.0692 0.0724 0.1041 0.3090 1 14
P6 0.0692 0.0728 0.1054 0.3090 0.4142 1
Part-(ii):
Page 7 of 11
Part-(iii):
4-clusters:
P12: (1,1), (2,1)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
Fix the step size as 0.12 and the initial starting point as [1 1]T.
Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.
Answer:
𝑘 1 2 3 4
Page 8 of 11
Consider minimizing the following function using gradient descent optimizer:
Fix the step size as 0.13 and the initial starting point as [1 1]T.
Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.
Answer:
𝑘 1 2 3 4
Fix the step size as 0.12 and the initial starting point as [1 1]T.
Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.
Page 9 of 11
Answer:
𝑘 1 2 3 4 5 6
𝑘 1 2 3 4
Fix the step size as 0.13 and the initial starting point as [1 1]T.
Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.
Answer:
d) 14
Part-(i): Gradient descent
𝑘 1 2 3 4 5 6
𝑘 1 2 3 4
Page 10 of 11
Part-(iii): Momentum leads to faster convergence.
Consider you are working as Customer Relationship Manager in a company which sells
head phones and you have collected anonymous feedback from customers. Discuss
how to use clustering as a pre-processing step for supervised learning.
3. a) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible
Consider the task of detection of Parkinson’s disease and you are given with unlabelled
data. Discuss how to use clustering as a pre-processing step for supervised learning.
b) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible
Consider the task of identifying customers who will default their loan repayment. If
you are given with unlabelled data, discuss how to use clustering as a pre-processing
step for supervised learning.
c) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible
Page 11 of 11