You are on page 1of 11

CSE3506 – CAT-2 Answers

Q. Sub- Question Text Marks


No. division
Answer ALL THREE Questions Total Marks = 30

Consider the following training data samples:

(1,2), (2,2), (4,5), (8,8), (9,10), (10,9)

(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).

Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.

Answer:

Part-(i):

P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
1. a) P4 0.0979 0.1054 0.1667 1 14
P5 0.0812 0.0860 0.1240 0.3090 1
P6 0.0806 0.0860 0.1218 0.3090 0.4142 1

Part-(ii):

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12


Iteration 2: Maximum similarity of 0.4142 occurs between P5 and P6 leading to P56
Iteration 3: Maximum similarity of 0.3090 occurs between P4 and P56 leading to P456
Iteration 4: Maximum similarity of 0.2171 occurs between P12 and P3 leading to P123
Iteration 5: All in one cluster - P123456

Page 1 of 11
Part-(iii):

4-clusters:
P12: (1,2), (2,2)
P3: (4,5)
P4: (8,8)
P56: (9,10), (10,9)

Part-(iv):

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(2,1), (2,2), (5,4), (9,9), (10,11), (11,10)

(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).

Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.

Answer:

Part-(i):
b) 14
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0860 0.0917 0.1351 1
P5 0.0724 0.0767 0.1041 0.3090 1
P6 0.0728 0.0767 0.1054 0.3090 0.4142 1

Part-(ii):

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12


Iteration 2: Maximum similarity of 0.4142 occurs between P5 and P6 leading to P56
Iteration 3: Maximum similarity of 0.3090 occurs between P4 and P56 leading to P456
Iteration 4: Maximum similarity of 0.2171 occurs between P12 and P3 leading to P123
Iteration 5: All in one cluster - P123456

Page 2 of 11
Part-(iii):

4-clusters:
P12: (2,1), (2,2)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)

Part-(iv):

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(1,1), (2,1), (5,4), (9,9), (10,11), (11,10)

(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
c) 14
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.

Answer:

Part-(i):

P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1667 0.1907 1
P4 0.0812 0.0860 0.1351 1
P5 0.0692 0.0724 0.1041 0.3090 1
P6 0.0692 0.0728 0.1054 0.3090 0.4142 1

Page 3 of 11
Part-(ii):

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12


Iteration 2: Maximum similarity of 0.4142 occurs between P5 and P6 leading to P56
Iteration 3: Maximum similarity of 0.3090 occurs between P4 and P56 leading to P456
Iteration 4: Maximum similarity of 0.1907 occurs between P12 and P3 leading to P123
Iteration 5: All in one cluster - P123456

Part-(iii):

4-clusters:
P12: (1,1), (2,1)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)

Part-(iv):

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(1,2), (2,2), (4,5), (8,8), (9,10), (10,9)

(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform complete-linkage agglomerative hierarchical clustering using the
d) 14
similarity matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).

Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.

Page 4 of 11
Answer:

Part-(i):

P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0979 0.1054 0.1667 1
P5 0.0812 0.0860 0.1240 0.3090 1
P6 0.0806 0.0860 0.1218 0.3090 0.4142 1

Part-(ii):

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12


Iteration 2: Maximum similarity of 0.4142 occurs between P5 and P6 leading to P56
Iteration 3: Maximum similarity of 0.3090 occurs between P4 and P56 leading to P456
Iteration 4: Maximum similarity of 0.1907 occurs between P12 and P3 leading to P123
Iteration 5: All in one cluster - P123456

Part-(iii):

4-clusters:
P12: (1,2), (2,2)
P3: (4,5)
P4: (8,8)
P56: (9,10), (10,9)

Part-(iv):

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(2,1), (2,2), (5,4), (9,9), (10,11), (11,10)


e) 14
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
Page 5 of 11
(ii) Perform complete-linkage agglomerative hierarchical clustering using the
similarity matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).

Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.

Answer:

Part-(i):

P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0860 0.0917 0.1351 1
P5 0.0724 0.0767 0.1041 0.3090 1
P6 0.0728 0.0767 0.1054 0.3090 0.4142 1

Part-(ii):

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12


Iteration 2: Maximum similarity of 0.4142 occurs between P5 and P6 leading to P56
Iteration 3: Maximum similarity of 0.3090 occurs between P4 and P56 leading to P456
Iteration 4: Maximum similarity of 0.1907 occurs between P12 and P3 leading to P123
Iteration 5: All in one cluster - P123456

Part-(iii):

4-clusters:
P12: (2,1), (2,2)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)

Part-(iv):

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Page 6 of 11
Consider the following training data samples:

(1,1), (2,1), (5,4), (9,9), (10,11), (11,10)

(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform complete-linkage agglomerative hierarchical clustering using the
similarity matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).

Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.

Answer:

Part-(i):

P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1667 0.1907 1
P4 0.0812 0.0860 0.1351 1
f) P5 0.0692 0.0724 0.1041 0.3090 1 14
P6 0.0692 0.0728 0.1054 0.3090 0.4142 1

Part-(ii):

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12


Iteration 2: Maximum similarity of 0.4142 occurs between P5 and P6 leading to P56
Iteration 3: Maximum similarity of 0.3090 occurs between P4 and P56 leading to P456
Iteration 4: Maximum similarity of 0.1667 occurs between P12 and P3 leading to P123
Iteration 5: All in one cluster - P123456

Page 7 of 11
Part-(iii):

4-clusters:
P12: (1,1), (2,1)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)

Part-(iv):

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 4 𝑥1 − 2 𝑥2

Fix the step size as 0.12 and the initial starting point as [1 1]T.

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.


(ii) Apply momentum optimizer in the place of gradient descent and evaluate 𝑓(𝑋𝑘 )
for 𝑘 in the range 1 to 4. Fix the momentum factor as 0.1. Ignore the momentum
term in the first iteration alone.
(iii) Compare the convergence of the two methods using the results obtained in part-
(i) and part-(ii).

Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.

Answer:

2. a) Part-(i): Gradient descent 14


𝑘 1 2 3 4 5 6

0.2800 0.7696 0.6025 0.7294 0.6917 0.7250


𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.4400 0.1936 −0.1681 −0.0156 −0.1070 −0.0706
𝑓(𝑋𝑘 ) 0.4704 -0.9054 -1.2486 -1.3346 -1.3563 -1.3617

Part-(ii): Gradient descent with momentum

𝑘 1 2 3 4

0.2800 0.6976 0.6933 0.7005


𝑋𝑘 [ ] [ ] [ ] [ ]
−0.4400 0.0496 −0.0558 −0.0922
𝑓(𝑋𝑘 ) 0.4704 -1.2790 -1.3588 -1.3613

Part-(iii): Momentum leads to faster convergence.

Page 8 of 11
Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 4 𝑥1 − 2 𝑥2

Fix the step size as 0.13 and the initial starting point as [1 1]T.

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.


(ii) Apply momentum optimizer in the place of gradient descent and evaluate 𝑓(𝑋𝑘 )
for 𝑘 in the range 1 to 4. Fix the momentum factor as 0.1. Ignore the momentum
term in the first iteration alone.
(iii) Compare the convergence of the two methods using the results obtained in part-
(i) and part-(ii).

Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.

Answer:

b) Part-(i): Gradient descent 14


𝑘 1 2 3 4 5 6

0.2200 0.8596 0.5460 0.7863 0.6616 0.7524


𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.5600 0.3136 −0.2811 0.0604 −0.1670 −0.0339
𝑓(𝑋𝑘 ) 1.4604 -0.2789 -0.9464 -1.2030 -1.3018 -1.3398

Part-(ii): Gradient descent with momentum

𝑘 1 2 3 4

0.2200 0.7816 0.6662 0.7184


𝑋𝑘 [ ] [ ] [ ] [ ]
−0.5600 0.1576 −0.1220 −0.0778
𝑓(𝑋𝑘 ) 1.4604 -0.9920 -1.3400 -1.3630

Part-(iii): Momentum leads to faster convergence.

Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 3 𝑥1 − 2 𝑥2

Fix the step size as 0.12 and the initial starting point as [1 1]T.

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.


c) 14
(ii) Apply momentum optimizer in the place of gradient descent and evaluate 𝑓(𝑋𝑘 )
for 𝑘 in the range 1 to 4. Fix the momentum factor as 0.1. Ignore the momentum
term in the first iteration alone.
(iii) Compare the convergence of the two methods using the results obtained in part-
(i) and part-(ii).

Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.

Page 9 of 11
Answer:

Part-(i): Gradient descent

𝑘 1 2 3 4 5 6

0.1600 0.6160 0.4119 0.5262 0.4769 0.5056


𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.4400 0.2512 −0.1059 0.0635 −0.0253 0.0162
𝑓(𝑋𝑘 ) 1.1632 -0.2776 -0.6333 -0.7212 -0.7429 -0.7482

Part-(ii): Gradient descent with momentum

𝑘 1 2 3 4

0.1600 0.5320 0.4947 0.4862


𝑋𝑘 [ ] [ ] [ ] [ ]
−0.4400 0.1072 0.0179 −0.0100
𝑓(𝑋𝑘 ) 1.1632 -0.6757 -0.7487 -0.7484

Part-(iii): Momentum leads to faster convergence.

Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 3 𝑥1 − 2 𝑥2

Fix the step size as 0.13 and the initial starting point as [1 1]T.

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.


(ii) Apply momentum optimizer in the place of gradient descent and evaluate 𝑓(𝑋𝑘 )
for 𝑘 in the range 1 to 4. Fix the momentum factor as 0.1. Ignore the momentum
term in the first iteration alone.
(iii) Compare the convergence of the two methods using the results obtained in part-
(i) and part-(ii).

Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.

Answer:
d) 14
Part-(i): Gradient descent

𝑘 1 2 3 4 5 6

0.0900 0.7010 0.3460 0.5799 0.4418 0.5315


𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.5600 0.3812 −0.2189 0.1457 −0.0853 0.0559
𝑓(𝑋𝑘 ) 2.2407 0.4043 -0.3045 -0.5780 -0.6836 -0.7244

Part-(ii): Gradient descent with momentum

𝑘 1 2 3 4

0.0900 0.6100 0.4591 0.5000


𝑋𝑘 [ ] [ ] [ ] [ ]
−0.5600 0.2252 −0.0462 0.0080
𝑓(𝑋𝑘 ) 2.2407 -0.3610 -0.7267 -0.7497

Page 10 of 11
Part-(iii): Momentum leads to faster convergence.

Consider you are working as Customer Relationship Manager in a company which sells
head phones and you have collected anonymous feedback from customers. Discuss
how to use clustering as a pre-processing step for supervised learning.
3. a) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible

Consider the task of detection of Parkinson’s disease and you are given with unlabelled
data. Discuss how to use clustering as a pre-processing step for supervised learning.
b) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible

Consider the task of identifying customers who will default their loan repayment. If
you are given with unlabelled data, discuss how to use clustering as a pre-processing
step for supervised learning.
c) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible

Page 11 of 11

You might also like