CSE3506 - CAT-2 Answers: Q. No. Sub-Division Question Text Marks

CSE3506 – CAT-2 Answers
Q. Sub- Question Text Marks

No. division
Answer ALL THREE Questions Total Marks = 30
Consider the following training data samples:
(1,2), (2,2), (4,5), (8,8), (9,10), (10,9)
(i) Determine the similarity matrix for the given data samples such that the
maximum similarity (i.e. similarity between two identical data samples) is 1. For
distance computation, use Euclidean distance and round it off to 4 decimal places.
(ii) Perform single-linkage agglomerative hierarchical clustering using the similarity
matrix obtained in part-(i). Sketch the resulting dendogram.
(iii) Based on the results obtained in part-(ii), form 4 clusters. Your answer should
clearly indicate the data samples in each cluster.
(iv) Calculate the total within-cluster-sum-of-squares for the clusters obtained in part-
(iii).
Note: Mark weightage for Parts (i), (ii), (iii) and (iv) are 3, 7, 2 and 2, respectively.
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
1. a) P4 0.0979 0.1054 0.1667 1 14
P5 0.0812 0.0860 0.1240 0.3090 1
P6 0.0806 0.0860 0.1218 0.3090 0.4142 1
Part-(ii):
Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Iteration 5: All in one cluster - P123456
Page 1 of 11
Part-(iii):
4-clusters:
P12: (1,2), (2,2)
P3: (4,5)
P4: (8,8)
P56: (9,10), (10,9)
Part-(iv):
Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5
(2,1), (2,2), (5,4), (9,9), (10,11), (11,10)
(iii).
Answer:
Part-(i):
b) 14
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0860 0.0917 0.1351 1
P5 0.0724 0.0767 0.1041 0.3090 1
P6 0.0728 0.0767 0.1054 0.3090 0.4142 1
Part-(ii):

Page 2 of 11
Part-(iii):
4-clusters:
P12: (2,1), (2,2)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5
(1,1), (2,1), (5,4), (9,9), (10,11), (11,10)
(iii).
c) 14
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1667 0.1907 1
P4 0.0812 0.0860 0.1351 1
P5 0.0692 0.0724 0.1041 0.3090 1
P6 0.0692 0.0728 0.1054 0.3090 0.4142 1
Page 3 of 11
Part-(ii):

Part-(iii):
4-clusters:
P12: (1,1), (2,1)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5
(1,2), (2,2), (4,5), (8,8), (9,10), (10,9)
(ii) Perform complete-linkage agglomerative hierarchical clustering using the
d) 14
similarity matrix obtained in part-(i). Sketch the resulting dendogram.
(iii).
Page 4 of 11
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0979 0.1054 0.1667 1
P5 0.0812 0.0860 0.1240 0.3090 1
P6 0.0806 0.0860 0.1218 0.3090 0.4142 1
Part-(ii):

Part-(iii):
4-clusters:
P12: (1,2), (2,2)
P3: (4,5)
P4: (8,8)
P56: (9,10), (10,9)
Part-(iv):
Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5
(2,1), (2,2), (5,4), (9,9), (10,11), (11,10)

e) 14
Page 5 of 11
(iii).
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1907 0.2171 1
P4 0.0860 0.0917 0.1351 1
P5 0.0724 0.0767 0.1041 0.3090 1
P6 0.0728 0.0767 0.1054 0.3090 0.4142 1
Part-(ii):

Part-(iii):
4-clusters:
P12: (2,1), (2,2)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5
Page 6 of 11
(1,1), (2,1), (5,4), (9,9), (10,11), (11,10)
(iii).
Answer:
Part-(i):
P1 P2 P3 P4 P5 P6
P1 1
P2 0.5000 1
P3 0.1667 0.1907 1
P4 0.0812 0.0860 0.1351 1
f) P5 0.0692 0.0724 0.1041 0.3090 1 14
P6 0.0692 0.0728 0.1054 0.3090 0.4142 1
Part-(ii):

Page 7 of 11
Part-(iii):
4-clusters:
P12: (1,1), (2,1)
P3: (5,4)
P4: (9,9)
P56: (10,11), (11,10)
Part-(iv):
Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5
Consider minimizing the following function using gradient descent optimizer:
𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 4 𝑥1 − 2 𝑥2
Fix the step size as 0.12 and the initial starting point as [1 1]T.
(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.

(ii) Apply momentum optimizer in the place of gradient descent and evaluate 𝑓(𝑋𝑘 )
for 𝑘 in the range 1 to 4. Fix the momentum factor as 0.1. Ignore the momentum
term in the first iteration alone.
(iii) Compare the convergence of the two methods using the results obtained in part-
(i) and part-(ii).
Note: Mark weightage for Parts (i), (ii) and (iii) are 7, 5 and 2, respectively.
Answer:
2. a) Part-(i): Gradient descent 14

𝑘 1 2 3 4 5 6
0.2800 0.7696 0.6025 0.7294 0.6917 0.7250

𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.4400 0.1936 −0.1681 −0.0156 −0.1070 −0.0706
𝑓(𝑋𝑘 ) 0.4704 -0.9054 -1.2486 -1.3346 -1.3563 -1.3617
Part-(ii): Gradient descent with momentum
𝑘 1 2 3 4
0.2800 0.6976 0.6933 0.7005

𝑋𝑘 [ ] [ ] [ ] [ ]
−0.4400 0.0496 −0.0558 −0.0922
𝑓(𝑋𝑘 ) 0.4704 -1.2790 -1.3588 -1.3613
Part-(iii): Momentum leads to faster convergence.
Page 8 of 11
𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 4 𝑥1 − 2 𝑥2

(i) and part-(ii).
Answer:
b) Part-(i): Gradient descent 14

𝑘 1 2 3 4 5 6
0.2200 0.8596 0.5460 0.7863 0.6616 0.7524

𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.5600 0.3136 −0.2811 0.0604 −0.1670 −0.0339
𝑓(𝑋𝑘 ) 1.4604 -0.2789 -0.9464 -1.2030 -1.3018 -1.3398
𝑘 1 2 3 4
0.2200 0.7816 0.6662 0.7184

𝑋𝑘 [ ] [ ] [ ] [ ]
−0.5600 0.1576 −0.1220 −0.0778
𝑓(𝑋𝑘 ) 1.4604 -0.9920 -1.3400 -1.3630
𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 3 𝑥1 − 2 𝑥2

c) 14
(i) and part-(ii).
Page 9 of 11
Answer:
Part-(i): Gradient descent
𝑘 1 2 3 4 5 6
0.1600 0.6160 0.4119 0.5262 0.4769 0.5056

𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.4400 0.2512 −0.1059 0.0635 −0.0253 0.0162
𝑓(𝑋𝑘 ) 1.1632 -0.2776 -0.6333 -0.7212 -0.7429 -0.7482
𝑘 1 2 3 4
0.1600 0.5320 0.4947 0.4862

𝑋𝑘 [ ] [ ] [ ] [ ]
−0.4400 0.1072 0.0179 −0.0100
𝑓(𝑋𝑘 ) 1.1632 -0.6757 -0.7487 -0.7484
𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 3 𝑥1 − 2 𝑥2

(i) and part-(ii).
Answer:
d) 14
Part-(i): Gradient descent
𝑘 1 2 3 4 5 6
0.0900 0.7010 0.3460 0.5799 0.4418 0.5315

𝑋𝑘 [ ] [ ] [ ] [ ] [ ] [ ]
−0.5600 0.3812 −0.2189 0.1457 −0.0853 0.0559
𝑓(𝑋𝑘 ) 2.2407 0.4043 -0.3045 -0.5780 -0.6836 -0.7244
𝑘 1 2 3 4
0.0900 0.6100 0.4591 0.5000

𝑋𝑘 [ ] [ ] [ ] [ ]
−0.5600 0.2252 −0.0462 0.0080
𝑓(𝑋𝑘 ) 2.2407 -0.3610 -0.7267 -0.7497
Page 10 of 11
Consider you are working as Customer Relationship Manager in a company which sells
head phones and you have collected anonymous feedback from customers. Discuss
how to use clustering as a pre-processing step for supervised learning.
3. a) 2
Answer: Clustering will result in clusters/groups – labelling of clusters/groups can be
done based on domain knowledge - supervised learning is possible
Consider the task of detection of Parkinson’s disease and you are given with unlabelled
data. Discuss how to use clustering as a pre-processing step for supervised learning.
b) 2
Consider the task of identifying customers who will default their loan repayment. If
you are given with unlabelled data, discuss how to use clustering as a pre-processing
step for supervised learning.
c) 2
Page 11 of 11

CSE3506 - CAT-2 Answers: Q. No. Sub-Division Question Text Marks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE3506 - CAT-2 Answers: Q. No. Sub-Division Question Text Marks

Uploaded by

Copyright:

Available Formats

CSE3506 – CAT-2 Answers

Q. Sub- Question Text Marks

Consider the following training data samples:

(1,2), (2,2), (4,5), (8,8), (9,10), (10,9)

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(2,1), (2,2), (5,4), (9,9), (10,11), (11,10)

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(1,1), (2,1), (5,4), (9,9), (10,11), (11,10)

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(1,2), (2,2), (4,5), (8,8), (9,10), (10,9)

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider the following training data samples:

(2,1), (2,2), (5,4), (9,9), (10,11), (11,10)

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

(1,1), (2,1), (5,4), (9,9), (10,11), (11,10)

Iteration 1: Maximum similarity of 0.5 occurs between P1 and P2 leading to P12

Total WCSS: (0.5)2 + (0.5)2 + (0.5)2 + (0.5)2 +(0.5)2 + (0.5)2 = 1.5

Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 4 𝑥1 − 2 𝑥2

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.

2. a) Part-(i): Gradient descent 14

0.2800 0.7696 0.6025 0.7294 0.6917 0.7250

Part-(ii): Gradient descent with momentum

0.2800 0.6976 0.6933 0.7005

Part-(iii): Momentum leads to faster convergence.

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 4 𝑥1 − 2 𝑥2

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.

b) Part-(i): Gradient descent 14

0.2200 0.8596 0.5460 0.7863 0.6616 0.7524

Part-(ii): Gradient descent with momentum

0.2200 0.7816 0.6662 0.7184

Part-(iii): Momentum leads to faster convergence.

Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 3 𝑥1 − 2 𝑥2

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.

Part-(i): Gradient descent

0.1600 0.6160 0.4119 0.5262 0.4769 0.5056

Part-(ii): Gradient descent with momentum

0.1600 0.5320 0.4947 0.4862

Part-(iii): Momentum leads to faster convergence.

Consider minimizing the following function using gradient descent optimizer:

𝑓(𝑋) = 3 𝑥12 + 4 𝑥1 𝑥2 + 5 𝑥22 − 3 𝑥1 − 2 𝑥2

(i) Evaluate 𝑓(𝑋𝑘 ) for 𝑘 in the range 1 to 6.

0.0900 0.7010 0.3460 0.5799 0.4418 0.5315

Part-(ii): Gradient descent with momentum

0.0900 0.6100 0.4591 0.5000

You might also like