Unit 3-5 Problem

1.
K Nearest Neighbour
- Calculate Distance and put Ranking, Shortest Distance first

Step 1: Prior Probability
Step 2: Conditional Probability
P(No/NewIns) have Higher Value 
It is
Classifies as NO
Navie Bayse- Numeric Classification
Navie Bayse- Text Classification
K Means
K-Means Clustering
Iris Dataset
Sepal length Sepal width Class label
5.1 3.5 Iris setosa
4.9 3.0 Iris setosa
4.7 3.2 Iris setosa
7.0 3.2 Iris versicolor
Initial Clusters:
Initially the mean for 2 clusters are taken from the above samples.
Cluster 1  Iris setosa
Cluster 2  Iris versicolor
Mean of cluster 1  (5.1,3.5) which is 1st sample.
Mean of cluster 2  (7.0,3.2) which is 2nd sample.
The formulae used to calculate mean is Euclidean distance:

Mean = √ ( y 2− y 1 )2 +( x 2−x 1)2
Iteration 1: step 1
Sample 1  (5.1,3.5)
Mean 1 √ ( 5.1−5.1 )2 +(3.5−3.5)2= 0
Mean 2 √ ( 5.1−7.0 )2 +( 3.5−3.2)2= 1.9235
Mean 1 is smaller, hence 1st sample belongs to cluster 1.
step2:
Sample 2 (4.9,3.0)
Mean 1 √ ( 4.9−5.1 )2+(3.0−3.5)2 = 0.5385
Mean 2 √ ( 4.9−7.0 )2 +(3.0−3.2)2 = 2.1095
Mean 1 is smaller, hence 2nd sample belongs to cluster 1.
step 3:
Sample 3 (4.7,3.2)
Mean 1 √ ( 4.7−5.1 )2+(3.2−3.5)2= 0.5
Mean 2 √ ( 4.7−7.0 )2 +(3.2−3.2)2 = 2.3
Mean 1 is smaller, hence 3rd sample belongs to cluster 1.
step 4:
Sample 4 (7.0,3.2)
Mean 1 √ ( 7.0−5.1 )2 +( 3.2−3.5)2= 1.9235
Mean 2 √ ( 7.0−7.0 )2 +(3.2−3.2)2= 0
Mean 2 is smaller, hence 4th sample belongs to cluster 2.
step 5:
Sample 5 (6.4,3.2)
Mean 1 √ ( 6.4−5.1 )2+(3.2−3.5)2= 1.3342
Mean 2 √ ( 6.4−7.0 )2 +(3.2−3.2)2= 0.6
step 6:
Sample 6 (6.9,3.1)
Mean 1 √ ( 6.9−5.1 )2 +(3.1−3.5)2= 1.8439
Mean 2 √ ( 6.9−7.0 )2+(3.1−3.2)2= 0.1414
Sepal length Sepal (5.1,3.5) (7.0,3.2) Cluster Class label

width
5.1 3.5 0 1.9235 C1 Iris setosa
4.9 3.0 0.5385 2.1095 C1 Iris setosa
4.7 3.2 0.5 2.3 C1 Iris setosa
7.0 3.2 1.9235 0 C2 Iris versicolor
6.4 3.2 1.3342 0.6 C2 Iris versicolor
Iteration 2:
Here, we need to calculate New Mean 1 and New mean 2 to replace with old Mean 1 and Mean 2.
New mean 1 Add all samples belongs to cluster 1 / No of samples.
{ (5.1+4.9+4.7)/3 , (3.5+3.0+3.2)/3 } = (4.9,3.23)
New mean 2 Add all samples belongs to cluster 2 / No of samples.
{ (7.0+6.4+6.9)/3 , (3.2+3.2+3.1)/3 } = (6.767,3.167)
Step 1:
Sample 1  (5.1,3.5)
New Mean 1 √ ( 5.1−4.9 )2+(3.5−3.23)2= 0.3360
New Mean 2 √ ( 5.1−6.767 )2 +(3.5−3.167)2= 1.6999
New Mean 1 is smaller, hence 1st sample belongs to cluster 1.
Step 2:
Sample2 (4.9,3.0)
New Mean 1 √ ( 4.9−4.9 )2 +(3.0−3.23)2= 0.23
New Mean 2 √ ( 4.9−6.767 )2 +(3.0−3.167)2= 1.8745
New Mean 1 is smaller, hence 2nd sample belongs to cluster 1.
Step 3:
Sample 3 (4.7,3.2)
New Mean 1 √ ( 4.7−4.9 )2+(3.2−3.23)2= 0.202
New Mean 2 √ ( 4.7−6.767 )2 +(3.2−3.167)2= 2.067
New Mean 1 is smaller, hence 3rd sample belongs to cluster 1.
Step 4:
Sample 4 (7.0,3.2)
New Mean 1 √ ( 7.0−4.9 )2 +(3.2−3.23)2= 2.1002
New Mean 2 √ ( 7.0−6.767 )2+(3.2−3.167)2= 0.2353
New Mean 2 is smaller, hence 4th sample belongs to cluster 2
Step 5:
Sample 5 (6.4,3.2)
New Mean 1 √ ( 6.4−4.9 )2 +(3.2−3.23)2= 1.5002
New Mean 2 √ ( 6.4−6.767 )2 +(3.2−3.167)2= 0.3684
New Mean 2 is smaller, hence 5th sample belongs to cluster 2.
Step 6:
Sample 6 (6.9,3.1)
New Mean 1 √ ( 6.9−4.9 )2 +(3.1−3.23)2 = 2.004
New Mean 2 √ ( 6.9−6.767 )2+(3.1−3.167)2= 0.1489
New Mean 2 is smaller, hence 6th sample belongs to cluster 2.
Sepal length Sepal (5.1,3.5) (7.0,3.2) Cluster Class label
width
5.1 3.5 0.3360 1.6999 C1 Iris setosa
4.9 3.0 0.23 0.18745 C1 Iris setosa

4.7 3.2 0.202 2.067 C1 Iris setosa

Both the Iterations gives same result. Hence, we stop the iterations.
In both the iterations first three samples fall under cluster1 (Iris-setosa) and second three samples falls under
cluster2 (Iris versicolor).
K Medoid
Genetic Algorithm
Q-Learning

Unit 3-5 Problem

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 3-5 Problem

Uploaded by

Copyright:

Available Formats

1.

- Calculate Distance and put Ranking, Shortest Distance first

Step 2: Conditional Probability

P(No/NewIns) have Higher Value 

The formulae used to calculate mean is Euclidean distance:

Sepal length Sepal (5.1,3.5) (7.0,3.2) Cluster Class label

4.9 3.0 0.23 0.18745 C1 Iris setosa

7.0 3.2 2.1002 0.2353 C2 Iris versicolor

You might also like