You are on page 1of 11

❖ Practical No 04) Practical of Clustering.

➢ Example No 01):
> iris
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
| | | | | |
| | | | | |
| | | | | |
| | | | | |
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
> iriscopy=iris
> iriscopy
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
| | | | | |
| | | | | |
| | | | | |
| | | | | |
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica
> iriscopy$Species<-NULL
> iriscopy
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
4 4.6 3.1 1.5 0.2
| | | | | |
| | | | | |
| | | | | |
| | | | | |
147 6.3 2.5 5.0 1.9
148 6.5 3.0 5.2 2.0
149 6.2 3.4 5.4 2.3
150 5.9 3.0 5.1 1.8
> Result<-kmeans(iriscopy,3)
> Result
K-means clustering with 3 clusters of sizes 38, 50, 62

Cluster means:
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 6.850000 3.073684 5.742105 2.071053
2 5.006000 3.428000 1.462000 0.246000
3 5.901613 2.748387 4.393548 1.433871

Clustering vector:
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[33] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3
[65] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[97] 3 3 3 3 1 3 1 1 1 1 3 1 1 1 1 1 1 3 3 1 1 1 1 3 1 3 1 3 1 1 3 3
[129] 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3
Within cluster sum of squares by cluster:
[1] 23.87947 15.15100 39.82097
(between_SS / total_SS = 88.4 %)

Available components:

[1] "cluster" "centers" "totss" "withinss"


[5] "tot.withinss" "betweenss" "size" "iter"
[9] "ifault"
> Result$size
[1] 38 50 62
> Result$cluster
[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
[33] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 1 3 3 3 3 3 3 3 3 3 3 3
[65] 3 3 3 3 3 3 3 3 3 3 3 3 3 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
[97] 3 3 3 3 1 3 1 1 1 1 3 1 1 1 1 1 1 3 3 1 1 1 1 3 1 3 1 3 1 1 3 3
[129] 1 1 1 1 1 3 1 1 1 1 3 1 1 1 3 1 1 1 3 1 1 3
> table(iris$Species,Result$cluster)

1 2 3
setosa 0 50 0
versicolor 48 0 2
virginica 14 0 36

> plot(iris[c("Petal.Length","Petal.Width")],col=Result$cluster,main="Clu
stering Petal Length and Petal Width")

>
➢ Output:

➢ 01)Clustering of Petal length and Petal width.


> plot(iris$Petal.Length~iris$Petal.Width,main="Clustering Petal lenght a
nd Petal Width")

>

Output:
➢ 02)Clustering of Petal length and Sepal Width.
> plot(iris$Petal.Length~iris$Sepal.Width,main="Clustering Petal lenght
and Sepal Width")

>
Output:

➢ 03)Clustering of Sepal length and Petal Width.


> plot(iris$Sepal.Length~iris$Petal.Width,main="Clustering Sepal lenght
and Petal Width")

>
➢ Output:

➢ 04)Clustering of Sepal length and Sepal Width.


> plot(iris$Sepal.Length~iris$Sepal.Width,main="Clustering Sepal lenght
and Sepal Width")

>

➢ Output:
> plot(iris$Sepal.Length~iris$Sepal.Width)
> with(iris,text(iris$Sepal.Length~iris$Sepal.Width,labels=iris$Species,
pos=4,cex=0.6))

>
➢ Output:

➢ Plotting Cluster Dendrogram-


> iriscopy
Sepal.Length Sepal.Width Petal.Length Petal.Width
1 5.1 3.5 1.4 0.2
2 4.9 3.0 1.4 0.2
3 4.7 3.2 1.3 0.2
| | | | |
| | | | |
| | | | |
| | | | |
147 6.3 2.5 5.0 1.9
148 6.5 3.0 5.2 2.0
149 6.2 3.4 5.4 2.3
150 5.9 3.0 5.1 1.8
> distance<-dist(iriscopy)
> hc<-hclust(distance)
> plot(hc)

➢ Output:
➢ Example No:02) Clustering and plotting cluster dendrogram on student
data.
➢ Excel Sheet:

> Student_Data<-read.csv("E:\\Satyavan\\TYCS Practical\\Sem VI\\Data Scie


nce\\Student's Result.csv")
> Data_Copy=Student_Data
> Data_Copy
Roll.No Student.Name Physics Chemistry Mathematics Biology
1 1 Ravi Kasalkar 75 84 85 89
2 2 Sonal Parab 85 70 77 79
3 3 Nilesh Hatle 80 86 84 69
4 4 Sarika Jagtap 77 73 77 54
5 5 Rohan Desai 81 64 63 75
6 6 Pratibha Patil 65 73 56 78
7 7 Aniket Lad 54 88 65 62
8 8 Snehal Jikamde 62 66 77 82
9 9 Parth Wagh 62 55 59 68
10 10 Siya Panchal 57 67 79 78
Information.Technology Obt..Marks Total.Marks Percentage
1 81 414 500 82.8
2 77 388 500 77.6
3 63 382 500 76.4
4 58 339 500 67.8
5 78 361 500 72.2
6 64 336 500 67.2
7 63 332 500 66.4
8 64 351 500 70.2
9 61 305 500 61.0
10 75 356 500 71.2
> Data_Copy$Student.Name<-NULL
> Data_Copy
Roll.No Physics Chemistry Mathematics Biology
1 1 75 84 85 89
2 2 85 70 77 79
3 3 80 86 84 69
4 4 77 73 77 54
5 5 81 64 63 75
6 6 65 73 56 78
7 7 54 88 65 62
8 8 62 66 77 82
9 9 62 55 59 68
10 10 57 67 79 78
Information.Technology Obt..Marks Total.Marks Percentage
1 81 414 500 82.8
2 77 388 500 77.6
3 63 382 500 76.4
4 58 339 500 67.8
5 78 361 500 72.2
6 64 336 500 67.2
7 63 332 500 66.4
8 64 351 500 70.2
9 61 305 500 61.0
10 75 356 500 71.2
> Result<-kmeans(Data_Copy,3)
> Result
K-means clustering with 3 clusters of sizes 3, 3, 4

Cluster means:
Roll.No Physics Chemistry Mathematics Biology
1 2.000000 80.00000 80.00000 82.00 79.00000
2 7.666667 66.66667 65.66667 73.00 78.33333
3 6.500000 64.50000 72.25000 64.25 65.50000
Information.Technology Obt..Marks Total.Marks Percentage
1 73.66667 394.6667 500 78.93333
2 72.33333 356.0000 500 71.20000
3 61.50000 328.0000 500 65.60000

Clustering vector:
[1] 1 1 1 3 2 3 3 2 3 2

Within cluster sum of squares by cluster:


[1] 1222.4800 675.3333 2178.7000
(between_SS / total_SS = 71.1 %)

Available components:

[1] "cluster" "centers" "totss" "withinss"


[5] "tot.withinss" "betweenss" "size" "iter"
[9] "ifault"
> Result$size
[1] 3 3 4
> Result$cluster
[1] 1 1 1 3 2 3 3 2 3 2
> table(Student_Data$Student.Name,Result$cluster)

1 2 3
Aniket Lad 0 0 1
Nilesh Hatle 1 0 0
Parth Wagh 0 0 1
Pratibha Patil 0 0 1
Ravi Kasalkar 1 0 0
Rohan Desai 0 1 0
Sarika Jagtap 0 0 1
Siya Panchal 0 1 0
Snehal Jikamde 0 1 0
Sonal Parab 1 0 0
> plot(Student_Data[c("Physics","Chemistry")],col=Result$cluster)

>
➢ Output:
➢ 01)Clustering of Physics marks and Chemistry marks.
> plot(Student_Data$Physics~Student_Data$Chemistry,main="Clustering of
Physics marks and Chemistry marks.",xlab="Chemistry Marks",ylab="Physics
Marks")

>
➢ Output:

➢ 02)Clustering of Physics marks and Mathematics marks.


> plot(Student_Data$Physics~Student_Data$Mathematics,main="Clustering
of Physics marks and Mathematics marks.",xlab="Mathematics Marks",ylab="P
hysics Marks")

>
➢ Output:
➢ 03)Clustering of Physics marks and Information Technology marks.
> plot(Student_Data$Physics~Student_Data$Information.Technology,main="Clu
stering of Physics marks and Information Technology marks..",xlab="Inform
ation Technology Marks",ylab="Physics Marks")

>
➢ Output:

➢ 04)Clustering of Physics marks and Biology marks.


> plot(Student_Data$Physics~Student_Data$Biology,main="Clustering of Phys
ics marks and Biology marks.",xlab="Biology Marks",ylab="Physics Marks")

>
➢ Output:
> plot(Student_Data$Physics~Student_Data$Chemistry,main="Clustering of
Physics marks and Chemistry marks.")
> with(Student_Data,text(Student_Data$Physics~Student_Data$Chemistry,
labels=Student_Data$Student.Name,pos=4,cex=0.6))

>
➢ Output:

➢ Plotting Cluster Dendrogram-


> Data_Copy
Roll.No Physics Chemistry Mathematics Biology
1 1 75 84 85 89
2 2 85 70 77 79
3 3 80 86 84 69
4 4 77 73 77 54
5 5 81 64 63 75
6 6 65 73 56 78
7 7 54 88 65 62
8 8 62 66 77 82
9 9 62 55 59 68
10 10 57 67 79 78
Information.Technology Obt..Marks Total.Marks Percentage
1 81 414 500 82.8
2 77 388 500 77.6
3 63 382 500 76.4
4 58 339 500 67.8
5 78 361 500 72.2
6 64 336 500 67.2
7 63 332 500 66.4
8 64 351 500 70.2
9 61 305 500 61.0
10 75 356 500 71.2
> Distance<-dist(Data_Copy)
> HC<-hclust(Distance)
> plot(HC)

>
➢ Output:

You might also like