You are on page 1of 6

23/04/2022, 12:27 AP19110010030_R_Lab-Assignment-6 - Jupyter Notebook

Kilaru Sravan

AP19110010030

CSE-A

Implementing K-Mediod

Installing the required pacakges

In [1]:

install.packages("cluster")

There is a binary version available but the source version is later:

binary source needs_compilation

cluster 2.1.2 2.1.3 TRUE

Binaries will be installed

package 'cluster' successfully unpacked and MD5 sums checked

The downloaded binary packages are in

C:\Users\LENOVO\AppData\Local\Temp\RtmpGoFql6\downloaded_packages

In [2]:

install.packages("factoextra")

package 'factoextra' successfully unpacked and MD5 sums checked

The downloaded binary packages are in

C:\Users\LENOVO\AppData\Local\Temp\RtmpGoFql6\downloaded_packages

Loading the dataset

localhost:8888/notebooks/AP19110010030_R_Lab-Assignment-6.ipynb 1/7
23/04/2022, 12:27 AP19110010030_R_Lab-Assignment-6 - Jupyter Notebook

In [4]:

library("ggplot2")
library("cluster")
library("factoextra")
data = read.csv("BMI.csv", header = TRUE)
head(data, 8)

Gender Height Weight Category Index

Male 174 96 Over weight 2

Male 189 87 Weak 0

Female 185 110 Over weight 2

Female 195 104 Normal 1

Male 149 61 Normal 1

Male 189 104 Normal 1

Male 147 92 Obesity 3

Male 154 111 Obesity 3

Taking the only columns required in the analysis

In [5]:

df = subset(data, select = c(Height, Weight))


head(df, 8)

Height Weight

174 96

189 87

185 110

195 104

149 61

189 104

147 92

154 111

Plotting the data points

localhost:8888/notebooks/AP19110010030_R_Lab-Assignment-6.ipynb 2/7
23/04/2022, 12:27 AP19110010030_R_Lab-Assignment-6 - Jupyter Notebook

In [6]:

plot1 = ggplot(df, aes(x = Height, y = Weight))+geom_point()


plot1

Finding the Optimal number of clusters using Elbow Curve method

localhost:8888/notebooks/AP19110010030_R_Lab-Assignment-6.ipynb 3/7
23/04/2022, 12:27 AP19110010030_R_Lab-Assignment-6 - Jupyter Notebook

In [7]:

fviz_nbclust(df, pam, method = "wss")

From the above graph we can observe that 6 as the optimal number of clusters as there is a change of
slope

In [8]:

kmed = pam(df, k = 6)

localhost:8888/notebooks/AP19110010030_R_Lab-Assignment-6.ipynb 4/7
23/04/2022, 12:27 AP19110010030_R_Lab-Assignment-6 - Jupyter Notebook

In [9]:

final_data = cbind(df, cluster = kmed$cluster)


head(final_data, 8)

Height Weight cluster

174 96 1

189 87 1

185 110 1

195 104 1

149 61 2

189 104 1

147 92 3

154 111 3

Plotting the Cluster plot

localhost:8888/notebooks/AP19110010030_R_Lab-Assignment-6.ipynb 5/7
23/04/2022, 12:27 AP19110010030_R_Lab-Assignment-6 - Jupyter Notebook

In [10]:

fviz_cluster(kmed, data = df)

localhost:8888/notebooks/AP19110010030_R_Lab-Assignment-6.ipynb 6/7

You might also like