You are on page 1of 3

Fapodkaicr i 3ation a

id
Chaibna. M
8) Apply EM algorithm to cluster a set of data stored in a CSV file.
Use the same data set for
clustering using k-Mcans algorithm.
Compare the results of these two algorithms and comment on the
quality of clustering. You can add Java/Python ML library
classes/API in the program. n
Mhi ubray (Usés Sattes rLes
bau a u h , Shg r a m d
import matplotlib.pvplot as plt i e CeAK e
from sklearn import datasets
from sklearn.cluster import KMeans
-climcnSien
import pandas as pd DatoFrame a
import numpy as np
dlata Sruture ie dkta us
allgnud n a tabulal shia e

# import some data to play with


iris datasets.load_iris()
X pd.DataFrame(iris.data)
X.columns= 'Sepal_Length,Sepal_Width', Petal_Length',Petal_Width'
yEpd.DataFrameiris.target)
y.columns = ['Targets]

#Build the KMeans Model


model KMeans(n_clusters=3)
#model.labels_: Gives cluster no for which samples belongs to numbe
ht the model.fitl Tvaius model epodo
# Visualise the cluster'ngresults kakes a turl liuda(ns
plt.igure(figsize=(14,14)) s i 3 . dcécisEl )
QsGumenu
Vclus," gfel
colormap= np.array(l'red', 'lime,"black']) Aeigie
Oidti
idh en te invhes
#Plot the Original Classifications using Petal features
S mallo n
plt.subplot(2, 2, 1)
plt.scatter(X.Petal_ Length, X.Petal_Width, c=colormaply.Targets], s=40)
pit.title(Real Clusters')
plt.xlabel('Petal Length) Sa reint
tn åph
plt ylabel'Petal Width')
pit.show()
ndex
-
m a s , Vce»
de
O S i Cn
hi oSidc
Ocacbi
3 kq

vaiables
v aiabes as dots
aas dots
d o s ha Stalaicabhip bethoan
Sadte pod each a&ible
en dmenSicnd Ce ausfan ,
caltec
catte lo
SrattelPlOE_mahia

1 ngth
Pee Re Biala data

Real Clusters Ciqnali lanyfatict


tandaidi
NERmali3a
Binari3a
aa
data
data
data

ylahet Vi
n
OASicou)

dundi
e n T t oJ a b e l

e t s l Length
,l2
x.Aabel

# Plot the Models Classifications


plt.subplot(2, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_1, s=40)
plt.title(K-Means Clustering')
plt.xlabel(Petal Length')
plt.ylabel('Petal Width')
plt.show()

K-Means Clustenng

2 St4 me vêhmal disubulien

Stcukd
CLEiation=|
Petal Length
SrmetimA doke pornt Li be O, O-meen
i.e data pstnta
ho&mase ull be
#General EM for GMM Sakeled/
from sklearn import preprocessin8 data.
# transform your data such that its distribution will have a
# mean value 0 and standard deviation of 1.

ns'mcke scaler =preprocessing.StandardScaler() dantdandi Se the d d e


NCAÁAG nan Salio
Fe

scaler.fit(X Salingo
xsa = scaler.transform(XFF hanbtn he data
xS =pd.DataFrame(7Sa, columns =X.coumns dod ekoen 0 o
from sklearn.mixture import GaussianMixture h alla estimale ka
Bmm =GaussianMixture(n_components-3) T CWam
ansm mürhua Ushbubivn
gmm.fit(xs) PaamataS f UaLmcun du
Standotd
maan
Salast(*) CCrmule a
e upd alea
8 calca. by Coniei J
ron(x) pein tardendiZLicn

St(x[.y3) # CSthnale model pauamaks hum £H l


Poedit enavculen upul PsedichCns d& dhe r mlis.
uSt necduen -(dcle ponts)
Pebun Cmpenant label
dho Jaben d dat Jmples
gmm_y gmm.predict(xs) # re cl cl X X wine hviunad
plt.subplot(2, 2,3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormaplgmm_yl, s=40) rmod
plt.title(GMM Clustering')
plt.xlabel 'Petal Length') Si d o t Poin
plt.ylabel('Petal Width')
plt.show()
print 'Observation: The GMM using EM algorithm based clustering matched the true labels more closely
than the Kmeans.)

GMM Clusternng

Clac

2 6
Petal Length
based clustering matched the true
Observation: The GMM using EM algorithm
labels more closely than the Kmeans

mas keg Si 3e 4/88


Y VS x t i

vsed Kid den deda"


idden dota
CDtrrerel4 icdely

You might also like