You are on page 1of 3

Hierarchial Clustering

import numpy as np

import pandas as pd

from matplotlib import pyplot as plt

df.iloc[: , 1:5].describe() – To select column from 2nd to 5th column (1,2 ,3 4)

from scipy.cluster.hierarchy import dendrogram, linkage

wardlink = linkage(data, method = 'ward')

dend = dendrogram(wardlink)

dend = dendrogram(wardlink,

truncate_mode='lastp',

p = 10,

from scipy.cluster.hierarchy import fcluster

#Method 1

clusters = fcluster (wardlink, 3, criterion='maxclust')

clusters

# Method 2

Clusters = fcluster (wardlink, 3, criterion='distance')

clusters

df['clusters'] = clusters – To add cluster column into given dataset

df.head ()
Cluster Frequency
df.clusters.value_counts().sort_index()

Cluster Profiles
aggdata=df.iloc[:,1:8].groupby('clusters').mean()
aggdata['Freq']=df.clusters.value_counts().sort_index()
aggdata

 Cluster 1: Tier 1 colleges (Top Colleges)


 Cluster 2: Tier 3 colleges (Poor performing colleges/new college)
 Cluster 3: Tier 2 colleges (Medium performing colleges)

Using Agglomerative Clustering


from sklearn.cluster import AgglomerativeClustering

cluster = AgglomerativeClustering(n_clusters=3, affinity='euclidean',


linkage='average')
Cluster_agglo=cluster.fit_predict(enggdata.iloc[:,1:6])
print(Cluster_agglo)

df["Agglo_CLusters"]=Cluster_agglo

df.columns

agglo_data=df.drop(["SR_NO","clusters"],axis=1).groupby('Agglo_CLusters').
mean()
agglo_data['Freq']=df.Agglo_CLusters.value_counts().sort_index()
agglo_data

Recommendations
1. For companies hiring, go to colleges for Placements are Tier 1 colleges, followed by Tier 2
colleges
2. For companies providing Training program to staffs and students, go to colleges are Tier 2
and Tier 3 Colleges, since Tier 1 is comparitively performing better.
3. Tier 3 colleges will need to concentrate more on Marketing and Advertisements about their
campus to create awareness and attract students
4. Students looking to enroll in a college, can give priority to Tier 1 over Tier 2 and 3 colleges
Saving the Cluster Profiles in a csv file
#aggdata.to_csv('enggdata_hc.csv')

K-Means Clustering
import pandas as pd

import numpy as np

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.cluster import KMeans

You might also like