Professional Documents
Culture Documents
Note:Observations/objects represent respondents in HR analytics, shown in the rows of the data set.
Characteristics/attributes represent the actual data you would like to collect, represent the columns
of the dataset.
Cluster analysis
Can be used for data reduction.
◦ Eg. From the entire set of fresh candidates, you would like to understand the nature of these candidates
and group them into several sub-groups based on commonalities, e.g., location, education background
etc.
Used to compare observations when the characteristics are measured in non-metric terms. For example,
the employees say yes or no for a set of attributes like likeability of the office space, boss and
colleagues. Association measures asses the degree of agreement between pair of respondents.
Hierarchical Clustering
Involves a series of n-1 clustering decisions(n represents the number of observations), combining
observations into hierarchy or decision tree.
◦ Agglomerative methods: Each observation starts out as its own cluster and is successively joined based
on the similarity measures until only a single cluster remains. So, when you have 50 observations, what
will be the number of cluster at the start and at the end?
◦ Divisive methods:All observations start under a single cluster and then divides themselves (first into 2,
then 3… so on) until each observation becomes a single cluster. So, when you have 50 observations,
what will be the number of cluster at the start and at the end?
◦ Commonly used method is the Agglomerative method.
Non-hierarchical Clustering
Doesn’t involve tree like construction process.
They assign observations into clusters once the number of clusters is specified. It is proceeded
through two steps.
◦ Specify cluster seeds. For example, the first observation, which has no missing values, can be taken as a
cluster seed for a cluster.’
◦ Assignment of observations: Assign each observation to one of the cluster seeds based on similarity.
◦ Cluster seeds can be formed simultaneously or sequentially.
K-means: A form of non-hierarchical
clustering
Portion the data into a user-specified number of clusters
Then iteratively reassign the observations to clusters until the numerical criterion is met.
The criterion specifies a goal related to minimizing the distance of observations within a cluster
and maximizing the distance between the clusters.
Non-hierarchical methods are preferred for HR analytics as it can accommodate a large sets of
data
Hierarchical or K-means?