Professional Documents
Culture Documents
At first, with the above R code, we tried to normalize the data because the data was
distributed into different units, hence, we used the normalizing function to bring it to
common units.
We used the above R code to find the Euclidian distance between the normalized value and
the data.
We used the normalized data to form clusters by using the function “hclust” and then we used
the “plot” function to create the dendrograms for the clusters.
We again ran the clustering function but this time we used average of the data instead of the
complete data.
Using this function, we determined the total number of members in each clusters for both
complete and average data.
Using this function, we determined the aggregate characteristics of each cluster for different
factors, like gender, age, income, and spending.
We use these two plots to determine the optimum number of clusters for the data.
INTERPRETATIONS:
DENDROGRAM
Based on these two plots, we determined that the optimum number of clusters that we need to
make are 4.
Based on these tables, we can see the total number of customers in each cluster. Member A
table tells us the number of customers when we used the complete data and Member C table
tells us the number of customers when we used the average data. The third table tells us the
number of common customers between the two clusters.
These tables tell us about the aggregate characteristics for different factors for each cluster.
Based on the data, we can see that there is not much difference between gender, aggregate for
different clusters meaning it is not a significant factor in determining the clusters.
CLUSTERS AGE ANNUAL SPENDING
INCOME SCORE
1 LOW LOWEST AVERAGE
2 HIGHEST LOWEST AVERAGE
3 AVERAGE HIGHEST HIGHEST
4 AVERAGE HIGHEST LOWEST