You are on page 1of 3

Machine Learning: Clustering

1) The file ‘4-1 citizens.csv’ contains information on the location of citizens in a neighborhood.
The file contains the x and y coordinates of where the citizen lives.

Looking at the data, it appears there are 3 living areas in the neighborhood.

Cluster the citizens based on their location and draw the clustered citizens in a scatterplot.
The result should look like the graph below (but feel free to use different colors).
2) The file ‘4-2 citizens.csv’ contains data for another year. It contains both the x and y
coordinate of where the citizen lives and the party they voted for in the previous election.
You see a representation of the data in the graph below.

Based on the observation, we feel that we can split up the data into 4 clusters. Do the
clustering (remember to normalize) and draw the resulting graph.

The result should look like the graph below. (Colors may again look different.)
3) The file ‘4-3 customers.csv’ contains information on customers. It shows customers of
various ages buying your product A, B, C, or D. The graph below represents the data in the
file.

Cluster the data (remember to normalize). Plot the result. Try various settings for the
number of clusters and see what happens. You can easily plot the data in the same manner
as shown above, by adding an extra column to the data that contains the x coordinate on a
scatterplot. A customer has the value 1 for x, if the product is A; 2, if the value is B, …

The graph below shows what your result could look like for 7 clusters.

You might also like