Professional Documents
Culture Documents
Assignment 4
Assignment 4
420 Assignment 4
Friday, 31 March 2023 Total Marks: 20
Instructions
Submission Requirements
1. One R file/R Script file containing the relevant code per group
2. One word document containing the desired plots/visuals/outputs (copied from R Studio) and
analysis as/where specified in the questions below.
Preliminary tasks
• Save your R file and word document with your Group number as file name.
• Download the data file called “Menu.csv” and save it in your working directory/project folder.
Data Set
Menu.csv consists of menu items of a restaurant listing all the dishes/drinks along with prices. The variable
names are self--‐explanatory. Import/load the data file in R Studio and name/store the data frame as ‘Menu’.
i. Will you scale the data if you intend to apply clustering using the features and elements as
your variables? Why or why not? (0.5 mark)
ii. You are required to do clustering only on dishes and not drinks. In the names.x column all
drinks start with [Drinks] label. (0.5 mark)
DISC 420 Assignment 4
Friday, 31 March 2023 Total Marks: 20
a) Apply k--‐means clustering on the prepared data. Decide upon the number of clusters, your analysis must
address the following points:
(Note: If/when your discussion/analysis is based on some outputs from R Studio while covering any of
the points below, you must report those relevant outputs in your word doc to support your analysis)
i. Justify the choice of the number of clusters that you decided upon. (Note: You are not being
asked to go with the optimal number of clusters but rather choose a number based on subjectivity)
(2)
ii. What is the membership in each cluster that you have produced? (1)
iii. Which is the least dense cluster out of all the clusters produced and why? What can
you deduce about this cluster in comparison to other clusters?
(1.5)
iv. Characterize each cluster that you produced using cluster means. (4.5)
v. Which two variables contributed most to cluster formation? Justify your answer. (1.5)
vi. Comment on cluster health and how you would justify it as good clustering. (2)
b) By using the variables, you identified in part 3--‐v on the x and y axis, for k--‐means clustering, use ggplot2
to visualize the clusters that you produced.
(1.5)