You are on page 1of 2

DISC

420 Assignment 4
Friday, 31 March 2023 Total Marks: 20

Assignment + Peer Evaluation Form: Due by 3rd April, MONDAY 11:55 P M

Instructions

• This is a take-home group assignment


• Read the question paper carefully before attempting the questions.
• You are required to submit your R Script File and WORD DOCUMENT.
• Copy and paste the plots/visualizations/dendrograms/relevant outputs as specified or
required in the questions, from R Studio to your word document.
• All analysis/explanation required by the questions must be done in the word document and not
the R file. Analysis/Explanation will not be marked if it’s not in the word document.
• Questions will be marked on the basis of your analysis and the reported
outputs/visuals/dendrograms in your word document, as specified and required in the questions.
Hence, quality of your analysis will primarily determine the marks you’ll obtain. Submitting your
R file containing the relevant code along with the word document is compulsory as the
analysis/explanation and reported outputs/visuals/dendrograms based on code work will only
be awarded marks if the relevant code will be present in your R file.

Submission Requirements
1. One R file/R Script file containing the relevant code per group
2. One word document containing the desired plots/visuals/outputs (copied from R Studio) and
analysis as/where specified in the questions below.

Preliminary tasks
• Save your R file and word document with your Group number as file name.
• Download the data file called “Menu.csv” and save it in your working directory/project folder.

Data Set

Menu.csv consists of menu items of a restaurant listing all the dishes/drinks along with prices. The variable
names are self--‐explanatory. Import/load the data file in R Studio and name/store the data frame as ‘Menu’.

PART 1 (Total Marks = 1)

Report the following answers in your word file.

i. Will you scale the data if you intend to apply clustering using the features and elements as
your variables? Why or why not? (0.5 mark)
ii. You are required to do clustering only on dishes and not drinks. In the names.x column all
drinks start with [Drinks] label. (0.5 mark)
DISC 420 Assignment 4
Friday, 31 March 2023 Total Marks: 20

PART 2 (Total Marks = 5)


a) Apply hierarchal clustering to the prepared data. Use linkage criterion for hierarchical clustering that
according to you gives/will give you the most suitable results. You may try more than one linkage criteria
before finalizing one. Analyze and report the results that you obtain after using the linkage criteria that you
decided upon/finalized where your analysis must address the following points: (Note: If/when your
discussion/analysis is based on some outputs from R Studio while covering any of the points below, you
must report those relevant outputs in your word doc to support your analysis)
i. How many clusters should be made? Justify your choice (2.5)
ii. After producing clusters, state the number of members in each cluster that you have produced. (1)
iii. Report the final dendrogram from R Studio in your word doc. Your dendrogram should show the cluster
membership of each menu item as well as the names of the menu item. (1.5)

PART 3 (Total Marks= 14)

a) Apply k--‐means clustering on the prepared data. Decide upon the number of clusters, your analysis must
address the following points:
(Note: If/when your discussion/analysis is based on some outputs from R Studio while covering any of
the points below, you must report those relevant outputs in your word doc to support your analysis)
i. Justify the choice of the number of clusters that you decided upon. (Note: You are not being
asked to go with the optimal number of clusters but rather choose a number based on subjectivity)
(2)
ii. What is the membership in each cluster that you have produced? (1)
iii. Which is the least dense cluster out of all the clusters produced and why? What can
you deduce about this cluster in comparison to other clusters?
(1.5)
iv. Characterize each cluster that you produced using cluster means. (4.5)
v. Which two variables contributed most to cluster formation? Justify your answer. (1.5)
vi. Comment on cluster health and how you would justify it as good clustering. (2)

b) By using the variables, you identified in part 3--‐v on the x and y axis, for k--‐means clustering, use ggplot2
to visualize the clusters that you produced.
(1.5)

You might also like