You are on page 1of 6

Tutorial 6

Data Analytics and Big Data for


Management
GE2338 Internet Applications and Security
K-means Clustering
 K-means is an algorithm that trains a model that groups similar objects
together and is a type of unsupervised learning, which is used when you
have unlabelled data (i.e., data without defined categories or groups).
Data points are clustered into k groups based on feature similarity.
K-means Clustering Algorithm

 Step 1: Randomly pick k centroids (may be existing data point)


 Step 2: For each remaining data point, assign it to the nearest cluster, i.e.,
the shortest distance between the data point and the nearest centroid.
 Step 3: Update the centroids by taking the mean of all data points
assigned to that centroid's cluster.
 Step 4: Repeat step 2 until no more relocation.
Exercise

 Assign the users into 3 groups based on their age. The 3 initial clusters
are C, E and F.
User Age
A 30
B 22
C 18
D 50
E 35
F 67
G 25
H 52
I 43
Spearman Correlation

 Correlation measures whether two variables are related linearly.


 Spearman rank correlation works well for ordinal / interval / ratio data.
 Steps for calculating Spearman correlation:
 Step 1: Create 2 new columns to store the respective RANK of 2 variables under investigation
 Step 2: For each row, calculate D, which is the difference between the ranks
 Step 3: Calculate D2. Sum up the D2 of all rows
 Step 4: Apply the formula:
Exercise

 The table below shows the midterm and exam marks of 6 students.
Calculate the Spearman correlation.

User Midterm Exam


A 30 46
B 89 93
C 76 74
D 50 52
E 44 62
F 62 80

You might also like