Professional Documents
Culture Documents
- SAP Community
"V006" DOUBLE,
"V007" DOUBLE
);
"INTARGS" INTEGER,
"DOUBLEARGS" DOUBLE,
);
Ok, I have my code ready, but I’m missing a very important part, I still don’t know how many Ks I need to specify as the
input parameter (well, I do know because I created the sample data, but let’s pretend I don’t know). There are multiple
techniques to find out how many groups will produce the best clustering, in this case I will use the Elbow Criterion. The
elbow criterion is a common rule of thumb that says that one should choose a number of clusters so that adding another
cluster does not add sufficient information. I will run the code above specifying different number of clusters and for each run
I will measure the total intra-cluster distance. When the distance does not decrease much from one run to the other I will
know the number of groups I need to use. I built the chart below with the results:
As you can see, the distance goes dramatically down between 2 and 3, and after 3 the distance keeps going down but in a
smaller scale. So the “elbow” is clearly in cluster 3. This means that I should use 3 clusters to run the algorithm. So now I’m
going to run the algorithm again using the right number of clusters. This is the result:
The first column is the Customer ID and the second column is the cluster that has been assigned to that customer. So
based on how customers use their mobile phones, the K-Means algorithm clustered my customers in the following way: