You are on page 1of 3

3/14/24, 10:02 AM SAP HANA PAL – K-Means Algorithm or How to do Cust...

- SAP Community

"V006" DOUBLE,

"V007" DOUBLE

);

/* Table Type that will be used to specify

the different parameters to run the KMeans Algorithm */

DROP TYPE PAL_CONTROL_TELCO;

CREATE TYPE PAL_CONTROL_TELCO AS TABLE(

"NAME" VARCHAR (50),

"INTARGS" INTEGER,

"DOUBLEARGS" DOUBLE,

"STRINGARGS" VARCHAR (100)

);

/* This table is used to generate the KMeans procedure

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 9/39


3/14/24, 10:02 AM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

CALL PAL_KMEANS_TELCO(TELCO, PAL_CONTROL_TAB_TELCO, PAL_KMEANS_RESASSIGN_TAB_TELCO,


PAL_KMEANS_CENTERS_TAB_TELCO) with overview;

Pretty easy huh?

Identify the Right Number of Clusters

Ok, I have my code ready, but I’m missing a very important part, I still don’t know how many Ks I need to specify as the
input parameter (well, I do know because I created the sample data, but let’s pretend I don’t know). There are multiple
techniques to find out how many groups will produce the best clustering, in this case I will use the Elbow Criterion. The
elbow criterion is a common rule of thumb that says that one should choose a number of clusters so that adding another
cluster does not add sufficient information. I will run the code above specifying different number of clusters and for each run
I will measure the total intra-cluster distance. When the distance does not decrease much from one run to the other I will
know the number of groups I need to use. I built the chart below with the results:

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 15/39


3/14/24, 10:02 AM SAP HANA PAL – K-Means Algorithm or How to do Cust... - SAP Community

As you can see, the distance goes dramatically down between 2 and 3, and after 3 the distance keeps going down but in a
smaller scale. So the “elbow” is clearly in cluster 3. This means that I should use 3 clusters to run the algorithm. So now I’m
going to run the algorithm again using the right number of clusters. This is the result:

The first column is the Customer ID and the second column is the cluster that has been assigned to that customer. So
based on how customers use their mobile phones, the K-Means algorithm clustered my customers in the following way:

Customer ID 1 thru 10 --> Cluster 2


Customer ID 10001 thru 10010 --> Cluster 1
Customer ID 20001 thru 20010 --> Cluster 0

https://community.sap.com/t5/technology-blogs-by-members/sap-hana-pal-k-means-algorithm-or-how -to-do-customer-segmentation-for-the/ba-p/12976696/page/2 16/39

You might also like