Professional Documents
Culture Documents
Segmentation
Shivaan Chopra
Dendrogram
The dendrogram represents the grouping process of observations into clusters. The chart reads from bottom (all initia
observations are separated) to top (all observations are clustered into one unique segment).
The height represents the distance between the two groups of observations being merged at each step. If two very d
groups are being merged, this will create a 'jump' in the dendrogram, indicating that it might be wise to stop the
clustering process before.
Dendrogram. The dendrogram is a tree diagram to illustrate the arrangement of clusters produced by hierarchical clustering, and
Scree plot
The screeplot displays, for each cluster solution, a measure of within-cluster heterogeneity. If clusters group
observations that are widely different (which will happen if the number of clusters is too small to capture the
variability in the data), the value will be high.
A good cluster solution might be where the screeplot displays an 'elbow', that is, where increasing the number of
clusters beyond a certain point does not dramatically decreases within-cluster heterogeneity.
The measure displayed in the screeplot is related, but not equivalent, to the distance reported in the dendrogram.
Scree plot. The scree plot compares the sum of squared error (SSE) for each cluster solution. A good cluster solution might be wh
From a statistical point of view, the SSE reported in the screeplot is computed as the sum of squared error between e
observation and its cluster centroid (or center), summed over all the observations.
y), managerial relevance (what makes the
ments be easily targeted).
r of segments becomes a judgment call.
Segment size
Population Segment 1
Size 317 82
Relative size 100% 26%
Segment description
Segment description
Average value of each segmentation variable, overall for each segment (centroid). Segmentation variables that are statistically di
Population Segment 1
Rich full-bodied 4.77 3.48
Light beer 3.72 2.23
No aftertaste 4.56 4.26
Refreshing 5.02 5.65
Goes down easily 5.17 5.21
Gives a buzz 3.39 2.82
Good taste 3.000 1.268
Low price 3.91 3.41
Good value 4.65 4.44
From country with brewing tradition 3.82 3.04
Attractive bottle 3.00 2.44
Prestigious brand 3.20 1.87
High quality 4.48 3.06
Drink at picnics 4.56 4.62
Masculine 2.67 1.54
For young people 2.49 1.33
Drink with friends 4.70 4.57
Drink at home 4.34 4.20
To serve dinner guests 4.92 5.06
For dining out 5.03 4.91
Drink at bar 4.35 3.57
Segment differences per segment. Cell colors indicate to what extent a segment is statistically different from the rest of the popu
Segment space
The chart below is a graphical representation of the various segments, segment members, and segmentation variable
is obtained by outputing the first two dimensions of a principal component analysis performed on the (standardized)
segmentation data, on top of which segment information has been overlayed.
Because only the first two dimensions of the PCA are displayed, and these two dimensions capture only 34.1% of the
variance in the data, some differences between segments might not appear here. Note that segmentation variables w
variance, if any, have been excluded.
Two clusters who appear to overlap in the first two dimensions might actually be distinct on other dimensions.
Consequently, this chart is a useful guide, especially to see which segmentation variables are correlated, but may be
misleading if used to select the optimal number of segments.
Segment space. Spatial representation of segments and segmentation variables, using principal component analysis.
Segment membership
Segment membership
Segment to which each member of the population belongs to. The complete membership list is only available in the Excel format
Segment
6861 1
4129 1
4393 2
445 3
7393 4
964 3
6773 4
461 2
7156 4
5785 4
4946 3
428 2
4465 1
2478 1
6321 2
4263 1
4305 2
5074 4
3004 4
702 3
6778 4
5542 2
7247 4
6024 2
6172 2
2590 1
6293 3
4485 3
5520 1
5837 4
1868 2
5996 1
2588 1
387 2
5660 3
6758 1
7187 3
993 2
301 1
7018 1
3570 1
4829 2
5575 1
6100 1
3950 2
1241 2
1729 2
4377 4
4448 2
7049 3
4970 2
7201 4
5555 2
6021 3
3528 3
1129 4
6502 2
2911 1
2899 1
6421 1
616 4
141 1
464 2
1173 1
525 4
5646 3
166 2
382 1
4953 2
5789 3
3655 1
1574 2
3642 2
6450 2
3774 3
3711 2
5691 4
3602 4
2316 4
1694 4
807 1
7326 1
3485 4
3125 1
1035 2
763 3
3014 3
707 3
5655 2
5320 4
4413 4
4537 2
534 1
1275 1
388 3
7211 2
5573 2
4668 1
3490 2
4350 3
4614 2
3819 1
6855 2
2024 2
3373 3
5701 3
4174 1
2810 1
3996 1
7447 2
4491 4
3595 4
308 1
3514 3
71 1
3953 3
6048 4
6494 2
1223 4
3967 2
4643 1
5824 4
1322 4
3520 3
6157 3
3028 2
5870 4
3800 1
6068 4
5637 2
7460 1
7173 1
6910 1
1555 4
1743 2
3833 3
5043 4
6814 3
6605 4
7152 4
5850 2
863 1
7487 4
2298 2
4181 4
748 2
3507 1
5262 2
3962 1
1242 2
2338 1
389 3
62 3
5506 1
4388 4
4480 3
5543 2
2062 2
4841 4
5839 3
7308 3
4956 2
6551 1
6312 2
1696 2
5856 1
2403 2
7413 1
5400 1
1176 2
1595 3
717 2
574 4
2523 4
5035 1
3154 3
2029 1
4262 3
7134 2
489 3
6674 2
6550 3
7415 2
2009 4
6423 4
4195 4
1232 2
6831 3
2054 3
985 3
708 4
4670 3
6351 2
2013 1
6316 2
7124 1
2569 3
1676 2
6237 4
2936 4
3666 4
6669 2
2857 1
5855 2
2474 4
3617 3
1324 1
6221 2
1283 3
3212 4
7355 2
3225 1
6817 1
6858 1
6970 2
4420 3
3762 4
4545 2
2459 2
3842 1
7165 2
2687 1
2819 1
5429 3
7501 2
1733 3
6641 2
4323 4
4653 2
4091 1
206 1
4459 2
2708 2
6189 1
3535 1
599 3
385 1
2128 4
6225 2
903 1
4743 2
7191 4
6352 1
6881 1
4967 3
4762 4
1730 3
5913 2
298 3
7192 3
3483 3
5862 1
2997 1
5698 1
2052 4
5481 4
3733 1
5353 1
2787 2
662 3
4136 3
7304 2
4626 3
5450 2
5628 3
4295 3
6612 2
6013 2
3614 2
5045 3
4794 3
4424 2
6373 4
1712 1
4421 2
6575 1
4498 1
5406 1
7519 2
2204 3
7169 3
73 3
5719 4
5832 3
7172 4
5803 2
4382 2
7388 1
5794 2
3011 2
7189 2
2587 2
4908 3
879 1
6291 2
6148 4
2932 3
5786 2
7128 4
997 2
984 1
6702 1
3974 2
3970 2
5633 3
7180 4
3214 2
7469 4
381 3
4196 4
5619 3
7500 3
6071 2
2319 2
6145 1
3460 4
5456 4
Segment 2 Segment 3 Segment 4
100 70 65
32% 22% 0.20504731861199
egmentation variables that are statistically different from the rest of the population are highlighted in red (lower) or green (higher).
Segment 2 Segment 3 Segment 4
6.08 2.30 7.0307692307692
4.27 3.67 4.8153846153846
4.46 3.46 6.2923076923077
5.23 2.00 7.1692307692308
5.46 2.91 7.0923076923077
3.44 2.63 4.8769230769231
3.280 0.443 7.5076923076923
4.12 3.13 5.0615384615385
4.42 3.26 6.7692307692308
4.42 2.54 5.2461538461538
2.67 2.77 4.4461538461538
3.81 2.27 4.9384615384615
6.02 1.53 7.0615384615385
5.21 2.49 5.7384615384615
2.67 2.13 4.6769230769231
2.58 2.29 4.0307692307692
6.26 1.36 6.0461538461539
5.02 1.60 6.4153846153846
6.17 1.46 6.5384615384615
6.17 1.67 7.0461538461539
5.69 1.56 6.2615384615385
atistically different from the rest of the population on each segmentation variable.
Discriminant space
The chart below is a graphical representation of the various segments, segment members, and discriminant variables
is obtained by outputing the first two dimensions of a principal component analysis performed on the (standardized)
discriminant data, on top of which segment information has been overlayed.
Because only the first two dimensions of the PCA are displayed, and these two dimensions capture only 30.3% of the
variance in the data, some differences between segments might not appear here. Note that discriminant variables wi
variance, if any, have been excluded.
If two or more segments fully overlap, it is unlikely that they could be clearly separated based on discriminant data
alone.
However, two segments that seem to overlap on two dimensions may be more clearly separated on other dimension
Consequently, the confusion matrix is a better guide to assess the quality of segment discrimination.
Discriminant space. Spatial representation of segments and discriminant variables, using principal component analysis.
segment. The more differences can be
inant data alone.
inant variables that are statistically different from the rest of the population are highlighted in red (lower) or green (higher).
Segment 2 Segment 3 Segment 4
8.74 9.19 8.0461538461538
4.89 4.94 4.7230769230769
5.58 5.51 5.1846153846154
4.52 4.21 4.4461538461538
1.13 1.17 1.0769230769231
3.47 3.56 3.3538461538461
3.35 3.39 3.2
3.63 3.54 3.5384615384615
3.03 3.19 3.0461538461538
3.32 3.23 3.3230769230769
3.49 3.49 3.3076923076923
2.96 2.86 2.9230769230769
ution of a discriminant variable in a segment is statistically different from the rest of the population.
Model coefficients
Model parameters
Segment 2 is the model baseline.
Segment 1 Segment 3
(Intercept) 0.0738 -0.4419
Weekly consumption 0.01870 0.00385
Age (1-7) -0.1543 0.0558
Income (1-7) -0.02539 0.00585
Education (1-6) 0.0927 -0.1606
Sex (male=1) -0.211 0.383
Adapt to new situations 0.373 0.297
Make friends easily -0.2436 -0.0303
Do not like to be tied to timetable -0.542 -0.402
Like to take chances 0.542 0.445
Like to travel abroad -0.0951 -0.1777
Like ethnic food 0.1342 0.0258
Knowledgeable about beer -0.0770 -0.1325
P-values
p-values
Probabilities that parameter estimates are different from zero only by chance.
Segment 1 Segment 3
(Intercept) 0.967 0.809
Weekly consumption 0.253 0.835
Age (1-7) 0.198 0.652
Income (1-7) 0.816 0.960
Education (1-6) 0.450 0.173
Sex (male=1) 0.675 0.409
Adapt to new situations 0.192 0.305
Make friends easily 0.258 0.892
Do not like to be tied to timetable 0.044 0.151
Like to take chances 0.024 0.061
Like to travel abroad 0.618 0.354
Like ethnic food 0.576 0.914
Knowledgeable about beer 0.686 0.486
Confusion matrix
The confusion matrix compares actual segment membership (obtained from the segmentation analysis and the origin
segmentation variables) and predicted segment membership (obtained from the discriminant analysis and the descrip
alone). When actual and predicted segment memberships coincide, the diagonal elements will be comparatively large
indicating that the discriminant model is accurate.
Model predictions
Segment 4
3.6752448117395
-0.014206784116895
-0.050670139697248
-0.12519701129326
0.019778456087426
-0.66900518842226
-0.23558312882646
-0.15358065224408
-0.24436589265983
0.17873469652749
0.1036731651552
-0.29474006075355
-0.040855236012989
Segment 4
0.043603601936111
0.49427840861891
0.68835693555349
0.27031514075597
0.87366391797036
0.24274722003337
0.40133100149874
0.49303052454743
0.39773334350524
0.46780934868293
0.61979890231218
0.20721263487867
0.83971214143618
elong to each cluster (as predicted by the discriminant model and the descriptors alone). The segment with the highest probability is retained, and is co
Prob(cluster 3) Prob(cluster 4) Predicted Actual Correct
17% 0 2 1 0
42% 0 3 1 0
17% 0 1 2 0
21% 0 2 3 0
19% 0 4 4 1
37% 0 3 3 1
13% 0 4 4 1
27% 0 2 2 1
21% 0 2 4 0
14% 0 4 4 1
25% 0 2 3 0
30% 0 2 2 1
24% 0 4 1 0
19% 0 1 1 1
14% 0 1 2 0
15% 0 4 1 0
17% 0 2 2 1
16% 0 2 4 0
14% 0 4 4 1
31% 0 2 3 0
19% 0 2 4 0
18% 0 4 2 0
21% 0 4 4 1
16% 0 1 2 0
35% 0 3 2 0
22% 0 4 1 0
12% 0 4 3 0
20% 0 2 3 0
15% 0 1 1 1
19% 0 2 4 0
23% 0 1 2 0
45% 0 3 1 0
16% 0 2 1 0
19% 0 2 2 1
25% 0 1 3 0
20% 0 2 1 0
22% 0 2 3 0
17% 0 1 2 0
13% 0 2 1 0
22% 0 1 1 1
11% 0 1 1 1
16% 0 2 2 1
25% 0 1 1 1
14% 0 4 1 0
21% 0 2 2 1
25% 0 2 2 1
9% 0 2 2 1
13% 0 4 4 1
21% 0 1 2 0
22% 0 2 3 0
17% 0 2 2 1
48% 0 3 4 0
39% 0 3 2 0
33% 0 2 3 0
27% 0 2 3 0
14% 0 2 4 0
16% 0 1 2 0
17% 0 4 1 0
22% 0 2 1 0
22% 0 2 1 0
25% 0 2 4 0
33% 0 2 1 0
49% 0 3 2 0
30% 0 2 1 0
25% 0 2 4 0
28% 0 2 3 0
16% 0 2 2 1
15% 0 1 1 1
16% 0 4 2 0
40% 0 3 3 1
16% 0 1 1 1
7% 1 4 2 0
18% 0 1 2 0
32% 0 3 2 0
26% 0 1 3 0
17% 0 2 2 1
30% 0 2 4 0
14% 0 1 4 0
17% 0 2 4 0
16% 0 2 4 0
18% 0 2 1 0
35% 0 3 1 0
17% 0 1 4 0
28% 0 3 1 0
20% 0 1 2 0
7% 1 4 3 0
35% 0 3 3 1
19% 0 1 3 0
8% 0 2 2 1
17% 0 2 4 0
15% 0 2 4 0
26% 0 4 2 0
16% 0 4 1 0
18% 0 2 1 0
29% 0 2 3 0
54% 0 3 2 0
11% 0 2 2 1
32% 0 2 1 0
46% 0 3 2 0
19% 0 2 3 0
19% 0 1 2 0
35% 0 2 1 0
21% 0 1 2 0
17% 0 2 2 1
35% 0 3 3 1
26% 0 2 3 0
20% 0 1 1 1
25% 0 4 1 0
30% 0 2 1 0
12% 0 4 2 0
28% 0 2 4 0
18% 0 1 4 0
17% 0 1 1 1
34% 0 3 3 1
22% 0 2 1 0
15% 0 2 3 0
21% 0 1 4 0
26% 0 2 2 1
15% 0 2 4 0
13% 0 2 2 1
28% 0 3 1 0
34% 0 3 4 0
32% 0 3 4 0
11% 0 2 3 0
29% 0 1 3 0
11% 0 2 2 1
17% 0 2 4 0
15% 0 2 1 0
32% 0 1 4 0
25% 0 1 2 0
21% 0 1 1 1
22% 0 2 1 0
28% 0 2 1 0
21% 0 4 4 1
17% 0 2 2 1
20% 0 2 3 0
27% 0 1 4 0
27% 0 4 3 0
23% 0 4 4 1
20% 0 2 4 0
25% 0 2 2 1
28% 0 1 1 1
13% 0 4 4 1
18% 0 1 2 0
35% 0 2 4 0
25% 0 4 2 0
16% 0 1 1 1
36% 0 3 2 0
17% 0 1 1 1
18% 0 2 2 1
25% 0 1 1 1
19% 0 2 3 0
31% 0 3 3 1
21% 0 1 1 1
11% 0 2 4 0
18% 0 1 3 0
28% 0 2 2 1
23% 0 2 2 1
15% 0 2 4 0
41% 0 3 3 1
33% 0 2 3 0
17% 0 2 2 1
22% 0 1 1 1
31% 0 2 2 1
19% 0 1 2 0
42% 0 3 1 0
18% 0 2 2 1
20% 0 2 1 0
18% 0 2 1 0
36% 0 3 2 0
25% 0 2 3 0
10% 0 2 2 1
13% 0 4 4 1
31% 0 3 4 0
23% 0 2 1 0
22% 0 2 3 0
23% 0 1 1 1
22% 0 2 3 0
19% 0 2 2 1
22% 0 1 3 0
16% 0 2 2 1
18% 0 2 3 0
13% 0 1 2 0
13% 0 2 4 0
21% 0 2 4 0
38% 0 3 4 0
22% 0 1 2 0
12% 0 2 3 0
13% 0 1 3 0
33% 0 3 3 1
13% 0 2 4 0
29% 0 2 3 0
24% 0 2 2 1
38% 0 3 1 0
21% 0 2 2 1
20% 0 2 1 0
16% 0 2 3 0
25% 0 2 2 1
17% 0 2 4 0
17% 0 1 4 0
6% 0 4 4 1
15% 0 4 2 0
21% 0 1 1 1
27% 0 2 2 1
29% 0 2 4 0
37% 0 3 3 1
10% 0 2 1 0
16% 0 2 2 1
42% 0 3 3 1
15% 0 4 4 1
11% 0 1 2 0
15% 0 2 1 0
23% 0 1 1 1
27% 0 1 1 1
32% 0 3 2 0
18% 0 1 3 0
14% 0 2 4 0
18% 0 4 2 0
21% 0 2 2 1
14% 0 1 1 1
24% 0 2 2 1
21% 0 4 1 0
23% 0 2 1 0
20% 0 2 3 0
15% 0 2 2 1
24% 0 1 3 0
19% 0 2 2 1
22% 0 2 4 0
14% 0 2 2 1
17% 0 1 1 1
16% 0 1 1 1
13% 0 1 2 0
24% 0 2 2 1
15% 0 1 1 1
16% 0 2 1 0
24% 0 2 3 0
15% 0 2 1 0
16% 0 1 4 0
22% 0 1 2 0
16% 0 2 1 0
16% 0 2 2 1
22% 0 1 4 0
26% 0 2 1 0
15% 0 1 1 1
21% 0 2 3 0
18% 0 2 4 0
31% 0 3 3 1
26% 0 1 2 0
31% 0 3 3 1
28% 0 2 3 0
33% 0 3 3 1
29% 0 2 1 0
16% 0 1 1 1
36% 0 2 1 0
21% 0 2 4 0
14% 0 2 4 0
23% 0 4 1 0
17% 0 1 1 1
29% 0 2 2 1
16% 0 2 3 0
29% 0 1 3 0
19% 0 1 2 0
17% 0 4 3 0
15% 0 2 2 1
32% 0 3 3 1
17% 0 2 3 0
17% 0 2 2 1
17% 0 2 2 1
14% 0 2 2 1
23% 0 2 3 0
25% 0 1 3 0
41% 0 3 2 0
19% 0 2 4 0
21% 0 2 1 0
24% 0 2 2 1
13% 0 4 1 0
14% 0 1 1 1
9% 0 4 1 0
28% 0 2 2 1
12% 0 2 3 0
32% 0 1 3 0
17% 0 4 3 0
19% 0 2 4 0
38% 0 3 3 1
16% 0 2 4 0
18% 0 2 2 1
35% 0 3 2 0
15% 0 2 1 0
26% 0 2 2 1
23% 0 1 2 0
17% 0 1 2 0
32% 0 2 2 1
15% 0 1 3 0
19% 0 2 1 0
25% 0 2 2 1
28% 0 1 4 0
31% 0 2 3 0
27% 0 1 2 0
23% 0 4 4 1
29% 0 2 2 1
23% 0 1 1 1
22% 0 1 1 1
17% 0 4 2 0
26% 0 1 2 0
23% 0 1 3 0
17% 0 4 4 1
20% 0 2 2 1
19% 0 4 4 1
19% 0 1 3 0
24% 0 2 4 0
19% 0 2 3 0
32% 0 2 3 0
27% 0 2 2 1
16% 0 2 2 1
18% 0 2 1 0
23% 0 1 4 0
34% 0 3 4 0
obability is retained, and is compared to the actual segment membership to measure model accuracy and classification errors.