You are on page 1of 21

Project I – MIS 6324 – Business Intelligence

Association Rules & Clustering

Group – 11
Visalakshi Arunachalam
Saumil Badia
Chirag Gala
Pratik Kapadia
Introduction:

• Confidence:
Confidence is the ratio of the number of transactions that include all items in the consequent as well as the
antecedent (namely, the support) to the number of transactions that include all items in the antecedent.

Confidence = No. of transactions containing both body and head
No. of transactions containing body
• Support:

The support is simply the number of transactions that include all items in the antecedent and consequent parts of
the rule. (The support is sometimes expressed as a percentage of the total number of records in the database.)

Support = No. of transactions containing items in body and head
Total no. of transactions in database

• Lift:

Lift is a measure of the performance of a model. It is basically the likelihood of occurrence of an outcome (head)
given the antecedent (body).

Lift = Confidence
Freq of head

• Association Rule:

Association rule mining finds interesting associations and/or correlation relationships among large set of data
items. Association rules shows attribute value conditions that occur frequently together in a given dataset. A
typical and widely-used example of association rule mining is Market Basket Analysis.

For example, data are collected using bar-code scanners in supermarkets. Such ‘market basket’ databases consist
of a large number of transaction records.

Association rules provide information of this type in the form of "if-then" statements. These rules are computed
from the data and, unlike the if-then rules of logic, association rules are probabilistic in nature.

In addition to the antecedent (the "if" part) and the consequent (the "then" part), an association rule has two
numbers that express the degree of uncertainty about the rule. In association analysis the antecedent and
consequent are sets of items (called itemsets) that are disjoint (do not have any items in common).
• K-Means

The k-means algorithm assigns each point to the cluster whose center (also called centroid) is nearest. The center
is the average of all the points in the cluster — that is, its coordinates are the arithmetic mean for each dimension
separately over all the points in the cluster.

Association Rules:
Q1.

Input Parameters: Support: 150, Confidence: 30

Rules Generated: 11

Data
Input Data project1_data!$M$1:$Q$2187
Data Format Item List
Minimum Support 150
Minimum Confidence % 30
# Rules 11
Overall Time (secs) 3

If customers buy from landsend.com they tend to buy from llbean.com also and the confidence of this
rule is 41.45%

Support(a Lift Ratio
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c)
U c) 
1 41.54 landsend.com=> llbean.com 479 538 199 1.688051
2 36.99 llbean.com=> landsend.com 538 479 199 1.688051
3 52.59 gap.com=> oldnavy.com 424 820 223 1.402088
4 46.5 ae.com=> victoriassecret.com 357 825 166 1.232072
5 46.2 kohls.com=> jcpenney.com 368 923 170 1.094081
6 36.79 gap.com=> victoriassecret.com 424 825 156 0.974889
7 36.46 oldnavy.com=> victoriassecret.com 820 825 299 0.96617
8 36.24 victoriassecret.com=> oldnavy.com 825 820 299 0.96617
9 34.45 landsend.com=> jcpenney.com 479 923 165 0.815825
10 32.53 llbean.com=> jcpenney.com 538 923 175 0.770379
11 32.44 oldnavy.com=> jcpenney.com 820 923 266 0.768274
Q2.

Comparing two association rules that have reverse body and head

A.

Support
Support Support (Head &
Rule# Confidence Head Body (Head) (Body) Body) Lift Ratio
1 41.54 landsend.com=> llbean.com 479 538 199 1.688051
2 36.99 llbean.com=> landsend.com 538 479 199 1.688051

Rule 1:

Rule 2:

Association Rules Defined in terms of number of customers:

Landsend.com(A) llbean.com (B) Both only A Only B
479 538 199 280 339
B.

Support
Support Support (Head &
Rule# Confidence Head Body (Head) (Body) Body) Lift Ratio
7 36.46 oldnavy.com=> victoriassecret.com 820 825 299 0.96617
8 36.24 victoriassecret.com=> oldnavy.com 825 820 299 0.96617

Rule 1:

Rule 2:

Association Rules Defined in terms of number of customers:

oldnavy.com (A) victoriassecret.com (B) Both only A Only B
820 825 299 521 526
Q3

Relationship between the number of rules and the support/confidence level

Support Confidence
(No. of Transactions) (%) No. of Rules Time
100 30 14 1
100 40 5 1
100 50 2 2
100 60 0 0
150 30 11 1
150 40 4 1
150 50 1 1
200 30 4 1
200 40 1 1
200 50 1 1
250 30 3 1
250 40 0 0

Chart displaying the trend in Association Rules for different values of Support &
Confidence Level

Conclusion:

No. of Association Rules generated depends on both Confidence & Support Level. Higher the value of these two factors
lesser the number of rules.
Q4.

Clustering with K=4:

Cluster centers

Cluster region hhsz age income money

Cluster-1 1.82582 2.764345 6.594262 2.29099 494.759126
Cluster-2 1.30693 3.400283 7.118812 5.951909 572.845096
Cluster-3 3.255893 2.417508 7.215488 4.929293 385.638049
Cluster-4 3.047858 4.695217 6.425693 5.234257 991.819418

Distance between cluster
Cluster-1 Cluster-2 Cluster-3 Cluster-4
centers
Cluster-1 0 78.1778095 109.1646525 497.0742874
Cluster-2 78.1778095 0 187.2225891 418.9811281
Cluster-3 109.1646525 187.2225891 0 606.1862753
Cluster-4 497.0742874 418.9811281 606.1862753 0

Data summary Data summary (In Original coordinates)

Average
Average distance
Cluster #Obs distance in Cluster #Obs
in cluster
cluster
Cluster-1 506 1.68 Cluster-1 506 355.4019014
Cluster-2 704 1.477 Cluster-2 704 404.8165039
Cluster-3 563 1.347 Cluster-3 563 251.4218487
Cluster-4 413 1.868 Cluster-4 413 842.0241951
Overall 2186 1.565 Overall 2186 436.4733185

Elapsed Time

Overall (secs) 5.00
Q5.
Cluster Interpretation for K = 4:
Cluster 1 Cluster 2
Region HH Size Age Income Money Region HH Size Age Income Money
2 2 8 4 178.8 2 4 9 6 542.21
2 2 2 3 504.05 1 3 10 5 393.76
3 2 6 1 519.79 2 4 4 7 308.47
1 3 11 2 151.5 1 4 4 7 272.86
1 2 8 4 968.91 1 5 11 5 1080.14
2 1 7 4 255.67 1 3 11 6 357.63
2 2 3 1 261.99 1 2 11 5 58.97
3 4 6 1 325.5 1 2 9 6 255.89
2 3 3 2 207.06 1 4 9 7 1415.03
3 3 8 2 336.2 1 4 6 6 663.75
1 2 8 2 349.32 1 6 5 5 137.99
2 3 3 2 369.47 1 5 8 7 332.97
2 2 6 2 430.5 <-AVG-> 1 4 8 6 558.5

Cluster 3 Cluster 4
Region HH Size Age Income Money Region HH Size Age Income Money
3 2 11 3 599.08 4 5 6 7 360.49
2 2 9 5 363.89 3 5 7 7 475.95
2 2 5 5 133.48 4 4 5 7 1976.27
3 2 11 5 697.97 3 4 4 5 135.91
3 2 8 4 377.35 2 6 6 6 377.32
3 2 9 7 629.69 4 6 5 5 328.7
4 3 6 4 303.48 2 5 8 5 259.82
2 2 3 5 184 3 4 3 6 158.89
4 2 4 4 131.89 3 6 7 6 1704.46
3 3 10 5 287.85 4 4 9 4 1840.72
3 2 9 4 83.87 2 6 8 5 87.44
4 3 5 7 318.92 3 4 9 7 413.18
3 2 8 5 342.62 <-AVG-> 3 5 6 6 676.595

Region HH Size Age Income Money
c1 2 2 6 2 430.5
c2 1 4 8 6 558.5
c3 3 2 8 5 342.62
c4 3 5 6 6 676.595
Final
Interpretation:

The clusters generated are uniquely identifiable. They are based on the attributes house hold size, age and income.
Cluster1 are called as random buyers. Their income is low and still they end up buying more. Cluster2 is called as luxury
seekers as they are old aged people with high income and they tend to spend more. Cluster3 is called as careful buyers as
they have a middle income and smaller family. They really plan and purchase at a low frequency. Cluster4 is called as Big
Buyers as they have a bigger family and middle aged persons with high income group. So they tend to purchase a lot and
in high frequency.
Q6.

Clustering with different values of K:

CASE-I (K=2):

Distance between cluster Cluster-1 Cluster-2
centers
Cluster region hhsz age income child race connection country money

Cluster-1 0 120.7418813
Cluster-1 2.25067 2.131368 6.689008 3.825738 0.229221 1.0563 0.852547 0.135389 501.111093
Cluster-2 120.7418813 0

Cluster-2 2.277778 3.79375 7.0125 5.193055 0.996528 1.025694 0.964583 0.132639 621.830856

Data summary Data summary (In Original coordinates)

Average Average
Cluster #Obs distance in Cluster #Obs distance in
cluster cluster

Cluster-1 667 2.661 Cluster-1 667 355.8518258

Cluster-2 1519 2.328 Cluster-2 1519 477.1282528

Overall 2186 2.429 Overall 2186 440.1239633

Cluster Center:

For K = 2, we have only 2 clusters. Though the number of observations in each cluster is identical, there is no difference
between the clusters in most of the parameters. Hence we cannot classify the clusters appropriately. So K=2 is not an
optimum value for k-means clustering.

CASE-II (K=4):

Cluster Center:

Cluster region hhsz age income child race connection country money
Cluster-1 2.378882 3.062112 7.329194 4.10559 0.664596 1.024845 -0.000001 0.254658 453.632868

Cluster-2 1.528678 3.609726 7.200748 4.759352 1 1.027431 1 0.017456 574.395328

Cluster-3 2.223282 2.043893 6.664122 4.305343 0 1.057252 1 0.122137 523.652347

Cluster-4 3.125894 3.711016 6.639485 5.147353 0.997139 1.032904 1 0.247497 659.759516

Data summary (In Original
Data summary
coordinates)

Average Average
Cluster #Obs distance Cluster #Obs distance in
in cluster cluster
325.605348
Cluster-1 161 2.568 Cluster-1 161
2
446.428346
Cluster-2 771 1.935 Cluster-2 771
9
373.971923
Cluster-3 523 2.175 Cluster-3 523
4
Cluster-4 731 2.241 Cluster-4 731 504.745063
Overall 2186 2.141 Overall 2186 439.695642

CASE-II (K=10):

Cluster Center:

Cluster region hhsz age income child race connection country money

Cluster-1 2.253623 3.221015 6.445651 2.289848 1 1 1 0 450.229973
Cluster-2 1.906542 2.051402 6.67757 2.714951 0 1 1 0 475.320936
Cluster-3 3.16 2.752727 7.861819 5.821818 1 1 1 0 450.519958
Cluster-4 1.241071 3.750001 7.174107 5.910715 0.997768 1 1 0 592.383203
Cluster-5 2.597345 2.039824 6.632743 5.69469 0 1 1 0 497.282837
Cluster-6 2.382165 3.050956 7.350318 4.127388 0.662421 1 -0.000001 0.261147 448.730053
Cluster-7 3.008097 5.004043 6.165993 5.441295 0.991903 1 1 0 507.327606
4449.98798
Cluster-8 2.431818 3.227273 7.272726 4.886364 0.863636 1.045455 1 0
8
Cluster-9 2.263158 2.929824 6.578947 3.982455 0.649123 2.350877 0.929825 0.157895 523.31175
Cluster-10 2.330578 3.318182 6.747934 4.747934 0.760331 1 1 0.999999 496.809785
stanc

etwee
Cluster-1 Cluster-2 Cluster-3 Cluster-4 Cluster-5 Cluster-6 Cluster-7 Cluster-8 Cluster-9 Cluster-10
uster
enters
uster- 25.1451665 142.205791 47.2028949 57.2179869 73.1156037
0 3.95034679 2.76989798 3999.75895 46.65710299
4 9 1 3 7
uster- 25.1451665 117.125391 22.1739320 26.6881051 32.2960826 3974.66799 48.0407051
0 25.0839916 21.66266898
4 6 3 1 2 2 6
uster- 141.881422 46.7987016 56.8790043 72.8456421
3.95034679 25.0839916 0 2.86576461 3999.46828 46.33796747
9 2 9 6
uster- 142.205791 117.125391 141.881422 95.1324295 143.674667 85.0904591 3857.60514 69.1276477
0 95.59415137
9 6 9 7 7 8 4 4
uster- 47.2028949 22.1739320 46.7987016 95.1324295 48.6098735 3952.70556 26.1461054
0 10.5412827 2.10163247
1 3 2 7 7 2 1
uster- 26.6881051 143.674667 48.6098735 58.6701395 74.6041191
2.76989798 2.86576461 0 4001.25815 48.10445001
1 7 7 2 1
uster- 57.2179869 32.2960826 56.8790043 85.0904591 58.6701395 3942.66102 16.2671794
10.5412827 0 10.76098261
3 2 9 8 2 1 7
uster- 3974.66799 3857.60514 3952.70556 3942.66102 3926.67664
3999.75895 3999.46828 4001.25815 0 3953.178371
2 4 2 1 5
uster- 73.1156037 48.0407051 72.8456421 69.1276477 26.1461054 74.6041191 16.2671794 3926.67664
0 26.56454999
7 6 6 4 1 1 7 5
uster- 46.6571029 21.6626689 46.3379674 95.5941513 48.1044500 10.7609826 3953.17837 26.5645499
2.10163247 0
0 9 8 7 7 1 1 1 9

Data summary Data summary (In Original coordinates)

Average Average
Cluster #Obs distance in Cluster #Obs distance in
cluster cluster
Cluster-1 270 1.603 Cluster-1 270 308.507607
Cluster-2 214 1.631 Cluster-2 214 297.5014998
Cluster-3 283 1.344 Cluster-3 283 277.938553
Cluster-4 446 1.379 Cluster-4 446 393.3044052
Cluster-5 226 1.44 Cluster-5 226 348.2440924
Cluster-6 157 2.512 Cluster-6 157 322.7320293
Cluster-7 247 1.339 Cluster-7 247 326.2300767
Cluster-8 44 3.033 Cluster-8 44 1502.255369
Cluster-9 57 3.485 Cluster-9 57 383.4960612
Cluster-10 242 2.163 Cluster-10 242 367.8065808
Overall 2186 1.685 Overall 2186 360.4535118

For K=10, there are too many clusters and the inter cluster difference is minimal. The number of observations in
each cluster is too low and a manager cannot differentiate marketing initiatives to various market segments
identified with 10 clusters as the clusters are too similar.

CASE-III (K=6):

Cluster Center:

connectio
Cluster region hhsz age income child race country money
n
Cluster-1 2.382165 3.050956 7.350318 4.127388 0.662421 1 -0.000001 0.261147 448.730053
Cluster-2 2.253363 2.042601 6.672646 4.242153 0 1 1 0 530.739044
Cluster-3 3.306195 3.582301 6.761062 4.920354 0.99646 1 1 0 611.6884
Cluster-4 1.414226 3.711297 7.129707 5.059972 0.998605 1 1 0 631.830167
Cluster-5 2.271186 2.966102 6.661017 3.983051 0.661017 2.338984 0.932203 0.152542 733.070973
Cluster-6 2.330578 3.318182 6.747934 4.747934 0.760331 1 1 0.999999 496.809785
Distance between
Cluster-1 Cluster-2 Cluster-3 Cluster-4 Cluster-5 Cluster-6
cluster centers
Cluster-1 0 82.02735741 162.9684465 183.1095957 284.3465278 48.10445001
Cluster-2 82.02735741 0 80.97986148 101.1176476 202.3397826 33.98039549
Cluster-3 162.9684465 80.97986148 0 20.23468212 121.4001728 114.8877857
Cluster-4 183.1095957 101.1176476 20.23468212 0 101.2635411 135.0288767
Cluster-5 284.3465278 202.3397826 121.4001728 101.2635411 0 236.2680567
Cluster-6 48.10445001 33.98039549 114.8877857 135.0288767 236.2680567 0

Data summary Data summary (In Original coordinates)

Average Average
Cluster #Obs distance in Cluster #Obs distance in
cluster cluster
Cluster-1 157 2.512 Cluster-1 157 322.7320293
Cluster-2 446 1.787 Cluster-2 446 376.2517197
Cluster-3 565 1.737 Cluster-3 565 470.6028622
Cluster-4 717 1.77 Cluster-4 717 480.9731203
Cluster-5 59 3.647 Cluster-5 59 664.8660267
Cluster-6 242 2.163 Cluster-6 242 367.8065808
Overall 2186 1.912 Overall 2186 437.9971766

Out of the K=4 and K=6, K=4 would be better for clustering. This is because in k=4 we obtain more disparate clusters.
For e.g. With K=4 there is significant difference between clusters based on the education level besides other variables.
This distinction was not as evident with K value 6.Also with 4 clusters the numbers persons per cluster is sizeable and
since inter cluster difference is higher based on more input variables K=4 should be the optimal value for K-Means
clustering.

Conclusion:

To look at what are the other possible customer segments we formulated clusters but with different values of K
The following is the result of our analysis based on different values of K
Value of K Average Distance b/w Average Distance b/w Interpretability
cluster centers cluster members

(High: clusters are far apart) (High: members of the same
cluster are far apart)

2 Highest(best) Highest Difficult to interpret and
classify

4 Higher Higher Easy to classify and
categorize

6 Lower Lower Convenient to categorize

10 Lowest Lowest(best) Difficult to categorize

Conditions to have unique clusters:
• High value of Inter-cluster distance that implies the clusters have many distinct features.
• Lower value of Intra cluster distance so that the members inside the cluster have high degree of homogeneity.
• Interpretability should be very good so that we can understand and categorize the cluster and plan business
strategies accordingly, concentrating on the key features of that particular cluster.
Q7.

Let us select the following association rule for interpretation:

Landsend.com  Llbean.com

Cluster centers

connectio
Cluster region hhsz age income child money
n
Cluster-1 1.833333 3 8 4.5 0.5 -0.000001 797.148331
Cluster-2 1.117647 3.517647 7.188235 5.541177 1 1 949.315413
Cluster-3 2.128205 1.974359 7.435898 5.282051 -0.000002 1 619.080895
Cluster-4 2.855073 3.84058 8.478263 5.985508 1 1 910.932895

Distance between cluster
Cluster-1 Cluster-2 Cluster-3 Cluster-4
centers
178.076754 113.808448
Cluster-1 0 152.17948
3 8
330.241379 38.4473958
Cluster-2 152.17948 0
1 4
178.076754 330.241379 291.863293
Cluster-3 0
3 1 9
113.808448 38.4473958 291.863293
Cluster-4 0
8 4 9

Data summary Data summary (In Original coordinates)

Average Average
Cluster #Obs distance in Cluster #Obs distance in
cluster cluster
567.820509
Cluster-1 6 2.206 Cluster-1 6
4
740.539882
Cluster-2 86 1.57 Cluster-2 86
6
401.573635
Cluster-3 39 1.625 Cluster-3 39
5
521.159355
Cluster-4 68 1.75 Cluster-4 68
9
593.937492
Overall 199 1.661 Overall 199
2

Elapsed Time

Overall (secs) 2.00

Cluster Interpretation:

Cluster_
1
region hhsz age income child connection money
1 3 7 6 1 1 863.69
1 4 10 6 1 1 723.31
1 3 10 6 1 1 666.65
1 2 7 5 0 0 1065.07
3 2 8 5 1 0 491.12
2 4 7 7 0 0 635.91
1 2 9 3 0 0 2232.66
3 5 8 6 1 0 298.95
1 3 9 1 1 0 59.18
3 8 6 Both No Connection

Cluster_
2
region hhsz age income child connection money
1 3 7 6 1 1 863.69
1 4 10 6 1 1 723.31
1 3 10 6 1 1 666.65
1 4 4 5 1 1 1103.43
1 3 10 6 1 1 1239.79
1 4 4 5 1 1 757.2
1 3 7 6 1 1 1485.98
1 4 8 6 1 1 319.34
1 4 7 6 1 1 259.9
1 3 5 5 1 1 331.22
1 3 7 5 1 1 188
1 3 2 5 1 1 1146.05
1 3 10 5 1 1 1644.13
1 3 6 6 1 1 140.09
1 3 5 5 1 1 201.49
1 4 8 7 1 1 1094.31
1 4 6 7 1 1 1024.22
1 4 6 7 1 1 1058.38
1 4 9 7 1 1 904.3
1 4 5 7 1 1 1043.41
1 4 5 7 1 1 1110.67
1 3 5 7 1 1 1030.66
2 4 9 5 1 1 921.19
1 3 10 6 1 1 137.76
1 3 7 5 1 1 1845.37
Have
1 4 5 to 7 5 to 7 Have Connection 850
Children
Cluster_3
Region hhsz age income child connection money
2 2 7 6 0 1 391.91
2 2 11 5 0 1 870.21
3 2 5 5 0 1 370.31
2 3 6 6 0 1 208.97
3 2 6 6 0 1 156
2 2 8 7 0 1 215.32
1 2 7 5 0 1 568.45
1 2 9 5 0 1 752.38
2 2 6 7 0 1 1182.52
1 2 6 6 0 1 458.29
1 2 6 6 0 1 436.83
3 1 6 6 0 1 730.34
1 2 11 5 0 1 987.32
2 1 4 7 0 1 485.91
3 2 10 7 0 1 854.27
3 2 10 7 0 1 373.19
3 2 11 7 0 1 591.92
1 2 7 5 0 1 1532.77
3 2 5 7 0 1 161.87
3 2 8 7 0 1 1270.35
3 2 5 7 0 1 62.45
1 2 6 7 0 1 249.43
1 2 9 7 0 1 144
1 2 11 7 0 1 343.32
3 2 9 3 0 1 208
1,2,3 2 7 to 11 7 no child Have Connection 500
Cluster_
4
region hhsz age income child connection money
3 4 8 7 1 1 604.5
3 3 11 6 1 1 906.17
3 3 9 6 1 1 303
3 3 8 7 1 1 806.9
3 4 4 7 1 1 882.37
3 3 9 7 1 1 738.83
3 4 5 7 1 1 1363.64
3 4 6 5 1 1 304.83
3 3 10 7 1 1 766.99
3 3 6 7 1 1 831.93
3 3 6 7 1 1 611.97
2 4 9 5 1 1 921.19
2 4 9 7 1 1 842.53
3 3 9 7 1 1 1457.28
3 3 5 7 1 1 679.29
3 3 9 5 1 1 260.82
2 4 6 6 1 1 270.83
3 5 7 5 1 1 610.05
2 4 7 7 1 1 460.74
3 3 3 6 1 1 398
3 3 5 6 1 1 1729.28
3 5 9 7 1 1 1394.86
2 3 9 7 1 1 999.94
3 5 6 7 1 1 471.83
2 5 9 6 1 1 779.66
Have
2,3 4 7 to9 5to7 Have Connection 800
Children

Conclusion

When compared to the clusters generated in Part2 these clusters have more similarities and are classified based on lesser
number of attributes. We are able to narrow down on the customer segments more easily than the clusters in part2 due to
lesser number of attributes defining them. Hence these clusters help in concentrating on the specific segments of interest
to the company chosen in the association rule and hence pave way for better business Intelligence strategies. As it can be
seen from the conclusion members of these clusters are middle or old aged people with good income level and the
differentiating factors are house hold size and children with internet.

Q8.

Association Rule on a cluster from Part 2:
Analyzing a particular cluster: Cluster_2

Case Data
i
Input Data Cluster2!$N$1:$R$87
Data Format Item List
Minimum Support 15
Minimum Confidence % 30
# Rules 12
Overall Time (secs) 1

If customers buy from landsend.com they tend to from buy llbean.com also and the confidence of this rule is 100%

Support(a
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c) Lift Ratio
U c)

1 100 oldnavy.com=> landsend.com, llbean.com 16 86 16 1
2 100 oldnavy.com=> landsend.com 16 86 16 1
3 100 landsend.com=> llbean.com 86 86 86 1
4 100 llbean.com=> landsend.com 86 86 86 1
5 100 llbean.com, oldnavy.com=> landsend.com 16 86 16 1
6 100 landsend.com, oldnavy.com=> llbean.com 16 86 16 1
7 100 oldnavy.com=> llbean.com 16 86 16 1
8 100 jcpenney.com=> llbean.com 24 86 24 1
9 100 jcpenney.com=> landsend.com 24 86 24 1
10 100 jcpenney.com=> landsend.com, llbean.com 24 86 24 1
11 100 jcpenney.com, llbean.com=> landsend.com 24 86 24 1
12 100 jcpenney.com, landsend.com=> llbean.com 24 86 24 1

Case Data
ii
Input Data Sheet1!$M$1:$R$87
Data Format Item List
Minimum Support 25
Minimum Confidence % 60
# Rules 2
Overall Time (secs) 1

If customers buy from landsend.com they tend to from buy llbean.com also and the confidence of this rule is 100%
Support(a
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c) Lift Ratio
U c)

1 100 llbean.com=> landsend.com 86 86 86 1
2 100 landsend.com=> llbean.com 86 86 86 1

Case Data
iii
Input Data Sheet1!$M$1:$R$87
Data Format Item List
Minimum Support 20
Minimum Confidence % 35
# Rules 7
Overall Time (secs) 1

If customers buy from landsend.com they tend to from buy llbean.com also and the confidence of this rule is 100%

Support(a
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c) Lift Ratio
U c)

1 100 llbean.com=> landsend.com 86 86 86 1
2 100 jcpenney.com=> llbean.com 24 86 24 1
3 100 jcpenney.com=> landsend.com 24 86 24 1
4 100 jcpenney.com=> landsend.com, llbean.com 24 86 24 1
5 100 jcpenney.com, llbean.com=> landsend.com 24 86 24 1
6 100 jcpenney.com, landsend.com=> llbean.com 24 86 24 1
7 100 landsend.com=> llbean.com 86 86 86 1

Analysis of Association Rules generated:

The association rules generated has a 100% confidence level irrespective of the Support level and with a constant Lift
Ratio of 1. This was not the case for the rules generated in Part 1. This implies that we are looking at rules that are
sensible and pertaining to the company of our interest, which is Landsend.com. They appear to be losing their customers
to llbean.com. Also these rules generated are based on one specific company where as in Part1 the rules generated were
based on a huge dataset with no reference to specific company. It just denoted the buying behavior of the customers in
general.
Q9.

Business Intelligence discovered for the firm Landsen.com &
Recommendations:

The shop landsend.com might be losing its customers to llbean.com. This is based on the XL
Miner report that we have generated. But the good thing is Landsen.com is also slowly gaining
customers from oldnavy.com. These are the some of the BI recommendations for Landsend.com

1. Landsend should concentrate on product and offerings/ promotions according to the
preference of our target customer segment. It should reach the target segments at the
right time.
2. They should try to gain or attract other segments. Landsend.com can have attractive
offers for the teens and children as they do not have these as their current target
segments.
3. They must have an easy to navigate user-friendly website with easily designed
shopping cart. Users should be able to check out with fewer clicks.
4. They can also provide better customer service by 24x7 customer support centers
assisting people in online purchases.
5. They can try different marketing techniques like e-mail and snail mail offers to create
awareness among existing customers and customers who are about to leave.
6. Also they must identify major competitors and launch an appropriate marketing
campaign to differentiate and target their customer base.
7. Landsend should better understand the expectations of the unhappy customers and
decide on business strategies. They can conduct store or online surveys and
understand customer behavior. They can even employ BI techniques to implement
product placements for better customer shopping experience.