Data Mining Unsupervised - Assotiation Rules

Association Rules
(Unsupervised
Learning Method)
Association Rules
Study of “what goes with what”
Also called market basket analysis,

affinity analysis
Origin: Study of customer transaction

databases to determine dependencies
between purchases of different items
Promotion on one item, raise price of related item
Placement in store
Stocking
And lift
Actionable Rules
Example: cell phone faceplates
A store that sells accessories for cellular

phones runs a promotion on faceplates
Buy multiple
faceplates from a
choice of 6 different
colors and get a
discount!
The store managers would like to know what

colors of faceplates customers are likely to
purchase together
Data from first week of promotion
(tiny example)
Association Rules are probabilistic “if-then" statements
Basic idea:
• Examine all possible rules between items in “if-then“
format
• Select only rules most likely to indicate true dependence
Many rules are possible
Example: Transaction 1 supports several

rules, such as
– “If red, then white” (“If a red faceplate is
purchased, then so is a white one”)
– “If white, then red”
– “If red and white, then green”
+ several more Terminology:

“IF” part = antecedent
“THEN” part = consequent
Problem: computation time grows

exponentially as # items increases
Rule Generation
Problem: Generating all possible rules is
exponential in the number of distinct
items
Solution: Frequent item sets

Consider only combinations that occur
with higher frequency in the database
Criterion for “frequent”: Support of an item

% (or number) of transactions in which antecedent
(IF) and consequent (THEN) appear in the data
Criterion for “Support”: Support of an item % (or number) of
transactions in which antecedent (IF) and consequent (THEN) appear
in the data
# transactions with both antecedent & consequent item sets

# Total transactions
Confidence: % of antecedent (IF) transactions that also have the

consequent (THEN) item set
# transactions with both antecedent & consequent item sets

# transactions with antecedent item set
What is the support for “if white then
blue”? (choose one or more)
1. 4
2. 40%
3. 2
4. 90%
What is the support for “if blue then
white”? (choose one or more)
1. 4
2. 40%
3. 2
4. 90%
What is the support and confidence for the Rule Socks => Tie ???
Generating frequent itemsets:
The Apriori Algorithm
For k products…
1. Set minimum support criterion
2. Generate list of one-item sets that
meet the support criterion
3. Use list of one-item sets to generate
list of two-item sets that meet support
criterion
4. Use list of two-item sets to generate
list of three-item sets that meet
support criterion
5. Continue up through k-item sets
Performance Measure : Lift ratio
Lift ratio= confidence/(benchmark
confidence)
Benchmark assumes independence

between antecedent and consequence:
P(antecedent & consequent) = P(antecedent) x P(consequent)
Benchmark confidence:
P(C|A) = P(C&A) / P(A) = P(C) x P(A) / P(A) = P(C)
=
# transactions with consequent item sets
# transactions in database
Lift = Confidence(rule) / support
(consequence)
If confidence > support (rule), lift is

positively correlated.
If c=s, lift =1 means lift is independent
c & s = [0,1]
Lift = [0,inf]
Interpreting Lift
Lift > 1 indicates a rule that is useful in

finding consequent items sets (i.e., more
useful than selecting transactions
randomly)
Interpretation revisited
• Lift ratio shows how effective the rule is
in finding consequents vs. random
(useful if finding particular
consequents is important)
• Confidence shows the rate at which

consequents will be found (useful in
learning costs of promotion)
• Support measures overall impact (%

transactions affected)
Caution: The Role of Chance
Random data can generate apparently

interesting association rules
The more rules you produce,

the greater this danger
Rules based on large numbers of

records are less subject to this danger
3 and 11 rule
Rule # Conf. % Antecedent (a) Consequent (c) Support(a) Support(c) Support(a U c) Lift Ratio
1 100 ItalCook=> CookBks 227 862 227 2.320186
2 62.77 ArtBks, ChildBks=> GeogBks 325 552 204 2.274247

3 54.13 CookBks, DoItYBks=> ArtBks 375 482 203 2.246196
4 61.98 ArtBks, CookBks=> GeogBks 334 552 207 2.245509
5 53.77 CookBks, GeogBks=> ArtBks 385 482 207 2.230964
6 57.11 RefBks=> ChildBks, CookBks 429 512 245 2.230842
7 52.31 ChildBks, GeogBks=> ArtBks 390 482 204 2.170444
8 60.78 ArtBks, CookBks=> DoItYBks 334 564 203 2.155264
9 58.4 ChildBks, CookBks=> GeogBks 512 552 299 2.115885
10 54.17 GeogBks=> ChildBks, CookBks 552 512 299 2.115885
11 57.87 CookBks, DoItYBks=> GeogBks 375 552 217 2.096618
12 56.79 ChildBks, DoItYBks=> GeogBks 368 552 209 2.057735
• Rules in order of lift

• Information can be compressed
– e.g., rules 3 and 11 have same trio of
books
Let us Do it in Python

Data Mining Unsupervised - Assotiation Rules

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining Unsupervised - Assotiation Rules

Uploaded by

Copyright:

Available Formats

Association Rules

Also called market basket analysis,

Origin: Study of customer transaction

A store that sells accessories for cellular

The store managers would like to know what

Association Rules are probabilistic “if-then" statements

Example: Transaction 1 supports several

+ several more Terminology:

Problem: computation time grows

Solution: Frequent item sets

Criterion for “frequent”: Support of an item

# transactions with both antecedent & consequent item sets

Confidence: % of antecedent (IF) transactions that also have the

# transactions with both antecedent & consequent item sets

Benchmark assumes independence

P(antecedent & consequent) = P(antecedent) x P(consequent)

If confidence > support (rule), lift is

If c=s, lift =1 means lift is independent

Lift > 1 indicates a rule that is useful in

• Confidence shows the rate at which

• Support measures overall impact (%

Random data can generate apparently

The more rules you produce,

Rules based on large numbers of

2 62.77 ArtBks, ChildBks=> GeogBks 325 552 204 2.274247

• Rules in order of lift

You might also like