Professional Documents
Culture Documents
04associationrulesbasics 12600984153182 Phpapp02 PDF
04associationrulesbasics 12600984153182 Phpapp02 PDF
Applications
Basket data analysis
Cross-marketing
Catalog design
…
Support (s)
Fraction of transactions
that contain both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
Customer
buys both Customer
buys Milk
Customer
buys Bread
Brute-force approach:
List all possible association rules
Compute the support and confidence for each rule
Prune rules that fail the minsup and minconf thresholds
All the above rules are binary partitions of the same itemset:
Rule Generation
Generate high confidence rules from frequent itemset
Each rule is a binary partitioning of a frequent itemset
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Brute-force approach:
Each itemset in the lattice is a candidate frequent itemset
Count the support of each candidate by scanning the
database
TID Items
1 Bread, Peanuts, Milk, Fruit, Jam
candidates
2 Bread, Jam, Soda, Chips, Milk, Fruit
3 Steak, Jam, Soda, Chips, Bread
N 4 Jam, Soda, Peanuts, Milk, Fruit
5 Jam, Soda, Chips, Milk, Bread M
6 Fruit, Soda, Chips, Milk
7 Fruit, Soda, Peanuts, Milk
8 Fruit, Peanuts, Cheese, Yogurt
w
Match each transaction against every candidate
Complexity ~ O(NMw) => Expensive since M = 2d
Apriori principle
If an itemset is frequent, then all of
its subsets must also be frequent
null
A B C D E
AB AC AD AE BC BD BE CD CE DE
Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
Pruned
supersets ABCDE
Items (1-itemsets)
Item Count
Bread 4
Peanuts 4 2-itemsets
Milk 6 2-Itemset Count
Fruit 6
Bread, Jam 4
Jam 5
Peanuts, Fruit 4
Soda 6
Milk, Fruit 5
Chips 4
Milk, Jam 4
Steak 1
Milk, Soda 5
Cheese 1
Fruit, Soda 4
Yogurt 1
Jam, Soda 4
3-itemsets
Soda, Chips 4
Minimum Support = 4 3-Itemset Count
Milk, Fruit, Soda 4
Let k=1
Generate frequent itemsets of length 1
Repeat until no new frequent itemsets are identified
Generate length (k+1) candidate itemsets from length k
frequent itemsets
Prune candidate itemsets containing subsets of length k
that are infrequent
Count the support of each candidate by scanning the DB
Eliminate candidates that are infrequent, leaving only
those that are frequent
L1 = {frequent items};
for (k = 1; Lk != ; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Join Step
Ck is generated by joining Lk-1 with itself
Prune Step
Any (k-1)-itemset that is not frequent cannot be a subset
of a frequent k-itemset
e.g., L = {A,B,C,D}:
ABCD=>{ }
Low
Confidence
Rule
BCD=>A ACD=>B ABD=>C ABC=>D
Support
distribution of
a retail data set