Professional Documents
Culture Documents
Data Mining
Association Rules:
Background
Customer
buys diaper
Customer
buys beer
C A (50%, 100%)
Application Examples
Transaction: patient
Problem Statement
Y,
Various extensions
Problem Decomposition
1. Find all sets of items that have minimum
support (frequent itemsets)
2. Use the frequent itemsets to generate the
desired rules
Problem Decomposition
Example
Transaction ID Items Bought
1
Shoes, Shirt, Jacket
2
Shoes,Jacket
3
Shoes, Jeans
4
Shirt, Sweatshirt
Frequent Itemset
{Shoes}
{Shirt}
{Jacket}
{Shoes, Jacket}
Support
75%
50%
50%
50%
50
Confidence =
70
=66.6%
Discovering Rules
Nave Algorithm
for each frequent itemset l do
for each subset c of l do
if (support(l ) / support(l - c) >= minconf) then
output the rule (l c ) c,
with confidence = support(l ) / support (l - c )
and support = support(l )
C1
Items
134
235
1235
25
Scan D
L2 itemset sup
{1}
{2}
{3}
{4}
{5}
C2 itemset sup
2
2
3
2
{1
{1
{1
{2
{2
{3
C3 itemset
{2 3 5}
Scan D
{1 3}
{2 3}
{2 5}
{3 5}
2
3
3
1
3
2}
3}
5}
3}
5}
5}
1
2
1
2
3
2
L1 itemset sup.
{1}
{2}
{3}
{5}
2
3
3
3
C2 itemset
{1 2}
Scan D
L3 itemset sup
{2 3 5} 2
{1
{1
{2
{2
{3
3}
5}
3}
5}
5}
Step 2: pruning
forall itemsets c in Ck do
forall (k-1)-subsets s of c do
if (s is not in Lk-1) then delete c from Ck
Example of Generating
Candidates
Self-joining: L3*L3
Pruning:
C4={abcd}
Method:
Hash-tree:search
Max-Miner
1,2
1,3 1,4
1,2,3
1,2,4
1,2,3,4
2,3
1,3,4
2,4
3,4
2,3,4
Max-miner pruning
The algorithm
Max-Miner
Set candidate groups C {}
Set of Itemsets F {Gen-Initial-Groups(T,C)}
while C not empty do
scan T to count the support of all candidate groups in C
for each g in C s.t. h(g) U t(g) is frequent do
F F U {h(g) U t(g)}
Set candidate groups Cnew{ }
for each g in C such that h(g) U t(g) is infrequent do
F F U {Gen-sub-nodes(g, Cnew)}
C
remove from F any itemset with a proper superset in F
remove from C any group g s.t. h(g) U t(g) has a superset in F
return F
Item Ordering