You are on page 1of 22

Association Rule Mining

FP Growth

Data Mining - Ayesha Khan November 3, 2015


ASSOCIATION RULE MINING

FP Growth Algorithm

} One problematic aspect of the Apriori is the candidate


generation
} Source of exponential growth

} Idea: Compress the database into a frequent pattern tree


representing frequent items
ASSOCIATION RULE MINING

FP Growth Algorithm
Divide-and-conquer strategy as follows:
} Compress a large database into a compact, Frequent-Pattern
tree (FP-tree) structure
} highly condensed, but complete for frequent pattern
mining
} avoid costly database scans
} Then, divide such a compressed database into a set of
conditional databases(a special kind of projected database)
} And, mine each such database separately
ASSOCIATION RULE MINING

FP Growth Algorithm: Tree Construction

} Initially, scan database for frequent 1-itemsets


} Place resulting set in a list L in descending order by
frequency (support)

} Construct an FP-tree
} Create a root node labeled null
} Scan database
} Process the items in each transaction in L order
} From the root, add nodes in the order in which items appear in
the transactions
} Link nodes representing items along different branches
ASSOCIATION RULE MINING

FP Growth Algorithm: Tree Construction

TID Items
1 I1,I2,I5 } Minimum support of ~20% (count of 2)
2 I2,I4 } Frequent 1-itemsets
3 I2,I3,I6 I1, I2, I3, I4, I5
4 I1,I2,I4 } Construct list
5 I1,I3
L = {(I2,7),(I1,6),(I3,6),(I4,2),(I5,2)}
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
ASSOCIATION RULE MINING

FP Growth Algorithm: Tree Construction

I2 1 null Create root node


I1 1
I3 0 (I2,1) Scan database
Transaction1, I1, I2, I5
I4 0 Order: I2, I1, I5
I5 1 (I1,1)

(I5,1) Process transaction


Add nodes in item order
Label with items, count

Maintain header table


ASSOCIATION RULE MINING

FP Growth Algorithm: Tree Construction

TID Items
I2 2 null
1 I1,I2,I5
I1 1
2 I2,I4
I3 0 (I2,2)
3 I2,I3,I6
I4 1
(I4,1) 4 I1,I2,I4
I5 1 (I1,1)
5 I1,I3

(I5,1) 6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
ASSOCIATION RULE MINING

Prefix Paths
FP Growth Algorithm: FP Tree Mining (I2 I1,1)
(I2 I1 I3, 1)
Conditional Path
I2 7 null (I2 I1, 2)
I1 6 Conditional FP-tree
(I1,2)
I3 6 (I2,7) null
I4 2
(I4,1) (I3,2)
I5 2 (I1,4)
(I3,2) (I2,2)
(I5,1) (I4,1)
(I3,2) (I2 I1 I5, 2),
(I1,2)
(I2 I5, 2),
(I1 I5, 2)
(I5,1)
Data Mining - Ayesha Khan November 3, 2015
ASSOCIATION RULE MINING

TID Items bought (ordered) frequent items


100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Steps: Header Table
1. Scan DB once, find
Item frequency head f:4 c:1
frequent 1-itemset (single
item pattern) f 4
c 4 c:3 b:1 b:1
2. Order frequent items in a 3
frequency descending b 3 a:3 p:1
order m 3
3. Scan DB again, construct p 3 m:2 b:1
FP-tree
p:2 m:1
ASSOCIATION RULE MINING

FP Growth Algorithm: FP Tree Mining

1) Construct conditional pattern base for each node in the


FP-tree
2) Construct conditional FP-tree from each conditional
pattern-base
3) Recursively mine conditional FP-trees and grow frequent
patterns obtained so far
▪ If the conditional FP-tree contains a single path,
simply enumerate all the patterns
ASSOCIATION RULE MINING

} Starting at the frequent header table in the FP-tree


} Traverse the FP-tree by following the link of each frequent item
} Accumulate all of transformed prefix paths of that item to form a
conditional pattern base

Header Table {}
Conditional pattern bases
Item frequency head f:4 c:1 itemcond. pattern base
f 4
c 4 c f:3
c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
ASSOCIATION RULE MINING

} For each pattern-base


} Accumulate the count for each item in the base
} Construct the FP-tree for the frequent items of the pattern base

m-conditional pattern
Header Table {} base:
Item frequency head fca:2, fcab:1
f 4 f:4 c:1 All frequent patterns
{} concerning m
c 4
c:3 b:1 b:1 m,
a 3   fm, cm, am,
b 3 a:3 p:1 f:3
fcm, fam, cam,
m 3
c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
Example: Sorted by their frequency
Transactio
n Items      
100 Bread, Cheese. Eggs, Juice
200 Bread, Cheese, Juice  
300 Bread, Milk, Yogurt  
400 Bread, Juice, Milk  
500 Cheese, Juice, Milk  

Item Frequency      

Bread 4

Juice 4  

Cheese. 3  

Milk, 3  

 
Removing the non-frequent
items and reordering
Transactio
n Items      
100 Bread, Cheese. Eggs, Juice
200 Bread, Cheese, Juice  
300 Bread, Milk, Yogurt  
400 Bread, Juice, Milk  
500 Cheese, Juice, Milk  

Transaction Items      

100 Bread, Juice. Cheese

200 Bread, Juice. Cheese  

300 Bread, Milk,  

400 Bread, Juice, Milk  

500 Juice, Cheese, Milk  


Transacti
on Items      
100 FPBread,Tree
Juice. Cheese
200 Bread, Juice. Cheese  
300 Bread, Milk,  
400 Bread, Juice, Milk  
500 Juice, Cheese, Milk  
NUL
L

B:4
J:1

B
J:3 M:
J 1
C:1

C
M

C:2 M: M:
1 1
Mining FP-tree for frequent items
} For any frequent item A, all the frequent itemsets
containing A can be obtained by following the A's node-
links, starting from A's head in the FP-tree header

} The mining on the FP-tree structure is done using an


algorithm called the frequent pattern growth (FP-Growth).

} This algorithm starts with the least frequent item, that is


the last item in the header table.
FP Growth
} We start with the item M and find the following patterns:
} BM(l) NUL
L

} BJM(I)
B:4
} JCM(I) J:1

B
J:3 M:
J 1
C:1

C
M

C:2 M: M:
1 1
FP Growth
} Next we look at C and find the following:
} BJC(2)
} JC(l) NUL
L

} Give us a frequent itemset JC(3).B:4


J:1

B
J:3 M:
J 1
C:1

C
M

C:2 M: M:
1 1
FP Growth
} Looking at J, the next frequent item in the header table we
obtain:
} BJ(3) NUL
L
} J(l)
B:4
J:1
} We obtain a frequent itemset,
B
J:3 M:
J 1
C:1

C
M

C:2 M: M:
1 1
“Conditional" trees for M
NUL
L

B:4
J:1

J:3 M: C:1
1

M: M:
1 1
FP-growth vs. Apriori: Scalability With the Support Threshold

100 Data set T25I20D10K


D1 FP-growth runtime
D1 Apriori runtime

75
Run time(sec.)

50

25

0
0 0.8 1.5 2.3 3
Support threshold(%)

You might also like