FP Growth Algorithm for Association Rule Mining

Association Rule Mining
FP Growth
Data Mining - Ayesha Khan November 3, 2015

ASSOCIATION RULE MINING
FP Growth Algorithm
} One problematic aspect of the Apriori is the candidate

generation
} Source of exponential growth
} Idea: Compress the database into a frequent pattern tree

representing frequent items
FP Growth Algorithm
Divide-and-conquer strategy as follows:
} Compress a large database into a compact, Frequent-Pattern
tree (FP-tree) structure
} highly condensed, but complete for frequent pattern
mining
} avoid costly database scans
} Then, divide such a compressed database into a set of
conditional databases(a special kind of projected database)
} And, mine each such database separately
FP Growth Algorithm: Tree Construction
} Initially, scan database for frequent 1-itemsets

} Place resulting set in a list L in descending order by
frequency (support)
} Construct an FP-tree
} Create a root node labeled null
} Scan database
} Process the items in each transaction in L order
} From the root, add nodes in the order in which items appear in
the transactions
} Link nodes representing items along different branches
TID Items
1 I1,I2,I5 } Minimum support of ~20% (count of 2)
2 I2,I4 } Frequent 1-itemsets
3 I2,I3,I6 I1, I2, I3, I4, I5
4 I1,I2,I4 } Construct list
5 I1,I3
L = {(I2,7),(I1,6),(I3,6),(I4,2),(I5,2)}
6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
I2 1 null Create root node

I1 1
I3 0 (I2,1) Scan database
Transaction1, I1, I2, I5
I4 0 Order: I2, I1, I5
I5 1 (I1,1)
(I5,1) Process transaction

Add nodes in item order
Label with items, count
Maintain header table

TID Items
I2 2 null
1 I1,I2,I5
I1 1
2 I2,I4
I3 0 (I2,2)
3 I2,I3,I6
I4 1
(I4,1) 4 I1,I2,I4
I5 1 (I1,1)
5 I1,I3
(I5,1) 6 I2,I3
7 I1,I3
8 I1,I2,I3,I5
9 I1,I2,I3
Prefix Paths
FP Growth Algorithm: FP Tree Mining (I2 I1,1)
(I2 I1 I3, 1)
Conditional Path
I2 7 null (I2 I1, 2)
I1 6 Conditional FP-tree
(I1,2)
I3 6 (I2,7) null
I4 2
(I4,1) (I3,2)
I5 2 (I1,4)
(I3,2) (I2,2)
(I5,1) (I4,1)
(I3,2) (I2 I1 I5, 2),
(I1,2)
(I2 I5, 2),
(I1 I5, 2)
(I5,1)
Data Mining - Ayesha Khan November 3, 2015
TID Items bought (ordered) frequent items

100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m} min_support = 0.5
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p}
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}
{}
Steps: Header Table
1. Scan DB once, find
Item frequency head f:4 c:1
frequent 1-itemset (single
item pattern) f 4
c 4 c:3 b:1 b:1
2. Order frequent items in a 3
frequency descending b 3 a:3 p:1
order m 3
3. Scan DB again, construct p 3 m:2 b:1
FP-tree
p:2 m:1
FP Growth Algorithm: FP Tree Mining
1) Construct conditional pattern base for each node in the

FP-tree
2) Construct conditional FP-tree from each conditional
pattern-base
3) Recursively mine conditional FP-trees and grow frequent
patterns obtained so far
▪ If the conditional FP-tree contains a single path,
simply enumerate all the patterns
} Starting at the frequent header table in the FP-tree

} Traverse the FP-tree by following the link of each frequent item
} Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
Header Table {}
Conditional pattern bases
Item frequency head f:4 c:1 itemcond. pattern base
f 4
c 4 c f:3
c:3 b:1 b:1
a 3 a fc:3
b 3 a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
} For each pattern-base

} Accumulate the count for each item in the base
} Construct the FP-tree for the frequent items of the pattern base
m-conditional pattern
Header Table {} base:
Item frequency head fca:2, fcab:1
f 4 f:4 c:1 All frequent patterns
{} concerning m
c 4
c:3 b:1 b:1 m,
a 3   fm, cm, am,
b 3 a:3 p:1 f:3
fcm, fam, cam,
m 3
c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
Example: Sorted by their frequency
Transactio
n Items
100 Bread, Cheese. Eggs, Juice
200 Bread, Cheese, Juice
300 Bread, Milk, Yogurt
400 Bread, Juice, Milk
500 Cheese, Juice, Milk
Item Frequency
Bread 4
Juice 4
Cheese. 3
Milk, 3

Removing the non-frequent
items and reordering
Transactio
n Items
100 Bread, Cheese. Eggs, Juice
200 Bread, Cheese, Juice
300 Bread, Milk, Yogurt
500 Cheese, Juice, Milk
Transaction Items
100 Bread, Juice. Cheese
300 Bread, Milk,
500 Juice, Cheese, Milk

Transacti
on Items
100 FPBread,Tree
Juice. Cheese
300 Bread, Milk,
500 Juice, Cheese, Milk
NUL
L
B:4
J:1
B
J:3 M:
J 1
C:1
C
M
C:2 M: M:
1 1
Mining FP-tree for frequent items
} For any frequent item A, all the frequent itemsets
containing A can be obtained by following the A's node-
links, starting from A's head in the FP-tree header
} The mining on the FP-tree structure is done using an

algorithm called the frequent pattern growth (FP-Growth).
} This algorithm starts with the least frequent item, that is

the last item in the header table.
FP Growth
} We start with the item M and find the following patterns:
} BM(l) NUL
L
} BJM(I)
B:4
} JCM(I) J:1
B
J:3 M:
J 1
C:1
C
M
C:2 M: M:
1 1
FP Growth
} Next we look at C and find the following:
} BJC(2)
} JC(l) NUL
L
} Give us a frequent itemset JC(3).B:4

J:1
B
J:3 M:
J 1
C:1
C
M
C:2 M: M:
1 1
FP Growth
} Looking at J, the next frequent item in the header table we
obtain:
} BJ(3) NUL
L
} J(l)
B:4
J:1
} We obtain a frequent itemset,
B
J:3 M:
J 1
C:1
C
M
C:2 M: M:
1 1
“Conditional" trees for M
NUL
L
B:4
J:1
J:3 M: C:1
1
M: M:
1 1
FP-growth vs. Apriori: Scalability With the Support Threshold
100 Data set T25I20D10K

D1 FP-growth runtime
D1 Apriori runtime
75
Run time(sec.)
50
25
0
0 0.8 1.5 2.3 3
Support threshold(%)

FP Growth Algorithm for Association Rule Mining

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

FP Growth Algorithm for Association Rule Mining

Uploaded by

Copyright:

Available Formats

Association Rule Mining

Data Mining - Ayesha Khan November 3, 2015

} One problematic aspect of the Apriori is the candidate

} Idea: Compress the database into a frequent pattern tree

FP Growth Algorithm: Tree Construction

} Initially, scan database for frequent 1-itemsets

FP Growth Algorithm: Tree Construction

FP Growth Algorithm: Tree Construction

I2 1 null Create root node

(I5,1) Process transaction

Maintain header table

FP Growth Algorithm: Tree Construction

TID Items bought (ordered) frequent items

FP Growth Algorithm: FP Tree Mining

1) Construct conditional pattern base for each node in the

} Starting at the frequent header table in the FP-tree

} For each pattern-base

100 Bread, Juice. Cheese

200 Bread, Juice. Cheese

300 Bread, Milk,

400 Bread, Juice, Milk

500 Juice, Cheese, Milk

} The mining on the FP-tree structure is done using an

} This algorithm starts with the least frequent item, that is

} Give us a frequent itemset JC(3).B:4

100 Data set T25I20D10K

You might also like