You are on page 1of 13

FP Growth

Diana Popa
Computer Science Department, Faculty of Automatic Control and Computers
University Politehnica of Bucharest
Bucharest, Romania
diana.popa@cti.pub.ro

April 8, 2016

D. Popa
FP Growth

UPB

D. Popa FP Growth UPB .

What is FP Growth? Frequent Pattern Growth an efficient and scalable method for mining frequent patterns in a database finding inherent regularities in data What products were often purchased together?— Beer and diapers?! What are the subsequent purchases after buying a PC? What kinds of DNA are sensitive to this new drug? Can we automatically classify web documents? D. Popa FP Growth UPB .

Popa FP Growth UPB ..? D.Motivation improving the Apriori-like algorithms getting rid of candidate set ’generation-and-test’ approach avoiding huge candidate set generation avoiding scanning the database BUT HOW..

Simply a 2-step algorithm Step 1: Construct a compact data structure. Popa FP Growth UPB . called the FP-tree smart way of arranging the nodes the database scanned only twice Step 2: Extract frequent item sets from the FP-Tree divide-and-conquer traversal approach D.

Step 1: FP-Tree Construction (Example) FP-Tree is constructed using 2 passes over the data-set Pass 1 scan data and and find support for each item discard infrequent items sort frequent items in decreasing order based on their support D. Popa FP Growth UPB .

Popa FP Growth UPB .Step 1: FP-Tree Construction (Example) Pass 2: D.

Size of the FP-Tree The FP-Tree usually has a smaller size than the uncompressed data Best case scenario: all transactions contain the same set of items. Worst case scenario: every transaction has a unique set of items (no items in common) The size of the FP-tree depends on how the items are ordered D. Popa FP Growth UPB .

ae then in cde.q the prefix path sub-tree for e will be used to extract frequent items ending in e. ce. bde. be. than in de.. etc divide and conquer approach D.Step 2: Frequent Itemset Generation bottom-up algorithm starts by extracting prefix path sub-trees ending in an item(set) (hint: use the linked lists) each prefix path sub-tree is processed recursively to extract the frequent itemsets solutions are then merged e. Popa FP Growth UPB ..

FP-Growth Apriori D. Popa FP Growth UPB .

Popa FP Growth UPB .FP-Growth TreeProjection D.

Popa FP Growth UPB .Conclusions performance results prove efficiency and scalability uses novel FP-Tree structure to compress data-sets costly algorithm in terms of memory FP-Tree’s creation is expensive parallelization is possible: PFP D.

D. No. ”Mining frequent patterns without candidate generation: A frequent-pattern tree approach. and Yiwen Yin. ACM. et al. Jiawei. ”Mining frequent patterns without candidate generation.” ACM Sigmod Record. Jiawei.References Han.1 (2004): 53-87.” Data mining and knowledge discovery 8. 2. Jian Pei. 29. 2000. Vol. Han. Popa FP Growth UPB .