You are on page 1of 13

FP Growth

Diana Popa
Computer Science Department, Faculty of Automatic Control and Computers
University Politehnica of Bucharest
Bucharest, Romania

April 8, 2016

D. Popa
FP Growth


D. Popa FP Growth UPB .

What is FP Growth? Frequent Pattern Growth an efficient and scalable method for mining frequent patterns in a database finding inherent regularities in data What products were often purchased together?— Beer and diapers?! What are the subsequent purchases after buying a PC? What kinds of DNA are sensitive to this new drug? Can we automatically classify web documents? D. Popa FP Growth UPB .

Popa FP Growth UPB ..? D.Motivation improving the Apriori-like algorithms getting rid of candidate set ’generation-and-test’ approach avoiding huge candidate set generation avoiding scanning the database BUT HOW..

Simply a 2-step algorithm Step 1: Construct a compact data structure. Popa FP Growth UPB . called the FP-tree smart way of arranging the nodes the database scanned only twice Step 2: Extract frequent item sets from the FP-Tree divide-and-conquer traversal approach D.

Step 1: FP-Tree Construction (Example) FP-Tree is constructed using 2 passes over the data-set Pass 1 scan data and and find support for each item discard infrequent items sort frequent items in decreasing order based on their support D. Popa FP Growth UPB .

Popa FP Growth UPB .Step 1: FP-Tree Construction (Example) Pass 2: D.

Size of the FP-Tree The FP-Tree usually has a smaller size than the uncompressed data Best case scenario: all transactions contain the same set of items. Worst case scenario: every transaction has a unique set of items (no items in common) The size of the FP-tree depends on how the items are ordered D. Popa FP Growth UPB .

ae then in cde.q the prefix path sub-tree for e will be used to extract frequent items ending in e. ce. bde. be. than in de.. etc divide and conquer approach D.Step 2: Frequent Itemset Generation bottom-up algorithm starts by extracting prefix path sub-trees ending in an item(set) (hint: use the linked lists) each prefix path sub-tree is processed recursively to extract the frequent itemsets solutions are then merged e. Popa FP Growth UPB ..

FP-Growth Apriori D. Popa FP Growth UPB .

Popa FP Growth UPB .FP-Growth TreeProjection D.

Popa FP Growth UPB .Conclusions performance results prove efficiency and scalability uses novel FP-Tree structure to compress data-sets costly algorithm in terms of memory FP-Tree’s creation is expensive parallelization is possible: PFP D.

D. No. ”Mining frequent patterns without candidate generation: A frequent-pattern tree approach. and Yiwen Yin. ACM. et al. Jiawei. ”Mining frequent patterns without candidate generation.” ACM Sigmod Record. Jiawei.References Han.1 (2004): 53-87.” Data mining and knowledge discovery 8. 2. Jian Pei. 29. 2000. Vol. Han. Popa FP Growth UPB .