You are on page 1of 18

Association Rule

Dengan FP-Tree dan FP Growth

Sistem Informasi FT – UPI YAI

Jesa Ariawan, MTI

Latar Belakang • Algoritma Association Rule dengan Apriori kurang baik bila terdapat banyak pola kombinasi data yang sering muncul (banyak frequent pattern). • Algoritma Association Rule dengan Apriori membutuhkan waktu yang cukup lama karena scanning database dapat dilakukan berulang-ulang untuk mendapatkan frequent pattern yang ideal. • FP-Tree (Frequent Pattern – Tree) merupakan suatu algoritma yang dirancang untuk mengatasi kendala bottleneck pada proses penggalian data dengan algoritma Apriori (Zhao et al. banyak jenis item tetapi pemenuhan minimum support rendah. 2003). .

. • Tanpa memerlukan candidate generation • Dilanjutkan dengan proses algoritma FP-growth yang dapat langsung mengekstrak frequent Itemset dari FP tree yang telah terbentuk dengan menggunakan prinsip divide and conquer.Ide gagasan FP-Tree • Pemampatan data dengan model struktur data pohon untuk menghindari pengulangan scanning database.

dan pointer penghubung yang menghubungkan node-node dengan label item sama antar-lintasan. • Setiap node dalam FP-Tree mengandung tiga informasi penting. support count. ditandai dengan garis panah putus-putus.Definisi Frequent Pattern – Tree (FP – Tree) • Terdiri atas sebuah root dengan label ‘null’. yaitu label item. sekumpulan subtree yang menjadi child dari root dan sebuah tabel frequent header. . • Isi dari setiap baris dalam tabel frequent header terdiri atas label item dan head of nodelink yang menunjuk ke node pertama dalam FP-Tree yang menyimpan label item tersebut. menginformasikan jenis item yang direpresentasikan node tersebut. merepresentasikan jumlah lintasan transaksi yang melalui node tesebut.

Proses untuk mendapatkan Frequent Itemset dilakukan dalam 2 tahap • Pembentukan Frequent Pattern – Tree (FP-Tree) • Ekstrak Frequent Itemset hasil dari FP-Tree dengan menggunakan algoritma FP-Growth .

2. Jika bukan lintasan tunggal.Algoritma FP Growth dibagi dalam 3 tugas utama 1. Pembangkitan conditional pattern base didapatkan melalui FP-tree yang telah dibangun sebelumnya. maka dilakukan pembangkitan FP-growth secara rekursif. Tahap Pembangkitan Conditional FP-tree Pada tahap ini. 3. support count dari setiap item pada setiap conditional pattern base dijumlahkan. Tahap Pencarian frequent itemset Apabila Conditional FP-tree merupakan lintasan tunggal (single path). maka didapatkan frequent itemset dengan melakukan kombinasi item untuk setiap conditional FP-tree. lalu setiap item yang memiliki jumlah support count lebih besar sama dengan minimum support count akan dibangkitkan dengan conditional FP-tree. . Tahap Pembangkitan Conditional Pattern Base Conditional Pattern Base merupakan subdatabase yang berisi prefix path (lintasan prefix) dan suffix pattern (pola akhiran).

h a.d.c.r a.b.b. 1 2 3 4 5 6 7 8 9 10 Transaksi a.e.c a.b.c.c a.b.e a.b b.e .f a.d b.c.d.d.Contoh Proses Asosiasi dengan FP-Tree dan FP-Growth • Contoh Tabel Transaksai dengan minimum support count 2 No.z.c.g.d a.

Contoh Proses Asosiasi dengan FP-Tree dan FP-Growth • Frekuensi kemunculan tiap item dapat dilihat pada tabel berikut Item a b c d e f r z g h Frekuensi 8 7 6 5 3 1 1 1 1 1 .

c.e} {a.d} {a} {a.c} {a. diurut berdasarkan yang frekuensinya paling tinggi. TID 1 2 3 4 5 6 7 8 9 10 Item {a.b} {b.b.d.c.e} .b.c} {a.d} {b.Contoh Proses Asosiasi dengan FP-Tree dan FP-Growth • Kemunculan item yang frequent dalam setiap transaksi.d} {a.c.c.b.b.d.e} {a.

Most gures derived from [1]. Australia Copyright 2008 Florian Verhein.Frequent Pattern Growth (FP-Growth) Algorithm An Introduction Florian Verhein fverhein@it.edu. January 10.au School of Information Technologies. 2008 Outline Introduction FP-Tree data structure Step 1: FP-Tree Construction Step 2: Frequent Itemset Generation Discussion .usyd. The University of Sydney.

Introduction uses a generate-and-test approach  generates candidate itemsets and tests if they are frequent I Apriori: I Generation of candidate itemsets is expensive (in both space and time) I Support counting is expensive I I Subset checking (computationally expensive) Multiple Database scans (I/O) allows frequent itemset discovery without candidate itemset generation. counters are incremented I Pointers are maintained between nodes containing the same item. I In this case. creating singly linked lists (dotted lines) I The more paths that overlap. I Frequent itemsets extracted from the FP-Tree. the higher the compression. so paths can overlap when transactions share items (when they have the same prex ). FP-tree may t in memory. I Step 2: Extracts frequent itemsets directly from the FP-tree I Traversal through FP-Tree Core Data Structure: FP-Tree I Nodes correspond to items and have a counter I FP-Growth reads 1 transaction at a time and maps it to a path I Fixed order is used. . Two step approach: I FP-Growth: I Step 1: Build a compact data structure called the I FP-tree Built using 2 passes over the data-set.

d } Create 3 nodes for b . the paths are disjoint as they don't share a common prex. Set counts of a and b to 1. Set counts to 1. Note that although transaction 1 and 2 share b .Step 1: FP-Tree Construction (Example) FP-Tree is constructed using 2 passes over the data-set: I Pass 1: I Scan data and nd support for each item. e Use this order when building the FP-Tree. I Discard infrequent items. d . I Read transaction 2: I I {b . c and d and the path null → b → c → d . Add links between the c 's and d 's. . c . b } {a. Step 1: FP-Tree Construction (Example) I Pass 2: construct the FP-Tree (see diagram on next slide) I Read transaction 1: I Create 2 nodes a and b and the path null → a → b . b . e } It shares common prex item a with transaction 1 so the path for transaction 1 and 3 will overlap and the frequency count for node a will be incremented by 1. c . so common prexes can be shared. Add the link between the b 's. c . I I For our example: a. d . I Read transaction 3: I {a. I Continue until all transactions are mapped to a path in the FP-tree. I Sort frequent items in decreasing order based on their support.

I The size of the FP-tree depends on how the items are ordered I Ordering by decreasing support is typically used but it does not always lead to the smallest tree (it's a heuristic). I I 1 path in the FP-tree Worst case scenario: every transaction has a unique set of items (no items in common) I I Size of the FP-tree is at least as large as the original data. Storage requirements for the FP-tree are higher  need to store the pointers between the nodes and the counters. . I Best case scenario: all transactions contain the same set of items.Step 1: FP-Tree Construction (Example) FP-Tree size I The FP-Tree usually has a smaller size than the uncompressed data  typically many transactions share items (and hence prexes).

etc. be and ae . (hint: use the linked lists) ↑ Complete FP-tree → Example: prex path sub-trees Step 2: Frequent Itemset Generation I Each prex path sub-tree is processed recursively to extract the frequent itemsets.Step 2: Frequent Itemset Generation I FP-Growth extracts frequent itemsets from the FP-tree. etc. then d. then cd . I E. . Solutions are then merged. cde . prex path sub-tree for e will be used to extract frequent itemsets ending in e . . ce . . then de . I Bottom-up algorithm  from the leaves towards the root I Divide and conquer: rst look for frequent itemsets ending in e. . then in cde . bde . extract prex path sub-trees ending in an item(set).g. I First. the I Divide and conquer approach Prex path sub-tree ending in e. then in de . . etc.

de . I I I 1. I To do this. Conditional FP-Tree I The FP-Tree that would be built if we only consider transactions containing a particular itemset (and then removing that itemset from all transactions). Check if e is a frequent item by adding the counts along the linked list (dotted line).e. As e is frequent. i. count =3 so {e } is extracted as a frequent itemset. we must rst to obtain the conditional FP-tree for e. . Obtain the prex path sub-tree for e : 2. If so.e. decompose the problem recursively. be and ae . I i. I Example: FP-Tree conditional on e . 3. extract it. ce .Example Let minSup = 2 and extract all frequent itemsets containing e . I Yes. nd frequent itemsets ending in e .

I b and c should be set to 1 and a to 2.Conditional FP-Tree To obtain the conditional FP-tree for e from the prex sub-tree ending in e : I Update the support counts along the prex paths (from e ) to reect the number of transactions containing e . Conditional FP-Tree To obtain the conditional FP-tree for e from the prex sub-tree ending in e : I Remove the nodes containing e  information about node e is no longer needed because of the previous step .

ce and ae be is for e ..{a.. conditional tree for e. d . Question: why were c and d not removed? Example (continued) I 4.e. Use the the conditional FP-tree for e to nd frequent itemsets ending in de . there is only 1 transaction containing b and e so be is infrequent  can remove b.g. i. etc. generate conditional FP-tree. I Note that FP-tree not considered as I For each of them (e.g. b is not in the conditional nd the prex paths from the extract frequent itemsets. (recursive) I Example: frequent) e → de → ade ({d . e } are found to be . de ).Conditional FP-Tree To obtain the conditional FP-tree for e from the prex sub-tree ending in e : I Remove infrequent items (nodes) from the prex paths b has a support of 1 (note this really means be has a support of 1). I E. e }.

then do the whole thing for b. Use the the conditional FP-tree for e to nd frequent itemsets ending in de . e } is found to be frequent) etc. (ae .. etc) Result I Frequent itemsets found (ordered by sux and order in which they are found): . ce and ae I Example: I e → ce ({c ....Example (continued) I 4..