Professional Documents
Culture Documents
Appriori Algorithm
Appriori Algorithm
Apriori Algorithm
Association Rules
• Association rules are created by analyzing data for frequent if/then
patterns and using the criteria support and confidence to identify
the most important relationships.
• Support is an indication of how frequently the items appear in the
database.
• Confidence indicates the number of times the if/then statements
have been found to be true.
Apriori Algorithm
• As is common in association rule mining, given a set of itemsets
(for instance, sets of retail transactions, each listing individual
items purchased), the algorithm attempts to find subsets which are
common to at least a minimum number C of the item sets.
• Apriori uses a "bottom up" approach, where frequent subsets are
extended one item at a time (a step known as candidate
generation), and groups of candidates are tested against the data.
The algorithm terminates when no further successful extensions
are found.
• Apriori uses breadth-first search and a tree structure to count
candidate item sets efficiently. It generates candidate item sets of
length k from item sets of length k − 1. Then it prunes the
candidates which have an infrequent sub pattern.
Why Association Rule?
• Suppose you have records of large number of transactions at a
shopping center as follows:
• Learning association rules basically means finding the items that
are purchased together more frequently than others.
• For example in the table below you can see Item1 and item3 are
bought together frequently.
• Find all frequent item sets using Apriori Algorithm. Let minimum support
(min_sup) = 60% and minimum confidence (min_conf) = 70%
Step 1 :
Find single item frequently Find single items bought with
C1 bought together L1 minimum 60% support
Item Frequency Support Item Frequency Support
M 3 3/5 = 0.6 M 3 3/5 =
0.6
O 3 (why 3 not 4) 3/5 = 0.6
O 3 3/5 =
N 2 2/5 = 0.4 0.6
K 5 1.0 K 5 1.0
E 4 0.8 E 4 0.8
Y 3 0.6 Y 3 0.6
D 1 0.2 Items that satisfy 60% minimum support
A 1 0.2
U 1 0.2
C 2 0.4
I 1 0.2
C2 L2
Item Frequency Support Item Frequency Support
MO 1 1/5 = 0.2 MK 3 3/5 =
0.6
MK 3 3/5 = 0.6
OK 3 3/5 =
ME 2 2/5 = 0.4 0.6
MY 2 0.4 OE 3 3/5 =
0.6
OK 3 0.6
KE 4 4/5 =
OE 3 0.6 0.8
OY 2 0.4 Items
KY that satisfy
3 60% minimum
3/5 support
=
0.6
KE 4 0.8
KY 3 0.6
EY 2 0.4
We start making pairs from the first item, like MO,MK,ME,MY and then we start with the
second item like OK,OE,OY. We did not do OM because we already did MO when we were
making pairs with M and buying a Mango and Onion together is same as buying Onion and
Mango together. After making all the pairs we get,
Step 3 :
Find three items frequently bought together with minimum 60% support.
• To make the set of three items we need one more rule (it’s termed as self-join), It simply
means, from the Item pairs in the above table, we find two pairs with the same first
Alphabet, so we get
OK and OE, this gives OKE
KE and KY, this gives KEY
C3 L3
Item Frequency Support Item Frequency Support
OKE 3 3/5 = 0.6 OKE 3 3/5 =
0.6
KEY 2 2/5 = 0.4
Items that satisfy 60% minimum support
When we were generated C4 with L4, it turns out to be empty, therefore the process
terminated.
Assume: While we are on this, suppose you have sets of 3 items say ABC, ABD, ACD, ACE, BCD
and you want to generate item sets of 4 items you look for two sets having the same first two
alphabets.
· ABC and ABD -> ABCD
· ACD and ACE -> ACDE
And so on … In general you have to look for sets having just the last alphabet/item different.
How to calculate confidence ?
• We can take our frequent itemset knowledge even further, by finding (deducing)
association rules using the frequent itemset.
• In simple words, We know that O,K,E are brought together frequently, but, what is the
association between them.
• To do this we create, we list all the subsets of frequently bought items (O,K,E) in our case,
we get following subsets,
• {O}
• {K}
• {E}
• {O,K}
• {O,E}
• {K,E}
How to calculate confidence ?
confidence = support({A, C})/support({A})
Generate all possibilities of subset of OKE
Item Frequency (LHS) confidence
O => O
O => K OK =3 , O=3 3/3=1.0 (100%)
O => E OE=3, O=3 3/3=1.0 (100%)
O => OK
O => KE OKE = 3, O=3 3/3=1.0 (100%)
O => OE
K => O KO=0, K=5 0/5 = 0 (0%)
K => K
K => E KE=4, K=5 4/5 = 0.8 (80%)
K => OK
K => KE
K => OE KOE=0, K=5 0/5 = 0 (0%)
Continue :