You are on page 1of 15

Association Rule

Apriori Algorithm
Association Rules
• Association rules are created by analyzing data for frequent if/then
patterns and using the criteria support and confidence to identify
the most important relationships.
• Support is an indication of how frequently the items appear in the
database.
• Confidence indicates the number of times the if/then statements
have been found to be true.
Apriori Algorithm
• As is common in association rule mining, given a set of itemsets
(for instance, sets of retail transactions, each listing individual
items purchased), the algorithm attempts to find subsets which are
common to at least a minimum number C of the item sets.
• Apriori uses a "bottom up" approach, where frequent subsets are
extended one item at a time (a step known as candidate
generation), and groups of candidates are tested against the data.
The algorithm terminates when no further successful extensions
are found.
• Apriori uses breadth-first search and a tree structure to count
candidate item sets efficiently. It generates candidate item sets of
length k from item sets of length k − 1. Then it prunes the
candidates which have an infrequent sub pattern.
Why Association Rule?
• Suppose you have records of large number of transactions at a
shopping center as follows:
• Learning association rules basically means finding the items that
are purchased together more frequently than others.
• For example in the table below you can see Item1 and item3 are
bought together frequently.

Transactions Items bought


T1 Item1, Item3
T2 Item1, Item4
T3 Item1, item2, item3, item4
T4 Item1, item3, item4
T5 Item3, item4
Application for Association Rule
• Shopping centers use association rules to place the items next to
each other so that users buy more items. If you are familiar with
data mining you would know about the famous beer-diapers-Wal-
Mart story. Basically Wal-Mart studied their data and found that on
Friday afternoon young American males who buy diapers also tend
to buy beer. So Wal-Mart placed beer next to diapers and the beer-
sales went up. This is famous because no one would have
predicted such a result and that’s the power of data mining.
• Also if you are familiar with Amazon, they use association mining
to recommend you the items based on the current item you are
browsing/buying.
• Another application is the Google auto-complete, where after you
type in a word it searches frequently associated words that user
type after that particular word.
Example
Table 1.2 : Shopping Transaction
Transaction ID Items bought
T1 {Mango, Onion, Nintendo, Key-chain, Eggs, Yo-yo}
T2 {Doll, Onion, Nintendo, Key-chain, Eggs, Yo-yo}
T3 {Mango, Apple, Key-chain, Eggs}
T4 {Mango, Umbrella, Corn, Key-chain, Yo-yo}
T5 Corn, Onion, Onion, Key-chain, Ice-cream, Eggs}

Table 1.3 : Shopping Transaction – New Representation


Transaction ID Items bought
T1 {M, O, N, K, E, Y}
T2 {D, O, N, K, E, Y}
T3 {M, A, K, E}
T4 {M, U, C, K, Y}
T5 {C, O, O, K, I, E}

• Find all frequent item sets using Apriori Algorithm. Let minimum support
(min_sup) = 60% and minimum confidence (min_conf) = 70%
Step 1 :
Find single item frequently Find single items bought with
C1 bought together L1 minimum 60% support
Item Frequency Support Item Frequency Support
M 3 3/5 = 0.6 M 3 3/5 =
0.6
O 3 (why 3 not 4) 3/5 = 0.6
O 3 3/5 =
N 2 2/5 = 0.4 0.6
K 5 1.0 K 5 1.0
E 4 0.8 E 4 0.8
Y 3 0.6 Y 3 0.6
D 1 0.2 Items that satisfy 60% minimum support
A 1 0.2
U 1 0.2
C 2 0.4
I 1 0.2

• support = frequency (x) / number of transaction


• Items that satisfy min 60% support are {M,O,K,E,Y}
Step 2 :
Find two items frequently bought together with minimum 60% support.

C2 L2
Item Frequency Support Item Frequency Support
MO 1 1/5 = 0.2 MK 3 3/5 =
0.6
MK 3 3/5 = 0.6
OK 3 3/5 =
ME 2 2/5 = 0.4 0.6
MY 2 0.4 OE 3 3/5 =
0.6
OK 3 0.6
KE 4 4/5 =
OE 3 0.6 0.8
OY 2 0.4 Items
KY that satisfy
3 60% minimum
3/5 support
=
0.6
KE 4 0.8
KY 3 0.6
EY 2 0.4

We start making pairs from the first item, like MO,MK,ME,MY and then we start with the
second item like OK,OE,OY. We did not do OM because we already did MO when we were
making pairs with M and buying a Mango and Onion together is same as buying Onion and
Mango together. After making all the pairs we get,
Step 3 :
Find three items frequently bought together with minimum 60% support.

• To make the set of three items we need one more rule (it’s termed as self-join), It simply
means, from the Item pairs in the above table, we find two pairs with the same first
Alphabet, so we get
OK and OE, this gives OKE
KE and KY, this gives KEY

C3 L3
Item Frequency Support Item Frequency Support
OKE 3 3/5 = 0.6 OKE 3 3/5 =
0.6
KEY 2 2/5 = 0.4
Items that satisfy 60% minimum support
When we were generated C4 with L4, it turns out to be empty, therefore the process
terminated.
Assume: While we are on this, suppose you have sets of 3 items say ABC, ABD, ACD, ACE, BCD
and you want to generate item sets of 4 items you look for two sets having the same first two
alphabets.
· ABC and ABD -> ABCD
· ACD and ACE -> ACDE
And so on … In general you have to look for sets having just the last alphabet/item different.
How to calculate confidence ?

• We can take our frequent itemset knowledge even further, by finding (deducing)
association rules using the frequent itemset.
• In simple words, We know that O,K,E are brought together frequently, but, what is the
association between them.
• To do this we create, we list all the subsets of frequently bought items (O,K,E) in our case,
we get following subsets,
• {O}
• {K}
• {E}
• {O,K}
• {O,E}
• {K,E}
How to calculate confidence ?
confidence = support({A, C})/support({A})
Generate all possibilities of subset of OKE
Item Frequency (LHS) confidence

O => O
O => K OK =3 , O=3 3/3=1.0 (100%)
O => E OE=3, O=3 3/3=1.0 (100%)
O => OK
O => KE OKE = 3, O=3 3/3=1.0 (100%)
O => OE
K => O KO=0, K=5 0/5 = 0 (0%)
K => K
K => E KE=4, K=5 4/5 = 0.8 (80%)
K => OK
K => KE
K => OE KOE=0, K=5 0/5 = 0 (0%)
Continue :

Item Frequency (LHS) confidence

E => O EO=, E=4 0/4 =0 (0%)


E => K EK=0, E=4 0/4 = 0 (0%)
E => E
E => OK EOK=0, E=4 0/4 =0 (0%)
E => KE
E => OE
OK => O
OK => K
OK => E OKE=3, OK=3 3/3=1.0 (100%)
OK => OK
OK => KE
OK => OE
Continue :

Item Frequency (LHS) confidence

KE => O KEO=0, KE=4 0/4=0 (0%)


KE => K
KE => E
KE => OK
KE => KE
KE => OE
OE => O
OE => K OEK=0, OE=3 0/3=0 (0%)
OE => E
OE => OK
OE => KE
OE => OE
Continue :

• So the rules with 70% confidence are as bellow:

Item Frequency (LHS) confidence

O => K K =3 , 0=3 3/3=1.0 (100%)


O => E E=3, O=3 3/3=1.0 (100%)
O => KE KE = 3, O=3 3/3=1.0 (100%)
K => E E=4, K=5 4/5 = 0.8 (80%)
OK => E OK=3, E=3 3/3=1.0 (100%)

You might also like