You are on page 1of 11

Association Rule Mining

Data Mining and Knowledge Discovery


Prof. Carolina Ruiz and Weiyang Lin
Department of Computer Science
Worcester Polytechnic Institute
Sample Applications
 Sample Commercial Applications
 Market basket analysis
 cross-marketing
 attached mailing
 store layout, catalog design
 customer segmentation based on buying patterns
 …
 Sample Scientific Applications
 Genetic analysis
 Analysis of medical data
 …
Transactions and Assoc. Rules
Transaction Purchased
Id Items
1 {a, b, c}
2 {a, d}
3 {a, c}
 Association Rule: 4 {b, e, f}
a→c
 support: 50% = P(a & c)

percentage of transactions that contain both a and c


 confidence: 66% = P(c | a)
percentage of transactions that contain c among
those transactions that contain a
Association Rules - Intuition

 Given a set of transactions where each transaction


is a set of items
 Find all rules X → Y that relate the presence of
one set of items X with another set of items Y
 Example: 98% of people who purchase diapers and
baby food also buy beer
 A rule may have any number of items in the antecedent
and in the consequent
 Possible to specify constraints on rules
Mining Association Rules

Problem Statement
Given:
 a set of transactions (each transaction is a set of items)
 user-specified minimum support
 user-specified minimum confidence
Find:
 all association rules that have support and confidence
greater than or equal to the user-specified minimum
support and minimum confidence
Naïve Procedure to mine rules
 List all the subsets of the Complexity: Let n be the number
set of items of items. The number of rules
 For each subset naively considered is:
 Split the subset into two n i-1
n i
parts (one for the
antecedent and one for the S[(i ) * S ( )] k
consequent of the rule i=2 k=1
 Compute the support of the
rule n
n i
 Compute the confidence of
the rule = S[(i ) * (2 -2) ]
 IF support and confidence i=2
are no lower than user-
specified min. support and
confident THEN output the =3 –2 n (n+1)
+1
rule
The Apriori Algorithm

1. Find all frequent itemsets: sets of items whose


support is greater than or equal to the user-
specified minimum support.

2. Generate the desired rules: if {a, b, c, d} and {a,


b} are frequent itemsets, then compute the ratio
conf (a & b → c & d) = P(c & d | a & b)
= P( a & b & c & d)/P(a & b)
= support({a, b, c, d})/support({a, b}).
If conf >= mincoff, then add rule a & b → c & d
The Apriori Algorithm — Example
slide taken from J. Han & M. Kamber’s Data Mining book
Min. supp = 50%, I.e. min support count = 2
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Apriori Principle
 Key observation:
 Every subset of a frequent itemset is also a
frequent itemset
Or equivalently,
 The support of an itemset is greater than

or equal to the support of any superset of


the itemset
Apriori - Compute Frequent Itemsets

Making multiple passes over the data


for pass k
{candidate generation: Ck := Lk-1 joined with Lk-1 ;
support counting in Ck;
Lk := All candidates in Ck with minimum support;
}
terminate when Lk==  or Ck+1== 
Frequent-Itemsets =  k Lk
Lk - Set of frequent itemsets of size k. (those with minsup)
Ck - Set of candidate itemsets of size k.
(potentially frequent itemsets)
Apriori – Generating rules

For each frequent itemset:


- Generate the desired rules: if {a, b, c, d} and {a,
b} are frequent itemsets, then compute the ratio
conf (a & b → c & d)
= support({a, b, c, d})/support({a, b}).
If conf >= mincoff, then add rule a & b → c & d

You might also like