Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin

Association Rule Mining
Data Mining and Knowledge Discovery

Prof. Carolina Ruiz and Weiyang Lin
Department of Computer Science
Worcester Polytechnic Institute
Sample Applications
 Sample Commercial Applications
 Market basket analysis
 cross-marketing
 attached mailing
 store layout, catalog design
 customer segmentation based on buying patterns
 …
 Sample Scientific Applications
 Genetic analysis
 Analysis of medical data
 …
Transactions and Assoc. Rules
Transaction Purchased
Id Items
1 {a, b, c}
2 {a, d}
3 {a, c}
 Association Rule: 4 {b, e, f}
a→c
 support: 50% = P(a & c)
percentage of transactions that contain both a and c

 confidence: 66% = P(c | a)
percentage of transactions that contain c among
those transactions that contain a
Association Rules - Intuition
 Given a set of transactions where each transaction

is a set of items
 Find all rules X → Y that relate the presence of
one set of items X with another set of items Y
 Example: 98% of people who purchase diapers and
baby food also buy beer
 A rule may have any number of items in the antecedent
and in the consequent
 Possible to specify constraints on rules
Mining Association Rules
Problem Statement
Given:
 a set of transactions (each transaction is a set of items)
 user-specified minimum support
 user-specified minimum confidence
Find:
 all association rules that have support and confidence
greater than or equal to the user-specified minimum
support and minimum confidence
Naïve Procedure to mine rules
 List all the subsets of the Complexity: Let n be the number
set of items of items. The number of rules
 For each subset naively considered is:
 Split the subset into two n i-1
n i
parts (one for the
antecedent and one for the S[(i ) * S ( )] k
consequent of the rule i=2 k=1
 Compute the support of the
rule n
n i
 Compute the confidence of
the rule = S[(i ) * (2 -2) ]
 IF support and confidence i=2
are no lower than user-
specified min. support and
confident THEN output the =3 –2 n (n+1)
+1
rule
The Apriori Algorithm
1. Find all frequent itemsets: sets of items whose

support is greater than or equal to the user-
specified minimum support.
2. Generate the desired rules: if {a, b, c, d} and {a,

b} are frequent itemsets, then compute the ratio
conf (a & b → c & d) = P(c & d | a & b)
= P( a & b & c & d)/P(a & b)
= support({a, b, c, d})/support({a, b}).
If conf >= mincoff, then add rule a & b → c & d
The Apriori Algorithm — Example
slide taken from J. Han & M. Kamber’s Data Mining book
Min. supp = 50%, I.e. min support count = 2
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Apriori Principle
 Key observation:
 Every subset of a frequent itemset is also a
frequent itemset
Or equivalently,
 The support of an itemset is greater than
or equal to the support of any superset of

the itemset
Apriori - Compute Frequent Itemsets
Making multiple passes over the data

for pass k
{candidate generation: Ck := Lk-1 joined with Lk-1 ;
support counting in Ck;
Lk := All candidates in Ck with minimum support;
}
terminate when Lk==  or Ck+1== 
Frequent-Itemsets =  k Lk
Lk - Set of frequent itemsets of size k. (those with minsup)
Ck - Set of candidate itemsets of size k.
(potentially frequent itemsets)
Apriori – Generating rules
For each frequent itemset:

- Generate the desired rules: if {a, b, c, d} and {a,
b} are frequent itemsets, then compute the ratio
conf (a & b → c & d)
= support({a, b, c, d})/support({a, b}).
If conf >= mincoff, then add rule a & b → c & d

Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Association Rule Mining: Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin

Uploaded by

Copyright:

Available Formats

Association Rule Mining

Data Mining and Knowledge Discovery

percentage of transactions that contain both a and c

 Given a set of transactions where each transaction

1. Find all frequent itemsets: sets of items whose

2. Generate the desired rules: if {a, b, c, d} and {a,

or equal to the support of any superset of

Making multiple passes over the data

For each frequent itemset:

You might also like