You are on page 1of 19

Data Mining and Data Warehousing

Unit - III
Association Rules

Prepared by
R.Poonguzhali
Periyar Maniammai Institute of Science and Technology

04/13/2020 1
Outline

 Basic Concepts ƒ

 Example: Market Basket Analysis

 Apriori Algorithm

04/13/2020 2
Basic Concepts
What is Frequent pattern mining?
Frequent pattern : a pattern (a set of items,
subsequences, substructures, etc.) that occurs
frequently in a data set ƒ

Example:
A set of items such as milk and bread that occur
frequently together in a transaction dataset is a
frequent itemset.
04/13/2020 3
Example- Market Basket Analysis
• This process analyzes customer buying habits by
finding associations between the different items
that customers place in their shopping basket.

• Finding associations between different items


help the retailers to find which items are
frequently purchased together by customers.

04/13/2020 4
Transactional Database
Transaction ID List of Items
(TID)
T1 {Jam, milk, bread}
T2 {biscuit, eggs, salt, yogurt}
T3 {Jam, eggs, bread}

 An item: an item/article in a basket


 I= { I1,I2…In} : the set of all items sold in the
store
 A transaction: items purchased in a basket which
contains TID (transaction ID)
 A transactional database D: A set of transactions
04/13/2020 5
Let A be a set of items and B be a set of items. A
A ⸦ I and B ⸦ I

Support:
Support (A =>B )= P(AUB)
Percentage of transactions in D that contain AUB

Confidence:
Confidence (A =>B) = P(B/A)
Percentage of transactions in D containing A also
contain B

04/13/2020 6
Association Rule Mining
Goal:
Find all rules that satisfy the user-specified minimum
support (minsup) and minimum confidence
(minconf).

04/13/2020 7
The Apriori algorithm
• Probably the best known algorithm for Association
Rule Mining
• Two steps:
Step1:
– Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets).
Step2:
– Use frequent itemsets to generate rules.

04/13/2020 8
Start with STEP : 1
Find all itemsets that have minimum support count
(min_sup = 2)

04/13/2020 9
Example:1 - Transaction Database
TID List of items

1 I1,I2,I5
2 I2,I4
3 I2,I3
4 I1,I2,I4
5 I1,I3
6 I2,I3
7 I1.I3
8 I1.I2.I3.I5
9 I1,I2,I3

04/13/2020 10
Compare candidate
Scan D for count of each
support count to
candidate
minimum support count
C1
L1

Itemset Support Itemset Support


count count
[I1] 6 [I1] 6
[I2] 7 [I2] 7
[I3] 6 [I3] 6
[I4] 2 [I4] 2
[I5] 2 [I5] 2

04/13/2020 11
Compare candidate
Generate C2 candidates Scan D for count of support count with
from L1 each candidate C2 minimum support count
L2
Itemset Itemset Support Itemset Suppor
count t
[I1 , I2] count
[I1 , I2] 4
[I1, I3]
[I1, I3] 4 [I1 , I2] 4
[I1,I4]
[I1,I4] 1 [I1, I3] 4
[I1, I5]
[I1, I5] 2 [I1, I5] 2
[I2,I3]
[I2,I3] 4 [I2,I3] 4
[I2,I4]
[I2,I4] 2 [I2,I4] 2
[I2,I5]
[I2,I5] 2 [I2,I5] 2
[I3,I4]
[I3,I4] 0
[I3,I5]
[I3,I5] 1
[I4,I5]
[I4,I5] 0

04/13/2020 12
Generate C3 candidates Scan D for count of each
from L2 candidate C3

Itemset Itemset Support


count
[I1 , I2,I3]
[I1 , I2,I3] 2
[I1, I2, I5]
[I1, I2,I5] 2

Compare candidate support


count with minimum
support count L3

Itemset Support
count

[I1, I2,I3] 2
[I1, I2,I5] 2
04/13/2020 13
Start with STEP : 2
Generating Association rules form Frequent Itemsets

04/13/2020 14
Generate strong association rules from the
frequent item sets

Confidence (A=>B) = P (B|A) = Support_count (AUB)


Support_count(A)

Example 1:
The data contain frequent item set X= {I1,I2,I5}. What
are the association rules generated from X. The minimum
confidence threshold is 70%

04/13/2020 15
The resulting association rules are

{I1,I2} => I5 Confidence 2/4 = 50%


{I1,I5} => I2 Confidence 2/2= 100%
{I2,I5} =>I1 Confidence 2/2 = 100%
{I1} => {I2,I5} Confidence 2/6 = 33%
{I2} => {I1,I5} Confidence 2/7 = 29%
{I5}=> {I1,I2} Confidence 2/2 = 100%

Association rules are

{I1,I5} => I2 Confidence 2/2= 100%


{I2,I5} =>I1 Confidence 2/2 = 100%
{I5}=> {I1,I2} Confidence 2/2 = 100%
04/13/2020 16
Exercise:
The data contain frequent item set X= {I1,I2,I3}.
What are the association rules generated from X.
The minimum confidence threshold is 70%.

04/13/2020 17
This lecture is based on the following resources - slides:
1. J.Han : Data Mining Concepts and Techniques.
2. G.Piatetsky-Shapiro : Association Rules and Frequent
Item Analysis.
3. Jerzy Stefanowski : Institute of Computing Sciences
Poznan University of Technology Poznan, Poland.

04/13/2020 18
THANK YOU

04/13/2020 19

You might also like