You are on page 1of 41

Data Mining and Business Intelligence

Apriori

Mining Frequent Patterns,


FP-Growth
Associations, & Correlations
Evaluation
By Methods
Dr. Nora Shoaip

Lecture 5

Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems

2023 - 2024
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules

 Frequent Item set Mining Methods


• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth

 Pattern Evaluation Methods

2
The Basics: What Is Frequent Pattern Analysis?

• Frequent pattern: a pattern (a set of items, subsequences, substructures,


etc.) that occurs frequently in a data set

• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the


context of frequent itemsets and association rule mining

3
The Basics

21
The Basics

Motivation: Finding inherent regularities in data


What products were often purchased together?— Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign analysis,
Web log (click stream) analysis, and DNA sequence analysis

5
The Basics

6
The Basics : Frequent Itemsets
Itemset X = {x1, …, xk} ex: X={A, B, C, D, E, F}
Find all the rules X  Y with minimum support and confidence
 support, s, probability that a transaction contains X  Y
 confidence, c, conditional probability that a transaction having X also
contains Y

7
The Basics : Frequent Itemsets
Itemset X = {x1, …, xk} ex: X={A, B, C, D, E, F}
Find all the rules X  Y with minimum support and confidence
 support, s, probability that a transaction contains X  Y
 confidence, c, conditional probability that a transaction having X also
contains Y

8
The Basics : Association Rules
Ex: Let supmin = 50%, confmin = 50% Transaction-id Items bought
Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} 10 A, B, D
Association rules: 20 A, C, D
A  D (60%, 100%) 30 A, D, E
D  A (60%, 75%) 40 B, E, F
50 B, C, D, E, F

9
The Basics : Association Rules

 If frequency of itemset I satisfies min_support count then I is a frequent


itemset
 If a rule satisfies min_support and min_confidence thresholds, it is said
to be strong
 problem of mining association rules reduced to mining frequent itemsets
 Association rules mining becomes a two-step process:
 Find all frequent itemsets that occur at least as frequently as a
predetermined min_support count
 Generate strong association rules from the frequent itemsets that satisfy
min_support and min_confidence

10
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules

 Frequent Item set Mining Methods


• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth

 Pattern Evaluation Methods

11
Mining Frequent Itemsets: Apriori
Goes as follows:
 Find frequent 1-itemsets  L1
 Use L1 to find frequent 2-itemsets  L2
 … until no more frequent k-itemsets can be found

Each Lk itemset requires a full dataset scan

To improve efficiency, use the Apriori property:


 ―All nonempty subsets of a frequent itemset must also be frequent‖ –
if a set cannot pass a test, all of its supersets will fail the same test as
well – if P(I) < min_support then P(I  A) < min_support

12
Mining Frequent Itemsets: Apriori

Transactional data example


N=9, min_supp count=2 Scan dataset for Compare
count of each candidate support
candidate with min_support
TID List of items
C1 L1
T100 I1, I2, I5 Itemset Support
Itemset Support count
T200 I2, I4 count
T300 I2, I3 {I1} 6 {I1} 6
T400 I1, I2, I4 {I2} 7 {I2} 7
T500 I1, I3 {I3} 6 {I3} 6
T600 I2, I3 {I4} 2 {I4} 2
T700 I1, I3 {I5} 2 {I5} 2
T800 I1, I2, I3, I5
T900 I1, I2, I3
13
Mining Frequent Itemsets: Apriori

Transactional data example


N=9, min_supp count=2 Scan dataset for Compare
count of each candidate support
candidate with min_support
TID List of items
C1 L1
T100 I1, I2, I5 Itemset Support
Itemset Support count
T200 I2, I4 count
T300 I2, I3 {I1} 6 {I1} 6
T400 I1, I2, I4 {I2} 7 {I2} 7
T500 I1, I3 {I3} 6 {I3} 6
T600 I2, I3 {I4} 2 {I4} 2
T700 I1, I3 {I5} 2 {I5} 2
T800 I1, I2, I3, I5
T900 I1, I2, I3
14
Mining Frequent Itemsets: Apriori
Itemset Support
C2 Itemset C2 count
{I1, I2}
{I1, I2} 4 Itemset Support
{I1, I3} L2 count
Itemset Support
{I1, I3} 4
{I1, I4}
count {I1, I4} 1 {I1, I2} 4
{I1, I5}
{I1} 6 {I1, I5} 2 {I1, I3} 4
{I2, I3}
{I2} 7 {I2, I3} 4 {I1, I5} 2
{I2, I4}
{I3} 6 {I2, I4} 2 {I2, I3} 4
{I2, I5}
{I4} 2 {I2, I5} 2 {I2, I4} 2
{I3, I4}
{I5} 2 {I3, I4} 0 {I2, I5} 2
{I3, I5}
{I3, I5} 1
{I4, I5}
{I4, I5} 0
Compare candidate
Generate C2 candidates support with min_supp
Scan dataset for count
from L1 by joining L1  L1 of each candidate 15
Mining Frequent Itemsets: Apriori
C3 = L2  L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}
Not all subsets are frequent
Compare candidate
 Prune (Apriori property) Scan dataset for
count of each support with
Itemset Support candidate min_supp
count L3
{I1, I2} 4 C3
Itemset Support Itemset Support
{I1, I3} 4 Itemset count count
{I1, I5} 2 {I1, I2, I3} 2 {I1, I2, I3} 2
{I1, I2, I3}
{I2, I3} 4 {I1, I2, I5} 2 {I1, I2, I5} 2
{I1, I2, I5}
{I2, I4} 2
{I2, I5} 2
Two joining (lexicographically ordered) k-itemsets
must share first k-1 items 
Generate C3 candidates
{I1, I2} is not joined with {I2, I4}
from L2 by joining L2 L2
16
Mining Frequent Itemsets: Apriori

Itemset Support
count Itemset
Not all subsets are frequent
{I1, I2, I3} 2
{I1, I2, I3, I5}  Prune
{I1, I2, I5} 2

C4 =   Terminate

17
Mining Frequent Itemsets: Apriori

18
Apriori
Algorithm
Generate Ck using Lk-1 to find Lk

Join

Prune

19 11/6/2023
Mining Frequent Itemsets:
Generating Association Rules from Frequent Itemsets

20
Mining Frequent Itemsets:
Generating Association Rules from Frequent Itemsets
Nonempty subsets Association Rules Confidence
Itemset Support
count
{I1, I2} {I1, I2} I5 2/4 = 50%
{I1, I2, I3} 2
{I1, I2, I5} 2 {I1, I5} {I1, I5} I2 2/2 = 100%

{I2, I5} {I2, I5} I1 2/2 = 100%


{I1} I1 {I2, I5} 2/6 = 33%

{I2} I2 {I1, I5} 2/7 = 29%


{I5} I5 {I1, I2} 2/2 = 100%

For a min_confidence = 70%


21
Mining Frequent Itemsets:
FP-Growth

 To avoid costly candidate generation


 Divide-and-conquer strategy:
 Compress database representing frequent items into a frequent
pattern tree (FP-tree) – 2 passes over dataset
 Divide compressed database (FP-tree) into conditional databases,
then mine each for frequent itemsets – traverse through the FP-tree

22 11/6/2023
Mining Frequent Itemsets:
FP-Growth

Transactional data example Scan dataset for Compare candidate


N=9, min_supp count=2 count of each support with
candidate min_supp
TID List of items
T100 I1, I2, I5 C1 L1 - Reordered
T200 I2, I4 Itemset Support Itemset Support
count count
T300 I2, I3
{I1} 6 {I2} 7
T400 I1, I2, I4
T500 I1, I3 {I2} 7 {I1} 6
T600 I2, I3 {I3} 6 {I3} 6
T700 I1, I3 {I4} 2 {I4} 2
T800 I1, I2, I3, I5 {I5} 2 {I5} 2
T900 I1, I2, I3
23
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

FP-tree

L1 - Reordered null { }
Itemset Support Node
count Link

{I2} 7
{I1} 6
{I3} 6
{I4} 2
{I5} 2

24
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

FP-tree null { }

L1 - Reordered T100 TID List of items


Itemset Support Node I2:1
count Link
T100 I1, I2, I5
{I2} 7 T200 I2, I4
{I1} 6 I1:1 T300 I2, I3
{I3} 6 T400 I1, I2, I4
T500 I1, I3
{I4} 2
I5:1 T600 I2, I3
{I5} 2
T700 I1, I3
T800 I1, I2, I3, I5
Order of items is kept throughout path construction, with
T900 I1, I2, I3
common prefixes shared whenever applicable

25
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

FP-tree null { }

L1 - Reordered
Itemset Support Node
I2:1 T200 TID List of items
count Link T100 I1, I2, I5
{I2} 7 T200 I2, I4

{I1} 6 I1:1 I4:1 T300 I2, I3


T400 I1, I2, I4
{I3} 6
T500 I1, I3
{I4} 2 I5:1 T600 I2, I3
{I5} 2 T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

26
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree
null { }

L1 - Reordered
Itemset Support Node
I2:2 T200
count Link TID List of items

{I2} 7 T100 I1, I2, I5

{I1} 6 I1:1 I4:1 T200 I2, I4


T300 I2, I3
{I3} 6
T400 I1, I2, I4
{I4} 2 I5:1 T500 I1, I3
{I5} 2 T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

27
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree null { }

L1 - Reordered
Itemset Support Node
I2:2
count Link TID List of items

{I2} 7 T300 T100 I1, I2, I5

I1:1 I3:1 I4:1 T200 I2, I4


{I1} 6
T300 I2, I3
{I3} 6
T400 I1, I2, I4
{I4} 2 I5:1 T500 I1, I3
{I5} 2 T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

28
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree
null { }

L1 - Reordered
Itemset Support Node
I2:3
count TID List of items
Link
T100 I1, I2, I5
{I2} 7 T300
T200 I2, I4
{I1} 6 I1:1 I3:1 I4:1
T300 I2, I3
{I3} 6 T400 I1, I2, I4
{I4} 2 I5:1
T500 I1, I3

{I5} 2 T600 I2, I3


T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3

29
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction
FP-tree
null { }

L1 - Reordered
Itemset Support Node
I2:7 I1:2
count Link
{I2} 7
{I1} 6 I1:4 I3:2 I4:1 I3:2

{I3} 6
{I4} 2 I5:1 I3:2 I4:1
{I5} 2

For Tree
I5:1
Traversal

30
Mining Frequent Itemsets:
FP-Growth – FP-tree Construction

Bottom-up algorithm – start from leaves and FP-tree


null { }
go up to root
L1 - Reordered
I2:7 I1:2
Itemset Support Node
count Link

{I2} 7
I1:4 I3:2 I4:1 I3:2
{I1} 6
{I3} 6
{I4} 2 I5:1 I3:2 I4:1
{I5} 2

I5:1

31
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction

For I5 FP-tree
L1 - Reordered null { }
Itemset Support Node
count Link

{I2} 7
TID List of items
{I1} 6
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 T400 I1, I2, I4

Eliminate I5 T500 I1, I3


T600 I2, I3

Eliminate transactions T700 I1, I3


not including I5 T800 I1, I2, I3, I5
T900 I1,
32I2, I3
11/6/2023
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction
FP-tree null { }
For I5
L1 - Reordered
Itemset Support Node I2:1
count Link

{I2} 7
TID List of items
{I1} 6 I1:1
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 T400 I1, I2, I4

Eliminate transactions not T500 I1, I3


including I5 T600 I2, I3
T700 I1, I3
Eliminate I5 T800 I1, I2, I3, I5
T900 I1,
33I2, I3
11/6/2023
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction
FP-tree
For I5 null { }

L1 - Reordered
Itemset Support Node I2:2
count Link

{I2} 7
TID List of items
{I1} 6 I1:2
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
I3:1
{I5} 2 Eliminate transactions T400 I1, I2, I4
not including I5 T500 I1, I3
T600 I2, I3

Eliminate I5 T700 I1, I3


T800 I1, I2, I3, I5
T900 I1,
34I2, I3
11/6/2023
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction

For I4 FP-tree
null { }

L1 - Reordered
Itemset Support Node I2:2
count Link

{I2} 7
TID List of items
{I1} 6 I1:1
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 Eliminate transactions T400 I1, I2, I4
not including I4 T500 I1, I3
T600 I2, I3
Eliminate I4 T700 I1, I3
T800 I1, I2, I3, I5
T900 I1,
35I2, I3
11/6/2023
Mining Frequent Itemsets:
FP-Growth – Conditional FP-tree Construction
FP-tree
For I3 null { }

L1 - Reordered
Itemset Support Node I2:4 I1:2
count Link

{I2} 7
TID List of items
{I1} 6 I1:2
T100 I1, I2, I5
{I3} 6 Eliminate
T200 I2, I4
{I4} 2
transactions not
T300 I2, I3
including I3
{I5} 2 T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
Eliminate T700 I1, I3
I3 T800 I1, I2, I3, I5
T900 I1,
36I2, I3
11/6/2023
Mining Frequent Itemsets:
FP-Growth

Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated

I5 {{I2, I1: 1}, {I2, I1, I3: 1}} <I2:2, I1:2> {I2, I5: 2}, {I1, I5: 2},
{I2, I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} <I2:2> {I2, I4: 2}
I3 {{I2, I1: 2}, {I2: 2}, {I1: 2}} <I2:4, I1:2>, <I1:2> {I2, I3: 4}, {I1, I3: 4},
{I2, I1, I3: 2}
I1 {{I2: 4}} <I2:4> {I2, I1: 4}

Paths ending with item

37
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules

 Frequent Item set Mining Methods


• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth

 Pattern Evaluation Methods

38
Pattern Evaluation Methods

39
Pattern Evaluation Methods

40

You might also like