Data Mining

Data Mining and Business Intelligence
Apriori
Mining Frequent Patterns,

FP-Growth
Associations, & Correlations
Evaluation
By Methods
Dr. Nora Shoaip
Lecture 5
Damanhour University
Faculty of Computers & Information Sciences
Department of Information Systems
2023 - 2024
Outline
 The Basics
• Market Basket Analysis
• Frequent Item sets
• Association Rules
 Frequent Item set Mining Methods

• Apriori Algorithm
• Generating Association Rules from Frequent Item sets
• FP-Growth
 Pattern Evaluation Methods
2
The Basics: What Is Frequent Pattern Analysis?
• Frequent pattern: a pattern (a set of items, subsequences, substructures,

etc.) that occurs frequently in a data set
• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the

context of frequent itemsets and association rule mining
3
The Basics
21
The Basics
Motivation: Finding inherent regularities in data

What products were often purchased together?— Beer and diapers?!
What are the subsequent purchases after buying a PC?
What kinds of DNA are sensitive to this new drug?
Can we automatically classify web documents?
Applications
Basket data analysis, cross-marketing, catalog design, sale campaign analysis,
Web log (click stream) analysis, and DNA sequence analysis
5
The Basics
6
The Basics : Frequent Itemsets
Itemset X = {x1, …, xk} ex: X={A, B, C, D, E, F}
Find all the rules X  Y with minimum support and confidence
 support, s, probability that a transaction contains X  Y
 confidence, c, conditional probability that a transaction having X also
contains Y
7
The Basics : Frequent Itemsets
Itemset X = {x1, …, xk} ex: X={A, B, C, D, E, F}
Find all the rules X  Y with minimum support and confidence
 support, s, probability that a transaction contains X  Y
 confidence, c, conditional probability that a transaction having X also
contains Y
8
The Basics : Association Rules
Ex: Let supmin = 50%, confmin = 50% Transaction-id Items bought
Freq. Pat.: {A:3, B:3, D:4, E:3, AD:3} 10 A, B, D
Association rules: 20 A, C, D
A  D (60%, 100%) 30 A, D, E
D  A (60%, 75%) 40 B, E, F
50 B, C, D, E, F
9
The Basics : Association Rules
 If frequency of itemset I satisfies min_support count then I is a frequent

itemset
 If a rule satisfies min_support and min_confidence thresholds, it is said
to be strong
 problem of mining association rules reduced to mining frequent itemsets
 Association rules mining becomes a two-step process:
 Find all frequent itemsets that occur at least as frequently as a
predetermined min_support count
 Generate strong association rules from the frequent itemsets that satisfy
min_support and min_confidence
10
Outline
 The Basics

• FP-Growth
11
Mining Frequent Itemsets: Apriori
Goes as follows:
 Find frequent 1-itemsets  L1
 Use L1 to find frequent 2-itemsets  L2
 … until no more frequent k-itemsets can be found
Each Lk itemset requires a full dataset scan
To improve efficiency, use the Apriori property:

 ―All nonempty subsets of a frequent itemset must also be frequent‖ –
if a set cannot pass a test, all of its supersets will fail the same test as
well – if P(I) < min_support then P(I  A) < min_support
12
Transactional data example

N=9, min_supp count=2 Scan dataset for Compare
count of each candidate support
candidate with min_support
TID List of items
C1 L1
T100 I1, I2, I5 Itemset Support
Itemset Support count
T200 I2, I4 count
T300 I2, I3 {I1} 6 {I1} 6
T400 I1, I2, I4 {I2} 7 {I2} 7
T500 I1, I3 {I3} 6 {I3} 6
T600 I2, I3 {I4} 2 {I4} 2
T700 I1, I3 {I5} 2 {I5} 2
T800 I1, I2, I3, I5
T900 I1, I2, I3
13
Transactional data example

N=9, min_supp count=2 Scan dataset for Compare
count of each candidate support
candidate with min_support
TID List of items
C1 L1
T100 I1, I2, I5 Itemset Support
Itemset Support count
T200 I2, I4 count
T300 I2, I3 {I1} 6 {I1} 6
T400 I1, I2, I4 {I2} 7 {I2} 7
T500 I1, I3 {I3} 6 {I3} 6
T600 I2, I3 {I4} 2 {I4} 2
T700 I1, I3 {I5} 2 {I5} 2
T800 I1, I2, I3, I5
T900 I1, I2, I3
14
Itemset Support
C2 Itemset C2 count
{I1, I2}
{I1, I2} 4 Itemset Support
{I1, I3} L2 count
Itemset Support
{I1, I3} 4
{I1, I4}
count {I1, I4} 1 {I1, I2} 4
{I1, I5}
{I1} 6 {I1, I5} 2 {I1, I3} 4
{I2, I3}
{I2} 7 {I2, I3} 4 {I1, I5} 2
{I2, I4}
{I3} 6 {I2, I4} 2 {I2, I3} 4
{I2, I5}
{I4} 2 {I2, I5} 2 {I2, I4} 2
{I3, I4}
{I5} 2 {I3, I4} 0 {I2, I5} 2
{I3, I5}
{I3, I5} 1
{I4, I5}
{I4, I5} 0
Compare candidate
Generate C2 candidates support with min_supp
Scan dataset for count
from L1 by joining L1  L1 of each candidate 15
C3 = L2  L2 = {{I1, I2, I3}, {I1, I2, I5}, {I1, I3, I5}, {I2, I3, I4}, {I2, I3, I5}, {I2, I4, I5}}
Not all subsets are frequent
Compare candidate
 Prune (Apriori property) Scan dataset for
count of each support with
Itemset Support candidate min_supp
count L3
{I1, I2} 4 C3
Itemset Support Itemset Support
{I1, I3} 4 Itemset count count
{I1, I5} 2 {I1, I2, I3} 2 {I1, I2, I3} 2
{I1, I2, I3}
{I2, I3} 4 {I1, I2, I5} 2 {I1, I2, I5} 2
{I1, I2, I5}
{I2, I4} 2
{I2, I5} 2
Two joining (lexicographically ordered) k-itemsets
must share first k-1 items 
Generate C3 candidates
{I1, I2} is not joined with {I2, I4}
from L2 by joining L2 L2
16
Itemset Support
count Itemset
Not all subsets are frequent
{I1, I2, I3} 2
{I1, I2, I3, I5}  Prune
{I1, I2, I5} 2
C4 =   Terminate
17
18
Apriori
Algorithm
Generate Ck using Lk-1 to find Lk
Join
Prune
19 11/6/2023
Mining Frequent Itemsets:
Generating Association Rules from Frequent Itemsets
20
Generating Association Rules from Frequent Itemsets
Nonempty subsets Association Rules Confidence
Itemset Support
count
{I1, I2} {I1, I2} I5 2/4 = 50%
{I1, I2, I3} 2
{I1, I2, I5} 2 {I1, I5} {I1, I5} I2 2/2 = 100%
{I2, I5} {I2, I5} I1 2/2 = 100%

{I1} I1 {I2, I5} 2/6 = 33%
{I2} I2 {I1, I5} 2/7 = 29%

{I5} I5 {I1, I2} 2/2 = 100%
For a min_confidence = 70%

21
FP-Growth
 To avoid costly candidate generation

 Divide-and-conquer strategy:
 Compress database representing frequent items into a frequent
pattern tree (FP-tree) – 2 passes over dataset
 Divide compressed database (FP-tree) into conditional databases,
then mine each for frequent itemsets – traverse through the FP-tree
22 11/6/2023
FP-Growth
Transactional data example Scan dataset for Compare candidate

N=9, min_supp count=2 count of each support with
candidate min_supp
TID List of items
T100 I1, I2, I5 C1 L1 - Reordered
T200 I2, I4 Itemset Support Itemset Support
count count
T300 I2, I3
{I1} 6 {I2} 7
T400 I1, I2, I4
T500 I1, I3 {I2} 7 {I1} 6
T600 I2, I3 {I3} 6 {I3} 6
T700 I1, I3 {I4} 2 {I4} 2
T800 I1, I2, I3, I5 {I5} 2 {I5} 2
T900 I1, I2, I3
23
FP-Growth – FP-tree Construction
FP-tree
L1 - Reordered null { }
Itemset Support Node
count Link
{I2} 7
{I1} 6
{I3} 6
{I4} 2
{I5} 2
24
FP-tree null { }
L1 - Reordered T100 TID List of items

Itemset Support Node I2:1
count Link
T100 I1, I2, I5
{I2} 7 T200 I2, I4
{I1} 6 I1:1 T300 I2, I3
{I3} 6 T400 I1, I2, I4
T500 I1, I3
{I4} 2
I5:1 T600 I2, I3
{I5} 2
T700 I1, I3
T800 I1, I2, I3, I5
Order of items is kept throughout path construction, with
T900 I1, I2, I3
common prefixes shared whenever applicable
25
FP-tree null { }
L1 - Reordered
I2:1 T200 TID List of items
count Link T100 I1, I2, I5
{I2} 7 T200 I2, I4
{I1} 6 I1:1 I4:1 T300 I2, I3

T400 I1, I2, I4
{I3} 6
T500 I1, I3
{I4} 2 I5:1 T600 I2, I3
{I5} 2 T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
26
FP-tree
null { }
L1 - Reordered
I2:2 T200
count Link TID List of items
{I2} 7 T100 I1, I2, I5
{I1} 6 I1:1 I4:1 T200 I2, I4

T300 I2, I3
{I3} 6
T400 I1, I2, I4
{I4} 2 I5:1 T500 I1, I3
{I5} 2 T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
27
FP-tree null { }
L1 - Reordered
I2:2
count Link TID List of items
{I2} 7 T300 T100 I1, I2, I5
I1:1 I3:1 I4:1 T200 I2, I4

{I1} 6
T300 I2, I3
{I3} 6
T400 I1, I2, I4
{I4} 2 I5:1 T500 I1, I3
{I5} 2 T600 I2, I3
T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
28
FP-tree
null { }
L1 - Reordered
I2:3
count TID List of items
Link
T100 I1, I2, I5
{I2} 7 T300
T200 I2, I4
{I1} 6 I1:1 I3:1 I4:1
T300 I2, I3
{I3} 6 T400 I1, I2, I4
{I4} 2 I5:1
T500 I1, I3
{I5} 2 T600 I2, I3

T700 I1, I3
T800 I1, I2, I3, I5
T900 I1, I2, I3
29
FP-tree
null { }
L1 - Reordered
I2:7 I1:2
count Link
{I2} 7
{I1} 6 I1:4 I3:2 I4:1 I3:2
{I3} 6
{I4} 2 I5:1 I3:2 I4:1
{I5} 2
For Tree
I5:1
Traversal
30
Bottom-up algorithm – start from leaves and FP-tree

null { }
go up to root
L1 - Reordered
I2:7 I1:2
count Link
{I2} 7
I1:4 I3:2 I4:1 I3:2
{I1} 6
{I3} 6
{I4} 2 I5:1 I3:2 I4:1
{I5} 2
I5:1
31
FP-Growth – Conditional FP-tree Construction
For I5 FP-tree
L1 - Reordered null { }
count Link
{I2} 7
TID List of items
{I1} 6
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 T400 I1, I2, I4
Eliminate I5 T500 I1, I3

T600 I2, I3
Eliminate transactions T700 I1, I3

not including I5 T800 I1, I2, I3, I5
T900 I1,
32I2, I3
11/6/2023
FP-tree null { }
For I5
L1 - Reordered
count Link
{I2} 7
TID List of items
{I1} 6 I1:1
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 T400 I1, I2, I4
Eliminate transactions not T500 I1, I3

including I5 T600 I2, I3
T700 I1, I3
Eliminate I5 T800 I1, I2, I3, I5
T900 I1,
33I2, I3
11/6/2023
FP-tree
For I5 null { }
L1 - Reordered
count Link
{I2} 7
TID List of items
{I1} 6 I1:2
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
I3:1
{I5} 2 Eliminate transactions T400 I1, I2, I4
not including I5 T500 I1, I3
T600 I2, I3

T800 I1, I2, I3, I5
T900 I1,
34I2, I3
11/6/2023
For I4 FP-tree
null { }
L1 - Reordered
count Link
{I2} 7
TID List of items
{I1} 6 I1:1
T100 I1, I2, I5
{I3} 6
T200 I2, I4
{I4} 2 T300 I2, I3
{I5} 2 Eliminate transactions T400 I1, I2, I4
not including I4 T500 I1, I3
T600 I2, I3
T800 I1, I2, I3, I5
T900 I1,
35I2, I3
11/6/2023
FP-tree
For I3 null { }
L1 - Reordered
Itemset Support Node I2:4 I1:2
count Link
{I2} 7
TID List of items
{I1} 6 I1:2
T100 I1, I2, I5
{I3} 6 Eliminate
T200 I2, I4
{I4} 2
transactions not
T300 I2, I3
including I3
{I5} 2 T400 I1, I2, I4
T500 I1, I3
T600 I2, I3
Eliminate T700 I1, I3
I3 T800 I1, I2, I3, I5
T900 I1,
36I2, I3
11/6/2023
FP-Growth
Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated
I5 {{I2, I1: 1}, {I2, I1, I3: 1}} <I2:2, I1:2> {I2, I5: 2}, {I1, I5: 2},
{I2, I1, I5: 2}
I4 {{I2, I1: 1}, {I2: 1}} <I2:2> {I2, I4: 2}
I3 {{I2, I1: 2}, {I2: 2}, {I1: 2}} <I2:4, I1:2>, <I1:2> {I2, I3: 4}, {I1, I3: 4},
{I2, I1, I3: 2}
I1 {{I2: 4}} <I2:4> {I2, I1: 4}
Paths ending with item
37
Outline
 The Basics

• FP-Growth
38
Pattern Evaluation Methods
39
Pattern Evaluation Methods
40

Data Mining

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining

Uploaded by

Copyright:

Available Formats

Data Mining and Business Intelligence

Mining Frequent Patterns,

 Frequent Item set Mining Methods

 Pattern Evaluation Methods

• Frequent pattern: a pattern (a set of items, subsequences, substructures,

• First proposed by Agrawal, Imielinski, and Swami [AIS93] in the

Motivation: Finding inherent regularities in data

 If frequency of itemset I satisfies min_support count then I is a frequent

 Frequent Item set Mining Methods

 Pattern Evaluation Methods

Each Lk itemset requires a full dataset scan

To improve efficiency, use the Apriori property:

Transactional data example

Transactional data example

{I2, I5} {I2, I5} I1 2/2 = 100%

{I2} I2 {I1, I5} 2/7 = 29%

For a min_confidence = 70%

 To avoid costly candidate generation

Transactional data example Scan dataset for Compare candidate

L1 - Reordered T100 TID List of items

{I1} 6 I1:1 I4:1 T300 I2, I3

{I2} 7 T100 I1, I2, I5

{I1} 6 I1:1 I4:1 T200 I2, I4

{I2} 7 T300 T100 I1, I2, I5

I1:1 I3:1 I4:1 T200 I2, I4

{I5} 2 T600 I2, I3

Bottom-up algorithm – start from leaves and FP-tree

Eliminate I5 T500 I1, I3

Eliminate transactions T700 I1, I3

Eliminate transactions not T500 I1, I3

Eliminate I5 T700 I1, I3

Item Conditional Pattern Base Conditional FP-tree Frequent Patterns Generated

Paths ending with item

 Frequent Item set Mining Methods

 Pattern Evaluation Methods

You might also like