Professional Documents
Culture Documents
Association mining
Support
A transaction t is said to support an item Ii, if Ii
is present in t. t is said to support a subset of
items X ⊆ A, if t supports each item I in X.
An itemset X ⊆ A has a support s in T, denoted
by s(X)T if s% transaction in T supports X
The support of an item is the percentage of
transaction in which that item occurs.
The rule holds with support sup in T (the
transaction data set) if sup% of transactions
contain X Y.
sup = Pr(X Y).
Association mining
A={bear, diaper}
Customer
Customer
buys both
Support (bear, diaper)= buys diaper
=10000/500000
= 2%
Customer
buys beer
No of Transaction Item
20,000 Beer
30,000 Diaper
10,000 Diaper and beer
5,00,000 Total transaction
Association mining
Confidence
For a given transaction database T, an association
rule is an expression of the form X->Y where x
and y are subsets of A and X Y holds with the
confidence 𝞃, if 𝞃% of transaction in D that support
X also support Y
The rule holds in T with confidence conf if conf% of
transactions that contain X also contain Y.
conf = Pr(Y | X)
confidence for an association rule X Y is the
ratio of the number of transaction that contain X
U Y to the number of transaction that contain x
Confidence measures how much particular item
depend on another
Association mining
Confidence Example
( X Y ).count
confidence
X .count
People buy diapers also buy beer
= no of transaction (beer, diaper) / no of
diaper transaction
= 10000/20000
= 50%
People buy beer also buy diaper =
10,000/30,000
= 33.33%
Association mining
Confidence Example
Confidence (ANN CC)
= no of transaction for both book/ no of
transaction for
ANN
= 4/4 =100%
Confidence (CC ANN)
= no of transaction for both book Purchased/
no
of transaction for CC
= 4/6 =66%
Association mining
Confidence Example
Confidence (ANN CC)
= no of transaction for both book/ no of
transaction for
ANN
= 4/4 =100%
Confidence (CC ANN)
= no of transaction for both book Purchased/
no
of transaction for CC
= 4/6 =66%
Association mining
Frequent Set:
Let T be the transaction database and 𝞂 be
the user specified minimum support.
An itemset x ⊆ A is said to be frequent
itemset in T with respect to 𝞂 if s(X)T ≥ 𝞂
Example :let us say 𝞂 =50%,
• then {ANN,CC, TC} is frequent set as supported
by 3 transaction out of 6. (any subset of this set
is also frequent set)
• But {ANN, CC, DS} is not frequent set ( so no set
which properly contains this set is frequent set)
Methods to discover
Association Rules
Closed
An itemset X is closed in a dataset D if there
exist no proper super itemset such that Y
has same support count as X in D
Closed frequent itemset
An itemset X is a Closed frequent itemset in
a set D if x is both closed and frequent in D
Border set
An itemset is a border set if it is not a
frequent set, but all its proper subsets are
frequent sets
Methods to discover
Association Rules
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Methods to discover
Association Rules
Example A1 A2 A3 A4 A5 A6 A7 A8 A9
1 0 0 0 1 1 0 1 0
Assume
0 1 0 1 0 0 0 1 0
𝞂 =20% as we 0 0 0 1 1 0 1 0 0
have 10 0 1 1 0 0 0 0 0 0
0 0 0 0 1 1 1 0 0
Records
0 1 1 1 0 0 0 0 0
So an itemset0 1 0 0 0 1 1 0 1
supported by0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 0 1 0
atleast 2 0 0 1 0 1 0 1 0 0
transaction
is frequent set
Methods to discover
Association Rules
Large Itemset
Is an itemset whose number of occurrences
are above threshold, s
The most common approach to find
association rules is to break up the
problem into two parts
Find large itemset
Generate rules from frequent itemset
Methods to discover
Association Rules
TID Items
null
1 {A,B} After reading transactions
2 {B,C,D} TID=1, 2:
3 {A,C,D,E} A:1 B:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} B:1 C:1
7 {B,C}
8 {A,B,C} D:1
Header Table
9 {A,B,D}
10 {B,C,E} Item Pointer
A
B
The Header Table and the C
pointers assist in computing D
the itemset support E
FP-tree Construction
Reading transaction TID = 3
null
TID Items
A:1 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer
8 {A,B,C} A
9 {A,B,D} B
10 {B,C,E} C
D
E
FP-tree Construction
Reading transaction TID = 3
null
TID Items
A:2 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1
C:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer D:1
8 {A,B,C} A
9 {A,B,D} B E:1
10 {B,C,E} C
D
E
FP-tree Construction
Reading transaction TID = 3
null
TID Items
A:2 B:1
1 {A,B}
2 {B,C,D}
3 {A,C,D,E} B:1
C:1 C:1
4 {A,D,E}
5 {A,B,C}
6 {A,B,C,D} D:1
7 {B,C} Item Pointer D:1
8 {A,B,C} A
9 {A,B,D} B E:1
10 {B,C,E} C
D
E
All Itemsets
Ε D C B A
DE CE BE AE CD BD AD BC AC AB
CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
ABCDE
Frequent Itemsets
All Itemsets
Ε D C B A
Frequent?;
DE CE BE AE CD BD AD BC AC AB
Frequent?;
CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent?
ABCDE
Frequent Itemsets
All Itemsets
Frequent?
Ε D C B A
DE CE BE AE CD BD AD BC AC AB
Frequent?
CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent? Frequent?
ABCDE
Frequent Itemsets
All Itemsets
Ε D C B A
Frequent?
DE CE BE AE CD BD AD BC AC AB
Frequent?
CDE BDE ADE BCE ACE ABE BCD ACD ABD ABC
Frequent?
ABCDE
We can generate all itemsets this way
We expect the FP-tree to contain a lot less
Using the FP-tree to find frequent itemsets
TID Items
Transaction
1 {A,B}
2 {B,C,D}
Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1
E:1
Header table D:1
C:3
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D
Bottom-up traversal of the tree.
E First, itemsets ending in E, then D, etc,
each time a suffix-based class
Finding Frequent Itemsets
null
Subproblem: find frequent
itemsets ending in E
A:7 B:3
B:5 C:3
C:1 D:1
We will then see how to compute the support for the possible itemsets
Finding Frequent Itemsets
null
Ending in D
A:7 B:3
B:5 C:3
C:1 D:1
Ending in C
A:7 B:3
B:5 C:3
C:1 D:1
Ending in B
A:7 B:3
B:5 C:3
C:1 D:1
A:7 B:3
B:5 C:3
C:1 D:1
Phase 2
If X is frequent, construct the conditional FP-tree
for X in the following steps
1. Recompute support
2. Prune infrequent items
3. Prune leaves and recurse
Example
null
Phase 1 – construct prefix
tree
A:7 B:3
Find all prefix paths that
contain E
B:5 C:3
C:1 D:1
C:3
C:1 D:1
E:1
E:1
E:1
Example
null
Recompute Support
A:7 B:3
A:7 B:3
C:3
C:1 D:1
E:1
Example
null
A:7 B:3
C:1
C:1 D:1
E:1
Example
null
A:7 B:1
C:1
C:1 D:1
E:1
Example
null
A:7 B:1
C:1
C:1 D:1
E:1
Example
null
A:7 B:1
C:1
C:1 D:1
E:1
Example
null
A:2 B:1
C:1
C:1 D:1
E:1
Example
null
A:2 B:1
C:1
C:1 D:1
E:1
Example
null
A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1
E:1
Example
null
A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1
E:1
Example
null
A:2 B:1
Truncate
Delete the nodes of Ε C:1
C:1 D:1
D:1
Example
null
A:2 B:1
Prune infrequent
In the conditional FP-tree C:1
some nodes may have C:1 D:1
support less than minsup
e.g., B needs to be pruned D:1
A:2 B:1
C:1
C:1 D:1
D:1
Example
null
A:2 C:1
C:1 D:1
D:1
Example
null
A:2 C:1
C:1 D:1
D:1
A:2 C:1
C:1 D:1
D:1
Phase 1
Find all prefix paths that contain D (DE) in the conditional FP-tree
Example
null
A:2
C:1 D:1
D:1
Phase 1
Find all prefix paths that contain D (DE) in the conditional FP-tree
Example
null
A:2
C:1 D:1
D:1
{D,E} is frequent
Example
null
A:2
C:1 D:1
D:1
Phase 2
A:2
D:1
Example
null
A:2
D:1
Example
null
A:2
A:2
Small support
Prune nodes C:1
Example
null
A:2
A:2 C:1
C:1 D:1
D:1
A:2 C:1
C:1 D:1
D:1
Phase 1
Find all prefix paths that contain C (CE) in the conditional FP-tree
Example
null
A:2 C:1
C:1
Phase 1
Find all prefix paths that contain C (CE) in the conditional FP-tree
Example
null
A:2 C:1
C:1
{C,E} is frequent
Example
null
A:2 C:1
C:1
Phase 2
A:1 C:1
A:1 C:1
A:1
Prune nodes
Example
null
A:1
Prune nodes
Example
null
Prune nodes
A:2 C:1
C:1 D:1
D:1
A:2 C:1
C:1 D:1
D:1
Phase 1
Find all prefix paths that contain A (AE) in the conditional FP-tree
Example
null
A:2
Phase 1
Find all prefix paths that contain A (AE) in the conditional FP-tree
Example
null
A:2
{A,E} is frequent
We proceed with D
Example
null
Ending in D
A:7 B:3
B:5 C:3
C:1 D:1
D:1
Example
null
A:7 B:3
B:5 C:3
C:1 D:1
D:1
C:1
D:1
D:1
D:1
Recompute support
Example
null
A:7 B:3
B:2 C:3
C:1 D:1
D:1
C:1
D:1
D:1
D:1
Recompute support
Example
null
A:3 B:3
B:2 C:3
C:1 D:1
D:1
C:1
D:1
D:1
D:1
Recompute support
Example
null
A:3 B:3
B:2 C:1
C:1 D:1
D:1
C:1
D:1
D:1
D:1
Recompute support
Example
null
A:3 B:1
B:2 C:1
C:1 D:1
D:1
C:1
D:1
D:1
D:1
Recompute support
Example
null
A:3 B:1
B:2 C:1
C:1 D:1
D:1
C:1
D:1
D:1
D:1
Prune nodes
Example
null
A:3 B:1
B:2 C:1
C:1
C:1
Prune nodes
Example
null
A:3 B:1
B:2 C:1
C:1
C:1
And so on….
Observations
At each recursive step we solve a
subproblem
Construct the prefix tree
Compute the new support
Prune nodes
Subproblems are disjoint so we never
consider the same itemset twice