You are on page 1of 27

ECLAT and Hash Tree

By: Kashif Ayyub

1
ECLAT Algorithm

• Equivalence Class Clustering and bottom-up Lattice Traversal.


• One of the popular methods for Frequent Itemset Generation.
• It is a more efficient and scalable version of the Apriori
algorithm.
• The ECLAT algorithm works in a vertical manner just like the
Depth-First Search of a graph.
• This vertical approach of the ECLAT algorithm makes it a faster
algorithm than the Apriori algorithm.
• Few scans of database (best case 2).

2
ECLAT Algorithm - Pseudocode

1. Get tidlist for each item (database scan)


2. Tidlist of {a} is exactly the list of transactions
containing {a}
3. Intersect tidlist of {a} with the tidlists of all other
items, resulting in tidlists of {a,b}, {a,c}, {a,d}, ...={a}-
conditional database (if {a} removed)
4. Repeat from 1 on {a}-conditional database
5. Repeat for all other items
3
Transactions Dataset

Transactions
TID Items
1 Bread, Butter, Jam
2 Butter, Coke
3 Butter, Milk
4 Bread, Butter, Coke
5 Bread, Milk
6 Butter, Milk
7 Bread, Milk
8 Bread, Butter, Milk, Jam
9 Bread, Butter, Milk

4
Frequent Itemset Generation

Transactions Frequent 1-itemset


TID Items Items TID Set
1 Bread, Butter, Jam {Bread} 1, 4, 5, 7, 8, 9
2 Butter, Coke {Butter} 1, 2, 3, 4, 6, 8, 9
3 Butter, Milk {Milk} 3, 5, 6, 7, 8, 9
4 Bread, Butter, Coke {Coke} 2, 4
5 Bread, Milk {Jam} 1, 8
6 Butter, Milk
7 Bread, Milk
8 Bread, Butter, Milk, Jam
9 Bread, Butter, Milk
Min_Support=2

5
Frequent Itemset Generation

Frequent 1-itemset Frequent 2-itemset Frequent 2-itemset


Items TID Set Items TID Set Items TID Set
{Bread} 1, 4, 5, 7, 8, 9 {Bread, Butter} 1, 4, 8, 9 {Bread, Butter} 1, 4, 8, 9
{Butter} 1, 2, 3, 4, 6, 8, 9 {Bread, Milk} 5, 7, 8, 9 {Bread, Milk} 5, 7, 8, 9
{Milk} 3, 5, 6, 7, 8, 9 {Bread, Coke} 4 {Bread, Jam} 1, 8
{Coke} 2, 4 {Bread, Jam} 1, 8 {Butter, Milk} 3, 6, 8, 9
{Jam} 1, 8 {Butter, Milk} 3, 6, 8, 9 {Butter, Coke} 2, 4
{Butter, Coke} 2, 4 {Butter, Jam} 1, 8
{Butter, Jam} 1, 8
{Milk, Coke} 
{Milk, Jam} 8
Min_Support=2 {Coke, Jam} 

6
Frequent Itemset Generation

Frequent 2-itemset Frequent 3-itemset Frequent 3-itemset


Items TID Set Items TID Set Items TID Set
{Bread, Butter} 1, 4, 8, 9 {Bread, Butter, Milk} 8, 9 {Bread, Butter, Milk} 8, 9
{Bread, Milk} 5, 7, 8, 9 {Bread, Butter, Jam} 1, 8 {Bread, Butter, Jam} 1, 8
{Bread, Jam} 1, 8 {Bread, Butter, Coke} 4
{Butter, Milk} 3, 6, 8, 9 {Bread, Milk, Jam} 8
{Butter, Coke} 2, 4 {Butter, Milk, Coke} 
{Butter, Jam} 1, 8

Min_Support=2

7
Frequent Itemset Generation

Frequent 3-itemset Frequent 4-itemset


Items TID Set Items TID Set
{Bread, Butter, Milk} 8, 9 {Bread, Butter, Milk, Jam} 8
{Bread, Butter, Jam} 1, 8

Min_Support=2

8
Frequent Itemset Generation

Frequent 1-itemset Frequent 2-itemset Frequent 3-itemset


Items TID Set Items TID Set Items TID Set
{Bread} 1, 4, 5, 7, 8, 9 {Bread, Butter} 1, 4, 8, 9 {Bread, Butter, Milk} 8, 9
{Butter} 1, 2, 3, 4, 6, 8, 9 {Bread, Milk} 5, 7, 8, 9 {Bread, Butter, Jam} 1, 8
{Milk} 3, 5, 6, 7, 8, 9 {Bread, Jam} 1, 8
{Coke} 2, 4 {Butter, Milk} 3, 6, 8, 9
{Jam} 1, 8 {Butter, Coke} 2, 4
{Butter, Jam} 1, 8

Min_Support=2

9
Hash Tree

10
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
• Hash function
• Max leaf size: max number of itemsets stored in a leaf node (if number of candidate
itemsets exceeds max leaf size, split the node)

Hash function Hash function


3,6,9
1,4,7
h(v)=1 h(v)=2 h(v)=0 2,5,8

11
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145

12
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145
124
457
125

13
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145 145
124 124
457 457
125 125

14
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145
124
457
125
458

15
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145 145
124
457 124
125 457 125
458 458

16
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145

124
457 125 159
458

17
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

145
136

124
457 125 159
458

18
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

234
567
145
136

124
457 125 159
458

19
Generate Hash Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5
7}, {6 8 9}, {3 6 7}, {3 6 8}

234
567
145 345 356 367
136 368
357
124 689
457 125 159
458

20
Hash Tree
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5}, {3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}

Hash Function Candidate Hash Tree

1,4,7 3,6,9
2,5,8
234
567

145 136
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458

21
Hash Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9
2,5,8
234
567

145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458

22
Hash Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9
2,5,8
234
567

145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458

23
Subset Operation
Given a transaction t, what are
Transaction, t
the possible subsets of size 3?
1 2 3 5 6

Level 1
Items: 5 1 2 3 5 6 2 3 5 6 3 5 6
Subset size:3

Total= Level 2

Total= 12 3 5 6 13 5 6 15 6 23 5 6 25 6 35 6

Total=10

123
135 235
125 156 256 356
136 236
126

Level 3 Subsets of 3 items


24
Subset Operation Using Hash Tree
Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
2,5,8
3+ 56

234
567

145 136
345 356 367
357 368
124 159 689
125
457 458

25
Subset Operation Using Hash Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356
2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458

26
Subset Operation Using Hash Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356
2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458

27

You might also like