You are on page 1of 8

CS423

DATA WAREHOUSING AND DATA


MINING

Chapter 6d

Frequent Pattern Tree

Dr. Hammad Afzal

hammad.afzal@mcs.edu.pk

Department of Computer Software Engineering


National University of Sciences and Technology (NUST)
USING VERTICAL DATA FORMAT
 From Book
ECLAT: MINING BY EXPLORING VERTICAL DATA
FORMAT

 Vertical format: t(AB) = {T11, T25, …}


 tid-list: list of trans.-ids containing an itemset

 Use tid-list, the list of transaction-ids containing an itemset.


I1 T1,T4,T5,T7,T8,T9
I2 T1,T2,T3,T4,T6,T8,T9
I3 T3,T5,T6,T7,T8,T9
I4 T2,T4
I5 T1,T8

3
INTERESTINGNESS MEASURE:
CORRELATIONS (LIFT)
 Suppose a DB with 10,000 Transactions
 6000 include games
 7500 include videos
 4000 include both
 Data Mining algo with min_sup = 30% and min_conf=60%
 Buy Game Buy Video [40%, 66%]
 is misleading
 The overall % of students buying videos is 75% > 66%.

Game Not Game Sum (row)


Video 4000 3500 7500

Not Video 2000 500 2500 4

Sum(col.) 6000 4000 10000


INTERESTINGNESS MEASURE:
CORRELATIONS (LIFT)
 Measure of dependent/correlated events: lift
P ( A B )
lift 
P ( A) P ( B )

 If Lift = 1; independent; if less than 1, negatively correlated; > 1 positively


related
Game Not Game Sum (row)
Video 4000 3500 7500

Not Video 2000 500 2500


0.4
lift ( B, C )   0.89 Sum(col.) 6000 4000 10000
0.6 * 0.75

5
INTERESTINGNESS MEASURE:
CORRELATIONS (2)
 Measure of dependent/correlated events: lift

Game Not Game Sum (row)

Video 4000 (4500) 3500 (3000) 7500

Not Video 2000 (1500) 500 (1000) 2500

Sum(col.) 6000 4000 10000

6
ARE LIFT AND 2 GOOD MEASURES OF CORRELATION?

 “Buy walnuts  buy milk [1%, 80%]” is misleading if 85%


of customers buy milk

 Support and confidence are not good to indicate correlations

 Over 20 interestingness measures have been proposed.

7
SUMMARY

 Basic concepts: association rules, support-confident framework,


closed and max-patterns
 Scalable frequent pattern mining methods
 Apriori (Candidate generation & test)
 Projection-based (FPgrowth,)
 Vertical format approach (ECLAT)
 Which patterns are interesting?
 Pattern evaluation methods

You might also like