You are on page 1of 24

Lut kt hp Association rules

Thanh Ngh dtnghi@cit.ctu.edu.vn

Outline
Gii thiu Lut kt hp ng dng

Outline
Gii thiu Lut kt hp ng dng

Transactions
TID Produce 1 MILK, BREAD, EGGS 2 BREAD, SUGAR 3 BREAD, CEREAL 4 MILK, BREAD, SUGAR 5 MILK, CEREAL 6 BREAD, CEREAL 7 MILK, CEREAL 8 MILK, BREAD, CEREAL, EGGS 9 MILK, BREAD, CEREAL

Transaction
TID Products 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C

ITEMS: A = milk B= bread C= cereal D= sugar E= eggs

Instances = Transactions

Transaction
TID Products 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C

Attributes converted to binary flags


TID 1 2 3 4 5 6 7 8 9 A 1 0 0 1 1 0 1 1 1 B 1 1 1 1 0 1 0 1 1 C 0 0 1 0 1 1 1 1 1 D 0 1 0 1 0 0 0 0 0 E 1 0 0 0 0 0 0 1 0

nh ngha
Item: cp thuc tnh = gi tr hay gi tr Itemset I : tp ca cc items
v d : I = {A,B,E} (th t khng quan trng)

Transaction: (TID, itemset)


TID l transaction ID

Support v Frequent Itemsets


Support ca itemset
sup(I ) = s lng ca transactions t c cha I v d : sup ({A,B,E}) = 2, sup ({B,C}) = 4

Frequent itemset I l tp c support ti thiu l minimum support


sup(I ) >= minsup

Tnh cht ca subset


Mi tp con ca 1 frequent set l frequent!
v d : gi s {A,B} l frequent, khi s ln xut hin ca c A,B l frequent => hin nhin l s ln xut hin ca A hoc B cng frequent

tt c cc gii thut lut kt hp u da trn tnh cht subset

Outline
Gii thiu Lut kt hp ng dng

10

Lut kt hp
Lut kt hp R : Itemset1 => Itemset2
Itemset1, 2 khng giao nhau v Itemset2 khng rng ngha : nu transaction c cha Itemset1 th n cng cha Itemset2

v d
A,B => E,C A => B,C

11

Lut kt hp
cho frequent set {A,B,E}, lut kt hp c th

l
A => B, E A, B => E A, E => B B => A, E B, E => A E => A, B __ => A,B,E (empty rule) hay true => A,B,E

12

khc nhau gia lut phn lp v lut kt hp


lut phn lp lut kt hp

tp trung vo 1 thuc tnh target nhiu thuc tnh target measures: accuracy measures: support, confidence, Lift

13

Support v Confidence
gi s lut R : I => J
sup (R) = sup (I J)
support ca itemset I J

conf (R) = sup(R) / sup(I) l confidence ca lut R

lut kt hp c minimum support thng c cho l lut strong

14

Lut kt hp
cho frequent set {A,B,E}, lut

kt hp c minsup = 2 v minconf= 50%


A, B => E : conf=2/4 = 50%

TID List of items 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C

15

Lut kt hp
cho frequent set {A,B,E}, lut kt hp

c minsup = 2 v minconf= 50%


A, B => E : conf=2/4 = 50% A, E => B : conf=2/2 = 100% B, E => A : conf=2/2 = 100% E => A, B : conf=2/2 = 100% nhng lut khng tt
A =>B, E : conf=2/6 =33%< 50% B => A, E : conf=2/7 = 28% < 50% __ => A,B,E : conf: 2/9 = 22% < 50%

TID List of items 1 A, B, E 2 B, D 3 B, C 4 A, B, D 5 A, C 6 B, C 7 A, C 8 A, B, C, E 9 A, B, C

16

Tm lut mnh
nhng lut c sup >= minsup v conf >= minconf
sup(R) >= minsup and conf (R) >= minconf

tm tt c frequent itemsets

17

Tm itemsets
gii thut Apriori (Agrawal & Srikant, 1993) tng : s dng tp 1-item sinh ra tp 2item, tp 2-item dng sinh ra tp 3-item,
nu (A B) l frequent itemset, th (A) v (B) phi l frequent itemsets nu X l frequent k-item set, th tt c (k-1)-item subsets ca X cng l frequent tnh k-item set bng cch merge (k-1)-item sets

18

Sinh lut kt hp
2 bc :
xc nh frequent itemsets vi gii thut Apriori cho mi frequent itemset I
cho mi subset J ca I xc nh tt c cc lut kt hp : I-J => J

tng chnh : tnh cht subset

19

v d : sinh lut kt hp t Itemset


Frequent itemset ca tp weather :
Humidity = Normal, Windy = False, Play = Yes (4)

7 lut tim nng :


If Humidity = Normal and Windy = False then Play = Yes If Humidity = Normal and Play = Yes then Windy = False If Windy = False and Play = Yes then Humidity = Normal If Humidity = Normal then Windy = False and Play = Yes If Windy = False then Humidity = Normal and Play = Yes If Play = Yes then Humidity = Normal and Windy = False If True then Humidity = Normal and Windy = False and Play = Yes 4/4 4/6 4/6 4/7 4/8 4/9 4/12

20

Lut kt hp cho weather


lut c support > 1 v confidence = 100% :
Association rule 1 2 3 4 ... 58 Humidity=Normal Windy=False Temperature=Cool Outlook=Overcast Temperature=Cold Play=Yes ... Outlook=Sunny Temperature=Hot Play=Yes Humidity=Normal Play=Yes Humidity=Normal ... Humidity=High Sup. 4 4 4 3 ... 2 Conf. 100% 100% 100% 100% ... 100%

3 lut c support l 4, 5 lut c support bng 3, v 50 lut c support l 2


21

Lc lut kt hp
tp d liu ln => s lut sinh ra rt ln mc d s dng Confidence v Support tm cch lc hay chn la cc lut hu dng : s dng cc o khc (tham kho ti liu ca Howard Hamilton) mining lut kt hp

22

Outline
Gii thiu Lut kt hp ng dng

23

ng dng
Market basket analysis
Store layout, client offers

Wal-Mart knows that customers who buy Barbie dolls have a 60% likelihood of buying one of three types of candy bars. What does Wal-Mart do with information like that? 'I don't have a clue,' says Wal-Mart's chief of merchandising, Lee Scott See - KDnuggets 98:01 for many ideas www.kdnuggets.com/news/98/n01.html Diapers and beer urban legend Finding unusual events
WSARE What is Strange About Recent Events

24

You might also like