You are on page 1of 71

Khoa Khoa Hc & K Thut My Tnh Trng i Hc Bch Khoa Tp.

H Ch Minh

Chng 6: Lut kt hp
Cao Hc Ngnh Khoa Hc My Tnh Gio trnh in t Bin son bi: TS. V Th Ngc Chu (chauvtn@cse.hcmut.edu.vn)
Hc k 1 2011-2012
1

Ti liu tham kho


[1] Jiawei Han, Micheline Kamber, Data Mining: Concepts and Techniques, Second Edition, Morgan Kaufmann Publishers, 2006. [2] David Hand, Heikki Mannila, Padhraic Smyth, Principles of Data Mining, MIT Press, 2001. [3] David L. Olson, Dursun Delen, Advanced Data Mining Techniques, Springer-Verlag, 2008. [4] Graham J. Williams, Simeon J. Simoff, Data Mining: Theory, Methodology, Techniques, and Applications, Springer-Verlag, 2006. [5] Hillol Kargupta, Jiawei Han, Philip S. Yu, Rajeev Motwani, and Vipin Kumar, Next Generation of Data Mining, Taylor & Francis Group, LLC, 2009. [6] Daniel T. Larose, Data mining methods and models, John Wiley & Sons, Inc, 2006. [7] Ian H.Witten, Eibe Frank, Data mining : practical machine learning tools and techniques, Second Edition, Elsevier Inc, 2005. [8] Florent Messeglia, Pascal Poncelet & Maguelonne Teisseire, Successes and new directions in data mining, IGI Global, 2008. [9] Oded Maimon, Lior Rokach, Data Mining and Knowledge Discovery Handbook, Second Edition, Springer Science + Business Media, LLC 2005, 2010.

Ni dung
Chng 1: Tng quan v khai ph d liu Chng 2: Cc vn tin x l d liu Chng 3: Hi qui d liu Chng 4: Phn loi d liu Chng 5: Gom cm d liu Chng 6: Lut kt hp Chng 7: Khai ph d liu v cng ngh c s d liu Chng 8: ng dng khai ph d liu Chng 9: Cc ti nghin cu trong khai ph d liu Chng 10: n tp

Chng 6: Lut kt hp
6.1. Tng quan v khai ph lut kt hp 6.2. Biu din lut kt hp 6.3. Khm ph cc mu thng xuyn 6.4. Khm ph cc lut kt hp t cc mu thng xuyn 6.5. Khm ph cc lut kt hp da trn rng buc 6.6. Phn tch tng quan 6.7. Tm tt
4

6.0. Tnh hung 1 Market basket analysis

6.0. Tnh hung 2 - Tip th cho

6.0. Tnh hung 2 - Tip th cho

6.0. Tnh hung


Phn tch d liu gi hng (basket data analysis) Tip th cho (cross-marketing) Thit k catalog (catalog design) Phn loi d liu (classification) v gom cm d liu (clustering) vi cc mu ph bin
8

6.1. Tng quan v khai ph lut kt hp


Qu trnh khai ph lut kt hp Cc khi nim c bn Phn loi lut kt hp

6.1. Tng quan v khai ph lut kt hp


Qu trnh khai ph lut kt hp
Preprocessing

Mining

Raw Data

Items of Interest

Relationships among Items (Rules)

Postprocessing

User

10

6.1. Tng quan v khai ph lut kt hp


Qu trnh khai ph lut kt hp
Preprocessing Mining

Raw Data Transactional/ Relational Data

Items of Interest

Relationships among Items (Rules) Association Rules

Postprocessing

User

Items

Transaction Items_bought --------------------------------2000 A, B, C 1000 A, C 4000 A, D 5000 B, E, F

A, B, C, D, F,

A C (50%, 66.6%)

Bi ton phn tch gi th trng

11

6.1. Tng quan v khai ph lut kt hp


D liu mu ca AllElectronics (sau qu trnh tin x l)

12

6.1. Tng quan v khai ph lut kt hp


Cc khi nim c bn
Item (phn t) Itemset (tp phn t) Transaction (giao dch) Association (s kt hp) v association rule (lut kt hp) Support ( h tr) Confidence ( tin cy) Frequent itemset (tp phn t ph bin/thng xuyn) Strong association rule (lut kt hp mnh)
13

6.1. Tng quan v khai ph lut kt hp


D liu mu ca AllElectronics (sau qu trnh tin x l)

Itemsets: {I1, I2, I5}, {I2},

Item: I4

Transaction: T800
14

6.1. Tng quan v khai ph lut kt hp


Cc khi nim c bn
Item (phn t)
Cc phn t, mu, i tng ang c quan tm. J = {I1, I2, , Im}: tp tt c m phn t c th c trong tp d liu

Itemset (tp phn t)


Tp hp cc items Mt itemset c k items gi l k-itemset.

Transaction (giao dch)


Ln thc hin tng tc vi h thng (v d: giao dch khch hng mua hng) Lin h vi mt tp T gm cc phn t c giao dch
15

6.1. Tng quan v khai ph lut kt hp


Cc khi nim c bn
Association (s kt hp) v association rule (lut kt hp)
S kt hp: cc phn t cng xut hin vi nhau trong mt hay nhiu giao dch.
Th hin mi lin h gia cc phn t/cc tp phn t

Lut kt hp: qui tc kt hp c iu kin gia cc tp phn t.


Th hin mi lin h (c iu kin) gia cc tp phn t Cho A v B l cc tp phn t, lut kt hp gia A v B l A B. B xut hin trong iu kin A xut hin.
16

6.1. Tng quan v khai ph lut kt hp


Cc khi nim c bn
Support ( h tr)
o o tn s xut hin ca cc phn t/tp phn t. Minimum support threshold (ngng h tr ti thiu)
Gi tr support nh nht c ch nh bi ngi dng.

Confidence ( tin cy)


o o tn s xut hin ca mt tp phn t trong iu kin xut hin ca mt tp phn t khc. Minimum confidence threshold (ngng tin cy ti thiu)
Gi tr confidence nh nht c ch nh bi ngi dng.
17

6.1. Tng quan v khai ph lut kt hp


Cc khi nim c bn
Frequent itemset (tp phn t ph bin)
Tp phn t c support tha minimum support threshold. Cho A l mt itemset
A l frequent itemset iff support(A) >= minimum support threshold.

Strong association rule (lut kt hp mnh)


Lut kt hp c support v confidence tha minimum support threshold v minimum confidence threshold. Cho lut kt hp A B gia A v B, A v B l itemsets
A B l strong association rule iff support(A B) >= minimum support threshold v confidence(A B) >= minimum confidence threshold.
18

6.1. Tng quan v khai ph lut kt hp


Phn loi lut kt hp
Boolean association rule (lut kt hp lun l)/quantitative association rule (lut kt hp lng s) Single-dimensional association rule (lut kt hp n chiu)/multidimensional association rule (lut kt hp a chiu) Single-level association rule (lut kt hp n mc)/multilevel association rule (lut kt hp a mc) Association rule (lut kt hp)/correlation rule (lut tng quan thng k)

19

6.1. Tng quan v khai ph lut kt hp


Phn loi lut kt hp
Boolean association rule (lut kt hp lun l)/quantitative association rule (lut kt hp lng s)
Boolean association rule: lut m t s kt hp gia s hin din/vng mt ca cc phn t.
Computer Financial_management_software [support=2%, confidence=60%]

Quantitative association rule: lut m t s kt hp gia cc phn t/thuc tnh nh lng.


Age(X, 30..39) Income(X, 42K..48K) high resolution TV) buys(X,
20

6.1. Tng quan v khai ph lut kt hp


Phn loi lut kt hp
Single-dimensional association rule (lut kt hp n chiu)/multidimensional association rule (lut kt hp a chiu)
Single-dimensional association rule: lut ch lin quan n cc phn t/thuc tnh ca mt chiu d liu.
Buys(X, computer) Buys(X, financial_management_software)

Multidimensional association rule: lut lin quan n cc phn t/thuc tnh ca nhiu hn mt chiu.
Age(X, 30..39) Buys(X, computer)
21

6.1. Tng quan v khai ph lut kt hp


Phn loi lut kt hp
Single-level association rule (lut kt hp n mc) /multilevel association rule (lut kt hp a mc)
Single-level association rule: lut ch lin quan n cc phn t/thuc tnh mt mc tru tng.
Age(X, 30..39) Age(X, 18..29) Buys(X, computer) Buys(X, camera)

Multilevel association rule: lut lin quan n cc phn t/thuc tnh cc mc tru tng khc nhau.
Age(X, 30..39) Age(X, 30..39) Buys(X, laptop computer) Buys(X, computer)
22

6.1. Tng quan v khai ph lut kt hp


Phn loi lut kt hp
Association rule (lut kt hp)/correlation rule (lut tng quan thng k)
Association rule: strong association rules A B (association rules p ng yu cu minimum support threshold v minimum confidence threshold). Correlation rule: strong association rules A B p ng yu cu v s tng quan thng k gia A v B.

23

6.2. Biu din lut kt hp


Dng lut: A B [support, confidence]
Cho trc minimum support threshold (min_sup), minimum confidence threshold (min_conf) A v B l cc itemsets
Frequent itemsets/subsequences/substructures Closed frequent itemsets Maximal frequent itemsets Constrained frequent itemsets Approximate frequent itemsets Top-k frequent itemsets
24

6.2. Biu din lut kt hp


Frequent itemsets/subsequences/substructures
Itemset/subsequence/substructure X l frequent nu support(X) >= min_sup.
Itemsets: tp cc items Subsequences: chui tun t cc events/items Substructures: cc tiu cu trc (graph, lattice, tree, sequence, set, )

25

6.2. Biu din lut kt hp


Closed frequent itemsets
Mt itemset X closed trong J nu khng tn ti tp cha thc s Y no trong J c cng support vi X.
X J, X closed iff Y J v X Y: support(Y) <> support (X).

X l closed frequent itemset trong J nu X l frequent itemset v closed trong J.

Maximal frequent itemsets


Mt itemset X l maximal frequent itemset trong J nu khng tn ti tp cha thc s Y no trong J l mt frequent itemset.
X J, X l maximal frequent itemset iff Y J v X Y: Y khng phi l mt frequent itemset.
26

6.2. Biu din lut kt hp


Constrained frequent itemsets
Frequent itemsets tha cc rng buc do ngi dng nh ngha.

Approximate frequent itemsets


Frequent itemsets dn ra support (xp x) cho cc frequent itemsets s c khai ph.

Top-k frequent itemsets


Frequent itemsets c nhiu nht k phn t vi k do ngi dng ch nh.
27

6.2. Biu din lut kt hp


Lut kt hp lun l, n mc, n chiu gia cc tp phn t ph bin: A B [support, confidence]
A v B l cc frequent itemsets
single-dimensional single-level Boolean

Support(A B) = Support(A U B) >= min_sup Confidence(A B) = Support(A U B)/Support(A) = P(B|A) >= min_conf
28

6.3. Khm ph cc mu thng xuyn


Gii thut Apriori: khm ph cc mu thng xuyn vi tp d tuyn
R. Agrawal, R. Srikant. Fast algorithms for mining association rules. In VLDB 1994, pp. 487-499.

Gii thut FP-Growth: khm ph cc mu thng xuyn vi FP-tree


J. Han, J. Pei, Y. Yin. Mining frequent patterns without candidate generation. In MOD 2000, pp. 1-12.
29

6.3. Khm ph cc mu thng xuyn


Gii thut Apriori
Dng tri thc bit trc (prior knowledge) v c im ca cc frequent itemsets Tip cn lp vi qu trnh tm kim cc frequent itemsets tng mc mt (level-wise search)
k+1-itemsets c to ra t k-itemsets. mi mc tm kim, ton b d liu u c kim tra.

Apriori property gim khng gian tm kim: All nonempty subsets of a frequent itemset must also be frequent.
Chng minh??? Antimonotone: if a set cannot pass a test, all of its supersets will fail the same test as well.
30

6.3. Khm ph cc mu thng xuyn


Gii thut Apriori

31

6.3. Khm ph cc mu thng xuyn


Gii thut Apriori

32

6.3. Khm ph cc mu thng xuyn


D liu mu ca AllElectronics (sau qu trnh tin x l)

33

6.3. Khm ph cc mu thng xuyn

min_sup = 2/9 minimum support count = 2

34

6.3. Khm ph cc mu thng xuyn


Gii thut Apriori
c im
To ra nhiu tp d tuyn
104 frequent 1-itemsets 2-itemsets d tuyn nhiu hn 107 (104(104-1)/2)

Mt k-itemset cn t nht 2k -1 itemsets d tuyn trc .

Kim tra tp d liu nhiu ln


Chi ph ln khi kch thc cc itemsets tng ln dn. Nu k-itemsets c khm ph th cn kim tra tp d liu k+1 ln.
35

6.3. Khm ph cc mu thng xuyn


Gii thut Apriori
Cc ci tin ca gii thut Apriori
K thut da trn bng bm (hash-based technique)
Mt k-itemset ng vi hashing bucket count nh hn minimum support threshold khng l mt frequent itemset.

Gim giao dch (transaction reduction)


Mt giao dch khng cha frequent k-itemset no th khng cn c kim tra cc ln sau (cho k+1-itemset).

Phn hoch (partitioning)


Mt itemset phi frequent trong t nht mt phn hoch th mi c th frequent trong ton b tp d liu.

Ly mu (sampling)
Khai ph ch tp con d liu cho trc vi mt tr support threshold nh hn v cn mt phng php xc nh tnh ton din (completeness).

m itemset ng (dynamic itemset counting)


Ch thm cc itemsets d tuyn khi tt c cc tp con ca chng c d on l frequent.
36

6.3. Khm ph cc mu thng xuyn


Gii thut FP-Growth
Nn tp d liu vo cu trc cy (Frequent Pattern tree, FP-tree)
Gim chi ph cho ton tp d liu dng trong qu trnh khai ph
Infrequent items b loi b sm.

m bo kt qu khai ph khng b nh hng

Phng php chia--tr (divide-and-conquer)


Qu trnh khai ph c chia thnh cc cng tc nh.
1. Xy dng FP-tree 2. Khm ph frequent itemsets vi FP-tree

Trnh to ra cc tp d tuyn
Mi ln kim tra mt phn tp d liu
37

6.3. Khm ph cc mu thng xuyn


Gii thut FP-Growth
1. Xy dng FP-tree
1.1. Kim tra tp d liu, tm frequent 1-itemsets 1.2. Sp th t frequent 1-itemsets theo s gim dn ca support count (frequency, tn s xut hin) 1.3. Kim tra tp d liu, to FP-tree
To root ca FP-tree, c gn nhn null {} Mi giao dch tng ng mt nhnh ca FP-tree. Mi node trn mt nhnh tng ng mt item ca giao dch. Cc item ca mt giao dch c sp theo gim dn. Mi node kt hp vi support count ca item tng ng. Cc giao dch c chung items to thnh cc nhnh c prefix chung.
38

6.3. Khm ph cc mu thng xuyn


Gii thut FP-Growth

39

6.3. Khm ph cc mu thng xuyn

40

6.3. Khm ph cc mu thng xuyn


Gii thut FP-Growth
2. Khm ph frequent itemsets vi FP-tree
2.1. To conditional pattern base cho mi node ca FPtree
Tch lu cc prefix paths with frequency ca node

2.2. To conditional FP-tree t mi conditional pattern base


Tch ly frequency cho mi item trong mi base Xy dng conditional FP-tree cho frequent items ca base

2.3. Khm ph conditional FP-tree v pht trin frequent itemsets mt cch qui
Nu conditional FP-tree c mt path n th lit k tt c cc itemsets.
41

6.3. Khm ph cc mu thng xuyn


Gii thut FP-Growth

42

6.3. Khm ph cc mu thng xuyn

43

6.3. Khm ph cc mu thng xuyn


Gii thut FP-Growth
c im
Khng to tp itemsets d tuyn
Khng kim tra xem liu itemsets d tuyn c thc l frequent itemsets

S dng cu trc d liu nn d liu t tp d liu Gim chi ph kim tra tp d liu Chi ph ch yu l m v xy dng cy FP-tree lc u Hiu qu v co gin tt cho vic khm ph cc frequent itemsets di ln ngn
44

6.3. Khm ph cc mu thng xuyn


So snh gia gii thut Apriori v gii thut FP-Growth

Co gin vi support threshold

45

6.3. Khm ph cc mu thng xuyn


So snh gia gii thut Apriori v gii thut FP-Growth

Co gin tuyn tnh vi s giao dch

46

6.4. Khm ph cc lut kt hp t cc mu thng xuyn


Strong association rules A B
Support(A B) = Support(A U B) >= min_sup Confidence(A B) = Support(A U B)/Support(A) = P(B|A) >= min_conf Support(A B) = Support_count(A U B) >= min_sup Confidence(A B) = P(B|A) = Support_count(AUB)/Support_count(A) >= min_conf
47

6.4. Khm ph cc lut kt hp t cc mu thng xuyn


Qu trnh to cc strong association rules t tp cc frequent itemsets
Cho mi frequent itemset l, to cc tp con khng rng ca l.
Support_count(l) >= min_sup

Cho mi tp con khng rng s ca l, to ra lut s (l-s) nu Support_count(l)/Support_count(s) >= min_conf

48

6.4. Khm ph cc lut kt hp t cc mu thng xuyn

I1 I2 I5
Min_conf = 50%

I1 I5 I2 I2 I5 I1 I5 I1 I2
49

6.5. Khm ph cc lut kt hp da trn rng buc


Rng buc (constraints)
Hng dn qu trnh khai ph mu (patterns) v lut (rules) Gii hn khng gian tm kim d liu trong qu trnh khai ph Cc dng rng buc
Rng buc kiu tri thc (knowledge type constraints) Rng buc d liu (data constraints) Rng buc mc/chiu (level/dimension constraints) Rng buc lin quan n o (interestingness constraints) Rng buc lin quan n lut (rule constraints)
50

6.5. Khm ph cc lut kt hp da trn rng buc


Rng buc kiu tri thc (knowledge type constraints)
Lut kt hp/tng quan

Rng buc d liu (data constraints)


Task-relevant data (association rule mining)

Rng buc mc/chiu (level/dimension constraints)


Chiu (thuc tnh) d liu hay mc tru tng/ nim

Rng buc lin quan n o (interestingness constraints)


Ngng ca cc o (thresholds)

Rng buc lin quan n lut (rule constraints)


Dng lut s c khm ph
51

6.5. Khm ph cc lut kt hp da trn rng buc


Khm ph lut da trn rng buc
Qu trnh khai ph d liu tt hn v hiu qu hn (more effective and efficient).
Lut c khm ph da trn cc yu cu (rng buc) ca ngi s dng.
More effective

B ti u ha (optimizer) c th c dng khai thc cc rng buc ca ngi s dng.


More efficient

52

6.5. Khm ph cc lut kt hp da trn rng buc


Khm ph lut da trn rng buc lin quan n lut (rule constraints)
Dng lut (meta-rule guided mining)
Metarules: ch nh dng lut (v c php syntactic) mong mun c khm ph

Ni dung lut (rule content)


Rng buc gia cc bin trong A v/hoc B trong lut A B
Quan h tp hp cha/con Min tr Cc hm kt hp (aggregate functions)
53

6.5. Khm ph cc lut kt hp da trn rng buc


Metarules
Ch nh dng lut (v c php syntactic) mong mun c khm ph Da trn kinh nghim, mong i v trc gic ca nh phn tch d liu To nn gi thuyt (hypothesis) v cc mi quan h (relationships) trong cc lut m ngi dng quan tm Qu trnh khm ph lut kt hp + qu trnh tm kim lut trng vi metarules cho trc
54

6.5. Khm ph cc lut kt hp da trn rng buc


Metarules
Mu lut (rule template): P1 P2 Pl Q1 Q2 Qr
P1, P2, , Pl, Q1, Q2, , Qr: v t c th (instantiated predicates) hay bin v t (predicate variables) Thng lin quan n nhiu chiu/thuc tnh

V d ca metarules
Metarule P1(X, Y) P2(X, W) buys(X, office software) Lut tha metarule age(X, 30..39) income(X, 41k..60k) buys(X, office software) 55

6.5. Khm ph cc lut kt hp da trn rng buc


Rng buc gia cc bin S1, S2, trong A v/hoc B trong lut A B
Quan h tp hp cha/con: S1 / S2 Min tr
S1 value, {=, <>, <, <=, >, >=} value / S1 ValueSet S1 hoc S1 ValueSet, {=, <>, , , }

Cc hm kt hp (aggregate functions)
Agg(S1) value, Agg() {min, max, sum, count, avg}, {=, <>, <, <=, >, >=}
56

6.5. Khm ph cc lut kt hp da trn rng buc


Tnh cht ca cc rng buc
Anti-monotone Monotone Succinctness Convertible

57

6.5. Khm ph cc lut kt hp da trn rng buc


Tnh cht ca cc rng buc
Anti-monotone
A constraint Ca is anti-monotone iff. for any pattern S not satisfying Ca, none of the super-patterns of S can satisfy Ca. V d: sum(S.Price) <= value

Monotone
A constraint Cm is monotone iff. for any pattern S satisfying Cm, every super-pattern of S also satisfies it. V d: sum(S.Price) >= value
58

6.5. Khm ph cc lut kt hp da trn rng buc


Tnh cht ca cc rng buc
Succinctness
A subset of item Is is a succinct set, if it can be set expressed as p(I) for some selection predicate p, where is a selection operator. SP2I is a succinct power set, if there is a fixed set number of succinct set I1, , Ik I, s.t. SP can be expressed in terms of the strict power sets of I1, , Ik using union and minus. A constraint Cs is succinct provided SATCs(I) is a succinct power set. C th to tng minh v chnh xc cc tp tha succinct constraints. V d: min(S.Price) <= value
59

6.5. Khm ph cc lut kt hp da trn rng buc


Tnh cht ca cc rng buc
Convertible
Cc rng buc khng c cc tnh cht anti-monotone, monotone, v succinctness Cc rng buc hoc l anti-monotone hoc l monotone nu cc phn t trong itemset ang kim tra c th t. V d:
Nu cc phn t sp theo th t tng dn th avg(I.price) <= 100 l mt convertible anti-monotone constraint. Nu cc phn t sp theo th t gim dn th avg(I.price) <= 100 l mt convertible monotone constraint.
60

6.5. Khm ph cc lut kt hp da trn rng buc

61

6.5. Khm ph cc lut kt hp da trn rng buc


Khm ph lut (rules)/tp phn t ph bin (frequent itemsets) tha cc rng buc
Cch tip cn trc tip
p dng cc gii thut truyn thng Kim tra cc rng buc cho tng kt qu t c
Nu tha rng buc th tr v kt qu sau cng.

Cch tip cn da trn tnh cht ca cc rng buc


Phn tch ton din cc tnh cht ca cc rng buc Kim tra cc rng buc cng sm cng tt trong qu trnh khm ph rules/frequent itemsets
Khng gian d liu c thu hp cng sm cng tt.
62

6.6. Phn tch tng quan


Strong association rules A B
Da trn tn s xut hin ca A v B (min_sup) Da trn xc sut c iu kin ca B i vi A (min_conf) Cc o support v confidence da vo s ch quan ca ngi s dng
Lng rt ln lut kt hp c th c tr v.

Trong s 10,000 giao dch, 6,000 giao dch cho computer games, 7,500 cho videos, v 4,000 cho c computer games v videos
Buys(X, computer games) Buys (X, videos) [support = 40%, confidence = 66%]
63

6.6. Phn tch tng quan


Phn tch tng quan cho lut kt hp A B
Kim tra s tng quan v ph thuc ln nhau gia A v B Da vo thng k v d liu Cc o khch quan, khng ph thuc vo ngi s dng Trong s 10,000 giao dch, 6,000 giao dch cho computer games, 7,500 cho videos, v 4,000 cho c computer games v videos
Buys(X, computer games) Buys (X, videos) [support = 40%, confidence = 66%] P(videos) = 75% > 66%: computer games v videos tng quan nghch vi nhau.
64

6.6. Phn tch tng quan


Lut tng quan (correlation rules): A B [support, confidence, correlation]
correlation: o o s tng quan gia A v B.
Cc o correlation: lift, 2 (Chi-square), all_confidence, cosine
lift: kim tra s xut hin c lp gia A v B da trn xc sut (kh nng) 2 (Chi-square): kim tra s c lp gia A v B da trn gi tr mong i v gi tr quan st c all_confidence: kim tra lut da trn tr support cc i cosine: ging lift tuy nhin loi b s ph thuc vo tng s giao dch hin c

all_confidence v cosine tt cho tp d liu ln, khng ph thuc cc giao dch m khng cha bt k itemsets ang kim tra (nulltransactions). all_confidence v consine l cc o null-invariant.
65

6.6. Phn tch tng quan


o tng quan lift
lift(A, B) < 1: A tng quan nghch vi B lift(A, B) > 1: A tng quan thun vi B lift(A, B) = 1: A v B c lp nhau, khng c tng quan

lift ( A, B) =

P( A B) = P( B | A) / P ( B) = confidence( A => B) / support ( B) P( A) P( B)

lift({game}=>{video}) = 0.89 < 1

{game} v {video} tng quan nghch.

66

6.7. Tm tt
Khai ph lut kt hp
c xem nh l mt trong nhng ng gp quan trng nht t cng ng c s d liu trong vic khm ph tri thc

Cc dng lut: lut kt hp lun l/lut kt hp lng s, lut kt hp n chiu/lut kt hp a chiu, lut kt hp n mc/lut kt hp a mc, lut kt hp/lut tng quan thng k Cc dng phn t (item)/mu (pattern): Frequent itemsets/subsequences/substructures, Closed frequent itemsets, Maximal frequent itemsets, Constrained frequent itemsets, Approximate frequent itemsets, Top-k frequent itemsets Khm ph cc frequent itemsets: gii thut Apriori v gii thut FP-Growth dng FP-tree
67

Hi & p

68

c thm
R. Agrawal, R. Srikant. Fast algorithms for mining association rules. In VLDB 1994, pp. 487-499. rules J. Han, J. Pei, Y. Yin. Mining frequent patterns without candidate generation. In MOD 2000, pp. generation 1-12. J. Hipp, U. Guntzer, G. Nakhaeizadeh (2000). Algorithms for association rule mining a general survey and comparison. SIGKDD Explorations comparison 2:1, pp. 58-64. W-J Lee, S-J Lee (2004). Discovery of fuzzy temporal association rules. IEEE transactions on rules Systems, Man, and Cybernetics Part B 34:6, pp. 2330-2342.
69

Chng 6: Lut kt hp
Phn Ph Lc

70

Ni dung Ph lc
Fuzzy association rules (Lut kt hp m)
[3], chapter 5: Fuzzy Sets in Data Mining, pp. 79 - 86.

Incremental association rule mining (Khai ph lut kt hp gia tng)


Hong, T.P., Lin, C.W., Wu, Y.L. (2008), Incrementally fast updated frequent pattern trees, Expert Systems with Applications, 34(4), pp. 2424-2435. Lin, C.W., Hong, T.P., Lu, W.H. (2009), The Pre-FUFP algorithm for increment mining, Expert Systems with Applications, 36(5), pp. 9498-9505.

Tm tt phn ph lc
71

You might also like