Professional Documents
Culture Documents
Luat Ket Hop
Luat Ket Hop
H Ch Minh
Chng 6: Lut kt hp
Cao Hc Ngnh Khoa Hc My Tnh Gio trnh in t Bin son bi: TS. V Th Ngc Chu (chauvtn@cse.hcmut.edu.vn)
Hc k 1 2011-2012
1
Ni dung
Chng 1: Tng quan v khai ph d liu Chng 2: Cc vn tin x l d liu Chng 3: Hi qui d liu Chng 4: Phn loi d liu Chng 5: Gom cm d liu Chng 6: Lut kt hp Chng 7: Khai ph d liu v cng ngh c s d liu Chng 8: ng dng khai ph d liu Chng 9: Cc ti nghin cu trong khai ph d liu Chng 10: n tp
Chng 6: Lut kt hp
6.1. Tng quan v khai ph lut kt hp 6.2. Biu din lut kt hp 6.3. Khm ph cc mu thng xuyn 6.4. Khm ph cc lut kt hp t cc mu thng xuyn 6.5. Khm ph cc lut kt hp da trn rng buc 6.6. Phn tch tng quan 6.7. Tm tt
4
Mining
Raw Data
Items of Interest
Postprocessing
User
10
Items of Interest
Postprocessing
User
Items
A, B, C, D, F,
A C (50%, 66.6%)
11
12
Item: I4
Transaction: T800
14
19
Multidimensional association rule: lut lin quan n cc phn t/thuc tnh ca nhiu hn mt chiu.
Age(X, 30..39) Buys(X, computer)
21
Multilevel association rule: lut lin quan n cc phn t/thuc tnh cc mc tru tng khc nhau.
Age(X, 30..39) Age(X, 30..39) Buys(X, laptop computer) Buys(X, computer)
22
23
25
Support(A B) = Support(A U B) >= min_sup Confidence(A B) = Support(A U B)/Support(A) = P(B|A) >= min_conf
28
Apriori property gim khng gian tm kim: All nonempty subsets of a frequent itemset must also be frequent.
Chng minh??? Antimonotone: if a set cannot pass a test, all of its supersets will fail the same test as well.
30
31
32
33
34
Ly mu (sampling)
Khai ph ch tp con d liu cho trc vi mt tr support threshold nh hn v cn mt phng php xc nh tnh ton din (completeness).
Trnh to ra cc tp d tuyn
Mi ln kim tra mt phn tp d liu
37
39
40
2.3. Khm ph conditional FP-tree v pht trin frequent itemsets mt cch qui
Nu conditional FP-tree c mt path n th lit k tt c cc itemsets.
41
42
43
S dng cu trc d liu nn d liu t tp d liu Gim chi ph kim tra tp d liu Chi ph ch yu l m v xy dng cy FP-tree lc u Hiu qu v co gin tt cho vic khm ph cc frequent itemsets di ln ngn
44
45
46
48
I1 I2 I5
Min_conf = 50%
I1 I5 I2 I2 I5 I1 I5 I1 I2
49
52
V d ca metarules
Metarule P1(X, Y) P2(X, W) buys(X, office software) Lut tha metarule age(X, 30..39) income(X, 41k..60k) buys(X, office software) 55
Cc hm kt hp (aggregate functions)
Agg(S1) value, Agg() {min, max, sum, count, avg}, {=, <>, <, <=, >, >=}
56
57
Monotone
A constraint Cm is monotone iff. for any pattern S satisfying Cm, every super-pattern of S also satisfies it. V d: sum(S.Price) >= value
58
61
Trong s 10,000 giao dch, 6,000 giao dch cho computer games, 7,500 cho videos, v 4,000 cho c computer games v videos
Buys(X, computer games) Buys (X, videos) [support = 40%, confidence = 66%]
63
all_confidence v cosine tt cho tp d liu ln, khng ph thuc cc giao dch m khng cha bt k itemsets ang kim tra (nulltransactions). all_confidence v consine l cc o null-invariant.
65
lift ( A, B) =
66
6.7. Tm tt
Khai ph lut kt hp
c xem nh l mt trong nhng ng gp quan trng nht t cng ng c s d liu trong vic khm ph tri thc
Cc dng lut: lut kt hp lun l/lut kt hp lng s, lut kt hp n chiu/lut kt hp a chiu, lut kt hp n mc/lut kt hp a mc, lut kt hp/lut tng quan thng k Cc dng phn t (item)/mu (pattern): Frequent itemsets/subsequences/substructures, Closed frequent itemsets, Maximal frequent itemsets, Constrained frequent itemsets, Approximate frequent itemsets, Top-k frequent itemsets Khm ph cc frequent itemsets: gii thut Apriori v gii thut FP-Growth dng FP-tree
67
Hi & p
68
c thm
R. Agrawal, R. Srikant. Fast algorithms for mining association rules. In VLDB 1994, pp. 487-499. rules J. Han, J. Pei, Y. Yin. Mining frequent patterns without candidate generation. In MOD 2000, pp. generation 1-12. J. Hipp, U. Guntzer, G. Nakhaeizadeh (2000). Algorithms for association rule mining a general survey and comparison. SIGKDD Explorations comparison 2:1, pp. 58-64. W-J Lee, S-J Lee (2004). Discovery of fuzzy temporal association rules. IEEE transactions on rules Systems, Man, and Cybernetics Part B 34:6, pp. 2330-2342.
69
Chng 6: Lut kt hp
Phn Ph Lc
70
Ni dung Ph lc
Fuzzy association rules (Lut kt hp m)
[3], chapter 5: Fuzzy Sets in Data Mining, pp. 79 - 86.
Tm tt phn ph lc
71