Professional Documents
Culture Documents
Data Mining - Chapter 6
Data Mining - Chapter 6
Hc k 1 2009-2010
Ni dung
6.7. Tm tt
Cc khi nim c bn
Raw Data
Mining
Items of Interest
Relationship
s among
Items
(Rules)
Postprocessing
User
Raw Data
Mining
Items of Interest
Transactional/
Relational Data
Items
Transaction
Items_bought
--------------------------------2000
A, B, C
1000
A, C
4000
A, D
5000
B, E, F
A, B, C, D, F,
Relationship
s among
Items
(Rules)
Postprocessing
User
Association
Rules
A C (50%, 66.6%)
10
11
Cc khi nim c bn
Item (phn t)
Support ( h tr)
12
Itemsets:
{I1, I2, I5},
{I2},
Item: I4
Transaction: T800
13
Cc khi nim c bn
Item (phn t)
Tp hp cc items
14
Cc khi nim c bn
15
Cc khi nim c bn
Support ( h tr)
16
Cc khi nim c bn
Cho A l mt itemset
A l frequent itemset iff support(A) >= minimum support
threshold.
17
18
21
22
A v B l cc itemsets
Frequent itemsets/subsequences/substructures
Frequent
itemsets/subsequences/substructures
Itemset/subsequence/substructure X l frequent
nu support(X) >= min_sup.
Itemsets: tp cc items
24
A v B l cc frequent itemsets
single-dimensional
single-level
Boolean
k+1-itemsets c to ra t k-itemsets.
Chng minh???
29
30
31
32
min_sup = 2/9
minimum support count = 2
33
c im
To ra nhiu tp d tuyn
104 frequent 1-itemsets nhiu hn 107 (104(104-1)/2)
2-itemsets d tuyn
Mt k-itemset cn t nht 2k -1 itemsets d tuyn trc .
Ly mu (sampling)
Khai ph ch tp con d liu cho trc vi mt tr support threshold
nh hn v cn mt phng php xc nh tnh ton din
(completeness).
35
Trnh to ra cc tp d tuyn
36
1. Xy dng FP-tree
37
38
39
41
42
c im
44
45
Support(AB)
min_sup
= Support_count(A U B) >=
Confidence(AB)
= P(B|A) =
Support_count(AUB)/Support_count(A) >=
min_conf
46
47
I1 I2 I5
Min_conf = 50%
I1 I5 I2
I2 I5 I1
I5 I1 I2
48
49
Ngng ca cc o (thresholds)
51
Metarules
Metarules
V d ca metarules
Metarule
P1(X, Y) P2(X, W) buys(X, office software)
54
Quan h tp hp cha/con: S1 / S2
Min tr
value / S1
Cc hm kt hp (aggregate functions)
Anti-monotone
Monotone
Succinctness
Convertible
56
Anti-monotone
Monotone
Succinctness
58
Convertible
V d:
Nu cc phn t sp theo th t tng dn th avg(I.price)
<= 100 l mt convertible anti-monotone constraint.
Nu cc phn t sp theo th t gim dn th avg(I.price)
<= 100 l mt convertible monotone constraint.
59
60
61
Cc
o support v confidence da vo s ch
quan ca ngi s dng
Lng rt ln lut kt hp c th c tr v.
Trong
62
Da vo thng k v d liu
Trong
63
64
P ( A B )
lift ( A, B )
P ( B | A) / P ( B) confidence( A B ) / support ( B)
P( A) P( B )
65
6.7. Tm tt
Khai ph lut kt hp
Hi & p
67