You are on page 1of 69

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.

vn


I HC THI NGUYN
KHOA CNG NGH THNG TIN








L THU H





PHNG PHP LUN KT HP
V NG DNG








Lun vn thc s : Khoa hc my tnh











Thi Nguyn - 2009
S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn












































I HC THI NGUYN
KHOA CNG NGH THNG TIN



L THU H





PHNG PHP LUN KT HP
V NG DNG




Chuyn ngnh: : Khoa hc my tnh
M s: 60 48 01


Lun vn Thc s Khoa hc my tnh



NGI HNG DN KHOA HC: PGS.TS V C THI





Thi Nguyn - 2009

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

MC LC
LI CM N...................................................................................................i
DANH MC CC HNH...............................................................................ii
M U ....................................................................................................... 3
Chng 1 TNG QUAN V KHM PH TRI THC V KHAI PH
D LIU....................................................................................................... 6
1.1. Pht hin tri thc v khai ph d liu ................................................... 6
1.2. Qu trnh pht hin tri thc t c s d liu ......................................... 7
1.2.1. Xc nh vn ............................................................................ 8
1.2.2.Thu thp v tin x l d liu ........................................................ 9
1.2.3. Khai thc d liu ......................................................................... 11
1.2.4. Minh ha v nh gi.................................................................. 11
1.2.5. a kt qu vo thc t .............................................................. 11
1.3. Khai ph d liu ................................................................................ 12
1.3.1. Cc quan nim v khai ph d liu ............................................. 12
1.3.2. Nhim v ca khai ph d liu.................................................... 13
1.3.3. Trin khai vic khai ph d liu .................................................. 15
1.3.4. Mt s ng dng khai ph d liu ............................................... 15
1.3.5. Cc k thut khai ph d liu ...................................................... 17
1.3.6. Kin trc ca h thng khai ph d liu ...................................... 19
1.3.7. Qu trnh khai ph d liu........................................................... 21
1.3.8. Nhng kh khn trong khai ph d liu ...................................... 22
Chng 2 LUT KT HP TRONG KHAI PH D LIU ................ 25
2.1. Bi ton kinh in dn n vic khai ph lut kt hp ....................... 25
2.2. nh ngha v lut kt hp ................................................................. 26
2.3. Mt s hng tip cn trong khai ph lut kt hp ............................ 32
Chng 3 MT S THUT TON PHT HIN LUT KT HP .... 35
3.1. Thut ton AIS .................................................................................. 35
3.2. Thut ton SETM .............................................................................. 36
3.3. Thut ton Apriori ............................................................................. 37
3.4. Thut ton Apriori-TID ..................................................................... 44
3.5.Thut ton Apriori-Hybrid .................................................................. 46
3.6. Thut ton FP_growth ....................................................................... 47
3.7. Thut ton PARTITION [Savasere 95] .............................................. 55
Chng 4 KHAI THC LUT KT HP TRONG BI TON QUN
L THIT B TRNG THPT CHU VN AN- THI NGUYN ....... 58
4.1. Pht biu bi ton .............................................................................. 58
4.2. C s d liu ca bi ton ................................................................. 59
4.3. Ri rc cc thuc tnh gc to thnh cc thuc tnh nh phn ........ 60

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

2
4.4. C s d liu dng nh phn .............................................................. 62
4.5. Kt qu khai thc lut kt hp bng thut ton Apriori ...................... 62
4.6. Kt qu khai thc c s d liu qun l thit b Trng THPT Chu Vn
An Thi Nguyn .................................................................................... 63
KT LUN ................................................................................................. 64
TI LIU THAM KHO .......................................................................... 66


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

3
M U
Trong nhng nm gn y, s pht trin mnh m ca cng ngh thng
tin lm cho kh nng thu thp v lu tr thng tin ca cc h thng thng
tin tng nhanh mt cch nhanh chng. Bn cnh , vic tin hc ha mt cch
t v nhanh chng cc hot ng sn xut, kinh doanh cng nh nhiu lnh
vc hot ng khc to ra cho chng ta mt lng d liu cn lu tr
khng l. Hng triu c s d liu c s dng trong cc hot ng sn
xut, kinh doanh, qun l..., trong c nhiu c s d liu cc ln c
Gigabyte, thm ch l Terabyte.
S bng n ny dn ti mt yu cu cp thit l cn c nhng k
thut v cng c mi t ng chuyn i lng d liu khng l kia thnh
cc tri thc c ch. T , cc k thut khai ph d liu tr thnh mt lnh
vc thi s ca nn CNTT th gii hin nay ni chung v Vit Nam ni ring.
Khai ph d liu ang c p dng mt cch rng ri trong nhiu lnh
vc kinh doanh v i sng khc nhau: marketing, ti chnh, ngn hng v
bo him, khoa hc, y t, an ninh, internet Rt nhiu t chc v cng ty ln
trn th gii p dng k thut khai ph d liu vo cc hot ng sn xut
kinh doanh ca mnh v thu c nhng li ch to ln.
Mc ch nghin cu ca ti l tm hiu v cc k thut khai ph d
liu; cc vn lin quan n khai ph lut kt hp nhm pht hin v a ra
cc mi lin h gia cc gi tr d liu trong CSDL v p dng chng vo bi
ton qun l trang thit b dng ca trng THPT Chu Vn An Tnh
Thi Nguyn.
Mc tiu nghin cu ca ti:
- Tng kt cc kin thc c bn nht lin quan n pht hin lut kt
hp v tm kim tri thc t d liu.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

4
- Da trn l thuyt tng kt c, i su vo tm hiu, nghin cu
phng php lut kt hp v lm mt chng trnh th nghim da
trn thut ton Apriori.
ngha khoa hc ca ti:
- y l phng php c nhiu nh khoa hc nghin cu v c
ng gp trong thc tin.
- C th coi ti l mt ti liu tham kho kh y , r rng v cc
kin thc c bn trong phng php pht hin lut kt hp.
Phng php nghin cu:
- Lp k hoch, ln qui trnh, tin thc hin.
- Tham kho nhiu ti liu c lin quan, tham kho kin cc chuyn
gia trong lnh vc nghin cu.
Phm vi nghin cu:
Cc kin thc c bn nht v phng php pht hin lut kt hp
trn c s lm lun vn thc s.
Cc kt qu nghin cu t c:
- Tng kt cc kin thc c bn nht ca phng php khai ph lut
kt hp.
- Lun vn c th tr thnh mt ti liu tham kho cho nhng ngi
mun tm hiu v khai ph d liu v phng php khai ph lut kt
hp.
- Xy dng mt phn mm th nghim da trn thut ton Apriori.
Lun vn bao gm 4 chng, vi cc ni dung:
Chng 1: Trnh by tng quan v khm ph tri thc v khai ph d liu,
trong c cp n khi nim tri thc, d liu, qu trnh khm ph tri
thc, nhim v v cc k thut khm ph tri thc.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

5
Chng 2: Trnh by v lut kt hp, trong trnh by v cc khi nim,
nh ngha, tnh cht ca lut kt hp.
Chng 3: Trnh by mt s k thut khai thc lut kt hp.
Chng 4: Ci t chng trnh tm lut kt hp, ng dng trong qun l
trang thit b, dng ca trng THPT Chu Vn An Tnh Thi Nguyn.
Lun vn ny c hon thnh trong khong thi gian khng di.
Tuy nhin, t c mt s kt qu tt, ti ang nghin cu hon thin
v a chng trnh trong lun vn vo ng dng thc tin qun l trang thit
b ca trng THPT Chu Vn An Tnh Thi Nguyn, rt mong nhn c
s gp ca cc thy c, ng nghip v bn b lun vn v chng trnh
c hon thin hn.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

6
Chng 1
TNG QUAN V KHM PH TRI THC V KHAI PH D LIU

1.1. Pht hin tri thc v khai ph d liu
Trong thi i bng n cng ngh thng tin, cc cng ngh lu tr d
liu ngy cng pht trin to iu kin cho cc n v thu thp d liu tt hn.
c bit trong lnh vc kinh doanh, cc doanh nghip nhn thc c tm
quan trng ca vic nm bt v x l thng tin, nhm gip cc ch doanh
nghip trong vic vch ra cc chin lc kinh doanh kp thi mang li nhng
li nhun to ln cho doanh nghip ca mnh. Tt c l do khin cho cc c
quan, n v v cc doanh nghip to ra mt lng d liu khng l c
Gigabyte thm ch l Terabyte cho ring mnh.
Khi lu tr cc d liu khng l nh vy th chng ta thy rng chc
chn chng phi cha nhng gi tr nht nh no . Tuy nhin, theo thng
k th ch c mt lng nh ca nhng d liu ny (khong t 5% n 10%)
l lun c phn tch, s cn li h khng bit s phi lm g hoc c th
lm g vi chng nhng h vn tip tc thu thp rt tn km vi ngh lo s
rng s c ci g quan trng b b qua sau ny c lc cn n n. Mt
khc, trong mi trng cnh tranh, ngi ta ngy cng cn c nhiu thng tin
vi tc nhanh tr gip vic ra quyt nh v ngy cng c nhiu cu hi
mang tnh cht nh tnh cn phi tr li da trn mt khi lng d liu
khng l c. Vi nhng l do nh vy, cc phng php qun tr v khai
thc c s d liu truyn thng ngy cng khng p ng c thc t lm
pht trin mt khuynh hng k thut mi l K thut pht hin tri thc v
khai ph d liu (KDD - Knowledge Discovery and Data Mining).
Thng thng chng ta coi d liu nh mt dy cc bit, hoc cc s v
cc k hiu, hoc cc i tng vi mt ngha no khi c gi cho
mt chng trnh di mt dng nht nh. Chng ta s dng cc bit o

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

7
lng cc thng tin v xem n nh l cc d liu c lc b cc d tha,
c rt gn ti mc ti thiu c trng mt cch c bn cho d liu.
Chng ta c th xem tri thc nh l cc thng tin tch hp, bao gm cc s
kin v cc mi quan h gia chng. Cc mi quan h ny c th c hiu
ra, c th c pht hin, hoc c th c hc. Ni cch khc, tri thc c th
c coi l d liu c tru tng v t chc cao.
Pht hin tri thc trong cc c s d liu l mt qui trnh nhn bit cc
mu hoc cc m hnh trong d liu vi cc tnh nng: hp thc, mi, kh ch,
v c th hiu c. Cn khai thc d liu l mt bc trong qui trnh pht
hin tri thc gm c cc thut ton khai thc d liu chuyn dng di mt s
qui nh v hiu qu tnh ton chp nhn c tm ra cc mu hoc cc m
hnh trong d liu. Ni mt cch khc, mc ch ca pht hin tri thc v khai
ph d liu chnh l tm ra cc mu v/hoc cc m hnh ang tn ti trong
cc c s d liu nhng vn cn b che khut bi hng ni d liu.
Nhiu ngi coi khai ph d liu v khm ph tri thc trong c s d
liu l nh nhau. Tuy nhin trn thc t, khai ph d liu ch l mt bc thit
yu trong qu trnh pht hin tri thc trong c s d liu.
1.2. Qu trnh pht hin tri thc t c s d liu
Qu trnh pht hin tri thc c th chia thnh cc bc nh sau:
- Lm sch d liu (Data cleaning): Loi b d liu nhiu hoc d liu
khng thch hp.
- Tch hp d liu (Data integration): Tch hp d liu t cc ngun khc
nhau.
- Chn d liu (Data Selection): Chn nhng d liu lin quan trc tip
n nhim v.
- Chuyn i d liu (Data Transformation): Chuyn d liu v nhng
dng ph hp cho vic khai ph.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

8
- Khai ph d liu (Data mining): Cc k thut c p dng trch
xut thng tin c ch hoc cc mu in hnh trong d liu.
- nh gi mu (Pattern evaluation): nh gi mu hoc tri thc thu
c.
- Trnh din d liu (Knowledge Presentation): Biu din nhng tri thc
khai ph c cho ngi s dng.









Hnh 1.1. Qu trnh khm ph tri thc t c s d liu
Hnh 1.1 m t 5 giai on trong qu trnh khm ph tri thc t c s d
liu. Mc d c 5 giai on nh trn xong qu trnh khm ph tri thc t c s
d liu l mt qu trnh tng tc v lp di lp li theo chu trnh lin tc kiu
xoy trn c, trong ln lp sau hon chnh hn ln lp trc. Ngoi ra, giai
on sau li da trn kt qu thu c ca giai on trc theo kiu thc
nc. y l mt qu trnh bin chng mang tnh cht khoa hc ca lnh vc
pht hin tri thc v l phng php lun trong vic xy dng cc h thng
pht hin tri thc.
1.2.1. Xc nh vn
y l mt qu trnh mang tnh nh tnh vi mc ch xc nh c lnh
vc yu cu pht hin tri thc v xy dng bi ton tng kt. Trong thc t,
5. a kt qu vo
thc tin
4. Minh ha v nh
gi tri thc
3. Khai thc d liutrch
ra cc mu/m hnh
2. Thu thp v tin
x l d liu
1. Hiu v xc nh
vn

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

9
cc c s d liu c chuyn mn ha v phn chia theo cc lnh vc khc
nhau nh sn phm, kinh doanh, ti chnh, Vi mi tri thc pht hin c
c th c gi tr trong lnh vc ny nhng li khng mang nhiu ngha i
vi mt lnh vc khc. V vy m vic xc nh lnh vc v nh ngha bi
ton gip nh hng cho giai on tip theo thu thp v tin x l d liu.
1.2.2.Thu thp v tin x l d liu
Cc c s d liu thu c thng cha rt nhiu thuc tnh nhng li
khng y , khng thun nht, c nhiu li v cc gi tr c bit. V vy,
giai on thu thp v tin x l d liu tr nn rt quan trng trong qu trnh
pht hin tri thc t c s d liu. C th ni rng giai on ny chim t
70% n 80% gi thnh trong ton b bi ton.
Ngi ta chia giai on thu thp v tin x l d liu thnh cc cng
on nh: la chn d liu, lm sch, lm giu, m ha d liu. Cc cng
on c thc hhin theo trnh t a ra c mt c s d liu thch hp
cho cc giai on sau. Tuy nhin, ty tng d liu c th m qu trnh trn
c iu chnh cho ph hp v ngi ta a ra mt phng php cho mi
loi d liu.
a. Chn lc d liu: y l bc chn lc cc d liu c lin quan
trong cc ngun d liu khc nhau. Cc thng tin c chn lc sao cho c
cha nhiu thng tin lin quan ti lnh vc cn pht hin tri thc xc nh
trong giai on xc nh vn .
b. Lm sch d liu:D liu thc t, c bit d liu ly t nhiu ngun
khc nhau thng khng ng nht. Do cn c bin php x l a v
mt c s d liu thng nht phc v cho khai thc. Nhim v lm sch d
liu thng bao gm:
- iu ha d liu: Cng vic ny nhm gim bt tnh khng nht qun
do d liu ly t nhiu ngun khc nhau. Phng php thng thng l

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

10
kh cc trng hp trng lp d liu v thng nht cc k hiu. Chng
hn, mt khch hng c th c nhiu bn ghi do vic nhp sai tn hoc
do qu trnh thay i mt s thng tin c nhn gy ra v to s lm
tng c nhiu khch hng khc nhau.
- X l cc gi tr khuyt: Tnh khng y ca d liu c th gy ra
hin tng d liu cha cc gi tr khuyt. y l hin tng kh ph
bin. Thng thng, ngi ta c th la chn cc phng php khc
nhau thc hin vic x l cc gi tr khuyt nh: b qua cc b c
gi tr khuyt, im b sung bng tay, dng mt hng chung in
vo gi tr khuyt, dng gi tr trung bnh ca mi bn ghi cng lp
hoc dng cc gi tr m tn sut xut hin ln.
- X l nhiu v cc ngoi l: Thng thng, nhiu d liu c th l
nhiu ngu nhin hoc cc gi tr bt thng. lm sch nhiu, ngi
ta c th s dng phng php lm trn nhiu hoc dng cc gii thut
pht hin ra cc ngoi l x l.
c. Lm giu d liu: Vic thu thp d liu i khi khng m bo tnh
y ca d liu. Mt s thng tin quan trng c th thiu hoc khng y
. Chng hn, d liu v khch hng ly t mt ngun bn ngoi khng c
hoc khng y thng tin v thu nhp. Nu thng tin v thu nhp l quan
trng trong qu trnh khai thc d liu phn tch hnh vi khch hng th r
rng l ta khng th chp nhn a cc d liu khuyt thiu vo c.
Qu trnh lm giu bao cng bao gm vic tch hp v chuyn i d
liu. Cc d liu t nhiu ngun khc nhau c tch hp thnh mt kho
thng nht. Cc khun dng khc nhau ca d liu cng c quy i, tnh
ton li a v mt kiu thng nht, tin cho qu trnh phn tch. i khi,
mt s thuc tnh mi c th c xy dng da trn cc thuc tnh c.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

11
d. M ha: Cc phng php dng chn lc, lm sch, lm giu d
liu s c m ha di dng cc th tc, chng trnh hay tin ch nhm t
ng ha vic kt xut, bin i v di chuyn d liu. Cc h thng con c
th c thc thi nh k lm ti d liu phc v cho vic phn tch.
1.2.3. Khai thc d liu
Giai on khai thc d liu c bt u sau khi d liu c thu
thp v tin hnh x l. Trong giai on ny, cng vic ch yu l xc nh
c bi ton khai thc d liu, tin hnh la chn phng php khai thc
ph hp vi d liu c c v tch ra cc tri thc cn thit.
Thng thng, cc bi ton khai thc d liu bao gm: cc bi ton
mang tnh cht m t - a ra nhng tnh cht chung nht ca cc d liu, cc
bi ton khai thc d bo bao gm c vic thc hin cc suy din trn d
liu. Ty theo bi ton xc nh c m ta la chn cc phng php khai
thc d liu cho ph hp.
1.2.4. Minh ha v nh gi
Cc tri thc pht hin t c s d liu cn c tng hp di dng cc
bo co phc v cho cc mc ch h tr quyt nh khc nhau.
Do nhiu phng php khai thc c th c p dng nn cc kt qu
c mc tt/xu khc nhau. Vic nh gi cc kt qu thu c l cn thit,
gip to c s cho nhng quyt nh chin lc. Thng thng chng c
tng hp, so snh bng cc biu v c kim nghim, tin hc ha. Cng
vic ny thng l ca cc chuyn gia, cc nh phn tch v quyt nh.
1.2.5. a kt qu vo thc t
Cc kt qu ca qu trnh pht hin tri thc c th c a vo ng
dng trong nhng lnh vc khc nhau. Do cc kt qu c th l cc d bo
hoc cc m t nn chng c th c a vo cc h thng h tr ra quyt
nh nhm t ng ha qu trnh ny.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

12
Qu trnh pht hin tri thc c th c tin hnh theo cc bc trn.
Ngoi ra trong qu trnh khai thc ngi ta c th thc hin cc ci tin, nng
cp cho ph hp.
1.3. Khai ph d liu
1.3.1. Cc quan nim v khai ph d liu
Sau y l mt s quan nim v khai ph d liu:
Khai ph d liu l tp hp cc thut ton nhm chit xut nhng thng
tin c ch t kho d liu khng l.
Khai ph d liu c nh ngha nh mt qu trnh pht hin mu
trong d liu. Qu trnh ny c th l t ng hay bn t ng, song phn
nhiu l bn t ng. Cc mu c pht hin thng hu ch theo ngha: cc
mu mang li cho ngi s dng mt li th no , thng l li th v kinh
t.
Khai ph d liu ging nh qu trnh tm ra v m t mu d liu. D
liu nh l mt tp hp ca cc vt hay s kin, cn u ra ca qu trnh khai
ph d liu nh l nhng d bo ca cc vt hay s kin mi.
Khai ph d liu c p dng trong cc c s d liu quan h, giao
dch, c s d liu khng gian, cng nh cc kho d liu phi cu trc, m
in hnh l World Wide Web.
Khm ph tri thc l qu trnh nhn bit cc mu hoc cc m hnh
trong d liu vi cc tnh cht: ng n, mi, kh ch v c th hiu c.
Khai ph d liu l mt bc trong qu trnh khm ph tri thc bao gm cc
thut ton khai ph d liu chuyn dng di mt s quy nh v hiu qu
tnh ton chp nhn c tm ra cc mu v cc m hnh trong d liu.
Nh vy, mc ch ca khm ph tri thc v khai ph d liu l tm ra
cc mu hoc m hnh ang tn ti trong cc c s d liu nhng vn cn b
khut bi s lng d liu khng l.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

13
1.3.2. Nhim v ca khai ph d liu
Cc bi ton lin quan n khai ph d liu v bn cht l cc bi ton
thng k. im khc bit gia cc k thut khai ph d liu v cc cng c
phc v tnh ton thng k m chng ta bit l khi lng cn tnh ton.
Mt khi d liu tr nn khng l th nhng khu nh: thu thp d liu, tin
x l v x l d liu u i hi phi c t ng ha. Tuy nhin cng
on cui cng, vic phn tch kt qu sau khi khai ph d liu vn lun l
cng vic ca con ngi.
Do l mt lnh vc a ngnh, khai ph d liu thu ht cc lnh vc khoa
hc khc nh tr tu nhn to, c s d liu, hin th d liu, marketing, ton
hc, vn tr hc, tin sinh hc, nhn dng mu, tnh ton thng k
iu m khai ph d liu c th lm rt tt l pht hin ra nhng gi
thuyt mnh trc khi s dng nhng cng c tnh ton thng k. M hnh d
bo s dng k thut phn cm (Crustering) chia nhm cc s vt, s kin
sau rt ra cc lut nhm tm ra c trng cho mi nhm v cui cng
ngh mt m hnh. V d, nhng bn c ng k di hn ca mt tp ch c
th phn nhm da theo nhiu tiu ch khc nhau (la tui, gii tnh, thu
nhp), sau tp ch cn c vo c trng ring ca tng nhm ra
mc ph thu trong nm sao cho ph hp nht.
Chng ta thy, nhng nhim v c bn nht ca khai ph d liu l:
- Phn cm, phn loi, phn nhm, phn lp. Nhim v l tr li cu hi:
Mt d liu mi thu thp s thuc v nhm no? Qu trnh ny thng
c thc hin mt cch t ng.
- Khai ph lut kt hp. Nhim v l pht hin ra nhng mi quan h
ging nhu ca cc bn ghi giao dch. Lut kt hp X=>Y c dng tng
qut l: Nu mt giao dch s hu cc tnh cht X th ng thi n
cng s hu cc tnh cht Y, mt mc no . Khai ph lut kt

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

14
hp c hiu theo ngha: Bit trc cc tnh cht X, vy cc tnh cht
Y l nhng tnh cht no?
- Lp m hnh d bo, bao gm hai nhim v: Hoc l phn nhm d
liu vo mt hay nhiu lp d liu xc nh t trc, hoc l s
dng cc trng cho trong mt c s d liu d bo s xut hin
(hoc khng xut hin) ca cc trng hp khc.
- Phn tch i tng ngoi cuc: Mt c s d liu c th c th cha
cc i tng khng tun theo m hnh d liu. Cc i tng d liu
nh vy gi l cc i tng ngoi cuc. Hu ht cc phng php khai
ph d liu u coi cc i tng ngoi cuc l nhiu v loi b chng.
Tuy nhin trong mt s ng dng, chng hn nh pht hin nhiu th s
kin him khi sy ra li c ch hn nhng g thng xuyn gp
phi. S phn tch d liu ngoi cuc c coi nh l phai ph cc i
tng ngoi cuc. Mt s phng php c ng dng pht hin i
tng ngoi cuc: S dng cc hnh thc kim tra mang tnh thng k
trn c s mt phn phi d liu hay mt m hnh xc sut cho d liu,
dng cc o khong cch m theo cc i tng c mt khong
cch ng k n cm bt k khc c coi l i tng ngoi cuc,
dng cc phng php da trn lch kim tra s khc nhau trong
nhng c trng chnh ca cc nhm i tng.
- Phn tch s tin ha: Phn tch s tin ha thc hin vic m t v m
hnh ha cc quy lut hay khuynh hng ca nhng i tng m ng
x ca chng thay i theo thi gian. Phn tch s tin ha c th bao
gm c c trng ha, phn bit, tm lut kt hp, phn lp hay phn
cm d liu lin quan n thi gian, phn tch d liu theo chui thi
gian, so snh mu theo chu k v phn tch d liu da trn tnh tng
t.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

15
1.3.3. Trin khai vic khai ph d liu
Nhm cc tc gi Cabena et al. ngh trin khai qu trnh khai ph
d liu theo 5 bc:
Bc 1: Xc nh r mc tiu thng mi cn khai ph.
Bc 2: Chun b d liu (Thu thp, tin x l, chuyn i khun dng d
liu nu thy cn thit)
Bc 3: Khai ph d liu (Chn thut ton thch hp)
Bc 4: Phn tch kt qu thu c (Xem c g th v khng?)
Bc 5: Tiu ha cc tri thc thu lm c (Nhm ra k hoch khai
thc cc thng tin mi)
Mt tc gi khc cng ni ti quy trnh 5 bc ca khai ph d liu, vi
quan im gn ging nh trn:
1. Chit xut, bin i v np d liu vo h thng kho d liu.
2. Lu tr v qun tr d liu trong mt c s d liu nhiu chiu
3. Xc nh mc tiu cn khai ph (S dng cc cng c phn tch v my
tc nghip)
4. S dng cc phn mm phn tch d liu khai ph d liu
5. Th hin kt qu khai ph di khun dng hu ch hay bng biu, th
1.3.4. Mt s ng dng khai ph d liu
thp k 90 ca th k XX, ngi ta coi khai ph d liu l qu trnh
phn tch c s d liu nhm pht hin ra cc thng tin mi v gi tr, thng
th hin di dng cc mi quan h cha bit n gia cc bin s. Nhng
pht hin ny c s dng nhm tng thm tnh hiu qu ca doanh nghip
trong khi phi cnh tranh trn thng trng. Nh phn tch cc d liu lin
quan n khch hng, doanh nghip c kh nng d bo trc mt s hnh vi
ng x ca khch hng.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

16
Nhng nm gn y, ngi ta quan nim khai ph d liu (i khi cn
dng thut ng khm ph d liu hay pht hin tri thc) l mt qu trnh phn
tch d liu t cc vin cnh khc nhau v rt ra cc thng tin b ch nhng
thng tin c th dng tng li nhun, ct gim chi ph hoc c hai mc
ch. Phn mm khai ph d liu l mt cng c phn tch dng phn tch
d liu. N cho php ngi s dng phn tch d liu theo nhiu gc nhn
khc nhau, phn loi d liu thao nhng quan im ring bit v tng kt cc
mi quan h c bc tch. Xt v kha cnh k thut, khai ph d liu l
mt qu trnh tm kim cc mi tng quan gia cc mu n cha trong hng
chc trng d liu ca mt c s d liu quan h c ln.
Hin nay, k thut khai ph d liu ang c p dng mt cch rng
ri trong rt nhiu lnh vc kinh doanh v i sng khc nhau nh:
- Thng mi: Phn tch d liu bn hng v thi trng, phn tch u t,
quyt nh cho vay, pht hin gian ln,
- Thng tin sn xut: iu khin v lp k hoch, h thng qun l, phn
tch kt qu th nghim,
- Thng tin khoa hc: d bo thi tit, CSDL sinh hc: Ngn hng gen,
khoa hc a l: d bo ng t,
- Trong y t, marketing, ngn hng, vin thng, du lch, internet
V nhng g thu c tht ng gi. iu c chng minh bng thc
t: Chn on bnh trong y t da trn kt qu xt nghim gip cho bo
him y t pht hin ra nhiu trng hp xt nghim khng hp l, tit kim
c nhiu kinh ph mi nm; trong dch v vin thng pht hin ra nhng
nhm ngi thng xuyn gi cho nhau bng mobile v thu li hng triu
USD; IBM Suft-Aid p dng khai ph d liu vo phn tch cc ln ng
nhp Web vo cc trang lin quan n th trng pht hin s thch khch
hng, t nh gi hiu qu ca vic tip th qua Web v ci thin hot ng

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

17
ca cc Website; trang Web mua bn qua mng Amazon cng tng doanh thu
nh p dng khai ph d liu trong vic phn tch s thch mua bn ca khch
hng.
1.3.5. Cc k thut khai ph d liu
Thng c chia thnh hai nhm chnh:
- K thut khai ph d liu m t: C nhim v m t v cc tnh cht
hoc cc c tnh chung ca d liu tring c s d liu hin c. Cc k
thut ny gm c: Phn cm (clustering), tm tt (summerization), trc
quan ha (visualiztation), phn tch s pht trin v lch (evolution
and deviation analyst), phn tch lut kt hp (association rules)
- K thut khai ph d liu d on: C nhim v a ra cc d on
da vo cc suy din trn d liu hin thi. Cc k thut ny gm c:
Phn lp (classification), hi quy (regession)
Tuy nhin, ch c mt s phng php thng dng nht l: Phn cm d
liu, phn lp d liu, phng php hi quy v khai ph lut kt hp
a. Phn cm d liu:
Mc tiu chnh ca phng php phn cm d liu l nhm cc i
tng tng t nhau trong tp d liu vo cc cm sao cho cc i tng
thuc cng mt lp l tng ng cn cc i tng thuc cc cm khc nhau
s khng tng ng. Phn cm d liu l mt v d ca phng php hc
khng c thy. Khng ging nh phn lp d liu, phn cm d liu khng
i hi phi nh ngha trc cc mu d liu hun luyn. V th c th coi
phn cm d liu l mt cch hc bng quan st (learning by observation),
trong khi phn lp d liu l hc bng v d (learning by example). Trong
phng php ny bn khng th bit kt qu cc cm thu c s th no khi
by u qu trnh. V vy, thng thng cn c mt chuyn gia v lnh vc
nh gi cc cm thu c. Phn cm d liu c s dng nhiu trong

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

18
cc ng dng v phn on th trng, phan on khch hng, nhn dng
mu, phan loi trang Web Ngoi ra phn cm d liu cn c th c s
dng nh mt bc tin x l cho cc thut ton khai ph d liu khc.
b. Phn lp d liu:
Mc tiu ca phng php phn lp d liu l d on nhn lp cho
cc mu d liu. Qu trnh phn lp d liu thng gm hai bc: Xy dng
m hnh v s dng m hnh phn lp d liu.
- Bc 1: Mt m hnh s c xy dng da trn vic phn tch cc
mu d liu sn c. Mi mu tng ng vi mt lp, c quyt nh
bi mt thuc tnh gi l thuc tnh lp. Cc lp d liu ny cn c
gi l lp d liu hun luyn (training data set). Cc nhn lp ca tp
d liu hun luyn u phi c xc nh trc khi xy dng m hnh.
- Bc 2: S dng m hnh phn lp d liu. Trc ht, chng ta phi
tnh chnh xc ca m hnh. Nu chnh xc l chp nhn c,
m hnh s c s dng d on nhn lp cho cc mu d liu
khc trong tng lai.
V d v vic s dng phng php phn lp trong khai ph d liu l ng
dng phn lp cc xu hng trong th trng ti chnh v ng dng t ng
xc nh cc i tng ng quan tm trong c s d liu nh ln.
c. Phng php hi quy:
Phng php hi quy khc vi phn lp d liu ch: Hi quy dng
d on v cc gi tr lin tc cn phn lp d liu ch dng d on v
cc gi tr ri rc.
Hi quy l mt hm hc nh x mc d liu thnh mt bin d on c
gi tr thc. C rt nhiu ng dng khai ph d liu vi nhim v hi quy,
chng hn nh kh nng nh gi t vong ca bnh nhn khi bit cc kt qu

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

19
xt nghim; chn on, d on nhu cu tiu th mt sn phm mi bng mt
hm chi tiu qung co.
d. Khai ph lut kt hp:
Mc tiu ca phng php ny l pht hin v a ra cc mi lin h
gia cc gi tr d liu trong c s d liu. Mu u ra ca gii thut khai ph
d liu l lut kt hp tm c. Chng hn, phn tch c s d liu bn hng
nhn c thng tin v nhng khch hng mua my tnh c khuynh hng
mua phn mm qun l ti chnh trong cng ln mua c miu t trong lut
kt hp sau: My tnh=>Phn mm qun l ti chnh ( h tr: 2%, tin
cy: 60%)
h tr v tin cy l hai o ca s ng quan tm ca lut.
Chng phn nh s hu ch v s chc chn ca lut khm ph. h tr
2% c ngha l 2% ca tt c cc v ang phn tch ch ra rng my tnh v
phn mm qun l ti chnh l c mua cng nhau. Cn tin cy 60%
c ngha l: 60% cc khch hng mua my tnh cng mua phn mm. Khai
ph lut kt hp c thc hin qua hai bc:
Bc 1: Tm tt c cc tp mc ph bin, mt tp mc ph bin c
xc nh qua tnh h tr v tha mn h tr cc tiu
Bc 2: Sinh ra cc lut kt hp mnh t tp mc ph bin, cc lut
phi tha mn h tr cc tiu v tin cy cc tiu.
Phng php ny c s dng rt hiu qu trong cc lnh vc nh
maketing c ch ch, phn tch quyt nh, qun l kinh doanh, phn tch gi
th trng
1.3.6. Kin trc ca h thng khai ph d liu
Nh trnh by trn, khai ph d liu l mt giai on trong qu
trnh pht hin tri thc t s lng ln d liu lu tr trong cc c s d liu,
kho d liu hoc cc ni lu tr khc. Bc ny c th tng tc ln nhau

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

20
gia ngi s dng hoc c s tri thc, nhng mu ng quan tm c a
cho ngi dng hoc lu tr nh l tri thc mi trong c s tri thc.
















Hnh 1.2 Kin trc ca h thng khai ph d liu

Kin trc ca h thng khai ph d liu (Hnh 1.2) c cc thnh phn nh sau:
- C s d liu, kho d liu: l mt hoc tuyn tp cc c s d liu,
kho d liu Cc k thut lm sch d liu, tch hp, lc d liu c
th thc hin trn d liu
- C s d liu hoc kho d liu phc v: L kt qu ly d liu c lin
quan trn c s khai ph d liu ca ngi dng.
- C s tri thc: l lnh vc tri thc c s dng hng dn vic
tm hoc nh gi cc mu kt qu thu c
- M t khai ph d liu: Bao gm tp cc modul chc nng thc hin
cc nhim v m t c in, kt hp, phn lp, phn cm d liu
- nh gi mu: Thnh phn ny s dng cc o v tng tc vi
modul khai ph d liu tp trung vo tm cc mu quan tm.
Giao din ngi dng
nh gi mu
M t khai ph d liu
CSDL hay kho d liu phc v
C s d liu Kho d liu
C s tri thc

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

21
- Giao din ngi dng: y l modul gia ngi dng v h thng khai
ph d liu. Cho php ngi dng tng tc vi h thng trn c s
nhng truy vn hay tc v, cung cp thng tin cho vic tm kim.
1.3.7. Qu trnh khai ph d liu
Cc thut ton khai ph d liu thng c m t nh nhng chng
trnh hot ng trc tip trn tp d liu. Vi phng php my hc v thng
k trc y, thng th bc u tin cc thut ton np ton b tp d liu
vo b nh. Khi chuyn sang cc ng dung cng nghip lin quan n vic
khai thc cc kho d liu ln, m hnh ny khng th p ng bi v khng
th np ht d liu vo b nh m cn kh c th chit xut ra nhng tp n
gin phn tch.
Qu trnh khai ph d liu (Hnh 1.3) bt u bng cch xc nh chnh
xc vn cn gii quyt. Tip n l xc nh d liu lin quan dng xy
dng gii php. Bc tip theo l thu thp cc d liu lin quan v x l
chng thnh dng sao cho thut ton khai ph c th hiu c.










Hnh 1.3. Qu trnh khai ph d liu

Xc nh
nhim
v
Xc nh
d liu
lin quan
Thu thp
v tin
x l d
liu
Thut
ton khai
ph d
liu
D liu trc tip
Mu

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

22
Sau chn thut ton khai ph d liu thch hp v thc hin vic khai
ph d liu tm c cc mu c ngha di dng biu din tng ng
(lut kt hp, cy quyt nh )
Kt qu thu c mu phi c c im mi. mi c th c i
snh tng ng vi thay i trong d liu hoc bng tri thc. Thng th
o mi ca mu c nh gi bng mt hm logic hoc hm o mi.
Ngoi ra mu cn c kh nng s dng tim n.
Vi thut ton v nhim v khai ph d liu khc nhau th dng mu
chit xut c cng rt a dng.
1.3.8. Nhng kh khn trong khai ph d liu
Vic nghin cu v ng dng k thut khai ph d liu gp nhiu kh
khn, nhng khng phi l khng gii quyt c m chng cn c tm
hiu c th pht trin tt hn. Nhng kh khn pht sinh trong khai ph d
liu chnh l d liu trong thc t thng ng, khng y , ln v b nhiu.
Trong trng hp khc, ngi ta khng bit c s d liu c cha thng tin
cn thit cho vic khai thc hay khng v lm th no gii quyt s d tha
thng tin khng thch hp ny.
- D liu ln: Hin nay cc c s d liu vi hng trm trng v bng,
hng triu bn ghi vi kch thc rt ln, c th ln n GB. Cc
phng php gii quyt hin nay l a ra mt ngng cho c s d
liu, ly mu, cc phng php tnh xp x, x l song song.
- Kch thc ln: khng ch c s lng bn ghi m s cc trng trong
c s d liu cng nhiu. V vy m kch thc ca bi ton tr nn ln
lm tng khng gian tm kim. Hn na, n cng lm tng kh nng
mt thut ton khai ph d liu c th tm thy cc mu gi. Bin php
khc phc l lm gim kch thc tc ng ca bi ton v s dng cc
tri thc bit trc xc nh cc bin khng ph hp.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

23
- D liu ng: c im c bn ca hu ht cc c s d liu l ni
dung ca chng thay i lin tc. Chng hn nh cc bin trong c s
d liu ca ng dng cho chng c th b thay i, b xa hoc l
tng ln theo thi gian. D liu c th thay i theo thi gian v vic
khai ph d liu b nh hng bi thi im quan st d liu, do c
th lm cho mu khai thc c trc mt gi tr. Vn ny c
gii quyt bng gii php tng trng nng cp cc mu v coi
nhng thay i nh l c hi khai thc bng cch s dng n tm
kim cc cu b thay i.
- Cc trng d liu khng ph hp: Mt c im quan trng khc l
tnh khng thch hp ca d liu ngha l mc d liu tr thnh
khng thch hp vi trng tm hin ti ca vic khai thc. Bn cnh ,
tnh ng dng ca mt thuc tnh i vi mt tp con ca c s d liu
cng l mt vn i khi cng lin quan dn ph hp.
- Cc gi tr b thiu: S c mt hay vng mt ca gi tr cc thuc tnh
d liu ph hp c th nh hng n vic khai ph d liu. Trong h
thng tng tc, s thiu vng d liu quan tng c th dn ti yu cu
cho gi tr ca n hoc kim tra xc nh gi tr ca n. Hoc cng
c th s vng mt ca d liu c coi nh mt iu kin, thuc tnh
b mt c th c xem nh mt gi tr trung gian v ga tr khng bit.
- Cc trng d liu b thiu: Mt quan st khng y c s d liu c
th lm cho d liu c gi tr b xem nh c li. Vic quan st c s d
liu phi pht hin c ton b cc thuc tnh c th dng thut
ton khai ph d liu c th p dng gii quyt bi ton. Gi s ta c
cc thuc tnh phn bit cc tnh hung ng quan tm. Nu chng
khng lm c iu th c ngha l c li trong d liu. y

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

24
cng l vn thng xy ra trong c s d liu kinh doanh. Cc thuc
tnh quan trng c th s b thiu d liu khng c chun b.
- Qu ph hp: Khi mt thut ton tim kim tham s tt nht cho mt m
thnh no s dng mt tp d liu hu hn, n c th s b tnh trng
qu d liu (ngha l tm kim qu mc cn thit gy ra hin
tng ch ph hp vi d liu m khng c kh nng p ng cho
cc d liu l), lm cho m hnh hot ng rt km i vi cc d liu
th. Cc gii php khc phc nh nh gi cho, thc hin theo nguyn
tc no hoc s dng cc bin php thng k khc.
- Kh nng biu t mu: Trong rt nhiu ng dng, iu quan trng l
nhng iu khai thc c phi cng d hiu vi con ngi cng tt. V
vy, cc gii php thng bao gm vic din t di dng ha, xy
dng cu trc lut vi cc th c hng, biu din bng ngn ng t
nhin v k thut khc nhm biu din cc tri thc v d liu.
- S tng tc vi ngi s dng cc tri thc sn c: Rt nhiu cng c
v phng php khai ph d liu khng thc s tng tc vi ngi
dng v khng d dng kt hp cng vi cc tri thc bit trc .
Vic s sng tri thc min l rt quan trng trong khai ph d liu.
c nhiu bin php nhm khc phc vn ny nh s dng c s d
liu suy din pht hin tri thc, nhng tri thc ny sau c s
dng hng dn cho vic tm kim khai ph d liu hoc s dng s
phn b xc sut d liu trc nh mt dng m ha tri thc c sn.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

25
Chng 2
LUT KT HP TRONG KHAI PH D LIU
2.1. Bi ton kinh in dn n vic khai ph lut kt hp
Bi ton gi mua hng trong siu th.
Gi nh chng ta c rt nhiu mt hng, v d nh bnh m,
sa,(coi l tnh cht hoc trng). Khch hng khi i siu th s b vo
gi mua hng ca h mt s mt hng no , v chng ta mun tm hiu cc
khch hng thng mua cc mt hng no ng thi, thm ch chng ta khng
cn bit khch hng c th l ai. Nh qun l dng nhng thng tin ny
iu chnh vic nhp hng v siu th, hay n gin l b tr sp xp cc
mt hng gn nhau, hoc bn cc mt hng theo mt gi hng, gip cho
khc mt cng tm kim.
Bi ton ny hon ton c th p dng trong cc lnh vc khc. V d:
- Gi hng = vn bn. Mt hng = t. Khi , nhng t hay i cng nhau
s gip ta nhanh chng tm ra cc li din t, hay cc khi nim c
mt trong vn bn.
- Gi hng = vn bn. Mt hng = cu. Khi , nhng vn bn c nhiu
cu ging nhau gip pht hin ra s o vn, hay nhng website p
(mirror website).
Khai ph lut kt hp c mt t nh s tng quan ca cc s kin-
nhng s kin xut hin thng xuyn mt cc ng thi. Nhim v chnh
ca khai ph lut kt hp l pht hin ra cc tp con cng xut hin trong mt
khi lng giao dch ln ca mt c s d liu cho trc. Ni cch khc,
thut ton khai ph lut kt hp cho php to ra cc lut m t cc s kin xy
ra ng thi (mt cch thng xuyn) nh th no. Cc thut ton ny tri
qua 2 pha: pha u l i tm cc s kin xy ra thng xuyn, pha hai l tm
lut.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

26
2.2. nh ngha v lut kt hp
nh ngha:
Cho I={I1, I2, .., Im} l tp hp ca m tnh cht ring bit. Gi s D l
CSDL, vi cc bn ghi cha mt tp con T cc tnh cht (c th coi nh T _
I), cc bn ghi u c ch s ring. Mt lut kt hp l mt mnh ko theo
c dng XY, trong X, Y _ I, tha mn iu kin XY=C. Cc tp hp
X v Y c gi l cc tp hp tnh cht (itemset). Tp X gi l nguyn nhn,
tp Y gi l h qu.
C 2 o quan trng i vi lut kt hp: h tr (support) v
tin cy (confidence), c nh ngha nh phn di y.
nh ngha: h tr
nh ngha 2.1: h tr ca mt tp hp X trong c s d liu D l t s
gia cc bn ghi T _ D c cha tp X v tng s bn ghi trong D (hay l phn
trm ca cc bn ghi trong D c cha tp hp X), k hiu l support(X) hay
supp(X) (support s t sinh ra khi ci thut ton)
Supp(X)=
| { T D: Y X}|
| | D
c

Ta c: 0 s supp(X) s 1 vi mi tp hp X.
nh ngha 2.2: h tr ca mt lut kt hp XY l t l gia s lng
cc bn ghi cha tp hp XY, so vi tng s cc bn ghi trong D - K hiu
supp(XY)
Supp(XY)=
| { T D: T X Y}|
| | D
c _

Khi chng ta ni rng h tr ca mt lut l 50%, c ngha l coc
50% tng s bn ghi cha XY. Nh vy, h tr mang ngha thng k
ca lut.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

27
Trong mt s trng hp, chng ta ch quan tm n nhng lut c
h tr cao (V d nh lut kt hp xt trong ca hng tp phm). Nhng cng
c trng hp, mc d h tr ca lut thp, ta vn cn quan tm (v d lut
kt hp lin quan n nguyn nhn gy ra s t lin lc cc tng i in
thoi)
nh ngha: tin cy
nh ngha 2.3: tin cy ca mt lut kt hp XY l t l gia s lng
cc bn ghi trong D cha XY vi s bn ghi trong D c cha tp hp X. K
hiu tin cy ca mt lut l conf(r). Ta c 0 s conf(r) s 1
Nhn xt: h tr v tin cy c xc sut sau:
Supp(XY)=P(XY)
Conf (XY) = P(Y/X)=supp(XY)/supp(X)
C th nh ngha tin cy nh sau:
nh ngha 2.4: tin cy ca mt lut kt hp XY l t l gia s lng
cc bn ghi ca tp hp cha XY, so vi tng s cc bn ghi cha X.
Ni rng tin cy ca mt lut l 90%, c ngha l c ti 90% s bn
ghi cha X cha lun c Y. Hay ni theo ngn ng xc sut l: Xc sut c
iu kin sy ra s kin Y t 85%. iu kin y chnh l: Xy ra s
kin X.
Nh vy, tin cy ca lut th hin s tng quan (correlation) ga X
v Y. tin cy o sc nng ca lut, v ngi ta hu nh ch quan tm n
nhng lut c tin cy cao. Mt lut kt hp i tm cc nguyn nhn dn ti
hng hc ca h thng tng i, hay cp n nhng mt hng thng hay
c khch hng mua km vi mt hng chnh m tin cy thp s khng
c ch cho cng tc qun l.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

28
Vic khai thc cc lut kt hp t c s d liu chnh l vic tm tt cc
cc lut c h tr v tin cy do ngi s dng xc nh trc. Cc
ngng ca h tr v tin cy c k hiu l minsup v mincof.
V d: Khi phn tch gi hng ca ngi mua hng trong mt siu th ta
c lut kiu nh: 85% khch hng mua sa th cng mua bnh m, 30% th
mua c hai th. Trong : mua sa l tin cn mua bnh m l kt
lun ca lut. Con s 30% l h tr ca lut cn 80% l tin cy ca lut.
Chng ta nhn thy rng tri thc em li bi lut kt hp dng trn c
s khc bit rt nhiu so vi nhng thng tin thu c t cc cu lnh truy
vn d liu thng thng nh SQL. l nhng tri thc, nhng mi lin h
cha bit trc v mang tnh d bo ang tim n trong d liu. Nhng tri
thc ny khng n gin l kt qu ca php nhm, tnh tng hay sp xp m
l ca mt qu trnh tnh ton kh phc tp.
nh ngha: Tp hp
nh ngha 2.5: Tp hp X c gi l tp hp thng xuyn (Frenquent
itemset) nu c supp(X) > minsup, vi minsup l ngng h tr cho trc.
K hiu cc tp ny l FI
Tnh cht 2.1: Gi s A,B _ I l hai tp hp vi A_B th supp(A) > supp(B)
Nh vy, nhng bn ghi no cha tp hp B th cng cha tp hp A
Tnh cht 2.2: Gi s A, B l hai tp hp, A,B _ I, nu B l tp hp thng
xuyn v A_B th A cng l tp hp thng xuyn.
Tht vy, nu B l tp hp thng xuyn th supp(B) > minsup, mi tp
hp A l con ca tp hp B u l tp hp thng xuyn trong c s d liu
D v supp(A) > supp(B) (Tnh cht 2.1)
Tnh cht 2.3: Gi s A, B l hai tp hp, A _ B v A l tp hp khng
thng xuyn th B cng l tp hp khng thng xuyn

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

29
nh ngha 2.6: Mt tp mc X c gi l ng (closed) nu khng c tp
cha no ca X c cng h tr vi n, tc l khng tn ti mt tp mc X
no m XX v t(X) = t(X) (vi t(x) v t(X) tng ng l tp cc giao
cha tp mc X v X). K hiu tp ph bin ng l FCI.
nh ngha 2.7: Nu X l ph bin v khng tp cha no ca X l ph bin,
ta ni rng X l mt tp ph bin ln nht (maximally frequent itemset). K
hiu tp tt c cc tp ph bin lm nht l MFI. D thy MFI _ FCI _ FI.
Khai ph lut kt hp l cng vic pht hin ra (tm ra, khm ph, pht
hin) cc lut kt hp tha mn cc ngng h tr (o) v ngng tin
cy (o) cho trc. Bi ton khai ph lut kt hp c chia thnh hai bi ton
nh, hay nh ngi ta thng ni, vic gii bi ton tri qua hai pha:
Pha 1: Tm tt c cc tp ph bin (tm FI) trong CSDL T.
Pha 2: S dng tp FI tm c pha 1 sinh ra cc lut tin cy
(interesting rules). tng chung l nu gi ABCD v AB l cc tp mc
ph bin, th chng ta c th xc nh lut AB CD vi t l tin cy:
conf =
supp( )
supp( )
ABCD
AB

Nu conf > minconf th lut c gi li (v tha mn h tr ti
thiu v ABCD l ph bin).
Trong thc t, hu ht thi gian ca qu trnh khai thc lut kt hp l
thc hin pha 1. Nhng khi c nhng mu rt di (mu cha nhiu mc)
xut hin trong d liu, vic sinh ra ton b cc tp ph bin (FI) hay cc tp
ng (FCI) l khng thc t. Hn na, c nhiu ng dng m ch cn sinh tp
ph bin ln nht (MFI) l , nh khm ph mu t hp trong cc ng dng
sinh hc.
C rt nhiu nghin cu v cc phng php sinh tt c cc tp ph
bin v tp ph bin ln nht mt cch c hiu qu. Khi cc mu ph bin

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

30
(frequent patterm) di c t 15 n 20 items) th tp FI, thm ch c tp FCI
tr nn rt ln v hu ht cc phng php truyn thng phi m qu nhiu
tp mc mi c th thc hin c. Cc thut ton da trn thut ton Apriori
m tt c 2
k
tp con ca mi k- itemsets m chng qut qua, v do
khng thch hp vi cc itemsets di c. Cc phng php khc s dng
lookaheads gim s lng tp mc c m. Tuy nhin, hu ht cc
thut ton ny u s dng tm kim theo chiu rng, v d: tm tt c cc k
itemsets trc khi tnh n cc (k+1) itemsets.
Cch lm ny hn ch hiu qu ca lookaheads, v cc mu ph bin di
hn m hu ch vn cha c tm ra.
Thut ton 1 Thut ton c bn:
Input: I, D, o, o
Output: Cc lut kt hp tha mn ngng h tr o, ngng tin cy o.
Algorithm:
1) Tm tt c cc tp hp cc tnh cht c h tr khng nh hn ngng o.
2) T cc tp hp mi tm ra, to ra cc lut kt hp c tin cy khng nh
hn o.
V d minh ha:
Xt 4 mt hng (tnh cht) trong mt ca hng thc phm vi CSDL
cc giao dch thuc loi nh, ch c 4 giao dch (gi mua hng), cho trong cc
bng sau:
Giao dch Mua hng g?
T1 Bnh m, B, Trng
T2 B, Trng, Sa
T3 B
T4 Bnh m, B
Bng 2.1. Giao dch mua hng

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

31
Cho trc 2 ngng o = 40% v o = 60%
Ta tnh h tr ca cc tp hp cc tnh cht.
Tp hp Tp cc
bn ghi
T l h
tr
Vt ngng
h tr 40%
Bnh m {1,4} 2/4 50% ng
B {1,2,3,4} 4/4 100% ng
Trng {1,2} 2/4 50% ng
Sa {2} 1/4 25% Sai
Bnh m, B {1,4} 2/4 50% ng
Bnh m, Trng {1} 1/4 25% Sai
Bnh m, Sa
{C}
0/4 0% Sai
B, Trng {1,2} 2/4 50% ng
B, Sa {2} 1/4 25% Sai
Trng, Sa {2} 1/4 25% Sai
Bnh m, B, Trng {1} 1/4 25% Sai
Bnh m, B, Sa
{C}
0/4 0% Sai
Bnh m, Trng, Sa
{C}
0/4 0% Sai
B, Trng, Sa {2} 1/4 25% Sai
Bnh m, B, Trng, Sa
{C}
0/4 0% Sai
Bng 2.2. Tnh h tr cho cc tp hp cha cc mt hng

Lut kt hp T l tin cy Vt ngng tin cy 60%
Bnh m B 2/4 50% Sai
B Bnh m 2/2 100% ng
B Trng 2/2 100% ng
Trng B 2/4 50% Sai
Bng 2.3. Cc lut kt hp v tin cy ca chng

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

32
Agrawal ch ra vic duyt cc tp hp cc tnh cht tnh ra
ngng h tr ca chng v nh gi c vt ngng o cho trc hay
khng, tn rt nhiu thi gian tnh ton ( phc tp hm m). Cn mt khi
xc nh xong cc tp hp tha mn iu kin trn (gi l cc tp hp xut
hin thng xuyn) th vic KPLKH tn thi gian hn. Agrawal ngh
mt thut ton nh sau.
Thut ton 2- Tm lut kt hp khi bit cc tp hp thng
xuyn):
Input: I, D, o, o, S
Output: Cc lut kt hp tha mn ngng h tr o, ngng tin cy o.
Algorithm:
1) Ly ra mt tp xut hin othng xuyn Se S, v mt tp con X _ S.
2) Xt lut kt hp c dng X (S X), nh gi tin cy ca n xem
c nh hn o hay khng.
Thc cht, tp hp S m ta xt ng vai tr ca tp hp giao S = X Y,
v do X (S X) = C, nn coi nh Y= S X.
Cc thut ton xoay quanh KPLKH ch yu nu ra cc gii php y
nhanh vic thc hin mc 1 ca Thut ton 1. Chng sau ta im qua mt s
thut ton.
2.3. Mt s hng tip cn trong khai ph lut kt hp
Lnh vc khai thc lut kt hp cho n nay c nghin cu v pht
trin theo nhiu hng khc nhau. C nhng xut nhm ci tin thut ton,
c xut tm kim nhng lut c ngha hn v.v v c mt s hng
chnh sau y:
- Lut kt hp nh phn (Binary association rule): l hng nghin cu
u tin ca lut kt hp. Theo dng lut kt hp ny th cc items ch
c quan tm l c hay khng xut hin trong c s d liu giao tc

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

33
(Transaction database) ch khng quan tm v mc hay tn xut
xut hin. Thut ton tiu biu nht ca khai ph dng lut ny l thut
ton Apriori.
- Lut kt hp c thuc tnh s v thuc tnh hng mc (Quantitative and
categorial association rule): cc c s d liu thc t thng c cc
thuc tnh a dng (nh nh phn, s, mc (categorial)...) ch khng
nht qun mt dng no c. V vy khai ph lut kt hp vi cc
c s d liu ny cc nh nghin cu xut mt s phng php ri
rc ha nhm chuyn dng lut ny v dng nh phn c th p dng
cc thut ton c.
- Lut kt hp tip cn theo hng tp th (mining association rule base
on rough set): tm kim lut kt hp da trn l thuyt tp th.
- Lut kt hp nhiu mc (multi-level association ruls): vi cch tip cn
lut kt hp th ny s tm kim thm nhng lut c dng: mua my
tnh PC mua h iu hnh Window AND mua phn mm vn phng
Microsoft Office,
- Lut kt hp m (fuzzy association rule): Vi nhng kh khn gp phi
khi ri rc ha cc thuc tnh s, cc nh nghin cu xut lut kt
hp m khc phc hn ch v chuyn lut kt hp v mt dng gn
gi hn.
- Lut kt hp vi thuc tnh c nh trng s (association rules with
weighted items): Cc thuc tnh trong c s d liu thng khng c
vai tr nh nhau. C mt s thuc tnh quan trng v c ch trng
hn cc thuc tnh khc. V vy trong qu trnh tm kim lut cc thuc
tnh c nh trng s theo mc xc nh no . Nh vy ta thu
c nhng lut him (tc l c h tr thp nhng mang nghiu
ngha).

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

34
- Khai thc lut kt hp song song (parallel mining of association rule):
Nhu cu song song ha v x l phn tn l cn thit v kch thc d
liu ngy cng ln nn i hi tc x l phi c m bo.
Trn y l nhng bin th ca khai ph lut kt hp cho php ta tm
kim lut kt hp mt cch linh hot trong nhng c s d liu ln. Bn cnh
cc nh nghin cu cn ch trng xut cc thut ton nhm tng tc qu
trnh tm kim lut kt hp trong c s d liu.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

35
Chng 3
MT S THUT TON PHT HIN LUT KT HP
3.1. Thut ton AIS
Thut ton do Agrwal ngh nm 1993. Thut ton ny ch trng
khai ph lut kt hp c dng X Y, vi Y l tp hp ch bao gm 1 tnh
cht (tp hp 1 phn t). Thut ton tm cch xy dng dn dn cc tp ng
c vin cho chc v tp hp xut hin o thng xuyn. Vi cch nh s
th t t in cho tng tnh cht, vic b sung phn t cho tp ng c vin
trnh c trng lp, do vy tit kim ti a thi gian tnh ton.
S lng cc tp ng c vin qu nhiu c th gy ra hin tng trn b
nh. Thut ton ngh mt phng n qun l b nh hp l phng
trng hp ny: khng cho php cc ng c vin chim b nh, m ghi thng
chng vo a ch thng trc (disk-resident).
Di y l ni dung ch yu ca Thut ton AIS:
Input: CSDL D, minsup
Output: cc tp mc ph bin
1. L
1
= { cc tp mc ph bin};
2. for (k=2; Lut kt hp
k-1
= C ; k++ ) do begin
3. C
k
= C;
4. forall cc giao dch t e D do begin
5. L
t
= Subset(L
k-1
,t); // cc tp mc ph bin thuc L
k-1
cha trong giao
dch t
6. forall cc tp mc ph bin l
t
e L
t
B do begin
7. C
t
= tng thm mt mc c trong giao dch t;
8. forall cc ng c vin c e C
t
do
9. if (c e C
k
) then
add tng bin m ca c thm 1 cho mc tng ng ca C
k

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

36
else add c v C
k
v tng bin m tng ng thm 1;
10. End
11. L
k
= { c e C
k
| c.count > minsup}
12. End
13. Tr li =
k
L
k
;
Thut ton c p dng t ra thnh cng cho c s d liu ca cc
cng ty bn l hng ha v tm ra cc lut kt hp cp n mi quan h
gia hnh vi ng x mua hng ca khch hng vi 63 gian hng ca cng ty,
sau khi nghin cu 46.873 giao dch mua hng.
3.2. Thut ton SETM
Thut ton do Houtsma ngh nm 1995. Thut ton ny cng s dng
k thut b sung dn dn tng phn t (t tp hp 1 phn t) nhm tm kim
cc tp hp ng c vin. Mt ci tin ng k l Thut ton ngh lu li c
ID ca giao dch cng vi tp hp ng c vin. Agrawal ch ra, Thut ton
ny khng nhng khng c phng n qun l b nh m n cn gi nh
nht ton b tp hp ng c vin ca bc trc vo b nh bc sau tin
b s dng. Sarawagi ch ra Thut ton ny khng hiu qu.
Thut ton SETM c m t hnh thc nh sau:
Input: CSDL D, minsup
Output: Cc tp mc ph bin
1. L
1
= {cc tp mc ph bin};
2. L
1
={cc tp mc ph bin cng cc TID ca n c sp xp theo
TID};
3. for (k=2; Lut kt hp
k-1
= C ; k++ ) do begin
4. C
k
= C;
5. forall cc giao dch t e D do begin

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

37
6. L
t
= (l e L
k-1
| l.TID = t.TID); // cc tp c (k - l) mc ph bin trong
giao dch t
7. forall cc tp mc ph bin l
t
e L
t
do begin
8. C
t
= tng l
t
thm mt mc c trong giao dch t; //Cc ng c vin c
trong t
9. C
k
+={<t.TID, c>| c e C
t
};
10. end
11. end
12. Sort C
k
theo cc tp mc;
13. delete cc mc c e C
k
c c.count<minsup a vo L
k
;
14. L
k
={<l.itemset, countof l in L
k
> | l e L
k
'}; //kt hp vi bc 13
15. Sort L
k
theo TID;
16. end
17. Tr li =
k
L
k
;
3.3. Thut ton Apriori
Thut ton do Agrawal ngh nm 1994, c Cheung nh gi mang
tnh cht lch s trong lnh vc KPLKH, v vt xa tm ca cc thut ton
quen thuc trong lnh vc ny. Thut ton da trn mt nhn xt kh n gin
l bt k tp hp con no ca tp xut hin o thng xuyn cng l tp xut
hin o thng xuyn. Do , trong qu trnh i tm cc tp ng c vin, n
ch cn dng n cc tp ng c vin va xut hin bc ngay trc , ch
khng cn dng n tt c cc tp ng c vin (cho n thi im ). Nh
vy, b nh c gii phng ng k.
1/ Bc 1: cho trc ngng h tr 0 s o s 1. Tm tt c cc mt
hng xut hin o thng xuyn. rng, mt siu th c th c ti
100.000 mt hng. Tp hp tm c k hiu l L
1
.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

38
2/ Bc 2: Ta tin hnh ghp i cc phn t ca L
1
(khng cn
n th t), c tp C2, tp gi l tp cc ng c vin c 2 phn t. S d
ch gi l ng c vin, v cha chc chng l o thng xuyn. Sau khi
kim tra (dng nh ngha), ta lc ra c cc tp hp o thng xuyn c 2
phn t. K hiu tp hp ny l L
2
.
3/ Bc 3: Vi ch nu (v tnh cht tng dn ca cc tp hp o
thng xuyn ), ta tin hnh tm cc ng c vin c 3 phn t (ly t L
1
). Gi
n l tp C
3
. Lu l nu {A, B, C} mun l ng c vin th cc tp 2 phn
t {A, B},{B,C},{C, A } u phi l o thng xuyn, tc l chng u l
phn t ca tp L
2
. Ta i kim tra t cch i biu trong tp C
3
v lc ra
c tp cc tp hp o thng xuyn c 3 phn t. Tp hp ny c k
hiu l L
3
.
4/ Bc 4: Ta tin hnh tm cc ng c vin c n phn t. Gi tp ca
chng l tp C
n
v t y, lc ra L
n
l tp tp cc tp hp o thng xuyn c
n phn t.
Thut ton ny c gip ch c g, ta cng nhau xem xt v d sau:
Cu lnh SQL sau y to cp, x l 10 triu gi mua hng, mi gi
mua hng trung bnh c 10 mt hng, vi gi thit siu th c khong 100.000
mt hng:
SELECT b1.item b2.item COUNT(*)
FROM Baskets b1, Baskets b2
WHERE b1.BID = b2.BID AND b1.item <b2. item
GROUP BY b1.item , b2. item
HAVING COUNT(*) >= s;
Cu lnh WHERE m bo cc cp ghp khng b p 2 ln (v ta
khng cn n t t cc phn t).
Cu lnh HAVING bo cc tp hp chn ra l o thng xuyn.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

39
Nhn xt: Khi ghp Baskets vi chnh n, mi gi ta c 45 cch ch ra cc
cp ng vin [do (10*9)/2=45], v do c 10 triu gi mua hng, nn ta phi
xt 45x10
7

trng hp lc ra cc cp o thng xuyn.
Trong khi nu s dng Thut ton Apriori, trc ht ta gim c
ng k kch thc ca Baskets, v bc 1 ta i tm cc phn t (mt hng)
xut hin o thng xuyn.
SELECT *
FROM Baskets
GROUP BY item
HAVING COUNT (*) >= s;
S gim kch thc ca Baskets cha phi l im ct yu. im ct
yu l khi ta kt hp tm cp, ta s gim c bnh phng ln.
Ct li ca thut ton Apriori l hm apriori_gen() do Agrawal ngh
nm 1994. Hm ny hot ng theo 2 bc, bc 1- tp hp L
k-1
t kt ni
(join) vi chnh n to ra tp ng c vin C
k
. Sau hm apriori_gen()
loi b cc tp hp c mt hp con (k-1) phn t khng nm trong L
k-1
(v
chng khng th l tp hp xut hin o thng xuyn, theo nh nhn xt
ban u).
Method: apriori_gen() [Agrwal1994]
Input: Lp cc tp hp xut hin o thng xuyn c (k-1) phn t, k hiu
l Lk
-1

Output: Lp cc tp hp xut hin o thng xuyn c k phn t, k hiu l
Lut kt hp
// Bc t kt ni
I
i
= Items i
Insert into C
k

Select p.I
1
, p.I
2
,, p.I
k-1
, q.I
k-1


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

40
From L
k-1
is p, L
k-1
is q
Where p.I
1
= q.I
1
and.and p.I
k-2
= q.I
k-2
and p.I
k-1
< q.I
k-1

//Bc ta bt
Forall itemsets c C
k
do
Forall (k-1)- subsets s of c do
If (s is not of L
k-1
) then
Delete c from C
k

Hm sau y c nhim v r sot tng tnh cht v o m xem gi
ca n bng bao nhiu. Ni cch khc, bc u tin Agrawal dng hm
count() tm ra cc tp hp xut hin o thng xuyn c 1 phn t.
Function count(C:a set of itemsets, D: database)
begin
for each transaction T e D = D
i
do
begin
forall subsets xe T do if x e C then x.count++;
end
end
Di y l ton b Thut ton Apriori
Thut ton 3- Apriori [Agrawal1994]
Input: I, D, o
Output: L
Algorithm:
//Apriori Algorithm prposed by Agrawal R., Srikant, R. [Agrawal1994]
//procedure LargeItemsets
1) C
1
: = I; // Tp ng c vin c 1 phn t
2) Sinh ra L
1
bng cch tnh tn s xut hin ca mt hng trong cc giao
dch;

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

41
3) for (k=2; L
k-1
= C; k++) do begin
//To ra cc tp ng c vin
// Cc tp ng c vin c k phn t c sinh ra t cc tp (k-1)- phn t
xut hin o thng xuyn.
4) C
k
= apriori-gen( L
k-1
);
// Tnh h tr cho Ck
5) Count (C
k
, D)
6) L
k
= {c e C
k
| c.count > o}
7) end
8) L:=
k
L
k

Bng 3.1 di y minh ha p dng thut ton cho v d 2 (o =40%)


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

42
C
1
C
1
L
1

Tp 1 phn t
{Bnh m}
{B}
{Trng}
{Sa}
Qut ton
b CSDL
tnh
h tr
Tp hp
{Bnh m}
{B}
{Trng}
{Sa}
h tr
50%
100%
50%
25%
Tp hp
{Bnh m}
{B}
{Trng}

tin cy
50%
100%
50%

C
2
C
2
L
2

Tp 2 phn t
{Bnh m,
B}
{Bnh m,
Trng}
{B,
Trng}

Tp hp
{Bnh m,
B}
{Bnh m,
Trng}
{B, Trng}

h tr
50%

25%

50%
Tp hp
{Bnh m,
B}
{B,
Trng}

tin cy
50%

50%
C
3
Qut ton
b CSDL
tnh
h tr
C
3
L
3

Tp 3 phn t
C
Tp hp h tr
C
Tp hp tin cy
C
Bng 3.1.
Dng thut ton Apriori tnh ra cc tp hp xut hin o thng xuyn
Bn thn Agrawal a ra nhn xt: thut ton Apriori hiu qu hn so
vi AIS v SETM. Trong mt v d minh ha, bc th t, thut ton
Apriori lc b ht, ch cn gi li mt tp ng c vin duy nht, trong khi c
hai thut ton kia vn ngh ti 5 ng c vin. Do , t c kt qu
nh Apriori, hai thut ton kia chc chn phi cn n nhng tnh ton b tr.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

43
Thut ton Apriori ci tin cng gii quyt 2 tnh hung xu, l khi
C
k
hoc L
k-1
to qu, khng cha trong b nh tnh ton. Khi , cn tu
chnh li hm apriori_gen() mt cht.
*Thut ton Apriori nh phn:
Thut ton Apriori nh phn s dng cc vector bit cho cc thuc tnh,
vector nh phn n chiu ng vi n giao tc trong c s d liu. C th biu
din c s d liu bng mt ma trn nh phn trong dng th I tng ng
vi giao tc (bn ghi) t
i
v ct th j tng ng vi mc (thuc tnh ) i
j
. Ma
trn biu din c s d liu v d cho bng di:
TID A B C D E
1 1 1 0 1 1
2 0 1 1 0 1
3 1 1 0 1 1
4 1 1 1 0 1
5 1 1 1 1 1
6 0 1 1 1 0
Bng 3.2. Ma trn biu din c s d liu
Cc vector biu din nh phn cho cc tp 1 thuc tnh c dng sau:
{A} Vector {B} Vector {C} Vector {D} Vector {E} Vector
1 1 0 1 1
0 1 1 0 1
1 1 0 1 1
1 1 1 0 1
1 1 1 1 1
0 1 1 1 0
Bng 3.3. Vector biu din nh phn cho tp 1 thuc tnh


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

44
Cc vector biu din nh phn cho cc tp 2 thuc tnh c dng sau:
{A,B} {A,C} {A,D} {A,E} {B,C} {B,D} {B,E} {C,D} {C,E} {D,E}
1 0 1 1 0 1 1 0 0 1
0 0 0 0 1 0 1 0 1 0
1 0 1 1 0 1 1 0 0 1
1 1 0 0 1 0 1 0 1 0
1 1 1 1 1 1 1 1 1 1
0 0 0 0 1 1 0 1 0 0
Bng 3.4. Vector biu din nh phn cho cc tp 2 thuc tnh
Cc vector biu din cho thy {A,C}, {C,D} c h tr 33% nh hn h
tr ti thiu MinSupp=50% (cho trc) nn b loi.
Cc vector biu din nh phn cho cc tp 3 thuc tnh c dng:
{A,B,D} {A,B,E} {B,C, E} {B,D,E}
1 1 0 1
0 0 1 0
1 1 0 1
0 1 1 0
1 1 1 1
0 0 0 0
Bng 3.5. Vector biu din nh phn cho cc tp 3 thuc tnh
Cc vector biu din nh phn cho cc tp 4 thuc tnh c dng:
{A,B,C,D} {A,B,C,E} {A,C,D,E} {B,C,D,E}
0 0 0 0
0 0 0 0
0 0 0 0
0 1 0 0
1 1 1 1
0 0 0 0
Bng 3.6. Vector biu din nh phn cho cc tp 4 thuc tnh

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

45
3.4. Thut ton Apriori-TID
Thut ton Apriori-TID l phn m rng theo hng tip cn c bn
ca thut ton Apriori. Thay v da vo c s d liu th thut ton Apriori-
TID biu din bn trong mi giao dch bi cc candidate hin hnh.
Nh ta thy, thut ton Apriori i hi phi qut ton b c s d
liu tnh h tr cho cc tp hp ng c vin mi bc. y l mt s
lng ph ln. Da trn t tng c on v nh gi h tr, Agrawal
ngh ci tin Apriori theo hng ch phi qut c s d liu ln u tin, sau
tnh h tr cho cc tp hp 1 phn t. T bc th hai tr i, Thut
ton Apriori-TID nh lu tr song song c ID ca giao dch v cc ng c
vin, c th nh gi, c lng h tr m khi phi qut li ton b c s
d liu.
Ni dung thut ton Apriori-TID
Input: Tp cc giao dch D, minsup
Output: Tp Answer gm cc tp mc thng xuyn trn D
Method:
L
1
= {large 1 itemset};
1
C = database D;
for (k=2; L
k-1
= C; k++) do
begin
;
k
C | =

For all entries t e
1 k
C

do
begin
//Xc nh cc candidate itemset
//c cha trong giao dch vi nh danh t.TID
{ }
1
c (c-c[k]) t.set_of_itemset (c-c[k-1]) t.set_of_itemset ;
k
C C = e e . e


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

46
For all candidates c eC
t
do
c.count++;
if (C
1
=C) then
t
t.TID,C
k k
C C = + < >

end
Lut kt hp= {c e C
k
| c.count > minsup};
end
Answer =
k
L
k

S khc nhau gia Apriori v AprioriTID l: c s d liu khng c
s dng m cc support sau ln u tin qut qua c s d liu. V sau ln
qut u tin cc 1-itemset c sinh (cc L1), cc L1 ny c dng
lc ra cc giao dch ca c s d liu bt k item no l khng ph bin v
nhng giao dch trong
1
C ch cha nhng item khng ph bin. Kt qu
c a vo
2
C v s dng ln qut . V vy kch thc ca
2
C l kh nh
hn so vi
1
C .
S ging nhau ca hai thut ton ny l u s dng bc ct ta trong
hm Apriori_gen()
3.5.Thut ton Apriori-Hybrid
Thut ton Apriori-Hybrid c coi nh kt hp gia Thut ton
Apriori v thut ton Apriori-TID.
Trong thut ton Apriori-Hybrid, c s dng khi t chc lp v
chuyn sang Apriori-TID khi chc chn rng tp
k
C vo b nh chnh.
Thut ton Apriori-Hybrid c coi l tt hn so vi Apriori v AprioriTID.
Nh c nhn xt tinh t l thut ton Apriori chy kh nhanh nhng
bc u tin, cn thut ton Apriori-TID chy nhanh nhng bc sau (v
ng bun l chy kh chm nhng bc u tin), Agrawal ngh
phng n lai ghp: khng nht thit phi chy tt c cc bc cng mt thut

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

47
ton ging nhau. Nhng bc u tin, ng cho chy thut ton Apriori, sau
khi tp cc ng c vin kh ln, sp cha y trong b nh tnh ton, mi
dng thut ton Apriori-TID.
Srikant a ra thm mt nhn xt: thi gian chuyn t thut ton
Apriori sang thut ton Apriori-TID tng i t (tn km), v thut ton
lai ghp Apriori-Hybrid ch t ra hiu qu khi s chuyn mch ny din ra
gn cui qu trnh tm kim tp xut hin o thng xuyn.
3.6. Thut ton FP_growth
Nh ta bit thut ton Apriori l mt bc t ph v khai thc cc
tp mc thng xuyn bng cch s dng k thut ta rt gn kch thc
ca cc tp mc ng c. Tuy nhin, trong trng hp s tp mc nhiu, tp
mc di hoc ngng h tr nh th thut ton gp phi hai chi ph ln:
- Sinh ra s lng khng l cc tp mc ng c. V d nu c 10
4
tp mc 1-
mc thng xuyn th s sinh ra hn 10
7
tp mc 2- mc ng c v thc
hin kim tra xem tp mc no thng xuyn. Hn na, pht hin ra cc
tp mc thng xuyn c kch thc n, thut ton phi kim tra 2
n
-2 cc
tp mc thng xuyn tim n.
- Phi duyt qua c s d liu nhiu ln. S ln duyt c s d liu ca thut
ton Apriori bng di ca tp mc thng xuyn di nht tm c.
Trong trng hp tp mc thng xuyn di v c s d liu ln th khng
th thc hin c. Thut ton Apriori ph hp vi c s d liu tha, cn
vi c s d liu dy th thut ton km hiu qu.
khc phc nhng chi ph ln ca thut ton Apriori nm 2000 Jiawei
Han, Jian pei v Yiwen Yin a ra thut ton mi c gi l FP_growth
tm tp mc thng xuyn bng cch khng sinh cc tp mc ng c t
cc tp mc thng xuyn trc m vn hiu qu bng cch s dng ba k
thut sau:

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

48
Th nht, thut ton s dng cu trc cy mu thng xuyn FP_Tree
nn d liu. Cu trc FP_Tree l m rng ca cu trc cy prefix. Nhng
nt trong cy l cc mc c di l 1, c gn nhn bi tn mc v c
sp xp theo tn sut xut hin ca cc mc cc mc c s ln xut hin
nhiu th s chia s nhiu hn.
Th hai, khai thc pht trin tng on mu da trn FP_Tree, bt u
t mu thng xuyn c kch thc 1 v ch kim tra trn c s mu ph
thuc (conditional pattern base), khi to FP_Tree ca mu ph thuc, thc
hin khai thc quy trn cy ny. Mu kt qu nhn c qua vic kt ni
mu hu t vi mu mi c sinh ra t FP_Tree ph thuc.
Th ba, dng k thut tm kim phn hoch khng gian tm kim v
chia tr chia nhim v khai thc thnh nhng nhim v nh hn v gii
hn li cc mu lm gim khng gian tm kim.
Cy mu thng xuyn
Cy mu thng xuyn l cy c cu trc c nh ngha nh sau:
nh ngha: FP_Tree bao gm nt gc c nhn Null, tp cc cy non prefix
nh l cy con ca nt gc v mt bng tiu cc mc thng xuyn.
Mi nt ca cy con prefix c 3 trng: Item_name, count, nt lin kt
(node link); vi item_name l nhn ca nt, count l s giao tc m mc ny
xut hin, node_link dng lin kt vi nt tip theo trong cy nu c cng
Item_name hay Null nu khng c.
Mi li vo trong bng tiu c hai trng: Item_name v node_link,
node_link tr ti nt u tin trong FP_Tree c cha nhn Item_name.
V d: Cho c s d liu vi cc giao tc v cc mc thng xuyn trong mi
giao tc c sp xp gim dn theo h tr (minsup = 3/5) c th hin
trong bng sau:


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

49
TID Cc mc trong giao tc Cc mc thng xuyn c sp xp
T100 f, a, c, d, g, i, m, p f, c, a, m, p
T200 a, b, c, f, l, m, o f, c, a, b, m
T300 b, f, h, j, o f, b
T400 b, c, k, s, p c, b, p
T500 a, f, c, l, p, m, n f, c, a, m, p
Bng 3.7.Cc giao tc c s d liu
T nh ngha trn chng ta c thut ton xy dng cy mu thng xuyn
FP_Tree nh sau:
Thut ton xy dng cy FP_Tree
Input: c s d liu v ngng h tr minsup
Output: Cy mu thng xuyn FP_Tree
Method:
Bc 1: Duyt qua c s d liu m s ln xut hin ca cc mc
trong giao tc v xc nh mc thng xuyn v h tr ca chng, sp xp
cc mc thng xuyn gim dn theo h tr, ta c danh sch cc mc
c sp xp L.
Bc 2: Xy dng FP_Tree. u tin to nt gc, sau vi mi giao
tc t chn v sp xp cc mc thng xuyn theo th t trong danh sch L,
thc hin thm vo cy FP_Tree bng cch gi hm insert_tree(p|T), thay i
trng count cho ph hp.
V d: Vi c s d liu trnh by trong bng 2.2 ta c:


S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

50

Hnh 3.8. Mt cy mu thng xuyn
Duyt qua c s d liu tm tp mc thng xuyn v sp xp gim dn
theo h tr:
Mc S ln xut hin
F 4
C 4
A 3
B 3
M 3
P 3
Khi to cy T, gc c nhn Null
Duyt qua c s d liu ln th hai, vi mi giao tc loi b cc mc
khng thng xuyn, cc mc cn li sp xp gim dn theo s ln xut hin,
dy cc mc ph bin c thm vo cy v thay i s m cho ph hp.
Qu trnh xy dng cy c th hin nh trong hnh 3.6
Header table

Item Head of
node_link

f
c
a
b
m
p
f:4
c:3
a:3
m:1
p:2
Root
c:1
b:1
p:1
b:1
b:1
m:2

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

51


















Hnh 3.9. Qu trnh xy dng FP_Tree
c:1

Root
f:1
a:1
m:1
p:1
T100
fcamp
T200
fcabm
Root
f:2
c:2
a:2
m:1
p:1
b:1
m:1
T300
fb
Root
f:3
c:2
p:1
a:2
m:1 b:1
m:1
T400
cbb
Root
c:2
f:2
p:1
a:2
m:1 b:1
m:1
c:1
a:1
p:1
b:1
Root
c:3
f:4
p:2
a:3
m:2 b:1
m:1
c:1
a:1
p:1
b:1
T500
fcamp

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

52
Kt qu thu c FP_Tree y nh sau:













Hnh 3.10. Cy FP_Tree ca c s d liu trong bng 2.2
Th tc thm cc mc thng xuyn vo cy FP_Tree:
Procedure Insert_Tree(string[p|P], Tree T)
//Trong p l mc u tin ca dy v P l phn cn li ca dy
{If cy T c nt con N m N.Item_name = p Then N.count++
Else
To nt mi N;
N.Item_name:= p; N.count:=1;
Thay i nt lin kt cho p;
If p=C then
Insert_Tree (p,N):
}
Khai thc tp mc thng xuyn
Sau khi xy dng xong cy FP_Tree cho c s d liu vic tm cc tp mc
thng xuyn ch thc hin trn FP_Tree m khng cn duyt c s d liu.



Item Head of
Node_link
f
c
a
b
m
p

Root
f:4
c:3
a:3
m:2
p:2
m:1
b:1
b:1
p:1
b:1
c:1

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

53
Tnh cht: Khi tm cc mu c cha mc a
i
ch cn tnh ton cho cc nt ca cy
con tin t P ca a
i
s ln xut hin ca cc nt trong ng dn tin t bng s ln
xut hin ca nt a
i
.
Thut ton FP_Growth c thc hin nh sau:
Bt u t di ln trn ca bng header v cy, mi mc A dng nt lin
kt duyt qua tt c cc nt trn cy m xut hin A, vi mi nt N c
n.Item_name = A tm tt c cc ng dn ca cc nt N xut pht t gc ca
cy ti nt N. T cc ng dn ta xy dng cy mu (partten tree) ph thuc
cho A. Sau tm cc mc thng xuyn c cha A t cy mu ph thuc ny. V
d ln lt xt mc theo th t t di ln p, m, .., f nh sau:
Xut pht t mc p:chiu vo cy FP_Tree hnh 3.7 ta c hai ng dn c
cha p l: f:4, c:3, a:3, m:2, p:2 v c:1, b:1, p:1.
Theo cc ng dn trn ta c tp mc fcam v xut hin 2 ln cng vi p,
cb xut hin 1 ln cng vi p. S ln xut hin ca mc p l 2+1= 3 ln. V vy ta
tm cc mc thng xuyn c cha p m c cng tn sut xut hin nh p.
T ta c hai tin ng dn ca p l: {(f:2, c:2, a:2, m:2)}, {(c:1, b:1)} v
l c s mu ph thuc. Khi to cy mu thng xuyn trn c s mu ph thuc
ta c FP_Tree ph thuc v thc hin khai thc quy trn cy ny ta thu c
kt qu, trong cy ny ch c mt nhnh (c:3) nn ta ch c tp mc thng xuyn
(cp) tha mn ngng minsup=3/5.
Mc m c tn sut xut hin l 3, c hai ng dn c cha mc m l (f:4,
c:3, a:3, m:2) v (f:4, c:3, b:1, m:1)
(Ta khng cn xt mc p v tt c cc tp mc thng xuyn c cha p c tm
thy khi x l vi mc p)
T hai ng dn trn ta c hai c s mu ph thuc {(f:2, c:2, a:2), f:1, c:1,
a:1, b:1}. Khi to cy iu kin trn ta c mt ng dn n <f:3, c:3, a:3>
sau thc hin khai thc quy trn cy mu thng xuyn ny. Hnh 3.8 th
hin qu trnh khai thc cc tp mc thng xuyn. Bt u thc hin khai thc ln
lt vi cc nt c nhn a, c, f thu c mt tp mc thng xuyn am, cm, fm,

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

54
tip n thc hin vi mu thng xuyn (am:3) l <f:3, c:3|cm> thu c tp mc
cam, fam v fcm. Thc hin vi <f:3|cam> c fcam. Nh vy vi ng dn
n th kt qu khai thc c th l t hp ca tt c cc mc trong ng dn.

Hnh 3.11. Cc FP_Tree ph thuc

C s mu ph thuc ca m
(f:2, c:2, a:2)
(f:1, c:2, a:1, b:1)
Bng tiu





Mc
Head of
node link
f
c
a

FP_Tree ph thuc ca m



FP_Tree tng qut C s mu ph thuc ca cam(f:3)
FP_Tree ph thuc ca cam(f:3)


C s mu ph thuc ca am: (f:3, c:3)
FP_Tree ph thuc ca am



C s mu ph thuc ca cm(f:3)
FP_Tree ph thuc ca cm(f:3)

Root
m:1
f:4
c:3
a:3
c:4
b:1 f:4
m:2
p:2
b:1
f:4
f:3
Root
c:2
a:2
f:3
Root
c:3
f:3
Root
f:3
Root

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

55

Thut ton FP_Growth
Procedue FP_Growth(tree, o)
{ If (cy cha mt ng n P) then
For mi t hp (k hiu |) ca cc nt trong ng dn P Do
Sinh mu o |vi support = h tr nh nht ca cc nt trong |
Else
For mi a
i
trong header ca cy Do
{ Sinh mu | =o
i
o
support= o
i
.support
Tm c s mu ph thuc ca | v khi to cy FP_Tree ph thuc Tree|

If Tree|

= C Then FP_Growth(Tree|,

|)
}
Thut ton FP_growth hiu qu ch l ch duyt qua c s d liu hai ln
xc nh cc mc thng xuyn v to cy FP_Tree. Nh s dng cu trc
FP_Tree m trong qu trnh khai thc cc mu thng xuyn khng cn phi duyt
li c s d liu m ch cn xut pht t cc mc a
i
trong bng tiu , sinh ra
nhng c s mu ph thuc, nhng a
i
c x l th s khng xem xt trong x
l cc a
i
sau .
Thut ton phn hoch khng gian tm kim thu nh khng gian tm kim,
dng phng php chia tr phn r ra thnh nhng nhim v nh to nn hiu
qu. Sp xp cc mc gim dn theo tn sut xut hin ca cc mc dn n cc
mc thng xuyn hn th c chia s nhiu hn.
Thut ton ph hp vi c d liu tha, dy v mu di. ng thi thut ton
cng loi b ngay nhng mc khng ph bin t u.
3.7. Thut ton PARTITION [Savasere 95]
Thut ton Partition dng k thut tm kim theo b rng v giao tp hp ca
cc bin nhn dng (TID-List Intersection).

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

56
Thut ton Partition l thut ton ta Apriori dng tp giao xc nh gi
tr support. Nh trnh by trn thut ton Apriori xc nh gi tr support ca
tt c cc k-1 candidate trc khi tnh k candidate.
Vn t ra l thut ton Partition mun dng TIDList ca tp ph bin (k-
1)-item pht sinh ra IDList ca k candidate. Mt iu hin nhin l kch thc
pht sinh ca cc kt qu trn s vt qu gii hn ca b nh vt l ca my tnh
thng thng mt cch d dng.
gii quyt vn ny thut ton Partition chia c s d liu thnh nhiu
phn v chng c x l c lp nhau. Kch thc ca mi phn c chn nh
cch thc ca TIDList c lu trn b nh chnh.
Sau khi xc nh tp h bin cho mi phn ca c s d liu, cn phi c
mot tao tc duyt li ton b c s d liu m bo rng tp ph bin cc b
cng l tp ph bin ton cc.
Thut ton Partition lm gim s ln qut d liu [18]. N chia c s d liu
thnh nhng phn nh v mi phn ny c lu tr trn b nh chnh, gi s cc
phn ny l D
1
, D
2
,., D
p
. Trong ln qut u tin, n tm large-itemset i
phng trong mi D
i
(1 s i s p), vi large-itemset a phng L
i
c th tm c
bng cch s dng mt thut ton Level-wise chng hn nh Apriori. T mi phn
c th iu chnh b nh. Trong ln qut th hai, trong mi phn n m cc
candidate-itemset.
Input: I, o, D
1
, D
2
,., D
p
.
Output: L
Algorithm:
//Tm cc tp xut hin o thng xuyn trong tng ln phn hoch
1) for I from 1 to p do
2) L
i
= Apriori (I, D
i
, o ); //L
i
l cc tp xut hin o thng xuyn trong D
i

// Ghp cc tp con li to ra tp ng c vin
3) C=
i
L
i

4) count (C,D)=
i
D
i
;

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

57
5) return L= {x | x eC, x.count > o x|D|};
Thut ton ny t ra hiu qu khi phn b d liu trong c s d liu b lch.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

58
Chng 4
KHAI THC LUT KT HP TRONG BI TON QUN L THIT B
TRNG THPT CHU VN AN- THI NGUYN
4.1. Pht biu bi ton
Trng THPT Chu Vn An Tnh Thi Nguyn l trng THPT u tin
c B Gio dc o to cng nhn trng t chun Quc gia giai on 2001-
2010 ca tnh Thi Nguyn, v l trng s 16 trn ton quc t chun ti thi
im (nm 2003). Hin nay trng l n v i u trong cc trng THPT ng
dng c hiu qu Cng ngh thng tin v truyn thng trong vic qun l v ging
dy. c c nhng thnh tch ng trn trng chnh l nh vo i ng
gio vin 100% t chun v c s vt cht hin i ca Nh trng.
Ngoi c s vt cht (lp hc, bn, gh) nh cc trng khc th trng
THPT Chu Vn An cn qun l 150 b my vi tnh, 27 my chiu projector, 9 my
in, ...Trong 100% cc lp hc u c trang b y my tnh v my chiu.
Vi s lng trang thit b hin i nhiu n nh vy th vn qun l c ton
b cc trang thit b, dng trong trng bng s sch qu l mt cng vic ht
sc nng nhc dnh cho ngi qun l.
gim bt kh khn cn c mt chng trnh qun l trang thit b
nhm h tr cho ngi qun l trong cng vic ca mnh v d nh: la chn thit
b, dng cn mua: mua nhng thit b g lin quan? mua s lng bao nhiu?
khi cn thay th th c nhng nhm thit b g trnh lng ph? Din tch phng
thc hnh l 70m
2
th cn c thit b g?...
Vic ng dng khai thc lut kt hp trong qun l trang thit b gip ngi
qun l nm bt c c th trang thit b ca tng loi phng, danh sch cc
thit b hay lin quan ti nhau, t khi cn mua sm hay sa cha thay th ngi
qun l s c c cng c h tr c lc gip a ra nhanh quyt nh.
Chng trnh ny c ci t bng thut ton Apriori nh phn bi nh
bit, thut ton Apriori nh phn da trn mt nhn xt kh n gin l bt k tp
con no ca tp xut hin othng xuyn cng l tp xut hin othng xuyn.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

59
Do , trong qu trnh i tm cc tp ng c vin, n ch cn dng n cc tp ng
c vin va xut hin bc ngay trc , ch khng cn tt c cc tp ng c
vin (cho n thi im ). Nh vy, b nh c gii phng ng k.
4.2. C s d liu ca bi ton
- Bng danh mc cc phng cn qun l thit b

Hnh 4.1.Bng danh mc cc phng

Cu trc v v d d liu ca bng nh sau:
+ Maphong: Ghi m phng
+ Loaiphong: Ghi loi phng l phng hp, phng hc hay phng thc hnh
+ Tenphong: Ghi tn c th ca phng

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

60
+ Nha: Ghi tn dy nh ca phng
+ Tang: Ghi tn tng
- Bng thng k chi tit cc thit b trong phng

Hnh 4.2.Bng thng k chi tit cc thit b trong phng

+ Trng Maphong: Ghi m phng
+ Cc trng cn li l tn ca cc thit b cn qun l nh: Attomat, Ampli,
Banhs (bn hc sinh), DieuHoa (iu ho),....v d liu ghi s lng ca thit b
.



S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

61

4.3. Ri rc cc thuc tnh gc to thnh cc thuc tnh nh phn

Hnh 4.3.Bng ng k tn thuc tnh ri rc
Bng gm cc trng
+ M TT gc: ghi m thuc tnh gc
+ M TT ri rc: ghi m thuc tnh c tch ra (ri rc) t thuc tnh gc.
Mt thuc tnh gc c tch thnh n thuc tnh kiu nh phn (thuc tnh
m d liu c gi tr 0 hoc 1).
V d: thuc tnh gc l Auttomat th ta to thnh ba thuc tnh At1, At2 v At3.
Nu s lng Attomat <=2 th trng At1=1, cn cc trng At2, At3 s = 0
Nu s lng Attomat >=3 v < 6 th At2=1, cn At1, At3 s = 0
Nu s lng Attomat >=6 th trng At3=1, cn At1, At2 s =0
C th, nu trng Attomat c gi tr l 1, 3, 4 th trng At1, At2 v At3 c gi tr
nh hnh sau:
Attomat At1 At2 At3
1 1 0 0
3 0 1 0
4 0 1 0

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

62

Tng t ta ri rc cho cc trng lu tr cc thit b khc nh: rm ca,
my tnh iu ho,
4.4. C s d liu dng nh phn
Sau khi bin i bng d liu gc chi tit tn v s lng cc thit b ca cc
phng trong c quan thnh bng d liu dng nh phn, ta c bng d liu nh
phn nh sau:

Hnh 4.4.Bng c s d liu dng nh phn

4.5. Kt qu khai thc lut kt hp bng thut ton Apriori
Vi h tr (Min Support) = 0.65, tin cy (Min Confidence) = 0.7
Tng s giao tc = 18
Tng s thuc tnh = 35
Tng s tp ph bin l 32 tp
Tng s lut l 180 lut

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

63
4.6. Kt qu khai thc c s d liu qun l thit b Trng THPT Chu Vn
An Thi Nguyn
Kt qu khai thc lut kt hp trn c s d liu thng k phng: c 100
giao tc tng ng vi thng ting 100 phng v c 43 thuc tnh.
h tr ti
thiu Minsupp
tin cy ti
thiu Min
confidence
Thi gian thc
hin
Tng s tp
ph bin
Tng s
lut
60 0,7 5 pht 29 giy 63 602
50 0,7 6 pht 12 giy 126 1932

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

64
KT LUN
C th ni rng, khai ph d liu l mt trong nhng k thut quan trng,
mang tnh thi s khng ch i vi Vit Nam m cn ca c nn CNTT th gii
hin nay. S bng n thng tin, d liu ton cu, trn mi mt ca i sng x hi
cng vi s pht trin v ng dng ngy cng rng ri ca cng ngh thng tin
trong mi lnh vc khin cho nhu cu x l nhng khi d liu khng l kt
xut ra nhng thng tin, tri thc hu ch cho ngi s dng mt cch t ng,
nhanh chng v chnh xc tr thnh nhn t quan trng hng u cho mi thnh
cng ca cc c quan, t chc v c nhn trn th gii. Khai ph d liu ang c
p dng mt cch rng ri trong nhiu lnh vc kinh doanh v i sng khc nhau:
marketing, ti chnh, ngn hng v bo him, khoa hc, y t, an ninh, internet
Rt nhiu t chc v cng ty ln trn th gii p dng k thut khai ph d liu
vo cc hot ng sn xut kinh doanh ca mnh v thu c nhng li ch to ln.
Mt trong nhng phng php quan trng v c bn nht ca k thut khai
ph d liu m ti i su tm hiu l khai ph lut kt hp. Mc tiu ca phng
php ny l pht hin v a ra cc mi lin h gia cc gi tr d liu trong c s
d liu. Mu u ra ca gii thut khai ph d liu l lut kt hp tm c.
Phng php ny c s dng rt hiu qu trong cc lnh vc nh maketing c
ch ch, phn tch quyt nh, qun l kinh doanh, phn tch gi th trng
Trong khong thi gian khng di song ti tng kt cc kin thc c
bn nht ca phng php khai ph lut kt hp. C th coi ti l mt ti liu
tham kho kh y , r rng v cc kin thc c bn trong phng php pht
hin lut kt hp. ng thi, t vic tm hiu v cc k thut khai ph d liu; cc
vn lin quan n khai ph lut kt hp nhm pht hin v a ra cc mi lin
h gia cc gi tr d liu trong CSDL ti p dng chng vo bi ton th
nghim qun l trang thit b dng ca trng THPT Chu Vn An Tnh Thi
Nguyn da trn thut ton Apriori.



S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

65
Hng pht trin ca lun vn:
Mt trong nhng cng vic quan trng ca khai ph lut kt hp l tm tt c
cc tp ph bin trong c s d liu, nn trong thi gian ti lun vn s m rng
nghin cu theo hng: ng dng thut ton song song p dng cho bi ton khai
ph lut kt hp m, l lut kt hp trn cc tp thuc tnh m.
Thut ton song song chia u c s d liu v tp ng vin cho cc b vi
x l v cc tp ng vin sau khi chia cho tng b s l l hon ton c lp vi
nhau mc ch ci thin chi ph tm lut kt hp m v thi gian ho d liu.
Tip tc hon thin h thng qun l trang thit b v c th ng dng thm
vo trong cc lnh vc khc nh o to, ngn hng, siu th.

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

66
TI LIU THAM KHO

[1] L Hoi Bc (2002), Bi ging v khm ph tri thc v khai thc d liu tm
lut kt hp theo mc ch ngi dng, i hc Quc gia TP. H Ch Minh.
[2] Phc (2002), Nghin cu v pht trin mt s thut gii, m hnh ng dng
khai thc d liu (data mining). Lun n tin s ton hc, i hc Quc gia TP.
H Ch Minh.
[3] Rakesh Agrawal, Tomasz Imielinski, and Arun Swami (1993), Mining
association rules between sets of items in large database, In proc of the ACM
SIGMOD Conference on Management of Data, Washington, D.C.
[4] Rakesh Agrawal, Ramakrishnan Srikant (1996), Mining Quantilative
Association in Large Rilation Table, In proc of the ACM SIGMOD
Conference on Management of Data, Montreal, Canada.
[5] Usama M.Fayyad, Gregory Piatetsky-Shapiro (1996), Advances in knowledge
discovery and data mining, AAAI press/the MIT press.
[6] Krzystof J.Cios, and Witold Perdrycz and Roman W.Swiniarski (1998), Data
Mining Methods for Knowledge Discovery, Kluwer Acsdemic Publicshers,
Boston/Dordrecht/London.
[7] R. Agrawal and R. Srikant (1994). Fast algorithms for mining association rules.
The International Conference on Very Large Databases, pages 487499.
[8] D.Phuc, H. Kiem (2000), Discovering the binary and fuzzy association rules
from database, In proc of Intl ConfAfss2000, Tsukuba, Japan, pp 981-986.
[9] R. Agrawal and R. Srikant (1995). Mining sequential patterns. In P. S. Yu and
A. L. P. Chen, editors, Proc. 11th Int. Conf. Data Engineering, ICDE.
[10] N. F.Ayan, A. U. Tansel, and M. E. Arkun (1999). An efficient algorithm to
update large itemsets with early pruning. In Knowledge Discovery and Data
Mining.
[11] John Wang (Idea Group Publishing) (2003). Data Mining: Opportunities and
Challenges .

S ha bi Trung tm Hc liu i hc Thi Nguyn http://www.lrc-tnu.edu.vn

67

[12] Jiawei Han and Micheline Kamber 2002, Data Mining: Concepts and
Techniques, University of Illinois, Morgan Kaufmann Publishers.
[13] N Pqaquier et al (1999), Discovering frequent closed item sets for association
rules, In proc of the 7
th
intl conference ICDT99, pp 398-410, Israel.
[14] Osmar R.Zaiane, Mohammad EI-Haij, and PaulLu (200), Fast paralled
Association Rule Mining without Cadidacy Generation, University of Alberta,
Edmonton, Alberta, Canada.

You might also like