Professional Documents
Culture Documents
MC LC...................................................................................................................................1 DANH MC CC T VIT TT............................................................................................3 DANH MC CC BNG.........................................................................................................4 DANH MC HNH V..............................................................................................................5 LI NI U............................................................................................................................6 Chng 1.....................................................................................................................................7 TNG QUAN V KHAI PH D LIU...........................................................................7 1.1 Gii thiu v khai ph d liu...........................................................................................7 1.2 Cc nhim v ca khai ph d liu...................................................................................8 1.3 Cc loi d liu c khai ph..........................................................................................9 1.4 Lch s pht trin ca Khai ph d liu............................................................................9 1.5 ng dng ca Khai ph d liu.......................................................................................9 1.6 Phn loi.........................................................................................................................11 1.7 Mt s thch thc t ra cho vic khai ph d liu.......................................................11 Kt chng................................................................................................................................11 Chng 2...................................................................................................................................12 QUY TRNH V PHNG THC THC HIN KHAI PH D LIU............................12 2.1 Quy trnh tng qut thc hin Khai ph d liu..............................................................12 2.2 Tin trnh khm ph tri thc khi i vo mt bi ton c th...........................................13 2.3 Tin x l d liu............................................................................................................14 2.3.1 Lm sch d liu......................................................................................................15 2.3.1.1 Cc gi tr thiu.................................................................................................15 2.3.1.2 D liu nhiu.....................................................................................................16 2.3.2 Tch hp v chuyn i d liu................................................................................17 2.3.2.1 Tch hp d liu................................................................................................17 2.3.2.2 Bin i d liu.................................................................................................19 2.3.3 Rt gn d liu (Data reduction).............................................................................20 2.3.3.1 Rt gn d liu dng Histogram.......................................................................21 2.3.3.2 Ly mu (Sampling)..........................................................................................22 2.3.4 Ri rc ha d liu v to lc phn cp khi nim..........................................24 2.3.4.1 Ri rc ha bng cch phn chia trc quan dng cho d liu dng s............25 2.3.4.2 To h thng phn cp khi nim cho d liu phn loi...................................26 2.3 Phng php khai ph d liu.........................................................................................26 2.4 Mt s k thut dng trong Data Mining ......................................................................28 2.4.1 Cy quyt nh.........................................................................................................28 2.4.1.1 Gii thiu chung................................................................................................28 2.4.1.2 Cc kiu cy quyt nh....................................................................................29 2.4.1.3 u im ca cy quyt nh.............................................................................31 2.4.2 Lut kt hp............................................................................................................31 2.4.2.1 Pht biu bi ton khai ph lut kt hp..........................................................32 2.4.2.2 Cc hng tip cn khai ph lut kt hp........................................................34 2.4.3 M hnh d liu a chiu........................................................................................35 2.4.3.1 nh ngha:.......................................................................................................35 2.4.3.2 Cc thao tc trn cc chiu ca MDDM..........................................................36 2.4.4 Khong cch ngn nht...........................................................................................37 2.4.5 K-Lng ging gn nht............................................................................................38 2.4.6 Phn cm.................................................................................................................39
DANH MC CC T VIT TT
AS BIDS BI Dev Studio CSDL DM DMX DSV DTS IDS/IPS KDD KTDL KDL MDDM MMPB MSE Analysis Services Intelligence Development Studio Business Intelligent Developtment C s d liu Data mining: Khai ph d liu Data Mining eXtensions Data Source View Data Transformation Services Intrusion Detection System/ Intrusion Prevention System Knowledge Discovery and Data Mining Khai thc d liu Kho d liu Dimensional Data Model: M hnh d liu a chiu Mining Model Prediction Builder Mining Structure Editor
3
SRSWOR SRSWR
Microsoft SQL Server Online Analytical Processing Simple random sample without replacement Simple random sample with replacement
DANH MC CC BNG
Bng 2.1: Tn s quan st.........................................................................................................19 Bng 3.1: D liu chi golf.......................................................................................................30 Bng 3.2: V d v mt CSDL giao dch D...........................................................................32 Bng 3.3: Tp mc thng xuyn minsup = 50%....................................................................33 Bng 3.4: Lut kt hp sinh t tp mc ph bin ABE............................................................34 Bng 3.5: D liu iu tra vic s hu cc tin nghi................................................................37 Bng 3.6: Mu d liu khch hng...........................................................................................38 Bng 3.7: Mt s v d dng k thut k-lng ging..................................................................39 Bng 3.8: Bng s kin cho bin nh phn................................................................................82 Bng 3.9: Mt bng quan h trong cc bnh nhn c m t bng cc bin nh phn......83 Bng 3.10: Bng d liu mu cha cc bin dng hn hp...................................................83
DANH MC HNH V
Hnh 2.1: Data mining mt bc trong qu trnh khm ph tri thc.....................................13 Hnh 2.2: Tng quan tin trnh khai ph d liu.......................................................................14 Hnh 2.3: Cc hnh thc tin x l d liu................................................................................15 Hnh 2.4: Mt histogram cho price s dng singleton bucket biu din mt cp price value/frequency.........................................................................................................................21 Hnh 2.5: Mt histogram c rng bng nhau cho price......................................................22 Hnh 2.6: Phng php ly mu..............................................................................................24 Hnh 2.7: Mt lc phn cp cho khi nim price..............................................................25 Hnh 2.8: T ng to h thng phn cp khi nim da trn s lng gi tr phn bit ca cc thuc tnh............................................................................................................................26 Hnh 3.1: Kt qu ca cy quyt nh......................................................................................30 Hnh 3.2: Biu din hnh hc cho m hnh d liu n-chiu (vi n=3).....................................35 Hnh 3.3: Bin i bng 2 chiu sang m hnh d liu n-chiu...............................................36 Hnh 3.4: Cc mu tin biu din thnh im trong mt khng gian bi cc thuc tnh ca chng v khong cch gia chng c th c o...................................................................38 Hnh 3.6: th da vo hai o..........................................................................................41 Hnh 3.7: th tng tc 3 chiu...........................................................................................41 Hnh 3.8: M phng kin trc mng neural.............................................................................42 Hnh 3.5: Minh ha thut ton k-means....................................................................................87
LI NI U
S pht trin ca cng ngh thng tin v vic ng dng cng ngh thng tin trong nhiu lnh vc ca i sng, kinh t x hi trong nhiu nm qua cng ng ngha vi lng d liu c cc c quan thu thp v lu tr ngy mt tch lu nhiu ln. H lu tr cc d liu ny v cho rng trong n n cha nhng gi tr nht nh no . Tuy nhin, theo thng k th ch c mt lng nh ca nhng d liu ny (khong t 5% n 10%) l lun c phn tch, s cn li h khng bit s phi lm g hoc c th lm g vi chng nhng h vn tip tc thu thp rt tn km vi ngh lo s rng s c ci g quan trng b b qua sau ny c lc cn n n. Cc phng php qun tr v khai thc c s d liu truyn thng khng p ng c k vng ny, nn ra i K thut pht hin tri thc v khai ph d liu (KDD - Knowledge Discovery and Data Mining). K thut pht hin tri thc v khai ph d liu v ang c nghin cu, ng dng trong nhiu lnh vc khc nhau cc nc trn th gii, ti Vit Nam k thut ny tng i cn mi m tuy nhin cng ang c nghin cu v dn a vo ng dng. Trong phm vi ca ti nghin cu ny, ti xin c trnh by nhng kin thc c bn v khai ph d liu v vic ng dng khai ph d liu trong h thng IDS/IPS. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
6
Chng 1
TNG QUAN V KHAI PH D LIU 1.1 Gii thiu v khai ph d liu Khai ph d liu c nh ngha l qu trnh trch xut cc thng tin c gi tr tim n bn trong lng ln d liu c lu tr trong cc c s d liu, kho d liu. C th hn l tin trnh trch lc, sn sinh nhng tri thc hoc nhng mu tim n, cha bit nhng hu ch t cc c s d liu ln. ng thi l tin trnh khi qut cc s kin ri rc trong d liu thnh cc tri thc mang tnh khi qut, tnh qui lut h tr tch cc cho cc tin trnh ra quyt nh. Hin nay, ngoi thut ng khai ph d liu, ngi ta cn dng mt s thut ng khc c ngha tng t nh: Khai ph tri thc t CSDL, trch lc d liu, phn tch d liu/mu, kho c d liu (data archaeology), no vt d liu (data dredredging). Nhiu ngi coi khai ph d liu v mt s thut ng thng dng khc l khm ph tri thc trong CSDL (Knowledge Discovery in Databases-KDD) l nh nhau. Tuy nhin trn thc t khai ph d liu ch l mt bc thit yu trong qu trnh Khm ph tri thc trong CSDL. hnh dung vn ny ta c th s dng mt v d n gin nh sau: Khai ph d liu c v nh tm mt cy kim trong ng c kh. Trong v d V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
7
Kt chng
Trong chng ny, gii thiu v: - Khi nim khai ph d liu - Nhim v ca khai ph d liu - Phn loi trong khai ph d liu - Cc lnh vc ng dng ca khai ph d liu - Mt s thch thc trong khai ph d liu V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
11
Hnh 2.1: Data mining mt bc trong qu trnh khm ph tri thc 2.2 Tin trnh khm ph tri thc khi i vo mt bi ton c th Chnh v mc tiu khm ph tr thc ngm nh trong c s d liu nn qu trnh khai ph thng phi qua mt s cc giai on cn thit. Bao gm nhng giai on chun b d liu khai ph, giai on khai ph d liu v cui cng l giai on chuyn kt qu khai ph sang nhng tri thc cho con ngi hiu c. Chi tit cc bc thc hin c m t trong bng tm tt nh sau: Giai on 1: u tin l pht trin mt s hiu bit v lnh vc ng dng v nhng tri thc tng ng. Xc nh mc ch ca tin trnh khai ph d liu t qua im ca ngi dng. Giai on 2: chun b d liu khai ph, thu thp d liu v d liu mu Giai on 3: tin x l d liu, xa cc thng tin b nhiu trong d liu, loi b s trng lp d liu v xc nh chin lc x l d liu b mt V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
13
Hnh 2.2: Tng quan tin trnh khai ph d liu Qu trnh khai ph ny c s tng tc v lp li gia hai bc bt k, nhng bc c bn ca tin trnh c minh ha trong hnh trn. Hu ht nhng cng vic trc y u tp trung bc 7 l giai on khai ph d liu. Tuy nhin, cc bc cn li quan trng khng km v nhng bc ng gp rt nhiu vo s thnh cng ca ton b tin trnh khai ph d liu. Sau y ta s tm hiu chi tit v qu trnh tin x l trong tin trnh. 2.3 Tin x l d liu D liu trong thc t thng khng sch, v khng nht qun. Cc k thut tin x l d liu c th ci thin c cht lng ca d liu, do n gip cc qu trnh khai ph d liu chnh xc v hiu qu. Tin x l d liu l mt bc quan trng trong qu trnh khm ph tri thc, bi v cht lng cc V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
14
Hnh 2.3: Cc hnh thc tin x l d liu 2.3.1 Lm sch d liu D liu trong thc t thng khng y , nhiu, v khng nht qun. Qu trnh l sch d liu s c gng in cc gi tr thiu, loi b nhiu, v sa cha s khng nht qun ca d liu. 2.3.1.1 Cc gi tr thiu Cc phng php x l gi tr thiu: 1. B qua b c gi tr thiu): Phng php ny thng c s dng khi nhn lp b thiu (thng trong tc v khai ph d liu phn lp, classification). Phng php ny rt khng hiu qu, tr khi mt b cha kh nhiu thuc tnh vi cc gi tr thiu. c bit phng php ny rt km hiu qu khi phn trm gi tr thiu trong tng thuc tnh l ng k. 2. in vo bng tay cc gi tr thiu: Cch tip cn ny tn thi gian v khng kh thi khi thc hin trn tp d liu ln vi nhiu gi tr thiu. 3. S dng mt hng s ton cc in vo cc gi tr thiu: Thay th ton b gi tr thiu ca cc thuc tnh bng mt hng s nh "Unknown" hay Nu cc gi tr thiu c thay th bi mt hng s khi chng trnh khai ph d liu s nhm n vi mt khi nim c ngha, "Unknown", khi chng c cng mt gi tr ph bin- "Unknown". Do , mc d y l mt phng php n gin, nhng n khng d dng. 4. S dng gi tr trung bnh ca thuc tnh in cc gi tr thiu. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
15
(ai A)(bi B)
i =1
N A B
(a b ) N AB
i =1 i i
N A B
Trong : - N l s cc b, ai v bi tng ng l cc gi tr ca cc thuc tnh A, B trong b th i. - , A and B tng ng l cc lch chun ca A v B - A v B l cc gi tr trung bnh ca A v B. Ch rng -1=<rA,B<=1, nu rA,B >0 th A, B tng quan dng, c ngha l A tng th B cng tng, rA,B cng ln th s tng quan cng cao. Nu rA,B < 0 th A, B tng quan m, c ngha l A tng th B gim Nu r A,B = 0 th A v B khng ph thuc. Ch rng s tng quan khng hm quan h nhn qu, c ngha l nu A v B l tng quan th khng c ngha l A gy ra B hay B gy ra A. V d, trong s phn tch v d liu nhn khu hc, chng ta c th tm ra rng s lng bnh vin v s lng xe hi trong mt vng c mi quan h tng quan. iu ny khng c ngha l mt thuc tnh l nguyn nhn ca mt thuc tnh khc. Thc t, c hai thuc tnh u lin quan n mt thuc tnh th ba l dn s. i vi cc d liu ri rc, mi quan h tng quan gia hai thuc tnh, A v B, c th c tm ra bi php th 2 (chi-bnh phng). Gi s A c c gi tr khc ring bit: a1,a2,a3....ac. B c r gi tr ring bit: b 1,b2,b3...br. Cc b d liu m t bi A v B c th c biu din bng mt bng vi c ct gi tr A v r hng gi tr B. Gi s (Ai;Bj) trong A=ai v B=bj. Khi 2 c tnh bi:
=
2 i =1 j =1 C r
Trong oij l tn s quan st ca s kin chung (A i;Bj) v eij l tn s mong i ca (Ai;Bj) c tnh bi:
e11 = count ( A = ai ) count ( B = b j ) N ,
Thng k 2 kim tra gi thuyt A v B c c lp hay khng. Php kim tra da trn mt mc ngha, vi (r-1)x(c-1) mc t do. Nu gi thuyt b loi ta kt lun rng A v B ph thuc. V d 2.2: Php phn tch tng quan ca cc thuc tnh phn loi s dng 2 . Gi s rng mt nhm 1500 ngi c iu tra. Gii tnh ca mi ngi c ch .. Mi ngi c hi xem c thch tiu thuyt h cu hay khng. V vy chng ta c hai thuc tnh: gender v prefered-reading. Tn s quan st (hay count) ca mi possible joint event c ghi li trong bng (bng 2.1) sau: V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
18
= 284.44 + 121.90 + 71.11 + 30.48 = 507.93 Mc t do cho bng 2.2 l (2-1)x(2-1)=1. Cho mt mc t do, gi tr 2 cn bc b gi thuyt 0.001 mc ngha l 10.828 ( s ny c c do tra bng, cc bng ny thng c trong cc sch thng k). T kt qu tnh ton trn, chng ta c th loi b gi thuyt prefered_reading v gender l khng ph thuc v rt ra kt lun: 2 thuc tnh trn c tng quan vi nhau. 2.3.2.2 Bin i d liu Trong bin i d liu, d liu c chuyn i hay hp nht v dng ph hp cho vic KTDL. Bin i d liu bao gm nhng vic sau y: - Lm trn, tc l loi b nhiu ra khi d liu. Cc k thut bao gm: binning, regression, v clustering. - Kt hp, trong cc php ton tm tt (summary) hay cc php ton kt hp (aggregation) c p dng cho d liu. v d, d liu bn hng hng ngy c th c tnh ton theo thng hay theo nm. Bc ny c th s dng trong vic xy dng khi d liu ca d liu nhiu mc. - Khi qut ha d liu, trong d liu mc khi nim thp hay d liu th c tng hp khi nim mc cao hn. - Chun ha, trong thuc tnh d liu c tnh t l sao cho n nm trong mt khong nh no v d nh -1 n 1; 0 n 1. Xy dng thuc tnh (hay xy dng c tnh), trong cc thuc tnh mi c xy dng v c thm vo tp thuc tnh cho tr gip cho qu trnh KTDL Cc phng php chun ha d liu: 1. Chun ha min-max: Thc hin php bin i tuyn tnh trn d liu gc. Gi s rng minA v maxA l gi tr ln nht v gi tr nh nht ca thuc tnh A. Php bnh thng ha nh x gi tr v ca A thnh gi tr v' trong khong [newminA, new maxA] bng biu thc chuyn i:
v' = v min A ( new _ max A new _ min A ) + new _ min A max A min A
19
trong A v A l gi tr trung bnh v lch chun ca thuc tnh A. Phng php chun ha ny hu ch khi gi tr ln nht v gi tr nh nht ca thuc tnh l cha bit, hay cc thuc tnh ny c cc gi tr ngoi l nh hng n phng php chun ha min-max. 3. Chun ha bng cch a v t l thp phn: Mt gi tr v c chun ha thnh gi tr v' bng php tnh sau:
v' = v 10 j
trong j l gi tr nguyn nh nht m Max(|v'|)<1. Trong vic xy dng cc thuc tnh, cc thuc tnh mi c xy dng gip ci thin tnh chnh xc v r rng ca cu trc trong d liu a chiu (highdimensional data). V d, chng ta c th xy dng thuc tnh din tch trn c s hai thuc tnh chiu di v chiu rng. Bng cch phi hp cc thuc tnh, s xy dng thuc tnh c th khm ph ra thng tin cn thiu v mi quan h gia cc thuc tnh m c th hu ch cho khm ph tri thc. 2.3.3 Rt gn d liu (Data reduction) K thut rt gn d liu c th c p dng c c s biu din rt gn ca tp d liu m nh hn nhiu v s lng, m vn gi c tnh nguyn vn ca d liu gc. Tc l, KTDL trn d liu rt gn s hiu qu hn so vi KTDL trn d liu gc. Cc giai on rt gn d liu nh sau: 1. Tng hp khi d liu, trong cc php ton tng hp c p dng trn d liu trong cu trc ca khi d liu. 2. La chn tp thuc tnh con, trong cc thuc tnh hay cc chiu khng lin quan, lin quan yu, hay d tha c th c tm v xa. 3. Rt gn chiu, trong c ch m ha c s dng rt gn kch thc tp d liu. 4. Gim s lng, trong d liu c thay th hay c nh gi bi d liu khc, nh hn v s lng nh cc m hnh tham s (ch cn lu gi cc tham s m hnh thay v phi lu gi d liu tht) hay cc phng php khng dng tham s (nonparametric method) nh clustering, ly mu (sampling), v s dng cc lc (histograms). 5. Ri rc ha v to cc phn cp khi nim, trong cc gi tr d liu th ca cc thuc tnh c thay th bi cc di hay cc mc khi nim cao hn. Ri rc ha l mt hnh thc ca numerosity reduction, n rt hu dng cho t ng to cc phn cp khi nim. Ri rc ha v to cc khi nim phn cp l V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
20
Hnh 2.4: Mt histogram cho price s dng singleton bucket biu din mt cp price value/frequency
21
Hnh 2.5: Mt histogram c rng bng nhau cho price Phn hoch cc gi tr thuc tnh: rng bng nhau: Trong mt histogram rng bng nhau, rng ca tng min gi tr bucket l mt hng s (nh rng $10 ca cc bucket) su bng nhau (hay cao bng nhau): Trong mt histogram su bng nhau, cc bucket c to ra sao cho tn sut ca tng bucket l mt hng s (c ngha l mi bucket cha ng cng mt s mu d liu k nhau) Ti u-V: Nu chng ta xt tt c cc histogram c th c ca mt s bucket cho, histogram ti u-V l histogram c khc bit thp nht. lch histogram l mt tng c trng s ca cc gi tr gc m tng bucket biu din, trong trng s ca bucket bng s gi tr trong bucket. MaxDiff: Trong mt histogram MaxDiff, xt s khc bit gia tng cp gi tr lin k (adjacent). Mt bin ca bucket c thit lp gia tng cp cho cc cp c -1 s khc bit ln nht, trong l s bucket c ngi s dng xc nh. 2.3.3.2 Ly mu (Sampling) Ly mu c th c s dng nh mt k thut rt gn d liu bi v n cho php mt tp d liu ln c biu din bng mt tp mu ngu nhin nh hn nhiu ca d liu. Gi s mt tp d liu D c N b. Cc phng php c th ly mu D rt gn d liu: Mu ngu nhin n gin khng c s thay th (SRSWOR simple random sample without replacement) vi kch thc n: o Mu ny c to ra bng cch rt n b ca N b t D (n<N), trong xc sut rt mt b bt k trong D l 1/N, c ngha l tt c cc b u c xc sut bng nhau. Mu ngu nhin n gin c s thay th (SRSWR simple random sample with replacement) vi kch thc n: o Mu ny tng t nh SRSWOR, ngoi tr vic mi ln mt b c rt ra t D, b s c ghi li v sau c thay th. iu ny c ngha l sau khi mt b c rt ra, b s c a v li D c th c rt ra li. Mu cm (cluster) V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
22
23
Hnh 2.6: Phng php ly mu 2.3.4 Ri rc ha d liu v to lc phn cp khi nim Cc k thut ri rc ha d liu c th c s dng gim s lng cc gi tr cho mt thuc tnh lin tc cho trc bng cch chia di gi tr ca thuc tnh thnh cc khong nh. Cc nhn ca nhng khong ny c th c s dng thay th cc gi tr d liu thc. S thay th s lng ln cc gi tr ca thuc tnh lin tc bng mt s nh cc khong nh c gn nhn lm gim kch thc v n gim ha d liu gc. iu ny dn n s biu din ngn gn, d dng v knowledge-level ca cc kt qu KTDL. Mt phn cp khi nim cho mt thuc tnh s cho cho nh ngha mt s ri rc ha ca thuc tnh. Cc phn cp khi nim c th c s dng rt gn d liu thu thp v thay th cc khi nim mc thp thnh cc khi nim mc cao hn. Mc d s chi tit b mt i bi cc phng php khi qut ha d liu (data generalization) nh vy nhng d liu sau khi khi qut ha c th c ngha hn v d dng gii thch hn. iu ny gp phn biu din nht qun cc kt qu khai ph d liu trong nhiu tc v khai ph d liu, y l mt yu cu ph bin. Thm vo , khai ph d liu trn tp d liu thu gn yu cu t thao tc vo ra v hiu qu hn khai ph d liu trn tp d liu ln hn, tp d liu cha khi qut ha. Do cc li ch ny, cc k thut ri rc ha d liu v V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
24
Hnh 2.7: Mt lc phn cp cho khi nim price 2.3.4.1 Ri rc ha bng cch phn chia trc quan dng cho d liu dng s Mc d cc phng php ri rc ha trn l hu ch trong vic to cc h thng phn cp cc h thng phn cp bng s, Nhiu ngi dng thch xem cc khong bng s c phn hoch thnh cng dng d c, trc quan v t nhin. V d, mc lng hng nm thng c chia thnh nhng min gi tr nh [$50,000, $60,000) hn nhng min nh [$51263.98, $860872.34) l kt qu thu c t mt s qu trnh phn tch phn cm phc tp. Lut 3-4-5 c th c s dng phn on d liu bng s thnh nhng on t nhin, tng t nhau. Trong trng hp tng qut, lut trn phn hoch min d liu thnh 3, 4 hay 5 khong c di tng t nhau, mt cch qui theo tng mc, da trn min gi tr ti nhng con s c ngha nht. Chng ta s minh ha vic s dng lut ny vi nhng v d di y. Lut c thc hin nh sau: Nu mt on cha 3, 6, 7 hay 9 gi tr phn bit k s ngha nht, th s phn hoch min thnh 3 on (3 on c rng bng nhau cho 3, 6, 9 v 3 on trong nhm 2-3-2 cho 7); Nu mt on cha 2, 4 hay 8 gi tr phn bit k s ngha nht, th s phn hoch min thnh 4 on c rng bng nhau. Nu mt on cha 1, 5 hay 10 gi tr phn bit k s ngha nht, th phn hoch min thnh 5 on rng bng nhau. Lut trn c th c p dng qui cho mi on con, to thnh mt h thng phn cp khi nim cho thuc tnh bng s cho. D liu thc t thng cha nhiu gi tr ngoi lai, m c th lm sai lch phng php ri rc ha topdown da trn gi tr min v max. V d, ti sn ca mt s t ngi c th ln hn rt nhiu so vi s khc trong cng mt tp d liu. Phng php ri rc ha da trn cc gi tr ti sn ln nht c th dn n mt h thng phn cp c dc cao. Do top-level discretization c th c thc hin da trn min gi tr d liu m t phn ln d liu cho (V d khong gia ca d liu sau V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
25
Hnh 2.8: T ng to h thng phn cp khi nim da trn s lng gi tr phn bit ca cc thuc tnh Ngoi cc phng php trn cn c mt s phng php khc ri rc ha nh: Bining, clustering, ri rc ha da trn Entropy 2.3 Phng php khai ph d liu T nhng nhim v trn chng ta thy rng vic khai ph d liu khng ch n gin l s dng duy nht mt k thut no . Bt c phng php no h tr cho vic tm kim thng tin tt th s c s dng. Ty thuc vo cc nhim v khc nhau m cc phng php c th c chn, mi phng php c im mnh v nhng mt hn ch ring. Chng ta c th phn loi nhng phng php khai ph d liu theo cc nhm sau: V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
26
Bin ph thuc (dependant variable) y l bin m chng ta cn tm hiu, phn loi hay tng qut ha. x1 , x2 , x3 ... l cc bin s gip ta thc hin cng vic . 2.4.1.2 Cc kiu cy quyt nh Cy quyt nh cn c hai tn khc: Cy hi quy (Regression tree): c lng cc hm gi c gi tr l s thc thay v c s dng cho cc nhim v phn loi. (v d: c tnh gi mt ngi nh hoc khong thi gian mt bnh nhn nm vin) Cy phn loi (Classification tree): nu y l mt bin phn loi nh: gii tnh (nam hay n), kt qu ca mt trn u (thng hay thua). V d: Ta s dng mt v d gii thch v cy quyt nh: David l qun l ca mt cu lc b nh golf ni ting. Anh ta ang c rc ri chuyn cc thnh vin n hay khng n. C ngy ai cng mun chi golf nhng s nhn vin cu lc b li khng phc v. C hm, khng hiu v l do g m chng ai n chi, v cu lc b li tha nhn vin. Mc tiu ca David l ti u ha s nhn vin phc v mi ngy bng cch da theo thng tin d bo thi tit on xem khi no ngi ta s n chi golf. thc hin iu , anh cn hiu c ti sao khch hng quyt nh chi v tm hiu xem c cch gii thch no cho vic hay khng. Vy l trong hai tun, anh ta thu thp thng tin v: Tri (outlook) (nng (sunny), nhiu my (clouded) hoc ma (raining)). Nhit (temperature) bng F. m (humidity). C gi mnh (windy) hay khng. V tt nhin l s ngi n chi golf vo hm . David thu c mt b d liu gm 14 dng v 5 ct. D liu chi golf Cc bin c lp Quang cnh Nhit m Gi Chi V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
29
Hnh 3.1: Kt qu ca cy quyt nh Cy quyt nh l mt m hnh d liu m ha phn b ca nhn lp (cng l y) theo cc thuc tnh dng d on. y l mt th c hng phi chu trnh di dng mt cy. Nt gc (nt nm trn nh) i din cho ton b d liu. Thut ton cy phn loi pht hin ra rng cch tt nht gii thch bin ph thuc, play (chi), l s dng bin Outlook. Phn loi theo cc gi tr ca bin Outlook, ta c ba nhm khc nhau: Nhm ngi chi golf khi tri nng, nhm chi khi tri nhiu my, v nhm chi khi tri ma. Kt lun th nht: nu tri nhiu my, ngi ta lun lun chi golf. V c mt s ngi ham m n mc chi golf c khi tri ma. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
30
Hnh 3.2: Biu din hnh hc cho m hnh d liu n-chiu (vi n=3) Mi min nh ngha mt chiu, do chng ta c th bin i mt bng hai chiu trong mt c s d liu (CSDL) sang m hnh d liu n-chiu:
35
Hnh 3.3: Bin i bng 2 chiu sang m hnh d liu n-chiu 2.4.3.2 Cc thao tc trn cc chiu ca MDDM Chng ta c th thao tc trn mt, hai hay mt s chiu ca MDDM bng cch c nh cc chiu cn li vi nhng gi tr c th. iu ny c th c xem nh mt php chiu c tham s trn mt s chiu ca MDDM. ' ' ' D3 , , xn Dn D2 , x3 V d: gi s chng ta c n-1 gi tr xc nh x2 v mt nh x g nh ngha nh sau: g(X1) = f(X1, X2 = X2, , Xn = Xn), vi x1 D1 . Th g(x1) s l 1 bng d liu 1 chiu (y1,y2, yk). y k = Card (D1). Tng t, nu chng ta c nh n-2 chiu D 1, D2, , Dn vi nhng gi tr c th
" x D3 , x 4 D4 ,
" 3
th nh x trn 2 chiu D1 v D2:
x Dn
" n
g(x1,x2) = f(x1, x2,x3 = x3, , xn = xn) vi x1 D1 , x2 D2 s cho chng ta mt bng d liu hai chiu k dng ng vi k gi tr ca min D1 v l ct tng ng vi l gi tr ca min D2. Tng t nh vy, chng ta c th nh ngha php trch m chiu t MDDM f n-chiu ban u. V cc bng d liu c trch l cc gi tr thuc khng gian s thc R nn chng ta c th p dng cc php tnh tng trung bnh cng min, max phng sai v lch chun cho cc gi tr ca cc trn 1 ct (khi cho x 1 bin thin trong min D1 v c nh cc min khc ti mt gi tr c th), hoc trn cc ca mt dng (khi cho x2 bin thin trong min D2 v c nh cc min khc ti mt gi tr c th) hoc cho cc gi tr ca tt c cc trong bng ny v kt qu tnh ton cng s l mt s thc. Kt qu ca cc php ton trn u l cc s thc. Vn dng m hnh d liu n-chiu cho mt v d n gin v iu tra i sng dn c. Gi s chng ta c 2 min nh ngha D 1 = { 101, 102, 103, 104, 105, 106} ng vi danh sch cc m s h iu tra v D 2 = {tivi, t lnh, xe my, my git, iu ha nhit } ng vi vic s hu cc tin nghi sinh hot. Nu nh ngha nh x f: D1 x D2 n th chng ta c bng theo di v V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
36
S lng mc cn xt l 2. Khi (tivi) = 1; (t lnh) = 1; (xe my) = 0; (my git) = 0; (iu ha nhit ) = 0. (D2) = (1,1,0,0,0). nh ngha php chiu g trn min gi tr D1 nh sau:
y c l s mc c xt (c = 2). Khi g(x1) l mt bng mt chiu: g(x1) = {0,1,0,0,1,1}. S lng gi tr 1 trong vector g(x 1) phn nh s h gia nh c ng thi c 2 tin nghi sinh hot l ti vi v t lnh. Bng cc to cc nh x (MDDM) mi, cc php ton s hc cng nh cc hm tch hp trn cc nh x , chng ta c th t c cc phn tch th v trn cc bng ca CSDL. M hnh d liu n-chiu c s dng rt thch hp cho vic phn tch d liu thng k. Cng c phn tch d liu trc tuyn OLAP ca Microsoft pht trin da trn m hnh d liu ny. 2.4.4 Khong cch ngn nht y l phng php xem cc mu tin nh l nhng im trong khng gian d liu a chiu. p dng tng ny c th xc nh khong cch gia hai mu tin trong khng gian d liu nh sau: cc mu tin c lin h vi nhau th rt gn nhau. Cc mu tin xa nhau th c t im chung. C s d liu mu cha V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
37
Hnh 3.4: Cc mu tin biu din thnh im trong mt khng gian bi cc thuc tnh ca chng v khong cch gia chng c th c o V d: Tui phm vi: 1 100, trong khi thu nhp khong t 0 100.000 dollar mi thng. Nu dng d liu ny m khng hiu chnh cho ng th thu nhp s l mt thuc tnh d phn bit hn rt nhiu so vi tui v y l iu m chng ta khng mong mun. V vy chia thu nhp cho 1000 t ti mt n v o ln nh l tui. Lm tng t cho thuc tnh tn dng. Nu o tt c thuc tnh cng mt o, s c mt o khong cch ng tin cy o cc mu tin khc nhau. Trong v d s dng o Enclidean, khong cch gia khch hng 1 v khch hng 2 l 15. 2.4.5 K-Lng ging gn nht Khi thng dch cc mu tin thnh cc im trong mt khng gian d liu nhiu chiu, chng ta c th nh ngha khi nim ca lng ging: Cc mu tin gn nhau l lng ging ca nhau Gi s ta mun d on thi ca mt tp khch hng t mt c s d liu vi nhng mu tin m t nhng khch hng ny. Gi thuyt c s i hi lm mt d n l nhng khch hng cng loi s c cng thi . Trong thut ng n d ca khng gian d liu a chiu, mt kiu ch l mt vng trong khng gian d liu ny. Mt khc, cc mu tin cng kiu s gn nhau trong khng gian d liu: chng s l lng ging ca nhau. Da vo hiu bit ny, pht trin mt thut ton mnh nhng rt n gin - thut ton k-lng ging gn nht. L thuyt c s ca k-lng ging gn nht l lm nh lng ging ca bn lm. Nu mun d on thi ca mt cc nhn c th, bt u nhn vo thi ca mi ngi gn gi vi anh ta trong khng gian d liu. Tnh tr trung bnh v thi ca 10 ngi ny, v tr trung bnh ny s l c s d on cho c nhn V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
38
Bng 3.7: Mt s v d dng k thut k-lng ging 2.4.6 Phn cm Gom cm d liu l hnh thc hc khng gim st trong cc mu hc cha c gn nhn. Mc ch ca gom cm d liu l tim nhng mu i din hoc gom d liu tng t nhau (theo mt chun nh gi no ) thnh nhng cm. Cc im d liu nm trong cc cm khc nhau c tng t thp hn cc im d liu nm trong mt cm. Phn tch cm c nhiu ng dng rng ri, bao gm nghin cu th trng, nhn dng mu, phn tch d liu v x l nh. Trong kinh doanh, phn tch cm c th gip cc nh marketing khm ph s khc nhau gia cc nhm khch hng da trn thng tin khc hng v cc c trng ca cc nhm khch hng da trn cc mu mua hng. Trong sinh hc, n c th c s dng phn loi thc vt v ng vt, cc mu gen vi cc chc nng tng t nhau. Phn tch cm cn c th phn loi t theo cng nng hoc thc t s dng c chnh sch qui hoch ph hp, phn loi cc ti liu trn Web. Cc yu cu c bn ca phn tch cm trong KTDL: C kh nng lm vic hiu qu vi lng d liu ln: Phn tch cm trn mt mu ca d liu ln c th dn n cc kt qu thin lch. Cn phi c cc thut ton phn cm trn CSDL ln. C kh nng x l cc dng d liu khc nhau: Nhiu thut ton c thit k x l d liu bng s. Tuy nhin, cc ng dng c th yu cu phn V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
39
40
Hnh 3.6: th da vo hai o Trong v d hnh trn, chng ta to th da vo hai o: thu nhp v tui. Ta thy rng nhng ngi tui trung bnh c thu nhp thp s c khuynh hng c cc tp ch m nhc. Mt phng php khm ph tp d liu tt hn rt nhiu l thng qua mi trng tng tc 3 chiu v hnh 3.7 minh ha kh nng ny.
Hnh 3.7: th tng tc 3 chiu 2.4.8 Mng Neural 2.4.8.1 Tng quan Mng neural nhn to (Artificial Neural Network - ANN) l mt m hnh x l thng tin da trn c ch hot ng ca h thng thn kinh sinh hc, nh no b. Thnh phn chnh yu ca m hnh ny l cu trc c bit ca h thng ny. N tp hp mt s lng ln cc phn t x l kt hp ni ti (c gi l cc neuron) hot ng hp nht gii quyt cc bi ton c th. Mt ANN s c cu hnh cho mt ng dng c th no , v d nh nhn dng m hnh hoc phn loi d liu thng qua qu trnh hc. Vic hc trong h thng nhm mc ch iu chnh cc kt ni thuc k tip hp c phn chia trong t bo m c sn gia cc neuron. Neuron nhn to u tin c to ra vo nm 1943 bi nh nghin cu neuron hc Warren McCulloch v nh logic hc Walter Pits. Nhng k thut thi khng cho php neuron pht trin c cc th mnh ca n. Mng neuron ny nay c nhiu ci tin cng nh p ng c cc yu cu t ra ca cc bi ton, mt s u im ca mng neuron ngy nay so vi thi trc l:
41
Ngy nay mng Nron c th gii quyt nhiu vn phc tp i vi con ngi, p dng trong nhiu lnh vc nh nhn dng, nh dng, phn loi, x l tn hiu, hnh nh v.v Dng k thut mng Neural c th phn tch mt c s d liu nh hnh 3.8.
Hnh 3.8: M phng kin trc mng neural Hnh trn m t mt kin trc mng Neural n gin dng biu din vic phn tch c s d liu tip th. Trong thuc tnh c chia thnh ba lp V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
42
44
M ho cc bin
nh gi thch nghi
Chn lc
Lai ghp
t bin Khng
Kt thc
Cu trc thut gii di truyn tng qut Bt u t =0; Khi to P(t) Tnh thch nghi cho cc c th thuc P(t); Khi (iu kin dng cha tha) lp t = t + 1; Chn lc P(t) Lai P(t) t bin P(t) Ht lp Kt thc
45
46
ng dng k thut khai ph d liu trong h thng IDS Chng 3 NG DNG K THUT KHAI PH D LIU TRONG H THNG IDS
3.1 H thng IDS 3.1.1 Gii thiu Ngy nay, s pht trin lin tc ca cng ngh my tnh l khng th ph nhn. C rt nhiu cng ty thit lp cc cng giao dch Internet v cc khch hng ngy cng quan tm hn n vic mua bn trn mng, cc thng tin thu thp c pht trin nhanh chng. Mng Internet tr thnh mt cng c thng dng cho vic giao tip. Cng vi s pht trin ca cng ngh th chng ta li phi i mt vi s cn thit phi tng cng cng tc an ninh. Tnh gn gi, m rng v s phc tp ca Internet lm cho s cn thit ca an ninh h thng thng tin cng tr nn cp thit hn bao gi ht. Vic kt ni h thng mng thng tin vo cc h thng mng nh Internet v cc h thng in thoi cng cng lm tng thm kh nng tim n ri ro i vi h thng. n ny c tham vng khm ph v kim tra cc vn c th c i vi mt H thng pht hin xm nhp (IDS), mt phn rt quan trng ca an ninh my tnh ni ring v an ninh mng ni chung. Mt IDS t bn thn n khng th ngn cn cc security brake, nhng n pht hin cc mi him ha bng cch kim sot cc hot ng khng mong mun. Mc tiu ca n ny, kt hp vi kinh nghim thc t v cc tham kho trn mng, l nhm xy dng mt h thng IDS c kh nng hc cc hnh vi tn cng v c th xc nh c cc cuc tn cng mi m khng cn phi cp nht li h thng. iu ny l kh quan. Mt h thng uyn chuyn nh vy s khng cn thit phi c mt c s d liu cp nht th cng ca cc du hiu tn cng, bn cnh n cn c th xc nh cc cuc tn cng mi da trn cc mu hc v khng b ph thuc vo cc lut lc ca hng th ba. 3.1.2 H thng pht hin xm nhp - IDS 3.1.2.1 IDS l g? Khi bn t mt ng h bo ng trn nhng cnh ca v trn nhng ca s trong nh ca bn, ging nh vic bn ang ci t mt h thng pht hin xm nhp (IDS) trong nh bn vy. H thng pht hin xm nhp c dng bo v mng my tnh ca bn iu hnh trong mt kiu n gin. Mt IDS l mt phn mm v phn cng mt cch hp l m nhn ra nhng mi nguy hi c th tn cng chng li mng ca bn. Chng pht hin nhng hot ng xm phm m xm nhp vo mng ca bn. Bn c th xc nh nhng hot ng V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
47
3.1.2.2 Vai tr, chc nng ca IDS * Pht hin cc nguy c tn cng v truy nhp tri php - y l vai tr chnh ca mt h thng pht hin xm nhp IDS, n c nhim v xc nh nhng tn cng v truy nhp tri php vo h thng mng bn trong. - H thng IDS c kh nng h tr pht hin cc nguy c an ninh e da mng m cc h thng khc (nh bc tng la) khng c, kt hp vi h thng ngn chn xm nhp IPS gip cho h thng chn ng, hn ch cc cuc tn cng, xm nhp t bn ngoi. * Tng kh nng hiu bit v nhng g ang hot ng trn mng IDS cung cp kh nng gim st xm nhp v kh nng m t an ninh cung cp kin thc tng hp v nhng g ang chy trn mng t gc ng dng cng nh gc mng cng vi kh nng lin kt vi phn tch, iu tra an ninh nhm a ra cc thng tin v h thng nh gip ngi qun tr nm bt v hiu r nhng g ang din ra trn mng. * Kh nng cnh bo v h tr ngn chn tn cng - IDS c th hot ng trong cc ch lm vic ca mt thit b gim st th ng (sniffer mode) h tr cho cc thit b gim st ch ng hay nh l mt thit b ngn chn ch ng (kh nng loi b lu lng kh nghi). - IDS h tr cho cc h thng an ninh a ra cc quyt nh v lu lng da trn a ch IP hoc cng cng nh c tnh ca tn cng. - V d: Nh mu tn cng hoc bt thng v giao thc hoc lu lng tng tc n t nhng my ch khng hp l. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
48
3.1.2.4 Cu trc v hot ng bn trong ca h thng IDS: H thng pht hin xm nhp bao gm 3 modul chnh: - Modul thu thp thng tin, d liu - Modul phn tch, pht hin tn cng - Modul phn ng
49
* Modul thu thp thng tin, d liu: - Modul ny c nhim v thu thp cc gi tin trn mng em phn tch. - Vn t ra trong thc t l chng ta cn trin khai h thng pht hin xm nhp IDS v tr no trong m hnh mng ca chng ta. Thng thng chng ta s t IDS nhng ni m chng ta cn gim st. - C hai m hnh chnh thu thp d liu l : + M hnh ngoi lung. + M hnh trong lung. M hnh thu thp d liu ngoi lung - Trong m hnh ngoi lung khng can thip trc tip vo lung d liu. Lung d liu vo ra h thng mng s c sao mt bn v c chuyn ti modul thu thp d liu . - Theo cch tip cn ny h thng pht hin xm nhp IDS khng lm nh hng ti tc lu thng ca mng.
M hnh thu thp d liu trong lung V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
50
y l modul quan trng nht n c nhim v pht hin cc tn cng. Modul ny c chia thnh cc giai on: Tin x l, phn tch, cnh bo. Tin x l: Tp hp d liu, ti nh dng gi tin. - D liu c sp xp theo tng phn loi, phn lp. - Xc nh nh dng ca ca d liu a vo (chng s c chia nh theo tng phn loi). - Ngoi ra, n c th ti nh dng gi tin (defragment), sp xp theo chui. Phn tch: - Pht hin s lm dng (Misuse detection models): da trn mu, u im chnh xc. + Phn tch cc hot ng ca h thng, tm kim cc s kin ging vi cc mu tn cng bit trc. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
51
NIDS (Network Intrusion Detection System): t ti nhng im quan trng ca h thng mng, pht hin xm nhp cho khu vc
* So snh gia h thng HIDS v NIDS: NIDS HIDS p dng trong phm vi rng (theo di p dng trong phm vi mt Host ton b hot ng ca mng) V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
53
55
Hnh 2: Mu d liu TCP DUMP 3.2.1.2 X l d liu kim ton th v xy dng cc thuc tnh Cc d liu kim ton c thu thp t cc sensor mng hoc t mt ngun no dng th v nh dng nh phn. Trc khi s dng chng chng ta cn x l chng v cn ly c cc lut t chng. Ni dung hang u v c bn y l chng ta phi xy dng mt c s d liu t d liu kim ton v c mt s hiu bit ban u v cc lut. Vi s tr gip ca hai iu ny v vi thut ton lut kt hp chng ta s c c cc tp lut mi v c tn cng c th. Sau chng ta c th ng dng nhng lut ny cho cc s kin sp xy ra pht hin cc tn cng mi cha c bit. Trc khi ng dng mt lut khai ph d liu no chng ta cn tin x l d liu kim ton th dng nh phn thu c t cc sensor. Vic ny c thc hin bi TCPDUMP hay BSM. tin x l nhng d liu kim ton th ny nhm lm vic trng i hc Columbia s dng BAM (Basic Auditing Model) thay v BSM (Basic Security Model), ci m c chnh h to ra. Tin x l c ngha l u vo l cc d liu kim ton thi v u ra s l nhng d liu kim ton nhng dng c t chc vi cc thuc tnh nh IP ngun, IP ch, cng ngun, cng ch, giao thc (TCP, UDP ...), thi gian v khong thi gian tn ti. Bc tip theo l p dng mt s thut ton khai ph d liu tin x l d liu. Nhng thut ton ny nh l thut ton lut phn nhm chung, lut phn nhm, lut kt hp v thut ton frequent episodes. Mt vi nghin cu tp trung vo mt lut c th, mt s tp trung vo mt kt hp ca hai hay ba lut hay mt s tp trung vo mt vi lut c ci tin nh l lut to ra cc bt V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
56
Figure 4: Example RIPPER Rules from Telnet Records ([LS00] page: 233) y, chng ta thy rng qu tht RIPPER chn cc gi tr thuc tnh duy nht xc nh cc xm nhp. Nhng lut ny trc tin c th c xem xt k v chnh sa bi cc chuyn gia an ton, sau l c kt hp vo trong h thng pht hin s lm dng. S chnh xc ca mt mt hnh phn nhm ph thuc trc tip vo tp cc thuc tnh c cung cp trong qu trnh hun luyn d liu. V d nu cc thuc tnh hot, compromised and root shell c chuyn dch t cc bn ghi trong bng trn, RIPPER s c th thc hin cc lut chnh V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
57
Cc thut ton thc hin sau y. i vi mi im x i P, chng ta tnh d(x;xi). Nu d(x; xi) <dmin , chng ta c th m bo rng xi l im gn x hn sau tt c cc im trong cc cm ca C. Trong trng hp ny, chng ta loi b xi khi P v thm n vo K. Nu chng ta khng th m bo vic ny cho bt k phn t ca P (bao gm c cc trng hp m nu P l trng rng), sau chng ta "m" cm gn nht bng cch thm tt c cc im vo P, v b cm khi C. Ch rng khi chng ta loi b cc cm khi C, dmin s tng. Sau khi K c k phn t, chng ta kt thc. Hu ht cc tnh ton c chi ph kim tra khong cch gia cc im trong D ti trung tm ca cm. iu ny c hiu qu ng k hn vic tnh khong cch gia cc cp im ca tt c cc im. S la chn ca chiu rng w hin khng nh hng n kt qu k-NN, nhng thay v ch nh hng n cc kt qu tnh ton. Bng trc gic, chng ta mun chn mt w ci m chia d liu vo cc cm c kch thc hp l. 4. Phn cm Mc d c cp n, nhng phn cm hun luyn trung tm pht hin bt thng khng gim st. iu ny l bi v phn cm tm kim cc bt V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
59
Figure 8: V d v cc chui thng xuyn c hnh ng xen vo hoc nhiu Mc ch chnh trong bo co ny l nhn dng cc chui cc cnh bo c gy ra bi cc hnh ng bnh thng. Cc on thng xuyn l cc chui cnh bo thng xy ra. Nhng on thng xuyn ny l rt quan trng bi v hai im sau: Mt chui ph bin ca cc cnh bo khng th l mt xm nhp. Bi v nhng k tn cng s khng th cng mt th lp i lp li nu khng chng b pht hin. Cc hnh ng bnh thng c thc hin thng xuyn hn v mt on thng xuyn l kt qu ca mt hnh vi bnh thng. Vic phn tch cc chui thng xuyn v ti thiu chng t danh sch cc tn cng c th s gim thiu ti a nht trong dng cc cnh bo li bi v lun c nhiu hnh ng bnh thng hn c hnh ng nguy him. Trong thc nghim ca h, h phn tch trn mt triu cc cnh bo xm nhp thu c t 7 cm bin trong mt mng. Khong thi gian thc nghim ca h l hai tun. H ti cc file log vo trong mt c s d liu quan h. S c bn ca log ny ging nh Log(Event, FromIP, ToIP, time). Sau khi h s V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
62
Figure 9: Developing custom filters with data mining ([CG00] page: 2) Trong m hnh ny h s dng mt h thng pht hin xm nhp mng thng mi c (h khng cp mt h thng c th no m quan tm n h thng pht hin s lm dng c s lm vic nh th no) ci m thu thp d liu bn ghi cc kt ni t cc cm bin v thc hin cc hnh ng c bn v nh mt h thng u ra sinh ra cc cnh bo. Trc khai ph d liu cha h i vo hot ng trong h thng pht hin xm nhp v tt c h thng l h thng pht hin s lm dng da trn pht hin cc du hiu. V th n c gng kt ni vi nhng du hiu bnh thng, nu khng tm thy th n s c gn c nh l mt cnh bo. V nh th s dn n kt qu c rt nhiu cc cnh bo sai. Sau h cho cc cnh bo ny i qua cc b lc tu chnh. Trong b lc ny s dng cc lut frequent episode h c tm ra mt phn y ca mt chui bnh thng. Sau cc cnh bo c pht ra cho cc mu thng xuyn b b qua v ch c phn cc cnh bo xm nhp cn li c a qua cng c khai ph d liu (da trn thut ton Query Flocks, l s m rng thut ton khai ph lut kt hp), ni m n c th s dng k thut khai ph d liu bt k no tm kim cc mu xm nhphoc ni mt cch khc sau ton b h thng cng c th thc hin vic pht hin bt thng. Trong nghin cu ny, c bn h khng pht trin mt m hnh mi. Trng tm chnh ca h l lc cc cnh bo sai s dng thut ton c thc thi V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
63
64
Figure 11: The training phase of ADAM ([BCJ+01] page: 5) bc u tin ny, mt c s d liu ca cc tp phn t thng xuyn thng thng attack-free c mt h tr cc tiu, c to. C s d liu ny phc v nh l mt tp h s, ci m c so snh sau khi thu c cc tp d liu thng xuyn c tm thy. C s d liu h s c b tr vi cc tp phn t thng xuyn mt nh dng c th cho cc phn attack-free ca d liu. Thut ton c s dng trong bc ny c th l mt s kt hp cc thut ton khai ph d liu thng thng mc d h s dng mt thut ton tu bin cho tc tt hn. V th, trong bc u tin ny h s ti mt h s cc hnh vi bnh thng. H s ny ch yu cha cc d liu ca kt ni mng bnh thng, iu ny c ngha l h s ny cha tp gi tr hoc s kt hp ca cc gi tr IP ngun, IP ch, cng ngun, cng ch, thi gian kt ni, tem thi gian, gi tr c bnh thng. bc th hai mt ln na h s dng d liu c hun luyn, h s nhng hnh vi bnh thng v mt thut ton trc tuyn cho lut kt hp ci m u ra ca n cha cc tp phn t thng xuyn c th l cc tn cng. Cc tp phn t nguy him cng vi mt tp cc thuc tnh c trch t d liu bng mt module chn thuc tnh c s dng nh hun luyn d liu cho thnh phn phn nhm l da trn cy quyt nh. By gi hy xem xt gii thut lut kt hp trc tuyn ng lm vic nh th no. Thut ton ny c li bng mt ca s trt vi kch thc c th iu hng c. Thut ton cho ra tp phn t m nhn c s h tr mnh vi h s trong thi gian kch thc ca s c th. Chng so snh tt c cc tp d liu vi c s d liu h s, nu c s kt ni th d liu l bnh thng. Mt khc chng li mt b m ci m s theo di s h tr ca tp phn t. Nu h tr vt qua mt V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
65
Figure 12: Discovering intrusion with ADAM (Phase 2) ([BCJ+01] page: 6) Trong giai on ny thnh phn phn lp c hun luyn v c th phn loi bt k tn cng no nh l bit, cha bit hay l cnh bo sai. giai on ny cng s dng cungd mt thut ton trc tuyn ng sinh ra cc d liu ng ng v cng vi module chn thuc tnh, h s, nhng nghi ng ny c gi n phn t phn lp c hun luyn. Thnh phn phn lp sau cho u ra l kiu tn cng m d liu ph hp. Nu l mt cnh bo sai th thnh phn phn lp loi b d liu ra khi danh sch cc tn cng v khng gi nhng d liu ny ti nhn vin qun l h thng Do , nh mt kt lun chng ta c th ni rng phn ny cho thy mt cch hiu qu s dng k thut khai ph ti thi im . Nhc im chnh ca phng php ny l h ch s dng cc lut kt hp v bi v kt qu ca thnh phn phn lp ca h sinh ra nhiu lut, trong s c nhiu lut b tha. H khng c bt k k thut chng li nhng lut d tha v khng lin quan . V d gi s mt lut l (A,B) C c ngha l A v B xy ra th C s xy ra. Phi tha nhn rng nu B xy ra th C s xy ra. Nhng thut ton ny s tnh B C cng nh mt lut khc, c ngha l thut ton ny sinh ra cc lut m rng khng cn thit. Nhng sau , nhiu nghin cu c thc hin theo phng php ny v nhiu nghin cu gii thiu mt lot cc phng php (nh interestingness) vo trong nhng xem xt ca h v ci thin m hnh ny. 2. Mt Framework v vic xy dng cc thuc tnh v c m hnh cho h thng pht hin xm nhp (MADAM ID): MADAMID l mt IDS ni bt trong lnh vc ny. Trong chng ny mc ch ca h l pht trin mt phng php c h thng v t ng ho hn xy dng IDS. H pht trin mt lot cc cng c ci m c th p dng V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
66
Figure 13: MADAMID workflow ([LS00] page: 231) bc u tin, d liu kim ton th c thu gom dng nh phn. Sau chng c x l v dng thng tin gi mng ASCII. V d, ban u chng l cc byte dng 0 v 1. Sau chng ta chuyn nhng gi tr v dng ASCII, chng ta c th d dng hiu c. Gi s rng s 16 bit nh phn u tin cho ta bit cng ngun, do chng ta chuyn 16 bit nh phn ny v dng hex hay thp phn chng ta c th hiu c cng ngun. Sau khi gii m tt c cc thng tin u ca gi tin chng ta khi qut ho chng vo cc bn ghi kt ni cha mt s cc c im c bn nh dch v, thi gian kt ni Cc chng trnh khai ph d liu khc nhau nh lut kt hp, lut on thng xuyn sau c p dng vo trong nhng bn ghi kt ni v nh mt u ra h c c mt s cc c im ban u v sau nhng c im ny c s dng nh l cc lut trong m hnh. V d, gi s trong cc bn ghi kt ni chng ta c c IP ngun nh nhau t nhiu gi tin c gng truy cp vo nhiu IP ch nhng vi cng mt cng. Trong cc bn ghi s kin hay cc gi tin cho tt c cc thng tin ny l ri rc v chng c nhm li trn c s mt ca s thi gian xc nh (trong thc nghim ca h l 5 pht) vo cc bn ghi bc kt ni/ phin. Sau khi p dng cc lut khai ph d liu (kt hp/ on thng xuyn/ phn nhm) vo trong cc bn ghi ny chng ta i m vic bit c mt c im ci m nu iu kin trn xy ra ngay khi , iu ny c th l mt iu bt thng hoc mt tn cng v chng ta thu c mt lut miu t tnh hung ny t giai on ny. Cui cng lut ny c p dng vo V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
67
Figure 14: Sample Training Dataset ([MC03] page: 602) y c 4 thuc tnh Port, Word1, Word2 v Word3. Bc 1: trong bc ny chng ta sinh ra tt c cc lut v cc quan h c th t cc mu. Thut ton nh sau:
68
Figure 15: LERAD Algorithm ([MC03] page: 602) dng u tin ca thut ton, chng ta chn ngu nhin hai mu S 1 v S2 t cc mu S. S0, S1 =80, GET, /, HTTP/1.0} v S 2 = {80, GET, /index.html, HTTP/1.0}. Sau chng ta ni cc thuc tnh ca S 1 v S2 dng th hai. Cc thuc tnh c ni kt l (Port, Word1, Word3). Sau dng th 3 chng ta bt u mt vng lp. Vng lp t 1 ti M, gi nh trong trng hp ny M=4. By gi chng ta i vo vo lp dng tip theo. y chng ta la chn ngu nhin Word1 nh l a t danh sch cc thuc tnh A v loi b n khi A. Do a=Word1 v A={Port, Word3}. Vi ln u tin m=1 v chng ta vo trong m t if v to mt lut r1: Word1=GET. Chng ta thm lut ny vo tp lut. Ln th hai m=2 v tp thuc tnh A khng rng. Do chng ta s i vo trong vng lp. Ln ny chng ta loi b ngu nhin thuc tnh khc Port nh a. V th by gi a=Port, A={Word3}. Ln ny chng ta s i n phn else khi m khng bng 1. S1[Port] = 80 v chng ta them n vo nh l v trc ca lut th hai r2: if Port = 80 then Word1 = GET. Chng ta them lut ny vo tp lut. Ln th ba m < 4 v A khng rng. Chng ta chn ngu nhin ch mt thuc tnh bn tri Word3 v loi n ra khi A. By gi a=Word3 v A={ }. Sau chng ta ti phn else. S1[Word3] = HTTP/1.0. Chng ta thm n nh phn trc ca r3. r3: if Port = 80 and Word3 = HTTP/1.0 then Word1 = GET. Chng ta thm lut ny vo tp lut. Ln th m=4 v A cng bng rng. Do ta thot khi vng lp. Phn ny ch cho ta thy c thut ton sinh cc lut nh th no v ton b tin trnh s tip tc cho n khi sinh ra tt c cc lut. V cui cng tp lut ca ta l: R={ r1: Word1 = GET, r2: if Port = 80 then Word1 = GET r3: if Port = 80 and Word3 = HTTP/1.0 then Word1 = GET } Bc 2: trong bc ny sp xp nhng lut theo th t gim dn v loi b cc lut d tha. sp xp cc lut ny, chng ta s dng mt t l n/r, V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
69
Figure 16: LERAD algorithm (Part 2) ([MC03] page: 602) V d, trong trng hp ca chng ta, sau khi hun luyn trn S v sp xp theo n/r th nhng lut ny s l: r2: if Port = 80 then Word1 = GET (n/r = 2/1) r3: if Port = 80 and Word3 = HTTP/1.0 then Word1 = GET (n/r = 2/1) r1: Word1 = GET or HELO (n/r = 3/2) gii thch r chng ta hy xem n v r ca r3 c chn nh th no. S cc mu c word1 = GET hay HELO l 3 v gi tr cho php ca c hai l 2. Hy xem mt v d khc, vi r2, s cc mu m lut c ni l 2 (dng th nht v dng th hai ca bng) v gi tr cho php y ch l GET, do r = 1. Gi tr tu y ca r2v r3 l nh nhau. Loi lut d tha: r2 nh du hai gi tr GET trong S. r3 s nh du hai gi tr nh vy v khng c gi tr mi, v th chng ta s loi n. R1 nh du HELO trong mu th ba them vo cc gi tr c nh du t trc, do chng ta gi li lut ny. V vy chng ta c th thy rng rt nhiu thut ton hin ny c s dng ci tin thut ton ADAM c gii thiu trong nghin cu gn y. Tt c mi ngi u cho rng ADAM nh l mt tng v h ang tp trung vo vic lm th no ci tin n. 2. Pht hin xm nhp da trn Entropy: Trong nghin cu ny cc tc gi s trnh by mt cch s lc hai m hnh khai ph d liu da trn ADAM v khai ph d liu da trn Entropy. Sau so snh hai h thng v ch cho ta thy c u im ca h thng da trn Entropy hn hn h thng ADAM. Mt phng php pht hin xm nhp in hnh trn ADAM nh sau: 1) Trc tin xy dng mt h s ca cc hnh vi bnh thng hay cc hnh vi v hi ca h thng my tnh v mng. 2) Sau nhng bt thng xa ri nhng hnh vi bnh thng ny c xem nh l cc xm nhp tin nng. Thut ton ADAM c s dng trong bc 1 khai thc cc lut kt hp t c s d liu. N tm tt c cc lut kt hp c h tr ln hn h tr cc tiu m ngi dng ch nh. Do thut ton ADAM tm cc chui s kin in hnh v ph bin nh l h s h thng. Mc d y l thut ton c nghin cu su v c s dng nhiu nht nhng n cng i hi vic la chn rt cn thn tham s h tr cc tiu. V d chng ta c mt c s d liu v c tn ti lut A c 100 cc phn t V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
70
Figure 17: Workflow of Entropy based Intrusion Detection ([Yo03] page: 842)
71
Figure 18: MINDS System ([ELK+04] page: 4) u vo ca MINDS l lung d liu mng phin bn 5 thu c bng cc cng c lung (chi tit c th xem ti www.splintered.net/sw/flow-tools) thay cho d liu tcpDump. Flowtools ch bt cc thng tin tiu gi tin, khng bt ni dung gi tin. Cng ging nh d liu tcpdump thng tin u cha cc gi tr IP ngun, IP ch, cng ngun, cng ch, tem thi gian, c, thi gian kt niH s dng ca s thi gian l 10 pht. Tt c cc d liu trong mng internet c i qua nh cc gi. Tt c cc gi tin ny c cc thng tin u v d liu. H thng ch bt thng tin phn u ca tt c cc gi tin i qua trong 10 pht cui. Nhng d liu c lu tr v trc khi chng c chuyn ti h thng chnh, mt bc lc d liu c tin hnh loi b nhng lu thng mng c phn tch l khng interest trong phn tch. V d d liu c lc c th cha cc lu thng t cc ngun khng c cc thc. Nh trng i hc Windsor, khi mt truy cp gi yu cu ti mt cng trong khong 40000 V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
73
Figure 20: Connection-window based features ([ELK+04] page: 5) Giai on 2: pht hin tn cng bit Sau khi chng ta thu c tt c cc c im ca cc kt ni th bc tip theo l so snh nhng c im ny vi nhng bt thng bit. Nu chng ta tm thy mt s lin kt th chng ta gi trc tip n cho thnh phn V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
74
75
Figure 21: MINDS Association Analysis Module ([ELK+04] page: 14) Module ny khi qut cc kt ni mng c xp vo nhng hnh vi bt thng cao bng module pht hin bt thng. Mc ch ca vic khai ph cc mu kt hp l khm ph cc mu thng xy ra trong lp bt thng hay trong lp bnh thng. Trong bc ny h p dng lut kt hp xy dng cc tp lut cho lp bt thng v lp bnh thng. V d, xem xt cc hnh ng cho mt dch v c th c th c tm lc bng mt tp thng xuyn sau: sourceIP=X, destinationPort=Y Nu c nhiu kt ni trong tp thng xuyn c xp vo mc cao t bc trc, th nhng tp thng xuyn ny c th l mt du hiu thch hp cho vic them vo mt h thng da vo du hiu. Hay nu tp thng xuyn sau c t l thp hn v xut hin nhiu ln th chng ta c th ni n bnh thng ci m l mt hnh vi duyt Web. Protocol=TCP, destinationPort=80, NumPackets=36 H thng ca h cng tm cc mu tri ngc. tm cc mu tri ngc h s dng mt s phng thc nh t l, chnh xc, phng tin gi nh v u ho ca chnh xc v gi nh. Do cc phng thc ny module ny sp xp cc mu v nhm cc mu tng t li vi nhau v biu din chng trc thnh phn phn tch. Cc phng php v vic sp xp cc mu theo th t c m t tng quan nh sau:
76
Figure 22: Measures for ordering patterns ([ELK+04] page: 13) Xem xt mt tp cc c im xy ra c1 ln trong lp bt thng v c2 ln trong lp bnh thng. ng thi xt n1 v n2 l s cc kt ni bt thng v bnh thng trong tp d liu. Gi s rng chng ta ch quan tm n vic tm cc h s ca lp bt thng, t l c1/n1 trn c2/n2 s cho thy l th no cc mu tt c th phn bit nhng kt ni bt thng vi nhng kt ni bnh thng. T l hay chnh xc n l l khng bi v chng thng c trng cho mt s rt nh cc kt ni bt thng. Trong trng hp rng hn, mt mu him c thc hin ch mt ln trong lp bt thng v khng xut hin trong lp bnh thng s c gi tr cc i ca t l v chnh xc, v vn c th l khng quan trng. gii thch cho quan trng ca mt mu, phng php gi nh c th c s dng nh l mt phng tin thay th. Khng may, mt mu c s gi nh cai c th khng cn thit l c nhn thc ng n. Bin php F1 l mt phng tin iu ho v chnh xc v gi nh, cung cp mt s kt hp tt gia hai bin php ny. Sau trong bc cui cng tng ca tt c cc lut c biu din trc thnh phn phn tch v thnh phn ny c th nng cp hoc xy dng h s bnh thng hay c th gn nhn nhng du hiu ca mt tn cng mi. y l cch m MINDS lm vic nh mt h thng pht hin xm nhp khng gim st. TNG QUT CC NGHIN CU NIDS: H Misuse/ Da Cng ngh D liu im mnh im yu thng Anomal trn khai ph hun y cnh c s luyn bo dng c dng ADA Anomaly Khng Lut kt hp, TCPDu Nghin cu Sinh ra cc M lut phn lp mp i u lut d tha, i hi la chn cn thn minsupport V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
77
MIN DS
Anomaly Khng
LER AD
Anomaly Khng
Anomaly Khng xem Scores xt 60% d made liu gia it easy to c th cha determine cc lut qua anomalies trng D liu C th c i hi c thm vo chun b cc dinh ra bt k h b lc tu t b thng no chnh cho cm bin cc mi trng khc nhau TCPDu Khng c Chn la mp d tha minsupport cn thn D liu Cch thay i hi kim th ca Minsupport ton nh ADAM phi c phn la chn cn thn
78
ng dng k thut khai ph d liu trong h thng IDS Chng 4 XY DNG CHNG TRNH PHT HIN TN CNG DoS S DNG K THUT KHAI PH D LIU
4.1 Thut ton phn cm 4.1.1 Dn nhp D hin nay tnh trng pht tn virus tr nn ph bin, 90% doanh nghip khng nh nhng cuc tn cng t chi dch v ang l vn phin toi v thng gp nht trong cng ty. T cui nhng nm 90 ca th k trc. Hot ng ny bt ngun t khi mt s chuyn gia bo mt, trong qu trnh pht hin khim khuyt h thng trn h iu hnh Windows 98, pht hin ra rng ch cn gi mt gi d liu ping c dung lng ln cng lm t lit mt server mc tiu. Pht hin ny sau ngay lp tc c gii hacker s dng trit tiu nhng i tng m h c nh tn cng. T y, hnh thc s khai ca DoS (Denial of Service) ra i. Trong khi , dng DDoS (Distributed Denial of Service) th da vo vic gi mt lnh ping ti mt danh sch gm nhiu server (kiu ny gi l amplifier, tc l khuch i rng mc tiu), gi dng l mt gi ping a ch IP gc c tr hnh vi IP ca mc tiu nn nhn. Cc server khi tr li yu cu ping ny khi s lm lt nn nhn vi nhng phn hi (answer) gi l pong. Do phn n ny chn vic nghin cu v demo khai ph d liu trong pht hin tn cng t chi dch v v k thut c s dng y l k thut phn cm. y l k thut pht hin bt thng khng gim st. Cc thut ton pht hin bt thng khng gim st c th c thc hin trn d liu khng gn nhn, ci m d dng c c bi v n ch n gin l thu thp cc d liu km ton th t mt h thng. Trong thc t, pht hin bt thng khng gim st c nhiu li th hn hn pht hin bt thng c gim st. Cc li th chnh l chng khng yu cu mt tp d liu hon ton bnh thng hun luyn. Hn na, tp d liu cc hnh vi bnh thng ca ngi dng l v cng ln v trong qu trnh ly tp d liu sch hun luyn trong k thut pht hin c gim st th khng th m bo rng trong d liu khng c xm nhp. Trong khi tp cc hnh vi c gi l xm nhp th nh hn nhiu. Ngoi ra k thut ny cn c rt nhiu u im nh c trnh by cc phn bn trn. 4.1.2 Cc dng d liu trong phn tch cm Gi s mt tp d liu dng phn tch cm cha n i tng (cc i tng c th l con ngi, nh, ti liu...). Cc thut ton gom cm thng x l trn mt trong hai cu trc d liu sau: V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
79
(6.1) (2) Ma trn phn bit: biu din khong cch gia hai im (i tng) trong khng gian d liu gm n i tng theo p thuc tnh ta dng ma trn phn bit
0 d ( 2,1) d (3,1) d ( n,1) 0 d (3,2) d ( n,2) 0 0
(6.2) Trong d(i,j) l khong cch gia i tng i v i tng j. Trong trng hp tng qut, d(i,j) l mt s khng m v dn v 0 khi 2 i tng i v j tng t nhau. Bi v d(i,j) = d(j,i) v d(i,i) =0 nn ta c ma trn 6.2. 4.2.2.1 Bin tr khong Cc bin tr khong l o lin tc ca cc i lng tuyn tnh n gin nh trng lng, chiu cao, nhit , tui...Cc n v o nh hng rt nhiu n kt qu gom cm. V d, thay i n v o: Mt thay cho inch cho chiu cao, kg thay cho pound cho cn nng c th dn n cc cu trc cm khc nhau. Trong trng hp tng qut, biu din mt bin bi n v b s dn n mt khong ln cho bin v n nh hng ln n kt qu gom cm. trnh s ph thuc vo vic la chn cc n v o, d liu cn phi c chun ha. Cc phng php chun ha c gng a tt c cc bin mt nh hng nh nhau. iu ny rt hu ch khi chng ta cha c mt tri thc tin nghim no v d liu. Tuy nhin trong mt s ng dng, ngi s dng c th cho mt tp cc bin no nh hng nhiu hn cc bin khc. V d, khi gom cm cc ng vin cho mn bng r th chiu cao c u tin hn c. chun ha cc php o, mt s la chn l chuyn cc php o ban u thnh cc bin khng n v. i vi mt bin f c cc s o x 1f, x2f...xnf, s chun ha c th c thc hin theo cc cch sau: (1) Tnh sai s tuyt i trung bnh sf:
sf = 1 ( x1 f m f + x 2 f m f + ... + x nf m f ) n
80
mf =
1 ( x1 + x 2 + ... + x n ) n
(6.4)
Da vo cng thc (6.4) ta thy rng sai s tuyt i trung bnh cng ln th hin tng c bit cng gim. Do o c chn s nh hng n kt qu phn tich mu c bit. Cc o thng dng cho bin tr khong: (1) Khong cch Euclide: d (i, j ) = ( x x ) + ( x x ) +... + ( x x ) (6.5) Trong i = ( xi1 , xi 2 ,, xin ), j = ( xj , xj ,, xj ) l 2 i tng d liu n chiu. (2) Khong cch Manhattan: d (i, j ) = x x + x x +... + x x (6.6) C hai khong cch Euclide v Manhattan u tha cc yu cu ton hc ca m phng trnh khong cch: a. d(i,j)>=0: Khong cch phi l mt s khng m. b. d(i,j) = 0: Khong cch ca mt i tng n chnh n bng khng. c. d(i,j) =d(j,i): Khong cch l mt hm i xng. d. d(i,j) <= d(j,h) + d(h,j): Tnh cht bt ng thc tam gic. (3) Khong cch Minkowski:
2 2 2 i1 j1 i2 j2 in jn
i1 j1 i2 j2 in jn
d (i, j ) = ( xi1 x j1
+ xi 2 x j 2
+ ... + xin x jn ) p
(6.7)
+ w2 xi 2 x j 2
+ ... + wn xin x jn ) p
(6.8)
Khong cch c trng l s ci tin ca khong cch Minkowski, trong c tnh n nh hng ca tng thuc tnh n khong cch gia hai i tng. Thuc tnh c trng s w cng ln th cng nh hng nhiu n khong cch d. Vic chn trng s ty thuc vo ng dng v mc tiu c th. Mt bin nh phn l bt i xng nu c mt trng thi c ngha quan trng hn (thng c gn l 1). Lc ny thng c xu hng thin v trng thi u tin . V d trong chun on y khoa ngi ta thng u tin mt hng kt lun hn hng kia. Do nhng trng thi cha r rng (nh triu chng bnh cha r rng) th cng c th kt lun l 1 u tin cho bc chun on chuyn su hoc cch ly theo di. Mt v d ca bin nh phn bt i xng l HIV c 2 trng thi l dng tnh (1) v m tnh (0). Object j 1 0 sum V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
81
Object i
(6.9)
S khc nhau ca hai i tng da trn cc bin nh phn bt i xng (asymmetric binary dissimilarity)
d (i, j ) = r +s q +r +s
(6.10)
Chng ta c th o khong cch gia hai bin nh phn da trn khi nim tng t nhau (similarity) thay v khng tng t nhau sim(i,j) = 1 - d(i, j) (6.11) H s sim(i,j) c gi l h s jaccard. V d 6.1: S khc nhau gia cc bin nh phn. Gi s rng mt bng cc record ca cc bnh nhn (bng 3.9) cha cc thuc tnh name, gender, fever, cough, test-1, test-2, test-3 v test-4, trong name l thuc tnh nh danh, gender l mt thuc tnh i xng, v cc thuc tnh cn li l cc thuc tnh nh phn khng i xng. i vi cc gi tr ca cc thuc tnh khng i xng, cho cc gi tr Y ( yes) v P (positive) bng 1, cc gi tr N (no hay negative) bng 0. Gi s rng khong cc i tng (bnh nhn) c tnh da trn cc thuc tnh khng i xng. Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4 Jack M Y N P N N N Mary F Y N P N P N Jim M Y Y N N N N V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
82
Cc php o ch ra rng Marry v Jim c bnh khng ging nhau bi v d(Marry, Jim) l ln nht. 4.2.2.3 Cc bin phn loi (bin nh danh), bin th t, v bin t l theo khong 1. Cc bin phn loi: Bin phn loi l bin c th nhn dng nhiu hn hai trng thi. V d bin mu sc c th c cc trng thi vng lc v xanh. Cho s cc trng thi ca mt bin nh danh l M. Cc trng thi c th biu th bng cc ch ci, k hiu, hay mt tp cc s nguyn nh 1, 2, 3..., M. Ch rng cc s nguyn ny ch dng cho vic trnh by d liu v khng biu din mt gi tr nguyn c th no. Khong cch gia 2 i tng i v j theo bin phn loi c th c tnh da trn h s i xng n gin:
d (i, j ) = p m p
(6.12)
Trong m l s thuc tnh phn loi c gi tr trng khp gia hai i tng i v j, p l tng s thuc tnh phn loi. V d 6.2: S khc nhau gia cc thuc tnh phn loi. Gi s chng ta c d liu mu trong bng 3.10, Ngoi tr 2 bin object-identifier (bin nh danh) v bin test-1 l bin phn loi xem xt (chng ta s s dng test-2 v test-3 trong cc v d sau). Object Test-1 Test-2 Test-3 Identifier (categorical) (ordinal) (ratio-scaled) 1 Code-A Excellent 445 2 Code-B Fair 22 3 Code-C Good 164 4 Code-A Excellent 1,210 Bng 3.10: Bng d liu mu cha cc bin dng hn hp Ma trn phn bit cho cc i tng ca bng trn l:
0 d ( 2,1) d (3,1) d ( 4,1) 0 d (3,2) d ( 4,2) 0 d ( 4,3) 0
83
2. Bin th t: Bin th t l bin trn mt tp gi tr c xc nh quan h th t trn , v d hng xp loi huy chng vng, bc, ng. Bin th t c th ri rc hoc lin tc. Cc gi tr ca mt bin th t f c M f trng thi. Cc trng thi c sp xp nh ngha hng (rank): 1,..., Mf. Gi s rng f l mt bin t mt tp cc bin th t m t n i tng. o cho bin th t f c xy dng nh sau: (1) Gi tr ca bin f cho i tng th i l x if, v f c Mf trng thi c sp xp, biu din cc cp 1,..., Mf. Thay th xif bi cp tng ng ca n, rif {1,..., Mf}. (2) nh x hng tng bin vo on [0,1] bng cch thay th i tng i trong bin f bi:
z if = xif 1 M f 1
(6.13)
(3) Tnh phn bit theo cc phng php bit i vi bin tr khong zif. V d 6.3: S khc nhau gia cc bin th t. Gi s chng ta c d liu mu cho trong bng 6.3. Bin test-2 l bin th t. C 3 trng thi cho bin test-2 theo trt t sau: fair, good, v excellent, do Mf = 3. i vi bc 1, chng ta thay th mi gi tr ca test-2 bi rank ca n, 4 i tng ln lt c gn cho cc rank: 3, 1, 2, 3. Bc 3 chun ha rank bng cch nh x theo cng thc (6.13) ta c rank 1 --> 0 rank 2 --> 0,5 v rank 3 --> 1.0. i vi bc 3, chng ta s dng khong cch Euclide, kt qu th hin trong ma trn phn bit sau y:
0 1 0.5 0 0 0. 5 1.0 0 0. 5 0
3. Bin theo thang t l (ratio-scaled variable): Bin t l theo khong l o dng trn cc t l phi tuyn. V d: Cc i lng c biu din theo hm m chng hn: AeBt. Trong A, B l cc hng s dng v t l bin biu din thi gian. Trong a s trng hp ta khng th p dng trc tip phng php o cho cc bin tr khong cho loi bin ny v c th gy sai s ln. Phng php tt hn l tin x l d liu bng cch chuyn sang logarit yif=log(xif) sau mi p dng trc tip cho cc bin tr khong hoc th t. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
84
4. Bin c kiu hn hp CSDL c th cha c 6 loi bin nu trn. Ta c th dng cng thc c gn trng kt hp cc hiu qu ca cc bin thnh phn.
Trong ij ( f ) c tnh nh sau: - ij ( f ) =0 khi xjf hay xif khng tn ti hoc xif = xjf = 0. - ij ( f ) = 1 trong cc trng hp khc. Ngoi ra dij(f) c tnh nh sau: - i vi cc bin tr khong hay th t: d ij(f) l khong cch c chun ha. - i vi cc bin nh phn hay phn loi: + dij(f) = 0 khi xif = xjf = 0 + dij(f) = 1 trong cc trng hp khc 4.2.3 Cc phng php gom cm 4.2.3.1 Cc phng php phn hoch y l phng php phn hoch CSDL D c n i tng thnh k cm sao cho i) Mi cm cha t nht mt i tng. ii) Mi i tng thuc v mt cm duy nht. iii) k l s cm c cho trc. y l tiu chun chung ca cc phng php phn hoch truyn thng. Gn y xut hin nhiu phng php phn hoch da trn l thuyt tp m th tiu chun (ii) l khng quan trng m thay vo l mc thuc v mt cm ca mt i tng no , mc ny c gi tr t 0 n 1. Cc phng php tip cn phn hoch: K-means: Mi cm c biu din bng gi tr trung bnh ca cc i tng trong cm. K-medoids: Mi cm c biu din bng mt trong cc i tng nm gn tm ca cm. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
85
Trong , p l i tng thuc cluster Ci, mi l trng tm ca cluster Ci. 4.2.4.1 Thut ton k-means Input: + k: S cc cluster. + D: Mt tp d liu cha n i tng V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
86
Hnh 3.5: Minh ha thut ton k-means Hnh 3.5 a: Chn ngu nhin 3 i tng lm 3 trng tm ban u ca 3 cluster. Ba i tng ny c nh du +. Mi i tng cn li c phn b vo mt cluster nu i tng gn trng tm ca cluster nht. Ta c 3 cluster c khoanh vng bng cc ng gch chm nh hnh v. Hnh 3.5 b: Trng tm ca mi cluster c cp nht. cc trng tm sau khi c cp nht c nh du +. S dng cc trng tm cluster mi, phn b li cc i tng vo cc cluster da trn khong cch gn nht ca i tng vi cc trng tm cluster. Cc cluster kt qu c khoanh bng cc ng chm. Hnh 3.5 c: Lp li nh cc bc trong hnh 3.5 b. Cc cluster kt qu c khoanh bng cc ng lin nt. Lp li nh trong hnh 3.5 b, kt qu khng c s thay i. Dng thut ton. Kt qu thu c l 3 cluster c khoanh bng cc ng lin nt. V d 6.6: Cho tp im x1= {1, 3} = {x11, x12} x2 = {1.5, 3.2} = {x21, x22} x3 = {1.3, 2.8} = {x31, x32} x4 = {3, 1} = {x41, x42} Dng thut ton k-means gom cm vi k = 2. Bc khi to: V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
87
88
Quay li bc 1: - Gn cc i tng vo cc cluster: Tnh cc khong cch Euclide: 2 2 2 2 d (v1 , x1 ) = ( x11 v11 ) + ( x12 v12 ) = (1 1.15) + ( 3 2.9) = 0.180 2 2 2 2 d (v 2 , x1 ) = ( x11 v21 ) + ( x12 v22 ) = (1 2.25) + ( 3 2.1) = 1.54 Xp x1 vo cm c1. 2 2 2 2 d (v1 , x 2 ) = ( x 21 v11 ) + ( x 22 v12 ) = (1.5 1.15) + ( 3.2 2.9 ) = 0.461 2 2 2 2 d (v 2 , x 2 ) = ( x 21 v 21 ) + ( x 22 v 22 ) = (1.5 2.25) + ( 3.2 2.1) = 1.415 Xp x2 vo cm c2. 2 2 2 2 d (v1 , x3 ) = ( x31 v11 ) + ( x32 v12 ) = (1.3 1.15) + ( 2.8 2.9 ) = 0.180 2 2 2 2 d (v 2 , x3 ) = ( x31 v 21 ) + ( x32 v 22 ) = (1.3 2.25) + ( 2.8 2.1) = 1.18 Xp x3 vo cm c1. 2 2 2 2 d (v1 , x 4 ) = ( x 41 v11 ) + ( x 42 v12 ) = ( 3 1.15) + (1 2.9 ) = 2.652 2 2 2 2 d (v 2 , x 4 ) = ( x 41 v 21 ) + ( x 42 v 22 ) = ( 3 1.15) + (1 2.9 ) = 1.331 Xp x4 vo cm c2. - Cp nht li ma trn phn hoch: x1 x2 x3 x4 c1 1 1 1 0 c2 0 0 0 1 Bc 2: Cp nht li trng tm cho cc cluster
m11 x11 + m12 x21 + m13 x31 + m14 x41 1*1 + 1*1.5 + 1*1.3 + 0 * 3 = = 1.27 m11 +12 +m13 + m14 1 +1 +1 + 0 m x + m12 x22 + m13 x32 + m14 x42 1* 3 + 1* 3.2 + 1* 2.8 + 0 *1 v12 = 11 12 = =3 m11 +12 +m13 + m14 1 +1 +1 + 0 m x + m22 x21 + m23 x31 + m24 x 41 0 *1 + 0 *1.5 + 0 *1.3 + 1* 3 v21 = 21 11 = =3 m21 + 22 +m23 + m24 0 + 0 + 0 +1 m x + m22 x22 + m23 x32 + m24 x 42 0 *1 + 0 *1.5 + 0 *1.3 + 1* 3 v22 = 21 12 = =1 m21 + 22 +m23 + m24 0 + 0 + 0 +1 v11 =
Ma trn phn hoch thay i do ta quay li bc 2 v tip tc cho n khi ma trn phn hoch khng thay i. * u v nhc im ca thut ton a. u im: + Tng i nhanh. phc tp ca thut ton l O(tkn), trong : - n: S i tng trong khng gian d liu. - k: S cm cn phn hoch. - t: S ln lp (t thng kh nh so vi n). + K-means ph hp vi cc cm c dng hnh cu. V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
89
Trong p l im trong khng gian biu din mt i tng trong cluster Cj ; v oj l i tng i din ca ca Cj. Trong trng hp tng qut, thut ton lp cho n khi mi i tng i din l mt medoid thc s, c ngha l n nm trung tm ca cluster tng ng. y chnh l tng c bn ca phng php k-medois. Thut ton k-medoids Input: Tp d liu D; s cluster k. Output: k cluster. Method: (1) Chn ngu nhin k i tng Oi (i=1..k) lm trung tm (medoids) ban u ca cm. (2) Repeat (3) Gn (hoc gn li) tng i tng cn li vo cm c trung tm gn im ang xt nht. (4) Vi mi i tng trung tm - Ln lt xt cc i tng khng l trung tm (non-medoids) x. - Tnh li S khi hon i Oi bi x S c xc nh nh sau: V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
90
C ch x l C s d liu chuyn i
Hnh : Nguyn l chung ca mt tin trnh pht hin xm nhp s dung k thut phn cm 4.2.1 Tp hp d liu v tin x l 4.2.1.1 Tp hp d liu i vi h thng *NIX c th dng cng c rt thng dng TCPDUMP cho mc ch thu thp d liu v sng lc d liu qua giao tip mng. Tcpdump l mt tin trnh chy ch nn (background), n s kt xut cc thng tin cn thit (qua cc tham s dng lnh) ra tp tin. C mi thi im c n packet c tp hp v lu tr cho vic x l sau ny cng nh nhm phc v cng vic phn lp, sau tin trnh thu thp thng tin li c tip din. Khi lu cc thng tin v packet thnh tp tin trong mi chu k x l, cc thng tin cn thit s c n gin ha s dng cc tham s ca tcpdump, v d: tcpdump c 50 w dump host victim-machine i vi mi packet Ethernet nhn c: V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
91
92
93
Bin i RemoteHost v dng s nguyn bng cch thm mt bng tham chiu
Gom cm d liu theo mt ca s thi gian xc nh v lu kt qu nhm cm trong mt bng trung gian TCP_F
4.2.2 Khai ph d liu pht hin tn cng t chi dch v 4.2.2.1 Cc mu bt thng ca tn cng t chi dch v T kho st thc t kt hp vi nhng kin thc, cc chuyn gia trong lnh vc an ninh cho thy: - Cc tn cng t chi dch v ch yu tn cng qua giao thc HTTP. Cc request ca giao thc HTTP bnh thng ch l cc a ch Web (URL) nn c kch thc rt nh. Song c rt nhiu tn cng, k tn cng thng chn m lnh thc thi trn trnh duyt nn nhn (cross site scripting), v d: http://www.microsoft.com/education/? ID=MCTN&target=http://www.microsoft.com/education/? ID=MCTN&target=<script>alert(document.cookie)</script>, s dng cc script (on m) dng tp tin flash hay chn cc truy vn SQL vo URL iu ny lm cho cc request ny c kch thc ln hn bnh thng. V th theo nghin cu cho thy th cc request ca giao thc ny c kch thc >=350 byte l nhng mu bt thng c kh nng l tn cng.
94
i vi lnh vc khai ph d liu trong h thng pht hin xm nhp ny, ngoi vic hin th kt qu th cc kt qu ny s c tip tc c dng x l v d nh a ra cc cnh bo, thc hin mt s hnh ng chng li nhng mi nguy hn khai ph c ny Nhng trong phm vi ca n s khng i su vo vn ny.
97
99
Hnh 5.1 Giao din chnh Trong giao in ny, chc nng ca cc thnh phn c th nh sau: - Buttom Chn file cho php ta chn file dng .txt hay .log khai ph. File y s c khai ph theo du hiu s kt ni t mt RemoteHost trong khong thi gian (ca s thi gian) - Nt HTTP s cho php ta i vo mt giao din mi khai ph theo du hin bt thng trong giao thc HTTP cng nh mt s la chn khc - T ng s a ta ti mt giao din mi cho php thc thi vic khai ph mt cch t ng c sau mt khong thi gian tu chn - Gom d liu thc hin vic c d liu t file c chn vo bng d liu TCP ng thi chuyn i mt s thuc tnh t dng vn bn v dng s nh: tem thi gian, thi gian kt ni duy tr, v thm cc d liu cn thiu - Tin x l m ra mt giao din mi thc hin chc nng tin x l cho d liu thu c trn - Lm li cho php chn li file thc hin li vic khai ph t u - Thot cho php thot khi chng trnh Giao din Tin x l:
100
Hnh 5.2 Tin x l d liu Thi gian x l: cho php tu chn khong thi gian (ca s thi gian) x l, thi gian ny tnh bng giy. Gi tr mc nh l 60 giy Tin x l: a d liu v cc kt ni v dng ph hp vi thut ton nh chuyn RemoteHost v dng s, gom nhm theo thi gian cc kt ni ny Kt qu: hin th kt qu Tin x l trong phn Kt qu tin x l bn Khai ph: m ra giao din tin hnh khai ph tm ra cc bt thng Thot: quay v mn hnh chnh Mn hnh khai ph
101
Hnh 5.3 Giao din khai ph ph bin: tu chn ph bin ca kt ni c du hiu bt thng, chnh l ngng s kt ni dng lm tham s u vo ca thut ton khai ph, tnh theo n v s ln xut hin. trnh nhng kt qu sai ta phi chn gi tr ny mt cch thch hp, ch yu l tu vo thc nghim cng nh kinh nghim ca ngi dung. y khng dng n v ca ngng kt ni l % v s c rt nhiu trng hp tng s kt ni l v cng nh chng hn l 2 th chc chn y s cho ta mt du hiu bt thng nu ta chn ngng <=50% nhng nu chn ngng >50% th li cho kt qu sai ln khi m tng kt ni trong mt ca s thi gian no l rt ln Kt qu: him th kt qu khai ph c (cm nhng bt thng) Quay v: quay v ca s Tin x l c th thc hin li qu trnh tin x l Thot: thot khi chng trnh Ca s khai ph da trn giao thc HTTP
102
Hnh 5.4 Mn hnh khai ph d liu ca giao thc HTTP Chn file Audit: c chc nng tng t nh trong giao din chnh Chn bng trong c s d liu: cho php tin hnh khai ph trn d liu c sn trong c s d liu ca chng ta (TCP) Chn khong thi gian: Chc nng c cp trong mn hnh Tin x l HTTP: Nu c la chn th s thc thi khai ph theo du hiu bt thng trn giao thc HTTP, khi chc nng ny c chn th ta phi nhp ngng kt ni v ngng kch thc ca cc Request HTTP Tt c cc giao thc: thc thi khai ph da trn du hiu s cc request ca tt c cc giao thc. Khai ph: nt thc hin chc nng khai ph da trn d liu thu c cng vi cc thng s tu chn Thc hin li: chn v thc qu trnh khai ph Quay v: tr v mn hnh chnh Thot: thot khi chng trnh Mn hnh thc hin chc nng khai ph t ng
103
Hnh 5.5 Mn hnh t ng khai ph - Chn file d liu Audit: tng t nh bn trn - Chn khong thi gian: tng t nh trn - Ngng: ngng s kt ni - Mc thi gian hin ti: Do d liu khai ph l d liu c lp nn ta phi chn mc thi gian ban u bt u khai ph. Khi tch hp vo h thng thc th khng phi thc hin bc ny v n ly l thi gian ca h thng hin ti - T ng: thc hin khai ph t ng vi cc thng s bn trn. S c s kim tra tnh ng n ca cc thng s c nhp. - Stop: tm thi dng vic thc thi khai ph t ng, trng thi ca chng trnh s c duy tr. Khi thc hin tip n s tip tc khai ph t v tr ang ng li . - Lm li: thc hin li t u qu trnh t ng - Quay v: tr v mn hnh chnh - Thot: thot khi h thng Tc thc thi: - Ca s thi gian: 60 giy Tng s kt ni Thi gian x l (S) 18 0.046875 14 0.03125 11 0.0625 11 0.015625 V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
104
- Ta c th thy tc x l ph thuc vo nhiu yu t m yu quan trng ta quan tm y l khong thi gian gom nhm v s kt ni trong tng nhm s lm nh hng ti thi gian x l nh th no. 5.3 Kt lun Trong qu trnh hon thnh ti ny, d t c nhng kin thc nht nh, nhng ti nhn thy Khai ph d liu ni chung v khai ph d liu trong h thng IDS/IPS ni ring l mt lnh vc nghin cu rng ln, nhiu trin vng. ti trnh by c cc vn c bn v khai ph d liu: Tm quan trng ca KPDL, cc hng tip cn khai ph d liu v cc k thut khai ph d liu. S dng thut ton gom cm m c th y l phng thc kmedoids ng dng vo khai ph d liu pht hin xm nhp. Vi ti ny, chng t kh nng ng dng tr tu nhn to trong ngnh chuyn su ca khoa hc my tnh v cc ngnh khc, nht l v mng my tnh, mng Internet. ti ua ra m hnh hot ng ca mt h thng thng minh tr gip ngi dng v ngi qun tr mng nhm pht hin cc xm nhp tin n kh nng tn cng tn cng t chi dch v. Tuy nhin, do hn ch v mt thi gian v lng kin thc vn c nn phn nghin cu mi ch dng li cp demo h thng vi mt s kiu tn cng tn cng t chi dch v n gin. Khi m lng d liu thu thp v lu tr ngy cng tng, cng vi nhu cu nm bt thng tin, th nhim v t ra cho Khai ph d liu ngy cng quan trng. S p dng c vo nhiu lnh vc kinh t x hi, an ninh quc phng, an ninh mng cng l mt u th ca khai ph d liu. Vi nhng mong V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
106
107
ng dng k thut khai ph d liu trong h thng IDS TI LIU THAM KHO
[1] PGS.TS Phc (2006), Gio trnh Khai thc D liu, Trng i hc Cng ngh thng tin TP. H Ch Minh, i hc Quc gia TP. H Ch Minh. [2] Hunh Tun Anh, Bi ging DATAWAREHOUSE AND DATA MINING, TRNG I HC NHA TRANG (2008). [3] Ts.Nguyn nh Thc. Tr tu nhn to - Mng Nron Phng php v ng dng NXB Gio dc nm 2000 [4] PGS.TS Nguyn Quang Hoan. Nhp mn tr tu nhn to. Hc vin Cng ngh Bu chnh Vin thng (2007) [5] Tuyn tp Bo co Hi ngh Sinh vin Nghin cu Khoa hc ln th 6 .THUT GII DI TRUYN V NG DNG. i hc Nng 2008 [6] Ths. Phm Nguyn Anh Huy, Lun vn thc s tin hc Dng mt s thut ton khai khong d liu h tr truy xut cc a ch Internet WebServer . Trng i hc Khoa hc T nhin - i hc quc gia TPHCM (2000). [7 ] PGS.TS Phc, Lun vn tin s ton hc Nghin cu v pht trin mt s thut gii, m hnh ng dng khai thc d liu (DATA MINING) . Trng i hc Khoa hc T nhin - i hc Quc gia TPHCM (2002) [8] Nong Ye. The handbook of data mining. Arizona state University. LAWRENCE ERLBAUM ASSOCIATES(LEA), PUBLISHERS Mahwah, New Jersey London (2003). [9] Jiawei Han and Micheline Kamber, University of Illinois at UrbanaChampaign. Data Mining Concepts and Techniques 2nd. Morgan kaufmann Publishers (2006). [10] ZhaoHui Tang and Jamie MacLennan. Data Mining with SQL Server 2005. Wiley Publishing, Inc., Indianapolis, Indiana (2005). [11] D. Barbara, J. Cou to, S. Jajodia, v N.Wu. Special sectionon data mining for intrusion detection and threat analysis: Adam: a testbed for exploring the use of data mining in intrusion detection . ACM SIGMOD Record, vol. 30, page 15-42, Dec. 2001. [12] D. Barbara, N.Wu, v S. Jajodia. Detection novel network intrusions using bayes estimators Proceedings of the First SIAM International Conference on Data Mining (SDM 2001), Chicago, USA, Apr, 2001. [13] Ken. Toshida. Entropy based intrusion detection. Proceedings of IEEE Pacific Rim Conference on Communications, Computers and signal Processing (PACRIM2003), vol. 2, trang 840-843. IEEE, Aug. 2003. IEEE Explore. [14] S. B. Cho, Incorporating soft computing techiniquesinto a probabilistic intrusion detection system. IEEE Transactions on Systems, Man, and Cyberneticspart C: applications and reviews, vol. 32, trang 154-160, May 2002. [15] S. S. Ahmedur Rahman, Survey report association and classification rule mining for network intrusion detection, Schook of computer science University of Windsor (2006) V th Vn_Khoa An ton Thng tin_Hc vin K thut Mt m
108
109