Professional Documents
Culture Documents
PhanLopDuLieu Khoa 1
PhanLopDuLieu Khoa 1
Bo co thu hoch chuyn Khai ph d liu & Nh kho d 2 liu K thut phn lp d liu trong Khai Ph D Liu l mt trong nhng vn nguyn cu m rng hin nay ; tp trung ch yu vo thng k, my hc v mng ntrn . K thut phn lp c nh gi l mt k thut khai ph d liu c s dng rng ri nht vi nhiu m rng . S kt hp ca k thut phn lp v c s d liu l mt lnh vc ha hn bi v p ng c mt vn ht sc quan trng ca ng dng c s d liu l tnh uyn chuyn cao . Vi nhng ngha v vai tr ht sc quan trng ca k thut phn lp nu trn , bi thu hoch ny tp trung nguyn cu su nhng k thut phn lp ,nhng cch tip cn khc nhau i vi k thut phn lp cng vi nhng tm hiu v nh gi nhng ci tin ca k thut phn lp trong thi gian gn y t nhng kt qu c ng ti trn mt s bo co khoa hc ti nhng hi ngh khoa hc quc t v Khai Ph D Liu cng nh vic tm hiu v s dng k thut phn lp trong sn phm thng mi Microsoft SQL Server 2000 . Bi Thanh Hiu Cao hc Kha 1
Phn lp d liu l k thut da trn tp hun luyn v nhng gi tr hay hay l nhn ca lp trong mt thuc tnh phn lp v s dng n trong vic phn lp d liu mi . Phn lp cng l tin on loi lp ca nhn . Bn cnh k thut phn lp c mt hnh thc tng t l k thut tin on , k thut tin on khc vi phn lp ch phn lp ch lin quan n tin on loi lp ca nhn cn k thut tin on m hnh nhng hm nh gi lin tc . K thut phn lp c tin hnh bao gm 2 bc : Xy dng m hnh v s dng m hnh . Xy dng m hnh : l m t mt tp nhng lp c nh ngha trc trong : mi b hoc mu c gn thuc v mt lp c nh ngha trc nh l c xt nh bi thuc tnh nhn lp , tp hp ca nhng b c s dng trong vic s dng m hnh c gi l tp hun luyn . M hnh c biu din l nhng lut phn lp , cy quyt nh v nhng cng thc ton hc . S dng m hnh : Vic s dng m hnh phc v cho mc ch phn lp d liu trong tng lai hoc phn lp cho nhng i tng cha bit n . Trc khi s dng m hnh ngi ta thng phi nh gi tnh chnh xt ca m hnh trong : nhn c bit ca mu kim tra c so snh vi kt qu phn lp ca m hnh , chnh xc l phn trm ca tp hp mu kim tra m phn loi ng bi m hnh , tp kim tra l c lp vi tp hun luyn . Phn lp l mt hnh thc hc c gim st tc l : tp d liu hun luyn ( quan st , thm nh ...) i i vi nhng nhn ch nh lp quan st , nhng d liu mi c phn lp da trn tp hun luyn . Ngc li vi hnh thc hc c gim st l hnh thc hc khng c gim st lc nhn lp ca tp d liu hun luyn l khng c bit n .
2. Phn lp bng phng php qui np cy quyt nh : 2.1. Khi nim cy quyt nh :
Cy quyt nh l mt flow-chart ging cu trc cy , nt bn trong biu th mt kim tra trn mt thuc tnh , nhnh biu din u ra ca kim tra , nt l biu din nhn lp hoc s phn b ca lp .
Information gain l i lng c s dng chn la thuc tnh vi information gain ln nht .Gi s c hai lp , P v N . Cho tp hp ca nhng v d S cha p phn t ca lp P v n phn t ca lp N . Khi lng ca thng tin , cn quyt nh nu nhng mu ty trong S thuc v P hoc N c nh ngha nh l :
E(A) =
Thng tin nhn c nhnh A l : Gain(A) = I(p,n)-E(A) 2.4. Ni dung gii thut hc cy quyt nh c bn ID3 : ID3 l mt gii thut hc cy quyt nh c pht trin bi Ross Quinlan (1983). tng c bn ca gii thut ID3 l xy dng cy quyt nh bng vic s dng mt cch tm kim t trn xung trn nhng tp hp cho trc kim tra mi thuc tnh ti mi nt ca cy . chn ra thuc tnh m hu ch nht cho s phn loi trn nhng tp hp cho trc , chng ta s a ra mt h o li thng tin. tm ra mt cch ti u phn loi mt tp hp thng tin , vn t ra l chng ta cn phi lm ti thiu ha ( Chng hn, ti thiu chiu cao ca cy). Nh vy chng ta cn mt s chc nng m c th nh gi trng hp no no cho ra mt s phn chia cn bng nht . H o li thng tin s l hm nh vy.
ID3 ( Learning Sets S, Attributes Sets A, Attributesvalues V) Return Decision Tree.
Begin u tin np learning sets , to nt gc cho cy quyt nh 'rootNode', thm learning set S vo trong nt gc nh l tp con ca n. For rootNode, u tin chng ta tnh Entropy(rootNode.subset) If Entropy(rootNode.subset)==0, then rootNode.subset bao gm records tt c vi cng gi tr cho cng gi tr thuc tnh xt nh, tr v mt nt l vi decision attribute:attribute value; If Entropy(rootNode.subset)!=0, then tnh li thng tin (information gain) cho mi thuc tnh tri (cha c s dng phn chia), tm thuc tnh A vi Maximum(Gain(S,A)). To nhng nt con ca rootNode ny v thm vo rootNode trong cy quyt nh. For mi con ca rootNode, p dng
nhng tp hp con ca chng (subset) sau tnh : Entropy(rootNode.subset)= -(9/14)log 2 ( 9/14 ) ( 5/14)log 2 (5/14)= 0.940 2. Tnh ton thng tin nhn c cho mi thuc tnh : Gain(S,Windy)= Entropy(S)-(8/14)Entropy(S false) (6/14)Entropy(S true) = 0.048 Gain(S,Humidity) = 0.151 Gain(S,Temperature) = 0.029 Gain(S,Outlook) = 0.246
l s phn chia theo thuc tnh outlook . 4. p dng ID3 cho mi nt con ca nt gc ny , cho n khi t n nt l hoc nt c entropy = 0.
2.5. Nhng thiu st ca gii thut ID3:
Trng hp thiu st th nht : Mt thiu st quan trng ca ID3 l khng gian phn chia hp l ti mt node l cn kit . Mt s phn chia l s phn hoch ca mi trng hp ca khng gian m kt qu t c t vic th nghim ti mt node quyt nh ID3 v con chu ca n cho php s kim tra ti ti mt thuc tnh n v nhnh trong kt qu cho ra t s kim tra ny. Trng hp thiu st th hai : Mt thiu st m ID3 mc phi l n da vo rt nhiu vo s lng ca nhng tp hp d liu a vo. Qun l s tp nhiu ca tp d liu vo l v cng quan trng khi chng ta ng dng gii thut hc cy quyt nh vo th gii thc .Cho v d , khi c s ln tp trong tp d liu a vo hoc khi s lng v d a vo l qu nh to ra mt v d in hnh ca hm mc tiu ng . ID3 c th dn n vic to quyt nh sai. C rt nhiu nhng m rng t gii thut ID3 c bn pht trin p dng nhng lut hc cy quyt nh vo th gii thc , nh l nhng post-pruning tree , qun l nhng thuc tnh gi tr thc , lin quan n vic thiu nhng thuc tnh , s dng nhng tiu chun chn la thuc tnh khc hn thu thp thng tin . 2.6. M rng qui np cy quyt nh c bn : Vic m rng qui np cy quyt nh c p dng cho nhng thuc tnh gi tr lin tc : nh ngha mt cch uyn chuyn nhng thuc tnh gi tr b ri rc m s phn chia gi tr thuc tnh thnh mt tp ri rc ca nhng khong . M rng qui np cy quyt nh cng c p dng cho nhng gi tr thuc tnh thiu st bng cch : Gn nhng gi tr thiu st bng gi tr thng thng nht ca thuc tnh hoc gn kh nng c th vi mi gi tr c th . Vic m rng qui np cy quyt nh cng c p dng cho xy dng thuc tnh : To nhng thuc tnh da trn nhng ci tn ti m chng th hin tha tht . iu ny s gip thu gim vic phn mnh , s lp li v vic to bn sao .
2.7. Gii thut m rng C4.5 :
C4.5 l s m rng ca gii thut ID3 trn mt s kha cnh sau: Trong vic xy dng cy quyt nh , chng c th lin h vi tranning set m c nhng records vi nhng gi tr thuc tnh khng c bit n bi vic nh gi vic thu thp thng tin hoc l t s thu thp thng tin , cho
Tng t nh i lng Gain trn IBM a ra mt i lng cho vic phn lp l gini nh sau: Nu mt tp d liu T cha nhng mu t n lp, gini index , gini(T) c nh ngha nh sau :
trong pj l tn s lin quan ca lp j trong T. Nu mt tp hp d liu T c chia thnh 2 tp con T1 v T2 vi kch thc tng ng l N1 v N2 . gini index ca d liu chia ct cha nhng v d t n lp , gini index gin(T) c nh ngha nh sau:
Thuc tnh cung cp gi tr ginisplit(T) nh nht c chn phn chia nt Th hin tri thc theo dng thc nhng lut IF-THEN . Trong mt lut c to da trn mi con ng t nt gc n l . Mi cp thuc tnh theo mt con ng to thnh mt s kt hp v nt l nm gi ton b lp tin on . Nhng lut to ra rt d hiu i vi con ngi .
Cy c to ra c th qu kht vi d liu hun luyn . Vic qu kht c th sy ra trong nhng trng hp sau y : Qu nhiu nhng nhnh , mt s c th phn nh s d thng v nhng phn t hn lon (noise) hoc nhng phn t nm ngoi phn lp (outlier) . Kt qu nhn c thiu chnh xt i vi nhng mu cha thy. C hai cch tip cn trnh qu kht d liu . Ta trc : Dng s xy dng ca cy sm khng chia mt node nu iu ny to kt qu di mt ngng theo mt h nh gi tt . Mt kh khn trong vic ta trc l iu ny s to ra s kh khn trong vic to ra mt ngng thch hp . Ta sau : Loi nhng nhnh t mt cy ln y - to mt th t ca nhng cy b ta tng dn ln trong ta s dng mt tp d liu khc nhau t d liu hun luyn xt nh ci no l cy c ta tt nht (best pruned tree). Tip cn xt nh kch c cy cui cng : Thng thng ngi ta phn chia tp hun luyn thnh tp d liu hun luyn (2/3) v d liu th (1/3) , ngi ta s dng s nh gi cho . Cng c mt cch khc l s dng tt c d liu hun luyn , nhng p dng kim tra thng k nh lng khi no m rng hoc ta bt mt nt c th ci thin ton b s phn phi . Mt cch th ba l s dng nguyn tc m t chiu di ti thiu : trong ngi ta dng s pht trin ca cy khi s m ha c ti thiu . 3. S phn lp cy quyt nh trong c s d liu ln: S phn lp l mt vn c in c nguyn cu mt cch m rng bi nhng nh thng k v nhng nh nguyn cu my hc .Hng pht trin hin nay l ca vic phn lp l phn lp nhng tp d liu vi hng t nhng mu th v hng trm thuc tnh vi tc va phi. Qui np cy quyt nh c nh gi cao trong khai ph d liu ln v nhng nguyn nhn sau : Tc hc tng i nhanh hn so vi nhng phng php phn loi khc . C th hon chuyn c thnh nhng lut phn lp n gin v d hiu .
DataMining Phng php ny th hin mt gii thut phn lp da trn cy quyt nh c kh nng loi b s gii hn ca b nh , thc thi nhanh v c kh nng leo thang . Gii thut ny c thit k d dng cho song song ha , cho php nhiu b x l lm vic vi nhau xy dng mt m hnh nht qun . Danh sch lp trong SLIQ phi cha trong b nh C chai : danh sch lp c th ln SPRINT : t thng tin lp vo danh sch thuc tnh v khng c danh sch lp no. Song song phn lp : phn chia danh sch nhng thuc tnh . 3.3. PUBLIC :Tch hp s pht trin ca cy quyt nh v ta cy : Phng php ny cp mt ci tin ca cng c cy quyt nh c ci tin m tch hp giai on ta cy vi giai on xy dng ban u . Trong PUBLIC , mt nt khng c m rng trong sut giai on xy dng , nu n c xc nh rng n s b ta trong sut giai on ta sau . Do to s quyt nh cho node ny trc khi n c m rng , PUPLIC tnh ton mt bin di trn gi tr ti thiu cy con c t ti nt . S c lng ny s c s dng bi PUBLIC xt nh nhng nt m chc chn c ta v i vi nhng nt nh vy m khng tiu tn trong vic phn chia chng . Tch hp pht trin v ta : mi nt , kim tra chi ph ca s pht trin ca nhng cy con .
3.4. RainForest : A Generic Framework :
Phng php ny trnh by mt khung lm vic hp nht cho nhng cng c to cy quyt nh m tch ri nhng kha cnh v kh nng leo thang ca gii thut cho vic xy dng mt cy quyt nh t nhng c tnh trung tm m xt nh phm cht ca cy . Loi gii thut ny d dng hot ng c th vi nhng gii thut ring bit t ti liu nguyn cu bao gm
Qui np cy quyt nh ca d liu da trn khi l s tch hp ca tng qut ha vi qui np cy quyt nh .. Phn lp nhiu cp da trn khi c hai vn quan trng l phn tch lin quan da trn nhiu cp v phn tch thng tin nhn c vi chiu v cp . 4. Phng php phn lp Bayesian: L thuyt Bayesian cung cp mt tip cn theo xc xut suy din . N da trn gi thuyt rng s lng ca khuynh hng b chi phi bi phn b xc xut v quyt nh ti u c th c to bi s suy lun v nhng xc xut i lin vi d liu c quan st . y l vn quan trng ca my hc bi v n cung cp mt tip cn nh lng cho vic xem xt cn thn bng chng h tr nhng gi thuyt thay i . L thuyt Bayesian cung cp gii thut hc c bn m vn dng nhng xc xut cng nh l mt khung lm vic cho s phn tch s hot ng ca nhng gii thut m khng th vn dng r rng . Hc theo xt sut : Tnh xt sut hin cho gi thuyt , trong s nhng tip cn thc dng nht cho cc kiu chc chn ca nhng vn hc . Tnh tng dn : mi v d hun luyn c th gia tng vic tng hoc gim m khng gian gi thuyt ng . Kin thc trc c th kt hp vi d liu c quan st . Tin on xt sut : Tin on nhiu khng gian gi thuyt , c o bi xt sut ca n . Tiu chun : Thm ch khi phng thc Bayesian kh tnh ton , chng cng cung cp mt tiu chun tt nht cho vic to quyt nh . nh l Bayesian : Cho trc mt tp hun luyn D , xt sut posteriori ca mt gi thuyt h , p(h\D) cho bi nh l Bayesian : P(D\h)P(h) P(h\D) = P(D) Gi thuyt ti a posteriori MAP :
Kh khn thc t ca phng php phn lp Bayesian ch n i hi kin thc khi to ca nhiu kh nng c th sy ra v chi ph tnh ton ng k . Phn lp Naive Bayes: Gi thuyt n gin : nhng thuc tnh l c lp theo iu kin
Chi ph tnh ton thu gim ng k , ch tnh n s phn b lp. Vi mt tp hp cho trc chng ta c th tnh ton kh nng sy ra Vn phn lp c th c th thc ha s dng xt sut a-posteriori nh sau : P(C|X)= xt sut ca mu v d x=<x1,..,xk> l lp ca C . tng : gn mu X vo lp nhn C sao cho P(C|X) l ln nht . nh l Bayes pht biu nh sau : P(C|X) = P(X|C).P(C)/P(X) , trong : P(X) l hng cho tt c lp P(C) = tn sut tng i ca mu lp C C m P(C|X) l ln nht = C m P(X|C).P(C) l ln nht . Phn lp Naive Bayesian : Gi thuyt Naive : thuc tnh l c lp P(x1,...,xk|C)=P(x1|C)...P(xk|C) Nu thuc tnh th i l xt thc : P(xi |C) c c lng nh l tn sut tng i ca nhng mu c gi tr xi nh l thuc tnh th i trong lp C. Nu thuc tnh th i l lin tc P(xi |C) c c lng thng qua hm mt Gaussian .Vic tnh ton l d dng trong c hai trng hp .V d playtennis : phn lp X .Cho mu cha c thy nh sau X=<rain,hot,high,false>
P(X|p).P(p)=P(rain|p).P(hot|p).P(high|p).P(false|p).P(p) = 3/9.2/9.3/9.6/9.9/14=0.010582 P(rain|n).P(hot|n).P(high|n).P(false|n).P(n)=2/5.2/5.4/5.5/14=0.018286 Mu X c phn lp vo lp n. Gii thut Nave Bayes vit bng m gi nh sau:
S c lp ca gi thuyt:
Vector x n chiu c nh x vo bin y da trn tch v hng v mt hm nh x phi tuyn . Mng hun luyn: Mc tiu c bn ca vic hun luyn : t c mt mt tp hp ca nhng trng s m c th lm cho hu ht tt c nhng b trong tp hun luyn c phn lp ng . Nhng bc ca qu trnh hun luyn :
Tng t , i vi nhng hm mc tiu gi tr thc . Mnh i vi d liu hn lon bi trung bnh k-nearest neighbor. Tr ngi ca chiu : Khong cch gia nhng lng ging c th b chi phi bi nhng thuc tnh quan trng . vt qua iu ny , ko dn trc hoc loi tr nhng thuc tnh t quan trng nht .
Gii thut di truyn : da trn mt tng t i vi mt s tin b sinh hc . Mi lut c biu din bi mt chui d liu bit . Mt mu khi to c to ra bao hm nhng lut c to mt cch ngu nhin . Da trn khi nim ca ci thch hp nht tn ti . Nhng qui lut ph hp nht s c biu din bi s phn lp chnh xt ca n trn mt tp hp ca nhng v d hun luyn . S thch hp ca mt lut c biu din bi chnh xt s phn lp ca n trn tp hp nhng v d hun luyn . Kt qu c to ra bi s giao nhau v s bin i . 7.3. Tip cn tp th : Tp th c s dng xp x hoc nh ngha th nhng lp tng ng . Mt tp th cho mt lp C c xp x bi 2 tp hp : mt xp x di ( chc chn trong C ) v mt xp x trn . Tm kim tp rt gn ca nhng thuc tnh (i vi c tnh rt gn) l NP kh nhng ma trn phn bit c s dng thu gim cng tnh ton . 7.4. Tip cn tp m : Logic m s dng nhng gi tr thc gia 0.0 v 1.0 biu din ca quan h thnh vin (v d nh l s dng th thnh vin m ). Nhng gi tr thuc tnh c chuyn sang nhng gi tr m . i vi mt v d mi cho trc , nhiu hn mt gi tr m c th c p dng . Mi lut c th p dng c th p dng ng gp mt c cho thnh vin theo th loi . Thng thng , nhng gi tr thc i vi th loi c tin on c tng kt . 7.5. Phn lp bng suy lun da trn trng hp ( case-based reasoning): Nhng thc th c biu din bi s m t giu tnh tng trng (v d nh hm th) . Kt hp nhiu trng hp nhn c , suy lun da trn tri thc v vn gii quyt . 8. Tin on v phn lp: Tin on v vn tng t nh phn loi . Vic tin on c xy dng nh sau : u tin l xy dng m hnh .Tip theo l s dng m hnh tin on nhng gi tr tip theo . Phng php chnh tin on l phng php hi qui . C nhiu kiu hi qui : hi qui tuyn tnh, hi qui a tuyn v hi khng tuyn tnh . Tin on khc vi phn lp ch : Phn lp cp n tin on loi lp d liu . Tin on m hnh nhng hm gi tr lin tc .
Ti sao phi khai ph d liu vi OLE DB . Tiu chun cng nghip l ti hn ca s pht trin khai ph d liu , vic s dng , thao tc gia cc phn , v trao i . OLEDB cho khai ph d liu l mt cuc i mi t nhin t OLEDB v OLDB thnh OLAP . Xy dng nhng ng dng khai ph trn c s d liu quan h l ng k . Ta cn nhng gii thut khai ph d liu ty bin khc nhau , cng vic quan trng trn phn ca nhng ngi xy dng ng dng . Mc tiu : xa b gnh nng ca vic pht trin ng dng trong c s d liu quan h ln . ng c ca OLE DB i vi vic khai ph d liu : Lm cho pht trin ca m hnh khai ph d liu tr nn d dng hn . To ra nhng m hnh khai ph d liu . Cha ng , bo qun v lm ti m hnh d liu c cp nht .Chng trnh s dng m hnh trn tp d liu khc . Duyt qua m hnh . Cho php nhng ngi pht trin ng dng tch hp tham gia trong vic xy dng nhng gii php khai ph d liu . Nhng c im ca OLE i vi khai ph d liu :
Data Mining Module l tng t vi mt bng trong SQL: To mt i tng data mining module : CREATE MINING MODEL[model_name] Chn vo d liu hun luyn d liu vo m hnh v hun luyn n . INSERT INTO [model_name] S dng m hnh khai ph d liu : SELECT relation_name.[id],[mode_name].[predict_arr] quan tm ni dung DMM to tin on v duyt thng k cha bi m hnh . S dng DELETE lm rng / reset d liu . Tin on trong c s d liu : tin on kt ni gia mt m hnh v tp d liu (table) . Khai trin DMM bi ch vit lnh SQL . Hai thnh phn chnh : Trng hp v tp trng hp : d liu u vo . Mt bng hoc nhng bng lng ( cho d liu phn cp) . M hnh khai ph d liu : mt kiu c bit ca bng . Mt tp hp nhng trng hp l lin quan vi mt m hnh khai ph d liu v mt thng tin th trong khi to mt DMM . Lu gii thut khai ph v kt qu tng quan thay v ca d liu chnh n .
S dng dch v to hnh d liu to mt tp nhng hng phn cp . Phn ca nhng sn phm Microsoft Data Access Component (MDAC) . Nhng bng lng nhau : Khng cn thit cho nhng h thng con cha ng h tr nhng record lng nhau . Nhng trng hp ch c c th ha nh l tp nhng hng lng nhau trc hun luyn , tin on nhng m hnh khai ph d liu . Cng d liu vt l c th c s dng to nhng tp trng hp khc . nh ngha m hnh khai ph d liu : Vic nh ngha tn m hnh khai ph d liu bao gm nhng giai on sau: t tn ca m hnh . Thit lp gii thut v nhng thng s u vo . Nhng ct ca caseset v nhng quan h gia nhng ct . nhng ct ngun v nhng ct tin on. V d :
CREATE MINING MODEL [Age Prediction] %Name of Model ( [Customer ID] LONG KEY, %source column [Gender] TEXT DISCRETE, %source column [Age] Double DISCRETIZED() PREDICT, %prediction column [Product Purchases] TABLE %source column ( [Product Name] TEXT KEY, %source column [Quantity] DOUBLE NORMAL CONTINUOUS, %source column [Product Type] TEXT DISCRETE RELATED TO [Product Name] %source column ))
Column Specifiers :
KEY ( kha ) ATTRIBUTE (thuc tnh) RELATION (quan h) (RELATED TO clause) QUALIFIER (t hn nh) (OF clause) PROBABILITY: [0, 1] VARIANCE SUPPORT PROBABILITY-VARIANCE ORDER TABLE
Attribute Types :
DISCRETE ORDERED CYCLICAL CONTINOUS DISCRETIZED SEQUENCE_TIME
Chuyn n mt m hnh phn tch d liu : S dng pht biu INSERT INTO . Tiu tn thi gian cho mt trng hp s dng m hnh khai ph d liu . S dng pht biu SHAPE to bng lng nhau t d liu input .V d:
INSERT INTO [Age Prediction] ( [Customer ID], [Gender], [Age], [Product Purchases](SKIP, [Product Name], [Quantity], [Product Type]) ) SHAPE {SELECT [Customer ID], [Gender], [Age] FROM Customers ORDER BY [Customer ID]} APPEND {SELECT [CustID], {product Name], [Quantity], [Product Type] FROM Sales ORDER BY [CustID]} RELATE [Customer ID] TO [CustID] ) AS [Product Purchases]
Duyt qua m hnh khai ph d liu : Duyt qua m hnh khai ph d liu l qu trnh trc quan ho d liu visualization . Kt lun : OLE DB cho khai ph d liu l s tch hp khai ph d liu v h thng c s d liu . OLE DB cho khai ph d liu l mt tiu chun tt cho xy dng ng dng khai ph d liu . 10. To cy quyt nh trong Microsoft SQL Server 2000 Trong phn ny s trnh by cch thc Microsoft Analysis services c s dng hin thc m hnh cy quyt nh trong phn mm Microsoft SQL Server 2000 .Chng ta cp n to m hnh cy quyt nh vi hai m hnh - mt s dng nhng bng quan h chun nh l ngun v mt ci khc s dng OLAP cubes . To m hnh Bc u tin trong hot ng khai ph d liu l to m hnh . M hnh khai ph d liu c to ra t nhng mu tin cha trong mt ngun d liu (data source) . Mt vi ngun d liu c th c kt ni thng qua
Ch rng c nhiu Analysis Manager folder cha nhng phn t cn to OLAP cubes v nhng m hnh khai ph d liu . Server phn tch bao gm nhng thnh phn sau: Databases : Mi Analysis Server cha mt hoc nhiu c s d liu , mt icon i din mi c s d liu .C 4 folder v mt icon di mi database icon .
c th tin hnh khai ph d liu u tin ta phi to c s d liu . To c s d liu : To database l vn n gin . Ta ch cn kch chut phi ln server v chn New Database . Database dialog box hin ra v ta phi g tn ca c s d liu , c c phn ty nh ta g m t ca database .
Mining Mode Wizard : sn phm ca Microsoft i i vi nhng tc v trong mt gii hn v c th tin on mt s bc . Mining mode wizard s dn dt chng ta tng bc to mt m hnh . 1. Chn ngun (Select source). 2. Chn case table hoc nhng bng cho m hnh khai ph d liu . 3. Chn k thut khai ph d liu (gii thut) . 4. Hiu chnh nhng kt ni ca nhng bng c chn nh l ngun trong nhng bc trc . 5. Chn ct Case Key . 6. Chn Input v ct tin on . 7. Kt thc . Select Source : Ta phi chn vic to m hnh khai ph d liu m cha nhng trng hp t bng quan h hoc OLAP cubes . Select case tables : Kt ni c s dng vi m hnh quan h c to ra v hin th trong mn Select Case Table . y cng cung cp mt ty chn ca vic to ra mt kt ni mi bng vic kch ln mt Data Source mi .
Chn mt k thut khai ph d liu : Mining Model Wizard cung cp hai gii thut khai ph d liu , hoc "k thut" nh chng ta gi trong wizard m chng ta chn t . Vi mc ch , chng ta chn Microsoft Decision Tree trong mn hnh Select Data Mining Techniques . To v hiu chnh nhng kt ni : Nu ta chn nhiu bng trong cc bc trc sau mn hnh to v hiu chnh nhng kt ni s hin th tip theo . Mn hnh ny s cho php ta ha ha nhng bn kt ni bng vic ko nhng ct t nhng bng cha vo con ca n . Nu bn chn ch mt bng n th bc ny b b qua . Chn kha ca ct : Bc k tip l chn ID nh l Case Key column . S chn la ca ID c mt nh hng quan trng ln u ra ca quyt nh bi v Key l cng c xt nh s duy nht ca mt record. Chn la mt kha l iu bt but , do n rt quan trng to mt kha trong SQL Server database nu mt ci cha tn ti . Chn input v tin on ct : Trn mn hnh tin on v chn ct , ly t nht mt ct cho m hnh khai ph t mt ct cho php trong danh sch trn ca s bn tri . M hnh Input column th hin d liu thc s m c s dng hun luyn m hnh khai ph d liu . Nu bn chn Microsoft Decision Trees trong Select Case Tables screen , v cng chn t nht mt ct tin on .
Kt thc : Cui cng nhng thng s m hnh khai ph d liu c nh ngha , ta phi nhp tn ca m hnh khai ph d liu . Trnh son tho m hnh khai ph quan h : Tin dng nh wizard l nhng ng dng , chng khng gii hn s uyn chuyn trong mi bc bi v qun l n gin , wizard phi s dng nhng gi tr mc nh v nhng quyt nh tng minh hon thnh mt tc v . Bng vic s dng Relational Mining Model Editor , ta c th b qua wizard .
Trc quan m hnh : Mt trong nhng c im c gi tr nht ca cy quyt nh l s n gin ca logic bn trong cu trc ca n .Data Mining Model Editor cha hai tabs y mn hnh , lc tab , m c s dng thay i cu trc ca m hnh v Content tab l ni hin th d liu c phn loi v c t chc thnh cy. Content tab l cch nhanh v tin li xem m hnh , nhng cu trc v thuc tnh . Dependency Network Browser : Dependency Network Browser l cng c c s dng xem nhng s c lp v nhng mi quan h gia nhng i tng trong m hnh khai ph d liu . hin th n t ca s Analysis Manager Tree , kch chut phi mt m hnh khai ph d liu v sau chn Browser Dependency Network . Trong Dependency Network Browser , mt m hnh khai ph d liu c th hin nh l mt mng ca nhng thuc tnh . Bn trong m hnh , chng ta c th xt nh d liu c lp v tin on trong nhng thuc tnh quan h . S ph thuc c th hin bi nhng mi tn . Hng ca s tin on c ch nh bi arrowhead v bi color-coding ca nhng notes .
Chn thng tin (Select Case) : Trong mn hnh Select Case , chn chiu cha nhng d liu c s dng hun luyn cho m hnh khai ph d liu . Cng ty chn cp m ta quan tm s dng . Nu ta khng chn cp th wizard s chn cp thp nht trong s cc chiu ca n .
Chn la thc th tin on (Select Predicted Entity) : Trong mn hnh Selected Predicted Entity , chng ta c ba ty chn cho ngun ca s tin on ca ta . + Gii hn source cube . + Thuc tnh thnh vin ca case level . + Nhng thnh vin ca chiu khc . Tiu chun nh gi ca Source Cube : Nu ta mun to s tin on vi n v o lng - nhng gi tr s trong cube . Ta s chn tiu chun nh gi . Thuc tnh thnh vin ca Case Level : Tt c nhng level dimension trong OLAP c th cha thuc tnh thnh vin thm vo m t level . Nhng thnh vin ca chiu (Dimension) khc : Nu c quan h gia chiu cha trong nhng nhng thng tin v nhng chiu khc , ta c th s dng chiu lin quan nh l ngun tin on thuc tnh .
Chn d liu hun luyn : Trong bc tip theo , ta chn d liu hun luyn m hnh ca ta . Chiu m ta chn trong mn hnh Select Case c chn mc nh . Chn chiu v Cube o : Bc tip theo l ty nh nhng cung cp mt c tnh mnh m cho php ch khi s dng OLAP ging nh l ngun d liu v Microsoft Decision Trees ging nh gii thut khai ph d liu . B qua ty nh m ta chn, m hnh khai ph d liu s to cu trc Analysis Service . Chiu (Dimension) : Chiu l kt qu ca output data-mining model.Nu ta nhn vo mt vi chiu OLAP, ta ch rng dng thc ca n l cy th bc trong nhng nhnh c th c nhng nhnh con , mi nhnh con c th c nhng nhnh con ca chnh n . Khi o (Virtual Cube) : Khi o hu nh xc nh khi t d liu n ngoi tr l n cng cha chiu m c to trong lc hin ti . Hon tt M hnh khai ph d liu : Trong bc cui cng l t tn cho m hnh khai ph d liu .
nh ngha v giao tc : C nhiu cng vic c hon tt trong succession khi x l mt cube hoc mt m hnh khai ph d liu . Chng bao gm : 1.To cu trc . 2.Truy vn ngun d liu . 3.Chn d liu vo nhng cu trc . 4.To nhng trng c tnh ton . Trc khi Analysis Service pht biu rng m hnh khai ph d liu l hon thnh , n kim tra tt c cc bc c hon tt . Nu mt vi bc trong sai , nhng bc trc khng c hon tt . Trnh son tho m hnh khai ph d liu OLAP : Khi s x l kt thc , kch vo Close button v ch cho trnh son tho m hnh khai ph d liu OLAP xut hin . Trnh chc nng trnh son tho v c bn ging nh trnh son tho m hnh khai ph d liu . C mt cht khc bit l OLAP l ngun ca m hnh v khng phi l c s d liu quan h .
Ni dung chi tit ca s : iu u tin ta s l nhng nt trong cy khng c nhng tn trng hp l nh l trong c s d liu quan h . Danh sch cy tin on : Danh sch cy tin on cha nhiu cu trc cy quyt nh khc nhau c trong m hnh . Mi cy quyt nh c th hin bi trng quyt nh m s c xut pht thng qua vic s dng n . Phn tch d liu vi OLAP Data-Ming Model : To mt m hnh t OLAP l tng t theo nhiu kiu to n t mt ngun c s d liu quan h . Nhng tc v khai ph d liu khng trc tip m chng ta tm kim ng hnh s dng nh l khng n ngun , ngoi tr cho s kin l OLAP khng ging nh m hnh quan h , cung cp kh nng ca chnh m hnh khai ph d liu quan h thnh OLAP cube m c s dng nh ngun .