Professional Documents
Culture Documents
Giới thiệu kỹ thuật xử lý phân tích trực tuyến với SQL SERVER
Giới thiệu kỹ thuật xử lý phân tích trực tuyến với SQL SERVER
Tr l i Lin h
Gii thiu k thut x l phn tch trc tuyn vi SQL SERVER OLAP (On-Line Analytical Processing) with SQL SERVER Nguyn Vn Chc chucnv@ud.edu.vn
Trong cng ngh kho d liu (Data Warehouse Technology), OLAP l k thut truy xut d liu ch yu trong kho d liu. D liu trong DW c t chc di dng cc khi d liu a chiu (Multi Dimensional Cube) v OLAP c dng phn tch trn d liu khi (cube).
Bi vit ny trnh by cch trin khai thc hin k thut OLAP trn DBMS SQL Server phin bn 2005 hoc cao hn.
Data Warehouse (DW): c xem l tp cc c s d liu hng ch , c tnh lch s c tch hp t nhiu ngun d liu qua cc qu trnh trch lc, hp nht, chuyn i, lm sch.
D liu khi (Data Cube): D liu trong kho d liu c th hin di dng a chiu (Multi Dimension) gi l khi (cube). Mi chiu m t mt c trng no ca d liu. V d vi Data Cube bn hng th chiu hng ha (Item) m t chi tit v hng ha, chiu thi gian (time) m t v thi gian bn hng, chiu chi nhnh (Branch) m t thng tin v cc i l bn hng,
r hn v Data Cube, hnh di y minh ha Data Cube ca d liu bn hng t bng d liu (Spreadsheet) sang d liu dng khi vi 3 dimensions l: Location (Cities), Time (Quarters) v Item (Types)
Lc hnh sao (Star Schema): y l m hnh biu din d liu ca DW, lc hnh sao v c bn gm c bng s kin (Fact Table) v cc bng chiu (Dimension table). Fact table
ng theo di cc bin ng ca d liu, cu trc ca Fact table gm cc kha ngoi l cc kha chnh ca c bng chiu (Dimension table). Dimension Table l cc bng m t cc t trng ca cc chiu nh chiu thi gian, chiu khch hng, chiu hng ha,
Di y minh ha lc hnh sao ca bi ton bn hng. y cng l d liu dng minh ha trong phn tip theo khi thc hin OLAP trn SQL Server. Trong Fact table l Sales v 4 Dimension tables l time (chiu thi gian) , item (chiu hng ha) , location (chiu b tr) v branch (chiu chi nhnh)
Measure ( o): L i lng c th tnh ton c trn cc thuc tnh ca fact table. y l mc tiu ca OLAP v phi xc nh trc khi tin hnh phn tch. V d nh tng tin bn hng ca mt chi nhnh, doanh thu ca tng mt hng theo qu,
Phn cp (Hierarchies): Khi nim ny m t s phn cp th bc (mc chi tit ca d liu). V d i vi chiu thi gian, ta c thc bc nh sau: day<week<month<quarter<year. Tng t i vi chiu location ta c th bc street<city<province_or_state<country. Trong khi phn tch d liu chng ta rt cn khi nim ny tng hp hay chi tit tng hng mc d liu trong DW.
Bi ton m t trong phn ny l bi ton bn hng, gm c 1 Fact table l Sales v 4 Dimension table l time, item, location v branch (Xem lc hnh sao trn).
Fact Table (Sales): Lu gi cc bin ng v qu trnh bn hng, gm cc kha ngoi ca 4 dimension tables v 2 thuc tnh l gi bn (dollars_ sold) v s lng bn (units_sold)
Cc dimension table:
Lu : c cng c phn tch OLAP, bn phi ci t SQL Server 2005 (2008) phin bn Developer hoc phin bn Enterprise Edition y v khi ci t nh chn mc SQLServer Database Services v Analysis Services. Cng c cho php thc hin OLAP lSQL Server Business Intelligence Development Studio - BIDS. Khi ci SQL Server cc phin bn trn th BIDS s c t ng ci t.
Cc bc thc hin: Khi ng SQL Server Management Studio v to CSDL c tn DW nh sau v nhp vo cc bng mt s records phn tch.
Trong ca s Solution Explorer ca Project OLAP_DW, bm phm phi chut vo Data Source to mt b kt ni n d liu dng cho phn tch.
t tn cho Data Source vm bm Finish hon thnh vic kt ni n c s d liu. To Data Source View ly cc bng d liu cn thit cn cho phn tch. Bm phm phi chut vo Data Source View trong ca s Solution Explorer chn New Data Source View
Ch : Nu bn mun chn bng Fact v cc bng Dimension lin quan n bng Fact th ch cn chn Fact Table a qua khung bn phi v bm nt "Add Related Tables" t ng ly cc bng Dimensions lin quan. Sau khi hon thnh, cc bng Fact v Dimension nh sau:
Sau khi tao Data Source v Data Source View ta to d liu khi cho phn tch bng cch bm chut phi ln Cube trong Solution Explorer v chn New Cube
Chn Next v chn ngun d liu cho Khi (DW), h thng s t ng d tm fact v Dimension Tables
Kt qu nh sau:
Bm Next thit lp chiu thi gian. Ch , thi gian l mt chiu rt quan trng trong kho d liu ni chng v phn tch OLAP. V vy nu bn khng xc nh chiu thi gian th h thng s t ng to ra mt chiu thi gian qun l.
Bm Next xc nh cc o (Measure) cho phn tch. Nhc li rng o l cc i lng phn nh mc tiu phn tch, tnh ton. l cc php ton trn thuc tnh c th tnh ton trong bng Fact.
Bm Next, h thng s t ng pht hin cc cu trc phn cp (Hierarchies) trong cc Dimesion Tables
Sau khi to ra khi d liu cho phn tch, thc thi OLAP ta bm phm phi chut vo tn project trong Solution Explorer v chn Deploy
Sau khi thc thi xong project, thc hin cc phn tch OLAP phc v cho cng tc qun l, bm phm phi chut vo Cube trong Solution Explorer chn Browse xut hin m hnh phn tch:
Panel bn phi chia lm 2 ca s, ca s pha trn dng xc nh cc iu kin phn tch, ca s pha di cha kt qu cc measure khi ta ko th (drag and drop) cc measure t panel bn tri qua. Ty theo mc ch phn tch m chng ta xc lp cc biu thc phn tch cho ph hp. V d vi thit lp nh di y c ngha l yu cu cho bit s ln (Sales Count) v tng s lng (Unit Sold) hng m chi nhnh Danang bn.
Thit lp di y cho bit Mt hng ca hng Intel c bn bao nhiu ln vi tng s lng bao nhiu ti chi nhnh HCM
Mn hnh thit k OLAP rt d s dng v linh hot, bn c th ko th cc Dimension v cc Measure t Panel bn tri sang panel bn phi. V d ta c th ko th thuc tnhBranch Name trong Dimension Branches sang panel bn phi v h thng s cho bit s lng v s ln bn cc sn phm theo tng chi nhnh nh sau:
Ty theo nhu cu phn tch d liu, bn c th to ra cc lt ct (slice) d liu trn nhiu chiu khc nhau sinh ra cc tng hp d liu cn thit cho nhu cu phn tch d liu trong kho rt nhanh chng v tin li. Hnh di y cho bit s lng v s ln bn cc mt hng theo tng chi nhnh da trn lt ct 2 chiu Branches v Items
Cc tab Dimension Usage, Caculations, KPIs, Actions, Partitions, Perspectives, translations c dng m rng kh nng phn tch ca OLAP. Ngoi k thut phn tch OLAP, SQL Server Business Intelligence Development Studio cn cung cp cc k thut khai ph d liu nh Regression, Association, Decision tree, Time Series, Clustering.. trong mc Mining Structure rt mnh v tin li xy dng cc m hnh khai ph d liu (s trnh by bi vit khc) Xem Video minh hoa OLAP tai y All comments please send to chucnv@ud.edu.vn. Thank you and Welcome!
T kha i din: data warehouse, kho d liu, OLAP, Phn tch x l trc tuyn, Data Mining, Business Intelligence, Tin hoc quan ly, Management Information Systems
im ch : 80