Professional Documents
Culture Documents
Slideshare - VN Bai Giang Tich Hop He Thong Bai 4 DH Kinh Te TP HCM
Slideshare - VN Bai Giang Tich Hop He Thong Bai 4 DH Kinh Te TP HCM
Bigingmn
TCHHPHTHNG
BI 4: DATA WAREHOUSE
1
Mctiu
Saukhihcxongbinysinhvincth:
Hiurkhinimkhodliu(DataWarehouse)v
ccctrngcamhnhkhodliu
Bitcccmhnhtchhpdliuachiu
Nmckintrckhodliu
Nmcccphngphpphntch,khaiphtrn
khodliu
2
Thamkho
PaulrajPonniah,DataWarehousing,2001
W.H.Inmon,BuildingtheDataWarehouse(Third
Edition),2002
3
4
Nidung
Khinimkhodliu
Mhnhdliuachiu
Kintrckhodliu
5
Khinimkhodliu
Khodliu(DataWarehouse)cnhnghal:
CSDLhtrquytnhcduytrtchbitviCSDLtc
nghipcatchc.
Htrxlthngtinnhcungcpmtdliuhpnhtphn
tch.
KDLlmttphpdliuhngch,tchhp,ctnhthigian
vkhngthayihtrqutrnhtoquytnhquntr.
Bnctrng:hngch,tchhp,ctnhthigianvkhng
thayi
6
Khodliu:khinim
Khodliu:
Cungcpmtkhungnhntchhpvtngthvdoanhnghip
To s sn c thng tin hin ti v lch s ca doanh nghip
thunliraquytnh
To kh nng giao dch h tr quyt nh m khng cn tr h
thngtcnghip
Cungcptnhnhtqunthngtindoanhnghip
7
Kintrckhodliu
8
TokhodliuDatawarehousing
QutrnhxydngvsdngKDL
9
KDLctrnghngch
ctchcxungquanhccchchnh,chnghnnh
khchhng,snphm,bnhng.
Tptrungvoxydngmhnhvphntchdliu
toquytnh
Cungcpmtkhungnhnnginvngngnvcc
tithucchcthtrongqutrnhraquytnh.
10
KDLctrnghngch
ngdngtcnghip chKDL
11
KDLctrngtchhp
KDLcxydngtvictchhpccngundliu
phc,khngngnht
CSDLquanh,CSDLfilephng(flatfiles:mha
CSDLsangdngcbitnh.txthoc.ini),ccmu
tingiaodchtrctuyn
Sdngcckthutlmschdliuvtchhpd
liu.
mbotnhnhtqunquycttn,cutrcm
ha,olngthuctnh,giaccngundliu
khcnhau
VD,gikhchsn:tint,thu,baoginsng
DliuchuyntiKDLthncchuyni.
12
KDLctrngtchhp
13
KDLctrngthigian
ChiuthigianiviKDLlngkdihnsovih
thngCSDLtcnghip.
CSDLtcnghip:dliugitrhinthi.
DliuKDL:cungcpthngtintheoquanimlch
s(chnghn,510nmqukh)
MicutrcctlitrongKDL
Chayutthigian
Nhngctlicadliutcnghipcthchahoc
khngchayutthigian.
April26,2014 14
KDLctrngthigian
hiuthigianhinthiti6090ngy hiuthigian5=10nm
pnhths utrcchnhchayutthigian
utrcchnhcha/khngchayutthi
gian
15
KDLctrngkhngthayi
Lutrvtlringbitccdliucchuyntmi
trngtcnghipsang.
Cpnhttcnghipdliukhngxuthintrongmi
trngKDL.
Khngcxlgiaodch,phchivcchiu
khinngthi.
Chchaithaotctruynhpdliu:
Npdliuvtruycpdliu.Dliungun
khngbinitrongKDL.
16
KDLctrngkhngthayi
17
KDLvHQTCSDLtcnghip
OLTP(xlgiaodchtrctuyn/onlinetransactionprocessing)
BitonchnhcaHQTCSDLquanhtruynthng
Tcnghiphngngy:thumua,lukho,ngnhng,snxut,tin
lng,ngk,kton,vv
OLAP(xlphntchtrctuyn/onlineanalyticalprocessing)
BitonchnhcahthngKDL
Phntchdliuvtoquytnh
ctrngphnbit(OLTP<>OLAP):
nhhngngidngvhthng:khchhng<>thtrng
Nidungdliu:hinthi,cth<>lchs,hpnht
ThitkCSDL:ER+ngdng<>hnhsao+ch
Khungnhn:hinthi,ccb<>tinha,tchhp
Mutruycp:truynhp<>chcvicuhiphc
18
OLTP<>OLAP
OLTP OLAP
Ngi dng Th l, chuyn vin CNTT Chuyn vin tri thc
Chc nng Tc nghip hng ngy H tr quyt nh
Thit kCSDL Hng ng dng Hng ch
D liu Hin thi, cp nht Lch s, tm tt, tch hp a chiu,
chi tit, quan h phng bit hp nht
lp
S dng Lp D tm (ad-hoc)
Truy cp c/ghi Nhiu duyt
Ch mc/bm theo kha
chnh
n v thao tc Giao dch ngn,n gin Cu hi phc tp
# bn ghi truy cp Chc Triu
#ngi dng Nghn Trm
Kch thc CSDL 100MB-GB 100GB-TB
n v o Thng lng giao dch Thng lng truy vn, p ng
19
Khodliuringbit
Hiunngcaochochaihthng
DBMSphnbchoOLTP:phngphptruycp,lpchmc,
iukhinngthi,khiphc
WarehousephnbchoOLAP:truyvnOLAPphc,khung
nhnachiu,hpnht
Chcnngkhcnhauvdliukhcnhau:
Thiudliu:HtrquytnhcndliulchsmCSDLtc
nghipthngkhngduytr
Hpnhtdliu:Htrquytnhihihpnht(tnghp,
tmtt)cadliutccngunkhngngnht
Chtlngdliu:ngunkhcnhausdngtrnhdin,mha
vkhundngdliukhngnhtqun(cnphihahp)
20
Khinimkhodliu
Mhnhdliuachiu
Kintrckhodliu
21
MhnhkhinimcaKDL
MhnhKDL:chiuvgitro
Shnhsao(starschema):Mtbngskin
trungtmcktnivimttpccbngchiu
Sbngtuyt(Snowflakeschema):Mtmrng
cashnhsaotrongmtvicutrcchiu
cchunhathnhmttpccbngchiunh
hn,hnhthctngtnhbngtuyt.
Schmsaoskin(Factconstellationsschema):
Bngskinphcchiasccbngchiu,tokhung
nhnmttpccngisao,nncncgis
ngnh(galaxyschema)hocchmsaoskin
22
Vdvshnhsao
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures
23
Vdvsbngtuyt
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key
branch_key
location
branch location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country
24
ExampleofFactConstellation
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location
Phnbit:Nuktqunhnctpdnghmtingi
trkthpgingnhktqunhncbipdngchnh
hmtrnmigitrkhngphnhoch.
Chnghn,count(),sum(),min(),max().
is(algebraic):nunctnhtonbimthmi
sviMis(Mlmtsnguynhuhn),miis
thucbimthmtchhpphnb.
Chnghn,avg(),min_N(),standard_deviation().
Lplun(holistic):Nucntimthngshnchtheo
kchthclutrmtmttphpcon.
Chnghn,median(),mode(),rank().
26
Dliuachiu
Khilngbnhnglmthmcasnphm,
thng,vqun
Cc chiu: SP, a danh, Thi gian
Cc ng tm tt phn cp
on
gi
Office Day
Month
Khodliuvkhaiphdliu
27
Mtkhidliuvd
Total annual sales
Date of TV in U.S.A.
1Qtr 2Qtr 3Qtr 4Qtr sum
t
uc
TV
od
PC U.S.A
Pr
VCR
Country
sum
Canada
Mexico
sum
28
Sdngkhodliu
BakiungdngKDL
Xlthngtin(Informationprocessing)
Htrtruyvn,phntchthngkcbn,vlpbocos
dngxuynm,bng,sctvth
Xlphntch
Phntchachiudliutrongkhodliu
HtrthaotcOLAPcbn,cunln,khoanxung,xoay
Khaiphdliu
Phthintrithctmun
Htrmhnhphntchkthp,xydng,thihnhphnlp
vdbo,vtrnhdinktqukhaiphbngtinchtrc
quanha.
29
Khinimkhodliu
Mhnhdliuachiu
Kintrckhodliu
30
ThitkKDL:Mtkhungphntchkinh
doanh
4khungnhnivithitkmtKDL
Khungtrnxung(Topdownview)
ChophplachnthngtinlinquancnthitchoKDL
KhungngunDL(Datasourceview)
Trnhbythngtincnmgi,lutrvqunlbih
thngtcnghip
KhungKDL(Datawarehouseview)
Chaccbngskinvccbngchiu
Khungtruyvnkinhdoanh(Businessqueryview)
Thyphicnhcadliutrongkhotkhungnhncangi
sdng
31
QutrnhthitkKDL
TipcnTopdown,bottomuphockthpchai
Topdown:Khiuvithitkvlnkhochkhiqut(hon
thnh)
Bottomup:Khiutkinhnghimvmu(nhanh)
Theoquanimcaknghphnmm
Thcnc(Waterfall):Phntchcutrcvhthngtimi
bctrckhitinhnhbctiptheo
Xonc(Spiral):Phtsinhnhanhhthngchcnngtng
trng,chukngnvnhanh
QutrnhthitkKDLinhnh
Chnqutrnhkinhdoanhmhnhha,nhthng,gi
nhng,
Chndliucaqutrnhkinhdoanh
Chnccchiuspdngtimibnghibngskin
Chnomibnghibngskin
32
Kin trc a tng
Monitor
& OLAP Server
other Metadata
sources Integrator
Analysis
Operational Extract Query
Transform Data Serve Reports
DBs Load
Refresh
Warehouse Data mining
Data Marts
34
BamhnhKDL
Khodoanhnghip(Enterprisewarehouse)
Tphpttcccthngtinvccchtritrnton
bdoanhnghip
KDLchuyn(DataMart)
Mttpcondliutondoanhnghipcgitrivi
mtnhmngidngchuynbit.PhmvicaKDL
chuyncgiihntrongccnhmchuynbit,
cchnlc,vdnhKDLchuyntipth.
KDLchuynclp<>Phthuc(trctiptKDL)
Khoo(Virtualwarehouse)
MttpkhungnhntrnCSDLtcnghip
35
Mhnhdliuachiu
Khuynh hng suy ngh ca ngi qun l kinh
doanh: nhiu chiu (multidimensionally). V d,
khuynhhngmtnhnggmcngtylm:
Chng ti kinh doanh cc sn phm trong nhiu th trng
khc nhau, v chng ti nh gi hiu qu thc hin ca
chngtiquathigian.
Ngi thit k DWH thng lng nghe cn thn v
thmvoccnhnmnhcbit:
Chng ti kinh doanh cc sn phm trong nhiu th trng
khc nhau, v chng ti nh gi hiu qu thc hin ca
chngtiquathigian.
Mhnhdliuachiu(2)
Mphngccchiutrongkinhdoanh
Trcgic:vickinhdoanhnhmtkhi(cube)dliu:
Minhntrnmicnhcakhi.
imtrongkhilccgiaoimcacccnh.
Vimtkinhdoanhtrn
CnhlSnphm,Thtrng,vThigian.
hiu v tng tng rng: im trong khi l cc o hiu qu kinh
doanh,kthpccgitrSnphm,ThtrngvThigian.
XLPHNTCHTRCTUYN
H thng OLAP (On_Line Analysis Processing
Xlphntchtrctuyn)
HTqunlchophpphntchdliu:
Ctlt(slice)dliutheonhiucnhkhcnhau,
Khoanxung(drilldown)mcchitithn
Cunln(rollup)mctnghphn.
BnchtctlicaOLAP
dliuclyratKDLhoctDatamart(khod
liuch)
dliucchuynthnhmhnhachiu
dliuclutrtrongmtkhodliuachiu.
XLPHNTCHTRCTUYN
i tng chnh ca OLAP l khi (cube): mt s biu din a
chiucadliuchititvtngth.
Nhc li: Khi bao gm mt bng s kin (Fact), mt/nhiu