You are on page 1of 16

Data Warehouse

Business Intelligence
Combination of technologies like
Data Warehousing (DW)
On-Line Analytical Processing (OLAP)
Data Mining (DM)
Data Visualiation (V!")
Decision Analysis (#hat-if)
Customer $elationshi% Management (C$M)
Operational Data
Presents a &ynamic 'ie# of the business
Must be ke%t u%-to-&ate an& current at all times
(%&ate& by transactions entere& by &ata-entry o%erators or s%ecially traine& en& users
!s maintaine& in &etail
(tiliation is %re&ictable) "ystems can be o%timie& for %ro*ecte& #orkloa&s
+igh 'olume of transactions, each of #hich affects a small %ortion of the &ata
(sers &o not nee& to un&erstan& &ata structures
-unctional orientation
Analytical Data
Presents a static 'ie# of the business
.n&-user access is usually rea&-only
More concerne& #ith summary information
(sage is un%re&ictable in terms of &e%th of information nee&e& by the user
"maller number of /ueries, each of #hich may access large amounts of &ata
(sers nee& to un&erstan& the structure of the &ata (an& business rules) to &ra#
meaningful conclusions from the &ata
"ub*ect -orientation
Database
0roa&ly classifie& into
1) OL2P (Online 2ransactional Processing) D0
3) OLAP (Online Analytical Processing) D0
OLAP
"licing an& &icing of &ata is calle& as Online Analytical Processing (OLAP))
OLAP only ser'es the nee&s of &ata #arehousing than OL2P)
OLAP systems allo# a& hoc %rocessing an& su%%ort access to &ata o'er time
%erio&s)
OLAP systems are the aggregation, transformation, integration an& historical
collection of OL2P &ata from one or more systems)
2y%ical OLAP o%erations4
1) $oll u% (&rill u%)
- summarize data by climbing hierarchy or by dimension
reduction.
3) Drill &o#n(roll &o#n)
- from higher level summary to lower level summary or
detailed data, or
- introducing new dimensions
5) "lice an& &ice
- project and select
6) Pi'ot (rotate)
- reorient the cube, visualization, 3D to series of 2D planes.
OLAP vs OLTP
Slno OLTP OLAP
1) 2ransaction Oriente& Decision Oriente& ($e%orts)
3) Com%le7 &ata mo&el (fully
normalie&)
"im%le &ata mo&el
(multi&imensional8&e-normalie&)
5) "maller &ata 'olume (fe# historical
&ata)
Larger &ata 'olumes (collection of
historical &ata)
6) Many, 9small9 /ueries -e#er, but 9bigger9 /ueries
:) -re/uent u%&ates -re/uent rea&s, in-fre/uent u%&ates
(&aily)
;) +uge no) of users(clerks)) Only fe# users(Management Personnel)
Objective of Data Warehouse
2he %rimary %ur%ose of a &ata #arehouse is to %ro'i&e easy access to s%ecially
%re%are& &ata that can be use& #ith &ecision su%%ort a%%lications, such as management
re%orting, /ueries, &ecision su%%ort systems, an& e7ecuti'e information systems)
Decision Support
A Decision "u%%ort "ystem (D"") is a system that %ro'i&es managers #ith
information they nee& to make &ecisions) 2hese systems ha'e the effect of em%o#ering
em%loyees at all le'els, %ro'i&ing them access to business an& financial information that
&irectly im%act their %ro&ucti'ity an& /uality of #ork
Eecutive infor!ation syste!s
An .7ecuti'e information system (.!") is a concise sna%shot of ho# the com%any
is &oing to&ay) Consi&er it as an electronic e7ecuti'e briefing) .!" allo#s greater
fle7ibility in <slicing-an&-&icing9 &ata, i)e)= it allo#s e7%loration of &ata through multi%le
&imensions or 'ie#s)
Why Data"arehouse#
0y centraliing &ata
1) 2he /ueries can be ans#ere& locally #ithout accessing the original information
sources) 2hus, high /uery %erformance can be obtaine& for com%le7 aggregation
/ueries that are nee&e& for in-&e%th analysis, &ecision su%%ort an& &ata mining > a
#ay of e7tracting rele'ant &ata from a 'ast &atabase)
3) On-line Analytical Processing (OLAP) is &ecou%le& (se%arate&) as much as
%ossible from On-line 2ransaction Processing (OL2P)) 2hus making information
accessible to &ecision makers a'oi&ing interference of OLAP #ith local
%rocessing at the o%erational sources)
Data "arehouse
A &ecision su%%ort &atabase that is maintaine& se%arately from the organiation?s
o%erational &atabases
A Data Warehouse is an enter%rise-#ise collection of
"ub*ect oriente&
!ntegrate&
2ime 'ariant
@on-'olatile
&ata in su%%ort of management?s &ecision making %rocess)
- W) +) !nmon, 1AA5
BSubject riented - Data #arehouses focuses on high-le'el business entities like
sales,marketing,etc)
B!ntegrated - Data in the #arehouse is obtaine& from multi%le sources an& ke%t in a
consistent format)
B"ime-#arying - .'ery &ata com%onent in the &ate #arehouse associates itself #ith some
%oint of time like weekly,monthly,quarterly, yearly
B$on-volatile - Dw stores historical data) Data &oes not change once it gets into the
#arehouse) Only loa&8refresh)
Data from the o%erational systems are
.7tracte&
Cleanse&
2ransforme&
1) case con'ersion,
3) &ata trimming,
5) concatenation,
6) &ataty%e con'ersion
Aggregate&
Loa&e& into DW
Perio&ically refreshe& to reflect u%&ates at the sources an& %urge& from
the #arehouse onto slo#er archi'al storage)
$se of DW%
A&-hoc analyses an& re%orts
Data mining4 i&entification of tren&s
Management !nformation "ystems
Designing a &atabase for a Data Warehouse
1) Define (ser re/uirements, consi&ering &ifferent 'ie#s of users from &ifferent
&e%artments)
3) !&entify &ata integrity, synchroniation an& security issues8bottlenecks)
5) !&entify technology, %erformance, a'ailability C utiliation re/uirements)
6) $e'ie# normalie& 'ie# of relational &ata to i&entify entities)
:) !&entify &imensions)
;) Create an& organie hierarchies of &imensions)
D) !&entify attributes of &imensions)
E) !&entify fact table(s))
A) Create &ata re%ository (meta&ata))
1F) A&& calculations)
Data!art
Datamart is a subset of &ata #arehouse an& it is &esigne& for a %articular line of
business, such as sales, marketing, or finance)
!n a &e%en&ent &ata mart, &ata can be &eri'e& from an enter%rise-#i&e &ata
#arehouse)
!n an in&e%en&ent &ata mart, &ata can be collecte& &irectly from sources
May be structure& for s%ecific access tools
Datamart is the &ata #arehouse you really use
Why DatamartG
1) Data#arehouse %ro*ects are 'ery e7%ensi'e an& time taking)
3) "uccess rate of DW+ %ro*ects is 'ery less
2o a'oi& single %oint of loss #e i&entify &e%artment #ise nee&s
an& buil& Datamart) !f succee&e& #e go for other &e%artments an& integrate all
&atamarts into a Data#arehouse)
A&'antages
!m%ro'e &ata access %erformance
"im%lify en&-user &ata structures
-acilitate a& hoc re%orting
Slno Data "arehouse Data !art
1) DW O%erates on an enter%rise
le'el an& contains all &ata use&
for re%orting an& analysis
Data Mart is use& by a s%ecific
business &e%artment an& is
focuse& on a s%ecific sub*ect
(business area))
DM is a subset of DW+
DW% A'(%ITE(%T$'E
Data #arehouse architecture is a #ay of re%resenting the o'erall structure of &ata,
communication, %rocessing an& %resentation that is %lanne&, for en&-user com%uting
#ithin the enter%rise) 2he architecture has the follo#ing main %arts4
O%erational &ata base
!nformation access layer
Data Access layer
Data &ictionary (meta&ata) layer
Process management layer
A%%lication messaging layer
Processing (Data Warehouse) layer
Data "taging layer
perational data is the information relate& to &ay-to-&ay functioning of an
organiation) An o%erational &atabase stores business transactions critical to the
functioning of the organiation)
!nformation access layer is the layer that the en&-user &eals #ith &irectly)
.7am%les of these are a&-hoc /uery tools like 0usiness Ob*ects, Po#er Play an&
!m%rom%tu)
Data access layer is the &ata interchange layer) 2his layer %ro'i&es interface
bet#een o%erational &ata bases an& information access layers) 2he common &ata
language use& is H"IL?) A familiar e7am%le of a &ata access layer is HOD0C?)
%etadata layer hol&s a re%ository of Meta&ata information) Meta&ata is &efine&
as &ata about &ata, resulting in an intelligent, efficient #ay to manage &ata) Meta&ata
%ro'i&es the structure an& content of the &ata #arehouse, source an& ma%%ing
information, transformation 8 integration &escri%tion an& business rules) !t is essential for
/uality im%ro'ement in a Data Warehouse)
&rocess management layer is in'ol'e& in sche&uling the 'arious tasks that must
be e7ecute& to buil& an& maintain the &ata #arehouse an& &ata re%ository) !t also hel%s to
kee% the Data Warehouse u%-to-&ate)
'pplication messaging layer trans%orts information aroun& the enter%rises?
com%uting net#ork) !t also acts as Hmi&&le-#are? an& isolates a%%lications from e7act
&ata format on either en&)
&rocessing (data warehouse) layer is the logical 'ie# of the informational &ata) !t
also %erforms the summariation, loa&ing an& %rocessing of &ata from o%erational
&atabases)
Data staging layer manages &ata re%lication across ser'ers) !t also manages &ata
transformation)
ETL
1) .2L means .7traction, transformation, an& loa&ing)
3) .2L refers to the metho&s in'ol'e& in accessing an& mani%ulating source &ata
an& loa&ing it into target &atabase)
ETL Process
.tl is a %rocess that in'ol'es the follo#ing tasks4
etracting &ata from source o%erational or archi'e systems #hich are the %rimary
source of &ata for the &ata #arehouse
transfor!ing the &ata - #hich may in'ol'e cleaning, filtering, 'ali&ating an&
a%%lying business rules
loa&ing the &ata into a &ata #arehouse or any other &atabase or a%%lication that
houses &ata
Transfor!
1) Denormalie &ata
3) Data cleaning)
5) Case con'ersion
6) Data trimming
:) "tring concatenation
;) &ataty%e con'ersion
D) Deco&ing
E) calculation
A) Data correction)
(leansing
2he %rocess of resol'ing inconsistencies an& fi7ing the anomalies in source &ata,
ty%ically as %art of the .2L %rocess)
Data Staging Area
1) Most com%le7 %art in the architecture)
3) A %lace #here &ata is %rocesse& before entering the #arehouse
5) !t in'ol'es)))
.7traction (.)
2ransformation (2)
Loa& (L)
!n&e7ing
Popular ETL Tools
Tool )a!e (o!pany )a!e
!nformatica !nformatica Cor%oration
D28"tu&io .mbarca&ero 2echnologies
Data"tage !0M
Ab !nitio Ab !nitio "oft#are Cor%oration
Data Junction Per'asi'e "oft#are
Oracle Warehouse 0uil&er Oracle Cor%oration
Microsoft "IL "er'er !ntegration Microsoft
2ransformOnDeman& "olon&e
2ransformation Manager .2L "olutions
Di!ensional *o&eling
Means storing &ata in fact an& &imension tables)
+ere &ata is fully &enormalie&
Di!ension table
1) Dimension table gi'es the &escri%ti'e attributes of a business)
3) 2hey are fully &enormalie&
5) !t has a %rimary key
6) Data arrange& in hierarchical manner (%ro&uct to category= month to year) > if so
#e can use for &rill &o#n an& &rill u% analysis
:) +as less no) of recor&s
;) +as rich no) of columns
D) +ea'ily in&e7e&
E) Dimension tables are sometimes calle& looku% or reference tables)
Types of Di!ensions
1) @ormal Dimension
3) Confirme& Dimension
5) Junk Dimension
6) Degenerate& Dimension
:) $ole Playing Dimension
(onfir!e& Di!ension
Dimension table use& by more than one fact table is calle& Confirme& Dimensions
(dimensions that are linked to multiple fact tables)
D1 D2 D1 D2 D5
FT1 FT2 FT3
D3 D D3
A&'4
1) 2o a'oi& unnecessary s%ace
3) $e&uce time
5) Drill across fact table
+un, Di!ension
is an abstract &imension it #ill remo'e number of foreign keys from fact table)
2his is achie'e& by combining 3 or more &imensions into a single &imension)
Degenerate& Di!ension
%eans a *ey value or dimension table which does not have descriptive attributes.
i.e.) a non foreign *ey and non numerical measure column used for grouping purpose
.7 4 !n'oice @umber, 2icket @umber
'ole Playing Di!ension
Means a single %hysical &imension table %lays &ifferent role #ith the hel% of
'ie#s)
-act Table
1) 2he centralie& table in a star schema is calle& as -AC2 table
3) A fact table ty%ically has t#o ty%es of columns4
@umerical measures an&
-oreign keys to &imension tables)
5) 2he %rimary key of a fact table is usually a com%osite key that is ma&e u% of all of
its foreign keys
6) -act tables store &ifferent ty%es of measures like
a&&iti'e,
non a&&iti'e an&
semi a&&iti'e measures
:) A fact table might contain either &etail le'el facts or facts that ha'e been
aggregate&
;) A fact table usually contains facts #ith the same le'el of aggregation)
D) +as millions of recor&s
*easure Types
A&&iti'e - Measures that can be summarie& across all &imensions)
o .74 sales
@on A&&iti'e - Measures that cannot be summarie& across all &imensions)
o .74 a'erages
"emi A&&iti'e - Measures that can be summarie& across fe# &imensions an& not
#ith others)
o .74 in'entory le'els
-actless -act
A fact table that contains no measures or facts is calle& as -actless -act table)
Slo"ly (hanging Di!ensions
1) Dimensions that change o'er time are calle& "lo#ly Changing
Dimensions
3) "lo#ly Changing Dimensions are often categorie& into three ty%es
namely
2y%e1,
2y%e3 an&
2y%e5
Type . S(D /
(se& if history is not re/uire&
O'er#riting the ol& 'alues)
Pro&uct Price in 3FF64
Pro&uct !D(PK) Lear Pro&uct @ame Pro&uct Price
1 3FF6 Pro&uct1 M1:F
Pro&uct Price in 3FF:4
Pro&uct !D(PK) Lear Pro&uct @ame Pro&uct Price
1 3FF: Pro&uct1 M3:F
Type 0 S(D/
!f history an& current 'alue nee&e&
Creating another a&&itional recor&)(ne# recor& #ith ne# changes an& ne#
surrogate key)
Mostly %referre& in &imensional mo&eling
Pro&uct
Pro&uct
!D(PK)
.ffecti'e
Date2ime(PK)
Lear
Pro&uct
@ame
Pro&uct
Price
.7%iry
Date2ime
1 F1-F1-3FF6 3FF6 Pro&uct1 M1:F 13-51-3FF6
13)FFAM 11):APM
1
F1-F1-3FF:
13)FFAM
3FF: Pro&uct1 M3:F
Type 1 S(D/
(se& if changes are 'ery less
Pre'ious one le'el of history a'ailable
Creating ne# fiel&s)
Pro&uct Price in 3FF:
Pro&uct !D(PK)
Current
Lear
Pro&uct
@ame
Current
Pro&uct Price
Ol& Pro&uct
Price
Ol& Lear
1 3FF: Pro&uct1 M3:F M1:F 3FF6
Surrogate ,eys
"urrogate keys are al#ays numeric an& uni/ue on a table le'el #hich makes it
easy to &istinguish an& track 'alues change& o'er time)
"urrogate keys are integers that are assigne& se/uentially as nee&e& to %o%ulate a
&imension)
"urrogate keys merely ser'e to *oin &imensional tables to the fact table)
"urrogate keys are beneficial as the follo#ing reasons4
1) $e&uces s%ace use& by fact table
3) -aster retrie'al of &ata ( since al%hanumerical retrie'al is costlier than
numerical &ata)
5) Maintaining in&e7 is easier #ith numeric key)
6) Maintain all slo#ly changing &imenion)
Data "arehouse Design
2he &ata #arehouse &esign essentially consists of four ste%s, #hich are as
follo#s4
1) !&entifying facts an& &imensions
3) Designing fact tables
5) Designing &imension tables
6) Designing &atabase schemas
Types of &atabase sche!as
2here are three main ty%es of &atabase schemas4
1) "tar "chema,
3) "no#flake "chema an&
5) "tarflake schema)
Star Sche!a
1) !t is the sim%lest form of &ata #arehouse schema that contains one or more
&imensions an& fact tables
3) !t is calle& a star schema because the entity-relationshi% &iagram bet#een
&imensions an& fact tables resembles a star #here one fact table is connecte& to
multi%le &imensions
5) 2he center of the star schema consists of a large fact table an& it %oints to#ar&s
the &imension tables
6) -act 2able N +ighly @ormalie&
Dimension 2able N +ighly &enormalie&)
:) !t can be 'ery effecti'e to treat fact &ata as %rimarily rea&-only &ata, an&
&imensional &ata as &ata that #ill change o'er a %erio& of time
A&'antages4
"tar schema is easy to &efine)
!t re&uces the number of %hysical *oins)
Pro'i&es 'ery sim%le meta&ata)
Dra#backs4
"ummary &ata in -act tables (such as "ales amount by region, or &istrict-#ise, or year-
#ise) yiel&s %oor %erformance for summary le'els an& huge &imension tables)
Steps in &esigning Star Sche!a
1) !&entify a business %rocess for analysis (like sales))
3) !&entify measures or facts (sales &ollar))
5) !&entify &imensions for facts (%ro&uct &imension, location &imension, time
&imension, organiation &imension))
6) List the columns that &escribe each &imension) ($egion name, branch name,
em%loyee name))
:) Determine the lo#est le'el of summary in a fact table (sales &ollar))
-act constellation4
Dimension tables #ill, in turn, ha'e their o#n &imension tables) !n this case, the
Store &imension #ill contain District i&s an& $egion i&s, #hich #ill reference
&istrict an& region &imensions of Store &imension, res%ecti'ely) 2his "chema is
calle& -act (onstellation "chema)
Sno"fla,e sche!a
1) A sno#flake schema is a term that &escribes a star schema structure normalie&
through the use of outrigger tables) i)e &imension table hierarchies are broken into
sim%ler tables
3) $e%resent &imensional hierarchy &irectly by normaliing the &imension tables ie)
all &imensional information is store& in thir& normal form
5) 2his im%lies &i'i&ing the &imension tables into more tables, thus a'oi&ing non-
key attributes to be &e%en&ent on each other)
A&'antages4
"no#flake schema %ro'i&es best %erformance #hen /ueries in'ol'e aggregation)
Disa&'antages4
Maintenance is com%licate&)
!ncrease in the number of tables)
More *oins #ill be nee&e&
Sno"fla,e Sche!a
Starfla,e Sche!a
1) combinations of &enormalie& "tar an& normalie& "no#flake schemas)
Star Sche!a vs Sno"fla,e Sche!a
Slno Star Sche!a Sno" -la,e
1) Dimension table #ill not ha'e any
%arent table
Dimension table #ill ha'e one or more
%arent tables
3) +ierarchies for the &imensions are
store& in the &imensional table itself
+ierarchies are broken into se%arate
tables in sno# flake schema
2ranularity
Means #hat &etail &ata to be store& in fact table
2y%es of Oranularity
1) 2ransactional Le'el Oranularity
3) Perio&ic "na%shot Oranularity
Transactional Level 2ranularity
Mostly use&
.ach an& e'ery transaction store& in fact table
Drill &o#n an& &rill u% analysis can be &one
Disa&'antage
1) "ie increases)
Perio&ic Snapshot 2ranularity
"ummariing &ata o'er a %erio& is store& in fact table
A&v / -aster retrie'al (less recor&s)
Disa&v 4 Detail information not a'ailable
-A3
%ierarchy

1) +ierarchies are logical structures that use or&ere& le'els as a means of organiing
&ata)
3) A hierarchy can be use& to &efine &ata aggregation)
.7am%le
countryPcityPstatePi%
in a time &imension, a hierarchy might be use& to aggregate &ata from the Month
le'el to the Iuarter le'el, from the Iuarter le'el to the Lear le'el)
Level
A %osition in a hierarchy) -or e7am%le, a time &imension might ha'e a hierarchy that
re%resents &ata at the Month, Iuarter, an& Lear le'els)
Operational Data Store
!n recent times, OLAP functionality is being built into OL2P systems #hich is
calle& OD" (o%erational &ata store))
A %hysical set of tables sitting bet#een the o%erational systems an& the &ata
#arehouse or a s%ecially a&ministere& hot %artition of the &ata #arehouse itself)
2he main reason of OD" is to %ro'i&e imme&iate re%orting of o%erational results
if neither the o%erational system nor the regular &ata #arehouse can %ro'i&e
satisfactory accsee)
"ince an OD" is necessarily an e7tract of the o%erational &ata, it also may %lay the
role of source for &ata #arehouse)
Data Staging Area
1) A storage area that clean, transform, combine, &u%licate an& %re%are source &ata
for use in the &ata #arehouse)
3) 2he &ata staging area is e'erything in bet#een the source system an& &ata
%resentation ser'er)
5) @o /uerying shoul& be &one in the &ata staging area because the &ata staging area
normally is not set u% to han&le fine-graine& security, in&e7ing or aggregation for
%erformance)
Data Warehouse Bus *atri
1) 2he matri7 hel%s %rioritie #hich &imensions shoul& be tackle& first for
conformity gi'en their %rominent roles)
3) 2he matri7 allo#s us to communicate effecti'ely #ithin an& across &ata mart
teams)
5) 2he columns of the matri7 re%resent the common &imensions)
6) 2he ro#s i&entify the organiations business %rocesses)
Degenerate& Di!ension
O%erational control numbers such as in'oice numbers, or&er numbers an& bill of la&ing
numbers looks like &imension key in a fact table but &o not *oin to any actual &imension
table) 2hey gi'e rise to em%ty &imension hence #e refer them as Degenerate&
Dimension(DD))

You might also like