You are on page 1of 37

Introduction to Data

Warehousing
MARIA CHAUDHRY
maria.vfbu@gmail.com
(Bahria University Lahore Campus) 2018
Reference Material

•  1. Data Warehousing Fundamentals,
2nd Edi6on, Paulraj Ponniah, 2010, John Wiley &
Sons Inc., NY.
•  2. Building the Data Warehouse, 4th Edi6on, W.
H. Inmon, 2005, John Wiley & Sons Inc., NY.
•  3. The Data Warehouse Toolkit,
2nd Edi6on Ralph Kimball and Margy Ross, 2002,
John Wiley & Sons Inc., NY.

Reference Material (Contd. )

•  4. Research Papers, journals, Case Studies and


magazines online

•  5. Hand-outs as per lecture requirement


Chapter 2

DATA WAREHOUSE –
THE BUILDING BLOCKS
CHAPTER 2 - OUTLINE
•  DATA WAREHOUSE : BILL INMON DEFINITION
•  DATA WAREHOUSE : KEY FEATURES
•  DATA WAREOUSE VS. DATA MARTS
•  DWH – BASIC ARCHITECTURAL TYPES
•  DWH – MAIN COMPONENTS / BUILDING BLOCKS

DWH–BILL INMON DEFINITION
•  “A Data Warehouse is a subject oriented,
integrated, nonvola6le, and 6me variant
collec6on of data in support of management’s
decisions ”

•  Sean Kelly, [DWH PRACTITIONER]


•  The data in the data warehouse is: separate,
available, integrated, 6me stamped, subject
oriented, non vola6le AND accessible.

DATA WAREHOUSE : KEY FEATURES

•  The data in the data warehouse is:
•  Separate
•  Available
•  Integrated
•  Time stamped
•  Subject oriented
•  Non vola\le
•  Accessible
1. Subject Oriented Data
2. Integrated Data
3. Time Variant Data
•  Opera\onal systems à current values of data ,
EXAMPLE
•  Data Warehouse à historical data plus current (?)
EXAMPLE
•  Data is stored as snapshots over past and current
periods
•  Changes to data are tracked and recorded
•  Reports to reflect changes if needed
•  TIME VARIANCE important for design & implementa\on
phase of DWH:
•  Allows for analysis of the past
•  Relates informa\on to the present
•  Enables forecasts for the future
4. Non volatile data
5. Data Granularity

DATA WAREOUSE VS. DATA MARTS
•  “The single most important issue facing the IT manager
this year is whether to build the data warehouse first or
the data mart first.”

•  Some fundamental ques\ons before DWH design


•  Top-down or bocom-up approach?
•  Enterprise-wide or departmental?
•  Which first—data warehouse or data mart?
•  Build pilot or go with a full-fledged implementa\on?
•  Dependent or independent data marts?
DATA WAREOUSE VS. DATA MARTS
DATA WAREOUSE VS. DATA MARTS
COURTESY : Aalborg Univ@cs.aau.dk.
DATA WAREOUSE VS. DATA MARTS
COURTESY : Aalborg Univ@cs.aau.dk.

DATA WAREOUSE VS. DATA MARTS

• CONCLUSION

“ A data warehouse means a collec\on
of the cons\tuent data marts “
DWH – BASIC ARCHITECTURAL TYPES

1. Centralized DWH
•  Considers enterprise-level informa\on
requirements
•  Atomic level normalized data
•  lowest level of granularity
•  Occasionally, some summarized data if required
•  Applica\ons access the normalized data in the
central data warehouse.
•  There are no separate data marts
2. Independent Data Marts
•  Used in companies with several organiza\onal units
à separate data marts à specific purposes
•  each data mart serves each par\cular organiza\onal
unit
•  Independent data marts (without a single unified
version)
•  Independent marts à inconsistent data defini\ons
and standards
•  analysis of data across data marts becomes difficult
•  EXAMPLE : two independent data marts, sales and
shipments à related subjects à difficult to analyze
sales and shipments data together
3. Federated DWH
•  Used in Companies with legacy of DSS
•  Old DSS structures, extracted data sets, primi\ve
data marts
•  Star\ng a DWH design from scratch is not
possible
•  data integra\on : physical or logical through
shared key fields
•  global metadata, distributed queries
•  No single overall Data Storage Unit
4. Hub and Spoke DWH
•  Centralized Architecture ++
•  overall enterprise-wide data warehouse (INMON
approach)
•  Atomic data stored in centralized DWH
•  Dependent Data Marts obtaining data from
centralized DWH
•  Centralized DWH = hub , several data marts = spokes
•  Dependent Data Marts ßà Variety of purposes
•  Most queries directed to dependent data marts (top
down DWH approach)
5. Data-mart Bus
•  KIMBAL conformed super marts approach
•  Begin by analyzing requirements for a specific
business subject
•  Incremental approach of developing data marts
•  data marts contain atomic data
•  EXAMPLE : orders, shipments, billings, insurance
claims, car rentals, and so on
•  logically integrated super marts provide an
enterprise view of data
•  bocom-up approach to DWH approach

Data Warehouse : Basic Architecture
• 

DWH – COMPONENTS/BUILDING BLOCKS

•  Major components
•  Source data component
•  Data staging component
•  Data storage component
•  Informa\on delivery component
•  Metadata component
•  Management and control component
1. Source Data Components
•  Source data can be grouped into 4 components
•  Produc1on data
•  Segments of data from different opera\onal
systems
•  Data from mul\ple ver\cal apps
•  Varia\ons in data formats, DB,OS, h/w plamorms
•  Predictable and narrow queries
•  Narrow scope, e.g. order details
•  DISPARITY : No conformance of data among
various opera\onal systems
•  Solu\on: standardize à transformà Integrate
•  Internal data
•  Private spreadsheets, documents, files,
departmental DB’s
•  E.g. Customer profiles for specific offering
•  Special strategies to transform it for DWH
1. Source Data Components (Contd.)
•  Archived data
•  Old data is archived periodically
•  Time span : as per circumstances in organiza\on
•  Yearly, 5 years, \ll life and so on
•  DWH have snapshots of historical data
•  METHOD : Staged Archival
•  External data
•  Execu\ves depend upon data from external sources
•  E.g. external agencies, na\onal sta\s\cal offices
•  E.g. Car rental company à produc\on schedules of
automobile manufacturers
•  E.g. Retail Company àTax regulatory standards
•  Important to spot industry trends & performance
comparison
•  Conformance issues: internal vs. external format &
data types
2. Data Staging Components
•  Aoer data is extracted, data is to be prepared
•  Data extracted from sources needs to be made
ready in suitable format
•  Cycle : CCCCD (Clean, Change, Combine, Convert,
De Duplicate)
•  Three major func\ons to make data ready
•  Extract
•  Transform
•  Load
2. Data Staging Components (Contd.)
•  ETL Cycle
•  Extract (E)
•  Numerous data sources à numerous formats
•  Numerous formats à numerous data models
•  Tools for extrac\on (high ini\al COST)
•  In house programs for extrac\on (maintenance /
upgrade COST)
•  Process : DWH implementa\on team à extract
source in a separate physical environment à flat
files / RDBMS à easy movement to DWH
2. Data Staging Components (Contd.)
•  ETL Cycle
•  Transform (T)
•  E.g. magazine subscrip\on system
•  Plamorm : manual à file à DB à RDB
•  Conversion from prior systems is a must
•  Process at DWH level : data transforma\on àfeed
1 à ini\al load à data transforma\on revised à
feed 2 à load 2 à itera\on
•  Tasks :
•  clean misspells, resolving conflicts of codes,
eliminate duplicates, missing values defaults
•  Standardize data types, field length, seman\cs,
synonyms, homonyms
•  Sor6ng and merging pieces of data
•  Summariza6on as per keys
2. Data Staging Components (Contd.)
•  ETL Cycle
•  Load (L)

•  Ini1al load
•  First \me DWH goes live
•  Substan\al \me involved à large volumes of data

•  Itera1ve load
•  Incremental
•  Less \me involved à small volumes of data
•  ETL cycle
3. Data Storage Components
OperaTonal System DB DWH DB
Day to day storage Periodic storage
Examples Examples
Current data Historic data
Normalized storage Large volumes storage
Fast efficient processing Not very quick retrievals
Random queries Analysis purpose queries
Moment by moment data change Data update is not con\nuous
READ WRITE UPDATE data repository READ ONLY data repository
Tools from many vendors
Normally open DB
RDBMS (proprietary product)
MDDBS (proprietary product)
4. Informa\on Delivery Component

• Categories of users
•  Novice user
•  Untrained user, Preset queries, pre fabricated
reports
•  Casual user
•  Pre packaged informa\on
•  Power user
•  complex analysis, custom reports, ad hoc queries,
•  Drill across data layers
4. Informa\on Delivery Component
(Contd.)
5. Meta Data Component
•  Opera\onal Meta Data
•  Extrac\on and transforma\on Meta Data
•  End User Meta Data

6. Management & Control Component
•  On top of all other components
•  Monitoring of data movements in between all
other components
CONCLUSION

• Refer to the chapter outline / chapter


summary
• Ques\ons

You might also like