Data Warehousing
Basics of Data Warehousing
Session Objectives
At the end of this session, you will be able to answer
What, Why and Where -- Data Warehouse
Terminologies / Technical jargons used
OLTP and OLAP
DW Architecture
ETL process in a Data Warehousing environment
Understand the Evolution of ETL tools
DW – 3 W’s
Data Warehouse --- What? Why? Where?
Decision Support System
In order to make correct decisions, accurate, meaningful information about
business environments, external issues, and internal workings must be available
in a timely fashion.
What is a Data Warehouse ?
A data warehouse is a subject-oriented, integrated,
nonvolatile, time-variant collection of data in
support of management's decisions.
- WH Inmon
WH Inmon - Regarded As Father Of Data Warehousing
Integrated - Characteristics of a Data
Warehouse
Appl A - m,f
Appl B - 1,0 m,f
Appl C - male,female
Appl A - balance dec fixed (13,2) balance dec
Appl B - balance pic 9(9)V99 fixed (13,2)
Appl C - balance pic S9(7)V99 comp-3
Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand
Appl A - date (julian)
Appl B - date (yymmdd) date (julian)
Appl C - date (absolute)
Integrated View Is The Essence Of A Data Warehouse
Non-volatile - Characteristics of a Data
Warehouse
insert change
Operational Data
Warehouse
insert
delete
load
read only
access
replace
change
Data Warehouse Is Relatively Static In Nature
Historical Look at
Informational Processing
The goal of Informational Processing is to turn data into
information!
Why?
Because business questions are answered using information and
the knowledge of how to apply that information to a given problem.
Data Information Knowledge
A Need For New Technology
Government and industrial entities have been collecting data in electronic format
since the 1960s.
Today, organizations collect millions of pieces of information about every
aspect of their operation on a daily basis.
Data is obtained from multiple disparate sources.
Often information is replicated, leading to confusion.
Related data is often retained in seemingly heterogeneous and incompatible
platforms.
Common data attributes are represented in nonstandard formats and naming
constructs across systems.
Most systems are built for data collection (transaction based).
Designed to support On-Line Transaction Processing (OLTP).
Designed to support day-to-day business operations.
Very specific applications built to support interaction with the data.
Perform best when handling small specific volumes of data.
Does not accept information from dissimilar sources readily.
A Need for New Technology
contd..
Capable of answering questions of a specific nature and time frame.
How many items do I have in stock today?
How many tickets were sold on a specific date?
What is the current price of an item?
Transaction based systems experience great difficulty in answering analytical
and decision support questions.
Analysis takes a long time, interfering with:
transaction performance
daily operations
The nature of the data is dynamic and dispersed.
A Need for New Technology
contd..
Most organizations have created a “spider web” of systems and data sources.
Databases Applications
A Need for New Technology
All of this has created “data overload” and “data confusion”.
What do I do with all of this data?
What does it mean?
Do I really need this data?
I am overwhelmed with the amount of data I am confronted with.
I cannot make a timely decision (too much data from too many sources).
Data, Data everywhere
• I can’t get the data I need
– need an expert to get the data
I can’t find the data I need
data is scattered over the network
many versions, subtle differences
• I can’t understand the data I found
– available data poorly documented
• I can’t use the data I found
– results are unexpected
– data needs to be transformed from one
form to other
Data Warehousing
Data warehousing is:
A large historical database designed to accept key analytical data from
multiple and disparate sources that manage the day-to-day management of
enterprise data. Furthermore, the role of the warehouse is to transform
transaction data into corporate information.
information The warehouse is provided in
a read-only fashion to a user.
Data Warehousing
A data warehouse will provide:
The ability to ask business analysis questions in a real-time, iterative
fashion, obtaining decision support information readily and quickly.
Data Warehousing
A data warehouse is not:
not
A repository for all corporate data.
A data warehouse will not:
not
Single handedly solve all of the problems associated to an enterprise.
DW Jargons
OLTP – Online Transaction Processing
OLAP – Online Analytical Processing
ETL – Extraction Transformation and Loading
DSS – Decision Support System
Metadata – Data About Data
Fact – Numeric values
Dimensions – Subject Areas of Business
ODS – Operational Data Store
STG – Data Staging Layer
Cube
Star Schema
Snowflake Schema
What is an Operational System /
OLTP?
Operational systems are just what their name implies; they are the systems that help
us run the day-to-day enterprise operations.
These are the backbone systems of any enterprise, such as order entry inventory etc.
The classic examples are airline reservations, credit-card
authorizations, and ATM withdrawals etc.,
Characteristics of Operational
Systems
Continuous availability
Predefined access paths
Transaction integrity
Volume of transaction - High
Data volume per query - Low
Used by operational staff
Supports day to day control operations
What is OLAP?
OLAP (On Line Analytical Processing) applications - designed for online
ad-hoc data access and analysis.
Data organized into multiple dimensions.
Access to analytical content such as time series and trend analysis views
and summary level information.
A set of functionality that attempts to facilitate multidimensional analysis.
Offers drill-down, drill-across and slice and dice capabilities.
OLAP - Fast Analysis
• On Line No piles of paper, please!
• Analytical Establish patterns
• Processing Data-based
• Fast Analysis of Shared Multidimensional Information
What is ETL?
ETL (Extraction, Transformation and Loading) is a process by which data is
integrated and transformed from the operational systems into the Data
Warehouse environment
Filters and
Extractors
Cleanser
Error
Operational systems Cleaning View
Rules Check
• Rule 1 Correct
• Rule 2
• Rule 3
Transformation
Rules
• Rule 1
• Rule 2
• Rule 3
Transformation
Engine
Integrator
Error
View
Check
Correct Loader Warehouse
Data Warehousing Architecture
Overview
Data Warehouse Architecture
Information Directory Repository
Data
Legacy Data
Transformation
Data Warehouse
Legacy Data Data Warehouse Management Layer
External Data Source
Data Warehousing
Model concepts:
Fact table(s)
A table containing multiple measurable descriptors relating to a
specific area of business
Each fact can be viewed, calculated, and aggregated against various
defining areas of the business (time, geography, customer)
Data Warehousing
Model concepts:
Dimension Table(s)
Retains information (product description, geography description,
customer description) that is descriptive and remains moderately
constant over time
Data Warehousing
Data Warehouse Modeling
Special modeling techniques must be applied to provide rapid response of
queries on large volumes of data.
OLTP systems are built with update operations in mind, resulting in
normalization and greatly reduced browse performance.
Data Warehousing
Common data model techniques are as follows:
star schema
snowflake
fact constellation
relational
Data Warehousing
Sample Star Schema Model
TIME GEOGRAPHY
Dimensions Dimensions
SALES
STORE CUSTOMER
Sales Facts
Data Warehousing
Year
North
Sample Snowflake Model
Qtr South
Month TIME East
GEOGRAPHY
West
Dimensions Dimensions
SALES
East Region
STORE CUSTOMER
West Region Sales Facts
Data Warehousing
Sample Fact Constellation Model
TIME Regional GEOGRAPHY
Sales
District
Dimensions Dimensions
Sales
Store
Sales
STORE CUSTOMER
Data Mart
Data mart is:
A functional segment of an enterprise restricted for purposes of security,
locality, performance, or business necessity using modeling and information
delivery techniques identical to data warehousing.
Data Mart
Why build a data mart?
Allows an organization to visualize the large but focus on the small and
attainable.
Provides a platform for rapid delivery of an operational system.
Minimizes risk.
A corporate warehouse can be constructed from the union of the enterprise
data marts.
Data Mart
Data From
Transaction Sources Data
Warehouse
Update From the
Warehouse
The data warehouse
populates
the data marts. Financial Logistics Contract
Data Mart Data Mart Data Mart
Data Mart
The data marts populate
the data warehouse.
Data
Warehouse
Update From the
Data Marts
Financial Logistics Contract
Data Mart Data Mart Data Mart
Data From
Transaction Sources
Data Mart
Virtual Data Warehouse
Abstract Data Warehouse
Data is moved through the Access Layer
abstract layer on demand.
The data warehouse layer
manages the data marts Financial Logistics Contract
as a warehouse. Data Mart Data Mart Data Mart
Data From
Transaction Sources
OLAP
OLAP is a powerful graphics-oriented tool used to access the data warehouse
OLAP supports
Business analysis queries
Data visualization
Trend analysis
Scenario analysis
User defined queries
OLAP
Drill Down
Move from summary to detail
Roll Up
Move from detail to summary
Slice and Dice
Look at a specific interest of the business
Pivot and Rotate
Looking at data from varying perspectives
Drill Through
Move to a near transaction level of detail
OLAP
The flavors of OLAP
Multidimensional On-Line Analytical Processing (MOLAP)
Relational On-Line Analytical Processing (ROLAP)
Hybrid On-Line Analytical Processing (HOLAP)
OLAP
MOLAP
Produces a hypercube
Pre-aggregated and pre-calculated
Rapid response times
Limited in the amount of data that can be managed
ROLAP
Data remains in a relational format
Some degree of aggregation
Slower response times
Scales to large amounts of data
HOLAP
Can manage data both as ROLAP and MOLAP
Currently evolving
MOLAP vendors are finding it easier to move into the HOLAP market space
Data Mining
As defined by the Gartner Group in 1995, data mining is:
“…the process of discovering meaningful new correlations, patterns, and
trends by sifting through large amounts of data stored in a repository, using
pattern recognition technologies and statistical and mathematical
techniques.”
Data Mining
Data mining requires an analyst who is familiar with the domain to appropriately
model scenarios.
Data mining assists analysts in uncovering nontrivial data relationships.
Analysis must be conducted to determine the meanings of these newly identified
relationships.
Why Use a Data Warehouse ?
Data warehousing is a must for anyone who uses multiple data sources to make
decisions and understand business (trends, forecasting).
Those who do not move to warehousing will not be capable of responding to
problems and business conditions, thus falling behind the competition.
For organizations wanting to minimize costs and maximize productivity,
warehousing is a must.
Individuals who spend time gathering data instead of analyzing data require the
assistance of a warehouse.
Organizations that collect data but have difficulty determining meanings and
impacts need a data warehouse.
Making the Warehouse a
Reality
Think big but work small.
Match technology to requirements.
Build for the future (scalability).
Work closely with the users.
Requirements
Rapid Application Development (RAD)
Periodic releases to the user community
Real World Success Stories
Radio Shack
Sales and stocking analysis
Marketing (regionalized mailings)
Wal-Mart
Sales and stock analysis
Trend analysis
Vendor analysis
Naval Surface Warfare Center (NSWC)
Procurement
Supply
Workload
Harris Semiconductor
Yield
Product
Personnel productivity
A Few Observations About
Data Warehouses
Industry and our experience indicate that:
Warehouses that succeed average an ROI of 400% with the top end being as
much as 600% in the first year.
The incremental approach is most successful (build the warehouse a
functional area at a time).
The average time to gather requirements, perform a design, and deploy a
warehouse increment is six months.
New tools may be required that differ from the transaction environment.
Software oriented toward intelligent analysis and query of the data
warehouse
Hardware oriented to support the massive storage requirements and
analytical queries
Keys to Success
Do you understand why you are building the warehouse?
Have you identified both technical and business professionals that you will need
to build the warehouse?
Do you have a strong management sponsor?
Are you managing the expectations of the users?
Careers in Data Warehousing
System Administration DBA
DW Architect Application Developer
Data Architect Data Cleansing/ Transformation Analyst
DW Manager Business Analyst
DW Administrator Management
Decision Support Analysts