You are on page 1of 26

Data Warehousing – An

Introductory Perspective

DWCC BBSR
Agenda
 Why Data Warehouse
 Definition and Architecture
 Terminology
The Business Need

Business Decisions It
hin
Are not made by I g k…
ue . e
Rolling Dices ss rrr
so r,

We Don’t know
What we don’t
know
Current Business Environment
 Competitive
 Ever Changing
 Chaotic
 Global
 Urgency to make decisions
 Competitive advantages stems from well informed
decisions
 Based on an understanding of:
 Your Products
 Your Customers Preferences
 The Competition
 Your own company strengths
The Value Pyramid

Each layer provides


Value en route to a
targeted business
Outcome

Business Outcomes

Actions

Decisions

Knowledge
Increased revenue
Increased productivity
Information
Reduced costs
Competitive advantage Data
Definitions
 A collection of integrated, subject oriented
databases designed to support the DSS function
where each unit of data is relevant at some moment
of time (Inmon 1991)
 A copy of transaction data specifically structured to
Query and Analysis (Kimball 1996)
 Data Warehouse is NOT a specific technology
 It is a series of processes, procedures and tools that
help the enterprise understand more about itself, its
products, its customers and the market it services.
 It is NOT possible to purchase a Data Warehouse
 But, it is possible to build one.
sachin_kambhoj
sachin_kambhoj
: : sachin_kambhoj
sachin_kambhoj
::

FEATURES
•Non Volatile - Used mainly for reporting purpose and it is
independent of transactional data.

•Subject Orientation- All relevant data is stored together.


Ex: Sales, Finance, Marketing, Customer data etc.

•Historical data- Can contain data of several years


depending on company requirements.
Subject Orientation.

 Operational Datawarehouse

AUTO Customer

HEALTH Policy

LIFE Premium

CASUALTY Claims

Applications Subjects
Goals and Applications

 Goals of a Data Warehouse


 Provide reliable, High performance access
 Consistent view of Data: Same query, same data. All users
should be warned if data load has not come in.
 Slice and dice capability
 Quality of data is a driver for business re-engineering.

 Data Warehousing Applications:


 Customer Profitability Analysis
 Customer satisfaction and retention
 Buyer behavior.
 Pricing, Promotion Analysis
 Market research
 Inventory optimization
OLTP v/s Data Warehouse
OLTP system runs the business, Data Warehouses
tell you how to run the business
Characteristic OLTP Data Warehouse
Orientation Transaction Analysis
Data Access Record at a time Set at a time
Updates Frequent & Periodic &
Unscheduled Scheduled
Response time Seconds required Minutes
acceptable
Concurrent users Many Few
Availability Guaranteed As needed
Data structures Highly normalized Often de-
normalized
Data nature Current historical
If most of your business needs are
 To report on data in a single transaction processing system
 All the historical data you need are in the system
 Data in the system is clean
 Your hardware can support reporting against the live system data
 The structure of the system data is relatively simple
 Your firm does not have much interest in end user adhoc
query/report tools

Data warehousing may not be for your business!!


Modeling Constructs

 Entity Relationship Diagram


 Star schema
 Snow flake schema

Within the implementation of a warehouse,


several of these constructs may be
integrated to form an optimal design
Entity Relationship Diagram

• Based on set theory and SQL


• Highly normalized
• Optimized for update and fast transaction
turnaround
• Not suited for querying in a data warehouse
environment
• diagrams like these are very difficult for users to
visualize and memorize.
Star Schema
A central fact table surrounded by a number of dimension
tables.

Dimensions are business entities on which calculations are


done. They can be numeric or alphanumeric.
Example: Product table comprising brand name,
category, packaging type, size.

Facts are numerical measurements of business with


respect to dimensions.They are numeric and additive
(summable across any combination)
e.g. A sales fact table could contain time, product and store
key along with dollars sold, units sold, dollars cost.
Snow Flake Schema

Normalized version of the star schema with the addition of


normalized dimension tables.

Normalization helps to reduce redundancy in the dimension


tables, but affects performance and user comprehension.
DW Terminology
Granularity
 Granularity (or grain) defines the level of detail stored in the
physical warehouse
 Low granularity indicates lot of detail while high granularity
indicates less detail.
 Example: A commercial airline is building a data warehouse. What
will the granularity be?
 Choice A: Each record represents a flight

 Choice B: Each record represents the customer on a flight

There is no correct answer. To a large extent, the granularity


depends on the business User’s exploitation needs.
However, you should be aware that the granularity of data
affects
 Volumes of Data, Data Maintenance, Indexing

 Level of Data Exploration

 Query and Reporting constraints


DW Terminology
Metadata
 At all levels of the data warehouse, information is
required to support the maintenance and use of the
data warehouse.
Metadata is data about data.
There are two views of Metadata
 Business – are warehouse attributes and properties
for use by business users
 Technical – describe data flow from Operational
systems into the data warehouse
OLAP
 Online Analytical processing
 Tool(s) for Analytical Reporting including Graphical
capabilities.
DW Terminology
OLAP Tools available for exploring the information built
in a DW :
 Multi-dimensional On-line Analytical Processing
(MOLAP)
 The data from data warehouse is queried and dumped
periodically on to a server on local network to a data
storage called Multi-dimensional Database (MDDB)
provided by the OLAP tool. This MDDB forms a Data Mart
which is then used for querying and reporting.
 Relational On-Line Analytical Processing (ROLAP)
 Refers to the ability to conduct OLAP analysis directly
against a relational warehouse without any constraints on
the number of dimensions, database size, analytical
complexity, or number and type of users.
 Hybrid On-line Analytical Processing (HOLAP)
 An environment with a combination of MOLAP and ROLAP
data storage. Summarized information is typically stored
in an MDDB and detailed data is stored in a Relational
environment.
Terminology

Data Mart- Contains Data about a specific subject. Eg. Official


data, Customer data, Campaign data etc.

Metadata- Data about data. Describes the data stored in Data


warehouse.

Data Cubes- Central object of data containing information in a


multidimensional structure.

Data Cleansing- Regular cleaning of data.

ETL- Extraction, Transformation and Loading of Data.

Data Mining- A mechanism which uses intelligent algorithms to


discover patterns, clusters and models from data.
Stages

Heterogeneous Extraction, Business


Source Systems Transformation & Intelligence
Loading (ETL)
Query &
Reporting
Operational

Staging
Legacy Area Data OLAP
Warehouse

External
Data Mining
A Typical Data Warehouse

Data Warehouse
Summarized Data Facilitates in firing
Meta Data queries on
detailed data.

Detailed Data

Data Data Data Data marts contain


Mart Mart Mart data specific to a
subject.
MOLAP/ROLAP/HOLAP

MDD MDD
Proprietary Proprietary
API API
SQL
Query Tool OLAP Custom Data
by MDD Vendor
Engine Loader Warehouse

Rows Rows (RDBMS)


Cubes MDD
Periodic,
Database
Manual
Storage
Data
Load
OLAP Terminology
Region
State
District
no
gi
Re

Location

Month

Product
 Analytical technique whereby the user navigates
from the most summarized to the most detailed
level.
OLAP Terminology
 Rotation Or Dicing
n

th
o
gi

on
Re

M
M
P
O R
O
N
D
T U
C
h
t

Product Region
OLAP Terminology
gi
o Slicing
n
Re

M
O
N
T
h

Product
Products and Vendors

 Data Warehouses
 Oracle
 Sybase
 DB2
 OLAP tools
 Oracle Express
 Hyperion Essbase
 Data Mining
 Oracle Darwin
 IBM Intelligent Data Miner
 Querying & Reporting
 Oracle Discoverer
 Business Objects

You might also like