You are on page 1of 48

Data Warehouse and OLAP

Week-2
06-nov-2018

By
Dr.T.GopiKrishna
Bad decisions can lead to disaster!
• Data Warehousing is at the base of decision support
systems
Why Data Warehousing & OLAP is important?
It helps to ..
Why Data Warehousing & OLAP is
important? Contd..
It helps to ..
• Understand the information hidden within the organization’s
data.

• See data from different angles:


product, client, time, geographical area

• Get adequate statistics to get your point of argumentation


across

• Get a glimpse of the future…


What is a Data Warehouse?
A Practitioners Viewpoint

“A data warehouse is simply a single,


complete, and consistent store of
data obtained from a variety of
sources and made available to end
users in a way they can understand
and use it in a business context.”
-- Barry Devlin, IBM Consultant
What is a DataWarehouse ?
An Alternative Viewpoint
A data warehouse is a subject-oriented,
integrated, nonvolatile, time-variant
collection of data in support of
management's decisions.

collection of data that is used primarily in


organizational decision making.”

WH Inmon - Regarded As Father Of Data Warehousing


Subject-Oriented- Characteristics of a Data Warehouse

Data
Operational
Warehouse

Leads Prospects Customers Products

Quotes Regions Time


Orders

Focus is on Subject Areas rather than Applications


Subject oriented contd..
• The data in the data warehouse is organized so that
all the data elements relating to the same real-world
event or object are linked together
• Typical subject areas in DWs are
Customer,
Product,
Order,
Claim,
Account,…
Subject oriented contd..
• Case Study: customer as subject in a DW
•DW is organized in this case by the customer
•It may consist of 10, 100 or more physical
tables, all related.
Integrated
• The data warehouse contains data from most
or all of an organization's operational systems
and this data is made consistent
–use case: gender, measurement, conflicting
keys,
Integrated (use case) - Characteristics of a
Data Warehouse
Appl A - m,f
Appl B - 1,0 m,f
Appl C - male,female

Appl A - balance dec fixed (13,2)


balance dec
Appl B - balance pic 9(9)V99
fixed (13,2)
Appl C - balance pic S9(7)V99 comp-3

Appl A - bal-on-hand
Appl B - current-balance Current balance
Appl C - cash-on-hand

Appl A - date (julian)


Appl B - date (yymmdd) date (julian)
Appl C - date (absolute)

Integrated View Is The Essence Of A Data Warehouse


Non-volatile
–– Data in the data warehouse is never over-
written or deleted-once committed, the
data is static, read-only, and retained for
future reporting.
––Data is loaded, but not updated.
––When subsequent changes occur, a new
snapshot record is written.
Non-volatile contd..- Characteristics of a
Data Warehouse
insert change

Operational Data
Warehouse
insert
delete
load
read only
access
replace
change

Data Warehouse Is Relatively Static In Nature


Time-varying
– The changes to the data in the data
warehouse are tracked and recorded so that
reports can be produced showing changes over
time.
– Different environments have different time
horizons associated.
• While for operational systems a 60-to-90 day
time horizon is normal, data warehouse has a
5-to-10 year horizon
Time Variant contd..- Characteristics of a
Data Warehouse

Data
Operational
Warehouse

Current Value data Snapshot data


• time horizon : 60-90 days • time horizon : 5-10 years
•data warehouse stores historical
data

Data Warehouse Typically Spans Across Time


More general,Definition of DW is..

a DW is a
–Repository of an organization’s
electronically stored data.

–Designed to facilitate
reporting and
analysis.
Typical Features of DW..
-- Reside on computers dedicated to this function.
– Run on DBMS such as Oracle, IBM DB2,
Teradata or Microsoft SQL Server
– Retain data for long periods of time
– Consolidate data obtained from a variety of
sources
– Are built around their own carefully designed
data model
Evolution of Data Warehousing
1. 1960 - 1985 : MIS Era
• Unfriendly
• Slow
• Dependent on IS programmers
• Inflexible
• Analysis limited to defined reports
Evolution of Data Warehousing
2. 1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data

• SQL as interface not scalable

• Cannot handle complex analysis


Evolution of Data Warehousing
3. 1990 - 20xx : Analysis Era
• Trend Analysis
• What If ?
• Cross Dimensional Comparisons
• Statistical profiles
• Automated pattern and rule discovery
Need for Data Warehousing
• Better business intelligence for end-users

• Reduction in time to locate, access, and analyze information

• Consolidation of disparate information sources

• Strategic advantage over competitors

• Faster time-to-market for products and services

• Replacement of older, less-responsive decision support systems

• Reduction in demand on IS to generate reports


OLTP (OnLineTransaction
Processing):

Also known under the name of operational data,

it represents day-to-day operational business activities:

•Purchasing, sales, production distribution, …

–Typically for data entry and retrieval transaction


processing.

–Reflects only the current state of the data.


OLTP vs. DW

• Represents front-end analytics based on a


DW repository.
–It provides information for activities like
Resource planning, capital budgeting,
marketing initiatives,…
–It is decision oriented.
OLTP Systems Vs Data Warehouse
Remember

Between OLTP and Data Warehouse systems

users are different

data content is different,

data structures are different

hardware is different
Understanding The Differences Is The Key
Properties
OLTP Vs Warehouse contd..
Operational System Data Warehouse
Transaction Processing Query Processing

Predictable CPU Usage Random CPU Usage

Time Sensitive History Oriented

Operator View Managerial View

Normalized Efficient Denormalized Design for


Design for TP Query Processing
OLTP Vs Warehouse contd..
Operational System Data Warehouse
Designed for Atmocity, Designed for quite or static
Consistency, Isolation and database
Durability
Organized by transactions Organized by subject
(Order, Input, Inventory) (Customer, Product)
Relatively smaller database Large database size
Many concurrent users Relatively few concurrent
users
Volatile Data Non Volatile Data
OLTP Vs Warehouse contd..

Operational System Data Warehouse


Stores all data Stores relevant data

Performance Sensitive Less Sensitive to performance

Not Flexible Flexible

Efficiency Effectiveness
Different kinds of Information Needs
•• Current
Current Is this medicine available
in stock

•• Recent
Recent What are the tests this
patient has completed so
far

•• Historical
Historical
Has the incidence of
Tuberculosis increased in
last 5 years in Southern
region
OLAP (Online Analytical Process)

• data warehouse systems are well suited for On-Line

Analytical Processing.
• Describes processing at warehouse

Examples of OLAP operations –

include drill-down and roll-up, which allow the user to


view the data at differing degrees of summarization.
OLAP Contd..
• OLAP consists of Summarization,
Consolidation, Aggregation, Different angle
view / Multidimensional Analysis for decision
making.
OLTP Vs OLAP
Standard DB (OLTP) Warehouse (OLAP)
• Mostly updates · Mostly reads
• Many small transactions · Queries are long and complex
• Mb - Gb of data · Gb - Tb of data
• Current snapshot · History
• Index/hash on p.k. · Lots of scans
• Raw data · Summarized, reconciled data
• Thousands of users (e.g., · Hundreds of users (e.g.,
clerical users) decision-makers, analysts)

CS E5317 31
Comparison between OLTP and OLAP
systems Contd..
OLTP Vs ODS Vs DWH
Characteristic OLTP ODS Data Warehouse
Audience Operating Analysts Managers and
Personnel analysts
Data access Individual records, Individual records, Set of records,
transaction driven transaction or analysis driven
analysis driven
Data content Current, real-time Current and near- Historical
current
Data granularity Detailed Detailed and lightly Summarized and
summarized derived
Data organization Functional Subject-oriented Subject-oriented
Data quality All application All integrated data Data relevant to
specific detailed needed to support a management
data needed to business activity information needs
support a business
activity
OLTP Vs ODS Vs DWH Contd…
Characteristic OLTP ODS Data Warehouse
Data redundancy Non-redundant Somewhat Managed
within system; redundant with redundancy
Unmanaged operational
redundancy among databases
systems
Data stability Dynamic Somewhat dynamic Static
Data update Field by field Field by field Controlled batch
Data usage Highly structured, Somewhat Highly
repetitive structured, some unstructured,
analytical heuristic or
analytical
Database size Moderate Moderate Large to very large
Database Stable Somewhat stable Dynamic
structure stability
OLTP Vs ODS Vs DWH Contd…

Characteristic OLTP ODS Data Warehouse


Development Requirements Data driven, Data driven,
methodology driven, structured somewhat evolutionary
evolutionary
Operational Performance and Availability Access flexibility
priorities availability and end user
autonomy
Philosophy Support day-to- Support day-to-day Support managing
day operation decisions & the enterprise
operational
activities
Predictability Stable Mostly stable, some Unpredictable
unpredictability
Response time Sub-second Seconds to minutes Seconds to minutes
Return set Small amount of Small to medium Small to large
data amount of data amount of data
OLAP Operations

Since OLAP servers are based on


multidimensional view of data, we will
discuss OLAP operations in
multidimensional data.
Here is the list of OLAP operations −
Roll-up
Drill-down
Slice and dice
Pivot (rotate)
Roll-up performs aggregation on Roll-up
a data cube in any of the following ways −
By climbing up a concept hierarchy for a dimension
By dimension reduction
On rolling up, the data is aggregated by ascending the location hierarchy from
the level of city to the level of country.
The data is grouped into cities rather than countries.
When roll-up is performed, one or more dimensions from the data cube are
removed.
OLAP Operations
Drill-down

Drill-down is the reverse operation of roll-up.


It is performed by either of the following ways
OLAP Operations

By stepping down a concept hierarchy for a


dimension
By introducing a new dimension.
On drilling down, the time dimension is descended from the level of quarter to
the level of month.
When drill-down is performed, one or more dimensions from the data cube are
added.
It navigates the data from less detailed data to highly detailed data.
OLAP Operations
Slice
The slice operation selects one particular dimension from a given cube and provides a new sub-
cube. Here Slice is performed for the dimension "time" using the criterion time = "Q1".

It will form a new sub-cube by selecting one or more dimensions.


OLAP Operations
Dice
Dice selects two or more dimensions from a given cube and provides a new sub-
cube. 
OLAP Operations
The dice operation on the cube based on the
following selection criteria involves three
dimensions.

(location = "Toronto" or "Vancouver")


(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")
OLAP Operations
Pivot
The pivot operation is also known as rotation. It rotates the data axes in view in
order to provide an alternative presentation of data. 
OLAP Operations
Types of OLAP Servers

We have four types of OLAP servers −


Relational OLAP (ROLAP)
Multidimensional OLAP (MOLAP)
Hybrid OLAP (HOLAP)
Specialized SQL Servers
Applications of Data Warehouse
Data Warehouse Applications by Industry
Typical Data Warehouse Architecture

Data
Marts

Metadata Metadata

Select Select

Extract Extract
ODS Transform
Data
Transform Warehouse
Integrate Load

Maintain

Operational
Systems/Data
Data
Data
Preparation
Preparation

Multi-tiered Data Warehouse with ODS


Thank You!

You might also like