An Introduction to Data Warehousing

► At


the end of this lesson, you will know

     

What is Data Warehousing The evolution of Data Warehousing Need for Data Warehousing OLTP Vs Warehouse Applications Data marts Vs Data Warehouses Operational Data Stores

What is a Data Warehouse ?
A data warehouse is a subjectoriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions. - WH Inmon

WH Inmon - Regarded As Father Of Data Warehousing

Subject-Oriented- Characteristics
of a Data Warehouse


Data Warehouse









Focus is on Subject Areas rather than Applications

Integrated - Characteristics of a
Data Warehouse
Appl A - m,f Appl B - 1,0 Appl C - male,female Appl A - balance dec fixed (13,2) Appl B - balance pic 9(9)V99 Appl C - balance pic S9(7)V99 comp-3 Appl A - bal-on-hand Appl B - current-balance Appl C - cash-on-hand Appl A - date (julian) Appl B - date (yymmdd) Appl C - date (absolute) m,f

balance dec fixed (13,2)

Current balance

date (julian)

Integrated View Is The Essence Of A Data Warehouse

Non-volatile - Characteristics of
a Data Warehouse
change insert

delet e insert load

Data Warehouse
read only access



Data Warehouse Is Relatively Static In Nature

Time Variant - Characteristics of
a Data Warehouse
Data Warehouse
Snapshot data • time horizon : 5-10 years • key has an element of time • data warehouse stores historical data


Current Value data • time horizon : 60-90 days • key may not have element of time

Data Warehouse Typically Spans Across Time

Alternate Definitions
A collection of integrated, subject oriented databases designed to support the DSS function, where each unit of data is relevant to some moment of time - Imhoff

Alternate Definitions
Data Warehouse is a repository of data summarized or aggregated in simplified form from operational systems. End user orientated data access and reporting tools let user get at the data for decision support - Babcock

Evolution of Data Warehousing
1960 - 1985 : MIS Era

• Unfriendly • Slow • Dependent on IS programmers • Inflexible • Analysis limited to defined reports
Focus on Reporting

Evolution of Data Warehousing
1985 - 1990 : Querying Era

• Adhoc, unstructured access to corporate data • SQL as interface not scalable • Cannot handle complex analysis

Focus on Online Querying

Evolution of Data Warehousing
1990 - 20xx : Analysis Era

• Trend Analysis • What If ? • Moving Averages • Cross Dimensional Comparisons • Statistical profiles • Automated pattern and rule discovery
Focus on Online Analysis

Need for Data Warehousing
► ►

Better business intelligence for end-users Reduction in time to locate, access, and analyze information

► ► ► ►

Consolidation of disparate information sources Strategic advantage over competitors Faster time-to-market for products and services Replacement of older, less-responsive decision support systems

Reduction in demand on IS to generate reports

Remember Between OLTP and Data Warehouse systems users are different data content is different, data structures are different hardware is different
Understanding The Differences Is The Key

OLTP Systems Vs Data Warehouse

OLTP Vs Warehouse
Operational System
Transaction Processing Predictable CPU Usage Time Sensitive Operator View Normalized Efficient Design for TP

Data Warehouse
Query Processing Random CPU Usage History Oriented Managerial View Denormalized Design for Query Processing

OLTP Vs Warehouse
Operational System
Designed for Atmocity, Consistency, Isolation and Durability Organized by transactions (Order, Input, Inventory) Many concurrent users Volatile Data

Data Warehouse
Designed for quite or static database Organized by subject (Customer, Product) Relatively few concurrent users Non Volatile Data

Relatively smaller database Large database size

OLTP Vs Warehouse
Operational System
Stores all data Performance Sensitive Not Flexible Efficiency

Data Warehouse
Stores relevant data Less Sensitive to performance Flexible Effectiveness

Examples Of Some Applications
Manufacturers Manufacturers

► Target ► Market

Marketing Segmentation

Retailers Retailers

► Budgeting ► Credit

Rating Agencies Reporting and Consolidation

► Financial

   

Market Basket Analysis - POS Analysis Churn Analysis Profitability Management Event tracking
Customers Customers

Do we need a separate database ?

and data warehousing require two very of Production System from Business and highly variable resource

differently configured systems
► Isolation

Intelligence System
► Significant

demands of the data warehouse
► Cost

of disk space no longer a concern systems not designed for query

► Production


Data Marts
► Enterprise ► Getting

wide data warehousing projects have a very large cycle time consensus between multiple parties may also be difficult may not be satisfied with priority accorded to them individual departmental needs may be strong enough to warrant a local implementation distribution is also an important factor

► Departments ► Sometimes

► Application/database

Data Marts
Subject or Application Oriented Business View of Warehouse » Finance, Manufacturing, Sales etc. » Smaller amount of data used for Analytic Processing » Frequently implemented using MDDBs
A Logical Subset of The Complete Data Warehouse

Data Warehouse and Data Mart
Data Warehouse
Application Neutral • Centralized, Shared • Cross • LOB/enterprise Historical Detailed data • Some • summary Multiple subject areas •

Data Marts
Specific Application • Requirement LOB, • department Business Process • Oriented Detailed (some history) • Summarized • Single Partial subject • Multiple partial subjects • OLTP snapshots •

Data Perspective Subjects

Data Warehouse and Data Mart
Data Warehouse Data Marts
Data Sources
Many • Operational/ External • Data 9-18 • months for first stage Multiple stage • implementation Flexible, extensible • Durable/Strategic • Data • orientation Few • Operational, external • data OLTP • snapshots 4-12 • months

Implement Time Frame Characteristics

Restrictive, non • extensible Short • life/tactical Project Orientation •

Warehouse or Mart First ?
Data Warehouse First
Expensive Large development cycle Change management is difficult Difficult to obtain continuous corporate support Technical challenges in building large databases

Data Mart first
Relatively cheap Delivered in < 6 months Easy to manage change Can lead to independent and incompatible marts Cleansing, transformation, modeling techniques may be incompatible

Operational Data Store - Definition
Can I see credit report from Accounts, Sales from marketing and open order report from order entry for this customer Data from multiple sources is integrated for a subject

A subject oriented, integrated, volatile, current valued data store containing only corporate detailed data
Data stored only for current period. Old Data is either archived or moved to Data Warehouse

Identical queries may give different results at different times. Supports analysis requiring current data

Operational Data Store
► Increasingly

becoming integrated with the data warehouse ► Are nothing but more responsive real time data warehouses ► Data Mining has anyway forced Data Warehouses to store transactional level data

Different kinds of Information Needs
► Current ► Current
Is this medicine available in stock

► Recent ► Recent

What are the tests this patient has completed so far

► Historica ► Historica


Has the incidence of Tuberculosis increased in last 5 years in Southern region

Characteristic Audience OLTP ODS Data Warehouse Operating Analysts Managers and Personnel analysts Data access Individual records, Individual records, Set of records, transaction driven transaction or analysis driven analysis driven Data content Current, real-time Current and near- Historical current Data granularity Detailed Detailed and lightly Summarized and summarized derived Data organization Functional Subject-oriented Subject-oriented Data quality All application specific detailed data needed to support a business activity All integrated data Data relevant to needed to support a management business activity information needs

Characteristic Data redundancy OLTP Non-redundant within system; Unmanaged redundancy among systems Dynamic Field by field Highly structured, repetitive Moderate ODS Somewhat redundant with operational databases Data Warehouse Managed redundancy

Data stability Data update Data usage

Somewhat dynamic Static Field by field Somewhat structured, some analytical Moderate Somewhat stable Controlled batch Highly unstructured, heuristic or analytical Large to very large Dynamic

Database size

Database Stable structure stability

Characteristic Development methodology Operational priorities Philosophy OLTP Requirements driven, structured Performance and availability Support day-today operation Stable Sub-second Small amount of data ODS Data driven, somewhat evolutionary Availability Data Warehouse Data driven, evolutionary Access flexibility and end user autonomy Support managing the enterprise

Predictability Response time Return set

Support day-to-day decisions & operational activities Mostly stable, some Unpredictable unpredictability Seconds to minutes Seconds to minutes Small to medium amount of data Small to large amount of data

Sign up to vote on this title
UsefulNot useful