You are on page 1of 23

Data Warehouse Prerequisites

 Familiarity with Microsoft SQL Server


 System Administration for Microsoft SQL Server 7.0
and Implementing a Database on Microsoft SQL
Server 7.0
 Knowledge of Transact-SQL Usage in Developing
OLTP Systems
 Basic Understanding of Programming Principles and
Experience with a Scripting Language
 Understanding of Basic Database Design,
Administration, and Implementation Concepts
What is Data Warehousing?

Peter Rawsthorne

Special Thanks to Bill Inmon, the “grandfather”


of data warehousing.
OLTP vs. DSS
 Online Transaction Processing (OLTP)
 Decision Support System (DSS)

 OLTP
– ATM, Bank Teller, Ticket Master, POS…
 DSS
– Marketing, What if?, Inventory, Health (Walmart)…
– Click-through analysis
In-class Exercise
1. Break into teams of three
2. Think of a business or subject area
3. Determine three OLTP systems required to support
business or subject

4. Determine two DSS systems required to support


business or subject

5. Provide one example of how the DSS system could


be used to predict the future
Exercise Example
 Business: Yacht Club
 OLTP
– Membership System
– Accounting System
– Yacht Racing Results System
 DSS
– Quarterly and Yearly Expenses
– Race Results

 How much beer will we need for next years regatta


for the male non-members who are crew on yachts
over 40 feet?
What then is a data warehouse?

A data warehouse is a:
 subject oriented,
 integrated,
 time variant,
 non volatile

collection of data in support of


management's decision
making process.
Subject Orientation
 Data is organized via
subject rather than process
or business function.
 The application world is
concerned both with data
base design and process
design.
 The data warehouse world
focuses on data modeling
and database design
exclusively.
Integration
 Easily the most important
aspect of the data
warehouse environment is
that data found within the
data warehouse is
integrated. ALWAYS. WITH
NO EXCEPTIONS.
 consistent naming
conventions,
 consistent measurement of
variables,
 consistent encoding
structures,
 consistent physical
attributes of data,
 and so forth.
Time Variant
OPERATIONAL DATA WAREHOUSE
 Current valued data  Snapshot data
 Time horizon: 60 – 90 days  Time horizon: 5 – 10 days
 Key fields may or may not
 Keys do not have an
have an element of time
element of time
 Data can be updated
 Once snapshot is made,
records cannot be updated
non volatile
 Inserts, deletes, and changes - are done regularly to the
operational environment on a record by record basis.
 There are only two kinds of operations that occur in the data
warehouse - the initial loading of data, and the access of
data.
The structure of the warehouse

The different components of


the data warehouse are:
 metadata,
 current detail data,
 old detail data,
 lightly summarized data,
and
 highly summarized data.
Current Detail Data
 Most recent happenings
 Voluminous
 Lowest level of granularity
 Almost always stored on disk storage
 Fast to access
 Expensive and complex to manage
Older detail data
 Stored on some form of mass storage
 Infrequently accessed
 Stored at a level of detail consistent with current
detailed data
 Often stored on an alternate storage medium
 Anticipated large volume
Lightly summarized data
 Distilled from the low level of detail
 Almost always stored on disk
 Design issues facing the data architect are;
– what unit of time

– what contents – attributes

 Frequently mined data, a lot of “what if?”


Highly summarized data
 Compact and easily accessible
 Sometimes found in the data warehouse
 Sometimes found outside the data warehouse
 In any case, the highly summarized data is part of
the data warehouse
 Yearly or multi year summaries
Metadata
 Sits in a different dimension
 Contains no data directly taken from the
operational environment
 Special and very important role
 Metadata is used as:
– a directory to locate the contents

– a guide to the mapping of data

– a guide to the algorithms used for summarization


Metadata – levels of summarization
Flow of Data
 data enters from the
operational environment, it
is transformed
 data goes into the current
detail level of detail
 It resides there and is used
there until one of three
events occurs:
– it is purged,
– it is summarized, and/or
– it is archived.
Using the Data Warehouse
Example
Summary
 A data warehouse is a
subject oriented,
integrated,
time variant,
non volatile
collection of data in support of management's decision needs.
 Four levels of data warehouse data:
old detail,
current detail,
lightly summarized data, and
highly summarized data.
 Metadata is a very important part
Lab deliverables

 W2KS Install

 SQL7.0 Install

 SQL7.0 OLAP Services Install

 MSPress install

 Complete MSPress Chapter 1


Contact Information
Peter Rawsthorne, B.Tech, MCSD, MCT, CCR
President, Eclectic Endeavours Inc.

559A Artisan Lane


PO Box 281
Bowen Island, BC
CANADA V0N 1G0

Phone: 604-947-2760
Fax: 604-947-2715
email: peterr@endeavours.com
web: http://www.endeavours.com

You might also like