Copyright © 2005, Oracle. All rights reserved.

Data Warehouse Design
1-2 Copyright © 2005, Oracle. All rights reserved.
Objectives
After completing this lesson, you should be able to do
the following:
• Differentiate OLTP and data warehousing design
techniques
• Describe effective data warehouse design
• Identify data warehousing schemas
• Explain implementation models
• List data warehousing objects
1-3 Copyright © 2005, Oracle. All rights reserved.
Characteristics of a Data Warehouse
• A data warehouse is a database designed for
querying, reporting, and analysis.
• A data warehouse contains historical data derived
from transaction data.
• Data warehouses separate analysis workload from
transaction workload.
• A data warehouse is primarily
an analytical tool.
1-4 Copyright © 2005, Oracle. All rights reserved.
Comparing OLTP and Data Warehouses
OLTP
Many
Comparatively
lower
Normalized
DBMS
Rare
Some
Large
amount
Denormalized
DBMS
Common
Data
Warehouse
Data accessed
by queries
Joins
Duplicated
data
Derived data
and
aggregates
1-6 Copyright © 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Basic Data
Warehouse
Analysis
Reporting
Data mining
Operational
systems
Flat files
Materialized
views
Metadata
Raw data
1-7 Copyright © 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Data Warehouse
with Staging Area
Analysis
Reporting
Data mining Flat files
Materialized
views
Metadata
Raw data
Operational
systems
Staging
area
1-8 Copyright © 2005, Oracle. All rights reserved.
Data Warehouse Architectures
Data Warehouse
with Staging Area
Reporting
Data mining Flat files
Materialized
views
Metadata
Raw data
Operational
systems
Staging
area
Sales
Purchasing
Inventory
Analysis
1-9 Copyright © 2005, Oracle. All rights reserved.
Data Warehouse Design
Key data warehouse design considerations:
• Identify the specific data content.
• Recognize the critical relationships within and
between groups of data.
• Define the system environment
supporting your data warehouse.
• Identify the required data
transformations.
• Calculate the frequency at which
the data must be refreshed.
1-10 Copyright © 2005, Oracle. All rights reserved.
Logical Design
• A logical design is conceptual and
abstract.
• Entity-relationship (ER) modeling
is useful in identifying logical
information requirements.
– An entity represents a chunk of data.
– The properties of entities are known as attributes.
– The links between entities and attributes are known
as relationships.
• Dimensional modeling is a specialized
type of ER modeling useful in data warehouse
design.
1-12 Copyright © 2005, Oracle. All rights reserved.
Oracle Warehouse Builder
• Oracle Database provides tools to implement the
ETL process.
– Oracle Warehouse Builder is a tool to help in this
process.
• Oracle Warehouse Builder generates the following
types of code:
– SQL data definition language (DDL) scripts
– PL/SQL programs
– SQL*Loader control files
– XML Processing Description Language (XPDL)
– ABAP code (used to extract data from SAP
systems)
1-13 Copyright © 2005, Oracle. All rights reserved.
Data Warehousing Schemas
• Objects can be arranged in data warehousing
schema models in a variety of ways:
– Star schema
– Snowflake schema
– Third normal form (3NF) schema
– Hybrid schemas
• The source data model and user
requirements should steer the data
warehouse schema.
• Implementation of the logical model may require
changes to enable you to adapt it to your physical
system.
1-14 Copyright © 2005, Oracle. All rights reserved.
Schema Characteristics
• Star schema
– Characterized by one or more large fact tables and
a number of much smaller dimension tables
– Each dimension table joined to the fact table using
a primary key to foreign key join
• Snowflake schema
– Dimension data grouped into multiple tables
instead of one large table
– Increased number of dimension tables, requiring
more foreign key joins
• Third normal form (3NF) schema
– A classical relational-database model that
minimizes data redundancy through normalization
1-16 Copyright © 2005, Oracle. All rights reserved.
Data Warehousing Objects
• Fact tables
– Fact tables are the large tables that store business
measurements.
• Dimension tables
– A dimension is a structure composed of one or
more hierarchies that categorizes data.
– Unique identifiers are specified for one distinct
record in a dimension table.
• Relationships
– Relationships guarantee
integrity of business
information.
1-17 Copyright © 2005, Oracle. All rights reserved.
Fact Tables
• A fact table must be defined for each star schema.
• Fact tables are the large tables that store business
measurements.
• A fact table contains either detail-level or
aggregated facts.
• A fact table usually contains facts with the same
level of aggregation.
• The primary key of the fact table is
usually a composite key made up
of all its foreign keys.
1-18 Copyright © 2005, Oracle. All rights reserved.
Dimensions and Hierarchies
• A dimension is a structure
composed of one or more
hierarchies that categorizes data.
• Dimensional attributes help to
describe the dimensional value.
• Dimension data is collected at the
lowest level of detail and aggregated
into higher level totals.
• Hierarchies are structures that use
ordered levels to organize data.
• In a hierarchy, each level is
connected to the levels above and
below it.
STATE
COUNTRY
SUBREGION
REGION
CUSTOMERS dimension
hierarchy (by level)
CITY
CUSTOMER
1-19 Copyright © 2005, Oracle. All rights reserved.
Dimensions and Hierarchies
Dimension table Dimension table
TIMES CHANNELS
CUSTOMERS
#cust_id
cust_last_name
cust_city
cust_state_province
PRODUCTS
#prod_id
Fact table
PROMOTIONS
Dimension table
SALES
cust_id
prod_id
Hierarchy
Unique identifier
Relationship
1-20 Copyright © 2005, Oracle. All rights reserved.
Physical Design
Relationships
Unique
identifiers
Attributes
Entities Tables
Integrity
constraints
- Primary key
- Foreign key
- Not null
Columns
Indexes
Materialized
views
Dimensions
Logical Physical (Tablespaces)
1-21 Copyright © 2005, Oracle. All rights reserved.
Data Warehouse Physical Structures
Tables and partitioned tables
• Partitioned tables enable you to split
large data volumes into smaller,
more manageable pieces.
• Expect performance benefits from:
– Partition pruning
– Intelligent parallel processing
• Compressed tables offer scaleup opportunities for
read-only operations.
• Table compression saves disk space.

1-22 Copyright © 2005, Oracle. All rights reserved.
Data Warehouse Physical Structures
• Views:
– Are tailored presentations of data contained in one
or more tables or views
– Do not require any space in the database
• Materialized views:
– Are query results that have been stored in advance
– (Like indexes) are used transparently and improve
performance
• Integrity constraints:
– Are used in data warehouses for query rewrite
• Dimensions:
– Are containers of logical relationships and do not
require any space in the database
1-23 Copyright © 2005, Oracle. All rights reserved.
Managing Large Volumes of Data
Work smarter in your data warehouse:
• Partitioning
• Bitmap indexes/Star transformation
• Data compression
• Query rewrite
Work harder in your data warehouse:
• Parallelism for all operations
– DBA tasks, such as loading, index creation, table
creation, data modification, backup and recovery
– End-user operations, such as queries
– Unbounded scalability: Real Application Clusters
1-24 Copyright © 2005, Oracle. All rights reserved.
I/O Performance in Data Warehouses
• I/O is typically the primary determinant of data
warehouse performance.
• Data warehouse storage configurations should be
chosen by I/O bandwidth, not storage capacity.
• Every component of the I/O
subsystem should provide
enough bandwidth:
– Disks
– I/O channels
– I/O adapters
• In data warehouses, maximizing
sequential I/O throughput is critical.
1-25 Copyright © 2005, Oracle. All rights reserved.
Performance of Sequential I/Os
• In data warehouses, drive arrays generally see
random large I/Os (1 MB) spread across the
devices.
– This is known as multiuser sequential workload.
• The host operating system, device drivers, or
storage array may fracture large I/Os into smaller
I/Os.
– It is common in default Linux configurations to
fracture large I/Os into smaller ones (up to 32 KB).
• This level of I/O fracturing can have a disastrous
effect on the total throughput.
• The implementation of query rewrite has a positive
effect on minimizing I/O requests.
1-26 Copyright © 2005, Oracle. All rights reserved.
SELECT sum(sales_amount)
FROM sales
WHERE sales_date
BETWEEN ‘01-MAR-2005’ AND ‘31-MAY-2005’;
Minimizing I/O Requests
• Only the relevant partitions are accessed.
• Optimizer knows or finds the relevant
partitions.
– Static pruning uses known values in advance.
– Dynamic pruning uses internal recursive SQL
to find the relevant partitions.
• It provides order of magnitude performance
gains.
Partition pruning
SALES
2005-JAN
2005-FEB
2005-MAR
2005-APR
2005-MAY
2005-JUN
1-27 Copyright © 2005, Oracle. All rights reserved.
Minimizing I/O Requests
• Bitmap indexes are usually 3 to 20 times
smaller than B-tree indexes.
• They are ideal for set-based operations.
• Star transformation uses bitmap indexes to
identify base table records of interest.
• Full table access is replaced with bitmap
index access.
• Bitmap indexes minimize I/O.
Bitmap indexes
<Blue, <rowid>, 1000100100010010100>
<Green, <rowid>, 0001010000100100000>
<Red, <rowid>, 0100000011000001001>
<Orange, <rowid>, 0010001000001000010>
1-28 Copyright © 2005, Oracle. All rights reserved.
Minimizing I/O Requests

Query rewrite:
• Reduces I/O requests by employing materialized
views with precomputed aggregates and joins
• Transforms a SQL statement expressed in terms
of tables or views into a statement accessing
materialized views based on the detail tables
– The transformation is transparent.
• Is implemented using materialized views that can
be added or dropped like indexes without
invalidating your SQL statements
1-30 Copyright © 2005, Oracle. All rights reserved.
I/O Scalability
• Reduces response time for data-intensive operations
on large databases
• Benefits systems with the following characteristics:
– Multiprocessors, clusters, or massively parallel systems
– Sufficient I/O bandwidth
– Sufficient memory to support memory-intensive
processes such as sorts, hashing, and I/O buffers
Data on disk
Query servers
Coordinator
Dispatch
work
Sort Q4
Sorters (Aggregators) Scanners
Parallel execution:
Sort Q3
Sort Q2
Sort Q1 Scan
Scan
Scan
Scan
1-31 Copyright © 2005, Oracle. All rights reserved.
I/O Scalability
Automatic Storage Management (ASM)
• Configuring storage for a DB depends on many
variables:
– Which data to put on which disk
– Logical unit number (LUN) configurations
– DB types and workloads; data warehouse, OLTP,
DSS
– Trade-offs between available options
• ASM provides solutions to storage issues
encountered in data warehouses.
1-32 Copyright © 2005, Oracle. All rights reserved.
I/O Scalability

Automatic Storage Management: Overview
• Portable and high-performance
cluster file system
• Manages Oracle database files
• Data spread across disks
to balance load
• Integrated mirroring across
disks
• Solves many storage
management challenges
ASM
File
system
Volume
manager
Operating system
Application
Database
1-33 Copyright © 2005, Oracle. All rights reserved.
I/O Scalability
ASM benefits
• Stripes files rather than
logical volumes
• Online disk reconfiguration
and dynamic rebalancing
• Provides redundancy on a
file basis
• Automatic database file
management
• EM-based graphical
management interface
• Hot spots and manual I/O
tuning eliminated
1-34 Copyright © 2005, Oracle. All rights reserved.
I/O Scalability
Real Application Clusters
• Real Application Clusters (RAC) provides linear
scalability and availability for data warehouses.
• RAC provides redundancy so that if a node goes
down, the other nodes will continue to execute.
• RAC nodes can share all work equally or perform
dedicated tasks such as ETL or query processing.
1-35 Copyright © 2005, Oracle. All rights reserved.
Typical Data Warehouse Cluster
16-port switch
16-port switch
1 Gigabit Ethernet interconnects
Sixteen storage arrays,
each with 10–20 disks
Four nodes, each
with four 2 GHz
CPUs
1-36 Copyright © 2005, Oracle. All rights reserved.
Parallel Execution with RAC
Execution slaves have node affinity with the execution
coordinator, but will expand if needed.
Execution
coordinator
Parallel
execution
server
Shared disks
Node 4 Node 1 Node 2 Node 3
1-37 Copyright © 2005, Oracle. All rights reserved.
Summary
In this lesson, you should have learned how to:
• Differentiate OLTP and data warehousing design
techniques
• Describe effective data warehouse design
• Identify data warehousing schemas
• Explain implementation models
• List data warehousing objects

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.