You are on page 1of 40

<Insert Picture Here>

Best Practices for Implementing a Data Warehouse on Oracle Exadata


Rekha Balwada
Principal Product Manager

Agenda
Oracle Exadata Database Machine
Oracle Reference Architecture
Oracle Staging Area Practices for Data Loading
Oracle Foundation Layer 3NF Data Model

Oracle Access & Performance Layer - Star Schema


Pre-Built Data Models

Oracle Exadata Database Machine


Best Machine For
Mixed Workloads

All Tiers

Warehousing

Disk

OLTP

Flash

DB Consolidation

Memory

DB Consolidation

Tier Unification

Lower Costs
Increase Utilization
Reduce Management

Cost of Disk
IOs of Flash
Speed of DRAM

Standardized and Simple to Deploy


All Database Machines Are The Same
Delivered Ready-to-Run
Thoroughly Tested
Highly Supportable
No Unique Configuration Issues
Identical to Config Used by Oracle Engineering
Runs Existing OLTP and DW Applications

30 Years of Oracle DB Capabilities


No Exadata Certification Required

Deploy in Days, Not


Months

Leverages Oracle Ecosystem


Skills, Knowledge Base, People, & Partners

Exadata Innovations
Exadata Storage Server Software
Intelligent Storage

Hybrid Columnar Compression

Smart Scan query offload


Scale-out storage

10x compression for warehouses


15x compression for archives
Uncompressed

Smart Flash Cache


Accelerates random I/O up to 30x
Doubles data scan rate

Data
remains
compressed
for scans
and in Flash
primary

Benefits
Multiply

backup
test

standby

devt

Compressed

Exadata in the Marketplace


Rapid Adoption In All Geographies and Industries

Oracle Reference Architecture


Information Management

Oracle Reference Architecture


Staging Layer

Efficient Data Loading

Full usage of SQL capabilities directly on the data


Automatic use of parallel capabilities
No need to stage the data again

Pre-Processing in an External Table


Allows flat files to be processed automatically during load
Decompression of large file zipped files

Pre-processing doesnt support automatic granulation


Need to supply multiple data files - number of files will
determine DOP
CREATE TABLE sales_external ()
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir1
ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE
PREPROCESSOR exec_dir: zcat'
FIELDS TERMINATED BY '|'
)
LOCATION ()
);

Direct Path Load


Data is written directly to the database storage using
multiple blocks per I/O request using asynchronous
writes
A CTAS command always uses direct path but an
IAS needs an APPEND hint
Insert /*+ APPEND */ into Sales partition(p2)
Select * From ext_tab_for_sales_data;

Ensure you do direct path loads in parallel


Specify parallel degree either with hint or on both tables
Enable parallel DML by issuing alter session command
ALTER SESSION ENABLE PARALLEL DML;

Data Loading Best Practices


Never locate the staging data files on the same disks as the RDBMS
DBFS on a Database Machine is an exception

Number of files might determine the maximum DOP, so plan for it


Always true when pre-processing is used

Ensure proper space management


Use bigfile ASSM tablespace
Auto allocate extents preferred

Ensure sufficiently large data extents for the target


Set INITIAL and NEXT to 8 MB for non-partitioned tables

Use parallelism Manual (DOP) or Auto DOP


More on Data loading best practices can found on OTN
http://www.oracle.com/technetwork/database/focus-areas/bi-datawarehousing/twpdwbestpractices-for-loading-11g-404400.pdf

Partition Exchange Loading


DBA
1. Create external table
for flat files

2. Use CTAS command


to create nonpartitioned table
TMP_SALES
Tmp_ sales
Table

3. Create indexes

4. Gather Statistics
Tmp_ sales
Table

Sales Table

Sales Table

May 18th
2008

May 18th
2008

May 19th
2008

May 19th
2008

May 20th
2008

May 20th
2008

May 21st
2008

Sales table now


has all the data

May 21st
2008

May 22nd
2008

May 22nd
2008

May 23rd
2008

May 23rd
2008

May 24th
2008

May 24th
2008

5. Alter table Sales


exchange partition
May_24_2008 with table
tmp_sales

Oracle Reference Architecture


Foundation Layer

What does a 3NF schema look like?


Looks like an OLTP schema
Multiple fact tables

Objective is to store a data point only once


process called normalization
Large number of tables due to normalization
Relationships between tables are chained
Lots of large table joins

Optimizing 3rd Normal Form


Requires 3 Ps - Power, Partitioning, Parallelism
Power Balanced Hardware Configuration
Weakest link will define throughput

Partition larger tables or fact tables

Use composite partitioning range-hash


Range to facility the data load and data elimination
Hash on join column to facility partition wise joins
Number of hash partitions should be power of 2 (#CPU X 2)

Parallel execution should be used


Instead of one process doing all the work, multiple processes
working concurrently on smaller units
Parallel degree should be power of 2

Exadata Hardware Architecture


Scaleable Grid of industry standard servers for Compute and Storage
Eliminates long-standing tradeoff between Scalability, Availability, Cost

Database Grid

Intelligent Storage Grid

8 Dual-processor x64
database servers

14 High-performance low-cost
storage servers

OR
2 Eight-processor x64
database servers

100 TB High Performance disk


or
504 TB High Capacity disk

InfiniBand Network

5.3 TB PCI Flash

Redundant 40Gb/s switches


Unified server & storage
network

Data mirrored across storage


servers

Partitioning
Range partition large fact tables typically on date column
Consider data loading frequency
Is an incremental load required?
How much data is involved, a day, a week, a month?
Partition pruning for queries
What range of data do the queries touch - a quarter, a year?

Sub partition by hash to improve join performance


between fact tables and / or dimension tables
Pick the common join column
If all dimension have different join columns use join column for
the largest dimension or most common join in the queries

Partition Pruning
Sales Table

Q: What was the total


sales for the weekend of
May 20 - 22 2008?

May 18th 2008

May 19th 2008

May 20th 2008

Select sum(sales_amount)
From SALES

May 21st 2008

Where sales_date between


May 22nd 2008

to_date(05/20/2008,MM/DD/YYYY)
And
to_date(05/23/2008,MM/DD/YYYY);

Only the 3
relevant
partitions are
accessed

May 23rd 2008

May 24th 2008

Partition Wise Join


SELECT sum(amount_sold)
FROM sales s, customer c
Sales
Range
partition
May 18th
2008
Sub part 1

Customer

WHERE

s.cust_id=c.cust_id;

Hash
Partitioned
Part 1

Sub part 1

Part 1

Sub part 2

Part 2

Sub part 2

Part 2

Sub part 3

Part 3

Sub part 3

Part 3

Sub part 4

Part 4

Sub part 4

Part 4

Both tables have the same


degree of parallelism and are
partitioned the same way on the
join column (cust_id)

A large join is divided into


multiple smaller joins, each
joins a pair of partitions in
parallel

Exadata Smart Scan


Improve Query Performance by 10x or More
What Were
Yesterdays
Sales?

Select sum(sales)
where salesdate=
22-Jan-2010

Return Sales for


Jan 22 2010

Sum

Off-load data intensive processing to Exadata Storage Server


Exadata Storage Server only returns relevant rows and columns
Wide Infiniband connections eliminate network bottlenecks

Exadata Storage Index


Transparent I/O Elimination with No Overhead
A B C D

Index

1
3

Min B = 1
Max B =5

5
5
8

Select * from Table where B<2 Only first set of rows can match

Min B = 3
Max B =8

3
Maintain summary information about table data in memory

Eliminate disk I/Os if MIN / MAX never match where clause


Completely automatic and transparent

Benefits Multiply
Converting Terabytes to Gigabytes

10 TB of User Data

1 TB of User Data

100 GB of User Data

10 TB of User Data

With 10x Compression

With Partition Pruning

20 GB of User Data

5 GB of User Data

Sub second 10 TB Scan

With Smart Scan

No Indexes

10 TB
of User
Data
With
Storage
Indexes

IM Reference Architecture
Access & Performance Layer

Access and Performance Layer


Dimensional Model
A form of (Physical) analytical design in which data
is pre-classified into Facts and Dimensions
Model that is all about Performance
Physical performance through optimisation
User access through simplified model

Various forms exist: Star, Star with Embedded


Aggregates, and Snowflakes
Plus multidimensional cubes (AWs in an Oracle Database)

What Does a Star Schema Look Like?

Called star schema because diagram resembles a star


Center of the star consists of one or more fact tables
Points of the star are the dimension tables
Each dimension is composed of levels
Levels are organized in hierarchies

Example: Assume this schema to be of a


retail-chain.
- Fact will be sales revenue (money)
- How you want to view the fact data is
called a dimension: customers, distribution
channels, products and time

Optimizing Star Schema


Create bitmap index on foreign key columns in fact table
Set STAR_TRANSFORMATION_ENABLED to TRUE

Goal is Star Transformation


Powerful optimization technique that rewrites or transform SQL
Executes the query in two phases
The first phase retrieves necessary rows (row set) from the fact table
Bitmap joins between bitmap indexes on all of the foreign key columns

The second phase joins this row set to the dimension tables
The join back to the dimension tables done using a hash join

Star Transformation in Detail


Select SUM(quanity_sold)
From Sales s, Customers c, Products p, Times t
Where s.cust_id = c.cust_id
And

s.prod_id = p.prod_id

And

s.time_id = t.time_id

And
And

Step 1: Oracle rewrites / transforms the


query to retrieve only the necessary rows from
the fact table using bitmap indexes on foreign
key columns

c.cust_city = BOSTON
p.product = UMBRELLA

And

t.month

= MAY

And

t.year

= 2008;
Select SUM(quanity_sold)

Step 2: Oracle joins

From Sales s

the rows from fact table


to the dimension tables

Where s.cust_id IN
(Select c.cust_id From Customers c Where c.cust_city = BOSTON)

And

s.prod_id IN

(Select p.prod_id From Products p where p.product = UMBRELLA)


And

s.time_id IN

(Select t.time_id From Times t Where t.month =MAY And t.year =2008);

Summary Management
Improve Response Time with Materialized Views
Region

SQL Query

Date

Query
Rewrite

Products

Relational Star
Schema

Sales by
Region

Sales by
Date

Sales by
Product

Sales by
Channel

Channel

Materialized Views

Pre-summarized information stored within Oracle Database 11g

Separate database object, transparent to queries


Supports sophisticated transparent query rewrite
Fast incremental refresh of changed data

Cube Organized Materialized Views


Region

SQL Query

Summaries

Date

Query Rewrite

Automatic
Refresh
Products

Channel

Exposes Oracle OLAP cubes as relational materialized views


Provides SQL access to data stored in an OLAP cubes
Any BI tool or SQL application can leverage OLAP cubes

Exadata Smart Flash Cache


OLAP Cubes on Flash Cache
Exadata has 5 TB of flash
56 Flash PCI cards avoid disk
controller bottlenecks

Intelligently manages flash


Smart Flash Cache holds hot data
Avoids large scan wipe-outs of cache
Gives speed of flash, cost of disk

5X More I/Os than


1000 Disk Enterprise
Storage Array

Exadata flash cache achieves:


Over 1.5 million IO/sec from SQL (8K)
Sub-millisecond response times

Lets build our DW ...

Custom Solutions
Key Implementation Tasks

Assemble Hardware
Install Specialized
Software

Design Data
Model
Requirements
Collection
ERD Specification

Assemble &
Configure
System

ETL
Define tables, views,
cubes, analytics
Implement indexes,
partitions

Implement
Data Model

Identify sources, map


to targets, cleanse
data
Map detail data to
summaries

Ensure system is
balanced
Modify performance
structures (MVs,
indexes, etc.)

Define Metrics
& Reports
Create BI metadata
Implement reports and
dashboards

Optimize
Performance

Challenges
Design a data model that satisfies existing and future requirements
Employ a diverse skill set
Integrate, implement, administer and tune disparate technologies

DW Reference Architecture
Industry Data Model Fit

OIDM

Oracle Industry Data Model


Enterprise wide data model for industry

Oracle Industry
Data Model
Sample OBIEE
Metadata & Reports

Information
Access

Derived

Aggregate

TRANSFORMATION

Analytic
Layer

Base, Reference and Lookup Tables

Foundation
Layer

Over 1,300 tables and 16,000 attributes


Over 1,000 industry measures and KPIs
Industry Standards conformant

Designed & Optimized for VLDB


including Exadata
Prebuilt mining models, OLAP cubes
and sample reports
Automatic data movement across the
warehouse
Easily extensible and customizable
Application and Tools agnostic
Central repository for atomic level data
Complete metadata (end-to-end)
Rapid implementation

Oracle Industry Data Model


Best Practice Implementation

ASSEMBLED

DESIGNED

IMPLEMENTED

ACCELERATED

FAST

READY

Exadata

Industry Data Model

Oracle Best
Practice DW
Methodology

Industry Data Model

Exadata

Reporting &
Analysis

Complete. Fast Results. Lower Risk.


Benefits
Utilize an industry specific enterprise-wide data model
Leverage your existing Oracle expertise
Implemented using Oracle data warehousing best practices

Oracle Industry Data Model


Complete. Fast Results. Lower Risk.

Time to Implement

Optimize Performance

3x Faster

Define Metrics & Reports

8x More
Complete

ETL

2 Weeks of POC for Data Model at No Cost

Implement Model

Optimize Performance

Define Metrics & Reports

Design Model

ETL
Implement Model

Sizing and Configuration

Design Model
Sizing and Configuration

Custom Warehouse

Oracle Industry Data Model with Exadata

Summary
Engineered System for Data Warehousing
Oracle Exadata Database Machine

A Single Source of Truth


Oracle Reference Architecture

Data Modeling for your Business


Oracle is optimized for any kind of modeling technique

Fast Time to Market


Leverage pre-built Data Models

Oracle Exadata Database Machine


Additional Resources

Oracle.com - www.oracle.com/exadata
Follow on Best Practice Exadata Webcast Series Best Practices for Workload Management of a Data
Warehouse on Oracle Exadata , April 19th, 2012
http://www.oracle.com/us/dm/sev100056475-wwmk11051130mpp016-1545274.html

Best Practices for Extreme Data Warehouse


Performance on Oracle Exadata, May 10th, 2012
http://www.oracle.com/us/dm/sev100056475-wwmk11051130mpp016-1545274.html

You might also like