Best Practices Implementing Final

<Insert Picture Here>
Best Practices for Implementing a Data Warehouse on Oracle Exadata

Rekha Balwada
Principal Product Manager
Agenda
Oracle Exadata Database Machine
Oracle Reference Architecture
Oracle Staging Area Practices for Data Loading
Oracle Foundation Layer 3NF Data Model
Oracle Access & Performance Layer - Star Schema

Pre-Built Data Models

Best Machine For
Mixed Workloads
All Tiers
Warehousing
Disk
OLTP
Flash
DB Consolidation
Memory
DB Consolidation
Tier Unification
Lower Costs
Increase Utilization
Reduce Management
Cost of Disk
IOs of Flash
Speed of DRAM
Standardized and Simple to Deploy

All Database Machines Are The Same
Delivered Ready-to-Run
Thoroughly Tested
Highly Supportable
No Unique Configuration Issues
Identical to Config Used by Oracle Engineering
Runs Existing OLTP and DW Applications
30 Years of Oracle DB Capabilities

No Exadata Certification Required
Deploy in Days, Not

Months
Leverages Oracle Ecosystem

Skills, Knowledge Base, People, & Partners
Exadata Innovations
Exadata Storage Server Software
Intelligent Storage
Hybrid Columnar Compression
Smart Scan query offload

Scale-out storage
10x compression for warehouses

15x compression for archives
Uncompressed
Smart Flash Cache

Accelerates random I/O up to 30x
Doubles data scan rate
Data
remains
compressed
for scans
and in Flash
primary
Benefits
Multiply
backup
test
standby
devt
Compressed
Exadata in the Marketplace

Rapid Adoption In All Geographies and Industries

Information Management

Staging Layer
Efficient Data Loading
Full usage of SQL capabilities directly on the data

Automatic use of parallel capabilities
No need to stage the data again
Pre-Processing in an External Table

Allows flat files to be processed automatically during load
Decompression of large file zipped files
Pre-processing doesnt support automatic granulation

Need to supply multiple data files - number of files will
determine DOP
CREATE TABLE sales_external ()
ORGANIZATION EXTERNAL
(
TYPE ORACLE_LOADER
DEFAULT DIRECTORY data_dir1
ACCESS PARAMETERS
(RECORDS DELIMITED BY NEWLINE
PREPROCESSOR exec_dir: zcat'
FIELDS TERMINATED BY '|'
)
LOCATION ()
);
Direct Path Load

Data is written directly to the database storage using
multiple blocks per I/O request using asynchronous
writes
A CTAS command always uses direct path but an
IAS needs an APPEND hint
Insert /*+ APPEND */ into Sales partition(p2)
Select * From ext_tab_for_sales_data;
Ensure you do direct path loads in parallel

Specify parallel degree either with hint or on both tables
Enable parallel DML by issuing alter session command
ALTER SESSION ENABLE PARALLEL DML;
Data Loading Best Practices

Never locate the staging data files on the same disks as the RDBMS
DBFS on a Database Machine is an exception
Number of files might determine the maximum DOP, so plan for it

Always true when pre-processing is used
Ensure proper space management

Use bigfile ASSM tablespace
Auto allocate extents preferred
Ensure sufficiently large data extents for the target

Set INITIAL and NEXT to 8 MB for non-partitioned tables
Use parallelism Manual (DOP) or Auto DOP

More on Data loading best practices can found on OTN
http://www.oracle.com/technetwork/database/focus-areas/bi-datawarehousing/twpdwbestpractices-for-loading-11g-404400.pdf
Partition Exchange Loading

DBA
1. Create external table
for flat files
2. Use CTAS command

to create nonpartitioned table
TMP_SALES
Tmp_ sales
Table
3. Create indexes
4. Gather Statistics
Tmp_ sales
Table
Sales Table
Sales Table
May 18th
2008
May 18th
2008
May 19th
2008
May 19th
2008
May 20th
2008
May 20th
2008
May 21st
2008
Sales table now

has all the data
May 21st
2008
May 22nd
2008
May 22nd
2008
May 23rd
2008
May 23rd
2008
May 24th
2008
May 24th
2008
5. Alter table Sales

exchange partition
May_24_2008 with table
tmp_sales

Foundation Layer
What does a 3NF schema look like?

Looks like an OLTP schema
Multiple fact tables
Objective is to store a data point only once

process called normalization
Large number of tables due to normalization
Relationships between tables are chained
Lots of large table joins
Optimizing 3rd Normal Form

Requires 3 Ps - Power, Partitioning, Parallelism
Power Balanced Hardware Configuration
Weakest link will define throughput
Partition larger tables or fact tables
Use composite partitioning range-hash

Range to facility the data load and data elimination
Hash on join column to facility partition wise joins
Number of hash partitions should be power of 2 (#CPU X 2)
Parallel execution should be used

Instead of one process doing all the work, multiple processes
working concurrently on smaller units
Parallel degree should be power of 2
Exadata Hardware Architecture

Scaleable Grid of industry standard servers for Compute and Storage
Eliminates long-standing tradeoff between Scalability, Availability, Cost
Database Grid
Intelligent Storage Grid
8 Dual-processor x64
database servers
14 High-performance low-cost
storage servers
OR
2 Eight-processor x64
database servers
100 TB High Performance disk

or
504 TB High Capacity disk
InfiniBand Network
5.3 TB PCI Flash
Redundant 40Gb/s switches

Unified server & storage
network
Data mirrored across storage

servers
Partitioning
Range partition large fact tables typically on date column
Consider data loading frequency
Is an incremental load required?
How much data is involved, a day, a week, a month?
Partition pruning for queries
What range of data do the queries touch - a quarter, a year?
Sub partition by hash to improve join performance

between fact tables and / or dimension tables
Pick the common join column
If all dimension have different join columns use join column for
the largest dimension or most common join in the queries
Partition Pruning
Sales Table
Q: What was the total

sales for the weekend of
May 20 - 22 2008?
May 18th 2008
May 19th 2008
May 20th 2008
Select sum(sales_amount)
From SALES
May 21st 2008
Where sales_date between

May 22nd 2008
to_date(05/20/2008,MM/DD/YYYY)
And
to_date(05/23/2008,MM/DD/YYYY);
Only the 3
relevant
partitions are
accessed
May 23rd 2008
May 24th 2008
Partition Wise Join

SELECT sum(amount_sold)
FROM sales s, customer c
Sales
Range
partition
May 18th
2008
Sub part 1
Customer
WHERE
s.cust_id=c.cust_id;
Hash
Partitioned
Part 1
Sub part 1
Part 1
Sub part 2
Part 2
Sub part 2
Part 2
Sub part 3
Part 3
Sub part 3
Part 3
Sub part 4
Part 4
Sub part 4
Part 4
Both tables have the same

degree of parallelism and are
partitioned the same way on the
join column (cust_id)
A large join is divided into

multiple smaller joins, each
joins a pair of partitions in
parallel
Exadata Smart Scan

Improve Query Performance by 10x or More
What Were
Yesterdays
Sales?
Select sum(sales)
where salesdate=
22-Jan-2010
Return Sales for

Jan 22 2010
Sum
Off-load data intensive processing to Exadata Storage Server

Exadata Storage Server only returns relevant rows and columns
Wide Infiniband connections eliminate network bottlenecks
Exadata Storage Index

Transparent I/O Elimination with No Overhead
A B C D
Index
1
3
Min B = 1
Max B =5
5
5
8
Select * from Table where B<2 Only first set of rows can match
Min B = 3
Max B =8
3
Maintain summary information about table data in memory
Eliminate disk I/Os if MIN / MAX never match where clause

Completely automatic and transparent
Benefits Multiply
Converting Terabytes to Gigabytes
10 TB of User Data
1 TB of User Data
100 GB of User Data
10 TB of User Data
With 10x Compression
With Partition Pruning
20 GB of User Data
5 GB of User Data
Sub second 10 TB Scan
With Smart Scan
No Indexes
10 TB
of User
Data
With
Storage
Indexes
IM Reference Architecture
Access & Performance Layer
Access and Performance Layer

Dimensional Model
A form of (Physical) analytical design in which data
is pre-classified into Facts and Dimensions
Model that is all about Performance
Physical performance through optimisation
User access through simplified model
Various forms exist: Star, Star with Embedded

Aggregates, and Snowflakes
Plus multidimensional cubes (AWs in an Oracle Database)
What Does a Star Schema Look Like?
Called star schema because diagram resembles a star

Center of the star consists of one or more fact tables
Points of the star are the dimension tables
Each dimension is composed of levels
Levels are organized in hierarchies
Example: Assume this schema to be of a

retail-chain.
- Fact will be sales revenue (money)
- How you want to view the fact data is
called a dimension: customers, distribution
channels, products and time
Optimizing Star Schema

Create bitmap index on foreign key columns in fact table
Set STAR_TRANSFORMATION_ENABLED to TRUE
Goal is Star Transformation

Powerful optimization technique that rewrites or transform SQL
Executes the query in two phases
The first phase retrieves necessary rows (row set) from the fact table
Bitmap joins between bitmap indexes on all of the foreign key columns
The second phase joins this row set to the dimension tables
The join back to the dimension tables done using a hash join
Star Transformation in Detail

Select SUM(quanity_sold)
From Sales s, Customers c, Products p, Times t
Where s.cust_id = c.cust_id
And
s.prod_id = p.prod_id
And
s.time_id = t.time_id
And
And
Step 1: Oracle rewrites / transforms the

query to retrieve only the necessary rows from
the fact table using bitmap indexes on foreign
key columns
c.cust_city = BOSTON
p.product = UMBRELLA
And
t.month
= MAY
And
t.year
= 2008;
Select SUM(quanity_sold)
Step 2: Oracle joins
From Sales s
the rows from fact table

to the dimension tables
Where s.cust_id IN
(Select c.cust_id From Customers c Where c.cust_city = BOSTON)
And
s.prod_id IN
(Select p.prod_id From Products p where p.product = UMBRELLA)

And
s.time_id IN
(Select t.time_id From Times t Where t.month =MAY And t.year =2008);
Summary Management
Improve Response Time with Materialized Views
Region
SQL Query
Date
Query
Rewrite
Products
Relational Star
Schema
Sales by
Region
Sales by
Date
Sales by
Product
Sales by
Channel
Channel
Materialized Views
Pre-summarized information stored within Oracle Database 11g
Separate database object, transparent to queries

Supports sophisticated transparent query rewrite
Fast incremental refresh of changed data
Cube Organized Materialized Views

Region
SQL Query
Summaries
Date
Query Rewrite
Automatic
Refresh
Products
Channel
Exposes Oracle OLAP cubes as relational materialized views

Provides SQL access to data stored in an OLAP cubes
Any BI tool or SQL application can leverage OLAP cubes
Exadata Smart Flash Cache

OLAP Cubes on Flash Cache
Exadata has 5 TB of flash
56 Flash PCI cards avoid disk
controller bottlenecks
Intelligently manages flash

Smart Flash Cache holds hot data
Avoids large scan wipe-outs of cache
Gives speed of flash, cost of disk
5X More I/Os than

1000 Disk Enterprise
Storage Array
Exadata flash cache achieves:

Over 1.5 million IO/sec from SQL (8K)
Sub-millisecond response times
Lets build our DW ...
Custom Solutions
Key Implementation Tasks
Assemble Hardware
Install Specialized
Software
Design Data
Model
Requirements
Collection
ERD Specification
Assemble &
Configure
System
ETL
Define tables, views,
cubes, analytics
Implement indexes,
partitions
Implement
Data Model
Identify sources, map

to targets, cleanse
data
Map detail data to
summaries
Ensure system is
balanced
Modify performance
structures (MVs,
indexes, etc.)
Define Metrics
& Reports
Create BI metadata
Implement reports and
dashboards
Optimize
Performance
Challenges
Design a data model that satisfies existing and future requirements
Employ a diverse skill set
Integrate, implement, administer and tune disparate technologies
DW Reference Architecture
Industry Data Model Fit
OIDM
Oracle Industry Data Model

Enterprise wide data model for industry
Oracle Industry
Data Model
Sample OBIEE
Metadata & Reports
Information
Access
Derived
Aggregate
TRANSFORMATION
Analytic
Layer
Base, Reference and Lookup Tables
Foundation
Layer
Over 1,300 tables and 16,000 attributes

Over 1,000 industry measures and KPIs
Industry Standards conformant
Designed & Optimized for VLDB

including Exadata
Prebuilt mining models, OLAP cubes
and sample reports
Automatic data movement across the
warehouse
Easily extensible and customizable
Application and Tools agnostic
Central repository for atomic level data
Complete metadata (end-to-end)
Rapid implementation

Best Practice Implementation
ASSEMBLED
DESIGNED
IMPLEMENTED
ACCELERATED
FAST
READY
Exadata
Industry Data Model
Oracle Best
Practice DW
Methodology
Industry Data Model
Exadata
Reporting &
Analysis
Complete. Fast Results. Lower Risk.

Benefits
Utilize an industry specific enterprise-wide data model
Leverage your existing Oracle expertise
Implemented using Oracle data warehousing best practices

Complete. Fast Results. Lower Risk.
Time to Implement
Optimize Performance
3x Faster
Define Metrics & Reports
8x More
Complete
ETL
2 Weeks of POC for Data Model at No Cost
Implement Model
Optimize Performance
Define Metrics & Reports
Design Model
ETL
Implement Model
Sizing and Configuration
Design Model
Sizing and Configuration
Custom Warehouse
Oracle Industry Data Model with Exadata
Summary
Engineered System for Data Warehousing
A Single Source of Truth

Data Modeling for your Business

Oracle is optimized for any kind of modeling technique
Fast Time to Market

Leverage pre-built Data Models

Additional Resources
Oracle.com - www.oracle.com/exadata
Follow on Best Practice Exadata Webcast Series Best Practices for Workload Management of a Data
Warehouse on Oracle Exadata , April 19th, 2012
http://www.oracle.com/us/dm/sev100056475-wwmk11051130mpp016-1545274.html
Best Practices for Extreme Data Warehouse

Performance on Oracle Exadata, May 10th, 2012
http://www.oracle.com/us/dm/sev100056475-wwmk11051130mpp016-1545274.html

Best Practices Implementing Final

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Best Practices Implementing Final

Uploaded by

Copyright:

Available Formats

<Insert Picture Here>

Best Practices for Implementing a Data Warehouse on Oracle Exadata

Oracle Access & Performance Layer - Star Schema

Oracle Exadata Database Machine

Standardized and Simple to Deploy

30 Years of Oracle DB Capabilities

Deploy in Days, Not

Leverages Oracle Ecosystem

Hybrid Columnar Compression

Smart Scan query offload

10x compression for warehouses

Smart Flash Cache

Exadata in the Marketplace

Oracle Reference Architecture

Oracle Reference Architecture

Efficient Data Loading

Full usage of SQL capabilities directly on the data

Pre-Processing in an External Table

Pre-processing doesnt support automatic granulation

Direct Path Load

Ensure you do direct path loads in parallel

Data Loading Best Practices

Number of files might determine the maximum DOP, so plan for it

Ensure proper space management

Ensure sufficiently large data extents for the target

Use parallelism Manual (DOP) or Auto DOP

Partition Exchange Loading

2. Use CTAS command

Sales table now

5. Alter table Sales

Oracle Reference Architecture

What does a 3NF schema look like?

Objective is to store a data point only once

Optimizing 3rd Normal Form

Partition larger tables or fact tables

Use composite partitioning range-hash

Parallel execution should be used

Exadata Hardware Architecture

Intelligent Storage Grid

100 TB High Performance disk

5.3 TB PCI Flash

Redundant 40Gb/s switches

Data mirrored across storage

Sub partition by hash to improve join performance

Q: What was the total

May 18th 2008

May 19th 2008

May 20th 2008

May 21st 2008

Where sales_date between

May 23rd 2008

May 24th 2008

Partition Wise Join

Both tables have the same

A large join is divided into

Exadata Smart Scan

Return Sales for

Off-load data intensive processing to Exadata Storage Server

Exadata Storage Index

Eliminate disk I/Os if MIN / MAX never match where clause

100 GB of User Data

With 10x Compression

With Partition Pruning

Sub second 10 TB Scan