You are on page 1of 578

For Oracle employees and authorized partners only. Do not distribute to third parties.

© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Exadata Oracle Database Machine Overview
Presenter‘s Name
Presenter‘s Title

Student Manual

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Why Exadata? <Insert Picture Here>

• Exadata features for increasing resources


• Exadata features for reducing demand
• Exadata features for insuring efficiency
• Exadata benefits
• Exadata sizing and licensing
• Summary

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Why Exadata?

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Database machine
Requirements with 11g R2

• Data integrity
• MVRC
• Data Guard
• ASM

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Database machine
Requirements with 11g R2

• Data integrity
• Performance
• MVRC
• Efficient caching
• Bitmap indexes
• Partitioning
• Materialized views

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Database machine
Requirements with 11g R2

• Data integrity
• Performance
• Scalability
• RAC
• Powerful platforms

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Database machine
Requirements

• Data integrity
• Performance
• Scalability

Oracle delivers!

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Database Machine
What more?

• Efficiency

Resources
Efficiency = -----------------------
Demands

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Business impact of efficiency
• Business queries if 10 TB of data must be scanned

Data Bandwidth
1 GB/sec 3 queries/work day Don‘t Ask

10 TB
10 GB/sec
3 queries/hour Ask Tomorrow

100 GB/sec
35 queries/hour Ask Anything
Exadata Database Machine
Extreme efficiency

• Fast Predictable Performance

• Lower Ongoing Costs

• The Fastest Time to Value & Lowest Risk

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Exadata features for


increasing resources

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Hardware Architecture
Scaleable Grid of industry standard servers for Compute and Storage
• Eliminates long-standing tradeoff between Scalability, Availability, Cost

Database Grid Intelligent Storage Grid


• 8 Dual-processor x64 • 14 High-performance low-cost
database servers storage servers
OR
• 2 Eight-processor x64 • 100 TB High Performance disk,
database servers or
336 TB High Capacity disk
InfiniBand Network • 5.3 TB PCI Flash
• Redundant 40Gb/s switches
• Data mirrored across storage
• Unified server & storage servers
network

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Smart Flash Cache
Extreme Performance OLTP & DW

• Exadata has 5 TB of flash


• 56 Flash PCI cards avoid disk
controller bottlenecks

• Intelligently manages flash


• Smart Flash Cache holds hot data
• Avoids large scan wipe-outs of cache
• Gives speed of flash, cost of disk
5X More I/Os than
1000 Disk Enterprise • Exadata flash cache achieves:
Storage Array • Over 1 million IO/sec from SQL (8K)
• Sub-millisecond response times

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Exadata features for


reducing demand

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Intelligent Storage

• Exadata storage servers also run more complex


operations in storage
• Join filtering
• Incremental backup filtering
• I/O prioritization
• Storage Indexing
• Database level security
• Offloaded scans on encrypted data
Exadata • Data Mining Model Scoring
Intelligent Storage
Grid • 10x reduction in data sent to DB servers
is common

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Traditional Scan Processing


 • With traditional storage, all
SELECT
customer_name Rows Returned database intelligence
FROM calls resides in the database
WHERE amount > hosts
200;
 • Very large percentage of
DB Host reduces data returned from storage
terabyte of data to 1000 is discarded by database

customer names that servers
Table
are returned to client
Extents • Discarded data consumes
Identified
valuable resources, and
impacts the performance of
 other workloads

I/Os Issued I/Os Executed:
1 terabyte of data
returned to hosts

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Smart Scan Processing
Reduces demand
 • Only the relevant columns
SELECT
customer_name • customer_name

FROM calls Rows Returned
and required rows
WHERE amount > • where amount>200
200; are are returned to hosts

 • CPU consumed by predicate


 Consolidated evaluation is offloaded
Extents and Result Set
metadata sent to Built From All
storage Cells • Moving scan processing off
the database host frees host
CPU cycles and eliminates
 massive amounts of
Smart Scan
 unproductive messaging
identifies rows and
2MB of data • Returns the needle, not the
columns within
returned to server entire hay stack
terabyte table that
match request

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Additional Smart Scan functionality
Reduces demand
• Join filtering
• Filtering is performed within Exadata storage cells
• Join predicates are transformed into filters
• Backups
• Only changed blocks are returned
• Create Tablespace (file creation)
• Formatting of tablespace extents eliminates the I/O associated with
the creation and writing of tablespace blocks
• Smart Scan offload for encrypted tablespaces and
columns
• Offload of Data Mining Model scoring

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Hybrid Columnar Compression
Highest Capacity, Lowest Cost

• Data is organized and compressed by column


• Dramatically better compression

• Speed Optimized Query Mode for Data


Warehousing
• 10X compression typical
Query

• Runs faster because of Exadata offload!

• Space Optimized Archival Mode for


infrequently accessed data
• 15X to 50X compression typical

Faster and Simpler


Backup, DR, Caching,
Benefits Multiply
Reorg, Clone

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Storage Index
Transparent I/O Elimination with No Overhead

Table Index
• Exadata Storage Indexes maintain summary
information about table data in memory
A B C D
• Store MIN and MAX values of columns
1 • Typically one index entry for every MB of disk
Min B = 1
3
Max B =5 • Eliminates disk I/Os if MIN and MAX can never
5 match ―where‖ clause of a query
5
8 Min B = 3 • Completely automatic and transparent
Max B =8
3

Select * from Table where B<2 - Only first set of rows can match

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Exadata features for


insuring efficiency

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata I/O Resource Management
Mixed Workloads and Multi-Database Environment

• Ensure different databases are


allocated the correct relative amount of Database A Database B
I/O bandwidth
• Database A: 33% I/O resources
• Database B: 67% I/O resources
• Ensure different users and tasks within InfiniBand Switch/Network
a database are allocated the correct
relative amount of I/O bandwidth Exadata Cell Exadata Cell Exadata Cell

• Database A:
• Reporting: 60% of I/O resources
• ETL: 40% of I/O resources
• Database B:
• Interactive: 30% of I/O resources
• Batch: 70% of I/O resources

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Exadata benefits

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Benefits
Fast Predictable Performance

• More predictable timeliness of results


• Faster results by moving Oracle database
intelligence to disk storage

• Properly configured out-of-the-box


• Ready to run - plug it in

• More capabilities to support more


business analysts
• Scale to support an enterprise

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Brian Camp
SVP, Infrastructure Services
KnowledgeBase Marketing

“After carefully testing several data warehouse platforms, we chose the


Oracle Database Machine. Oracle Exadata was able to speed up one of
our critical processes from days to minutes. The Oracle Database Machine
will allow us to improve service levels and expand our service offerings.”

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Performance Query Throughput Query Throughput with Flash

60
50
50
Why is Oracle Faster?
40
 DB Processing in Storage
30
 Better Compression (10x) 21
20
 Smart Flash Cache
11.4
10
7.5
 Faster Interconnect (40Gb/sec) 10

 More Disks 0
HITACHI TERADATA NETEZZA SUN ORACLE
USP V 2550 TwinFin 12 Database Machine
 Faster Disks (15K RPM)

© 2009 Oracle Corporation For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Performance Scales

10 Hour
• Exadata delivers brawny
Table Scan Time hardware for use by Oracle‘s
brainy software

Typical • Performance scales with size


Warehouse
5 Hour
• Result
• More business insight
• Better decisions
• Improved competitiveness
1 Hour
Exadata
1TB 10 TB 100TB Table Size

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Exadata sizing and


licensing

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Hardware Architecture
Scaleable Grid of industry standard servers for Compute and Storage
• Eliminates long-standing tradeoff between Scalability, Availability, Cost

Database Grid Intelligent Storage Grid


• 8 Dual-processor x64 • 14 High-performance low-cost
database servers storage servers
OR
• 2 Eight-processor x64 • 100 TB High Performance disk,
database servers or
336 TB High Capacity disk
InfiniBand Network • 5.3 TB PCI Flash
• Redundant 40Gb/s switches
• Data mirrored across storage
• Unified server & storage servers
network

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Standardized and Simple to Deploy

• All Database Machines are the same


• Delivered ready-to-run
• Tested
• Highly supportable
• No unique configuration issues
• Identical to config used by Oracle Engineering

• Runs existing OLTP and DW applications


• Full 30 years of Oracle DB capabilities
• No Exadata certification required

Deploy in Days, • Leverages Oracle ecosystem


Not Months • Skills, knowledge base, people, partners

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Paul Hartley
General Manager
LGR Telecommunications

―You can easily remove six months of the


implementation cycle…‖

―…we estimate there‘s up to a 70 percent reduction in


terms of cost of ownership compared to custom
solutions, just in terms of the personnel savings.‖

from Profit Magazine, February 2009

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Storage Server Building Block
• High-performance storage server built from
• Hardware by Sun industry standard components
• Software by Oracle
• 12 disks - 600 GB 15000 RPM High
Performance SAS or 2TB 7200 RPM High
Capacity SAS

• 2 Six-Core Intel Xeon Processors (L5640)

• Dual ported 40 Gb/sec InfiniBand

• 4 x 96 GB Flash Cards

• Intelligent Exadata Storage Server software

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
New - Exadata Database Machine X2-8 Full Rack
Extreme Performance for Consolidation, Large OLTP and DW

• 2 x64 Eight-processor Database servers (Sun Fire 4800)


• High Core, High Memory Database Servers
• 128 CPU cores (64 per server)
• 2 TB (1 TB per server)
• 10 GigE connectivity to Data Center
• 16 x 10GbE ports (8 per server)
• 14 Exadata Storage Servers X2-2
• All with High Performance 600GB SAS disks
OR
• All with High Capacity 2 TB SAS disks
• 3 Sun Datacenter InfiniBand Switch 36
• 36-port Managed QDR (40Gb/s) switch
• 1 ―Admin‖ Cisco Ethernet switch
• Redundant Power Distributions Units (PDUs)

Add more racks for additional scalability


For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Database Machine X2-2 Full Rack
Pre-Configured for Extreme Performance

• 8 x64 Dual-procesor Database Servers (Sun Fire X4170 M2)


• 96 cores (12 per server)
• 768 GB memory (96GB per server)
• 10 GigE connectivity to Data Center
• 16 x 10GbE ports (2 per server)
• 14 Exadata Storage Servers X2-2
• All with High Performance 600GB SAS disks
OR
• All with High Capacity 2 TB SAS disks
• 3 Sun Datacenter InfiniBand Switch 36
• 36-port Managed QDR (40Gb/s) switch
• 1 ―Admin‖ Cisco Ethernet switch
• Keyboard, Video, Mouse (KVM) hardware
• Redundant Power Distributions Units (PDUs)

Add more racks for additional scalability


For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Database Machine X2-2 Half Rack
Pre-Configured for Extreme Performance

• 4 x64 Dual-procesor Database Servers (Sun Fire X4170


M2)
• 48 cores (12 per server)
• 384 GB memory (96GB per server)
• 10 GigE connectivity to Data Center
• 8 x 10GbE ports (2 per server)
• 7 Exadata Storage Servers X2-2
• All with High Performance 600GB SAS disks
OR
• All with High Capacity 2 TB SAS disks
• 3 Sun Datacenter InfiniBand Switch 36
• 36-port Managed QDR (40Gb/s) switch
• 1 ―Admin‖ Cisco Ethernet switch
• Keyboard, Video, Mouse (KVM) hardware
• Redundant Power Distributions Units (PDUs)

Can Upgrade to a Full Rack


For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Database Machine X2-2 Quarter Rack
Pre-Configured for Extreme Performance

• 2 x64 Dual-procesor Database Servers (Sun Fire X4170 M2)


• 24 cores (12 per server)
• 192 GB memory (96GB per server)
• 10 GigE connectivity to Data Center
• 4 x 10GbE ports (2 per server)
• 3 Exadata Storage Servers X2-2
• All with High Performance 600GB SAS disks
OR
• All with High Capacity 2 TB SAS disks
• 2 Sun Datacenter InfiniBand Switch 36
• 36-port Managed QDR (40Gb/s) switch
• 1 ―Admin‖ Cisco Ethernet switch
• Keyboard, Video, Mouse (KVM) hardware
• Redundant Power Distributions Units (PDUs)

Can Upgrade to an Half Rack


For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Start Small and Grow

Quarter Half Full


Rack Rack Rack

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Product Capacity
X2-8 X2-2 X2-2 X2-2
Full Rack Full Rack Half Rack Quarter Rack
High Perf Disk 100 TB 100 TB 50 TB 21 TB
Raw Disk1
High Cap Disk 336 TB 336 TB 168 TB 72 TB

Raw Flash1 5.3 TB 5.3 TB 2.6 TB 1.1 TB

User High Perf Disk 28 TB 28 TB 14 TB 6 TB


Data2 High Cap Disk
(assuming no
100 TB 100 TB 50 TB 21 TB
compression)

1 – Raw capacity calculated using 1 GB = 1000 x 1000 x 1000 bytes and 1 TB = 1000 x 1000 x 1000 x 1000 bytes.
2 - User Data: Actual space for end-user data, computed after single mirroring (ASM normal redundancy) and after
allowing space for database structures such as temp, logs, undo, and indexes. Actual user data capacity varies by
application. User Data capacity calculated using 1 TB = 1024 * 1024 * 1024 * 1024 bytes.

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Product Performance
X2-8 X2-2 X2-2 X2-2
Full Rack Full Rack Half Rack Quarter
Rack
Raw Disk Data High Perf Disk 25 GB/s 25 GB/s 12.5 GB/s 5.4 GB/s
Bandwidth1,4 High Cap Disk 14 GB/s 14 GB/s 7 GB/s 3 GB/s

Raw Flash Data Bandwidth1,4 50 GB/s 50 GB/s 25 GB/s 11 GB/s

High Perf Disk 50,000 50,000 25,000 10,800


Disk IOPS3,4
High Cap Disk 25,000 25,000 12,500 5,400
Flash IOPS3,4 1,000,000 1,000,000 500,000 225,000
Data Load Rate4 5 TB/hr 5 TB/hr 2.5 TB/hr 1 TB/hr

1 – Bandwidth is peak physical disk scan bandwidth, assuming no compression.


2 - Max User Data Bandwidth assumes scanned data is compressed by factor of 10 and is on Flash.
3 – IOPs – Based on IO requests of size 8K
4 - Actual performance will vary by application.

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Database Server Operating System Choices

• Two Operating System Choices on the database servers


• Oracle Linux
• Solaris 11 Express (x86) – coming soon
• Customers will choose their preferred Database Server
OS at installation time
• Exadata Storage Servers will continue to be Oracle Linux

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Licensing
Database nodes
Required Products
Oracle Database 11g Enterprise Edition
Oracle Exadata Storage Server Software
Highly recommended products
RAC
Partitioning Option
Other Recommended Software
Advanced Compression Option
Enterprise Manager Packs: Diagnostics, Provisioning, Tuning
OLAP Option
Data Mining Option
Advanced Security Option
Real Application Testing
Oracle Business Intelligence Enterprise Edition Plus

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Exadata summary

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Exadata Database Machine Summary
Extreme Performance for all Data Management

• Best for Data Warehousing


• Smart scan of 10x compressed tables
• Parallel query on in-memory data
• Overall up to 5x faster than 11.1 for Warehousing

• Best for OLTP


• Only database that scales real-world applications on grid
• Smart Flash Cache for 20x IOP‟s or 20x fewer disks
• Smart Flash cache can hold entire working set
• Up to 50x compression for archival data
• Secure, fault tolerant

• Best for Consolidation


• Only database machine that runs and scales all workloads
• Predictable response times in multi-database, multi-application, multi-user
environments

© 2009 Oracle Corporation For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Smart features
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Smart Scans <Insert Picture Here>

• Smart Scan feature support


• How Smart Scans work
• Smart Scans and Oracle features
• Other Smart features
• Smart Scan benefits
<Insert Picture Here>

Smart Scans
Smart Scan

• Finite resources can lead to performance bottlenecks


• I/O is the chief source of bottlenecks in modern
computing systems
• Smart Scan is designed to reduce the amount of data
flowing from the storage devices to the database
servers

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2010 Oracle Corporation – Proprietary and Confidential
Traditional Scan Processing

• Traditional Scan Example:


 • Telco wants to identify
SELECT customers that spend
customer_name more than $200 on a
FROM calls single phone call
WHERE amount > • The information about
200; these premium customers
occupies 2MB in a 1
terabyte table
Traditional Scan Processing

• Traditional Scan Example:


 • With traditional storage, all
SELECT database intelligence
customer_name resides in the database
FROM calls hosts
WHERE amount > • Database server nodes
200; must identify all table
extents that may contain
requested data
 • Partitioning may help to
Table eliminate some extents
Extents
Identified
Traditional Scan Processing

• Traditional Scan Example:


 • Database server issues I/O
SELECT requests for all potentially
customer_name relevant data
FROM calls • Storage system returns all
WHERE amount > relevant data to database
200; server, using I/O
bandwidth
• Storage system
 returns blocks of data
Table
Extents
Identified

 
I/Os Issued I/Os Executed:
1 terabyte of data
returned to hosts
Traditional Scan Processing

• Traditional Scan Example:


 • Database server must
SELECT 
discard irrelevant data by
customer_name Rows Returned
checking values against
FROM calls selection criteria
WHERE amount > • Final results are sent to
200; client

• Large use of resources
DB Host reduces
terabyte of data to 1000 • CPU/memory for mapping
 extents
customer names that
Table
are returned to client • I/O bandwidth from disk for
Extents
data which will be
Identified
discarded
• CPU to impose selection
criteria
 
I/Os Issued I/Os Executed:
1 terabyte of data
returned to hosts
Exadata Smart Scan Processing

SELECT • Smart Scan Example:
customer_name • Same SQL request is
FROM calls issued
WHERE amount > • Smart Scan completely
200; transparent to applications
and users
• Even if cell fails during
operations
Exadata Smart Scan Processing

SELECT • Smart Scan Example:
customer_name • Database server sends
FROM calls database extents and
WHERE amount > metadata to Exadata
200; Storage Server cells


Table extents and
meta-data sent to
cells
Exadata Smart Scan Processing

SELECT • Smart Scan Example:
customer_name • Smart Scan processing on
FROM calls the Exadata Storage cells
WHERE amount > scans data blocks to
200; identify relevant rows and
columns


Table extents and
meta-data sent to
cells


Smart Scan
identifies rows and
columns within
terabyte table that
match request
Exadata Smart Scan Processing

SELECT • Smart Scan Example:
customer_name • Only relevant rows and
FROM calls columns returned to
WHERE amount > database server
200; • Does not return blocks
when Smart Scan
used
 • Will return blocks
Table extents and when appropriate
meta-data sent to
cells


Smart Scan

identifies rows and
2MB of data
columns within
returned to server
terabyte table that
match request
Exadata Smart Scan Processing

SELECT • Smart Scan Example:
customer_name  • Database server only has
FROM calls to assembled returned
WHERE amount > Rows Returned relevant data into result set
200; • No wasted I/O bandwidth or
database server CPU


Consolidated
Table extents and Result Set
meta-data sent to Built From All
cells Cells


Smart Scan

identifies rows and
2MB of data
columns within
returned to server
terabyte table that
match request
<Insert Picture Here>

Smart Scan feature


support
Smart Scan
Row filtering
• Predicate filtering
• >, <. =, !=, <=, =>, IS [NOT] NULL, LIKE, [NOT] BETWEEN,
[NOT]IN, EXISTS, IS OF type, NOT, AND, OR
• Most SQL functions
• Full list
• SELECT * FROM v$sqlfn_metadata WHERE offloadable =
‗YES‘;
Smart Scan
Column projection

• Smart Scan only returns columns requested by query


• Significant reduction in I/O bandwidth
• Diagram based on SQL query SELECT B, C FROM tablea;

A B C D E A B C D E B D
Smart Scan
Join filtering

• Join filtering for star schemas


• Joins large tables to smaller tables
• Uses Bloom filters
• A way to indicate membership in a set in a compact way
• Bloom filters are used to reduce potential row candidates
for join, reducing the data sent to the database server for
join processing
<Insert Picture Here>

Smart Scans – how they


work
Smart Scan
Uses direct reads
• Direct reads are not new to Exadata
• Direct reads involves reading the data into PGA buffers as opposed
to the buffer cache used in for caching data blocks
• Direct reads make sense when the ratio of cache to the data to be
read is very small.
• If the cache is very small relative to the data to be read, the buffers
are going to be evicted anyways and perhaps at a cost of
adversely affecting any OLTP type of applications.
• Exadata for DW environments involves scanning large volume of data
and returning results in formatted data blocks – should not go to
buffer cache
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount > 200;
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount > 200;
• Optimizer makes execution plan

Step 1

Step 2

FULL ACCESS
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount > 200;
• Optimizer makes execution plan
• Full scan access – Smart Scan
eligible
• Smart Scan not performed if query Step 1

Step 2

columns includes LOBs or other FULL ACCESS

conditions
• Smart scan processing
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount > 200;
• Optimizer makes execution plan
• Full scan access – Smart Scan
• Smart scan processing
Step 1

• Select rows and projected Step 2

FULL ACCESS

columns returned to PGA


Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount = 200;
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount = 200;
• Optimizer makes execution plan

Step 1

Step 2

No FULL ACCESS
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount = 200;
• Optimizer makes execution plan
• Not scan access – block request

Step 1

Step 2

No FULL ACCESS
Smart Scan at work

• Query submitted SELECT customer_name


FROM calls WHERE amount = 200;
• Optimizer makes execution plan
• Not scan access – block request
• Blocks returned to buffer cache
Step 1

• Normal block processing Step 2

No FULL ACCESS
<Insert Picture Here>

Smart Scans and Oracle


features
Smart Scan and Oracle features

• All standard Oracle features continue to work as normal,


including
• Consistent reads
• Locking
• Chained rows
• Compressed table
• Partitioned tables
• Materialized views
• National Language Processing
• Date arithmetic
• Regular expression searches

. . . and everything else . . .


<Insert Picture Here>

Other Smart features


Exadata Software features 11.2
Offloaded data mining scoring

• Data mining scoring executed in Exadata:

select cust_id
from customers
where region = ‘US’ Scoring function
and prediction_probability(churnmod, ‘Y’ using *) > 0.8; executed in
Exadata

• All data mining scoring functions offloaded to Exadata


• Up to 10x performance gains
• Reduced CPU utilization on Database Server
Exadata Software Features
Smart incremental backup
• Recovery Manager does Block Change Tracking
• Maintains list of groups of blocks where data has changed
• Incremental backup only backs up marked groups of blocks
• Exadata Storage Server improves the granularity of tracking units,
reducing size of backup even more

Change Tracking File Content for 1MB

001010110010101100101011001010000

Smart Incremental backup Request


Exadata Software Features
Fast file creation

• Standard tablespace creation/extension


• Tablespaces created by the database are initialized
• Full blocks initialized as part of process by database server
and written to storage
• Exadata tablespace creation/extension
• Only metadata is sent by database server to Exadata Storage
Server
• Initialization is done by the Exadata Storage Server software
on the drives
• Tremendous reduction in I/O between database and
storage system
• Corresponding reduction in overhead
Exadata software features
Encrypted data

• Exadata Storage Servers perform Smart Scans on


encrypted data in tablespaces and columns
• Data is decrypted by Exadata cells before being sent
to the database servers
• X2 Exadata Database Machines use hardware decryption
<Insert Picture Here>

Smart Scan benefits


Smart Scans
Benefits

• Normal scans return all data blocks to the database


server
• Scan speed of storage throttled by limitation in data
flow from storage to database server
• Smart Scan can perform scans at full 1.5 GBs/sec,
while only returning relevant data
• Smaller amount of relevant data does not cause I/O
bottleneck
• Since no storage-database server bottleneck, more
cells scale for higher throughput
Smart Scans
Determining benefits

• Single query
• EXPLAIN PLAN
• Operation name and redicate information will use keyword of
storage
Smart Scan
CELL_OFFLOAD_PLAN_DISPLAY
• Controls whether the offload status of a step in an
execution plan is displayed
• Set with ALTER SYSTEM or ALTER SESSION
commands
• Values
• AUTO (default) – displays predicates if cell is present and table is
on the cell
• ALWAYS – shows option whether cell is present or not
• NEVER – does not display offload status
• Be aware – optimizer does not control if processing is
actually offloaded, just if it is eligible
Monitoring Smart Scan
Efficiency

cell session smart scan efficiency = cell IO uncompressed bytes +


cell physical IO bytes saved by storage index)/
cell physical IO interconnect bytes returned by smart scan

SQL> SELECT b.name, a.value FROM v$mystat a, v$statname b


WHERE a.STATISTIC# = b.STATISTIC# AND
b.name = 'cell session smart scan efficiency';
NAME VALUE
--------------------------------------------------------------------------
cell session smart scan efficiency 11.9
Monitoring Smart Scan
V$SQL statistics
• Statistics for individual SQL statements
• IO_CELL_OFFLOAD_ELIGIBLE_BYTES
• IO_CELL_OFFLOAD_RETURNED_BYTES
• OPTIMIZED_PHY_READ_BYTES
• And others
• Also available in
• V$SQLAREA
• V$SQLAREA_PLAN_HASH
• V$SQLSTATS
• V$SQLSTATS_PLAN_HASH

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2010 Oracle Corporation – Proprietary and Confidential
Smart Scans
Building on benefits

10 TB of user data 1 TB 100 GB


Requires 10 TB of IO with compression with partition pruning

Subsecond
On Database
Machine
20 GB 5 GB
with Storage Indexes with Smart Scans

Data is 10x Smaller, Scans are 2000x faster


For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Compression
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Oracle compression options <Insert Picture Here>

• Advanced Compression Option (ACO)


• Advanced Compression in the real world
• Exadata Hybrid Columnar Compression
(EHCC)
• EHCC in the real world
• Tips and techniques
<Insert Picture Here>

Oracle compression
options
Oracle Database Compression

Use Case Product Feature

Database 11g
OLTP Advanced Compression Advanced Compression

Database 11g SecureFiles Compression


Unstructured (File) Data Advanced Compression SecureFiles Deduplication

Database 11g RMAN Compression


Backup Compression Advanced Compression Data Pump Compression
Database 11g Data Guard Redo
Network Compression
Advanced Compression Transport Compression

Data Warehouses Exadata V2 EHCC Warehouse Compression

Cold / Historical Data Exadata V2 EHCC Archive Compression


<Insert Picture Here>

Advanced Compression
Option
Advanced Compression
Compress All Your Data

• Compress large application tables


• Transaction processing, data warehousing
• Compress all data types
• Structured and unstructured data types
• Improve query performance
• Cascade storage savings throughout data
center
Up To

4X
Compression
Advanced Compression Option
Table Compression
• Oracle Database 11g extends table compression for
OLTP (and other) data
• Support for conventional DML operations
• Average storage savings of 2-4x
• New algorithm significantly reduces write overhead
• Improved performance for queries accessing large
amounts of data
• Compression enabled at either the table or partition
level
• Completely transparent to applications
Table Compression
Block-Level Batch Compression

• Patent pending algorithm minimizes performance overhead and


maximizes compression
• Individual INSERTs and UPDATEs do not cause recompression
• Compression cost is amortized over several DML operations
• Block-level (local) compression keeps up with frequent data
changes in OLTP environments
• Competitors use static, fixed size dictionary table thereby
compromising compression benefits
Table Compression

Initially Uncompressed Block


Employee Table

ID FIRST_NAME LAST_NAME Header


1 John Doe
1•John•Doe 2•Jane•
2 Jane Doe
Doe 3•John•Smith 4•
3 John Smith Jane • Doe
4 Jane Doe Free Space

INSERT INTO EMPLOYEE


VALUES (5, „Jack‟, „Smith‟);
COMMIT;
Table Compression

Employee Table Compressed


Block Block

ID FIRST_NAME LAST_NAME Header


John=|Doe=|Jane=|Smith=
1 John Doe
1•• 2•• 2•Jane•
1•John•Doe 3•• 4 • 
2 Jane Doe •Doe
 5•Jack•
3•John•Smith 4•
3 John Smith Jane •Free
DoeSpace
4 Jane Doe

5 Jack Smith

Local
Symbol Table

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2009 Oracle Corporation – Proprietary and Confidential
Table Compression Syntax
OLTP Table Compression Syntax:
CREATE TABLE emp (
emp_id NUMBER
, first_name VARCHAR2(128)
, last_name VARCHAR2(128)
) COMPRESS FOR OLTP;

Direct Load Compression Syntax (default):


CREATE TABLE emp (
emp_id NUMBER
, first_name VARCHAR2(128)
, last_name VARCHAR2(128)
) COMPRESS [BASIC];
Advanced Compression Option
Table Compression Advisor
• Available in 11g Release 2
• Available on OTN *
• Supports Oracle Database 9i Release 2 through 11g Release 1
• Shows projected compression ratio for uncompressed tables
• Reports actual compression ratio for compressed tables (11g Only)

* http://www.oracle.com/technology/products/database/compression/compression-advisor.html
Advanced Compression Option
SecureFiles
• Next-generation high performance LOB
• Superset of LOB interfaces allows easy migration from LOBs
• Transparent deduplication, compression, and encryption
• Leverage the security, reliability, and scalability of database
• Enables consolidation of file data with associated relational data
• Single security model
• Single view of data
• Single management of data
• Scalable to any level using SMP scale-up or grid scale-out
• SecureFiles standard with Oracle Database 11g
• Compression and deduplication with Advanced Compression
Option
• Encryption with Advanced Security Option
SecureFiles
Deduplication

Secure
Hash

• Enables storage of a single physical image for duplicate data


• Significantly reduces space consumption
• Dramatically improves writes and copy operations
• No adverse impact on read operations
• May actually improve read performance for cached data
• Duplicate detection happens within a table, partition or sub-partition
• Very useful for content management, email applications and data
archival applications
SecureFiles
Compression
• Significant storage savings for unstructured data
• Three levels of compression (LOW/[MEDIUM]/ HIGH ) provide desired
ratios
• 2-3x compression for typical files (combination of doc, pdf, xml)
• Compression Level LOW (NEW in 11.2)
• Compression algorithm optimized for high performance
• 3x less CPU utilization than default SecureFiles Compression
• Maintains 80% compression of default SecureFiles Compression

• Allows for random reads and writes to compressed SecureFile data


• Can be specified at a partition level
• Automatically detects if SecureFile LOB data is compressible
• Independent of table or index compression
Network Compression
Data Guard Redo Transport Services
• Compress network traffic between primary and standby databases
• Lower bandwidth networks (<100Mbps)
• 15-35% less time required to transmit 1 GB of data
• Bandwidth consumption reduced up to 35%
• High bandwidth networks (>100 Mbps)
• Compression will not reduce transmission time
• But will reduce bandwidth consumption up to 35%
• Syntax:
LOG_ARCHIVE_DEST_3='SERVICE=denver SYNC
COMPRESSION=ENABLE|[DISABLE]'

• Ref. MetaLink 729551.1 “Redo Transport Compression in a Data


Guard Environment”
Redo Transport Compression

NoCompression
OLTP workload No Compression Batch workload
Compression
Compression 300
20
Redo Transport(Mbit/sec)

18

REDO Transport (Mbit/s)


250
16
14 2X Compression! 200
12 5X Compression!
10 150
8
100
6
4
50
2
0 0
Time Time

• More efficient bandwidth utilization, up to 5x compression ratio


• Compression did not impact throughput or response time

Validation performed by CTC in collaboration with Oracle Japan Grid Center


http://www.ctc-g.co.jp/en/
<Insert Picture Here>

Advanced Compression
in the real world
Advanced Compression
Oracle‟s Internal E-Business Application DB
• Oracle‘s Internal E-Business Suite Production System deployed ACO in 2009
• 4-node Sun E25K RAC, 11gR1
• Average overall storage savings 3x
• Table compression 4x
• Index compression 2x
• LOB compression 2.3x
• 65TB of realized storage savings primary, standby and test systems
• Additional benefits were also accrued in Dev clones and Backups
• Payroll, Order-2-Cash, AP/AR batch flows, Self-Service flows run without regression,
Queries involving full table scans show speedup
Advanced Compression
Oracle‟s Internal Beehive Email DB
• Production system on 11gR1 & Exadata for Primary and Standby
• Using Exadata Storage Servers for storage
• Average Compression Ratio: 2x
• Storage savings add up with standby, mirroring, flash recovery area
• Compression went production in 2009
• Consolidate 90K employees on this email server, more being migrated
• Savings As of April 2010
• Beehive Saved 365TB of storage using Advanced Compression
• Incrementally saves 2.6TB/day based on db size growth
• Savings higher with Sun user migration
• Compression also helped improve performance by caching only
compressed emails in memory and reducing I/O latencies
Advanced Compression
SAP R/3, BW, Leading Global Company
• Compression on SAP databases
at leading global company
• Oracle Database 11g Release 2
• SAP R/3 DB
• 4.67TB Uncompressed
• 1.93 TB Compressed
• 2.4x compression ratio
• SAP BW DB
• 1.38 TB Uncompressed
• .53 TB Compressed
• 2.6x compression ratio
• Leverage 11g compression for
Tables, Indexes and LOB data
<Insert Picture Here>

Exadata Hybrid
Columnar Compression
Exadata Hybrid Columnar Compression
Compression • New in Exadata Version 2
Unit
• Hybrid columnar compressed tables
• New approach to compressed table storage
• Useful for data that is bulk loaded and queried
• Update activity is light
• How it works
• Tables are organized into Compression Units
10x to 15x (CUs)
Reduction • CUs are a multiple of database block size
• Within Compression Unit, data is organized by
column instead of by row
• Column organization brings similar values
close together, enhancing compression
Exadata Hybrid Columnar Compression
Compression Units
• Compression Unit
• Logical structure spanning multiple database blocks
• Data organized by column during data load
• Number of rows for a CU determined at load, based on row size
and estimated compression
• Each column compressed separately
• All column data for a set of rows stored in compression unit
Logical Compression Unit

BLOCK HEADER BLOCK HEADER BLOCK HEADER BLOCK HEADER


CU HEADER C3 C7
C5
C1 C8
C4
C2 C6 C8
EHCC tables
Details
• Data loaded using direct load uses Hybrid Columnar
Compression
• Parallel DML, INSERT /*+ APPEND */, Direct Path SQL*LDR
• Optimized algorithms avoid or greatly reduce overhead
of decompression during query
• Individual row lookups consume more CPU than row format
• Need to reconstitute row from columnar format
EHCC tables
Details
• Updated rows automatically migrate to lower
compression level to support frequent transactions
• Table size will increase moderately
• All rows in Compression Unit are locked during
updates
• Data loaded using conventional INSERTs use lower
compression level
Exadata Hybrid Columnar Compression
Integration with Oracle features

• Fully supported with…


• B-Tree, Bitmap Indexes, Text indexes
• Materialized Views
• Exadata Server and Cells including offload
• Partitioning
• Parallel Query, PDML, PDDL
• Schema Evolution support, online, metadata-only add/drop
columns
• Data Guard Physical Standby Support
• Logical Standby (as of 11.2.0.2)
• Streams supported in a future release
Exadata Hybrid Columnar Compression

Warehouse Compression Archive Compression


• 10x average storage savings • 15x average storage savings
• 10x reduction in Scan IO • Up to 70x on some data
• For cold or historical data

Optimized for Speed Optimized for Space

Smaller Warehouse Reclaim 93% of Disks


Faster Performance Keep Data Online

Can mix OLTP and hybrid columnar compression by partition for ILM
Exadata Hybrid Columnar Compression
Warehouse Compression
• 10x average storage savings
• 100 TB Database compresses to 10 TB
• Reclaim 90 TB of disk space
• Space for 9 more ‗100 TB‘ databases
• 10x average scan improvement
• 1,000 IOPS reduced to 100 IOPS

10
TB
100 TB
Exadata Hybrid Columnar Compression
Archive compression
• Compression algorithm optimized for maximum storage
savings
• Benefits any application with data retention requirements
• Best approach for ILM and data archival
• Minimum storage footprint
• No need to move data to tape or less expensive disks
• Data is always online and always accessible
• Run queries against historical data (without recovering from tape)
• Update historical data
• Supports schema evolution (add/drop columns)
Exadata Hybrid Columnar Compression
Archive compression
• Optimal workload characteristics for Archive compression
• Any application (OLTP, Data Warehouse)
• Cold or historical data
• Data loaded with bulk load operations or compressed using in-
database bulk compression operations
• Minimal access and update requirements

• 15x average storage savings


• 100 TB database compresses to 6.6 TB
• Keep historical data online forever
• Up to 70x savings seen on production customer data
EHCC Syntax

Warehouse Compression Syntax:


CREATE TABLE emp (…)
COMPRESS FOR QUERY [LOW | HIGH];

Online Archival Compression Syntax:


CREATE TABLE emp (…)
COMPRESS FOR ARCHIVE [LOW | HIGH];
Exadata Hybrid Columnar Compression
Comparisons
1000 1000 100
Uncompressed Uncompressed Pure
Columnar
Cliff

500 Table Compress 500 50


Table
Hybrid & Pure Hybrid
Column Pure Column
0 0 0
Table Size Scan Time Row Lookup Time

• Hybrid Columnar Compression combines the best of row and


column formats
• Best compression – matching full columnar
• Excellent scan time – 93% as good as full columnar
• Good single row lookup – no full columnar ―cliff‖
• Row format best for workloads with updates or trickle feeds
Data Archiving Strategies

• OLTP Applications
• Table partitioning
• Heavily accessed data
• Partitions using OLTP Table Compression
• Cold or historical data
• Partitions using Online Archival Compression

• Data Warehouses
• Table partitioning
• Heavily accessed data
• Partitions using Warehouse Compression
• Cold or historical data
• Partitions using Online Archival Compression
EHCC benefits
Efficient data movement
• Read/Write compressed data to disk
• Write compressed data to ASM mirrors
• Read/Write compressed data in Flash Cache
• 10x improvement for Flash price performance
• Send compressed data over Infiniband
• Write compressed data to Redo Logs
• Send compressed data to standby database
• 10x reduction in WAN bandwidth cost: makes ADG appealing for DW
• Write compressed data to Backups

Oracle Confidential
EHCC benefits
Efficient queries

• Specialized columnar query processing engine runs in


Exadata Storage Server to run directly against compressed
data
• Column optimized processing of query projection and filtering
• Vector processing techniques used to fully leverage columnar format
• 10x to 100x smaller subset of qualifying data returned over
Infiniband to database server for further query processing
• Optimized single row lookups to perform efficient I/O of a
contiguous set of blocks that form a Compression Unit

Oracle Confidential
<Insert Picture Here>

EHCC in the real world


Exadata Hybrid Columnar Compression
Storage savings
• Retail
• Top Global Retailer 4x
• Scientific Data Customer (EHCC, Archive Compression)
• Top R&D customer (with PBs of data): 28x
• OLTP Customer (EHCC, Archive Compression)
• SAP R/3 Application, Top Global Retailer: 28x
• Oracle E-Business Suite, Oracle Corp.: 23x
• Custom Call Center Application, Top Telco: 15x
Exadata Hybrid Columnar Compression
Storage savings
• Financial (EHCC, Data Warehouse Compression)
• Top Financial Services 1: 11x
• Top Financial Services 2: 24x
• Top Financial Services 2: 19x
• Telco (EHCC DW Compression)
• Top Telco 1: 8x
• Top Telco 2: 14x
• Top Telco 3: 6x
• Top Telco 4: 7x
Real World DW Performance
(Leading Financial Company)

• Compression Ratios
• Query High: 11x
• Archive High: 16x

• Load Performance
• data pump loading from flat file
• 28% increase in elapsed time

• Query Performance
• 40% faster to execute 60
queries in customer workload

Oracle Confidential
EHCC benefits
Table scan performance
• Table scans of EHCC data run significantly faster than
uncompressed
• Sample test run (uses Call Data Record data, 46
columns)
• Compression ratio: 14x
• Load takes 55% more time
• Table Scan runs 5.5x faster (less disk I/O)

Oracle Confidential
Exadata Hybrid Columnar Compression
Estimating savings
• EHCC Compression Advisor
• Runs on any 11.2 setup (non-Exadata too)
• Given a sample of customer data, provides compression ratio
estimates
• Patch available for 11.2 (8896202)
<Insert Picture Here>

Tips and techniques


Tips and techniques
Compression Advisor

• Too little data can reduce compression ratios


• It is best to try with a big dataset, if possible
• By default, advisor does sampling. You can specify it to use
all rows.
• You should run Advisor with data co-located as
customer is going to use.
• Do not perform extra sort or partition
• UNIFORM tablespaces can have unused blocks.
• Advisor cannot be used on UNIFORM tablespaces
Tips and techniques
When to use EHCC
• Designed for data warehouse workloads
• What if customer has lot of DMLs in workload?
• EHCC can be changed per partition
• Use ILM to compress older, less updated partitions
• Use ‗ALTER TABLE MOVE‘ when partition has stabilized.
• How can I determine if I should do ‗ALTER TABLE
MOVE‘?
• Use dbms_compression.get_compression_type
Tips and techniques
Loading data

• 1 - 2TB/hour compressed loads on full Exadata Rack


• EHCC load speeds comparable to basic compression
• Loading speeds depend on data and compression level
• If customer wants higher load speeds
• High speed load mode available – ‗Query Low‘
• EHCC can be turned off temporarily during critical loads
• Or load into uncompressed and then compress partition later
• Make sure loads are ‗Direct path‘
• No EHCC for single row or buffered row inserts
Tips and techniques
Loading data

• Use DBFS as a staging file system


• Check the data distribution
• If all the data is going into few partitions, speed can appear
slow
Tips and techniques
Storage savings
• Storage savings very dependent on data
• Can vary from 2x-200x
• Compression ratios can be misleading when
compared to other competitors
• Ratio depends on efficiency of non-compressed storage
• Always compare final size of a table on disk
• If customer wants higher storage savings
• Higher storage saving mode ‗Archive Low‘ can be used.
• Don‘t use UNIFORM Tablespaces
• UNIFORM tablespaces can cause extra blocks to be
allocated
Tips and techniques
Performance
• Highest benefit for I/O-bound queries
• If query is disk-bound, it can speed up by compression ratio
• CPU-bound queries may not see as much
performance improvement
• Storage saving benefits still attractive
• Most queries see speed ups somewhere in between
• Look at customer queries to see if they can be sped
up
Tips and techniques
ILM
• You can assign compression techniques based on
partitions
• For active partitions
• Advanced Compression
• Compresses data as it is updated and added
• For less active partitions
• EHCC, warehouse mode
• Better compression, little performance impact
• For historical partitions
• EHCC, archive mode
• Best compression
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Storage indexes
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Storage indexes
Exadata Storage Index 11.2
Transparent I/O Elimination with No Overhead

Table Index
• Exadata Storage Indexes maintain summary
information about table data in memory
A B C D
• Store MIN and MAX values of columns
1 • Typically one index entry for every MB of disk
Min B = 1
3
Max B =5 • Eliminates disk I/Os if MIN and MAX can never
5 match ―where‖ clause of a query
5
Min B = 3 • Completely automatic and transparent
8
Max B =8
3

Select * from Table where B<2 - Only first set of rows can match
Storage indexes
How they work

• Storage indexes are used to filter out data from


consideration
• Indexes help you find data, storage indexes help you filter
data
• Index values are created for 1 MB storage regions
• Each index region indexes can have its own set of
columns
• Based on heuristic evaluation of data distribution
• Minimum and maximum values for multiple columns
in each storage region
Exadata Storage Indexes
Sample Table SALES
MIN MAX
Order_date Ship_date Cust_ID Prod_ID Amount
Data Chunk #1

Order_date 03-SEP-2009 03-SEP-2009


03-SEP-2009 19-SEP-2009 10075 32932 10,000.00

Ship_date 05-SEP-2009 07-OCT-2009


03-SEP-2009 05-SEP-2009 20098 20098 20,000.00
Cust_ID 10075 20098

03-SEP-2009 07-OCT-2009 10089 20010 15,000.00 Prod_ID 20010 32932

Amount 10,000 20,000


03-SEP-2009 01-OCT-2009 20100 10000 35,000.00
Data Chunk #2

03-SEP-2009 19-OCT-2009 80300 30000 10,000.00 Order_date 03-SEP-2009 03-SEP-2009

Ship_date 01-OCT-2009 03-NOV-2009

03-SEP-2009 03-NOV-2009 10000 2030 40,000.00


Cust_ID 10000 80300

Prod_ID 2030 30000

Amount 10,000 40,000

• Synopsis for frequently used columns


is automatically collected
• Stored in-memory within Exadata Storage Server
Exadata Storage Indexes
Sample Table SALES
MIN MAX
WHERE ship_date
Data Chunk #1
between ’01-SEP-2009’
Order_date 03-SEP-2009 03-SEP-2009
and ’30-SEP-2009’
Ship_date

Cust_ID
05-SEP-2009

10075
07-OCT-2009

20098
 leads to elimination of data chunk #2
Prod_ID 20010 32932
Order_date Ship_date Cust_ID Prod_ID Amount

Amount 10,000 20,000


03-SEP-2009 19-SEP-2009 10075 32932 10,000.00
Data Chunk #2
03-SEP-2009 05-SEP-2009 20098 20098 20,000.00
Order_date 03-SEP-2009 03-SEP-2009

Ship_date 01-OCT-2009 03-NOV-2009


x 03-SEP-2009 07-OCT-2009 10089 20010 15,000.00

Cust_ID 10000 80300 03-SEP-2009 01-OCT-2009 20100 10000 35,000.00

Prod_ID 2030 30000


03-SEP-2009 19-OCT-2009 80300 30000 10,000.00

Amount 10,000 40,000


03-SEP-2009 03-NOV-2009 10000 2030 40,000.00

• Storage Index eliminates data chunks of no interest


• Provides ‗partition-pruning‘-like functionality
Storage indexes
Monitoring

• Can monitor I/O savings from v$sysstat using the


statistic "cell physical IO bytes saved by storage
index‖
Storage indexes
Conditions

• Works with Smart Scan queries


• Predicate selection uses storage indexes if appropriate
• Works with <, <=, =, !=, >=, >, IS [NOT] NULL
• Storage index works with joins based on Bloom filters
• Works with uncompressed tables, OLTP
compression, EHCC, tablespace encryption
Storage indexes
Conditions - continued

• NLS columns and LOBs will not be used in a storage


index
• Writes for Hybrid Columnar Compression and
tablespace encryption invalidate storage region index
• Non-configurable
Storage indexes
“Maintenance”

• Storage indexes lost in the event of a cell reboot


• Portions of SI may be invalidated as a result of write
operations
• Rebuilt as Smart Scan queries touch storage regions
• Heuristically adjusted in response to distribution of
predicate columns in Smart Scan queries
• Think of storage index maintenance as cyclical
• Loading data in sorted order can result in good use of
storage indexes
Storage Index with partitions
Example
Orders Table
Order# Order_Date Ship_Date Item
Partitioning Column
1 2007 2007
2 2008 2008
3 2009 2009

• Queries on Ship_Date do not benefit from Order_Date partitioning


• However Ship_date and Order# are highly correlated with Order_Date
• e.g. Ship dates are usually near Order_Dates and are never less
• Storage index provides partition pruning like performance for queries on
Ship_Date and Order#
• Takes advantage of ordering created by partitioning or sorted loading
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Resource Manager
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Resource Manager overview <Insert Picture Here>

• Contending CPU workloads


• Parallel execution workload management
• Database consolidation
• Server consolidation
<Insert Picture Here>

Resource Manager
overview
Resource Manager
Overview

• Allows you to allocate


over-subscribed
resources
• Key tool for guaranteeing
SLAs
• Works with Oracle
databases since 8i
• Works transparently,
based on session ID
Resource Manager
Implementation

1. Group sessions with similar performance objectives into


Consumer Groups
2. Allocate resources to consumer groups using Resource
Plans
3. Enable Resource Plan
Creating Consumer Groups
• Create Consumer Groups for each type of workload, e.g.
• OLTP consumer group
• Reports consumer group
• Low-Priority consumer group
• Create rules to dynamically map sessions to consumer groups, based
on session attributes

Mapping Rules Consumer Groups

OLTP
service = „Customer_Service‟
client program name = „Siebel Call Center‟
Oracle username = „Mark Marketer‟
Reports
module name = „AdHoc‟
query has been running > 1 hour
estimated execution time of query > 1 hour
Low-Priority
Creating Resource Plans

Priority-Based Plan
Priority 1: OLTP
Ratio-Based Plan Priority 2: Reports
Priority 3: Ad-Hoc
Reports
30%
OLTP
60%
Hybrid Plan
Low-Priority
Level 1 Level 2
10%
OLTP 90%
Reports 60%
Low-Priority 40%
Enable Resource Management

• Manually
• Set resource_manager_plan parameter
• Automatically
• Set resource plan for a scheduler window
<Insert Picture Here>

Contending CPU
workloads
Resource Manager
Contending CPU workloads

When a database host has


100%
insufficient CPU for all workloads,
the workloads will compete for
60%
CPU.
Performance of all workloads will
CPU 90%
Usage 80% degrade!

40%
What if you cannot tolerate
performance degradations for
certain workloads?
OLTP Reports ETL +
only only Reports
Resource Manager
Contending CPU workloads
100%

20%

CPU
With Resource Manager,
80% 90% 80% 90% you control how CPU
Usage
resources should be
allocated

10%

OLTP Reports OLTP + Reports OLTP + Reports


only only

Resource Manager Enabled

OLTP Reports
Prioritized Prioritized
Resource Manager
CPU management details

• Very fine-grained scheduling


• Resource Manager schedules at a 100 ms quantum
• Low-priority session yields to a high-priority session in ~1
quantum
• Background processes are not managed
• Backgrounds are either high-priority or not CPU-intensive
• Maximize CPU utilization
• If one consumer group doesn‘t use its allocation, it is
redistributed to other consumer groups based on the resource
plan
<Insert Picture Here>

Parallel execution
workload management
Parallel execution
Potential problems

• Parallel servers are a limited resource


• Limit specified by parallel_max_servers
• Too many concurrent parallel statements causes thrashing
• When there are no more parallel servers
• Critical statements may run serially
• When parallel servers free up, no way to boost DOP of
running statements
• Non-ideal solutions
• Size system for maximum load, inefficient
• Manually schedule large queries during off hours
Parallel Statement Queuing

• Introduced in 11.2.0.1
• Goals:
1. Run enough parallel statements to fully utilize system
resources
2. Ensure appropriate degree of parallelism for all statements
• Enable by setting parallel_degree_policy = “auto”
Parallel Statement Queuing

Statement is parsed If not enough parallel


and Oracle automatically servers available, queue
determines DOP the statement
SQL
statements
64 32
64 16
32 128
16

FIFO Queue

When the required number


of parallel servers become
If enough parallel available, dequeue and
servers available, execute the head
execute immediately statement

8
128
Parallel Statement Queuing
With Resource Manager
• One Consumer Group can flood the system and
queue with queries
• Critical queries are forced to queue
• Critical queries are stuck behind batched queries
 Limit the DOP for queries from a Consumer Group
 Limit the percentage of parallel servers a Consumer
Group can use
• Reserves parallel servers for critical parallel queries
• Coming soon…

For example, parallel queries from the Batch consumer


group can only use 50% of the parallel servers
Parallel Statement Queuing
With Resource Manager

• DBAs want to control the order that parallel queries


are dequeued
• Prioritize tactical queries over batch and ad-hoc queries
• Impose a user-defined policy for ordering queued parallel
statements

• Coming soon…
 Separate queues per Consumer Group
 Resource Plan specifies which queues parallel
statements are issued next
Parallel Statement Queuing
With Resource Manager
Current Resource Plan:
Priority 1: Tactical
Next Parallel Priority 2, 70%: Normal
Query Priority 2, 30%: Ad-Hoc

Tactical T T
Consumer Group

Parallel TTT
Normal N
Consumer Group Query N
Selection

Ad-Hoc A A A
Consumer Group

Running
Queries
Test Results: 2 Concurrent Workloads

Crtical Analytics: 150% degradation


Non-critical reporting: 9% degration

Crtical Analytics: 16% degradation


Non-critical reporting: 10% degration
<Insert Picture Here>

Database consolidation
Database consolidation challenges
Service levels

• Insuring service levels for all applications


• Surge in one application‘s workload should not affect
another‘s
• Need a minimum, guaranteed amount of CPU and I/O per
application

Use CPU Resource Manager to allocate CPU


Use Exadata I/O Resource Manager to allocate I/O
Database consolidation challenges
Consistent performance

• Insuring consistent performance


• An application‘s performance should be consistent, even if all
other applications are idle
• Need a way to limit CPU and I/O utilization!

Specify maximum CPU and I/O utilization per


Consumer Group in Resource Plan
• CPU utilization limit and I/O utilization limit – new in 11.2.0.1
Maximum Utilization Limit
• ―max_utilization_limit‖ directive limits an application‘s
CPU and I/O utilization

DB Consolidation Plan #1 DB Consolidation Plan #2

Resource Maximum Resource Maximum


Allocation Utilization Limit Allocation Utilization Limit
App 1 50% 50% App 1 50%
App 2 20% 50% App 2 20%
App 3 20% 50% App 3 20%
App 4 10% 50% App 4 10%

Specify minimum and maximum Specify maximum CPU and I/O


CPU and I/O utilization limits utilization limits only
Test Results: CPU Utilization Limit
Setting limit to 25%, 50%, and 75%
Workload is a mix of
OLTP transactions,
parallel queries, and
DMLs from Oracle
Financials
Test Scenario: I/O Utilization Limit

Limiting Disk Utilization for TPC-H Workload

100.00%
90.00%
80.00%
% Disk Utilization

70.00%
No limit
60.00%
75% limit
50.00%
50% limit
40.00%
30.00% 25% limit
20.00%
10.00%
0.00%
1 2 3 4 5 6 7 8 9 10 11 12 13 14

Time in Minutes
<Insert Picture Here>

Server consolidation
Server Consolidation
Challenges
• Common theme in today‘s data centers
• Many test, development, and small production databases
• Low loads
• Not critical
• Cannot fully utilize today‘s powerful servers!
• Solution – server consolidation
• Run multiple database instances on the same server
• But there may be problems
• Contention for CPU, memory, and I/O
• Unexpected workload surges on one instance can wreak
havoc on other databases
Server consolidation challenge
Instance Caging

• Limits the CPU consumption of a database instance


• Advantages over virtualization
• No I/O overhead
• No new license
• No sys-admin overhead
• Advantages over O/S workload managers
• Available on all platforms
• Easy to configure
Instance Caging
Configuration

• Just 2 steps:
1. Set ―cpu_count‖ parameter
• Maximum number of CPUs the instance can use at any
time
2. Set ―resource_manager_plan‖ parameter
• Enable any CPU resource plan
• E.g. out-of-box plan ―DEFAULT_PLAN‖
Instance caging
Over-provisioning approach
• Scenario
• Multiple database Sum of cpu_counts = 12

instances sharing a server


• Instances are typically 12
well-behaved Instance D
9
• Server‘s CPUs not typically
Instance C
fully utilized 6
Total Number
• Use Instance Caging to 3
Instance B
of CPUs = 4
over-provision Instance A
• Limit each instance‘s CPU
usage
Instance caging
Partitioning approach
• Scenario Sum of cpu_counts = 32

• Multiple database
instances sharing a
32 Total Number
server of CPUs = 32
Instance D
• Performance-critical 24 Instance C
databases
Instance B
• Cannot afford any 16
interference from each
other 8
Instance A
• Use Instance Caging
to partition
Instance Caging
Results
Swingbench CPU Utilization

100%
• Swingbench OLTP 90%
application 80%
• 4 CPU Linux server 70%

• Oracle 11gR2 60% idle


50% sys
• Instance Caging enabled user
40%
with 2 CPUs
30%
20%
10%
0%
Instance Caging Off Instance Caging On
Instance Caging
Results

6000

• 2 sysbench 5000

applications

Transactions Per Second


Instance 1
4000
• 6 CPU Linux server Instance 2

• Oracle 11gR2 3000


Both Instances
• Instance Caging 2000 Instance 1 Only, No
enabled to partition Instance Caging
server 1000

0
0,6 1,5 2,4 3,3 4,2 5,1 6,0
cpu_count: <Instance 1>,<Instance 2>
Exadata I/O Resource Manager

Need to limit the disk utilization of a database?

Maximum disk utilization limits for I/O:


 Coming soon!
 Provides predictable, consistent performance
 Configure via inter-database resource manager plan
 Specifies the maximum disk utilization for each
database
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
I/O Resource Manager
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Shared storage issues <Insert Picture Here>

• I/O Resource Manager overview


• IORM resource management
• IORM examples
• IORM at work
• Enabling IORM
<Insert Picture Here>

Shared storage issues


Issues with shared storage

• Storage can be shared by multiple types of workloads


and multiple databases
• Sharing lowers administration costs
• Sharing leads to more efficient usage of storage
• But, workloads may not happily coexist
• ETL jobs interfere with DSS query performance
• One production data warehouse can interfere with another
• How do you gain benefits of shared storage without
introducing inconsistent performance?
Issues with shared storage
Traditional Solutions
• Over-provision the storage system
• Configure your storage based on the maximum expected load
• Wasteful and expensive
• Place performance-critical databases on dedicated
storage
• Still need to ensure that administrative tasks like backups or
data loads don‘t interfere
• High administrative costs and expensive storage
• Schedule non-critical tasks at off-peak hours
• Cumbersome and prone to problems
Issues with shared storage
Exadata solution
• Efficient utilization of I/O bandwidth
• Goal is to have 100% utilization
• Consistent performance
• Goal is to avoid 100+% utilization
• Prioritization of workloads
• Goal is high priority workloads get enough bandwidth
<Insert Picture Here>

I/O Resource Manager


overview
Sample Exadata Configuration
Single-Instance RAC
Database Database

InfiniBand Switch/Network

Exadata Cell Exadata Cell Exadata Cell

• Databases are deployed across multiple Exadata cells


• Database enhanced to work in cooperation with Exadata
intelligent storage
• ASM implements striping and mirroring for Exadata
• Exadata Storage Servers can support multiple databases
I/O Bandwidth Limits
Extreme consequences • Each Exadata Cell has an I/O
Production Database bandwidth limit
• If the databases issue I/O over this
limit, performance will degrade
500 MB/s

Ad-Hoc Queries Desired


Bandwidth: Desired Bandwidth:
500 + 1000 MB/s 500 + 1000 + 800 = 2300 MB/s
Available I/O Bandwidth:
1200 MB/s
Critical Reports
1000 MB/s

Storage Network
Reports Storage
Desired Bandwidth:
800 MB/s

Development Database
Managing the I/O Bandwidth with IORM
I/O Resource Manager provides a way to
Production Database manage how multiple workloads and databases
share the available I/O bandwidth

500 100 MB/s


Available I/O Bandwidth:
Ad-Hoc Queries Actual 1200 MB/s
Bandwidth: Actual Bandwidth:
500 100 + 1000 500 100 + 1000 + 800 100 =
MB/s 1200 MB/s

Critical Reports
1000 MB/s

Storage Network
Reports Storage
Actual Bandwidth:
800 100 MB/s

Development Database
When Does I/O Resource Manager
Help the Most?
• Conflicting Workloads
• Multiple consumer groups in a Database (e.g. ad hoc queries,
critical reports)
• Multiple databases (e.g. production, test)
• Concurrent database administration - backups, ETL, file
creation
• I/O is a bottleneck
• Significant proportion of the wait events are for I/O
• Any data warehouse workload!
<Insert Picture Here>

IORM resource
management
IORM Possible Scenarios
I/O
Resource
Manager
Inside Across
One Multiple
Database Databases

Mixed Dueling Cooperative


Workload Databases Databases

Intra-Database Inter-Database Category


Resource Resource Resource
Management Management Management
IORM Resource Management
Intra-database

• Used to manage multiple workloads in a single database


• Enabled at the database level by Database Resource Manager
and resource plans
• Group sessions with similar performance objectives into
consumer groups
• Create a resource plan that specifies how I/O requests should be
prioritized
Creating Consumer Groups
• Create consumer groups for each type of workload, e.g.
• Priority DSS consumer group
• DSS consumer group
• Maintenance consumer group
• Create rules to dynamically map sessions to consumer groups,
based on session attributes
Consumer Groups
Mapping Rules

Priority DSS
service = „PRIORITY‟
Oracle username = „LARRY‟
Oracle username = „DEV‟
client program name = „ETL‟
DSS
function = „BACKUP‟
query has been running > 1 hour
Maintenance
Creating Resource Plans

Priority-Based Plan
Priority 1: Priority DSS
Ratio-Based Plan Priority 2: DSS
Priority 3: Maintenance
DSS
30%
Priority DSS
60%

Maintenance Hybrid Plan


10%
Level 1 Level 2
Priority DSS 90%
DSS 100%
Maintenance 5%
Configuring Consumer Groups & Plans

• Consumer groups and plans are configured on the database


• Configure using dbms_resource_manager PL/SQL package
• Configure using Resource Manager section in Enterprise
Manager
• Plans are used for both CPU and I/O resource management
• Multiple plans can be defined
• E.g. daytime plan, evening plan, emergency maintenance
plan
• Set plans using ―resource_manager_plan‖ parameter
• Only one plan can be enabled at any time
• Use the Job Scheduler to automatically enable plans
IORM Resource Management
Inter-database
• Can I/O Resource Manager allow multiple databases to effectively
share Exadata storage with the following requirements?
• Partition resources among multiple production databases
• Don‘t allow standby, development, and test databases to impact
production databases

Sales Data Exadata Finance Data


Warehouse Storage Warehouse

Customer Sales
Service Development Sales Test
Standby Database Database
Database
IORM Resource Management
Inter-database

• Inter-database plan allocates resources for each database


• Divides resources among production databases
• Allocates unconsumed resources to test databases
• Configure and enable inter-database plans via CellCLI on each
Exadata Servers
• Can have multiple levels of plan
• Each sublevel uses resources left over from superior level
Exadata Inter-database Plans
Usage
• Only inter-database plan configured
– IORM picks a database I/O request using the inter-database
plan
• Inter-database plans can be configured along with intra-database
plans
– Inter-database plans manages I/O among databases
– Intra-database plans manages I/O among consumer groups
within a Database
– IORM first picks a database I/O request using the inter-
Database plan
– Then picks a consumer group from that database, using its
intra-database plan
Exadata Inter-database plans
Sales Data Warehouse Exadata
Priority DSS Consumer Cell
Group Queue

PP
Sales
Database
DDD Sales-Priority

DSS Consumer Group


Queue
I/O
Resource
Finance Data Warehouse Manager
Priority DSS Consumer
Group Queue
Finance-Priority
P
Finance
Database
DDDD
DSS Consumer Group
Queue
IORM Resource Managerment
Categories
• Categories are used to group consumer groups, based on nature
of workload
• Goal: workload priority should depend on its type, not just which
Database it‘s running on

Database Consumer Group Workload Type


Sales Production Priority DSS Critical
DSS Somewhat critical
Maintenance Not critical
Finance Production Priority DSS Critical
DSS Somewhat critical
Maintenance Not critical
Sales Development Priority DSS Not critical
DSS Not critical
Categories
Grouping consumer groups

• Category plan allocates resources for each Category


• Category is an attribute of each Consumer Group
• Category Plans are enabled and configured via CellCLI on each
Exadata Cells

Priority-Based Category Plan


Priority 1: Critical
Priority 2: Somewhat Critical
Priority 3: Not Critical
Categories
With other plans

• First, categories (if present) allocate I/O requests for cell


• Second, inter-database plans (if present) allocate I/O
requests for multiple databases per cell
• Finally, intra-database plans (if present) allocate I/O for
consumer groups within a cell
IORM Resource Management
Levels

• Levels are a way to give priority to some consumer


groups over others
• Each lower level gets to allocate I/O resources that were not
allocated by the previous level
• You can specify up to 8 levels of resource allocation
• Each level assigns percentages to consumer groups,
databases or categories
<Insert Picture Here>

IORM examples
IORM Possibilities

• Give 70% of my storage performance capacity to Data Warehouse


finance, 30% to Data Warehouse sales
 Enable an Inter-Database plan
• Prioritize my Production Databases over my Test and Development
Databases
 Enable an Inter-Database plan
• Prioritize my OLTP workloads over my maintenance workloads
 Enable a Category plan
• For my Standby Databases, prioritize apply I/O‘s over read-only queries
 IORM does this automatically
• Always prioritize Control File and other critical I/O‘s
 IORM does this automatically
• Automatically pace ASM rebalances and RMAN jobs
 IORM does this automatically
Scenario 1: OLTP vs Report

• Database has 2 workloads


• Critical OLTP workload: Order Entry application
• Non-critical workload: Report, based on Order Items table
• Your goal: protect the performance of critical OLTP workload
• Solution: use a priority-based resource plan

Priority-Based Plan
Priority 1: Interactive
Priority 2: Batch
Scenario 1: OLTP vs Report
Results
• I/O Resource Manager boosts OLTP performance by 408%!
• Report has small effect on OLTP performance (8%)
• Report data uses significant disk space, resulting in longer seek times
• Storage system is fully utilized
• OLTP workload: 376 IOPS per disk
• Report workload: 5 MBps per disk
1400
1200
1000

OLTP Performance 800

in TPS 600
> 4X Improvement!
400
200
0

OLTP & Report OLTP & Report


OLTP Only No IORM IORM
Scenario 2: DSS Query vs DSS Query

• Two different Data Warehouses are running DSS queries


• Production Data Warehouse: Critical
• Development Data Warehouse: Non-Critical
• Our goal: protect the performance of Critical DSS query
• Solution: use an Inter-Database plan to prioritize production data
warehouse

Priority-Based Plan
Priority 1: Production Data Warehouse
Priority 2: Development Data Warehouse
Scenario 2: DSS Query vs DSS Query
Results
• I/O Resource Manager boosts critical query time by 41%
• Non-critical query has small effect on critical query (9%)
• Report data uses significant disk space, resulting in longer seek times
• Running queries together is 17% more efficient than running them
serially

400
41% Improvement!
300

Critical Query:
200
Elapsed Time in
Seconds 100

0
Both Queries Both Queries
Critical Query No IORM IORM
<Insert Picture Here>

IORM at work
How IORM operates

• Resource limits only take effect when I/O bandwidth is


100% utilized
• Any resource group or category can access
bandwidth until I/O bandwidth saturation is reached
• Once I/O bandwidth is taken, I/O requests are queued
according to the IORM plan(s)
• Sub-plans allocate the resources give to the owner of the plan
I/O Scheduling
Traditional way
• With traditional storage, I/O schedulers are black boxes
• You cannot influence their behavior!
• I/O requests are processed in FIFO order
• Some reordering may be done to improve disk efficiency
• Elevator algorithms, deadline scheduling

Disk Queue
I/O
Requests Traditional
RDBMS Storage H L H L L L
Server

High-Priority Low-Priority
Workload Workload
I/O Scheduling
Exadata way
• Exadata executes requests, based on the user‘s prioritization
scheme
• Exadata may internally queue I/O requests to prevent a low-
priority intensive workload from flooding the disk

Exadata
High-Priority Disk Queue
Workload Queue

I/O H
Requests I/O
RDBMS Resource H L H H
Manager
LLLL
Low-Priority
Workload Queue
IORM Resource Plans
• I/O Resource Manager issues enough I/O requests to the disk to
keep it busy and efficient
• One queue for each consumer group
• When IORM is ready to issue the next request, it uses the
Resource Plan to select a consumer group queue
• Percentage for each queue determined by overall resource plan

Exadata
Priority DSS Consumer Disk Queue
Group Queue

I/O P
Requests I/O
RDBMS Resource P D P P
Manager
DDDD
DSS Consumer
Group Queue
IORM
Background I/Os

• Redo and control file I/Os always take top priority


• DBWR writes take priority specified in plan
IORM allocations
Categories, inter-database, intra-database
CG 1 –
50% 21%
Database
1 (60%) CG 2 –
50% 21%
High 42%
(70%) CG 3 –
50% 14%
Database
2 (40%) CG 4 –
50%
14%
Cell 1 28%
CG 5 –
75%
13.5%
Database
1 (60%) CG 6 –
25%
4.5%
Low 18%
(30%) CG 7 –
80%
9.6%
Database
2 (40%) CG 8 –
20%
2.4%
12%
Categories
Inter-database
Intra-database
<Insert Picture Here>

Enabling IORM
Enabling IORM
Steps
• Define consumer groups with DBRM
• You must assign sessions to consumer groups, either manually or
through consumer group mapping rules
• Create intra-database plan with Database Resource
Manager
• [Assign categories with to consumer groups with DBRM]
• [Create inter-database plan with CellCLI]
• Enable plan with RESOURCE_MANAGER_PLAN
parameter
• Enable IORMPLAN on all cells
• DBPLAN and CATPLAN
Enabling IORM

• You can switch Database Resource Manager IORM


plans at runtime
• IORM plans persist through cell reboots
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Flash Cache
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Flash Cache basics <Insert Picture Here>

• Configuring Flash Cache


• Flash Cache usage
• Flash Cache at work
• Flash Cache monitoring
• Flash Cache troubleshooting
<Insert Picture Here>

Flash Cache basics


Why Flash?

• Disk drives hold vast amounts of data


• But are limited to few hundreds I/Os per second

• Flash technology holds much less data


• But can run tens of thousands of I/Os per second

• Exadata v2+ solution:


• Keep most data on disk for low cost
• Transparently move hot data to flash
Sun Exadata Storage Server
Dual-redundant, hot-
swappable power supplies 24 GB DRAM (6 x 4GB)

ILOM 12 x 3.5” Disk Drives

Disk Controller
HBA with
512M BBC
2 Quad-Core Intel®
Xeon® E5540
Processors

InfiniBand QDR 4 x 96GB Sun Flash PCIe


(40Gb/s) dual Cards
port card
Sun Flash Accelerator F20
• 96GB Storage Capacity
• 4 x 24GB Flash modules/DOM
• 6GB reserved for failures
– Advanced Wear Leveling, Page Erase
Management, Performance Pipelining, Bad Block
Mapping
• x8 PCIe card
• Avoid disk controller limitations
• Super Capacitor backup
• Built-in write-back cache
• Measured end-to-end performance
• 3.6GB/sec/cell
• 75,000 read IOPs/cell
Smart Flash Cache benefits
Performance
• 50GB/s throughput
• 1 million IOPs
• Use PCIe cards instead of SSDs to avoid slow disk interface
• Exadata storage, InfiniBand and PCIe can drive higher levels of
performance
• Traditional storage arrays and SANs already have internal
bottlenecks which prevent them from exploiting the full spinning
disk performance and hence are unable to leverage the higher
performance of flash technology
Smart Flash Cache benefits
Capacity
• Linearly scalable – no bottlenecks as you add more
storage
• Efficient compression increases effective performance
and capacity by up to 10X
Smart Flash Cache benefits
Smart caching
• Integrated database and Exadata Storage Server
software ensures only frequently accessed data in
cached
• Automatically skips caching of data that will not be frequently
accessed or avoid caching data that will not fit in the cache
• Backups, mirrored copies, ASM rebalance, Data Pump, etc.
• Database awareness enables caching only data likely to be
accessed again
• User can fine-tune caching policies online
• Hardware flash cannot distinguish between relevant
database data and other data
• Much lower cache efficiency
• Much higher cost
<Insert Picture Here>

Configuring Flash Cache


Flash Cache
Organization

• 4 - 24 GB flash memory per card


• 4 cards per cell
• 384 GB flash memory per cell - 16 cell disks
Flash Cache
Management
• Managed using MS CellCLI command tool
• By default, automatically created at cell creation
• CellCLI> CREATE CELL <Name> …
• Uses all available flash space by default
• Can be dropped at any time
• CellCLI> DROP FLASHCACHE
• Can be re-created at any time
• CellCLI> CREATE FLASHCACHE ALL [SIZE=…]
Flash Cache
Usage

• Flash-based cell disks can be used for


• Smart Flash Cache
• Uses all available space by default
• Managed automatically for maximum efficiency
• Flash-based grid disks
• Premium persistent DB storage
• Requires deliberate planning for efficient usage
Flash Cache
Flash-based grid-disks

ASM diskgroup
Grid Disk 1
Cell
Flash Disks …
Disk
Grid Disk n ASM diskgroup
Flash Cache
Creating grid disks
• Flash-based cell disks and grid disks
CellCLI> LIST CELLDISK DETAIL
name: FD_00_cell01
diskType: FlashDisk
. . .
name: CD_00_cell01
diskType: HardDisk
. . .

CellCLI> CREATE GRIDDISK ALL FLASHDISK –


PREFIX=„FAST„, SIZE=10G
GridDisk FAST_FD_00_cell01 successfully created
GridDisk FAST_FD_01_cell01 successfully created
. . .
<Insert Picture Here>

Flash Cache usage


Flash Cache usage
Prioritization
• Prioritization levels
• DEFAULT
• KEEP
• NONE
• Assigned to table, index, partition or LOB column
• Can be modified with an ALTER statement
Flash Cache usage
Prioritization syntax

CREATE TABLE pt (c1 number)


PARTITION BY RANGE(c1)
(PARTITION p1 VALUES LESS THAN (100)
STORAGE (CELL_FLASH_CACHE DEFAULT),
PARTITION p2 VALUES LESS THAN (200)
STORAGE (CELL_FLASH_CACHE KEEP));

ALTER INDEX tkbi STORAGE (CELL_FLASH_CACHE NONE);


Prioritizing Flash Cache usage
KEEP option
• Impact of KEEP objects
• Cached more aggressively
• Cannot be pushed out by ‗default‘ objects
• 80% upper limit on KEEP cache size
• Do not add more data than KEEP can hold at one time
• Keep blocks are automatically ‗un-pinned‘ if
• Object is dropped, shrunk, or truncated
• Object is not accessed on the cell within 48 hours
• Block is not accessed on the cell within 24 hours
• Downgraded to ‗DEFAULT‘ behavior
• Changing priority from KEEP to NONE marks blocks in
cache as DEFAULT
<Insert Picture Here>

Flash Cache at work


Flash Cache at work
Database server prep

• SQL statement is optimized


• SQL statement is sent to Exadata Storage Server
(CellSRV), with Flash Cache prioritization for objects

Flash
Cache
Database CellSRV
Server
Disk
Flash Cache at work
Read operations

• Checks to see if Smart Scan candidate


• If no, or if object has KEEP attribute
• Checks to see if object is in the Flash Cache
• Else, go to disk
• For some operations, may go to Flash Cache and
disk, increasing overall bandwidth

Flash
Cache
Database CellSRV ?
Server
Disk
Flash Cache at work
Write operations

• Writes directly to disk


• Acknowledge write to database server
• Does not interfere with speed of write operations

Flash
Cache
Database CellSRV
Server
Disk
Flash Cache at work
Post operation (read or write)

• Checks to see if data should be cached


• Mirrored I/Os, log writes, control file writes, etc., never cached
• If block not in Flash Cache
• Checks to see Flash Cache attribute
• KEEP – store in Flash Cache
• NONE – do not store in Flash Cache
• DEFAULT – if read operation and small I/O, store in Flash
Cache
Flash
Cache
Database CellSRV ?
Server
Disk
Flash Cache at work
Large I/Os not cached

• Flash Cache improves response time for small I/Os


• Flash Cache increases bandwidth for large I/Os
• No improvement for response time for large I/Os with Flash
Cache
• CellSRV will use bandwidth of both disk reads and Flash
Cache when appropriate, increasing overall bandwidth
• Usage patterns determine if queries require increased
bandwidth for large I/Os
<Insert Picture Here>

Flash Cache monitoring


Flash Cache monitoring
MS Metrics
• Get general information about Smart Flash Cache

CellCLI> LIST FLASHCACHE DETAIL

name: cell01_FLASHCACHE
cellDisk: FD_00_cell01,FD_01_cell01
. . .
FD_14_cell01,FD_15_cell01
creationTime: 2009-10-19T17:18:35-07:00
id: b79b3376-7b89-4de8-8051-6eefc
size: 365.25G
status: normal
Flash Cache monitoring
MS Metrics
• Get overall statistics for Smart Flash Cache on a Cell

CellCLI> LIST METRICCURRENT WHERE –


objectType='FLASHCACHE„
FC_BY_USED 72119 MB
FC_IO_RQ_R 55395828 IO requests
FC_IO_RQ_R_MISS 123184 IO requests
...

CellCLI> LIST METRICDEFINITION FC_BY.*_USED DETAIL


name: FC_BY_USED
description: “Megabytes used on FlashCache”

name: FC_BYKEEP_USED
description: “Megabytes used for
keep objects on FlashCache"
Flash Cache monitoring
Finding if an object is cached
• Cell-level caching statistics for a DB object

SQL> SELECT object_id FROM DBA_OBJECTS


2 WHERE object_name='EMP';
OBJECT_ID
---------
57435
CellCLI> LIST FLASHCACHECONTENT
WHERE objectNumber=57435 DETAIL
cachedKeepSize: 0
cachedSize: 495438874
dbID: 70052
hitCount: 415483
missCount: 2059
objectNumber: 57435
tableSpaceNumber: 1
Flash Cache monitoring
Statistics and wait events
• Use standard Oracle tools
• AWR
• Enterprise Monitor
• End-to-End
• V$SYSSTATS statistics
• cell flash cache read hits
• physical read total bytes optimized
• cell physical IO bytes saved by storage index
• V$SQL
• OPTIMIZED_PHY_READ_REQUESTS
Flash Cache monitoring
Database level
• System statistics
SQL> SELECT name, value FROM V$SYSSTAT WHERE
2 NAME IN ('physical read total IO requests',
3 'cell flash cache read hits');
NAME VALUE
physical read total IO requests 15673
cell flash cache read hits 14664

• AWR report

Segments by UnOptimized Reads


Tablespc UnOptimized
Name Object Type Reads %Total
CUST_0 CUST TABLE 7,322,866 31.95
IORDL_0 IORDL INDEX 3,787,324 16.52
Flash Response monitoring
Database level
• Segment Statistics
SQL> SELECT object_id FROM DBA_OBJECTS
2 WHERE object_name='EMP';
OBJECT_ID
---------
57435
SQL> SELECT statistic_name, value
2 FROM V$SEGMENT_STATISTICS
3 WHERE dataobj#= 57435 AND ts#=5 AND
4 statistic_name=„optimized physical reads‟;

STATISTIC_NAME VALUE
------------------------ ------
optimized physical reads 743502
Flash Response monitoring
Mapping cell disks

CellCLI> LIST LUN WHERE cellDisk=„FD_00_cell03‟ DETAIL


name: 1_1
cellDisk: FD_00_cell03
deviceName: /dev/sdn
diskType: FlashDisk
physicalDrives: [9:0:2:0]

CellCLI> LIST PHYSICALDISK „[9:0:2:0]‟ DETAIL


name: [9:0:2:0]
physicalFirmware: D20R
slotNumber: "PCI Slot: 1; FDOM: 1"
<Insert Picture Here>

Flash Cache
troubleshooting
Data integrity
Protection
• Flash Cache is less stable than disk
• Flash Cache includes read-cache-verification
• Few ‗check bytes‘ are stored in memory for every 4 KB of data
written to flash
• During flash reads ‗check bytes‘ are verified
• If verification fails, data is read from disk
• Checking for Flash Cache errors
• CellCLI> LIST METRICCURRENT FC_IO_ERRS
CELLSRV alert.log file

• Verification level may be changed


For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Oracle Exadata Database Machine performance
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Performance and I/O <Insert Picture Here>

• Data flow exchanges


• Data flow exchange capacities
• Case studies
<Insert Picture Here>

Performance and I/O


Performance fundamentals
I/O constrained workloads

• Host(s) must be able to generate I/O requests


• CPU bound systems cannot generate more I/O
• Storage must be able to deliver the I/O
• Conventional storage bottlenecks abound
• Drawer, Loop, Storage Processor

• Host(s) must be able to ingest the I/O


• Must have adequate I/O adaptors
• Must have balanced ―bus‖ / memory bandwidth
• Must have adequate CPU bandwidth
• CPUs saturated by data in-flow cannot generate more I/O
Performance fundamentals
Data flow exchanges

• Data flow exchanges


• There exists a ―producer‖ / ―consumer‖ relationship between
the database grid and the storage grid.
• Points of data-flow between producers and consumers are
called data flow exchanges
• Producer/consumer relationships are the foundation of
throughput
<Insert Picture Here>

Data flow exchanges


Sun Oracle Database Machine
Data flow exchanges

• Exchange 1: Within a Cell. The


flow of data between:
• HDD/flash <->memory <-
>CPU <-> iDB

Infiniband

CPUs & CPUs &


CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
Memory Memory
Exchange 1
Sun Oracle Database Machine
Data flow exchanges

• Exchange 1: Within a cell. The


flow of data between:
• HDD/flash <->memory <-
>CPU <-> iDB
• Exchange 2: The flow of data
between a single cell and the
Infiniband
database grid via iDB.
Exchange 2
• Realizable bandwidth is
roughly 2.5 GB/s
CPUs & CPUs &
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
Memory Memory
Exchange 1
Sun Oracle Database Machine
Data flow exchanges

• Exchange 1: Within a cell. The


flow of data between:
• HDD/flash <->memory <-
>CPU <-> iDB Exchange 3
• Exchange 2: The flow of data
between a single cell and the
Infiniband
database grid via iDB.
Exchange 2
• Realizable bandwidth is
roughly 2.5 GB/s
• Exchange 3: The flow of data CPUs &
CPUs &
CPUs &
CPUs &
Memory Memory
between a single database host CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
and the storage grid. CPUs &
Memory CPUs &
Memory
• Realizable bandwidth is Exchange 1 Memory Memory

roughly 2.5GB/s
Sun Oracle Database Machine
Data flow enhancement
• The Sun Oracle Database Machine implements best
practices for throughput
• Balanced configuration, designed to avoid bottlenecks at all
points in the data flow
• Software designed to reduce volume of data required to flow
• Elimination of
• Excess rows (predicate evaluation, join filtering, storage
indexes)
• Excess columns (column projection)
• Software designed to reduce disk I/O
• Storage indexes, Exadata Smart Flash Cache
• Software designed to efficiently allocate data flow bandwidth
• IORM
<Insert Picture Here>

Data flow exchange


capacities
Database Machine data flows
Maximum bandwidth
• Exchange 1: Within a cell.
• 125 MB/s * 12 HDD == 1.5 GB/s
• 3.6 GB/s (FLASH) + 1.2 GB/s (HDD) == ~4.8 GB/s
• Exchange 2: Between a single cell and the database (1:M) grid via iDB
• 2.5 GB/s
• Flow Control: A cell in the storage grid cannot produce/consume 2.5
GB/s unless the database grid can produce/consume the data.
• Exchange 3: Between a single database host and the storage grid via
iDB (1:N)
• 2.5 GB/s
• Flow Control: A host in the database grid cannot produce/consume 2.5
GB/s unless the storage grid can produce/consume the data.
Database Machine data flows
Scaling – Exchange 1

• Exchange 1:
• Scales horizontally to a maximum aggregate rate of roughly
67 GB/s
• Achieving this maximum theoretical rate involves parallel
scanning of Flash and HDD on all cells in a full rack.
• At this rate, data cannot flow through Exchange 2. That is,
the data cannot leave the cells at this rate.
• Think of a query that looks for a non-existent needle in
a haystack:
• SQL> SELECT base FROM payroll WHERE base >
8,000,000 ;
• Many chefs

• – 283 –
Database Machine data flows
Scaling – Exchange 2

• Exchange 2:
• Scales out to a maximum aggregate rate of roughly 20 GB/s
• This is the aggregate rate of data flow between the storage
grid and the database grid.
• The difference between Exchange 1 and Exchange 2 is the
Smart Scan effectiveness.
• Cells must reduce payload through projection/filtration to fit
within Exchange 2 bandwidth.
• The aggregate out-flow rate from Exchange 2 must fit
within about 20 GB/s
Database Machine data flows
Scaling – Exchange 2

• Exchange 2:
• All hosts in the database grid must participate in order to
accommodate maximum Exchange 2 data flow
• That is, less than 8 hosts cannot ingest this flow of data
• NOTE: A single Oracle foreground (no PQ) can drive
storage at roughly 20 GB/s but no data can flow from the
storage grid to the single foreground process at this rate.
Think of a fully offloaded query.
• 1 order, many eaters
Database Machine data flows
Scaling – Exchange 3

• Exchange 3:
• Scales horizontally to a maximum aggregate rate of roughly
20 GB/s
• This is the aggregate rate of data flow between the database
grid and the storage grid.
• PQ server must have sufficient CPU bandwidth else disk I/O
is throttled.
• All database hosts must participate to realize maximum
theoretical Exchange 3 bandwidth.
• Many orders, 1 dish
Sun Oracle Database Machine
Data flow maximums

2.5 GB/s per host


Exchange 3 20 GB/s Agg

Infiniband
Exchange 2 2.5 GB/s per cell
20 GB/s Agg

CPUs & CPUs &


CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
Memory Memory
Exchange 1 4.8GB/s
67 GB/s Agg
<Insert Picture Here>

Case studies
Case studies
Principals

• Lowest effective bandwidth through any exchange


limits overall throughput
• Exadata storage intelligence is used to limit required
data flow from cells to database grid
• RAC is used to combine consumer capability of
database grid
• Disks spin, but heads only move to read or write when
someone asks for something
• I/O and CPU utilization are linked
Case studies
Schema
SQL> desc all_card_trans
Name Null? Type
----------------------------------------- -------- ----------------------------
CARD_NO NOT NULL VARCHAR2(20)
CARD_TYPE CHAR(20)
MCC NOT NULL NUMBER(6)
PURCHASE_AMT NOT NULL NUMBER(6)
PURCHASE_DT NOT NULL DATE
MERCHANT_CODE NOT NULL NUMBER(7)
MERCHANT_CITY NOT NULL VARCHAR2(40)
MERCHANT_STATE NOT NULL CHAR(3)
MERCHANT_ZIP NOT NULL NUMBER(6)
Case studies
Queries

• Light, lightweight Scan


• SQL> select max(mcc) from all_card_trans where mcc < 0;
• Lightweight scan 50% selectivity
• SQL> select max(mcc) from all_card_trans where
purchase_amt > 60;
• Busy Storage Server CPUs

• Complex Query
• Synopsis:
• 5-table join – Busy database CPUs
• Heavy predicate evaluation – Busy Storage Server CPUs
• See next slide
Case studies
Complex query

select custid, sum(refund_amt) returns from CUST_SERVICE cs where return_dt > ( SYSDATE - 180)
and CLUB_CARD_NUM > 0 and CC_NUMBER > 0 and cs.club_card_num not like '%A%‗ group by custid
),
xxx as ( select cmr.custid, cf.aff_cc_num, cmr.returns, sum(os.trans_amt) from yyy cmr, CF_BASE2 cf, OS_BASE2 os
where cf.custid = cmr.custid and cf.custid = os.custid and os.club_points_earned > 0 and os.STORE_CODE > 0
and os.TRANS_ID > 0 and os.CUSTID > 0 and ( cf.club_card_num not like '%A%‗ and cf.AFF_CC_NUM not like '%A%'
and cf.CUST_SHIPTO_DETAIL2 not like '%NO DETAIL%‗ and cf.CUSTDETAIL1 not like '%NO DETAIL%'
) and os.trans_dt > ( SYSDATE - 180) group by cmr.custid, cf.aff_cc_num, cmr.returns
having (returns / sum(os.trans_amt) * 100) > 2
)
select card_no, sum(purchase_amt) sales
from ACT_BASE2 act where act.purchase_dt > ( SYSDATE - 180)
and
card_no in ( select aff_cc_num from xxx) and act.merchant_code not in
( select merchant_code from PARTNER_MERCHANTS where store_zip > 0 and store_name not like '%ACME%')
and MERCHANT_CITY not like '%Frankfort%' and MERCHANT_STATE not like '%KY%'
and MCC > 1 and CARD_TYPE not like '%NO CARD%‗ and PURCHASE_AMT > 0
group by card_no
having sum(purchase_amt) > 10;
Case study 1
Results
• Light, lightweight scan
• Fully offloaded – no data returned to server
• Caveats: This full-rack is configured with 450 GB SAS drives and
is missing 1 96 GB Flash card.
• Maximum HDD disk throughput is 20 GB/s, Combined
Flash+HDD is 54 GB/s.
SQL> SELECT MAX(MCC) FROM ALL_CARD_TRANS WHERE
MCC < 0;
Storage Source Query Throughput GB/s CPU %busy Query CPU Seconds Effective
From Storage iDB Cells Database Tm (sec) GB/s
Light, lightweight Scan 49 0.007 35 3 51 4194
FLASH+HDD
Light, lightweight Scan (HCC 6:1) 23 0.007 90 2 20 4124 125
Light, lightweight Scan 20 0.003 15 3 125 4682
HDD
Light, lightweight Scan (HCC 6:1) 19 0.012 70 5 24 3956 104
Case study 1
Results

Storage Source Query Throughput GB/s CPU %busy Query CPU Seconds Effective
From Storage iDB Cells Database Tm (sec) GB/s
Light, lightweight Scan 49 0.007 35 3 51 4194
FLASH+HDD
Light, lightweight Scan (HCC 6:1) 23 0.007 90 2 20 4124 125
Light, lightweight Scan 20 0.003 15 3 125 4682
HDD
Light, lightweight Scan (HCC 6:1) 19 0.012 70 5 24 3956 104

• Data Flow Lessons:


•HCC query time benefit metered by cell CPU bandwidth
•HCC data scanning from FLASH+HDD is, of course, the fastest even
though only 23 GB/s scan rate from HDD+FLASH
•HDD non-compressed -> FLASH+HDD compressed is 6.25x
faster and 12% less total CPU seconds used.
•If the performance you are measuring does not meet your
expectations, remember the 3 Data Flow Exchanges
Case study 2
Results

•Lightweight scan - 50% selectivity


SQL> select max(mcc) from all_card_trans where
purchase_amt > 60;

Storage Source Query Throughput GB/s CPU %busy Query Tm CPU Effective
(sec) Seconds GB/s
From Storage iDB Cells Database

FLASH+HDD Lightweight Scan 50% Selectivity 50 3.8 80 15 52 10317


Lightweight Scan 50% Selectivity (HCC 6:1)
13 2.7 90 50 37 9827 70

Lightweight Scan 50% Selectivity 20 1.5 28 10 126 9520


HDD
Lightweight Scan 50% Selectivity (HCC 6:1)
13 2.6 90 50 36 9657 69
Case study 2
Results
Storage Source Query Throughput GB/s CPU %busy Query Tm CPU Effective
(sec) Seconds GB/s
From Storage iDB Cells Database

FLASH+HDD Lightweight Scan 50% Selectivity 50 3.8 80 15 52 10317


Lightweight Scan 50% Selectivity (HCC 6:1)
13 2.7 90 50 37 9827 70

Lightweight Scan 50% Selectivity 20 1.5 28 10 126 9520


HDD
Lightweight Scan 50% Selectivity (HCC 6:1)
13 2.6 90 50 36 9657 69

• Data Flow Lessons:


•Producer CPU utilization throttles disk throughput
•Cell CPU reaches critical level when scanning/filtering/projecting data from
either HDD or combined HDD+FLASH. Both media sources generate the same
effective throughput.
•Use assets wisely. Do all tables need to be pinned in cell Flash Cache?
•Expect increase CPU utilization in both grids when querying HCC data
•If the performance you are measuring does not meet your expectations,
remember the 3 data flow exchanges
Data Flow Dynamics
Case Study Examples. Complex Query Case.
HDD, Non-HCC. Avg Disk I/O 9 GB/s, iDB 6 GB/s:

Complex Query HDD Phys I/O Complex Query CPU

100
14000
90

12000 80

10000 70
MB/sec

60

%CPU
8000
50
6000
40
4000
30
2000
20

0 10
3:30:34 3:30:43 3:30:51 3:31:00
0
3:30:34 3:30:43 3:30:51 3:31:00

Time
Time

iDB MB/s Disk MB/s Cells DB Grid


Data Flow Dynamics
Case Study Examples. Complex Query Case.

Complex Query CPU


• Complex Query Lessons:
• Heavy joins throttle I/O

100

90
• Heavy filtration/projection
80
throttles I/O
70

60 • If the performance you are


%CPU

50
measuring does not meet your
40
expectations, remember the 3
30
Data Flow Exchanges
20

10

0
3:30:34 3:30:43 3:30:51 3:31:00

Time

Cells DB Grid
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
CellCLI, DCLI and ADRCI
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Exadata software architecture <Insert Picture Here>

• CellCLI
• DCLI
• ADRCI
<Insert Picture Here>

Exadata software
archtecture
Exadata software architecture

CellCLI

iDB
Management
Server
CellSRV
[IORM]
Restart
Server

DiskDiskDiskDiskDisk
<Insert Picture Here>

CellCLI
CellCLI
Overview
• Command line utility for managing cell resources
• CellCLI runs on the cell
• Run locally from a shell prompt
• Run remotely via ssh or dcli
• Run automatically by EM agent with Exadata EM plugin
• Can run non-interactively

[celladmin@cell01 ~]# cellcli


CellCLI: Release 11.1.3.0.0 - Production on Tue Oct 04 22:13:21 PDT 2008

Copyright (c) 2007, 2008, Oracle. All rights reserved.


Cell Efficiency ratio: 73.1

CellCLI>
CellCLI
Syntax
• Commands not case sensitive
• - character for line continuation
• ; optional command terminator
• REM, REMARK or – indicate comments
CellCLI
Commands

• Administration commands -- Similar to SQLPLUS:


• HELP: displays syntax and usage descriptions for all CellCLI
commands
• SET : sets parameter options in the CellCLI environment.
• SPOOL: writes results of commands to the specified file on
the cell file system.
• EXIT or QUIT: return control to invoking shell
• START or @: runs the CellCLI commands in the specified
script file.
CellCLI
Help Command
CellCLI> help

HELP [topic]
Available Topics:
ALTER
ALTER ALERTHISTORY
ALTER CELL
ALTER CELLDISK
ALTER GRIDDISK
ALTER IORMPLAN
ALTER LUN
ALTER THRESHOLD
ASSIGN KEY
CALIBRATE
CREATE
CREATE CELL
CREATE CELLDISK
CREATE GRIDDISK

CellCLI>
CellCLI
Object commands
• List and change cell resources
• Syntax: <verb> <object-type> [ALL |object-name] [<options>]
• Generic verbs: ALTER, CREATE, DROP, and LIST
used to change, create, remove, and display objects
CellCLI> create griddisk all prefix=data
GridDisk data_CD_1_stsd2s3 successfully created
GridDisk data_CD_2_stsd2s3 successfully created
GridDisk data_CD_3_stsd2s3 successfully created
GridDisk data_CD_4_stsd2s3 successfully created
GridDisk data_CD_5_stsd2s3 successfully created
...

CellCLI> alter griddisk data_CD_1_stsd2s3 availableTo="+ASM"


GridDisk data_CD_1_stsd2s3 successfully altered
CellCLI
Cell Object Types
• Resource-related object types represent hardware
and software configuration:
CELL, CELLDISK, GRIDDISK, IORMPLAN, KEY,
LUN, PHYSICALDISK
• Performance metric object types: ACTIVEREQUEST,
METRICCURRENT, METRICDEFINITION,
METRICHISTORY
• Failure alert object types:
ALERTDEFINITION, ALERTHISTORY,THRESHOLD.

• Objects types indicated by RED


• List only
• Automatically created
CellCLI
Object Attributes
• Each object has attributes, listed by the DESCRIBE
command.
• Attributes which can be modified by ALTER
commands are listed as "modifiable"
CellCLI> describe griddisk
name modifiable
availableTo modifiable
cellDisk
comment modifiable
creationTime
errorCount
id
offset
size modifiable
status
Object Attributes
List options
• LIST command results can be limited by "where"
predicate on attribute values
• LIST output fields can be specified by the ―attributes‖
clause which uses standard comparison operators
• LIST DETAIL option provides display of all attributes
CellCLI> list celldisk where freespace > 100G
CD_1_stsd2s3 normal
CD_2_stsd2s3 normal
CD_3_stsd2s3 normal
CD_4_stsd2s3 normal
CellCLI> list griddisk attributes name,size,status where name like 'data.*'
data_CD_1_stsd2s3 928M active
data_CD_2_stsd2s3 136G active
data_CD_3_stsd2s3 136G active
data_CD_4_stsd2s3 136G active
CellCLI
CELL Object Type
• CELL is the local server to which disks are attached
and on which the CellCLI utility runs.
• One Cell object (default cell name = domain name)
• Automatically created, but can use CREATE CELL
CellCLI> list cell
cell01 online
CellCLI> alter cell smtpServer='my_mail.example.com', -
smtpFromAddr='john.doe@example.com', -
smtpFrom='John Doe', -
smtpToAddr='jane.smith@example.com', -
snmpSubscriber=((host=host1),(host=host2)), -
notificationPolicy='critical,warning,clear', -
notificationMethod='mail,snmp'
Cell cell01 successfully altered
CellCLI> alter cell shutdown services all
Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
LIST CELL
CellCLI> list cell detail
name: cell01
bmcConfigured: FALSE
bmcType: "ILO 2.0"
cellVersion: OSS_MAIN_LINUX_081120
cpuCount: 4
idLEDStatus: off
interconnectCount: 5
interconnect1: bond0
iormBoost: 0.0
ipaddress1: 144.25.214.119/22
kernelVersion: 2.6.18-53.1.el5.sage
makeModel: HP DL series smart array ILO
metricHistoryDays: 7
offloadEfficiency: 73.6
status: online
temperatureReading: 47.0
temperatureStatus: normal
upTime: 12 days, 19:55
cellsrvStatus: running
msStatus: running
rsStatus: running
CellCLI
Disk hierarchy

Physical LUN CELLDISK GRIDDISK


disk
CellCLI
PHYSICALDISK and LUN Object Types
• PHYSICALDISK: A physical disk on the cell.
• LUN: the address for each individual physical disk.
• PHYSICALDISK objects are discovered when the cell is started.
LUN objects are automatically created for each CELLDISK.

CellCLI> list physicaldisk attributes name, status, physicalsize


1I:0_1:1 normal 146G
1I:0_1:2 normal 146G
1I:3_1:5 normal 146G
...
CellCLI> list lun
3_1 normal
3_2 normal
3_3 normal
...
CellCLI
CELLDISK and GRIDDISK Object Types
• CELLDISK is associated with a logical unit number (LUN).
One physical disk is associated with each cell disk.
• GRIDDISK is a logical partition of a cell disk. It is exposed
on network (as ASM disks) to the database hosts.
CellCLI> create celldisk all
CellDisk CD_1_stsd2s3 successfully created
CellDisk CD_2_stsd2s3 successfully created
...
CellCLI> create griddisk all prefix=data,size=10G
GridDisk data_CD_1_stsd2s3 successfully created
GridDisk data_CD_2_stsd2s3 successfully created
...
CellCLI> alter griddisk all availableto='+ASM'
GridDisk data_CD_1_stsd2s3 successfully altered
GridDisk data_CD_2_stsd2s3 successfully altered
...
CellCLI
IORMPLAN Object Type
• The IORMPLAN object contains the set of directives that
determine allocation of I/O resources between multiple
cells (interdatabase) to database clients.
• There is one IORMPLAN object for the cell.

CellCLI> ALTER IORMPLAN dbPlan=((name=sales, level=1, allocation=80), -


(name=finance_prod, level=1, allocation=20), -
(name=sales_dev, level=2, allocation=100), -
(name=sales_test, level=3, allocation=50), -
(name=other, level=3, allocation=50))
IORMPLAN successfully altered

CellCLI> alter iormplan active


IORMPLAN successfully altered
CellCLI
ALERTHISTORY and THRESHOLD Object Types
• ALERTHISTORY: A list of alerts that have occurred on the
cell.
• THRESHOLD objects describe the rules for generating
alerts based on a specific metric.
CellCLI> LIST ALERTHISTORY WHERE begintime > 'Jun 1, 2008 11:37:00 AM PDT„
39 2008-10-02T12:26:53-07:00 "ORA-07445: exception encountered: core dump “
40 2008-10-06T23:28:06-07:00 "RS-7445 [unknown_function] [signum: 6] []"
41 2008-10-07T00:50:42-07:00 "RS-7445 [Serv MS not responding] []“
42 2008-10-07T02:21:19-07:00 "RS-7445 [unknown_function] [signum: 6] []"

CellCLI> CREATE THRESHOLD db_io_rq_sm_sec.db123 comparison='>', critical=120


Threshold db_io_rq_sm_sec.db123 successfully created

CellCLI> list threshold detail


name: db_io_rq_sm_sec.db123
comparison: >
critical: 120.0
CellCLI
METRIC* Object Types

• METRICDEFINITION objects describe the metrics.


• METRICCURRENT objects are the set of current observations
• Flushed to METRICHISTORY every hour

CellCLI> list METRICDEFINITION attributes name,description where objecttype='cell'


CL_CPUT "Cell CPU Utilization is the percentage of time over the
previous minute that the system CPUs were not idle (from /proc/stat)."
CL_FANS "Number of working fans on the cell"
...
ellCLI> list METRICCURRENT cl_cput
CL_CPUT stado54 3.3 %
CellCLI
METRIC* Object Types

• METRICHISTORY a collection of past metric observations

CellCLI> list METRICDEFINITION attributes name,description where objecttype='cell'


CL_CPUT "Cell CPU Utilization is the percentage of time over the
previous minute that the system CPUs were not idle (from /proc/stat)."
CL_FANS "Number of working fans on the cell"
...
CellCLI> list METRICCURRENT cl_cput
CL_CPUT stado54 3.3 %

CellCLI> list metrichistory cl_cput where –


collectiontime>'2008-11-18T11:46:32-08:00'
CL_CPUT stado54 3.3 % 2008-11-18T11:47:32-08:00
CL_CPUT stado54 2.8 % 2008-11-18T11:48:32-08:00
CL_CPUT stado54 3.3 % 2008-11-18T11:49:32-08:00
...
CellCLI
Other object commands

• CREATE KEY displays a random hex security key


• ASSIGN KEY assigns a security key for an ASM or
DB client
• CALIBRATE runs raw performance tests on cell disks.
• EXPORT CELLDISK: prepares cell disks before
moving (importing) the cell disk to a different cell.
• IMPORT CELLDISK: reinstates exported cell disks on
a cell where you moved the physical drives that
contain the cell disks
CellCLI
Exadata Storage Server users

• cellmonitor – can only use LIST commands


• celladmin – All functions except for CALIBRATE
• root – All functions
<Insert Picture Here>

DCLI
DCLI
Overview
• The DCLI script runs commands on multiple cells in
parallel threads.
• File copy and command execution occur on a set of cells in
parallel.
• Command output is collected and displayed after file copy
and command execution is finished on all cells.
• Setup:
• Copy DCLI from cell (/opt/oracle/cell/cellsrv/bin/dcli) to host from
which management is done.
• Create files which contain a list of cells to which commands are
issued, e.g. mycells
• Run ―dcli –k –g mycells‖ to create ssh key equivalence on cells
DCLI
Return codes
• A DCLI can returns one of the following values
• 0 – The command(s) were copied and run on all designated cells
• 1 – One or more cells could not be reached or returned a non-zero
error code
• 2 – A local error prevented execution of any commands
DCLI
Example 1
$ scp celladmin@stsd2s3:/opt/oracle/cell/cellsrv/bin/dcli .
dcli 100% 32KB 31.6KB/s 00:00

$ cat - > mycells


# cells to be managed
stsd2s1
stsd2s2
stsd2s3

$ dcli -g mycells -k
celladmin@stsd2s1's password:
celladmin@stsd2s2's password:
stsd2s1: ssh key added
stsd2s2: ssh key added
stsd2s3: ssh key already exists

$ dcli -g mycells cellcli -e list cell


stsd2s1: stsd2s1 online
stsd2s2: stsd2s2 online
stsd2s3: stsd2s3 online
DCLI
Example 2
$ dcli
Error: No command specified.
usage: dcli [options] [command]

options:
--version show program's version number and exit
-cCELLS comma-separated list of cells
-fFILE file to be copied
-gGROUPFILE file containing list of cells
-h, --help show help message and exit
-k push ssh key to cell's authorized_keys file
-lUSERID user to login as on remote cells (default: celladmin)
-n abbreviate non-error output
-rREGEXP abbreviate output lines matching a regular expression
-sSSHOPTIONS string of options passed through to ssh
--scp=SCPOPTIONS string of options passed through to scp if different
from sshoptions
-t list target cells
-v print extra messages to stdout
--vmstat=VMSTATOPS vmstat command options
-xEXECFILE file to be copied and executed
DCLI
Example 3
$ dcli -c mycells cellcli -e create griddisk all prefix="data", size=120G
stsd2s1: GridDisk data_CD_2_stsd2s1 successfully created
stsd2s1: GridDisk data_CD_3_stsd2s1 successfully created
stsd2s1: GridDisk data_CD_4_stsd2s1 successfully created
stsd2s1: GridDisk data_CD_5_stsd2s1 successfully created
stsd2s1: GridDisk data_CD_6_stsd2s1 successfully created
...
$ dcli -c mycells 'cellcli -e alter griddisk all availableTo=\"+ASM,dbm\"'
stsd2s1: GridDisk data_CD_1_1_stsd2s1 successfully altered
stsd2s1: GridDisk data_CD_2_stsd2s1 successfully altered
stsd2s1: GridDisk data_CD_3_stsd2s1 successfully altered
stsd2s1: GridDisk data_CD_4_stsd2s1 successfully altered
...
$ dcli -g mycells cellcli -e assign key for dbm='1212824bf214e59f3b60d1553b784cf0'
stsd2s1: Key for dbm successfully altered
stsd2s2: Key for dbm successfully altered
stsd2s3: Key for dbm successfully altered

$ dcli -c stsd2s3 cellcli -e alter iormplan active


stsd2s1: IORMPLAN successfully altered
stsd2s2: IORMPLAN successfully altered
stsd2s3: IORMPLAN successfully altered
DCLI
Example 4
$ dcli –g mycells --vmstat='3 10'
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
13:03:15: r b swpd free buff cache si so bi bo in cs us sy id wa st
stsd2s1: 1 0 451180 13008 3456 61460 0 0 180 347 5 4 1 2 96 0 0
stsd2s2:21 0 350260 14480 3500 62444 0 0 330 252 0 2 1 2 97 0 0
stsd2s3: 0 0 128 13880 23556 511432 0 0 370 25 9 2 1 2 97 0 0
Minimum: 0 0 128 13008 3456 61460 0 0 180 25 0 2 1 2 96 0 0
Maximum:21 0 451180 14480 23556 511432 0 0 370 347 9 4 1 2 97 0 0
Average: 7 0 267189 13789 10170 211778 0 0 293 208 4 2 1 2 96 0 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
13:03:20: r b swpd free buff cache si so bi bo in cs us sy id wa st
stsd2s1: 2 0 451180 13168 3480 61508 25 0 10857 34240 28862 30560 12 27 60 0 0
stsd2s2: 1 0 350260 12144 3524 62496 0 0 10912 34196 12344 31365 11 17 71 0 0
stsd2s3: 0 0 128 14576 23576 511480 0 0 0 0 1005 16498 0 0 100 0 0
Minimum: 0 0 128 12144 3480 61508 0 0 0 0 1005 16498 0 0 60 0 0
Maximum: 2 0 451180 14576 23576 511480 25 0 10912 34240 28862 31365 12 27 100 0 0
Average: 1 0 267189 13296 10193 211828 8 0 7256 22812 14070 26141 7 14 77 0 0

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------


13:03:24: r b swpd free buff cache si so bi bo in cs us sy id wa st
stsd2s1: 2 0 451180 12344 3504 61508 0 0 14145 42561 35069 31306 13 30 57 0 0
stsd2s2: 1 0 350260 11768 3548 62496 0 0 13958 42532 13328 31422 12 19 68 0 0
stsd2s3: 0 0 128 15952 23624 511484 0 0 0 111 1010 15093 0 0 100 0 0
Minimum: 0 0 128 11768 3504 61508 0 0 0 111 1010 15093 0 0 57 0 0
Maximum: 2 0 451180 15952 23624 511484 0 0 14145 42561 35069 31422 13 30 100 0 0
Average: 1 0 267189 13354 10225 211829 0 0 9367 28401 16469 25940 8 16 75 0 0

DCLI
Other uses
• Shutting down and starting up the CRS stack on all nodes
• Changing or checking a configuration parameter on all nodes
• Checking process state info on all nodes
• Checking/gathering hardware info on all nodes for HP tickets (ex:
ipmitool, dmidecode)
• Starting/Stopping/Checking cell services on all nodes
• Gathering/aggregating system stats on all nodes (ex: vmstat, iostat,
etc)
• Verifying network connectivity across all nodes (IP and rds - ie
normal pings and rds-pings)
• Setting up ssh across all nodes
• Setting up security - both cell side (cellcli commands) and database
side (cellkey.ora population)
<Insert Picture Here>

ADRCI
ADRCI
Utility Overview

• ADRCI: command line tool for viewing diagnostic data within


a cell's ADR (Automatic Diagnostic Repository).
• This support is similar to that provided with RDBMS in
11G.
• The tool is invoked simply by running ‗adrci'.
• ADRCI includes the Incident Packaging System (IPS), which
allows identifying and packing up all of the relevant
diagnostic data for a critical error.
• Critical errors are seen from cellcli LIST ALERTHISTORY,
from an email or SNMP notification, or from EM plugin alert
list.
ADRCI
Incident Packaging Example
$ adrci
ADRCI: Release 11.1.0.7.0 - Production on Thu Nov 20 16:32:15 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.

adrci> show homes


ADR Homes:
diag/asm/cell/cell01

adrci> set home diag/asm/cell/cell01


adrci> show incidents
ADR Home = /opt/oracle/cell/log/diag/asm/cell/cell01:
*************************************************************************
INCIDENT_ID PROBLEM_KEY CREATE_TIME
-------------------- --------------------- -----------------
5 RS 7445 2008-11-19 22:38:56.228289 -05:00

adrci> ips create package incident 5


Created package 1 based on incident id 5, correlation level typical

adrci> ips generate package 1 in /tmp


Generated package 1 in file /tmp/RS7445_20081120163628_COM_1.zip, mode complete
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Exadata and availability
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Availability with Exadata <Insert Picture Here>

• ASM and Exadata


• Exadata disk availability
• MAA and Exadata
• MAA best practices for Exadata
• Data Guard and Exadata
• Patching and upgrades
<Insert Picture Here>

Availability with Exadata


Availability and Exadata
Hardware redundancy

• Redundant database servers


• Redundant storage cells
• Redundant disks within cells
• Redundant connectivity
• Redundant power supplies
Availability and Exadata
Software redundancy

• RAC
• ASM
<Insert Picture Here>

ASM and Exadata


Exadata
Storage layout overview
• Physical disks (LUN) map to a Cell Disks
• Cell Disks partitioned into one or multiple Grid Disks
• ASM diskgroups created from Grid Disks
• Transparent above the ASM layer

ASM diskgroup
Grid Disk 1
Cell
Physical …
Disk Disk
Grid Disk n ASM diskgroup
Sys Area Sys Area
Exadata storage
Cell disks

Cell
Disk Exadata Cell Exadata Cell

• Cell Disk is the entity that represents a physical


disk residing within a Exadata Storage Cell
• Automatically discovered and activated
Exadata storage
Grid disks

Grid Exadata Cell Exadata Cell

Disk

• Cell Disks are logically partitioned into Grid Disks


• Grid Disk is the entity allocated to ASM as an ASM disk
• Minimum of one Grid Disk per Cell Disk
• Can be used to allocate ―hot‖, ―warm‖ and ―cold‖ regions of a
Cell Disk or to separate databases sharing Exadata Cells
Exadata storage
Grid disks
First grid disk
• First grid disk
defined is placed on
outer portion of the
disk
• Typical configuration
is to use first grid
disk for data, second
grid disk for Flash
Recover Area (FRA)

Second grid disk


Exadata storage
ASM Disk Groups and mirroring

Hot ASM
Disk Group
Exadata Cell Exadata Cell Cold ASM
Disk Group
Hot Hot Hot Hot Hot Hot

Cold Cold Cold Cold Cold Cold

• Two ASM disk groups defined


• One for the active, or ―hot‖ portion, of the database and a
second for the ―cold‖ or inactive portion
• ASM striping evenly distributes I/O across the disk group
• ASM mirroring is used protect against disk failures
• Optional for one or both disk groups
Exadata storage
ASM mirroring and failure groups

ASM ASM
Exadata Cell Exadata Cell
Failure Group Failure Group

Hot Hot Hot Hot Hot Hot

Cold Cold Cold Cold Cold Cold

ASM
Disk Group
• ASM mirroring is used protect against disk failures
• ASM failure groups are used to protect against cell
failures
Exadata storage
ASM interactions

• Grid disks cannot span multiple cells, but disk groups


can
• Redundancy is handled by ASM
• Availability settings, such as disk_repair_time, are
ASM settings
• DISKMON is used to communicate between ASM and
the Exadata Storage Server cells
Exadata storage
Intelligent Data Placement

• ASM mirrors data within a grid disk


• I/Os to outer sectors of a Physical Disk are faster
than inner sectors
• ASM places the primary extents in the first half of the
disk from the outer edge towards the middle, and
secondary's in the second half from the middle
towards the spindle
• Create cell disks with optional INTERLEAVING
attribute set to normal_redundancy or
high_redundancy
Intelligent Data Placement (IDP)

• Normal Redundancy: First half


(50% of the disk) is considered
HOT, while second half (50% of the
disk) is considered COLD
• IDP places primary extents in the 50%
HOT Zone
• IDP places secondary extents (mirror
copies) in the 50% COLD Zone
Intelligent Data Placement (IDP)

• High Redundancy: First 1/3rd (33%) of the disk is


considered HOT, while the rest 2/3rd (67%) of it is
considered COLD
• IDP places primary extents in the 33% HOT Zone
• IDP places secondary extents (2 mirror copies) in the 67% COLD
Zone
Intelligent Data Placement (IDP)
Interaction with ASM

• If an ASM disk group with high redundancy is desired


with IDP
• Create cell disks with INTERLEAVING =‗high_redundancy‘
• Create grid disks from these cell disks
• Add grid disks to the ASM Diskgroup

• If an ASM disk group with normal redundancy is


desired with IDP
• Create cell disks with INTERLEAVING =‗normal_redundancy‘
• Create grid disks from these cell disks
• Add grid disks to the ASM Diskgroup
<Insert Picture Here>

Exadata disk availability


Exadata Storage Server
Availability – Case 1

Cell becomes
unreachable

ONLINE OFFLINE

Cell becomes
reachable
Exadata Storage Server
Availability – Case 2

HDD / Flash Card removed


from cage / PCIe slot

ONLINE OFFLINE

HDD / Flash Card put back


into cage / PCIe slot
Exadata Storage Server
Availability – Case 3

Alter GridDisk/CellDisk
Inactive

ONLINE OFFLINE

Alter GridDisk/CellDisk
Active
Exadata disk availability
Automatic disk online

cell unreachable
OFFLINE cell reachable
disk pulled out disk pushed in
disk inactivated disk activated
user offline disk group mounted

ONLINE SYNC
What is automated ?

• Inactivating a GridDisk or CellDisk in the cell will


automatically initiate an OFFLINE operation in the ASM
instance.
• Cell admin will be able to query the ASM instance
using cellcli for the following:
1.‗mode_status‘ of a GridDisk. Possible values for this are
ONLINE, OFFLINE, SYNCING or UNKNOWN. If for any reason
the query could not succeed, it will print out a suitable error
message. A state of UNKNOWN is returned if for example the
diskgroup was not mounted.
2.If a GridDisk can be taken OFFLINE without ASM losing all
mirror copies.
Exadata/ASM disk availability
Adding and dropping grid disks

• disk went dead


• disk online after
rebalance • cell became
inaccessible

ONLINE OFFLINE

ADD DROP

• blank disk pushed in • blank disk pushed in


• cell became • disk_repair_time
accessible expired
• user drop disk force
What is automated ?

• DROP and ADD operation of an ASM disk. Following are the


scenarios when this operation will be initiated.
• A physical disk that went bad is replaced with a new blank disk. All
ASM disks that were hosted on the failed disk will be dropped and
added back. Likewise for flash card hosting GridDisks/ASM disks.
• An ASM disk (grid disk) that is in OFFLINE state is dropped
forcefully due to disk_repair_time expiry. Subsequent to this the
disk is realized as accessible (possibly due to cell/CellSRV
becoming accessible or say, the disk plugged back into it‘s cage).
• ASM Admin initiated a drop disk command with the ‗force‘ option.
Automation will try to add the disk back to the diskgroup when any
of the trigger conditions for disk ONLINE automation operation
happens.
How it works ?

• When a physical disk is plugged in, the lun gets


automatically enabled.
• As long as the disk is not marked as IMPORT
required CellSRV will make the grid disks on that
physical disk available to ASM.
• If however the physical disk is new (replacing a dead
disk), CellSRV will re-create the cell disk and gri disks
that were hosted on the dead disk. Once this step
completes the disk will be made accessible to ASM.
How it works ?
• Querying ASM disk ‗mode_status‘ and ‗asmdeactivationoutcome‘
from cellcli. One can pass a specific griddisk name as well.

CellCLI> list griddisk attributes name, asmmodestatus


datafile1 OFFLINE
datafile2 OFFLINE
datafile3 OFFLINE

Possible values: ONLINE | OFFLINE | SYNCING | UNKNOWN

CellCLI> list griddisk attributes name, asmdeactivationoutcome


datafile1 Yes
datafile2 Yes
datafile3 Yes
Possible values: Yes | No
How it works ?
• CellSRV maintains a file called griddisk.owners.dat which has
details such as:
– ASM disk name
– ASM diskgroup name
– ASM failgroup name
– Cluster identifier
– Requires DROP/ADD
for all GridDisks that are part of ASM diskgroups.
Exadata disk availability
What requires intervention?

• Disk group dismounted due


to loss of all mirrors
Disk Group
• Manually remount the disk
Remount group when the disks
become accessible

• Move a disk from one cell to


Cell Disk another
• Import the cell disk on the
Export new cell
Import • Manually online the disk in
ASM
<Insert Picture Here>

MAA and Exadata


Oracle‟s Database HA Solution Set
Protection against all sources of downtime

Server Real Application Clusters


Failures

Oracle MAA Best Practices


Unplanned Flashback
Downtime RMAN & Oracle Secure Backup
Data ASM
Failures Data Guard
GoldenGate

System Online Reconfiguration


Changes Rolling Upgrades

Planned Data
Downtime Changes Online Redefinition

App Edition-based Redefinition


Changes
Maximum Availability Architecture (MAA)

Real Application Active


Clusters Data Guard
Secure Backups
to Disk, Tape or Data Guard
Cloud

Automatic Storage Management


Fast Recovery Area
<Insert Picture Here>

MAA best practices for


Exadata
MAA best practices
ASM disk groups
• Standard protection disk groups
• DATA – normal redundancy
• Data files only (OUTER)
• RECO – high redundancy
• One controlfile, online redo logs (1 member), archives,
flashback logs, spfile, voting disks and OCR
• Potentially a DBFS disk group for staging on the innermost section
of disk, normal redundancy
• If double partner disk or double cell failure occurs, then database
can be restored from tape and full recovery with zero data loss is
achievable
• Can restore from tape and recovery procedures
• Downtime ensues
MAA best practices
ASM disk groups

• Alternative redundancy schemes


• Both DATA and RECO high redundancy
• Higher protection reduces potential downtime
• More disk used for mirrors
• High redundancy for DATA
• Best practice to store archive logs in alternate location
• Advantages
• Full bandwidth available from all cells
• Reduced maintenance and administration
• IORM can be used to set priority of IO operations
MAA best practices
Flashback Database

• Enable Flashback Database


• Minimum impact to OLTP workloads (< 2%)
• Minimum impact to DW loads if operational practices and
recommended patches are in place
• Refer to Support Note 565535.1
• Size fast recovery area minimum
• redo rate X DB_FLASHBACK_RETENTION_TARGET
MAA best practices
Data Corruption Protection

• Oracle protects from data corruptions by:


• DB_BLOCK_CHECKING checks block semantics
• DB_BLOCK_CHECKSUM calculates and validates checksum in redo or
data blocks
• DB_LOST_WRITE_PROTECT detects stray and lost writes.
• Exadata uses ASM redundancy enabling auto-correction
of corrupt blocks during writes using the mirror copy
• ASM redundancy enables Oracle to automatically try the mirrored copy if
it detects a corrupt block
MAA best practices
Data Corruption Protection

• On primary and Data Guard standby databases:


• For OLTP and DW, set DB_BLOCK_CHECKSUM=FULL
and DB_LOST_WRITE_PROTECT=TYPICAL
• Observed less than 5% performance impact for batch and
OLTP workloads
• Do not change the DB_BLOCK_CHECKING initialization
parameter without first conducting a performance impact
analysis since the impact varies on workload
• Evaluate impact on both primary and standby
databases – there is a benefit to setting this value to
medium, true or full at the standby database even if it
is not set at the primary
<Insert Picture Here>

Data Guard and Exadata


Data Guard Redo transport
Options
• SYNC transport and Maximum Protection
• Provides zero data loss failover
• If standby is down, primary halts
• Recommended if network latency between primary and standby is < 5 ms
in order to minimize primary performance impact
• Higher network latencies may be acceptable for some applications -
performance testing is required because synchronous redo transport
will impact primary database performance at higher network
latencies
• ASYNC transport and Maximum Performance
• Minimal impact to primary performance regardless of latency
• Monitor hit ratio using x$logbuf_readhist and increase LOG_BUFFER to
avoid excessive LNS disk reads from the online redo logs
Data Guard Redo transport
Configuration best practices

• Network considerations
• Configure with Infiniband for local standby database for
approximately 2 GB/sec bandwidth using IPoIB
• Otherwise GigE will provide approximately 120 MB/sec
• Standbys can use VIP interface or dedicated interface
• Use dedicated network interface for redo transport
• Refer to Support Note 960510.1 for complete details
Data Guard Redo transport
Configuration best practices

• Tune OS & network parameters


• Tune network parameters that affect network buffer sizes and
queue lengths
• A minimum of 10 MB network buffer size is recommended
• Ensure sufficient network bandwidth for maximum database
redo rate + other activities

Refer to Primary Site and Network Configuration Best Practices


http://www.oracle.com/technology/deploy/availability/pdf/MAA_DG_NetBestPrac.pdf
Active Data Guard Apply Rate
Extreme Performance on Exadata

11.1.0.7 requires BLR 8619827


11.2.0.1 gets great performance out of the box
Data Guard and Exadata
Standby machines and EHCC
• A Data Guard standby database can be a non-Exadata
Database Machine
• Must follow Data Guard mixed configuration specification (Note
413484.1)
• If EHCC is used, Data Guard will work, but non-Exadata
Database Machine will be unable to read data in
compressed data
• Upon failover
• Tables with EHCC will require ALTER TABLE MOVE statements to
become readable
• Storage required for uncompressed data
• Performance of non-Database Machine will be different
Active Data Guard Summary
Optimized for Database Machine

• Active Data Guard is the best availability, data


protection and disaster recovery solution for OLTP
and Data Warehouse
• Generic practices still apply
• Validated and optimized for the Database Machine
• Redo Apply
• Archiving
• Proven technology: multiple customers have already
deployed Data Guard standby databases on a
Database Machine
<Insert Picture Here>

Patching and upgrades


Patching & Upgrading
Scenarios
• All standard planned maintenance solutions apply
• Database Machine upgrades may require
• Exadata Storage server software changes
• Exadata software, firmware, OFED, OS
• Database Server software changes
• Oracle database software, firmware, OFED, OS
• Switch software (InfiniBand, Ethernet)
• Patches and upgrades situations
• Exadata Storage Server patch
• Database software patch
• Database server system patch (OS or firmware)
Exadata Storage Server
Online Patching
• Exadata Storage Server patches supplied by Oracle
maintain all aspects of OS, firmware, and software
• No additional software (Linux RPMs or otherwise) is
allowed
• Only software supplied by Oracle patching is permitted
• Manual firmware changes not allowed
• Patches are one of two types
• Overlay - Restart Exadata cell services, automatic reconnect
• Staged - Restart Exadata Storage Server, resync interim changes
with ASM fast mirror resync
• Installed by whomever manages the Exadata Cells
• Use patch installation tool (patchmgr.sh) – see README
• Most patches installed using root account
Database Server Patching
• Database software patches installed by DBA w/ OPatch
• Contact Oracle Support if one-off patch conflicts with Exadata required
patches
• Operating system and firmware patches
• Verify new patch meets Exadata requirements
• IB HCA and OFED versions must match storage servers
• Additional software allowed
• Maintain compliance with Exadata requirements for all dependencies
• RAC rolling upgrade
• Database software patches
• Firmware changes
• Certified operating system upgrades
• Data Guard rolling upgrades
Exadata Database Machine
Software Maintenance Documents
• Two My Oracle Support (MOS) notes document:
• Software/firmware requirements
• Compatibility requirements between components
• Software patches and upgrades
• Procedures for download and installation
• MOS note 835032.1 documents requirements for Oracle Database
11.1 (V1) systems
• MOS note 888828.1 documents requirements for Oracle Database
11.2 (V1 HW and V2 HW) systems
• Customers should sign up for automated alerts for changes to these
MOS notes
• In the future OCM will provide automatic notification of patches and
configuration changes
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Backup and recovery
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Backup and recovery overview <Insert Picture Here>

• Best practices for disk-based backup and


recovery
• Best practices for tape-based backup and
recovery
• Backup & recovery with Data Guard
<Insert Picture Here>

Backup and recovery


overview
Backup, restore and recovery rates
Database Machine

• Backup rates
• 18 TB/Hr full image backups
• 10-46 TB/Hr effective backup rate for incremental backups
• Restore rates
• 24 TB/Hr restore rates
• Recovery rates
• 2.1 TB/Hr recovery rates
• Above rates pertain to physical files. With
compression, effective backup/restore rates will
multiply
• It all comes down to bandwidth of your slowest
component
Backup, restore and recovery operations
Database Machine
• Simple operations with standard RMAN commands
• Automatically parallelized across all storage servers
• Data aware
• Detection of block corruptions
• Auto repair and manual block repair options
• Integrated and transparently
• OLTP and data warehouse databases
• RAC, Data Guard, flashback technologies, ASM, Exadata
• Oracle native compression capabilities
• OLTP (typically 3 X compression)
• Exadata Hybrid Columnar Compression (typically 10-15 X
compression)
<Insert Picture Here>

Best Practices
for disk-based backup
and recovery
Disk-based backup and recovery
Exadata Storage Server Grid Disk layout

The faster (outer) 40% of the disk is assigned to the DATA Area
The slower (inner) 60% of the disk is assigned to the RECO Area

• Recommended disk group configuration


• Will be configured automatically during deployment
Disk-based backup & recovery
Effective rates
• Backup (and restore) rates
• 18 TB/Hr for Full Rack configurations – X2
• 5.4TB/Hr for Quarter Rack configuration – V2
• Effectively 10-46 TB/Hr for incremental backups
• Restore rates into existing files
• 24TB/hr for Full Rack configuration
• 13TB/hr for Half Rack configuration
• 5.6TB/hr for Quarter Rack configuration
• Typical Redo Apply (recovery) rates
• 200MB redo/sec (720GB redo/hour) for OLTP workloads
• 600MB redo/sec (2.1TB redo/hour) for Direct Load workloads
Disk-based backup & recovery
Strategy and advantages

• Use RMAN incrementally updated backups


• Image copy stored in the Fast Recovery Area and created
once on the initial backup
• Nightly incremental backups created in the Fast Recovery
Area
• Incremental backups merged into image copies on a 24 hour
delay basis
• Key advantages over tape-only-based backups
strategies
• Potential for using backups directly with no restore
• Reduce backup windows and resources with incremental
backups
• Faster recovery for corruptions and some Tablespace Point In
Time Recovery (TSPITR) cases
Disk-based backup & recovery
Exadata best practices for backups
• Create a Database Service ―backup‖ that runs on a
maximum of two instances
• Use incremental backups and block change tracking
• Data block inspection is offloaded to Exadata
• For highest throughput allocate 8 RMAN Channels
• Listener Load Balancing distribute the connections between the
two instances
• Use fewer channels if highest throughput is not needed
• Set init.ora parameter
_file_size_increase_increment=2143289344
• Maximum observed CPU impact
• Less than 2 CPU cores used on the two DB nodes if all 8 RMAN
channels are utilized
Disk-based backup & recovery
Exadata best practices for restores

• For restore into existing files


• Create a Database Service ―restore‖ that runs on all the
instances of the database.
• Use 2 RMAN channels per database instance for Half Rack
and larger systems
• Use 4 RMAN channels per database instance for Quarter
Rack
• For restore into a new ASM Disk Group
• Create a Database Service ―restore‖ that runs on a maximum
of two instances
• Allocate a total of 8 RMAN Channels for the restore
Disk-based backup & recovery
Script examples on Database Machine
• RMAN configuration
configure default device type to disk;
configure device type disk parallelism 8;

• RMAN script for nightly incremental level 1 backup


run {
backup
incremental level 1
for recover of copy
with tag full_database
database;
recover
copy of database
with tag full_backup;
}
Disk-based backup & recovery
Alternative FRA on Exadata

Exadata Database Oracle Exadata


Machine Storage Servers

InfiniBand Network
Disk Based Backup & Recovery
Alternative FRA on Exadata

• Allocate additional (SATA) Exadata Storage Servers


for a dedicated Fast Recovery Area
• Additional Exadata Storage Servers must be installed
in another rack
• Key benefits
• Better failure isolation when using separate backup hardware
• Allows use of lower cost space for backup
Disk-based backup & recovery
Using non-Exadata storage

• Performance and complexity will vary


• No MAA best practices
• Considerations
• Utilize IP based protocols like iSCSI or NFS
• SAN HBA and network rates may limit the backup rate
• Must use an intermediate server that acts as an iSCSI or NFS
server If SAN Based storage
• Similar to the way the Media Server bridges between the
Exadata DB Machine and the tape library
<Insert Picture Here>

Best Practices
for tape-based backup
and recovery
MAA Validated Architecture

Exadata Database Sun StorageTek


Machine SL500
Sun Fire X4170
Oracle Secure Backup
Admin Servers

2 Sun Fire X4275


Oracle Secure
InfiniBand Backup Media Fiber Channel
Network Servers SAN
Tape-based backup and recovery
Exadata Storage Server Grid Disk layout

The faster (outer) 80% of the disk is assigned to the DATA Area
The slower (inner) 20% of the disk is assigned to the RECO Area

• Recommended disk group configuration


• Can be configured automatically during deployment
Tape-based backup & recovery
Rates and configurations
• Backup rates
• Limited by number of tape drives
• 179MB/sec per LTO4 tape drive
• 8.6TB/Hr for 14 tape drives
• 29TB/Hr with Exadata Database Machine Full Rack Configuration
and 64 LTO4 tape drives.
• Restore rates (into existing files)
• Limited by number of tape drives
• 162MB/sec per tape drive
• 7.8TB/hr for Half and Full Rack Configuration (14 tape drives)
• 6.1TB/hr for Quarter Rack Configuration
• Restore rates (into empty disk group)
• 5.4 TB/hr for Quarter and 7.1 TB/hr for Half and Full (14 tape drives)
Tape-based backup & recovery
Strategy and implementation

• Oracle Database tape backup strategy:


• Weekly RMAN level 0 (full) backup
• Daily RMAN cumulative incremental level 1 backup
• To scale and maintain availability:
• For HA, start with at least two media servers with a dual
ported Host Channel Adapter (HCA) per media server,
bonded for HA
• Add tape drives until all the media server‘s HBA or HCA
bandwidth is consumed
• Add media servers and associated tape drives when the
Media Servers HCA bandwidth is consumed
• Tape-based backups scale linearly by adding Media Servers
and tape drives
Tape-based backup & recovery
Benefits and trade-offs of tape solution

• Benefits
• Fault isolation from Exadata Storage Server
• Maximizes Database Machine capacity and bandwidth
• Move backup off-site easily
• Keep multiple copies of backups in a cost effective manner
• Trade-offs
• Disk-based solutions have better recovery times for data and
logical corruptions and certain tablespace point in time
recovery scenarios
• No differential incremental backups are available
Tape-based backup & recovery
Configuration best practices for tape
• Ethernet or InfiniBand based configuration only
• Hardware changes to Database Machine are not supported
• Smaller databases can use Gigabit Ethernet
• Use a dedicated interface for the transport to eliminate impact to
client access network
• Typically a dedicated backup network is in place
• Maximum throughput with the GigE network is 120 MB/sec X
Number of Database Servers
• For a full Database Machine, 960 MB/sec possible
• Use InfiniBand for best performance
• Bigger database needing faster backup rates
• Lower CPU overhead
Tape-based backup & recovery
InfiniBand configuration best practices for tape

• Database nodes and Media Server configuration


• Use Oracle Enterprise Linux on the Media Server
• Use same kernel and OFED packages as used on Exadata
Database Machine
• Enable IPoIB connected mode and MTU changes on the
Media Server
• No changes on database nodes needed
• Minimal CPU impact
• Observed less than 1 CPU Core used per instance
Tape-based backup & recovery
Configuration best practices for tape backup

• For tape based backup create a Database Service


―backup‖ that runs on all the instances of the
database.
• Use incremental backups and block change tracking
• Data block inspection is automatically offloaded to Exadata
• Use tape hardware compression in addition to Oracle DBMS
OLTP and EHCC compression
• Allocate 1 RMAN channel per tape drive for the
backup
• Let Listener Load Balancing distribute the connections
between all the instances
• Spreads the backup I/O‘s evenly over all database nodes
Tape-based backup & recovery
Configuration best practices for tape restore

• For restore into existing files


• Create a Database Service ―restore‖ that runs on all the
database instances
• Allocate 1 RMAN Channel per tape drive
• For restore into a new ASM Disk Group
• i.e. restore after loss of the ASM Disk Group
• Create a Database Service ―restore‖ that runs on a maximum
of two database instances
• Allocate 1 RMAN Channel per tape drive
Tape-based backup & recovery
Script examples
• RMAN configuration
configure default device type to sbt;
configure device type sbt parallelism 14;

• RMAN script for weekly backup


run {
backup incremental level 0 database tag 'weekly_level0';
backup archivelog all not backed up;
}

• RMAN script for daily backup


run {
backup cumulative incremental level 1 database tag 'daily_level1‗
backup archivelog all not backed up tag 'archivelogs';
}
Tape-based backup & recovery
Oracle Secure Backup advantages

• Oracle Secure Backup (OSB) tape-based backup


advantages
• Fastest database backup to tape via tight integration with
RMAN
• Unused block compression
• Inactive Undo blocks not backed up
• Very low cost
• MAA Validated
Tape-based backup & recovery
Oracle Secure Backup best practices

• Configure the Preferred Network Interface (PNI) to


direct the OSB traffic over the InfiniBand network
interface

ob> lspni (List Preferred Network Interface)


mediaserver1:
PNI 1:
interface: mediaserver1-ib
clients: dbnode1, dbnode2, dbnode3, dbnode4, dbnode5, dbnode6,
dbnode7, dbnode8
PNI 2:
interface: mediaserver1
clients: adminserver
Database Machine backup & recovery
Documentation

• Backup and Recovery Performance and Best


Practices for Sun Oracle Database Machine and
Exadata

• http://www.oracle.com/technology/products/bi/db/exadata/pdf/
maa_tech_wp_sundbm_backup_final.pdf
3rd Party Media Management Vendor
No additional complexity

• Third party vendors test and validate their own


products
• Contact the MMV for configuration best practices
• No additional certification specific to Exadata required
• Tune the network communication within the MMV to
exploit the full potential of the InfiniBand or GigE
networks
• Production customers are using third party tape
backup products to backup Exadata systems today
<Insert Picture Here>

Backup and recovery


with Data Guard
Backup & recovery with Data Guard
Offload backup operations to standby database

• Both disk and tape based backups can be performed


from the physical standby Data Guard environment
• Offloads the backup to the standby environment
• Reduce backup times with fast incremental backups
• Eliminate impact to the primary environment
• Additional Data Guard benefits
• Auto block repair with zero impact on application
• Offload reads and reporting, backups, and testing
• Used for planned maintenance and rolling database upgrade
• Used for disaster recovery or high availability with Data Guard
Fast-Start Failover
Data Guard & the Database Machine
Data Guard Best Practices

• Oracle Data Guard: Disaster Recovery for Sun Oracle


Database Machine and Exadata

• http://www.oracle.com/technology/deploy/availability/pdf/maa
_wp_dr_dbm.pdf
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Best practices for data loading
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Data loading and Exadata <Insert Picture Here>

• Oracle Database File System (DBFS)


• DBFS performance expectations
• Configuration and implementation
<Insert Picture Here>

Data loading and


Exadata
Data loading
Definitions
• External tables
• Allows flat file to be accessed via SQL PL/SQL as if it was a table
• Enables complex data transformations and data cleansing to
occur ‗on the fly‘
• Avoids space wastage
• Direct Path loads in parallel
• Bypasses buffer cache and writes data directly to disk via multi-
block async I/O
• Use parallel to speed up load
• Remember to use Alter session enable parallel
• Range partitioning
• Enables partition exchange loads, with minimal service
interruption
Data loading
Exadata challenges
• The optimal method for loading a data warehouse is using
external tables.
• The Oracle Database Machine consists of a scalable storage grid
and a scalable database grid with Real Application Clusters.
• You can‘t run a cluster-parallelized SQL statement against an
external table unless it resides in shared storage.
• You can‘t maximize the throughput for the database grid of the
Oracle Database Machine with a simple single-headed NFS filer
or other such bottlenecked solution
<Insert Picture Here>

Oracle Database File


System
Oracle Database File System (DBFS)
Architecture Overview
Linux Client Host
• FUSE (File system in Application
UserSpacE)
LibC
• An API and Linux Kernel module
used to implement Linux Linux Kernel Fuse
filesystems in user land.
Oracle Database File System (DBFS)
Architecture Overview
Linux Client Host
• DBFS is a file system Application dbfs_client
interface for SecureFiles (OCI)
LibC
• DBFS Content Repository
implements a file server Linux Kernel Fuse
• PL/SQL package implements file
calls
SQL*Net
• File create, open, read, Oracle Database
list, etc.
DBFS Instance
• Files are stored as Secure File
DBFS Content Repository Package
LOBs DBFS Content Repository Package
Create Open Read Write List
• Directories and file metadata Create Open Read Write List
stored in tables/indexes
Secure Files
Metadata Tables
Oracle Database File System (DBFS)
Architecture Overview
Linux Client Host
• Combining DBFS with FUSE Application dbfs_client
offers mountable file systems (OCI)
LibC
• With RAC, DBFS is a scalable,
distributed file system. Linux Kernel Fuse

SQL*Net
Oracle Database
DBFS Instance
DBFS Content Repository Package
DBFS Content Repository Package
Create Open Read Write List
Create Open Read Write List

Secure Files
Metadata Tables
<Insert Picture Here>

DBFS performance
expectations
Oracle Database File System (DBFS)
Performance expectations
• Full Rack Oracle Database Machine can load a little more
than 5TB/h (V2)
• Data staged In DBFS
• DBFS tablespaces reside on same disks as data warehouse
tablespaces
• Data loaded into normal redundancy ASM Disk Group so double
writes
• Total I/O == 15.6TB/h or 4.4 GB/s
Oracle Database File System (DBFS)
Performance expectations
• 4.4 GBs/second -
• 1.5 GB/s flowing from a file system housed in one Oracle database
• 2.9 GB/second of writes (ASM normal redundancy)
• Could you achieve the same outside of DBFS on Database
Machine?
• 1.5 GB/s supply-side is 13 active line-rate GbE paths, or
• 2 active IB paths with NFS via TCPoIB from a high end NAS
device
• DBFS solves the problem without any additional resources
outside the rack
<Insert Picture Here>

Configuration and
execution
Configuration
DBFS
• House DBFS in a dedicated database
• Use DBCA with OLTP template to create database
• AMM or ASMM are fine…prefer ASMM though
• 8GB SGA buffer pool, 1GB shared pool
• Redo logs should be at least 2GB
• Create bigfile tablespace for the file system (8K, 16K
blocksize)
• Create a DBFS user (e.g., dbfs identified by dbfs)
• Grant create session, create table, create procedure and
dbfs_role to DBFS user
• Grant quota unlimited on the DBFS tablespace to DBFS user
DBFS
Implementation
• Create DBFS File System
• cd to $ORACLE_HOME/rdbms/admin)
• Start SQL Plus
SQL>@dbfs_create_filesystem_advanced.sql <TS Name> <FS
Name>\ nocompress nodeduplicate noencrypt non-partition
• Mount the file system
$ nohup $ORACLE_HOME/bin/dbfs_client dbfs@ -o\
allow_root,direct_io /data <passwd.txt &
DBFS
Implementation
• Move flat files to DBFS using FTP, SCP
• Define external tables with CREATE command
• You can move compressed files to save network bandwidth
• Use preprocessor directive to decompress external tables
Data loading best practices
External Tables
• Full usage of SQL capabilities directly on the data
• Automatic use of parallel capabilities (just like a table)
• No need to stage the data again
• Better allocation of space when storing data
• High watermark brokering
• Additional capabilities
• Optional sorting at load time (think improved compression)
Data loading best practices
Direct Path loads
• Data is written directly to the database storage using
multiple blocks per I/O request using asynchronous
writes
• Data bypasses buffer caches
• A CTAS command always uses direct path
• An INSERT AS SELECT needs an APPEND hint to
go direct

Insert /*+ APPEND */ into Sales


partition(p2)
Select * From ext_tab_for_sales_data;
Data loading best practices
Parallelize the load

• Specify parallel attribute either with hint or in both


table definitions
• CTAS will go parallel automatically when DOP is
specified
• IAS will not automatically parallelize
• Needs parallel DML to be enabled

ALTER SESSION ENABLE PARALLEL DML;


Data loading best practices
Partition exchange
1. Create external table
for flat files Sales Table
DBA May 18th
2008

May 19th
2008

May 20th
2008
2. Use CTAS command
to create non- May 21st
partitioned table 2008 Sales
TMP_SALES May 22nd
table now
2008 has all the
Tmp_ sales May 23rd
data
Table 2008

May 24th
th
3. Create indexes 2008 5. Gather
Statistics
4. Alter table Sales
exchange partition
May_24_2008 with table
Tmp_ sales tmp_sales
Table
<Insert Picture Here>

Receiving and injecting


Staging flat files
Receiving files
Provider System
SCP,FTP, etc

RAC Node | DBFS Client RAC Node | DBFS Client RAC Node | DBFS Client
DBFS Instance 1 DBFS Instance 2 DBFS Instance 3
/data/FS1 /data/FS1 /data/FS1 …

Oracle Database

DBFS
Content Repository: FS1

Oracle Database Machine


Staging flat files
Injecting files
• Provider systems must support FUSE client
• Linux x64, Linux x32, Solaris x64, Solaris Sparc, HP-UX PA-
RISC64, HP-UX IA64, AIX PPC64 :
• The dbfs_client executable can be used on the Provider
System to ―inject‖ data into DBFS repository
• Eliminates the need to mount the DBFS file system just to stage
files
• Required libraries:
• libfuse.so.2, libclntsh.so.11.1, libnnz11.so
• Very network-efficient
• Tremendous relief in Database Machine processor utilization
• Customers may choose TCPS protocol (SQL*Net)
Injecting data with dbfs_client
Architectural Overview
./dbfs_client dbfs@DBFS1 --command cp
/data/stage1/all_card_trans.ul
dbfs:/FS1/stage1/all_card_trans.ul

RAC Node
Provider System DBFS Instance 1 …
SQL*Net
dbfs_client executable (OCI)
Oracle Database

DBFS
Content Repository: FS1

Oracle Database Machine


Injecting data with dbfs_client
Performance comparison

• Without any tuning, injecting data into the DBFS repository


from the Provider System via TCP over Gigabit Ethernet is
nearly 80% more efficient than scp+ssh

SCP vs dbfs_client "Injection"

120
107
99
100

80
scp
60
46 dbfs_client "Injection"
40

20 10

0
% CPU MB/s
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Consolidation of mixed workloads
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Consolidation challenges and questions <Insert Picture Here>

• Consolidation configuration options


<Insert Picture Here>

Consolidation challenges
Typical consolidation challenges

• Packaged applications
• Schema name collisions
• Different SLAs
• 24/7 versus 8/5
• Daytime (<2 secs) vs Night (batch only)
• Workload types: OLTP, DW, hybrid
• Sizing for availability
• Predictable response times
• Application tier scalability
Key question
Consolidation of mixed workloads

• Can you mix workloads?


• Should you mix workloads?
• Can <> Should
• We also allow you to partition a table such that each row is in a
separate partition.
Mixed Workloads
Should you consolidate?
• Mixed workloads can be consolidated when one or more
of these exist:
• Excess capacity
• Inverted profiles
• Clear workload priorities
• Not good consolidation candidates when (for example):
• SLAs are incompatible
• Cannot use tools and techniques to provide separation
• IORM can't be employed effectively (i.e. Flash scans)
• No substitute for real testing
• Not yet enough field experiences for to derive best practices
<Insert Picture Here>

Consolidation
configuration options
High-level consolidation options
Database
• Single RAC database
• Place all schemas in one database
• Pro:
• Better resource control
• Less overhead
• Focus on one database's performance and management
• Con:
• One set of instance-level params
• Outage affects all tenants
• Migration to single database can be challenging, since they
were separate for a reason
High-level consolidation options
Database
• Multiple RAC databases
• Move databases with minimal changes
• Pro:
• Flexibility for different params and versions/patches
• Simple platform migration
• Security and isolation more easily achieved
• Con:
• Resource control more difficult
• More moving parts to manage
• Most common choice
• RAC One Node
• Single-instance databases, one cluster
High-level consolidation options
Storage
• Single diskgroup, all cells
• Stripe all data across all cells (DBFS, DATA, RECO only)
• Pro:
• Maximum throughput/bandwidth
• Centralized resource control
• Less overhead
• Simpler management
• Con:
• Loss of two cells (normal redundancy) may cause an outage
• Recommended option
High-level consolidation options
Storage
• Segregate groups of cells
• Isolated environments, more simultaneous failures tolerated
• Pro:
• Can sustain more simultaneous failures (potentially)
• Little chance of one database impacting other
• Con:
• Reduced throughput/bandwidth
• Management overhead
• Fewer CPUs to operate on decompression
• Sizing such environments is difficult, especially performance
High-level consolidation options
Storage
If you are going to run mixed workloads successfully on
one Exadata system, one of the following have to be
true:
1.All priority workloads are OLTP.
2.All priority workloads are data warehouse.
3.The Exadata Smart Flash Cache is mainly used by
OLTP workloads.
1. KEEP attribute on objects
2. Flash disks
High level consolidation options
Exadata Smart Flash Cache considerations

• Size KEEP objects to fit in cache


• No software limits; up to you to size properly to fit physical limits
• Up to 80% of total flash cache on each cell can hold KEEP objects,
20% always reserved for "hot" objects.
• Keeping too much ~ no KEEP at all
• If SUM(bytes of KEEP objects) > (80% of SUM(bytes of ESFC)),
cannot depend on an object being in ESFC
• Overall recommendation is to start with no KEEP objects
• Currently, IORM does not help manage which objects are
cached
• Flash scans will also read from disk while reading from flash
High level consolidation options
Backup and DR considerations
• Backups
• Scheduling may require adjustment
• Currently dedicated hardware may need replacement or evaluation
• Reconciling differing SLAs if merging into single database
• Many databases: backup concurrency needs to be considered
• Disaster Recovery
• Disaster could impact all applications on consolidated platform,
which may cause conflicts in SLAs.
• Is DR site using Exadata? (EHCC considerations)
High level consolidation options
The only way to *know*: Test
• Best: Real Application Testing
• Real database load from actual recorded load
• Before and after statistics compare directly
• Note: Shared servers or connection pools may make Database
Replay difficult or impossible.
• Not Bad: Load testing tools
• Challenge: simulate production workload mix
• Expensive, difficult to implement
• Not Bad: Parallel production run, real users, real load
• Challenge: simulate real production workload mix
• Subjective user feedback: "slower" or "faster" or "crashed"
• Unfortunate: See if it "works"
• Too common
• Usually done under the "it should work" notion
<Insert Picture Here>

Consolidation tools
Oracle features
Tools for successful consolidation

• Database Resource Manager (DBRM)


• I/O Resource Manager (IORM)
• Instance caging
• RAC
• Services
• Database server pools
• RAC One Node
Oracle features for consolidation
Database Resource Manager
• Allows control over:
• How CPU is shared among multiple applications
• Maximum CPU utilization of an application
• Manage runaway queries (based on execution time estimate)
• Degree of parallelism
• Multiple levels of prioritization
• Consumer groups are based on services, username, and
other session attributes
• Available to all Enterprise Edition databases
• Allocation scheme (resource plan) changes are dynamic
• Can also include I/O Resource Manager
Oracle features for consolidation
Database Resource Manager
• I/O Resource Manager
• Inter-database I/O resource management (via storage cell config)
• Intra-database I/O resource management (via DBRM config)
• When using categories, can provide additional granularity
• Instance caging at the server level
• Provides a way to limit the amount of CPU an instance can use
Oracle features for consolidation
RAC and related features
• Scalability and availability
• Oracle Services
• Enables workload management and workload placement
• Parallel servers follow service placement
• Services can be designated by
• Application
• Group of users (DBRM)
• Workload type
• Combination of these
Oracle features for consolidation
RAC and related features
• Database server pools
• Service associated with server pool
• One server (SINGLETON) or all (UNIFORM)
• Server pools have minimum and maximum number of servers
• Server pools have priorities
• High priority server pool can grab servers from lower priority server
pools when required
• A method to implement different SLAs
• Quality of Service (QoS) with server pools (coming in 11.2.0.2)
Oracle features for consolidation
RAC and related features
• Quality of Service management
• Identify existing server pools to manage
• Define Performance Classes based upon workloads
• Associate Performance Classes to databases services
• Map Performance Classes to SLAs
• Create Performance Policies
• Rank Performance Classes to map to SLA priorities
• Set a Performance Objective per Performance Class
• When Performance Objective is not met for a Performance Class
• Identifies bottlenecked resource and sends alert.
• If CPU is bottleneck,
• Adjusts CPU shares through DBRM
• Increases size of server pool within SP Override constraints.
• Maintains performance and audit record.
Oracle features for consolidation
RAC and related features
• RAC One Node
• Enables seamless single-instance failover utilizing RAC features
• Allows single instances to utilize Exadata features
Oracle features for consolidation
Options for database CPU provisioning

Feature Non- RAC Granularity Granularity Change Active before I/O


RAC (measure) (manage) scope oversubscription

DBRM Y Y % of total % of total Resource N Y


CPUs CPUs plan (without ma
Instance Y Y Core Core Resource Y N
caging plan

Services N Y Server Server Server Y N

Server pools N Y Server Server Server Y N


group

Note: Database Resource Manager does more than provision CPU.


<Insert Picture Here>

Consolidation sizing
Sizing for consolidation
Considerations
• Cumulative resource requirements
• Utilize AWR to determine current requirements
• Sizing presentation describes sizing in general
• Database Machines will probably have
• Faster CPUs
• Faster storage – more IOPS
• Greater network bandwidth – more MBs/second
• Reductions in amount of data moved to CPU
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Sizing for the Database Machine
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Sizing challenges <Insert Picture Here>

• Sizing options
• Comparative sizing method
<Insert Picture Here>

Sizing challenges
Sizing challenges
What‟s the big deal?
• One of three answers –
• Quarter
• Half
• Full
• A simple answer does not imply a simple process for
arriving at the answer
Sizing challenges
Issues
• Capacity sizing is simple
• A single number for each resource category
• Workload sizing is complex
• Reflects cumulative amounts of resource consumption across a
broad range of heterogeneous database operations
• Real world workload sizing is even more complex
• Interaction of workload demands, both average and peak, over
time against resources
• Additional resource demands stem from interactions of different
interactions
Sizing challenges
Impact of sizing decision
• System sizing drives solution price
• Under-sizing will reduce price & under-cut competitors
• Under-sizing will reduce pricing/discount pressure
• Under-sizing will result in business impact
• Impact can be MANY TIMES greater than system price
• Example: $60,000 under-investment ==> $1M+ business loss
• Can result in serious customer satisfaction issues or even
lawsuits
Sizing challenges
Impact of sizing decision
• Process is key to customer satisfaction
• Need a defined & documented process
• Process must produce the same results given same inputs
• Need to retain historical sizing documents
• Accuracy & transparency
• Must be reasonably accurate (often a RANGE of sizes)
• Must be able to explain the process (transparency)
<Insert Picture Here>

Sizing options
Sizing processes
Analytic processes
• Comparative sizing
• System refresh
• System replacement
• Competitive
• Predictive sizing
• New application deployments
• Depend on accurate metrics of workload and real-world
comparisons
• Use predictive sizing to check comparative approach
• Hybrid approach
• Very scalable method for sizing customer systems
• Produces relatively accurate result
Sizing processes
Benchmark-based sizing

• Involves building and executing a benchmark or POC


• Enormous amount of work of questionable value
• Needs to simulate production data & volumes
• Needs to simulate end-user workloads
• Accuracy based on accuracy of simulation
• Many (most?) POCs do not properly model data & workloads
• Mocked-up data often includes improperly skewed data
• Not a scalable process for sizing thousands of
systems
• Benchmarks provide valuable data to fine-tune
analytic sizing
Sizing approach
Exadata considerations
• Database Machine implementation is unique
• Sum of Exadata features means much higher effective I/O
throughput
• I/O configuration is balanced and predetermined by CPU
count
• Can scale up (larger/more Database Machines) or out (more
Exadata Storage Servers), but minimum I/O bandwidth is fixed
• Memory is balanced and predetermined by CPU count
• Not enough data for real world workloads and
configurations to go with predictive approach
• The first phase of Exadata sizing approach focuses on
comparative approach built around database server CPUs
Comparative sizing
Real world needs Near-term, most Exadata
deployments will be

New system replacements.


Increasing Complexity & Risk

application An “80/20” rule applies.


deployments
(Predictive Sizing)

Sizing for DBMS


migration
(Either or both approaches)

Sizing for system replacement


(Comparative Sizing)
<Insert Picture Here>

Comparative sizing
method
Comparative Sizing
Steps

1. Gather inputs from customer


2. Perform DB tier comparison
3. Validate storage requirements
4. Validate current system utilization
5. Evaluate growth projections
6. Quantify the resulting Exadata benefits
7. Conduct read-out & provide report to customer
8. Post-production follow-up
Comparative sizing
Gather inputs
Gather Inputs Existing config
Server (CPU) DBMS & Ver. Only Oracle is supported in V1 of tool

Disk size DBMS options RAC, Secure Files, OLAP, etc.

Utilization O/S Windows, Linux, Unix

Growth Server H/W Vendor & Model (HP, Dell, Sun, IBM, etc.)

X-factor Cluster Conf. Number of nodes (symmetric/asymmetric)

Report CPU CPU model, speed, cores & number

Feedback SAN/Disk DAS, SAN, NAS, vendor, model, speeds

DB Storage ASM, OCFS2, VCFS, etc.

Utilization AWR or other helpful (not mandatory)

Customer Pain Business and/or Technical Pains

Perf. KPI’s Customer measures of performance


Comparative sizing
DB tier CPU comparison
Gather Inputs
Server (CPU) Existing Config
M-Values Exadata
Disk size
Development
Utilization SPECint to define
Benchmarks Comparison
Growth
Metrics & Process
X-factor POC Results
Report
Feedback X4170 Server
Server-only sizing at this stage
Sizing

Equivalent sizing only. Not yet


DB Node Sizing
sized for growth, performance, etc
Comparative sizing
CPU comparisons
• Find the type, speed, and number of CPU cores of the
system that the Database Machine is competing against
or replacing
• Use SPECint comparisons to find the equivalent number
of Database Machine cores needed
• Adjust number of cores upwards if database/application moving
from single instance to RAC
• Adjust number of cores downwards, if competing or replacing
slower CPUs than shown in the tables (the table lists the best
case)
• Pick the size of the Database Machine (Quarter, Half, Full,
Mulitple Racks) that‘s closet to the number of cores
needed
Sun Sparc – SPECint Comparison
Sun Processor CINT2006_rates Equivalent
Database
System
Machine
Cores

M-Series Sparc64 VII (2.75 GHz) 49.1 / 4 cores = 12.3/core 0.45

M-Series Sparc64 VI (2.4 GHz) 352 / 32 cores = 11/core 0.40

E25K UltraSparc IV+ (1.95 GHz) 1230 / 144 cores = 8.5/core 0.32

V890 UltraSparcIV+ (2.1 GHz) 154 / 16 cores = 9.6 /core 0.35

T5xxx UltraSparc T2 Plus (1.6 GHz) 97 / 8 cores = 12.125/core 0.45

Note: These are the best case numbers on a per-core basis. Database Machine CPU
is 26.6/core
IBM Power – SPECint Comparison
IBM Processor CINT2006_rates Equivalent
Database
System
Machine
Cores
Power 7 Eight-Core (3.86
pSeries
GHz)
652 / 16 cores = 40.8/core 1.53

pSeries Power6 Dual-Core (5.0 GHz) 2180 / 64 cores = 34/core 1.28

Note: These are the best case numbers on a per-core basis. Database Machine CPU
is 26.6/core
Note: IBM does have faster per core numbers for the Power7. But this is based on a
quad-core version where they effectively plug in an 8-core chip, turn off 4-cores and
run the remaining 4 cores at a faster speed. This is benchmark special and not cost
effective for the customer.
HP Itanium – SPECint Comparison
HP Processor CINT2006_rates Equivalent
Database
System
Machine
Cores
Itanium Quad-core 9350
Integrity
(1.73 GHz)
134 / 8 cores = 16.75/core 0.63
Itanium Dual-core 9050
Integrity
(1.6GHz)
53.9 / 4 cores = 13.5/core 0.50

Superdome Intel Itanium 2 (1.66 GHz) 1650 / 128 cores = 12.9/core 0.48

Note: These are the best case numbers on a per-core basis. Database Machine CPU
is 26.6/core
AMD Opteron – SPECint Comparison
System Processor CINT2006_rates Equivalent
Database
Machine
Cores
Opteron Dual-Core 2222
HP DL185 G5
(3.0 GHz)
61 / 4 cores = 15.25/core 0.57
Opteron Quad-Core 2389
HP DL385 G5p 143 / 8 cores = 17.9/core 0.67
(2.9 GHz)
Opteron Six-Core 8439
HP DL585 G5
(2.8 GHz)
416 / 24 cores = 17.3/core 0.65
Opteron 12-core 6176
HP DL385 G7
(2.3 GHz)
398 / 24 cores = 16.6/core 0.63

Note: These are the best case numbers on a per-core basis. Database Machine CPU
is 26.6/core
Intel – SPECint Comparison
System Processor CINT2006_rates Equivalent
Database
Machine Cores
Xeon Dual-Core X5270
HP DL380 G5
(3.5 GHz)
90.7 / 4 cores = 22.7 / core 0.85
Xeon Quad-Core X5365
HP DL380 G5
(3.0 GHz)
116 / 8 cores = 14.5 / core 0.55
Xeon Quad-Core X5470
HP DL360 G5
(3.33GHz)
150 / 8 cores = 18.75/core 0.70
Xeon Quad-Core X5570
HP DL360 G6
(2.93 GHz)
251 / 8 cores = 31.4/core 1.18

Note: These are the best case numbers on a per-core basis. Database Machine CPU
is 26.6/core
Comparative sizing
Validate storage requirements DB Node Sizing

Gather Inputs
Server (CPU)
Disk capacity is fixed by rack size
Disk size Need vs Capacity SAS is assumed
Utilization
Growth
X-factor
Compression Assume not more than 2X + A.C. (not HCC)
Report
SAS vs SATA Assume SAS unless demand for SATA
Feedback
Expansion Cab. Must justify why extra Cells would be okay

DB Node & Storage Sizing


Comparative sizing
Validate CPU utilization and growth potential
Gather Inputs DB Node & Automated data collection (AWR, sar, etc.)

Server (CPU) Storage Sizing for utilization would be ideal, but cannot
be collected in all cases.
Disk size
Utilization Key Peak Stats
Growth % CPU Busy

X-factor % Mem Util


IOPS
Report
MBPS Adjust Sizing
Feedback
Data Growth
Proc growth
May adjust for some KPIs Perf. KPI’s Final Sizing for
Growth
Comparative sizing
Quantify Exadata benefits
Gather Inputs Customer Pains
Server (CPU) Performance KPI’s
Disk size Technology Adoption
Utilization Integration
Growth Business Pain
Exadata
X-factor
Advantages IT/Tech Pain
Report
Not a Single Number
Feedback
Map to Pains
Justifies purchase
Map to Exadata
Quantifiable Metrics Ex: I/O Bandwidth, IOPS, offload %
Metrics
Subjective Advantages Ex: Ease/Speed of deployment
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Migrating to the Database Machine
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Initial considerations <Insert Picture Here>

• Migration strategies
• Migration methods overview
• Physical migration
• Logical migration
• Migration methods in practice
• Bulk data movement
<Insert Picture Here>

Initial considerations
Database Machine software
Considerations

• Exadata Storage Server software and Oracle Database


• Versions must match
• Cannot run 11.2 Exadata with 11.1 Database (or vice versa)
• 11.1 and 11.2 cannot coexist on same machine
• Important consideration for migration from v1 to v2
• Sun hardware 11.2 only
• HP hardware either 11.1 or 11.2
• Operating system
• Oracle Enterprise Linux (OEL5) Linux x86_64
• Little endian format
<Insert Picture Here>

Migration strategies
Migration strategy
Migration method considerations
• Determine what to migrate
• Because of Exadata unique features (e.g. Smart Scan), expect
differences between source and Exadata warehouse databases
• Fewer indexes, fewer materialized views, potentially different
partitioning strategy, compression
• Avoid methods that migrate what you will discard
• Consider configuration of source system
• Not all migration methods available for all source environments
• Non-Oracle: Not covered in this presentation, although many
methods work if you take into consideration platform differences
• Oracle: Source database version and platform matters
• Target system fixed: 11.2, ASM, Linux x86-64
Migration strategy
Migration method considerations

• Implement best practices


• Will the migration method accommodate best practices?
Examples
• Large extents (8MB) for large segments – at extent
allocation
• Don‘t consider migration method in isolation - avoid methods
that prevent proper best practices

• Minimize downtime
• Yes, but implementing best practices is more important (your
future performance depends on it)
<Insert Picture Here>

Migration methods
overview
Migration methods
Overview

Physical migration Logical migration


• Data remains in datafiles • Data unloaded from
(block-for-block) source, loaded into
• Most methods are whole Exadata database w/ SQL
database migration • Easier to migrate subset
• Generally more restrictive • Easier to implement
structural best practices
• Generally less restrictive

http://www.oracle.com/technology/products/bi/db/exadata/pdf/migration-to-exadata-whitepaper.pdf
Migration methods
Migration method choice

• No single best method for all cases, but in general …

Data Warehouse OLTP


• Typical strategy • Typical strategy
• Change structure • Structure intact
Reduce / remove indexes, MVs
• Change storage
Use new compression (EHCC)
• Migration method choice
Optimize extent sizing • 1st: Physical
• Change platform • 2nd: Logical
Source big endian
• Migration method choice
• 1st: Logical
• 2nd: Physical
<Insert Picture Here>

Physical migration
Physical migration
Basics

• Data remains in datafiles (block-for-block)


• Database extent sizes remain the same
• Most methods perform whole database migration
(except TTS)
• Inherit legacy database configuration
• indexes, MVs, no compression
• Stricter requirements
• Platform and version changes restricted
Physical migration
Challenges

• Best practices challenged


• Suboptimal sizing
• Migrate unnecessary objects
• Objects can be recreated post migration, but
• Why not use logical method in the first place?
Physical migration
Methods at a glance

• Physical standby
• Transportable database (TDB)
• Transportable tablespaces (TTS)

If best practices not already implemented on source


database, consider logical migration method
Physical migration methods
Physical standby

• Overview (Note 1055938.1)


• Create physical standby on DBM
• Data Guard switchover
• Source system criteria
• 11.2 on Linux (or Windows – see Note 413484.1)
• Use this method migrating from HP DBM running 11.2
• Outage time
• Data Guard switchover
• Consider
• Archivelog mode and LOGGING required
• New DB_UNIQUE_NAME needed
Physical migration methods
Physical standby plus database upgrade
• Overview (Note 1055938.1)
• Create physical standby on DBM
• Apply archives
• Activate standby
• Run database upgrade scripts
• Source system criteria
• 11.1+ on Linux
• Outage time
• Time to apply archives + run database upgrade scripts
• Consider
• Archivelog mode and LOGGING required
• New DB_UNIQUE_NAME needed
Physical migration methods
Transportable database (TDB)
• Overview
• RMAN CONVERT DATABASE
• Transfer datafiles to Exadata storage
• CONVERT subset of datafiles, as required (up to 2GB/s) (Note:732053.1)
• Run transport script
• Source system criteria
• 11.2 on little endian
• Outage time
• Transfer all datafiles + partial CONVERT + transport script
• Consider
• Do not use source system conversion
• Staging space requirement – size of files that need CONVERT
• OLAP AWs need special consideration (Note 352306.1)
Physical migration methods
Transportable tablespace (TTS)
• Overview
• Build empty 11.2 Exadata database
• TTS export source system metadata
• Transfer files to Exadata (CONVERT if source system big endian)
• TTS import metadata into Exadata database
• Source system criteria
• 10.1 or later, any endian
• Outage time
• TTS export + Transfer files + CONVERT (if necessary) + TTS import
• Consider
• If source system big endian, CONVERT on source system
• Staging space requirement - size of files that need CONVERT
• OLAP AWs need special consideration (Note 352306.1)
Physical migration
Method selection

Method When to use


Physical standby Linux source on 11.2, archiving and LOGGING

Transportable database Little endian source on 11.2


Transportable tablespaces Big endian source >= 10.1
Little endian source >=10.1, <11.2
<Insert Picture Here>

Logical migration
Migration methods
Logical migration

• Data unloaded from source, loaded into Exadata


database w/ SQL
• Move only the user data
• Best practices can be added
• 4MB ASM AU size set for new disk groups
• Large extents (8MB) for large database segments
• Table compression, if desired
• Partitioning (added or changed), if desired
Logical migration
Methods at a glance

• Logical standby
• GoldenGate / Streams
• Data Pump
• Create Table As Select (CTAS) or Insert As Select
(IAS)
Logical migration methods
Logical standby
• Overview
• Steps depend on starting point - See following slides
1. Source database 11.2
2. Source database < 11.2 (including HP DBM)
• Source system criteria
• Linux (check Note 413484.1 for cross-platform support)
• Outage time
• Typically Data Guard switchover + application failover
• Consider
• Archivelog mode, LOGGING, and supplemental logging required
• Data type support
• Can apply catch up?
Logical migration methods
Logical standby – source system 11.2
• Overview
• Create logical standby on 11.2 DBM
• Change table storage characteristics, as desired (Note:737460.1)
• Data Guard switchover
• When to use this method
• Table storage characteristics will be changed
• If not, use physical standby method
Logical migration
Logical standby – source system < 11.2

• Overview (Note 1055938.1)


• Create logical standby on source system (e.g. 11.1 HP DBM)
• Shutdown and copy logical standby + controlfile to 11.2 DBM
• RMAN: duplicate target database for standby from
active database
• Upgrade logical standby to 11.2 (run upgrade scripts
manually)
• Enable redo transport and standby apply to catch up
• Change table storage characteristics, as desired
(Note:737460.1)
• DG switchover
• When to use
• Table storage characteristics will be changed or
• Rolling database upgrade
Logical migration
GoldenGate / Streams
• Overview
• Create and upgrade replica on DBM
• Stop apply
• Implement best practices on replica (e.g. unload, recreate, reload)
• Start apply to catch up
• Disconnect users from primary, reconnect to DBM
• Source system criteria
• 10.1+ on any platform (GoldenGate allows different DBMS, too)
• Outage time
• Application reconnection
• Consider
• Archivelog mode, LOGGING, and supplemental logging required
• Data type support
• Can apply catch up?
Logical migration
Data Pump

• Overview
• Create Exadata database
• Import user data into Exadata using Data Pump
• Network mode - Direct import from source via dblink
• Can result in large UNDO on target
• File mode - Export to dump file(s), transfer file(s), Import
• Source system criteria
• 10.1 or later on any platform
• Outage time
• Network mode - 1x data movement
• File mode - 3x data movement and 2x staging space
Logical migration methods
CTAS / IAS
• Overview
• Create Exadata database
• CTAS or IAS
• From external tables in DBFS staging area
• From dblink to source database
• Source system criteria
• Any version or platform
• Outage time
• Significant (3x) variation depending on partitioning (and what
scheme), compression, target data type
• Consider
• Use DBFS for staging external tables, not local filesystem
• Dblink - Manually parallelize
Logical migration
Method selection

Method When to use


Logical Standby Rolling database upgrade requirement
Table storage characteristics will be changed
Oracle GoldenGate Minimal downtime requirement
Streams Different source platform
Data Pump Data type restriction with other methods
CTAS / IAS Initial bulk load
<Insert Picture Here>

Migration methods in
practice
Migration methods
In practice

• Current most data warehouses not on Linux x86-64


and not running 11g, so most physical methods
eliminated
• Most data warehouses replaced by Exadata are running
either Oracle on big-endian UNIX, or competitor (e.g. DB2,
Netezza, Teradata)
• Customers only want tables with user data in order to
implement new database configuration determined
during testing
Migration methods
In practice

• Most common methods used thus far


• Combination for staged migration
• CTAS/IAS or Data Pump for the initial bulk load into
Exadata while source remains in use
• Perform daily loads (external tables) into both source and
Exadata
• Initially users serviced by source database
• Move users over to Exadata
• Stop daily load into source
Migration Scenario
From 11.1 HP DBM

• Restriction
• RDBMS 11.1 cannot use Exadata 11.2
• RDBMS 11.2 cannot use Exadata 11.1
• Option #1 - Physical Standby + Database Upgrade
• Option #2 – Logical Standby source system < 11.2
• Reduce downtime – rolling database upgrade
Migration Scenario
From 10gR2 / 11gR1 on Big Endian

• Option #1 – Transportable Tablespaces


• Option #2 – Data Pump
• Implement best practices not in source database
• Option #3 – GoldenGate, Streams
• Reduce downtime
• Implement best practices not in source database
Migration Scenario
From 10gR2 / 11gR1 on Little Endian (non-DBM)
• Option #1 – Physical Standby + Database Upgrade
• Check Note 413484.1 for cross platform standby support
• Option #2 - Logical Standby source system < 11.2
• Reduce downtime – rolling database upgrade
• Check Note 413484.1 for cross platform standby support
• Option #3 - Data Pump
• Implement best practices not in source database
• No cross platform standby support
• Option #4 – GoldenGate, Streams
• Reduce downtime
• Implement best practices not in source database
<Insert Picture Here>

Bulk data movement


Bulk data movement

• Performance criteria
• Network
• Protocol
• Source system
• Target system (i.e. DBM)

Note: Bulk data movement to the DB servers – you do


NOT move data directly to the storage – it always
goes through an instance on a DB server first.
Bulk data movement
Network

• 2 networks can get data to DB servers on DBM


• InfiniBand (IB) 4x QDR 40Gb/s per link
• Gigabit Ethernet (GbE) 1Gb/s
• eth1 and eth2 can be bonded for aggregation

• In practice, IB is about 20x faster than single GbE


• IB 2GB/s vs GbE 110MB/s for single connection (TCP)

Use IB network
Bulk data movement
Protocol
• TCP over IB (TCPoIB)
• On source system
• Use IP connected
mode (CM)
• Set Large MTU
(65520)
• DBM DB servers
already configured

• RDS - only used by


Oracle for RAC and
storage traffic

• SDP - stick w/ TCP


Bulk data movement
Protocol

• Oracle Net TCP


• Set SDU=32767
• Yields more efficient write by Oracle Net to socket buffer
Bulk data movement
Source system

• Source system
• I/O subsystem must deliver
• Fast IB network can‘t compensate for slow I/O
• CPU usage varies
• Data transfer with very fast networks can cause high CPU usage
• One CPU may be pegged while others have headroom (e.g.
interrupt handling)
• Use mpstat(1) to investigate
Bulk data movement
Target system

• Target system (DB servers of the Exadata system)


• ASM for staging
• Stored in Exadata
• Oracle-structured files only (e.g. data files, DP dump files)
• Excellent disk I/O throughput
• Oracle tool required to move data (DFT, ASMCMD CP, RMAN
BACKUP AS COPY AUXILIARY, XDB FTP)
• DFT 115 MB/s for single connection (use multiple to scale)
• Double (or triple) writes for ASM redundancy
• 600MB/s network rate translates to 1200+MB/s ASM write rate
Bulk data movement
Target system

• Target system (DB servers of the Exadata system)


• DBFS for staging (Note 1054431.1)
• File system in a database, using Exadata storage
• Standard OS tools
• Local disk file system for staging
• Do NOT use it for staging
• Not designed for performance
• Use DBFS – better performance, higher capacity, shared
For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>

Title of Presentation
Why you don‟t need a Database Machine
Presenter‘s Name
Presenter‘s Title

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008 Oracle Corporation – Proprietary and Confidential
Module Agenda

• Magic? <Insert Picture Here>

• The goal
• Build your own database machine
• Do you need the Database Machine?
<Insert Picture Here>

Magic?
Looking back

• At this point, you understand more about Exadata


technology (hopefully)
• Software
• 11g
• Exadata Storage Server Software
• Hardware
• Components
• Flash
• Balanced configuration
Is it magic?
Well, is it?
• No
• You could build a machine to deliver the same
performance
• As long as it can achieve the same proven throughput as a
Database Machine
<Insert Picture Here>

The goal
The goal
Winter Corporation Exadata Proof of Concept

• Workload
• Execute 4 complex, concurrent queries
• Vast amounts of data, query I/O rate peaked at 14
GB/s and queries complete in 99 seconds
• Sun Oracle Database Machine completes these
queries in 48 seconds—without using any V2 software
features.
• 48 seconds == 20.8 GB/s
The goal
Don‟t forget!

• Remember –
• The Sun Oracle Database Machine is a balanced
configuration, so you must guarantee I/O
throughput capabilities in every section of the
machine you will build
• Database Machine also provides for balance
across components and high availability
<Insert Picture Here>

Build your own Database


Machine
Build your own Database Machine
Network bandwidth

• To achieve 20.8 GBs/s


• 53 active 4GFC Fibre Channel paths
• Fibre Channel SAN Arrays put disks in drawers
• Drawers are connected to the array controller with 4GFC FC
cabling
• Need 53 drawers (maybe 26 depends on array)
Build your own Database Machine
Disk bandwidth

• To achieve 20.8 GBs/s


• Exadata offers this 20.8 GB/s with 168 SAS disks.
• 15K RPM SAS disks are the same as 15K RPM FC drives
• You have to spread 168 disks over 53 drawers
Build your own Database Machine
Array controllers

• To achieve 20.8 GBs/s


• ~3 disks per drawer == massive wasted cabinet space
• More drawers == more array controllers
Build your own Database Machine
HA for storage cabling and switches

• To achieve 20.8 GBs/s


• 53 active paths need HA protection
• Dual port HBAs
• Now I need 106 runs of FC cabling from storage to switch and
106 from switch to hosts
• Ugh, I need multiple switches.
• Director/high end switches?
Build your own Database Machine
CPUs

• To achieve 20.8 GBs/s


• And now the fun begins. The data has to get to a set of
database hosts
• V2 Database Machine has 8 2s8c16t Nehalem EP based
server. Great, that‘s easy. But, Exadata does offload
processing…hmmm…
• I need 39 CPUs just to match the Exadata offload processing
Build your own Database Machine
CPUs for offload processing

• To achieve 20.8 GBs/s


• I also need the 8 database hosts (2s8c16t Nehalem EP) that
the Database Machine used
• So:
• If using 2s8c16t servers I need 103 cores…round up to 13
servers.
• OK, go make a 13 node RAC cluster and work out the 53
active Fibre Channel paths…all with balance!
Build your own Database Machine
Create and build the cluster

• To achieve 20.8 GBs/s


• OK, go make a 13 node RAC cluster and work out the 53
active Fibre Channel paths…all with balance!
• 53/13 is 4 HBAs per host.
• 2s8c16t servers generally don‘t support 4 dual port
HBAs…but maybe I found some that do…
<Insert Picture Here>

Is it worth it?
Typical DW technical architecture
Hardware needed to achieve 6 GB/s
Ethernet Network Team Switch Vendor
Interconnect

Database DBAs DB Vendor

Unix Sys Admin OS Vendor


Unix/Linux OS

HBA H/W admin HBA Vendor


Massively Shared Infrastructure

Storage design LVM Vendor


Volume Manager

FC Switches Data Fabric FC switch Vendor

LUNS Storage Admin Storage Vendor

Storage array Vendor Support

What chance of getting it right ?


Virtually Impossible to scale
Typical DW technical architecture
Hardware needed to achieve 18 GB/s (v2)

or . . . .
Typical DW technical architecture
Hardware needed to match X2-8 (2x cores)

or . . . .
Database Machine
Hardware needed to achieve 18 GB/s
Is it worth it?
Database versus purpose-built

Purpose-built Database Machine


• Does not require benefits • Not fully saturated
of Exadata smarts • Single vendor
• Replaced software with • Scalable
hardware • Preconfigured
• Very complex to
implement and manage
• 13 node RAC grid is
totally saturated
• Potentially unpleasant
place to be
DIY Exadata-like Performance?
Heresy?
"If IT guys go out and build infrastructure under an Oracle
database in their enterprise IT shop, it's a major design
project. […] the IT team has to go in and figure out what's
the right servers to buy, what's the right storage to buy, how
do I connect them all together properly into a cluster or a
SAN or whatever they're doing.

"And this is a big deal: it takes months and months, and lots
of negotiating with lots of vendors, and at the end of the day
they have this completely unique system that they built—and
it's really good, but they're the only ones in the world who
have this unique system. Which means that if there's any
problem, they're going to be the first ones to find it, right?

--Andrew Mendelsohn
Is it worth it?
Well, is it?
© 2008 Oracle Corporation – Proprietary

You might also like