Student Manual

For Oracle employees and authorized partners only. Do not distribute to third parties.
© 2008
2009 Oracle Corporation – Proprietary and Confidential
<Insert Picture Here>
Title of Presentation
Exadata Oracle Database Machine Overview
Presenter‘s Name
Presenter‘s Title
Student Manual
© 2008
Module Agenda
• Why Exadata? <Insert Picture Here>
• Exadata features for increasing resources

• Exadata features for reducing demand
• Exadata features for insuring efficiency
• Exadata benefits
• Exadata sizing and licensing
• Summary
© 2009 Oracle Corporation – Proprietary and Confidential
Why Exadata?
Database machine
Requirements with 11g R2
• Data integrity
• MVRC
• Data Guard
• ASM
Database machine
• Data integrity
• Performance
• MVRC
• Efficient caching
• Bitmap indexes
• Partitioning
• Materialized views
Database machine
• Data integrity
• Performance
• Scalability
• RAC
• Powerful platforms
Database machine
Requirements
• Data integrity
• Performance
• Scalability
Oracle delivers!
Database Machine
What more?
• Efficiency
Resources
Efficiency = -----------------------
Demands
Business impact of efficiency
• Business queries if 10 TB of data must be scanned
Data Bandwidth
1 GB/sec 3 queries/work day Don‘t Ask
10 TB
10 GB/sec
3 queries/hour Ask Tomorrow
100 GB/sec
35 queries/hour Ask Anything
Exadata Database Machine
Extreme efficiency
• Fast Predictable Performance
• Lower Ongoing Costs
• The Fastest Time to Value & Lowest Risk
Exadata features for

increasing resources
Exadata Hardware Architecture
Scaleable Grid of industry standard servers for Compute and Storage
• Eliminates long-standing tradeoff between Scalability, Availability, Cost
Database Grid Intelligent Storage Grid

• 8 Dual-processor x64 • 14 High-performance low-cost
database servers storage servers
OR
• 2 Eight-processor x64 • 100 TB High Performance disk,
database servers or
336 TB High Capacity disk
InfiniBand Network • 5.3 TB PCI Flash
• Redundant 40Gb/s switches
• Data mirrored across storage
• Unified server & storage servers
network
Exadata Smart Flash Cache
Extreme Performance OLTP & DW
• Exadata has 5 TB of flash

• 56 Flash PCI cards avoid disk
controller bottlenecks
• Intelligently manages flash

• Smart Flash Cache holds hot data
• Avoids large scan wipe-outs of cache
• Gives speed of flash, cost of disk
5X More I/Os than
1000 Disk Enterprise • Exadata flash cache achieves:
Storage Array • Over 1 million IO/sec from SQL (8K)
• Sub-millisecond response times

reducing demand
Exadata Intelligent Storage
• Exadata storage servers also run more complex

operations in storage
• Join filtering
• Incremental backup filtering
• I/O prioritization
• Storage Indexing
• Database level security
• Offloaded scans on encrypted data
Exadata • Data Mining Model Scoring
Intelligent Storage
Grid • 10x reduction in data sent to DB servers
is common
Traditional Scan Processing

 • With traditional storage, all
SELECT
customer_name Rows Returned database intelligence
FROM calls resides in the database
WHERE amount > hosts
200;
 • Very large percentage of
DB Host reduces data returned from storage
terabyte of data to 1000 is discarded by database

customer names that servers
Table
are returned to client
Extents • Discarded data consumes
Identified
valuable resources, and
impacts the performance of
 other workloads

I/Os Issued I/Os Executed:
1 terabyte of data
returned to hosts
Exadata Smart Scan Processing
Reduces demand
 • Only the relevant columns
SELECT
customer_name • customer_name

FROM calls Rows Returned
and required rows
WHERE amount > • where amount>200
200; are are returned to hosts
 • CPU consumed by predicate

 Consolidated evaluation is offloaded
Extents and Result Set
metadata sent to Built From All
storage Cells • Moving scan processing off
the database host frees host
CPU cycles and eliminates
 massive amounts of
Smart Scan
 unproductive messaging
identifies rows and
2MB of data • Returns the needle, not the
columns within
returned to server entire hay stack
terabyte table that
match request
Additional Smart Scan functionality
Reduces demand
• Join filtering
• Filtering is performed within Exadata storage cells
• Join predicates are transformed into filters
• Backups
• Only changed blocks are returned
• Create Tablespace (file creation)
• Formatting of tablespace extents eliminates the I/O associated with
the creation and writing of tablespace blocks
• Smart Scan offload for encrypted tablespaces and
columns
• Offload of Data Mining Model scoring
Exadata Hybrid Columnar Compression
Highest Capacity, Lowest Cost
• Data is organized and compressed by column

• Dramatically better compression
• Speed Optimized Query Mode for Data

Warehousing
• 10X compression typical
Query
• Runs faster because of Exadata offload!
• Space Optimized Archival Mode for

infrequently accessed data
• 15X to 50X compression typical
Faster and Simpler

Backup, DR, Caching,
Benefits Multiply
Reorg, Clone
Exadata Storage Index
Transparent I/O Elimination with No Overhead
Table Index
• Exadata Storage Indexes maintain summary
information about table data in memory
A B C D
• Store MIN and MAX values of columns
1 • Typically one index entry for every MB of disk
Min B = 1
3
Max B =5 • Eliminates disk I/Os if MIN and MAX can never
5 match ―where‖ clause of a query
5
8 Min B = 3 • Completely automatic and transparent
Max B =8
3
Select * from Table where B<2 - Only first set of rows can match

insuring efficiency
Exadata I/O Resource Management
Mixed Workloads and Multi-Database Environment
• Ensure different databases are

allocated the correct relative amount of Database A Database B
I/O bandwidth
• Database A: 33% I/O resources
• Database B: 67% I/O resources
• Ensure different users and tasks within InfiniBand Switch/Network
a database are allocated the correct
relative amount of I/O bandwidth Exadata Cell Exadata Cell Exadata Cell
• Database A:
• Reporting: 60% of I/O resources
• ETL: 40% of I/O resources
• Database B:
• Interactive: 30% of I/O resources
• Batch: 70% of I/O resources
Exadata benefits
Exadata Benefits
Fast Predictable Performance
• More predictable timeliness of results

• Faster results by moving Oracle database
intelligence to disk storage
• Properly configured out-of-the-box

• Ready to run - plug it in
• More capabilities to support more

business analysts
• Scale to support an enterprise
Brian Camp
SVP, Infrastructure Services
KnowledgeBase Marketing
“After carefully testing several data warehouse platforms, we chose the

Oracle Database Machine. Oracle Exadata was able to speed up one of
our critical processes from days to minutes. The Oracle Database Machine
will allow us to improve service levels and expand our service offerings.”
Performance Query Throughput Query Throughput with Flash
60
50
50
Why is Oracle Faster?
40
 DB Processing in Storage
30
 Better Compression (10x) 21
20
 Smart Flash Cache
11.4
10
7.5
 Faster Interconnect (40Gb/sec) 10
 More Disks 0
HITACHI TERADATA NETEZZA SUN ORACLE
USP V 2550 TwinFin 12 Database Machine
 Faster Disks (15K RPM)
© 2009 Oracle Corporation For Oracle employees and authorized partners only. Do not distribute to third parties.
Exadata Performance Scales
10 Hour
• Exadata delivers brawny
Table Scan Time hardware for use by Oracle‘s
brainy software
Typical • Performance scales with size

Warehouse
5 Hour
• Result
• More business insight
• Better decisions
• Improved competitiveness
1 Hour
Exadata
1TB 10 TB 100TB Table Size
Exadata sizing and

licensing
Exadata Hardware Architecture
Scaleable Grid of industry standard servers for Compute and Storage
• Eliminates long-standing tradeoff between Scalability, Availability, Cost
Database Grid Intelligent Storage Grid

• 8 Dual-processor x64 • 14 High-performance low-cost
database servers storage servers
OR
• 2 Eight-processor x64 • 100 TB High Performance disk,
database servers or
336 TB High Capacity disk
InfiniBand Network • 5.3 TB PCI Flash
• Redundant 40Gb/s switches
• Data mirrored across storage
• Unified server & storage servers
network
Standardized and Simple to Deploy
• All Database Machines are the same

• Delivered ready-to-run
• Tested
• Highly supportable
• No unique configuration issues
• Identical to config used by Oracle Engineering
• Runs existing OLTP and DW applications

• Full 30 years of Oracle DB capabilities
• No Exadata certification required
Deploy in Days, • Leverages Oracle ecosystem

Not Months • Skills, knowledge base, people, partners
Paul Hartley
General Manager
LGR Telecommunications
―You can easily remove six months of the

implementation cycle…‖
―…we estimate there‘s up to a 70 percent reduction in

terms of cost of ownership compared to custom
solutions, just in terms of the personnel savings.‖
from Profit Magazine, February 2009
Exadata Storage Server Building Block
• High-performance storage server built from
• Hardware by Sun industry standard components
• Software by Oracle
• 12 disks - 600 GB 15000 RPM High
Performance SAS or 2TB 7200 RPM High
Capacity SAS
• 2 Six-Core Intel Xeon Processors (L5640)
• Dual ported 40 Gb/sec InfiniBand
• 4 x 96 GB Flash Cards
• Intelligent Exadata Storage Server software
New - Exadata Database Machine X2-8 Full Rack
Extreme Performance for Consolidation, Large OLTP and DW
• 2 x64 Eight-processor Database servers (Sun Fire 4800)

• High Core, High Memory Database Servers
• 128 CPU cores (64 per server)
• 2 TB (1 TB per server)
• 10 GigE connectivity to Data Center
• 16 x 10GbE ports (8 per server)
• 14 Exadata Storage Servers X2-2
• All with High Performance 600GB SAS disks
OR
• All with High Capacity 2 TB SAS disks
• 3 Sun Datacenter InfiniBand Switch 36
• 36-port Managed QDR (40Gb/s) switch
• 1 ―Admin‖ Cisco Ethernet switch
• Redundant Power Distributions Units (PDUs)
Add more racks for additional scalability

Exadata Database Machine X2-2 Full Rack
Pre-Configured for Extreme Performance
• 8 x64 Dual-procesor Database Servers (Sun Fire X4170 M2)

• 96 cores (12 per server)
• 768 GB memory (96GB per server)
OR
• Keyboard, Video, Mouse (KVM) hardware
Add more racks for additional scalability

Exadata Database Machine X2-2 Half Rack
• 4 x64 Dual-procesor Database Servers (Sun Fire X4170

M2)
OR
Can Upgrade to a Full Rack

Exadata Database Machine X2-2 Quarter Rack
• 2 x64 Dual-procesor Database Servers (Sun Fire X4170 M2)

OR
Can Upgrade to an Half Rack

Start Small and Grow
Quarter Half Full

Rack Rack Rack
Exadata Product Capacity
X2-8 X2-2 X2-2 X2-2
Full Rack Full Rack Half Rack Quarter Rack
High Perf Disk 100 TB 100 TB 50 TB 21 TB
Raw Disk1
High Cap Disk 336 TB 336 TB 168 TB 72 TB
Raw Flash1 5.3 TB 5.3 TB 2.6 TB 1.1 TB
User High Perf Disk 28 TB 28 TB 14 TB 6 TB

Data2 High Cap Disk
(assuming no
100 TB 100 TB 50 TB 21 TB
compression)
1 – Raw capacity calculated using 1 GB = 1000 x 1000 x 1000 bytes and 1 TB = 1000 x 1000 x 1000 x 1000 bytes.
2 - User Data: Actual space for end-user data, computed after single mirroring (ASM normal redundancy) and after
allowing space for database structures such as temp, logs, undo, and indexes. Actual user data capacity varies by
application. User Data capacity calculated using 1 TB = 1024 * 1024 * 1024 * 1024 bytes.
Exadata Product Performance
X2-8 X2-2 X2-2 X2-2
Full Rack Full Rack Half Rack Quarter
Rack
Raw Disk Data High Perf Disk 25 GB/s 25 GB/s 12.5 GB/s 5.4 GB/s
Bandwidth1,4 High Cap Disk 14 GB/s 14 GB/s 7 GB/s 3 GB/s
Raw Flash Data Bandwidth1,4 50 GB/s 50 GB/s 25 GB/s 11 GB/s
High Perf Disk 50,000 50,000 25,000 10,800

Disk IOPS3,4
High Cap Disk 25,000 25,000 12,500 5,400
Flash IOPS3,4 1,000,000 1,000,000 500,000 225,000
Data Load Rate4 5 TB/hr 5 TB/hr 2.5 TB/hr 1 TB/hr
1 – Bandwidth is peak physical disk scan bandwidth, assuming no compression.

2 - Max User Data Bandwidth assumes scanned data is compressed by factor of 10 and is on Flash.
3 – IOPs – Based on IO requests of size 8K
4 - Actual performance will vary by application.
Database Server Operating System Choices
• Two Operating System Choices on the database servers

• Oracle Linux
• Solaris 11 Express (x86) – coming soon
• Customers will choose their preferred Database Server
OS at installation time
• Exadata Storage Servers will continue to be Oracle Linux
Exadata Licensing
Database nodes
Required Products
Oracle Database 11g Enterprise Edition
Oracle Exadata Storage Server Software
Highly recommended products
RAC
Partitioning Option
Other Recommended Software
Advanced Compression Option
Enterprise Manager Packs: Diagnostics, Provisioning, Tuning
OLAP Option
Data Mining Option
Advanced Security Option
Real Application Testing
Oracle Business Intelligence Enterprise Edition Plus
Exadata summary
Exadata Database Machine Summary
Extreme Performance for all Data Management
• Best for Data Warehousing

• Smart scan of 10x compressed tables
• Parallel query on in-memory data
• Overall up to 5x faster than 11.1 for Warehousing
• Best for OLTP

• Only database that scales real-world applications on grid
• Smart Flash Cache for 20x IOP‟s or 20x fewer disks
• Smart Flash cache can hold entire working set
• Up to 50x compression for archival data
• Secure, fault tolerant
• Best for Consolidation

• Only database machine that runs and scales all workloads
• Predictable response times in multi-database, multi-application, multi-user
environments
© 2009 Oracle Corporation For Oracle employees and authorized partners only. Do not distribute to third parties.
Smart features
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Smart Scans <Insert Picture Here>
• Smart Scan feature support

• How Smart Scans work
• Smart Scans and Oracle features
• Other Smart features
• Smart Scan benefits
Smart Scans
Smart Scan
• Finite resources can lead to performance bottlenecks

• I/O is the chief source of bottlenecks in modern
computing systems
• Smart Scan is designed to reduce the amount of data
flowing from the storage devices to the database
servers
• Traditional Scan Example:

 • Telco wants to identify
SELECT customers that spend
customer_name more than $200 on a
FROM calls single phone call
WHERE amount > • The information about
200; these premium customers
occupies 2MB in a 1
terabyte table

 • With traditional storage, all
SELECT database intelligence
customer_name resides in the database
FROM calls hosts
WHERE amount > • Database server nodes
200; must identify all table
extents that may contain
requested data
 • Partitioning may help to
Table eliminate some extents
Extents
Identified

 • Database server issues I/O
SELECT requests for all potentially
customer_name relevant data
FROM calls • Storage system returns all
WHERE amount > relevant data to database
200; server, using I/O
bandwidth
• Storage system
 returns blocks of data
Table
Extents
Identified
 
1 terabyte of data
returned to hosts

 • Database server must
SELECT 
discard irrelevant data by
customer_name Rows Returned
checking values against
FROM calls selection criteria
WHERE amount > • Final results are sent to
200; client

• Large use of resources
DB Host reduces
terabyte of data to 1000 • CPU/memory for mapping
 extents
customer names that
Table
are returned to client • I/O bandwidth from disk for
Extents
data which will be
Identified
discarded
• CPU to impose selection
criteria
 
1 terabyte of data
returned to hosts

SELECT • Smart Scan Example:
customer_name • Same SQL request is
FROM calls issued
WHERE amount > • Smart Scan completely
200; transparent to applications
and users
• Even if cell fails during
operations

customer_name • Database server sends
FROM calls database extents and
WHERE amount > metadata to Exadata
200; Storage Server cells

Table extents and
meta-data sent to
cells

customer_name • Smart Scan processing on
FROM calls the Exadata Storage cells
WHERE amount > scans data blocks to
200; identify relevant rows and
columns

Table extents and
meta-data sent to
cells

Smart Scan
identifies rows and
columns within
terabyte table that
match request

customer_name • Only relevant rows and
FROM calls columns returned to
WHERE amount > database server
200; • Does not return blocks
when Smart Scan
used
 • Will return blocks
Table extents and when appropriate
meta-data sent to
cells

Smart Scan

identifies rows and
2MB of data
columns within
returned to server
terabyte table that
match request

customer_name  • Database server only has
FROM calls to assembled returned
WHERE amount > Rows Returned relevant data into result set
200; • No wasted I/O bandwidth or
database server CPU


Consolidated
Table extents and Result Set
meta-data sent to Built From All
cells Cells

Smart Scan

identifies rows and
2MB of data
columns within
returned to server
terabyte table that
match request
Smart Scan feature

support
Smart Scan
Row filtering
• Predicate filtering
• >, <. =, !=, <=, =>, IS [NOT] NULL, LIKE, [NOT] BETWEEN,
[NOT]IN, EXISTS, IS OF type, NOT, AND, OR
• Most SQL functions
• Full list
• SELECT * FROM v$sqlfn_metadata WHERE offloadable =
‗YES‘;
Smart Scan
Column projection
• Smart Scan only returns columns requested by query

• Significant reduction in I/O bandwidth
• Diagram based on SQL query SELECT B, C FROM tablea;
A B C D E A B C D E B D
Smart Scan
Join filtering
• Join filtering for star schemas

• Joins large tables to smaller tables
• Uses Bloom filters
• A way to indicate membership in a set in a compact way
• Bloom filters are used to reduce potential row candidates
for join, reducing the data sent to the database server for
join processing
Smart Scans – how they

work
Smart Scan
Uses direct reads
• Direct reads are not new to Exadata
• Direct reads involves reading the data into PGA buffers as opposed
to the buffer cache used in for caching data blocks
• Direct reads make sense when the ratio of cache to the data to be
read is very small.
• If the cache is very small relative to the data to be read, the buffers
are going to be evicted anyways and perhaps at a cost of
adversely affecting any OLTP type of applications.
• Exadata for DW environments involves scanning large volume of data
and returning results in formatted data blocks – should not go to
buffer cache
Smart Scan at work
• Query submitted SELECT customer_name

FROM calls WHERE amount > 200;
Smart Scan at work

• Optimizer makes execution plan
Step 1
Step 2
FULL ACCESS
Smart Scan at work

• Full scan access – Smart Scan
eligible
• Smart Scan not performed if query Step 1
Step 2
columns includes LOBs or other FULL ACCESS
conditions
• Smart scan processing
Smart Scan at work

• Full scan access – Smart Scan
• Smart scan processing
Step 1
• Select rows and projected Step 2
FULL ACCESS
columns returned to PGA

Smart Scan at work

FROM calls WHERE amount = 200;
Smart Scan at work

Step 1
Step 2
No FULL ACCESS
Smart Scan at work

• Not scan access – block request
Step 1
Step 2
No FULL ACCESS
Smart Scan at work

• Not scan access – block request
• Blocks returned to buffer cache
Step 1
• Normal block processing Step 2
No FULL ACCESS
Smart Scans and Oracle

features
Smart Scan and Oracle features
• All standard Oracle features continue to work as normal,

including
• Consistent reads
• Locking
• Chained rows
• Compressed table
• Partitioned tables
• Materialized views
• National Language Processing
• Date arithmetic
• Regular expression searches
. . . and everything else . . .

Other Smart features

Exadata Software features 11.2
Offloaded data mining scoring
• Data mining scoring executed in Exadata:
select cust_id
from customers
where region = ‘US’ Scoring function
and prediction_probability(churnmod, ‘Y’ using *) > 0.8; executed in
Exadata
• All data mining scoring functions offloaded to Exadata

• Up to 10x performance gains
• Reduced CPU utilization on Database Server
Exadata Software Features
Smart incremental backup
• Recovery Manager does Block Change Tracking
• Maintains list of groups of blocks where data has changed
• Incremental backup only backs up marked groups of blocks
• Exadata Storage Server improves the granularity of tracking units,
reducing size of backup even more
Change Tracking File Content for 1MB
001010110010101100101011001010000
Smart Incremental backup Request

Exadata Software Features
Fast file creation
• Standard tablespace creation/extension

• Tablespaces created by the database are initialized
• Full blocks initialized as part of process by database server
and written to storage
• Exadata tablespace creation/extension
• Only metadata is sent by database server to Exadata Storage
Server
• Initialization is done by the Exadata Storage Server software
on the drives
• Tremendous reduction in I/O between database and
storage system
• Corresponding reduction in overhead
Exadata software features
Encrypted data
• Exadata Storage Servers perform Smart Scans on

encrypted data in tablespaces and columns
• Data is decrypted by Exadata cells before being sent
to the database servers
• X2 Exadata Database Machines use hardware decryption
Smart Scan benefits

Smart Scans
Benefits
• Normal scans return all data blocks to the database

server
• Scan speed of storage throttled by limitation in data
flow from storage to database server
• Smart Scan can perform scans at full 1.5 GBs/sec,
while only returning relevant data
• Smaller amount of relevant data does not cause I/O
bottleneck
• Since no storage-database server bottleneck, more
cells scale for higher throughput
Smart Scans
Determining benefits
• Single query
• EXPLAIN PLAN
• Operation name and redicate information will use keyword of
storage
Smart Scan
CELL_OFFLOAD_PLAN_DISPLAY
• Controls whether the offload status of a step in an
execution plan is displayed
• Set with ALTER SYSTEM or ALTER SESSION
commands
• Values
• AUTO (default) – displays predicates if cell is present and table is
on the cell
• ALWAYS – shows option whether cell is present or not
• NEVER – does not display offload status
• Be aware – optimizer does not control if processing is
actually offloaded, just if it is eligible
Monitoring Smart Scan
Efficiency
cell session smart scan efficiency = cell IO uncompressed bytes +

cell physical IO bytes saved by storage index)/
cell physical IO interconnect bytes returned by smart scan
SQL> SELECT b.name, a.value FROM v$mystat a, v$statname b

WHERE a.STATISTIC# = b.STATISTIC# AND
b.name = 'cell session smart scan efficiency';
NAME VALUE
--------------------------------------------------------------------------
cell session smart scan efficiency 11.9
Monitoring Smart Scan
V$SQL statistics
• Statistics for individual SQL statements
• IO_CELL_OFFLOAD_ELIGIBLE_BYTES
• IO_CELL_OFFLOAD_RETURNED_BYTES
• OPTIMIZED_PHY_READ_BYTES
• And others
• Also available in
• V$SQLAREA
• V$SQLAREA_PLAN_HASH
• V$SQLSTATS
• V$SQLSTATS_PLAN_HASH
Smart Scans
Building on benefits
10 TB of user data 1 TB 100 GB

Requires 10 TB of IO with compression with partition pruning
Subsecond
On Database
Machine
20 GB 5 GB
with Storage Indexes with Smart Scans
Data is 10x Smaller, Scans are 2000x faster

© 2008
Compression
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Oracle compression options <Insert Picture Here>
• Advanced Compression Option (ACO)

• Advanced Compression in the real world
• Exadata Hybrid Columnar Compression
(EHCC)
• EHCC in the real world
• Tips and techniques
Oracle compression
options
Oracle Database Compression
Use Case Product Feature
Database 11g
OLTP Advanced Compression Advanced Compression
Database 11g SecureFiles Compression

Unstructured (File) Data Advanced Compression SecureFiles Deduplication
Database 11g RMAN Compression

Backup Compression Advanced Compression Data Pump Compression
Database 11g Data Guard Redo
Network Compression
Advanced Compression Transport Compression
Data Warehouses Exadata V2 EHCC Warehouse Compression
Cold / Historical Data Exadata V2 EHCC Archive Compression

Advanced Compression
Option
Compress All Your Data
• Compress large application tables

• Transaction processing, data warehousing
• Compress all data types
• Structured and unstructured data types
• Improve query performance
• Cascade storage savings throughout data
center
Up To
4X
Compression
Table Compression
• Oracle Database 11g extends table compression for
OLTP (and other) data
• Support for conventional DML operations
• Average storage savings of 2-4x
• New algorithm significantly reduces write overhead
• Improved performance for queries accessing large
amounts of data
• Compression enabled at either the table or partition
level
• Completely transparent to applications
Table Compression
Block-Level Batch Compression
• Patent pending algorithm minimizes performance overhead and

maximizes compression
• Individual INSERTs and UPDATEs do not cause recompression
• Compression cost is amortized over several DML operations
• Block-level (local) compression keeps up with frequent data
changes in OLTP environments
• Competitors use static, fixed size dictionary table thereby
compromising compression benefits
Table Compression
Initially Uncompressed Block

Employee Table
ID FIRST_NAME LAST_NAME Header

1 John Doe
1•John•Doe 2•Jane•
2 Jane Doe
Doe 3•John•Smith 4•
3 John Smith Jane • Doe
4 Jane Doe Free Space
INSERT INTO EMPLOYEE

VALUES (5, „Jack‟, „Smith‟);
COMMIT;
Table Compression
Employee Table Compressed

Block Block
ID FIRST_NAME LAST_NAME Header

John=|Doe=|Jane=|Smith=
1 John Doe
1•• 2•• 2•Jane•
1•John•Doe 3•• 4 • 
2 Jane Doe •Doe
 5•Jack•
3•John•Smith 4•
3 John Smith Jane •Free
DoeSpace
4 Jane Doe
5 Jack Smith
Local
Symbol Table
Table Compression Syntax
OLTP Table Compression Syntax:
CREATE TABLE emp (
emp_id NUMBER
, first_name VARCHAR2(128)
, last_name VARCHAR2(128)
) COMPRESS FOR OLTP;
Direct Load Compression Syntax (default):

CREATE TABLE emp (
emp_id NUMBER
, first_name VARCHAR2(128)
, last_name VARCHAR2(128)
) COMPRESS [BASIC];
Table Compression Advisor
• Available in 11g Release 2
• Available on OTN *
• Supports Oracle Database 9i Release 2 through 11g Release 1
• Shows projected compression ratio for uncompressed tables
• Reports actual compression ratio for compressed tables (11g Only)
* http://www.oracle.com/technology/products/database/compression/compression-advisor.html
SecureFiles
• Next-generation high performance LOB
• Superset of LOB interfaces allows easy migration from LOBs
• Transparent deduplication, compression, and encryption
• Leverage the security, reliability, and scalability of database
• Enables consolidation of file data with associated relational data
• Single security model
• Single view of data
• Single management of data
• Scalable to any level using SMP scale-up or grid scale-out
• SecureFiles standard with Oracle Database 11g
• Compression and deduplication with Advanced Compression
Option
• Encryption with Advanced Security Option
SecureFiles
Deduplication
Secure
Hash
• Enables storage of a single physical image for duplicate data

• Significantly reduces space consumption
• Dramatically improves writes and copy operations
• No adverse impact on read operations
• May actually improve read performance for cached data
• Duplicate detection happens within a table, partition or sub-partition
• Very useful for content management, email applications and data
archival applications
SecureFiles
Compression
• Significant storage savings for unstructured data
• Three levels of compression (LOW/[MEDIUM]/ HIGH ) provide desired
ratios
• 2-3x compression for typical files (combination of doc, pdf, xml)
• Compression Level LOW (NEW in 11.2)
• Compression algorithm optimized for high performance
• 3x less CPU utilization than default SecureFiles Compression
• Maintains 80% compression of default SecureFiles Compression
• Allows for random reads and writes to compressed SecureFile data

• Can be specified at a partition level
• Automatically detects if SecureFile LOB data is compressible
• Independent of table or index compression
Network Compression
Data Guard Redo Transport Services
• Compress network traffic between primary and standby databases
• Lower bandwidth networks (<100Mbps)
• 15-35% less time required to transmit 1 GB of data
• Bandwidth consumption reduced up to 35%
• High bandwidth networks (>100 Mbps)
• Compression will not reduce transmission time
• But will reduce bandwidth consumption up to 35%
• Syntax:
LOG_ARCHIVE_DEST_3='SERVICE=denver SYNC
COMPRESSION=ENABLE|[DISABLE]'
• Ref. MetaLink 729551.1 “Redo Transport Compression in a Data

Guard Environment”
Redo Transport Compression
NoCompression
OLTP workload No Compression Batch workload
Compression
Compression 300
20
Redo Transport(Mbit/sec)
18
REDO Transport (Mbit/s)

250
16
14 2X Compression! 200
12 5X Compression!
10 150
8
100
6
4
50
2
0 0
Time Time
• More efficient bandwidth utilization, up to 5x compression ratio

• Compression did not impact throughput or response time
Validation performed by CTC in collaboration with Oracle Japan Grid Center

http://www.ctc-g.co.jp/en/
in the real world
Oracle‟s Internal E-Business Application DB
• Oracle‘s Internal E-Business Suite Production System deployed ACO in 2009
• 4-node Sun E25K RAC, 11gR1
• Average overall storage savings 3x
• Table compression 4x
• Index compression 2x
• LOB compression 2.3x
• 65TB of realized storage savings primary, standby and test systems
• Additional benefits were also accrued in Dev clones and Backups
• Payroll, Order-2-Cash, AP/AR batch flows, Self-Service flows run without regression,
Queries involving full table scans show speedup
Oracle‟s Internal Beehive Email DB
• Production system on 11gR1 & Exadata for Primary and Standby
• Using Exadata Storage Servers for storage
• Average Compression Ratio: 2x
• Storage savings add up with standby, mirroring, flash recovery area
• Compression went production in 2009
• Consolidate 90K employees on this email server, more being migrated
• Savings As of April 2010
• Beehive Saved 365TB of storage using Advanced Compression
• Incrementally saves 2.6TB/day based on db size growth
• Savings higher with Sun user migration
• Compression also helped improve performance by caching only
compressed emails in memory and reducing I/O latencies
SAP R/3, BW, Leading Global Company
• Compression on SAP databases
at leading global company
• Oracle Database 11g Release 2
• SAP R/3 DB
• 4.67TB Uncompressed
• 1.93 TB Compressed
• 2.4x compression ratio
• SAP BW DB
• 1.38 TB Uncompressed
• .53 TB Compressed
• 2.6x compression ratio
• Leverage 11g compression for
Tables, Indexes and LOB data
Exadata Hybrid
Columnar Compression
Compression • New in Exadata Version 2
Unit
• Hybrid columnar compressed tables
• New approach to compressed table storage
• Useful for data that is bulk loaded and queried
• Update activity is light
• How it works
• Tables are organized into Compression Units
10x to 15x (CUs)
Reduction • CUs are a multiple of database block size
• Within Compression Unit, data is organized by
column instead of by row
• Column organization brings similar values
close together, enhancing compression
Compression Units
• Compression Unit
• Logical structure spanning multiple database blocks
• Data organized by column during data load
• Number of rows for a CU determined at load, based on row size
and estimated compression
• Each column compressed separately
• All column data for a set of rows stored in compression unit
Logical Compression Unit
BLOCK HEADER BLOCK HEADER BLOCK HEADER BLOCK HEADER

CU HEADER C3 C7
C5
C1 C8
C4
C2 C6 C8
EHCC tables
Details
• Data loaded using direct load uses Hybrid Columnar
Compression
• Parallel DML, INSERT /*+ APPEND */, Direct Path SQL*LDR
• Optimized algorithms avoid or greatly reduce overhead
of decompression during query
• Individual row lookups consume more CPU than row format
• Need to reconstitute row from columnar format
EHCC tables
Details
• Updated rows automatically migrate to lower
compression level to support frequent transactions
• Table size will increase moderately
• All rows in Compression Unit are locked during
updates
• Data loaded using conventional INSERTs use lower
compression level
Integration with Oracle features
• Fully supported with…

• B-Tree, Bitmap Indexes, Text indexes
• Materialized Views
• Exadata Server and Cells including offload
• Partitioning
• Parallel Query, PDML, PDDL
• Schema Evolution support, online, metadata-only add/drop
columns
• Data Guard Physical Standby Support
• Logical Standby (as of 11.2.0.2)
• Streams supported in a future release
Warehouse Compression Archive Compression

• 10x average storage savings • 15x average storage savings
• 10x reduction in Scan IO • Up to 70x on some data
• For cold or historical data
Optimized for Speed Optimized for Space
Smaller Warehouse Reclaim 93% of Disks

Faster Performance Keep Data Online
Can mix OLTP and hybrid columnar compression by partition for ILM
Warehouse Compression
• 10x average storage savings
• 100 TB Database compresses to 10 TB
• Reclaim 90 TB of disk space
• Space for 9 more ‗100 TB‘ databases
• 10x average scan improvement
• 1,000 IOPS reduced to 100 IOPS
10
TB
100 TB
Archive compression
• Compression algorithm optimized for maximum storage
savings
• Benefits any application with data retention requirements
• Best approach for ILM and data archival
• Minimum storage footprint
• No need to move data to tape or less expensive disks
• Data is always online and always accessible
• Run queries against historical data (without recovering from tape)
• Update historical data
• Supports schema evolution (add/drop columns)
Archive compression
• Optimal workload characteristics for Archive compression
• Any application (OLTP, Data Warehouse)
• Cold or historical data
• Data loaded with bulk load operations or compressed using in-
database bulk compression operations
• Minimal access and update requirements
• 15x average storage savings

• 100 TB database compresses to 6.6 TB
• Keep historical data online forever
• Up to 70x savings seen on production customer data
EHCC Syntax
Warehouse Compression Syntax:

CREATE TABLE emp (…)
COMPRESS FOR QUERY [LOW | HIGH];
Online Archival Compression Syntax:

CREATE TABLE emp (…)
COMPRESS FOR ARCHIVE [LOW | HIGH];
Comparisons
1000 1000 100
Uncompressed Uncompressed Pure
Columnar
Cliff
500 Table Compress 500 50

Table
Hybrid & Pure Hybrid
Column Pure Column
0 0 0
Table Size Scan Time Row Lookup Time
• Hybrid Columnar Compression combines the best of row and

column formats
• Best compression – matching full columnar
• Excellent scan time – 93% as good as full columnar
• Good single row lookup – no full columnar ―cliff‖
• Row format best for workloads with updates or trickle feeds
Data Archiving Strategies
• OLTP Applications
• Table partitioning
• Heavily accessed data
• Partitions using OLTP Table Compression
• Partitions using Online Archival Compression
• Data Warehouses
• Table partitioning
• Heavily accessed data
• Partitions using Warehouse Compression
• Partitions using Online Archival Compression
EHCC benefits
Efficient data movement
• Read/Write compressed data to disk
• Write compressed data to ASM mirrors
• Read/Write compressed data in Flash Cache
• 10x improvement for Flash price performance
• Send compressed data over Infiniband
• Write compressed data to Redo Logs
• Send compressed data to standby database
• 10x reduction in WAN bandwidth cost: makes ADG appealing for DW
• Write compressed data to Backups
Oracle Confidential
EHCC benefits
Efficient queries
• Specialized columnar query processing engine runs in

Exadata Storage Server to run directly against compressed
data
• Column optimized processing of query projection and filtering
• Vector processing techniques used to fully leverage columnar format
• 10x to 100x smaller subset of qualifying data returned over
Infiniband to database server for further query processing
• Optimized single row lookups to perform efficient I/O of a
contiguous set of blocks that form a Compression Unit
Oracle Confidential
EHCC in the real world

Storage savings
• Retail
• Top Global Retailer 4x
• Scientific Data Customer (EHCC, Archive Compression)
• Top R&D customer (with PBs of data): 28x
• OLTP Customer (EHCC, Archive Compression)
• SAP R/3 Application, Top Global Retailer: 28x
• Oracle E-Business Suite, Oracle Corp.: 23x
• Custom Call Center Application, Top Telco: 15x
Storage savings
• Financial (EHCC, Data Warehouse Compression)
• Top Financial Services 1: 11x
• Telco (EHCC DW Compression)
• Top Telco 1: 8x
• Top Telco 2: 14x
• Top Telco 3: 6x
• Top Telco 4: 7x
Real World DW Performance
(Leading Financial Company)
• Compression Ratios
• Query High: 11x
• Archive High: 16x
• Load Performance
• data pump loading from flat file
• 28% increase in elapsed time
• Query Performance
• 40% faster to execute 60
queries in customer workload
Oracle Confidential
EHCC benefits
Table scan performance
• Table scans of EHCC data run significantly faster than
uncompressed
• Sample test run (uses Call Data Record data, 46
columns)
• Compression ratio: 14x
• Load takes 55% more time
• Table Scan runs 5.5x faster (less disk I/O)
Oracle Confidential
Estimating savings
• EHCC Compression Advisor
• Runs on any 11.2 setup (non-Exadata too)
• Given a sample of customer data, provides compression ratio
estimates
• Patch available for 11.2 (8896202)
Tips and techniques

Tips and techniques
Compression Advisor
• Too little data can reduce compression ratios

• It is best to try with a big dataset, if possible
• By default, advisor does sampling. You can specify it to use
all rows.
• You should run Advisor with data co-located as
customer is going to use.
• Do not perform extra sort or partition
• UNIFORM tablespaces can have unused blocks.
• Advisor cannot be used on UNIFORM tablespaces
Tips and techniques
When to use EHCC
• Designed for data warehouse workloads
• What if customer has lot of DMLs in workload?
• EHCC can be changed per partition
• Use ILM to compress older, less updated partitions
• Use ‗ALTER TABLE MOVE‘ when partition has stabilized.
• How can I determine if I should do ‗ALTER TABLE
MOVE‘?
• Use dbms_compression.get_compression_type
Tips and techniques
Loading data
• 1 - 2TB/hour compressed loads on full Exadata Rack

• EHCC load speeds comparable to basic compression
• Loading speeds depend on data and compression level
• If customer wants higher load speeds
• High speed load mode available – ‗Query Low‘
• EHCC can be turned off temporarily during critical loads
• Or load into uncompressed and then compress partition later
• Make sure loads are ‗Direct path‘
• No EHCC for single row or buffered row inserts
Tips and techniques
Loading data
• Use DBFS as a staging file system

• Check the data distribution
• If all the data is going into few partitions, speed can appear
slow
Tips and techniques
Storage savings
• Storage savings very dependent on data
• Can vary from 2x-200x
• Compression ratios can be misleading when
compared to other competitors
• Ratio depends on efficiency of non-compressed storage
• Always compare final size of a table on disk
• If customer wants higher storage savings
• Higher storage saving mode ‗Archive Low‘ can be used.
• Don‘t use UNIFORM Tablespaces
• UNIFORM tablespaces can cause extra blocks to be
allocated
Tips and techniques
Performance
• Highest benefit for I/O-bound queries
• If query is disk-bound, it can speed up by compression ratio
• CPU-bound queries may not see as much
performance improvement
• Storage saving benefits still attractive
• Most queries see speed ups somewhere in between
• Look at customer queries to see if they can be sped
up
Tips and techniques
ILM
• You can assign compression techniques based on
partitions
• For active partitions
• Advanced Compression
• Compresses data as it is updated and added
• For less active partitions
• EHCC, warehouse mode
• Better compression, little performance impact
• For historical partitions
• EHCC, archive mode
• Best compression
© 2008
Storage indexes
Presenter‘s Name
Presenter‘s Title
Storage indexes
Exadata Storage Index 11.2
Transparent I/O Elimination with No Overhead
Table Index
• Exadata Storage Indexes maintain summary
information about table data in memory
A B C D
• Store MIN and MAX values of columns
1 • Typically one index entry for every MB of disk
Min B = 1
3
Max B =5 • Eliminates disk I/Os if MIN and MAX can never
5 match ―where‖ clause of a query
5
Min B = 3 • Completely automatic and transparent
8
Max B =8
3
Select * from Table where B<2 - Only first set of rows can match
Storage indexes
How they work
• Storage indexes are used to filter out data from

consideration
• Indexes help you find data, storage indexes help you filter
data
• Index values are created for 1 MB storage regions
• Each index region indexes can have its own set of
columns
• Based on heuristic evaluation of data distribution
• Minimum and maximum values for multiple columns
in each storage region
Exadata Storage Indexes
Sample Table SALES
MIN MAX
Order_date Ship_date Cust_ID Prod_ID Amount
Data Chunk #1
Order_date 03-SEP-2009 03-SEP-2009

03-SEP-2009 19-SEP-2009 10075 32932 10,000.00
Ship_date 05-SEP-2009 07-OCT-2009

03-SEP-2009 05-SEP-2009 20098 20098 20,000.00
Cust_ID 10075 20098
03-SEP-2009 07-OCT-2009 10089 20010 15,000.00 Prod_ID 20010 32932
Amount 10,000 20,000

03-SEP-2009 01-OCT-2009 20100 10000 35,000.00
Data Chunk #2
03-SEP-2009 19-OCT-2009 80300 30000 10,000.00 Order_date 03-SEP-2009 03-SEP-2009
Ship_date 01-OCT-2009 03-NOV-2009
03-SEP-2009 03-NOV-2009 10000 2030 40,000.00

Cust_ID 10000 80300
Prod_ID 2030 30000
Amount 10,000 40,000
• Synopsis for frequently used columns

is automatically collected
• Stored in-memory within Exadata Storage Server
Exadata Storage Indexes
Sample Table SALES
MIN MAX
WHERE ship_date
Data Chunk #1
between ’01-SEP-2009’
and ’30-SEP-2009’
Ship_date
Cust_ID
05-SEP-2009
10075
07-OCT-2009
20098
 leads to elimination of data chunk #2
Prod_ID 20010 32932
Order_date Ship_date Cust_ID Prod_ID Amount
Amount 10,000 20,000

03-SEP-2009 19-SEP-2009 10075 32932 10,000.00
Data Chunk #2
03-SEP-2009 05-SEP-2009 20098 20098 20,000.00
Ship_date 01-OCT-2009 03-NOV-2009

x 03-SEP-2009 07-OCT-2009 10089 20010 15,000.00
Cust_ID 10000 80300 03-SEP-2009 01-OCT-2009 20100 10000 35,000.00
Prod_ID 2030 30000

03-SEP-2009 19-OCT-2009 80300 30000 10,000.00
Amount 10,000 40,000

03-SEP-2009 03-NOV-2009 10000 2030 40,000.00
• Storage Index eliminates data chunks of no interest

• Provides ‗partition-pruning‘-like functionality
Storage indexes
Monitoring
• Can monitor I/O savings from v$sysstat using the

statistic "cell physical IO bytes saved by storage
index‖
Storage indexes
Conditions
• Works with Smart Scan queries

• Predicate selection uses storage indexes if appropriate
• Works with <, <=, =, !=, >=, >, IS [NOT] NULL
• Storage index works with joins based on Bloom filters
• Works with uncompressed tables, OLTP
compression, EHCC, tablespace encryption
Storage indexes
Conditions - continued
• NLS columns and LOBs will not be used in a storage

index
• Writes for Hybrid Columnar Compression and
tablespace encryption invalidate storage region index
• Non-configurable
Storage indexes
“Maintenance”
• Storage indexes lost in the event of a cell reboot

• Portions of SI may be invalidated as a result of write
operations
• Rebuilt as Smart Scan queries touch storage regions
• Heuristically adjusted in response to distribution of
predicate columns in Smart Scan queries
• Think of storage index maintenance as cyclical
• Loading data in sorted order can result in good use of
storage indexes
Storage Index with partitions
Example
Orders Table
Order# Order_Date Ship_Date Item
Partitioning Column
1 2007 2007
2 2008 2008
3 2009 2009
• Queries on Ship_Date do not benefit from Order_Date partitioning

• However Ship_date and Order# are highly correlated with Order_Date
• e.g. Ship dates are usually near Order_Dates and are never less
• Storage index provides partition pruning like performance for queries on
Ship_Date and Order#
• Takes advantage of ordering created by partitioning or sorted loading
© 2008
Resource Manager
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Resource Manager overview <Insert Picture Here>
• Contending CPU workloads

• Parallel execution workload management
• Database consolidation
• Server consolidation
Resource Manager
overview
Resource Manager
Overview
• Allows you to allocate

over-subscribed
resources
• Key tool for guaranteeing
SLAs
• Works with Oracle
databases since 8i
• Works transparently,
based on session ID
Resource Manager
Implementation
1. Group sessions with similar performance objectives into

Consumer Groups
2. Allocate resources to consumer groups using Resource
Plans
3. Enable Resource Plan
Creating Consumer Groups
• Create Consumer Groups for each type of workload, e.g.
• OLTP consumer group
• Reports consumer group
• Low-Priority consumer group
• Create rules to dynamically map sessions to consumer groups, based
on session attributes
Mapping Rules Consumer Groups
OLTP
service = „Customer_Service‟
client program name = „Siebel Call Center‟
Oracle username = „Mark Marketer‟
Reports
module name = „AdHoc‟
query has been running > 1 hour
estimated execution time of query > 1 hour
Low-Priority
Creating Resource Plans
Priority-Based Plan
Priority 1: OLTP
Ratio-Based Plan Priority 2: Reports
Priority 3: Ad-Hoc
Reports
30%
OLTP
60%
Hybrid Plan
Low-Priority
Level 1 Level 2
10%
OLTP 90%
Reports 60%
Low-Priority 40%
Enable Resource Management
• Manually
• Set resource_manager_plan parameter
• Automatically
• Set resource plan for a scheduler window
Contending CPU
workloads
Resource Manager
Contending CPU workloads
When a database host has

100%
insufficient CPU for all workloads,
the workloads will compete for
60%
CPU.
Performance of all workloads will
CPU 90%
Usage 80% degrade!
40%
What if you cannot tolerate
performance degradations for
certain workloads?
OLTP Reports ETL +
only only Reports
Resource Manager
Contending CPU workloads
100%
20%
CPU
With Resource Manager,
80% 90% 80% 90% you control how CPU
Usage
resources should be
allocated
10%
OLTP Reports OLTP + Reports OLTP + Reports

only only
Resource Manager Enabled
OLTP Reports
Prioritized Prioritized
Resource Manager
CPU management details
• Very fine-grained scheduling

• Resource Manager schedules at a 100 ms quantum
• Low-priority session yields to a high-priority session in ~1
quantum
• Background processes are not managed
• Backgrounds are either high-priority or not CPU-intensive
• Maximize CPU utilization
• If one consumer group doesn‘t use its allocation, it is
redistributed to other consumer groups based on the resource
plan
Parallel execution
workload management
Parallel execution
Potential problems
• Parallel servers are a limited resource

• Limit specified by parallel_max_servers
• Too many concurrent parallel statements causes thrashing
• When there are no more parallel servers
• Critical statements may run serially
• When parallel servers free up, no way to boost DOP of
running statements
• Non-ideal solutions
• Size system for maximum load, inefficient
• Manually schedule large queries during off hours
Parallel Statement Queuing
• Introduced in 11.2.0.1
• Goals:
1. Run enough parallel statements to fully utilize system
resources
2. Ensure appropriate degree of parallelism for all statements
• Enable by setting parallel_degree_policy = “auto”
Statement is parsed If not enough parallel

and Oracle automatically servers available, queue
determines DOP the statement
SQL
statements
64 32
64 16
32 128
16
FIFO Queue
When the required number

of parallel servers become
If enough parallel available, dequeue and
servers available, execute the head
execute immediately statement
8
128
With Resource Manager
• One Consumer Group can flood the system and
queue with queries
• Critical queries are forced to queue
• Critical queries are stuck behind batched queries
 Limit the DOP for queries from a Consumer Group
 Limit the percentage of parallel servers a Consumer
Group can use
• Reserves parallel servers for critical parallel queries
• Coming soon…
For example, parallel queries from the Batch consumer

group can only use 50% of the parallel servers
• DBAs want to control the order that parallel queries

are dequeued
• Prioritize tactical queries over batch and ad-hoc queries
• Impose a user-defined policy for ordering queued parallel
statements
• Coming soon…
 Separate queues per Consumer Group
 Resource Plan specifies which queues parallel
statements are issued next
Current Resource Plan:
Priority 1: Tactical
Next Parallel Priority 2, 70%: Normal
Query Priority 2, 30%: Ad-Hoc
Tactical T T
Consumer Group
Parallel TTT
Normal N
Consumer Group Query N
Selection
Ad-Hoc A A A
Consumer Group
Running
Queries
Test Results: 2 Concurrent Workloads
Crtical Analytics: 150% degradation

Non-critical reporting: 9% degration
Crtical Analytics: 16% degradation

Non-critical reporting: 10% degration
Database consolidation
Database consolidation challenges
Service levels
• Insuring service levels for all applications

• Surge in one application‘s workload should not affect
another‘s
• Need a minimum, guaranteed amount of CPU and I/O per
application
Use CPU Resource Manager to allocate CPU

Use Exadata I/O Resource Manager to allocate I/O
Database consolidation challenges
Consistent performance
• Insuring consistent performance

• An application‘s performance should be consistent, even if all
other applications are idle
• Need a way to limit CPU and I/O utilization!
Specify maximum CPU and I/O utilization per

Consumer Group in Resource Plan
• CPU utilization limit and I/O utilization limit – new in 11.2.0.1
Maximum Utilization Limit
• ―max_utilization_limit‖ directive limits an application‘s
CPU and I/O utilization
DB Consolidation Plan #1 DB Consolidation Plan #2
Resource Maximum Resource Maximum

Allocation Utilization Limit Allocation Utilization Limit
App 1 50% 50% App 1 50%
App 2 20% 50% App 2 20%
App 3 20% 50% App 3 20%
App 4 10% 50% App 4 10%
Specify minimum and maximum Specify maximum CPU and I/O

CPU and I/O utilization limits utilization limits only
Test Results: CPU Utilization Limit
Setting limit to 25%, 50%, and 75%
Workload is a mix of
OLTP transactions,
parallel queries, and
DMLs from Oracle
Financials
Test Scenario: I/O Utilization Limit
Limiting Disk Utilization for TPC-H Workload
100.00%
90.00%
80.00%
% Disk Utilization
70.00%
No limit
60.00%
75% limit
50.00%
50% limit
40.00%
30.00% 25% limit
20.00%
10.00%
0.00%
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Time in Minutes
Server consolidation
Server Consolidation
Challenges
• Common theme in today‘s data centers
• Many test, development, and small production databases
• Low loads
• Not critical
• Cannot fully utilize today‘s powerful servers!
• Solution – server consolidation
• Run multiple database instances on the same server
• But there may be problems
• Contention for CPU, memory, and I/O
• Unexpected workload surges on one instance can wreak
havoc on other databases
Server consolidation challenge
Instance Caging
• Limits the CPU consumption of a database instance

• Advantages over virtualization
• No I/O overhead
• No new license
• No sys-admin overhead
• Advantages over O/S workload managers
• Available on all platforms
• Easy to configure
Instance Caging
Configuration
• Just 2 steps:
1. Set ―cpu_count‖ parameter
• Maximum number of CPUs the instance can use at any
time
2. Set ―resource_manager_plan‖ parameter
• Enable any CPU resource plan
• E.g. out-of-box plan ―DEFAULT_PLAN‖
Instance caging
Over-provisioning approach
• Scenario
• Multiple database Sum of cpu_counts = 12
instances sharing a server

• Instances are typically 12
well-behaved Instance D
9
• Server‘s CPUs not typically
Instance C
fully utilized 6
Total Number
• Use Instance Caging to 3
Instance B
of CPUs = 4
over-provision Instance A
• Limit each instance‘s CPU
usage
Instance caging
Partitioning approach
• Scenario Sum of cpu_counts = 32
• Multiple database
instances sharing a
32 Total Number
server of CPUs = 32
Instance D
• Performance-critical 24 Instance C
databases
Instance B
• Cannot afford any 16
interference from each
other 8
Instance A
• Use Instance Caging
to partition
Instance Caging
Results
Swingbench CPU Utilization
100%
• Swingbench OLTP 90%
application 80%
• 4 CPU Linux server 70%
• Oracle 11gR2 60% idle

50% sys
• Instance Caging enabled user
40%
with 2 CPUs
30%
20%
10%
0%
Instance Caging Off Instance Caging On
Instance Caging
Results
6000
• 2 sysbench 5000
applications
Transactions Per Second

Instance 1
4000
• 6 CPU Linux server Instance 2
• Oracle 11gR2 3000

Both Instances
• Instance Caging 2000 Instance 1 Only, No
enabled to partition Instance Caging
server 1000
0
0,6 1,5 2,4 3,3 4,2 5,1 6,0
cpu_count: <Instance 1>,<Instance 2>
Exadata I/O Resource Manager
Need to limit the disk utilization of a database?
Maximum disk utilization limits for I/O:

 Coming soon!
 Provides predictable, consistent performance
 Configure via inter-database resource manager plan
 Specifies the maximum disk utilization for each
database
© 2008
I/O Resource Manager
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Shared storage issues <Insert Picture Here>
• I/O Resource Manager overview

• IORM resource management
• IORM examples
• IORM at work
• Enabling IORM
Shared storage issues

Issues with shared storage
• Storage can be shared by multiple types of workloads

and multiple databases
• Sharing lowers administration costs
• Sharing leads to more efficient usage of storage
• But, workloads may not happily coexist
• ETL jobs interfere with DSS query performance
• One production data warehouse can interfere with another
• How do you gain benefits of shared storage without
introducing inconsistent performance?
Traditional Solutions
• Over-provision the storage system
• Configure your storage based on the maximum expected load
• Wasteful and expensive
• Place performance-critical databases on dedicated
storage
• Still need to ensure that administrative tasks like backups or
data loads don‘t interfere
• High administrative costs and expensive storage
• Schedule non-critical tasks at off-peak hours
• Cumbersome and prone to problems
Exadata solution
• Efficient utilization of I/O bandwidth
• Goal is to have 100% utilization
• Consistent performance
• Goal is to avoid 100+% utilization
• Prioritization of workloads
• Goal is high priority workloads get enough bandwidth
I/O Resource Manager

overview
Sample Exadata Configuration
Single-Instance RAC
Database Database
InfiniBand Switch/Network
Exadata Cell Exadata Cell Exadata Cell
• Databases are deployed across multiple Exadata cells

• Database enhanced to work in cooperation with Exadata
intelligent storage
• ASM implements striping and mirroring for Exadata
• Exadata Storage Servers can support multiple databases
I/O Bandwidth Limits
Extreme consequences • Each Exadata Cell has an I/O
Production Database bandwidth limit
• If the databases issue I/O over this
limit, performance will degrade
500 MB/s
Ad-Hoc Queries Desired

Bandwidth: Desired Bandwidth:
500 + 1000 MB/s 500 + 1000 + 800 = 2300 MB/s
Available I/O Bandwidth:
1200 MB/s
Critical Reports
1000 MB/s
Storage Network
Reports Storage
Desired Bandwidth:
800 MB/s
Development Database
Managing the I/O Bandwidth with IORM
I/O Resource Manager provides a way to
Production Database manage how multiple workloads and databases
share the available I/O bandwidth
500 100 MB/s

Available I/O Bandwidth:
Ad-Hoc Queries Actual 1200 MB/s
Bandwidth: Actual Bandwidth:
500 100 + 1000 500 100 + 1000 + 800 100 =
MB/s 1200 MB/s
Critical Reports
1000 MB/s
Storage Network
Reports Storage
Actual Bandwidth:
800 100 MB/s
Development Database
When Does I/O Resource Manager
Help the Most?
• Conflicting Workloads
• Multiple consumer groups in a Database (e.g. ad hoc queries,
critical reports)
• Multiple databases (e.g. production, test)
• Concurrent database administration - backups, ETL, file
creation
• I/O is a bottleneck
• Significant proportion of the wait events are for I/O
• Any data warehouse workload!
IORM resource
management
IORM Possible Scenarios
I/O
Resource
Manager
Inside Across
One Multiple
Database Databases
Mixed Dueling Cooperative

Workload Databases Databases
Intra-Database Inter-Database Category

Resource Resource Resource
Management Management Management
IORM Resource Management
Intra-database
• Used to manage multiple workloads in a single database

• Enabled at the database level by Database Resource Manager
and resource plans
• Group sessions with similar performance objectives into
consumer groups
• Create a resource plan that specifies how I/O requests should be
prioritized
Creating Consumer Groups
• Create consumer groups for each type of workload, e.g.
• Priority DSS consumer group
• DSS consumer group
• Maintenance consumer group
• Create rules to dynamically map sessions to consumer groups,
based on session attributes
Consumer Groups
Mapping Rules
Priority DSS
service = „PRIORITY‟
Oracle username = „LARRY‟
Oracle username = „DEV‟
client program name = „ETL‟
DSS
function = „BACKUP‟
query has been running > 1 hour
Maintenance
Creating Resource Plans
Priority-Based Plan
Priority 1: Priority DSS
Ratio-Based Plan Priority 2: DSS
Priority 3: Maintenance
DSS
30%
Priority DSS
60%
Maintenance Hybrid Plan

10%
Level 1 Level 2
Priority DSS 90%
DSS 100%
Maintenance 5%
Configuring Consumer Groups & Plans
• Consumer groups and plans are configured on the database

• Configure using dbms_resource_manager PL/SQL package
• Configure using Resource Manager section in Enterprise
Manager
• Plans are used for both CPU and I/O resource management
• Multiple plans can be defined
• E.g. daytime plan, evening plan, emergency maintenance
plan
• Set plans using ―resource_manager_plan‖ parameter
• Only one plan can be enabled at any time
• Use the Job Scheduler to automatically enable plans
Inter-database
• Can I/O Resource Manager allow multiple databases to effectively
share Exadata storage with the following requirements?
• Partition resources among multiple production databases
• Don‘t allow standby, development, and test databases to impact
production databases
Sales Data Exadata Finance Data

Warehouse Storage Warehouse
Customer Sales
Service Development Sales Test
Standby Database Database
Database
Inter-database
• Inter-database plan allocates resources for each database

• Divides resources among production databases
• Allocates unconsumed resources to test databases
• Configure and enable inter-database plans via CellCLI on each
Exadata Servers
• Can have multiple levels of plan
• Each sublevel uses resources left over from superior level
Exadata Inter-database Plans
Usage
• Only inter-database plan configured
– IORM picks a database I/O request using the inter-database
plan
• Inter-database plans can be configured along with intra-database
plans
– Inter-database plans manages I/O among databases
– Intra-database plans manages I/O among consumer groups
within a Database
– IORM first picks a database I/O request using the inter-
Database plan
– Then picks a consumer group from that database, using its
intra-database plan
Exadata Inter-database plans
Sales Data Warehouse Exadata
Priority DSS Consumer Cell
Group Queue
PP
Sales
Database
DDD Sales-Priority
DSS Consumer Group

Queue
I/O
Resource
Finance Data Warehouse Manager
Priority DSS Consumer
Group Queue
Finance-Priority
P
Finance
Database
DDDD
DSS Consumer Group
Queue
IORM Resource Managerment
Categories
• Categories are used to group consumer groups, based on nature
of workload
• Goal: workload priority should depend on its type, not just which
Database it‘s running on
Database Consumer Group Workload Type

Sales Production Priority DSS Critical
DSS Somewhat critical
Maintenance Not critical
Finance Production Priority DSS Critical
DSS Somewhat critical
Maintenance Not critical
Sales Development Priority DSS Not critical
DSS Not critical
Categories
Grouping consumer groups
• Category plan allocates resources for each Category

• Category is an attribute of each Consumer Group
• Category Plans are enabled and configured via CellCLI on each
Exadata Cells
Priority-Based Category Plan

Priority 1: Critical
Priority 2: Somewhat Critical
Priority 3: Not Critical
Categories
With other plans
• First, categories (if present) allocate I/O requests for cell

• Second, inter-database plans (if present) allocate I/O
requests for multiple databases per cell
• Finally, intra-database plans (if present) allocate I/O for
consumer groups within a cell
Levels
• Levels are a way to give priority to some consumer

groups over others
• Each lower level gets to allocate I/O resources that were not
allocated by the previous level
• You can specify up to 8 levels of resource allocation
• Each level assigns percentages to consumer groups,
databases or categories
IORM examples
IORM Possibilities
• Give 70% of my storage performance capacity to Data Warehouse

finance, 30% to Data Warehouse sales
 Enable an Inter-Database plan
• Prioritize my Production Databases over my Test and Development
Databases
 Enable an Inter-Database plan
• Prioritize my OLTP workloads over my maintenance workloads
 Enable a Category plan
• For my Standby Databases, prioritize apply I/O‘s over read-only queries
 IORM does this automatically
• Always prioritize Control File and other critical I/O‘s
• Automatically pace ASM rebalances and RMAN jobs
Scenario 1: OLTP vs Report
• Database has 2 workloads

• Critical OLTP workload: Order Entry application
• Non-critical workload: Report, based on Order Items table
• Your goal: protect the performance of critical OLTP workload
• Solution: use a priority-based resource plan
Priority-Based Plan
Priority 1: Interactive
Priority 2: Batch
Scenario 1: OLTP vs Report
Results
• I/O Resource Manager boosts OLTP performance by 408%!
• Report has small effect on OLTP performance (8%)
• Report data uses significant disk space, resulting in longer seek times
• Storage system is fully utilized
• OLTP workload: 376 IOPS per disk
• Report workload: 5 MBps per disk
1400
1200
1000
OLTP Performance 800
in TPS 600
> 4X Improvement!
400
200
0
OLTP & Report OLTP & Report

OLTP Only No IORM IORM
Scenario 2: DSS Query vs DSS Query
• Two different Data Warehouses are running DSS queries

• Production Data Warehouse: Critical
• Development Data Warehouse: Non-Critical
• Our goal: protect the performance of Critical DSS query
• Solution: use an Inter-Database plan to prioritize production data
warehouse
Priority-Based Plan
Priority 1: Production Data Warehouse
Priority 2: Development Data Warehouse
Scenario 2: DSS Query vs DSS Query
Results
• I/O Resource Manager boosts critical query time by 41%
• Non-critical query has small effect on critical query (9%)
• Report data uses significant disk space, resulting in longer seek times
• Running queries together is 17% more efficient than running them
serially
400
41% Improvement!
300
Critical Query:
200
Elapsed Time in
Seconds 100
0
Both Queries Both Queries
Critical Query No IORM IORM
IORM at work
How IORM operates
• Resource limits only take effect when I/O bandwidth is

100% utilized
• Any resource group or category can access
bandwidth until I/O bandwidth saturation is reached
• Once I/O bandwidth is taken, I/O requests are queued
according to the IORM plan(s)
• Sub-plans allocate the resources give to the owner of the plan
I/O Scheduling
Traditional way
• With traditional storage, I/O schedulers are black boxes
• You cannot influence their behavior!
• I/O requests are processed in FIFO order
• Some reordering may be done to improve disk efficiency
• Elevator algorithms, deadline scheduling
Disk Queue
I/O
Requests Traditional
RDBMS Storage H L H L L L
Server
High-Priority Low-Priority
Workload Workload
I/O Scheduling
Exadata way
• Exadata executes requests, based on the user‘s prioritization
scheme
• Exadata may internally queue I/O requests to prevent a low-
priority intensive workload from flooding the disk
Exadata
High-Priority Disk Queue
Workload Queue
I/O H
Requests I/O
RDBMS Resource H L H H
Manager
LLLL
Low-Priority
Workload Queue
IORM Resource Plans
• I/O Resource Manager issues enough I/O requests to the disk to
keep it busy and efficient
• One queue for each consumer group
• When IORM is ready to issue the next request, it uses the
Resource Plan to select a consumer group queue
• Percentage for each queue determined by overall resource plan
Exadata
Priority DSS Consumer Disk Queue
Group Queue
I/O P
Requests I/O
RDBMS Resource P D P P
Manager
DDDD
DSS Consumer
Group Queue
IORM
Background I/Os
• Redo and control file I/Os always take top priority

• DBWR writes take priority specified in plan
IORM allocations
Categories, inter-database, intra-database
CG 1 –
50% 21%
Database
1 (60%) CG 2 –
50% 21%
High 42%
(70%) CG 3 –
50% 14%
Database
2 (40%) CG 4 –
50%
14%
Cell 1 28%
CG 5 –
75%
13.5%
Database
1 (60%) CG 6 –
25%
4.5%
Low 18%
(30%) CG 7 –
80%
9.6%
Database
2 (40%) CG 8 –
20%
2.4%
12%
Categories
Inter-database
Intra-database
Enabling IORM
Enabling IORM
Steps
• Define consumer groups with DBRM
• You must assign sessions to consumer groups, either manually or
through consumer group mapping rules
• Create intra-database plan with Database Resource
Manager
• [Assign categories with to consumer groups with DBRM]
• [Create inter-database plan with CellCLI]
• Enable plan with RESOURCE_MANAGER_PLAN
parameter
• Enable IORMPLAN on all cells
• DBPLAN and CATPLAN
Enabling IORM
• You can switch Database Resource Manager IORM

plans at runtime
• IORM plans persist through cell reboots
© 2008
Flash Cache
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Flash Cache basics <Insert Picture Here>
• Configuring Flash Cache

• Flash Cache usage
• Flash Cache at work
• Flash Cache monitoring
• Flash Cache troubleshooting
Flash Cache basics

Why Flash?
• Disk drives hold vast amounts of data

• But are limited to few hundreds I/Os per second
• Flash technology holds much less data

• But can run tens of thousands of I/Os per second
• Exadata v2+ solution:

• Keep most data on disk for low cost
• Transparently move hot data to flash
Sun Exadata Storage Server
Dual-redundant, hot-
swappable power supplies 24 GB DRAM (6 x 4GB)
ILOM 12 x 3.5” Disk Drives
Disk Controller
HBA with
512M BBC
2 Quad-Core Intel®
Xeon® E5540
Processors
InfiniBand QDR 4 x 96GB Sun Flash PCIe

(40Gb/s) dual Cards
port card
Sun Flash Accelerator F20
• 96GB Storage Capacity
• 4 x 24GB Flash modules/DOM
• 6GB reserved for failures
– Advanced Wear Leveling, Page Erase
Management, Performance Pipelining, Bad Block
Mapping
• x8 PCIe card
• Avoid disk controller limitations
• Super Capacitor backup
• Built-in write-back cache
• Measured end-to-end performance
• 3.6GB/sec/cell
• 75,000 read IOPs/cell
Smart Flash Cache benefits
Performance
• 50GB/s throughput
• 1 million IOPs
• Use PCIe cards instead of SSDs to avoid slow disk interface
• Exadata storage, InfiniBand and PCIe can drive higher levels of
performance
• Traditional storage arrays and SANs already have internal
bottlenecks which prevent them from exploiting the full spinning
disk performance and hence are unable to leverage the higher
performance of flash technology
Capacity
• Linearly scalable – no bottlenecks as you add more
storage
• Efficient compression increases effective performance
and capacity by up to 10X
Smart caching
• Integrated database and Exadata Storage Server
software ensures only frequently accessed data in
cached
• Automatically skips caching of data that will not be frequently
accessed or avoid caching data that will not fit in the cache
• Backups, mirrored copies, ASM rebalance, Data Pump, etc.
• Database awareness enables caching only data likely to be
accessed again
• User can fine-tune caching policies online
• Hardware flash cannot distinguish between relevant
database data and other data
• Much lower cache efficiency
• Much higher cost
Configuring Flash Cache

Flash Cache
Organization
• 4 - 24 GB flash memory per card

• 4 cards per cell
• 384 GB flash memory per cell - 16 cell disks
Flash Cache
Management
• Managed using MS CellCLI command tool
• By default, automatically created at cell creation
• CellCLI> CREATE CELL <Name> …
• Uses all available flash space by default
• Can be dropped at any time
• CellCLI> DROP FLASHCACHE
• Can be re-created at any time
• CellCLI> CREATE FLASHCACHE ALL [SIZE=…]
Flash Cache
Usage
• Flash-based cell disks can be used for

• Smart Flash Cache
• Uses all available space by default
• Managed automatically for maximum efficiency
• Flash-based grid disks
• Premium persistent DB storage
• Requires deliberate planning for efficient usage
Flash Cache
Flash-based grid-disks
ASM diskgroup
Grid Disk 1
Cell
Flash Disks …
Disk
Grid Disk n ASM diskgroup
Flash Cache
Creating grid disks
• Flash-based cell disks and grid disks
CellCLI> LIST CELLDISK DETAIL
name: FD_00_cell01
diskType: FlashDisk
. . .
name: CD_00_cell01
diskType: HardDisk
. . .
CellCLI> CREATE GRIDDISK ALL FLASHDISK –

PREFIX=„FAST„, SIZE=10G
GridDisk FAST_FD_00_cell01 successfully created
GridDisk FAST_FD_01_cell01 successfully created
. . .
Flash Cache usage

Flash Cache usage
Prioritization
• Prioritization levels
• DEFAULT
• KEEP
• NONE
• Assigned to table, index, partition or LOB column
• Can be modified with an ALTER statement
Flash Cache usage
Prioritization syntax
CREATE TABLE pt (c1 number)

PARTITION BY RANGE(c1)
(PARTITION p1 VALUES LESS THAN (100)
STORAGE (CELL_FLASH_CACHE DEFAULT),
PARTITION p2 VALUES LESS THAN (200)
STORAGE (CELL_FLASH_CACHE KEEP));
ALTER INDEX tkbi STORAGE (CELL_FLASH_CACHE NONE);

Prioritizing Flash Cache usage
KEEP option
• Impact of KEEP objects
• Cached more aggressively
• Cannot be pushed out by ‗default‘ objects
• 80% upper limit on KEEP cache size
• Do not add more data than KEEP can hold at one time
• Keep blocks are automatically ‗un-pinned‘ if
• Object is dropped, shrunk, or truncated
• Object is not accessed on the cell within 48 hours
• Block is not accessed on the cell within 24 hours
• Downgraded to ‗DEFAULT‘ behavior
• Changing priority from KEEP to NONE marks blocks in
cache as DEFAULT
Flash Cache at work

Flash Cache at work
Database server prep
• SQL statement is optimized

• SQL statement is sent to Exadata Storage Server
(CellSRV), with Flash Cache prioritization for objects
Flash
Cache
Database CellSRV
Server
Disk
Flash Cache at work
Read operations
• Checks to see if Smart Scan candidate

• If no, or if object has KEEP attribute
• Checks to see if object is in the Flash Cache
• Else, go to disk
• For some operations, may go to Flash Cache and
disk, increasing overall bandwidth
Flash
Cache
Database CellSRV ?
Server
Disk
Flash Cache at work
Write operations
• Writes directly to disk

• Acknowledge write to database server
• Does not interfere with speed of write operations
Flash
Cache
Database CellSRV
Server
Disk
Flash Cache at work
Post operation (read or write)
• Checks to see if data should be cached

• Mirrored I/Os, log writes, control file writes, etc., never cached
• If block not in Flash Cache
• Checks to see Flash Cache attribute
• KEEP – store in Flash Cache
• NONE – do not store in Flash Cache
• DEFAULT – if read operation and small I/O, store in Flash
Cache
Flash
Cache
Database CellSRV ?
Server
Disk
Flash Cache at work
Large I/Os not cached
• Flash Cache improves response time for small I/Os

• Flash Cache increases bandwidth for large I/Os
• No improvement for response time for large I/Os with Flash
Cache
• CellSRV will use bandwidth of both disk reads and Flash
Cache when appropriate, increasing overall bandwidth
• Usage patterns determine if queries require increased
bandwidth for large I/Os
Flash Cache monitoring

MS Metrics
• Get general information about Smart Flash Cache
CellCLI> LIST FLASHCACHE DETAIL
name: cell01_FLASHCACHE
cellDisk: FD_00_cell01,FD_01_cell01
. . .
FD_14_cell01,FD_15_cell01
creationTime: 2009-10-19T17:18:35-07:00
id: b79b3376-7b89-4de8-8051-6eefc
size: 365.25G
status: normal
MS Metrics
• Get overall statistics for Smart Flash Cache on a Cell
CellCLI> LIST METRICCURRENT WHERE –

objectType='FLASHCACHE„
FC_BY_USED 72119 MB
FC_IO_RQ_R 55395828 IO requests
FC_IO_RQ_R_MISS 123184 IO requests
...
CellCLI> LIST METRICDEFINITION FC_BY.*_USED DETAIL

name: FC_BY_USED
description: “Megabytes used on FlashCache”
name: FC_BYKEEP_USED
description: “Megabytes used for
keep objects on FlashCache"
Finding if an object is cached
• Cell-level caching statistics for a DB object
SQL> SELECT object_id FROM DBA_OBJECTS

2 WHERE object_name='EMP';
OBJECT_ID
---------
57435
CellCLI> LIST FLASHCACHECONTENT
WHERE objectNumber=57435 DETAIL
cachedKeepSize: 0
cachedSize: 495438874
dbID: 70052
hitCount: 415483
missCount: 2059
objectNumber: 57435
tableSpaceNumber: 1
Statistics and wait events
• Use standard Oracle tools
• AWR
• Enterprise Monitor
• End-to-End
• V$SYSSTATS statistics
• cell flash cache read hits
• physical read total bytes optimized
• cell physical IO bytes saved by storage index
• V$SQL
• OPTIMIZED_PHY_READ_REQUESTS
Database level
• System statistics
SQL> SELECT name, value FROM V$SYSSTAT WHERE
2 NAME IN ('physical read total IO requests',
3 'cell flash cache read hits');
NAME VALUE
physical read total IO requests 15673
cell flash cache read hits 14664
• AWR report
Segments by UnOptimized Reads

Tablespc UnOptimized
Name Object Type Reads %Total
CUST_0 CUST TABLE 7,322,866 31.95
IORDL_0 IORDL INDEX 3,787,324 16.52
Flash Response monitoring
Database level
• Segment Statistics
SQL> SELECT object_id FROM DBA_OBJECTS
2 WHERE object_name='EMP';
OBJECT_ID
---------
57435
SQL> SELECT statistic_name, value
2 FROM V$SEGMENT_STATISTICS
3 WHERE dataobj#= 57435 AND ts#=5 AND
4 statistic_name=„optimized physical reads‟;
STATISTIC_NAME VALUE
------------------------ ------
optimized physical reads 743502
Flash Response monitoring
Mapping cell disks
CellCLI> LIST LUN WHERE cellDisk=„FD_00_cell03‟ DETAIL

name: 1_1
cellDisk: FD_00_cell03
deviceName: /dev/sdn
diskType: FlashDisk
physicalDrives: [9:0:2:0]
CellCLI> LIST PHYSICALDISK „[9:0:2:0]‟ DETAIL

name: [9:0:2:0]
physicalFirmware: D20R
slotNumber: "PCI Slot: 1; FDOM: 1"
Flash Cache
troubleshooting
Data integrity
Protection
• Flash Cache is less stable than disk
• Flash Cache includes read-cache-verification
• Few ‗check bytes‘ are stored in memory for every 4 KB of data
written to flash
• During flash reads ‗check bytes‘ are verified
• If verification fails, data is read from disk
• Checking for Flash Cache errors
• CellCLI> LIST METRICCURRENT FC_IO_ERRS
CELLSRV alert.log file
• Verification level may be changed

© 2008
Oracle Exadata Database Machine performance
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Performance and I/O <Insert Picture Here>
• Data flow exchanges

• Data flow exchange capacities
• Case studies
Performance and I/O

Performance fundamentals
I/O constrained workloads
• Host(s) must be able to generate I/O requests

• CPU bound systems cannot generate more I/O
• Storage must be able to deliver the I/O
• Conventional storage bottlenecks abound
• Drawer, Loop, Storage Processor
• Host(s) must be able to ingest the I/O

• Must have adequate I/O adaptors
• Must have balanced ―bus‖ / memory bandwidth
• Must have adequate CPU bandwidth
• CPUs saturated by data in-flow cannot generate more I/O
Performance fundamentals
Data flow exchanges
• Data flow exchanges

• There exists a ―producer‖ / ―consumer‖ relationship between
the database grid and the storage grid.
• Points of data-flow between producers and consumers are
called data flow exchanges
• Producer/consumer relationships are the foundation of
throughput
Data flow exchanges

Sun Oracle Database Machine
Data flow exchanges
• Exchange 1: Within a Cell. The

flow of data between:
• HDD/flash <->memory <-
>CPU <-> iDB
Infiniband
CPUs & CPUs &

CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
Memory Memory
Exchange 1
Data flow exchanges
• Exchange 1: Within a cell. The

>CPU <-> iDB
• Exchange 2: The flow of data
between a single cell and the
Infiniband
database grid via iDB.
Exchange 2
• Realizable bandwidth is
roughly 2.5 GB/s
CPUs & CPUs &
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
Memory Memory
Exchange 1
Data flow exchanges
• Exchange 1: Within a cell. The

>CPU <-> iDB Exchange 3
• Exchange 2: The flow of data
between a single cell and the
Infiniband
database grid via iDB.
Exchange 2
• Realizable bandwidth is
roughly 2.5 GB/s
• Exchange 3: The flow of data CPUs &
CPUs &
CPUs &
CPUs &
Memory Memory
between a single database host CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
and the storage grid. CPUs &
Memory CPUs &
Memory
• Realizable bandwidth is Exchange 1 Memory Memory
roughly 2.5GB/s
Data flow enhancement
• The Sun Oracle Database Machine implements best
practices for throughput
• Balanced configuration, designed to avoid bottlenecks at all
points in the data flow
• Software designed to reduce volume of data required to flow
• Elimination of
• Excess rows (predicate evaluation, join filtering, storage
indexes)
• Excess columns (column projection)
• Software designed to reduce disk I/O
• Storage indexes, Exadata Smart Flash Cache
• Software designed to efficiently allocate data flow bandwidth
• IORM
Data flow exchange

capacities
Database Machine data flows
Maximum bandwidth
• Exchange 1: Within a cell.
• 125 MB/s * 12 HDD == 1.5 GB/s
• 3.6 GB/s (FLASH) + 1.2 GB/s (HDD) == ~4.8 GB/s
• Exchange 2: Between a single cell and the database (1:M) grid via iDB
• 2.5 GB/s
• Flow Control: A cell in the storage grid cannot produce/consume 2.5
GB/s unless the database grid can produce/consume the data.
• Exchange 3: Between a single database host and the storage grid via
iDB (1:N)
• 2.5 GB/s
• Flow Control: A host in the database grid cannot produce/consume 2.5
GB/s unless the storage grid can produce/consume the data.
Scaling – Exchange 1
• Exchange 1:
• Scales horizontally to a maximum aggregate rate of roughly
67 GB/s
• Achieving this maximum theoretical rate involves parallel
scanning of Flash and HDD on all cells in a full rack.
• At this rate, data cannot flow through Exchange 2. That is,
the data cannot leave the cells at this rate.
• Think of a query that looks for a non-existent needle in
a haystack:
• SQL> SELECT base FROM payroll WHERE base >
8,000,000 ;
• Many chefs
• – 283 –
• Exchange 2:
• Scales out to a maximum aggregate rate of roughly 20 GB/s
• This is the aggregate rate of data flow between the storage
grid and the database grid.
• The difference between Exchange 1 and Exchange 2 is the
Smart Scan effectiveness.
• Cells must reduce payload through projection/filtration to fit
within Exchange 2 bandwidth.
• The aggregate out-flow rate from Exchange 2 must fit
within about 20 GB/s
• Exchange 2:
• All hosts in the database grid must participate in order to
accommodate maximum Exchange 2 data flow
• That is, less than 8 hosts cannot ingest this flow of data
• NOTE: A single Oracle foreground (no PQ) can drive
storage at roughly 20 GB/s but no data can flow from the
storage grid to the single foreground process at this rate.
Think of a fully offloaded query.
• 1 order, many eaters
• Exchange 3:
• Scales horizontally to a maximum aggregate rate of roughly
20 GB/s
• This is the aggregate rate of data flow between the database
grid and the storage grid.
• PQ server must have sufficient CPU bandwidth else disk I/O
is throttled.
• All database hosts must participate to realize maximum
theoretical Exchange 3 bandwidth.
• Many orders, 1 dish
Data flow maximums
2.5 GB/s per host

Exchange 3 20 GB/s Agg
Infiniband
Exchange 2 2.5 GB/s per cell
20 GB/s Agg
CPUs & CPUs &

CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
CPUs &
Memory CPUs &
Memory
Memory Memory
Exchange 1 4.8GB/s
67 GB/s Agg
Case studies
Case studies
Principals
• Lowest effective bandwidth through any exchange

limits overall throughput
• Exadata storage intelligence is used to limit required
data flow from cells to database grid
• RAC is used to combine consumer capability of
database grid
• Disks spin, but heads only move to read or write when
someone asks for something
• I/O and CPU utilization are linked
Case studies
Schema
SQL> desc all_card_trans
Name Null? Type
----------------------------------------- -------- ----------------------------
CARD_NO NOT NULL VARCHAR2(20)
CARD_TYPE CHAR(20)
MCC NOT NULL NUMBER(6)
PURCHASE_AMT NOT NULL NUMBER(6)
PURCHASE_DT NOT NULL DATE
MERCHANT_CODE NOT NULL NUMBER(7)
MERCHANT_CITY NOT NULL VARCHAR2(40)
MERCHANT_STATE NOT NULL CHAR(3)
MERCHANT_ZIP NOT NULL NUMBER(6)
Case studies
Queries
• Light, lightweight Scan

• SQL> select max(mcc) from all_card_trans where mcc < 0;
• Lightweight scan 50% selectivity
• SQL> select max(mcc) from all_card_trans where
purchase_amt > 60;
• Busy Storage Server CPUs
• Complex Query
• Synopsis:
• 5-table join – Busy database CPUs
• Heavy predicate evaluation – Busy Storage Server CPUs
• See next slide
Case studies
Complex query
select custid, sum(refund_amt) returns from CUST_SERVICE cs where return_dt > ( SYSDATE - 180)
and CLUB_CARD_NUM > 0 and CC_NUMBER > 0 and cs.club_card_num not like '%A%‗ group by custid
),
xxx as ( select cmr.custid, cf.aff_cc_num, cmr.returns, sum(os.trans_amt) from yyy cmr, CF_BASE2 cf, OS_BASE2 os
where cf.custid = cmr.custid and cf.custid = os.custid and os.club_points_earned > 0 and os.STORE_CODE > 0
and os.TRANS_ID > 0 and os.CUSTID > 0 and ( cf.club_card_num not like '%A%‗ and cf.AFF_CC_NUM not like '%A%'
and cf.CUST_SHIPTO_DETAIL2 not like '%NO DETAIL%‗ and cf.CUSTDETAIL1 not like '%NO DETAIL%'
) and os.trans_dt > ( SYSDATE - 180) group by cmr.custid, cf.aff_cc_num, cmr.returns
having (returns / sum(os.trans_amt) * 100) > 2
)
select card_no, sum(purchase_amt) sales
from ACT_BASE2 act where act.purchase_dt > ( SYSDATE - 180)
and
card_no in ( select aff_cc_num from xxx) and act.merchant_code not in
( select merchant_code from PARTNER_MERCHANTS where store_zip > 0 and store_name not like '%ACME%')
and MERCHANT_CITY not like '%Frankfort%' and MERCHANT_STATE not like '%KY%'
and MCC > 1 and CARD_TYPE not like '%NO CARD%‗ and PURCHASE_AMT > 0
group by card_no
having sum(purchase_amt) > 10;
Case study 1
Results
• Light, lightweight scan
• Fully offloaded – no data returned to server
• Caveats: This full-rack is configured with 450 GB SAS drives and
is missing 1 96 GB Flash card.
• Maximum HDD disk throughput is 20 GB/s, Combined
Flash+HDD is 54 GB/s.
SQL> SELECT MAX(MCC) FROM ALL_CARD_TRANS WHERE
MCC < 0;
Storage Source Query Throughput GB/s CPU %busy Query CPU Seconds Effective
From Storage iDB Cells Database Tm (sec) GB/s
Light, lightweight Scan 49 0.007 35 3 51 4194
FLASH+HDD
Light, lightweight Scan (HCC 6:1) 23 0.007 90 2 20 4124 125
HDD
Case study 1
Results
Storage Source Query Throughput GB/s CPU %busy Query CPU Seconds Effective
From Storage iDB Cells Database Tm (sec) GB/s
FLASH+HDD
HDD
• Data Flow Lessons:

•HCC query time benefit metered by cell CPU bandwidth
•HCC data scanning from FLASH+HDD is, of course, the fastest even
though only 23 GB/s scan rate from HDD+FLASH
•HDD non-compressed -> FLASH+HDD compressed is 6.25x
faster and 12% less total CPU seconds used.
•If the performance you are measuring does not meet your
expectations, remember the 3 Data Flow Exchanges
Case study 2
Results
•Lightweight scan - 50% selectivity

SQL> select max(mcc) from all_card_trans where
purchase_amt > 60;
Storage Source Query Throughput GB/s CPU %busy Query Tm CPU Effective
(sec) Seconds GB/s
From Storage iDB Cells Database
FLASH+HDD Lightweight Scan 50% Selectivity 50 3.8 80 15 52 10317

Lightweight Scan 50% Selectivity (HCC 6:1)
13 2.7 90 50 37 9827 70
Lightweight Scan 50% Selectivity 20 1.5 28 10 126 9520

HDD
13 2.6 90 50 36 9657 69
Case study 2
Results
Storage Source Query Throughput GB/s CPU %busy Query Tm CPU Effective
(sec) Seconds GB/s
From Storage iDB Cells Database
FLASH+HDD Lightweight Scan 50% Selectivity 50 3.8 80 15 52 10317

13 2.7 90 50 37 9827 70
Lightweight Scan 50% Selectivity 20 1.5 28 10 126 9520

HDD
13 2.6 90 50 36 9657 69
• Data Flow Lessons:

•Producer CPU utilization throttles disk throughput
•Cell CPU reaches critical level when scanning/filtering/projecting data from
either HDD or combined HDD+FLASH. Both media sources generate the same
effective throughput.
•Use assets wisely. Do all tables need to be pinned in cell Flash Cache?
•Expect increase CPU utilization in both grids when querying HCC data
•If the performance you are measuring does not meet your expectations,
remember the 3 data flow exchanges
Data Flow Dynamics
Case Study Examples. Complex Query Case.
HDD, Non-HCC. Avg Disk I/O 9 GB/s, iDB 6 GB/s:
Complex Query HDD Phys I/O Complex Query CPU
100
14000
90
12000 80
10000 70
MB/sec
60
%CPU
8000
50
6000
40
4000
30
2000
20
0 10
3:30:34 3:30:43 3:30:51 3:31:00
0
3:30:34 3:30:43 3:30:51 3:31:00
Time
Time
iDB MB/s Disk MB/s Cells DB Grid

Data Flow Dynamics
Case Study Examples. Complex Query Case.
Complex Query CPU

• Complex Query Lessons:
• Heavy joins throttle I/O
100
90
• Heavy filtration/projection
80
throttles I/O
70
60 • If the performance you are

%CPU
50
measuring does not meet your
40
expectations, remember the 3
30
Data Flow Exchanges
20
10
0
3:30:34 3:30:43 3:30:51 3:31:00
Time
Cells DB Grid
© 2008
CellCLI, DCLI and ADRCI
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Exadata software architecture <Insert Picture Here>
• CellCLI
• DCLI
• ADRCI
Exadata software
archtecture
Exadata software architecture
CellCLI
iDB
Management
Server
CellSRV
[IORM]
Restart
Server
DiskDiskDiskDiskDisk
CellCLI
CellCLI
Overview
• Command line utility for managing cell resources
• CellCLI runs on the cell
• Run locally from a shell prompt
• Run remotely via ssh or dcli
• Run automatically by EM agent with Exadata EM plugin
• Can run non-interactively
[celladmin@cell01 ~]# cellcli

CellCLI: Release 11.1.3.0.0 - Production on Tue Oct 04 22:13:21 PDT 2008
Copyright (c) 2007, 2008, Oracle. All rights reserved.

Cell Efficiency ratio: 73.1
CellCLI>
CellCLI
Syntax
• Commands not case sensitive
• - character for line continuation
• ; optional command terminator
• REM, REMARK or – indicate comments
CellCLI
Commands
• Administration commands -- Similar to SQLPLUS:

• HELP: displays syntax and usage descriptions for all CellCLI
commands
• SET : sets parameter options in the CellCLI environment.
• SPOOL: writes results of commands to the specified file on
the cell file system.
• EXIT or QUIT: return control to invoking shell
• START or @: runs the CellCLI commands in the specified
script file.
CellCLI
Help Command
CellCLI> help
HELP [topic]
Available Topics:
ALTER
ALTER ALERTHISTORY
ALTER CELL
ALTER CELLDISK
ALTER GRIDDISK
ALTER IORMPLAN
ALTER LUN
ALTER THRESHOLD
ASSIGN KEY
CALIBRATE
CREATE
CREATE CELL
CREATE CELLDISK
CREATE GRIDDISK
…
CellCLI>
CellCLI
Object commands
• List and change cell resources
• Syntax: <verb> <object-type> [ALL |object-name] [<options>]
• Generic verbs: ALTER, CREATE, DROP, and LIST
used to change, create, remove, and display objects
CellCLI> create griddisk all prefix=data
GridDisk data_CD_1_stsd2s3 successfully created
...
CellCLI> alter griddisk data_CD_1_stsd2s3 availableTo="+ASM"

GridDisk data_CD_1_stsd2s3 successfully altered
CellCLI
Cell Object Types
• Resource-related object types represent hardware
and software configuration:
CELL, CELLDISK, GRIDDISK, IORMPLAN, KEY,
LUN, PHYSICALDISK
• Performance metric object types: ACTIVEREQUEST,
METRICCURRENT, METRICDEFINITION,
METRICHISTORY
• Failure alert object types:
ALERTDEFINITION, ALERTHISTORY,THRESHOLD.
• Objects types indicated by RED

• List only
• Automatically created
CellCLI
Object Attributes
• Each object has attributes, listed by the DESCRIBE
command.
• Attributes which can be modified by ALTER
commands are listed as "modifiable"
CellCLI> describe griddisk
name modifiable
availableTo modifiable
cellDisk
comment modifiable
creationTime
errorCount
id
offset
size modifiable
status
Object Attributes
List options
• LIST command results can be limited by "where"
predicate on attribute values
• LIST output fields can be specified by the ―attributes‖
clause which uses standard comparison operators
• LIST DETAIL option provides display of all attributes
CellCLI> list celldisk where freespace > 100G
CD_1_stsd2s3 normal
CD_2_stsd2s3 normal
CD_3_stsd2s3 normal
CD_4_stsd2s3 normal
CellCLI> list griddisk attributes name,size,status where name like 'data.*'
data_CD_1_stsd2s3 928M active
data_CD_2_stsd2s3 136G active
CellCLI
CELL Object Type
• CELL is the local server to which disks are attached
and on which the CellCLI utility runs.
• One Cell object (default cell name = domain name)
• Automatically created, but can use CREATE CELL
CellCLI> list cell
cell01 online
CellCLI> alter cell smtpServer='my_mail.example.com', -
smtpFromAddr='john.doe@example.com', -
smtpFrom='John Doe', -
smtpToAddr='jane.smith@example.com', -
snmpSubscriber=((host=host1),(host=host2)), -
notificationPolicy='critical,warning,clear', -
notificationMethod='mail,snmp'
Cell cell01 successfully altered
CellCLI> alter cell shutdown services all
Stopping the RS, CELLSRV, and MS services...
The SHUTDOWN of services was successful.
LIST CELL
CellCLI> list cell detail
name: cell01
bmcConfigured: FALSE
bmcType: "ILO 2.0"
cellVersion: OSS_MAIN_LINUX_081120
cpuCount: 4
idLEDStatus: off
interconnectCount: 5
interconnect1: bond0
iormBoost: 0.0
ipaddress1: 144.25.214.119/22
kernelVersion: 2.6.18-53.1.el5.sage
makeModel: HP DL series smart array ILO
metricHistoryDays: 7
offloadEfficiency: 73.6
status: online
temperatureReading: 47.0
temperatureStatus: normal
upTime: 12 days, 19:55
cellsrvStatus: running
msStatus: running
rsStatus: running
CellCLI
Disk hierarchy
Physical LUN CELLDISK GRIDDISK

disk
CellCLI
PHYSICALDISK and LUN Object Types
• PHYSICALDISK: A physical disk on the cell.
• LUN: the address for each individual physical disk.
• PHYSICALDISK objects are discovered when the cell is started.
LUN objects are automatically created for each CELLDISK.
CellCLI> list physicaldisk attributes name, status, physicalsize

1I:0_1:1 normal 146G
...
CellCLI> list lun
3_1 normal
3_2 normal
3_3 normal
...
CellCLI
CELLDISK and GRIDDISK Object Types
• CELLDISK is associated with a logical unit number (LUN).
One physical disk is associated with each cell disk.
• GRIDDISK is a logical partition of a cell disk. It is exposed
on network (as ASM disks) to the database hosts.
CellCLI> create celldisk all
CellDisk CD_1_stsd2s3 successfully created
CellDisk CD_2_stsd2s3 successfully created
...
CellCLI> create griddisk all prefix=data,size=10G
...
CellCLI> alter griddisk all availableto='+ASM'
...
CellCLI
IORMPLAN Object Type
• The IORMPLAN object contains the set of directives that
determine allocation of I/O resources between multiple
cells (interdatabase) to database clients.
• There is one IORMPLAN object for the cell.
CellCLI> ALTER IORMPLAN dbPlan=((name=sales, level=1, allocation=80), -

(name=finance_prod, level=1, allocation=20), -
(name=sales_dev, level=2, allocation=100), -
(name=sales_test, level=3, allocation=50), -
(name=other, level=3, allocation=50))
IORMPLAN successfully altered
CellCLI> alter iormplan active

IORMPLAN successfully altered
CellCLI
ALERTHISTORY and THRESHOLD Object Types
• ALERTHISTORY: A list of alerts that have occurred on the
cell.
• THRESHOLD objects describe the rules for generating
alerts based on a specific metric.
CellCLI> LIST ALERTHISTORY WHERE begintime > 'Jun 1, 2008 11:37:00 AM PDT„
39 2008-10-02T12:26:53-07:00 "ORA-07445: exception encountered: core dump “
40 2008-10-06T23:28:06-07:00 "RS-7445 [unknown_function] [signum: 6] []"
41 2008-10-07T00:50:42-07:00 "RS-7445 [Serv MS not responding] []“
42 2008-10-07T02:21:19-07:00 "RS-7445 [unknown_function] [signum: 6] []"
CellCLI> CREATE THRESHOLD db_io_rq_sm_sec.db123 comparison='>', critical=120

Threshold db_io_rq_sm_sec.db123 successfully created
CellCLI> list threshold detail

name: db_io_rq_sm_sec.db123
comparison: >
critical: 120.0
CellCLI
METRIC* Object Types
• METRICDEFINITION objects describe the metrics.

• METRICCURRENT objects are the set of current observations
• Flushed to METRICHISTORY every hour
CellCLI> list METRICDEFINITION attributes name,description where objecttype='cell'

CL_CPUT "Cell CPU Utilization is the percentage of time over the
previous minute that the system CPUs were not idle (from /proc/stat)."
CL_FANS "Number of working fans on the cell"
...
ellCLI> list METRICCURRENT cl_cput
CL_CPUT stado54 3.3 %
CellCLI
METRIC* Object Types
• METRICHISTORY a collection of past metric observations
CellCLI> list METRICDEFINITION attributes name,description where objecttype='cell'

CL_CPUT "Cell CPU Utilization is the percentage of time over the
previous minute that the system CPUs were not idle (from /proc/stat)."
CL_FANS "Number of working fans on the cell"
...
CellCLI> list METRICCURRENT cl_cput
CL_CPUT stado54 3.3 %
CellCLI> list metrichistory cl_cput where –

collectiontime>'2008-11-18T11:46:32-08:00'
CL_CPUT stado54 3.3 % 2008-11-18T11:47:32-08:00
CL_CPUT stado54 2.8 % 2008-11-18T11:48:32-08:00
CL_CPUT stado54 3.3 % 2008-11-18T11:49:32-08:00
...
CellCLI
Other object commands
• CREATE KEY displays a random hex security key

• ASSIGN KEY assigns a security key for an ASM or
DB client
• CALIBRATE runs raw performance tests on cell disks.
• EXPORT CELLDISK: prepares cell disks before
moving (importing) the cell disk to a different cell.
• IMPORT CELLDISK: reinstates exported cell disks on
a cell where you moved the physical drives that
contain the cell disks
CellCLI
Exadata Storage Server users
• cellmonitor – can only use LIST commands

• celladmin – All functions except for CALIBRATE
• root – All functions
DCLI
DCLI
Overview
• The DCLI script runs commands on multiple cells in
parallel threads.
• File copy and command execution occur on a set of cells in
parallel.
• Command output is collected and displayed after file copy
and command execution is finished on all cells.
• Setup:
• Copy DCLI from cell (/opt/oracle/cell/cellsrv/bin/dcli) to host from
which management is done.
• Create files which contain a list of cells to which commands are
issued, e.g. mycells
• Run ―dcli –k –g mycells‖ to create ssh key equivalence on cells
DCLI
Return codes
• A DCLI can returns one of the following values
• 0 – The command(s) were copied and run on all designated cells
• 1 – One or more cells could not be reached or returned a non-zero
error code
• 2 – A local error prevented execution of any commands
DCLI
Example 1
$ scp celladmin@stsd2s3:/opt/oracle/cell/cellsrv/bin/dcli .
dcli 100% 32KB 31.6KB/s 00:00
$ cat - > mycells

# cells to be managed
stsd2s1
stsd2s2
stsd2s3
$ dcli -g mycells -k
celladmin@stsd2s1's password:
celladmin@stsd2s2's password:
stsd2s1: ssh key added
stsd2s2: ssh key added
stsd2s3: ssh key already exists
$ dcli -g mycells cellcli -e list cell

stsd2s1: stsd2s1 online
DCLI
Example 2
$ dcli
Error: No command specified.
usage: dcli [options] [command]
options:
--version show program's version number and exit
-cCELLS comma-separated list of cells
-fFILE file to be copied
-gGROUPFILE file containing list of cells
-h, --help show help message and exit
-k push ssh key to cell's authorized_keys file
-lUSERID user to login as on remote cells (default: celladmin)
-n abbreviate non-error output
-rREGEXP abbreviate output lines matching a regular expression
-sSSHOPTIONS string of options passed through to ssh
--scp=SCPOPTIONS string of options passed through to scp if different
from sshoptions
-t list target cells
-v print extra messages to stdout
--vmstat=VMSTATOPS vmstat command options
-xEXECFILE file to be copied and executed
DCLI
Example 3
$ dcli -c mycells cellcli -e create griddisk all prefix="data", size=120G
stsd2s1: GridDisk data_CD_2_stsd2s1 successfully created
...
$ dcli -c mycells 'cellcli -e alter griddisk all availableTo=\"+ASM,dbm\"'
stsd2s1: GridDisk data_CD_1_1_stsd2s1 successfully altered
stsd2s1: GridDisk data_CD_2_stsd2s1 successfully altered
...
$ dcli -g mycells cellcli -e assign key for dbm='1212824bf214e59f3b60d1553b784cf0'
stsd2s1: Key for dbm successfully altered
$ dcli -c stsd2s3 cellcli -e alter iormplan active

stsd2s1: IORMPLAN successfully altered
DCLI
Example 4
$ dcli –g mycells --vmstat='3 10'
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
13:03:15: r b swpd free buff cache si so bi bo in cs us sy id wa st
stsd2s1: 1 0 451180 13008 3456 61460 0 0 180 347 5 4 1 2 96 0 0
stsd2s2:21 0 350260 14480 3500 62444 0 0 330 252 0 2 1 2 97 0 0
stsd2s3: 0 0 128 13880 23556 511432 0 0 370 25 9 2 1 2 97 0 0
Minimum: 0 0 128 13008 3456 61460 0 0 180 25 0 2 1 2 96 0 0
Maximum:21 0 451180 14480 23556 511432 0 0 370 347 9 4 1 2 97 0 0
Average: 7 0 267189 13789 10170 211778 0 0 293 208 4 2 1 2 96 0 0
stsd2s1: 2 0 451180 13168 3480 61508 25 0 10857 34240 28862 30560 12 27 60 0 0
stsd2s2: 1 0 350260 12144 3524 62496 0 0 10912 34196 12344 31365 11 17 71 0 0
stsd2s3: 0 0 128 14576 23576 511480 0 0 0 0 1005 16498 0 0 100 0 0
Minimum: 0 0 128 12144 3480 61508 0 0 0 0 1005 16498 0 0 60 0 0
Maximum: 2 0 451180 14576 23576 511480 25 0 10912 34240 28862 31365 12 27 100 0 0
Average: 1 0 267189 13296 10193 211828 8 0 7256 22812 14070 26141 7 14 77 0 0

stsd2s1: 2 0 451180 12344 3504 61508 0 0 14145 42561 35069 31306 13 30 57 0 0
stsd2s2: 1 0 350260 11768 3548 62496 0 0 13958 42532 13328 31422 12 19 68 0 0
stsd2s3: 0 0 128 15952 23624 511484 0 0 0 111 1010 15093 0 0 100 0 0
Minimum: 0 0 128 11768 3504 61508 0 0 0 111 1010 15093 0 0 57 0 0
Maximum: 2 0 451180 15952 23624 511484 0 0 14145 42561 35069 31422 13 30 100 0 0
Average: 1 0 267189 13354 10225 211829 0 0 9367 28401 16469 25940 8 16 75 0 0
…
DCLI
Other uses
• Shutting down and starting up the CRS stack on all nodes
• Changing or checking a configuration parameter on all nodes
• Checking process state info on all nodes
• Checking/gathering hardware info on all nodes for HP tickets (ex:
ipmitool, dmidecode)
• Starting/Stopping/Checking cell services on all nodes
• Gathering/aggregating system stats on all nodes (ex: vmstat, iostat,
etc)
• Verifying network connectivity across all nodes (IP and rds - ie
normal pings and rds-pings)
• Setting up ssh across all nodes
• Setting up security - both cell side (cellcli commands) and database
side (cellkey.ora population)
ADRCI
ADRCI
Utility Overview
• ADRCI: command line tool for viewing diagnostic data within

a cell's ADR (Automatic Diagnostic Repository).
• This support is similar to that provided with RDBMS in
11G.
• The tool is invoked simply by running ‗adrci'.
• ADRCI includes the Incident Packaging System (IPS), which
allows identifying and packing up all of the relevant
diagnostic data for a critical error.
• Critical errors are seen from cellcli LIST ALERTHISTORY,
from an email or SNMP notification, or from EM plugin alert
list.
ADRCI
Incident Packaging Example
$ adrci
ADRCI: Release 11.1.0.7.0 - Production on Thu Nov 20 16:32:15 2008
Copyright (c) 1982, 2007, Oracle. All rights reserved.
adrci> show homes

ADR Homes:
diag/asm/cell/cell01
adrci> set home diag/asm/cell/cell01

adrci> show incidents
ADR Home = /opt/oracle/cell/log/diag/asm/cell/cell01:
*************************************************************************
INCIDENT_ID PROBLEM_KEY CREATE_TIME
-------------------- --------------------- -----------------
5 RS 7445 2008-11-19 22:38:56.228289 -05:00
adrci> ips create package incident 5

Created package 1 based on incident id 5, correlation level typical
adrci> ips generate package 1 in /tmp

Generated package 1 in file /tmp/RS7445_20081120163628_COM_1.zip, mode complete
Exadata and availability
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Availability with Exadata <Insert Picture Here>
• ASM and Exadata

• Exadata disk availability
• MAA and Exadata
• MAA best practices for Exadata
• Data Guard and Exadata
• Patching and upgrades
Availability with Exadata

Availability and Exadata
Hardware redundancy
• Redundant database servers

• Redundant storage cells
• Redundant disks within cells
• Redundant connectivity
• Redundant power supplies
Availability and Exadata
Software redundancy
• RAC
• ASM
ASM and Exadata

Exadata
Storage layout overview
• Physical disks (LUN) map to a Cell Disks
• Cell Disks partitioned into one or multiple Grid Disks
• ASM diskgroups created from Grid Disks
• Transparent above the ASM layer
ASM diskgroup
Grid Disk 1
Cell
Physical …
Disk Disk
Grid Disk n ASM diskgroup
Sys Area Sys Area
Exadata storage
Cell disks
Cell
Disk Exadata Cell Exadata Cell
• Cell Disk is the entity that represents a physical

disk residing within a Exadata Storage Cell
• Automatically discovered and activated
Exadata storage
Grid disks
Grid Exadata Cell Exadata Cell
Disk
• Cell Disks are logically partitioned into Grid Disks

• Grid Disk is the entity allocated to ASM as an ASM disk
• Minimum of one Grid Disk per Cell Disk
• Can be used to allocate ―hot‖, ―warm‖ and ―cold‖ regions of a
Cell Disk or to separate databases sharing Exadata Cells
Exadata storage
Grid disks
First grid disk
• First grid disk
defined is placed on
outer portion of the
disk
• Typical configuration
is to use first grid
disk for data, second
grid disk for Flash
Recover Area (FRA)
Second grid disk

Exadata storage
ASM Disk Groups and mirroring
Hot ASM
Disk Group
Exadata Cell Exadata Cell Cold ASM
Disk Group
Hot Hot Hot Hot Hot Hot
Cold Cold Cold Cold Cold Cold
• Two ASM disk groups defined

• One for the active, or ―hot‖ portion, of the database and a
second for the ―cold‖ or inactive portion
• ASM striping evenly distributes I/O across the disk group
• ASM mirroring is used protect against disk failures
• Optional for one or both disk groups
Exadata storage
ASM mirroring and failure groups
ASM ASM
Exadata Cell Exadata Cell
Failure Group Failure Group
Hot Hot Hot Hot Hot Hot
Cold Cold Cold Cold Cold Cold
ASM
Disk Group
• ASM mirroring is used protect against disk failures
• ASM failure groups are used to protect against cell
failures
Exadata storage
ASM interactions
• Grid disks cannot span multiple cells, but disk groups

can
• Redundancy is handled by ASM
• Availability settings, such as disk_repair_time, are
ASM settings
• DISKMON is used to communicate between ASM and
the Exadata Storage Server cells
Exadata storage
Intelligent Data Placement
• ASM mirrors data within a grid disk

• I/Os to outer sectors of a Physical Disk are faster
than inner sectors
• ASM places the primary extents in the first half of the
disk from the outer edge towards the middle, and
secondary's in the second half from the middle
towards the spindle
• Create cell disks with optional INTERLEAVING
attribute set to normal_redundancy or
high_redundancy
Intelligent Data Placement (IDP)
• Normal Redundancy: First half

(50% of the disk) is considered
HOT, while second half (50% of the
disk) is considered COLD
• IDP places primary extents in the 50%
HOT Zone
• IDP places secondary extents (mirror
copies) in the 50% COLD Zone
• High Redundancy: First 1/3rd (33%) of the disk is

considered HOT, while the rest 2/3rd (67%) of it is
considered COLD
• IDP places primary extents in the 33% HOT Zone
• IDP places secondary extents (2 mirror copies) in the 67% COLD
Zone
Interaction with ASM
• If an ASM disk group with high redundancy is desired

with IDP
• Create cell disks with INTERLEAVING =‗high_redundancy‘
• Create grid disks from these cell disks
• Add grid disks to the ASM Diskgroup
• If an ASM disk group with normal redundancy is

desired with IDP
• Create cell disks with INTERLEAVING =‗normal_redundancy‘
• Create grid disks from these cell disks
• Add grid disks to the ASM Diskgroup
Exadata disk availability

Exadata Storage Server
Availability – Case 1
Cell becomes
unreachable
ONLINE OFFLINE
Cell becomes
reachable
HDD / Flash Card removed

from cage / PCIe slot
ONLINE OFFLINE
HDD / Flash Card put back

into cage / PCIe slot
Alter GridDisk/CellDisk
Inactive
ONLINE OFFLINE
Alter GridDisk/CellDisk
Active
Automatic disk online
cell unreachable
OFFLINE cell reachable
disk pulled out disk pushed in
disk inactivated disk activated
user offline disk group mounted
ONLINE SYNC
What is automated ?
• Inactivating a GridDisk or CellDisk in the cell will

automatically initiate an OFFLINE operation in the ASM
instance.
• Cell admin will be able to query the ASM instance
using cellcli for the following:
1.‗mode_status‘ of a GridDisk. Possible values for this are
ONLINE, OFFLINE, SYNCING or UNKNOWN. If for any reason
the query could not succeed, it will print out a suitable error
message. A state of UNKNOWN is returned if for example the
diskgroup was not mounted.
2.If a GridDisk can be taken OFFLINE without ASM losing all
mirror copies.
Exadata/ASM disk availability
Adding and dropping grid disks
• disk went dead

• disk online after
rebalance • cell became
inaccessible
ONLINE OFFLINE
ADD DROP
• blank disk pushed in • blank disk pushed in

• cell became • disk_repair_time
accessible expired
• user drop disk force
What is automated ?
• DROP and ADD operation of an ASM disk. Following are the

scenarios when this operation will be initiated.
• A physical disk that went bad is replaced with a new blank disk. All
ASM disks that were hosted on the failed disk will be dropped and
added back. Likewise for flash card hosting GridDisks/ASM disks.
• An ASM disk (grid disk) that is in OFFLINE state is dropped
forcefully due to disk_repair_time expiry. Subsequent to this the
disk is realized as accessible (possibly due to cell/CellSRV
becoming accessible or say, the disk plugged back into it‘s cage).
• ASM Admin initiated a drop disk command with the ‗force‘ option.
Automation will try to add the disk back to the diskgroup when any
of the trigger conditions for disk ONLINE automation operation
happens.
How it works ?
• When a physical disk is plugged in, the lun gets

automatically enabled.
• As long as the disk is not marked as IMPORT
required CellSRV will make the grid disks on that
physical disk available to ASM.
• If however the physical disk is new (replacing a dead
disk), CellSRV will re-create the cell disk and gri disks
that were hosted on the dead disk. Once this step
completes the disk will be made accessible to ASM.
How it works ?
• Querying ASM disk ‗mode_status‘ and ‗asmdeactivationoutcome‘
from cellcli. One can pass a specific griddisk name as well.
CellCLI> list griddisk attributes name, asmmodestatus

datafile1 OFFLINE
datafile2 OFFLINE
datafile3 OFFLINE
…
Possible values: ONLINE | OFFLINE | SYNCING | UNKNOWN
CellCLI> list griddisk attributes name, asmdeactivationoutcome

datafile1 Yes
datafile2 Yes
datafile3 Yes
Possible values: Yes | No
How it works ?
• CellSRV maintains a file called griddisk.owners.dat which has
details such as:
– ASM disk name
– ASM diskgroup name
– ASM failgroup name
– Cluster identifier
– Requires DROP/ADD
for all GridDisks that are part of ASM diskgroups.
What requires intervention?
• Disk group dismounted due

to loss of all mirrors
Disk Group
• Manually remount the disk
Remount group when the disks
become accessible
• Move a disk from one cell to

Cell Disk another
• Import the cell disk on the
Export new cell
Import • Manually online the disk in
ASM
MAA and Exadata

Oracle‟s Database HA Solution Set
Protection against all sources of downtime
Server Real Application Clusters

Failures
Oracle MAA Best Practices

Unplanned Flashback
Downtime RMAN & Oracle Secure Backup
Data ASM
Failures Data Guard
GoldenGate
System Online Reconfiguration

Changes Rolling Upgrades
Planned Data
Downtime Changes Online Redefinition
App Edition-based Redefinition

Changes
Maximum Availability Architecture (MAA)
Real Application Active

Clusters Data Guard
Secure Backups
to Disk, Tape or Data Guard
Cloud
Automatic Storage Management

Fast Recovery Area
MAA best practices for

Exadata
MAA best practices
ASM disk groups
• Standard protection disk groups
• DATA – normal redundancy
• Data files only (OUTER)
• RECO – high redundancy
• One controlfile, online redo logs (1 member), archives,
flashback logs, spfile, voting disks and OCR
• Potentially a DBFS disk group for staging on the innermost section
of disk, normal redundancy
• If double partner disk or double cell failure occurs, then database
can be restored from tape and full recovery with zero data loss is
achievable
• Can restore from tape and recovery procedures
• Downtime ensues
MAA best practices
ASM disk groups
• Alternative redundancy schemes

• Both DATA and RECO high redundancy
• Higher protection reduces potential downtime
• More disk used for mirrors
• High redundancy for DATA
• Best practice to store archive logs in alternate location
• Advantages
• Full bandwidth available from all cells
• Reduced maintenance and administration
• IORM can be used to set priority of IO operations
MAA best practices
Flashback Database
• Enable Flashback Database

• Minimum impact to OLTP workloads (< 2%)
• Minimum impact to DW loads if operational practices and
recommended patches are in place
• Refer to Support Note 565535.1
• Size fast recovery area minimum
• redo rate X DB_FLASHBACK_RETENTION_TARGET
MAA best practices
Data Corruption Protection
• Oracle protects from data corruptions by:

• DB_BLOCK_CHECKING checks block semantics
• DB_BLOCK_CHECKSUM calculates and validates checksum in redo or
data blocks
• DB_LOST_WRITE_PROTECT detects stray and lost writes.
• Exadata uses ASM redundancy enabling auto-correction
of corrupt blocks during writes using the mirror copy
• ASM redundancy enables Oracle to automatically try the mirrored copy if
it detects a corrupt block
MAA best practices
Data Corruption Protection
• On primary and Data Guard standby databases:

• For OLTP and DW, set DB_BLOCK_CHECKSUM=FULL
and DB_LOST_WRITE_PROTECT=TYPICAL
• Observed less than 5% performance impact for batch and
OLTP workloads
• Do not change the DB_BLOCK_CHECKING initialization
parameter without first conducting a performance impact
analysis since the impact varies on workload
• Evaluate impact on both primary and standby
databases – there is a benefit to setting this value to
medium, true or full at the standby database even if it
is not set at the primary
Data Guard and Exadata

Data Guard Redo transport
Options
• SYNC transport and Maximum Protection
• Provides zero data loss failover
• If standby is down, primary halts
• Recommended if network latency between primary and standby is < 5 ms
in order to minimize primary performance impact
• Higher network latencies may be acceptable for some applications -
performance testing is required because synchronous redo transport
will impact primary database performance at higher network
latencies
• ASYNC transport and Maximum Performance
• Minimal impact to primary performance regardless of latency
• Monitor hit ratio using x$logbuf_readhist and increase LOG_BUFFER to
avoid excessive LNS disk reads from the online redo logs
Configuration best practices
• Network considerations
• Configure with Infiniband for local standby database for
approximately 2 GB/sec bandwidth using IPoIB
• Otherwise GigE will provide approximately 120 MB/sec
• Standbys can use VIP interface or dedicated interface
• Use dedicated network interface for redo transport
• Refer to Support Note 960510.1 for complete details
Configuration best practices
• Tune OS & network parameters

• Tune network parameters that affect network buffer sizes and
queue lengths
• A minimum of 10 MB network buffer size is recommended
• Ensure sufficient network bandwidth for maximum database
redo rate + other activities
Refer to Primary Site and Network Configuration Best Practices

http://www.oracle.com/technology/deploy/availability/pdf/MAA_DG_NetBestPrac.pdf
Active Data Guard Apply Rate
Extreme Performance on Exadata
11.1.0.7 requires BLR 8619827

11.2.0.1 gets great performance out of the box
Data Guard and Exadata
Standby machines and EHCC
• A Data Guard standby database can be a non-Exadata
Database Machine
• Must follow Data Guard mixed configuration specification (Note
413484.1)
• If EHCC is used, Data Guard will work, but non-Exadata
Database Machine will be unable to read data in
compressed data
• Upon failover
• Tables with EHCC will require ALTER TABLE MOVE statements to
become readable
• Storage required for uncompressed data
• Performance of non-Database Machine will be different
Active Data Guard Summary
Optimized for Database Machine
• Active Data Guard is the best availability, data

protection and disaster recovery solution for OLTP
and Data Warehouse
• Generic practices still apply
• Validated and optimized for the Database Machine
• Redo Apply
• Archiving
• Proven technology: multiple customers have already
deployed Data Guard standby databases on a
Database Machine
Patching and upgrades

Patching & Upgrading
Scenarios
• All standard planned maintenance solutions apply
• Database Machine upgrades may require
• Exadata Storage server software changes
• Exadata software, firmware, OFED, OS
• Database Server software changes
• Oracle database software, firmware, OFED, OS
• Switch software (InfiniBand, Ethernet)
• Patches and upgrades situations
• Exadata Storage Server patch
• Database software patch
• Database server system patch (OS or firmware)
Online Patching
• Exadata Storage Server patches supplied by Oracle
maintain all aspects of OS, firmware, and software
• No additional software (Linux RPMs or otherwise) is
allowed
• Only software supplied by Oracle patching is permitted
• Manual firmware changes not allowed
• Patches are one of two types
• Overlay - Restart Exadata cell services, automatic reconnect
• Staged - Restart Exadata Storage Server, resync interim changes
with ASM fast mirror resync
• Installed by whomever manages the Exadata Cells
• Use patch installation tool (patchmgr.sh) – see README
• Most patches installed using root account
Database Server Patching
• Database software patches installed by DBA w/ OPatch
• Contact Oracle Support if one-off patch conflicts with Exadata required
patches
• Operating system and firmware patches
• Verify new patch meets Exadata requirements
• IB HCA and OFED versions must match storage servers
• Additional software allowed
• Maintain compliance with Exadata requirements for all dependencies
• RAC rolling upgrade
• Database software patches
• Firmware changes
• Certified operating system upgrades
• Data Guard rolling upgrades
Exadata Database Machine
Software Maintenance Documents
• Two My Oracle Support (MOS) notes document:
• Software/firmware requirements
• Compatibility requirements between components
• Software patches and upgrades
• Procedures for download and installation
• MOS note 835032.1 documents requirements for Oracle Database
11.1 (V1) systems
• MOS note 888828.1 documents requirements for Oracle Database
11.2 (V1 HW and V2 HW) systems
• Customers should sign up for automated alerts for changes to these
MOS notes
• In the future OCM will provide automatic notification of patches and
configuration changes
Backup and recovery
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Backup and recovery overview <Insert Picture Here>
• Best practices for disk-based backup and

recovery
• Best practices for tape-based backup and
recovery
• Backup & recovery with Data Guard
Backup and recovery

overview
Backup, restore and recovery rates
Database Machine
• Backup rates
• 18 TB/Hr full image backups
• 10-46 TB/Hr effective backup rate for incremental backups
• Restore rates
• 24 TB/Hr restore rates
• Recovery rates
• 2.1 TB/Hr recovery rates
• Above rates pertain to physical files. With
compression, effective backup/restore rates will
multiply
• It all comes down to bandwidth of your slowest
component
Backup, restore and recovery operations
Database Machine
• Simple operations with standard RMAN commands
• Automatically parallelized across all storage servers
• Data aware
• Detection of block corruptions
• Auto repair and manual block repair options
• Integrated and transparently
• OLTP and data warehouse databases
• RAC, Data Guard, flashback technologies, ASM, Exadata
• Oracle native compression capabilities
• OLTP (typically 3 X compression)
• Exadata Hybrid Columnar Compression (typically 10-15 X
compression)
Best Practices
for disk-based backup
and recovery
Disk-based backup and recovery
Exadata Storage Server Grid Disk layout
The faster (outer) 40% of the disk is assigned to the DATA Area
The slower (inner) 60% of the disk is assigned to the RECO Area
• Recommended disk group configuration

• Will be configured automatically during deployment
Disk-based backup & recovery
Effective rates
• Backup (and restore) rates
• 18 TB/Hr for Full Rack configurations – X2
• 5.4TB/Hr for Quarter Rack configuration – V2
• Effectively 10-46 TB/Hr for incremental backups
• Restore rates into existing files
• 24TB/hr for Full Rack configuration
• 13TB/hr for Half Rack configuration
• 5.6TB/hr for Quarter Rack configuration
• Typical Redo Apply (recovery) rates
• 200MB redo/sec (720GB redo/hour) for OLTP workloads
• 600MB redo/sec (2.1TB redo/hour) for Direct Load workloads
Strategy and advantages
• Use RMAN incrementally updated backups

• Image copy stored in the Fast Recovery Area and created
once on the initial backup
• Nightly incremental backups created in the Fast Recovery
Area
• Incremental backups merged into image copies on a 24 hour
delay basis
• Key advantages over tape-only-based backups
strategies
• Potential for using backups directly with no restore
• Reduce backup windows and resources with incremental
backups
• Faster recovery for corruptions and some Tablespace Point In
Time Recovery (TSPITR) cases
Exadata best practices for backups
• Create a Database Service ―backup‖ that runs on a
maximum of two instances
• Use incremental backups and block change tracking
• Data block inspection is offloaded to Exadata
• For highest throughput allocate 8 RMAN Channels
• Listener Load Balancing distribute the connections between the
two instances
• Use fewer channels if highest throughput is not needed
• Set init.ora parameter
_file_size_increase_increment=2143289344
• Maximum observed CPU impact
• Less than 2 CPU cores used on the two DB nodes if all 8 RMAN
channels are utilized
Exadata best practices for restores
• For restore into existing files

• Create a Database Service ―restore‖ that runs on all the
instances of the database.
• Use 2 RMAN channels per database instance for Half Rack
and larger systems
• Use 4 RMAN channels per database instance for Quarter
Rack
• For restore into a new ASM Disk Group
• Create a Database Service ―restore‖ that runs on a maximum
of two instances
• Allocate a total of 8 RMAN Channels for the restore
Script examples on Database Machine
• RMAN configuration
configure default device type to disk;
configure device type disk parallelism 8;
• RMAN script for nightly incremental level 1 backup

run {
backup
incremental level 1
for recover of copy
with tag full_database
database;
recover
copy of database
with tag full_backup;
}
Alternative FRA on Exadata
Exadata Database Oracle Exadata

Machine Storage Servers
InfiniBand Network
Disk Based Backup & Recovery
Alternative FRA on Exadata
• Allocate additional (SATA) Exadata Storage Servers

for a dedicated Fast Recovery Area
• Additional Exadata Storage Servers must be installed
in another rack
• Key benefits
• Better failure isolation when using separate backup hardware
• Allows use of lower cost space for backup
Using non-Exadata storage
• Performance and complexity will vary

• No MAA best practices
• Considerations
• Utilize IP based protocols like iSCSI or NFS
• SAN HBA and network rates may limit the backup rate
• Must use an intermediate server that acts as an iSCSI or NFS
server If SAN Based storage
• Similar to the way the Media Server bridges between the
Exadata DB Machine and the tape library
Best Practices
for tape-based backup
and recovery
MAA Validated Architecture
Exadata Database Sun StorageTek

Machine SL500
Sun Fire X4170
Oracle Secure Backup
Admin Servers
2 Sun Fire X4275

Oracle Secure
InfiniBand Backup Media Fiber Channel
Network Servers SAN
Tape-based backup and recovery
Exadata Storage Server Grid Disk layout
The faster (outer) 80% of the disk is assigned to the DATA Area
The slower (inner) 20% of the disk is assigned to the RECO Area
• Recommended disk group configuration

• Can be configured automatically during deployment
Tape-based backup & recovery
Rates and configurations
• Backup rates
• Limited by number of tape drives
• 179MB/sec per LTO4 tape drive
• 8.6TB/Hr for 14 tape drives
• 29TB/Hr with Exadata Database Machine Full Rack Configuration
and 64 LTO4 tape drives.
• Restore rates (into existing files)
• Limited by number of tape drives
• 162MB/sec per tape drive
• 7.8TB/hr for Half and Full Rack Configuration (14 tape drives)
• 6.1TB/hr for Quarter Rack Configuration
• Restore rates (into empty disk group)
• 5.4 TB/hr for Quarter and 7.1 TB/hr for Half and Full (14 tape drives)
Strategy and implementation
• Oracle Database tape backup strategy:

• Weekly RMAN level 0 (full) backup
• Daily RMAN cumulative incremental level 1 backup
• To scale and maintain availability:
• For HA, start with at least two media servers with a dual
ported Host Channel Adapter (HCA) per media server,
bonded for HA
• Add tape drives until all the media server‘s HBA or HCA
bandwidth is consumed
• Add media servers and associated tape drives when the
Media Servers HCA bandwidth is consumed
• Tape-based backups scale linearly by adding Media Servers
and tape drives
Benefits and trade-offs of tape solution
• Benefits
• Fault isolation from Exadata Storage Server
• Maximizes Database Machine capacity and bandwidth
• Move backup off-site easily
• Keep multiple copies of backups in a cost effective manner
• Trade-offs
• Disk-based solutions have better recovery times for data and
logical corruptions and certain tablespace point in time
recovery scenarios
• No differential incremental backups are available
Configuration best practices for tape
• Ethernet or InfiniBand based configuration only
• Hardware changes to Database Machine are not supported
• Smaller databases can use Gigabit Ethernet
• Use a dedicated interface for the transport to eliminate impact to
client access network
• Typically a dedicated backup network is in place
• Maximum throughput with the GigE network is 120 MB/sec X
Number of Database Servers
• For a full Database Machine, 960 MB/sec possible
• Use InfiniBand for best performance
• Bigger database needing faster backup rates
• Lower CPU overhead
InfiniBand configuration best practices for tape
• Database nodes and Media Server configuration

• Use Oracle Enterprise Linux on the Media Server
• Use same kernel and OFED packages as used on Exadata
Database Machine
• Enable IPoIB connected mode and MTU changes on the
Media Server
• No changes on database nodes needed
• Minimal CPU impact
• Observed less than 1 CPU Core used per instance
Configuration best practices for tape backup
• For tape based backup create a Database Service

―backup‖ that runs on all the instances of the
database.
• Use incremental backups and block change tracking
• Data block inspection is automatically offloaded to Exadata
• Use tape hardware compression in addition to Oracle DBMS
OLTP and EHCC compression
• Allocate 1 RMAN channel per tape drive for the
backup
• Let Listener Load Balancing distribute the connections
between all the instances
• Spreads the backup I/O‘s evenly over all database nodes
Configuration best practices for tape restore
• For restore into existing files

• Create a Database Service ―restore‖ that runs on all the
database instances
• Allocate 1 RMAN Channel per tape drive
• For restore into a new ASM Disk Group
• i.e. restore after loss of the ASM Disk Group
• Create a Database Service ―restore‖ that runs on a maximum
of two database instances
• Allocate 1 RMAN Channel per tape drive
Script examples
• RMAN configuration
configure default device type to sbt;
configure device type sbt parallelism 14;
• RMAN script for weekly backup

run {
backup incremental level 0 database tag 'weekly_level0';
backup archivelog all not backed up;
}
• RMAN script for daily backup

run {
backup cumulative incremental level 1 database tag 'daily_level1‗
backup archivelog all not backed up tag 'archivelogs';
}
Oracle Secure Backup advantages
• Oracle Secure Backup (OSB) tape-based backup

advantages
• Fastest database backup to tape via tight integration with
RMAN
• Unused block compression
• Inactive Undo blocks not backed up
• Very low cost
• MAA Validated
Oracle Secure Backup best practices
• Configure the Preferred Network Interface (PNI) to

direct the OSB traffic over the InfiniBand network
interface
ob> lspni (List Preferred Network Interface)

mediaserver1:
PNI 1:
interface: mediaserver1-ib
clients: dbnode1, dbnode2, dbnode3, dbnode4, dbnode5, dbnode6,
dbnode7, dbnode8
PNI 2:
interface: mediaserver1
clients: adminserver
Database Machine backup & recovery
Documentation
• Backup and Recovery Performance and Best

Practices for Sun Oracle Database Machine and
Exadata
• http://www.oracle.com/technology/products/bi/db/exadata/pdf/
maa_tech_wp_sundbm_backup_final.pdf
3rd Party Media Management Vendor
No additional complexity
• Third party vendors test and validate their own

products
• Contact the MMV for configuration best practices
• No additional certification specific to Exadata required
• Tune the network communication within the MMV to
exploit the full potential of the InfiniBand or GigE
networks
• Production customers are using third party tape
backup products to backup Exadata systems today
Backup and recovery

with Data Guard
Backup & recovery with Data Guard
Offload backup operations to standby database
• Both disk and tape based backups can be performed

from the physical standby Data Guard environment
• Offloads the backup to the standby environment
• Reduce backup times with fast incremental backups
• Eliminate impact to the primary environment
• Additional Data Guard benefits
• Auto block repair with zero impact on application
• Offload reads and reporting, backups, and testing
• Used for planned maintenance and rolling database upgrade
• Used for disaster recovery or high availability with Data Guard
Fast-Start Failover
Data Guard & the Database Machine
Data Guard Best Practices
• Oracle Data Guard: Disaster Recovery for Sun Oracle

Database Machine and Exadata
• http://www.oracle.com/technology/deploy/availability/pdf/maa
_wp_dr_dbm.pdf
Best practices for data loading
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Data loading and Exadata <Insert Picture Here>
• Oracle Database File System (DBFS)

• DBFS performance expectations
• Configuration and implementation
Data loading and

Exadata
Data loading
Definitions
• External tables
• Allows flat file to be accessed via SQL PL/SQL as if it was a table
• Enables complex data transformations and data cleansing to
occur ‗on the fly‘
• Avoids space wastage
• Direct Path loads in parallel
• Bypasses buffer cache and writes data directly to disk via multi-
block async I/O
• Use parallel to speed up load
• Remember to use Alter session enable parallel
• Range partitioning
• Enables partition exchange loads, with minimal service
interruption
Data loading
Exadata challenges
• The optimal method for loading a data warehouse is using
external tables.
• The Oracle Database Machine consists of a scalable storage grid
and a scalable database grid with Real Application Clusters.
• You can‘t run a cluster-parallelized SQL statement against an
external table unless it resides in shared storage.
• You can‘t maximize the throughput for the database grid of the
Oracle Database Machine with a simple single-headed NFS filer
or other such bottlenecked solution
Oracle Database File

System
Oracle Database File System (DBFS)
Architecture Overview
Linux Client Host
• FUSE (File system in Application
UserSpacE)
LibC
• An API and Linux Kernel module
used to implement Linux Linux Kernel Fuse
filesystems in user land.
Linux Client Host
• DBFS is a file system Application dbfs_client
interface for SecureFiles (OCI)
LibC
• DBFS Content Repository
implements a file server Linux Kernel Fuse
• PL/SQL package implements file
calls
SQL*Net
• File create, open, read, Oracle Database
list, etc.
DBFS Instance
• Files are stored as Secure File
DBFS Content Repository Package
LOBs DBFS Content Repository Package
Create Open Read Write List
• Directories and file metadata Create Open Read Write List
stored in tables/indexes
Secure Files
Metadata Tables
Linux Client Host
• Combining DBFS with FUSE Application dbfs_client
offers mountable file systems (OCI)
LibC
• With RAC, DBFS is a scalable,
distributed file system. Linux Kernel Fuse
SQL*Net
Oracle Database
DBFS Instance
Secure Files
Metadata Tables
DBFS performance
expectations
Performance expectations
• Full Rack Oracle Database Machine can load a little more
than 5TB/h (V2)
• Data staged In DBFS
• DBFS tablespaces reside on same disks as data warehouse
tablespaces
• Data loaded into normal redundancy ASM Disk Group so double
writes
• Total I/O == 15.6TB/h or 4.4 GB/s
Performance expectations
• 4.4 GBs/second -
• 1.5 GB/s flowing from a file system housed in one Oracle database
• 2.9 GB/second of writes (ASM normal redundancy)
• Could you achieve the same outside of DBFS on Database
Machine?
• 1.5 GB/s supply-side is 13 active line-rate GbE paths, or
• 2 active IB paths with NFS via TCPoIB from a high end NAS
device
• DBFS solves the problem without any additional resources
outside the rack
Configuration and
execution
Configuration
DBFS
• House DBFS in a dedicated database
• Use DBCA with OLTP template to create database
• AMM or ASMM are fine…prefer ASMM though
• 8GB SGA buffer pool, 1GB shared pool
• Redo logs should be at least 2GB
• Create bigfile tablespace for the file system (8K, 16K
blocksize)
• Create a DBFS user (e.g., dbfs identified by dbfs)
• Grant create session, create table, create procedure and
dbfs_role to DBFS user
• Grant quota unlimited on the DBFS tablespace to DBFS user
DBFS
Implementation
• Create DBFS File System
• cd to $ORACLE_HOME/rdbms/admin)
• Start SQL Plus
SQL>@dbfs_create_filesystem_advanced.sql <TS Name> <FS
Name>\ nocompress nodeduplicate noencrypt non-partition
• Mount the file system
$ nohup $ORACLE_HOME/bin/dbfs_client dbfs@ -o\
allow_root,direct_io /data <passwd.txt &
DBFS
Implementation
• Move flat files to DBFS using FTP, SCP
• Define external tables with CREATE command
• You can move compressed files to save network bandwidth
• Use preprocessor directive to decompress external tables
Data loading best practices
External Tables
• Full usage of SQL capabilities directly on the data
• Automatic use of parallel capabilities (just like a table)
• No need to stage the data again
• Better allocation of space when storing data
• High watermark brokering
• Additional capabilities
• Optional sorting at load time (think improved compression)
Direct Path loads
• Data is written directly to the database storage using
multiple blocks per I/O request using asynchronous
writes
• Data bypasses buffer caches
• A CTAS command always uses direct path
• An INSERT AS SELECT needs an APPEND hint to
go direct
Insert /*+ APPEND */ into Sales

partition(p2)
Select * From ext_tab_for_sales_data;
Parallelize the load
• Specify parallel attribute either with hint or in both

table definitions
• CTAS will go parallel automatically when DOP is
specified
• IAS will not automatically parallelize
• Needs parallel DML to be enabled
ALTER SESSION ENABLE PARALLEL DML;

Partition exchange
1. Create external table
for flat files Sales Table
DBA May 18th
2008
May 19th
2008
May 20th
2008
2. Use CTAS command
to create non- May 21st
partitioned table 2008 Sales
TMP_SALES May 22nd
table now
2008 has all the
Tmp_ sales May 23rd
data
Table 2008
May 24th
th
3. Create indexes 2008 5. Gather
Statistics
4. Alter table Sales
exchange partition
May_24_2008 with table
Tmp_ sales tmp_sales
Table
Receiving and injecting

Staging flat files
Receiving files
Provider System
SCP,FTP, etc
RAC Node | DBFS Client RAC Node | DBFS Client RAC Node | DBFS Client
DBFS Instance 1 DBFS Instance 2 DBFS Instance 3
/data/FS1 /data/FS1 /data/FS1 …
Oracle Database
DBFS
Content Repository: FS1
Oracle Database Machine

Staging flat files
Injecting files
• Provider systems must support FUSE client
• Linux x64, Linux x32, Solaris x64, Solaris Sparc, HP-UX PA-
RISC64, HP-UX IA64, AIX PPC64 :
• The dbfs_client executable can be used on the Provider
System to ―inject‖ data into DBFS repository
• Eliminates the need to mount the DBFS file system just to stage
files
• Required libraries:
• libfuse.so.2, libclntsh.so.11.1, libnnz11.so
• Very network-efficient
• Tremendous relief in Database Machine processor utilization
• Customers may choose TCPS protocol (SQL*Net)
Injecting data with dbfs_client
Architectural Overview
./dbfs_client dbfs@DBFS1 --command cp
/data/stage1/all_card_trans.ul
dbfs:/FS1/stage1/all_card_trans.ul
RAC Node
Provider System DBFS Instance 1 …
SQL*Net
dbfs_client executable (OCI)
Oracle Database
DBFS
Content Repository: FS1
Oracle Database Machine

Injecting data with dbfs_client
Performance comparison
• Without any tuning, injecting data into the DBFS repository

from the Provider System via TCP over Gigabit Ethernet is
nearly 80% more efficient than scp+ssh
SCP vs dbfs_client "Injection"
120
107
99
100
80
scp
60
46 dbfs_client "Injection"
40
20 10
0
% CPU MB/s
Consolidation of mixed workloads
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Consolidation challenges and questions <Insert Picture Here>
• Consolidation configuration options

Consolidation challenges
Typical consolidation challenges
• Packaged applications
• Schema name collisions
• Different SLAs
• 24/7 versus 8/5
• Daytime (<2 secs) vs Night (batch only)
• Workload types: OLTP, DW, hybrid
• Sizing for availability
• Predictable response times
• Application tier scalability
Key question
Consolidation of mixed workloads
• Can you mix workloads?

• Should you mix workloads?
• Can <> Should
• We also allow you to partition a table such that each row is in a
separate partition.
Mixed Workloads
Should you consolidate?
• Mixed workloads can be consolidated when one or more
of these exist:
• Excess capacity
• Inverted profiles
• Clear workload priorities
• Not good consolidation candidates when (for example):
• SLAs are incompatible
• Cannot use tools and techniques to provide separation
• IORM can't be employed effectively (i.e. Flash scans)
• No substitute for real testing
• Not yet enough field experiences for to derive best practices
Consolidation
configuration options
High-level consolidation options
Database
• Single RAC database
• Place all schemas in one database
• Pro:
• Better resource control
• Less overhead
• Focus on one database's performance and management
• Con:
• One set of instance-level params
• Outage affects all tenants
• Migration to single database can be challenging, since they
were separate for a reason
Database
• Multiple RAC databases
• Move databases with minimal changes
• Pro:
• Flexibility for different params and versions/patches
• Simple platform migration
• Security and isolation more easily achieved
• Con:
• Resource control more difficult
• More moving parts to manage
• Most common choice
• RAC One Node
• Single-instance databases, one cluster
Storage
• Single diskgroup, all cells
• Stripe all data across all cells (DBFS, DATA, RECO only)
• Pro:
• Maximum throughput/bandwidth
• Centralized resource control
• Less overhead
• Simpler management
• Con:
• Loss of two cells (normal redundancy) may cause an outage
• Recommended option
Storage
• Segregate groups of cells
• Isolated environments, more simultaneous failures tolerated
• Pro:
• Can sustain more simultaneous failures (potentially)
• Little chance of one database impacting other
• Con:
• Reduced throughput/bandwidth
• Management overhead
• Fewer CPUs to operate on decompression
• Sizing such environments is difficult, especially performance
Storage
If you are going to run mixed workloads successfully on
one Exadata system, one of the following have to be
true:
1.All priority workloads are OLTP.
2.All priority workloads are data warehouse.
3.The Exadata Smart Flash Cache is mainly used by
OLTP workloads.
1. KEEP attribute on objects
2. Flash disks
High level consolidation options
Exadata Smart Flash Cache considerations
• Size KEEP objects to fit in cache

• No software limits; up to you to size properly to fit physical limits
• Up to 80% of total flash cache on each cell can hold KEEP objects,
20% always reserved for "hot" objects.
• Keeping too much ~ no KEEP at all
• If SUM(bytes of KEEP objects) > (80% of SUM(bytes of ESFC)),
cannot depend on an object being in ESFC
• Overall recommendation is to start with no KEEP objects
• Currently, IORM does not help manage which objects are
cached
• Flash scans will also read from disk while reading from flash
Backup and DR considerations
• Backups
• Scheduling may require adjustment
• Currently dedicated hardware may need replacement or evaluation
• Reconciling differing SLAs if merging into single database
• Many databases: backup concurrency needs to be considered
• Disaster Recovery
• Disaster could impact all applications on consolidated platform,
which may cause conflicts in SLAs.
• Is DR site using Exadata? (EHCC considerations)
The only way to *know*: Test
• Best: Real Application Testing
• Real database load from actual recorded load
• Before and after statistics compare directly
• Note: Shared servers or connection pools may make Database
Replay difficult or impossible.
• Not Bad: Load testing tools
• Challenge: simulate production workload mix
• Expensive, difficult to implement
• Not Bad: Parallel production run, real users, real load
• Challenge: simulate real production workload mix
• Subjective user feedback: "slower" or "faster" or "crashed"
• Unfortunate: See if it "works"
• Too common
• Usually done under the "it should work" notion
Consolidation tools
Oracle features
Tools for successful consolidation
• Database Resource Manager (DBRM)

• I/O Resource Manager (IORM)
• Instance caging
• RAC
• Services
• Database server pools
• RAC One Node
Oracle features for consolidation
Database Resource Manager
• Allows control over:
• How CPU is shared among multiple applications
• Maximum CPU utilization of an application
• Manage runaway queries (based on execution time estimate)
• Degree of parallelism
• Multiple levels of prioritization
• Consumer groups are based on services, username, and
other session attributes
• Available to all Enterprise Edition databases
• Allocation scheme (resource plan) changes are dynamic
• Can also include I/O Resource Manager
Database Resource Manager
• I/O Resource Manager
• Inter-database I/O resource management (via storage cell config)
• Intra-database I/O resource management (via DBRM config)
• When using categories, can provide additional granularity
• Instance caging at the server level
• Provides a way to limit the amount of CPU an instance can use
RAC and related features
• Scalability and availability
• Oracle Services
• Enables workload management and workload placement
• Parallel servers follow service placement
• Services can be designated by
• Application
• Group of users (DBRM)
• Workload type
• Combination of these
• Database server pools
• Service associated with server pool
• One server (SINGLETON) or all (UNIFORM)
• Server pools have minimum and maximum number of servers
• Server pools have priorities
• High priority server pool can grab servers from lower priority server
pools when required
• A method to implement different SLAs
• Quality of Service (QoS) with server pools (coming in 11.2.0.2)
• Quality of Service management
• Identify existing server pools to manage
• Define Performance Classes based upon workloads
• Associate Performance Classes to databases services
• Map Performance Classes to SLAs
• Create Performance Policies
• Rank Performance Classes to map to SLA priorities
• Set a Performance Objective per Performance Class
• When Performance Objective is not met for a Performance Class
• Identifies bottlenecked resource and sends alert.
• If CPU is bottleneck,
• Adjusts CPU shares through DBRM
• Increases size of server pool within SP Override constraints.
• Maintains performance and audit record.
• RAC One Node
• Enables seamless single-instance failover utilizing RAC features
• Allows single instances to utilize Exadata features
Options for database CPU provisioning
Feature Non- RAC Granularity Granularity Change Active before I/O

RAC (measure) (manage) scope oversubscription
DBRM Y Y % of total % of total Resource N Y

CPUs CPUs plan (without ma
Instance Y Y Core Core Resource Y N
caging plan
Services N Y Server Server Server Y N
Server pools N Y Server Server Server Y N

group
Note: Database Resource Manager does more than provision CPU.

Consolidation sizing
Sizing for consolidation
Considerations
• Cumulative resource requirements
• Utilize AWR to determine current requirements
• Sizing presentation describes sizing in general
• Database Machines will probably have
• Faster CPUs
• Faster storage – more IOPS
• Greater network bandwidth – more MBs/second
• Reductions in amount of data moved to CPU
Sizing for the Database Machine
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Sizing challenges <Insert Picture Here>
• Sizing options
• Comparative sizing method
Sizing challenges
Sizing challenges
What‟s the big deal?
• One of three answers –
• Quarter
• Half
• Full
• A simple answer does not imply a simple process for
arriving at the answer
Sizing challenges
Issues
• Capacity sizing is simple
• A single number for each resource category
• Workload sizing is complex
• Reflects cumulative amounts of resource consumption across a
broad range of heterogeneous database operations
• Real world workload sizing is even more complex
• Interaction of workload demands, both average and peak, over
time against resources
• Additional resource demands stem from interactions of different
interactions
Sizing challenges
Impact of sizing decision
• System sizing drives solution price
• Under-sizing will reduce price & under-cut competitors
• Under-sizing will reduce pricing/discount pressure
• Under-sizing will result in business impact
• Impact can be MANY TIMES greater than system price
• Example: $60,000 under-investment ==> $1M+ business loss
• Can result in serious customer satisfaction issues or even
lawsuits
Sizing challenges
Impact of sizing decision
• Process is key to customer satisfaction
• Need a defined & documented process
• Process must produce the same results given same inputs
• Need to retain historical sizing documents
• Accuracy & transparency
• Must be reasonably accurate (often a RANGE of sizes)
• Must be able to explain the process (transparency)
Sizing options
Sizing processes
Analytic processes
• Comparative sizing
• System refresh
• System replacement
• Competitive
• Predictive sizing
• New application deployments
• Depend on accurate metrics of workload and real-world
comparisons
• Use predictive sizing to check comparative approach
• Hybrid approach
• Very scalable method for sizing customer systems
• Produces relatively accurate result
Sizing processes
Benchmark-based sizing
• Involves building and executing a benchmark or POC

• Enormous amount of work of questionable value
• Needs to simulate production data & volumes
• Needs to simulate end-user workloads
• Accuracy based on accuracy of simulation
• Many (most?) POCs do not properly model data & workloads
• Mocked-up data often includes improperly skewed data
• Not a scalable process for sizing thousands of
systems
• Benchmarks provide valuable data to fine-tune
analytic sizing
Sizing approach
Exadata considerations
• Database Machine implementation is unique
• Sum of Exadata features means much higher effective I/O
throughput
• I/O configuration is balanced and predetermined by CPU
count
• Can scale up (larger/more Database Machines) or out (more
Exadata Storage Servers), but minimum I/O bandwidth is fixed
• Memory is balanced and predetermined by CPU count
• Not enough data for real world workloads and
configurations to go with predictive approach
• The first phase of Exadata sizing approach focuses on
comparative approach built around database server CPUs
Comparative sizing
Real world needs Near-term, most Exadata
deployments will be
New system replacements.

Increasing Complexity & Risk
application An “80/20” rule applies.

deployments
(Predictive Sizing)
Sizing for DBMS

migration
(Either or both approaches)
Sizing for system replacement

(Comparative Sizing)
Comparative sizing
method
Comparative Sizing
Steps
1. Gather inputs from customer

2. Perform DB tier comparison
3. Validate storage requirements
4. Validate current system utilization
5. Evaluate growth projections
6. Quantify the resulting Exadata benefits
7. Conduct read-out & provide report to customer
8. Post-production follow-up
Comparative sizing
Gather inputs
Gather Inputs Existing config
Server (CPU) DBMS & Ver. Only Oracle is supported in V1 of tool
Disk size DBMS options RAC, Secure Files, OLAP, etc.
Utilization O/S Windows, Linux, Unix
Growth Server H/W Vendor & Model (HP, Dell, Sun, IBM, etc.)
X-factor Cluster Conf. Number of nodes (symmetric/asymmetric)
Report CPU CPU model, speed, cores & number
Feedback SAN/Disk DAS, SAN, NAS, vendor, model, speeds
DB Storage ASM, OCFS2, VCFS, etc.
Utilization AWR or other helpful (not mandatory)
Customer Pain Business and/or Technical Pains
Perf. KPI’s Customer measures of performance

Comparative sizing
DB tier CPU comparison
Gather Inputs
Server (CPU) Existing Config
M-Values Exadata
Disk size
Development
Utilization SPECint to define
Benchmarks Comparison
Growth
Metrics & Process
X-factor POC Results
Report
Feedback X4170 Server
Server-only sizing at this stage
Sizing
Equivalent sizing only. Not yet

DB Node Sizing
sized for growth, performance, etc
Comparative sizing
CPU comparisons
• Find the type, speed, and number of CPU cores of the
system that the Database Machine is competing against
or replacing
• Use SPECint comparisons to find the equivalent number
of Database Machine cores needed
• Adjust number of cores upwards if database/application moving
from single instance to RAC
• Adjust number of cores downwards, if competing or replacing
slower CPUs than shown in the tables (the table lists the best
case)
• Pick the size of the Database Machine (Quarter, Half, Full,
Mulitple Racks) that‘s closet to the number of cores
needed
Sun Sparc – SPECint Comparison
Sun Processor CINT2006_rates Equivalent
Database
System
Machine
Cores
M-Series Sparc64 VII (2.75 GHz) 49.1 / 4 cores = 12.3/core 0.45
M-Series Sparc64 VI (2.4 GHz) 352 / 32 cores = 11/core 0.40
E25K UltraSparc IV+ (1.95 GHz) 1230 / 144 cores = 8.5/core 0.32
V890 UltraSparcIV+ (2.1 GHz) 154 / 16 cores = 9.6 /core 0.35
T5xxx UltraSparc T2 Plus (1.6 GHz) 97 / 8 cores = 12.125/core 0.45
Note: These are the best case numbers on a per-core basis. Database Machine CPU
is 26.6/core
IBM Power – SPECint Comparison
IBM Processor CINT2006_rates Equivalent
Database
System
Machine
Cores
Power 7 Eight-Core (3.86
pSeries
GHz)
652 / 16 cores = 40.8/core 1.53
pSeries Power6 Dual-Core (5.0 GHz) 2180 / 64 cores = 34/core 1.28
is 26.6/core
Note: IBM does have faster per core numbers for the Power7. But this is based on a
quad-core version where they effectively plug in an 8-core chip, turn off 4-cores and
run the remaining 4 cores at a faster speed. This is benchmark special and not cost
effective for the customer.
HP Itanium – SPECint Comparison
HP Processor CINT2006_rates Equivalent
Database
System
Machine
Cores
Itanium Quad-core 9350
Integrity
(1.73 GHz)
134 / 8 cores = 16.75/core 0.63
Itanium Dual-core 9050
Integrity
(1.6GHz)
53.9 / 4 cores = 13.5/core 0.50
Superdome Intel Itanium 2 (1.66 GHz) 1650 / 128 cores = 12.9/core 0.48
is 26.6/core
AMD Opteron – SPECint Comparison
System Processor CINT2006_rates Equivalent
Database
Machine
Cores
Opteron Dual-Core 2222
HP DL185 G5
(3.0 GHz)
61 / 4 cores = 15.25/core 0.57
Opteron Quad-Core 2389
HP DL385 G5p 143 / 8 cores = 17.9/core 0.67
(2.9 GHz)
Opteron Six-Core 8439
HP DL585 G5
(2.8 GHz)
416 / 24 cores = 17.3/core 0.65
Opteron 12-core 6176
HP DL385 G7
(2.3 GHz)
398 / 24 cores = 16.6/core 0.63
is 26.6/core
Intel – SPECint Comparison
System Processor CINT2006_rates Equivalent
Database
Machine Cores
Xeon Dual-Core X5270
HP DL380 G5
(3.5 GHz)
90.7 / 4 cores = 22.7 / core 0.85
Xeon Quad-Core X5365
HP DL380 G5
(3.0 GHz)
116 / 8 cores = 14.5 / core 0.55
HP DL360 G5
(3.33GHz)
150 / 8 cores = 18.75/core 0.70
HP DL360 G6
(2.93 GHz)
251 / 8 cores = 31.4/core 1.18
is 26.6/core
Comparative sizing
Validate storage requirements DB Node Sizing
Gather Inputs
Server (CPU)
Disk capacity is fixed by rack size
Disk size Need vs Capacity SAS is assumed
Utilization
Growth
X-factor
Compression Assume not more than 2X + A.C. (not HCC)
Report
SAS vs SATA Assume SAS unless demand for SATA
Feedback
Expansion Cab. Must justify why extra Cells would be okay
DB Node & Storage Sizing

Comparative sizing
Validate CPU utilization and growth potential
Gather Inputs DB Node & Automated data collection (AWR, sar, etc.)
Server (CPU) Storage Sizing for utilization would be ideal, but cannot
be collected in all cases.
Disk size
Utilization Key Peak Stats
Growth % CPU Busy
X-factor % Mem Util

IOPS
Report
MBPS Adjust Sizing
Feedback
Data Growth
Proc growth
May adjust for some KPIs Perf. KPI’s Final Sizing for
Growth
Comparative sizing
Quantify Exadata benefits
Gather Inputs Customer Pains
Server (CPU) Performance KPI’s
Disk size Technology Adoption
Utilization Integration
Growth Business Pain
Exadata
X-factor
Advantages IT/Tech Pain
Report
Not a Single Number
Feedback
Map to Pains
Justifies purchase
Map to Exadata
Quantifiable Metrics Ex: I/O Bandwidth, IOPS, offload %
Metrics
Subjective Advantages Ex: Ease/Speed of deployment
Migrating to the Database Machine
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Initial considerations <Insert Picture Here>
• Migration strategies
• Migration methods overview
• Physical migration
• Logical migration
• Migration methods in practice
• Bulk data movement
Initial considerations
Database Machine software
Considerations
• Exadata Storage Server software and Oracle Database

• Versions must match
• Cannot run 11.2 Exadata with 11.1 Database (or vice versa)
• 11.1 and 11.2 cannot coexist on same machine
• Important consideration for migration from v1 to v2
• Sun hardware 11.2 only
• HP hardware either 11.1 or 11.2
• Operating system
• Oracle Enterprise Linux (OEL5) Linux x86_64
• Little endian format
Migration strategies
Migration strategy
Migration method considerations
• Determine what to migrate
• Because of Exadata unique features (e.g. Smart Scan), expect
differences between source and Exadata warehouse databases
• Fewer indexes, fewer materialized views, potentially different
partitioning strategy, compression
• Avoid methods that migrate what you will discard
• Consider configuration of source system
• Not all migration methods available for all source environments
• Non-Oracle: Not covered in this presentation, although many
methods work if you take into consideration platform differences
• Oracle: Source database version and platform matters
• Target system fixed: 11.2, ASM, Linux x86-64
Migration strategy
Migration method considerations
• Implement best practices

• Will the migration method accommodate best practices?
Examples
• Large extents (8MB) for large segments – at extent
allocation
• Don‘t consider migration method in isolation - avoid methods
that prevent proper best practices
• Minimize downtime
• Yes, but implementing best practices is more important (your
future performance depends on it)
Migration methods
overview
Migration methods
Overview
Physical migration Logical migration

• Data remains in datafiles • Data unloaded from
(block-for-block) source, loaded into
• Most methods are whole Exadata database w/ SQL
database migration • Easier to migrate subset
• Generally more restrictive • Easier to implement
structural best practices
• Generally less restrictive
http://www.oracle.com/technology/products/bi/db/exadata/pdf/migration-to-exadata-whitepaper.pdf
Migration methods
Migration method choice
• No single best method for all cases, but in general …
Data Warehouse OLTP

• Typical strategy • Typical strategy
• Change structure • Structure intact
Reduce / remove indexes, MVs
• Change storage
Use new compression (EHCC)
• Migration method choice
Optimize extent sizing • 1st: Physical
• Change platform • 2nd: Logical
Source big endian
• Migration method choice
• 1st: Logical
• 2nd: Physical
Physical migration
Physical migration
Basics
• Data remains in datafiles (block-for-block)

• Database extent sizes remain the same
• Most methods perform whole database migration
(except TTS)
• Inherit legacy database configuration
• indexes, MVs, no compression
• Stricter requirements
• Platform and version changes restricted
Physical migration
Challenges
• Best practices challenged

• Suboptimal sizing
• Migrate unnecessary objects
• Objects can be recreated post migration, but
• Why not use logical method in the first place?
Physical migration
Methods at a glance
• Physical standby
• Transportable database (TDB)
• Transportable tablespaces (TTS)
If best practices not already implemented on source

database, consider logical migration method
Physical migration methods
Physical standby
• Overview (Note 1055938.1)

• Create physical standby on DBM
• Data Guard switchover
• Source system criteria
• 11.2 on Linux (or Windows – see Note 413484.1)
• Use this method migrating from HP DBM running 11.2
• Outage time
• Consider
• Archivelog mode and LOGGING required
• New DB_UNIQUE_NAME needed
Physical standby plus database upgrade
• Create physical standby on DBM
• Apply archives
• Activate standby
• Run database upgrade scripts
• 11.1+ on Linux
• Outage time
• Time to apply archives + run database upgrade scripts
• Consider
• Archivelog mode and LOGGING required
• New DB_UNIQUE_NAME needed
Transportable database (TDB)
• Overview
• RMAN CONVERT DATABASE
• Transfer datafiles to Exadata storage
• CONVERT subset of datafiles, as required (up to 2GB/s) (Note:732053.1)
• Run transport script
• 11.2 on little endian
• Outage time
• Transfer all datafiles + partial CONVERT + transport script
• Consider
• Do not use source system conversion
• Staging space requirement – size of files that need CONVERT
• OLAP AWs need special consideration (Note 352306.1)
Transportable tablespace (TTS)
• Overview
• Build empty 11.2 Exadata database
• TTS export source system metadata
• Transfer files to Exadata (CONVERT if source system big endian)
• TTS import metadata into Exadata database
• 10.1 or later, any endian
• Outage time
• TTS export + Transfer files + CONVERT (if necessary) + TTS import
• Consider
• If source system big endian, CONVERT on source system
• Staging space requirement - size of files that need CONVERT
• OLAP AWs need special consideration (Note 352306.1)
Physical migration
Method selection
Method When to use

Physical standby Linux source on 11.2, archiving and LOGGING
Transportable database Little endian source on 11.2

Transportable tablespaces Big endian source >= 10.1
Little endian source >=10.1, <11.2
Logical migration
Migration methods
Logical migration
• Data unloaded from source, loaded into Exadata

database w/ SQL
• Move only the user data
• Best practices can be added
• 4MB ASM AU size set for new disk groups
• Large extents (8MB) for large database segments
• Table compression, if desired
• Partitioning (added or changed), if desired
Logical migration
Methods at a glance
• Logical standby
• GoldenGate / Streams
• Data Pump
• Create Table As Select (CTAS) or Insert As Select
(IAS)
Logical migration methods
Logical standby
• Overview
• Steps depend on starting point - See following slides
1. Source database 11.2
2. Source database < 11.2 (including HP DBM)
• Linux (check Note 413484.1 for cross-platform support)
• Outage time
• Typically Data Guard switchover + application failover
• Consider
• Archivelog mode, LOGGING, and supplemental logging required
• Data type support
• Can apply catch up?
Logical standby – source system 11.2
• Overview
• Create logical standby on 11.2 DBM
• Change table storage characteristics, as desired (Note:737460.1)
• When to use this method
• Table storage characteristics will be changed
• If not, use physical standby method
Logical migration
Logical standby – source system < 11.2

• Create logical standby on source system (e.g. 11.1 HP DBM)
• Shutdown and copy logical standby + controlfile to 11.2 DBM
• RMAN: duplicate target database for standby from
active database
• Upgrade logical standby to 11.2 (run upgrade scripts
manually)
• Enable redo transport and standby apply to catch up
• Change table storage characteristics, as desired
(Note:737460.1)
• DG switchover
• When to use
• Table storage characteristics will be changed or
• Rolling database upgrade
Logical migration
GoldenGate / Streams
• Overview
• Create and upgrade replica on DBM
• Stop apply
• Implement best practices on replica (e.g. unload, recreate, reload)
• Start apply to catch up
• Disconnect users from primary, reconnect to DBM
• 10.1+ on any platform (GoldenGate allows different DBMS, too)
• Outage time
• Application reconnection
• Consider
• Archivelog mode, LOGGING, and supplemental logging required
• Data type support
• Can apply catch up?
Logical migration
Data Pump
• Overview
• Create Exadata database
• Import user data into Exadata using Data Pump
• Network mode - Direct import from source via dblink
• Can result in large UNDO on target
• File mode - Export to dump file(s), transfer file(s), Import
• 10.1 or later on any platform
• Outage time
• Network mode - 1x data movement
• File mode - 3x data movement and 2x staging space
CTAS / IAS
• Overview
• Create Exadata database
• CTAS or IAS
• From external tables in DBFS staging area
• From dblink to source database
• Any version or platform
• Outage time
• Significant (3x) variation depending on partitioning (and what
scheme), compression, target data type
• Consider
• Use DBFS for staging external tables, not local filesystem
• Dblink - Manually parallelize
Logical migration
Method selection
Method When to use

Logical Standby Rolling database upgrade requirement
Table storage characteristics will be changed
Oracle GoldenGate Minimal downtime requirement
Streams Different source platform
Data Pump Data type restriction with other methods
CTAS / IAS Initial bulk load
Migration methods in
practice
Migration methods
In practice
• Current most data warehouses not on Linux x86-64

and not running 11g, so most physical methods
eliminated
• Most data warehouses replaced by Exadata are running
either Oracle on big-endian UNIX, or competitor (e.g. DB2,
Netezza, Teradata)
• Customers only want tables with user data in order to
implement new database configuration determined
during testing
Migration methods
In practice
• Most common methods used thus far

• Combination for staged migration
• CTAS/IAS or Data Pump for the initial bulk load into
Exadata while source remains in use
• Perform daily loads (external tables) into both source and
Exadata
• Initially users serviced by source database
• Move users over to Exadata
• Stop daily load into source
Migration Scenario
From 11.1 HP DBM
• Restriction
• RDBMS 11.1 cannot use Exadata 11.2
• RDBMS 11.2 cannot use Exadata 11.1
• Option #1 - Physical Standby + Database Upgrade
• Option #2 – Logical Standby source system < 11.2
• Reduce downtime – rolling database upgrade
Migration Scenario
From 10gR2 / 11gR1 on Big Endian
• Option #1 – Transportable Tablespaces

• Option #2 – Data Pump
• Implement best practices not in source database
• Option #3 – GoldenGate, Streams
• Reduce downtime
Migration Scenario
From 10gR2 / 11gR1 on Little Endian (non-DBM)
• Option #1 – Physical Standby + Database Upgrade
• Check Note 413484.1 for cross platform standby support
• Option #2 - Logical Standby source system < 11.2
• Reduce downtime – rolling database upgrade
• Check Note 413484.1 for cross platform standby support
• Option #3 - Data Pump
• No cross platform standby support
• Option #4 – GoldenGate, Streams
• Reduce downtime
Bulk data movement

Bulk data movement
• Performance criteria
• Network
• Protocol
• Source system
• Target system (i.e. DBM)
Note: Bulk data movement to the DB servers – you do

NOT move data directly to the storage – it always
goes through an instance on a DB server first.
Bulk data movement
Network
• 2 networks can get data to DB servers on DBM

• InfiniBand (IB) 4x QDR 40Gb/s per link
• Gigabit Ethernet (GbE) 1Gb/s
• eth1 and eth2 can be bonded for aggregation
• In practice, IB is about 20x faster than single GbE

• IB 2GB/s vs GbE 110MB/s for single connection (TCP)
Use IB network
Bulk data movement
Protocol
• TCP over IB (TCPoIB)
• On source system
• Use IP connected
mode (CM)
• Set Large MTU
(65520)
• DBM DB servers
already configured
• RDS - only used by

Oracle for RAC and
storage traffic
• SDP - stick w/ TCP

Bulk data movement
Protocol
• Oracle Net TCP

• Set SDU=32767
• Yields more efficient write by Oracle Net to socket buffer
Bulk data movement
Source system
• Source system
• I/O subsystem must deliver
• Fast IB network can‘t compensate for slow I/O
• CPU usage varies
• Data transfer with very fast networks can cause high CPU usage
• One CPU may be pegged while others have headroom (e.g.
interrupt handling)
• Use mpstat(1) to investigate
Bulk data movement
Target system
• Target system (DB servers of the Exadata system)

• ASM for staging
• Stored in Exadata
• Oracle-structured files only (e.g. data files, DP dump files)
• Excellent disk I/O throughput
• Oracle tool required to move data (DFT, ASMCMD CP, RMAN
BACKUP AS COPY AUXILIARY, XDB FTP)
• DFT 115 MB/s for single connection (use multiple to scale)
• Double (or triple) writes for ASM redundancy
• 600MB/s network rate translates to 1200+MB/s ASM write rate
Bulk data movement
Target system
• Target system (DB servers of the Exadata system)

• DBFS for staging (Note 1054431.1)
• File system in a database, using Exadata storage
• Standard OS tools
• Local disk file system for staging
• Do NOT use it for staging
• Not designed for performance
• Use DBFS – better performance, higher capacity, shared
Why you don‟t need a Database Machine
Presenter‘s Name
Presenter‘s Title
Module Agenda
• Magic? <Insert Picture Here>
• The goal
• Build your own database machine
• Do you need the Database Machine?
Magic?
Looking back
• At this point, you understand more about Exadata

technology (hopefully)
• Software
• 11g
• Exadata Storage Server Software
• Hardware
• Components
• Flash
• Balanced configuration
Is it magic?
Well, is it?
• No
• You could build a machine to deliver the same
performance
• As long as it can achieve the same proven throughput as a
Database Machine
The goal
The goal
Winter Corporation Exadata Proof of Concept
• Workload
• Execute 4 complex, concurrent queries
• Vast amounts of data, query I/O rate peaked at 14
GB/s and queries complete in 99 seconds
• Sun Oracle Database Machine completes these
queries in 48 seconds—without using any V2 software
features.
• 48 seconds == 20.8 GB/s
The goal
Don‟t forget!
• Remember –
• The Sun Oracle Database Machine is a balanced
configuration, so you must guarantee I/O
throughput capabilities in every section of the
machine you will build
• Database Machine also provides for balance
across components and high availability
Build your own Database

Machine
Build your own Database Machine
Network bandwidth
• To achieve 20.8 GBs/s

• 53 active 4GFC Fibre Channel paths
• Fibre Channel SAN Arrays put disks in drawers
• Drawers are connected to the array controller with 4GFC FC
cabling
• Need 53 drawers (maybe 26 depends on array)
Disk bandwidth

• Exadata offers this 20.8 GB/s with 168 SAS disks.
• 15K RPM SAS disks are the same as 15K RPM FC drives
• You have to spread 168 disks over 53 drawers
Array controllers

• ~3 disks per drawer == massive wasted cabinet space
• More drawers == more array controllers
HA for storage cabling and switches

• 53 active paths need HA protection
• Dual port HBAs
• Now I need 106 runs of FC cabling from storage to switch and
106 from switch to hosts
• Ugh, I need multiple switches.
• Director/high end switches?
CPUs

• And now the fun begins. The data has to get to a set of
database hosts
• V2 Database Machine has 8 2s8c16t Nehalem EP based
server. Great, that‘s easy. But, Exadata does offload
processing…hmmm…
• I need 39 CPUs just to match the Exadata offload processing
CPUs for offload processing

• I also need the 8 database hosts (2s8c16t Nehalem EP) that
the Database Machine used
• So:
• If using 2s8c16t servers I need 103 cores…round up to 13
servers.
• OK, go make a 13 node RAC cluster and work out the 53
active Fibre Channel paths…all with balance!
Create and build the cluster

• OK, go make a 13 node RAC cluster and work out the 53
active Fibre Channel paths…all with balance!
• 53/13 is 4 HBAs per host.
• 2s8c16t servers generally don‘t support 4 dual port
HBAs…but maybe I found some that do…
Is it worth it?
Typical DW technical architecture
Hardware needed to achieve 6 GB/s
Ethernet Network Team Switch Vendor
Interconnect
Database DBAs DB Vendor
Unix Sys Admin OS Vendor

Unix/Linux OS
HBA H/W admin HBA Vendor

Massively Shared Infrastructure
Storage design LVM Vendor

Volume Manager
FC Switches Data Fabric FC switch Vendor
LUNS Storage Admin Storage Vendor
Storage array Vendor Support
What chance of getting it right ?

Virtually Impossible to scale
Hardware needed to achieve 18 GB/s (v2)
or . . . .
Hardware needed to match X2-8 (2x cores)
or . . . .
Database Machine
Hardware needed to achieve 18 GB/s
Is it worth it?
Database versus purpose-built
Purpose-built Database Machine

• Does not require benefits • Not fully saturated
of Exadata smarts • Single vendor
• Replaced software with • Scalable
hardware • Preconfigured
• Very complex to
implement and manage
• 13 node RAC grid is
totally saturated
• Potentially unpleasant
place to be
DIY Exadata-like Performance?
Heresy?
"If IT guys go out and build infrastructure under an Oracle
database in their enterprise IT shop, it's a major design
project. […] the IT team has to go in and figure out what's
the right servers to buy, what's the right storage to buy, how
do I connect them all together properly into a cluster or a
SAN or whatever they're doing.
"And this is a big deal: it takes months and months, and lots
of negotiating with lots of vendors, and at the end of the day
they have this completely unique system that they built—and
it's really good, but they're the only ones in the world who
have this unique system. Which means that if there's any
problem, they're going to be the first ones to find it, right?
--Andrew Mendelsohn
Is it worth it?
Well, is it?
© 2008 Oracle Corporation – Proprietary

Student Manual

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Student Manual

Uploaded by

Copyright:

Available Formats

For Oracle employees and authorized partners only. Do not distribute to third parties.

• Why Exadata? <Insert Picture Here>

• Exadata features for increasing resources

• Fast Predictable Performance

• Lower Ongoing Costs

• The Fastest Time to Value & Lowest Risk

Exadata features for

Database Grid Intelligent Storage Grid

• Exadata has 5 TB of flash

• Intelligently manages flash

Exadata features for

• Exadata storage servers also run more complex

 • CPU consumed by predicate

• Data is organized and compressed by column

• Speed Optimized Query Mode for Data

• Runs faster because of Exadata offload!

• Space Optimized Archival Mode for

Faster and Simpler

Exadata features for

• Ensure different databases are

• More predictable timeliness of results

• Properly configured out-of-the-box

• More capabilities to support more

“After carefully testing several data warehouse platforms, we chose the

Typical • Performance scales with size

Exadata sizing and

Database Grid Intelligent Storage Grid

• All Database Machines are the same

• Runs existing OLTP and DW applications

Deploy in Days, • Leverages Oracle ecosystem

―You can easily remove six months of the

―…we estimate there‘s up to a 70 percent reduction in

from Profit Magazine, February 2009

• 2 Six-Core Intel Xeon Processors (L5640)

• Dual ported 40 Gb/sec InfiniBand

• Intelligent Exadata Storage Server software

• 2 x64 Eight-processor Database servers (Sun Fire 4800)

Add more racks for additional scalability

• 8 x64 Dual-procesor Database Servers (Sun Fire X4170 M2)

Add more racks for additional scalability

• 4 x64 Dual-procesor Database Servers (Sun Fire X4170

Can Upgrade to a Full Rack

• 2 x64 Dual-procesor Database Servers (Sun Fire X4170 M2)

Can Upgrade to an Half Rack

Quarter Half Full

Raw Flash1 5.3 TB 5.3 TB 2.6 TB 1.1 TB

User High Perf Disk 28 TB 28 TB 14 TB 6 TB

Raw Flash Data Bandwidth1,4 50 GB/s 50 GB/s 25 GB/s 11 GB/s

High Perf Disk 50,000 50,000 25,000 10,800

1 – Bandwidth is peak physical disk scan bandwidth, assuming no compression.

• Two Operating System Choices on the database servers

• Best for Data Warehousing

• Best for OLTP

• Best for Consolidation

• Smart Scans <Insert Picture Here>

• Smart Scan feature support

• Finite resources can lead to performance bottlenecks

• Traditional Scan Example:

• Traditional Scan Example:

• Traditional Scan Example:

• Traditional Scan Example:

Smart Scan feature

• Smart Scan only returns columns requested by query

• Join filtering for star schemas