You are on page 1of 44

<Insert Picture Here>

Planning Then and Now


Su Tang Sri Subramaniam RACPACK

Oracle Real Application Clusters: Sizing and Capacity

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracles products remains at the sole discretion of Oracle.

Agenda
Capacity Planning in GRID/RAC Environment Scalable Infrastructure Design
On Demand Capacity Addition and Utilization

Criteria to add more Capacity Real World Customer Example Questions

<Insert Picture Here>

Capacity Planning

RAC Capacity Planning


Advantages All current practices still apply
Network Storage sizing Interconnect Network capacity Servers capacity Application Service design

RAC flexibility ensures


Good initial estimate is sufficient Easily accommodates Growth Emphasis shifts to capacity utilization

<Insert Picture Here>

Storage Network

Networked Storage
RAC works with both SAN and NAS Storage Optimal Storage selection depends on ..
Estimated I/O Response Time Typically single block I/O requests Common characteristic of most OLTP applications IOPS measure used Estimated I/O Bandwidth Large multi-block I/Os Data Warehouse and Mix workload environments Occurs during backup/recovery operations Estimation should include requirements for both normal/backup I/Os

Storage Capacity Planning

Estimate initial data size and growth rate for all the applications (E.g., 500GB initial, double over two years, 1TB total)

Add the fault tolerance requirements (E.g., 2TB with RAID1, 1.2TB with RAID5)

Add the backup requirements to the size (E.g., Additional 1TB for a full, another 1TB for 5 incremental)

Storage Capacity Planning

Estimate aggregated throughput and IOPS (E.g., 2GB/sec, or 300,000 IOPS)

Calculate the total bandwidth requirement per node (E.g., 2GB/sec for 16 nodes = 128MB/node/sec or 300,000/16 = 18,750 IOPS/node)

Choose the appropriate storage class and build the configuration (E.g., 1,200 IOPS per spindle, 16-way striped = 19,200 IOPS per LUN)

<Insert Picture Here>

Interconnect Network

Interconnect Capacity Planning


RAC interconnect usage
Oracle Clusterware Very small messages exchanged periodically Response time/load critical not big bandwidth consumer Oracle RAC Database Primary user of interconnect capacity Exchanges both small and large messages between nodes Key driver in deciding the network configuration

RAC Messages
Small 256 byte messages
Used by GES and GCS

Cache Fusion blocks messages


Db_block_size

Parallel Query
Parallel_execution_message_size default 8k

Interconnect Bandwidth
Message received (M) per second
(#GES message + #GCS messages)

Blocks received (B) per second


(db_block_size * (#cr block received + #current block received)) / mtu size

PQ message received (P) per second


(PQ_message_size * # PX remote messages recv'd ) / mtu size

Total bandwidth required per second


(Message received + Blocks received + PQ message received) / max network transmit capacity (M+B+P)/85000

Similar equation applies to send side

Example from AWR Report


Global Cache blocks received: GCS/GES messages received: PX remote messages recv'd

2,534 8,11 65

Db_block_size 8192 Parallel_execution_message_size 8192 Mtu_size 1500 One Gigabit ethernet interface for interconnect

Total bandwidth Reqd= (M+B+P)/85000


= (2534 + ((811 *8192)/1500) + ((65*8192)/1500) )/85000 8.5 % of capacity utilization

Interconnect Bandwidth
Available Interconnect Bandwidth in IP based network
Depends on the network packets transmitted The comparison of theoretical bandwidth using total bytes transmitted is not accurate

Available Network Bandwidth


120

100

80

MB/sec
60 Series1

40

20

0 256 byte 512 byte 1024 byte 2048 byte 8192 byte

Message size in bytes

RAC Interconnect
Experience shows for most applications single Gigabit Ethernet is adequate In planning 70 % utilization should be reasonable point to add additional interfaces

<Insert Picture Here>

Server Capacity

Server Capacity Planning


To size the server optimally
Consider total no of concurrent processes Estimated CPU utilization of critical queries Grid control/ SQL Trace should give this data Plan for max run-queue length 2 * no of CPUs During high utilization periods never to exceed 70% overall CPU in the box

Factor the percentage of capacity each server adds


This would help to attain your High Availability Goals In planned outage situations it will help to Determine whether surviving nodes can support the workload

Server capacity Planning


Ensure optimal no of HBAs are available
To get desired I/O response time & bandwidth Plan for 50-70% Capacity utilization

Ensure optimal number of NICs avaiable


For both public and cluster interconnects And for NAS Storage if used

<Insert Picture Here>

Infrastructure Design

Scalable Infrastructure Design


Very critical aspect in new capacity planning exercise Critical elements of scalable infrastructure design consist of
Networked Storage Interconnect Network Optimally sized servers Software and Application Service

Infrastructure Design
2 SAN Switches Low-end SAN Storage 2 ports from each Storage Processor connected to each SAN switch Equal-size RAID5 LUNS are distributed among all SPs On Storage Processor failure in Array LUNs would failover
SAN Fabric 1 Storage Farm SAN Fabric 2

Storage 01

Storage 02

Storage NN

Infrastructure Design
2 CPU and 4 CPU boxes 2 port HBA connecting to each server LUNS are load-balanced on both ports Protects from SP, Array port, Single HBA, Single SAN switch
Server Farm a001 a002 a003 aNNN

b001

b002

b003

bNNN

SAN Fabric 1 Storage Farm

SAN Fabric 2

Storage 01

Storage 02

Storage NN

Server and storage farms horizontally scalable (scalingout)

Infrastructure Design
WAN NAS NN IP Network Public/App-DB Private Interconnect NAS/iSCSI Management LAN

Server Farm

a001

a002

a003

aNNN

b001

b002

b003

bNNN

SAN Fabric 1 Storage Farm

SAN Fabric 2

Storage 01

Storage 02

Storage NN

Server and storage farms horizontally scalable (scalingout)

Infrastructure Design
Separate Switches for PUBLIC, Private, NAS if used and Management Network Redundant Networks for PUBLIC, PRIVATE and NAS
- For most configurations active/failover should be sufficient - Where Load-balancing used ensure correct option of Network Redundancy is used to provide both send and Receive side load balance 803.2ad is used to aggregate switch ports 803.2ad is used in the host to bond the interfaces

Storage Network
Implement zoning / masking using
Simple scheme where all LUNs are visible across all nodes, if the cluster infrastructure is used by multiple databases

Create equi-sized LUNS that meets planned I/O characteristics Ensure LUN can support combined throughput of all concurrent RAC node access Avoid ISL in SAN switch design by sizing the SAN switch appropriately In ASM diskgroup add disks with similar storage characteristics and capacity

Interconnect Network
Ensure proper VLAN for the cluster-interconnect network Avoid cascading switches If NIC bonding used ensure switch ports are appropriately configured to provide both send/receive side load balancing Ensure similar vendors NICs are teamed in the host

Server Design
Ensure similar sized servers are clustered together Ensure Remote Administration has been correctly setup Use Automated procedures to check consistency of correct OS, firmware and application software version and revision levels
Cluster Verification Tool Verifies infrastructure,Clusterware and RAC configurations ORION Measures available I/O bandwidth and Response Time IPERF Measures & reports network performance

<Insert Picture Here>

Software Considerations

Cluster Software Design


If multiple Databases are using common cluster infrastructure
Ensure similar sized nodes are clustered together Install separate single CLUTER_HOME Install separate single ASM_HOME DB_HOMEs could be installed/expanded as required

<Insert Picture Here>

Adding Capacity

When to Add More Capacity


These Guidelines assumes
All configuration and Best Practices are followed And all necessary SQL, DB tuning is performed

Key threshold to monitor for disk I/O


Db_file_sequential_read > 25 msec Db_file_scattered_read > 30 msec Log_file_parallel_write > 3 msec

Determine the source of the bottleneck


HOST, HBA, SAN Switch or Storage Array

When to Add More Capacity


Thresholds to monitor Interconnect Network
Assumes following pre-requisites Host CPUs in any RAC instance node is not maxed out Correct Network Configuration and Best Practice followed Log_file_parallel_write not > 3 msec If cache fusion message latencies exceed following limitations
AWR Report Latency Name Average time to process cr block request Avg global cache cr block receive time (ms) Average time to process current block request Avg global cache current block receive time(ms) Lower Bound 0.1 0.3 0.1 0.3 Typical 1 4 3 8 Upper Bound 10 12 23 30

AWR Report RAC Statistics

When to Add Capacity


Server
Overall CPU utilization constantly exceed 70% Run-queue length is > 2*CPU for long periods of time

<Insert Picture Here>

Real World Example

Mercado Libre
eBay in Latin America Runs marketplace from search to Bid In 2004 moved from mid-range SMP to
4*4 node Itanium2 Linux RAC Cluster 16 Gig RAM each Node NFS filer storage Initially estimated 400,000 TP hour good for 2 years

Mercado Libre
Scaled incrementally as marketplace grew

1,600,000

Business Volume

1,400,000 1,200,000 1,000,000 800,000 600,000 400,000 200,000 0

2004

2005

2006

Nodes

Mercado Libre
Performance Characteristics MercadoLibres 13 node Linux Itanium cluster
460 GB RAM clusterwide 286 GB SGA 14,500 URLS/second 47 GB/ redo /day

Only use a maximum 40% of the capacity of a single Gigabit Ethernet interconnect

Summary
Plan initial sizing with good estimate Design a Scalable infrastructure Grow capacity with business volume Resource utilization is the key driver

For More Information

http://search.oracle.com
REAL APPLICATION CLUSTERS

or otn.oracle.com/rac

You might also like