You are on page 1of 49

Beyond

2 Node RAC
Workload Management
in the Grid
Erik Peterson
Real Application Clusters Development
Agenda

y A Gradual Approach to the Grid


y An Architectural Mindset Change
y Automatic Workload Management
y More Customer Examples
• Operates over 3,700 retail stores

• Presence in over 910 cities across 18 countries/


regions

• 13 retail brands
• Manufactures a full range of leading beverages from
bottled water, fruit juices, soft drinks and tea products

• Employs over 64,000 staff

• 2003 turnover HK$60bn


Phase 1 – EBIS Single Node DB and Application server.

Users

Application server Node RAC being designed.

Database Node

!
NE
DO
Phase 2 – EBIS Dual Node DB and Application server.

Users

Application server Node RAC being released.

Database Nodes with Veritas VCS


in failover mode. Failover node is

!
NE
usually idle.

DO
Phase 3 – EBIS Dual Node RAC DB and Application server.

Users

RAC being adopted.

Application server Node

Database Nodes with Veritas DBAC cluster


and RAC. Load is balanced between both
nodes.

!
NE
DO
Phase 4 – EBIS + PNS Dual Node RAC DB and Application server.

RAC being proved.


10G being released.

Application Application
server Node server Node

Database Nodes with


Veritas DBAC cluster
and RAC. Load is
balanced between both
EBIS Park n Shop nodes.
Retek RMS
Phase 5 – EBIS + PNS Four Node RAC DB and Application server + PNS D/W.

RAC being adopted.


10G being proved.

Application Application
server Node server Node

EBIS Park n Shop


Retek RMS
Business benefit is that we can use spare capacity on EBIS for PnS Park n Shop + Fortress RDW
Phase 6 – Multi-Node RAC DB and Application servers + RAC Development.

Application
server Nodes

EBIS, PnS Retek RMS, PnS RDW, Fortress RDW Production Cluster.

Development
cluster.
Phase 7 – Full development and Production Cluster. User Population
Intel/Linux based blade
server for applications
and web front end.

Production Cluster
LOAD
BALANCER

LOAD
BALANCER

Development Cluster
LOAD
BALANCER

Nodes can be moved from Development to


Production to cope with seasonal demands.
An Architectural Mindset Change
Monday Morning

100% 20% 50% 40%

Email Payroll OE DW
Payroll Run (night)

10% 100% 10% 80%

Email Payroll OE DW
Xmas Orders

40% 20% 100% 40%

Email Payroll OE DW
Xmas Orders – Desire

Payroll
OE
Email DW
Xmas Orders:
Before Failure

DW
OE
Payroll
Email
Xmas Orders:
After Failure

DW
OE
Payroll
Email
Paradigm Shift

4000

3500

Traditional 60~70% 3000


Grid
Computing
2500
Computing
2000

1500

1000

500

0
Requirements

y Location Transparency
y Workload Balancing
y Resource Management
y Manageable as a Single Environment
Automatic Workload Management

y Application workloads can be defined as


Services
– Individually managed and controlled
– Assigned to instances during normal startup
– On instance failure, automatic re-assignment
– Service performance individually tracked
– Finer grained control with Resource Manager
– Integrated with other Oracle tools / facilities
Services are a unit for
performance
y A new level dimension for performance tuning
– workloads are visible and measurable
– tuning by “service and SQL” replaces “session and
SQL” in most systems where sessions are shared.
– Performance measures for real transactions
y Alerts and actions when performance goals are
violated
Automatic Workload Management
Provides Visibility
Useful Service Views

y Service status in V$ACTIVE_SERVICES,


DBA_SERVICES, V$SESSION,
V$ACTIVE_SESSION_HISTORY
y Service performance in V$SERVICE_STATS,
V$SERVICE_EVENTS,
V$SERVICE_WAIT_CLASSES,
V$SERVICEMETRIC,
V$SERVICEMETRIC_HISTORY
y Service, MODULE, and ACTION performance in
V$SERV_MOD_ACT_STATS.
Example Service Metrics
Service time - current
NAME ELA(s)/CALL CPU(s)/CALL
--------------------------- ------------ ------------
ERP 0.1940 0.0082

Service time - history every 60 seconds

NAME ELA(s)/CALL CPU (s)/CALL


--------------------------- ------------ ------------
ERP 0.1940 0.0082
0.2046 0.0085
0.2154 0.0093
0.2248 0.0105
0.2160 0.0097
0.2185 0.0104
0.2211 0.0104
Service Thresholds and Alerts

y DBMS_SERVER_ALERT.SET_THRESHOLD
– SERVICE_ELAPSED_TIME
– SERVICE_CPU_TIME
– Warning and critical levels for observed periods
– Import from EM baselines
y Comparison of response time against
accepted minimum levels
– a desire for the wall clock time to be, at most, a
certain value.
30 service, module, action statistics
y user calls y workarea executions - optimal
y DB time – response time y workarea executions - onepass
y DB CPU – CPU/service y workarea executions - multipass
y parse count (total) y session cursor cache hits
y parse time elapsed y user rollbacks
y parse time cpu
y db block changes
y execute count
y gc cr blocks received
y sql execute elapsed time
y gc cr block receive time
y sql execute cpu time
y opened cursors cumulative y gc current blocks received
y session logical reads y gc current block receive time
y physical reads y cluster wait time
y physical writes y concurrency wait time
y redo size y application wait time
y user commits y user I/O wait time
Automatic Workload Management
Provides Visibility
30

25

20 D
C
15
B
10 A

0
9 10 11 12 13 14 15 16 17 18
AWR Automatically Measures
Service
y AWR measures response time, resource used
– Automatically for work done in every service
y AWR monitors thresholds, sends AWR alerts
– response time, cpu used
– maintains runtime history every 30 minutes
y Statistics collection and tracing are HA
– persistent for service location / instance restart.
– enabled and disabled globally for RAC.
Service High Availability using
RAC 10g
y Focus is on protecting the application services
– more flexible/more cost effective than other HA
that focus on availability of single physical systems
y Services are available continuously with load
shared across one or more instances.
y Any server in the RAC can offer services
– in response to failures
– in response to planned maintenance.
– in response to runtime demands
Automatic Workload Management
Integration with Other Tools

y Job Scheduler
– Job classes mapped to services
y Parallel Query / DML
– Query coordinator connects to a service like any
other client
– Parallel slaves inherit service from coordinator
y Oracle Streams Advanced Queuing
– Queues accessed by service
– Achieves location transparency
Failure Notification (FaNTM) with Oracle
Application Server 10g

y Fast, coordinated
Oracle Clusterware recovery without human
intervention
App
– Oracle Database 10g
Server RAC signals JDBC
10g Fast Connect Failover
when instances failures
occur
– Immediate recovery for
RAC mid-tiers
y < 4 seconds from
15 minutes
y self correcting
Failure Notification (FaN)
JDBC Fast Connection Failover Processing

y Supports multiple
connection caches JDBC / Mid-Tier Database Tier
y Datasource for each CACHES
cache mapped to a SERVICE 1 INSTANCE X
service
y Keeps track of service SERVICE 2 INSTANCE Y

and instance for each SERVICE 3


connection INSTANCE Z

y Distributes new work


requests across
available instances
Initial State
C1
C2
C3 Instance 1
C4
Instance 2
C5
C6

Service across RAC

Connection Pool
Instance Join
C1
C2
C3 Instance 1
C4
Instance 2
C5
C6 Instance 3

Service across RAC

Connection Pool
Instance Join – Undesired Cases
Option 1: Nothing Happens
C1
C2
C3 Instance 1
C4
Instance 2
C5
C6 Instance 3

Service across RAC

Connection Pool
Instance Join – Undesired Cases
Option 2: Add New Connections
Randomly
C1
C2
C3 Instance 1
C4
Instance 2
C5
C6 Instance 3
C7
C8
Service across RAC
C9

Connection Pool
Instance Join – Desired Result
C1
C2
C3 Instance 1
C4
Instance 2
C5
C6 Instance 3
C7
C8
Service across RAC
C9

Connection Pool
Rebalancing upon Instance
Join
y Desire: Create new connections to new
instances & potentially disable some old ones.
y Method:
– 10gR1: Fast Application Notification (FAN)
y Using the HA up and down events for services in
RAC
– 9i: DBMS session disconnect (Manual steps)
Node Leaves
C1
C2
C3 Instance 1
C4
Instance 2
C5
C6
C7
C8
Service across RAC
C9

Connection Pool
10gR2 Connection Pool Load Balancing

C1 Bad
C2 OK
C3 Instance 1 Good
1 2
C4
Work Instance 2
request C5
C6 Instance 2
C7
C8
Service across RAC
C9
Connection Pool
Customer Examples
Chicago Stock Exchange
New Distribution Possible

Node-1 Node-2 Node-3 Node-4

Critical 1 Critical 2 Critical 3 Critical 4

Medium 1 Medium 2

Low

Batch
Chicago Stock Exchange
Resource Management Plan -
Daytime

Critical1 Critical 2 Remainder


P1 50% 25% 25%

Medium 1 Medium 2 Remainder


P2 30% 40% 30%

Low Batch
P3 80% 20%

Need flexibility to change plan to meet changing demands


Dell
European Operations DW
Node-1 Node-2 Node-3 Node-4

Critical Reports P1 0-100% 0-100% 0-100% 0-100%

Adhoc Reports P3 0-100% 0-100% 0-100% 0-100%

Loads/Aggregation P2 0-100% 0-100% 0-100% 0-100%

Gains: Manages Priorities, Visibility of Use,


Can turn off reporting during aggregation
Talk America

Node-1 Node-2 Node-3 Node-4 Node-5 Node-6

OLTP 1
OLTP 2 Repor Batch
ting
OLTP 3 DW
OLTP 4
Retailer
Consolidated eCommerce Project

eCommerce Order Management Spare Capacity

Backups
DW

Real Time
Reporting
Large Node Proofpoints

y Amazon.com – 16 node Clickstream DW (17 TB), 8


node ETL DW on Linux
y a US Telecom – 16 node Grid on Solaris
y Telstra – 8 Node - 18 TB DW on Linux
y Mercadolibre – 8 Node OLTP on Linux/Itanium
y Gas Natural – 8 Node DW, 7 Node SAP BI system
y AC3 - 63 Node POC on Linux)
y OpenWorld Japan w/ Sun & Cisco - 128 Node
Cluster
Closing Remarks

Use 4+ Nodes – Greater HA & Scalability


y Connect by Service: not SID, not Instance, not
Host
y Instrument your code to use Module & Action
y Implement Services – weather your DB
consists of one application or many
QUESTIONS
ANSWERS