You are on page 1of 124

Oracle Real Application Clusters

(RAC)

Comprehensive Concepts Overview,


Insight, Recommendations, Best
Practices and a whole lot more

A “BrainSurface” Presentation
www.brainsurface.com
Disclaimer

n This views/content in this document are


those of the author and do not necessarily
reflect that of Oracle Corporation and/or its
affiliates/subsidiaries. The material in this
document is for informational purposes
only and is published with no guarantee or
warranty, express or implied.
About “Tariq Farooq”
n Tariq Farooq – www.brainsurface.com
n Founder of BrainSurface | The After-Oracle ConnectSpace -
Next-Generation Social Networking for the Oracle, Java &
MySQL Communities
n Oracle ACE – Oracle Technologist for 16+ years
n Oracle Certified Expert – Real Application Clusters
n Oracle Certified Professional – DBA – 8i, 9i, 10g, 11g & 10g
Apps DBA
n Oracle Certified Professional – Internet Application
Developer 2, 6i, 9i
n Oracle Certified Professional – E-Business Suite 11i
n Author, Speaker, Blogger, Forumizer & Community Organizer
What is Clustering? – Synopsis &
Overview
n A Cluster is a collection of networked computers working together to
form a single logical machine.

n Cluster nodes are typically connected to each other through a fast


network connection on a local area network within a small geographical
site such as a building or a cluster of buildings: This is the norm.,
however there are exceptions to this rule with Extended Distance
Clusters existing over WANs as well.

n Having redundant nodes within a cluster provides for a continuous


availability of computing services, as well as providing for more
processing power to balance the computing load amongst several
nodes instead of a single expensive powerful machine.

n Clustering is also Known-as/Synonymous-to Massive Parallel Processing


(MPP).
What is Oracle RAC? – Synopsis &
Overview
n Real Application Clusters (RAC) is the Oracle trade
name for its database server cluster product and
provides load-balancing, scalability and automatic
failover protection in case of server failure (the key
word here being server failure not site failure, which
is provided by Oracle's DataGuard product).

n Essentially, Oracle RAC is a parallelization product


that provides Load-Balancing, Scalability, Elasticity
and High-Availability by keeping the Oracle database
server product available and running across a set of
multiple server nodes accessing a common database
on Shared Storage.
What is Oracle RAC? – Synopsis &
Overview
§ Real Application Clusters (RAC) was released in 2001 with
Oracle 9i.

§ Oracle RAC employs the Shared-Everything approach.

§ Previous/Primitive version of Oracle RAC was known as Oracle


Parallel Server (OPS). Oracle was the first vendor to run a
Parallel Server product at the database level with Oracle
Database Server 6.2 for the DEC VAX platform.

§ Allows multiple servers to run Oracle RDBMS Server Instances


allowing them to concurrently read/write to a single “Clustered”
database on Shared Storage.
What is Oracle RAC? – Synopsis &
Overview
n Oracle 9i RAC required 3rd-Party Clusterware e.g. Veritas SFRAC,
Sun Clusters, HP TruCluster, IBM HACMP etc. on all major Unix
OS brands except for Linux and Windows; Oracle Cluster Ready
Services (CRS) was provided by Oracle for Linux and Windows.

n Oracle CRS was renamed to Oracle Clusterware in 10g R.2.

n 3rd Party Clusterware is no longer required OR recommended by


Oracle with 10g and up.

n With 11gR2, Grid Infrastructure combines Clusterware and ASM


in a single unified Oracle HOME.
What is Oracle RAC? – Synopsis &
Overview

n Oracle RAC is at the most complex end of the


Oracle RDBMS family spectrum and needs
sophisticated management tools such as
Oracle Enterprise Manager (OEM) Grid
Control.
n Enables Dynamic Provisioning for Grid/Cloud
Computing.
n Oracle RAC 10gR.2 supports up-to 100 nodes.
What is Oracle RAC? – Synopsis &
Overview
n Oracle RAC Provides:
n Scalability
n Load Balancing / Workload Distribution
n Elasticity
n Fault Tolerance

n Oracle RAC used primarily for Load-Balancing.

n Oracle RAC used secondarily for Fail-Over.


What is Oracle RAC? – Synopsis &
Overview
n Oracle RAC Instances have their own
separate/distinct Redo Log files and
Alert/Trace Log files.

n Oracle RAC Instances have their own


separate/distinct System Global Areas (SGAs)
and set of background processes.

n Redo Log files can be read by all instances


but written to only by the Master Instance.
Oracle RAC Architecture -
Overview
n A typical Oracle RAC cluster comprises of the
following components:
n Single RAC Database (comprising of shared Control and Data
Files) on shared Storage accessed accessed by Multiple
Instances.
n Nodes are typically comprised of Low Cost/Commodity
Hardware.
n Multiples Instances running on Multiple Nodes.
n Every RAC Instance has its own Redo Log files and Undo
Segments/Tablespaces.
n Data, Control and INIT files are shared across Instances.
n Cache Fusion/Synchronization enables
concurrent/simultaneous transaction-processing between all
Instances using the Private Cluster Interconnect.
Oracle RAC Architecture -
Overview

Figure/Diagram from Oracle Documentation


Oracle RAC Architecture -
Overview

Figure/Diagram from Oracle Documentation


Oracle RAC Architecture -
Overview

Figure/Diagram from Oracle Documentation


Oracle RAC Architecture -
Overview

Figure/Diagram from Oracle Documentation


Maximum Availability Architecture
- Overview

Figure/Diagram from Oracle Documentation


Maximum Availability Architecture
- Overview

Figure/Diagram from Oracle Documentation


Oracle RAC with DataGuard
Architecture - Overview

Figure/Diagram from Oracle Documentation


Oracle RAC with DataGuard
Architecture - Overview

Figure/Diagram from Oracle Documentation


Oracle RAC: Vertical Scalability vs.
Horizontal Scalability
n Horizontal Scalability within a single server
has its well-known limits that prevent an
application/database from growing beyond a
certain threshold.

n Oracle RAC is the only viable solution for


scaling out the Oracle RDBS server product
horizontally to support a Very Large User
Base (VLUB).
SMP vs. RAC (MPP)

Figure/Diagram from Oracle Documentation


Oracle RAC: Application Scalability

n With the advent of Oracle RAC 9i (and up)


Cache Fusion:
n Applications typically scale out-of-the-box with
zero/minimal tuning.
n More nodes can be added/removed in HOT
MODE=ZERO DOWNTIME with zero database
downtime to provide elasticity and scalability.
n Database Files residing on Shared Disk Cluster File
System provide a uniform, fast and read-
consistent image to the end-user.
Oracle RAC: Application Scalability

Quote from Oracle Documentation


Oracle RAC: Application Scalability

n Typically Applications that behave


correctly/normally in a Single-Server/Single-
Instance Oracle database typically scale just
fine on a Oracle RAC database without
making code changes.

n Cache Fusion is the driving technology behind


Oracle RAC that enable Applications to scale
out on multiple servers/instances.
Oracle RAC: Instance Failure:
Failover/Switchover Times
n Instance Failover Time: Within Seconds to
few Minutes.

n Application Switch-Over Time: Within


Seconds to few Minutes.

n Overall Downtime for a subset of users


affected by downed instance: Within Seconds
to few Minutes.
What is Oracle Clusterware? –
Synopsis & Overview

n Oracle Clusterware is the layer between


the OS and the Database.

n Manages multiple resources within the


Cluster and presents the clustered
database as a single logical machine to
the end-user.
What is Oracle Clusterware? –
Synopsis & Overview

n Oracle Clusterware provides:

n Global Resource Management

n Group Services

n Node Membership

n High-Availability Functions
What is Oracle Clusterware? –
Synopsis & Overview

n Unified Cohesive Solution comprising of


3-Tiered Architecture:

n Cluster Synchronization Service (CSS)

n Cluster Ready Service (CRS)

n Event Manager (EVM)


What is Oracle Clusterware? –
Synopsis & Overview

n Physical Components:

n Oracle Cluster Registry (OCR)

n Voting Disk
Oracle Clusterware Architecture –
Synopsis & Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware Architecture –
Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware Architecture –
Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware Architecture –
Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware Architecture –
Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware Architecture –
Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware Architecture –
Overview

Figure/Diagram from Oracle Documentation


Oracle Clusterware –
Daemons/Processes - Overview

n CRS daemon: Cluster resources management.


n Event Manager (EVM): Event publishing.
n Cluster Synchronization Services Daemon
(CSSD): Node membership.
n RACGOPROCD: Cluster Monitoring.
n Oracle Notification Services (ONS): used by
EVM.
Oracle Clusterware Framework –
Synopsis & Overview

n Framework: Scripting Interface.


n Can be used to restart and relocate an
application in case of node failure.
n C-based API – Used for managing
resources.
n Script: Agent – Start, Check, Stop an
application.
Cache Fusion – Synopsis &
Overview
n Cache Fusion, a mechanism within Oracle RAC employs Shared
Cache Architecture that fuses the in-memory data buffer cache
across all nodes into a single logical read-consistent buffer
cache available to all instances.

n Cache Fusion is very fast due to the fact that, disk writes are
eliminated when other instances request blocks for updates.

n DB Blocks are synchronized, NOT mirrored = Faster


performance.

n DB Blocks are transferred in-memory from instance-to-instance


cache over the Cluster InterConnect when requested after
proper locking procedures are implemented.
Cache Fusion – Synopsis &
Overview

n Some useful Dynamic Performance


Views for monitoring Cache Fusion:
n gv$file_cache_transfer
n gv$temp_cache_transfer
n gv$cache_transfer
n gv$class_cache_transfer
Cache Fusion – Synopsis &
Overview
n Global Cache Service (GCS) is used for FAST
instance-to-instance block buffer transfer and
establishes/implements Cache Coherency =
Never more than 3 hops.
n Global Enqueue Service (GES), previously
known as Dynamic Lock Manager (DLM) is
used for block buffer locking.
n Global Resource Directory (GRD) is used for
keeping track of Block Buffer
Location/Mode/Role information.
Cache Fusion – Synopsis &
Overview

n Global Cache Services (GCS) Waits


=
Cross-Instance Block transfer Waits
=
Measure of Data Block Transfer
Efficiency.
Cache Fusion Architecture -
Overview

Figure/Diagram from Oracle Documentation


Cache Fusion Architecture -
Overview

Figure/Diagram from Oracle Documentation


Oracle Cluster Registry (OCR) –
Synopsis & Overview

n OCR is the central repository for storing


any/all information about all clusterware
resources.
n OCR is automatically backed up every 4 hours
in $ORACLE_CRS_HOME\cdata directory.
n Recommendation: OCR Voting Disks should
be mirrored for Redundancy/High-Availability
(Upto two copies).
Voting Disk – Synopsis & Overview

n Synonymous to Quorum Disk.


n Used for preventing Split Brain scenario.
n Used for managing Cluster Members.
n Recommendation: Voting Disks should
be mirrored for Redundancy/High-
Availability (Upto 3 copies).
n Voting Disks can be dynamically added
in 11g.
Node Evictions: Overview

n Network Heartbeat (MissCount in Seconds):


n Node unable to send a network Heartbeat for
MissCount = Node Eviction.

n Disk Heartbeat (I/O Timeout in Seconds):


n Disk Heartbeat not updated in I/O Timeout in
seconds = Node Eviction.
Cluster InterConnect – Synopsis &
Overview
n Dedicated “Private” Network for Oracle RAC.
n Dedicated >= GigaBit Ethernet Switch required for
the Private Cluster Interconnect.
n Used for block-transfers amongst instances to enable
Cache Fusion.
n Same Interconnect should be used for both
Clusterware and Database.
n Recommendation: Test Cluster InterConnect
latency/band-width with 3rd Party Tools.
n Recommendation: Don’t use Cascading Switches.
Cluster InterConnect – Synopsis &
Overview
n Recommendation: GigaBit Ethernet Adapters should
be teamed/bonded together to provide for higher
bandwidth/fault-tolerance.
n Recommendation: Disable Unicast Storm Control.
n Recommendation: Enable Flow Control.
n Recommendation: Use Jumbo Frames in Gigabit
Ethernet.
n Recommendation: Full Duplex Mode.
n Recommendation: NIC Ring Buffers.
n Recommendation: Enable Rapid Spanning Tree
Protocol (RSTP).
Cluster InterConnect – Synopsis &
Overview
n Recommendation: Turn on Port Fast.
n Recommendation: UDP: Send/Receive Buffers - Max Setting.
n UDP is recommended over TCP because of lower latency.
n Highest-Top-Bit-Rate recommended over Auto-Negotiate.
n Duplex Mode recommended.
n MTU size on all adapters should be identical on all nodes.
n NIC should be on the fastest PCI bus.
n TCP Settings
n Flow Control Settings.
n Network Interrupts for CPU.
n Socket Receive Buffer.
Cluster InterConnect – Synopsis &
Overview
n Bandwidth required per second
= (
Message received per second
+
Blocks received per second
+
PQ message received per second
)
/
Maximum Network transmit capacity
n (
M (Message received per second = No. of GES messages + No. of GCS messages)
+
B (Blocks received per second = DB Block Size * (No. of CR Blocks received
+ No. of Current Blocks received) / MTU Size)
+
P (PQ message received per second = (PQ Message Size * No. of PX Remote Messages Received ) / MTU
Size)
)
/
85000

Formula from Oracle Documentation


Virtual IP (VIP) – Synopsis &
Overview
n Virtual Connection over Public Interface.

n Each node must have its own virtual IP (VIP), which is a unique
and unused IP address within the same network subnet.

n VIP must be a DNS known address.

n VIP is stored in the OCR.

n Upon Node/Instance Failure, EVM generates an event: Oracle


Clusterware transfers the VIP address to another instance.

n One active VIP per node.


Virtual IP (VIP) – Synopsis &
Overview

n Fast Failover: Database clients no longer


have to deal with TCP/IP timeouts.

n Application VIP (10gR2 and up): Node-


Independent connections.

n Recommendation: Use VIP for database


connections, NOT the physical IP address.
Shared Storage – Synopsis &
Overview
n Comprises of Control Files, Data Files, Redo
Log files and Undo Files.

n Cluster File System Required for Shared


Storage: Oracle RAC supports the following
types of Shared Storage:
n Automatic Storage Management (ASM)
n Oracle Cluster File System (OCFS)
n RAW Devices
Shared Storage – Synopsis &
Overview

n Automatic Storage Management (ASM)


goes hand-in-hand with Oracle RAC and
is highly recommended by Oracle Corp.
for RAC deployments.
n Recommendation: Enable Multi-Pathing
(Active-Active IO Paths) between
servers and SAN storage.
Shared Storage – Synopsis &
Overview

n Typical Thresholds for Disk I/O:

n Log File Parallel Write > 3 MSec

n DB File Scattered Read > 30 MSec

n Db File Sequential Read > 25 MSec


Automatic Storage Management
(ASM) – Synopsis & Overview
n Recommendation: Use ASM with Oracle RAC.

n Features:
n Elasticity = Cloud Computing.
n Cluster File System.
n Volume manager.
n HOT Mode: Add/remove disks online.
n Load balancing.
n Striping of data across disks.
n Mirroring.
n Eliminate/Significantly-Minimize I/O Performance Tuning.
n Built for Oracle by Oracle.
Automatic Storage Management
(ASM) – Synopsis & Overview

n ASM instance going down on one instance =


Surviving cluster nodes still keep running.

n Recommendation: Each ASM Diskgroup


should have disks with similar characteristics
in it: Avoid uneven architecture within each
ASM disk group.
Automatic Storage Management
(ASM) – HOT Data Migration

Figure/Diagram from Oracle Documentation


Automatic Storage Management
(ASM) – HOT Data Migration

Figure/Diagram from Oracle Documentation


Automatic Storage Management
(ASM) – HOT Data Migration

Figure/Diagram from Oracle Documentation


Automatic Storage Management
(ASM) – HOT Data Migration

Figure/Diagram from Oracle Documentation


Workload Management – Synopsis
& Overview
n Virtualization/Grid Computing/Abstraction.
n Segregation of jobs into various workloads
depending on their individual/common
characteristics.
n Parallelization: Segregate large jobs into
smaller units and execute them in parallel =
Lot better performance.
n Define Services to accomplish Workload
Management.
Workload Management – Synopsis
& Overview

Figure/Diagram from Oracle Documentation


Services – Synopsis & Overview
n Introduced in 1999 with release of Oracle 8i.
n Workload Management: Services.
n Services decouple hard-coded mapping
between a connection request and a RAC
instance.
n Services split-up the workload into different
classes.
n Enable Automatic Fail-over/Recovery of
applications.
n Up-to 100 services can be created in 10g R.2.
Services – Synopsis & Overview
n Services are a must for using the Load-
Balancing Advisory and Runtime Connection
Load Balancing.
n Each Instance can have multiple Services.
n Each Services can service multiple Instances.
n Each Service is comprised of:
n Thresholds & Priorities.
n Job Classes to be managed.
n Priority of Execution.
n Date Range of Job Class.
Services – Synopsis & Overview
n Can be created/managed in 10g by the following:
n Oracle Enterprise Manager (OEM) Grid Control.
n Database Configuration Assistant (DBCA).
n SRVCTL Command-Line Utility.
n DBMS_SERVICE PL/SQL database package in SQL*Plus.
n Automatic Workload Repository (AWR)
collects/contains/measures full-blown statistics about
Services.
n Integrated with Oracle Database Resource Manager
(DRM).
n Services can relocated across nodes.
Services – Synopsis & Overview

n Service Level Goals (3 Options):


n NONE
n SERVICE_TIME – Used for non-uniform completion
times e.g. eCommerce Application.
n THROUGHPUT – Uniform completion times e.g.
Stock-Trading System.
n Distributed Transaction Processing (DTP)
supported in 10gR.2 to provide tight coupling
with single instance.
Services – Synopsis & Overview

n Connection Load-Balancing Goal:


n CLB_GOAL_SHORT: Used for short duration
connections.
n CLB_GOAL_LONG: Default Value: Used for
longer duration connections.
n TAF Policy options:
n NONE
n BASIC
n PRECONNECT
Services – Synopsis & Overview
n Services - Some useful DBA & Dynamic Performance
Views:
n DBA_SERVICES
n V$SESSION
n V$SERV_MOD_ACT_STATS
n V$ACTIVE_SESSION_HISTORY
n V$ACTIVE_SERVICES
n V$SERVICE_WAIT_CLASSES
n V$SERVICEMETRIC
n V$SERVICEMETRIC_HISTORY
n V$SERVICE_STATS
n V$SERVICE_EVENTS
Oracle Database Resource
Manager – Synopsis & Overview
n Features:
n Kill/prevent runaway queries automatically.

n Limit session idle time.

n CPU resource distribution for user classes.

n Degree of parallelism control.

n Limit the undo space for user classes.


Oracle Database Resource
Manager – Synopsis & Overview

n Consumer Groups:
n Grouping of user sessions into groups.
n Pre-configured rules.
n Set priority level: Low, High.
n Attributes: User, Service, Module, Action.

n Resource Manager Plans: Plan Allocation:


n CPU Allocation.
Fast Application Notification
(FAN): Synopsis & Overview
n Fast Application Notification (FAN) events are used
for notifying applications of cluster component
failures, configuration changes and failure recovery
for applications using RAC.

n FAN includes functionality for dynamically starting


and stopping applications and other related events.

n Enables Fail-over/Recovery of Applications in case of


cluster component failure thereby providing High-
Availability & Scalability.
Fast Application Notification (FAN)
– Synopsis & Overview

n Typical FAN Events are published for the


following:
n Downed Services.
n TCP/IP Time-outs.
n Connection Time-outs.
n Non-Load Balancing of applications upon
restart/scale-out of services.
n Dead/Crashed/Slow nodes.
n Cluster Configuration Changes.
Fast Application Notification (FAN)
– Synopsis & Overview
n FAN enables the following:
n Users are directed to Available Instances.
n Server-side callouts to notify Administrative
Personnel log and create Problem-Tickets.
n Disruption of service is minimized/ mitigated.
n 3 types of FAN events:
n Load Balancing Events
n Node Events
n Service Events
Fast Application Notification (FAN)
– Synopsis & Overview
n Applications can subscribe to FAN events
using the Oracle Notification Service (ONS)
and API Oracle Call Interface (OCI).

n FAN integrated with the following Oracle


Clients (Handled Automatically: Don’t require
any code changes):
n Oracle Call Interface (OCI)
n Oracle JDBC
n Oracle ODP.NET
Fast Application Notification (FAN)
– Synopsis & Overview

n Server-side Callouts:
n Used to run an Event-Handling Shell Script/
Compiled Executable in the CRS
HOME/racg/usrco directory.
n Executed Asynchronously upon the
occurrence of a condition e.g. FAN posts
an event to ONS about a change in state
such as Startup/Stopping of an Instance,
Database or Service.
Fast Application Notification (FAN)
– Synopsis & Overview

n Server-side Callouts – Can be used for:


n Logging of events e.g. generation of log
file for event.
n Paging to Administrative Personnel e.g.
Email, Paging etc.
n Starting/Stopping of Remote
Daemons/Processes.
Fast Application Notification (FAN)
– Synopsis & Overview

Figure/Diagram from Oracle Documentation


Oracle Notification Service (ONS) -
Overview

n Publisher/Subscriber model for


messaging.

n Utilized by FAN to publish HA/Load-


Balancing events.
Load Balancing Advisory –
Synopsis & Overview
n Used for Load-distribution amongst the
various instances.
n The Load Balancing Advisory analyzes the
service/work-load level of the nodes within a
RAC cluster and sends a Fast Application
Notification (FAN) event to the application so
that, the requests are sent to the best service
at the time of the request.
n Detects and avoids sending jobs to slow and
hung nodes.
Load Balancing Advisory –
Synopsis & Overview

n Integrated with Automatic Workload


Repository (AWR) & Connection Pools.

n V$SERVICE &
V$SERVICEMETRIC_HISTORY updated
every hour.
Load Balancing Advisory –
Synopsis & Overview

Figure/Diagram from Oracle Documentation


Runtime Connection Load
Balancing – Architecture Overview

Figure/Diagram from Oracle Documentation


Various types of Load Balancing in
Oracle - Overview

n Load Balancing: Parallel Execution.


n Connection Pooling: Use Fast Connection
Failover + Runtime Load Balancing.
n Automatic Workload Management = Services.
n Connection Load Balancing: Oracle NET
Services:
n Server Side Load Balancing.
n Client Side Load Balancing.
Parallel Query – Synopsis &
Overview

n Parallel Query Options:


n Standard parallel query: Utilizes all
available resources in cluster.

n Restricted parallel query: Processing


limited to specific assigned nodes in
cluster. Achieved by:
n Services
n Parallel Instance Group
Parallel Query – Synopsis &
Overview

Figure/Diagram from Oracle Documentation


Parallel Query – Synopsis &
Overview

Figure/Diagram from Oracle Documentation


Parallel Query – Synopsis &
Overview

Figure/Diagram from Oracle Documentation


Parallel Query – Synopsis &
Overview

Figure/Diagram from Oracle Documentation


Oracle RAC Software: Storage &
Organization – Overview

Figure/Diagram from Oracle Documentation


Oracle RAC Software: Storage &
Organization – Overview

Figure/Diagram from Oracle Documentation


Oracle RAC: Shared HOME(s)
vs. Non-Shared HOME(s) –
Synopsis & Overview
n Shared Oracle HOME:
n One copy of the Oracle HOME shared by all nodes within the
cluster on a Shared File System.
n Shared Oracle HOME(s) cannot be used for rolling upgrades.
n OS needs to be cross-node compatible.

n Non-Shared Oracle HOME:


n Each node has its own set of Oracle HOMEs mutually
exclusive of other nodes.

n Recommendation: Use Non-Shared Oracle HOME(s).


Rolling Upgrades - Overview

Figure/Diagram from Oracle Documentation


Rolling Upgrades - Overview

Figure/Diagram from Oracle Documentation


Oracle RAC – Administration &
Management

n Three levels of Management/


Administration:
n Cluster-Level Administration
n Database-Level Administration
n Instance-Level Administration
Oracle RAC – Administration &
Management
n GUI: Oracle Enterprise Manager (OEM)
Database/Grid Control
n GUI: Database Configuration Assistant (DBCA)
n GUI: Virtual Internet Protocol Configuration Assistant
(VIPCA)
n Command Line: Cluster Verification Utility (CVU)
n Command Line: Oracle Interface Configuration Tool
(OIFCFG), SQL*Plus, SRVCTL, Oracle Clusterware
Command-Line Interface.
Oracle RAC – Administration &
Management:
Command-Line Utilities

n CRS_STAT
n CRSCTL
n CRS_STOP
n CRS_START
n SRVCTL
Cluster Verification Utility (CVU) –
Synopsis & Overview
n Cluster Verification Utility (CVU/CLUVFY) is a very
useful tool for performing Pre and Post component-
level checks at various stages of the Oracle RAC
Install/Patch/Update process in addition to various
other system-level checks at all major stages of the
deployment cycle.

n Two scripts provided for running CVU:


n Runcluvfy (runcluvfy.sh – Unix /runcluvfy.bat - Windows)
n Cluvfy (cluvfy.sh – Unix /cluvfy.bat - Windows)
Cluster Verification Utility (CVU) –
Synopsis & Overview

Figure/Diagram from Oracle Documentation


Oracle RAC over Commodity
Hardware
n Oracle RAC works on low-cost commodity hardware
to lower the costs of ownership and produce a high-
availability parallelized Grid/Cloud computing
architecture environment.

n Oracle along with Dell, EMC and Intel launched


Project MegaGrid in 2004 to demonstrate the cost-
effectiveness, reliability and functionality usage of
Grid Computing Infrastructure, employing Oracle Real
Application Clusters on inexpensive commodity
hardware as an economic and powerful alternative to
the conventional SMP computing paradigm.
Exadata & Oracle RAC
n Oracle Exadata – The world’s fastest series of database machines:
n Is based on and comes preconfigured with Oracle Real Application Clusters
(RAC)
n No Single Point of Failure
n Infiniband, Intelligence/Compute Capability at the Storage tier,
Compression, PCI Flash & Flash Cache = The Ultimate Database
Consolidation Platform
n Concurrent Query/Updates at the same time with MultiVersion Read
Consistency Beneficial 4 TrickleFeedDataLoads
n Extreme Performance, 25GB/sec IO bandwidth, Upto 50 GB/sec with
FLASH, Load Upto 5TB/hour
n Replace distributed systems with a consolidated system, utilizing the same
Oracle Skillset that you currently have
n Get upto 10x query performance on mixed workloads
Figure/Diagram from Oracle n 5TB of Flash Cache = 56 Flash PCI Cards per Exadata Rack, 50GB/sec Flash
Documentation
Bandwidth = Hypersonic Speeds!
Deploying Oracle RAC: A balanced
system approach

Figure/Diagram from Oracle Documentation


Deploying Oracle RAC: A balanced
system approach

Figure/Diagram from Oracle Documentation


Deploying Oracle RAC: A balanced
system approach

n Few large OR Many small: Either


approach is fine/feasible from a
scalability perspective.
n Recommendation: Cluster nodes with
uniform/even performance properties.
n Recommendation: Avoid uneven
architecture within Oracle RAC.
Deploying Oracle RAC: Destructive
Testing

Figure/Diagram from Oracle Documentation


Oracle RAC: Project Deployment
Phases/Goals

n Deploy/Implement a TEST Cluster.


n QA Regression/Stress Testing.
n QA Beta Testing.
n Database worst-case scenario testing for
possible long-running/slow queries that pose
a bottleneck/domino threat to the system.
n Test the System Infrastructure-As-a-Whole to
identify/correct any Integration Flaws.
Oracle RAC: Project Deployment
Phases/Goals

n Expansive Real Application Testing


(Where possible > 10g).
n RAW Database Load Testing.
n Simulate Peak Performance load on the
database, collect statistics and analyze
results.
n Comprehensive and Repetitive
Rehearsal of deployment to Production.
Deploying Applications on Oracle
RAC – Recommendations & Best
Practices

n Automatic Storage Management (ASM)


n Workload Management: Services
n SPFILEs on Shared Storage
n Automatic Undo Management
n Automatic Segment Space Management
n Automatic Database Diagnostic Monitor
(ADDM) in conjunction with (Automatic
Workload Repository) AWR
General Recommendations for a
healthier/faster RAC system
n Keep Batch & OLTP processes on separate instances.
n Commit Sizes should be reduced for faster operations.
n Use Reverse-Key and Hash-Partitioned indexes.
n Increase the Cache for High-Usage Sequences and change them
to NOORDER.
n Use Automatic Segment Space Management (ASSM).
n Remove PCTUSED, FREELIST, FREELISTs groups.
n Use Network Time Protocol (NTP) to synchronize the time on all
nodes within a Oracle RAC cluster.
n Lesser LMS with high utilization = More Efficiency.
n LMS procecess <= No. of CPUs.
General Recommendations for a
healthier/faster RAC system
n High-DML tables should have lesser rows per block:
ALTER TABLE MINIMIZE RECORDS_PER_BLOCK.
n Tune INITRANS and FREELISTS to mitigate block
contention.
n Partition High-Usage Database Segments to minimize
resource contention.
n Move as much PL/SQL code from client to Server-
Side e.g. Server-side Stored Procedures/Packages.
n Global Dynamic Performance Views = GV$ Prefix.
High Availability: Redundancy is
Crucial
n High-Availability requires redundant
components to be built in to every layer of
the Infrastructure Stack to eliminate SPOFs
(Single Point of Failure).
n Geographical Redundancy:
n Multiple Data Centers at geographically distant
locations.
n Multiple Power/Air-Conditioning/Other Critical
Resource units within each Data Center site.
n Oracle DataGuard provides protection for Site-
Level failure.
High Availability: Redundancy is
Crucial
n Software Redundancy at each Data Center
site:
n Multiple Web Servers.
n Multiple Application/ File/ BI/ Batch/ Transaction/
Interface/ Reporting/ Services/ Support/ Other/
Management/ Monitoring Servers.
n Oracle RAC: Multiple Oracle Database Instances.
n Storage Redundancy at each Data Center
site:
n Multiple SAN(s).
n RAID groups within each SAN.
High Availability: Redundancy is
Crucial
n Hardware Redundancy at each Data Center site:
n Multiple Global Traffic Managers (GTM).
n Multiple Local Traffic Managers (GTM).
n Multiple Servers at every tier-level.
n Multiple Storage Area Networks (SANs).
n Multiple Fibre/Infiniband Switches between the SAN and
servers.
n Multiple Network Switches/VLANs.
n Multiple HBAs in each server.
n Multiple Network Interface Cards
(NICs) within each server.
n Multiple Power, Cooling, Hard Drives, CPUs etc. within each
server.
Attaining High Availability: Its not
JUST the technology!

n Processes and People along with Technology


are crucial in implementing and achieving HA.
n Comprehensive Knowledge Transfer (CKT):
Every new technology comes with a learning
curve.
n The Right Mindset: HA is expensive to
acquire, deploy and maintain but pays off in
the medium/long run.
Attaining High Availability: Its not
JUST the technology!

n Processes/Management need to be in place:


Problems, Downtime Planning, Incidents,
Change Control, Risk, Releases etc.
n SLOs and SLAs must be negotiated, agreed
upon, honored & monitored.
n Elaborate and Exhaustive Testing must be
performed at all levels of the Infrastructure
Stack.
Monitoring Oracle RAC

n Establish Baselines.
n Workload Monitoring: Peak/Average/Various-
times-of day
n Resource Monitoring: Network, CPU, Memory,
IO, Transactions.
n Interconnect Monitoring: Latency, Efficiency
n Consumption > 70% CPU = Add another
node.
Monitoring Oracle RAC

n 3 levels of monitoring:
n OS
n Application
n Database

n Compare/Match statistics/metrics reported by


Oracle with statistics/metrics reported by
OS/3rd-Party tools.
Monitoring Oracle RAC

n Monitoring Tools:
n VMSTAT
n IOSTAT
n NETSTAT
n OS Watcher
Monitoring Oracle RAC
n Database Monitoring:
n AWR

n ADDM

n Statspack

n ASH

n OEM Grid Control:


n Enables/Facilitates all of the above in an intuitive

easy-to-use GUI.
Troubleshooting Oracle RAC
n Split-Brain: A cluster’s worst nightmare.
n Node Eviction.
n IO Fencing.
n STONITH: Shoot The Other Node In The
Head.
n (RACDIAG.SQL): Script to Collect RAC
Diagnostic Information [MetaLink ID
135714.1]
n RACDDT 2.0.5 User Guide [MetaLink ID
360926.1]
Troubleshooting Oracle RAC
n Log files:
n Resource specific logs
n Cluster Network Communication logs
n CRS alert logs
n CSS logs
n CRS logs
n EVM logs
n OPMN logs
n SRVM logs
n Listener Logs
n Trace files:
n BDUMP
n UDUMP
n CDUMP
Troubleshooting Oracle RAC
n Master Note for Real Application Clusters (RAC) Oracle Clusterware and
Oracle Grid Infrastructure [MetaLink ID 1096952.1].

n RAC: Frequently Asked Questions [MetaLink ID 220970.1].

n Resolving Instance Evictions on Windows Platforms [MetaLink ID


297498.1].

n Data Gathering for Instance Evictions in a RAC environment (ORA-


29740) [MetaLink ID 412884.1].

n OS Watcher (OSW) for various Unix flavors (Tru64, AIX, Solaris, HP-UX,
Linux).

n Perfmon for Windows family of OS.


Summary
n To summarize, Oracle RAC is a mature, stable and
robust clustering version of Oracle's database server
product providing fault-tolerance against a single-
point-of-server-failure and is used by
government entities, corporations and organizations
across the planet to provide continuous service, load-
balancing and scalability and a lower-cost alternative
to Mainframe-like SMP (Symmetric Multi-Processing)
models of computing. Learn more about Oracle RAC
at Oracle's RAC homepage.
http://www.oracle.com/technology/products/database/clustering/index.html

You might also like