You are on page 1of 49

RAC 10g Best Practices

on Linux
15. September 2004

Roland Knapp
RAC Pack

Agenda
! Planning Best Practices
! Implementation Best Practices
! Operational Best Practices

Understand the Architecture


public network

VIP2

VIP1

Node1

Node 2

ASM instance 1
Database
instance 1

VIP3

Node3

ASM instance 2
cluster
interconnect

ASM instance 3
cluster
interconnect

Database
instance 2

Database
instance 3

CRS

CRS

CRS

Operating System

Operating System

Operating System

shared storage
redo logs all instances
database & control files
OCR & voting disk
(oracle_home)

...

Understand the Architecture


! Cluster terminology
! Functional basics

HA by eliminating node & Oracle as Single Points of Failure,


SPOF
Scalability by making additional processing capacity available
incrementally

! Hardware components

Private interconnect/network switch


Shared storage/concurrent access/storage switch

! Software components

OS, Cluster Manager, DBMS/RAC, Application


Differences between cluster managers

Plan the Architecture


! Cluster interconnects
Gigabit Ethernet
! Public networks
Ethernet, FastEthernet, Gigabit Ethernet
! Server Recommendations
Minimum 2 CPUs per server
2 and 4 CPU servers normally most cost effective
1-2 GB of memory per CPU
! Fiber Channel, SCSI, or NAS storage connectivity

Plan the Architecture


! Cluster Interconnect redundancy

RH 3.0 NIC Bonding


www.kernel.org/pub/linux/kernel/people/marc
elo/linux2.4/Documentation/networking/bonding.txt

! Local ORACLE_HOME

OCFS V2 will support shared ORACLE_HOME

! OCR and Voting Disk on raw devices


! Database files on ASM

Plan the Architecture


! OCR and the Voting Disk are not mirrored by Oracle.
! This will be a new feature in 10gR2
! Use LVM and OS functionality to mirror these two
devices

Unbreakable Linux Distributions


! Red Hat Enterprise Linux AS and ES 2.1 and 3.0
! SuSE Linux Enterprise Server 8 (SuSE Linux AG)
! Oracle will support Oracle products running with other
distributions but will not support the operating system.

RAC Certification for Unbreakable


Linux
! Certification

Enterprise class OS distribution (e.g. RH AS 2.1 and 3.0,


Suse SLES/8)
Clusterware (Oracle OSD Clusterware)
Network Attached Storage (e.g. NetApp filers)
Most SCSI and SAN storage are compatible

! For more details on software certification:


http://technet.oracle.com/support/metalink/content.html

! Discuss hardware configuration with your HW vendor

Agenda
! Planning Best Practices
! Implementation Best Practices
! Operational Best Practices

Implementation Flowchart
Configure HW

Install Oracle Software,


including RAC and ASM

Configure OS,
Public Network,
Private interconnect

Run VIPCA, automatically


launched from RDBMS
root.sh

Configure
Shared storage

Create database with DBCA

Install Oracle CRS

Validate cluster/RAC
configuration

Operating System
Configuration
! Confirm OS requirements from

Platform-specific install documentation


Quick install guides (if available) from Metalink/OTN
Release notes

! Follow these steps on EACH node of the cluster

Configure ssh
! 10g OUI uses ssh/scp if configured otherwise rsh/rcp
Configure Private Interconnect
! Use UDP and GigE
! Non-routable IP addresses (eg 10.0.0.x)
! Redundant switches as std configuration for ALL cluster sizes.
! NIC teaming configuration (platform dependant)
Configure Public Network
! VIP and name must be DNS-registered in addition to the
standard static IP information
! Will not be visible until VIPCA install is complete

Linux x86 requirements


! Operating System Requirements x86 and Itanium systems

Red Hat Enterprise Linux AS/ES 2.1 (Update 3 or higher)


Red Hat Enterprise Linux AS/ES 3 (Update 2 or higher)
SuSE Linux Enterprise Server (SLES) 8 (service pack 3 or
higher)
SuSE Linux Enterprise Server 9

Linux x86 requirements (cont.)


! The system must be running the following kernel
version (or a higher version)

Red Hat Enterprise Linux 2.1 (x86):


! 2.4.9, errata e.34 (for example 2.4.9-e.34)

Red Hat Enterprise Linux 2.1 (Itanium):


! 2.4.18, errata e.40 (for example 2.4.18-e.40)

Red Hat Enterprise Linux 3:


! 2.4.21-15.EL

SuSE Linux Enterprise Server 8 (x86):


! 2.4.21-138

SuSE Linux Enterprise Server 9 (x86):


! 2.6.5-7.5

SuSE Linux Enterprise Server 9 (Itanium):


! 2.6.5-7.5

Linux x86 requirements (cont.)


! Operating System Requirements
To determine whether OCFS is installed, enter the following
command:
# rpm -qa | grep ocfs
If you want to install the Oracle database files on an OCFS file
system and the packages are not installed, download them from
the following Web site. Follow the instructions listed with the kit to
install the packages and configure the file system:
http://oss.oracle.com/projects/ocfs/

Linux x86 requirements (cont.)


! Operating System Requirements

For a detailed list of all required packages please see


Installation Guide
10g Release 1 (10.1) for UNIX Systems: AIX-Based
Systems,hp HP-UX, hp Tru64 UNIX, Linux, and Solaris
Operating System
Part No. B10811-03

SUSE SLES8 (x86) due to be certified with patch


set 10.1.0.3, CD cut 24th. September

Linux Itanium
! Operating System Requirements

For a detailed list of all required packages please see


Installation Guide
10g Release 1 (10.1) for UNIX Systems: AIX-Based
Systems,hp HP-UX, hp Tru64 UNIX, Linux, and Solaris
Operating System
Part No. B10811-03

Prepare Linux Environment


! Follow these steps on EACH node of the cluster

Set Kernel parameters in /etc/sysctl.conf


Add hostnames / VIP to /etc/hosts file
Establish file system or location for ORACLE_HOME
(writable for oracle userid)

NIC Bonding
! Required for private interconnect resiliency.
! Various 3rd party vendor solutions available:

Linux
! NIC bonding in RHEL 3.0 ES
http://www.kernel.org/pub/linux/kernel/people/marcelo/li
nux-2.4/Documentation/networking/bonding.txt
! Intel Advanced Network Services (ANS)
http://www.intel.com/support/network/adapter/1000/linu
x/ans.htm
! HANIC
http://oss.oracle.com/projects/hanic/

Shared Storage Configuration


! Configure devices for the Voting Disk and OCR file.

Voting Disk >= 20MB, OCR >= 100MB.


Use storage mirroring to protect these devices

! Configure shared Storage (for ASM)

Use large number of similarly sized disks


Confirm shared access to storage disks from all nodes
Include space for flash recovery area

! Configure IO Multi-pathing

ASM must only see a single (virtual) path to the storage


Multi-pathing configuration is platform specific (e.g.
Powerpath, SecurePath, )

! Establish file system or location for ORACLE_HOME

(And CRS & ASM HOME)

Use devlabel
! Use devlabel to create a unique binding to the OCR and
Voting device.

To use the command, as root


devlabel add -s /dev/raw/raw1 -d /dev/sda1
devlabel add -s /dev/raw/raw2 -d /dev/sdb1
This will add an entry into /etc/sysconfig/devlabel something like this:
/dev/raw/raw1 /dev/sda1
S83.3:6006016024d20c003a3cb11d3484d811DGCRAID5sector1005268

The third column is the scsi unique id ( actually it is uuid) which is stored in
disk for the life of the LUN and never changes.

! http://www1.us.dell.com/content/topics/global.aspx/power/en/ps1q03_lerh
aupt
! http://linux.dell.com/devlabel/devlabel.html
! For Suse use the following option

rpm -ivh --nodeps devlabel-0.48.01-1.i386.rpm

Installation Flowchart for


ASMLib
Download the latest
ASMLib rpms from
http://oss.oracle.com/

Set the discovery string to


ORCL

Install the rpms on all nodes

Configure ASMLib with the


script /etc/init.d/oracelasm
configure option
Make disks available for ASM
with /etc/init.d/oracleasm
createdisk VOL1 /dev/sdg

For a detailed installation


Instruction please check
http://otn.oracle.com/tech/linux/asmlib/install.html

Installation Flowchart CRS


Create two raw devices for
OCR and voting disk

Install CRS/CSS stack with


Oracle Universal Installer
Start the Oracle stack the
first time with
$CRS_HOME/root.sh

Load/Install hangcheck timer

Oracle Cluster Manager (CRS)


Installation
! CRS is REQUIRED to be installed and running prior
to installing 10g RAC.
! CRS must be installed in a different location from the
ORACLE_HOME, (e.g. ORA_CRS_HOME).
! Shared Location(s) or devices for the Voting File and
OCR file must be available PRIOR to installing CRS.

Reinstallation of CRS requires re-initialization of devices,


including permissions.

! CRS and RAC require that the private and public


network interfaces be configured prior to installing
CRS or RAC
! Specify interconnect for CRS communication

CRS Installation cont.


! Only one set of CRS daemons can be running per
RAC node.
! The CRS stack is run from entries in /etc/inittab with
respawn.
! The supported method to start CRS is booting the
machine
! The supported method to stop is shutdown the
machine or use "/etc/init.d/init.crs stop".

Installation Flowchart Oracle


Install Oracle Software

DBCA

Root.sh on all nodes

Verify cluster and


database configuration

Define VIP (VIPCA)

NETCA

Oracle Installation
! The Oracle 10g Installation can be performed after
CRS is installed and running on all nodes.
! Start the runInstaller (do not cd in your /mnt/cdrom
directory)
! Run root.sh on all nodes

Running root.sh on the first node will invoke VIPCA who


will configure your Virtual IPs on all nodes
After root.sh is finished on the first node start this
sequencially on the remaining nodes.

VIP Installation
! The VIP Configuration Assistant (vipca) starts
automatically from $ORACLE_HOME/root.sh
! After the welcome screen you have to choose only the
public interfaces(s)
! The next screen will ask you for the Virtual IPs for
cluster nodes, add your /etc/hosts defined name
under IP Alias Name.

The VIP must be a DNS known IP address because we


use the VIP for the tnsnames connect.

! After finishing this you will see a new VIP interface eg:
eth0:1. Use ifconfig (on most platforms) to verify this.

VIP Installation cont.


! If a cluster is moving to a new datacenter (or subnet)
it is necessary to change IPs. The VIP is stored within
the OCR and any modification or change to the IP
requires additional administrative steps

Please see Metalink Note:276434.1 for details

! The listener.ora has the VIP IP address configured,


this has to be changed as well.

Create RAC database using


DBCA
! Set MAXINSTANCES, MAXLOGFILES,
MAXLOGMEMBERS, MAXLOGHISTORY,
MAXDATAFILES (auto with DBCA)
! Create tablespaces as locally Managed (auto with DBCA)
! Create all tablespaces with ASSM (auto with DBCA)
! Configure automatic UNDO management (auto with
DBCA)
! Use SPFILE instead of multiple init.oras (auto with
DBCA)

Validate Cluster Configuration


! Query OCR to confirm status of all defined services: crsstat t

Use script from Note 259301.1 to improve output


formatting/readability

HA Resource
ora.RO.RO1.inst
ora.RO.RO2.inst
ora.RO.db
ora.mars.LISTENER_MARS.lsnr
ora.mars.gsd
ora.mars.ons
ora.mars.vip
ora.venus.LISTENER_VENUS.lsnr
ora.venus.gsd
ora.venus.ons
ora.venus.vip

Target
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE
ONLINE

State
ONLINE on mars
ONLINE on venus
ONLINE on venus
ONLINE on mars
ONLINE on mars
ONLINE on mars
ONLINE on mars
ONLINE on venus
ONLINE on venus
ONLINE on venus
ONLINE on venus

Validate RAC Configuration


! Instances running on all nodes
SQL> select * from gv$instance

! RAC communicating over the private Interconnect


SQL> oradebug setmypid
SQL> oradebug ipc
SQL> oradebug tracefile_name
/home/oracle/admin/RAC92_1/udump/rac92_1_ora_1343841.trc
Check trace file in the user_dump_dest:
SSKGXPT 0x2ab25bc flags
info for network 0
socket no 10 IP 10.0.0.1 UDP 49197
sflags SSKGXPT_UP
info for network 1
socket no 0 IP 0.0.0.0
UDP 0
sflags SSKGXPT_DOWN

Validate RAC Configuration


! RAC is using desired IPC protocol: Check Alert.log
cluster interconnect IPC version:Oracle UDP/IP
IPC Vendor 1 proto 2 Version 1.0
PMON started with pid=2

! Use cluster_interconnects only if necessary

RAC will use the same virtual interconnect selected


during CRS install
To check which interconnect and is used and where it
came from use select * from x$ksxpia;

ADDR
INDX
INST_ID P PICK NAME_KSXPIA
IP_KSXPIA
---------------- ---------- ---------- - ---- --------------- --------00000003936B8580
0
1
OCR eth1
10.0.0.1
Pick: OCR Oracle Clusterware
OSD Operating System dependent
CI indicates that the init.ora parameter
cluster_interconnects is specified

SRVCTL
! SRVCTL is a very powerful tool
! SRVCTL uses information from the OCR file
! GSD in 10g is running just for compatibility to serve 9i
clients if 9i and 10g is running on the same cluster.
! srvctl status nodeapps -n <nodename> will show all
services running on a node

SRVCTL commands are documented in Appendix B of the


RAC Admin Guide at:
http://download-west.oracle.com/docs/cd/B14117_01/
rac.101/b10765/toc.htm

Post Installation
! Enable asynchronous I/O if available

cd $ORACLE_HOME/rdbms/lib; make -f ins_rdbms.mk


async_on ioracle

! To enable asynchronous I/O must re-link Oracle to


use skgaioi.o and install Patch Set Exception for bug
3208258, Basebug 3016968 for RH2.1 and RH3.0
! Workaround until 10.1.0.3, is to use this command:
$ make PL_ORALIBS=-laio -f ins_rdbms.mk async_on

Post Installation
! Install @@@ARU: 10.1.0.2 ARU 6076422 to fix the
use of the private interconnect.
! Adjust UDP send / receive buffer size to 256K

sysctl -w net.core.rmem_max=262144
sysctl -w net.core.wmem_max=262144
sysctl -w net.core.rmem_default=262144
sysctl -w net.core.wmem_default=262144

! If Buffer Cache > 1.7GB required, use 64-bit platform.

Large SGA & Address Space


! Please read note 260152.1 for details
! RHAS2.1 for ia32

2.4.9-e.XX-Uniprocessor kernel
2.4.9-e.XX-smp-SMP kernel capable of handling up to 4GB of
physical memory
2.4.9-e.XX-enterprise-SMP kernel capable of handling up to about
16GB of physical memory

! RHEL3 for ia32

2.4.21-XX.EL-Uniprocessor kernel
2.4.21-XX.Elsmp-SMP kernel capable of handling up to 16GB of
physical memory
2.4.21-XX.Elhugemem-SMP kernel capable of handling beyond
16GB, up to 64GB
(XX = number of the errata kernel)

NTP Protocol
! We recommend setting up the Network Time Protocol
(NTP) on all cluster nodes.
This will synchronize the clocks among all nodes, and
facilitate analysis of tracing information based on
timestamps.
! Note that adjusting clocks by more than 15 minutes
can cause instance evictions. It is strongly advised to
shutdown all instances before date/time adjustments.

Backing up the OCR andVoting


Disk
! After installing Oracle RAC 10g and after ensuring
that the system is functioning properly, make a
backup of the OCR device and Voting disk.
! In addition, make a backup of the OCR and Voting
disk contents after you complete any node additions
or node deletions and after running any de-installation
procedures.
! Use dd or copy depending if your OCR or Voting files
are on raw or OCFS.

Please see Note.279793.1 for additional information

Optimizing failover time


! Set init.ora parameter fast_start_mttr_target to a value
between 30 and 60 seconds to archive a instance
recovery time within the choosen value.
! To tune the node monitor misscount value, the default
on Linux is 60 seconds, please contact support or
consulting. We dont recommend to adjust this value
after CRS/CSS is up and running.

Linux Monitoring and


Configuration

Overall tools
sar, vmstat
CPU
/proc/cpuinfo, mpstat, top
Memory
/proc/meminfo, /proc/slabinfo, free
Disk I/O
iostat
Network
/proc/net/dev, netstat, mii-tool
Kernel Version and Rel.cat /proc/version
Types of I/O Cards
lspci -vv
Kernel Modules Loaded lsmod, cat /proc/modules
List all PCI devices
lspci v
Startup changes
/etc/sysctl.conf, /etc/rc.local
Kernel messages
/var/log/messages, /var/log/dmesg
OS error codes
/usr/src/linux/include/asm/errno.h
OS calls
/usr/sbin/strace-p

Agenda
! Planning Best Practices
! Implementation Best Practices
! Operational Best Practices

Locally Managed Tablespaces


! Create all tablespaces as locally managed with
automatic segment space management
CREATE TABLESPACE xx ..
EXTENT MANAGEMENT LOCAL ..
SEGMENT SPACE MANAGEMENT AUTO .

Done automatically in DBCA

ASSM
! Automatic Segment Space Management (ASSM)

Eliminates complex process of computing PCTUSED,


FREELISTS and FREELIST GROUPS
Allows dynamic affinity of space to instances and avoids
hard partitioning of space inherent with free list groups.
! Contention during concurrent access is removed and space
usage optimized.

Doesnt need any maintenance.


Allows you to support any number of instances without any
changes to the object.
Use OnLine rebuild features to move objects from Free List
Groups to ASSM.
Configured automatically in DBCA

Application Deployment
! Same guidelines as single instance

SQL Tuning
Sequence Caching
Partition large objects
Use different block sizes
Avoid DDL
Use LMTs and ASSM as noted earlier

Operations
! Same DBA procedures as single instance, with some
minor, mostly mechanical differences.
! Managing the Oracle environment

Starting/stopping Oracle cluster stack with boot/reboot


server
Managing multiple redo log threads

! Startup and shutdown of the database

Use Grid Control

! Backup and recovery


! Performance Monitoring and Tuning
! Production migration

Avoid false node evictions


! May get heart beat failures if critical processes are
unable to respond quickly

Enable real time priority for LMS


Do not run system at 100% CPU over long period
Ensure good I/O response times for control file and voting
disk

NEWS NEWS NEWS NEWS


! Oracle Sets Record TPC-H Performance Result for 8
CPU Systems With Clustered Linux Servers

http://biz.yahoo.com/prnews/040913/sfm121_1.html
http://www.tpc.org/

QUESTIONS
ANSWERS