You are on page 1of 9

To

RAC and Oracle Clusterware Best Practices and Starter Kit (AIX) [ID Bottom
811293.1]

Modified:Feb
26, 2013 Type:BULLETIN Comments (0)
Status:PUBLISHED Priority:1

In this Document

Purpose
Scope
Details
RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices
 
(Generic)
  RAC Platform Specific Starter Kits and Best Practices
  RAC on AIX Step by Step Installation Instructions
  RAC on AIX Best Practices
  OS Configuration Considerations
  Storage Considerations
  Network Considerations
  Oracle Software Considerations
  Community Discussions
References

Applies to:

Oracle Database - Enterprise Edition - Version 10.2.0.1 to 11.2.0.3 [Release 10.2 to 11.2]
IBM AIX on POWER Systems (64-bit)
IBM AIX Based Systems (64-bit)

Purpose

The goal of the Oracle Real Application Clusters (RAC) series of Best Practice and Starter Kit
notes is to provide customers with quick knowledge transfer of generic and platform specific best
practices for implementing, upgrading and maintaining an Oracle RAC system. This document is
compiled and maintained based on Oracle's experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is
meant as a supplement to the same. It is imperative that the Oracle Documentation be read,
understood, and referenced to provide answers to any questions that may not be clearly
addressed by this Starter Kit.
All recommendations should be carefully reviewed by your own operations group and should
only be implemented if the potential gain as measured against the associated risk warrants
implementation. Risk assessments can only be made with a detailed knowledge of the system,
application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation,
including implementations of Oracle RAC, is predicated on a successful test environment. It is
thus imperative that any recommendations from this Starter Kit are thoroughly tested and
validated using a testing environment that is a replica of the target production environment
before being implemented in the production environment to ensure that there is no negative
impact associated with the recommendations that are made.

Scope

This article applies to all new and existing RAC implementations as well as RAC upgrades.

Details

RAC Assurance Support Team: RAC and Oracle Clusterware Starter Kit and Best Practices
(Generic)

The following document focuses on RAC and Oracle Clusterware Best Practices that are
applicable to all platforms including a white paper on available RAC System Load Testing Tools
and RAC System Test Plan outlines for 10gR2 & 11gR1 and 11gR2:

Document 810394.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Platform
Independent)

RAC Platform Specific Starter Kits and Best Practices

The following notes contain detailed platform specific best practices including Step-By-Step
installation cookbooks (downloadable in PDF format):

Document 811306.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
Document 811280.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
Document 811271.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
Document 811293.1 RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
Document 811303.1 RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)

RAC on AIX Step by Step Installation Instructions


Click here for a Step By Step guide for installing Oracle RAC 10gR2 on AIX.
Click here for a Step By Step guide for installing Oracle RAC 11gR1 on AIX.
Click here for a Step By Step guide for installing Oracle RAC 11gR2 on AIX.

RAC on AIX Best Practices

The Best Practices in this section are specific to the AIX Platform. That said, it is essential that
the Platform Independent Best Practices found in Document 810394.1 also be reviewed.

OS Configuration Considerations

 It is essential that the Oracle Real Application Clusters on IBM AIX Best practices in
memory tuning and configuring for system stability joint IBM/Oracle White Paper be
reviewed by ALL customers running RAC on AIX.
 For 11gR2, start with Document 1427855.1 - AIX: Top Things to DO NOW to Stabilize
11gR2 GI/RAC Cluster
 Validate your hardware/software configuration against the RAC Technologies Matrix for
Unix.
 Ensure all required OS packages are installed and system prerequisites have been
properly implemented for your particular release of Oracle. This information is
documented in Document 169706.1 as well as the install guides for your particular
release.
 If deploying on an AIX virtualized system, review Document 1470654.1 to gain a
understanding of the resource utilization in such configurations.
 If running AIX 6.1, ensure that the fix for APAR IV04047 has been installed to avoid
potential instance hangs and node evictions. Additional details can be found in Document
1393041.1.
 To ensure system stability all mandatory patches for AIX (5L and 6) that are documented
in Document 282036.1 have been applied.
 Tune Virtual memory parameters. IBM recommended numbers are:

minperm%=3
maxperm%=90
maxclient%=90
lru_file_repage=0
strict_maxperm=0
strict_maxclient=1
page_steal_method=1

Example script for setting these parameters is as :

#!/usr/bin/ksh
vmo -p -o maxperm%=90;
vmo -p -o minperm%=3;
vmo -p -o maxclient%=90;
vmo -p -o strict maxperm=0;
vmo -p -o strict maxclient=1;
vmo -p -o lru_file_repage=0;
vmo -r -o page_steal_method=1; (need to reboot to take into effect)
vmo -p -o strict_maxclient=1
vmo -p -o strict_maxperm=0;

 On AIX 5.3, apply APAR IY84780 to fix a known kernel issue with per-cpu freelists. For
details on this APAR, refer to IY84780: KERNEL MEMORY GARBAGE
COLLECTOR FAILS TO FREE LISTS.

Note:  This fix is also included in Technology Level 4 (TL4) and higher. If necessary,
check with IBM for any superceding fixes. 

 Set AIXTHREAD_SCOPE=S in the environment: export AIXTHREAD_SCOPE=S for


improved performance (default of S on AIX 6.1 and above).  Refer to Document
458403.1 (Why AIXTHREAD_SCOPE should be set to 'S' on AIX) for additional
details.
 When using the Processor Folding feature (default), it is essential that the Fix Packs for
AIX 5.3 and 6.1 are applied to prevent system hangs.
 If not using HACMP then HACMP filesets must not be installed.
 
 Do not use filesystems mounted with "cio" option for Oracle Homes, software staging or
temp.  The "cio" mount option is not supported and will cause, installation, relinking and
other unexpected failures.  See Document 869644.1 for details.
 Ensure that the GI and ORACLE owner account has the CAP_NUMA_ATTACH,
CAP_BYPASS_RAC_VMM, and CAP_PROPAGATE capabilities.  This is required per
the 11gR2 installation guide and it is also required for all pre-11gR2 installations. Check
and Set example for GRID user is as follows:

#/usr/bin/lsuser -a capabilities grid


#/usr/bin/chuser capabilities=CAP_NUMA_ATTACH,CAP_BYPASS_RAC_VMM,CAP_PROPAGATE
grid

Storage Considerations

 Ensure that SAN Storage is capable of read/write concurrency (writing at the same time
from any member of the RAC cluster) through it's drivers. This means that
"reserve_policy" attribute from the discovered disks (hdisk, hdiskpower, dlmfdrv, etc )
must be capable of handling settings with: "no_reserve" or "no_lock" values. 
See Document 422075.1 for details.
 Do not assign PVIDs (Physical Volume IDs) to disks or volumes that are being used for
ASM Diskgroups. PVIDs should be cleared on all nodes from any candidate disks or
volumes prior to being added to an ASM Diskgroup. Once a disk or volume is added to
an ASM Diskgroup, PVIDs should never be assigned after the fact, from any node in the
cluster, including nodes that are being added to an existing cluster. Reference Document
353761.1for more details on this issue.

CAUTION:  Assigning PVIDs to ASM disks will corrupt the disk header resulting in
catastrophic data loss!!
 Set FSCSI Device Attribute FC_ERR_RECOV to FAST_FAIL for Voting Disk and
ASM storage. This setting has been shown to avoid reboots in situations where a SAN
storage outage of the volumes hosting one of 3 voting disks caused reboots to occur. 
See Document 560077.1 for details.
 When implementing GPFS, be sure to review Document 302806.1 for recommendations
on LUN configuration, filesystem blocksize, AIO configuration, inodes, and
implementation examples.
 Users of AIX may encounter long interactive-application response times when other
applications in the system are running large writes to disk. Configuring I/O pacing limits
the number of outstanding I/O requests against a file. AIX 6.1 enables I/O pacing by
default and the default value: "minpout=4096 and maxpout=8193" is good for AIX6.1. 
However, in AIX 5.3, you need to explicitly enable this feature.

Oracle's testing has shown that starting values of 8 for minpout and 12 for maxpout are a good
baseline for most Oracle customers. However, every environment is different, and therefore
different values may very well be acceptable, if the system has been properly tuned and shown to
perform with differing values. To configure I/O pacing on the system via SMIT, using Oracle's
recommended baseline values, enter the following at the command line as root: 
# smitty chgsys
# chdev -l sys0 -a minpout=8 -a maxpout=12

 On AIX ASM can use concurrent RAW logical volumes or RAW partitions.  When using
multipath technologies with ASM, ASM must access the devices via the appropriate
multipath device, the device paths for major multipath technologies are documented
in Document 294869.1.

Network Considerations

 Ensure that the network tuning parameters are set in accordance with the following to
ensure optimal interconnect performance: 

tcp_recvspace = 65536
tcp_sendspace = 65536
udp_sendspace = ((DB_BLOCK_SIZE * DB_MULTIBLOCK_READ_COUNT) + 4 KB) but no lower
than 65536
udp_recvspace = 655360 (Minimum recommended value is 10x udp_sendspace,
parameter value must be less than sb_max)
rfc1323 = 1
sb_max = 4194304 
ipqmaxlen = 512

NOTE: Failure to set the udp_sendspace will result in failure of root.sh for


11.2.0.2 GI installations, see Document 1280234.1.

 Oracle clusterware VIPs IP address and corresponding nodes names must not be used on
the network prior to Oracle Clusterware installation. Don't make any AIX alias on the
public network interface, the clusterware installation will do it.  Just reserve 1 VIP and
it's hostname per RAC node. Oracle Clusterware VIP IPs and corresponding nodes names
are to be defined in DNS.
 Installations using AIX VIO must review Document 1305174.1 - AIX VIO: Block Lost
or IPC Send Timeout Possible Without Fix of APAR IZ97457.
 HAIP may randomly fail to start on cluster startup on 11.2.0.3 with: category: -2,
operation: SETIF, loc: bpfopen:21,o, OS error: 6, other: dev /dev/bpf0.  This is the result
of Bug 13989181 which is fixed in Patch 13989181.

Oracle Software Considerations

The Software Considerations in this section are specific to the AIX Platform. That said, it is
essential that the Platform Independent Best Practices found in Document 810394.1 also be
reviewed.

 For 10.2.0.4 and 11.1.0.7 installations on AIX systems using the IBM Logical Host
Ethernet Adapter (LHEA) interfaces, it's required to apply the fix for Bug 8725020 to
ensure VIP functionality.  This fix is included in 10.2.0.5 and 11.1.0.7 CRS bundle#1
(and above).  See Document 959746.1 for additional details around this issue.
 To ensure that critical process threads are running with the proper priority (to prevent
node evictions), apply the fix for BUG 13940331 (AIX Specific).  Bug 13940331 is fixed
in 11.2.0.4, at present one-off patches are available for 10.2.0.5 and 11.2.0.3 under Patch
13940331.
 For 11.2.0.2 installations and/or upgrades apply the 11.2.0.2.4 GI PSU  Patch
12827731 (or later) prior to running root.sh or rootupgrade.sh to prevent failure of these
scripts (due to Bug 10370797, fixed in 11.2.0.2.4).  Instructions on how to apply the
11.2.0.2.4 GI PSU Patch 12827731 prior to running root.sh or rootupgrade.sh are as
follows:

Note:  These instructions were written for the 11.2.0.2.4 GI PSU.  Though the patch
numbers will differ, the same instructions are applicable for later GI PSUs.

1. Perform an Oracle Grid Infrastructure 11.2.0.2 installation or upgrade


2. Right before the first root.sh (or rootupgrade.sh) is supposed to be run, leave the
current installation behind:

o Do NOT run root.sh or rootupgrade.sh


o Do NOT close the installer or abort an operation in progress.
o Do Leave the current installation as-is and open a new terminal.

3. Download and prepare the application of Patch 12827731 by unzipping the patch into
an empty directory on EVERY node in the cluster.
4. Download and install the latest version of OPatch to apply the patch.  The latest
version of OPatch can be found under Patch 6880880.  Install OPatch into the GI Home
on ALL nodes as follows:

$ unzip <OPATCH-ZIP> -d <ORACLE_HOME>


5. Unlike described in the patch readme,

o Do NOT use "opatch auto"


o Since this is a fresh install that has not been configured, do NOT execute
"rootcrs.pl -unlock" or "rootcrs.pl -patch"
o Do use: "opatch napply -local" as the software install owner e.g. grid

$GI_HOME/OPatch/opatch napply -local <patch_location>/12827731


$GI_HOME/OPatch/opatch napply -local <patch_location>/12827726

Note: Opatch is used with the "-local" flag here, you need to perform this operation on
every node.

6. After you have patched every node in the cluster, return to the original installation
7. Proceed to run the root.sh (rootupgrade.sh) on all nodes and follow the instructions on
the OUI screen.

 On pre-11.2 AIX systems (without vendor clusterware) OPROCD by default is not


running in the AIX global run queue (Bug 13623902) which can cause false reboots by
OPROCD.  Corrective action on this issue is to modify the /etc/init.cssd file as follows:

Note:  The instructions below are performed in a rolling method to avoid a complete
outage of the database.

1.  Stop the Clusterware stack on the local node.


2.  Modify the /etc/init.cssd as follows:

From:

   # Run oprocd synchronously and look for its status code


   cd $OPROCDIR

   # startup the some diagnostic collection scripts if any


   StartDiagCollect;

   $OPROCD run -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \


      $OPROCD_DEFAULT_HISTOGRAM $FATALARG
   RC=$?

To:

   # Run oprocd synchronously and look for its status code


   cd $OPROCDIR

   # startup the some diagnostic collection scripts if any


   StartDiagCollect;

   RT_GRQ=ON
   export RT_GRQ
   $OPROCD run -t $OPROCD_DEFAULT_TIMEOUT -m $OPROCD_DEFAULT_MARGIN \
      $OPROCD_DEFAULT_HISTOGRAM $FATALARG
   RC=$?

3.  Restart the Clusterware stack on the local node.


4.  Repeat Steps 1-3 on all remaining cluster nodes.

Community Discussions

Still have questions? Use the communities window below to search for similar discussions or
start a new discussion on this subject.

Note: Window is the LIVE community not a screenshot.

Click here to open in main browser window.

*LINK to Community: 
https://communities.oracle.com/portal/server.pt/community/scalability_rac/253
Hover over text:  Database - RAC/Scalability Community

References

NOTE:294869.1 - Oracle ASM and Multi-Pathing Technologies


NOTE:353761.1 - Assigning a Physical Volume ID (PVID) To An Existing ASM Disk Corrupts
the ASM Disk Header
NOTE:422075.1 - Error ORA-27091, ORA-27072 When Mounting Diskgroup
NOTE:560077.1 - Asm Hangs After Loss Of Failgroup on AIX
NOTE:810394.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Platform
Independent)
NOTE:811271.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
NOTE:811280.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
NOTE:811293.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
NOTE:869644.1 - Having an ORACLE_HOME on a Filesystem Mounted With "cio" Option is
Not Supported and Will Have Issues
BUG:8725020 - VIP WONT RUN (LHEA) ADAPTER 5.3 TL9
NOTE:1305174.1 - AIX VIO: Block Lost or IPC Send Timeout Possible Without Fix of APAR
IZ97457
NOTE:959746.1 - AIX: 10.2/11.1 VIP Fails to Come Up with "Invalid Parameters, Or Failed To
Bring Up VIP"
NOTE:811303.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)
NOTE:811306.1 - RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
NOTE:1393041.1 - AIX 6.1 Instance Hang Then Node Reboot due to High Load IV04047
NOTE:1427855.1 - AIX: Top Things to DO NOW to Stabilize 11gR2 GI/RAC Cluster
NOTE:169706.1 - Oracle Database (RDBMS) on Unix AIX,HP-UX,Linux,Mac OS
X,Solaris,Tru64 Unix Operating Systems Installation and Configuration Requirements Quick
Reference (8.0.5 to 11.2)
NOTE:282036.1 - Minimum Software Versions and Patches Required to Support Oracle
Products on IBM Power Systems

Attachments

 RACGuides_Rac10gR2OnAIX.pdf (2.16 MB)

 RACGuides_Rac11gR1OnAIX.pdf (12.23 MB)

 RACGuides_Rac11gR2OnAIX.pdf (1.71 MB)

Related

You might also like