Professional Documents
Culture Documents
RAC
RAC
1]
Modified 08-MAR-2010
PUBLISHED
Type FAQ
Status
Applies to:
Oracle Server - Enterprise Edition - Version: 9.2.0.1 to 11.2.0.1 - Release: 9.2 to 11.2
Purpose
Frequently Asked Questions for Real Application Clusters and Grid Infrastructure.
What are the dependencies between OCFS and ASM in Oracle Database 10g ?
What software is necessary for Oracle RAC? Does it have a separate installation CD to
order?
Where can I find a list of supported solutions to ensure NIC availability / redundancy (for
the interconnect) per platform?
I have changed my spfile with alter system set parameter_name =.... scope=spfile. The
spfile is on ASM storage and the database will not start.
What combinations of Oracle Clusterware, Oracle RAC and ASM versions can I use?
I had a 3 node Oracle RAC. One of the nodes had to be completely rebuilt as a result of a
problem. As there are no backups, What is the proper procedure to remove the 3rd node
from the cluster so it can be added back in?
Does Weblogic (WLS) support Services, FAN/FCF, and the Load Balancing Advisory
(LBA) with Oracle RAC?
Where do I find Oracle Clusterware binaries and ASM binaries with Oracle Database 11g
Release 2?
If my OCR and Voting Disks are in ASM, can I shutdown the ASM instance?
I have the 11.2 Grid Infrastructure installed and now I want to install an earlier version of
Oracle Database (11.1 or 10.2), is this supported ?
What is SCAN?
I get an error with DBCA from 10.2 or 11.1 after I have installed the 11.2 Grid
Infrastructure?
I get the following error starting my Oracle RAC database, what do I do?
WARNING: No cluster interconnect has been specified.
Are block devices supported for OCR, Voting Disks, ASM devices?
When configuring the NIC cards and switch for a GigE Interconnect should it be set to
FULL or Half duplex in Oracle RAC?
How can a NAS storage vendor certify their storage solution for Oracle RAC ?
Can I run Oracle 9i RAC and Oracle RAC 10g in the same cluster?
What are the restrictions on the SID with an Oracle RAC database? Is it limited to 5
characters?
I want to use rconfig to convert a single instance to Oracle RAC but I am using raw
devices in Oracle RAC. Does rconfig support RAW ?
Can we designate the place of archive logs on both ASM disk and regular file system,
when we use SE RAC?
Can my customer use Veritas Agents to manage their Oracle RAC database on Unix with
SFRAC installed?
Can I run more than one clustered database on a single Oracle RAC cluster?
I could not get the user equivalence check to work on my Solaris 10 server when trying to
install 10.2.0.1 Oracle Clusterware. The install ran fine without issue. << Message:
Result: User equivalence check failed for user "oracle". >>
Why does the NOAC attribute need to be set on NFS mounted RAC Binaries?
Are there any issues for the interconnect when sharing the same switch as the public
network by using VLAN to separate the network?
Why does netca always creates the listener which listens to public ip and not VIP only?
Does changing uid or gid of the Oracle User affect Oracle Clusterware?
Can we output the backupset onto regular file system directly (not onto flash recovery
area) using RMAN command, when we use SE RAC?
Should the SCSI-3 reservation bit be set for our Oracle Clusterware only installation?
RAC Assistance
High Availability
How do I configure FCF with BPEL so I can use Oracle RAC 10g in the backend?
Can I change my SCAN after I have completed my Grid Infrastructure 11g Release 2
install?
Why do we have a Virtual IP (VIP) in Oracle RAC 10g or 11g? Why does it just return a
dead connection when its primary node fails?
What do the VIP resources do once they detect a node has failed/gone down? Are the
VIPs automatically acquired, and published, or is manual intervention required? Are
VIPs mandatory?
If I use Services with Oracle RAC, do I still need to set up Load Balancing ?
How can a customer mask the change in their clustered database configuration from their
client or application? (I.E. So I do not have to change the connection string when I add a
node to the Oracle RAC database)
What are my options for load balancing with Oracle RAC? Why do I get an uneven
number of connections on my instances?
I have a 2 node Oracle RAC cluster, if I pull the interconnect on node 1 to simulate
failure, why does node 2 reboot?
Can our Oracle RAC 10g VIP fail over from NIC to NIC as well as from node to node ?
Is there a way to provide or configure HA for the interconnect using Infiniband on AIX ?
I am using shared services which the following set in init.ora SQL> show parameters
dispatchers=(protocol=TCP)(listener=listen ers_nl01)(con=500)(serv=oltp). I stopped my
service with srvctl stop service but it is still registered with the listener and accepting
connections. Is this expected?
Is it possible to use SVRCTL start database with a user account other than oracle ( that is
other than the owner of the oracle software)?
With three primary load balancing options (client-side connect-time LB, server-side
connect-time LB, and the runtime connection load balancing) Is it fair to say Runtime
Connection Load Balancing is the only option to leverage FAN up/down events?
What is Server-side Transparent Application Failover (TAF) and how do I use it?
What does the Virtual IP service do? I understand it is for failover but do we need a
separate network card? Can we use the existing private/public cards? What would happen
if we used the public ip?
I want to configure a secure environment for ONS so have added a Wallet however I am
seeing errors (SSL handshake failed) after adding the wallet?
Do I need to install the ONS on all my mid-tier serves in order to enable JDBC Fast
Connection Failover (FCF)?
Can I use the 10.2 JDBC driver with 10.1 database for FCF?
How does the datasource properties initialLimit, minLimit, and maxLimit affect Fast
Connection Failover processing with JDBC?
What type of callbacks are supported with OCI when using FAN/FCF?
Scalability
I am seeing the wait events 'ges remote message', 'gcs remote message', and/or 'gcs for
action'. What should I do about these?
What are the changes in memory requirements from moving from single instance to
RAC?
Will adding a new instance to my Oracle RAC database (new node to the cluster) allow
me to scale the workload?
Can I have different servers in my Oracle RAC? Can they be from different vendors? Can
they be different sizes?
A customer is currently using RAC in a 2 node environment. How should one review the
ability to scale out to 4, 6, 8 or even more nodes? What should the requirements of a scale
out test?
What are my options for setting the Load Balancing Advisory GOAL on a Service?
How can I validate the scalability of my shared storage? (Tightly related to RAC /
Application scalability)
Does Database blocksize or tablespace blocksize affect how the data is passed across the
interconnect?
Manageability
How should I deal with space management? Do I need to set free lists and free list
groups?
I was installing Oracle 9i RAC and my Oracle files did not get copied to the remote
node(s). What went wrong?
If I am using Vendor Clusterware such as Veritas, IBM, Sun or HP, do I still need Oracle
Clusterware to run Oracle RAC 10g or Oracle RAC 11g?
How is Oracle Enterprise Manager integrated with the Oracle RAC 11g Release 2 stack?
What storage option should I use for Oracle RAC on Linux? ASM / OCFS / Raw Devices
/ Block Devices / Ext3 ?
What are the implications of using srvctl disable for an instance in my Oracle RAC
cluster? I want to have it available to start if I need it but at this time to not want to run
this extra instance for this database.
How do I identify which node was used to install the cluster software and/or database
software?
Are the Oracle Clusterware bundle patches cumulative, do they conflict with one
another?
I have added a second network to my cluster, can I load balance my users across this
network?
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215, however
sqlplus can start it on both nodes? What is the problem?
When I look at ALL_SERVICES view in my database I see services I did not create,
what are they for?
I have 2 clusters named "crs" (the default), how do I get Grid Control to recognize them
as targets?
I found in 10.2 that the EM "Convert to Cluster Database" wizard would always fall over
on the last step where it runs emca and needs to log into the new cluster database as
dbsnmp to create the cluster database targets etc. I changed the password for the dbsnmp
account to be dbsnmp (same as username) and it worked OK. Is this a known issue?
What versions of the database can I use the cluster verification utility (cluvfy) with?
Platform Specific
Is it possible to run Oracle RAC on logical partitions (i.e. LPARs) or virtual separate
servers.
Can the Oracle Database Configuration Assistant (DBCA) be used to create a database
with Veritas DBE / AC 3.5?
After installing patchset 9013 and patch_2313680 on Linux, the startup was very slow
Oracle Clusterware fails to start after a reboot due to permissions on raw devices
reverting to default values. How do I fix this?
How do I configure raw devices in order to install Oracle Clusterware 10g on RHEL5 or
OEL5?
Can different releases of Oracle RAC be installed and run on the same physical Linux
cluster?
Is the hangcheck timer still needed with Oracle RAC 10g and 11g?
Customer did not load the hangcheck-timer before installing RAC, Can the customer just
load the hangcheck-timer ?
Does Oracle Support Oracle RAC with Solaris 10 Containers (aka Zones)?
In Solaris 10, do we need Sun Cluster to provide redundancy for the interconnect and
multiple switches?
Can I configure HP's Autoport aggregation for NIC Bonding after the install? (i.e. not
present beforehand)
Is HMP supported with Oracle RAC 10g or Oracle RAC 11g on all HP platforms ?
Does the Oracle Cluster File System (OCFS) support network access through NFS or
Windows Network Shares?
When running Oracle RAC on Windows 2003, what is the recommended OS level?
Can I run my Oracle 9i RAC and Oracle RAC 10g on the same Windows cluster?
When using MS VSS on Windows with Oracle RAC, do I need to run the VSS on each
node where I have an Oracle RAC instance?
What do I do when I get an ORA-01031 error logging into the ASM instance?
The OracleCRService does not start with my windows Oracle RAC implementation, what
do I do?
How do I verify that Host Bus Adapter Node Local Caching has been disabled for the
disks I will be using in my RAC cluster?
My customer has a failsafe cluster installed, what are the benefits of moving their system
to RAC?
My customer wants to understand what type of disk caching they can use with their
Windows RAC Cluster, the install guide tells them to disable disk caching?
Is HACMP needed for RAC on AIX 5.2 using GPFS file system?
Can I run Oracle RAC 10g on my IBM Mainframe Sysplex environment (z/OS)?
Can I use Oracle Clusterware for failover of the SAP Enqueue and VIP services when
running SAP in a RAC environment?
Diagnosibility
How do I gather all relevant Oracle and OS log/trace files in an Oracle RAC cluster to
provide to Support?
What is the optimal migration path to be used while migrating the E-Business suite to
Oracle RAC?
What are the Best Practices for using a clustered file system with Oracle RAC?
Can I use a cluster file system for OCR, Voting Disk, Binaries as well as database files?
Is Sun QFS supported with Oracle RAC? What about Sun GFS?
Is Red Hat GFS(Global File System) is certified by Oracle for use with Oracle Real
Application Clusters?
What is the maximum number of nodes I can have in my cluster if I am using OCFS2?
Oracle Clusterware
When does the Oracle node VIP fail over to another node and subsequently return to its
home node?
How do I use multiple network interfaces to provide High Availability and/or Load
Balancing for my interconnect with Oracle Clusterware?
Can the Network Interface Card (NIC) device names be different on the nodes in a
cluster, for both public and private?
When ct run the command 'onsctl start' receives the message "Unable to open
libhasgen10.so". Any idea why the message "unable to open libhasgen10.so" ?
Voting Files stored in ASM - How many disks per disk group do I need?
With GNS, do ALL public addresses have to be DHCP managed (public IP, public VIP,
public SCAN VIP) ?
I am trying to move my voting disks from one diskgroup to another and getting the error
"crsctl replace votedisk not permitted between ASM Disk Groups." Why?
Can I run the fixup script generated by the 11.2 OUI or CVU on a running system?
What should the permissions be set to for the voting disk and ocr when doing an Oracle
RAC Install?
How is the Oracle Cluster Registry (OCR) stored when I use ASM?
I am trying to install Oracle Clusterware (10.2) and when I run the OUI, at the Specify
Cluster Configuration screen, the Add, Edit and Remove buttons are grayed out. Nothing
comes up in the cluster nodes either. Why?
I am installing Oracle Clusterware with a 3rd party vendor clusterware however in the
"Specify Cluster Configuration Page" , Oracle Clusterware installer doesn't show the
existing nodes. Why?
I made a mistake when I created the VIP during the install of Oracle Clusterware, can I
change the VIP?
How should I test the failure of the public network (IE Oracle VIP failover) in my Oracle
RAC environment?
Can I change the public hostname in my Oracle Database 10g Cluster using Oracle
Clusterware?
Does the hostname have to match the public name or can it be anything else?
I have a 2-node RAC running. I notice that it is always node2 that is evicted when I test
private network failure scenario by disconnecting the private network cable. Doesn't
matter whether it is node1's or node2's private network cable that is disconnected, it is
always the node2 that is evicted. What happens in a 3-nodes RAC cluster if node1's cable
is disconnected?
Can I use Oracle Clusterware to provide cold failover of my single instance Oracle
Databases?
What are the licensing rules for Oracle Clusterware? Can I run it without RAC?
In the course of failure testing in an extended RAC environment we find entries in the
cssd logfile which indicate actions like 'diskShortTimeout set to (value)' and
'diskLongTimeout set to (value)'.
Can anyone please explain the meaning of these two timeouts in addition to disktimeout?
During Oracle Clusterware installation, I am asked to define a private node name, and
then on the next screen asked to define which interfaces should be used as private and
public interfaces. What information is required to answer these questions?
Can I change the name of my cluster after I have created it when I am using Oracle
Clusterware?
Why does Oracle Clusterware use an additional 'heartbeat' via the voting disk, when other
cluster software products do not?
Why does Oracle still use the voting disks when other cluster sofware is present?
Customer is hitting bug 4462367 with an error message saying low open file descriptor,
how do I work around this until the fix is released with the Oracle Clusterware Bundle for
10.2.0.3 or 10.2.0.4 is released?
Does Oracle Clusterware have to be the same or higher release than all instances running
on the cluster?
Can I set up failover of the VIP to another card in the same machine or what do I do if I
have different network interfaces on different nodes in my cluster (I.E. eth0 on node1,2
and eth1 on node 3,4)?
How can I register the listener with Oracle Clusterware in RAC 10g Release 2?
Why is the home for Oracle Clusterware not recommended to be subdirectory of the
Oracle base directory?
How do I put my application under the control of Oracle Clusterware to achieve higher
availability?
Is it a requirement to have the public interface linked to ETH0 or does it only need to be
on a ETH lower than the private interface?: - public on ETH1 - private on ETH2
Can I use ASM as mechanism to mirror the data in an Extended RAC cluster?
How should voting disks be implemented in an extended cluster environment? Can I use
standard NFS for the third site voting disk?
When I run 10.2 CLUVFY on a system where RAC 10g Release 1 is running I get
following output:
Package existence check failed for "SUNWscucm:3.1".
Package existence check failed for "SUNWudlmr:3.1".
Package existence check failed for "SUNWudlm:3.1".
Package existence check failed for
"ORCLudlm:Dev_Release_06/11/04,_64bit_3.3.4.8_reentrant".
Package existence check failed for "SUNWscr:3.1".
Package existence check failed for "SUNWscu:3.1".
Checking this Solaris system I don't see those packages installed. Can I continue my
install?
What are the default values for the command line arguments?
How do I check the Oracle Clusterware stack and other sub-components of it?
Is there a way to verify that the Oracle Clusterware is working properly before
proceeding with RAC install?
At what point cluvfy is usable? Can I use cluvfy before installing Oracle Clusterware?
What is a stage?
What is a component?
What is nodelist?
How do I know about cluvfy commands? The usage text of cluvfy does not show
individual commands.
Do I have to type the nodelist every time for the CVU commands? Is there any shortcut?
Why the peer comparison with -refnode says passed when the group or user does not
exist?
If a current customer has an Enterprise License Agreement (ELA), are they entitled to use
Oracle RAC One Node?
Is Oracle RAC One Node supported with 3rd party clusterware and/or 3rd party CFS?
How does RAC One Node compare with traditional cold fail over solutions like HP
Serviceguard, IBM HACMP, Sun Cluster and Symantec, and Veritas Cluster Server?
How does RAC One Node compare with a single instance Oracle Database protected
with Oracle Clusterware?
What is Oracle Real Application Clusters One Node (RAC One Node)?
If I add or remove nodes from the cluster, how do I inform RAC One Node?
How do I get Oracle Real Application Clusters One Node (Oracle RAC One Node)?
How does RAC One Node compare with database DR products like DataGuard or
Golden Gate?
How do I install the command line tools for RAC One Node?
How does RAC One Node compare with virtualization solutions like VMware?
Can I use Oracle RAC One Node for Standard Edition Oracle RAC?
If the root.sh script fails on a node during the install of the Grid Infrastructure with
Oracle Database 11g Release 2, can I re-run it?
How do I explain for a customer who is concerned about the phrase in the following
doc ?
Oracle Clusterware Administration and Deployment Guide 11g Release 2 (11.2)
E10717-04
2-27
"If Oracle ASM fails, then OCR is not accessible on the node on which Oracle ASM
failed, but the cluster remains operational. The entire cluster only fails if the Oracle ASM
instance on the OCR master node fails, if the majority of the OCR locations are in Oracle
ASM, and if there is an OCR read or write access, then the crsd stops and the node
becomes inoperative. "
Is it recommended that we put the OCR/Voting disk on ASM disk and, if so, is it
preferable to create a separate disk group for them?
Answers
What is Cache Fusion and how does this affect applications?
Cache Fusion is a new parallel database architecture for exploiting clustered computers to
achieve scalability of all types of applications. Cache Fusion is a shared cache architecture that
uses high speed low latency interconnects available today on clustered systems to maintain
database cache coherency. Database blocks are shipped across the interconnect to the node where
access to the data is needed. This is accomplished transparently to the application and users of
the system. As Cache Fusion uses at most a 3 point protocol, this means that it easily scales to
clusters with a large numbers of nodes. For more information about cache fusion see the
following links:
Additional Information can be found at:
Note: 139436.1 Understanding 9i Real Application Clusters Cache Fusion
causecontention with header blocks of tables and indexes as multiple instances vie for the same
block. This may cause a performance problem and require data partitioning. However, the need
for these changes should be rare.
Recommendation: apply automatic space segment management to perform these changes
automatically. The free space management will replace the freelists and freelist groups and is
better. The database requires one Redo thread and one Undo tablespace for each instance, which
are easily added with SQL commands or with Enterprise Manager tools. NOTE: With ORacle
RAC 11g Release 2, you do not neet to pre-create redo threads or undo tablespaces if you
are using Oracle Managed Files (EG ASM).
Datafiles will need to be moved to either a clustered file system (CFS) so that all nodes can
access them. Oracle recommends the use of Automatic Storage Management (ASM) Also, the
MAXINSTANCES parameter in the control file must be greater than or equal to number of
instances you will start in the cluster.
For more detailed information, please see Migrating from single-instance to RAC in the Oracle
Documentation.
With Oracle Database 10g Release 2, $ORACLE_HOME/bin/rconfig tool can be used to convert
Single instance database to RAC. This tool takes in a xml input file and convert the Single
Instance database whose information is provided in the xml. You can run this tool in "verify
only" mode prior to performing actual conversion. This is documented in the Oracle RAC Admin
book and a sample xml can be found
$ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC.xml. This tool only
supports databases using a clustered file system or ASM. You cannot use it with raw devices.
Grid Control 10g Release 2 provides a easy to use wizard to perform this function.
Oracle Enterprise Manager includes workflows to assiste with migrations. (I.E. Migrating to
ASM, Creating Standby, Converting Standby to RAC etc) The migration is automated in
Enterprise Manager Grid Control 10.2.0.5.
What are the dependencies between OCFS and ASM in Oracle Database 10g ?
In an Oracle RAC 10g environment, there is no dependency between Automatic Storage
Management (ASM) and Oracle Cluster File System (OCFS).
OCFS is not required if you are using Automatic Storage Management (ASM) for database files.
You can use OCFS on Windows( Version 2 on Linux ) for files that ASM does not handle binaries (shared oracle home), trace files, etc. Alternatively, you could place these files on local
file systems even though it's not as convenient given the multiple locations.
If you do not want to use ASM for your database files, you can still use OCFS for database files
in Oracle Database 10g.
Each node of a cluster that is being used for a clustered database will typically have the database
and Oracle RAC software loaded on it, but not actual datafiles (these need to be available via
shared disk). For example, if you wish to run Oracle RAC on 2 nodes of a 4-node cluster, you
would need to install the clusterware on all nodes, Oracle RAC on 2 nodes and it would only
need to be licensed on the two nodes running the Oracle RAC database. Note that using a
clustered file system, or NAS storage can provide a configuration that does not necessarily
require the Oracle binaries to be installed on all nodes.
With Oracle RAC 11g Release 2, if you are using policy managed databases, then you should
have the Oracle RAC binaries accessible on all nodes in the cluster.
What software is necessary for Oracle RAC? Does it have a separate installation
CD to order?
Oracle Real Application Clusters is an option of Oracle Database and therefore part of the Oracle
Database CD. With Oracle 9i, Oracle 9i RAC is part of Oracle9i Enterprise Edition. If you install
9i EE onto a cluster, and the Oracle Universal Installer (OUI) recognizes the cluster, you will be
provided the option of installing RAC. Most UNIX platforms require an OSD installation for the
necessary clusterware. For Intel platforms (Linux and Windows), Oracle provides the OSD
software within the Oracle9i Enterprise Edition release.
With Oracle Database 10g, Oracle RAC is an option of EE and available as part of SE. Oracle
provides Oracle Clusterware on its own CD included in the database CD pack.
Please check the certification matrix (Note 184875.1) or with the appropriate platform vendor for
more information.
With Oracle Database 11g Release 2, Oracle Clusterware and Automatic Storage Management
are installed as a single set of binaries called the grid infrastructure. The media for the grid
infrastructure is on a separate CD or under the grid directory. For standalone servers, Automatic
Storage Management and Oracle Restart are installed as the grid infrastructure for a standalone
server which is installed from the same media.
design. Serializing contention makes applications less scalable. If your customer uses standard
SQL and schema tuning, it solves > 80% of performance problems
Some of the scaleability pitfalls they should look for are:
* Serializing contention on a small set of data/index blocks
--> monotonically increasing key
--> frequent updates of small cached tables
--> segment without automatic segment space management (ASSM) or Free List Group (FLG)
* Full table scans
--> Optimization for full scans in 11g can save CPU and latency
* Frequent invalidation and parsing of cursors
--> Requires data dictionary lookups and synchronizations
* Concurrent DDL ( e.g. truncate/drop )
Look for:
* Indexes with right-growing characteristics
--> Use reverse key indexes
--> Eliminate indexes which are not needed
* Frequent updated and reads of small tables
--> small=fits into a single buffer cache
--> Use Sparse blocks ( PCTFREE 99 ) to reduce serialization
* SQL which scans large amount of data
--> Perhaps more efficient when parallelized
--> Direct reads do not need to be globally synchronized ( hence less CPU for global cache )
I have changed my spfile with alter system set parameter_name =.... scope=spfile.
The spfile is on ASM storage and the database will not start.
How to recover: </p>
In $ORACLE_HOME/dbs
. oraenv <instance_name>
sqlplus "/ as sysdba"
startup nomount
create pfile='recoversp' from spfile
/
shutdown immediate
quit
Now edit the newly created pfile to change the parameter to something sensible.
Then:
sqlplus "/ as sysdba"
startup pfile='recoversp' (or whatever you called it in step one).
create spfile='+DATA/GASM/spfileGASM.ora' from pfile='recoversp'
/
<b>N.B.The name of the spfile is in your original init(instance_name).ora so adjust to suit</b>
shutdown immediate
startup
quit
What combinations of Oracle Clusterware, Oracle RAC and ASM versions can I
use?
See Note:337737.1 for a detailed support matrix. Basically the Clusterware version must be at
least the highest release of ASM or Oracle RAC. ASM must be at least 10.1.0.3 to work with
10.2 database.
Note: With Oracle Database 11g Release 2, You must upgrade Oracle Clusterware and ASM to
11g Release 2 at the same time.
I had a 3 node Oracle RAC. One of the nodes had to be completely rebuilt as a
result of a problem. As there are no backups, What is the proper procedure to
remove the 3rd node from the cluster so it can be added back in?
Follow the documentation for removing a node but you can skip all the steps in the node-removal
doc that need to be run on the node being removed, like steps 4, 6 and 7 (See Chapter 10 of
Oracle RAC Admin and Deployment Guide). Make sure that you remove any database instances
that were configured on the failed node with srvctl, and listener resources also, otherwise
rootdeltenode.sh will have trouble removing the nodeapps.
Just running rootdeletenode.sh isn't really enough, because you need to update the installer
inventory as well, otherwise you won't be able to add back the node using addNode.sh. And if
you don't remove the instances and listeners you'll also have problems adding the node and
instance back again.
Does Weblogic (WLS) support Services, FAN/FCF, and the Load Balancing
Advisory (LBA) with Oracle RAC?
Currently the integration is incomplete however it is being actively worked upon.
The recommendation from Oracle Fusion Middleware is to use WLS Multi pools with Oracle
RAC.
Where do I find Oracle Clusterware binaries and ASM binaries with Oracle
Database 11g Release 2?
With Oracle Database 11g Release 2, the binaries for Oracle Clusterware and Automatic Storage
Management (ASM) are distributed in a single set of binaries called the grid infrastructure. To
install the grid infrastructure, go to the grid directory on your 11g Release 2 media and run the
Oracle Universal Installer). Choose the Grid Infrastructure for a Cluster. If you are install ASM
for a single instance of Oracle Database on a Standalone Server, choose the Grid Infrastructure
for a Standalone Server. This installation includes Oracle Restart.
If my OCR and Voting Disks are in ASM, can I shutdown the ASM instance?
No. You will have to stop the clusterware on that node? Either crsctl stop cluster or crsctl stop
crs.
I have the 11.2 Grid Infrastructure installed and now I want to install an earlier
version of Oracle Database (11.1 or 10.2), is this supported ?
Yes however you need to "pin" the nodes in the cluster before trying to create a database using
an earlier version of Oracle Database (IE not 11.2). The command to pin a node is crsctl pin css
-n nodename. You should also apply the patch for Bug 8288940 to make DBCA work in an 11.2
cluster.
What is SCAN?
Single Client Access Name (SCAN) is a single name that allows client connections to connect to
any database in an Oracle cluster independently of which node in the cluster the database (or
service) is currently running. The SCAN should be used in all client connection strings and does
not change when you add/remove nodes from the cluster. SCAN allows clients to use EZConnect
or the this JDBC URL.
sqlplus system/manager@ sales1-scan:1521/oltp
jdbc:oracle:thin:@sales1-scan:1521/oltp
The SCAN is defined as a single name resolving to 3 IP addresses in either the cluster's GNS or
your corporate DNS.
** Click here for more details on SCAN.
I get an error with DBCA from 10.2 or 11.1 after I have installed the 11.2 Grid
Infrastructure?
You will need to apply the patch for Bug 8288940 to your database home in order for it to
recognize ASM running from the new grid infrastructure home. Also make sure you have
"pinned" the nodes.
crsctl pin css -n nodename
I get the following error starting my Oracle RAC database, what do I do?
WARNING: No cluster interconnect has been specified.
It simply means you do not have cluster_interconnects parameter set and nothing was set in the
OCR, so the private interconnect is picked at random by the database and hence the warning...
You can either set cluster_interconnects parameter in the init.ora to the private interconnect IP;
OR play with oifcfg getif and setif (type oifcfg without anything for help message)
$ oifcfg getif
eth0 138.2.236.0 global public
eth2 138.2.238.0 global cluster_interconnect
Note that if hardware is not identical you'll have to provide each node with it's own correct value,
if it's identical hardware you can use the -global switch.
Are block devices supported for OCR, Voting Disks, ASM devices?
Block Devices are only supported on Linux. For Unix platforms, the directio symantics are not
applicable (or rather not implemented) for the block devices on these platforms.
Note: The support for raw/block devices is scheduled for Oracle Database 12g. The Oracle
Database 10g OUI does not support block devices however Oracle Clusterware and ASM do.
With Oracle RAC 11g Release 2, the Oracle Universal Installer and the Configuration Assistants
do not support raw or block devices. The Command Line Interfaces still support raw/block.
When configuring the NIC cards and switch for a GigE Interconnect should it be
set to FULL or Half duplex in Oracle RAC?
You must use Full Duplex for all network communication. Half Duplex means you can only
either send OR receive at a time.
How can a NAS storage vendor certify their storage solution for Oracle RAC ?
As of January 2007 the OSCP has been discontinued!!
Please refer to this link on OTN for details on Oracle RAC Technologies Matrix (storage being
part of it).
Old Answer text:
They should obtain an OCE test kit and complete the required Oracle RAC tests. They can
submit the request for an OCE kit to ocesup_ie@oracle.com.
The list of certified NAS vendors/solutions is posted on OTN under the OSCP program
Can I run Oracle 9i RAC and Oracle RAC 10g in the same cluster?
YES. However Oracle Clusterware (CRS) will not support a Oracle 9i RAC database so you will
have to leave the current configuration in place. You can install Oracle Clusterware and Oracle
RAC 10g into the same cluster. On Windows and Linux, you must run the 9i Cluster Manager
for the 9i Database and the Oracle Clusterware for the 10g Database. When you install Oracle
Clusterware, your 9i srvconfig file will be converted to the OCR. Both Oracle 9i RAC and
Oracle RAC 10g will use the OCR. Do not restart the 9i gsd after you have installed Oracle
Clusterware. With Oracle Clusterware 11g Release 2, the GSD resource will be disabled by
default. You only need to enable this resource if you are running Oracle 9i RAC in the clsuter.
Remember to check certify for details of what vendor clusterware can be run with Oracle
Clusterware.
For example on Solaris, your Oracle 9i RAC will be using Sun Cluster. You can install Oracle
Clusterware and Oracle RAC 10g in the same cluster that is running Sun Cluster and Oracle 9i
RAC.
What are the restrictions on the SID with an Oracle RAC database? Is it limited
to 5 characters?
The SID prefix in 10g Release 1 and prior versions was restricted to five characters by
install/config tools so that an ORACLE_SID of upto max of 5+3=8 characters can be supported
in an Oracle RAC environment. The SID prefix is relaxed up to 8 characters in 10g Release 2,
see bug4024251 for more information.
With Oracle RAC 11g Release 2, SIDs in Oracle RAC with Policy Managed database are
dynamically allocated by the system when the instance starts. This supports a dynamic grid
infrastructure which allows the instance to start on any server in the cluster.
I want to use rconfig to convert a single instance to Oracle RAC but I am using
raw devices in Oracle RAC. Does rconfig support RAW ?
No. rconfig supports ASM and shared file system only.
Can we designate the place of archive logs on both ASM disk and regular file
system, when we use SE RAC?
Yes, - customers may want to create a standby database for their SE RAC database so placing the
archive logs additionally outside ASM is OK.
Can my customer use Veritas Agents to manage their Oracle RAC database on
Unix with SFRAC installed?
For details on the support of SFRAC and Veritas Agents with RAC 10g, please see <note /
Oracle's Policy for Supporting Oracle RAC 10g (applies to Oracle RAC 11g too) with Symantec
SFRAC on Unix and <note / Using Oracle Clusterware with Vendor Clusterware FAQ
Can I run more than one clustered database on a single Oracle RAC cluster?
You can run multiple databases in a Oracle RAC cluster, either one instance per node (w/
different databases having different subsets of nodes in a cluster), or multiple instances per node
(all databases running across all nodes) or some combination in between. Running multiple
instances per node does cause memory and resource fragmentation, but this is no different from
running multiple instances on a single node in a single instance environment which is quite
common. It does provide the flexibility of being able to share CPU on the node, but the Oracle
Resource Manager will not currently limit resources between multiple instances on one node.
You will need to use an OS level resource manager to do this.
Yes. The Oracle Clusterware should always run at the highest level. With Oracle Clusterware
11g, you can run both Oracle RAC 10g and Oracle RAC 11g databases. If you are using ASM
for storage, you can use either Oracle Database 10g ASM or Oracle Database 11g ASM however
to get the 11g features, you must be running Oracle Database 11g ASM. It is recommended to
use Oracle Database 11g ASM.
Note: When you upgrade to 11g Release 2, you must upgrade both Oracle Clusterware and
Automatic Storage Management to 11g Release 2. This will support Oracle Database 10g and
Oracle Database 11g (both RAC and single instance).
Yes, you can run Oracle 9i RAC in the cluster as well. 9i RAC requires the clusterware that is
certified with Oracle 9i RAC to be running in addition to Oracle Clusterware 11g.
I could not get the user equivalence check to work on my Solaris 10 server when
trying to install 10.2.0.1 Oracle Clusterware. The install ran fine without issue.
<< Message: Result: User equivalence check failed for user "oracle". >>
Cluvfy and the OUI tries to find SSH on Solaris at /usr/local/bin. Workaround is to create a
softlink from /usr/bin/ssh to /usr/local/bin.
Note: User equivalence is required for installations (IE using OUI) and patching. DBCA,
NETCA, and DBControl also require user equivalence.
Why does the NOAC attribute need to be set on NFS mounted RAC Binaries?
The noac attribute is required because the installer determines sharedness by creating a file and
checking for that files existance on remote node. If the noac attribute is not enabled then this
test will incorrectly fail. This will confuse installer and opatch. Some other minor issues issues
with spfile in the default $ORACLE_HOME/dbs will definitely be affected.
Are there any issues for the interconnect when sharing the same switch as the
public network by using VLAN to separate the network?
RAC and Clusterware deployment best practices recommend that the interconnect be deployed
on a stand-alone, physically seperate, dedicated switch. Many customers have consolidated these
stand-alone switches into larger managed switches. A consequence of this consolidation is a
merging of IP networks on a single shared switch, segmented by VLANs. There are caveats
associated with such deployments. RAC cache fusion exercises the IP network more rigorously
than non-RAC Oracle databases. The latency and bandwidth requirements as well as availability
requirements of the RAC/Clusterware interconnect IP network are more in-line with high
performance computing. Deploying the RAC/Clusterware interconnect on a shared switch,
segmented VLAN may expose the interconnect links to congestion and instability in the larger IP
network topology. If deploying the interconnect on a VLAN, there should be a 1:1 mapping of
VLAN to non-routable subnet and the VLAN should not span multiple VLANs (tagged) or
multiple switches. Deployment concerns in this environment include Spanning Tree loops when
the larger IP network topology changes, Assymetric routing that may cause packet flooding, and
lack of fine grained monitoring of the VLAN/port.
Why does netca always creates the listener which listens to public ip and not VIP
only?
This is for backward compatibility with existing clients: consider pre-10g to 10g server upgrade.
If we made upgraded listener to only listen on VIP, then clients that didn't upgrade will not be
able to reach this listener anymore.
Does changing uid or gid of the Oracle User affect Oracle Clusterware?
There are a lot of files in the Oracle Clusterware home and outside of the Oracle Clusterware
home that are chgrp'ed to the appropriate groups for security and appropriate access. The
filesystem records the uid (not the username), and so if you exchange the names, now the files
are owned by the wrong group.
Can we output the backupset onto regular file system directly (not onto flash
recovery area) using RMAN command, when we use SE RAC?
Yes, - customers might want to backup their database to offline storage so this is also supported.
Should the SCSI-3 reservation bit be set for our Oracle Clusterware only
installation?
If you are using only Oracle Clusterware(no Veritas CM), then you don't need to have SCSI-3
PGR enabled, since Oracle Clusterware does not require it for IO fencing. If the reservation is
set, then you'll get the inconsistent results. So ask your storage vendor to disable the reservation.
Veritas RAC requires that the storage array support SCSI-3 PGR, since this is how Veritas
handles IO fencing. This SCSI-3 PGR is set at the array level; for example EMC hypervolume
level.
known to lock the files whilst is scans then it is a good idea to exclude the Oracle
Datafiles/controlfiles/logfiles from a regular AV scan
For more information on troubleshooting this error, see the following note:
Note: 219361.1 Troubleshooting ORA-29740 in a RAC Environment
How do I configure FCF with BPEL so I can use Oracle RAC 10g in the
backend?
Note: 372456.1 describes the procedure to set up BPEL with a Oracle RAC 10g Release 1
database.
If you are using SSL, ensure the SSL enable attribute of ONS in opmn.xml file has same value,
either true or false, for all OPMN servers in the Farm. To troubleshoot OPMN at the application
server level, look at appendix A in Oracle Process Manager and Notification Server
Administrator's Guide.
a valid backup is restored the Oracle Clusterware will not startup due to the corrupt/missing
OCR file.
The interesting discussion is what happens if you have the OCR mirrored and one of the copies
gets corrupt? You would expect that everything will continue to work seemlessly. Well..
Almost.. The real answer depends on when the corruption takes place.
If the corruption happens while the Oracle Clusterware stack is up and running, then the
corruption will be tolerated and the Oracle Clusterware will continue to funtion without
interruptions. Despite the corrupt copy. DBA is advised to repair this hardware/software problem
that prevent OCR from accessing the device as soon as possible; alternatively, DBA can replace
the failed device with another healthy device using the ocrconfig utility with -replace flag.
If however the corruption happens while the Oracle Clusterware stack is down, then it will not be
possible to start it up until the failed device becomes online again or some administrative action
using ocrconfig utility with -overwrite flag is taken. When the Clusteware attempts to start you
will see messages similar to:
total id sets (1), 1st set (1669906634,1958222370), 2nd set (0,0) my votes (1), total votes (2)
2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprioini:disk 0 (/dev/raw/raw1) doesn't
have enough votes (1,2)
2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprseterror: Error in accessing physical
storage [26]
This is because the software can't determin which OCR copy is the valid one. In the above
example one of the OCR mirrors was lost while the Oracle Clusterware was down. There are 3
ways to fix this failure:
a) Fix whatever problem (hardware/software?) that prevent OCR from accessing the device.
b) Issue "ocrconfig -overwrite" on any one of the nodes in the cluster. This command will
overwrite the vote check built into OCR when it starts up. Basically, if OCR device is configured
with mirror, OCR assign each device with one vote. The rule is to have more than 50% of total
vote (quorum) in order to safely make sure the available devices contain the latest data. In 2-way
mirroring, the total vote count is 2 so it requires 2 votes to achieve the quorum. In the example
above there isn't enough vote to start if only one device with one vote is available. (In the earlier
example, while OCR is running when the device is down, OCR assign 2 vote to the surviving
device and that is why this surviving device now with two votes can start after the cluster is
down). See warning below
c) This method is not recommend to be performed by customers. It is possible to manually
modify ocr.loc to delete the failed device and restart the cluster. OCR won't do the vote check if
the mirror is not configured. See warning below
EXTREME CAUTION should be excersized if chosing option b or c above since data loss can
occur if the wrong file is manipulated, please contact Oracle Support for assistance before
proceeding.
For the best availability and to ensure the application receives all FAN events, yes, you should
update the configuration. To a certain degree, ONS will discover nodes. ONS runs on each node
in the cluster and is aware of all other nodes in the cluster. As long as when ONS on the middle
tier can find at least one node in the cluster when it starts, it will find the rest of the nodes. In the
case where the only node up is the new node in the cluster when the middle tier starts, the middle
tier will not find the cluster.
Why do we have a Virtual IP (VIP) in Oracle RAC 10g or 11g? Why does it just
return a dead connection when its primary node fails?
The goal is application availability.
When a node fails, the VIP associated with it is automatically failed over to some other node.
When this occurs, the following things happen.
(1) VIP detects public network failure which generates a FAN event.
(2) the new node re-arps the world indicating a new MAC address for the IP.
(3) connected clients subscribing to FAN immediately receive ORA-3113 error or equivalent.
Those not subscribing to FAN will eventually time out.
(4) New connection requests rapidly traverse the tnsnames.ora address list skipping over the dead
nodes, instead of having to wait on TCP-IP timeouts
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP
timeout period (which can be up to 10 min) before getting an error.
As a result, you don't really have a good HA solution without using VIPs and FAN. The easiest
way to use FAN is to use an integrated client with Fast Connection Failover (FCF) such as
JDBC, OCI, or ODP.NET.
What do the VIP resources do once they detect a node has failed/gone down? Are
the VIPs automatically acquired, and published, or is manual intervention
required? Are VIPs mandatory?
With Oracle RAC 10g or higher, each node requires a VIP. With Oracle RAC 11g Release 2, 3
additional SCAN vips are required for the cluster. When a node fails, the VIP associated with the
failed node is automatically failed over to one of the other nodes in the cluster. When this occurs,
two things happen:
1. The new node re-arps the world indicating a new MAC address for this IP address. For
directly connected clients, this usually causes them to see errors on their connections to
the old address;
2. Subsequent packets sent to the VIP go to the new node, which will send error RST
packets back to the clients. This results in the clients getting errors immediately.
In the case of existing SQL conenctions, errors will typically be in the form of ORA-3113 errors,
while a new connection using an address list will select the next entry in the list. Without using
VIPs, clients connected to a node that died will often wait for a TCP/IP timeout period before
getting an error. This can be as long as 10 minutes or more. As a result, you don't really have a
good HA solution without using VIPs.
With Oracle RAC 11g Release 2, you can delegate the management of the VIPs to the cluster. If
you do this, the Grid Naming Service (part of the Oracle Clusterware) will automatically
allocated and manage all VIPs in the cluster. This requires a DHCP service on the public
network.
If I use Services with Oracle RAC, do I still need to set up Load Balancing ?
Yes, Services allow you granular definition of workload and the DBA can dynamically define
which instances provide the service. Connection Load Balancing (provided by Oracle Net
Services) still needs to be set up to allow the user connections to be balanced across all instances
providing a service. With Oracle RAC 10g Release 2 or higher, set the CLB_GOAL on service
to define the type of load balancing you want, SHORT for short lived connections (IE connection
pool) or LONG (default) for applciations that have connections active for long periods (IE
Oracle Forms applicaiton).
How can a customer mask the change in their clustered database configuration
from their client or application? (I.E. So I do not have to change the connection
string when I add a node to the Oracle RAC database)
The combination of Server Side load balancing and Services allows you to easily mask cluster
database configuration changes. As long as all instances register with all listeners (use the
LOCAL_LISTENER and REMOTE_LISTENER parameters), server side load balancing will
allow clients to connect to the service on currently available instances at connect time.
The load balancing advisory (setting a goal on the service) will give advice as to how many
connections to send to each instance currently providing a service. When a service is enabled on
an instance, as long as the instance registers with the listeners, the clients can start getting
connections to the service and the load balancing advisory will include that instance is its advice.
With Oracle RAC 11g Release 2, the Single Client Access Name (SCAN) provides a single
name to be put in the client connection string (as the address). Clients using SCAN never have to
change even if the cluster configuration changes such as adding nodes.
The Oracle Clusterware and Oracle Real Application Clusters both support rolling upgrades of
the OS software when the version of the Oracle Database is certified on both releases of the OS
(and the OS is the same, no Linux and Windows or AIX and Solaris, or 32 and 64 bit etc.). This
can apply a patch to the operating system, a patchset (such as EL4u4 to EL4u6) or a release (EL4
to EL5). Stay within a 24 hours of upgrade window and fully test this path as it's not possible for
Oracle to test all these different paths and combinations.
What are my options for load balancing with Oracle RAC? Why do I get an
uneven number of connections on my instances?
All the types of load balancing available currently (9i-10g) occur at connect time.
This means that it is very important how one balances connections and what these connections
do on a long term basis.
Since establishing connections can be very expensive for your application, it is good
programming practice to connect once and stay connected. This means one needs to be careful as
to what option one uses. Oracle Net Services provides load balancing or you can use external
methods such as hardware based or clusterware solutions.
The following options exist prior to Oracle RAC 10g Release 2 (for 10g Release 2 see Load
Balancing Advisory):
Random
Either client side load balancing or hardware based methods will randomize the connections to
the instances.
On the negative side this method is unaware of load on the connections or even if they are up
meaning they might cause waits on TCP/IP timeouts.
Load Based
Server side load balancing (by the listener) redirects connections by default depending on the
RunQ length of each of the instances. This is great for short lived connections. Terrible for
persistent connections or login storms. Do not use this method for connections from connection
pools or applicaton servers
Session Based
Server side load balancing can also be used to balance the number of connections to each
instance. Session count balancing is method used when you set a listener parameter,
prefer_least_loaded_node_listener-name=off. Note listener name is the actual name of the
listener which is different on each node in your cluster and by default is listener_nodename.
Session based load balancing takes into account the number of sessions connected to each node
and then distributes the connections to balance the number of sessions across the different nodes.
ONS somewhere in the farm. Oracle RAC has its own ONS server for which SSL is disabled by
default. You must either enable SSL for Oracle RAC ONS, or disable it for OID ONS(OPMN).
You need to create a wallet for each Oracle RAC ONS server, or copy one of the wallets from
OPMN on the OID instances.
In ons.conf you need to specify the wallet file and password:
walletfile=
walletpassword=
ONS only uses SSL between servers, and so ONS clients will not be affected. You specify the
wallet password when you create the wallet. If you copy a wallet from an OPMN instance, then
use the same password configured in opmn.xml. If there is no wallet password configured in
opmn.xml, then you don't need to specify a wallet password in ons.conf either.
during the failure. This is a good time to review operating procedures and document recovery
procedures. Destructive testing should include tests such as node failure, instance failure, public
network failure, interconnect failures, storage failure, storage network failure, voting disk failure,
loss of an OCR, and loss of ASM.
Using features of Oracle Real Application Clusters and Oracle Clients including Fast Application
Notification (FAN), Fast Connection Failover (FCF), Oracle Net Service Connection Load
Balancing, and the Load Balancing Advisory, applications can mask most failures and provide a
very highly available application. For details on implementing best practices, see the MAA
document Client Failover Best Practices for Highly Available Oracle Databases and the Oracle
RAC Administration and Deployment Guide.
Can our Oracle RAC 10g VIP fail over from NIC to NIC as well as from node to
node ?
Yes, the Oracle RAC 10g VIP implementation is capable from failing over within a node from
NIC to NIC and back if the failed NIC is back online again, and also we fail over between nodes.
The NIC to NIC failover is fully redundant if redundant switches are installed.
I am using shared services which the following set in init.ora SQL> show
parameters dispatchers=(protocol=TCP)(listener=listen ers_nl01)(con=500)
(serv=oltp). I stopped my service with srvctl stop service but it is still registered
with the listener and accepting connections. Is this expected?
YES. This is by design of dispatchers which are part of Oracle Net Services. If you specify the
service attribute of the dispatchers init.ora parameter, the service specified cannot be managed by
the dba.
Is it possible to use SVRCTL start database with a user account other than
oracle ( that is other than the owner of the oracle software)?
YES. When you create a RAC db as a user different than the home/software owner (oracle) user,
the db creation assistant would set the correct permissions/ACLs on the CRS resources that
control the db/instances etc, assuming that you had setup group membership for this user to the
dba group of the home (find it using oracle_home/bin/osdbagrp) and also part of the crs home
owners primary group (usually oinstall) and there was group write permission on the
oracle_home.
With three primary load balancing options (client-side connect-time LB, serverside connect-time LB, and the runtime connection load balancing) Is it fair to say
Runtime Connection Load Balancing is the only option to leverage FAN up/down
events?
No. The listener is a subscriber to all FAN events (both from the load balancing advisory and the
HA events). Therefore server side connection load balancing leverages FAN HA events as well
as laod balancing advisory events.
With the Oracle JDBC driver 10g Release 2, if you enable Fast Connection Failover, you also
enable Runtime Connection Load Balancing (one knob for both).
What does the Virtual IP service do? I understand it is for failover but do we
need a separate network card? Can we use the existing private/public cards?
What would happen if we used the public ip?
The 10g Virtual IP Address (VIP) exists on every RAC node for public network communication.
All client communication should use the VIPs in their TNS connection descriptions. The TNS
ADDRESS_LIST entry should direct clienst to VIPs rather than using hostnames. During normal
runtime, the behaviour is the same as hostnames, however when the node goes down or is
shutdown the VIP is hosted elsewhere on the cluster, and does not accept connection requests.
This results in a silent TCP/IP error and the client fails immediately to the next TNS address. If
the network interface fails within the node, the VIP can be configured to use alternate interfaces
in the same node. The VIP must use the public interface cards. There is no requirement to
purchase additional public interface cards (unless you want to take advantage of within-node
card failover.)
Do I need to install the ONS on all my mid-tier serves in order to enable JDBC
Fast Connection Failover (FCF)?
With 10g Release 1, the middle tier must have ONS running (started by same users as
application). ONS is not included on the Client CD however is is part of the Oracle Database 10g
cd.
With 10g Release 2 or later, they do not need to install the ons on the middle tier. The JDBC
driver allows the use of remote ONS (ie uses the ONS running in the RAC cluster) . Just use the
datasource parameter ods.setONSConfiguration("nodes=racnode1:4200,racnode2.:4200");
Can I use the 10.2 JDBC driver with 10.1 database for FCF?
Yes with the patch for Bug 5657975 for 10.2.0.3,the 10.2 JDBC driver will work with a 10.1
database. The fix will be part of the 10.2.0.4 patchset. If you do not have the patch then using
FCF, use the 10.2 JDBC driver with 10.2 database. If database is 10.1, use 10.1 JDBC driver.
and OCI. Other applications can integrate with FAN by using the API to subscribe to the FAN
events.
Note: If you are using a 3rd party application server, then you can only use FCF if you use the
Oracle driver and except for OCI, its connection pool. If you are using the connection pool of the
3rd Party Application Server, then you do not get FCF. Your customer can subscribe directly to
FAN events however that is a development project for the customer. See the white paper
Workload Management with Oracle RAC 10g on OTN
How does the datasource properties initialLimit, minLimit, and maxLimit affect
Fast Connection Failover processing with JDBC?
The initialLimit property on the Implicit Connection Cache is effective only when the cache is
first created. For example, if the initialLimit is set to 10, you'll have 10 connections pre-created
and available when the conn cache is first created. Pls don't be confused between minLimit and
initialLimit. The current behavior is that after a DOWN event and the affected connections are
cleaned up, it is possible for the number of connections in the cache to be lower than minLimit.
An UP event is processed for both (a) new instance joins, as well as (b) down followed by an
instance UP. This has no relevance to initialLimit, or even minLimit. When a UP event comes
into our jdbc Implicit Connection Cache, we will create some new connections. Assuming you
have your listener load balancing set up properly, then those connections should go to the
instance that was just started. When your application does a get connection to the pool, it will be
given an idle connection, if you are running 10.2 and have the load balancing advisory turned on
for the service, we will allocate the session based on the defined goal to provide the best service
level
MaxLimit, when set, defines the upper boundary limit for the connection cache. By default,
maxLimit is unbounded - your database sets the limit.
What type of callbacks are supported with OCI when using FAN/FCF?
There are two separate callbacks supported. The HA Events (FAN) callback is called when an
event occurs. When a down event occurs, for example, you can clean up a custom connection
pool. i.e. purge stale connections. When the failover occurs, the TAF callback is invoked. At
failover time you can customize the newly created database session. Both FAN and TAF are
client-side callbacks. FAN also has a separate server side callout that should not be confused
with the OCI client callback.
I am seeing the wait events 'ges remote message', 'gcs remote message', and/or
'gcs for action'. What should I do about these?
These are idle wait events and can be safetly ignored. The 'ges remote message' might show up
in a 9.0.1 statspack report as one of the top wait events. To have this wait event not show up you
can add this event to the PERFSTAT.STATS$IDLE_EVENT table so that it is not listed in
Statspack reports.
What are the changes in memory requirements from moving from single
instance to RAC?
If you are keeping the workload requirements per instance the same, then about 10% more buffer
cache and 15% more shared pool is needed. The additional memory requirement is due to data
structures for coherency management. The values are heuristic and are mostly upper bounds.
Actual esource usage can be monitored by querying current and maximum columns for the gcs
resource/locks and ges resource/locks entries in V$RESOURCE_LIMIT.
But in general, please take into consideration that memory requirements per instance are reduced
when the same user population is distributed over multiple nodes. In this case:
Assuming the same user population N number of nodes M buffer cache for a single system then
(M / N) + ((M / N )*0.10) [ + extra memory to compensate for failed-over users ]
Thus for example with a M=2G & N=2 & no extra memory for failed-over users
=( 2G / 2 ) + (( 2G / 2 )) *0.10
=1G + 100M
Will adding a new instance to my Oracle RAC database (new node to the cluster)
allow me to scale the workload?
YES! Oracle RAC allows you to dynamically scale out your workload by adding another node to
the cluster. You must remember that adding more work to the database means that in addition to
the CPU and Memory that the new node brings, you will have to ensure that your I/O subsystem
can support the additional I/O requirements. In an Oracle RAC environment, you need to look at
the total I/O across all instances in the cluster.
Can I have different servers in my Oracle RAC? Can they be from different
vendors? Can they be different sizes?
Oracle Real Application Clusters (RAC) requires all the nodes to run the same Operating System
binary in a cluster (IE All nodes must be Windows 2008 or all nodes must be OEL 4). All nodes
must be the same architecture (I.E. All nodes must be either 32 bit or all nodes must be 64 bit or
all nodes must be HP-UX PARISC since you cannot mix PARISC with Itanium).
Oracle RAC does support a cluster with nodes that have different hardware configurations. An
example is a cluster with 3 nodes with 4 CPUs and another node with 6 CPUs. This can easily
occur when adding a new node after the cluster has been in production for a while. For this type
of configuration, customers must consider some additional features to get the optimal cluster
performance. The servers used in the cluster can be from different vendors; this is fully
supported as long as they run the same binaries. Since many customers implement Oracle RAC
for high availability, you must make sure that your hardware vendor will support the
configuration. If you have a failure, will you get support for the hardware configuration?
The installation of Oracle Clusterware expects the network interface to be the same name on all
nodes in the cluster. If you are using different hardware, you may need to work with your
operating system vendor to make sure the network interface names are the same name on all
nodes (IE eth0). Customers implementing uneven cluster configurations need to consider how
they will balance the workload across the cluster. Some customers have chosen to manually
assign different workloads to different nodes. This can be done using database services however
it is often difficult to predict workloads and the system cannot dynamically react to changes in
workload. Changes to workload require the DBA to modify the service. You will also need to
consider how you will survive failures in the cluster. Will the service levels be maintained if the
larger node in the cluster fails? Especially in a small cluster, the impact of losing a node could
impact the ability to continue processing the application workload.
The impact of the different sized nodes depends on how much difference there is in the size. If
there is a large difference between the nodes in terms of memory and CPU size, than the "bigger"
nodes will attract more load, obviously, and in the case of failure the "smaller" node(s) will
become overpowered. In such a case, static routing of workload via services e.g. batch and
certain services, which can be suspended/stopped if the large node fails and the cluster has
significantly reduced capacity, may be advisable. The general recommendation is that the nodes
should be sized in such a way that the aggregated peak load of the large node(s) can be absorbed
by the smaller node(s), i.e. smaller node should have sufficient capacity to run the essential
services alone. Another option is to add another small node to the cluster on demand in case that
the large one fails.
It should also be noted especially if there is a large difference between the sizes of the nodes, the
small nodes can slow down the larger node. This could be critical one if the smaller node is very
busy and must serve data to the large node.
To help balance workload across a cluster, Oracle RAC 10g Release 2 and above provides the
Load Balancing Advisory (LBA). The load balancing advisory runs in an Oracle RAC database
and monitors the work executed by the service on all instances where the service is active in the
cluster. The LBA provides recommendations to the subscribed clients about the state of the
service and where the client should direct connection requests. Setting the GOAL on the service
activates the load balancing advisory. Clients that can utilize the load balancing advisory are
Oracle JDBC Implicit Connection Cache, Oracle Universal Connection Pool for Java, Oracle
Call Interface Session Pool, ODP.NET Connection Pool, and Oracle Net Services Connection
Manager. The Oracle Listener also uses the Load Balancing Advisory if CLB_GOAL parameter
is set to SHORT (recommended Best Practice if using an integrated Oracle Client mentioned
here). If CLB_GOAL is set to LONG (default), the Listener will load balance the number of
sessions for the service across the instances where the service is available. See the Oracle Real
Application Clusters Administration and Deployment Guide for details on implementing services
and the various parameter settings.
8. Get familiar w/ EM GC to manage a cluster and help eliminate a lot of the complexity of many
of the nodes.
9. Why stop at 6 nodes? A maximum of 3 way messaging ensure RAC can scale much, much
further.
What are my options for setting the Load Balancing Advisory GOAL on a
Service?
The load balancing advisory is enabled by setting the GOAL on your service either through
PL/SQL DBMS_SERVICE package or EM DBControl Clustered Database Services page. There
are 3 options for GOAL:
None Default setting, turn off advisory
THROUGHPUT Work requests are directed based on throughput. This should be used
when the work in a service completes at homogenous rates. An example is a trading system
where work requests are similar lengths.
SERVICE_TIME Work requests are directed based on response time. This should be used
when the work in a service completes at various rates. An example is as internet shopping system
where work requests are various lengths
Note: If using GOAL, you should set CLB_GOAL=SHORT
Does Database blocksize or tablespace blocksize affect how the data is passed
across the interconnect?
Oracle ships database block buffers, i.e. blocks in a tablespace configured for 16K will result in a
16K data buffer shipped, blocks residing in a tablespace with base block size (8K) will be
shipped as base blocks and so on; the data buffers are broken down to packets of MTU sizes.
How should I deal with space management? Do I need to set free lists and free
list groups?
Manually setting free list groups is a complexity that is no longer required.
We recommend using Automatic Segment Space Management rather than trying to manage
space manually. Unless you are migrating from an earlier database version with OPS and have
already built and tuned the necessary structures, Automatic Segment Space Management is the
preferred approach.
Automatic Segment Space Management is NOT the default, you need to set it.
For more information see:
Note: 180608.1 Automatic Space Segment Management in RAC Environments
I was installing Oracle 9i RAC and my Oracle files did not get copied to the
remote node(s). What went wrong?
First make sure the cluster is running and is available on all nodes. You should be able to see all
nodes when running an 'lsnodes -v' command.
If lsnodes shows that all members of the cluster are available, then you may have an rcp/rsh
problem on Unix or shares have not been configured on Windows.
You can test rcp/rsh on Unix by issuing the following from each node:
[node1]/tmp> touch test.tst
[node1]/tmp> rcp test.tst node2:/tmp
[node2]/tmp> touch test.tst
[node2]/tmp> rcp test.tst node1:/tmp
On Windows, ensure that each node has administrative access to all these directories within the
Windows environment by running the following at the command prompt:
NET USE \\host_name\C$
Each machine has a different clock frequency and as a result a slightly different time drift. NTP
computes this time drift every about 15 minutes, and stores this information in a "drift" file, it
then adjusts the system clock based on this known drift as well as compares it to a given time-
How is Oracle Enterprise Manager integrated with the Oracle RAC 11g Release
2 stack?
Oracle Enterprise Manager (EM) is available in 2 versions: Oracle EM Grid Control and Oracle
EM Database Control. Oracle EM Grid Control underlies a different release cycle than the
Oracle Database, while the new version of Oracle EM Database Control is available with
every new database release.
At the time of writing, Oracle EM Grid Control is available in version 10.2.0.5. This version
does not support new features of the Oracle Database 11g Release 2. Oracle 11g Rel. 2 Database,
however, can be managed with Oracle EM in the current version with some restrictions (no 11.2
feature support).
With Oracle Database and Grid Infrastructure 11g Release 2, Oracle EM Database Control is
now able to manage the full Oracle RAC 11g Release 2 stack. This includes: Oracle RAC
Databases, Oracle Clusterware, and Oracle Automatic Storage Management.
The new feature that needs to be noted here is the full management of Oracle Clusterware 11g
Release 2 with Oracle EM Database Control 11g Release 2. For more information and details,
see publicly available Technical White Paper: The New Oracle Enterprise Manager Database
Control 11g Release 2 - Now Managing Oracle Clusterware
What storage option should I use for Oracle RAC on Linux? ASM / OCFS / Raw
Devices / Block Devices / Ext3 ?
The recommended way to manage large amounts of storage in an Oracle RAC environment is
ASM (Automatic Storage Management). If you really need/want a clustered filesystem, then
Oracle offers OCFS (Oracle Clustered File System); for 2.4 kernel (RHEL3/SLES8) use OCFS
Version 1 and for 2.6 kernel (RHEL4/SLES9) use OCFS2. All these options are free to use and
completely supported, ASM is bundled in the RDBMS software, and OCFS as well as ASMLib
are freely downloadable from Oracle's OSS (Open Source Software) website.
EXT3 is out of the question, since it's data structures are not cluster aware, that is, if you mount
an ext3 filesystem from multiple nodes, it will quickly get corrupted.
Another option of course is NFS and iSCSI both are outside the scope of this FAQ but included
for completeness.
If for any reason the above options (ASM/OCFS) are not good enough and you insist on using
'raw devices' or 'block devices' here are the details on the two (This information is still very
useful to know in the context of ASM and OCFS).
On Unix/Linux there are two types of devices:
block devices (/dev/sde9) are **BUFFERED** devices!! unless you explicitly open them in
O_DIRECT you will get buffered (linux buffer cache) IO.
character devices (/dev/raw/raw9) are *UN-BUFFERRED** devices!! no matter how you open
them, you always get unbufferred IO, hence no need to specify O_DIRECT on the file open call.
Above is not a typo, block devices on Unix do buffered IO by default (cached in linux buffer
cache), this means that RAC can not operate on it (unless opened with O_DIRECT), since the
IO's will not be immediately visible to other nodes.
You may check if a device is block or character device by the first letter printed with the "ls -l"
command:
crw-rw---- 1 root disk 162, 1 Jan 23 19:53 /dev/raw/raw1
brw-rw---- 1 root disk 8, 112 Jan 23 14:51 /dev/sdh
Above, "c" stands for character device, and "b" for block devices.
Starting with Oracle 10.1 an RDBMS fix added the O_DIRECT flag to the open call
(O_DIRECT flag tells the Linux kernel to bypass the Linux buffer cache and write directly to
disk), in the case of a block device, that ment that a create datafile on '/dev/sde9' would succeed
(need to set filesystemio_options=directio in init.ora).. This enhancement was well received, and
shortly after bug 4309443 was fixed (by adding the O_DIRECT flag on the OCR file open call)
meaning that starting with 10.2 (there are several 10.1 backports available) the Oracle OCR file
could also access block devices directly. For the voting disk to be opened with O_DIRECT you
need fix for bug 4466428 (5021707 is a duplicate). This means that both voting disks and OCR
files could live on block devices. However, due to OUI bug 5005148, there is still a need to
configure raw devices for the voting or OCR files during installation of RAC, not such a big
deal, since it's just 5 files in most cases. It is not possible to ask for a backport of this bug since it
means a full re-release of 10g, one alternative if raw devices are not a good option is to use 11g
Clusterware (with 10g RAC database).
By using block devices you no longer have to live with the limitations of 255 raw devices per
node. You can access as many block devices as the system can support. Also block devices carry
persistent permissions across reboots, while with raw devices one would have to customize that
after installation otherwise the Clusterware stack or database would fail to startup due to
permission issues.
ASM or ASMlib can be given the raw devices (/dev/raw/raw2) as was done in the initial
deployment of 10g Release 1, or the more recommended way: ASM/ASMLib should be given
the block devices directly (eg. /dev/sde9).
Since RAW devices are being phased out of Linux in the long term, it is recommended everyone
should switch to using the block devices (meaning, pass these block devices to ASM or OCFS/2
or Oracle Clusterware)
Note: With Oracle Database 11g Release 2, Oracle Clusterware files (OCR and Voting Disk can
be store in ASM and this is the Best Practice). The Oracle Universal Installer and the
configuration assistants (IE DBCA, NETCA) will not support raw/block devices. All command
line interfaces will support raw/block for this release. Therefore if you are using raw/block today,
you can continue to use it and upgrading to 11g Release 2 will not change the location of any
files. However due to the desupport in the next release, you are recommended to plan a migration
to a supported storage option. All files supported natively in ASM, will not be supported in
production with the ASM Cluster File System (ACFS)
What are the implications of using srvctl disable for an instance in my Oracle
RAC cluster? I want to have it available to start if I need it but at this time to not
want to run this extra instance for this database.
During node reboot, any disabled resources will not be started by the Clusterware, therefore this
instance will not be restarted. It is recommended that you leave the vip, ons,gsd enabled in that
node. For example, VIP address for this node is present in address list of database services, so a
client connecting to these services will still reach some other database instance providing that
service via listener redirection. Just be aware that by disabling an Instance on a node, all that
means is that the instance itself is not starting. However, if the database was originally created
with 3 instances, that means there are 3 threads of redo. So, while the instance itself is disabled,
the redo thread is still enabled, and will occasionally cause log switches. The archived logs for
this 'disabled' instance will still be needed in any potential database recovery scenario. So, if you
are going to disable the instance through srvctl, you may also want to consider disabling the redo
thread for that instance.
srvctl disable instance -d orcl -i orcl2
SQL> alter database disable public thread 2;
Do the reverse to enable the instance.
SQL> alter database enable public thread 2;
srvctl enable instance -d orcl -i orcl2
GSD is only needed for configuration/management of cluster database. Once database has been
configured and up, it can be safely stopped provided you don't run any 'srvctl or dbca or dbua'
tools. In Oracle 9i RAC, the GSD doesn't write anywhere unless tracing was turned on, in which
case traces go to stdout.
Once the database has been configured and started and you don't use 'srvctl or EM' to manage or
'dbca to extend/remove' or 'dbua to upgrade' this database, GSD can be stopped.
Note: With Oracle RAC 11g Release 2, the gsd resource is disabled by default. You will only
need to enable the resource if you are running Oracle 9i RAC in the same cluster.
How do I identify which node was used to install the cluster software and/or
database software?
You can find out which node by running olsnodes command. The node which is returned first is
the node from which the software was installed and patches should be installed.
Note: When applying patches in a rolling fashion, you are recommended to run the rolling scripts
from the last node added to the cluster first and follow the list in reverse order.
Are the Oracle Clusterware bundle patches cumulative, do they conflict with one
another?
Fix-wise, the Oracle Clusterware bundles are cumulative, that is, CRS bundle #3 fixes all the
issues that bundle #2 did, and some additional ones, see Note:405820.1 for complete list of bugs
fixed in each bundle.
However, OPatch does not allow to apply ANY patch if there are any overlapping libs or
binaries between an already existing patch and the to-be-installed patch.
If two patches touch a particular file, e.g: kcb.o, then the existing patch must be manually
removed before the new applied.
So, bundle patches are cumulative, however they do conflict with one another due to the way
OPatch allows patch application, hence previous bundle must be manually removed before a new
one is applied.
To check if any two patches conflict invoke OPatch as per Note:458485.1 or using:
$ OPatch/opatch prereq CheckConflictAmongPatches -phbasefile patchlist
where patchlist is a text file containing all the patch numbers to be checked, separated by a
newline.
I have added a second network to my cluster, can I load balance my users across
this network?
Server side load balancing will only work on a single network which is configured as the public
network with the Oracle VIPS. If you add a second network, with a second listener, do not add
this new listener to the local_listener and remote_listener parameter. You can use client-side load
balancing and failover for users connecting to this network however you will be unable to use
server-side load balancing or receive FAN events for this network.
Oracle RAC 11g Release 2 adds the support for multiple public networks. Connections will be
load balanced across the instances. Each network will have its own service. To enable load
balancing use the LISTENER_NETWORKS parameter instead of LOCAL_LISTENER and
REMOTE_LISTENER.
Srvctl cannot start instance, I get the following error PRKP-1001 CRS-0215,
however sqlplus can start it on both nodes? What is the problem?
This could be many things but a popular issue is when you have a separate ASM Home and the
listener is running out of this home (it was the first home installed). Srvctl needs a TNS_ADMIN
alias to the network/admin directory in this home instead of using the default
ORACLE_HOME/network/admin for the database you are trying to start. For srvctl to work you
must
srvctl setenv nodeapps -n node -T TNS_ADMIN=full path
on each node in the cluster.
You cannot rely on a TNS_ADMIN environment variable.
See Note 420977.1
Another cause is non-existent spfile, see Note 732683.1
I have 2 clusters named "crs" (the default), how do I get Grid Control to
recognize them as targets?
There are 2 options:
a) if the grid control agent install (which is a separate install) has already been done and has
picked up the name of the cluster as it was configured as CRS, one can go to the EM console as
is, and for the second, manually delete and rediscover the target. When you rediscover the target,
give whatever display name you like
b) Prior to performing the Grid control agent install, just set CLUSTER_NAME environment
variable and run the install. This variable need to be set only for that install session. No need to
set it every time agent starts.
I found in 10.2 that the EM "Convert to Cluster Database" wizard would always
fall over on the last step where it runs emca and needs to log into the new cluster
database as dbsnmp to create the cluster database targets etc. I changed the
password for the dbsnmp account to be dbsnmp (same as username) and it
worked OK. Is this a known issue?
The conversion to cluster happens successfully but the EM monitoring credentials for the
converted database are not properly set due to this bug. This is resolved in next patchset. In the
interim, user can set the monitoring password from the "monitoring configuration" screen for the
RAC DB from GC console and proceed.
This issue has been fixed in 10.2.0.3 database and to get the complete functionality you will need
10.2.0.2 Grid Control patch also, as the fix is spread between the two pieces of software. For
now you can proceed with setting password for dbsnmp user same as that of sys user.
environment. The wide domain of deployment of CVU ranges from initial hardware setup
through fully operational cluster for RAC deployment and covers all the intermediate stages of
installation and configuration of various components. Cluvfy does not take any corrective action
following the failure of a verification task, does not enter into areas of performance tuning or
monitoring, does not perform any cluster or RAC operation, and does not attempt to verify the
internals of cluster database or cluster elements.
What versions of the database can I use the cluster verification utility (cluvfy)
with?
The cluster verification utility is release with Oracle Database 10g Release 2 but can also be used
with Oracle Database 10g Release 1.
clocking, and will thus not function. These NICS were made cheaper by assuming that the switch
was going to have the clock. Unfortunately there is no way to know which NICs do not have that
clock.
b) Media sense behaviour on various OS's (most notably Windows) will bring a NIC down when
a cable is disconnected. Either of these issues can lead to cluster instability and lead to ORA29740 errors (node evictions).
Due to the benefits and stability provided by a switch, and their afforability ($200 for a simple 16
port GigE switch), and the expense and time related to dealing with issues when one does not
exist, this is the only supported configuration.
From a purely technology point of view Oracle does not care if the customer uses cross over
cable or router or switches to deliver a message. However, we know from experience that a lot of
adapters misbehave when used in a crossover configuration and cause a lot of problems for RAC.
Hence we have stated on certify that we do not support crossover cables to avoid false bugs and
finger pointing amongst the various parties: Oracle, Hardware vendors, Os vendors etc...
Please check the certification matrix available through Metalink for your specific release.
After installing patchset 9013 and patch_2313680 on Linux, the startup was very
slow
Please carefully read the following new information about configuring Oracle Cluster
Management on Linux, provided as part of the patch README:
Three parameters affect the startup time:
soft_margin (defined at watchdog module load)
-m (watchdogd startup option)
WatchdogMarginWait (defined in nmcfg.ora).
Oracle Clusterware fails to start after a reboot due to permissions on raw devices
reverting to default values. How do I fix this?
After a successful installation of Oracle Clusterware a simple reboot and Oracle Clusterware
fails to start. This is because the permissions on the raw devices for the OCR and voting disks
e.g. /dev/raw/raw{x} revert to their default values (root:disk) and are inaccessible to Oracle. This
change of behavor started with the 2.6 kernel; in RHEL4, OEL4, RHEL5, OEL5, SLES9 and
SLES10. In RHEL3 the raw devices maintained their permissions across reboots so this
symptom was not seen.
The way to fix this is on RHEL4, OEL4 and SLES9 is to create /etc/udev/permission.d/40udev.permissions (you must choose a number that's lower than 50). You can do this by
copying /etc/udev/permission.d/50-udev.permissions, and removing the lines that are not needed
(50-udev.permissions gets replaced with upgrades so you do not want to edit it directly, also a
typo in the 50-udev.permissions can render the system non-usable). Example permissions file:
# raw devices
raw/raw[1-2]:root:oinstall:0640
raw/raw[3-5]:oracle:oinstall:0660
Note that this applied to all raw device files, here just the voting and OCR devices were
specified.
Can different releases of Oracle RAC be installed and run on the same physical
Linux cluster?
Yes - However Oracle Clusterware (CRS) will not support a Oracle 9i RAC database so you will
have to leave the current configuration in place. You can install Oracle Clusterware and Oracle
RAC 10g or 11g into the same cluster. On Windows and Linux, you must run the 9i Cluster
Manager for the 9i Database and the Oracle Clusterware for the 10g Database. When you install
Oracle Clusterware, your 9i srvconfig file will be converted to the OCR. Oracle 9i RAC, Oracle
RAC 10g, and Oracle RAC 11g will use the OCR. Do not restart the 9i gsd after you have
installed Oracle Clusterware. Remember to check certify for details of what vendor clusterware
can be run with Oracle Clusterware. Oracle Clusterware must be the highest level (down to the
patchset). IE Oracle Clusterware 11g Release 2 will support Oracle RAC 10g and Oracle RAC
11g databases. Oracle Clusterware 10g can only support Oracle RAC 10g databases.
Is the hangcheck timer still needed with Oracle RAC 10g and 11g?
YES! The hangcheck-timer module monitors the Linux kernel for extended operating system
hangs that could affect the reliability of the RAC node ( I/O fencing) and cause database
corruption. To verify the hangcheck-timer module is running on every node:
as root user:
/sbin/lsmod | grep hangcheck
If the hangcheck-timer module is not listed enter the following command as
the root user:
9i: /sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180
hangcheck_reboot=1
10g & 11g: /sbin/insmod hangcheck-timer hangcheck_tick=1 hangcheck_margin=10
hangcheck_reboot=1
To ensure the module is loaded every time the system reboots, verify that the local system
startup file (/etc/rc.d/rc.local) contains the command above.
For additional information please review the Oracle RAC Install and Configuration Guide (5-41)
and note:726833.1.
Customer did not load the hangcheck-timer before installing RAC, Can the
customer just load the hangcheck-timer ?
YES. hangcheck timer is a kernel module that is shipped with the Linux kernel, all you have to
do is load it as follows:
9i: /sbin/insmod hangcheck-timer hangcheck_tick=30 hangcheck_margin=180
hangcheck_reboot=1
would have been serious. But since customer is installing this on a fresh new box, They can
continue the install.
Does Oracle Support Oracle RAC with Solaris 10 Containers (aka Zones)?
No. Oracle RAC is currently not supported with Solaris 10 Local Containers. You can use a
Global container but remember 1 global container per system or per domain. So, in case your
hardware is capable of being split up in domains, you may have more than 1 global container on
the whole system (hardware), that is per domain.
In local containers, you cannot manipulate hardware in any way, shape or form. You can't plumb
and unplumb network interfaces .... nothing ... even as the local container root user. You can only
do this in the global container. We rely on the uadmin command to quickly bring down a node if
an urgent condition is detected. As I recall, you can't do this from the local container either. CRS
has to maintain the ability to manipulate hardware and this just is not going to happen in a local
container.
The answer is the same if you are using Vendor Clusterware such as Veritas SF RAC or Sun
Cluster.
In Solaris 10, do we need Sun Cluster to provide redundancy for the interconnect
and multiple switches?
Link Aggregation (GLDv3) is bundled in the OS as of Solaris 10. IPMP is available for Solaris
10 and Solaris 9. Neither require Sun Cluster to be installed. For the interconnect and switch
redundancy, as a best practice, avoid VLAN trunking across the switches. We can configure
stand-alone redundant switches that do not require the VLAN to be trunked between them, nor
the need for an inter-switch link (ISL). If the interconnect VLAN is trunked with other VLANS
between the redundant switches, insure that the interconnect VLAN is pruned from the trunk to
avoid unnecessary traffic propagation through the corportate network. For ease of configuration
(e.g. fewer IP address requirements), use IPMP with link mode failure detection in
primary/standby configuration. This will give you a single failover IP which you will define in
cluster_interconnects init.ora parameter. Remove any interfaces for the interconnect from the
OCR using `oifcfg delif`. AND TEST THIS RIGOROUSLY. For now, as Link Aggregation
(GLDv3) cannot span multiple switches from a single host, you will need to configure the switch
redundancy and the host NICs with IPMP. When configuring IPMP for the interconnect with
multiple switches available, configure IPMP as active/standby and *not* active/active. This is to
avoid potential latencies in switch failure detection/failover which may impact the availability of
the rdbms. Note, IPMP spreads/load balances outbound packets on the bonded interfaces, but
inbound packets are received on a single interface. In an active/active configuration this makes
send/receive problems difficult to diagnose. Both Link Aggregation (GLDv3) and IPMP are core
OS packages SUNWcsu, SUNWcsr respectively and do not require Sun Clusterware.
Can I configure HP's Autoport aggregation for NIC Bonding after the install?
(i.e. not present beforehand)
You are able to add NIC bonding after the installation although this is more complicated than the
other way round.
There are several notes on webiv regarding this.
Note 276434.1 Modifying the VIP of a Cluster Node
Regarding the private interconnect, please use oifcfg delif / setif to modify this.
Is HMP supported with Oracle RAC 10g or Oracle RAC 11g on all HP platforms
?
Does the Oracle Cluster File System (OCFS) support network access through
NFS or Windows Network Shares?
No, in the current release the Oracle Cluster File System (OCFS) is not supported for use by
network access approaches like NFS or Windows Network Shares.
Can I run my Oracle 9i RAC and Oracle RAC 10g on the same Windows
cluster?
Yes but the Oracle 9i RAC database must have the 9i Cluster Manager and you must run Oracle
Clusterware for the Oracle Database 10g. 9i Cluster Manager can coexsist with Oracle
Clusterware 10g.
Be sure to use the same 'cluster name' in the appropriate OUI field for both 9i and 10g when you
install both together in the same cluster.
The OracleCMService9i service will remain intact during the Oracle Clusterware 10g install, as a
Oracle 9i RAC database would require that the 9i OracleCMService9i, it should be left running.
The information for the 9i database will get migrated to the OCR during the Oracle Clusterware
installation. Then, for future database management, you would use the 9i srvctl to manage the 9i
database, and the 10g srvctl to manage any new 10g databases. Both srvctl commands will use
the OCR. The same applies for Oracle RAC 11g
When using MS VSS on Windows with Oracle RAC, do I need to run the VSS on
each node where I have an Oracle RAC instance?
There is no need to run Oracle VSS writer instance on each Oracle RAC node (even though it is
installed and enabled by default on all nodes). And the documentation in Windows Platform Doc
What do I do when I get an ORA-01031 error logging into the ASM instance?
This sounds like the ORA_DBA group on Node2 is empty, or else does not have the correct
username in it. Double-check what user account you are using to logon to Node2 as ( a 'set'
command will show you the USERNAME and USERDOMAIN values) and then make sure that
this account is part of ORA_DBA.
The other issue to check is that SQLNET.AUTHENTICATION_SERVICES=(NTS) is set in the
SQLNET.ORA
How do I verify that Host Bus Adapter Node Local Caching has been disabled
for the disks I will be using in my RAC cluster?
Disabling write caching is a standard practice while using the volume managers/file systems are
shared. Go to My computer -> Manage->Storage->Disk Management->Disk-Properties>Policies-> and uncheck the "Enable Write Caching on Disk". This will disable the write
caching.
3rd party HBA's may have their own management tools to modify these settings. Just remember
that centralized, shared cache is generally OK. It's the node local cache that you need to turn off.
How exactly you do this will vary from HBA vendor to HBA vendor.
My customer has a failsafe cluster installed, what are the benefits of moving their
system to RAC?
Fail Safe development is continuing. Most work on the product will be around accomodating
changes in the supported resources (new releases of RDBMS, AS, etc.) and the underlying
Microsoft Cluster Services and Windows operating system.
A failsafe protected instance is an Active/Passive instance so, as such, does not benefit that much
at all from adding more nodes to a cluster. Microsoft have a limit of nodes in a MSCS cluster.
(typically 8 nodes - but it does vary). RAC is active active so you get dual benefits of increased
scalability and availability every time you add a node to a cluster. We have a limit of 100 nodes
in a RAC cluster (we don't use MSCS). Your customer should really consider more than 2 nodes.
(because of aggregate computer power on node failure). If the choice is 2 of 4 CPU nodes or 4 of
2CPU node then I would go for 2 CPU nodes. Customers are using both Windows Itanium RAC
and Windows X64 RAC. Windows X64 seems more popular.
Keep in mind, though, that for Fail Safe, if the server is 64-Bit, regardless of flavor, Fail Safe
Manager must be installed on a 32-Bit client, which will complicate things just a bit. There is no
such restriction for RAC, as all management for RAC can be done via Grid Control or Database
Control.
For EE RAC you can implement an 'extended cluster' where there is a distance between the
nodes in the cluster (usually less than 20 KM).
My customer wants to understand what type of disk caching they can use with
their Windows RAC Cluster, the install guide tells them to disable disk caching?
If the write cache identified is local to the node then that is bad for RAC. If the cache is visible to
all nodes as a 'single cache', typically in the storage array, and is also 'battery backed' then that is
OK.
Is HACMP needed for RAC on AIX 5.2 using GPFS file system?
The newest version of GPFS can be used without HACMP, if it is available for AIX 5.2 then you
do not need HACMP.
The prerequisites
doc for AIX clearly says:
-----
On AIX it is
important to put the reserve_lock=no/reserve_policy =no_reserve
in order to allow
AIX to access the devices from more than one node simultaneously.
Can I run Oracle RAC 10g on my IBM Mainframe Sysplex environment (z/OS)?
YES! There is no separate documentation for RAC on z/OS. What you would call "clusterware"
is built in to the OS and the native file systems are global. IBM z/OS documentation explains
how to set up a Sysplex Cluster; once the customer has done that it is trivial to set up a RAC
database. The few steps involved are covered in in Chapter 14 of the Oracle for z/OS System
Admin Guide, which you can read here. There is also an Install Guide for Oracle on z/OS ( here)
but I don't think there are any RAC-specific steps in the installation. By the way, RAC on z/OS
does not use Oracle's clusterware (CSS/CRS/OCR).
Can I use Oracle Clusterware for failover of the SAP Enqueue and VIP services
when running SAP in a RAC environment?
Oracle has created sapctl to do this and it is available for certain platforms. SAPCTL will be
available for download on SAP Services Marketplace on AIX and Linux. For Solaris, it will not
be available in 2007, use Veritas or Sun Cluster.
How do I gather all relevant Oracle and OS log/trace files in an Oracle RAC
cluster to provide to Support?
Use RAC-DDT (RAC Diagnostic Data Tool), User Guide is in Note: 301138.1. Quote from the
User Guide:
RACDDT is a data collection tool designed and configured specifically for
gathering diagnostic data related to Oracle's Real Application Cluster (RAC)
technology. RACDDT is a set of scripts and configuration files that is run on
one or more nodes of an Oracle RAC cluster. The main script is written in
Perl, while a number of proxy scripts are written using Korn shell. RACDDT
will run on all supported Unix and Linux platforms, but is not supported on
any Windows platforms.
Newer versions of RDA (Remote Diagnostic Agent) have the RAC-DDT functionality, so going
forward RDA is the tool of choice. The RDA User Guide is in Note: 314422.1
What is the optimal migration path to be used while migrating the E-Business
suite to Oracle RAC?
Following is the recommended and most optimal path to migrate you E-Business suite to an
Oracle RAC environment:
1. Migrate the existing application to new hardware. (If applicable).
2. Use Clustered File System (ASM recommended) for all data base files or migrate all database
files to raw devices. (Use dd for Unix or ocopy for NT)
3. Install/upgrade to the latest available e-Business suite.
4. Ensure the database version is supported with Oracle RAC
5. In step 4, install Oracle RAC option and use Installer to perform install for all the nodes.
6. Clone Oracle Application code tree.
Reference Documents:
Oracle E-Business Suite Release 11i with 9i RAC: Installation and Configuration : Note:
279956.1
E-Business Suite 11i on RAC : Configuring Database Load balancing & Failover: Note:
294652.1
Oracle E-Business Suite 11i and Database - FAQ : Note: 285267.1
- Datafiles
- Control Files
- Redo Logs
- Archive Logs
- SPFILE
Oracle Clusterware files OCR and Voting Disk can be put on OCFS2 however Best Practice is to
put them on raw or block devices.
What are the Best Practices for using a clustered file system with Oracle RAC?
Can I use a cluster file system for OCR, Voting Disk, Binaries as well as database
files?
Oracle Best Practice for using Cluster File Systems (CFS) with Oracle RAC
* Oracle Clusterware binaries should not be placed on a CFS as this reduces cluster functionality
while CFS is recovering, and also limits the ability to perform rolling upgrades of Oracle
Clusterware.
* Oracle Clusterware voting disks and the Oracle Cluster Registry (OCR) should not be placed
on a CFS as the I/O freeze during CFS reconfiguration can lead to node eviction, or cluster
management activities to fail (I.E start, stop, or check of a resource).
* Oracle Database 10g binaries are supported on CFS for Oracle RAC 10g and for Oracle
Database. The system should be configured to support multiple ORACLE_HOMEs in order
to maintain the ability to perform a rolling patch application.
* Oracle Database 10g database files (e.g. datafiles, trace files, and archive log files) are
supported on CFS.
Check Certify for certified cluster file systems.
Rolling Upgrades with Cluster File Systems in General
It is not recommended to use a cluster file system (CFS) for the Oracle Clusterware binaries.
Oracle Clusterware supports in-place rolling upgrades. Using a shared Oracle Clusterware home
results in a global outage during patch application and upgrades. A workaround is available to
clone the Oracle Clusterware home for each upgrade. This is not common practice.
If a patch is marked for rolling upgrade, then it can be applied to a Oracle RAC database in a
rolling fashion. Oracle supports rolling upgrades for the Oracle Database Automated Storage
Management (ASM) after you have upgraded to Oracle Database 11g. When using a CFS for the
database and ASM Oracle homes, the CFS should be configured to use of context dependent
links (CDSLs) or equivalent and these should configured to work in conjunction with rolling
upgrades and downgrades. This includes updating the database and ASM homes in the OCR to
Is Sun QFS supported with Oracle RAC? What about Sun GFS?
From certify, check there for the latest details.
Sun Cluster - Sun StorEdge QFS (9.2.0.5 and higher,10g and 10gR2):
No restrictions on placement of files on QFS
Sun StorEdge QFS is supported for Oracle binary executables, database data files, database data
files, archive logs, Oracle Cluster Registry (OCR), Oracle Cluster ReadyServices voting disk and
recovery area can be placed on QFS.
Solaris Volume Manager for Sun Cluster can be used for host-based mirroring
Supports up to 8 nodes
Is Red Hat GFS(Global File System) is certified by Oracle for use with Oracle
Real Application Clusters?
Sistina Cluster Filesystem is not part of the standard RedHat kernel and therefore is not certified
by Oracle but falls under a kernel extension. This however, does not mean that Oracle RAC is
not certified with it. As a fact, Oracle RAC does not certify against a filesystem per se, but
certifies against an operating system. If, as is the case with Sistina filesystem, the filesystem is
certified with the operating system, this only means that the Oracle does not provide direct
support and fix the filesystem in case of an error. Customer will have to contact the filesystem
provider for support.
Theroetically you can have up to 255 however it has been tested with up to 16 nodes.
When does the Oracle node VIP fail over to another node and subsequently
return to its home node?
The handling of the VIP with respect to a failover to another node and subsequent return to its
home node is handled differently depending on the Oracle Clusterware version. In general, one
can distinguish between Oracle Clusterware 10g & 11g Release 1 and Oracle Clusterware 11g
Release 2 behavior.
For Oracle Clusterware 10g & 11g Release 1 the VIP will fail over to another node either after a
network or a node failure. However, the VIP will automatically return to its home node only after
a node failure and a subsequent restart of the node. Since the network is not constantly monitored
in this Oracle Clusterware version, there is no way that Oracle Clusterware can detect the
recovery of the network and initiate an automatic return of the node VIP to its home node.
Exception: With Oracle Patch Set 10.2.0.3 a new behavior was introduced that allowed the node
VIP to return to its home node after the network recovered. The required network check was part
of the database instance check. However, this new check introduced quite some side effects and
hence, was disabled with subsequent bundle patches and the Oracle Patch Set 10.2.0.4
Starting with 10.2.0.4 and for Oracle Clusterware 11g Release 1 the default behavior is to
avoid an automatic return of the node VIP to its home node after the network recovered.
This behavior can be activated, if required, using the "ORA_RACG_VIP_FAILBACK"
parameter. This parameter should only be used after reviewing support note 805969.1 (VIP does
not relocate back to the original node starting from 10.2.0.4 and 11.1 even after the public
network problem is resolved.)
With Oracle Clusterware 11g Release 2 the default behavior is to automatically initiate a return
of the node VIP to its home node as soon as the network recovered after a failure. It needs to be
noted that this behavior is not based on the parameter mentioned above and therefore does not
induce the same side effects. Instead, a new network resource is used in Oracle Clusterware 11g
Release 2, which monitors the network constantly, even after the network failed and the resource
became "OFFLINE". This feature is called "OFFLINE resource monitoring" and is per default
enabled for the network resource.
Can the Network Interface Card (NIC) device names be different on the nodes in
a cluster, for both public and private?
All public NICs must have the same name on all nodes in the cluster
Similarly, all private NICs must also have the same names on all nodes
Do not mix NICs with different interface types (infiniband, ethernet, hyperfabric, etc.) for the
same subnet/network.
For Oracle RAC 10g rerunning root.sh after the initial successful install of the Oracle
Clusterware is expressly discouraged and unsupported. We strongly recommend not doing it.
In case where root.sh is failing to execute for on an initial install (or a new node joining an
existing cluster), it is OK to re-run root.sh after the cause of the failure is corrected (permissions,
paths, etc.). In this case, please run rootdelete.sh to undo the local effects of root.sh before rerunning root.sh.
When ct run the command 'onsctl start' receives the message "Unable to open
libhasgen10.so". Any idea why the message "unable to open libhasgen10.so" ?
Most likely you are trying to start ONS from ORACLE_HOME instead of Oracle Clusterware
(or Grid Infrastructure in 11.2) home. Please try to start it from the Oracle Clusterware home.
Voting Files stored in ASM - How many disks per disk group do I need?
If Voting Files are stored in ASM, the ASM disk group that hosts the Voting Files will place the
appropriate number of Voting Files in accordance to the redundancy level. Once Voting Files are
managed in ASM, a manual addition, deletion, or replacement of Voting Files will fail, since
users are not allowed to manually manage Voting Files in ASM.
If the redundancy level of the disk group is set to "external", 1 Voting File is used.
If the redundancy level of the disk group is set to "normal", 3 Voting Files are used.
If the redundancy level of the disk group is set to "high", 5 Voting Files are used.
Note that Oracle Clusterware will store the disk within a disk group that holds the Voting Files.
Oracle Clusterware does not rely on ASM to access the Voting Files.
In addition, note that there can be only one Voting File per failure group. In the above list of
rules, it is assumed that each disk that is supposed to hold a Voting File resides in its own,
dedicated failure group.
In other words, a disk group that is supposed to hold the above mentioned number of Voting
Files needs to have the respective number of failure groups with at least one disk. (1 / 3 / 5
failure groups with at least one disk)
Consequently, a normal redundancy ASM disk group, which is supposed to hold Voting
Files, requires 3 disks in separate failure groups, while a normal redundancy ASM disk group
that is not used to store Voting Files requires only 2 disks in separate failure groups.
If the (RAC) databases use ASM, too, they cannot access their data on this node anymore during
the time the ASM instance is down. If a RAC database is used, access to the same data can be
established from another node.
If the CRSD process running on the node affected by the ASM instance failure is the OCR
writer, AND the majority of the OCR locations is stored in ASM, AND an IO is attempted on the
OCR during the time the ASM instance is down on this node, THEN CRSD stops and becomes
inoperable. Hence cluster management is affected on this particular node.
Under no circumstances will the failure of one ASM instance on one node affect the whole
cluster.
With GNS, do ALL public addresses have to be DHCP managed (public IP,
public VIP, public SCAN VIP) ?
No, The choice to use DHCP for the hostname is outside Oracle. The Oracle Clusterware and
Oracle RAC will work with both static and DHCP hostnames. When you use GNS, we will use
DHCP for the VIPs which includes node vips and SCAN vips.
I am trying to move my voting disks from one diskgroup to another and getting
the error "crsctl replace votedisk not permitted between ASM Disk
Groups." Why?
You need to review the ASM and crsctl logs to see why the command is failing.
To put your voting disks in ASM, you must have the diskgroup set up properly. There must be
enough failure groups to support the redundancy of the voting disks as set by the redundancy on
the disk group. EG: Normal redundancy, 3 failure groups are requried, High redundancy, 5
failure groups. Note: by default each disk in a diskgroup is put in its own failure group. The
compatible.asm attribute of the diskgroup must be set to 11.2 and you must be using 11.2 version
of Oracle Clusterware and ASM.
Can I run the fixup script generated by the 11.2 OUI or CVU on a running
system?
It depends on what the problem that were listed to be fixed. The fixup scripts can change system
parameters so you should not change system parameters while applications are running.
However, if an earlier version of Oracle Database is already running on the system, there should
not be any need to change the system parameters.
What should the permissions be set to for the voting disk and ocr when doing an
Oracle RAC Install?
The Oracle Real Application Clusters install guide is correct. It describes the PRE-INSTALL
ownership/permission requirements for ocr and voting disk. This step is needed to make sure that
the Oracle Clusterware install succeeds. Please don't use those values to determine what the
ownership/permmission should be POST INSTALL. The root script will change the
ownership/permission of ocr and voting disk as part of install. The POST INSTALL permissions
will end up being : OCR - root:oinstall - 640 Voting Disk - oracle:oinstall - 644
How is the Oracle Cluster Registry (OCR) stored when I use ASM?
The OCR is stored similar to how Oracle Database files are stored. The extents are spread across
all the disks in the diskgroup and the redundancy (which is at the extent) is based on the
redundancy of the disk group. You can only have one OCR in a diskgroup. Best Practice for
ASM is to have 2 diskgroups. Best Practice for OCR in ASM is to have a copy of the OCR in
each diskgroup.
I am trying to install Oracle Clusterware (10.2) and when I run the OUI, at the
Specify Cluster Configuration screen, the Add, Edit and Remove buttons are
grayed out. Nothing comes up in the cluster nodes either. Why?
Check for 3rd Party Vendor clusterware (such as Sun Cluster or Veritas Cluster) that was not
completely removed. IE Look for /opt/ORCLcluster directory, it should be removed.
I made a mistake when I created the VIP during the install of Oracle
Clusterware, can I change the VIP?
Yes The details of how to do this are described in Metalink Note.276434.1
How should I test the failure of the public network (IE Oracle VIP failover) in
my Oracle RAC environment?
Prior to 10.2.0.3, It was possible to test VIP failover by simply running
ifconfig <interface_name> down.
The intended behaviour was that the VIP would failover to the another node. In 10.2.0.3 this is
the same behaviour on Linux, however on other operating systems the VIP will NOT failover,
instead the interface will be plumbed again. To test VIP failover on platforms other than Linux,
the switch can be turned off or the physical cable pulled.
The is best way to test. NOTE: if you have other DBs that share the same IPs then they
will be affected. Your tests should simulate Production failures which are generally Switch errors
or interface errors.
disks)
At most only one sub-cluster will continue and a split brain will be avoided.
Can I change the public hostname in my Oracle Database 10g Cluster using
Oracle Clusterware?
Hostname changes are not supported in Oracle Clusterware (CRS), unless you want to perform a
deletenode followed by a new addnode operation.
The hostname is used to store among other things the flag files and Oracle Clusterware stack will
not start if hostname is changed.
Does the hostname have to match the public name or can it be anything else?
When there is no vendor clusterware, only Oracle Clusterware, then the public node name must
match the host name. When vendor clusterware is present, it determines the public node names,
and the installer doesn't present an opportunity to change them. So, when you have a choice,
always choose the hostname.
I have a 2-node RAC running. I notice that it is always node2 that is evicted
when I test private network failure scenario by disconnecting the private
network cable. Doesn't matter whether it is node1's or node2's private network
cable that is disconnected, it is always the node2 that is evicted. What happens in
a 3-nodes RAC cluster if node1's cable is disconnected?
The node with the lower node number will survive(The first node to join the cluster). In case of 3
nodes, 2 nodes will survive and the one you pulled the cable will go away. 4 nodes - the sub
cluster with the lower node number will survive.
What are the licensing rules for Oracle Clusterware? Can I run it without RAC?
Check the Oracle Database Licensing Information 11g Release 1 (11.1) Part Number
B28287-01 Look in the Special Use section under Oracle Database Editions.
Can I change the name of my cluster after I have created it when I am using
Oracle Clusterware?
No, you must properly deinstall Oracle Clusterware and then re-install. To properly de-install
Oracle Clusterware, you MUST follow the directions in the Installation Guide Chapter 10. This
will ensure the ocr gets cleaned out.
Why does Oracle Clusterware use an additional 'heartbeat' via the voting disk,
when other cluster software products do not?
Oracle uses this implementation because Oracle clusters always have access to a shared disk
environment. This is different from classical clustering which assumes shared nothing
architectures, and changes the decision of what strategies are optimal when compared to other
environments. Oracle also supports a wide variety of storage types, instead of limiting it to a
specific storage type (like SCSI), allowing the customer quite a lot of flexibility in configuration.
Why does Oracle still use the voting disks when other cluster sofware is present?
Voting disks are still used when 3rd party vendor clusterware is present, because vendor
clusterware is not able to monitor/detect all failures that matter to Oracle Clusterware and the
database. For example one known case is when the vendor clusterware is set to have its heartbeat
go over a different network than RAC traffic. Continuing to use the voting disks allows CSS to
resolve situations which would otherwise end up in cluster hangs.
Customer is hitting bug 4462367 with an error message saying low open file
descriptor, how do I work around this until the fix is released with the Oracle
Clusterware Bundle for 10.2.0.3 or 10.2.0.4 is released?
The fix for "low open file descriptor" problem is to increase the ulimit for Oracle Clusterware.
Please be careful when you make this type of change and make a backup copy of the
init.crsd before you start! To do this, you can modify the init.crsd as follows, while you wait
for the patch: 1. Stop Oracle Clusterware on the node (crsctl stop crs)
2. copy the /etc/init.d/init.crsd
3. Modify the file changing:
# Allow the daemon to drop a diagnostic core file/
ulimit -c unlimited
ulimit -n unlimited
to
# Allow the daemon to drop a diagnostic core file/
ulimit -c unlimited
ulimit -n 65536
4. restart Oracle Clusterware in the node (crsctl start crs)
Does Oracle Clusterware have to be the same or higher release than all instances
running on the cluster?
Yes - Oracle Clusterware must be the same or a higher release with regards to the RDBMS or
ASM Homes.
Please refer to Note#337737.1
Can I set up failover of the VIP to another card in the same machine or what do I
do if I have different network interfaces on different nodes in my cluster (I.E.
eth0 on node1,2 and eth1 on node 3,4)?
With srvctl, you can modify the nodeapp for the VIP to list the NICs it can use. Then VIP will
try to start on eth0 interface and if it fails, try eth1 interface.
./srvctl modify nodeapps -n -A //eth0\|eth1
Note how the interfaces are a list separated by the | symbol and how you need to quote
this with a \ character or the Unix shell will interpret the character as a pipe. So
on a node called ukdh364 with a VIP address of ukdh364vip and we want a netmask (say) of
255.255.255.0 then we have:
./srvctl modify nodeapps -n ukdh364 -A ukdh364vip/255.255.255.0/eth0\|eth1
To check which interfaces are configured as public or private use oifcfg getif
example output:
eth0 138.2.238.0 global public
eth1 138.2.240.0 global public
eth2 138.2.236.0 global cluster_interconnect
An ifconfig on your machine will show what the hardware names for the interface cards
installed.
As long as you can confirm via the CSS daemon logfile that it thinks the voting disk is bad, you
can restore the voting disk from backup while the cluster is online. This is the backup that you
took with dd (by the manual's request) after the most recent addnode, deletenode, or install
operation. If by accident you restore a voting disk that the CSS daemon thinks is NOT bad, then
the entire cluster will probably go down.
crsctl add css votedisk - adds a new voting disk
crsctl delete css votedisk - removes a voting disk
Note: the cluster has to be down. You can also restore the backup via dd when the cluster is
down.
How can I register the listener with Oracle Clusterware in RAC 10g Release 2?
NetCA is the only tool that configures listener and you should be always using it. It will register
the listener with Oracle Clusterware. There are no other supported alternatives.
application running in the cluster. Oracle Database 10g Real Application Clusters (RAC)
databases and associated Oracle processes (E.G. listener) are automatically managed by the
clusterware.
Necessary Connections
Interconnect, SAN, and IP Networking need to be kept on separate channels, each with required
redundancy. Redundant connections must not share the same Dark Fiber (if used), switch, path,
or even building entrances. Keep in mind that cables can be cut.
The SAN and Interconnect connections need to be on dedicated point-to-point connections. No
WAN or Shared connection allowed. Traditional cables are limited to about 10 km if you are to
avoid using repeaters. Dark Fiber networks allow the communication to occur without repeaters.
Since latency is limited, Dark Fiber networks allow for a greater distance in separation between
the nodes. The disadvantage of Dark Fiber networks are they can cost hundreds of thousands of
dollars, so generally they are only an option if they already exist between the two sites.
If direct connections are used (for short distances) this is generally done by just stringing long
cables from a switch. If a DWDM or CWDM is used then then these are directly connected via a
dedicated switch on either side.
Note of caution: Do not do RAC Interconnect over a WAN. This is a the same as doing it over
the public network which is not supported and other uses of the network (i.e. large FTPs) can
cause performance degradations or even node evictions.
For SAN networks make sure you are using SAN buffer credits if the distance is over 10km.
At the moment in Oracle 10g, if Oracle Clusterware is being used, we also require that a single
subnet be setup for the public connections so we can fail over VIPs from one side to another.
Can I use ASM as mechanism to mirror the data in an Extended RAC cluster?
Yes, but it cannot replicate everything that needs replication.
ASM works well to replicate any object you can put in ASM. But you cannot put the OCR or
Voting Disk in ASM.
In 10gR1 they can either be mirrored using a different mechanism (which could then be used
instead of ASM) or the OCR needs to be restored from backup and the Voting Disk can be
recreated.
In the future we are looking at providing Oracle redundancy for both.
This support is for 10gR2 onwards and has the following limitations:
1. As in any extended RAC environments, the additional latency induced by distance will affect
I/O and cache fusion performance. This effect will vary by distance and the customer is
responsible for ensuring that the impact attained in their environment is acceptable for their
application.
2. OCR must be mirrored across both sites using Oracle provided mechanisms.
3. Voting Disk redundancy must exists across both sites, and at a 3rd site to act as an arbitrage.
This third site may be via a WAN.
4. Storage at each site much be setup as seperate failure groups and use ASM mirroring, to
ensure at least one copy of the data at each site.
5. Customer must have a seperate and dedicated test cluster also in an extended configuration
setup using the same software and hardware components (can be fewer or smaller nodes).
6. Customer must be aware that in 10gR2 ASM does not provide partial resilvering. Should a
loss of connectivity between the sites occur, one of the failure groups will be marked invalid.
When the site rejoins the cluster, the failure groups will need to be manually dropped and added.
When I run 10.2 CLUVFY on a system where RAC 10g Release 1 is running I get
following output:
Package existence check failed for "SUNWscucm:3.1".
Package existence check failed for "SUNWudlmr:3.1".
Package existence check failed for "SUNWudlm:3.1".
Package existence check failed for
"ORCLudlm:Dev_Release_06/11/04,_64bit_3.3.4.8_reentrant".
Package existence check failed for "SUNWscr:3.1".
Package existence check failed for "SUNWscu:3.1".
Checking this Solaris system I don't see those packages installed. Can I continue
my install?
Note that cluvfy checks all possible prerequisites and tells you whether your system passes the
check or not. You can then cross reference with the install guide to see if the checks that failed
are required for your type of installation. It the above case, if you are not planning on using Sun
Cluster, then you can continue the install. The checks that failed are the checks for Sun Cluster
required packages and are not neede d on your cluster. As long as everything else checks out
successfully, you can continue.
What are the default values for the command line arguments?
Here are the default values and behavior for different stage and component commands:
For component nodecon:
If no -i or -a arguments is provided, then cluvfy will get into the discovery mode.
How do I check the Oracle Clusterware stack and other sub-components of it?
Cluvfy provides commands to check a particular sub-component of the CRS stack as well as the
whole CRS stack. You can use the 'comp ocr' command to check the integrity of OCR. Similarly,
you can use 'comp crs' and 'comp clumgr' commands to check integrity of crs and clustermanager
sub-components. To check the entire CRS stack, run the stage command 'clucvy stage -post
crsinst'.
Is there a way to verify that the Oracle Clusterware is working properly before
proceeding with RAC install?
Yes. You can use the post-check command for cluster services setup(-post clusvc) to verify CRS
status. A more appropriate test would be to use the pre-check command for database
installation(-pre dbinst). This will check whether the current state of the system is suitable for
RAC install.
At what point cluvfy is usable? Can I use cluvfy before installing Oracle
Clusterware?
You can run cluvfy at any time, even before CRS installation. In fact, cluvfy is designed to assist
the user as soon as the hardware and OS is up. If you invoke a command which requires CRS or
RAC on local node, cluvfy will report an error if those required products are not yet installed.
What is a stage?
CVU supports the notion of Stage verification. It identifies all the important stages in RAC
deployment and provides each stage with its own entry and exit criteria. The entry criteria for a
stage define a specific set of verification tasks to be performed before initiating that stage. This
pre-check saves the user from entering into a stage unless its pre-requisite conditions are met.
The exit criteria for a stage define another specific set of verification tasks to be performed after
completion of the stage. The post-check ensures that the activities for that stage have been
completed successfully. It identifies any stage specific problem before it propagates to
subsequent stages; thus making it difficult to find its root cause. An example of a stage is "precheck of database installation", which checks whether the system meets the criteria for RAC
install.
What is a component?
CVU supports the notion of Component verification. The verifications in this category are not
associated with any specific stage. The user can verify the correctness of a specific cluster
component. A component can range from a basic one, like free disk space to a complex one like
CRS Stack. The integrity check for CRS stack will transparently span over verification of
multiple sub-components associated with CRS stack. This encapsulation of a set of tasks within
specific component verification should be of a great ease to the user.
What is nodelist?
Nodelist is a comma separated list of hostnames without domain. Cluvfy will ignore any domain
while processing the nodelist. If duplicate entities after removing the domain exist, cluvfy will
eliminate the duplicate names while processing. Wherever supported, you can use '-n all' to
check on all the cluster nodes. Check this for more information on nodelist and shortcuts.
Note that, this package should be installed only on RedHat Linux 3.0 distribution. Discovery of
scsi disks for RedHat Linux 2.1 is not supported.
How do I know about cluvfy commands? The usage text of cluvfy does not show
individual commands.
Cluvfy has context sensitive help built into it. Cluvfy shows the most appropriate usage text
based on the cluvfy command line arguments. If you type 'cluvfy' on the command prompt,
cluvfy displays the high level generic usage text, which talks about valid stage and component
syntax. If you type 'cluvfy comp -list', cluvfy will show valid components with brief description
on each of them. If you type 'cluvfy comp -help', cluvfy will show detail syntax for each of the
valid components. Similarly, 'cluvfy stage -list' and 'cluvfy stage -help' will list valid stages and
their syntax respectively. If you type an invalid command, cluvfy will show the appropriate
usage for that particular command. For example, if you type 'cluvfy stage -pre dbinst', cluvfy will
show the syntax for pre-check of dbinst stage.
Do I have to type the nodelist every time for the CVU commands? Is there any
shortcut?
You do not have to type the nodelist every time for the CVU commands. Typing the nodelist for
a large cluster is painful and error prone. Here are few short cuts. To provide all the nodes of the
cluster, type '-n all'. Cluvfy will attempt to get the nodelist in the following order: 1. If a vendor
clusterware is available, it will pick all the configured nodes from the vendor clusterware using
lsnodes utility. 2. If CRS is installed, it will pick all the configured nodes from Oracle
clusterware using olsnodes utility. 3. In none of the above, it will look for the CV_NODE_ALL
environmental variable. If this variable is not defined, it will complain. To provide a partial
list(some of the nodes of the cluster) of nodes, you can set an environmental variable and use it
in the CVU command. For example: setenv MYNODES node1,node3,node5 cluvfy comp
nodecon -n $MYNODES
command line. This will produce detail output of individual checks and where applicable will
show per-node result in a tabular fashion.
Why the peer comparison with -refnode says passed when the group or user does
not exist?
Peer comparison with the -refnode feature acts like a baseline feature. It compares the system
properties of other nodes against the reference node. If the value does not match( not equal to
reference node value ), then it flags that as a deviation from the reference node. If a group or user
does not exist on reference node as well as on the other node, it will report this as 'matched' since
there is no deviation from the reference node. Similarly, it will report as 'mismatched' for a node
with higher total memory than the reference node for the above reason.
the disk bindings ( e.g. /dev/raw/raw1 ) as valid storage paths or identifiers. Please use the
underlying disk( e.g. /dev/sdm etc ) for the storage path or identifiers. 6._ Current version of
CVU for RedHat 2.1 complains about the missing cvuqdisk package. This will be corrected in
the future release. User should ignore this error. Note that, 'cvuqdisk' package should be installed
only on RedHat Linux 3.0 distribution. Discovery of scsi disks for RedHat Linux 2.1 is not
supported.
Is Oracle RAC One Node supported with 3rd party clusterware and/or 3rd party
CFS?
No. Oracle RAC One Node is only supported with with version 11.2 (and above) of Oracle grid
infrastructure.
How does RAC One Node compare with traditional cold fail over solutions like
HP Serviceguard, IBM HACMP, Sun Cluster and Symantec, and Veritas Cluster
Server?
RAC One Node is a better high availability solution than traditional cold fail over solutions.
RAC One Node operates in a cluster but only a single instance of the database is running on one
node in the cluster. If that database instance has a problem, RAC One Node detects that and can
attempt to restart the instance on that node. If the whole node fails, RAC One Node will detect
that and will bring up that database instance on another node in the cluster. Unlike traditional
cold failover solutions, Oracle Clusterware will send out notifications (FAN events) to clients to
speed reconnection after failover. 3rd-party solutions may simply wait for potentially lengthy
timeouts to expire.
RAC One Node goes beyond the traditional cold fail over functionality by offering
administrators the ability to proactively migrate instances from one node in the cluster to
another. For example, lets say you wanted to do an upgrade of the operating system on the node
that the RAC One Node database is running on. The administrator would activate "OMotion," a
new Oracle facility that would migrate the instance to another node in the cluster. Once the
instance and all of the connections have migrated, the server can be shut down, upgraded and
restarted. OMotion can then be invoked again to migrate the instance and the connections back
to the now-upgraded node. This non-disruptive rolling upgrade and patching capability of RAC
One Node exceeds the current functionality of the traditional cold fail over solutions.
Also, RAC One Node provides a load balancing capability that is attractive to DBAs and Sys
Admins. For example, if you have two different database instances running on a RAC One Node
Server and it becomes apparent that the load against these two instances is impacting
performance, the DBA can invoke OMotion and migrate one of the instances to another lessused node in the cluster. RAC One Node offers this load balancing capability, something that
the traditional cold fail over solutions do not.
Lastly,many 3rd-party solutions do not support ASM storage. This can slow down failover, and
prevent consolidation of storage across multiple databases, increasing the management burden on
the DBA.
The following table summarizes the differences between RAC One Node and 3rd-party fail over
solutions:
Feature
Single Vendor
Fast failover
Rolling DB
patching, OS,
Clusterware, ASM
patching and
upgrades
Workload
Management
Storage
virtualization
How does RAC One Node compare with a single instance Oracle Database
protected with Oracle Clusterware?
Feature
Supportability
DB Control support
Rolling DB patching, OS, RAC One Node can online EE must be failed over from one
Clusterware, ASM
migrate a database from one node to another, which means all
patching and upgrades
server to another to enable connections will be dropped and
online rolling patching.
must reconnect. Some
Most connections should
transactions will be dropped and
migrate with no disruption must reconnect. Reconnection
could take several minutes.
Workload Management
RAC One Node can online EE must be failed over from one
migrate a database from one node to another, which means all
server to another to enable connections will be dropped and
load balancing of databases must reconnect. Some
across servers in the
transactions will be dropped and
cluster. Most connections must reconnect. Reconnection
should migrate with no
could take several minutes.
disruption
node RAC
RAC and RAC One Node EE and RAC use different tools,
use same tools,
management interfaces, and
management interfaces, and processes
processes
What is Oracle Real Application Clusters One Node (RAC One Node)?
Oracle RAC One Node is an option available with Oracle Database 11g Release 2. Oracle RAC
One Node is a single instance of Oracle RAC running on one node in a cluster.
This option adds to the flexibility that Oracle offers for reducing costs via consolidation. It
allows customers to more easily consolidate their less mission critical, single instance databases
into a single cluster, with most of the high availability benefits provided by Oracle Real
Application Clusers (automatic restart/failover, rolling patches, rolling OS and clusterware
upgrades), and many of the benefits of server virtualization solutions like VMware.
RAC One Node offers better high availability functionality than traditional cold failover cluster
solutions because of a new Oracle technology Omotion, which is able to intelligently relocate
database instances and connections to other cluster nodes for high availability and system load
balancing.
If I add or remove nodes from the cluster, how do I inform RAC One Node?
You must re-run raconeinit to update the candidate server list for each RAC One Node Database.
How do I get Oracle Real Application Clusters One Node (Oracle RAC One
Node)?
Oracle RAC One Node is only available with Oracle Database 11g Release 2. Oracle Grid
Infrastructure for 11g Release 2 must be installed as a prerequisite. Download and apply Patch
9004119 to your Oracle RAC 11g Release 2 home in order to obtain the code associated with
RAC One Node. (this patch was released after 11.2.0.1 was released and is only available for
Linux). Support for other platforms will be added with 11.2.0.2. The documentation is the
Oracle RAC One Node User Guide
How does RAC One Node compare with database DR products like DataGuard
or Golden Gate?
The products are entrely complementary. RAC One Node is designed to protect a single
database. It can be used for rolling database patches, OS upgrades/patches, and grid
infrastructure (ASM/Clusterware) rolling upgrades and patches. This is less disruptive than
switching to a datbase replica. Switching to a replica for patching, or for upgrading the OS or
grid infrastructure requires that you choose to run Active/Active (and deal with potential
conflicts) or Active/Passive (and wait for work on the active primary database to drain before
allowing work on the replica). You need to make sure replication supports all data types you are
using. You need to make sure the replica can keep up with your load. You need to figure out how
to re-point your clients to the replica (not an issue with RAC One Node because it's the same
database, and we use VIPs). And lastly, RAC One Node allows a spare node to be used 10 days
per year without licensing. Our recommendation is to use RAC or RAC One Node to protect
from local failures and to support rolling maintenance activities. Use Data Guard or replication
technology for DR, data protection, and for rolling database upgrades. Both are required as part
of a comprehensive HA solution.
How do I install the command line tools for RAC One Node?
The command line tools are installed when you install the RAC One Node patch 9004119 on top
of 11.2.0.1.
How does RAC One Node compare with virtualization solutions like VMware?
RAC One Node offers greater benefits and performance than VMware in the following ways:
<!--[if !supportLists]-->- <!--[endif]-->Server Consolidation: VMware offers physical
server consolidation but imposes a 10%+ processing overhead to enable this
consolidation and have the hypervisor control access to the systems resources. RAC One
Node enables both physical server consolidation as well as database consolidation
without the additional overhead of a hypervisor-based solution like VMware.
<!--[if !supportLists]-->- <!--[endif]-->High Availability: VMware offers the ability to fail
over a failed virtual machine everything running in that vm must be restarted and
connections re-established in the event of a virtual machine failure. VMware cannot
detect a failed process within the vm just a failed virtual machine. RAC One Node
offers a finer-grained, more intelligent and less disruptive high availability model. RAC
One Node can monitor the health of the database within a physical or virtual server. If it
fails, RAC One Node will either restart it or migrate the database instance to another
server. Oftentimes, database issues or problems will manifest themselves before the
whole server or virtual machine is affected. RAC One Node will discover these problems
much sooner than a VMware solution and take action to correct it. Also, RAC One Node
allows database and OS patches or upgrades to be made without taking a complete
database outage. RAC One Node can migrate the database instance to another server,
patches or upgrades can be installed on the original server and then RAC One Node will
migrate the instance back. VMware offers a facility, Vmotion, that will do a memory-tomemory transfer from one virtual machine to another. This DOES NOT allow for any OS
or other patches or upgrades to occur in a non-disruptive fashion (an outage must be
taken). It does allow for the hardware to be dusted and vacuumed, however.
Can I use Oracle RAC One Node for Standard Edition Oracle RAC?
No, Oracle RAC One Node is only part of Oracle Database 11g Release 2 Enterprise Edition. It
is not licensed or supported for use with any other editions.
If the root.sh script fails on a node during the install of the Grid Infrastructure
with Oracle Database 11g Release 2, can I re-run it?
Yes, however you should first fix the problem that caused it to fail.
Run /crs/install/rootcrs.pl -delete force
Rerun root.sh
How do I explain for a customer who is concerned about the phrase in the
following doc ?
Oracle Clusterware Administration and Deployment Guide 11g Release 2
(11.2) E10717-04
2-27
"If Oracle ASM fails, then OCR is not accessible on the node on which Oracle
ASM failed, but the cluster remains operational. The entire cluster only fails if
the Oracle ASM instance on the OCR master node fails, if the majority of the
OCR locations are in Oracle ASM, and if there is an OCR read or write access,
then the crsd stops and the node becomes inoperative. "
This was a documentation bug and has been fixed. Here is the updated write up (posted in the
online version).
If an Oracle ASM instance fails on any node, then OCR becomes unavailable on that particular
node. If the crsd process running on the node affected by the Oracle ASM instance failure is the
OCR writer, the majority of the OCR locations are stored in Oracle ASM, and you attempt I/O
on OCR during the time the Oracle ASM instance is down on this node, then crsd stops and
becomes inoperable. Cluster management is now affected on this particular node. Under no
circumstances will the failure of one Oracle ASM instance on one node affect the whole cluster.
Is it recommended that we put the OCR/Voting disk on ASM disk and, if so, is it
preferable to create a separate disk group for them?
With 11g Release 2, it is recommended to put the OCR and Voting Disks in ASM, using the
same diskgroup you use for your data. For OCR, for full coverage, put an OCR in two different
disk groups.
The Grid Naming Service (GNS) is a part of the Grid Plug and Play feature of Oracle RAC 11g
Release 2. It provides name resolution for the cluster. If you have a larger cluster (greater than 46 nodes) or a requirement to have a dynamic cluster (you expect to add or remove nodes in the
cluster), then you should implement GNS. If you are implementing a small cluster 4 nodes or
less, you do not need to add GNS. Note: Select GNS during install assumes that you have a
DHCP server running on the public subnet where Oracle Clusterware can obtain IP addresses for
the Node VIPs and the SCAN VIPs.
Related
Products
Oracle Database Products > Oracle Database > Oracle Database > Oracle Server -
Enterprise Edition
Errors
CRS-215; RFC-1918; PRKP-1001; 273120 ERROR; 721236 ERROR; 3113 ERROR;
29740 ERROR; 01031 ERROR; ORA-29740; ORA-3113; ORA-1031
Back to top
Rate this document
Top of Form
220970.1
Article Rating
Rate this document
Comments
Provide some feedback
Excellent
Good
Poor
Yes
No
Just browsing
Very easy
Somewhat easy
Not easy
Cance
l
Bottom of Form