You are on page 1of 19

                                                                                                                           DBA FAQ  

RAC
Date : 17th August 2010 If my OCR and Voting Disks are in ASM, can I shutdown the ASM instance? No. You will have to stop the clusterware on that node? Either crsctl stop cluster or crsctl stop crs. Can I run Oracle 9i RAC and Oracle RAC 10g in the same cluster? YES. However Oracle Clusterware (CRS) will not support a Oracle 9i RAC database so you will have to leave the current configuration in place. You can install Oracle Clusterware and Oracle RAC 10g into the same cluster. On Windows and Linux, you must run the 9i Cluster Manager for the 9i Database and the Oracle Clusterware for the 10g Database. When you install Oracle Clusterware, your 9i srvconfig file will be converted to the OCR. Both Oracle 9i RAC and Oracle RAC 10g will use the OCR. Do not restart the 9i gsd after you have installed Oracle Clusterware. With Oracle Clusterware 11g Release 2, the GSD resource will be disabled by default. You only need to enable this resource if you are running Oracle 9i RAC in the clsuter. Remember to check certify for details of what vendor clusterware can be run with Oracle Clusterware. I want to use rconfig to convert a single instance to Oracle RAC but I am using raw devices in Oracle RAC. Does rconfig support RAW ? No. rconfig supports ASM and shared file system only. How many NICs do I need to implement Oracle RAC? At minimum you need 2: external (public), interconnect (private). When storage for Oracle RAC is provided by Ethernet based networks (e.g. NAS/nfs or iSCSI), you will need a third interface for I/O so a minimum of 3. Anything else will cause performance and stability problems under load. From an HA perspective, you want these to be redundant, thus needing a total of 6. Can I run more than one clustered database on a single Oracle RAC cluster? You can run multiple databases in a Oracle RAC cluster, either one instance per node (w/ different databases having different subsets of nodes in a cluster), or multiple instances per node (all databases running across all nodes) or some combination in between. Running multiple instances per node does cause memory and resource fragmentation, but this is no different from running multiple instances on a single node in a single instance environment which is quite common. It does provide the flexibility of being able to share CPU on the node, but the Oracle Resource Manager will not currently limit resources between multiple instances on one node. You will need to use an OS level resource manager to do this. Is it supported to install Oracle Clusterware and Oracle RAC as different users? Yes, Oracle Clusterware and Oracle RAC can be installed as different users. The Oracle Clusterware user and the Oracle RAC user must both have OINSTALL as their primary group. Every Database home can have a different OSDBA group with a different username. Does changing uid or gid of the Oracle User affect Oracle Clusterware? There are a lot of files in the Oracle Clusterware home and outside of the Oracle Clusterware home that are chgrp'ed to the appropriate groups for security and appropriate access. The filesystem records the uid 1 ORAFACT

Can we output the backupset onto regular file system directly (not onto flash recovery area) using RMAN command. when we use SE RAC? Yes. which may include a communications error in the cluster. When this occurs. and so if you exchange the names. When this occurs. Why do we have a Virtual IP (VIP) in Oracle RAC 10g or 11g? Why does it just return a dead connection when its primary node fails? The goal is application availability. (4) New connection requests rapidly traverse the tnsnames. This results in the clients getting errors immediately. the VIP associated with the failed node is automatically failed over to one of the other nodes in the cluster.                                                                                                                           DBA FAQ   (not the username). What should I do? This error can occur when problems are detected on the cluster: Error: ORA-29740 (ORA-29740) Text: evicted by member %s. 3. (1) VIP detects public network failure which generates a FAN event. Action: Check the trace files of other active instances in the cluster group for indications of errors that caused a reconfiguration.ora address list skipping over the dead nodes. the following things happen. For directly connected clients. or ODP. When a node fails. (2) the new node re-arps the world indicating a new MAC address for the IP. 2 ORAFACT . group incarnation %s --------------------------------------------------------------------------Cause: This member was evicted from the group by another member of the cluster database for one of several reasons. I am receiving an ORA-29740 error. As a result. . etc. and published. the VIP associated with it is automatically failed over to some other node. The new node re-arps the world indicating a new MAC address for this IP address. The easiest way to use FAN is to use an integrated client with Fast Connection Failover (FCF) such as JDBC. Those not subscribing to FAN will eventually time out. What do the VIP resources do once they detect a node has failed/gone down? Are the VIPs automatically acquired. (3) connected clients subscribing to FAN immediately receive ORA-3113 error or equivalent. two things happen: 2. clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. instead of having to wait on TCP-IP timeouts Without using VIPs or FAN. you don't really have a good HA solution without using VIPs and FAN. each node requires a VIP. 3 additional SCAN vips are required for the cluster. now the files are owned by the wrong group.NET. When a node fails. failure to issue a heartbeat to the control file. With Oracle RAC 11g Release 2.customers might want to backup their database to offline storage so this is also supported. which will send error RST packets back to the clients. or is manual intervention required? Are VIPs mandatory? With Oracle RAC 10g or higher. this usually causes them to see errors on their connections to the old address. OCI. Subsequent packets sent to the VIP go to the new node.

This requires a DHCP service on the public network. Without using VIPs. you need to look at the total I/O across all instances in the cluster. Services allow you granular definition of workload and the DBA can dynamically define which instances provide the service. You must remember that adding more work to the database means that in addition to the CPU and Memory that the new node brings.loc or /etc/ocr. As a result. do I still need to set up Load Balancing ? Yes. the Grid Naming Service (part of the Oracle Clusterware) will automatically allocated and manage all VIPs in the cluster. and does not attempt to verify the internals of cluster database or cluster elements. With Oracle RAC 11g Release 2. you can delegate the management of the VIPs to the cluster. How do you backup voting disk? A: #dd if=voting_disk_name of=backup_file_name How do I identify the voting disk location ? A: #crsctl query css votedisk How do I identify the OCR file location ? A: check /var/opt/oracle/ocr. SHORT for short lived connections (IE connection pool) or LONG (default) for applciations that have connections active for long periods (IE Oracle Forms application Will adding a new instance to my Oracle RAC database (new node to the cluster) allow me to scale the workload? YES! Oracle RAC allows you to dynamically scale out your workload by adding another node to the cluster. while a new connection using an address list will select the next entry in the list. clients connected to a node that died will often wait for a TCP/IP timeout period before getting an error. If I use Services with Oracle RAC. If you do this. In an Oracle RAC environment. you don't really have a good HA solution without using VIPs. Cluvfy does not take any corrective action following the failure of a verification task. errors will typically be in the form of ORA-3113 errors.loc ( depends upon platform) or 3 ORAFACT .                                                                                                                           DBA FAQ   In the case of existing SQL conenctions. does not enter into areas of performance tuning or monitoring. The wide domain of deployment of CVU ranges from initial hardware setup through fully operational cluster for RAC deployment and covers all the intermediate stages of installation and configuration of various components. does not perform any cluster or RAC operation. What is the Cluster Verification Utiltiy (cluvfy)? The Cluster Verification Utility (CVU) is a validation tool that you can use to check all the important components that need to be verified at different stages of deployment in a RAC environment. With Oracle RAC 10g Release 2 or higher. Connection Load Balancing (provided by Oracle Net Services) still needs to be set up to allow the user connections to be balanced across all instances providing a service. you will have to ensure that your I/O subsystem can support the additional I/O requirements. set the CLB_GOAL on service to define the type of load balancing you want. This can be as long as 10 minutes or more.

How many nodes are supported in a RAC Database? A: 10g Release 2. shared between the caches of participating nodes in the cluster. you don't really have a good HA solution without using VIPs. support 100 nodes in a cluster using Oracle Clusterware. When a node fails. however sqlplus can start it on both nodes? How do you identify the problem? A: Set the environmental variable SRVM_TRACE to true. I get the following error PRKP-1001 CRS-0215. What are Oracle Clusterware Components? A: Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health 4 ORAFACT . Check netstat -s if you see "fragments dropped" or "packet reassemblies failed" .. This communication is based on the TCP protocol. What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report? A:This is most likely due to a fault in interconnect network. Now you will get detailed error stack. Work with your system administrator find the fault with network. and 100 instances in a RAC database. RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). This results in the clients getting errors immediately. As a result. which will send error RST packets back to the clients. Subsequent packets sent to the VIP go to the new node. And start the instance with srvctl. Srvctl cannot start instance. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster. clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. What is SCAN? A: Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. What is the purpose of Private Interconnect ? A: Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes.                                                                                                                           DBA FAQ   #ocrcheck Is ssh required for normal Oracle RAC operation ? A: "ssh" are not required for normal Oracle RAC operation. Cache Fusion is the remote memory mapping of Oracle buffers. However "ssh" should be enabled for Oracle RAC and patchset installation. Why do we have a Virtual IP (VIP) in Oracle RAC? A: Without using VIPs or FAN.

With Oracle RAC 11g Release 1. How do we verify an existing current backup of OCR? A: We can verify the current backup of OCR using the following command : ocrconfig -showbackup What are the performance views in an Oracle RAC environment? A: We have v$ views that are instance specific. The default location is : $ORA_CRS_HOME\cdata\"clustername"\ To display backups : #ocrconfig -showbackup To restore a backup : #ocrconfig -restore With Oracle RAC 10g Release 2 or later. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster How do you backup the OCR? A: There is an automatic backup mechanism for OCR. What is the differnece between server-side and client-side connection load balancing? 5 ORAFACT . Oracle RAC instances use two processes. the Global Cache Service (GCS) and the Global Enqueue Service (GES). you can do a manaual backup of the OCR with the command: # ocrconfig -manualbackup What are Oracle database background processes specific to RAC? •LMS—Global Cache Service Process •LMD—Global Enqueue Service Daemon •LMON—Global Enqueue Service Monitor •LCK0—Instance Enqueue Process To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction. In addition we have GV$ views called as global views that has an INST_ID column of numeric data type.                                                                                                                           DBA FAQ   check and arbitrates cluster ownership among the instances in case of network failures. Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. What are the types of connection load-balancing? A: There are two types of connection load-balancing:server-side load balancing and client-side load balancing. The voting disk must reside on shared disk.GV$ views obtain information from individual V$ views. The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD). you can also use the export command: #ocrconfig -export -s online. The GRD contents are distributed across all of the active instances. and use -import option to restore the contents back.

there is no dependency between Automatic Storage Management (ASM) and Oracle Cluster File System (OCFS). Give the usage of srvctl:srvctl start instance -d db_name -i "inst_name_list" [-o start_options]srvctl stop instance -d name -i "inst_name_list" [-o stop_options]srvctl stop instance -d orcl -i "orcl3. HOW DO I DETERMINE WHICH NODE IN THE CLUSTER IS THE "MASTER" NODE? * For the cluster synchronization service (CSS)./crs_stat Which enable the load balancing of applications in RAC? A: Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database. If you do not want to use ASM for your database files. * Cache Fusion is a shared cache architecture that uses high speed low latency interconnects available on clustered systems to maintain database cache coherency.After that asm can be removed using srvctl tool as follows: srvctl stop asm -n node_name srvctl remove asm -n node_name We can verify if ASM has been removed by issuing the following command: srvctl config asm -n node_name How do we verify that an instance has been removed from OCR after deleting an instance? A: Issue the following srvctl command: srvctl config database -d database_name cd CRS_HOME/bin .In case of server-side load balancing listener uses a load-balancing advisory to redirect connections to the instance providing best service. You can use OCFS on Windows( Version 2 on Linux ) for files that ASM does not handle . Database blocks are shipped across the interconnect to the node where access to the data is needed.orcl4" -o immediatesrvctl start database -d name [-o start_options]srvctl stop database -d name [-o stop_options]srvctl start database -d orcl -o mount How do we remove ASM from a Oracle RAC environment? A: We need to stop and delete the instance in the node first in interactive or silent mode. you can still use OCFS for database files in Oracle Database 10g. WHAT IS CACHE FUSION AND HOW DOES THIS AFFECT APPLICATIONS? * Cache Fusion is a new parallel database architecture for exploiting clustered computers to achieve scalability of all types of applications. WHAT ARE THE DEPENDENCIES BETWEEN OCFS AND ASM IN ORACLE DATABASE 10G ? In an Oracle RAC 10g environment.                                                                                                                           DBA FAQ   A: Client-side balancing happens at client side where load balancing is done using listener. OCFS is not required ASM for database files. the master can be found by searching 6 ORAFACT .

CAN I RUN ORACLE RAC 10G WITH ORACLE RAC 11G? Yes. It is recommended to use Oracle Database 11g ASM. If you are using ASM for storage. Every Database home can have a different OSDBA group with a different username. you can select from v$ges_resource. you can use either Oracle Database 10g ASM or Oracle Database 11g ASM however to get the 11g features. The crs process manages designated cluster resources. * The filesystem records the uid (not the username). and so if you exchange the names.The Global Cache Service Processes (LMSx) are the processes that handle remote Global 7 ORAFACT . now the files are owned by the wrong group. DOES CHANGING UID OR GID OF THE ORACLE USER AFFECT ORACLE CLUSTERWARE? * There are a lot of files in the Oracle Clusterware home and outside of the Oracle Clusterware home that are chgrp'ed to the appropriate groups for security and appropriate access. WHAT ARE ALL THE RAC BACKGROUND PROCESSES ? DIAG: Diagnosability Daemon LCKx . you must be running Oracle Database 11g ASM. Workload is automatically shared. and listeners. The Oracle Clusterware user and the Oracle RAC user must both have OINSTALL as their primary group.The Global Enqueue Service Monitor (LMON) monitors the entire cluster to manage the global enqueues and the resources. There should be a master_node column. * If files need to be restored. The Oracle Clusterware should always run at the highest level. using set AUTOLOCATE ON alerts RMAN to search for backed up files and archive logs on all nodes.The Global Enqueue Service Daemon. LMON . With Oracle Clusterware 11g. CAN RMAN BACKUP ORACLE REAL APPLICATION CLUSTER DATABASES? * Absolutely. RMAN can be configured to connect to all nodes within the cluster to parallelize the backup of the database files and archive logs. you can run both Oracle RAC 10g and Oracle RAC 11g databases.log where it is either the Oracle HOME for the Oracle Clusterware (this is the Grid Infrastructure home in Oracle Database 11g Release 2). Oracle Clusterware and Oracle RAC can be installed as different users. such as databases. WHAT IS CRS ? Cluster Ready Services (CRS) is the primary program that manages high availability operations in an RAC environment.The LMD process also handles deadlock detection and remote enqueue requests.                                                                                                                           DBA FAQ   ORACLE_HOME/log/cssd/ocssd. * For master of a enqueue resource with Oracle RAC.This process manages the global enqueue requests and the cross-instance broadcast. IS IT SUPPORTED TO INSTALL ORACLE CLUSTERWARE AND ORACLE RAC AS DIFFERENT USERS? Yes. LMSx . LMDx . Remote resource requests are the requests originating from another instance. services.

You can also check Clusterware status on both the nodes using: $crsctl check cluster prod01 ONLINE prod02 ONLINE CHECKING ORACLE CLUSTERWARE VERSION: To determine software version (binary version of the software on a particular cluster node) use $crsctl query crs softwareversion Oracle Clusterware version on node [prod01] is [11. 0 /dev/sda3 1.                                                                                                                           DBA FAQ   Cache Service (GCS) messages. $crsctl check crsd Cluster Ready Services appears healthy. $crsctl check evmd Event Manager appears healthy.6. CRSCLTL COMMANDS ? Enable Oracle Clusterware # crsctl enable crs Start Oracle Clusterware # crsctl start crs Stop Oracle Clusterware # crsctl stop crs Disable Oracle Clusterware # crsctl disable crs CHECKING VOTING DISK LOCATION ? $ crsctl query css votedisk 0. 0 /dev/sda5 2. Note: -Any command which just needs to query information can be run using oracle user.1. 0 /dev/sda6 Located 3 voting disk(s).0.0] 8 ORAFACT . Add Voting disk # crsctl add css votedisk path Remove Voting disk # crsctl delete css votedisk path Check CRS Status HOW TO SEE THE PARTICULAR DAEMON STATUS ? $crsctl check cssd Cluster Synchronization Services appears healthy. But anything which alters Oracle Clusterware requires root privileges.

The most common wait events related to this are .test2 srvctl enable asm -n node1-pub srvctl enable asm -n node2-pub srvctl enable nodeapps -n node1-pub srvctl enable nodeapps -n node2-pub VIEWING NO.1. USE $ crsctl query crs activeversion Oracle Clusterware active version on the cluster is [11.                                                                                                                           DBA FAQ   FOR CHECKING ACTIVE VERSION ON CLUSTER. OF NODES CONFIGURED IN CLUSTER: olsnodes -n -p -i [root@node1-pub ~]# olsnodes -n -p -i node1-pub 1 node1-prv node1-vip node2-pub 2 node2-prv node2-vip WHAT ARE THE MAJOR RAC WAIT EVENTS? In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs.6..0] VIEWING OCR DISK INFORMATION: [root@node1-pub ~]# ocrcheck Status of Oracle Cluster Registry is as follows : Version : 2 Total space (kbytes) : 262120 Used space (kbytes) : 3848 Available space (kbytes) : 258272 ID : 744414276 Device/File Name : /u02/ocfs2/ocr/OCRfile_0 Device/File integrity check succeeded Device/File Name : /u02/ocfs2/ocr/OCRfile_1 Device/File integrity check succeeded Cluster registry integrity check succeeded. DATABASE INSTANCES FOR ALL THE NODES: srvctl enable instance -d test -i test1. GC CR REQUEST : The time it takes to retrieve the data from the remote cache Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase 9 ORAFACT . RESTORING VOTEDISKS ? crsctl stop crs crsctl query css votedisk dd if=<backup of Votedisk> of=<Votedisk file> (do this for all the votedisks) crsctl start crs ENABLE THE NODEAPPS.0. ASM.

A non-RAC database is only available on a single system.xml $ rconfig racconv. What is RAC and how is it different from non RAC databases? RAC stands for Real Application Clusters. In such cases the application may need to be partitioned based on function or data to eliminate the contention. almost). How does one convert a single instance database to RAC? Oracle 10gR2 introduces a utility called rconfig (located in $ORACLE_HOME/bin) that will convert a single instance database to a RAC database. If that system fails. the database service will still be available on the remaining nodes. $ cp $ORACLE_HOME/assistants/rconfig/sampleXMLs/ConvertToRAC. For prior releases. Do you need special hardware to run RAC? RAC requires the following hardware components: • A dedicated network interconnect .) GC BUFFER BUSY: It is the time the remote instance locally spends accessing the requested data block. Should a single system (node) fail. These disk partitions should be spread across different physical disks. follow these steps: • Shut Down your Database: SQL> CONNECT SYS AS SYSDBA SQL> SHUTDOWN NORMAL • Enable RAC . Applications with 'hot' blocks (the same data blocks continuously accessed by processes on different nodes) may not work well. one should have at lease two OCR disks and three voting disks (raw disk partitions). the database service will be down (single point of failure). The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect. 10 ORAFACT . How many OCR and voting disks should one have? For redundancy. • Make the software available on all computer systems that will run RAC. This can be done by copying the software to all systems or to a shared clustered file system.xml racconv. This is because data blocks will constantly be moved from one Oracle Instance to another. Can any application be deployed on RAC? Most applications can be deployed on RAC without any modifications and still scale linearly (well. and • A shared disk subsystem.                                                                                                                           DBA FAQ   the amount of data blocks requested by an Oracle session. It allows multiple nodes in a clustered system to mount and open a single database that resides on shared disk storage.xml $ vi racconv.On Unix this is done by relinking the Oracle software.might be as simple as a fast network connection between nodes.xml One can also use dbca and enterprise manager to convert the database to RAC mode.

How does one stop and start RAC instances? There are no difference between the way you start a normal database and RAC database.. • Create the dictionary views needed for RAC by running catclust...sql (previously called catparr. except that a RAC database needs to be started from multiple nodes. SQL> GROUP G5 ('RAW_FILE2') SIZE 500k. The CLUSTER_DATABASE=TRUE (PARALLEL_SERVER=TRUE) parameter needs to be set before a database can be started in cluster mode.sql): SQL> START ?/rdbms/admin/catclust.ORA files and number the instances 1. INSTANCE_NUMBER = 1 THREAD = 1 UNDO_TABLESPACE = undots1 (or ROLLBACK_SEGMENTS if you use UNDO_MANAGEMENT=manual) # Include %T for the thread in the LOG_ARCHIVE_FORMAT string. • Each instance requires its own set of Undo segments (rollback segments).. SQL> ALTER DATABASE ENABLE PUBLIC THREAD 2. # Set LM_PROCS to the number of nodes * PROCESSES # etc.                                                                                                                           DBA FAQ   • Each instance requires its own set of Redo Log Files (called a thread).. Here are some examples: $ srvctl status database -d RACDB $ srvctl start database -d RACDB $ srvctl start instance -d RACDB -i RACDB1 $ srvctl start instance -d RACDB -i RACDB2 $ srvctl stop database -d RACDB $ srvctl start asm -n node2 How Can I test if a database is running in RAC mode? 11 ORAFACT . To add undo segments for New Nodes: UNDO_MANAGEMENT = auto UNDO_TABLESPACE = undots2 • Edit the SPFILE/INIT. 2. In Oracle 10g one can use the srvctl utility to start instances and listener across the cluster from a single node.sql • On all the computer systems.. SQL> GROUP G6 ('RAW_FILE3') SIZE 500k. startup the instances: SQL> CONNECT / as SYSDBA SQL> STARTUP. Create additional log files: SQL> CONNECT SYS AS SYSBDA SQL> STARTUP EXCLUSIVE SQL> ALTER DATABASE ADD LOGFILE THREAD 2 SQL> GROUP G4 ('RAW_FILE1') SIZE 500k.: CLUSTER_DATABASE = TRUE (PARALLEL_SERVER = TRUE prior to Oracle9i).

With login time (hour): SELECT inst_id. Cluster Ready Services (crsd) — The crs process manages cluster resources (which could be a database. An OPROCD failure results in Oracle Clusterware restarting the node. TO_CHAR(logon_time.V_$ACTIVE_INSTANCES. an instance. For example. failure of this process results in cluster restart. and so on) based on the resource's configuration information that is stored in the OCR.V_$THREAD Can one see how connections are distributed across the nodes? Select from gv$session.is_cluster_database THEN dbms_output. TO_CHAR(logon_time. racgimon) —Extends clusterware to support Oracle-specific requirements and complex resources.put_line('Running in EXCLUSIVE mode. OPROCD performs its check. 'DD-MON-YYYY HH24') "Hour when connected".put_line('Running in SHARED/RAC mode. How can I keep track of active instances? You can keep track of active RAC instances by executing one of the following queries: SELECT * FROM SYS. END. monitor and failover operations. This includes start. stops running. END IF. Some examples: SELECT inst_id. count(*) "DB Sessions" FROM gv$session WHERE type = 'USER' GROUP BY inst_id. RACG (racgmain. a Listener. 'DD-MON-YYYY HH24') ORDER BY inst_id. an application process. Runs server callout scripts when FAN events occur. TO_CHAR(logon_time. 12 ORAFACT . This process runs as the root user Event manager daemon (evmd) —A background process that publishes events that crs creates. ELSE dbms_output.                                                                                                                           DBA FAQ   Use the DBMS_UTILITY package to determine if a database is running in RAC mode or not. from SQL*Plus: SQL> show parameter CLUSTER_DATABASE If the value of CLUSTER_DATABASE is FALSE then database is not running in RAC Mode.'). SELECT * FROM SYS. then OPROCD resets the processor and reboots the node. stop. and if the wake up is beyond the expected time. count(*) "DB Sessions" FROM gv$session WHERE type = 'USER' GROUP BY inst_id. Process Monitor Daemon (OPROCD) —This process monitor the cluster and provide I/O fencing. / Another method is to look at the database parameters. a virtual IP (VIP) address.'). Example: BEGIN IF dbms_utility. 'DD-MON-YYYY HH24 What are Oracle Clusterware processes for 10g on Unix and Linux? Cluster Synchronization Services (ocssd) — Manages cluster node membership and runs as the oracle user. OPROCD uses the hangcheck timer on Linux platforms. a service.

. the Global Cache Service (GCS) and the Global Enqueue Service (GES). With Oracle RAC 11g Release 1. you can do a manaual backup of the OCR with the command: # ocrconfig -manualbackup How do you backup voting disk? #dd if=voting_disk_name of=backup_file_name How do I identify the voting disk location? #crsctl query css votedisk 13 ORAFACT . The GRD contents are distributed across all of the active instances.1 Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions. Note 265769. The voting disk must reside on shared disk. Oracle RAC instances use two processes.1 Troubleshooting CRS Reboots Note. The OCR must reside on shared disk that is accessible by all of the nodes in your cluster How do you troubleshoot node reboot ? Please check metalink .559365.                                                                                                                           DBA FAQ   What are Oracle database background processes specific to RAC? •LMS—Global Cache Service Process •LMD—Global Enqueue Service Daemon •LMON—Global Enqueue Service Monitor •LCK0—Instance Enqueue Process To ensure that each Oracle RAC database instance obtains the block that it needs to satisfy a query or transaction. you can also use the export command: #ocrconfig -export -s online. and use -import option to restore the contents back. How do you backup the OCR? There is an automatic backup mechanism for OCR. What are Oracle Clusterware Components? Voting Disk — Oracle RAC uses the voting disk to manage cluster membership by way of a health check and arbitrates cluster ownership among the instances in case of network failures. Oracle Cluster Registry (OCR) — Maintains cluster configuration information as well as configuration information about any cluster database within the cluster. The default location is : $ORA_CRS_HOME\cdata\"clustername"\ To display backups : #ocrconfig -showbackup To restore a backup : #ocrconfig -restore With Oracle RAC 10g Release 2 or later.. The GCS and GES maintain records of the statuses of each data file and each cached block using a Global Resource Directory (GRD).

How many nodes are supported in a RAC Database? 10g Release 2. However "ssh" should be enabled for Oracle RAC and patchset installation. What is the purpose of Private Interconnect ? Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. I get the following error PRKP-1001 CRS-0215. which will send error RST packets back to the clients. This results in the clients getting errors immediately. shared between the caches of participating nodes in the cluster. Why do we have a Virtual IP (VIP) in Oracle RAC? Without using VIPs or FAN.. Subsequent packets sent to the VIP go to the new node. and 100 instances in a RAC database. however sqlplus can start it on both nodes? How do you identify the problem? Set the environmental variable SRVM_TRACE to true. Work with your system administrator find the fault with network. Check netstat -s if you see "fragments dropped" or "packet reassemblies failed" . RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). 14 ORAFACT . support 100 nodes in a cluster using Oracle Clusterware. clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. What is SCAN? Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC) 11g Release 2 feature that provides a single name for clients to access an Oracle Database running in a cluster. This communication is based on the TCP protocol. What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR Report? This is most likely due to a fault in interconnect network. When a node fails. you don't really have a good HA solution without using VIPs. And start the instance with srvctl. the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. Now you will get detailed error stack.loc ( depends upon platform) or #ocrcheck Is ssh required for normal Oracle RAC operation ? "ssh" are not required for normal Oracle RAC operation. Cache Fusion is the remote memory mapping of Oracle buffers. The benefit is clients using SCAN do not need to change if you add or remove nodes in the cluster. Srvctl cannot start instance. As a result.                                                                                                                           DBA FAQ   How do I identify the OCR file location? check /var/opt/oracle/ocr.loc or /etc/ocr.

What is Global Cache Service? Global Cache Service (GCS) is the main component of Oracle Cache Fusion technology. which will send error RST packets back to the clients. 15 ORAFACT . shared between the caches of participating nodes in the cluster. The main function of GCS is to track the status and location of data blocks. b. Status of data block means the mode and role of data block (I will explain mode and role further). This communication is based on the TCP protocol. GCS is the main mechanism by which cache coherency among “multiple cache” is maintained. This results in the clients getting errors immediately. the VIP associated with it is automatically failed over to some other node and new node re-arps the world indicating a new MAC address for the IP. What is the purpose ofPrivate Interconnect ? Clusterware uses the private interconnect for cluster synchronization (network heartbeat) and daemon communication between the the clustered nodes. clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error. the 10gR2 Load Balancing Advisory. Subsequent packets sent to the VIP go to the new node. This is represented by background process LMSn. As a result. This in order to facilitate: a. the feature that permit load balancing accross different rac nodes dependent of the load on the different nodes. If performs this operation for resources that are accessed by more then once instance. RAC uses the interconnect for cache fusion (UDP) and inter-process communication (TCP). library cache locks and transactions. Why do we have a Virtual IP (VIP) in Oracle RAC? Without using VIPs or FAN. There can be max 10 LMS process for an instance. What is Global Enqueue Service? Global Enqueue Service (GES) tracks the status of all Oracle enqueuing mechanism. GCS is also responsible for block transfer between the instances.                                                                                                                           DBA FAQ   what is the purpose of the ONS daemon? The Oracle Notification Service (ONS) daemon is an daemon started by the CRS clusterware as part of the nodeapps. There is one ons daemon started per clustered node. the FAN or Fast Application Notification feature or allowing applications to respond to database state changes. The Oracle Notification Service daemon receive a subset of published clusterware events via the local evmd and racgimon clusterware daemons and forward those events to application subscribers and to the local listeners. Cache Fusion is the remote memory mapping of Oracle buffers. The rdbms MMON is creating an advisory for distribution of work every 30seconds and forward it via racgimon and ONS to listeners and applications. This involves all non-cache fusion intra instance operations. GES performs concurrency control on dictionary cache locks. you don't really have a good HA solution without using VIPs. When a node fails.

The role granted is local. GRD know what is the location of latest version of block. GRD is stored in SGA. GCS resources also has roles. what is the role of block (Mode and role will be discussed shortly) etc. Meaning that only 1 copy of data block exists in the cache. For example a user connected to one of the instance request for a data block. RAC is the best solution for high performance and high availably. When ever a user ask for any data block GCS gets all the information from GRD. Non RAC 16 ORAFACT . To understand more about enqueues. GRD is a distributed resource. Other resource cannot have write over this data block. what is Global Resource Directory? GES and GCS together maintains Global Resource Directory (GRD). No other instance cache has a copy of this block. Cluster is the key component and is a collection of servers operations as one unit. meaning that each instance maintain some part of GRD. Typically GRD contains following and more information • Data Block Address – This is the address of data block being modified • Location of most current version of data block • Modes of data block • Roles of data block • SCN number of data block • Image of data block – Could be current image or past image. These are responsible for locking the rows on a table using different locking modes. It indicates no access rights. 2 Shared (S) Mode: Shared mode indicate that database block is being read and not modified. What is RAC? RAC stands for Real Application cluster. media failover features. If another instance request for same block this block will get copied to the requesting instance and the role becomes global. What is RAC and how is it different from non RAC databases? RAC stands for Real Application Cluster. However another session can read the data block 3 Exclusive (X) Mode: Exclusive mode indicate exclusive access to block. what is the mode of block. GRD is like a in-memory database which contains details about all the blocks that are present in cache. It is a clustering solution from Oracle Corporation that ensures high availability of databases by providing instance failover. Following are the different roles present: 1 Local: When a data block is first read into the instance from the disk it has a local role. The modes are as follows: 1 Null (N) Mode: Null mode is the least restrictive mode. However it can have consistent read on this datablock. you have n number of instances running in their own separate nodes and based on the shared storage. 2 Global: Global role indicates that multiple copy of data block exists in clustered instance. This data block is read from disk into an instance. This distributed nature of GRD is a key to fault tolerance of RAC. what is GCS resource modes and roles? Mode of data block is decided based on whether a resource holder intends to modify the data or read the data.                                                                                                                           DBA FAQ   Enqueue services are also present in single instance database. It acts as a place holder. This role and mode information is maintained in GRD (Global Resource Directory) by GCS (Global Cache Service).

                                                                                                                           DBA FAQ   databases has single point of failure in case of hardware failure or server crash.This process is referred to as cache fusion and helps in data integrity. RAC system must equipped with low-latency and high speed interconnect to make it happen. the block must be transfered from the cache of one node to the other. private.Oracle RAC instances use two processes GES(Global Enqueue Service). GCS(Global Cache Service) that enable cache fusion. The GES and GCS maintains records of the statuses of each datafile and each cahed block using global resource directory. They are composed of Memory structures and background processes same as the single instance database. Give the usage of srvctl : srvctl start instance -d db_name -i "inst_name_list" [-o start_options] srvctl stop instance -d name -i "inst_name_list" [-o stop_options] srvctl stop instance -d orcl -i "orcl3. What are the major RAC wait events? In a RAC environment the buffer cache is global across all instances in the cluster and hence the processing differs. What is Cache Fusion? Cache fusion is the mechanism to transfer the data block from memory to memory of one node to the other. What are the different network components are in 10g RAC? public. Give Details on ACMS ACMS stands for Atomic Controlfile Memory Service. and vip components Private interfaces is for intra node communication.Oracle RAC instances are composed of following background processes: ACMS—Atomic Controlfile to Memory Service (ACMS) GTX0-j—Global Transaction Process LMON—Global Enqueue Service Monitor LMD—Global Enqueue Service Daemon LMS—Global Cache Service Process LCK0—Instance Enqueue Process RMSn—Oracle RAC Management Processes (RMSn) RSMN—Remote Slave Monitor What is GRD? GRD stands for Global Resource Directory.orcl4" -o immediate srvctl start database -d name [-o start_options] srvctl stop database -d name [-o stop_options] srvctl start database -d orcl -o mount Mention the Oracle RAC software components : Oracle RAC is composed of two or more database instances.If two nodes require the same block for query or update. this is the reason that all applications should based on vip components means tns entries should have vip entry in the host list.The most common wait events related to this are gc cr request and gc buffer busy 17 ORAFACT . When a node fails then the VIP component fail over to some other node.In an Oracle RAC environment ACMS is an agent that ensures a distributed SGA memory update(ie)SGA updates are globally committed on success or globally aborted in event of a failure. VIP is all about availability of application.

control files.network file system(NFS). SPFILE's. SPFIles.Windows clusters use the TCP protocol. redo log files must reside on cluster-aware shred storage.                                                                                                                           DBA FAQ   GC CR request :the time it takes to retrieve the data from the remote cache Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned queries will increase the amount of data blocks requested by an Oracle session.Users can use oracle services feature to connect to database. workload balancing and failover options. controlfiles. redolog files when these files are hosted out of cluster-aware shared storage which are group of shared disks.and high availability characteristics. How do users connect to database in an Oracle RAC environment? Users can access a RAC database using a client/server configuration or through one or more middle tiers . How can we configure the cluster interconnect? Configure User Datagram Protocol(UDP) on Gigabit ethernet for cluster interconnect.) What components in RAC must reside in shared storage? All datafiles. What is the use of cluster interconnect? Cluster interconnect is used by the Cache fusion for inter instance communication.with or without connection pooling. The more blocks requested typically means the more often a block will need to be read from a remote instance via the interconnect. crossover cables are not supported with Oracle Clusterware intercnects. OCFS2 and OCFS(Oracle Cluster Fie systems). Can we use crossover cables with Oracle Clusterware interconnects? No.raw disk devices.Services enable us to define rules and characteristics to control how users and applications connect to database instances. What is the use of a service in Oracle RAC environment? Applications should use the services feature to connect to the Oracle database. What are the characteristics controlled by Oracle services feature? The charateristics include a unique name. The interconnect network uses a switch/multiple switches that only the nodes in the cluster can access. What is the significance of using cluster-aware shared storage in an Oracle RAC environment? All instances of an Oracle RAC can access all the datafiles. What is an interconnect network? An interconnect network is a private network that connects all of the servers in a cluster. Give few examples for solutions that support cluster storage: ASM(automatic storage management).On unix and linux systems we use UDP and RDS(Reliable data socket) protocols to be used by Oracle Clusterware. 18 ORAFACT .

What is the use of VIP? If a node fails. then the node's VIP address fails over to another node on which the VIP address can accept TCP connections but it cannot accept Oracle connections. What is the significance of VIP address failover? When a VIP address failover happens. Give situations under which VIP address failover happens: VIP addresses failover happens when the node on which the VIP address runs fails. Clients that attempt to connect to the VIP address receive a rapid connection refused error .                                                                                                                           DBA FAQ   What enables the load balancing of applications in RAC? Oracle Net Services enable the load balancing of application connections across all of the instances in an Oracle RAC database. all interfaces for the VIP address are disconnected from the network. all interfaces for the VIP address fails. we need to reserve a spare IP address for each node. and the IP addresses must use the same subnet as the public network. To configureVIP address. What is a virtual IP address or VIP? A virtual IP address or VIP is an alternate IP address that the client connections use instead of the standard public IP address.They don't have to wait for TCP connection timeout messages. 19 ORAFACT .