0% found this document useful (0 votes)
287 views64 pages

RAC DOC - New

Uploaded by

sushma bugudala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
287 views64 pages

RAC DOC - New

Uploaded by

sushma bugudala
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd

Advantages of RAC (Real Application Clusters)

Reliability - if one node fails, the database won't fail


Availability - nodes can be added or replaced without having to shutdown the
database
Scalability - more nodes can be added to the cluster as the workload increases

===================================================================================
==========================================

What is Oracle RAC One Node?


Oracle RAC one Node is a single instance running on one node of the cluster while
the 2nd node is in cold standby mode. If the instance fails for some reason then
RAC one node detect it and restart the instance on the same node or the instance is
relocate to the 2nd node incase there is failure or fault in 1st node. The benefit
of this feature is that it provides a cold failover solution and it automates the
instance relocation without any downtime and does not need a manual intervention.
Oracle introduced this feature with the release of 11gR2 (available with Enterprise
Edition).

===================================================================================
==========================================

crs_start -all

* Public Interface: Used for normal network communications to the node

* Private Interface: Used as the cluster interconnect

* Virtual (Public) Interface: Used for failover and RAC management

LOcation of the ip adress is /etc/hosts


COmmand $/sbin/ifconfig -a

OCR locaion in the in the asm diskgroup-----

####$/usr/sbin/oracleasm configure
ORACLEASM_ENABLED=false
ORACLEASM_UID=
ORACLEASM_GID=
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=��
ORACLEASM_SCANEXCLUDE=��

[root@myrac2 ~]# /usr/sbin/oracleasm configure -i


Configuring the Oracle ASM library driver.

vi /etc/sysconfig/oracleasm

oracleasm_uid=grid

===================================================================================
=========================================
********How to check what all the components Installed on this Database?

select comp_name,status,version from dba_registry;


(or)
$ORA_BASE/oraInventory/logs/installActions******.log

===================================================================================
==========================================

crsctl check crs <<---------- local node CRS info


crsctl check cluster <<------ remote nodes in the cluster
crsctl stat res -t ---------- Check satus of cluster resources info.
crsctl stat res -t -init----- checking status on local node

-bash-3.2$ crsctl check crs


CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

===================================================================================
==========================================

Sequence to Start and stop RAC Services


=======================================

Stop RAC {NODEAPPS, ASM, DATABASE, services}


Start RAC {Services, DATABASE, ASM, NODEAPPS}
Status RAC {NODE, ASM, INSTANCE, DATABASE, Services}

Follow the steps below to start and stop individual application resource.
crs_stat -t
srvctl start nodeapps -n <node1 hostname>
srvctl start nodeapps -n <node2 hostname>
srvctl start asm -n <node1 hostname>
srvctl start asm -n <node2 hostname>
srvctl start database -d <database name>
srvctl start service -d <database name> -s <service name>
crs_stat -t

Follow the steps below to start and stop individual application resource.
crs_stat -t
srvctl stop service -d <database name> -s <service name>
srvctl stop database -d <database name>
srvctl stop asm -n <node1 hostname>
srvctl stop asm -n <node2 hostname>
srvctl stop nodeapps -n <node1 hostname>
srvctl stop nodeapps -n <node2 hostname>
crs_stat -t

===================================================================================
==========================================

*******What is Local listener and what is Remote_listener?

Local_listener is the listener which serves the instances which register to it (as
local_listener).
Remote_listener(s) should be all the listeners which serve the other instances
belonging to the database.
every pmon tells all the local and remote listeners the load of the system where it
resists.
In server side load balancing, the listener will redirect the client to the
listener on the system with the lowest load.

===================================================================================
==========================================

Cluster Ready Services (crsd) � The crs process manages cluster resources (which
could be a database, an instance, a service, a Listener, a virtual IP (VIP)
address, an application process, and so on) based on the resource's configuration
information that is stored in the OCR. This includes start, stop, monitor and
failover operations. This process runs as the root user

===================================================================================
==========================================

The voting disk ----is used to determine if a node has failed, i.e. become
separated from the majority. If a node is deemed to no longer belong to the
majority then it is forcibly rebooted and will after the reboot add itself again
the the surviving cluster nodes.

===================================================================================
==========================================

****What is Cache Fusion and how does this affect applications?


Cache Fusion is a shared cache architecture that uses high speed. Database blocks
are shipped across the interconnect to the node where access to the data is needed.

****Give Details on Cache Fusion:-


Oracle RAC is composed of two or more instances. When a block of data is read from
datafile by an instance within the cluster and another instance is in need of the
same block,it is easy to get the block image from the insatnce which has the block
in its SGA rather than reading from the disk. To enable inter instance
communication Oracle RAC
makes use of interconnects. The Global Enqueue Semrvice(GES) monitors and Instance
enqueue process manages the cahce fusion.

===================================================================================
==========================================

How many voting disks are you maintaining & Why we need to keep odd number of
voting disks ?

By default Oracle will create 3 voting disk files in ASM.

Oracle expects that you will configure at least 3 voting disks for redundancy
purposes. You should always configure an odd number of voting disks >= 3. This is
because loss of more than half your voting disks will cause the entire cluster to
fail.

You should plan on allocating 280MB for each voting disk file. For example, if you
are using ASM and external redundancy then you will need to allocate 280MB of disk
for the voting disk. If you are using ASM and normal redundancy you will need
560MB.

===================================================================================
==========================================
What is the use of VIP?

If a node fails, then the node's VIP address fails over to another node on which
the VIP address can accept TCP connections but it cannot accept Oracle connections.

===================================================================================
==========================================
What are Oracle database background processes specific to RAC?
LMS�Global Cache Service Process
LMD�Global Enqueue Service Daemon
LMON�Global Enqueue Service Monitor
LCK0�Instance Enqueue Process

Oracle RAC instances use two processes, the Global Cache Service (GCS) and the
Global Enqueue Service (GES). The GCS and GES maintain records of the statuses of
each data file and each cached block using a Global Resource Directory (GRD). The
GRD contents are distributed across all of the active instances

from SQL joins x$kjbl with x$le


dba_objects , v$gcspfmaster_info

�LMS�Global Cache Service Process


------------------------------------
The LMS process maintains records of the datafile statuses and each cached block by
recording information in a Global Resource Directory (GRD). The LMS process also
controls the flow of messages to remote instances and manages global data block
access and transmits block images between the buffer caches of different instances.
This processing is part of the Cache Fusion feature.

�LMD�Global Enqueue Service Daemon


------------------------------------
The LMD process manages incoming remote resource requests within each instance.

�LMON�Global Enqueue Service Monitor


------------------------------------

The LMON process monitors global enqueues and resources across the cluster and
performs global enqueue recovery operations.

�LCK0�Instance Enqueue Process


------------------------------------
The LCK0 process manages non-Cache Fusion resource requests such as library and row
cache requests.

===================================================================================
==========================================
GCS
Global Cache Service processes (GCS) are processes that, when spawned by Oracle,
copy blocks directly from the holding instance's buffer cache and send a read
consistent copy of the block to the requesting foreground process on the requesting
instance to be placed into the buffer cache. RAC software provides for up to 10 GCS
processes (0 thru 9), depending on the amount of messaging traffic. However, there
is, by default, one GCS process per pair of CPU.

GES
Global Enqueue Service Daemon is a background agent process that manages requests
for resources to control access to blocks and global enqueues. It manages lock
manager service requests for GCS resources and sends them to a service queue to be
handled by the GCS process. The GES process also handles global deadlock detection
and remote resource requests (remote resource requests are requests originating
from another instance).

LCK
The Lock Process (LCK) manages non-cache fusion resource requests such as library
and row cache requests and lock requests that are local to the server. LCK manages
instance resource requests and cross-instance call operations for shared resources.
It builds a list of invalid lock elements and validates lock elements during
recovery. Because the LMS process handles the primary function of lock management,
only a single LCK process exists in each instance. There is only one LCK process
per instance in RAC.

DIAG
The Diagnosability Daemon (DIAG) background process monitors the health of the
instance and captures diagnostic data about process failures within instances. The
operation of this daemon is automated and updates an alert log file to record the
activity that it performs.
Oracle 11g Processes

===================================================================================
==========================================

Check nodes in cluster

[root@Rac1 bin]# olsnodes -n -p -i


rac1 1
rac2 2

can also give -s option to see which nodes are active or inactive incase of node
eviction

crs_start -all

===================================================================================
==========================================
OCR info:
---------
cd /u01/app/11.2.0/grid/bin/ ./ocrcheck --------check the staus of OCR
ocrconfig -showbackup--------check the backups info
ocrconfig -manualbackup <<--Physical Backup of OCR
ocrconfig -export /tmp/ocr_exp.dat -s online <<-- Logical Backup of OCR
ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2 ----remove
ocrconfig -restore /u01/app/crs/cdata/test-crs/[Link]-------restore
Backing Up OCR

Oracle performs physical backup of OCR devices every 4 hours under the default
backup direcory and then it rolls that forward to Daily, weekly and monthly backup.
You can get the backup information by executing below command.
$ORA_CRS_HOME/cdata/<CLUSTER_NAME>

ocrconfig -showbackup

===================================================================================
==========================================

VOting DIsk:
------------
[root@Rac1 bin]# ./crsctl query css votedisk---status check

[root@Rac1 bin]# ./crsctl query crs activeversion


Oracle Clusterware active version on the cluster is [[Link].0]

[root@Rac1 bin]# ./crsctl query crs softwareversion


Oracle Clusterware version on node [rac1] is [[Link].0]

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0---backup command


crsctl add css votedisk /u02/ocfs2/vote/VDFile_3---add
crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3---remove
dd of=$ORACLE_BASE/bkp/vd/VDFile_0 if=/u02/ocfs2/vote/VDFile_0 ---restore command

===================================================================================
==========================================

crsctl get css disktimeout


crsctl get css misscount
crsctl get css reboottime
./crsctl get css reboottime

===================================================================================
==========================================
=================Adding Services to Cluster========================

[Link] ASM INSTANCE(S) to OCR:

srvctl add asm -n -i -o


[oracle@rac1 bin]$ pwd
/u01/crs/oracle/product/10.2.0/crs/bin
[oracle@rac1 bin]$ ./srvctl add asm -i +ASM1 -n rac1 -o
/u01/app/oracle/product/10.2.0/db_1

[Link] DATABASE to OCR:

srvctl add database -d -o


[oracle@rac1 bin]$ ./srvctl add database -d cdbs -o
/u01/app/oracle/product/10.2.0/db_1

[Link] INSTANCE(S) to OCR:

srvctl add instance -d -i -n


[oracle@rac1 bin]$ ./srvctl add instance -d cdbs -i cdbs1 -n rac1
[Link] SERVICE(S) to OCR:

srvctl add service -d -s -r -P


[oracle@rac1 bin]$ ./srvctl add service -d cdbs -s cdbs_srvc -r cdbs1,cdbs2 -P
BASIC
===================================================================================
==========================================

What is SCAN? (11gR2 feature)


Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC)
11g Release 2 feature that provides a single name for clients to access an Oracle
Database running in a cluster. The benefit is clients using SCAN do not need to
change if you add or remove nodes in the cluster.

SCAN provides a single domain name via (DNS), allowing and-users to address a RAC
cluster as-if it were a single IP address. SCAN works by replacing a hostname or IP
list with virtual IP addresses (VIP).

Single client access name (SCAN) is meant to facilitate single name for all Oracle
clients to connect to the cluster database, irrespective of number of nodes and
node location. Until now, we have to keep adding multiple address records in all
clients [Link], when a new node gets added to or deleted from the cluster.

Single Client Access Name (SCAN) eliminates the need to change TNSNAMES entry when
nodes are added to or removed from the Cluster. RAC instances register to SCAN
listeners as remote listeners. Oracle recommends assigning 3 addresses to SCAN,
which will create 3 SCAN listeners, though the cluster has got dozens of nodes..
SCAN is a domain name registered to at least one and up to three IP addresses,
either in DNS (Domain Name Service) or GNS (Grid Naming Service). The SCAN must
resolve to at least one address on the public network. For high availability and
scalability, Oracle recommends configuring the SCAN to resolve to three addresses.
[Link]
What are SCAN components in a cluster?
[Link] Name
[Link] IPs (3)
[Link] Listeners (3)

===================================================================================
==========================================

How to find location of OCR file when CRS is down?

When the CRS is down:


Look into �[Link]� file, location of this file changes depending on the OS:
On Linux: /etc/oracle/[Link]
On Solaris: /var/opt/oracle/[Link]

When CRS is UP:


Set ASM environment or CRS environment then run the below command:
ocrcheck

===================================================================================
==========================================

How to find IP�s information in RAC ?

Edit the /etc/hosts file as shown below:


===================================================================================
==========================================

Why we need to have configured SSH or RSH on the RAC nodes?

SSH (Secure Shell,10g+) or RSH (Remote Shell, 9i+) allows �oracle� UNIX account
connecting to another RAC node and copy/ run commands as the local �oracle� UNIX
account.

===================================================================================
==========================================

What is the Load Balancing Advisory?

To assist in the balancing of application workload across designated resources,


Oracle Database 10g Release 2 provides the Load Balancing Advisory. This Advisory
monitors the current workload activity across the cluster and for each instance
where a service is active; it provides a percentage value of how much of the total
workload should be sent to this instance as well as service quality flag.

===================================================================================
==========================================

What is the Cluster Verification Utiltiy (cluvfy)?


The Cluster Verification Utility (CVU) is a validation tool that you can use to
check all the important components that need to be verified at different stages of
deployment in a RAC environment.

===================================================================================
==========================================

What is hangcheck timer used for ?


The hangcheck timer checks regularly the health of the system. If the system hangs
or stop the node will be restarted automatically.

There are 2 key parameters for this module:

-> hangcheck-tick: this parameter defines the period of time between checks of
system health. The default value is 60 seconds; Oracle recommends setting it to
30seconds.

-> hangcheck-margin: this defines the maximum hang delay that should be tolerated
before hangcheck-timer resets the RAC node.

===================================================================================
==========================================

Name two specific RAC background processes


RAC processes are: LMON, LMDx, LMSn, LKCx and DIAG.

===================================================================================
==========================================

How do you backup voting disk


#dd if=voting_disk_name of=backup_file_name

48. How do I identify the voting disk location


#crsctl query css votedisk
49. How do I identify the OCR file location
check /var/opt/oracle/[Link] or /etc/[Link] ( depends upon platform)
or
#ocrcheck

50. How do you backup the OCR


There is an automatic backup mechanism for OCR. The default location is :
$ORA_CRS_HOME\cdata\"clustername"\
To display backups :
#ocrconfig -showbackup
To restore a backup :
#ocrconfig -restore
With Oracle RAC 10g Release 2 or later, you can also use the export command:
#ocrconfig -export -s online, and use -import option to restore the contents back.
With Oracle RAC 11g Release 1, you can do a manaual backup of the OCR with the
command:
# ocrconfig -manualbackup

===================================================================================
==========================================

-bash-3.2$ crsctl check crs


CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

Cluster Ready Services (crsd) � The crs process manages cluster resources (which
could be a database, an instance, a service, a Listener, a virtual IP (VIP)
address, an application process, and so on) based on the resource's configuration
information that is stored in the OCR. This includes start, stop, monitor and
failover operations. This process runs as the root user

Event manager daemon (evmd) �A background process that publishes events that crs
creates.

Process Monitor Daemon (OPROCD) �This process monitor the cluster and provide I/O
fencing. OPROCD performs its check, stops running, and if the wake up is beyond the
expected time, then OPROCD resets the processor and reboots the node. An OPROCD
failure results in Oracle Clusterware restarting the node. OPROCD uses the
hangcheck timer on Linux platforms.

===================================================================================
==========================================

*********What do you do if you see GC CR BLOCK LOST in top 5 Timed Events in AWR
Report?

This is most likely due to a fault in interconnect network.


Check netstat -s
if you see "fragments dropped" or "packet reassemblies failed" , Work with your
system administrator find the fault with network.

RAC Wait events:


***gc buffer busy acquire and gc buffer busy release

The gc buffer busy acquire and gc buffer busy release wait events specify the time
the remote instance locally spends accessing the requested data block. In Oracle
11g you will see gc buffer busy acquire wait event when the global cache open
request originated from the local instance and gc buffer busy release when the open
request originated from a remote instance. In Oracle 10g these two wait events were
represented in a single gc buffer busy wait, and in Oracle 9i and prior the "gc"
was spelled out as "global cache" in the global cache buffer busy wait event. These
wait events are all very similar to the buffer busy wait events in a single-
instance database and are often the result of:

***gc cr request

The gc cr request wait event specifies the time it takes to retrieve the data from
the remote cache. In Oracle 9i and prior, gc cr request was known as global cache
cr request.

===================================================================================
==========================================

CRSCTL commands in Oracle 11g Release 2

How to Check current status of CRS:


----------------------------------
$crsctl check crs
$crsctl check cluster [-node node_name]

How to Check CSS, CRS and EVMD:


------------------------------
$crsctl check cssd
$crsctl check crsd
$crsctl check evmd

How to check VIP status is ONLINE / OFFLINE:


----------------------------------------
$crs_stat or
$crsctl stat res -t ------> 11gr2

How to Start & Stop CRS and CSS:


-------------------------------
$crsctl start crs
$crsctl stop crs

#/etc/init.d/[Link] start
#/etc/init.d/[Link] stop

#/etc/init.d/[Link] stop
#/etc/init.d/[Link] start

How to Enable & Disable CRS:


---------------------------
$crsctl enable crs
$crsctl disable crs

#/etc/init.d/[Link] enable
#/etc/init.d/[Link] disable
How to Check current Version of Clusterware:
-------------------------------------------
$crsctl query crs activeversion
$crsctl query crs softwareversion [node_name]

How to List the Voting disks currently used by CSS:


--------------------------------------------------
$crsctl check css votedisk
$crsctl query css votedisk

How to Add and Delete any voting disk:


-------------------------------------
$crsctl add css votedisk <PATH>
$crsctl delete css votedisk <PATH>

How to start clusterware resources:


----------------------------------
$crsctl start resources
$crsctl stop resources

How to STOP the Oracle RAC resources:


------------------------------------
#srvctl stop instance -d <database_name> -n <node_name>
#srvctl stop vip -n <node_name> -f

How to check current VIP configuration:


--------------------------------------
$srvctl config nodeapps -a

How to verify VIP status:


------------------------
$ifconfig -a

===================================================================================
==========================================

SCAN IP address in 11gR2 RAC

SCAN in RAC

Single Client Access Name (SCAN) eliminates the need to change TNSNAMES entry when
nodes are added to or removed from the Cluster. RAC instances register to SCAN
listeners as remote listeners. Oracle recommends assigning 3 addresses to SCAN,
which will create 3 SCAN listeners, though the cluster has got dozens of nodes..
SCAN is a domain name registered to at least one and up to three IP addresses,
either in DNS (Domain Name Service) or GNS (Grid Naming Service). The SCAN must
resolve to at least one address on the public network. For high availability and
scalability, Oracle recommends configuring the SCAN to resolve to three addresses.
[Link]

How to start SCAN and SCAN Listener:


-----------------------------------
#$GRID_HOME/bin/srvctl start scan
#$GRID_HOME/bin/srvctl start scan_listener
How to STOP the SCAN LISTENER and the SCAN VIP resources:
--------------------------------------------------------
#$GRID_HOME/bin/srvctl stop scan_listener
#$GRID_HOME/bin/srvctl stop scan

How to check the STATUS of the SCAN LISTENER and the SCAN VIP resources:
-----------------------------------------------------------------------
#$GRID_HOME/bin/srvctl status scan_listener
SCAN listener LISTENER_SCAN1 is enabled
SCAN listener LISTENER_SCAN1 is not running
SCAN listener LISTENER_SCAN2 is enabled
SCAN listener LISTENER_SCAN2 is not running
SCAN listener LISTENER_SCAN3 is enabled
SCAN listener LISTENER_SCAN3 is not running

#$GRID_HOME/bin/srvctl status scan


SCAN VIP scan1 is enabled
SCAN VIP scan1 is not running
SCAN VIP scan2 is enabled
SCAN VIP scan2 is not running
SCAN VIP scan3 is enabled
SCAN VIP scan3 is not running

How to check SCAN-VIP in the resource file:


------------------------------------------
#$GRID_HOME/bin/srvctl config scan
SCAN name:rac-scan,Network: 1/[Link]/[Link]/eth1
SCAN VIP name: scan1, /[Link]/10.1000.10.21
SCAN VIP name: scan1, /[Link]/10.1000.10.22
SCAN VIP name: scan1, /[Link]/10.1000.10.23

How to check SCN IP address on DNS:


----------------------------------
$nslookup <scan-name>

$nslookup [Link]
Server: [Link]
fc Address: [Link]#53

Name:[Link]
Address: [Link]
Name:[Link]
Address: [Link]
Name:[Link]
Address: [Link]

===================================================================================
==========================================

Log Directory Structure in RAC (RAC Logs)


-------------------------------------------
Each component in the CRS (Cluster Ready Services) stack has its respective
directories created under the CRS home.

$ORA_CRS_HOME/crs/log-------->>> Contains trace files for the CRS resources.

$ORA_CRS_HOME/crs/init-------->>> Contains trace files of the CRS daemon during


startup. Good place to start with any CRS login problems.

$ORA_CRS_HOME/css/log-------->>> The Cluster Synchronization (CSS) logs indicate


all actions such as reconfigurations, missed check-ins, connects, and disconnects
from the client CSS listener. In some cases, the logger logs messages with the
category of [Link] for the reboots done by Oracle. This could be used for
checking the exact time when the reboot occurred.

$ORA_CRS_HOME/css/init-------->>> Contains core dumps from the Oracle Cluster


Synchronization Service daemon (OCSSd) and the process ID (PID) for the CSS daemon
whose death is treated as fatal. If abnormal restarts for CSS exist, the core files
will have the format of core..

$ORA_CRS_HOME/evm/log-------->>> Log files for the Event Volume Manager (EVM) and
evmlogger daemons. Not used as often for debugging as the CRS and CSS directories.

$ORA_CRS_HOME/evm/init-------->>> PID and lock files for EVM. Core files for EVM
should also be written here.

$ORA_CRS_HOME/srvm/log-------->>> Log files for Oracle Cluster Registry (OCR),


which contains the details at the Oracle cluster level.

$ORA_CRS_HOME//log Log-------->>> files for Oracle Clusterware (known as the


cluster alert log), which contains diagnostic messages at the Oracle cluster level.
This is available from Oracle database 10g R2.

===================================================================================
==========================================

RAC Commands :idp ,attributes

Cluster Related Commands


--------------- --------
crs_stat -t Shows HA resource status (hard to read)
crsstat Ouptut of crs_stat -t formatted nicely
ps -ef|grep [Link] [Link] [Link] [Link]
crsctl check crs CSS,CRS,EVM appears healthy
crsctl stop crs Stop crs and all other services
crsctl disable crs* Prevents CRS from starting on reboot
crsctl enable crs* Enables CRS start on reboot
crs_stop -all Stops all registered resources
crs_start -all Starts all registered resources

Database Related Commands


-------------------------
srvctl start instance -d <db_name> -i <inst_name> Starts an instance
srvctl stop instance -d <db_name> -i <inst_name> Stops an instance
srvctl status instance -d <db_name> -i <inst_name> Checks an individual
instance
srvctl start database -d <db_name> Starts all instances
srvctl stop database -d <db_name> Stops all instances, closes
database
srvctl status database -d <db_name> Checks status of all
instances
srvctl start service -d <db_name> -s <service_name> Starts a service
srvctl stop service -d <db_name> -s <service_name> Stops a service
srvctl status service -d <db_name> Checks status of a service
srvctl start nodeapps -n <node_name> Starts gsd, vip, listener,
and ons
srvctl stop nodeapps -n <node_name> Stops gsd, vip and listener

BACKGROUND PROCESSES
----------------------------------
There are three main background processes you can see when doing a ps �ef|grep
[Link]. They are normally started by init during the operating system boot process.
They can be started and stopped manually by issuing the command
/etc/init.d/[Link] {start|stop|enable|disable}

/etc/rc.d/init.d/[Link]
/etc/rc.d/init.d/[Link]
/etc/rc.d/init.d/[Link]

===================================================================================
==========================================

ORACLE CLUSTERWARE:

Oracle Cluster Ready Services becomes Oracle Clusterware


Oracle RAC 10g Release 1 introduced Oracle Cluster Ready Services (CRS), a
platform-independent set of system services for cluster environments. In Release 2,
Oracle has renamed this product to Oracle Clusterware.

Clusterware maintains two files: the Oracle Cluster Registry (OCR) and the Voting
Disk. The OCR and the Voting Disk must reside on shared disks as either raw
partitions or files in a cluster filesystem.

What is Clusterware composed of?


The Oracle Clusterware is comprised primarily of two components: the voting disk
and the OCR (Oracle Cluster Registry). The voting disk is nothing but a file that
contains and manages information of all the node memberships and the OCR is a file
that manages the cluster and RAC configuration. Let's take a quick look at
administering the voting disks and the OCR.

****components of Oracle clusterware:-


Oracle clusterware is made up of components like voting disk and Oracle Cluster

****Registry(OCR). What is a CRS resource?

Oracle clusterware is used to manage high-availability operations in a


[Link]
that Oracle Clusterware manages is known as a CRS [Link] examples of CRS
resources are database,an instance,a service,a listener,a VIP address,an
application
process etc.

****How does a Oracle Clusterware manage CRS resources?


Oracle clusterware manages CRS resources based on the configuration information of
CRS resources stored in OCR(Oracle Cluster Registry).

****voting disk
A file that manages information about node membership

Voting Disk File is a file on the shared cluster system or a shared raw device
[Link] helps to avoid the split-brain syndrome.

****split brain syndrome


Where two or more instances attempt to control a cluster database. In a two-node
environment, for example, one instance attempts to manage updates simultaneously
while the other instance attempts to manage updates.

****What is Cache Fusion and how does this affect applications?


Cache Fusion is a shared cache architecture that uses high speed. Database blocks
are shipped across the interconnect to the node where access to the data is needed.

****Give Details on Cache Fusion:-


Oracle RAC is composed of two or more instances. When a block of data is read from
datafile by an instance within the cluster and another instance is in need of the
same
block,it is easy to get the block image from the insatnce which has the block in
its SGA
rather than reading from the disk. To enable inter instance communication Oracle
RAC
makes use of interconnects. The Global Enqueue Semrvice(GES) monitors and Instance
enqueue process manages the cahce fusion.

****What is FAN?
Fast application Notification as it abbreviates to FAN relates to the events
related to instances,services and [Link] is a notification mechanism that
Oracle RAc uses to notify other processes about the configuration and service level
information that includes service status changes such as,UP or DOWN
[Link] can respond to FAN events and take immediate action.

****Transparent application failover (TAF)


A runtime failover for high-availability environments, such as Real Application
Clusters and Oracle Real Application Clusters Guard, TAF refers to the failover and
re-establishment of application-to-service connections. It enables client
applications to automatically reconnect to the database if the connection fails,
and optionally resume a SELECT statement that was in progress. This reconnect
happens automatically from within the Oracle Call Interface (OCI) library.

****VIP:
each cluster node is assigned a virtual IP address (VIP). In the event of node
failure, the failed node's IP address can be reassigned to a surviving node to
allow applications to continue accessing the database through the same IP address.

*****What is the differnece between server-side and client-side connection load


balancing?
Client-side balancing happens at client side where load balancing is done using
listener.
Incase of server-side load balancing listener uses a load-balancing advisory to
redirect
connections to the instance providing best service.
What is GRD?
GRD stands for Global Resource Directory. The GES and GCS maintains records of the
statuses of each datafile and each cahed block using global resource [Link]
process is referred to as cache fusion and helps in data integrity.

Give Details on Cache Fusion:-


Oracle RAC is composed of two or more instances. When a block of data is read from
datafile by an instance within the cluster and another instance is in need of the
same
block,it is easy to get the block image from the insatnce which has the block in
its SGA
rather than reading from the disk. To enable inter instance communication Oracle
RAC
makes use of interconnects. The Global Enqueue Service(GES) monitors and Instance
enqueue process manages the cahce fusion.

***Which enable the load balancing of applications in RAC?


Oracle Net Services enable the load balancing of application connections across all
of the
instances in an Oracle RAC database

***What is the use of VIP?


If a node fails, then the node's VIP address fails over to another node on which
the VIP
address can accept TCP connections but it cannot accept Oracle connections.

***How do we verify that RAC instances are running?


SQL>select * from V$ACTIVE_INSTANCES;

***What is FAN?
Fast application Notification (FAN )relates to the events,instances,services and
nodes.
notify other processes about the configuration and service level information that
includes
service status changes such as,UP or DOWN [Link] can respond to FAN
events and take immediate action.

***What two parameters must be set at the time of starting up an ASM instance in a
RAC environment?
The parameters CLUSTER_DATABASE and INSTANCE_TYPE must beset.

***CLUSTER_DATABASE

This option needs to be set to �TRUE� for obvious reasons in RAC operation, BUT
there are exceptions when you have to patch. Say you have to shutdown all the nodes
except one, then you set the parameter for this node to �FALSE� and carry on with
the operations of patching/upgrading the instance.

***ACTIVE_INSTANCE_COUNT

This is used in a 2-node RAC. It enables you to assign one instance in a two-
instance RAC cluster as the primary instance and the other instance as the
secondary instance. So obviously, it has no use in a RAC that consists of more than
two nodes.

It is pretty simple. You do:

ALTER SYSTEM SET ACTIVE_INSTANCE_COUNT= 1The first starting node becomes the
primary node and the other one is the secondary node. This means that the primary
node goes ahead and accepts client connections and should that fail then the
secondary node takes over the failed connections from the first node, thus making
the secondary node the primary node. Should the failed primary node come back
online then it starts operating as the secondary node and will not accept client
connections unless the (current) primary node has failed.

*****Which enable the load balancing of applications in RAC?


Oracle Net Services enable the load balancing of application connections across all
of theinstances in an Oracle RAC database.

RAC Back ground Process:-


===========================

RAC does have a few unique background processes that do not play any role in a
single instance configuration. The functionality of these background processes is
described below:

Oracle RAC instances are composed of following background processes:


ACMS � Atomic Control file to Memory Service (ACMS)
GTX0-j � Global Transaction Process
LMON � Global Enqueue Service Monitor
LMD � Global Enqueue Service Daemon
LMS � Global Cache Service Process
LCK0 � Instance Enqueue Process
DIAG � Diagnosability Daemon
RMSn � Oracle RAC Management Processes (RMSn)
RSMN � Remote Slave Monitor
DBRM � Database Resource Manager (from 11g R2)
PING � Response Time Agent (from 11g R2)

GCS
Global Cache Service processes (GCS) are processes that, when spawned by Oracle,
copy blocks directly from the holding instance's buffer cache and send a read
consistent copy of the block to the requesting foreground process on the requesting
instance to be placed into the buffer cache. RAC software provides for up to 10 GCS
processes (0 thru 9), depending on the amount of messaging traffic. However, there
is, by default, one GCS process per pair of CPU.

GES
Global Enqueue Service Daemon is a background agent process that manages requests
for resources to control access to blocks and global enqueues. It manages lock
manager service requests for GCS resources and sends them to a service queue to be
handled by the GCS process. The GES process also handles global deadlock detection
and remote resource requests (remote resource requests are requests originating
from another instance).
LCK
The Lock Process (LCK) manages non-cache fusion resource requests such as library
and row cache requests and lock requests that are local to the server. LCK manages
instance resource requests and cross-instance call operations for shared resources.
It builds a list of invalid lock elements and validates lock elements during
recovery. Because the LMS process handles the primary function of lock management,
only a single LCK process exists in each instance. There is only one LCK process
per instance in RAC.

DIAG
The Diagnosability Daemon (DIAG) background process monitors the health of the
instance and captures diagnostic data about process failures within instances. The
operation of this daemon is automated and updates an alert log file to record the
activity that it performs.
Oracle 11g Processes

In addition, Oracle 11g added the following new processes:


? ACMS: The Atomic Controlfile to Memory Service (ACMS) process insures SGA
memory updates are updated on all nodes or none.
? GTX: The Global Transaction Process supports XA transactions within RAC.
? RMS: The RAC Management Process creates resources that allow new database
instances to be added to the cluster.
? RSM: The remote slave monitor creates slave processes for processes running
on other RAC instances within the cluster.

****What are the types of connection load-balancing?


There are two types of connection load-balancing:server-side load balancing and
client-side load balancing.

*****What is the differnece between server-side and client-side connection load


balancing?
Client-side balancing happens at client side where load balancing is done using
[Link]
case of server-side load balancing listener uses a load-balancing advisory to
redirect
connections to the instance providing best service.

===================================================================================
==========================================

*****Oracle RAC initialization parameters


list of parameters that must be identical on every instance are given below:
ACTIVE_INSTANCE_COUNT
ARCHIVE_LAG_TARGET
COMPATIBLE
CLUSTER_DATABASE
CLUSTER_DATABASE_INSTANCE
CONTROL_FILES
DB_BLOCK_SIZE
DB_DOMAIN
DB_FILES
DB_NAME
DB_RECOVERY_FILE_DEST
DB_RECOVERY_FILE_DEST_SIZE
DB_UNIQUE_NAME
INSTANCE_TYPE (RDBMS or ASM)
PARALLEL_MAX_SERVERS
REMOTE_LOGIN_PASSWORD_FILE
UNDO_MANAGEMENT

===================================================================================
==========================================

Verify if the Clusterware has all the nodes registered using the olsnodes command.
[oracle@oradb1 oracle]$ olsnodes -n
oradb1
oradb2
oradb3
oradb4
oradb5

SQL> select * from v$active_instances;

INST_NUMBER INST_NAME
----------- -----------------------------------
1 [Link]:SSKY1
2 [Link]:SSKY2
3 [Link]:SSKY3
4 [Link]:SSKY4
5 [Link]:SSKY5

The database services:


[oracle@oradb1 oracle]$ srvctl status service -d SSKYDB
Service CRM is running on instance(s) SSKY1
Service CRM is running on instance(s) SSKY2
Service CRM is running on instance(s) SSKY3
Service CRM is running on instance(s) SSKY4
Service CRM is running on instance(s) SSKY5
Service PAYROLL is running on instance(s) SSKY1
Service PAYROLL is running on instance(s) SSKY5

*****Starting of the CRS services

crs_start -all
crs_stop -all

Verify if the cluster services is started, using the crs_stat command.


[oracle@oradb1 oracle]$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
[Link] application ONLINE ONLINE oradb1
[Link] application ONLINE ONLINE oradb1
[Link] application ONLINE ONLINE oradb1
[Link] application ONLINE ONLINE oradb2
[Link] application ONLINE ONLINE oradb3
[Link] application ONLINE ONLINE oradb4
[Link] application ONLINE ONLINE oradb4
[Link] application ONLINE ONLINE oradb4
[Link] application ONLINE ONLINE oradb5
[Link] application ONLINE ONLINE oradb5
[Link] application ONLINE ONLINE oradb5

The VIP configuration could be verified using the ifconfig command at the OS level.

[oracle@oradb5 oracle]$ ifconfig -a

Verify if the cluster services is started, using the crs_stat command.


[oracle@oradb1 oracle]$ crs_stat -t

===================================================================================
==========================================

Oracle RAC Interview Questions/FAQs Part1

1. What is the use of RAC?

2. What are the prerequisites for RAC setup?

3. What are Oracle Clusterware/Daemon processes and what they do?


Ans:
ocssd, crsd, evmd, oprocd, racgmain, racgimon

4. What are the special background processes for RAC (or) what is difference in
stand-alone database & RAC databasebackground processes?
DIAG, LCKn, LMD, LMSn, LMON

5. What are structural changes in 11g R2 RAC?


Ans:
[Link]
Grid & ASM are on one home,
Voting disk & ocrfile can be on the ASM,
SCAN,
By using srvctl, we can mange diskgroups, home, ons, eons, filesystem, srvpool,
server, scan, scan_listener, gns, vip, oc4j,
GSD

6. What are the new features in 11g (R2) RAC?


Ans:
[Link]
Grid & ASM are on one home,
Voting disk & ocrfile can be on the ASM,
SCAN,
By using srvctl, we can mange diskgroups, home, ons, eons, filesystem, srvpool,
server, scan, scan_listener, gns, vip, oc4j,
GSD

7. What is cache fusion?


Ans:
Transferring of data between RAC instances by using private network. Cache Fusion
is the remote memory mapping of Oracle buffers, shared between the caches of
participating nodes in the cluster. When a block of data is read from datafile by
an instance within the cluster and another instance is in need of the same block,
it is easy to get the block image from the instance which has the block in its SGA
rather than reading from the disk.

8. What is the purpose of Private Interconnect?


Ans:
Clusterware uses the private interconnect for cluster synchronization (network
heartbeat) and daemon communication between the clustered nodes. This communication
is based on the TCP protocol. RAC uses the interconnect for cache fusion (UDP) and
inter-process communication (TCP).

9. What are the Clusterware components?


Ans:
Voting Disk - Oracle RAC uses the voting disk to manage cluster membership by way
of a health check and arbitrates cluster ownership among the instances in case of
network failures. The voting disk must reside on shared disk.

Oracle Cluster Registry (OCR) - Maintains cluster configuration information as well


as configuration information about any cluster database within the cluster. The OCR
must reside on shared disk that is accessible by all of the nodes in your cluster.
The daemon OCSSd manages the configuration info in OCR and maintains the changes to
cluster in the registry.

Virtual IP (VIP) - When a node fails, the VIP associated with it is automatically
failed over to some other node and new node re-arps the world indicating a new MAC
address for the IP. Subsequent packets sent to the VIP go to the new node, which
will send error RST packets back to the clients. This results in the clients
getting errors immediately.
crsd � Cluster Resource Services Daemon
cssd � Cluster Synchronization Services Daemon
evmd � Event Manager Daemon
oprocd / hangcheck_timer � Node hang detector

10. What is OCR file?


Ans:
RAC configuration information repository that manages information about the cluster
node list and instance-to-node mapping information. The OCR also manages
information about Oracle Clusterware resource profiles for customized applications.
Maintains cluster configuration information as well as configuration information
about any cluster databasewithin the cluster. The OCR must reside on shared disk
that is accessible by all of the nodes in your cluster. The daemon OCSSd manages
the configuration info in OCR and maintains the changes to cluster in the registry.

11. What is Voting file/disk and how many files should be there?
Ans:
Voting Disk File is a file on the shared cluster system or a shared raw device
file. Oracle Clusterware uses the voting disk to determine which instances are
members of a cluster. Voting disk is akin to the quorum disk, which helps to avoid
the split-brain syndrome. Oracle RAC uses the voting disk to manage cluster
membership by way of a health check and arbitrates cluster ownership among the
instances in case of network failures. The voting disk must reside on shared disk.

12. How to take backup of OCR file?


Ans:
#ocrconfig -manualbackup
#ocrconfig -export file_name.dmp
#ocrdump -backupfile my_file
$cp -p -R /u01/app/crs/cdata /u02/crs_backup/ocrbackup/RAC1

13. How to recover OCR file?


Ans:
#ocrconfig -restore backup_file.ocr
#ocrconfig -import file_name.dmp

14. What is local OCR?


Ans:
/etc/oracle/[Link]
/var/opt/oracle/[Link]

15. How to check backup of OCR files?


Ans:
#ocrconfig �showbackup

16. How to take backup of voting file?


Ans:
dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
crsctl backup css votedisk -- from 11g R2

17. How do I identify the voting disk location?


Ans:
# crsctl query css votedisk

18. How do I identify the OCR file location?


check /var/opt/oracle/[Link] or /etc/[Link]
Ans:
# ocrcheck

19. If voting disk/OCR file got corrupted and don�t have backups, how to get them?
Ans:
We have to install Clusterware.

20. Who will manage OCR files?


Ans:
cssd will manage OCR.

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================
===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================
===================================================================================
==========================================

===================================================================================
==========================================

===================================================================================
==========================================
11g RAC Administration and Maintenance Tasks and Utilities:

1. Checking CRS Status


2. Viewing Cluster name
3. Viewing No. Of Nodes configured in Cluster
4. Viewing Votedisk Information
5. Viewing OCR Disk Information
6. Various Timeout Settings in Cluster
7. Add/Remove OCR file in Cluster
8. Add/Remove Votedisk file in Cluster
9. Backing Up OCR
10. Restoring Votedisk
11. Changing Public and Virtual IP Address

[Link] CRS Status:

The below two commands are generally used to check the status of CRS. The first
command lists the status of CRS
on the local node where as the other command shows the CRS status across all the
nodes in Cluster.

crsctl check crs <<-- for the local node

[root@rak3 bin]# ./crsctl check crs

CRS-4638: Oracle High Availability Services is online


CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

crsctl check cluster <<-- for remote nodes in the cluster


[root@rak3 bin]# ./crsctl check cluster

CRS-4537: Cluster Ready Services is online


CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager i

Checking Visability of CSS across nodes:

crsctl check cluster


For this command to run, CSS needs to be running on the local node. The "ONLINE"
status for remote node says that CSS is running on that node.
When CSS is down on the remote node, the status of "OFFLINE" is displayed for that
node.

[root@rac1]# crsctl check cluster


rac1 ONLINE
rac2 ONLINE

[Link] Cluster name:

I use below command to get the name of Cluster. You can also dump the ocr and view
the name from the dump file.

[oracle@rak1 ~]$ cd /u01/app/11.2.0/grid/bin


[oracle@rak1 bin]$ ./cemutlo -n
rak-scan

or
[oracle@rak1 bin]$ ./olsnodes -n
rak-scan

or
Only the master node takes backups of the OCR and any node can become the master
node (depending on node evictions). So this "ocrdump" command worked on the master
node only

ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk


'{print $3}'

[root@rac1]# ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT


| awk '{print $3}'
rak-scan
[root@rac1]#

or

ocrconfig -export /tmp/ocr_exp.dat -s online


for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if [ $i !=
'[Link]' ]; then echo $i; fi; done

[root@rac1]# ocrconfig -export /tmp/ocr_exp.dat -s online


[root@rac1]# for i in `strings /tmp/ocr_exp.dat | grep -A 1 clustername` ; do if
[ $i != '[Link]' ]; then echo $i; fi; done
rak-scan

[root@rac1]#

OR

Oracle creates a directory with the same name as Cluster under the
$ORA_CRS_HOME/cdata. you can get the cluster name from this directory as well.

[root@rac1]# ls /u01/app/11.2.0/grid/cdata
localhost
rak-scan
[Link] No. Of Nodes configured in Cluster:

The below command can be used to find out the number of nodes registered into the
cluster.
It also displays the node's Public name, Private name and Virtual name along with
their numbers.

olsnodes

[oracle@rak1 bin]$ olsnodes -n -s


rak1 1 Active
rak3 2 Active

Usage: olsnodes [ [-n] [-i] [-s] [-t] [<node> | -l [-p]] | [-c] ] [-g] [-v]
where
-n print node number with the node name
-p print private interconnect address for the local node
-i print virtual IP address with the node name
<node> print information for the specified node
-l print information for the local node
-s print node status - active or inactive
-t print node type - pinned or unpinned
-g turn on logging
-v Run in debug mode; use at direction of Oracle Support only.
-c print clusterware name

[Link] Votedisk Information:

The below command is used to view the no. of Votedisks configured in the Cluster.

crsctl query css votedisk

[oracle@rak1 bin]$ ./crsctl query css votedisk


## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 1c8b4a5e50684fc0bfd2cfc5fd3a1df0 (ORCL:DISK1) [DATA]
2. ONLINE 1cc7906f37314f8ebfd7a737ad917cd2 (ORCL:DISK2) [DATA]
3. ONLINE 814c85dfefb04f7cbfeb175d0e7b7831 (ORCL:DISK3) [DATA]
Located 3 voting disk(s).

[Link] OCR Disk Information:

The below command is used to view the no. of OCR files configured in the Cluster.
It also displays the version of OCR as well as storage space information. You can
only have 2 OCR files at max. run this command as root user, if we run this command
as oracle user we get this message "logical corruption check bypassed due to non-
privileged user"

ocrcheck

[root@rak3 bin]# ./ocrcheck


Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2776
Available space (kbytes) : 259344
ID : 33615009
Device/File Name : +DATA
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

6. Various Timeout Settings in Cluster:

Disktimeout:
Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)

Misscount:
Network Latencies in second from node-to-node (Interconnect). Default Value is 60
Sec (Linux) and 30 Sec in Unix platform. (Network IO)

Misscount < Disktimeout

NOTE: Do not change them without contacting Oracle Support. This may cause logical
corruption to the Data.

IF
(Disk IO Time > Disktimeout) OR (Network IO time > Misscount)
THEN
REBOOT NODE
ELSE
DO NOT REBOOT
END IF;

crsctl get css disktimeout


crsctl get css misscount
crsctl get css reboottime

Disktimeout:
[root@rac1]# crsctl get css disktimeout
200

Misscount:
[root@rac1]# crsctl get css misscount
Configuration parameter misscount is not defined.
<<<<< This message indicates that the Misscount is not set manually and it is set
to it's
Default Value On Linux, it is default to 60 Second. If you want to change it, you
can do that as below. (Not recommended)

[root@rac1]# crsctl set css misscount 80


Configuration parameter misscount is now set to 80

[root@rac1]# crsctl get css misscount


80
The below command sets the value of misscount back to its Default values:

crsctl unset css misscount


[oracle@rak1 bin]$ ./crsctl unset css misscount
Configuration parameter misscount is reset to default operation value.

[oracle@rak1 bin]$ ./crsctl get css misscount


60

Rebootingtime:
[oracle@rak1 bin]$ ./crsctl get css reboottime
3

[Link]/Remove OCR file in Cluster:

Removing OCR File

(1) Get the Existing OCR file information by running ocrcheck utility.

[root@rac1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0 <-- OCR
Device/File integrity check succeeded
Device/File Name : /u02/ocfs2/ocr/OCRfile_1 <-- OCR Mirror
Device/File integrity check succeeded

Cluster registry integrity check succeeded

(2) The First command removes the OCR mirror (/u02/ocfs2/ocr/OCRfile_1). If you
want to remove the OCR
file (/u02/ocfs2/ocr/OCRfile_1) run the next command.

ocrconfig -replace ocrmirror


ocrconfig -replace ocr

[root@rac1]# ocrconfig -replace ocrmirror


[root@rac1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
Device/File integrity check succeeded

Device/File not configured <-- OCR Mirror not existed any more

Cluster registry integrity check succeeded

Adding OCR
--------------------
You need to add OCR or OCR Mirror file in a case where you want to move the
existing OCR file location to the different Devices.
The below command add ths OCR mirror file if OCR file alread exists.

(1) Get the Current status of OCR:

[root@rac1]# ocrconfig -replace ocrmirror


[root@rac1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0 <<-- OCR File
Device/File integrity check succeeded

Device/File not configured <-- OCR Mirror does not exist

Cluster registry integrity check succeeded

As You can see, I only have one OCR file but not the second file which is OCR
Mirror.
So, I can add second OCR (OCR Mirror) as below command.

ocrconfig -replace ocrmirror <File name>

[root@rac1]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_1


[root@rac1]# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 3852
Available space (kbytes) : 258268
ID : 744414276
Device/File Name : /u02/ocfs2/ocr/OCRfile_0
Device/File integrity check succeeded
Device/File Name : /u02/ocfs2/ocr/OCRfile_1
Device/File integrity check succeeded

Cluster registry integrity check succeeded

You can have at most 2 OCR devices (OCR itself and its single Mirror) in a cluster.
Adding extra Mirror gives you below error message

[root@rac1]# ocrconfig -replace ocrmirror /u02/ocfs2/ocr/OCRfile_2


PROT-21: Invalid parameter
[root@rac1]#

Add/Remove Votedisk file in Cluster:


---------------------------------------

Adding Votedisk:

Get the existing Vote Disks associated into the cluster. To be safe, Bring crs
cluster stack down on all the nodes
but one on which you are going to add votedisk from.
(1) Stop CRS on all the nodes in cluster but one.

[root@rac2]# crsctl stop crs

(2) Get the list of Existing Vote Disks

crsctl query css votedisk

[root@rac1]# crsctl query css votedisk


0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).

(3) Backup the VoteDisk file

Backup the existing votedisks as below as oracle:

dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0

[root@rac1]# su - oracle
[oracle@rac1 ~]$ dd if=/u02/ocfs2/vote/VDFile_0 of=$ORACLE_BASE/bkp/vd/VDFile_0
41024+0 records in
41024+0 records out
[oracle@rac1 ~]$

(4) Add an Extra Votedisk into the Cluster:

If it is a OCFS, then touch the file as oracle. On raw devices, initialize the raw
devices using "dd" command

touch /u02/ocfs2/vote/VDFile_3 <<-- as oracle


crsctl add css votedisk /u02/ocfs2/vote/VDFile_3 <<-- as oracle
crsctl query css votedisks

[root@rac1]# su - oracle
[oracle@rac1 ~]$ touch /u02/ocfs2/vote/VDFile_3
[oracle@rac1 ~]$ crsctl add css votedisk /u02/ocfs2/vote/VDFile_3
Now formatting voting disk: /u02/ocfs2/vote/VDFile_3.
Successful addition of voting disk /u02/ocfs2/vote/VDFile_3.

(5) Confirm that the file has been added successfully:

[root@rac1]# ls -l /u02/ocfs2/vote/VDFile_3
-rw-r----- 1 oracle oinstall 21004288 Oct 6 16:31 /u02/ocfs2/vote/VDFile_3
[root@rac1]# crsctl query css votedisks
Unknown parameter: votedisks
[root@rac1]# crsctl query css votedisk
0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
3. 0 /u02/ocfs2/vote/VDFile_3
Located 4 voting disk(s).
[root@rac1]#

Removing Votedisk:

Removing Votedisk from the cluster is very simple. Tthe below command removes the
given votedisk from cluster configuration.
crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3

[root@rac1]# crsctl delete css votedisk /u02/ocfs2/vote/VDFile_3


Successful deletion of voting disk /u02/ocfs2/vote/VDFile_3.
[root@rac1]#

[root@rac1]# crsctl query css votedisk


0. 0 /u02/ocfs2/vote/VDFile_0
1. 0 /u02/ocfs2/vote/VDFile_1
2. 0 /u02/ocfs2/vote/VDFile_2
Located 3 voting disk(s).
[root@rac1]#

***********Backing Up OCR

Oracle performs physical backup of OCR devices every 4 hours under the default
backup direcory $ORA_CRS_HOME/cdata/<CLUSTER_NAME>
and then it rolls that forward to Daily, weekly and monthly backup. You can get the
backup information by executing below command.

ocrconfig -showbackup

[root@rac1]# ocrconfig -showbackup


rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
[root@rac1]#

*********Manually backing up the OCR

ocrconfig -manualbackup <<--Physical Backup of OCR

The above command backs up OCR under the default Backup directory. You can export
the contents of the OCR using below command (Logical backup).

ocrconfig -export /tmp/ocr_exp.dat -s online <<-- Logical Backup of OCR

*********Restoring OCR

The below command is used to restore the OCR from the physical backup. Shutdown CRS
on all nodes.

ocrconfig -restore <file name>

[root@rac2]# ocrconfig -restore /u01/app/crs/cdata/test-crs/[Link]

The above command restore the OCR from week old backup.
If you have logical backup of OCR (taken using export option), then You can import
it with the below command.

ocrconfig -import /tmp/ocr_exp.dat

*********Locate the avialable Backups


[root@rac1]# ocrconfig -showbackup
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac2 2007/09/03 [Link] /u01/app/crs/cdata/test-crs/[Link]
rac1 2007/10/07 [Link] /u01/app/crs/cdata/test-crs/backup_20071007_135041.ocr

*********Restoring Votedisks

� Shutdown CRS on all the nodes in Cluster.


� Locate the current location of the Votedisks
� Restore each of the votedisks using "dd" command from the previous good backup of
Votedisk taken using the same "dd" command.
� Start CRS on all the nodes.
crsctl stop crs
crsctl query css votedisk
dd if=<backup of Votedisk> of=<Votedisk file> <<-- do this for all the votedisks
crsctl start crs

Changing Public and Virtual IP Address:

Current Config Changed to

Node 1:

Public IP: [Link] [Link]


VIP: [Link] [Link]
subnet: [Link] [Link]
Netmask: [Link] [Link]
Interface used: eth0 eth0
Hostname: [Link] [Link]

Node 2:

Public IP: [Link] [Link]


VIP: [Link] [Link]
subnet: [Link] [Link]
Netmask: [Link] [Link]
Interface used: eth0 eth0
Hostname: [Link] [Link]

=======================================================================
(A)

Take the Services, Database, ASM Instances and nodeapps down on both the Nodes in
Cluster. Also disable the nodeapps, asm and database instances to prevent them from
restarting in case if this node gets rebooted during this process.
srvctl stop service -d test
srvctl stop database -d test
srvctl stop asm -n rac1
srvctl stop asm -n rac2
srvctl stop nodeapps -n rac1,rac12
srvctl disable instance -d test -i test1,test2
srvctl disable asm -n rac1
srvctl disable asm -n rac2
srvctl disable nodeapps -n rac1
srvctl disable nodeapps -n rac2
(B)
Modify the /etc/hosts and/or DNS, ifcfg-eth0 (local node) with the new IP values
on All the Nodes

(C)
Restart the specific network interface in order to use the new IP.
ifconfig eth0 down
ifconfig eth0 up

Or, you can restart the network.


CAUTION: on NAS, restarting entire network may cause the node to be rebooted.

(D)
Update the OCR with the New Public IP.
In case of public IP, you have to delete the interface first and then add it back
with the new IP address.

As oracle user, Issue the below command:


oifcfg delif -global eth0
oifcfg setif -global eth0/[Link]:public

(E)
Update the OCR with the New Virtual IP.
Virtual IP is part of the nodeapps and so you can modify the nodeapps to update the
Virtual IP information.

As privileged user (root), Issue the below commands:


srvctl modify nodeapps -n rac1 -A [Link]/[Link]/eth0 <-- for Node 1
srvctl modify nodeapps -n rac1 -A [Link]/[Link]/eth0 <-- for Node 2

(F)
Enable the nodeapps, ASM, database Instances for all the Nodes.
srvctl enable instance -d test -i test1,test2
srvctl enable asm -n rac1
srvctl enable asm -n rac2
srvctl enable nodeapps -n rac1
srvctl enable nodeapps -n rac2

(G)
Update the [Link] file on each nodes with the correct IP addresses in case if
it uses the IP address instead of the hostname.

(H)
Restart the Nodeapps, ASM and Database instance
srvctl start nodeapps -n rac1
srvctl start nodeapps -n rac2
srvctl start asm -n rac1
srvctl start asm -n rac2
srvctl start database -d test

Top of Form
Bottom of Form

---------------------------------------------------------------------------------

1) Check satus of cluster resources

[oracle@Rac2 ~]$ crsctl stat res -t


���������������������������
NAME TARGET STATE SERVER STATE_DETAILS
���������������������������
Local Resources
���������������������������
[Link]
ONLINE ONLINE rac1
ONLINE ONLINE rac2
[Link]
ONLINE ONLINE rac1
ONLINE ONLINE rac2
[Link]
ONLINE ONLINE rac1 Started
ONLINE ONLINE rac2 Started
[Link]
ONLINE ONLINE rac1
ONLINE ONLINE rac2
[Link]
OFFLINE OFFLINE rac1
OFFLINE OFFLINE rac2
[Link]
ONLINE ONLINE rac1
ONLINE ONLINE rac2
[Link]
ONLINE ONLINE rac1
ONLINE ONLINE rac2
[Link]
ONLINE ONLINE rac1
ONLINE ONLINE rac2
���������������������������
Cluster Resources
���������������������������
ora.LISTENER_SCAN1.lsnr
1 ONLINE ONLINE rac1
ora.LISTENER_SCAN2.lsnr
1 ONLINE ONLINE rac2
ora.LISTENER_SCAN3.lsnr
1 ONLINE ONLINE rac2
ora.oc4j
1 OFFLINE OFFLINE
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac1 Open
2 ONLINE ONLINE rac2 Open
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2

2) Check status of local RAC Background Processes

[oracle@Rac2 ~]$ crsctl stat res -t -init


���������������������������
NAME TARGET STATE SERVER STATE_DETAILS
���������������������������
Cluster Resources
���������������������������
[Link]
1 ONLINE ONLINE rac2 Started
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2 ACTIVE:0
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2
[Link]
1 ONLINE ONLINE rac2

[oracle@Rac1 ~]$ crsctl stat res -t -init


���������������������������
NAME TARGET STATE SERVER STATE_DETAILS
���������������������������
Cluster Resources
���������������������������
[Link]
1 ONLINE ONLINE rac1 Started
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1 ACTIVE:0
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1
[Link]
1 ONLINE ONLINE rac1

3) Check the status of the OCR


[root@Rac1 ~]# cd /u01/app/11.2.0/grid/bin/
[root@Rac1 bin]# ./orcrcheck
-bash: ./orcrcheck: No such file or directory
[root@Rac1 bin]# ./ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 3044
Available space (kbytes) : 259076
ID : 9093549
Device/File Name : +DATA
Device/File integrity check succeeded

Device/File not configured

Device/File not configured

Device/File not configured

Device/File not configured

Cluster registry integrity check succeeded

Logical corruption check succeeded

[root@Rac1 bin]# cat /etc/oracle/[Link]


ocrconfig_loc=+DATA
local_only=FALSE

4) Get information about the voting disk

[root@Rac1 bin]# ./crsctl query css votedisk


## STATE File Universal Id File Name Disk group
� �� ������ ��� ���
1. ONLINE 2a3ec883eca14fd9bf55866be66341ef (ORCL:DISK2) [DATA]
Located 1 voting disk(s).

5) Find the active and software version of the Grid Install

[root@Rac1 bin]# ./crsctl query crs activeversion


Oracle Clusterware active version on the cluster is [[Link].0]
[root@Rac1 bin]# ./crsctl query crs softwareversion
Oracle Clusterware version on node [rac1] is [[Link].0]

6) Enable, Disable and check status of auto-restart of Clusterware

[root@Rac1 bin]# crsctl config crs


CRS-4622: Oracle High Availability Services autostart is enabled.

[root@Rac1 bin]# crsctl disable crs


CRS-4621: Oracle High Availability Services autostart is disabled.

[root@Rac1 bin]# crsctl enable crs


CRS-4622: Oracle High Availability Services autostart is enabled.
7) Check nodes in cluster

[root@Rac1 bin]# olsnodes -n


rac1 1
rac2 2

can also give -s option to see which nodes are active or inactive incase of node
eviction

crs_start -all

*********How many OCR and voting disks should one have?

For redundancy, one should have atleast two OCR disks and three voting disks (raw
disk partitions). These disk partitions should be spread across different physical
disks.

*********How does one convert a single instance database to RAC?

Oracle 10gR2 introduces a utility called rconfig (located in $ORACLE_HOME/bin) that


will convert a single instance database to a RAC database.

$ cp $ORACLE_HOME/assistants/rconfig/sampleXMLs/[Link] [Link]
$ vi [Link]
$ rconfig [Link]
One can also use dbca and enterprise manager to convert the database to RAC mode.

For prior releases, follow these steps:

Shut Down your Database:


SQL> CONNECT SYS AS SYSDBA
SQL> SHUTDOWN NORMAL

Enable RAC - On Unix this is done by relinking the Oracle software.

Make the software available on all computer systems that will run RAC. This can be
done by copying the software to all systems or to a shared clustered file system.
Each instance requires its own set of Redo Log Files (called a thread). Create
additional log files:
SQL> CONNECT SYS AS SYSBDA
SQL> STARTUP EXCLUSIVE

SQL> ALTER DATABASE ADD LOGFILE THREAD 2


SQL> GROUP G4 ('RAW_FILE1') SIZE 500k,
SQL> GROUP G5 ('RAW_FILE2') SIZE 500k,
SQL> GROUP G6 ('RAW_FILE3') SIZE 500k;

SQL> ALTER DATABASE ENABLE PUBLIC THREAD 2;


Each instance requires its own set of Undo segments (rollback segments). To add
undo segments for New Nodes:
UNDO_MANAGEMENT = auto
UNDO_TABLESPACE = undots2
Edit the SPFILE/[Link] files and number the instances 1, 2,...:
CLUSTER_DATABASE = TRUE (PARALLEL_SERVER = TRUE prior to Oracle9i).
INSTANCE_NUMBER = 1
THREAD = 1
UNDO_TABLESPACE = undots1 (or ROLLBACK_SEGMENTS if you use
UNDO_MANAGEMENT=manual)
# Include %T for the thread in the LOG_ARCHIVE_FORMAT string.
# Set LM_PROCS to the number of nodes * PROCESSES
# etc....
Create the dictionary views needed for RAC by running [Link] (previously
called [Link]):
SQL> START ?/rdbms/admin/[Link]
On all the computer systems, startup the instances:
SQL> CONNECT / as SYSDBA
SQL> STARTUP;

*********How Can I test if a database is running in RAC mode?

SQL> show parameter CLUSTER_DATABASE


If the value of CLUSTER_DATABASE is FALSE then database is not running in RAC Mode.

*********How can I keep track of active instances?


You can keep track of active RAC instances by executing one of the following
queries:

SELECT * FROM SYS.V_$ACTIVE_INSTANCES;


SELECT * FROM SYS.V_$THREAD;

*********Can one see how connections are distributed across the nodes?
Select from gv$session. Some examples:

SELECT inst_id, count(*) "DB Sessions" FROM gv$session


WHERE type = 'USER' GROUP BY inst_id;
With login time (hour):

========================

*********Using Transparent Application Failover

After an Oracle RAC node crashes�usually from a hardware failure�all new


application transactions are automatically rerouted to a specified backup node. The
challenge in rerouting is to not lose transactions that were "in flight" at the
exact moment of the crash.

SELECT failover.

With SELECT failover, Oracle Net keeps track of all SELECT statements issued during
the transaction, tracking how many rows have been fetched back to the client for
each cursor associated with a SELECT statement. If the connection to the instance
is lost, Oracle Net establishes a connection to another Oracle RAC node and re-
executes the SELECT statements, repositioning the cursors so the client can
continue fetching rows as if nothing has happened. The SELECT failover approach is
best for data warehouse systems that perform complex and time-consuming
transactions.

SESSION failover.
When the connection to an instance is lost, SESSION failover results only in the
establishment of a new connection to another Oracle RAC node; any work in progress
is lost. SESSION failover is ideal for online transaction processing (OLTP)
systems, where transactions are small.
Oracle TAF also offers choices on how to restart a failed transaction. The Oracle
DBA may choose one of the following failover methods:

BASIC failover.

In this approach, the application connects to a backup node only after the primary
connection fails. This approach has low overhead, but the end user experiences a
delay while the new connection is created.

PRECONNECT failover.

In this approach, the application simultaneously connects to both a primary and a


backup node. This offers faster failover, because a pre-spawned connection is ready
to use. But the extra connection adds everyday overhead by duplicating connections.

Currently, TAF will fail over standard SQL SELECT statements that have been caught
during a node crash in an in-flight transaction failure. In the current release of
TAF, however, TAF must restart some types of transactions from the beginning of the
transaction.

The following types of transactions do not automatically fail over and must be
restarted by TAF:

Transactional statements. Transactions involving INSERT, UPDATE, or DELETE


statements are not supported by TAF.

ALTER SESSION statements. ALTER SESSION and SQL*Plus SET statements do not fail
over.
The following do not fail over and cannot be restarted:

Temporary objects. Transactions using temporary segments in the TEMP tablespace and
global temporary tables do not fail over.

PL/SQL package states. PL/SQL package states are lost during failover.

[Link] =
(DESCRIPTION_LIST =
(FAILOVER = true)
(LOAD_BALANCE = true)
(DESCRIPTION =
(ADDRESS =
(PROTOCOL = TCP)
(HOST = redneck)(PORT = 1521))
(CONNECT_DATA =
(SERVICE_NAME = bubba)
(SERVER = dedicated)
.......................
(LOAD_BALANCE = yes)................>>>>>Load Balancing
....................
(FAILOVER_MODE = .................>>>>>To Enable TAF Services
(BACKUP=cletus)
(TYPE=select)
(METHOD=preconnect)
(RETRIES=20)
(DELAY=3)
...........................................
)
)
)
)

The failover_mode section of the [Link] file lists the parameters and their
values:

To know about Connections transferred Information.

select
username,
sid,
serial#,
failover_type,
failover_method,
failed_over
from
v$session
where
username not in ('SYS','SYSTEM',
'PERFSTAT')
and
failed_over = 'YES';

You can run this script against the backup node after an instance failure to see
those transactions that have been reconnected with TAF. Remember, TAF will quickly
redirect transactions, so you'll only see entries for a short period of time
immediately after the failover. A backup node can have a variety of concurrent
failover transactions, because the [Link] file on each Oracle Net client
specifies the backup node, the failover type, and the failover method.

Conclusion

Oracle RAC, TAF, and Cache Fusion work together to guarantee continuous
availability and infinite scalability. To summarize, here's a short description of
each component:

Oracle RAC. The clustering component of Oracle that allows the creation of
multiple, independent Oracle instances, all sharing a single database.

Cache Fusion. The shared RAM component of Oracle RAC that provides fast interchange
of Oracle data blocks between SGA regions.

===================================================================================
===================================================================================
=

============================================
Example 1 Determining the Active Version
crsctl query crs activeversion

Example 2 Determining the Software Version


crsctl query crs softwareversion

1. Recent Copies of OCR Backups?


acmrac2-> ocrconfig -showbackup

=============================================

********To start all the resources on RAC database

crsctl start resources


or
crs_start -all

********Explain the difference between a FUNCTION, PROCEDURE and PACKAGE.

A function and procedure are the same in that they are intended to be a collection
of PL/SQL code that carries a single task. While a procedure does not have to
return any values to the calling application, a function will return a single
value. A package on the other hand is a collection of functions and procedures that
are grouped together based on their commonality to a business function or
application.

********How would you determine what sessions are connected and what resources they
are waiting for?

Use of V$SESSION and V$SESSION_WAIT

********Describe what redo logs are.

Redo logs are logical and physical structures that are designed to hold all the
changes made to a database and are intended to aid in the recovery of a database.

********How would you force a log switch?


ALTER SYSTEM SWITCH LOGFILE;

********Give two methods you could use to determine what DDL changes have been
made.

You could use Logminer or Streams

********What does coalescing a tablespace do?

Coalescing is only valid for dictionary-managed tablespaces and de-fragments space


by combining neighboring free extents into large single extents.

********What is the difference between a TEMPORARY tablespace and a PERMANENT


tablespace?

A temporary tablespace is used for temporary objects such as sort structures while
permanent tablespaces are used to store those objects meant to be used as the true
objects of the database.
********Name a tablespace automatically created when you create a database.

The SYSTEM tablespace.

********When creating a user, what permissions must you grant to allow them to
connect to the database?

Grant the CONNECT to the user.

********How can you rebuild an index?

ALTER INDEX <index_name> REBUILD;

********Explain what partitioning is and what its benefit is.

Partitioning is a method of taking large tables and indexes and splitting them into
smaller, more manageable pieces.

********How can you gather statistics on a table?

The ANALYZE command.

********How can you enable a trace for a session?

Use the DBMS_SESSION.SET_SQL_TRACE or


Use ALTER SESSION SET SQL_TRACE = TRUE;

********What is the difference between the SQL*Loader and IMPORT utilities?

These two Oracle utilities are used for loading data into the database. The
difference is that the import utility relies on the data being produced by another
Oracle utility EXPORT while the SQL*Loader utility allows data to be loaded that
has been produced by other utilities from different data sources just so long as it
conforms to ASCII formatted or delimited files.

********Name two files used for network connection to a database.

[Link] and [Link]

********Technical - UNIX ********


=================================================================
Every DBA should know something about the operating system that the database will
be running on. The questions here are related to UNIX but you should equally be
able to answer questions related to common Windows environments.

1. How do you list the files in an UNIX directory while also showing hidden files?

ls -ltra

2. How do you execute a UNIX command in the background?


Use the "&"

3. What UNIX command will control the default file permissions when files are
created?

Umask

4. Explain the read, write, and execute permissions on a UNIX directory.

Read allows you to see and list the directory contents.

6. Give the command to display space usage on the UNIX file system.

df -lk

7. Explain iostat, vmstat and netstat.

Iostat reports on terminal, disk and tape I/O activity.

Vmstat reports on virtual memory statistics for processes, disk, tape and CPU
activity.

Netstat reports on the contents of network data structures.

9. Give two UNIX kernel parameters that effect an Oracle install

SHMMAX & SHMMNI

10. Briefly, how do you install Oracle software on UNIX.

Basically, set up disks, kernel parameters, and run orainst.

I hope that these interview questions were not too hard. Remember these are "core"
DBA questions and not necessarily related to the Oracle options that you may
encounter in some interviews. Take a close look at the requirements for any job and
try to extract questions that interviewers may ask from manuals and real life
experiences. For instance, if they are looking for a DBA to run their databases in
RAC environments, you should try to determine what hardware and software they are
using BEFORE you get to the interview. This would allow you to brush up on
particular environments and not be caught off-guard. Good luck!

===============================================================
***To start all the resources on RAC database

crsctl start resources

or
crs_start -all

WHich CRS
/opt/oracle/orabase/product/crs/bin

cd /opt/oracle/orabase/product/crs/bin/
[root@semldslx5075 ~]# /opt/oracle/orabase/product/crs/bin/crsctl stop crs
[root@semldslx5075 ~]# /opt/oracle/orabase/product/crs/bin/crsctl stop crs
$CRS_HOME/bin/crs_stat �t
$ORA_CRS_HOME/bin/crsctl start resources
$ORA_CRS_HOME/bin/crsctl start crs
$ORA_CRS_HOME/bin/crsctl check crs
export/oracle/product/9.0.2/bin/gsdctl start

The Cluster Ready Services Daemon (crsd) Log Files


The CRS daemon (crsd) manages cluster resources based on the configuration
information that is stored in OCR for each resource. This includes start, stop,
monitor, and failover operations. The crsd process generates events when the status
of a resource changes. When you have Oracle RAC installed, the crsd process
monitors the Oracle database instance, listener, and so on, and automatically
restarts these components when a failure occurs.
Log files for the CRSD process (crsd) can be found in the following directories:
CRS home/log/hostname/crsd

Oracle Cluster Registry (OCR) Log Files


The Oracle Cluster Registry (OCR) records log information in the following
location:
CRS Home/log/hostname/client

Cluster Synchronization Services (CSS) Log Files


Cluster Synchronization Services (CSS): Manages the cluster configuration by
controlling which nodes are members of the cluster and by notifying members when a
node joins or leaves the cluster. If you are using certified third-party
clusterware, then CSS processes interfaces with your clusterware to manage node
membership information.

You can find CSS information that the OCSSD generates in log files in the following
locations: CRS Home/log/hostname/cssd

===============================================

********What is SCAN?

Single Client Access Name (SCAN) is s a new Oracle Real Application Clusters (RAC)
11g Release 2 feature that provides a single name for clients to access an Oracle
Database running in a cluster. The benefit is clients using SCAN do not need to
change if you add or remove nodes in the cluster.

********Various Timeout Settings in Cluster:

-bash-3.2$ crsctl get css disktimeout


200

-bash-3.2$ crsctl get css misscount


30

-bash-3.2$ crsctl get css reboottime


3

Disktimeout:
Disk Latencies in seconds from node-to-Votedisk. Default Value is 200. (Disk IO)

Misscount:
Network Latencies in second from node-to-node (Interconnect). Default Value is 60
Sec (Linux) and 30 Sec in Unix platform. (Network IO)

Misscount < Disktimeout

NOTE: Do not change them without contacting Oracle Support. This may cause logical
corruption to the Data.

********Check nodes in cluster

[root@Rac1 bin]# olsnodes -n


rac1 1
rac2 2

can also give -s option to see which nodes are active or inactive incase of node
eviction

********Starting and Stopping your Database.

[oracle@rac1 ~]$ srvctl status database -d orcl


Instance orcl1 is running on node rac1
Instance orcl2 is running on node rac2

[oracle@rac1 ~]$ srvctl stop database -d orcl


[oracle@rac1 ~]$ srvctl start database -d orcl

[oracle@rac1 ~]$ srvctl status instance -d orcl -i orcl1


[oracle@rac1 ~]$ srvctl stop instance -d orcl -i orcl1
[oracle@rac1 ~]$ srvctl start instance -d orcl -i orcl1

[grid@rac1 ~]$ srvctl stop diskgroup -g data -n rac1,rac2


[grid@rac1 ~]$ srvctl stop diskgroup -g fra -n rac1,rac2

[oracle@rac1 ~]$ srvctl stop scan_listener


[oracle@rac1 ~]$ srvctl stop scan

stop nodeapps on all nodes.

[oracle@rac1 ~]$ srvctl stop nodeapps -f

Monday, September 17, 2012Oracle RAC 11gR2 Clusterware and Database Administration.

Prior to 11gR2 "crs_stat -t" was used for checking the status of all the resources
But now "crsctl status resource -t" can be used in it's place.

I have a test two node RAC system on which i am going to demonstrate some of
the database and clusterware administration commands.

Here are some important notes from Oracle Docs on "SRVCTL" & "CRSCTL" utilities
which are
going to be used in this article.

********How do we verify that RAC instances are running?


select * from V$ACTIVE_INSTANCES
********How do we verify that the primary RAC instances are running?
ACTIVE_INSTANCE_COUNT

********What are the types of connection load-balancing?


There are two types of connection load-balancing:server-side load balancing and
client-side load balancing.

********Recover Corrupt/Missing OCR and Voting Disk with No Backup

The OCR location


Details in /u01/crs/oracle/product/10.2.0/crs/log/rac1/client/ocrcheck_20186.log.
[root@rac1 bin]#
[root@rac1 bin]# ./crsctl query css votedisk

[Link] [Link] from All Nodes.


The [Link] script can be found at $ORA_CRS_HOME/install/[Link] on all
nodes in the cluster

[Link] [Link] from the Primary Node.


[root@rac1 install]# ./[Link]

[Link] [Link] from the Primary Node.


# [root@rac1 install]# pwd
# /u01/crs/oracle/product/10.2.0/crs
[root@rac1 crs]# ./[Link]

# Local node checking complete.


# Run [Link] on remaining nodes to start CRS daemons.

###################################################################################
##############################################

********Add ASM INSTANCE(S) to OCR:

srvctl add asm -n -i -o


[oracle@rac1 bin]$ pwd
/u01/crs/oracle/product/10.2.0/crs/bin
[oracle@rac1 bin]$ ./srvctl add asm -i +ASM1 -n rac1 -o
/u01/app/oracle/product/10.2.0/db_1

[Link] DATABASE to OCR:

srvctl add database -d -o


[oracle@rac1 bin]$ ./srvctl add database -d cdbs -o
/u01/app/oracle/product/10.2.0/db_1

[Link] INSTANCE(S) to OCR:

srvctl add instance -d -i -n


[oracle@rac1 bin]$ ./srvctl add instance -d cdbs -i cdbs1 -n rac1

[Link] SERVICE(S) to OCR:


srvctl add service -d -s -r -P
[oracle@rac1 bin]$ ./srvctl add service -d cdbs -s cdbs_srvc -r cdbs1,cdbs2 -P
BASIC

******** Explain the use of setting GLOBAL_NAMES equal to TRUE.

Setting GLOBAL_NAMES dictates how you might connect to a database. This variable is
either TRUE or FALSE and if it is set to TRUE it enforces database links to have
the same name as the remote database to which they are linking.

===================================================================================
===================================================================================

What is the difference between server-side and client-side connection load


balancing?
Client-side balancing happens at client side where load balancing is done using
[Link] case of server-side load balancing listener uses a load-balancing
advisory to redirect connections to the instance providing best service.

What are the major RAC wait events?


In a RAC environment the buffer cache is global across all instances in the
cluster and hence the processing
[Link] most common wait events related to this are gc cr request and gc
buffer busy

GC CR request: the time it takes to retrieve the data from the remote cache

Reason: RAC Traffic Using Slow Connection or Inefficient queries (poorly tuned
queries will increase the amount of data blocks
requested by an Oracle session. The more blocks requested typically means the more
often a block will need to be read from a remote instance via the interconnect.)

GC BUFFER BUSY: It is the time the remote instance locally spends accessing the
requested data block.

3. What kind of storage we can use for the shared Clusterware files?
- OCFS (Release 1 or 2)
- raw devices
- third party cluster file system such as GPFS or Veritas

4. What kind of storage we can use for the RAC database storage?
- OCFS (Release 1 or 2)
- ASM
- raw devices
- third party cluster file system such as GPFS or Veritas
===================================================================================
==========================================
##################### RAC Real Time Senario's
######################
===================================================================================
==========================================

senario 1:-crsd fails to startup on 2nd node :


============================================
ISSUE :
crsd fails to start on 2nd node after server reboot in 11gR2 RAC

SOlution:-verified logs, ASM instance on node is not came up we have started


manually and started CRS on node2. issue solved.

NODE 1:

# crsctl check crs


CRS-4638: Oracle High Availability Services is online
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
However, after both server rebooted cluster ready service on second node does not
start, but on first node it works fine.

NODE 2:

# crsctl check crs


CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online

# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

As per Oracle documentation -which is great by the way

$ORA_CRS_HOME/crs/log Contains trace files for the CRS resources.


$ORA_CRS_HOME/crs/init Contains trace files of the CRS daemon during startup. Good
place to start with any CRS login problems.

To see more information why the clusterstack did not start up/and if it does not
start up the second time, look into the following logfiles:

$GI_HOME/log/<hostname>/alert*.log

For the specific process (crsd)


$GI_HOME/log/<hostname>/[Link]
$GI_HOME/log/<hostname>/[Link]

[root@galaxy crsd]# su - grid


[grid@galaxy ~]$ asmcmd
Connected to an idle instance.
ASMCMD> startup
ASM instance started

Total System Global Area 283930624 bytes


Fixed Size 2212656 bytes
Variable Size 256552144 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted
ASM diskgroups volume enabled
ASMCMD>

[root@galaxy ~]# crsctl start cluster


CRS-2672: Attempting to start '[Link]' on 'galaxy'
CRS-2676: Start of '[Link]' on 'galaxy' succeeded

===================================================================================
==========================================

Senario 2: one of the instance was not coming up, throwing the below error
==========================================================================

CRS-0215: Could not start resource �ora.test_prm.[Link]�.On a 3-node RAC


database, one of the instance was not coming up, throwing the below error

$ srvctl start instance -d test_prm -i test1 -o open


PRKP-1001 : Error starting instance test1 on node nn4040
CRS-0215: Could not start resource 'ora.test_prm.[Link]'.
I was asked to look into it, so the first thing i tried was to check whether the
instance comes up using startup command from sqlplus and it [Link], it was time
to check the database configuration stored in the OCR, using srvctl config.

$ srvctl config database -d test_prm -a


nn4040 test1 /u01/app/oracle/product/rdbms/10205
nn4041 test2 /u01/app/oracle/product/rdbms/10205
nn4042 test3 /u01/app/oracle/product/rdbms/10205
DB_UNIQUE_NAME: test_prm
DB_NAME: null
ORACLE_HOME: /u01/app/oracle/product/rdbms/10205
SPFILE: null
DOMAIN: null
DB_ROLE: null
START_OPTIONS: open
POLICY: AUTOMATIC
ENABLE FLAG: DB ENABLED
So, as per the output of OCR configuration �SPFILE: null� , and that was the reason
the instance was not coming up using srvctl start instance command.

Modified the configuration and the started up the instance :)


$ srvctl modify database -d test_prm -p
'/u01/oraadmin/test/admin/spfile/[Link]' -s open

$srvctl config database -d test_prm -a


nn4040 test1 /u01/app/oracle/product/rdbms/10205
nn4041 test2 /u01/app/oracle/product/rdbms/10205
nn4042 test3 /u01/app/oracle/product/rdbms/10205
DB_UNIQUE_NAME: test_prm
DB_NAME: null
ORACLE_HOME: /u01/app/oracle/product/rdbms/10205
SPFILE: /u01/oraadmin/test/admin/spfile/[Link]
DOMAIN: null
DB_ROLE: null
START_OPTIONS: open
POLICY: AUTOMATIC
ENABLE FLAG: DB ENABLED

$ srvctl start instance -d test_prm -i test1 -o open


$ sqlplus / as sysdba

SQL*Plus: Release [Link].0 - Production on Tue Mar 17 [Link] 2011

Copyright (c) 1982, 2010, Oracle. All Rights Reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release [Link].0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options

SYS@test1 >

===================================================================================
==========================================

Senario 3: one of the node crash of one node in 3-node [Link] RAC database
==========================================================================

Recently we had a node crash of one node in 3-node [Link] RAC database
[Link] alert log showed -

ORA-00600:[kcbsor_2], [3], [2]

Thu Apr 26 [Link] 2012


Hex dump of (file 488, block 345039) in trace file
/u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_dbw1_16218.trc
Corrupt block relative dba: 0x7a0543cf (file 488, block 345039)
Bad header found during preparing block for write
Data in bad block:
Hex dump of (file 488, block 345039) in trace file
/u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc
Reading datafile '/u10/oradata/matrix/test_idx_01_177.dbf' for corruption at rdba:
0x7a0543cf (file 488, block 345039)
Reread (file 488, block 345039) found same corrupt data (logically corrupt)
Errors in file
/u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc
(incident=48017):
ORA-00600: internal error code, arguments: [kcbsor_2], [3], [2], [], [], [], [],
[], [], [], [], []
Errors in file
/u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc:
ORA-00600: internal error code, arguments: [kcbsor_2], [3], [2], [], [], [], [],
[], [], [], [], []
PMON (ospid: 16123): terminating the instance due to error 472
Thu Apr 26 [Link] 2012

The trace file


�/u01/oradiag/diag/rdbms/matrix_adc/matrix3/trace/matrix3_pmon_16123.trc�

�Logical Corruption happens when a data block has a valid checksum, etc., but the
block contents are logically inconsistent.
We tried recover datafile 488 block 345039

RMAN> RECOVER DATAFILE 488 BLOCK 345039;

Starting recover at 26-APR-12


using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=2550 instance=matrix1 device type=DISK

starting media recovery


media recovery complete, elapsed time: [Link]

Finished recover at 26-APR-12


Thu Apr 26 [Link] 2012
alter database recover datafile list clear
Completed: alter database recover datafile list clear

There wasn�t any error, but media recovery complete showed elapsed time of [Link]
, which made us [Link] tried starting the instance and again it got stuck at
the same �RECOVERY OF THREAD 3 STUCK AT BLOCK 345039 OF FILE 488?. Meanwhile the
other 2 instances also got evicted cause of high load average on the server causing
[Link] then performed datafile recovery

Thu Apr 26 [Link] 2012

ALTER DATABASE RECOVER datafile '/u10/oradata/matrix/test_idx_01_177.dbf'


Media Recovery Start
Serial Media Recovery started
Recovery of Online Redo Log: Thread 1 Group 2 Seq 86819 Reading mem 0
Mem# 0: /u02/oraredo/matrix/[Link]
Mem# 1: /u05/oraredo/matrix/[Link]
Recovery of Online Redo Log: Thread 2 Group 5 Seq 62761 Reading mem 0
Mem# 0: /u02/oraredo/matrix/[Link]
Mem# 1: /u05/oraredo/matrix/[Link]
Recovery of Online Redo Log: Thread 3 Group 22 Seq 58470 Reading mem 0
Mem# 0: /u02/oraredo/matrix/[Link]
Mem# 1: /u05/oraredo/matrix/[Link]
Thu Apr 26 [Link] 2012
Recovery of Online Redo Log: Thread 3 Group 19 Seq 58471 Reading mem 0
Mem# 0: /u02/oraredo/matrix/[Link]
Mem# 1: /u05/oraredo/matrix/[Link]
Recovery of Online Redo Log: Thread 2 Group 6 Seq 62762 Reading mem 0
Mem# 0: /u02/oraredo/matrix/[Link]
Mem# 1: /u05/oraredo/matrix/[Link]
Thu Apr 26 [Link] 2012
Recovery of Online Redo Log: Thread 1 Group 3 Seq 86820 Reading mem 0
Mem# 0: /u02/oraredo/matrix/[Link]
Mem# 1: /u05/oraredo/matrix/[Link]
Media Recovery Complete (matrix1)
Completed: ALTER DATABASE RECOVER datafile
'/u10/oradata/matrix/test_idx_01_177.dbf'

The datafile was recovered and the instance [Link] 3-nodes were brought up.

===================================================================================
==========================================

Senario 6: ASM Disk groups Dismounted :


======================================

SQL> startup nomount


ORA-01078: failure in processing system parameters
ORA-01565: error in identifying file �+DATA_POC/dddddd/[Link]�
ORA-17503: ksfdopn:2 Failed to open file +DATA_POC/dddddd/[Link]
ORA-15077: could not locate ASM instance serving a required diskgroup
SQL> exit

2.- check the disk on ASMLib (as ROOT):

/etc/init.d/oracleasm listdisks
/etc/init.d/oracleasm querydisk

SQL> select name, state from v$asm_diskgroup;

NAME STATE
���������� ������
CTRLLOG MOUNTED
DATA MOUNTED
FRA MOUNTED
DATA_POC DISMOUNTED
FRA_POC DISMOUNTED
LOGCTL_POC DISMOUNTED

3.- Mounting the diskgroups (on all 3 ASM instances) was fairly simple:

SQL> alter diskgroup data_poc mount;


Diskgroup altered.

SQL> select name, state from v$asm_diskgroup;


NAME STATE
���������� ������
CTRLLOG MOUNTED
DATA MOUNTED
FRA MOUNTED
DATA_POC MOUNTED
FRA_POC DISMOUNTED
LOGCTL_POC DISMOUNTED

and the same for fra_poc and logctl_poc.

once all 3 diskgroups were mounted on all 3 nodes, I try to start the database
again using srvctl.
Senario 7: ASM Disk groups are Dismounted on RAC HA Clusters :
=============================================================

SQL> select name, state from v$asm_diskgroup;

NAME STATE
���������� ������
DATA DISMOUNTED
ARCH DISMOUNTED
POC MOUNTED
LOGCTL_POC MOUNTED

SQL> ALTER DISKGROUP ARCH mount;

Diskgroup altered.

SQL> ALTER DISKGROUP DATA mount;

Diskgroup altered.

semldslx5077.PRE030N3$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE OFFLINE
ora....[Link] application ONLINE OFFLINE
[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE OFFLINE
ora....[Link] application ONLINE OFFLINE
ora....[Link] application OFFLINE OFFLINE
ora....[Link] application ONLINE OFFLINE
[Link] application OFFLINE OFFLINE
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
semldslx5077.PRE030N3$ srvctl start database -d PRE030
semldslx5077.PRE030N3$ srvctl status database -d PRE030
Instance PRE030N3 is running on node semldslx5077
Instance PRE030N4 is running on node semldslx5078
semldslx5077.PRE030N3$ crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE OFFLINE
ora....[Link] application ONLINE OFFLINE
[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE OFFLINE
ora....[Link] application ONLINE OFFLINE
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5078
[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5075
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5076
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5077
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078
ora....[Link] application ONLINE ONLINE seml...5078

semldslx5078.PRE030N4$ ps -ef|grep pmon


oracle 6935 14162 0 15:28 pts/0 [Link] grep pmon
oracle 19178 1 0 11:19 ? [Link] asm_pmon_+ASM4
oracle 22303 1 0 15:21 ? [Link] ora_pmon_PRE030NA

8) Load balance Testing on Oracle RAC :


=======================================
Posted by Kamran Agayev A. on 14th January 2013

If you�ve installed Oracle RAC (Real Application Clusters) and want to test how
Load Balancing works, you can run the following shell script and check GV$SESSION
view:

#!/bin/bash
. /home/oracle/.bash_profile
for ((i=1; i <= 50 ; i++))
do
nohup sqlplus -S system/oracle@racdb<<eof &
begin
dbms_lock.sleep(10);
end;
/

eof
done

This will open 50 sessions in the background. Check GV$SESSION view before and
after running this query:

SQL> select inst_id,count(*) from gv$session where username is not null group by
inst_id;

INST_ID COUNT(*)
���- ���-
1 10
2 9

Run the following command from the different session:

[oracle@node1 ~] ./check_load_balancing.sh

SQL> /

INST_ID COUNT(*)
���- ���-
1 33
2 36

Wait for 10 seconds (as we�ve defined �10? seconds at DBMS_LOCK.SLEEP procedure)
and run the query again :

SQL> /

INST_ID COUNT(*)
���- ���-
1 10
2 9

Senario 9: Problem while instantiating the ASM disks when doing scandisk on second
node in the cluster.
===================================================================================
=====================

[root@myrac2 ~]# /usr/sbin/oracleasm scandisks


Reloading disk partitions: done
Cleaning any stale ASM disks�
Scanning system for ASM disks�
Instantiating disk �OCR�
Unable to instantiate disk �OCR�
Instantiating disk �VD�
Unable to instantiate disk �VD�
Instantiating disk �DATA�
Unable to instantiate disk �DATA�
Instantiating disk �FRA�
Unable to instantiate disk �FRA�

Solution:

[root@myrac2 ~]# /usr/sbin/oracleasm configure


ORACLEASM_ENABLED=false
ORACLEASM_UID=
ORACLEASM_GID=
ORACLEASM_SCANBOOT=true
ORACLEASM_SCANORDER=��
ORACLEASM_SCANEXCLUDE=��
[root@myrac2 ~]# /usr/sbin/oracleasm configure -i
Configuring the Oracle ASM library driver.

This will configure the on-boot properties of the Oracle ASM library
driver. The following questions will determine whether the driver is
loaded on boot and what permissions it will have. The current values
will be shown in brackets (�[]�). Hitting ; without typing an
answer will keep that current value. Ctrl-C will abort.

Default user to own the driver interface []: grid


Default group to own the driver interface []: asmadmin
Start Oracle ASM library driver on boot (y/n) [n]: y
Scan for Oracle ASM disks on boot (y/n) [y]: y
Writing Oracle ASM library driver configuration: done
[root@ms2rac2 ~]# ls -ltr /etc/sysconfig/oracleasm
lrwxrwxrwx 1 root root 24 Dec 11 2008 /etc/sysconfig/oracleasm ->;;; oracleasm-
_dev_oracleasm

[root@myrac2 ~]# /usr/sbin/oracleasm scandisks


Reloading disk partitions: done
Cleaning any stale ASM disks�
Scanning system for ASM disks�
Instantiating disk �OCR�
Instantiating disk �VD�
Instantiating disk �DATA�
Instantiating disk �FRA�

[root@myrac2 ~]# /usr/sbin/oracleasm listdisks


DATA
FRA
OCR
VD

If you still get the following error:

[root@ms2rac2 ~]# /usr/sbin/oracleasm scandisks


Reloading disk partitions: done
Cleaning any stale ASM disks�
Cleaning disk �FRA�
Cleaning disk �OCR�
Cleaning disk �VD�
Scanning system for ASM disks�
Instantiating disk �CRS�
Instantiating disk �FRA�
Unable to fix permissions on ASM disk �CRS�

Then check the grid and oracle user definition on all nodes.
id oracle

id grid

Check,

ORACLEASM_UID=grid
ORACLEASM_GID=asmadmin

===================================================================================
==========================================

===================================================================================
==========================================
Senario 10 :-*********Recover Corrupt/Missing OCR and Voting Disk without Backup

It happens. Not very often, but it can happen. You are faced with a corrupt or
missing Oracle Cluster Registry (OCR) and have no backup to recover from.

/u01/crs/oracle/product/10.2.0/crs/log/rac1/[Link]

[root@racnode1 ~]# echo $ORA_CRS_HOME


/u01/app/crs

[root@racnode1 ~]# which ocrcheck


/u01/app/crs/bin/ocrcheck

[oracle@racnode1 ~]$ ocrcheck


Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 262120
Used space (kbytes) : 4660
Available space (kbytes) : 257460
ID : 1331197
Device/File Name : /u02/oradata/racdb/OCRFile <-- OCR (primary)
Device/File integrity check succeeded
Device/File not configured <-- OCR Mirror (not
configured)
Cluster registry integrity check succeeded

[oracle@racnode1 ~]$ ocrconfig -showbackup

racnode1 2009/09/29 [Link] /u01/app/crs/cdata/crs


racnode1 2009/09/29 [Link] /u01/app/crs/cdata/crs
racnode1 2009/09/29 [Link] /u01/app/crs/cdata/crs
racnode1 2009/09/28 [Link] /u01/app/crs/cdata/crs
racnode1 2009/09/22 [Link] /u01/app/crs/cdata/crs

1. Recent Copies of OCR Backups?


acmrac2-> ocrconfig -showbackup
rac1 2007/06/26 [Link] /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
rac1 2007/06/26 [Link] /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
rac1 2007/06/25 [Link] /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
rac1 2007/06/25 [Link] /u01/app/oracle/product/10.2.0/crs_1/cdata/crs
rac1 2007/06/17 [Link] /u01/app/oracle/product/10.2.0/crs_1/cdata/crs

2. You can use OCRDUMP to review the contents of the backup.


$ ocrdump -backupfile file_name # file_name is the name of the backup file.

3. What is the frequency of OCR backups automatically done by Oracle Clusteware?


Oracle Clusterware automatically creates OCR backups every 4 hours, each full day
and at the end of each week.

1. Before using crs_unregister, it's always recommended to backup OCR and voting
disks. Make sure that you have a documented version to restore, if things go wild.

2. Nodeapps needs to be stopped on all the nodes (In my case I have 2 nodes).

3. You can use crs_unregister to remove the entries which are not required, here in
my example I did with listener which I configured wrongly and I want this entry to
be removed from CRS.
crs_unregister ora.testrac1p.LISTENER_RACDB_TESTRAC1P.lsnr
crs_unregister ora.testrac2p.LISTENER_RACDB_ [Link]

4. Once you have removed the entries using crs_unregister, use srvctl to reflect
the removed entries.
$ srvctl config listener -n testrac1p
$ srvctl config listener -n testrac2p
$ crs_stat -v

5. Once the listener entries are removed, you can use netca (NETCA) to configure
the Listeners.
6. Start the Nodeapps on all the nodes

===================================================================================
==========================================

Senario 3: Convert a single instance database to RAC :-


=======================================================

Oracle 10gR2 introduces a utility called rconfig (located in $ORACLE_HOME/bin) that


will convert a single instance database to a RAC database.

$ cp $ORACLE_HOME/assistants/rconfig/sampleXMLs/[Link] [Link]
$ vi [Link]
$ rconfig [Link]

2.==How can I test if a database is running in RAC mode?==

SQL> show parameter CLUSTER_DATABASE


If the value of CLUSTER_DATABASE is FALSE then database is not running in RAC Mode.

3.==How can I keep track of active instances?==


You can keep track of active RAC instances by executing one of the following
queries:
SELECT * FROM SYS.V_$ACTIVE_INSTANCES;
SELECT * FROM SYS.V_$THREAD;

[Link] can be viewed using the cluster state (crsstat)1 utility shown as
follows:
[oracle@oradb4 oracle]$ crsstat -t

[Link] the health of the Oracle Clusterware daemon processes


with the following:
[oracle@oradb4 oracle]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[oracle@oradb4 oracle]$

===================================================================================
==========================================

RAC DRM:(RAC Dynamic Remastering)


----------------------------------

Get data_object_id for [Link]

SYS> col owner for a10


col data_object_id for 9999999
col object_name for a15
select owner, data_object_id, object_name
from dba_objects
where owner = 'SCOTT'
and object_name = 'EMP';

OWNER DATA_OBJECT_ID OBJECT_NAME

���- ����� �����

SCOTT 73181 EMP

� Get File_id and block_id of emp table

SQL>select empno, dbms_rowid.rowid_relative_fno(rowid),


dbms_rowid.rowid_block_number(rowid)
from [Link]
where empno in (7788, 7369);

� Check that the current master of the block has changed to node2 (numbering starts
from 0)
� Previous master = 0 (Node1)
� REMASTER_CNT = 2 indicating the object has been remastered twice
SYS>select o.object_name, m.CURRENT_MASTER,
m.PREVIOUS_MASTER, m.REMASTER_CNT
from dba_objects o, v$gcspfmaster_info m
where o.data_object_id=74625
and m.data_object_id = 74625 ;
OBJECT CURRENT_MASTER PREVIOUS_MASTER REMASTER_CNT
�� ����� ����� ����
EMP 1 0 2

� Find master and owner of the block.


� Note that current owner of the block is Node2 (KJBLOWNER=1)
from where query was issued)
� current master of the block has been changed to node2 (KJBLMASTER=1)
SYS> select [Link], kj.kjblname2, [Link],
[Link]
from (select kjblname, kjblname2, kjblowner,
kjblmaster, kjbllockp
from x$kjbl
where kjblname = '[0x97][0x4],[BL]'
) kj, x$le le
where le.le_kjbl = [Link]
order by le.le_addr;

KJBLNAME KJBLNAME2 KJBLOWNER KJBLMASTER


���������� ���������� ���- ���-
[0x97][0x4],[BL] 151,4,BL 1 1

===================================================================================
==========================================

Oracle RAC Node Eviction � How to analyze :


=======================================================

WHAT IS NODE EVICTION


The Oracle Clusterware is designed to perform a node eviction by removing one or
more nodes from the cluster if some critical problem is detected. A critical
problem could be a node not responding via a network heartbeat, a node not
responding via a disk heartbeat, a hung or severely degraded machine, or a hung
[Link] process. The purpose of this node eviction is to maintain the overall
health of the cluster by removing bad members.

PROCESS ROLES FOR REBOOTS

OCSSD (aka CSS daemon)


Primary responsibility of this daemon is internode health check and RDBMS instance
endpoint discovery.
The health monitoring includes a network heartbeat and a disk heartbeat (to the
voting files).
OCSSD can also evict a node after escalation of a member kill from a client (such
as a database LMON process).

CSSDAGENT
This process provides following functionality. (These services was formerly (10g
and 11.1) provided by oprocd.
? Monitoring for node hangs (via oprocd functionality)
? Monitoring to the OCSSD process for hangs (via oclsomon functionality)
? monitoring vendor clusterware (via vmon functionality)
? This is a multi-threaded process that runs at an elevated priority and runs
as the root user.

CSSDMONITOR
? This process monitors for node hangs (via oprocd functionality)
? Monitors the OCSSD process for hangs (via oclsomon functionality)
? Monitors vendor clusterware (via vmon functionality)
This is a multi-threaded process that runs at an elevated priority and runs as the
root user.

Review these file to figure out what is going on


Clusterware alert log in $GRID_HOME>/log/nodename
The cssdagent log(s) in $GRID_HOME/log/nodename/agent/ohasd/oracssdagent_root
The cssdmonitor log(s) in $GRID_HOME/log/nodename/agent/ohasd/oracssdmonitor_root
The ocssd log(s) in $GRID_HOME/log//cssd
The lastgasp log(s) in /etc/oracle/lastgasp or /var/opt/oracle/lastgasp
IPD/OS or OS Watcher data
�opatch lsinventory -detail� output for the GRID home

Messages files:
Linux: /var/log/messages
Sun: /var/adm/messages
HP-UX: /var/adm/syslog/[Link]
IBM: /bin/errpt -a > [Link]

Common Causes of OCSSD Evictions


? Network failure or latency between nodes. It would take 30 consecutive missed
checkins (by default � determined by the CSS misscount) to cause a node eviction.
? Problems writing to or reading from the CSS voting disk. If the node cannot
perform a disk heartbeat to the majority of its voting files, then the node will be
evicted.
? A member kill escalation. For example, database LMON process may request CSS
to remove an instance from the cluster via the instance eviction mechanism. If this
times out it could escalate to a node kill.
? An unexpected failure of the OCSSD process, this can be caused by any of the
above issues or something else.
? An Oracle bug.

Common Causes of CSSDAgent or CSSDMonitor Evictions


? An OS scheduler problem. For example, if the OS is getting locked up in a
driver or hardware or there is excessive amounts of load on the machine, thus
preventing the scheduler from behaving reasonably.
? A thread(s) within the CSS daemon hung.
? An Oracle bug.

Node Evictions in RAC environment :


==========================
Node eviction is quite sometimes happening in RAC environment on any platform and
troubleshooting and finding root cause for node eviction is very important for DBAs
to avoid same in the future. There are two RAC processes which are basically
deciding about node evictions and who will initiate node evictions in almost all
platforms.
1. OCSSD : This process is primary responsible for inter node health monitoring and
instance endpoint recovery. It runs as oracle user. It also provides basic cluster
locking and group services. It can run with or without vendor clusterware. The
abnormal termination or killing this process will reboot the node by [Link]
script. If this script is killed then ocssd process will survive and node will keep
functioning. This script is called by /etc/inittab entry and when it tries to
respawn it and will try to start its own ocssd process. Since one ocssd process is
already running, this 2nd time script calling ocssd starting will fail and 2nd
[Link] script will reboot the node.

2. OPROCD : This process is known as checking hangcheck and drive freezes on


machine. On Linux, it is not available on [Link] platform as this same function
is performed by linux hangcheck timer module. Starting from [Link], it will be
started as part of clusterware startup and it runs as root. Killing this process
will reboot the node. If a machine is hang for long time, this process needs to
kill itself to avoid IO happening to disk so that rest of the nodes can remaster
the resources. This executable sets a signal handler and sets the interval time
bases on milliseconds parameter. It takes two parameters.

a. Timeout value �t : This is the length of time between executions. By default


it�s 1000.
b. Margin �m : This is the acceptable difference between dispatches. By default,
it�s 500.

When we set diagwait to 13, the margin becomes 13 -3 (reboottime seconds)= 10


seconds so value of m will be 10000.

There are two kinds of heartbeat mechanisms which are responsible for node reboot
and reconfiguration of remaining clusteware nodes.

a. Network heartbit : This indicates that node can participate in cluster


activities like group membership changes. When it�s missing for too long, cluster
membership will change as a result of reboot. This too long value is determined by
css miscount parameter value which is 30 seconds on most of platforms but can be
changed depending on network configuration of particular environment. If at all it
needs to be changed, it�s advisable to contact oracle support and take their
recommendations on this.

b. Disk heartbit : This disk heartbit means heartbits to voting disk file which has
the latest information about node members. Connectivity to a majority of voting
files must be maintained for a node to stay alive. Voting disk file uses kill
blocks to notify nodes they have been evicted and then remaining nodes can go for
reconfiguration and a node with least no will become master as per Oracle algorithm
generally. By default this value is 200 seconds which is css disktimeout parameter.
Again changing this parameter requires oracle support�s recommendation. When node
can no longer communicate through private interconnect, other nodes can see its
heartbits in voting file then it�s being evicted by using voting disk kill block
functionality.

Network split resolution : When network fails and nodes are not able to communicate
to each other then one node has to fail to maintain data integrity. The surviving
nodes should be an optimal subcluster of original cluster. Each node writes its own
vote to voting file and Reconfiguration manager component reads these votes to
calculate an optimal sub cluster. Nodes that are not to survive are evicted via
communication through network and disk.

Causes of reboot by clusterware processes


======================================
Now we will briefly discuss about causes of reboot by these processes and at last,
which files to review and upload to oracle support for further diagnosis.
Reboot by OCSSD.
============================
1. Network failure : 30 consecutive missed checkins will reboot a node where
heartbits are issues once per second. Some kind of messages in [Link] like
heartbit fatal, eviction in xx seconds� Here there are two things.
a. If node eviction time in messages log file is less than missed checkins then
node eviction is likely not due to missed checkins.
b. If node eviction time in messages log file is greater than missed checkins then
node eviction is likely due to missed checkins.
2. Problems writing to voting disk file : some kind of hang in accessing voting
disk.
3. High CPU utilization : When CPU is highly utilized then css daemon doesn�t get
CPU on time to ping to voting disk and as a result, it cannot write to voting disk
file its own vote and node is going to be rebooted.
4. Disk subsystem is unresponsive due to storage issues.
5. Killing ocssd process.
6. An oracle bug.

Finding What Caused the Eviction


=========================
Very often with node evictions you will need to engage Oracle support. Clusterware
is complex enough that it will take the support tools that Oracle Support has
available to diagnose the problem. However, there are some initial things you can
do that might help to solve some basic problems, like node mis-configurations. Some
things you might want to do are:

Determine the time the node rebooted. Using the uptime UNIX command for example.
This will help you determine where in the various logs you will want to look for
additional information.

Check the following logfiles to begin with:

/var/log/messages
GRID_HOME/log/<host>/cssd/[Link]
GRID_HOME/log/host/alert<host>.log
Perhaps the biggest causes for node evictions are:

Reboot by OPROCD.
============================
When a problem is detected by oprocd, it�ll reboot the node for following reasons.
1. OS scheduler algorithm problem.
2. High CPU utilization due to which oprocd is not getting cpu to check hang check
issues at OS level.
3. An oracle bug.

Also just to share with you, at one of the client sites, lms processes were running
on low priority scheduling and lms were not getting cpu on time when there�s cpu is
high utilized so lms couldn�t communicate through clusterware processes and node
eviction got delayed and it was observed that oprocd rebooted node which should not
have happened as lms was responsible to run at lower priority scheduling.
Determining cause of reboot by which process
==============================================

1. If there are below kind of messages in logfiles then it will be likely reboot by
ocssd process.
a. Reboot due to cluster integrity in syslog file or messages file.
b. Any error prior to reboot in [Link] file.
c. Missed checkins in syslog file and eviction time is prior to node reboot time.
2. If there are below kind of messages in logfiles then it will be likely reboot by
oprocd process.
a. Resetting message in messages logfile on linux.
b. Any error in oprocd log matching with timestamp of reboot or prior to reboot
at /etc/oracle/oprocd directory.
3. If there are other messages like Ethernet issues or some kind of errors in
messages or syslog file then please check with sysadmins. On AIX, errpt �a output
gives lot of information about cause of reboot.
Log files collection while reboot of node
==============================================
Whenever node reboot occurs in clusterware environment, please review below
logfiles for getting reason of reboot and these files are necessary to upload to
oracle support for node eviction diagnosis.
a. CRS log files (For 10.2.0 and above 10.2.0 release)
=============================================
1. $ORACLE_CRS_HOME/log//crsd/[Link]
2. $ORACLE_CRS_HOME/log//cssd/[Link]
3. $ORACLE_CRS_HOME/log//evmd/[Link]
4. $ORACLE_CRS_HOME/log//[Link]
5. $ORACLE_CRS_HOME/log//client/cls*.log (not all files but only latest files
matching with timestamp of node reboot)
6. $ORACLE_CRS_HOME/log//racg/ (Please check files and directories matching with
timestamp of reboot and if found then copy otherwise not required)
7. The latest .[Link] file from /etc/oracle or /var/opt/oracle/oprocd (Solaris)

Note: We can use $ORACLE_CRS_HOME/bin/[Link] to collect above files but


it doesn�t collect OPROCD logfiles, OS log files and OS watcher logfiles and also
it may take lot of time to run and consume resources so it�s better to copy
manually.
b. OS log files (This will get overwritten so we need to copy soon)
====================================================
1. /var/log/syslog
2. /var/adm/messages
3. errpt �a >error_.log (AIX only)

c. OS Watcher log files (This will get overwritten so we need to copy soon)
=======================================================
Please check in crontab where OSwatcher is installed. Go to that directory and then
archive folder and then collect files from all directory matching with timestamp of
node reboot.
1. OS_WATCHER_HOME/archive/oswtop
2. OS_WATCHER_HOME/archive/oswvmstat
3. OS_WATCHER_HOME/archive/oswmpstat
4. OS_WATCHER_HOME/archive/oswnetstat
5. OS_WATCHER_HOME/archive/oswiostat
6. OS_WATCHER_HOME/archive/oswps
7. OS_WATCHER_HOME/archive/oswprvtnet

You might also like