Professional Documents
Culture Documents
Contents
1. Overview
2. Remove the Instance
3. Remove the Node from the Cluster
Overview
With any RAC configuration, it is common for the DBA to encounter a scenario where
he or she needs to remove a node from the RAC environment. It may be that a server is
being underutilized in the cluster and could be better used in another business unit.
Another scenario is a node failure. In this case, a node can be removed from the cluster
while the remaining nodes continue to service ongoing requests.
If a node needs to be removed from an Oracle10g RAC database, even if the node will
no longer be available to the environment, there is a certain amount of cleanup that
needs to be done. The remaining nodes need to be informed of the change of status of
the departing node.
The three most important steps that need to be followed are and will be discussed in this
article are:
I will be removing node linux3, along with all metadata associated with it. Most of the
operations to remove the node from the cluster will need to be performed from a pre-
existing node that is available and will remain in the cluster. For this article, I will be
performing all of these actions from linux1 to remove linux3.
When removing a node from an Oracle10g RAC cluster, the DBA will first need to
remove the instance that is (or was) accessing the clustered database. This includes the
ASM instance if the database is making use of Automatic Storage Management. Most
of the actions to remove the instance need to be performed on a pre-existing node in the
cluster that is available and will remain available after the removal.
For this section, I will be removing the instance(s) on linux3 and performing all of
these operations from linux1:
This section provides two ways to perform the action of removing the instance(s): using
DBCA or command-line (svrctl). When possible, always attempt to use the DBCA
method.
Using DBCA
The following steps can be used to remove an Oracle10g instance from a clustered
database using DBCA - even if the instance on the node is not available.
12. Next, run the DBCA from one of the nodes you are going
to keep. The database should remain up as well as leaving the
departing instance up and running (if it is available).
$ dbca &
If the database is in archive log mode, the DBA may receive the following errors:
ORA-00350 or ORA-00312
This may occur because the DBCA cannot drop the current log, as it needs
archived. This issue is fixed with 10.1.0.3 patchset. If the DBA encounters this
error, click the [Ignore] button and when the DBCA completes, manually archive
the logs for the deleted instance and drop the log group:
If for any reason the redo thread is not disabled then disable the
thread:
21. Verify that the instance was removed from the Oracle
Configuration Repository (OCR) using the srvctl config
database -d <db_name> command. The following example
assumes the name of the clustered database is orcl:
22. $ srvctl config database -d orcl
23. linux1 orcl1
/u01/app/oracle/product/10.1.0/db_1
linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1
24. If the node had an ASM instance and the node will no
longer be a part of the cluster, the DBA should remove the ASM
instance using the following, assuming the node being removed is
linux3:
25. $ srvctl stop asm -n linux3
$ srvctl remove asm -n linux3
Verify that the ASM instance was removed using the following:
Using SRVCTL
The following steps can be used to remove an Oracle10g instance from a clustered
database using the command-line utility srvctl - even if the instance on the node is not
available.
15. Verify that the instance was removed from the Oracle
Configuration Repository (OCR) using the srvctl config
database -d <db_name> command:
16. $ srvctl config database -d orcl
17. linux1 orcl1
/u01/app/oracle/product/10.1.0/db_1
linux2 orcl2 /u01/app/oracle/product/10.1.0/db_1
18. If the node had an ASM instance and the node will no
longer be a part of the cluster, the DBA should remove the ASM
instance using the following, assuming the name of the clustered
database is named orcl and the node being removed is linux3:
19. $ srvctl stop asm -n linux3
$ srvctl remove asm -n linux3
Verify that the ASM instance was removed using the following:
Now that the instance has been removed (and the ASM instance is applicable), we now
need to remove the node from the cluster. This is a manual method performed using
scripts that need to be run on the deleted node (if available) to remove the CRS install as
well as scripts that should be run on any of the existing nodes (i.e. linux1).
Before proceeding to the steps for removing the node, we need to determine the node
name and the CRS-assigned node number for each node stored in the Oracle Cluster
Registry. This can be run from any of the existing nodes (linux1 for this example).
$ $ORA_CRS_HOME/bin/olsnodes -n
linux1 1
linux2 2
linux3 3
Now that we have the node name and node number, we can start the steps to remove the
node from the cluster. Here are the steps that should be executed from a pre-existing
(available) node in the cluster (i.e. linux1):
$ $ORA_CRS_HOME/bin/crs_stat
For example, verify that the node to be removed is not running any
database resources. Look for the record of type:
NAME=ora.<db_name>.db
TYPE=application
TARGET=ONLINE
STAT=ONLINE on <node>
Assuming the name of the clustered database is orcl, this is the record
that was returned from the crs_stat command on my system:
NAME=ora.orcl.db
TYPE=application
TARGET=ONLINE
STATE=ONLINE on linux1
I am safe here since the resource is running on linux1 and not linux3 -
the node I want to remove.
$ $ORA_CRS_HOME/bin/crs_relocate ora.<db_name>.db
13. The next step is to update the node list using the updateNodeList
option to the OUI as the oracle user. This procedure will remove the
node to be deleted from the list of node locations maintained by the OUI
by listing only those remaining nodes. The only file that I know of that
gets modified is
$ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is
the command I used for removing linux3 from the list. Notice that the
DISPLAY variable needs to be set even though the GUI does not run.
14. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
15.
16. $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs
-updateNodeList \
17. ORACLE_HOME=/u01/app/oracle/product/10.1.0/db_1 \
CLUSTER_NODES=linux1,linux2
Note that the command above will produce the following error which can
safely be ignored:
18. If the node to be removed is still available and running the CRS
stack, the DBA will need to stop the CRS stack and remove the ocr.loc
file. These tasks should be performed as the root user account and on
the node that is to be removed from the cluster. The nosharedvar option
assumes the ocr.loc file is not on a shared file system (which is the case
in my example). If the file does exist on a shared file system, then
specify sharedvar. From the node to be removed (i.e. linux3) and as
the root user, run the following:
19. $ su
20. Password: xxxx
21.
22. # cd $ORA_CRS_HOME/install
23. # ./rootdelete.sh remote nosharedvar
24. Running Oracle10 root.sh script...
25. \nThe following environment variables are set as:
26. ORACLE_OWNER= oracle
27. ORACLE_HOME= /u01/app/oracle/product/10.1.0/crs
28. Finished running generic part of root.sh script.
29. Now product-specific root actions will be performed.
30. Shutting down Oracle Cluster Ready Services (CRS):
31. /etc/init.d/init.crsd: line 188: 29017 Aborted
$ORA_CRS_HOME/bin/crsd -2
32.
33. Shutting down CRS daemon.
34. Shutting down EVM daemon.
35. Shutting down CSS daemon.
36. Shutdown request successfully issued.
37. Checking to see if Oracle CRS stack is down...
38. Oracle CRS stack is not running.
39. Oracle CRS stack is down now.
40. Removing script for Oracle Cluster Ready services
41. Removing OCR location file '/etc/oracle/ocr.loc'
Cleaning up SCR settings in '/etc/oracle/scls_scr/linux3'
42. Next, using the node name and CRS-assigned node number for
the node to be deleted, run the rootdeletenode.sh command as
follows. Keep in mind that this command should be run from a pre-
existing / available node (i.e. linux1) in the cluster as the root UNIX
user account:
43. $ su
44. Password: xxxx
45.
46. # cd $ORA_CRS_HOME/install
47. # ./rootdeletenode.sh linux3,3
48. Running Oracle10 root.sh script...
49. \nThe following environment variables are set as:
50. ORACLE_OWNER= oracle
51. ORACLE_HOME= /u01/app/oracle/product/10.1.0/crs
52. Finished running generic part of root.sh script.
53. Now product-specific root actions will be performed.
54. clscfg: EXISTING configuration version 2 detected.
55. clscfg: version 2 is 10G Release 1.
56. Successfully deleted 13 values from OCR.
57. Key SYSTEM.css.interfaces.nodelinux3 marked for
deletion is not there. Ignoring.
58. Successfully deleted 5 keys from OCR.
59. Node deletion operation successful.
'linux3,3' deleted successfully
To verify that the node was successfully removed, use the following as
either the oracle or root user:
$ $ORA_CRS_HOME/bin/olsnodes -n
linux1 1
linux2 2
60. Now, switch back to the oracle UNIX user account on the same
pre-existing node (linux1) and run the runInstaller command to
update the OUI node list, however this time for the CRS installation
($ORA_CRS_HOME). This procedure will remove the node to be deleted
from the list of node locations maintained by the OUI by listing only
those remaining nodes. The only file that I know of that gets modified is
$ORACLE_BASE/oraInventory/ContentsXML/inventory.xml. Here is
the command I used for removing linux3 from the list. Notice that the
DISPLAY variable needs to be set even though the GUI does not run.
61. $ DISPLAY=<machine_or_ip_address>:0.0; export DISPLAY
62.
63. $ $ORACLE_HOME/oui/bin/runInstaller -ignoreSysPrereqs
-updateNodeList \
64. ORACLE_HOME=/u01/app/oracle/product/10.1.0/crs \
CLUSTER_NODES=linux1,linux2
Note that each of the commands above will produce the following error
which can safely be ignored:
The OUI now contains the valid nodes that are part of the cluster!
65. Now that the node has been removed from the cluster, the DBA
should manually remove all Oracle10g RAC installation files from the
deleted node. Obviously, this applies only if the removed node is still
accessible and only if the files are not on a shared file system that is still
being accessed by other nodes in the cluster!
From the deleted node (linux3) I performed the following tasks as the
root UNIX user account:
3. Remove all init scripts and soft links (for Linux). For a
list of init scripts and soft links for other UNIX platforms, see
Metalink Note: 269320.1
4. # rm -f /etc/init.d/init.cssd
5. # rm -f /etc/init.d/init.crs
6. # rm -f /etc/init.d/init.crsd
7. # rm -f /etc/init.d/init.evmd
8. # rm -f /etc/rc2.d/K96init.crs
9. # rm -f /etc/rc2.d/S96init.crs
10. # rm -f /etc/rc3.d/K96init.crs
11. # rm -f /etc/rc3.d/S96init.crs
12. # rm -f /etc/rc5.d/K96init.crs
13. # rm -f /etc/rc5.d/S96init.crs
# rm -Rf /etc/oracle/scls_scr
It is not very easy to read this output if you have large number of nodes with lots of
resources configured on them. You can use the “-t” option with the crs_stat to see the
output in a tabular form like:
crs_stat
However, this output is designed for a fixed terminal width of 60 characters. Hence the
resource names are truncated. This makes it even more difficult to see what resource is
in which state.
Thankfully, there are some scripts out there that parse the default output of the crs_stat
and provide a tabular output in a wider form so you can see what your are looking for.
I also have an alias my_crs_stat for this command so I don’t have type it all the time.
alias my_crs_stat='crs_stat | awk -F= '\''/NAME=/{n=$2}/TYPE=/
{t=$2}/TARGET=/{g=$2}/STATE=/{s=$2; printf("%-45s%-15s%-10s%-30s\n",
n,t,g,s)}'\'''
my_crs_stat
1. * crs_stat
* crs_register
* crs_unregister
* crs_start
* crs_stop
* crs_getperm
* crs_profile
* crs_relocate
* crs_setperm
* crsctl check crsd
* crsctl check cssd
* crsctl check evmd
* crsctl debug log
* crsctl set css votedisk
* crsctl start resources
* crsctl stop resources
7.
8.
Logged to:
&ORA_CRS_HOME/racg/dump
$ORA_CRS_HOHE/log/<nodename>/racg
<event_type> VERSION=<n.n>
service=<service_namne.db_domain_name>
[database=<db_unique_name> [instance=<instance_name>]]
[host=<hostname>]
status=<event_status> reason=<event_reason> [card=<n>]
timestamp=<event_date> <event_time>
event_type Description
SERVICE Primary application service event
SRV_PRECONNECT Preconnect application service event (TAF)
SERVICEMEMBER Application service on a specific instance event
DATABASE Database event
INSTANCE Instance event
ASM ASM instance event
NODE Cluster node event
#FAN events can control the workload per instance for each service
9.
[$ORA_CRS_HOME/opmn/conf/ons.config]
localport=6lOO
remoteport=6200
loglevel=3
useocr=on
onsctl start/stop/ping/reconfig/debug/detailed
10.
11.
[/etc/inittab]
...
hl:35:respawn:/etc/init.d/init.evmd run >/dev/null 2>&l </dev/null
h2:35:respawn:/etc/init.d/init.cssd fatal >/dev/null 2>&l </dev/null
h3:35:respawn:/etc/init.d/init.crsd run >/dev/null 2>&1 </dev/null
12.
crs_stat
#Tested, as root
#Lists the status of an application profile and resources
#crs_stat [resource_name [...]] [-v] [-l] [-q] [-c cluster_node]
$ORA_CRS_HOME/bin/crs_stat -t
Name Type Target State Host
------------------------------------------------------------
ora.e2.gsd application ONLINE ONLINE e2
ora.e2.ons application ONLINE ONLINE e2
ora.e2.vip application ONLINE ONLINE e2
VIP Normal
Name Type Target State Host
------------------------------------------------------------
ora.e2.vip application ONLINE ONLINE e2
ora.e2.vip application ONLINE ONLINE e3
VIP Node 2 is down
Name Type Target State Host
------------------------------------------------------------
ora.e2.vip application ONLINE ONLINE e2
ora.e2.vip application ONLINE ONLINE e2
crs_stat -p ...
AUTO_START = #2 CRS will not start after system boot
crs_stat
NAME=ora.RAC.RACl.inst
TYPE=application
TARGET=ONLINE
STATE=ONLINE on londonl
NAME=ora.RAC.SERVICEl.RACl.srv
TYPE=application
TARGET=OFFLINE
STATE=OFFLINE
13.
Voting disk
On Shared storage, Used by CSS, contains nodes that are currently
available within the cluster
If Voting disks are lost and no backup is available then Oracle
Clusterware must be reinstalled
3 way multiplexing is ideal
crsctl
#Tested, as oracle
$ORA_CRS_HOME/bin/crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy
14.
#verify restore
cluvfy comp ocr -n all
ocrcheck
#OCR integrity check, validate the accessibility of the device and its
block integrity
log to current dir or to $OCR_HOME/log/<node>/client
ocrdump
#dump the OCR content to a text file, if succeds then integrity of
backups is verified
OCRDUMP - Identify the interconnect being used
$ORA CRS HOME/bin/ocrdump.bin -stdout -keyname SYSTEM.css.misscount
-xml
15.
Post installation
- Backup root.sh
- Set up other user accounts
- Verify Enterprise Manager / Cluster Registry by running srvctl config
database -d db_name
16.
SRVCTL
Stores infos in OCR, manages:
Database, Instance, Service, Node applications, ASM, Listener
17.
Services
Changes are recorded in OCR only! Must use DBMS_SERVICE to update the
dictionary
Views
GV$SERVICES
GV$ACTIVE_SERVICES
GV$SERVICEMETRIC
GV$SERVICEMETRIC_HISTORY
GV$SERVICE_WAIT_CLASS
GV$SERVICE_EVENT
GV$SERVICE_STATS
GV$SERV_MOD_ACT_STATS
18.
19.
VIP virtual IP
- Both application/RAC VIP fail over if related application fail and
accept new connections
- Recommended RAC VIP sharing among database instances but not among
different applications because...
- ...VIP fail over if the application fail over
- A failed over VIP application accepts new connection
- Each VIP requires an unused and resolvable IP address
- VIP address should be registered in DNS
- VIP address should be on the same subnet of the public network
- VIPs are used to prevent connection requests timeout during client
connection attempts
Changing a VIP
1- Stop VIP dependent cluster components on one node
2- Make changes on DNS
3- Change VIP using SRVCTL
4- Restart VIP dependent components
5- Repeat above on remaining nodes
20.
oifcfg
allocating and deallocating network interfaces, get values from OCR
21.
Listener parameters
local_listener='(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =
192.168.0.13) (PORT = 1521)))'
#allow pmon to register with local listener when not using 1521 port
remote_listener = '(ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST =
192.168.2.9) (PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST
=192.168.2.10)(PORT = 1521)))'
#make the listener aware of the load of the listeners of other nodes
22.
23.
Shared contents
datafiles, controlfiles, spfiles, redo log
Shared or local?
RAW_Dev File_Syst ASM
NFS OCFS
- Datafiles : shared mandatory
- Control files : shared mandatory
- Redo log : shared mandatory
- SPfile : shared mandatory
- OCR and vote : shared mandatory Y Y N
- Archived log : shared not mandatory. N Y N
Y
- Undo : local
- Flash Recovery : shared Y
Y Y
- Data Guard broker conf.: shared(prim. & stdby) Y Y
24.
25.
26.
27. Install
28. See official Note 239998.1 for removing crs installation
29. See http://startoracle.com/2007/09/30/so-you-want-to-play-with-
oracle-11gs-rac-heres-how/ to install 11g RAC on VMware
30. See
http://www.oracle.com/technology/pub/articles/hunter_rac10gr2_is
csi.html to install on Linux with iSCSI disks
31. See http://www.oracle-
base.com/articles/10g/OracleDB10gR2RACInstallationOnCentos4Using
VMware.php to install on VMware
32. See OCFS Oracle Cluster Filesystem
33.
34.
35.
36.
Prerequisites check
#check node connectivity and Clusterware integrity
./runcluvfy.sh stage -pre dbinst -n all
./runcluvfy.sh stage -post hwos -n "linuxes,linuxes1" -verbose
WARNING:
Package cvuqdisk not installed.
WARNING:
Unable to determine the sharedness of /dev/sdf on nodes:
linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes1,linuxes,l
inuxes,linuxes,linuxes,linuxes,linuxes
52.
/etc/hosts example
# Do not remove the following line, or various programs
# that require network functionality will fail,
127.0.0.1 localhost
147.43.1.101 londonl
147.43.1.102 london2
#VIP is usable only after VIPCA utility run,
#should be created on the public interface. Remember that VIPCA is a
GUI tool
147.43.1.201 londonl-vip
147.43.1.202 london2-vip
192.168.1.1 londonl-priv
192.168.1.2 london2-priv
53.
54.
55.
Kernel Parameters(/etc/sysctl.conf) Recommended Values
kernel.sem (semmsl) 250
kernel.sem (semmns) 32000
kernel.sem (semopm) 100
kernel.sem (semmni) 128
kernel.shmall 2097152
kernel.shmmax Half the size of physical memory
kernel.shmmni 4096
fs.file-max 65536
net.core.rmem_default 262144
net.core.rmem_max 262144
net.core.wmem_default 262144
net.core.wmem_max 262144
net.ipv4.ip_local_port_range 1024 to 65000
56.
57.
58.
RAC restrictions
- dbms_alert, both publisher and subscriber must be on same instance,
AQ is the workaround
- dbms_pipe, only works on the same instance, AQ is the workaround
- UTL_FILE, directories, external tables and BFILEs need to be on
shared storage
59.
60.
61.
Implementing the HA High Availability Framework
Use srvctl to start/stop applications
#Manually create a script that OCR will use to start/stop/status
#As the oracle user, register the VIP with Oracle Clusterware:
ORA_CRS_HOME/bin/crs_register hafdemovip
#As the root user, set the owner of the apphcation VIP to root:
$ORA_CRS_HOME/bin/crs_setperm hafdemovip -o root
#As the root user, grant the oracle user permission to run the script:
$ORA_CRS_HOME/bin/crs_setperm hafdemovip -u user:oracle:r-x
$ORA_CRS_HOME/bin/crs_start hafdemo
62.
63.
64.
CRS commands
crs_profile
crs_register
crs_unregister
crs_getperm
crs_setperm
crs_start
crs_stop
crs_stat
crs_relocate
65.
66.
67.
Server side callouts
Oracle instance up(/down?)
Service member down(/up?)
Shadow application service up(/down?)
68.
69.
70.
Adding a new node
- Configure hardware and OS
- With NETCA reconfigure listeners and add the new one
- $ORA_CRS_HOME/oui/bin/addnode.sh from one of existing nodes to define
the new one to all existing nodes
- $ASM_HOME/oui/bin/addnode.sh from one of existing nodes (if using
ASM)
- $ORACLE_HOME/oui/bin/addnode.sh from one of existing nodes
- racgons -add_config to add ONS metadata to OCR from one of existing
nodes