Testing Veritas Cluster

Actual commands are in black.

0. Check Veritas Licenses - for FileSystem, Volume Manager AND Cluster
vxlicense -p If any licenses are not valid or expired -- get them FIXED before continuing! All licenses should say "No expiration". If ANY license has an actual expiration date, the test failed. Permenant licenses do NOT have an expiration date. Non-essential licenses may be moved -- however, a senior admin should do this.

1. Hand check SystemList & AutoStartList
On either machine: grep SystemList /etc/VRTSvcs/conf/config/main.cf You should get: SystemList = { system1, system2 } grep AutoStartList /etc/VRTSvcs/conf/config/main.cf You should get: AutoStartList = { system1, system2 } Each list should contain both machines. If not, many of the next tests will fail. If your lists do NOT contain both systems, you will probably need to modify them with commands that follow. more /etc/VRTSvcs/conf/config/main.cf (See if it is reasonable. It is likely that the systems aren't fully set up) haconf -makerw (this lets you write the conf file) hagrp -modify oragrp SystemList system1 0 system2 1 hagrp -modify oragrp AutoStartList system1 system2 haconf -dump -makero (this makes conf file read only again)

2. Verify Cluster is Running
First verify that veritas is up & running: hastatus -summary If this command could NOT be found, add the following to root's path in /.profile: vi /.profile add /opt/VRTSvcs/bin to your PATH variable

Note: one system should ALWAYS be OFFLINE for the way we configure systems here. (If we ran oracle parallel server. start other system with hastart.System State Frozen • • .) [You ran hastart and that wasn't enough to get full cluster to work. use this one: PATH=/usr/bin:/usr/sbin:/usr/ucb:/usr/local/bin:/opt/VRTSvcs/bin: /sbin:$PATH export PATH .System A A e4500a e4500b State RUNNING RUNNING System e4500a e4500b Frozen 0 0 Probed Y Y AutoDisabled N N -.GROUP STATE -. /.profile Re-verify command now runs if you changed /. this could change -.Group State B oragrp ONLINE B oragrp OFFLINE If your systems do not show the above status.profile does not already exist.SYSTEM STATE -.] Verify that the systems have the following EXACT status (though your machine names will vary for other customers): gedb002# hastatus -summary -. try these debugging steps: • If NO systems are up. run hastart on both systems and run hastatus -summary again.profile: hastatus -summary Here is the expected result (your SYSTEMs/GROUPs may vary): One system should be OFFLINE and one system should be ONLINE ie: # hastatus -summary -. (This happens under strange network situations with GE Access.SYSTEM STATE -. If only one system is shown.If /. the cluster needs to be reset.but currently we run standard oracle server) If both systems are up but are OFFLINE and hastart did NOT correct the problem and oracle filesystems are not running on either system.

so please determine it's name. On the system that is running (system1).gedb002 ID 957266358.1025. switch veritas to other system (system2): hagrp -switch groupname -to system2 [ie: hagrp -switch oragrp -to e4500b] .gedb001 gedb002# vxdg list NAME STATE rootdg enabled gedb001# vxdg list NAME STATE rootdg enabled Recovery Commands: hastop -all on one machine hastart wait a few minutes on other machine hastart hastatus -summary (make sure one is OFFLINE && one is ONLINE) If none of these steps resolved the situation.1025. note the GROUP name used. contact Lorraine or Luke (possibly Russ Button or Jen Redman if they made it to Veritas Cluster class) or a Veritas Consultant. but the installer can use any name.Group AutoDisabled B oragrp OFFLINE B oragrp OFFLINE gedb002# nic-qfe3 nic-qfe3 Y Y N N hares -display | grep ONLINE State gedb001 ONLINE State gedb002 ONLINE ID 957265489. it will be "oragrp".A A gedb001 gedb002 RUNNING RUNNING System State gedb001 gedb002 0 0 Probed -. First check if group can switch back and forth. Verify Services Can Switch Between Systems Once.GROUP STATE -. hastatus -summary works. 3. Usually.

reboot the computer.and system 1 is now online You need to CLEAR the fault before trying to fail back over. we will kill off the listener process.. Once it is failed over.in this case 831) Failover will take a few minutes You will note that system 2 is faulted -. which should force a failover. On system that is online (should be system2). Test Actual Failover For System 2 (and pray db is okay) To do this. hares -display | grep FAULT for the resource that is failed (in this case.5/bin/tnslsnr LISTENER -inherit kill -9 process-id (the first # in list . This test SHOULD be okay for the db (that is why we choose LISTENER) but there is a very small chance things will go wrong . the second system should say it is offline using hastatus.1. Verify OTHER System Can Go Up & Down Smoothly For Maintanence On system that is OFFLINE (should be system 2 at this point). when the reboot is finished. That is. switch it back: hagrp -switch groupname -to system1 4. LISTENER) Clear the fault hares -clear resource-name -sys faulted-system [ie: hares -clear LISTENER -sys e4500b] . kill off ORACLE LISTENER Process ps -ef | grep LISTENER Output should be like: root 1415 600 0 20:43:58 pts/0 0:00 grep LISTENER oracle 831 1 0 20:27:06 ? 0:00 /apps/oracle/product/8. ssh system2 /usr/sbin/shutdown -i6 -g0 -y Make sure that the when the system comes up & is running after the reboot. hastatus -summary Once this is done.Watch failover with hastatus -summary. hence the "pray" part :). hagrp -switch groupname -to system2 and repeat reboot for the other system hagrp -switch groupname -to system2 ssh system1 /usr/sbin/shutdown -i6 -g0 -y Verify that system1 is in cluster once rebooted hastatus -summary 5.

.5/bin/tnslsnr LISTENER -inherit root 1330 631 0 20:58:29 pts/0 0:00 grep LISTENER kill -9 process-id (the first # in list . hares -display | grep FAULT for the resource that is failed (in this case.and system 1 is now online You need to CLEAR the fault before trying to fail back over.. To do this. we will kill off the listener process.1.in this case 987) Failover will take a few minutes You will note that system 1 is faulted -. which should force a failover. kill off ORACLE LISTENER Process ps -ef | grep LISTENER Output should be like: oracle 987 1 0 20:49:19 ? 0:00 /apps/oracle/product/8. .6. On system that is online (should be system2). LISTENER) Clear the fault hares -clear resource-name -sys faulted-system [ie: hares -clear LISTENER -sys e4500a] Run: hastatus -summary to make sure everything is okay. Test Actual Failover For System 1 (and pray db is okay) Now we do same thing for the other system first verify that the other system is NOT faulted hastatus -summary Now do the same thing on this system.

Sign up to vote on this title
UsefulNot useful