Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Sun Cluster 3.2 Cheat Sheet

Sun Cluster 3.2 Cheat Sheet



|Views: 16,392|Likes:
Published by jigspaps

More info:

Published by: jigspaps on Oct 01, 2009
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Cluster Configuration Repository (CCR)
Important Files
Global Services
One node is to specific global services. All other nodes communicate with the global services (devices, filesystems)via the Cluster interconnect.
Global Naming (DID Devices)
/dev/did/dsk and /dev/did/rdsk
DID used only for naming globally — not accessDID device names cannot/are not used in VxVMDID device names are used in Sun/Solaris Volume Manager
Global Devices
provide global access to devices irrespective of there physical location.most commonly SDS/SVM/VxVM devices are used as global devices. LVM software is unaware of theimplementation of global nature on these devices.
is an integer representing the node in the cluster
Global Filesystems
 mount -o global, logging /dev/vx/dsk/nfsdg/vol01 /global/nfs
or edit the
file to contain the following:
/dev/vx/dsk/nfsdg/vol01 /dev/vx/rdsk/nfsdg/vol01 /global/nfs ufs 2 yes global,logging
Global Filesystem is also known as (aka) Cluster Filesystem (CFS) or PxFS (Proxy File system)
Local failover filesystems (i.e. directly attached to a storage device) cannot be used for scalable services — onewould have to use global filesystems for it.
Console Software
There are three wariants of the cluster console software:
(access the node consoles through the TC or other remote console access method)
as underlying transport)
as underlying transport)
/opt/SUNWcluster/bin/ &
Cluster Control Panel
/opt/SUNWcluster/bin/ccp [ clustername ] &
All necessary info for cluster admin is stored in the following two files:
sc-cluster sc-node1 sc-node2/etc/serialportssc-node1 sc-tc 5002 # Connect via TCP port on TCsc-node2 sc-tc 5003sc-10knode1 sc10k-ssp 23 # connect via E10K SSPsc-10knode2 sc10k-ssp 23sc-15knode1 sf15k-mainsc 23 # Connect via 15K Main SCe250node1 RSCIPnode1 23 # Connect via LAN RSC on a E250node1 sc-tp-ws 23 # Connect via a tip launchpadsf1_node1 sf1_mainsc 5001 # Connect via passthru on midframe
Sun Cluster Set up
Don't mix PCI and SBus SCSI devices
Quorum Device Rules
A quorum device must be available to both nodes in a 2-node clusterquorum device info is maintained globally in the CCR dbquorum device should contain user dataMax and optimal number of votes contributed by quorum devices must be N -1 (where N == number of nodesin the cluster)If # of quorum devices >= # of nodes, Cluster cannot come up easily if there are too many failed/erroredquorum devices
quorum devices are not required in clusters with more than 2 nodes, but recommended for higher clusteravailabilityquorum devices are manually configured after Sun Cluster s/w installation is donequorum devices are configured using DID devices
Quorum Math and Consequences
A running cluster is always aware of (Math):Total possible Q votes (number of nodes + disk quorum votes)Total present Q votes (number of booted nodes + available quorum device votes) --> Total needed Q votes (>= 50% of possible votes)Consequences:Node that cannot find adequate Q votes will freeze, waiting for other nodes to join the clusterNode that is booted in the cluster but can no longer find the needed number of votes kernel panics
Flag — allows for cluster nodes to be rebooted after/during initial installation without causing the other(active) node(s) to panic.
Cluster status
Reporting the cluster membership and quorum vote information
/usr/cluster/bin/scstat –q
Verifying cluster configuration info
scconf –p
to correct any configuration mistakes and/or to:
add or remove quorum disksadd, remove, enable, disable cluster transport componentsregister/unregister vxVM device groupsadd/remove node access from a VxVM device groupchange clsuter private host nameschange cluster name
Shuting down cluster on all nodes
scshutdown -y -g 15
#verifies cluster status
Cluster Daemons
lahirdx@aescib1:/home/../lahirdx >
 ps -ef|grep cluster|grep -v grep
root 4 0 0 May 07 ? 352:39 clusterroot 111 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/qd_userdroot 120 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/failfastdroot 123 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/clexecdroot 124 123 0 May 07 ? 0:00 /usr/cluster/lib/sc/clexecdroot 1183 1 0 May 07 ? 46:45 /usr/cluster/lib/sc/rgmdroot 1154 1 0 May 07 ? 0:07 /usr/cluster/lib/sc/rpc.fedroot 1125 1 0 May 07 ? 23:49 /usr/cluster/lib/sc/sparcv9/rpc.pmfdroot 1153 1 0 May 07 ? 0:03 /usr/cluster/lib/sc/cl_eventdroot 1152 1 0 May 07 ? 0:04 /usr/cluster/lib/sc/cl_eventlogdroot 1336 1 0 May 07 ? 2:17 /var/cluster/spm/bin/scguieventd -droot 1174 1 0 May 07 ? 0:03 /usr/cluster/bin/pnmdroot 1330 1 0 May 07 ? 0:01 /usr/cluster/lib/sc/scdpmdroot 1339 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/cl_ccrad
FF Panic rule — failfast will shutdown the node (panic the kernel) if specified daemon is not restarted within30 seconds
 — System proc created by the kernel to encap kernel threads that make up the core kernel range ofoperations. It directly panics the kernel if it's sent a
signal (
). Other signals have no effect.
 — This is used by cluster kernel threads to execute userland cmds (such as
cmds). It is also used to run cluster cmds remotely (eg:
).A failfast driver panics the kernel if this daemonis killed and not restarted in 30 seconds.
 — This daemon registers and forwards cluster events s(eg: nodes entering and leaving the cluster). Witha min of SC 3.1 10/03, user apps can register themselves to receive cluster events. The daemon automatically getsrespawned by
if it is killed.
 — This is the resource group mgr, which manages the state of all cluster-unaware applications. A failfast driverpanics the kernel if this daemon is killed by not started in 30 seconds.
 — This is the "fork-and-exec" daemon, which handles reqs from
to spawn methods for specific dataservices. failfast will hose the box if this is killed and not restarted in 30 seconds.
 — This daemon processes cluster events for the SunPlex or Sun Cluster Mgr GUI, so that the displaycan be updated in real time. It's not automatically started if it stops. If you are having trouble with SunPlex or SunCluster Mgr, might have to restart the daemon or reboot the specific node.
 — This is the process monitoring facility. It is i used as a general mech to initiate restarts and failureaction scripts for some cluster f/w daemons, and for most app daemons and app fault monitors. FF panic rule holdsgood.
 — This is the public Network mgt daemon, and manages n/w status info received from the local IPMP(
) running on each node in the cluster. It is automatically restarted by
if it dies.
 — multi-threaded DPM daemon runs on each node. DPM daemon is started by an rc script when a nodeboots. It montiors the availability of logical path that is visible thru various multipath drivers (MPxIO), HDLM,Powerpath, etc. Automatically restarted by
if it dies.
Validating basic cluster config
) cmd validates the cluster configuration:
is the repository where it stores the reports generated.
Disk Path Monitoring
scdpm -p all:all
prints all disk paths in the cluster and their status
scinstall -pv
checks the cluster installation status — package revisions, patches applied, etc.Cluster release file:
Shuting down cluster
scshutdown -y -g 30
Booting nodes in non-cluster mode
 boot -x
Placing node in maintenance mode
scconf -c -q node=,maintstate
Reset the maintenance mode by rebooting the node or running
scconf -c -q reset
By placing a node in a cluster in maintenance mode, we reduce the number of reqd. quorumvotes and ensure that cluster operation is not disrupted as a result thereof).Sunplex or Sun Cluster Manager is available on
VxVM Rootdg requirements for Sun Cluster
major number has to be identical on all nodes of the cluster (check for
entry in
installed on all nodes physically connected to shared storage — on non-storage nodes, yvxvm can be used toencapsulate and mirror the boot disk. If not using VxVM on a non-storage node, use SVM. All is required in such acase is the
major number be identical to all other nodes of the cluster (add an entry in
file).VxVM license is reqd. on all nodes not connected to a A5x00 StorEdge array.Std rootdg created on all nodes where vxVM is installed. Options to initialize rootdg on each node are:Encap boot disk so it can be mirroered. Preserve all data and creating volumes inside rootdg to encap
If disk has more than 5 slices on it, it cannot be encap'ed.Initialize other local disks into rootdg.Unique volume name and minor number across the nodes for the
file system ifthe boot disk is encap'ed — the
file system must be on devices with a unique nameoneach node, because it's mounted on each node for the same reason. The normal Solaris OS
logicredates global fs and still demands that each device have a unique major/minor number. VxVM doesn't supportchanging minor numbers of individual volumes. The entire disk group has to be re-minored.Use the following command:
vxdg [ -g diskgroup ] [ -f ] reminor [diskgroup ] new-base-minor
From the
man pages:
reminor Changes the base minor number for a disk group, and renumbers all devices inthe disk group to a range starting at that number. If the device for a volume is open,then the old device number remains in effect until the system is rebooted or until thedisk group is deported and re-imported.Also, if you close an open volume, then the user can execute vxdg reminor again tocause the renumbering to take effect without rebooting or reimporting.A new device number may also overlap with a temporary renumbering for a volume device.This also requires a reboot or reimport for the new device numbering to take effect. Atemporary renumbering can happen in the following situations:when two volumes (for example, volumes in two different disk groups) share the samepermanently assigned device number, in which case one of the volumes is renumberedtemporarily to use an alternate device number;or when the persistent device number for a volume was changed, but the active devicenumber could not be changed to match.

Activity (110)

You've already reviewed this. Edit your review.
1 hundred reads
1 thousand reads
Jaison Mathew added this note
nice docs............
Manoj Nagarale liked this
Shakil Qureshi liked this
woky_s liked this
Tom Bodnar liked this
samousameer liked this

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->