Professional Documents
Culture Documents
Sun Cluster 3 2 Cheat Sheet
Sun Cluster 3 2 Cheat Sheet
/etc/cluster/ccr (directory)
Important Files
/etc/cluster/ccr/infrastructure
Global Services
One node is to specific global services. All other nodes communicate with the global services (devices, filesystems) via the Cluster interconnect.
Global Devices
provide global access to devices irrespective of there physical location. most commonly SDS/SVM/VxVM devices are used as global devices. LVM software is unaware of the implementation of global nature on these devices. /global/.devices/node@nodeID nodeID is an integer representing the node in the cluster
Global Filesystems
# mount -o global, logging /dev/vx/dsk/nfsdg/vol01 /global/nfs or edit the /etc/vfstab file to contain the following: /dev/vx/dsk/nfsdg/vol01 /dev/vx/rdsk/nfsdg/vol01 /global/nfs ufs 2 yes global,logging Global Filesystem is also known as (aka) Cluster Filesystem (CFS) or PxFS (Proxy File system) Note Local failover filesystems (i.e. directly attached to a storage device) cannot be used for scalable services one would have to use global filesystems for it.
Console Software
SUNWccon There are three wariants of the cluster console software: cconsole (access the node consoles through the TC or other remote console access method) crlogin (uses rlogin as underlying transport) ctelnet (uses telnet as underlying transport) /opt/SUNWcluster/bin/ & Cluster Control Panel /opt/SUNWcluster/bin/ccp [ clustername ] & All necessary info for cluster admin is stored in the following two files: /etc/clusters e.g. sc-cluster sc-node1 sc-node2 /etc/serialports sc-node1 sc-tc 5002 # Connect via TCP port on TC sc-node2 sc-tc 5003 sc-10knode1 sc10k-ssp 23 # connect via E10K SSP sc-10knode2 sc10k-ssp 23 sc-15knode1 sf15k-mainsc 23 # Connect via 15K Main SC e250node1 RSCIPnode1 23 # Connect via LAN RSC on a E250 node1 sc-tp-ws 23 # Connect via a tip launchpad sf1_node1 sf1_mainsc 5001 # Connect via passthru on midframe
quorum devices are not required in clusters with more than 2 nodes, but recommended for higher cluster availability quorum devices are manually configured after Sun Cluster s/w installation is done quorum devices are configured using DID devices Quorum Math and Consequences A running cluster is always aware of (Math): Total possible Q votes (number of nodes + disk quorum votes) Total present Q votes (number of booted nodes + available quorum device votes) --> Total needed Q votes ( >= 50% of possible votes) Consequences: Node that cannot find adequate Q votes will freeze, waiting for other nodes to join the cluster Node that is booted in the cluster but can no longer find the needed number of votes kernel panics installmode Flag allows for cluster nodes to be rebooted after/during initial installation without causing the other (active) node(s) to panic.
Cluster status
Reporting the cluster membership and quorum vote information # /usr/cluster/bin/scstat q Verifying cluster configuration info # scconf p Run scsetup to correct any configuration mistakes and/or to: add or remove quorum disks add, remove, enable, disable cluster transport components register/unregister vxVM device groups add/remove node access from a VxVM device group change clsuter private host names change cluster name Shuting down cluster on all nodes # scshutdown -y -g 15 # scstat #verifies cluster status Cluster Daemons lahirdx@aescib1:/home/../lahirdx > ps -ef|grep cluster|grep -v grep root 4 0 0 May 07 ? 352:39 cluster root 111 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/qd_userd root 120 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/failfastd root 123 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/clexecd root 124 123 0 May 07 ? 0:00 /usr/cluster/lib/sc/clexecd root 1183 1 0 May 07 ? 46:45 /usr/cluster/lib/sc/rgmd root 1154 1 0 May 07 ? 0:07 /usr/cluster/lib/sc/rpc.fed root 1125 1 0 May 07 ? 23:49 /usr/cluster/lib/sc/sparcv9/rpc.pmfd root 1153 1 0 May 07 ? 0:03 /usr/cluster/lib/sc/cl_eventd root 1152 1 0 May 07 ? 0:04 /usr/cluster/lib/sc/cl_eventlogd root 1336 1 0 May 07 ? 2:17 /var/cluster/spm/bin/scguieventd -d root 1174 1 0 May 07 ? 0:03 /usr/cluster/bin/pnmd root 1330 1 0 May 07 ? 0:01 /usr/cluster/lib/sc/scdpmd root 1339 1 0 May 07 ? 0:00 /usr/cluster/lib/sc/cl_ccrad FF Panic rule failfast will shutdown the node (panic the kernel) if specified daemon is not restarted within 30 seconds cluster System proc created by the kernel to encap kernel threads that make up the core kernel range of operations. It directly panics the kernel if it's sent a KILL signal (SIGKILL). Other signals have no effect. clexecd This is used by cluster kernel threads to execute userland cmds (such as run_reserve and dofsck cmds). It is also used to run cluster cmds remotely (eg: scshutdown).A failfast driver panics the kernel if this daemon is killed and not restarted in 30 seconds. cl_eventd This daemon registers and forwards cluster events s(eg: nodes entering and leaving the cluster). With a min of SC 3.1 10/03, user apps can register themselves to receive cluster events. The daemon automatically gets respawned by rpc.pmfd if it is killed. rgmd This is the resource group mgr, which manages the state of all cluster-unaware applications. A failfast driver panics the kernel if this daemon is killed by not started in 30 seconds. rpc.fed This is the "fork-and-exec" daemon, which handles reqs from rgmd to spawn methods for specific data services. failfast will hose the box if this is killed and not restarted in 30 seconds. scguieventd This daemon processes cluster events for the SunPlex or Sun Cluster Mgr GUI, so that the display can be updated in real time. It's not automatically started if it stops. If you are having trouble with SunPlex or Sun Cluster Mgr, might have to restart the daemon or reboot the specific node.
rpc.pmfd This is the process monitoring facility. It is i used as a general mech to initiate restarts and failure action scripts for some cluster f/w daemons, and for most app daemons and app fault monitors. FF panic rule holds good. pnmd This is the public Network mgt daemon, and manages n/w status info received from the local IPMP (in.mpathd) running on each node in the cluster. It is automatically restarted by rpc.pmfd if it dies. scdpmd multi-threaded DPM daemon runs on each node. DPM daemon is started by an rc script when a node boots. It montiors the availability of logical path that is visible thru various multipath drivers (MPxIO), HDLM, Powerpath, etc. Automatically restarted by rpc.pmfd if it dies. Validating basic cluster config The sccheck (/usr/cluster/bin/sccheck) cmd validates the cluster configuration: /var/cluster/sccheck is the repository where it stores the reports generated. Disk Path Monitoring scdpm -p all:all prints all disk paths in the cluster and their status scinstall -pv checks the cluster installation status package revisions, patches applied, etc. Cluster release file: /etc/cluster/release Shuting down cluster scshutdown -y -g 30 Booting nodes in non-cluster mode boot -x Placing node in maintenance mode scconf -c -q node=,maintstate Reset the maintenance mode by rebooting the node or running scconf -c -q reset By placing a node in a cluster in maintenance mode, we reduce the number of reqd. quorum votes and ensure that cluster operation is not disrupted as a result thereof). Sunplex or Sun Cluster Manager is available on https\:\:3000.
The active number may be left unchanged after a persistent device number change either because the volume device was open, or because the new number was in use as the active device number for another volume. vxdg fails if you try to use a range of numbers that is currently in use as a persistent (not a temporary) device number. You can force use of the number range with use of the -f option. With -f, some device renumberings may not take effect until a reboot or a re-import (just as with open volumes). Also, if you force volumes in two disk groups to use the same device number, then one of the volumes is temporarily renumbered on the next reboot. Which volume device is renumbered should be considered random, except that device numberings in the rootdg disk group take precedence over all others. The -f option should be used only when swapping the device number ranges used by two or more disk groups. To swap the number ranges for two disk groups, you would use -f when renumbering the first disk group to use the range of the second disk group. Renumbering the second disk group to the first range does not require the use of -f. Sun Cluster does not work with Veritas DMP. DMP can be disabled before installing the software by putting in dummy symlinks, etc. scvxinstall is a shell script that automates VxVM installation in a Sun Clustered environment scvxinstall automates the following things: tries to disable DMP (vxdmp) installs correct cluster package automatically negotiates a vxio major number and properly edits /etc/name_to_major automates rootdg initialization process and encapsulates boot disk gives different device names for the /global/.devices/node@# volumes on each side edits the vfstab properly for this same volume. The problem is this particular line has DID device on it, and VxVM doesn't understand DID devices. installs a script to "reminor" the rootdg on the reboot reboots the node so that VxVM operates properly
returns node1:/dev/rdsk/c1t1d0 /dev/did/rdsk/d7 Then use following cmds to update and verify the DID info: scdidadm -R d7 scdidadm -l -o diskid d7 returns a large string with disk id. Replacing a failed disk in a A5200 Array (similar concept with other FC disk arrays) vxdisk list #get the failed disk name vxprint -g dgname #determine state of the volume(s) that might be affected On the hosting node, replace the failed disk: luxadm remove enclosure,position luxadm insert enclosure,position On either node of the cluster (that hosts the device group): scdidadm -l c#t#d# scdidadm -R d# On the hosting node: vxdctl enable vxdiskadm #replace failed disk in vxvm vxprint -g vxtask list #ensure that resyncing is completed Remove any relocated submirrors/plexes (if hot-relocation had to move something out of the way): vxunreloc repaired-diskname Solaris Vol Mgr (SDS) in Sun Clustered Env Preferred method of using Soft partitions is to use single slices to create mirrors and then create volumes (soft partitions) from that (kind of similar to VxVM public region in an initialized disk). Shared Disksets and Local Disksets Only disks that are physically located in the multi-ported storage will be members of shared disksets. Only disks that are in the same diskset operate as a unit; they can be used together to build mirrored volumes, and primary ownership of the diskset transfers as a while from node to node. Boot disks are the local disksets. This is a pre-requisite in order to have shared disksets. Replica management Add local replicas manually. Put local state db replicas on slice 7 of disks (as a convention) in order to maintain uniformity. Shared disksets have to have replicas on slice 7. Spread local replicas evenly across disks and controllers. Support for Shared disksets is provided by Pkg SUNWmdm Modifying /kernel/drv/md.conf nmd \=\= max num of volumes (default 128) md_nsets \=\= max is 32, default 4. Creating shared disksets and mediators scdidadm -l c1t3d0 Returns d17 as DID device scdidadm -l d17 metaset -s -a -h # creates metaset metaset -s -a -m # creates mediator metaset -s -s /dev/did/rdsk/d9 /dev/did/rdsk/d17 metaset # returns values metadb -s medstat -s # reports mediator status Remaining syntax vis-a-vis Sun Cluster is identical to that for VxVM. IPMP and Sun Cluster IPMP is cluster un-aware. To work around that, Sun Cluster uses Cluster-specific public network mgr daemon (pnmd) to integrate IPMP into the cluster. pmnd daemon has two capabilities: populate CCR with public network adapter status facilitate application failover When pnmd detects all members of a local IPMP group have failed, it consults a file called /var/cluster/run/pnm_callbacks. This file contains entries that would have been created by the activation of LogicalHostname and SharedAddress resources. It is the job of hafoip_ipmp_callback to device whether to migrate resources to another node. scstat -i #view IPMP configuration
file systems (to failover, local file system must reside on global device groups with affinity switchovers enabled) Data Service Agent is a specially written software that allows a data service in a cluster to operate properly. Data Service Agent (or Agent) does the following to a standard application: stops/starts an application monitors faults validates configuration provides a registration information file that allows Sun Cluster to store all the info about the methods Sun Cluster 2.x runs Fault Monitoring components on failover node, and can initiate a takeover. On Cluster 3.x software, it is not allowed. Monitor can either monitor to restart or failover on primary (active host) node. Failover resource groups: Logical host resource SUNW.Logicalhostname Data Storage Resource SUNW.HAStoragePlus NFS resource SUNW.nfs Shutdown a resource group scswitch -F -g Turn on a resource group scswitch -Z -g Switch a failover group over to another node scswitch -z -g -h Restart a resource group scswitch -R -h -g Evacuate all resources and rgs from a node scswitch -S -h node Disable a res and its fault monitor scswitch -n -j Enable a resource and it's fault monitor scswitch -e -j Clear the STOP_FAILED flag scswitch -c -j -h -f STOP_FAILED How to add a diskgroup and volume to Cluster configuration 1. Create the disk group and volume. 2. Register the local disk group with the cluster. root@aesnsra1:../ # scconf -a -D type=vxvm,name=patroldg2,nodelist=aesnsra2 root@aesnsra2:../ # scswitch -z -h aesnsra2 -D patroldg2 3. Create your file system. 4. Update /etc/vfstab to change '-' boot options Example: /dev/vx/dsk/patroldg2/patroldg02 /dev/vx/rdsk/patroldg2/patroldg02 \ /patrol02 vxfs 3 no suid 5. Set up a resource group with a HAStoragePlus resource for local filesystem: root@aesnsra2:../ # scrgadm -a -g aescib1-hastp-rg -h aescib1 root@aesnsra2:../ # scrgadm -a -g aescib1-hastp-rg -j sapmntdg01-rs \ -t SUNW.HAStoragePlus -x FilesystemMountPoints=/sapmnt 6. Bring the resource group online which will mount the specified filesystem: root@aesnsra2:../ # scswitch -Z -g hastp-aesnsra2-rg 7. Enable resource. root@aesnsra2:../# scswitch -e -j osdumps-dev-rs 8. (Optional) Reboot and test.
-x FilesystemMountpoints=/global/nfs -x AffinityOn=True Create SUNW.nfs resource scrgadm -a -j nfs-res -g nfs-rg -t SUNW.nfs -y Resource_dependencies=nfs-stor Print the various resource/resource group dependencies via scrgadm scrgadm -pvv|grep -i depend #And then parse this output Enable resource and resource monitors, manage resource group and switch resource group to online state scswitch -Z -f nfs-rg scstat -g Show current resource group configuration scrgadm -p[v[v]] [ -t resource_type_name ] [ -g resgrpname ] [ -j resname ] Resizing a VxVM/VxFS vol/fs under Sun Cluster # vxassist -g aesnfsp growby saptrans 5g # scconf -c -D name=aesnfsp,sync root@aesrva1:../ # vxprint -g aesnfsp -v saptrans TY NAME ASSOC KSTATE LENGTH PLOFFS STATE TUTIL0 PUTIL0 v saptrans fsgen ENABLED 188743680 - ACTIVE - root@aesrva1:../ # fsadm -F vxfs -b 188743680 /saptrans UX:vxfs fsadm: INFO: /dev/vx/rdsk/aesnfsp/saptrans is currently 178257920 sector s - size will be increased # root@aesrva1:../ # scconf -c -D name=aesnfsp,sync