Professional Documents
Culture Documents
Veritas Cluster 2 0
Veritas Cluster 2 0
Thank you.
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Services NFS Resources Using Cluster Manager
I-3
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Define VCS terminology. Describe cluster communication basics. Describe VERITAS Cluster Server architecture.
VCS_2.0_Solaris_R1.0_20011130
I-4
Clusters
Local Area Network
Fibre Switches
Several networked systems Shared storage Single administrative entity Peer monitoring
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-5
Systems
Members of a cluster Referred to as nodes Contain copies of:
Communication protocol configuration files VCS configuration files VCS libraries and directories VCS scripts and daemons
Service Groups
A service group is a related collection of resources. Resources in a service group must be available to the system. Resources and service groups have interdependencies.
IP Share
NFS Service Group
Mount
VCS_2.0_Solaris_R1.0_20011130
I-7
Parallel
Can be partially or fully online on multiple servers simultaneously Examples:
Oracle Parallel Server Web, FTP servers
VCS_2.0_Solaris_R1.0_20011130
I-8
Resources
VCS objects that correspond to hardware or software components Monitored and controlled by VCS Classified by type Identified by unique names and attributes Can depend on other resources within the same service group
VCS_2.0_Solaris_R1.0_20011130
I-9
Resource Types
General description of the attributes of a resource Example Mount resource type attributes:
MountPoint BlockDevice
VCS_2.0_Solaris_R1.0_20011130
I-10
Agents
Processes that control resources One agent per resource type Agent controls all resources of that type. Agents can be added into VCS agent framework.
/data Mount
c1t0d0s0
c1t0d1s0
hme0
qfe1
10.1.2.4 IP
Disk
NIC
VCS_2.0_Solaris_R1.0_20011130
I-11
Dependencies
Resources can depend on other resources. Parent resources depend on child resources. Service groups can depend on other service groups. Resource types can depend on other resource types. Rules govern service group and resource dependencies. No cyclic dependencies are allowed.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-12
Mount
(Parent)
Disk
(Child)
Private Network
Minimum two communication channels with separate infrastructure:
Multiple NICs (not just ports) Separate hubs, if used
Heartbeat communication determines which systems are members of the cluster. Cluster configuration broadcast updates cluster systems with status of each resource and service group.
VCS_2.0_Solaris_R1.0_20011130
I-13
Kernel Kernel
LLT
LLT
Private Network Private Network
Hardware Hardware
SystemA
SystemB
VCS_2.0_Solaris_R1.0_20011130 I-14
GAB
Kernel Kernel
GAB LLT
Private Network Private Network
LLT
Hardware Hardware
SystemA
SystemB
VCS_2.0_Solaris_R1.0_20011130 I-15
hashadow
LLT
Hardware Hardware
Copyright 2001 VERITAS Software
SystemA
SystemB
VCS_2.0_Solaris_R1.0_20011130 I-16
VCS Architecture
Shared Cluster Configuration in Memory Resources Resources Agents Agents /v Mount c1d0t0s0 Disk hme0 10.1.2.4 /v c1d0t0s0 hme0 10.1.2.4
NIC
IP
Mount
Disk
NIC
IP
hashadow
had
GAB LLT
hashadow
had
GAB LLT
Kernel Kernel
Hardware Hardware
Copyright 2001 VERITAS Software
SystemA
VCS_2.0_Solaris_R1.0_20011130
SystemB
I-17
Summary
You should now be able to: Define VCS terminology. Describe cluster communication basics. Describe VERITAS Cluster Server architecture.
VCS_2.0_Solaris_R1.0_20011130
I-18
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-20
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe VCS software, hardware, and licensing prerequisites. Describe the general VCS hardware requirements. Configure SCSI controllers for a shared disk storage environment. Add VCS executable and manual page paths to the environment variables. Install VCS using the installation script.
VCS_2.0_Solaris_R1.0_20011130
I-21
Hardware:
Check latest VCS release notes. Contact VERITAS Support.
Licenses:
Keys are required on a per-system or per-site basis. Contact VERITAS Sales for new license, or VERITAS Support for upgrades.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-22
OS
SCSI1
SCSI1
NICS
NICS
Public Network
SYSTEM A
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
SYSTEM B
I-23
5 OS Disk
scsi-initiator-id
0 7 SCSI1
7 SCSI2
0 SCSI1
SYSTEM A
SYSTEM B
VCS_2.0_Solaris_R1.0_20011130
I-24
VCS_2.0_Solaris_R1.0_20011130
I-25
MANPATH
setenv MANPATH ${MANPATH}:/opt/VRTS/man
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-26
Installation Settings
Information required by installvcs:
Cluster name Cluster number System names License key Network ports for private network Web Console configuration:
Virtual IP address Subnet mask Network interface
Summary
You should now be able to: Describe VCS software, hardware, and licensing prerequisites. Describe the general VCS hardware requirements. Configure SCSI controllers for a shared disk storage environment. Add VCS executable and manual page paths to the environment variables. Install VCS using the installation script.
VCS_2.0_Solaris_R1.0_20011130
I-33
0 OS Disk
train1
train2
# ./installvcs
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-34
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Services NFS Resources Using Cluster Manager
I-36
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe the cluster configuration mechanisms Start the VCS engine on cluster systems. Stop the VCS engine. Modify the cluster configuration. Describe cluster transition states.
VCS_2.0_Solaris_R1.0_20011130
I-37
Cluster Configuration
Shared Cluster Configuration in Memory
had main.cf
hashadow
hashadow
had main.cf
GAB LLT
GAB LLT
SystemA
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
SystemB
I-38
Starting VCS
1
System1
Cluster Conf 4 2
System2
No valid configuration
System3
main.cf
1
had hashadow
5
had hashadow
hastart
7 Private Network
hastart
VCS_2.0_Solaris_R1.0_20011130
I-39
System1
Cluster Conf
System2
Cluster Conf 10
System3
8
Copyright 2001 VERITAS Software
Private Network
VCS_2.0_Solaris_R1.0_20011130 I-40
Private Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-41
Stopping VCS
System1 System2
SGA
had 1
SGB
had 3
Copyright 2001 VERITAS Software
had
VCS_2.0_Solaris_R1.0_20011130 I-42
Options:
-local [-force | -evacuate] -sys sys_name [-force | -evacuate] -all [-force]
Example:
hastop -sys train4 -evacuate
VCS_2.0_Solaris_R1.0_20011130
I-43
Options:
-group service_group -sum[mary]
Example:
hastatus -group OracleSG
VCS_2.0_Solaris_R1.0_20011130
I-44
main.cf .stale
1 haconf -makerw 1. 2. 3.
main.cf .stale
hares add
main.cf
Cluster configuration opened; .stale file created Resources added to cluster configuration in memory; main.cf out of sync with memory configuration Changes saved to disk; .stale removed
VCS_2.0_Solaris_R1.0_20011130 I-45
Options:
-makerw -dump -dump makero Opens configuration Saves configuration Saves and closes configuration
Example:
haconf -dump -makero
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-46
System2
Cluster Conf
System3
main.cf
2 had hashadow
main.cf .stale
had hashadow
main.cf
hastart
3 Private Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-47
System3
main.cf
2
main.cf .stale
had hashadow
main.cf
hastart -force
3 Private Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-48
System1
System2
Cluster Conf 1
System3
main.cf .stale
had hashadow
main.cf .stale
had hashadow
main.cf .stale
had hashadow
2 Private Network
Options:
-force system_name -list -display system_name -delete system_name -add system_name
hastart -stale The -stale option causes these systems to wait until a running configuration is available from which they can build.
3. Start VCS on the system with the main.cf
VCS_2.0_Solaris_R1.0_20011130
I-51
Options:
-stale -force
Example:
hastart -force
VCS_2.0_Solaris_R1.0_20011130
I-52
VCS_2.0_Solaris_R1.0_20011130
I-53
ADMIN_WAIT
CURRENT_PEER_WAIT
No Peer Peer in RUNNING
STALE_ADMIN_WAIT
Peer starts LOCAL_BUILD
ADMIN_WAIT
LOCAL_BUILD
Disk Error
VCS_2.0_Solaris_R1.0_20011130
ADMIN_WAIT
FAULTED
LEAVING
EXITING_FORCIBLY
EXITING
EXITED
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-56
Summary
You should now be able to: Describe the cluster configuration mechanisms. Start VCS. Stop VCS. Modify the cluster configuration. Explain the transition states of the cluster.
VCS_2.0_Solaris_R1.0_20011130
I-57
VCS_2.0_Solaris_R1.0_20011130
I-58
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Services NFS Resources Using Cluster Manager
I-60
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Install Cluster Manager. Control access to VCS administration. Demonstrate Cluster Manager features. Create a service group. Create resources. Manage resources and service groups. Use the Web Console to administer VCS.
VCS_2.0_Solaris_R1.0_20011130
I-61
VCS_2.0_Solaris_R1.0_20011130
I-62
Can manage multiple clusters from a single workstation Uses TCP port 14141 by default; change with such an entry in /etc/services, if desired:
vcs 12345/tcp
VCS_2.0_Solaris_R1.0_20011130
I-63
Cluster Operator
All cluster, service group, and resource-level operations
Cluster Guest
Read-only access; new users created as Cluster Guest accounts by default.
Group Administrator
All service group operations for a specified service group, except deleting service groups
Group Operator
Online and offline service groups and resources; temporarily freeze or unfreeze service groups
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-64
Cluster Operator
Includes privileges for
Group Administrator
Includes privileges for
Group Operator
Includes privileges for
Cluster Guest
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-65
VCS user account admin is created with Cluster Administrator privilege by vcsinstall utility.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-66
To change a password:
hauser -update user_name
VCS_2.0_Solaris_R1.0_20011130
I-67
VCS_2.0_Solaris_R1.0_20011130
I-69
hagui&
2
3 4 6 5
VCS_2.0_Solaris_R1.0_20011130
I-70
Service Groups 3 4
VCS_2.0_Solaris_R1.0_20011130
I-71
VCS_2.0_Solaris_R1.0_20011130
I-72
2 4 3
VCS_2.0_Solaris_R1.0_20011130
I-73
Creating a Resource
3 1 2 4 5 6 7
VCS_2.0_Solaris_R1.0_20011130
I-74
1 2 3
VCS_2.0_Solaris_R1.0_20011130
I-75
1 2 4 3
VCS_2.0_Solaris_R1.0_20011130
I-76
VCS_2.0_Solaris_R1.0_20011130
I-77
VCS_2.0_Solaris_R1.0_20011130
I-78
Changing MonitorInterval
3 1 2 4
VCS_2.0_Solaris_R1.0_20011130
I-79
VCS_2.0_Solaris_R1.0_20011130
I-80
Faulted Resources
1 2
3
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-81
VCS_2.0_Solaris_R1.0_20011130
I-82
Log Desk
1
VCS_2.0_Solaris_R1.0_20011130
I-83
Command Log
1
VCS_2.0_Solaris_R1.0_20011130
I-84
Command Center
1 3
VCS_2.0_Solaris_R1.0_20011130
I-85
Shell Tool
1
2 5
4 3
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-86
VCS_2.0_Solaris_R1.0_20011130
I-87
Cannot be used to create resources or service groups Runs on any system with a Java-enabled Web browser
Copyright 2001 VERITAS Software
Can be used for all VCS administrative tasks Requires Cluster Manager and Java to be installed on the administration system
I-88
VCS_2.0_Solaris_R1.0_20011130
http://IP_alias:8181/vcs
Cluster Summary
Display Refresh Navigation buttons
Log entries
VCS_2.0_Solaris_R1.0_20011130
I-90
System View
Selected View Navigation trail
VCS_2.0_Solaris_R1.0_20011130
I-91
Summary
You should now be able to: Install Cluster Manager. Control access to VCS administration. Demonstrate Cluster Manager features. Create a service group. Create resources. Manage resources and service groups. Use the Web Console to administer VCS.
VCS_2.0_Solaris_R1.0_20011130
I-92
Student Blue
BlueGuiSG
RedFile
BlueFile
/tmp/RedFile
/tmp/BlueFile
VCS_2.0_Solaris_R1.0_20011130
I-93
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-95
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe how application services relate to service groups. Translate application requirements to service group resources. Define common service group attributes. Create a service group using the command line interface. Perform basic service group operations.
VCS_2.0_Solaris_R1.0_20011130
I-96
Application Service
Database Requests
SystemA
Web
Database
SystemB
Web
Database
VCS_2.0_Solaris_R1.0_20011130
I-99
Analyzing Applications
1. Specify application services
corresponding to service groups. service group type, failover or parallel. and the desired failover policy.
3. Specify which systems run which services 4. Identify the hardware and software objects
required for each service group and their dependencies. hardware and software objects.
VCS_2.0_Solaris_R1.0_20011130
I-100
Web
Database Application
VCS_2.0_Solaris_R1.0_20011130
I-102
Service Groups
Create a service group using the command line interface:
Syntax: hagrp -add group_name Example: hagrp add mySG
VCS_2.0_Solaris_R1.0_20011130
I-104
SystemList Attribute
Defines the systems that can run the service group Lowest numbered system has highest priority in determining the target system for failover. To define SystemList attribute:
Syntax:
hagrp modify group_name SystemList \ system1 priority1 system2 priority2
Example:
hagrp modify mySG SystemList \ train1 0 train2 1
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-105
Examples:
hagrp modify myManualSG AutoStart 0 hagrp modify mySG AutoStartList train0
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-106
AutoStartIfPartial Attribute
Allows VCS to bring a service group with disabled resources online All enabled resources must be probed. Default is 1, enabled. If 0, the service group cannot come online with disabled resources To define AutoStartIfPartial attribute:
Syntax:
hagrp modify group_name \ AutoStartIfPartial value
Example:
hagrp modify group_name \ AutoStartIfPartial 0
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-107
Parallel Attribute
Parallel service groups:
Run on more than one system at the same time Respond to system faults by:
Staying online on remaining systems Failing over to the specified target system
Example:
hagrp modify myparallelSG Parallel 1
Must set Parallel attribute before adding resources Default value: 0 (failover)
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-108
Done
Check Logs/Fix
Link Resources
Before
In-Progress
After
I-111
VCS_2.0_Solaris_R1.0_20011130
Before
In-Progress
After
I-112
VCS_2.0_Solaris_R1.0_20011130
IP
Mount
NIC
Disk
VCS_2.0_Solaris_R1.0_20011130
I-113
Example:
hagrp switch mySG to train8
VCS_2.0_Solaris_R1.0_20011130
I-114
hagrp flush group_name sys system_name Example: hagrp flush mySG sys train8
Summary
You should now be able to: Describe how application services relate to service groups. Translate application requirements to service group resources. Define common service group attributes. Create a service group using the command line interface. Perform basic service group operations.
VCS_2.0_Solaris_R1.0_20011130
I-117
RedGuiSG
BlueGuiSG
RedNFSSG
BlueNFSSG
VCS_2.0_Solaris_R1.0_20011130
I-118
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-120
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe the components required to create and share a file system using NFS. Prepare NFS resources. Describe the VCS network environment. Manually migrate the NFS services between two systems. Describe the process of automating high availability.
VCS_2.0_Solaris_R1.0_20011130
I-121
Network-related resources:
IP address Network interface
VCS_2.0_Solaris_R1.0_20011130
I-122
Disk Resources
/dev/(r)dsk/c1t1d0s3 /dev/(r)dsk/c1t1d0s3
Shared Storage
System 1 Partition 3
disk1
System 2
VCS_2.0_Solaris_R1.0_20011130
I-123
nfsd mountd
Shared Storage
nfsd mountd
System 1 Partition 3
disk1
System 2
VCS_2.0_Solaris_R1.0_20011130
I-124
running:
Note: The file system should not be shared automatically at boot time.
VCS_2.0_Solaris_R1.0_20011130
I-126
File System
NFS
Disk Partition
VCS_2.0_Solaris_R1.0_20011130
I-127
Application IP addresses
Added as a virtual IP address to the network interface, such as qfe1:1 Associated with an application service Controlled by the high availability software Migrated to other systems if the current system fails Also called service group or floating IP addresses
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-128
vi /etc/hostname.qfe1 train14_qfe1
2. Edit /etc/hosts and assign an IP address
vi /etc/hosts 166.98.112.14
3. Reboot the system.
train14_qfe1
VCS_2.0_Solaris_R1.0_20011130
I-129
to the IP address.
vi /etc/hosts 166.98.112.114
nfs_services
Share
Network Interface
File System
NFS
Disk Partition
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-131
Alternately:
touch /mount_point/sub_dir/.testfile rm /mount_point/sub_dir/.testfile
ifconfig -a
VCS_2.0_Solaris_R1.0_20011130
I-133
Make sure that the target system is available. Make sure that the disk is accessible from the target system. Make sure that the target system is connected to the network. Bring the NFS services down on the first system following the dependencies:
a. b. c.
Configure the application IP address down. Stop sharing the file system. Unmount the file system.
5.
Bring the NFS services up on the target system following the resource dependencies:
a. b. c. d.
Check and mount the file system. Start the NFS daemons if they are not already running. Share the file system. Configure and bring the application IP address up.
VCS_2.0_Solaris_R1.0_20011130 I-134
Script the start and stop processes. Use high availability software to automate:
Maintain communication between systems to verify that the target system is available for failover. Observe dependencies during starting and stopping. Define actions to take when a fault is detected.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-135
Summary
You should now be able to: Describe the components required to create and share a file system using NFS. Prepare NFS resources. Describe the VCS network environment. Manually migrate the NFS services between two systems. Describe the process of automating high availability.
VCS_2.0_Solaris_R1.0_20011130
I-136
Student Blue
BlueGuiSG
RedNFSSG
BlueNFSSG
c1t8d0s0 /Redfs
c1t15d0s0 /Bluefs
VCS_2.0_Solaris_R1.0_20011130
I-137
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-139
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe how resources and resource types are defined in VCS. Describe how agents work. Describe cluster configuration files. Modify the cluster configuration. Use the Disk resource and agent. Use the Mount resource and agent. Create a service group. Configure resources. Perform resource operations.
VCS_2.0_Solaris_R1.0_20011130
I-140
Resources
NFS Service Group
IP
Disk
VCS_2.0_Solaris_R1.0_20011130
I-141
Attributes
Mount MyNFSMount ( MountPoint = "/test" BlockDevice = "/dev/dsk/c1t2d0s4" FSType = vxfs ) Attribute Values
VCS_2.0_Solaris_R1.0_20011130
I-142
Persistent resources
Operations=OnOnly Operations=None
Resource Types
NFS_IP WEB_IP IP ORACLE_IP
Resource Types
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
Resources
I-144
Attribute Types
)
Copyright 2001 VERITAS Software
Mount MultiNICA NFS NIC Phantom Process Proxy ServiceGroupHB Share Volume
VCS_2.0_Solaris_R1.0_20011130 I-146
Agents
Periodically monitor resources and send status information to the VCS engine. Bring resources online when requested by the VCS engine. Take resources offline upon request. Restart resources when they fault (depending on the resource configuration). Send a message to the VCS engine and the agent log file when errors are detected.
VCS_2.0_Solaris_R1.0_20011130
I-147
IPAgent
VCS Engine
VCS_2.0_Solaris_R1.0_20011130 I-148
Enterprise Agents
Database Edition / HA 2.2 for Oracle Informix VERITAS NetBackup Oracle PC NetLink Sun Internet Mail Server (SIMS) Sybase VERITAS NetApp Apache Firewall (Checkpoint and Rapture) Netscape SuiteSpot
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-149
VCS_2.0_Solaris_R1.0_20011130
I-150
VCS_2.0_Solaris_R1.0_20011130
I-151
Service Group
Mount MyNFSMount ( MountPoint = /data BlockDevice = /dev/dsk/c1t1d0s3 FSType = vxfs ) Disk MyNFSDisk ( Partition = c1t1d0s3 ) MyNFSMount requires MyNFSDisk
Resources
Resource Attributes
Resource Dependencies
VCS_2.0_Solaris_R1.0_20011130
I-152
Offline configuration:
Edit main.cf. Restart VCS.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-153
Offline configuration:
Edit types.cf to change existing resource type definitions. Edit main.cf to add include statements for new agents with their own types file. Restart VCS.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-154
Use CLI.
Edit types.cf.
Required attributes:
Partition
No optional attributes
must exist.
VCS_2.0_Solaris_R1.0_20011130
I-156
Required attributes:
BlockDevice FSType MountPoint
Optional attributes:
FsckOpt, MountOpt, SnapUmount
VCS_2.0_Solaris_R1.0_20011130
I-157
Sample configuration:
Mount myNFSMount ( MountPoint = /export1 BlockDevice = /dev/dsk/c1t1d0s3 FSType = vxfs MountOpt = -o ro )
When setting MountOpt with hares, use % to escape arguments starting with dash (-):
hares modify myNFSMount MountOpt %-o ro
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-158
Done
Check Logs/Fix
Link Resources
Configuring a Resource
Add Resource Set Non-Critical Modify Attributes Enable Resource Bring Online Check Log Disable Resource Clear Resource Y Waiting to Online Faulted? Done
VCS_2.0_Solaris_R1.0_20011130 I-160
Flush Group
N Online? Y
Copyright 2001 VERITAS Software
Adding a Resource
Suggest using service group name as a prefix for resource names
VCS_2.0_Solaris_R1.0_20011130
I-161
Modifying a Resource
Enter values for each required attribute. Modify optional attributes, if necessary. See Bundled Agents Reference Guide for a complete description of all attributes.
VCS_2.0_Solaris_R1.0_20011130
I-162
VCS_2.0_Solaris_R1.0_20011130
I-163
Enabling a Resource
Resources must be enabled in order to be managed by the agent. If necessary, the agent initializes the resource when it is enabled. All required attributes of a resource must be set before the resource is enabled. By default, resources are not enabled.
VCS_2.0_Solaris_R1.0_20011130
I-164
VCS_2.0_Solaris_R1.0_20011130
I-165
Parent resources cannot be persistent type resources. You cannot link resources in different service groups. Resources can have an unlimited number of parent and child resources. Cyclical dependencies are not allowed.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-166
Linking Resources
VCS_2.0_Solaris_R1.0_20011130
I-167
VCS_2.0_Solaris_R1.0_20011130
I-168
Clearing Faults
Faulted resources must be cleared before they can be brought online. Persistent resources are cleared when the problem is fixed and they are probed by the agent. Offline resources are probed periodically. Resources can be manually probed.
VCS_2.0_Solaris_R1.0_20011130
I-169
Disabling a Resource
VCS calls agent on each system in SystemList. Agent calls Close entry point, if present, to reset the resource. Nonpersistent resources brought offline. Agent stops monitoring disabled resources.
VCS_2.0_Solaris_R1.0_20011130
I-170
Deleting a Resource
Before deleting a resource:
Take all parent resources offline. Take resource offline. Disable resource. Unlink any dependent resources.
VCS_2.0_Solaris_R1.0_20011130
I-171
Summary
You should now be able to: Describe how resources and resource types are defined in VCS. Describe how agents work. Describe cluster configuration files. Modify the cluster configuration. Use the Disk resource and agent. Use the Mount resource and agent. Create a service group. Configure resources. Perform resource operations.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-172
Student Blue
BlueNFSSG BlueGuiSG
BlueNFS Mount
RedNFSSG
RedNFS Mount
RedNFS Disk
BlueNFS Disk
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-175
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Prepare NFS services for the VCS environment. Describe the Share resource and agent. Describe the NFS resource and agent. Describe the NIC resource and agent. Describe the IP resource and agent. Configure and test an NFS service group.
VCS_2.0_Solaris_R1.0_20011130
I-176
IP
Disk
VCS_2.0_Solaris_R1.0_20011130
I-177
Major and minor numbers for block devices used for NFS services must be the same on each system.
NFS Request
Before Failover
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
After Failover
I-178
On System B:
grep ^vx /etc/name_to_major vxdmp vxio vxspec 89 90 91
VCS_2.0_Solaris_R1.0_20011130
I-180
Each system must have the same major number for the shared volume. Major numbers must also be unique within a system.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-181
Required attributes:
PathName Pathname of the file system
VCS_2.0_Solaris_R1.0_20011130
I-182
Required attributes: None Optional attributes: Nservers (default=16) Configuration prerequisites: None
Sample configuration:
NFS mySGNFS ( Nservers = 24 )
VCS_2.0_Solaris_R1.0_20011130
I-183
Required attributes:
Device
Optional attributes:
NetworkType, PingOptimize, NetworkHosts
VCS_2.0_Solaris_R1.0_20011130
I-184
Sample configuration:
NIC mySGNIC( Device = qfe1 NetworkHosts = { 192.20.47.254, 192.20.47.253 } )
VCS_2.0_Solaris_R1.0_20011130
I-185
Monitor
Required attributes:
Device Address
Optional attributes:
NetMask, Options, ArpDelay (default=1s), IfconfigTwice (default=0)
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-186
IP Resource Configuration
Configuration prerequisites: Configure a NIC resource. Sample configuration:
IP mySGIP ( Device = qfe1 Address = "192.20.47.61" )
VCS_2.0_Solaris_R1.0_20011130
I-187
Test
Troubleshoot Resources
Done
I-189
Troubleshooting Resources
hares -modify mySGIP Enabled 0 hagrp -flush mySG -sys sys1
Check Log Disable Resource Clear Resource Y Waiting to Online Faulted? Done
VCS_2.0_Solaris_R1.0_20011130 I-190
Flush Group
N Online? Y
Copyright 2001 VERITAS Software
hares -modify mySGIP Critical 1 hares -modify mySGNIC Critical 1 hares -modify
VCS_2.0_Solaris_R1.0_20011130
I-191
Summary
You should now be able to: Prepare NFS services for the VCS environment. Describe the Share resource and agent. Describe the NFS resource and agent. Describe the NIC resource and agent. Describe the IP resource and agent. Configure and test an NFS service group.
VCS_2.0_Solaris_R1.0_20011130
I-192
RedNFS Disk
BlueNFS Disk
VCS_2.0_Solaris_R1.0_20011130
I-193
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-195
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe the VCS notifier component. Configure the notifier to signal changes in cluster status. Describe SNMP configuration. Describe event triggers. Configure triggers to provide notification.
VCS_2.0_Solaris_R1.0_20011130
I-196
Notification
How VCS performs notification: 1. The had daemon sends a message to the notifier daemon when an event occurs. 2. The notifier daemon formats the event message and sends an SNMP trap or e-mail message (or both) to designated recipients. SMTP SNMP
notifier
had
had
VCS_2.0_Solaris_R1.0_20011130
I-197
had
had
VCS_2.0_Solaris_R1.0_20011130
I-198
Message Queues
1. 2. 3.
The had daemon stores a message in a queue when an event is detected. The message is sent over the private cluster network to all other had daemons to replicate the message queue. The notifier daemon can be started on another system in case of failure without loss of messages. SNMP SMTP SMTP SNMP
notifier
notifier
had
had
Replicated Queue
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-199
Configuring Notifier
The notifier daemon can be started and monitored by the NotifierMngr resource. Attributes define recipients and severity levels. For example: SmtpServer = "smtp.acme.com" SmtpRecipients = { "admin@acme.com" = Warning }
NotifierMngr NIC
VCS_2.0_Solaris_R1.0_20011130
I-200
VCS_2.0_Solaris_R1.0_20011130
I-201
Example resource configuration: NotifierMngr Notify_Ntfr ( PathName = "/opt/VRTSvcs/bin/notifier" SnmpConsoles = { snmpserv = Information } SmtpServer = "smtp.your_company.com" SmtpRecipients = { "vcsadmin@your_company.com" = SevereError }
Copyright 2001 VERITAS Software
VCS_2.0_Solaris_R1.0_20011130
I-202
SNMP Configuration
Load MIB for VCS traps into SNMP console. For HP OpenView Network Node Manager, merge events: xnmevents -merge vcs_trapd VCS SNMP configuration files: /etc/VRTSvcs/snmp/vcs.mib /etc/VRTSvcs/snmp/vcs_trapd
VCS_2.0_Solaris_R1.0_20011130
I-203
Event Triggers
How VCS performs notification:
1. VCS determines if notification is enabled.
If disabled, no action is taken. If enabled, VCS runs hatrigger with event-specific parameters. 2. The hatrigger script invokes the eventspecific trigger script with parameters passed by VCS. 3. The event trigger script performs the notification tasks.
VCS_2.0_Solaris_R1.0_20011130
I-204
Types of Triggers
Trigger
ResFault ResNotOff ResStateChange SysOffline InJeopardy NoFailover Violation LoadWarning PreOnline PostOnline PostOffline
Copyright 2001 VERITAS Software
Description
Resource faulted Resource not offline Resource changed state System went offline Cluster in jeopardy Service group cannot failover Resource online on more than one system System is overloaded
Script Name
resfault resnotoff resstatechan ge sysoffline injeopardy nofailover violation loadwarning
Service group about to come preonline online postonline Service group went online Service group went offline
VCS_2.0_Solaris_R1.0_20011130
postoffline
I-205
Configuring Triggers
Triggers enabled by presence of script file:
ResFault ResNotOff SysOffline InJeopardy Violation NoFailover PostOffline PostOnline LoadWarning
VCS_2.0_Solaris_R1.0_20011130
I-206
Sample Triggers
Sample trigger scripts include example code to send an e-mail message. Mail must be configured on the system invoking trigger to use sample e-mail code.
# Here is a sample code to notify a bunch of users. # @recipients=("username@servername.com"); # $msgfile="/tmp/resnotoff$2"; # `echo system = $ARGV[0], resource = $ARGV[1] > $msgfile`; # # foreach $recipient (@recipients) { # # # } #`rm $msgfile`;
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-207
# Must have elm setup to run this. `elm -s resnotoff $recipient < $msgfile`;
ResFault Trigger
Provides notification that a resource has faulted Arguments to resfault: system: Name of the system where the resource faulted resource: Name of the faulted resource
VCS_2.0_Solaris_R1.0_20011130
I-208
ResNotOff Trigger
Provides notification that a resource has not been taken offline If a resource is not offline on one system, the service group cannot be brought online on another. VCS cannot fail over the service group in the event of a fault, because the resource will not come offline. Arguments to resnotoff: system: Name of the system where the resource is not offline resource: Name of the resource that is not offline
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-209
ResStateChange Trigger
Provides notification that a resource has changed state Set at the service group level by the ResStateChange attribute:
hagrp serv_grp -modify TriggerResStateChange
Arguments to resstatechange: system: Name of the system where the resource faulted resource: Name of the faulted resource previous_state: State of the resource before change new_state: State of the resource after change
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-210
SysOffline Trigger
Provides notification that a system has gone offline Executed on another system when no heartbeat is detected Arguments to sysoffline: system: Name of the system that went offline systemstate: Value of the SysState attribute for the offline system
VCS_2.0_Solaris_R1.0_20011130
I-211
NoFailover Trigger
Run when VCS determines that a service group cannot fail over Executed on the lowest numbered system in a running state when the condition is detected Arguments to nofailover: systemlastonline: Name of the last system where the service group is online or partially online service_group: Name of the service group that cannot fail over
VCS_2.0_Solaris_R1.0_20011130
I-212
Summary
You should now be able to: Describe the VCS notifier component. Configure the notifier to signal changes in cluster status. Describe SNMP configuration. Describe event triggers. Configure triggers to provide notification.
VCS_2.0_Solaris_R1.0_20011130
I-213
RedNFSSG
notifier
Triggers
VCS_2.0_Solaris_R1.0_20011130
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-216
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe how VCS responds to faults. Implement failover policies. Set limits and prerequisites. Use system zones to control failover. Control failover behavior using attributes. Clear faults. Probe resources. Flush service groups. Test failover.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-217
Practice Exercise
Cas e NonCritica l Offlin e Take n offlin e due to fault Starts on another system
7 5 6 3 4 1 2 9 8
A B C D
4 4 4,6 4,6,7 4
6,7 7
I-219
Resource 4 Faults
E F
VCS_2.0_Solaris_R1.0_20011130
Practice Answers
Cas e NonCritica l Offlin e Take n offlin e due to fault Starts on another system
5 3 1
7 6 4 2
A
8 9
4 4 4,6 4,6,7 4
6,7 7
B C D
Resource 4 Fails
E F
VCS_2.0_Solaris_R1.0_20011130
Failover Attributes
AutoFailOver indicates whether automatic failover is enabled for the service group. Default value is 1, enabled. FailOverPolicy specifies how a target system is selected:
PrioritySystem with the lowest priority number in the list is selected (default). RoundRobinSystem with the least number of active service groups is selected. LoadSystem with greatest available capacity is selected.
Example configuration: hagrp modify group AutoFailOver 0 hagrp modify group FailOverPolicy Load
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-221
FailOverPolicy: Priority
Lowest numbered system in SystemList selected
AP1
Svr1
AP2
Svr2
FailOverPolicy: RoundRobin
System with fewest running service groups selected
Svr1
Svr3
Svr2
Svr4
VCS_2.0_Solaris_R1.0_20011130
I-223
FailOverPolicy: Load
Capacity = 100 AvailableCapacity = 70 AP1 Load = 30 SmSvr1 Load = 100 LgSvr1 Capacity = 100 AvailableCapacity = 80 Capacity = 200 AvailableCapacity = 100
DB1
AP2 Load = 20
Copyright 2001 VERITAS Software
SmSvr2
VCS_2.0_Solaris_R1.0_20011130 I-224
G1 Load=20 G6 Load=30
Svr1
G3 Load=30 G7 Load=20
Svr3
Svr2
Svr1
Svr3
Svr2
System Svr3 ( System Svr3 ( Capacity=100 Capacity=100 LoadWarningLevel=80 LoadWarningLevel=80 LoadTimeThreshold=600 LoadTimeThreshold=600 Svr4 ) )
VCS_2.0_Solaris_R1.0_20011130 I-227
Dynamic Load
The DynamicLoad attribute is used in conjunction with load-estimation software. It is set using the hasys command. Capacity = 100 AvailableCapacity = 10 GA GC GD SmSvr1 is 90 percent loaded.
SmSvr1
VCS_2.0_Solaris_R1.0_20011130
I-228
DB1 LgSvr1
DB2 LgSvr2
Failover Zones
Preferred Failover Zone for Database Service Group sysa sysb Preferred Failover Zone for Web Service Group sysc sysd
syse
sysf
Database Web The SystemList for both service groups includes all systems in the cluster.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-231
SystemZones Attribute
Used to define the preferred failover zones for each service group. If the service group is online in a system zone, it fails to other systems in the same zone based on the FailOverPolicy until there are no further systems available in that zone. When there are no other systems for failover in the same zone, VCS chooses a system in a new zone from the SystemList based on the FailOverPolicy. To define SystemZones: Syntax: hagrp modify group_name SystemZones \ sys1 zone# sys2 zone# sys zone# Example:
hagrp modify OracleSG SystemZones sysa \ 0 sysb 0 sysc 1 sysd 1 syse 1 sysf 1
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-232
RestartLimit
ConfInterval
Determines the amount of time that a tolerance or restart counter can be incremented Default: 600 seconds
ToleranceLimit
Enables the monitor entry point to return OFFLINE several times before the resource is declared FAULTED Default: 0
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-233
Restart Example
RestartLimit=1 Resource to be restarted one time within the ConfInterval timeframe ConfInterval=180 Resource can be restarted once within a three minute interval. MonitorInterval=60 seconds (default value) Resource is monitored every 60 seconds.
Online ConfInterval MonitorInterval Restart
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
Online
Offline
Online
Offline
Faulted
I-234
Adjusting Monitoring
MonitorInterval
Default value is 60 seconds for most resource types. Consider reducing to 10 or 20 seconds for testing. Use caution when changing this value:
Load is increased on cluster systems. Resources can fault if they cannot respond in the interval specified.
OfflineMonitorInterval
Default is 300 seconds for most resource types. Consider reducing to 60 seconds for testing.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-235
VCS_2.0_Solaris_R1.0_20011130
I-236
Preventing Failover
Frozen service group does not fail over when a critical resource faults. Service group must be unfrozen to enable fail over. To freeze a service group:
hagrp -freeze service_group [-persistent]
A persistent freeze:
Requires the cluster configuration to be open Remains in effect even if VCS stopped and restarted
VCS_2.0_Solaris_R1.0_20011130
I-237
Clearing Faults
Verify that the faulted resource is offline. Fix the problem that caused the fault and clean up any residual effects. To clear a fault, type:
hares -clear resource_name [-sys system_name]
VCS_2.0_Solaris_R1.0_20011130
I-238
Probing Resources
Causes VCS to immediately monitor the resource To probe a resource, type:
hares probe resource_name sys system_name
You can clear a persistent resource by probing it after the underlying problem has been fixed.
VCS_2.0_Solaris_R1.0_20011130
I-239
Testing Failover
Use test resources, such as FileOnOff, when applicable. Set lower values for MonitorInterval, OfflineMonitorInterval, and ConfInterval to detect faults more quickly. Manually online, offline, and switch the service group among all systems. Simulate failure of each resource in the service group. Simulate failover of the entire system.
VCS_2.0_Solaris_R1.0_20011130
I-241
Testing Examples
Force a resource to fault. Reboot a system. Halt and reboot a system. Remove power from a system.
VCS_2.0_Solaris_R1.0_20011130
I-242
Summary
You should now be able to: Describe how VCS responds to faults. Implement failover policies. Set limits and prerequisites. Use system zones to control failover. Control failover behavior using attributes. Clear faults. Probe resources. Flush service groups. Test failover.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-243
RedNFSSG
BlueNFSSG
Triggers
VCS_2.0_Solaris_R1.0_20011130
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-246
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe the benefits of keeping applications available during planned maintenance. Freeze service groups and systems. Upgrade a system in a running cluster. Describe the differences in application upgrades. Apply guidelines for installing new applications in the cluster.
VCS_2.0_Solaris_R1.0_20011130
I-247
VCS_2.0_Solaris_R1.0_20011130
I-248
Frozen
Web Server
Application Upgrade
WebSG DatabaseSG
Frozen
Freezing a System
Freezing a system prevents service groups from failing to it. Failover can still occur from a frozen system. Freeze a system while maintenance is being performed. Persistent freeze remains in effect through VCS restarts. Evacuate moves service groups off the frozen system. Syntax:
hasys freeze [persistent] [-evacuate] systemA hasys unfreeze [persistent] systemA
Persistent freeze remains in effect, even if VCS is stopped and restarted throughout the cluster. Syntax:
hagrp freeze service_group [persistent]
Yes More systems To upgrade? No Move service groups to appropriate systems: hagrp -switch mySG -to systemA Close the configuration: haconf -dump -makero
Done
I-253
VCS_2.0_Solaris_R1.0_20011130
I-254
Disadvantages:
Rolling upgrades cannot be performed. Downtime increased during maintenance
VCS_2.0_Solaris_R1.0_20011130
I-255
Disadvantages:
Must maintain multiple copies of the application Not scalable due to maintenance overhead in clusters with large numbers of service groups and systems
VCS_2.0_Solaris_R1.0_20011130
I-256
VCS_2.0_Solaris_R1.0_20011130
I-257
Summary
You should now be able to:
Describe the benefits of keeping applications available during planned maintenance. Freeze service groups and systems. Upgrade a system in a running cluster. Describe the differences in application upgrades. Apply guidelines for installing new applications in the cluster.
VCS_2.0_Solaris_R1.0_20011130
I-258
RedNFSSG BlueNFSSG
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-261
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe how Volume Manager enhances high availability. Describe Volume Manager storage objects. Configure shared storage using Volume Manager. Create a service group with Volume Manager resources. Configure Process resources. Configure Application resources.
VCS_2.0_Solaris_R1.0_20011130
I-262
Volume Management
Physical Disks
Virtual Volumes
System1
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
System2
I-263
VxVM Disks VxVM Disks Volumes Volumes Subdisks Subdisks Disk Group Disk Group Plexes Plexes
Subdisk Subdisk Subdisk
VCS_2.0_Solaris_R1.0_20011130
I-264
Disk Groups
Physical Disks
Disk1
VxVM Disks Disk Group: testDG VxVM objects cannot span disk groups. Disk groups represent management and configuration boundaries. Disk groups enable high availability.
Disk2
Disk3
VCS_2.0_Solaris_R1.0_20011130
I-265
VxVM Volume
Physical Disks
Disk1
Disk2
Volume1
Disk3
Create a volume.
vxassist -g disk_group make vol_name size
VCS_2.0_Solaris_R1.0_20011130
I-267
system. 3. Verify that the file system is accessible. 4. Unmount the file system. 5. Deport the disk group.
same name. Import the disk group. Start the volume. Mount and verify the file system. Unmount the file system. Deport the disk group.
VCS_2.0_Solaris_R1.0_20011130
I-268
VMSG
Mount
VMVol
VMDG
VCS_2.0_Solaris_R1.0_20011130
I-269
Required attributes:
DiskGroup Name of the disk group
Optional attributes:
StartVolumes, StopVolumes
Configuration Prerequisites:
Disk group and volume must be configured.
VCS_2.0_Solaris_R1.0_20011130
I-270
Required attributes:
DiskGroup Name of the disk group Volume Name of the volume
VCS_2.0_Solaris_R1.0_20011130
I-271
Done
Check Logs/Fix
Link Resources
Configuring a Resource
Add Resource Set Non-Critical Modify Attributes Enable Resource Bring Online Check Log Disable Resource Clear Resource Y Waiting to Online Faulted? Done
VCS_2.0_Solaris_R1.0_20011130 I-273
Flush Group
N Online? Y
Copyright 2001 VERITAS Software
Required attributes:
PathName Full path of the executable file
Optional attributes:
Arguments Use % to escape dashed arguments: hares modify myProc Arguments %-db q1h
Sample Configuration:
Process sendmail ( PathName = /usr/lib/sendmail Arguments = -db -q1h )
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-274
Required Attributes:
StartProgram Name of executable to start application StopProgram Name of executable to stop application One or more of the following: MonitorProgram Name of executable to monitor application MonitorProcesses List of processes to be monitored PidFiles List of pid files that contain the process ID of the processes to be monitored
Optional Attributes:
CleanProgram, User
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-275
Sample configuration:
Application samba_app ( StartProgram = /usr/sbin/samba start StopProgram = /usr/sbin/samba stop PidFiles = { /var/lock/samba/smbd.pid } MonitorProcesses = { smbd } )
VCS_2.0_Solaris_R1.0_20011130
I-276
Summary
You should now be able to:
Describe how Volume Manager enhances high availability. Describe Volume Manager Storage Objects. Configure shared storage using Volume Manager. Create a service group with Volume Manager resources. Configure Process resources. Configure Application resources.
VCS_2.0_Solaris_R1.0_20011130
I-277
ProdDG
RedNFSSG
ProdDG ProdVol /prod
Copyright 2001 VERITAS Software
BlueNFSSG
TestDG TestVol /test
VCS_2.0_Solaris_R1.0_20011130 I-278
Overview
Troubleshooting
Using Volume Manager Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-280
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Describe how systems communicate in a cluster. Describe the LLT and GAB configuration files and commands. Reconfigure LLT and GAB. Describe the effects of cluster communication failures. Recover from communication failures. Configure the InJeopardy trigger. Troubleshoot LLT and GAB.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-281
Cluster Communication
agent agent agent agent agent agent
System A
Copyright 2001 VERITAS Software
System B
VCS_2.0_Solaris_R1.0_20011130
I-282
System A
System B
System C Cluster 1
System D
VCS_2.0_Solaris_R1.0_20011130
I-283
Cluster State
GAB tracks all changes in configuration and resource status. Sends atomic broadcast to immediately transmit new configuration and status
Add Resource
1 2
1 2 3 4 5 6
3 6
1 2 3 4 5 6
4 5
1 2 3 4 5 6
I-284
VCS_2.0_Solaris_R1.0_20011130
Configuring LLT
Required configuration files: /etc/llttab /etc/llthosts Optional configuration file: /etc/VRTSvcs/conf/sysname
VCS_2.0_Solaris_R1.0_20011130
I-286
VCS_2.0_Solaris_R1.0_20011130
I-287
0 - 31
# /etc/VRTSvcs/conf/sysname sysb
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-288
# /etc/llttab set-node 1 set-cluster 10 # Solaris example link qfe0 /dev/qfe:0 - ether - link hme0 /dev/hme:0 - ether - link-lowpri qfe1 /dev/qfe:1 - ether - start
Device:Unit Link Type MTU
VCS_2.0_Solaris_R1.0_20011130
I-289
node_number name
Example entries:
VCS_2.0_Solaris_R1.0_20011130
I-293
GAB Configuration
GAB configuration file: /etc/gabtab GAB configuration command entry: /sbin/gabconfig -c -n seed_number Seed number is set to number of systems in the cluster. Starts GAB under normal conditions Other options discussed later
VCS_2.0_Solaris_R1.0_20011130
I-294
Stop VCS
Start VCS
Stop GAB
Start GAB
Stop LLT
Edit Files
Start LLT
VCS_2.0_Solaris_R1.0_20011130
I-295
Stop LLT:
/sbin/lltconfig -U
VCS_2.0_Solaris_R1.0_20011130
I-296
Starting LLT
Edit configuration files on each system before starting LLT on any system. Start LLT on each system in the cluster: /sbin/lltconfig -c LLT starts if configuration files are correct.
VCS_2.0_Solaris_R1.0_20011130
I-297
Starting GAB
Start LLT before starting GAB. Start GAB on each system, specifying a value for -n equal to the number of systems in the cluster: /sbin/gabconfig -c -n #
VCS_2.0_Solaris_R1.0_20011130
I-298
VCS_2.0_Solaris_R1.0_20011130
I-299
VCS_2.0_Solaris_R1.0_20011130
I-300
VCS_2.0_Solaris_R1.0_20011130
I-301
MTU Addrlen Xmit MTU Addrlen Xmit 1500 6 3732 1500 6 3732 1500 1500 1500 1500 6 6 6 6
Recv .. Recv .. 3678 0 3678 0 3731 3674 0 3731 3674 0 1584 6719 0 1584 6719 0
VCS_2.0_Solaris_R1.0_20011130
I-302
VCS_2.0_Solaris_R1.0_20011130
I-303
GAB is communicating.
VCS_2.0_Solaris_R1.0_20011130
I-304
Communication Failures
Network partition:
Failure of all Ethernet heartbeat links between one or more systems: Occurs when one or more systems fail Also occurs when all Ethernet heartbeat links fail
Split brain:
Failure of Ethernet heartbeat links is misinterpreted as failure of one or more systems. Multiple systems start running the same failover application. Leads to data corruption if applications using shared storage
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-305
Split-Brain Condition
Changing Block 20460 Changing Block 20460
Shared Storage
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-306
Jeopardy Condition
A special type of cluster membership called jeopardy is formed when one or more systems have only a single Ethernet heartbeat link. Service groups continue to run, and the cluster functions normally. Failover and switching at operator request are unaffected. The service groups running on a system in jeopardy are not taken over by another system if a system failure is detected by VCS.
VCS_2.0_Solaris_R1.0_20011130
I-308
Jeopardy Example
SG_1 SG_2 SG_3
B 1
Recovery Behavior
When a private network is reconnected after a network partition, VCS and GAB are stopped and restarted as such:
Two-system cluster:
System with the lowest LLT node number continues to run VCS. VCS is stopped on higher-numbered system.
Multi-system cluster:
Mini-cluster with the most systems running continues to run VCS. VCS is stopped on the systems in the smaller mini-cluster(s). If split into two equal size mini-clusters, the cluster containing the lowest node number continues to run VCS.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-312
VCS_2.0_Solaris_R1.0_20011130
I-313
VCS_2.0_Solaris_R1.0_20011130
I-314
Seeding
Prevents split brain Only seeded systems can run VCS. Systems are seeded only if GAB can communicate with other systems. Seeding determines the number of systems that must be communicating to allow VCS to start.
VCS_2.0_Solaris_R1.0_20011130
I-315
Overrides n; allows GAB to immediately seed the cluster so VCS can build a running configuration Use when the number of systems available is less than the number specified by n in /etc/gabtab. Only use on one system in the cluster; others then seed from first system.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-316
VCS_2.0_Solaris_R1.0_20011130
I-318
VCS_2.0_Solaris_R1.0_20011130
I-320
VCS_2.0_Solaris_R1.0_20011130
I-321
VCS_2.0_Solaris_R1.0_20011130
I-322
VCS_2.0_Solaris_R1.0_20011130
I-323
Summary
You should now be able to: Describe how systems communicate in a cluster. Configure the Low Latency Transport (LLT). Configure the Group Membership and Atomic Broadcast (GAB) mechanism. Start and stop LLT and GAB. Configure the InJeopardy trigger. Troubleshoot LLT and GAB.
VCS_2.0_Solaris_R1.0_20011130
I-324
TestSG
ProdDG
RedNFSSG
injeopardy
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130
BlueNFSSG
I-325
Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-327
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Objectives
After completing this lesson, you will be able to: Monitor system and cluster status. Apply troubleshooting techniques in a VCS environment. Detect and solve VCS communication problems. Identify and solve VCS engine problems. Correct service group problems. Solve problems with agents. Resolve problems with resources. Plan for disaster recovery.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-328
Monitoring VCS
VCS log files System log files The hastatus utility SNMP traps Event notification triggers Cluster Manager
VCS_2.0_Solaris_R1.0_20011130
I-329
VCS_2.0_Solaris_R1.0_20011130
I-331
Troubleshooting Guide
Primary types of problems:
Cluster communication VCS engine startup Service groups and resources
VCS engine startup problem indicated by systems with WAIT status Service group and resource problems indicated when VCS engine in RUNNING state
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-332
# gabconfig -a GAB Port Memberships =================================== Port a gen 24110002 membership 01 Port h gen 65510002 membership
VCS_2.0_Solaris_R1.0_20011130 I-333
Links 2 2
I-334
VCS_2.0_Solaris_R1.0_20011130
I-335
STALE_ADMIN_WAIT
To recover from STALE_ADMIN_WAIT state: 1. Visually inspect the main.cf file to determine whether it is valid. 2. Edit the main.cf file, if necessary.
3. Verify the syntax of main.cf, if modified.
main.cf file:
I-336
ADMIN_WAIT
A system can be in the ADMIN_WAIT state under these circumstances:
A .stale flag exists and the main.cf file has a syntax problem. A disk error occurs affecting main.cf during a local build. The system is performing a remote build and last running system fails.
VCS_2.0_Solaris_R1.0_20011130
I-337
hagrp display service_group Service group not configured to run on the system:
Check the SystemList attribute. Verify that the system name is included.
VCS_2.0_Solaris_R1.0_20011130
I-338
VCS_2.0_Solaris_R1.0_20011130
I-340
Check Probes attribute for resources: hares -display To probe resources: hares probe resource -sys system
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-341
VCS_2.0_Solaris_R1.0_20011130
I-342
VCS_2.0_Solaris_R1.0_20011130
I-343
VCS_2.0_Solaris_R1.0_20011130
I-345
Concurrency Violations
Occurs when a failover service group is online or partially online on more than one system Notification provided by the Violation trigger:
Invoked on the system that caused the concurrency violation Notifies the administrator and takes the service group offline on the system causing the violation Configured by default with the violation script in /opt/VRTSvcs/bin/triggers Can be customized: Send message to the system log. Display warning on all cluster systems. Send e-mail messages.
VCS_2.0_Solaris_R1.0_20011130
I-346
VCS_2.0_Solaris_R1.0_20011130
I-347
VCS_2.0_Solaris_R1.0_20011130
I-348
VCS_2.0_Solaris_R1.0_20011130
I-349
VCS_2.0_Solaris_R1.0_20011130
I-350
VCS_2.0_Solaris_R1.0_20011130
I-351
Clearing Faults
After external problems are fixed:
1.
Clear any faults on nonpersistent resources. hares -clear resource -sys system Check attribute fields for incorrect or missing data.
2.
Flush wait states: hagrp -flush service_group -sys system Bring resources offline first before bringing them online.
2.
VCS_2.0_Solaris_R1.0_20011130
I-352
VCS_2.0_Solaris_R1.0_20011130
I-353
Summary
You should now be able to: Monitor system and cluster status. Apply troubleshooting techniques in a VCS environment. Identify and solve VCS engine problems. Correct service group problems. Solve problems with agents. Resolve problems with resources. Plan for disaster recovery.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-355
Lab Exercise
Lesson 14 Troubleshooting
Overview
This lesson provides a guide for managing certain situations in a cluster environment: VCS upgrades VCS patches System changes: Adding, removing, and replacing cluster systems
VCS_2.0_Solaris_R1.0_20011130
I-358
Objectives
After completing this lesson, you will be able to: Upgrade VCS software to version 2.0 from any earlier versions. Install a VCS patch. Add systems to a running VCS cluster. Remove systems from a running VCS cluster. Replace systems in a running VCS cluster.
VCS_2.0_Solaris_R1.0_20011130
I-359
VCS_2.0_Solaris_R1.0_20011130
I-360
II. Stop the existing VCS software. III. Remove the existing VCS software and add the new VCS version. IV. Verify the configuration and make changes as needed. V. Start VCS on one system and propagate the configuration to others. Done
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-361
Open the cluster configuration and freeze all service groups persistently:
haconf makerw hagrp list hagrp freeze group_name -persistent
2. 3.
4.
Stop the VCS engine on all systems leaving the application services running:
hastop all -force
2.
3.
4.
VCS_2.0_Solaris_R1.0_20011130
I-363
2.
VCS_2.0_Solaris_R1.0_20011130
I-364
2.
3. 4.
Compare and merge any necessary changes to VCS scripts. Verify the configuration files:
hacf verify /etc/VRTSvcs/conf/config
VCS_2.0_Solaris_R1.0_20011130
I-365
2.
Start the VCS engine on the system where the changes were made:
hastart
3.
Start the VCS engine on all other systems in the cluster in a stale state:
hastart -stale
4.
Open the configuration, unfreeze the service groups, and save and close the configuration:
haconf makerw hagrp unfreeze group_name persistent haconf dump -makero
VCS_2.0_Solaris_R1.0_20011130
I-366
Done
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-367
2.
Add the new VCS patch on each system using the provided utility.
./vcs_install_patch
3.
VCS_2.0_Solaris_R1.0_20011130
I-368
2.
Open the configuration, unfreeze the service groups, and save and close the configuration:
haconf makerw hagrp unfreeze group_name persistent haconf dump -makero
VCS_2.0_Solaris_R1.0_20011130
I-369
network.
the cluster to add the system name and node ID of the new system.
5. Start LLT, GAB, and VCS on the new system. 6. Change the SystemList attribute for each
Switch all running service groups to other systems and freeze the system. Stop VCS on the system using hastop -local. Stop GAB on the system:
gabconfig U modinfo | grep gab modunload -i modid
4.
5. 6. 7.
Edit /etc/llthosts on all systems to delete the entry for the system to be removed. Remove llttab and gabtab files on that system.
VCS_2.0_Solaris_R1.0_20011130 I-371
Evacuate any service groups running on the system to be replaced. Make the VCS configuration read/write, freeze the system persistently, save and close the configuration.
haconf makerw hasys freeze system_name persistent haconf dump -makero
3.
Physically replace the system with a new one using the same VCS configuration (same cluster number, node id, and system name). Connect the new system to the private network. Start LLT, GAB, and VCS on the new system. Make the VCS configuration read/write, unfreeze the system, save and close the configuration.
haconf makerw hasys unfreeze system_name persistent haconf dump -makero
4. 5. 6.
VCS_2.0_Solaris_R1.0_20011130
I-372
Summary
You should now be able to: Upgrade VCS software to version 2.0. Install a VCS patch. Add systems to a running VCS cluster. Remove systems from a running VCS cluster. Replace systems in a running VCS cluster.
VCS_2.0_Solaris_R1.0_20011130
I-373
RedSG BlueSG
Install Patch
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-374
Introduction
Applications/Services Applications/Services
NFS
WWW
FTP
DB
Public Network Public Network VCS VCS Private Private Network Network
VCS_2.0_Solaris_R1.0_20011130
I-376
VCS Features
Availability
Monitor and restart
Scalability
Distribute services Add systems and
Network
Clustered Databases
Manageability
Use Java or Web
HA management software
Site replication Fault detection, notification, and failover Storage management Backup and recovery
Redundant hardware
Power supplies Network interface cards, hubs, switches Storage
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-378
Data Replication
VERITAS VVR & Support for Array-Based Replication
Parallel Extensions
VERITAS Cluster Volume Manager and File System
Foundation Products
VERITAS Volume Manager and File System
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-379
VCS VCS
Tokyo
London
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-380
VCS_2.0_Solaris_R1.0_20011130
I-381
Course Overview
Troubleshooting
Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-382
Introduction
Copyright 2001 VERITAS Software
Installing VCS
VCS_2.0_Solaris_R1.0_20011130
Lab Overview
Private Network
Red Student
Blue Student
train1
SCSI JBOD
train2
Public Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-383