You are on page 1of 383

We request that you please turn off pagers and cell phones during class.

Thank you.

VERITAS Cluster Server for Solaris


Lesson 1 VCS Terms and Concepts

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Services NFS Resources Using Cluster Manager
I-3

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Define VCS terminology. Describe cluster communication basics. Describe VERITAS Cluster Server architecture.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-4

Clusters
Local Area Network

Fibre Switches SCSI JBODS

Fibre Switches

Several networked systems Shared storage Single administrative entity Peer monitoring
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-5

Systems
Members of a cluster Referred to as nodes Contain copies of:
Communication protocol configuration files VCS configuration files VCS libraries and directories VCS scripts and daemons

Share a single dynamic cluster configuration Provide application services


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-6

Service Groups
A service group is a related collection of resources. Resources in a service group must be available to the system. Resources and service groups have interdependencies.
IP Share
NFS Service Group

NIC NFS Disk

Mount

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-7

Service Group Types


Failover
Can be partially or fully online on only one server at a time VCS controls stopping and restarting the service group when components fail

Parallel
Can be partially or fully online on multiple servers simultaneously Examples:
Oracle Parallel Server Web, FTP servers

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-8

Resources
VCS objects that correspond to hardware or software components Monitored and controlled by VCS Classified by type Identified by unique names and attributes Can depend on other resources within the same service group

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-9

Resource Types
General description of the attributes of a resource Example Mount resource type attributes:
MountPoint BlockDevice

Other example resource types:


Disk Share IP NIC

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-10

Agents
Processes that control resources One agent per resource type Agent controls all resources of that type. Agents can be added into VCS agent framework.

Resources Resources Agents Agents

/data Mount

c1t0d0s0

c1t0d1s0

hme0

qfe1

10.1.2.4 IP

Disk

NIC

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-11

Dependencies
Resources can depend on other resources. Parent resources depend on child resources. Service groups can depend on other service groups. Resource types can depend on other resource types. Rules govern service group and resource dependencies. No cyclic dependencies are allowed.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-12

Mount
(Parent)

Disk
(Child)

Private Network
Minimum two communication channels with separate infrastructure:
Multiple NICs (not just ports) Separate hubs, if used

Heartbeat communication determines which systems are members of the cluster. Cluster configuration broadcast updates cluster systems with status of each resource and service group.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-13

Low Latency Transport (LLT)


Provides fast, kernel-to-kernel communications Is connection oriented Is not routable Uses Data Link Provider Interface (DLPI) over Ethernet

Kernel Kernel

LLT

LLT
Private Network Private Network

Hardware Hardware

SystemA

SystemB
VCS_2.0_Solaris_R1.0_20011130 I-14

Copyright 2001 VERITAS Software

Group Membership Services/Atomic Broadcast (GAB)


Manages cluster membership Maintains cluster state Uses broadcasts Runs in kernel over Low Latency Transport (LLT)

GAB
Kernel Kernel

GAB LLT
Private Network Private Network

LLT

Hardware Hardware

SystemA

SystemB
VCS_2.0_Solaris_R1.0_20011130 I-15

Copyright 2001 VERITAS Software

VCS Engine (had)


Maintains configuration and state information for all cluster resources Uses GAB to communicate among cluster systems Is monitored by hashadow process
hashadow had GAB
Kernel Kernel

hashadow

had GAB LLT


Private Network Private Network

LLT

Hardware Hardware
Copyright 2001 VERITAS Software

SystemA

SystemB
VCS_2.0_Solaris_R1.0_20011130 I-16

VCS Architecture
Shared Cluster Configuration in Memory Resources Resources Agents Agents /v Mount c1d0t0s0 Disk hme0 10.1.2.4 /v c1d0t0s0 hme0 10.1.2.4

NIC

IP

Mount

Disk

NIC

IP

hashadow

had
GAB LLT

hashadow

had
GAB LLT

Kernel Kernel

Hardware Hardware
Copyright 2001 VERITAS Software

SystemA
VCS_2.0_Solaris_R1.0_20011130

SystemB
I-17

Summary
You should now be able to: Define VCS terminology. Describe cluster communication basics. Describe VERITAS Cluster Server architecture.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-18

VERITAS Cluster Server for Solaris


Lesson 2 Installing VERITAS Cluster Server

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-20

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe VCS software, hardware, and licensing prerequisites. Describe the general VCS hardware requirements. Configure SCSI controllers for a shared disk storage environment. Add VCS executable and manual page paths to the environment variables. Install VCS using the installation script.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-21

Software and Hardware Requirements


Software:
Solaris 2.6, 7 and 8 (32-bit and 64-bit) Recommended: Solaris patches VERITAS Volume Manager (VxVM) 3.1.P1+ VERITAS File System (VxFS) 3.3.1+

Hardware:
Check latest VCS release notes. Contact VERITAS Support.

Licenses:
Keys are required on a per-system or per-site basis. Contact VERITAS Sales for new license, or VERITAS Support for upgrades.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-22

General Hardware Layout


Private Ethernet Heartbeat Links OS Disk NICS NICS

OS

SCSI2 Shared Data Disks SCSI2

SCSI1

SCSI1

NICS

NICS

Public Network
SYSTEM A
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

SYSTEM B

I-23

SCSI Controller Configuration


OS Disk
scsi-initiator-id

5 OS Disk

SCSI Target IDs: 1 SCSI2 2 3 4

scsi-initiator-id

0 7 SCSI1

Shared Data Disks

7 SCSI2

0 SCSI1

SYSTEM A

SYSTEM B

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-24

SCSI Controller Setup


Use unique SCSI IDs for each system. Check the scsi-initiator-id setting using the eeprom command. Change the scsi-initiator-id if needed. Controller ID can also be changed on a controller-by-controller basis.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-25

Setting Environment Variables


For Bourne or Korn shell (sh or ksh):
PATH
PATH=$PATH:/sbin:/opt/VRTSvcs/bin:/opt/VRTSllt export PATH

MANPATH Add to /.profile


MANPATH=$MANPATH:/opt/VRTS/man export MANPATH

For C shell (csh or tcsh):


PATH
setenv PATH \ ${PATH}:/sbin:/opt/VRTSvcs/bin:/opt/VRTSllt

MANPATH
setenv MANPATH ${MANPATH}:/opt/VRTS/man
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-26

The installvcs Utility


Uses pkgadd to install the VCS packages on all the systems in the cluster: VRTSllt VRTSgab VRTSperl VRTSvcs VRTSweb VRTSvcsw VRTSvcsdc Requires remote root access to other systems in the cluster while the script is being run (/.rhosts file) Note: Can remove .rhosts files after VCS installation. Configures two private network links for VCS communications Brings the cluster up without any services
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-27

Installation Settings
Information required by installvcs:
Cluster name Cluster number System names License key Network ports for private network Web Console configuration:
Virtual IP address Subnet mask Network interface

SMTP/SNMP notification configuration (discussed later)


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-28

Starting VCS Installation


# ./installvcs Please enter the unique Cluster Name : mycluster Please enter the unique Cluster ID(a number from 0-255) : 200 Enter the systems on which you want to install. (system names separated by spaces) : train7 train8 Analyzing the system for install. Enter the license key for train7 : XXXX XXXX Applying the license key to all systems in the cluster
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-29

Installing the Private Network


Following is the list of discovered NICs: Sr. No. NIC Device 1. /dev/hme:0 2. /dev/qfe:0 3. /dev/qfe:1 4. /dev/qfe:2 5. /dev/qfe:3 6. Other From the list above, please enter the serial number (the number appearing in the Sr. No. column) of the NIC for First PRIVATE network link: 1 From the list above, please enter the serial number (the number appearing in the Sr. No. column) of the NIC for Second PRIVATE network link: 2 Do you have the same network cards set up on all systems (Y/N)? y
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-30

Configuring the Web Console


Do you want to configure the Cluster Manager (Web Console) (Y/N)[Y] ? y Enter the Virtual IP address for the Web Server : 192.168.27.9 Enter Subnet [255.255.255.0]: <enter> Enter the NIC Device for this Virtual IP address (public network) on train7 [hme0]: <enter> Do you have the same NIC Device on all other systems (Y/ N)[Y] ? y Do you want to configure SNMP and/or SMTP (e-mail) notification (Y/N)[Y] ? n Summary information for ClusterService Group setup : -------------------------------------------------Cluster Manager (Web Console) : Virtual IP Address : 192.168.27.9 Subnet : 255.255.255.0 Public Network link : train7 train8 : hme0 URL to access : http://192.168.27.9:8181/vcs
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-31

Completing VCS Installation


Installing on train7. Copying VRTSperl binaries. ..... Installing on train8. Copying VRTSperl binaries. .... Copying Cluster configuration files... Done. Installation successful on all systems. Installation can start the Cluster components on the following system/s. train7 train8 Do you want to start these Cluster components now (Y/N)[Y] ? y Loading GAB and LLT modules and starting VCS on train7: Starting LLT...Start GAB....Start VCS Loading GAB and LLT modules and starting VCS on trainer2: Starting LLT...Start GAB....Start VCS
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-32

Summary
You should now be able to: Describe VCS software, hardware, and licensing prerequisites. Describe the general VCS hardware requirements. Configure SCSI controllers for a shared disk storage environment. Add VCS executable and manual page paths to the environment variables. Install VCS using the installation script.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-33

Lab 2: Installing VCS


OS Disk scsi-initiator-id 5 SCSI1 1 SCSI2 5 2 3 4 SCSI2 7 SCSI Target Ids: scsi-initiator-id 7 SCSI1 0

Shared Data Disks

0 OS Disk

train1

train2

# ./installvcs
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-34

VERITAS Cluster Server for Solaris


Lesson 3 Managing Cluster Services

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Services NFS Resources Using Cluster Manager
I-36

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe the cluster configuration mechanisms Start the VCS engine on cluster systems. Stop the VCS engine. Modify the cluster configuration. Describe cluster transition states.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-37

Cluster Configuration
Shared Cluster Configuration in Memory

had main.cf

hashadow

hashadow

had main.cf

GAB LLT

GAB LLT

SystemA
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

SystemB
I-38

Starting VCS
1

System1
Cluster Conf 4 2

System2
No valid configuration

System3

main.cf
1

had hashadow
5

had hashadow
hastart
7 Private Network

hastart

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-39

Starting VCS: Second System


1

System1
Cluster Conf

System2
Cluster Conf 10

System3

main.cf had hashadow


9

main.cf had hashadow

8
Copyright 2001 VERITAS Software

Private Network
VCS_2.0_Solaris_R1.0_20011130 I-40

Starting VCS: Third System


System1 System2 System3

Shared Cluster Configuration in Memory

main.cf had hashadow

main.cf had hashadow

main.cf had hashadow

Private Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-41

Stopping VCS
System1 System2

SGA

SGB System1 System2

had 1

had SGA System2 2 had SGA SGB had

hastop -local System1

SGB

hastop -local -evacuate

had 3
Copyright 2001 VERITAS Software

had
VCS_2.0_Solaris_R1.0_20011130 I-42

hastop -local -force

The hastop Command


The hastop command stops the VCS engine. Syntax:
hastop option [arg] [-option]

Options:
-local [-force | -evacuate] -sys sys_name [-force | -evacuate] -all [-force]

Example:
hastop -sys train4 -evacuate

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-43

Displaying Cluster Status


The hastatus Command Displays status of items in the cluster. Syntax:
hastatus -option [arg] [-option arg]

Options:
-group service_group -sum[mary]

Example:
hastatus -group OracleSG

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-44

Protecting the Cluster Configuration


2 Cluster Conf 3

main.cf .stale
1 haconf -makerw 1. 2. 3.

main.cf .stale
hares add

main.cf

haconf dump makero

Cluster configuration opened; .stale file created Resources added to cluster configuration in memory; main.cf out of sync with memory configuration Changes saved to disk; .stale removed
VCS_2.0_Solaris_R1.0_20011130 I-45

Copyright 2001 VERITAS Software

Opening and Saving the Cluster Configuration


The haconf command opens, closes, and saves the cluster configuration. Syntax:
haconf option [-option]

Options:
-makerw -dump -dump makero Opens configuration Saves configuration Saves and closes configuration

Example:
haconf -dump -makero
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-46

Starting VCS with a Stale Configuration


System1
Cluster Conf

System2
Cluster Conf

System3

main.cf
2 had hashadow

main.cf .stale
had hashadow

main.cf

hastart
3 Private Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-47

Forcing VCS to Start on the Local System


System1 System2
Cluster Conf 4

System3

main.cf
2

main.cf .stale
had hashadow

main.cf

hastart -force
3 Private Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-48

Forcing a System to Start


1

System1

System2
Cluster Conf 1

System3

main.cf .stale
had hashadow

main.cf .stale
had hashadow

main.cf .stale
had hashadow

2 Private Network

hasys force System2


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-49

The hasys Command


Alters or queries state of had Syntax:
hasys option [arg]

Options:
-force system_name -list -display system_name -delete system_name -add system_name

Example: hasys -force train11


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-50

Propagating a Specific Configuration


1. Stop VCS on all systems in the cluster and

leave applications running: hastop -all -force

2. Start VCS stale on all other systems:

hastart -stale The -stale option causes these systems to wait until a running configuration is available from which they can build.
3. Start VCS on the system with the main.cf

that you are propagating: hastart

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-51

Summary of Start Options


The hastart command starts the had and hashadow daemons. Syntax:
hastart [-option]

Options:
-stale -force

Example:
hastart -force

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-52

Validating the Cluster Configuration


The hacf utility checks the syntax of the main.cf file.
Syntax: hacf -verify config_directory Example: hacf -verify /etc/VRTSvcs/conf/config

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-53

Modifying Cluster Attributes


The haclus command is used to view and change cluster attributes.
Syntax: haclus option [arg] Options: -display -help [-modify] -modify modify_options -value attribute -notes Example: haclus value ClusterLocation
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-54

Startup States and Transitions


UNKNOWN hastart INITING

Valid configuration on disk


CURRENT_DISCOVER_WAIT
Peer in ADMIN_WAIT Peer in LOCAL_BUILD Peer in RUNNING

Stale configuration on disk


STALE_DISCOVER_WAIT
Peer in LOCAL_BUILD Peer in ADMIN_WAIT

ADMIN_WAIT

CURRENT_PEER_WAIT
No Peer Peer in RUNNING

STALE_ADMIN_WAIT
Peer starts LOCAL_BUILD

ADMIN_WAIT

LOCAL_BUILD
Disk Error

STALE_PEER_WAIT REMOTE_BUILD RUNNING


Peer in RUNNING The only peer in RUNNING state crashes
I-55

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

Shutdown States and Transitions


RUNNING
Running config. lost Unexpected exit hastop hastop -force

ADMIN_WAIT

FAULTED

LEAVING

EXITING_FORCIBLY

Resources offlined, agents stopped

EXITING

EXITED
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-56

Summary
You should now be able to: Describe the cluster configuration mechanisms. Start VCS. Stop VCS. Modify the cluster configuration. Explain the transition states of the cluster.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-57

Lab 3: Managing Cluster Services


To complete this lab exercise: Use commands to start and stop cluster services, as described in the detailed lab instructions. Observe the cluster status by running hastatus in a terminal window.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-58

VERITAS Cluster Server for Solaris


Lesson 4 Using the Cluster Manager Graphical User Interface

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Services NFS Resources Using Cluster Manager
I-60

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Install Cluster Manager. Control access to VCS administration. Demonstrate Cluster Manager features. Create a service group. Create resources. Manage resources and service groups. Use the Web Console to administer VCS.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-61

Installing Cluster Manager


Cluster Manager requirements on Solaris:
128 MB RAM 1280 x 1024 display resolution Minimum 8-bit color depth of the monitor; 24-bit is recommended

To install Cluster Manager:


pkgadd d pkg_location VRTScscm

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-62

Cluster Manager Properties


Can be run from a remote system:
Windows NT Solaris system (cluster member or nonmember)

Can manage multiple clusters from a single workstation Uses TCP port 14141 by default; change with such an entry in /etc/services, if desired:
vcs 12345/tcp

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-63

Controlling Access to VCS: User Accounts


Cluster Administrator
Full privileges

Cluster Operator
All cluster, service group, and resource-level operations

Cluster Guest
Read-only access; new users created as Cluster Guest accounts by default.

Group Administrator
All service group operations for a specified service group, except deleting service groups

Group Operator
Online and offline service groups and resources; temporarily freeze or unfreeze service groups
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-64

VCS User Account Hierarchy


Cluster Administrator
Includes privileges for

Cluster Operator
Includes privileges for

Group Administrator
Includes privileges for

Group Operator
Includes privileges for

Cluster Guest
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-65

Adding Users and Setting Privileges


Cluster configuration must be open. Users are added using the hauser command.
hauser add username

Additional privileges can then be added:


haclus -modify Administrators -add user haclus -modify Operators -add user hagrp -modify group Administrators -add user hagrp -modify group Operators -add user

VCS user account admin is created with Cluster Administrator privilege by vcsinstall utility.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-66

Modifying User Accounts


To display account information:
hauser -display user_name

To change a password:
hauser -update user_name

To delete a VCS user account:


hauser -delete user_name

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-67

Controlling Access to the VCS Command Line Interface


No mapping between UNIX and VCS user accounts by default except root, which has Cluster Administrator privilege. Nonroot users are prompted for a VCS account name and password when executing VCS commands using the command line interface. The cluster attribute AllowNativeCliUsers can be set to map UNIX account names to VCS accounts. A VCS account must exist with the same name as the UNIX user with appropriate privileges.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-68

Cluster Manager Demonstration


Cluster Manager demonstration: Configuration and logging on Creating a service group and a resource Manual and automatic fail over Log desk, Command Log, Command Center, and Cluster Shell Refer to your participants guide as the steps are listed in the notes. If unable to demonstrate in class, the following slides guide you through the demonstration.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-69

Configuring Cluster Manager


1

hagui&
2

3 4 6 5

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-70

Logging In to Cluster Manager


5 Cluster Panel 1 2 Member Systems 6 7 Heartbeats

Service Groups 3 4

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-71

VCS Cluster Explorer


1 4

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-72

Creating a Service Group


1

2 4 3

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-73

Creating a Resource
3 1 2 4 5 6 7

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-74

Bringing a Resource Online

1 2 3

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-75

Resource and Service Group Status

1 2 4 3

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-76

Switching the Service Group to Another System

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-77

Service Group Switched

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-78

Changing MonitorInterval
3 1 2 4

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-79

Setting the Critical Attribute

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-80

Faulted Resources

1 2

3
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-81

Clearing a Faulted Resource

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-82

Log Desk
1

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-83

Command Log
1

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-84

Command Center
1 3

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-85

Shell Tool
1

2 5

4 3
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-86

Administering User Profiles

Add user account.

Remove or modify user account.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-87

Using the Web Console


Web Console Manage existing resources and service groups:
Online, offline Clearing faults and probing resources Switching, flushing, freezing service groups

Java Console Configure service groups and resources:


Add Delete Modify

Cannot be used to create resources or service groups Runs on any system with a Java-enabled Web browser
Copyright 2001 VERITAS Software

Can be used for all VCS administrative tasks Requires Cluster Manager and Java to be installed on the administration system
I-88

VCS_2.0_Solaris_R1.0_20011130

Connecting to the Web Console

http://IP_alias:8181/vcs

VCS account and password


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-89

Cluster Summary
Display Refresh Navigation buttons

Log entries

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-90

System View
Selected View Navigation trail

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-91

Summary
You should now be able to: Install Cluster Manager. Control access to VCS administration. Demonstrate Cluster Manager features. Create a service group. Create resources. Manage resources and service groups. Use the Web Console to administer VCS.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-92

Lab 4: Using Cluster Manager


Student Red
RedGuiSG

Student Blue
BlueGuiSG

RedFile

BlueFile

/tmp/RedFile

/tmp/BlueFile

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-93

VERITAS Cluster Server for Solaris


Lesson 5 Service Group Basics

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-95

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe how application services relate to service groups. Translate application requirements to service group resources. Define common service group attributes. Create a service group using the command line interface. Perform basic service group operations.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-96

Application Service
Database Requests

Database Software NIC

IP Address Data Log Network


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-97

High Availability Applications


VCS must be able to perform these operations: Start using a defined startup procedure. Stop using a defined shutdown procedure. Monitor using a defined procedure. Share storage with other systems and store data to disk, rather than maintaining it in memory. Restart to a known state. Migrate to other systems.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-98

Example Service Groups

SystemA

Web

Database

Parallel Service Group

Failover Service Group

SystemB

Web

Database

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-99

Analyzing Applications
1. Specify application services

corresponding to service groups. service group type, failover or parallel. and the desired failover policy.

2. Determine high availability level and

3. Specify which systems run which services 4. Identify the hardware and software objects

required for each service group and their dependencies. hardware and software objects.
VCS_2.0_Solaris_R1.0_20011130

5. Map the service group resources to actual

Copyright 2001 VERITAS Software

I-100

Example Application Services


Database Service Groups Application Services Database processes /oracle/data /oracle/log c1t1d0s5 c1t2d0s4 192.168.3.55 qfe1

Web

httpd /data c1t3d0s3 192.168.3.56 qfe1


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-101

Identify Physical Resources


Database Service Group

Database Application

File System /oracle/data Contains data files(s) Physical Disk 1 c1t1d0s5

File System /oracle/log Contains log file(s) Physical Disk 2 c1t2d0s4

IP Address 192.168.3.55 Network Port qfe1

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-102

Map Physical Objects to VCS Resources


The database service group in the example requires:
Two Disk resources to monitor the availability of the shared log disk and the shared data disk Two Mount resources that mount, unmount, and monitor the required log and data file systems A NIC resource to check the network connectivity on port qfe1 An IP resource to configure the IP address that will be used by database clients to access the database An Oracle resource to start, stop, and monitor the Oracle database application
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-103

Service Groups
Create a service group using the command line interface:
Syntax: hagrp -add group_name Example: hagrp add mySG

Modify service group attributes to define behavior:


hagrp modify group_name attribute \ value [values]

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-104

SystemList Attribute
Defines the systems that can run the service group Lowest numbered system has highest priority in determining the target system for failover. To define SystemList attribute:
Syntax:
hagrp modify group_name SystemList \ system1 priority1 system2 priority2

Example:
hagrp modify mySG SystemList \ train1 0 train2 1
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-105

AutoStart and AutoStartList Attributes


A service group is automatically started on a system when VCS is started (if it is not already online somewhere else in the cluster) under the following conditions:
The AutoStart attribute is set to 1. The system is listed in its AutoStartList attribute. The system is listed in its SystemList attribute.

To define AutoStart attribute (default is 1):


hagrp modify group_name AutoStart value

To define AutoStartList attribute:


hagrp modify group_name AutoStartList \ system1 system2

Examples:
hagrp modify myManualSG AutoStart 0 hagrp modify mySG AutoStartList train0
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-106

AutoStartIfPartial Attribute
Allows VCS to bring a service group with disabled resources online All enabled resources must be probed. Default is 1, enabled. If 0, the service group cannot come online with disabled resources To define AutoStartIfPartial attribute:
Syntax:
hagrp modify group_name \ AutoStartIfPartial value

Example:
hagrp modify group_name \ AutoStartIfPartial 0
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-107

Parallel Attribute
Parallel service groups:
Run on more than one system at the same time Respond to system faults by:
Staying online on remaining systems Failing over to the specified target system

To set the Parallel attribute: Syntax:


hagrp modify group_name Parallel value

Example:
hagrp modify myparallelSG Parallel 1

Must set Parallel attribute before adding resources Default value: 0 (failover)
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-108

Configuring a Service Group


Add Service Group Set SystemList Set Opt Attributes Success? Add/Test Resource Resource Flow Chart Test Switching Y More? N
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-109

Test Failover Set Critical Res Y N

Done

Check Logs/Fix

Link Resources

Service Group Operations


Service group operations described in the following sections: Bringing the service group online:
hagrp online group_name sys system_name

Taking the service group offline:


hagrp offline group_name sys system_name

Displaying service group properties:


hagrp display group_name

Example command lines:


hagrp online oraclegroup sys train8 hagrp offline oraclegroup sys train8 hagrp display oraclegroup
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-110

Bringing a Service Group Online


Oracle Process IP NIC Mount Disk IP NIC Oracle Process Mount Disk

Before

Oracle Process IP NIC Mount Disk

In-Progress

After
I-111

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

Taking a Service Group Offline


Oracle Process IP NIC Mount Disk IP NIC Oracle Process Mount Disk

Before

Oracle Process IP NIC Mount Disk

In-Progress

After
I-112

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

Partially Online Service Groups


A service group is partially online if:
One or more nonpersistent resources is online. At least one resource is:
Autostart enabled Critical Offline Oracle Process

IP

Mount

NIC

Disk

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-113

Switching a Service Group


A manual failover can be accomplished by taking the service group offline on one system, and bringing it online on another system. To switch a service group from one system to another using a single command:
Syntax:
hagrp switch group_name to system_name

Example:
hagrp switch mySG to train8

To switch using Cluster Manager:


Right-click on group>Switch to>system.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-114

Flushing a Service Group


Misconfigured resources can cause agents processes to hang. Flush service group to stop all online and offline processes. To flush a service group using the command
line:
Syntax:

hagrp flush group_name sys system_name Example: hagrp flush mySG sys train8

To flush a service group using Cluster Manager:


Right-click on group>Flush>system.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-115

Deleting a Service Group


Before deleting a service group:
1. Bring all resources offline. 2. Disable resources. 3. Delete resources. line:

To delete a service group using the command


Syntax: hagrp delete group_name Example: hagrp delete mySG

To delete a service group using Cluster Manager:


Right-click on group>Delete.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-116

Summary
You should now be able to: Describe how application services relate to service groups. Translate application requirements to service group resources. Define common service group attributes. Create a service group using the command line interface. Perform basic service group operations.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-117

Lab 5: Creating Service Groups


Student Red Student Blue

RedGuiSG

BlueGuiSG

RedNFSSG

BlueNFSSG

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-118

VERITAS Cluster Server for Solaris


Lesson 6 Preparing Resources

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-120

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe the components required to create and share a file system using NFS. Prepare NFS resources. Describe the VCS network environment. Manually migrate the NFS services between two systems. Describe the process of automating high availability.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-121

Operating System Components Related to NFS


File system-related resources:
Hard disk partition File system to be mounted Directory to be shared NFS daemons

Network-related resources:
IP address Network interface

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-122

Disk Resources
/dev/(r)dsk/c1t1d0s3 /dev/(r)dsk/c1t1d0s3

Shared Storage

System 1 Partition 3

disk1

System 2

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-123

File System and Share Resources


/dev/(r)dsk/c1t1d0s3 vxfs /data /dev/(r)dsk/c1t1d0s3 vxfs /data

nfsd mountd

Shared Storage

nfsd mountd

System 1 Partition 3

disk1

System 2

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-124

Creating File System Resources


Format a disk and create a slice:
Needs to be done on one system Use format command. Must have the same major and minor numbers on both systems (for NFS)

Create a file system on the slice:


From one system only:
mkfs F fstype /dev/rdsk/device_name

Can use newfs for UFS file systems

Create a directory for a mount point on each system:


mkdir /mount_point
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-125

Sharing the File System


1. Mount the file system:
The file system should not be mounted automatically at boot time. Check the file system, if necessary: fsck F fstype /dev/rdsk/device_name mount F fstype /dev/dsk/device_name \ mount_point

2. Start the NFS daemons, if they are not already

running:

/usr/lib/nfs/nfsd -a nserver /usr/lib/nfs/mountd 3. Share the file system: share mount_point


Copyright 2001 VERITAS Software

Note: The file system should not be shared automatically at boot time.
VCS_2.0_Solaris_R1.0_20011130

I-126

NFS Resource Dependencies


Share

File System

NFS

Disk Partition

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-127

IP Addresses in a VCS Environment


Administrative IP addresses
Associated with the physical network interface, such as qfe1 Assigned a unique hostname and IP address by the operating system at boot time Available only when the system is up and running Used for checking network connectivity Called Base or Maintenance IP addresses

Application IP addresses
Added as a virtual IP address to the network interface, such as qfe1:1 Associated with an application service Controlled by the high availability software Migrated to other systems if the current system fails Also called service group or floating IP addresses
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-128

Configuring an Administrative IP Address


1. Create /etc/hostname.interface with

the desired interface name:

vi /etc/hostname.qfe1 train14_qfe1
2. Edit /etc/hosts and assign an IP address

to the interface name.

vi /etc/hosts 166.98.112.14
3. Reboot the system.

train14_qfe1

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-129

Configuring Application IP Addresses


Requires the administrative IP address to be configured on the interface Do not create a hostname file. To set up manually:
1. Configure the IP address using ifconfig: ifconfig qfe1:1 inet 166.98.112.114 netmask + 2. Bring up the IP address: ifconfig qfe1:1 plumb ifconfig qfe1:1 up 3. Assign a virtual hostname (application service name)

to the IP address.
vi /etc/hosts 166.98.112.114

nfs_services

Clients use the application IP address to connect to the application services.


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-130

NFS Services Resource Dependencies


Application IP

Share

Network Interface

File System

NFS

Disk Partition
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-131

Monitoring NFS Resources


To verify the file system:
mount|grep mount_point

To verify the disk:


prtvtoc /dev/dsk/device_name

Alternately:
touch /mount_point/sub_dir/.testfile rm /mount_point/sub_dir/.testfile

To verify the share:


share | grep mount_point

To verify NFS daemons:


ps ef | grep nfs
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-132

Monitoring the Network


connect to other hosts on the same subnet as the administrative IP address:
ping 166.98.112.253 166.98.112.253 is alive

To verify network connectivity, use ping to

To verify the application IP address, use

ifconfig to determine whether the IP address is up:

ifconfig -a

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-133

Migrating NFS Services


1. 2. 3. 4.

Make sure that the target system is available. Make sure that the disk is accessible from the target system. Make sure that the target system is connected to the network. Bring the NFS services down on the first system following the dependencies:
a. b. c.

Configure the application IP address down. Stop sharing the file system. Unmount the file system.

5.

Bring the NFS services up on the target system following the resource dependencies:
a. b. c. d.

Check and mount the file system. Start the NFS daemons if they are not already running. Share the file system. Configure and bring the application IP address up.
VCS_2.0_Solaris_R1.0_20011130 I-134

Copyright 2001 VERITAS Software

Automating High Availability


Resources are created once; this is not part of HA operation. Script the monitoring process:
How often should each resource be monitored? What is the impact of monitoring on processing power? Are there any resources to be monitored on the target system even before failing over?

Script the start and stop processes. Use high availability software to automate:
Maintain communication between systems to verify that the target system is available for failover. Observe dependencies during starting and stopping. Define actions to take when a fault is detected.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-135

Summary
You should now be able to: Describe the components required to create and share a file system using NFS. Prepare NFS resources. Describe the VCS network environment. Manually migrate the NFS services between two systems. Describe the process of automating high availability.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-136

Lab 6: Preparing NFS Resources


Student Red
RedGuiSG

Student Blue
BlueGuiSG

RedNFSSG

BlueNFSSG

c1t8d0s0 /Redfs

c1t15d0s0 /Bluefs

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-137

VERITAS Cluster Server for Solaris


Lesson 7 Resources and Agents

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-139

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe how resources and resource types are defined in VCS. Describe how agents work. Describe cluster configuration files. Modify the cluster configuration. Use the Disk resource and agent. Use the Mount resource and agent. Create a service group. Configure resources. Perform resource operations.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-140

Resources
NFS Service Group
IP

Share NIC NFS Mount

Disk

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-141

Resource Definitions (main.cf)


Type Unique Name

Attributes

Mount MyNFSMount ( MountPoint = "/test" BlockDevice = "/dev/dsk/c1t2d0s4" FSType = vxfs ) Attribute Values

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-142

Nonpersistent and Persistent Resources


Nonpersistent resources
Operations=OnOff

Persistent resources
Operations=OnOnly Operations=None

Example types.cf entry


type Disk ( static str ArgList[] = { Partition } NameRule = resource.Partition static str Operations = None str Partition )
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-143

Resource Types
NFS_IP WEB_IP IP ORACLE_IP

NFS_NIC_qfe1 NIC ORACLE_NIC_qfe2

Resource Types
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Resources
I-144

Resource Type Definitions (types.cf)


type Keyword Unique Name Arguments Passed to Agent type Mount ( static str ArgList[] = { MountPoint, BlockDevice, FSType, MountOpt, FsckOpt, SnapUmount } NameRule = resource.MountPoint str MountPoint Name Rule Definition str BlockDevice str FSType str MountOpt str FsckOpt int SnapUmount = 0
VCS_2.0_Solaris_R1.0_20011130 I-145

Attribute Types

)
Copyright 2001 VERITAS Software

Bundled Resource Types


Application Disk DiskGroup DiskReservation ElifNone FileNone FileOnOff FileOnOnly IP IPMultiNIC
Copyright 2001 VERITAS Software

Mount MultiNICA NFS NIC Phantom Process Proxy ServiceGroupHB Share Volume
VCS_2.0_Solaris_R1.0_20011130 I-146

Agents
Periodically monitor resources and send status information to the VCS engine. Bring resources online when requested by the VCS engine. Take resources offline upon request. Restart resources when they fault (depending on the resource configuration). Send a message to the VCS engine and the agent log file when errors are detected.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-147

How Agents Work


ifconfig qfe1:1 192.20.47.11 up myNFSIP types.cf type IP ( static str ArgList[] = { Device, Address, Netmask, Options, ArpDelay, IfconfigTwice } main.cf IP myNFSIP ( Device = qfe1 Address = 192.20.47.11 )
Copyright 2001 VERITAS Software

IPAgent

IP Online Entry Point myNFSIP qfe1 192.20.47.11


Online myNFSIP

VCS Engine
VCS_2.0_Solaris_R1.0_20011130 I-148

Enterprise Agents
Database Edition / HA 2.2 for Oracle Informix VERITAS NetBackup Oracle PC NetLink Sun Internet Mail Server (SIMS) Sybase VERITAS NetApp Apache Firewall (Checkpoint and Rapture) Netscape SuiteSpot
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-149

The main.cf File


Cluster-wide configuration Service groups Resources Resource dependencies Service group dependencies Resource type dependencies Resource typesby way of include statements

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-150

Cluster Definition (main.cf)


Include for type definition files.
include types.cf

Cluster name and Cluster Manager users


cluster mycluster ( UserNames = { admin = "cDRpdxPmHpzS." } CounterInterval = 5

Systems which are members of the cluster


System train7 System train8 )

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-151

Service Group Definition (main.cf)


group MyNFSSG ( SystemList = { train8 = 1, train7 = 2 } AutoStartList = { train8 }

Service Group
Mount MyNFSMount ( MountPoint = /data BlockDevice = /dev/dsk/c1t1d0s3 FSType = vxfs ) Disk MyNFSDisk ( Partition = c1t1d0s3 ) MyNFSMount requires MyNFSDisk

Service Group Attributes

Resources

Resource Attributes

Resource Dependencies

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-152

Modifying the Cluster Configuration


Online configuration:
Use Cluster Manager or the command line interface. Changes are made in memory configuration on each system while cluster is running. Save cluster configuration from memory to disk:
File>Save Configuration haconf dump

Offline configuration:
Edit main.cf. Restart VCS.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-153

Modifying Resource Types


Online configuration:
Use Cluster Manager. Use hatype command. Save changes to synchronize in-memory configuration with configuration files on disk.

Offline configuration:
Edit types.cf to change existing resource type definitions. Edit main.cf to add include statements for new agents with their own types file. Restart VCS.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-154

Changing Agent Behavior

Use Cluster Manager.

Use CLI.

hatype -modify Disk MonitorInterval 30


type Disk ( static str ArgList[] = { Partition } NameRule = group.Name +_+ resource.Partition static str Operations = None str Partition int MonitorInterval = 30 )
VCS_2.0_Solaris_R1.0_20011130 I-155

Edit types.cf.

Copyright 2001 VERITAS Software

The Disk Resource and Agent


Functions:
Online Offline Monitor None (Disk type is persistent.) None Determines whether disk is online by reading from the raw device UNIX partition device name (If no path is specified, it is assumed to be in /dev/rdsk.)

Required attributes:
Partition

No optional attributes
must exist.

Configuration prerequisites: UNIX device file Sample configuration:


Disk MyNFSDisk ( Partition=c1t0d0s0 )

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-156

The Mount Resource and Agent


Functions:
Online Offline Monitor Mounts a file system Unmounts a file system Checks mount status using stat and stavfs UNIX file system device name File system type Directory used to mount the file system

Required attributes:
BlockDevice FSType MountPoint

Optional attributes:
FsckOpt, MountOpt, SnapUmount

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-157

Mount Resource Configuration


Configuration prerequisites:
Create the file system on the disk partition (or volume). Create the mount point directory on each system. Configure the VCS Disk resource on which Mount depends. Verify that there is no entry in /etc/vfstab.

Sample configuration:
Mount myNFSMount ( MountPoint = /export1 BlockDevice = /dev/dsk/c1t1d0s3 FSType = vxfs MountOpt = -o ro )

When setting MountOpt with hares, use % to escape arguments starting with dash (-):
hares modify myNFSMount MountOpt %-o ro
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-158

Configuring a Service Group


Add Service Group Set SystemList Set Opt Attributes Success? Add/Test Resource Resource Flow Chart Test Switching Y More? N
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-159

Test Failover Set Critical Res Y N

Done

Check Logs/Fix

Link Resources

Configuring a Resource
Add Resource Set Non-Critical Modify Attributes Enable Resource Bring Online Check Log Disable Resource Clear Resource Y Waiting to Online Faulted? Done
VCS_2.0_Solaris_R1.0_20011130 I-160

Flush Group

N Online? Y
Copyright 2001 VERITAS Software

Adding a Resource
Suggest using service group name as a prefix for resource names

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-161

Modifying a Resource
Enter values for each required attribute. Modify optional attributes, if necessary. See Bundled Agents Reference Guide for a complete description of all attributes.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-162

Setting the Critical Attribute


If a critical resource is faulted or taken offline due to a fault, the entire service group fails over. By default, all resources are critical. Set the Critical attribute to 0 to make a resource noncritical.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-163

Enabling a Resource
Resources must be enabled in order to be managed by the agent. If necessary, the agent initializes the resource when it is enabled. All required attributes of a resource must be set before the resource is enabled. By default, resources are not enabled.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-164

Bringing a Resource Online


Resources in a failover service group cannot be brought online if any resource in the service group is: Online on another system Waiting to go online on another system

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-165

Creating Resource Dependencies


Parent resources depend on child resources:
Child resource must be online before parent resource can come online. Parent resource must go offline before child resource can go offline.

Parent resources cannot be persistent type resources. You cannot link resources in different service groups. Resources can have an unlimited number of parent and child resources. Cyclical dependencies are not allowed.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-166

Linking Resources

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-167

Taking a Resource Offline


Take individual resources offline in order, from the top of the dependency tree to the bottom. Use Offline Propagate to take all resources offline. The selected resource: Must be the top online resource in the dependency tree Must have no online parent resources

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-168

Clearing Faults
Faulted resources must be cleared before they can be brought online. Persistent resources are cleared when the problem is fixed and they are probed by the agent. Offline resources are probed periodically. Resources can be manually probed.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-169

Disabling a Resource
VCS calls agent on each system in SystemList. Agent calls Close entry point, if present, to reset the resource. Nonpersistent resources brought offline. Agent stops monitoring disabled resources.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-170

Deleting a Resource
Before deleting a resource:
Take all parent resources offline. Take resource offline. Disable resource. Unlink any dependent resources.

Delete all resources before deleting a service group.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-171

Summary
You should now be able to: Describe how resources and resource types are defined in VCS. Describe how agents work. Describe cluster configuration files. Modify the cluster configuration. Use the Disk resource and agent. Use the Mount resource and agent. Create a service group. Configure resources. Perform resource operations.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-172

Lab 7: Configuring Resources


Student Red
RedGuiSG

Student Blue
BlueNFSSG BlueGuiSG
BlueNFS Mount

RedNFSSG
RedNFS Mount

RedNFS Disk

BlueNFS Disk

c1t8d0s0 /Redfs disk1


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

c1t15d0s0 /Bluefs disk2


I-173

VERITAS Cluster Server for Solaris


Lesson 8 Network File System (NFS) Resources

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-175

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Prepare NFS services for the VCS environment. Describe the Share resource and agent. Describe the NFS resource and agent. Describe the NIC resource and agent. Describe the IP resource and agent. Configure and test an NFS service group.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-176

NFS Service Group

IP

Share NIC NFS Mount

Disk

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-177

NFS Setup for VCS

Major and minor numbers for block devices used for NFS services must be the same on each system.

NFS Response NFS Request

NFS Request

Stale File Handle Error

Before Failover
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

After Failover
I-178

Major/Minor Numbers for Partitions


Each system must have the same major and minor number for the shared partition. Major/minor numbers must also be unique within a system. On System A: ls -lL /dev/dsk/c1t1d0s3 brw-r----- root sys 32,134 Dec 3 11:50 /dev/dsk/c1t1d0s3 On System B: ls -lL /dev/dsk/c1t1d0s3 brw-r----- root sys 36,134 Dec 3 11:55 /dev/dsk/c1t1d0s3 To make the major numbers the same on all systems: haremajor sd major_number Example: haremajor sd 36
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-179

Major Numbers for Volumes


Verify that the major numbers match on all systems:
On System A:
grep ^vx /etc/name_to_major vxdmp 87 vxio 88 vxspec 89

On System B:
grep ^vx /etc/name_to_major vxdmp vxio vxspec 89 90 91

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-180

Changing Major Numbers for Volumes


To make the major numbers the same on all systems:
Before running vxinstall:
Edit /etc/name_to_major manually and change the VM major numbers to be the same on both systems. Reboot the systems where the change was made.

After running vxinstall: haremajor vx major_num1 major_num2 Example: haremajor vx 91 92

Each system must have the same major number for the shared volume. Major numbers must also be unique within a system.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-181

The Share Resource and Agent


Functions:
Online Offline Monitor Shares an NFS file system Unshares an NFS file system Reads /etc/dfs/sharetab file to check for an entry for the file system

Required attributes:
PathName Pathname of the file system

Optional attributes: Options Configuration prerequisites:


The file system to be shared should not be written into /etc/dfs/dfstab. Must have Mount and NFS resources configured

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-182

The NFS Resource and Agent


Functions:
Online Offline Monitor Starts the nfsd and mountd processes if they are not already running None (NFS is an OnOnly resource.) Checks for the nfsd, mountd, lockd, and statd processes

Required attributes: None Optional attributes: Nservers (default=16) Configuration prerequisites: None
Sample configuration:
NFS mySGNFS ( Nservers = 24 )

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-183

The NIC Resource and Agent


Functions:
Online Offline Monitor None (NIC is persistent.) None Uses ping to check connectivity and determine whether the interface is up NIC device name

Required attributes:
Device

Optional attributes:
NetworkType, PingOptimize, NetworkHosts

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-184

NIC Resource Configuration


Configuration prerequisites:
Configure Solaris to plumb the interface during system boot. Edit these files: /etc/hosts /etc/hostname.interface Reboot the system.

Sample configuration:
NIC mySGNIC( Device = qfe1 NetworkHosts = { 192.20.47.254, 192.20.47.253 } )

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-185

The IP Resource and Agent


Functions:
Online Offline Configures a virtual IP address on an interface Removes an IP address from an interface This is the IP address that users connect to and that fails over between systems in the cluster. Determines whether a virtual IP address is present on the interface Name of NIC Unique application (virtual) IP address

Monitor

Required attributes:
Device Address

Optional attributes:
NetMask, Options, ArpDelay (default=1s), IfconfigTwice (default=0)
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-186

IP Resource Configuration
Configuration prerequisites: Configure a NIC resource. Sample configuration:
IP mySGIP ( Device = qfe1 Address = "192.20.47.61" )

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-187

Configuring an NFS Service Group


Add Service Group Set SystemList Set Opt Attributes hagrp -add mySG hagrp -modify mySG SystemList sys1 0 sys 2 hagrp -modify mySG Attribute Value

Add/Test Resource Resource Flow Chart Y More? N


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-188

Test

Configuring NFS Resources


Add Resource Set Non-Critical Modify Attributes Enable Resource Bring Online hares -add mySGIP IP mySG hares -modify mySGIP Critical 0 hares -modify mySGIP Attribute Value hares -modify mySGIP Enabled 1 hares -online mySGIP -sys sys1 N Online? Y
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Troubleshoot Resources

Done
I-189

Troubleshooting Resources
hares -modify mySGIP Enabled 0 hagrp -flush mySG -sys sys1

Modify Attributes Enable Resource Bring Online

Check Log Disable Resource Clear Resource Y Waiting to Online Faulted? Done
VCS_2.0_Solaris_R1.0_20011130 I-190

Flush Group

hares -clear mySGIP

N Online? Y
Copyright 2001 VERITAS Software

Testing the Service Group


Test Failover Set Critical Res Y Success? N Check Logs/Fix Done

hares -modify mySGIP Critical 1 hares -modify mySGNIC Critical 1 hares -modify

hagrp -switch mySG -to sys2 hares -link mySGIP mySGNIC


Copyright 2001 VERITAS Software

Test Switching Link Resources

VCS_2.0_Solaris_R1.0_20011130

I-191

Summary
You should now be able to: Prepare NFS services for the VCS environment. Describe the Share resource and agent. Describe the NFS resource and agent. Describe the NIC resource and agent. Describe the IP resource and agent. Configure and test an NFS service group.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-192

Lab 8: Creating an NFS Service Group


Student Red RedNFSSG Student Blue BlueNFSSG
BlueNFS IP BlueNFS Share BlueNFS Mount BlueNFS NIC BlueNFS NFS

RedNFS IP RedNFS Share RedNFS Mount RedNFS NIC RedNFS NFS

RedNFS Disk

BlueNFS Disk

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-193

VERITAS Cluster Server for Solaris


Lesson 9 Event Notification

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-195

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe the VCS notifier component. Configure the notifier to signal changes in cluster status. Describe SNMP configuration. Describe event triggers. Configure triggers to provide notification.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-196

Notification
How VCS performs notification: 1. The had daemon sends a message to the notifier daemon when an event occurs. 2. The notifier daemon formats the event message and sends an SNMP trap or e-mail message (or both) to designated recipients. SMTP SNMP

notifier

had

had

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-197

Message Severity Levels


SMTP Agent has faulted Warning SNMP Information Service group is online. notifier SNMP Error SNMP SevereError Concurrency violation Resource has faulted.

had

had

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-198

Message Queues
1. 2. 3.

The had daemon stores a message in a queue when an event is detected. The message is sent over the private cluster network to all other had daemons to replicate the message queue. The notifier daemon can be started on another system in case of failure without loss of messages. SNMP SMTP SMTP SNMP

notifier

notifier

had

had

Replicated Queue
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-199

Configuring Notifier
The notifier daemon can be started and monitored by the NotifierMngr resource. Attributes define recipients and severity levels. For example: SmtpServer = "smtp.acme.com" SmtpRecipients = { "admin@acme.com" = Warning }

NotifierMngr NIC notifier

NotifierMngr NIC

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-200

The NotifierMngr Agent


Functions: Starts, stops, and monitors the notifier daemon Required attribute: PathName Full path of the notifier daemon Required attributes for SMTP e-mail notification: SmtpServer SmtpRecipients Host name of the SMTP e-mail server E-mail address and message severity level for each recipient Name of the SNMP manager and message severity level

Required attribute for SNMP notification: SnmpConsole

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-201

The NotifierMngr Resource


Optional attributes:
MessagesQueue NotifierListeningPort SnmpdTrapPort SnmpCommunity Size of message queue size; default = 30 TCP/IP port number; default =14144 TCP/IP port to which SNMP traps are sent; default=162 Community ID for the SNMP manager; default = "public"

Example resource configuration: NotifierMngr Notify_Ntfr ( PathName = "/opt/VRTSvcs/bin/notifier" SnmpConsoles = { snmpserv = Information } SmtpServer = "smtp.your_company.com" SmtpRecipients = { "vcsadmin@your_company.com" = SevereError }
Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-202

SNMP Configuration
Load MIB for VCS traps into SNMP console. For HP OpenView Network Node Manager, merge events: xnmevents -merge vcs_trapd VCS SNMP configuration files: /etc/VRTSvcs/snmp/vcs.mib /etc/VRTSvcs/snmp/vcs_trapd

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-203

Event Triggers
How VCS performs notification:
1. VCS determines if notification is enabled.

If disabled, no action is taken. If enabled, VCS runs hatrigger with event-specific parameters. 2. The hatrigger script invokes the eventspecific trigger script with parameters passed by VCS. 3. The event trigger script performs the notification tasks.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-204

Types of Triggers
Trigger
ResFault ResNotOff ResStateChange SysOffline InJeopardy NoFailover Violation LoadWarning PreOnline PostOnline PostOffline
Copyright 2001 VERITAS Software

Description
Resource faulted Resource not offline Resource changed state System went offline Cluster in jeopardy Service group cannot failover Resource online on more than one system System is overloaded

Script Name
resfault resnotoff resstatechan ge sysoffline injeopardy nofailover violation loadwarning

Service group about to come preonline online postonline Service group went online Service group went offline
VCS_2.0_Solaris_R1.0_20011130

postoffline
I-205

Configuring Triggers
Triggers enabled by presence of script file:
ResFault ResNotOff SysOffline InJeopardy Violation NoFailover PostOffline PostOnline LoadWarning

Triggers configured by service group attributes:


PreOnline ResStateChange

Triggers configured by default:


Violation

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-206

Sample Triggers
Sample trigger scripts include example code to send an e-mail message. Mail must be configured on the system invoking trigger to use sample e-mail code.
# Here is a sample code to notify a bunch of users. # @recipients=("username@servername.com"); # $msgfile="/tmp/resnotoff$2"; # `echo system = $ARGV[0], resource = $ARGV[1] > $msgfile`; # # foreach $recipient (@recipients) { # # # } #`rm $msgfile`;
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-207

# Must have elm setup to run this. `elm -s resnotoff $recipient < $msgfile`;

ResFault Trigger
Provides notification that a resource has faulted Arguments to resfault: system: Name of the system where the resource faulted resource: Name of the faulted resource

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-208

ResNotOff Trigger
Provides notification that a resource has not been taken offline If a resource is not offline on one system, the service group cannot be brought online on another. VCS cannot fail over the service group in the event of a fault, because the resource will not come offline. Arguments to resnotoff: system: Name of the system where the resource is not offline resource: Name of the resource that is not offline
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-209

ResStateChange Trigger
Provides notification that a resource has changed state Set at the service group level by the ResStateChange attribute:
hagrp serv_grp -modify TriggerResStateChange

Arguments to resstatechange: system: Name of the system where the resource faulted resource: Name of the faulted resource previous_state: State of the resource before change new_state: State of the resource after change
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-210

SysOffline Trigger
Provides notification that a system has gone offline Executed on another system when no heartbeat is detected Arguments to sysoffline: system: Name of the system that went offline systemstate: Value of the SysState attribute for the offline system

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-211

NoFailover Trigger
Run when VCS determines that a service group cannot fail over Executed on the lowest numbered system in a running state when the condition is detected Arguments to nofailover: systemlastonline: Name of the last system where the service group is online or partially online service_group: Name of the service group that cannot fail over

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-212

Summary
You should now be able to: Describe the VCS notifier component. Configure the notifier to signal changes in cluster status. Describe SNMP configuration. Describe event triggers. Configure triggers to provide notification.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-213

Lab 9: Event Notification


Student Red Student Blue
BlueNFSSG

RedNFSSG

ClusterService webip webnic

notifier

resfault nofailover sysoffline


Copyright 2001 VERITAS Software

Triggers

resfault nofailover sysoffline


I-214

VCS_2.0_Solaris_R1.0_20011130

VERITAS Cluster Server for Solaris


Lesson 10 Faults and Failovers

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-216

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe how VCS responds to faults. Implement failover policies. Set limits and prerequisites. Use system zones to control failover. Control failover behavior using attributes. Clear faults. Probe resources. Flush service groups. Test failover.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-217

How VCS Responds to Resource Faults


1. Calls ResFault trigger, if present. 2. Offlines all resources in the path of the fault starting

from the faulted resource up to the top of the dependency tree.

3. If an online critical resource is part of the path,

offlines the entire service group in preparation for failover.


4. Starts the service group on another system in the

service groups SystemList (if possible).


5. If no other systems are available, service group

remains offline and NoFailover trigger is invoked, if present.


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-218

Practice Exercise
Cas e NonCritica l Offlin e Take n offlin e due to fault Starts on another system

7 5 6 3 4 1 2 9 8

A B C D

4 4 4,6 4,6,7 4

6,7 7
I-219

Resource 4 Faults

E F

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

Practice Answers
Cas e NonCritica l Offlin e Take n offlin e due to fault Starts on another system

5 3 1

7 6 4 2

A
8 9

4 4 4,6 4,6,7 4

6,7 7

6,7 6,7 6,7 6,7 6

All All All All but 7


I-220

B C D

Resource 4 Fails

E F

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

Failover Attributes
AutoFailOver indicates whether automatic failover is enabled for the service group. Default value is 1, enabled. FailOverPolicy specifies how a target system is selected:
PrioritySystem with the lowest priority number in the list is selected (default). RoundRobinSystem with the least number of active service groups is selected. LoadSystem with greatest available capacity is selected.

Example configuration: hagrp modify group AutoFailOver 0 hagrp modify group FailOverPolicy Load
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-221

FailOverPolicy: Priority
Lowest numbered system in SystemList selected

AP1

Svr1

SystemList = {Svr1 = 0, Svr2 = 1} DB Svr3

AP2

Svr2

SystemList = {Svr3=0, Svr1=1, Svr2=2}

SystemList = {Svr2 = 0, Svr1 = 1}


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-222

FailOverPolicy: RoundRobin
System with fewest running service groups selected

Svr1

Svr3

Svr2

Svr4

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-223

FailOverPolicy: Load
Capacity = 100 AvailableCapacity = 70 AP1 Load = 30 SmSvr1 Load = 100 LgSvr1 Capacity = 100 AvailableCapacity = 80 Capacity = 200 AvailableCapacity = 100

DB1

DB2 LgSvr2 Load = 100

AP2 Load = 20
Copyright 2001 VERITAS Software

SmSvr2
VCS_2.0_Solaris_R1.0_20011130 I-224

Setting Load and Capacity


The Load and Capacity attributes are user-defined values. Set attributes using the hagrp and hasys commands. Examples:
hasys modify SmSrv1 Capacity 100 hagrp modify AP1 Load 30

AvailableCapacity calculated by VCS: Capacity minus Load equals AvailableCapacity


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-225

Load-Based Failover Example


G4 migrates to Svr1 [SystemList = {Svr1, Svr2, Svr3, Svr4}] G5 migrates to Svr3 [SystemList = {Svr1, Svr2, Svr3, Svr4}] Capacity = 100 AvailableCapacity = 50 Capacity = 100 AvailableCapacity = 50

G1 Load=20 G6 Load=30

Svr1

G3 Load=30 G7 Load=20

Svr3

Capacity = 100 AvailableCapacity = 20 G2 Load=40 G8 Load=40

Capacity = 100 AvailableCapacity = 40 G4 Load=10 G5 Load=50 Svr4


VCS_2.0_Solaris_R1.0_20011130 I-226

Svr2

Copyright 2001 VERITAS Software

The LoadWarning Trigger


Svr3 runs the LoadWarning trigger when AvailableCapacity is 20 or less (80 percent of Capacity) for 10 minutes (600 seconds). Capacity = 100 AvailableCapacity = 40 Capacity = 100 AvailableCapacity = 0

G1 Load=20 G6 Load=30 G4 Load=10

Svr1

G3 Load=30 G7 Load=20 G5 Load=50

Svr3

Capacity = 100 AvailableCapacity = 20 G2 Load=40 G8 Load=40

Svr2

System Svr3 ( System Svr3 ( Capacity=100 Capacity=100 LoadWarningLevel=80 LoadWarningLevel=80 LoadTimeThreshold=600 LoadTimeThreshold=600 Svr4 ) )
VCS_2.0_Solaris_R1.0_20011130 I-227

Copyright 2001 VERITAS Software

Dynamic Load
The DynamicLoad attribute is used in conjunction with load-estimation software. It is set using the hasys command. Capacity = 100 AvailableCapacity = 10 GA GC GD SmSvr1 is 90 percent loaded.

SmSvr1

hasys -load 90 hasys -load 90

Capacity = 200 AvailableCapacity = 40 GB GH LgSvr2

LgSvr2 is 80 percent loaded.

hasys -load 160 hasys -load 160

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-228

Limits and Prerequisites


DB1 or DB2 can fail over to either SmSvr1 or SmSvr2. Both AP1 and AP2 can fail over to either LgSvr1 or LgSvr2. LgSvr1, LgSvr2 LgSvr1, LgSvr2
Limits = { Mem=100, Processors=12 } CurrentLimits = { Mem=50, Processors=8 } DB1, DB2 DB1, DB2 Prerequisites = { Mem=50, Processors=4 } SmSvr1, SmSvr2 SmSvr1, SmSvr2 Limits = { Mem=75, Processors=6 } CurrentLimits = { Mem=50, Processors=4 } AP1 AP2 SmSvr2
I-229

DB1 LgSvr1

DB2 LgSvr2

AP1, AP2 AP1, AP2 SmSvr1 Prerequisites = { Mem=25, Processors=2 }


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Combining Capacity and Limits


When used together, VCS determines the failover target as follows:
Limits and Prerequisites are used to determine a subset of potential failover targets. Of this subset, the system with the highest value for AvailableCapacity is selected. If multiple systems have the same AvailableCapacity, the first system in SystemList is selected. Limits are hard valuesif a system does not meet the Prerequisites, the service group cannot be started on that system. Capacity is a soft limit the system with the lowest AvailableCapacity is selected, even if Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-230 AvailableCapacity results in a negative number.

Failover Zones
Preferred Failover Zone for Database Service Group sysa sysb Preferred Failover Zone for Web Service Group sysc sysd

syse

sysf

Database Web The SystemList for both service groups includes all systems in the cluster.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-231

SystemZones Attribute
Used to define the preferred failover zones for each service group. If the service group is online in a system zone, it fails to other systems in the same zone based on the FailOverPolicy until there are no further systems available in that zone. When there are no other systems for failover in the same zone, VCS chooses a system in a new zone from the SystemList based on the FailOverPolicy. To define SystemZones: Syntax: hagrp modify group_name SystemZones \ sys1 zone# sys2 zone# sys zone# Example:
hagrp modify OracleSG SystemZones sysa \ 0 sysb 0 sysc 1 sysd 1 syse 1 sysf 1
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-232

RestartLimit

Controlling Failover Behavior with Resource Type Attributes

Affects how the agent responds to a resource fault Default: 0

ConfInterval
Determines the amount of time that a tolerance or restart counter can be incremented Default: 600 seconds

ToleranceLimit
Enables the monitor entry point to return OFFLINE several times before the resource is declared FAULTED Default: 0
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-233

Restart Example
RestartLimit=1 Resource to be restarted one time within the ConfInterval timeframe ConfInterval=180 Resource can be restarted once within a three minute interval. MonitorInterval=60 seconds (default value) Resource is monitored every 60 seconds.
Online ConfInterval MonitorInterval Restart
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Online

Offline

Online

Offline

Faulted
I-234

Adjusting Monitoring
MonitorInterval
Default value is 60 seconds for most resource types. Consider reducing to 10 or 20 seconds for testing. Use caution when changing this value:
Load is increased on cluster systems. Resources can fault if they cannot respond in the interval specified.

OfflineMonitorInterval
Default is 300 seconds for most resource types. Consider reducing to 60 seconds for testing.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-235

Modifying Resource Type Attributes


Can be used to optimize agents Applied to all resources of the specified type Command line example:
hatype modify FileOnOff MonitorInterval 5

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-236

Preventing Failover
Frozen service group does not fail over when a critical resource faults. Service group must be unfrozen to enable fail over. To freeze a service group:
hagrp -freeze service_group [-persistent]

To unfreeze a service group:


hagrp -unfreeze service_group [-persistent]

A persistent freeze:
Requires the cluster configuration to be open Remains in effect even if VCS stopped and restarted

throughout the cluster

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-237

Clearing Faults
Verify that the faulted resource is offline. Fix the problem that caused the fault and clean up any residual effects. To clear a fault, type:
hares -clear resource_name [-sys system_name]

To clear all faults in a service group, type:


hagrp -clear group_name [-sys system_name]

Persistent resources are cleared by probing:


hares -probe resource_name [-sys system_name]

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-238

Probing Resources
Causes VCS to immediately monitor the resource To probe a resource, type:
hares probe resource_name sys system_name

You can clear a persistent resource by probing it after the underlying problem has been fixed.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-239

Flushing Service Groups


All online/offline agent processes are stopped. All resources in transitional states waiting to go online are taken offline. Propagation of the offline operation is stopped, but resources waiting to go offline remain in the transitional state. You must verify the physical or software resources are stopped at the operating system level after flushing to avoid creating a concurrency violation. To flush a service group, type:
hagrp flush group_name sys system_name
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-240

Testing Failover
Use test resources, such as FileOnOff, when applicable. Set lower values for MonitorInterval, OfflineMonitorInterval, and ConfInterval to detect faults more quickly. Manually online, offline, and switch the service group among all systems. Simulate failure of each resource in the service group. Simulate failover of the entire system.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-241

Testing Examples
Force a resource to fault. Reboot a system. Halt and reboot a system. Remove power from a system.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-242

Summary
You should now be able to: Describe how VCS responds to faults. Implement failover policies. Set limits and prerequisites. Use system zones to control failover. Control failover behavior using attributes. Clear faults. Probe resources. Flush service groups. Test failover.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-243

Lab 10: Faults and Failovers


Student Red Student Blue

RedNFSSG

BlueNFSSG

resfault nofailover sysoffline


Copyright 2001 VERITAS Software

Triggers

resfault nofailover sysoffline


I-244

VCS_2.0_Solaris_R1.0_20011130

VERITAS Cluster Server for Solaris


Lesson 11 Installing and Upgrading Applications in the Cluster

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-246

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe the benefits of keeping applications available during planned maintenance. Freeze service groups and systems. Upgrade a system in a running cluster. Describe the differences in application upgrades. Apply guidelines for installing new applications in the cluster.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-247

Maintenance and Downtime


Client <1% LAN/WAN Equip. <1% Environment 5% People 15% Hardware 10%

Planned Downtime 30%


Software 40%

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-248

Operating System Update

Frozen

Web Server

Web Requests Operating System Update


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-249

Application Upgrade
WebSG DatabaseSG

Frozen

Update Web Application


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-250

Freezing a System
Freezing a system prevents service groups from failing to it. Failover can still occur from a frozen system. Freeze a system while maintenance is being performed. Persistent freeze remains in effect through VCS restarts. Evacuate moves service groups off the frozen system. Syntax:
hasys freeze [persistent] [-evacuate] systemA hasys unfreeze [persistent] systemA

Use hasys to determine if a system is frozen:


hasys display Frozen hasys display TFrozen
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-251

Freezing a Service Group


Freezing a service group prevents it from being taken offline, brought online, or failed over, even if a concurrency violation occurs. Example update scenario:
Freeze the service group. Update the application on the system(s) that are not currently running the application. 3. Unfreeze the service group. 4. Move the service group to an updated system and apply the application update on the original system.
1. 2.

Persistent freeze remains in effect, even if VCS is stopped and restarted throughout the cluster. Syntax:
hagrp freeze service_group [persistent]

Use hagrp to determine if a group is frozen:


hagrp display service_group attribute Frozen hagrp display service_group attribute TFrozen
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-252

Upgrading a SystemReboot Required


Start Open the configuration: haconf -makerw Freeze and evacuate system: hasys -freeze persistent -evacuate systemA Stop VCS on system: hastop -sys systemA Perform upgrade. Reboot system. Unfreeze the system: hasys unfreeze persistent systemA
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Yes More systems To upgrade? No Move service groups to appropriate systems: hagrp -switch mySG -to systemA Close the configuration: haconf -dump -makero

Done
I-253

Differences in Application Upgrades


Rolling upgrades No simple reversion from upgrade Multiple installation directories Upgrading without rebooting

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-254

Installing Applications: Program Files on Shared Storage


Advantages:
Simplifies application setup and maintenance Application service group is self-containedall program and data files are located on file systems within the service group.

Disadvantages:
Rolling upgrades cannot be performed. Downtime increased during maintenance

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-255

Binaries on Local Storage


Advantages:
Minimizes downtime during application maintenance May be able to perform rolling upgrades (depending on the application)

Disadvantages:
Must maintain multiple copies of the application Not scalable due to maintenance overhead in clusters with large numbers of service groups and systems

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-256

Application Installation Guidelines


Determine where to install program files (locally or shared disk) based on your cluster environment. Install application data files on a shared storage partition that is accessible to each system that can run the application. Specify identical installation options. Use the same mount point when installing the application on each system.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-257

Summary
You should now be able to:
Describe the benefits of keeping applications available during planned maintenance. Freeze service groups and systems. Upgrade a system in a running cluster. Describe the differences in application upgrades. Apply guidelines for installing new applications in the cluster.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-258

Lab 11: Installing Applications in the Cluster


Student Red Student Blue

RedNFSSG BlueNFSSG

Install Volume Manager


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-259

VERITAS Cluster Server for Solaris


Lesson 12 Volume Manager and Process Resources

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-261

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe how Volume Manager enhances high availability. Describe Volume Manager storage objects. Configure shared storage using Volume Manager. Create a service group with Volume Manager resources. Configure Process resources. Configure Application resources.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-262

Volume Management
Physical Disks

Virtual Volumes

System1
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

System2
I-263

Volume Manager Objects


Physical Disks

VxVM Disks VxVM Disks Volumes Volumes Subdisks Subdisks Disk Group Disk Group Plexes Plexes
Subdisk Subdisk Subdisk

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-264

Disk Groups
Physical Disks
Disk1

VxVM Disks Disk Group: testDG VxVM objects cannot span disk groups. Disk groups represent management and configuration boundaries. Disk groups enable high availability.

Disk2

Disk3

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-265

VxVM Volume
Physical Disks
Disk1

VxVM Disks VxVM Volume

Disk2

Volume1
Disk3

Disk Group: testDG


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-266

Volume Manager Configuration


Initialize disk(s).
vxdisksetup -i device

Create a disk group.


vxdg init disk_group disk_name=device

Create a volume.
vxassist -g disk_group make vol_name size

Make a file system.


mkfs -F vxfs volume_device

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-267

Testing Volume Manager Configuration


On the first system:
1. Create a mount point directory. 2. Mount the VMVol file system on the first

system. 3. Verify that the file system is accessible. 4. Unmount the file system. 5. Deport the disk group.

On the next system(s):


1. Create a mount point directory with the 2. 3. 4. 5. 6.
Copyright 2001 VERITAS Software

same name. Import the disk group. Start the volume. Mount and verify the file system. Unmount the file system. Deport the disk group.
VCS_2.0_Solaris_R1.0_20011130

I-268

Volume Manager Resources


Proc

VMSG

Mount

VMVol

VMDG

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-269

DiskGroup Resource and Agent


Functions:
Online Offline Monitor Imports a Volume Manager disk group Deports a disk group Determines the state of the disk group using vxdg

Required attributes:
DiskGroup Name of the disk group

Optional attributes:
StartVolumes, StopVolumes

Configuration Prerequisites:
Disk group and volume must be configured.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-270

Volume Resource and Agent


Functions:
Online Offline Monitor Starts a volume Stops a volume Reads a byte of data from the raw device interface for the volume

Required attributes:
DiskGroup Name of the disk group Volume Name of the volume

Optional attributes: None Configuration Prerequisites:


Disk group and volume must be configured.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-271

Configuring a Service Group


Add Service Group Set SystemList Set Opt Attributes Success? Add/Test Resource Resource Flow Chart Test Switching Y More? N
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-272

Test Failover Set Critical Res Y N

Done

Check Logs/Fix

Link Resources

Configuring a Resource
Add Resource Set Non-Critical Modify Attributes Enable Resource Bring Online Check Log Disable Resource Clear Resource Y Waiting to Online Faulted? Done
VCS_2.0_Solaris_R1.0_20011130 I-273

Flush Group

N Online? Y
Copyright 2001 VERITAS Software

Process Resource and Agent


Functions:
Online Offline Monitor Starts a daemon process Stops a process Determines whether the process is running using procfs

Required attributes:
PathName Full path of the executable file

Optional attributes:

Arguments Use % to escape dashed arguments: hares modify myProc Arguments %-db q1h

Sample Configuration:
Process sendmail ( PathName = /usr/lib/sendmail Arguments = -db -q1h )
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-274

The Application Resource and Agent


Functions:
Online Offline Monitor Clean Brings an application online using StartProgram Takes an application offline using StopProgram Monitors the status of the application in a number of ways Takes the application offline using CleanProgram or kills all the processes specified for the application

Required Attributes:
StartProgram Name of executable to start application StopProgram Name of executable to stop application One or more of the following: MonitorProgram Name of executable to monitor application MonitorProcesses List of processes to be monitored PidFiles List of pid files that contain the process ID of the processes to be monitored

Optional Attributes:
CleanProgram, User
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-275

Application Resource Configuration


Configuration prerequisites:
The application should have its own start and stop programs. It should be possible to monitor the application by either running a program that returns 0 for failure and 1 for success or by checking a list of processes.

Sample configuration:
Application samba_app ( StartProgram = /usr/sbin/samba start StopProgram = /usr/sbin/samba stop PidFiles = { /var/lock/samba/smbd.pid } MonitorProcesses = { smbd } )

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-276

Summary
You should now be able to:
Describe how Volume Manager enhances high availability. Describe Volume Manager Storage Objects. Configure shared storage using Volume Manager. Create a service group with Volume Manager resources. Configure Process resources. Configure Application resources.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-277

Lab 12: Volume Manager and Process Resources


Student Red ProdSG
Prod Loopy Prod Mount ProdVol Test Loopy Test Mount TestVol TestDG

Student Blue TestSG

ProdDG

RedNFSSG
ProdDG ProdVol /prod
Copyright 2001 VERITAS Software

BlueNFSSG
TestDG TestVol /test
VCS_2.0_Solaris_R1.0_20011130 I-278

VERITAS Cluster Server for Solaris


Lesson 13 Cluster Communication

Overview
Troubleshooting

Cluster Communication Event Notification Service Group Basics

Using Volume Manager Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-280

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Describe how systems communicate in a cluster. Describe the LLT and GAB configuration files and commands. Reconfigure LLT and GAB. Describe the effects of cluster communication failures. Recover from communication failures. Configure the InJeopardy trigger. Troubleshoot LLT and GAB.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-281

Cluster Communication
agent agent agent agent agent agent

Agent Framework had GAB LLT

Agent Framework had GAB LLT

System A
Copyright 2001 VERITAS Software

System B

VCS_2.0_Solaris_R1.0_20011130

I-282

GAB Membership Status


Determines cluster membership using heartbeat signals Heartbeats transmitted by LLT Membership determined by cluster ID number
GAB LLT GAB LLT GAB LLT GAB LLT

System A

System B

System C Cluster 1

System D

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-283

Cluster State
GAB tracks all changes in configuration and resource status. Sends atomic broadcast to immediately transmit new configuration and status
Add Resource

1 2

1 2 3 4 5 6

3 6

1 2 3 4 5 6

4 5

1 2 3 4 5 6
I-284

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

Low Latency Transport (LLT)


Provides traffic distribution across all private links Sends and receives heartbeats Transmits cluster configuration data Determines whether connections are reliable (more than one exists) or unreliable Runs in kernel for best performance Connection-oriented Uses DLPI over Ethernet Nonroutable
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-285

Configuring LLT
Required configuration files: /etc/llttab /etc/llthosts Optional configuration file: /etc/VRTSvcs/conf/sysname

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-286

The llttab File


set-node train1 set-cluster 10 # Solaris example link qfe0 /dev/qfe:0 - ether - link hme0 /dev/hme:0 - ether - start

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-287

Setting Node Number and Name


# /etc/llttab 0 - 255 set-cluster 10 set-node /etc/VRTSvcs/conf/sysname link qfe0 /dev/qfe:0 - ether - link hme0 /dev/hme:0 - ether - link-lowpri qfe1 /dev/qfe:1 - ether - start # /etc/llthosts 3 sysa 7 sysb

0 - 31

# /etc/VRTSvcs/conf/sysname sysb
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-288

The link Directive


Tag Name Range (all) SAP

# /etc/llttab set-node 1 set-cluster 10 # Solaris example link qfe0 /dev/qfe:0 - ether - link hme0 /dev/hme:0 - ether - link-lowpri qfe1 /dev/qfe:1 - ether - start
Device:Unit Link Type MTU

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-289

Low Priority Link


Public network link as redundant private network link LLT sends only heartbeats on low priority link if other private network links are functional. Rate of heartbeats slower to reduce traffic Low priority link is used for all cluster communication if all private links fail. Public network can be saturated with cluster traffic. Risk of system panics if the same system ID/cluster ID is present on network Configured with link-lowpri directive
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-290

Other LLT Directives


# for verbose messages from # lltconfig, add this line first # in llttab set-verbose 1 # the following will cause only # nodes 0-7 to be valid for # cluster participation exclude 8-31 # peerinact specifies how long the link is # down before marked inactive set-timer peerinact: 1600 # regulates heartbeat interval set-timer heartbeat:50 set-timer heartbeatlo:100 start
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-291

The llthosts File


Format:

node_number name
Example entries:

1 systema 2 systemb 3 systemc


No spaces before number Have same entries on all systems Unique node numbers required System names match llttab, main.cf System names match sysname, if used
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-292

The sysname File


Enables llttab and llthosts to be identical on all systems Must be different on each system Contains unique system name Removes dependency on UNIX node name System name must be in llthosts System name must match main.cf

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-293

GAB Configuration
GAB configuration file: /etc/gabtab GAB configuration command entry: /sbin/gabconfig -c -n seed_number Seed number is set to number of systems in the cluster. Starts GAB under normal conditions Other options discussed later

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-294

Changing Communication Configuration

Stop VCS

Start VCS

Stop GAB

Start GAB

Stop LLT

Edit Files

Start LLT

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-295

Stopping GAB and LLT


Stop VCS engine first. Stop GAB on each system:
/sbin/gabconfig -U

Stop LLT:
/sbin/lltconfig -U

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-296

Starting LLT
Edit configuration files on each system before starting LLT on any system. Start LLT on each system in the cluster: /sbin/lltconfig -c LLT starts if configuration files are correct.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-297

Starting GAB
Start LLT before starting GAB. Start GAB on each system, specifying a value for -n equal to the number of systems in the cluster: /sbin/gabconfig -c -n #

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-298

Starting LLT and GAB Automatically


Startup files added when VCS is installed: /etc/rc2.d/S70llt /etc/rc2.d/S92gab

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-299

The LinkHbStatus Attribute


Internal VCS system attribute that provides link status information Use hasys command to view status:
hasys -display system -attribute LinkHbStatus hme:0 UP qfe:0 UP

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-300

The lltstat Command


train12# lltstat -nvv |pg LLT node information: Node * 0 train12 State OPEN link1 link2 link3 1 train11 OPEN link1 link2 link3 Shows which system runs the command UP UP UP 08:00:20:B4:0C:3B 08:00:20:B4:0C:3B 08:00:20:B4:0C:3B UP UP UP 08:00:20:AD:BC:78 08:00:20:AD:BC:79 08:00:20:B7:08:5C Link Status Address

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-301

Other lltstat Options


train12#lltstat -c train12#lltstat -c LLT configuration information: LLT configuration information: node: 20 node: 20 name: train3 name: train3 cluster: 10 cluster: 10 version: 1.1 version: 1.1 nodes: 20 - 21 nodes: 20 - 21 max nodes: 32 max nodes: 32 max ports: 3 max ports: 3 () () train12# lltstat -l train12# lltstat -l LLT link information: LLT link information: Link Tag State Type Pri Link Tag State Type Pri 0 hme0 on ether hipri 0 hme0 on ether hipri 1 1 2 2 qfe0 qfe0 qfe1 qfe1 on on on on ether ether ether ether hipri hipri lowpri lowpri

SAP SAP 0xCAFE 0xCAFE 0xCAFE 0xCAFE 0xCAFE 0xCAFE

MTU Addrlen Xmit MTU Addrlen Xmit 1500 6 3732 1500 6 3732 1500 1500 1500 1500 6 6 6 6

Recv .. Recv .. 3678 0 3678 0 3731 3674 0 3731 3674 0 1584 6719 0 1584 6719 0

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-302

The lltconfig Command


train12# lltconfig -a list Link 0 (qfe0): Node 0 : 08:00:20:AD:BC:78 permanent Node 1 : 08:00:20:AC:BE:76 permanent Node 2 : 08:00:20:AD:BB:89 permanent Link 1 (hme0): Node 0 : 08:00:20:AD:BC:79 permanent Node 1 : 08:00:20:AC:BE:77 permanent Node 2 : 08:00:20:AD:BB:80 permanent

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-303

GAB Membership Notation


# /sbin/gabconfig -a 20 Placeholder GAB Port Memberships ===============================================
Port a gen a36e003 membership 01 Port h gen fd57002 membership 01 had is communicating. Nodes 0 and 1 Indicates 10s Digit (0 displayed if node 10 is a member of the cluster) Nodes 21 and 22 ; ; ;12 ;12

GAB is communicating.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-304

Communication Failures
Network partition:
Failure of all Ethernet heartbeat links between one or more systems: Occurs when one or more systems fail Also occurs when all Ethernet heartbeat links fail

Split brain:
Failure of Ethernet heartbeat links is misinterpreted as failure of one or more systems. Multiple systems start running the same failover application. Leads to data corruption if applications using shared storage
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-305

Split-Brain Condition
Changing Block 20460 Changing Block 20460

Block 20460 INVALID

Shared Storage
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-306

Preventing Split-Brain Condition


Redundant heartbeat channels: Multiple private network heartbeats Public network heartbeat Disk heartbeats Service group heartbeat

SCSI disk reservation Jeopardy Autodisabling Seeding PreOnline trigger


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-307

Jeopardy Condition
A special type of cluster membership called jeopardy is formed when one or more systems have only a single Ethernet heartbeat link. Service groups continue to run, and the cluster functions normally. Failover and switching at operator request are unaffected. The service groups running on a system in jeopardy are not taken over by another system if a system failure is detected by VCS.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-308

Jeopardy Example
SG_1 SG_2 SG_3

Regular Membership: A, B Jeopardy Membership: C


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-309

Network Partition Example


Autodisabled for C SG_1 SG_2 Autodisabled for A,B SG_3

B 1

2 Regular Membership: A, B No Jeopardy Membership


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

New Regular Membership No Jeopardy Membership


I-310

Split Brain Example


Service Groups Not Autodisabled SG_1 SG_2 SG_3 SG_1 SG_2 SG_3

Regular Membership: A, B No Jeopardy Membership


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

New Regular Membership No Jeopardy Membership


I-311

Recovery Behavior
When a private network is reconnected after a network partition, VCS and GAB are stopped and restarted as such:
Two-system cluster:
System with the lowest LLT node number continues to run VCS. VCS is stopped on higher-numbered system.

Multi-system cluster:
Mini-cluster with the most systems running continues to run VCS. VCS is stopped on the systems in the smaller mini-cluster(s). If split into two equal size mini-clusters, the cluster containing the lowest node number continues to run VCS.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-312

Configuring Recovery Behavior


Modify /etc/gabtab. For example: /sbin/gabconfig -c -n 2 j Causes high numbered node to panic if GAB tries to start after all Ethernet connections simultaneously stop and then restart Split brain avoidance mechanism

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-313

Preexisting Network Partitions


This condition is caused by failure in private network communication channels while systems are down. A preexisting network partition can lead to split brain when systems are started. VCS uses seeding to prevent split brain condition in the case of a network partition.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-314

Seeding
Prevents split brain Only seeded systems can run VCS. Systems are seeded only if GAB can communicate with other systems. Seeding determines the number of systems that must be communicating to allow VCS to start.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-315

Manually Seeding the Cluster


To start GAB and seed the system on which the command runs: gabconfig -c x
Warning: Do not use these options in gabtab.

Overrides n; allows GAB to immediately seed the cluster so VCS can build a running configuration Use when the number of systems available is less than the number specified by n in /etc/gabtab. Only use on one system in the cluster; others then seed from first system.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-316

The InJeopardy Trigger


To configure, add an injeopardy script to /opt/VRTSvcs/bin/triggers. The trigger is called when a system transitions from regular cluster membership to jeopardy. Arguments are the name of the system in jeopardy and the system state. The trigger is invoked on all systems that are part of jeopardy membership. The InJeopardy trigger is not run when:
A system loses its last network link. A system loses both private network links at once. A system transitions from any other state (such as down state) to jeopardy state.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-317

The lltdump Command


train12# lltdump -f /dev/qfe:0 -V -A -R DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000b9 len 0132 ack 0000007c 01 01 64 05 00 00 00 01 00 07 89 00 DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000bb len 0166 01 01 64 05 00 00 00 01 00 07 88 00 DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000bc len 0166 ack 00000080 01 01 64 05 00 00 00 01 00 07 89 00 DAT C 100 S 01 D 00 P 007 rdy 80000081 seq 000000bf len 0176 ack 00000083 01 01 64 05 00 00 00 01 00 07 89 00

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-318

The lltshow Command


train12# lltshow -n 0 |pg === LLT node 0: nid= 0 state= 4 OPEN my_gen= 3a89ec14 peer_gen= 0 flags= 0 links= 3 opens= ffffffff readyports= 0 rexmitcnt= 0 nxtlink= 0 lastacked= 0 nextseq= 0 recv_seq= 0 xmit_head= 0 xmit_tail= 0 xmit_next= 0 xmit_count= 0 recv_reseq= 0 oos= 0 retrans= 0 retrans2= 0 link [0]: hb= 0 hb2= 0 peerinact= 0 lasthb= 0 valid= 1 perm= 1 flags= 0 stat= 1 arpmode= 0 addr= 08 00 20 AD BC 78 00 00 00 00 dlpi_hdr= 00 00 00 07 00 00 00 08 00 00 00 14 00 00 00 64 00 00 00 00 08 00 20 AD BC 78 CA FE 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Identifies LLT Packets on Public Network


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-319

Common LLT Problems


Node or cluster number out of range:
Node number must be between 0 and 31. Cluster number must be between 0 and 255.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-320

Incorrect LLT Specification


Incorrectly specified Ethernet link device: qf3 should be qfe LLT not started: Check /etc/llttab for the start directive.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-321

Common GAB Problems


No GAB membership
gabconfig -a gabconfig -c -nN

GAB starts then shuts down


Check cabling

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-322

Problems with main.cf


VCS does not start:
Check main.cf for incorrect entries.

hacf -verify aborts:


Check system names in main.cf to verify that they match llthosts and llttab.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-323

Summary
You should now be able to: Describe how systems communicate in a cluster. Configure the Low Latency Transport (LLT). Configure the Group Membership and Atomic Broadcast (GAB) mechanism. Start and stop LLT and GAB. Configure the InJeopardy trigger. Troubleshoot LLT and GAB.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-324

Lab 13: Cluster Communication


Student Red Student Blue
Test Loopy Test Mount TestVol TestDG

Prod Loopy Prod Mount ProdVol

TestSG

ProdDG

RedNFSSG
injeopardy
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

BlueNFSSG

I-325

VERITAS Cluster Server for Solaris


Lesson 14 Troubleshooting

Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-327

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Objectives
After completing this lesson, you will be able to: Monitor system and cluster status. Apply troubleshooting techniques in a VCS environment. Detect and solve VCS communication problems. Identify and solve VCS engine problems. Correct service group problems. Solve problems with agents. Resolve problems with resources. Plan for disaster recovery.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-328

Monitoring VCS
VCS log files System log files The hastatus utility SNMP traps Event notification triggers Cluster Manager

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-329

VCS Log Entries


Engine log: /var/VRTSvcs/log/engine_A.log
TAG_D 2001/04/03 12:17:44 VCS:11022:VCS engine (had) started TAG_D 2001/04/03 12:17:44 VCS:10114:opening GAB library TAG_C 2001/04/03 12:17:45 VCS:10526:IpmHandle::recv peer exited errno 10054 TAG_E 2001/04/03 12:17:52 VCS:10077:received new cluster membership TAG_E 2001/04/03 12:17:52 VCS:10080:Membership: 0x3, Jeopardy: 0x0 TAG_D 2001/04/03 12:17:52 VCS:10322:Node '1' changed state from 'UNKNOWN' to 'INITING' TAG_B 2001/04/03 12:17:52 VCS:10455:Operation 'haclus -modify(0xc13)' rejected. Most Recent Sysstate=CURRENT_DISCOVER_WAIT,Channel=BCAST,Flag s=0x40000
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-330

Agent Log Entries


Agent logs kept in /var/VRTSvcs/log Log files named AgentName_A.log LogLevel attribute settings: none error (default setting) info debug all To change log level:
hatype -modify res_type LogLevel debug

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-331

Troubleshooting Guide
Primary types of problems:
Cluster communication VCS engine startup Service groups and resources

Determine path based on hastatus output:


Cluster communication problem indicated by message:
Cannot connect to server -- Retry Later

VCS engine startup problem indicated by systems with WAIT status Service group and resource problems indicated when VCS engine in RUNNING state
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-332

Cluster Communication Problems


Run gabconfig a.
No port a membership indicates a communication problem. No port h membership indicates a VCS engine (had) startup problem.
# gabconfig -a GAB Port Memberships ===================================
Communication Problem: GAB Not Seeded

VCS Engine Not Running: GAB and LLT Functioning


Copyright 2001 VERITAS Software

# gabconfig -a GAB Port Memberships =================================== Port a gen 24110002 membership 01 Port h gen 65510002 membership
VCS_2.0_Solaris_R1.0_20011130 I-333

Problems with GAB and LLT


If GAB is not seeded (no port memberships):
Run lltconfig to determine if LLT is running. Run lltstat -n to determine if systems can see each other on the LLT link. Check the physical network connection(s) if LLT cannot see each node. Check gabtab for correct seed value (-n) if LLT links are functional.

Manually seed the cluster, if necessary.


lltconfig LLT is running lltstat -n LLT node information: Node State * 0 train11 OPEN 1 train12 OPEN
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Links 2 2
I-334

VCS Engine Startup Problems


Start the VCS engine using hastart. Check hastatus to determine system state. If not running:
If ADMIN_WAIT or STALE_ADMIN_WAIT, see next sections. Check logs. Verify that the llthosts file exists and system entries match cluster configuration (main.cf). Check gabconfig.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-335

STALE_ADMIN_WAIT
To recover from STALE_ADMIN_WAIT state: 1. Visually inspect the main.cf file to determine whether it is valid. 2. Edit the main.cf file, if necessary.
3. Verify the syntax of main.cf, if modified.

hacf verify config_dir


4. Start VCS on the system with the valid

main.cf file:

hasys -force system_name


5. All other systems perform a remote build

from the system now running.


VCS_2.0_Solaris_R1.0_20011130

Copyright 2001 VERITAS Software

I-336

ADMIN_WAIT
A system can be in the ADMIN_WAIT state under these circumstances:
A .stale flag exists and the main.cf file has a syntax problem. A disk error occurs affecting main.cf during a local build. The system is performing a remote build and last running system fails.

Restore main.cf and use procedure for STALE_ADMIN_WAIT.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-337

Service Group Not Configured to AutoStart or Run


Service group not onlined automatically when VCS starts:
Check AutoStart and AutoStartList attributes:

hagrp display service_group Service group not configured to run on the system:
Check the SystemList attribute. Verify that the system name is included.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-338

Service Group AutoDisabled


Autodisable occurs when: GAB sees a system but had is not running on the system. Resources of the service group are not fully probed on all systems in the SystemList. A particular system is visible through disk heartbeat only. Make sure that the service group is offline on all systems in SystemList attribute. Clear the AutoDisabled attribute:
hagrp autoenable service_group -sys system

Bring the service group online.


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-339

Service Group Waiting for Dependencies


Check service group dependencies: hagrp -dep service_group Check resource dependencies: hares -dep resource

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-340

Service Group Not Fully Probed


Usually a result of misconfigured resource attributes Check ProbesPending attribute:
hagrp -display service_group

Check which resources are not probed:


hastatus -sum

Check Probes attribute for resources: hares -display To probe resources: hares probe resource -sys system
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-341

Service Group Frozen


Verify value of Frozen and TFrozen attributes:
hagrp -display service_group

Unfreeze the service group:


hagrp -unfreeze group [-persistent]

If you freeze persistently, you must unfreeze persistently.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-342

Service Group Is Not Offline Elsewhere


Determine which resources are online/offline:
hastatus -sum

Verify the State attribute:


hagrp -display service_group

Offline the group on the other system:


hagrp -offline

Flush the service group:


hagrp -flush service_group -sys system

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-343

Service Group Waiting for Resource


Review Istate attribute of all resources to determine which resource is waiting to go online. Use hastatus to identify the resource. Make sure the resource is offline (at the operating system level). Clear the internal state of the service group: hagrp flush service_group -sys system Bring all other resources in the service group offline and try to bring these resources online on another system. Verify that the resource works properly outside VCS. Check for errors in attribute values.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-344

Incorrect Local Name


1. Create /etc/VRTSvcs/conf/sysname with 2. Stop the local system. 3. Start VCS. 4. List all system names. 5. Open the configuration. 6. Delete any systems with incorrect names. 7. Save the configuration.

the correct system name shown in main.cf.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-345

Concurrency Violations
Occurs when a failover service group is online or partially online on more than one system Notification provided by the Violation trigger:
Invoked on the system that caused the concurrency violation Notifies the administrator and takes the service group offline on the system causing the violation Configured by default with the violation script in /opt/VRTSvcs/bin/triggers Can be customized: Send message to the system log. Display warning on all cluster systems. Send e-mail messages.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-346

Service Group Waiting for Resource to Go Offline


Identify which resource is not offline: hastatus summary Check logs. Manually bring the resource offline, if necessary. Configure ResNotOff trigger for notification or action.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-347

Agent Not Running


Determine whether the agent for that resource is FAULTED: hastatus summary Use the ps command to verify that the agent process is not running. Verify values for ArgList and ArgListValues type attributes: hatype display res_type Restart the agent: haagent start res_type -sys system

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-348

Problems Bringing Resources Online


Possible causes of failure while bringing resources online:
Waiting for child resources Stuck in a WAIT state Agent not running

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-349

Problems Bringing Resource Offline


Waiting for parent resources to come offline Waiting for a resource to respond Agent not running

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-350

Critical Resource Faults


Determine which critical resource has faulted: hastatus summary Make sure that the resource is offline. Examine the engine log. Fix the problem. Verify that the resources work properly outside of VCS. Clear fault in VCS.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-351

Clearing Faults
After external problems are fixed:
1.

Clear any faults on nonpersistent resources. hares -clear resource -sys system Check attribute fields for incorrect or missing data.

2.

If service group is partially online:


1.

Flush wait states: hagrp -flush service_group -sys system Bring resources offline first before bringing them online.

2.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-352

Planning for Disaster Recovery


Back up key VCS files:
types.cf and customized types files main.cf main.cmd sysname LLT and GAB configuration files Customized trigger scripts Customized agents

Use hagetcf to create an archive.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-353

The hagetcf Utility


# hagetcf Saving 0.13 MB Enter path where configuration can be saved (default is /tmp): Collecting package info Checking VCS package integrity Collecting VCS information Collecting system configuration .. Compressing /tmp/vcsconf.train12.tar to /tmp/vcsconf.train12.tar.gz Done. Please e-mail /tmp/vcsconf.train12.tar.gz to your support provider.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-354

Summary
You should now be able to: Monitor system and cluster status. Apply troubleshooting techniques in a VCS environment. Identify and solve VCS engine problems. Correct service group problems. Solve problems with agents. Resolve problems with resources. Plan for disaster recovery.
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-355

Lab Exercise
Lesson 14 Troubleshooting

VERITAS Cluster Server for Solaris


Appendix D Special Situations

Overview
This lesson provides a guide for managing certain situations in a cluster environment: VCS upgrades VCS patches System changes: Adding, removing, and replacing cluster systems

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-358

Objectives
After completing this lesson, you will be able to: Upgrade VCS software to version 2.0 from any earlier versions. Install a VCS patch. Add systems to a running VCS cluster. Remove systems from a running VCS cluster. Replace systems in a running VCS cluster.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-359

Preparations for VCS Upgrade


Acquire the new VCS software. Contact VERITAS Technical Support. Read the release notes. Write scripts to automate as much of the process as possible. If available, deploy on a test cluster first.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-360

VCS Upgrade Process


Start I. Complete initial preparation.

II. Stop the existing VCS software. III. Remove the existing VCS software and add the new VCS version. IV. Verify the configuration and make changes as needed. V. Start VCS on one system and propagate the configuration to others. Done
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-361

Step I - Initial Preparation


1.

Open the cluster configuration and freeze all service groups persistently:
haconf makerw hagrp list hagrp freeze group_name -persistent

2. 3.

Save and close the VCS configuration:


haconf dump -makero

Make a backup of the full configuration, including:


All configuration files Any custom-developed agents Any modified VCS scripts

4.

Rename the existing types.cf file:


mv /etc/VRTSvcs/conf/config/types.cf \ /etc/VRTSvcs/conf/config/types.save
VCS_2.0_Solaris_R1.0_20011130 I-362

Copyright 2001 VERITAS Software

Step II - Stopping VCS Software


1.

Stop the VCS engine on all systems leaving the application services running:
hastop all -force

2.

Remove heartbeat disk configurations:


gabdiskhb l gabdiskx l gabdiskhb d disk_name gabdiskx d device_name

3.

Stop GAB and unload GAB:


gabconfig U modinfo | grep gab modunload -i modid

4.

Stop and unload LLT.


lltconfig U modinfo | grep llt modunload -i modid

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-363

Step III - Removing Old and Adding New VCS Software


1.

Remove the existing VCS (pre-2.0) software packages.


pkgrm VRTScscm VRTSvcs VRTSgab VRTSllt \ VRTSperl

2.

Add the new VCS software packages.


pkgadd d /package_directory

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-364

Step IV - Verifying and Changing the Configuration


1.

Determine differences between existing and new types.cf files:


diff /etc/VRTSvcs/conf/config/types.save \ /etc/VRTSvcs/conf/config/types.cf

2.

Merge the new and old versions of types.cf files:


a. Check changes in attribute names. b. Check modified resource type attributes.

3. 4.

Compare and merge any necessary changes to VCS scripts. Verify the configuration files:
hacf verify /etc/VRTSvcs/conf/config

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-365

Step V - Starting The VCS Cluster


1.

On all systems in the cluster, start LLT and GAB:


lltconfig -c gabconfig c n #

2.

Start the VCS engine on the system where the changes were made:
hastart

3.

Start the VCS engine on all other systems in the cluster in a stale state:
hastart -stale

4.

Open the configuration, unfreeze the service groups, and save and close the configuration:
haconf makerw hagrp unfreeze group_name persistent haconf dump -makero

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-366

Installing a VCS Patch


Start

Same Same as in as in VCS VCS Upgrade Upgrade

I. Carry out the initial preparation

II. Stop the old VCS software

III. Install and verify the new patch

IV. Start the VCS software

Done
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-367

Step III - Installing and Verifying the New Patch


1.

Verify that VRTS* packages are all version 2.0.


pkginfo l VRTSgab VRTSllt VRTSvcs \ VRTSperl | grep VERSION

2.

Add the new VCS patch on each system using the provided utility.
./vcs_install_patch

3.

Verify that the new patch has been installed.


showrev p | grep VRTS

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-368

Step IV - Starting the VCS Cluster


1.

Start LLT, GAB, and VCS on all systems in the cluster.


lltconfig c gabconfig c n # hastart

2.

Open the configuration, unfreeze the service groups, and save and close the configuration:
haconf makerw hagrp unfreeze group_name persistent haconf dump -makero

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-369

Adding Systems to a Running VCS Cluster


1. Configure LLT with the same cluster number

and a unique node id on the new system.

2. Configure GAB. 3. Connect the new system to the private

network.

4. Edit /etc/llthosts files on all systems in

the cluster to add the system name and node ID of the new system.

5. Start LLT, GAB, and VCS on the new system. 6. Change the SystemList attribute for each

service group that can run on the new system.


VCS_2.0_Solaris_R1.0_20011130 I-370

Copyright 2001 VERITAS Software

Removing Systems from a Running VCS Cluster


1. 2. 3.

Switch all running service groups to other systems and freeze the system. Stop VCS on the system using hastop -local. Stop GAB on the system:
gabconfig U modinfo | grep gab modunload -i modid

4.

Stop and unload LLT on the system:


lltconfig U modinfo | grep llt modunload i modid

5. 6. 7.

Remove the system from the cluster configuration:


hasys delete system_name

Edit /etc/llthosts on all systems to delete the entry for the system to be removed. Remove llttab and gabtab files on that system.
VCS_2.0_Solaris_R1.0_20011130 I-371

Copyright 2001 VERITAS Software

Replacing Systems in a Running VCS Cluster


1. 2.

Evacuate any service groups running on the system to be replaced. Make the VCS configuration read/write, freeze the system persistently, save and close the configuration.
haconf makerw hasys freeze system_name persistent haconf dump -makero

3.

Physically replace the system with a new one using the same VCS configuration (same cluster number, node id, and system name). Connect the new system to the private network. Start LLT, GAB, and VCS on the new system. Make the VCS configuration read/write, unfreeze the system, save and close the configuration.
haconf makerw hasys unfreeze system_name persistent haconf dump -makero

4. 5. 6.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-372

Summary
You should now be able to: Upgrade VCS software to version 2.0. Install a VCS patch. Add systems to a running VCS cluster. Remove systems from a running VCS cluster. Replace systems in a running VCS cluster.

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-373

Lab: Installing VCS Patches


Student Red Student Blue

RedSG BlueSG

Install Patch
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-374

VERITAS Cluster Server for Solaris

Introduction

VERITAS Cluster Server


Clients Clients

Applications/Services Applications/Services

NFS

WWW

FTP

DB

Public Network Public Network VCS VCS Private Private Network Network

Shared Storage Shared Storage

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-376

VCS Features
Availability
Monitor and restart

applications Set failover policies

Scalability
Distribute services Add systems and

Network

storage to running clusters

Clustered Databases

Manageability
Use Java or Web

graphical interfaces Manage multiple clusters


Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130

Clustered Web Servers


I-377

High Availability Design


HA-aware applications
Restart capability Crash-tolerance

HA management software
Site replication Fault detection, notification, and failover Storage management Backup and recovery

Redundant hardware
Power supplies Network interface cards, hubs, switches Storage
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-378

VERITAS Clustering and Replication Products


Cluster Management
VERITAS Global Cluster Manager

Application Availability Agents


Informix, Oracle, Sybase, Apache

High Availability Clustering


VERITAS Cluster Server

Data Replication
VERITAS VVR & Support for Array-Based Replication

Parallel Extensions
VERITAS Cluster Volume Manager and File System

Foundation Products
VERITAS Volume Manager and File System
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-379

VERITAS High Availability Solutions


Global Global Cluster Cluster Manager Manager

WWW DB VCS VCS

VCS VCS

WAN VxVM VxVM VxFS VxFS Volume Volume Replicator Replicator

VxVM VxVM VxFS VxFS

Tokyo

London
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-380

References for High Availability


Blueprints for High Availability: Designing Resilient Distributed Systems by Evan Marcus and Hal Stern High Availability Design, Techniques, and Processes by Floyd Piedad and Michael Hawkins Designing Storage Area Networks by Tom Clark Storage Area Network Essentials: A Complete Guide to Understanding and Implementing SANs by Richard Barker and Paul Massiglia VERITAS High Availability Fundamentals
Web-based training

Copyright 2001 VERITAS Software

VCS_2.0_Solaris_R1.0_20011130

I-381

Course Overview
Troubleshooting

Using Volume Manager Event Notification Service Group Basics

Cluster Communication Installing Applications Resources and Agents Managing Cluster Service NFS Resources Using Cluster Manager
I-382

Faults and Failovers Preparing Resources

Introduction
Copyright 2001 VERITAS Software

Terms and Concepts

Installing VCS

VCS_2.0_Solaris_R1.0_20011130

Lab Overview
Private Network

Red Student

Blue Student

Odd/low numbered system

train1
SCSI JBOD

train2

Even/high numbered system

Public Network
Copyright 2001 VERITAS Software VCS_2.0_Solaris_R1.0_20011130 I-383

You might also like