Professional Documents
Culture Documents
100-002839-A SF6x AdministrationFundamentals Lessons
100-002839-A SF6x AdministrationFundamentals Lessons
Symantec Cluster
Server 6.x for UNIX:
Administration
Fundamentals
Lessons
100-002839-A
COURSE DEVELOPERS Copyright © 2014 Symantec Corporation. All rights reserved.
Raj Kiran Prasad Thota Symantec, the Symantec Logo, and VERITAS are trademarks or
registered trademarks of Symantec Corporation or its affiliates in
LEAD SUBJECT MATTER the U.S. and other countries. Other names may be trademarks of
EXPERTS their respective owners.
Graeme Gofton
THIS PUBLICATION IS PROVIDED “AS IS” AND ALL
Sean Nockles EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS
Brad Willer AND WARRANTIES, INCLUDING ANY IMPLIED
Gaurav Dong WARRANTY OF MERCHANTABILITY, FITNESS FOR A
PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE
TECHNICAL DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH
CONTRIBUTORS AND DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
REVIEWERS SYMANTEC CORPORATION SHALL NOT BE LIABLE FOR
Geoff Bergren INCIDENTAL OR CONSEQUENTIAL DAMAGES IN
Kelli Cameron CONNECTION WITH THE FURNISHING, PERFORMANCE,
Tomer Gurantz OR USE OF THIS PUBLICATION. THE INFORMATION
Anthony Herr CONTAINED HEREIN IS SUBJECT TO CHANGE WITHOUT
James Kenney NOTICE.
Bob Lucas No part of the contents of this book may be reproduced or
Paul Johnston transmitted in any form or by any means without the written
Rod Pixley permission of the publisher.
Clifford Barcliff Symantec Cluster Server 6.x for UNIX: Administration
Danny Yonkers Fundamentals
Antonio Antonucci Symantec Corporation
Satoko Saito World Headquarters
Steve Evans 350 Ellis Street
Feng Liu Mountain View, CA 94043
Maurizio Lancia United States
http://www.symantec.com
Copyright © 2014 Symantec Corporation. All rights reserved.
7
Course Introduction
Intro
Clustering concepts
The term cluster refers to multiple independent systems connected into a
management framework.
Types of clusters
A variety of clustering solutions are available for various computing purposes.
• HA clusters: Provide resource monitoring and automatic startup and failover
• Parallel processing clusters: Break large computational programs into smaller
tasks executed in parallel on multiple systems
• Load balancing clusters: Monitor system load and distribute applications
automatically among systems according to specified criteria
• High performance computing clusters: Use a collection of computing
resources to enhance application performance
Copyright © 2014 Symantec Corporation. All rights reserved.
Unlike the N-to-1 configuration, after the failed server is repaired, it can
become the redundant server.
• N-to-N—This configuration is an active/active configuration that supports
multiple application services running on multiple servers. Each application
service is capable of being failed over to different servers in the cluster.
In the example shown in the slide, utilization is increased by reconfiguring four
active/passive clusters and one active/active cluster into one N-to-1 cluster and one
N-to-N cluster. This enables a savings of four systems.
Campus clusters
The campus or stretch cluster environment is a single cluster stretched over
multiple locations, connected by an Ethernet subnet for the cluster interconnect
and a fiber channel SAN, with storage mirrored at each location.
Advantages of this configuration are:
• It provides local high availability within each site as well as protection against
site failure.
• It is a cost-effective solution; replication is not required.
Copyright © 2014 Symantec Corporation. All rights reserved.
Global clusters
Global clusters, or wide-area clusters, contain multiple clusters in different
geographical locations. Global clusters protect against site failures by providing
data replication and application failover to remote data centers.
Global clusters are not limited by distance because cluster communication uses
TCP/IP. Replication can be provided by hardware vendors or by a software
solution, such as Veritas Volume Replicator, for heterogeneous array support.
CONFIDENTIAL - NOT FOR DISTRIBUTION
21 Lesson 1 High Availability Concepts
Copyright © 2014 Symantec Corporation. All rights reserved.
1–7
HA application services
An application service is a collection of hardware and software components
required to provide a service, such as a Web site, that an end-user can access by
connecting to a particular network IP address or host name. Each application
service typically requires components of the following three types:
• Application binaries (executables)
• Network
• Storage
If an application service needs to be switched to another system, all of the
components of the application service must migrate together to re-create the
service on another system.
These are the same components that the administrator must manually move from a
failed server to a working server to keep the service available to clients in a
Copyright © 2014 Symantec Corporation. All rights reserved.
nonclustered environment.
Application service examples include:
• A Web service consisting of a Web server program, IP addresses, associated
network interfaces used to allow access into the Web site, a file system
containing Web data files, and a volume and disk group containing the file
system.
• A database service may consist of one or more IP addresses, database
management software, a file system containing data files, a volume and disk
group on which the file system resides, and a NIC for network access.
External dependencies
Whenever possible, it is good practice to eliminate or reduce reliance by high
availability applications on external services. If it is not possible to avoid outside
dependencies, ensure that those services are also highly available.
For example, network name and information services, such as DNS (Domain
Name System) and NIS (Network Information Service), are designed with
redundant capabilities.
Copyright © 2014 Symantec Corporation. All rights reserved.
VCS terminology
VCS cluster
A VCS cluster is a collection of independent systems working together under the
VCS management framework for increased service availability.
VCS clusters have the following components:
• Up to 64systems—sometimes referred to as nodes or servers
Each system runs its own operating system.
• A cluster interconnect, which enables cluster communications
• A public network, connecting each system in the cluster to a LAN for client
access
• Shared storage (optional), accessible by each system in the cluster that needs to
run the application
Copyright © 2014 Symantec Corporation. All rights reserved.
• The list of cluster systems on which you want the group to start automatically
Resource categories
• Persistent, never turned off
– None
VCS can only monitor persistent resources—these resources cannot be
Copyright © 2014 Symantec Corporation. All rights reserved.
Resource dependencies
Resources depend on other resources because of application or operating system
requirements. Dependencies are defined to configure VCS for these requirements.
Dependency rules
These rules apply to resource dependencies:
• A parent resource depends on a child resource. In the diagram, the Mount
resource (parent) depends on the Volume resource (child). This dependency
illustrates the operating system requirement that a file system cannot be
mounted without the Volume resource being available.
• Dependencies are homogenous. Resources can only depend on other
resources.
• No cyclical dependencies are allowed. There must be a clearly defined
Copyright © 2014 Symantec Corporation. All rights reserved.
starting point.
The difference between offline and clean is that offline is an orderly termination
and clean is a forced termination. In UNIX, this can be thought of as the difference
between exiting an application and sending the kill -9 command to the
process.
Each resource type needs a different way to be controlled. To accomplish this,
each agent has a set of predefined entry points that specify how to perform each of
the four actions. For example, the startup entry point of the Mount agent mounts a
block device on a directory, whereas the startup entry point of the IP agent uses the
ifconfig (Solaris, AIX, HP-UX) or ip addr add (Linux) command to set
the IP address on a unique IP alias on the network interface.
VCS provides both predefined agents and the ability to create custom agents.
CONFIDENTIAL - NOT FOR DISTRIBUTION
38 2–10 Symantec Cluster Server 6.x for UNIX: Administration Fundamentals
Copyright © 2014 Symantec Corporation. All rights reserved.
2
Note: The Veritas Cluster User’s Guide provides an appendix with a complete
description of attributes for all cluster objects.
To obtain PDF versions of product documentation for VCS and agents, see the
SORT Web site.
Low-Latency Transport
Clustering technologies from Symantec use a high-performance, low-latency
protocol for communications. LLT is designed for the high-bandwidth and low-
latency needs of not only Veritas Cluster Server, but also Veritas Cluster File
System and Veritas Storage Foundation for Oracle RAC.
LLT runs directly on top of the Data Link Provider Interface (DLPI) layer over
Ethernet and has several major functions:
• Sending and receiving heartbeats over network links
• Monitoring and transporting network traffic over multiple network links to
every active system
• Balancing the cluster communication load over multiple links
• Maintaining the state of communication
Copyright © 2014 Symantec Corporation. All rights reserved.
I/O fencing
The fencing driver implements I/O fencing, which prevents multiple systems from
accessing the same Volume Manager-controlled shared storage devices in the
event that the cluster interconnect is severed. In the example of a two-node cluster
displayed in the diagram, if the cluster interconnect fails, each system stops
receiving heartbeats from the other system.
GAB on each system determines that the other system has failed and passes the
cluster membership change to the fencing module.
The fencing modules on both systems contend for control of the disks according to
an internal algorithm. The losing system is forced to panic and reboot. The
winning system is now the only member of the cluster, and it fences off the shared
data disks so that only systems that are still part of the cluster membership (only
one system in this example) can access the shared storage.
Copyright © 2014 Symantec Corporation. All rights reserved.
The winning system takes corrective action as specified within the cluster
configuration, such as bringing service groups online that were previously running
on the losing system.
VCS architecture
Maintaining the cluster configuration
HAD maintains configuration and state information for all cluster resources in
memory on each cluster system. Cluster state refers to tracking the status of all
resources and service groups in the cluster. When any change to the cluster
configuration occurs, such as the addition of a resource to a service group, HAD
on the initiating system sends a message to HAD on each member of the cluster by
way of GAB atomic broadcast, to ensure that each system has an identical view of
the cluster.
Atomic means that all systems receive updates, or all systems are rolled back to the
previous state, much like a database atomic commit.
The cluster configuration in memory is created from the main.cf file on disk in
Copyright © 2014 Symantec Corporation. All rights reserved.
the case where HAD is not currently running on any cluster systems, so there is no
configuration in memory. When you start VCS on the first cluster system, HAD
builds the configuration in memory on that system from the main.cf file.
Changes to a running configuration (in memory) are saved to disk in main.cf
when certain operations occur. These procedures are described in more detail later
in the course.
are restarted.
Labs and solutions for this lesson are located on the following pages.
• “ Lab environment,” page A-3.
Cluster interconnect
Veritas Cluster Server requires a minimum of two heartbeat channels for the
cluster interconnect.
Loss of the cluster interconnect results in downtime, and in nonfencing
environments, can result in split brain condition (described in detail later in the
course).
Configure a minimum of two physically independent Ethernet connections on each
Copyright © 2014 Symantec Corporation. All rights reserved.
Shared storage
VCS is designed primarily as a shared data high availability product; however, you
can configure a cluster that has no shared storage.
For shared storage clusters, consider these recommendations:
• One HBA minimum for shared and one for nonshared (boot) disks:
Copyright © 2014 Symantec Corporation. All rights reserved.
3
Copyright © 2014 Symantec Corporation. All rights reserved.
sequence. When a Solaris system in a VCS cluster is paused with the Stop-A, the
system stops producing VCS heartbeats. This causes other systems to consider this
a failed node.
Ensure that the only action possible after an abort is a reset. To ensure that you
never issue a go function after an abort, create an alias for the go function that
displays a message. See the Veritas Cluster Server Installation Guide for the
detailed procedure.
Preparation assistance
Several tools are available from the Symantec Operations Readiness Tools (SORT)
Web site to help you prepare your environment to implement clustering.
• Data collection and reporting tools
A data collector can be run from the Web site, or downloaded locally, to gather
system information, run preinstallation checks, and generate reports.
• Documentation and compatibility lists
All product documentation, as well as software and hardware compatibility
lists are available from SORT.
• Preparation checklists
Platform-specific checklists can be created to assist in preparing an
environment for clustering.
Copyright © 2014 Symantec Corporation. All rights reserved.
• Patch management
SORT provides access to all products in the Storage Foundation HA family.
• Risk assessment
Checklists and reports can be used to analyze your environment and identify
risks and recommend remedies.
• Error code lookup
SORT enables you to search for additional information about error messages.
You can also request help for undocumented error codes.
• Inventory management service
Inventory management is a service that provides the ability to gather license
information from Storage Foundation HA deployments.
CONFIDENTIAL - NOT FOR DISTRIBUTION
55 Lesson 3 Preparing a Site for VCS
Copyright © 2014 Symantec Corporation. All rights reserved.
3–7
Alternately, you can run installvcs from the location of your VCS product
distribution to check your environment and examine the resultant log file to assess
readiness to install VCS.
cd sw_location
./installvcs -precheck system1 system2
Copyright © 2014 Symantec Corporation. All rights reserved.
For more information about these selections, see the Veritas Cluster Server
Installation Guide.
Lab solutions for this lesson are located on the following pages.
• “Lab 2: Validating site preparation,” page A-47
The Web installer supports most features of the installer utility. See the Veritas
Cluster Server Installation Guide for a description of supported options. The guide
also includes the browser types and versions supported by the Web installer.
If you are using VCS with shared storage devices that support SCSI-3 Persistent
Reservations, configure fencing after VCS is initially installed.
SCSI-3-based fencing provides the highest level of protection for data that is
located on shared storage and accessed by multiple cluster nodes.
You can configure fencing at any time using the installvcs -fencing
utility, as described in the “I/O Fencing” lesson. However, if you set up fencing
after you have service groups running, you must stop and restart VCS for fencing
to take effect.
Copyright © 2014 Symantec Corporation. All rights reserved.
Product documentation is not included with the software packages. You can
download all documentation from the SORT Web site.
This file contains the command line that is used to start GAB.
Cluster communication is described in detail later in the course.
Note: This command line shows status only if a module is using LLT, such as
Copyright © 2014 Symantec Corporation. All rights reserved.
GAB. If GAB is not running, the output shows a comm wait state.
The configured and active options show only nodes where LLT is
configured or active.
The lltconfig command just displays whether LLT is running, with no detail.
LLT is discussed in more detail later in the course. For now, you can see that LLT
is running using these commands.
Lab solutions for this lesson are located on the following pages.
• “Lab 3: Installing Storage Foundation HA 6.x,” page A-65.
webapache s2
• Determine the virtual IP address for the websg service group.
hares -value webip Address
#Resource Attribute System Value
webip Address global 10.10.27.93
• Determine the state of a resource on each cluster system.
hares -state webip
#Resource Attribute System Value
webip State s1 OFFLINE
webip State s2 ONLINE
Provide the service group name and the name of the system where the service
group is to be brought online.
service group.
Note: The service group shown in the slide is partially online after the webdg
resource is brought online. This is depicted by the textured coloring of the
service group circle.
Lab solutions for this lesson are located on the following pages.
• “Lab 4: Performing common VCS operations,” page A-85.
The s1 system is now in the VCS local build state, meaning that VCS is
Copyright © 2014 Symantec Corporation. All rights reserved.
The startup process is repeated on each system until all members have identical
copies of the cluster configuration in memory and matching main.cf files on
local disks. Synchronization is maintained by data transfer through LLT and GAB.
6
Copyright © 2014 Symantec Corporation. All rights reserved.
Use caution with this option. VCS does not warn you if the configuration is
open and you stop using the -force option.
• The -local option causes the service group to be taken offline on s1 and
stops the VCS engine (had) on s1.
• The -local -evacuate options cause the service group on s1 to be
migrated to s2 and then stop had on s1.
The had daemon communicates the configuration change to had on all other
nodes in the cluster, and each had daemon changes the in-memory configuration.
When the command to save the configuration is received from Cluster Manager,
Copyright © 2014 Symantec Corporation. All rights reserved.
had communicates this command to all cluster systems, and each system’s had
daemon writes the in-memory configuration to the main.cf file on its local disk.
The VCS command-line interface is an alternate online configuration tool. When
you run ha commands, had responds in the same fashion.
6
Copyright © 2014 Symantec Corporation. All rights reserved.
6
Copyright © 2014 Symantec Corporation. All rights reserved.
By default, only the UNIX root account is able to use VCS ha commands to
administer VCS from the command line.
Note: The effect of halogin only applies for that shell session.
6
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: In non-secure mode, if you change a UNIX account, this change is not
reflected in the VCS configuration automatically. You must manually modify
accounts in both places if you want them to be synchronized.
Lab solutions for this lesson are located on the following pages.
• “Lab 5: Starting and stopping VCS,” page A-97.
– Network interfaces
• Application-related resources:
– Identical installation and configuration procedures
– Procedures to manage and monitor the application
– The location of application binary and data files
The following sections describe the aspects of these components that are critical to
understanding how VCS manages resources.
Note: If your systems are not configured identically, you must note those
differences in the design worksheet. The “Online Configuration” lesson
Copyright © 2014 Symantec Corporation. All rights reserved.
shows how you can configure a resource with different attribute values for
different systems.
Note: Although examples used throughout this course are based on Veritas
Volume Manager, VCS also supports other volume managers. VxVM is
shown for simplicity—objects and commands are essentially the same on
all platforms. The agents for other volume managers are described in the
Veritas Cluster Server Bundled Agents Reference Guide.
Preparing shared storage, such as creating disk groups, volumes, and file systems,
is performed once, from one system. Then you must create mount point directories
on each system.
Copyright © 2014 Symantec Corporation. All rights reserved.
• Apply licenses.
• Set up configuration files.
This ensures that you have correctly identified the information used by the VCS
agent scripts to control the application. 7
Note: The shutdown procedure should be a graceful stop, which performs any
cleanup operations.
AIX
mount -V vxfs /dev/vx/dsk/appdatadg/appdatavol /appdata
Linux
mount -t vxfs /dev/vx/dsk/appdatadg/appdatavol /appdata 7
Solaris/HP-UX
mount -F vxfs /dev/vx/dsk/appdatadg/appdatavol /appdata
Note: The admin IP address on s2 is also configured during system startup. This
address is unique and associated with only this system, unlike the virtual IP
address.
7
Note: These virtual IP addresses are only configured temporarily for testing
purposes. You must not configure the operating system to manage the
virtual IP addresses.
Note: In each case, you can edit /etc/hosts to assign a virtual host name
(application name) to the virtual IP address.
10.10.21.198 eweb.com
Copyright © 2014 Symantec Corporation. All rights reserved.
Follow the guidelines for your platform to remove an application from operating
system control in preparation for configuring VCS to control the application.
Note: To test the network resources, access one or more well-known addresses
outside of the cluster, such as local routers, or primary and secondary DNS
servers.
This helps you identify any potential configuration problems before you test the
service as a whole, as described in the “Testing the Integrated Components”
section.
Copyright © 2014 Symantec Corporation. All rights reserved.
exported file system, verify that you can mount the exported file system from a
client on the network. This is described in more detail later in the course.
Linux
ifdown eth0:1
Solaris
7
ifconfig e1000g0 removeif 10.10.21.198
Lab solutions for this lesson are located on the following pages.
• “Lab 6: Preparing application services,” page A-113.
• NetworkHosts: The list of hosts on the network that are used to determine if
the network connection is alive
It is recommended that you specify the IP address of the host rather than the
host name to prevent the monitor cycle from timing out due to DNS problems.
• Example device attribute values:
AIX: en0; HP-UX: lan2; Linux: eth0; Solaris: e1000g0
Optional Attributes
• NetMask: Netmask associated with the application IP address
– The value may be specified in decimal (base 10) or hexadecimal (base 16).
The default is the netmask corresponding to the IP address class.
– This is a required attribute on AIX.
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: As of version 4.1, VCS sets the vxdg autoimport option to no, which
disables autoimporting of disk groups.
This also starts layered volumes by running vxrecover -s. The default is 1,
enabled, on all UNIX platforms except Linux.
• StopVolumes: Stops all volumes before deporting the disk group with vxvol
The default is 1, enabled, on all UNIX platforms except Linux.
Note: The example operating system commands for unmounting a locked file
system are specific to Solaris. Other operating systems may use different
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: Some resources must be disabled and reenabled. Only resources whose
agents have open and close entry points, such as MultiNICB, require you to
disable and enable again after fixing the problem. By contrast, a Mount
Copyright © 2014 Symantec Corporation. All rights reserved.
resource does not need to be disabled if, for example, you incorrectly
specify the MountPoint attribute.
Note: Persistent resource faults should be probed to force the agent to monitor the
resource immediately. Otherwise, the resource is not online until the next
OfflineMonitorInterval, up to five minutes.
Test procedure
For simplicity, the example service group uses the default Priority failover policy.
That is, if a critical resource in appsg faults, the service group is taken offline and
brought online on the system with the lowest priority value that is available for
failover.
The “Handling Resource Faults” lesson provides additional information about
configuring and testing failover behavior. Additional failover policies are also
described in the Veritas Cluster Server for UNIX: Cluster Management participant
guide.
Copyright © 2014 Symantec Corporation. All rights reserved.
//{
//IP appip
// {
// NIC appnic
// }
//}
Note: You cannot use the // characters as general comment delimiters. VCS
strips out all lines with // upon startup and re-creates these lines based on the 8
requires statements in the main.cf file.
Note: When you set an attribute to a default value, the attribute is removed from
main.cf. For example, after you set Critical to 1 for a resource, the
Copyright © 2014 Symantec Corporation. All rights reserved.
To see the values of all attributes for a resource, use the hares command. For
example:
hares -display appdg
Lab solutions for this lesson are located on the following pages.
• “Lab 7: Online configuration of a service group,” page A-125.
You can use the VOM to create and test a cluster configuration on Windows and
then copy the finalized configuration files into a real cluster environment. The
VOM enables you to create configurations for all supported UNIX, Linux, and
Windows platforms.
This only applies to the cluster configuration. You must perform all preparation
tasks to create and test the underlying resources, such as virtual IP addresses,
shared storage objects, and applications.
After the cluster configuration is copied to the real cluster and VCS is restarted,
you must perform complete testing of all objects, as shown later in this lesson.
Copyright © 2014 Symantec Corporation. All rights reserved.
VRTSvcs/conf/config directory.
3 Stop VCS.
Stop VCS on all cluster systems. This ensures that there is no possibility of
another administrator changing the cluster configuration while you are
modifying the main.cf file.
4 Edit the configuration files.
You must choose a system on which to modify the main.cf file. You can
choose any system. However, you must then start VCS first on that system.
5 Verify the configuration file syntax.
Note: The hacf command only identifies syntax errors, not configuration errors.
First system
Designate one system as the primary change management node. This makes
troubleshooting easier if you encounter problems with the configuration.
1 Save and close the configuration.
Save and close the cluster configuration before you start making changes. This
ensures that the working copy has the latest in-memory configuration.
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: The dot (.) argument indicates that the current working directory is used as
the path to the configuration files. You can run hacf -verify from any
directory by specifying the path to the configuration directory:
hacf -verify /etc/VRTSvcs/conf/config
8 Stop VCS.
Stop VCS on all cluster systems after making configuration changes. To leave
applications running, use the -force option, as shown in the diagram.
Copyright © 2014 Symantec Corporation. All rights reserved.
main.cf file and loads the cluster configuration into local memory on s1.
5 Verify that VCS is in a local build or running state on s1 using hastatus
-sum.
Resource dependencies
Ensure that you create the resource dependency definitions at the end of the
service group definition. Add the links using the syntax shown in the slide.
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: You cannot include comment lines in the main.cf file. The lines you see
starting with // are generated by VCS to show resource dependencies. Any
lines starting with // are stripped out during VCS startup.
Note: You must ensure that VCS is in the local build or running state on the
system with the recovered main.cf file before starting VCS on other
Copyright © 2014 Symantec Corporation. All rights reserved.
systems.
7 When HAD is in a running state on s1, this state change is broadcast on the
cluster interconnect by GAB.
8 Next, run hastart on s2 to start HAD.
9 HAD on s2 checks for a valid main.cf file. This system has an old version of
the main.cf.
10 HAD on s2 then checks for another node in a local build or running state.
11 Since s1 is in a local build or running state, HAD on s2 performs a remote
build from the configuration on s1.
12 HAD on s2 copies the cluster configuration into the local main.cf and
types.cf files after moving the original files to backup copies with
timestamps.
Copyright © 2014 Symantec Corporation. All rights reserved.
Lab solutions for this lesson are located on the following pages.
• “Lab 8: Offline configuration,” page A-157.
Notification overview
When VCS detects certain events, you can configure the notifier to:
• Generate an SNMP (V2) trap to specified SNMP consoles.
• Send an e-mail message to designated recipients.
Message queue
VCS ensures that no event messages are lost while the VCS engine is running,
even if the notifier daemon stops or is not started. The had daemons
throughout the cluster communicate to maintain a replicated message queue.
If the service group with notifier configured as a resource fails on one of the nodes,
notifier fails over to another node in the cluster. Because the message queue is
guaranteed to be consistent and replicated across nodes, notifier can resume
message delivery from where it left off after it fails over to the new node.
Copyright © 2014 Symantec Corporation. All rights reserved.
Messages are stored in the queue until one of these conditions is met:
• The notifier daemon sends an acknowledgement to had that at least one
recipient has received the message.
• The queue is full. The queue is circular—the last (oldest) message is deleted in
order to write the current (newest) message.
• Messages in the queue for one hour are deleted if notifier is unable to deliver to
the recipient.
Note: Before the notifier daemon connects to had, messages are stored
permanently in the queue until one of the last two conditions is met.
• Cannot be autodisabled
• Switches to another node upon hastop -local on the online system
• Attempts to start on all miniclusters if a network partition occurs
ClusterService is also used to manage the wide-area connector process in a global
cluster environment.
Notification configuration
These high-level tasks are required to manually configure highly available
notification within the ClusterService group.
1 Add a NotifierMngr type of resource to the ClusterService group.
Link the resource to the csgnic resource that is present
2 If SMTP notification is required:
a Modify the SmtpServer and SmtpRecipients attributes of the NotifierMngr
type of resource.
b Optionally, modify the ResourceOwner attribute of individual resources.
c Optionally, specify a GroupOwner e-mail address for each service group.
3 If SNMP notification is required:
a Modify the SnmpConsoles attribute of the NotifierMngr type of resource.
Copyright © 2014 Symantec Corporation. All rights reserved.
b Verify that the SNMPTrapPort attribute value matches the port configured
for the SNMP console. The default is port 162.
c Configure the SNMP console to receive VCS traps (described later in the
lesson).
4 Modify any other optional attributes of the NotifierMngr type of resource.
See the manual pages for notifier and hanotify for a complete description
of notification configuration options.
Note: Before modifying resource attributes, ensure that you take the resource
offline and disable it. The notifier daemon must be stopped and
restarted with new parameters in order for changes to take effect.
Overview of triggers
Using triggers
VCS provides an additional method for notifying users of important events. When
VCS detects certain events, you can configure a trigger to notify an administrator
or perform other actions. You can use event triggers in place of, or in conjunction
with, notification.
Triggers are executable programs, batch files, shell or Perl scripts associated with
the predefined event types supported by VCS that are shown in the slide.
Triggers are configured by specifying one or more keys in the TriggersEnabled
attribute. Some keys are specific to service groups or resources.
The RESSTATECHANGE, RESRESTART, and RESFAULT keys apply to both
resources and service groups. When one of these keys is specified in TriggerPath
Copyright © 2014 Symantec Corporation. All rights reserved.
at the service group level, the trigger applies to each resource in the service group.
Examples of some trigger keys include:
• POSTOFFLINE: The service group went offline from a PARTIAL or ONLINE
state.
• POSTONLINE: The service group went online from OFFLINE state.
• RESFAULT: A resource faulted.
• RESRESTART: A resource was restarted after a fault.
For a complete description of triggers, see the Veritas Cluster Server
Administrator’s Guide.
Location of triggers
Trigger executable programs, batch files, shell or Perl scripts reside in
/opt/VRTSvcs/bin/triggers by default.
You can change the location of triggers by specifying the TriggerPath attribute at
the service group or resource level. This attribute enables you to set up different
trigger programs for resources or service groups. In previous versions of VCS, the
same triggers applied to all resources or service groups in the cluster.
The value of the TriggerPath attribute is appended to /opt/VRTSvcs (also
referred to as VCS_HOME) to form a directory containing the trigger programs. In
the example shown in the slide, TriggerPath is set to bin/websg. Therefore, the
files executed when the PREONLINE key is specified for the websg service group
must be located in /opt/VRTSvcs/bin/websg.
The example portion of the main.cf file shows the PREONLINE trigger enabled
Copyright © 2014 Symantec Corporation. All rights reserved.
for websg on both s1 and s2, and the trigger path customized to map to
/opt/VRTSvcs/bin/websg.
copy the script or program to each system in the cluster that can run the trigger.
Finally, modify the TriggersEnabled attribute to specify the key for each system
that can run the trigger.
Lab solutions for this lesson are located on the following pages.
• “Lab 9: Configuring notification,” page A-173.
more optional service group attributes. Failover determination and behavior are
described throughout this lesson.
ManageFaults
The ManageFaults attribute can be used to prevent VCS from taking any automatic
actions whenever a resource failure is detected. Essentially, ManageFaults
determines whether VCS or an administrator handles faults for a service group.
If ManageFaults is set to the default value of ALL, VCS manages faults by
executing the clean entry point for that resource to ensure that the resource is
completely offline, as shown previously. This is the default value (ALL).
If this attribute is set to NONE, VCS places the resource in an ADMIN_WAIT
state and waits for administrative intervention. This is often used for service
Copyright © 2014 Symantec Corporation. All rights reserved.
groups that manage database instances. You may need to leave the database in its
FAULTED state in order to perform problem analysis and recovery operations.
Note: This attribute is set at the service group level. This means that any resource
fault within that service group requires administrative intervention if the
ManageFaults attribute for the service group is set to NONE.
AutoFailOver
This attribute determines whether automatic failover takes place when a resource
or system faults. The default value of 1 indicates that the service group should be
failed over to other available systems if at all possible. However, if the attribute is
set to 0, no automatic failover is attempted for the service group, and the service
group is left in an OFFLINE | FAULTED state.
Copyright © 2014 Symantec Corporation. All rights reserved.
MonitorInterval
This is the duration (in seconds) between two consecutive monitor calls for an
online or transitioning resource.
The default is 60 seconds for most resource types.
OfflineMonitorInterval
This is the duration (in seconds) between two consecutive monitor calls for an
Copyright © 2014 Symantec Corporation. All rights reserved.
Restart example
This example illustrates how the RestartLimit and ConfInterval attributes can be
configured for modifying the behavior of VCS when a resource is faulted.
Setting RestartLimit = 1 and ConfInterval = 180 has this effect when a resource
faults:
1 The resource stops after running for 10 minutes.
2 The next monitor returns offline.
3 The ConfInterval counter is set to 0.
4 The agent checks the value of RestartLimit.
5 The resource is restarted because RestartLimit is set to 1, which allows one
restart within the ConfInterval counter
6 The next monitor returns online.
Copyright © 2014 Symantec Corporation. All rights reserved.
7 The ConfInterval counter is now 60; one monitor cycle has completed.
8 The resource stops again.
9 The next monitor returns offline.
10 The ConfInterval counter is now 120; two monitor cycles have completed.
11 The resource is not restarted because the RestartLimit counter is now 1 and the
ConfInterval counter is 120 (seconds). Because the resource has not been
online for the ConfInterval time of 180 seconds, it is not restarted.
12 VCS faults the resource.
If the resource had remained online for 180 seconds, the internal RestartLimit
counter would have been reset to 0.
Some predefined static resource type attributes (those resource type attributes that
do not appear in types.cf unless their value is changed, such as
MonitorInterval) and all static attributes that are not predefined (static attributes
Copyright © 2014 Symantec Corporation. All rights reserved.
that are defined in the type definition file) can be overridden. For a detailed list of
predefined static attributes that can be overridden, refer to the VERITAS Cluster
Server User’s Guide.
Note: You can also run hagrp -clear group [-sys system] to clear
all FAULTED resources in a service group. However, you have to ensure
that all of the FAULTED resources are completely offline and the faults are
fixed on all the corresponding systems before running this command.
The FAULTED status of a resource is cleared when the monitor returns an online
status for that resource. Note that offline resources are monitored according to the
value of OfflineMonitorInterval, which is 300 seconds (five minutes) by default.
To avoid waiting for the periodic monitoring, you can initiate the monitoring of the
resource manually by probing the resource.
Lab solutions for this lesson are located on the following pages.
• “Lab 10: Configuring resource fault behavior,” page A-197.
Lab solutions for this lesson are located on the following pages.
• “Lab 11: IMF and AMF,” page A-243.
Note: The port a, port b, and port h generation numbers change each time the
membership changes.
Copyright © 2014 Symantec Corporation. All rights reserved.
• Specify the network device names used for the cluster interconnect.
• Modify LLT behavior, such as heartbeat frequency.
Note: Ensure that there is only one set-node line in the llttab file.
Note: The system (node) name does not need to be the UNIX host name found
using the hostname command. However, Symantec recommends that
you keep the names the same to simplify administration, as described in the
next section.
See the llthosts manual page for a complete description of the file.
Note: You can use the same cluster interconnect network infrastructure for
multiple clusters. The llttab file must specify the appropriate cluster ID
to ensure that there are no conflicting node IDs.
If you bypass the installer mechanisms for ensuring the cluster ID is unique and
LLT detects multiple systems with the same node ID and cluster ID on a private
network, the LLT interface is disabled on the node that is starting up. This prevents
a possible split-brain condition, where a service group might be brought online on
the two systems with the same node ID.
Note: If the sysname file contains a different name from the llttab/
llthosts/main.cf files, this “phantom” system is added to the cluster
upon cluster startup.
The sysname file can be specified for the set-node directive in the llttab
Copyright © 2014 Symantec Corporation. All rights reserved.
file. In this case, the llttab file can be identical on every node, which may
simplify reconfiguring the cluster interconnect in some situations.
See the sysname manual page for a complete description of the file.
Note: Other gabconfig options are discussed later in this lesson. See the
gabconfig manual page for a complete description of the file.
Copyright © 2014 Symantec Corporation. All rights reserved.
HP-UX
Linux 13
Solaris 10
Copyright © 2014 Symantec Corporation. All rights reserved.
where HAD has been stopped with the hastop -all -force option, the
resources are marked as online.
In this example, there are two Ethernet LLT links for the cluster interconnect.
Prior to any failures, systems s1, s2, and s3 are part of the regular membership of
cluster number 1. When the s3 system fails, it is no longer part of the cluster
membership. Service group C fails over and starts up on either s1 or s2, according
to the SystemList and FailOverPolicy values.
Copyright © 2014 Symantec Corporation. All rights reserved.
The time required for the VCS policy module to determine the target system is
negligible, less than one second in all cases, in comparison to the other factors.
• Bring the service group online on another system in the cluster.
As described in an earlier lesson, the time required for the application service
to start up is a key factor in determining the total failover time.
CAUTION Only manually seed the cluster when you are sure that no other
systems have GAB seeded. In clusters that do not use I/O fencing,
you can potentially create a split brain condition by using
gabconfig improperly.
After you have started GAB on one system, start GAB on other systems using
gabconfig with only the -c option. You do not need to force GAB to start with
the -x option on other systems. When GAB starts on the other systems, it
determines that GAB is already seeded and starts up.
and s2.
• Service groups A, B, and C continue to run and all other cluster functions
remain unaffected.
• Failover due to a resource fault or an operator request to switch a service group
is unaffected.
• If system s3 now faults or its last LLT link is lost, service group C is not started
on systems s1 or s2.
If an application starts on multiple systems and can gain control of what are
normally exclusive resources, such as disks in a shared storage device, split brain
condition results and data can be corrupted.
Copyright © 2014 Symantec Corporation. All rights reserved.
VCS uses the low-priority link only for heartbeats (at half the normal rate), unless
it is the only remaining link in the cluster interconnect.
Lab solutions for this lesson are located on the following pages.
• “Lab 12: Cluster communications,” page A-257.
Failure of the cluster interconnect presents identical symptoms. In this case, both
Copyright © 2014 Symantec Corporation. All rights reserved.
nodes determine that their peer has departed and attempt to take corrective action.
This can result in data corruption if both nodes are able to take control of storage in
an uncoordinated manner.
Other scenarios can cause this situation. If a system is so busy that it appears to be
hung, to another system in the cluster it would seem to have failed. The second
system would then take the corrective action of starting the services of the hung
system. This can also happen on systems where the hardware supports a break and
resume function. If the system is dropped to command-prompt level with a break
and subsequently resumed, the system can appear to have failed. The cluster is
reformed and then the system recovers and begins writing to shared storage again.
blocking access to other nodes. Persistent reservations are persistent across SCSI
bus resets and also support multiple paths from a host to a disk.
Coordinator disks
The coordinator disks act as a global lock mechanism used by the fencing driver to
determine which nodes are currently registered in the cluster. This registration is 14
represented by a unique key associated with each node that is written to the
coordinator disks. In order for a node to access a data disk, that node must have a
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: The registration key is not actually written to disk, but is stored in the drive
electronics or RAID controller.
node 2 would be CVCS, and so on. For simplicity, these are shown as A and B in
the diagram.
After registering with the data disks, a Write Exclusive Registrants Only
reservation is set on the data disk. This reservation means that only the registered
system can write to the data disk.
in the disk group belonging to the dbsg service group. Node 1 is registered to write
to the data disks in the disk group belonging to the appsg service group.
After registering with the data disk, Volume Manager sets a Write Exclusive
Registrants Only reservation on the data disk.
because the SCSI-PR protocol says that only a member can eject a member.
This condition means that only one system can win.
3 Node 0 also wins the race for the second coordinator disk.
Node 0 is favored to win the race for the second coordinator disk according to
the algorithm used by the fencing driver. Because node 1 lost the race for the
first coordinator disk, node 1 has to sleep for one second (default) before it
tries to eject the other node’s key. This favors the winner of the first
coordinator disk to win the remaining coordinator disks. Therefore, node 1
does not gain control of the second or third coordinator disks.
Because VxVM controls access to the storage, adding or deleting disks is not a
problem. VxVM fences any new drive added to a disk group and removes keys
when drives are removed. VxVM also determines if new paths are added and
fences these, as well.HAD starts service groups.
Using the coordinator=on option to vxdg for the coordinator disk group
ensures that the coordinator disk group has exactly three disks. This flag is set by
default when fencing is configured using the installer.
scsi3_disk_policy=dmp
14
Copyright © 2014 Symantec Corporation. All rights reserved.
Lab solutions for this lesson are located on the following pages.
• “Lab 13: Configuring SCSI3 disk-based I/O fencing,” page A-285.
Note: Before bringing VCS into the environment, ensure that all components are
properly configured, as described in the “Preparing Services for VCS”
lesson in the Veritas Cluster Server for UNIX: Install and Configure 15
participant guide.
agent process is not started on that system. The agent may be running on other
systems in the cluster if they are configured to run a resource of that type.
• A resource cannot be managed without an agent.
Custom and bundled VCS agents are located within subdirectories of the VCS bin
directory, /opt/VRTSvcs/bin. Database and other enterprise agents are located
in /opt/VRTSagents/ha/bin.
relationships among VCS components. The agent does not read the
configuration files directly. The VCS engine has the configuration in
memory and passes the configuration information to the agent when it
starts, when a new resource is created, and when an existing resource 15
configuration is modified. The agent then stores the configuration
information in memory.
Entry points
• Online: Runs StartProgram with the specified parameters in the specified user
context
• Offline: Runs StopProgram with the specified parameters in the specified user
context
• Monitor: If no MonitorProgram is specified, verifies that all processes
Copyright © 2014 Symantec Corporation. All rights reserved.
Note: If you use only PidFiles for monitoring, you may receive a false indication
of online if the application has not cleared out the process IDs upon
restarting. The PIDs can be cleared using a startup script or an open entry
point.
15
Note: The MonitorProcess values must match the output displayed by the ps
command exactly. For example, if the processes are displayed with full
path names, you must include the full path name when specifying the
processes to monitor.
Copyright © 2014 Symantec Corporation. All rights reserved.
3—Performs intelligent resource monitoring for both online and for offline
resources (default for Application resource type)
The MonitorFreq key determines how often a resource is monitored by traditional
polling. When set to an integer greater than 0, the value of MonitorFreq is 15
multiplied by the value of the MonitorInterval and OfflineMonitorInterval
attributes to determine the frequency of running the poll-based monitor entry point
for online and offline resources, respectively.
RegisterRetry determines how many times the agent tries to register the resource
with the IMF notification module.
IMFRegList specifies the attributes registered with the IMF notification module
and should not be modified.
CONFIDENTIAL - NOT FOR DISTRIBUTION
279 Lesson 15 Copyright © 2014 Symantec Corporation. All rights reserved.
15–13
Supported configurations
Intelligent monitoring is supported for the Application agent only under specific
configurations. The complete list of such configurations is provided in the table in
the slide.
See the Veritas Cluster Server Administrator’s Guide and Bundled Agents
Reference Guide for details about configuring IMF for the Application agent.
Copyright © 2014 Symantec Corporation. All rights reserved.
15
15
Lab solutions for this lesson are located on the following pages.
• “Lab 14: Configuring an Application resource,” page A-307.
16
Basic monitoring
The instance can be monitored by scanning the process table for the process IDs
Copyright © 2014 Symantec Corporation. All rights reserved.
(PIDs) for critical database processes. The processes monitored vary by database.
For example, the Oracle agent monitors the ora_smon, ora_dbw, ora_pmon,
and ora_lgwr processes.
16
a database server.
16
Note: Some databases, such as Oracle, install updates in a new directory. This
directory can be on shared storage, which provides a way to use the
Copyright © 2014 Symantec Corporation. All rights reserved.
16
**Linux
Shared memory settings:
– For drivers built into the kernel, append parameters to the kernel command
line using the boot loader.
– For kernel modules, use /etc/modules.conf.
– For tunable parameters, use sysctl and /etc/sysctl.conf.
*Solaris
Changes to /etc/system require a reboot to take effect.
CONFIDENTIAL - NOT FOR DISTRIBUTION
294 16–10 Symantec Cluster Server 6.x for UNIX: Administration Fundamentals
Copyright © 2014 Symantec Corporation. All rights reserved.
Network configuration
Each database service group requires at least one IP address for client connections.
This IP address should fail over together with the database in case of any major
faults.
Therefore, you need to use an IP resource (or an IPMultiNIC resource) and
configure the host name of the service group IP address in the database. The clients
connect to the host name corresponding to this virtual IP address and not to the
local host names of the servers.
Copyright © 2014 Symantec Corporation. All rights reserved.
16
See the platform-specific database agent guides for details about how to design
your VCS configuration to meet your high availability requirements.
16
• NIC: Monitors one or more network interface cards for remote client
connection
The example shown on the slide assumes that the Oracle binaries are located on
local storage. The data files are located on a file system (rather than raw volumes).
The clients access Oracle services using the service group IP address defined by
the IP resource.
16
logs. In this case, you can create an SQL script to perform these actions, and
this script is called when you set StartUpOpt to CUSTOM.
You must create the script in /opt/VRTSagents/ha/bin/Oracle with
the name of start_custom_Sid.sql, where Sid is the same as the value
of the Sid attribute.
16
Home = "/hr_ora"
EnvFile = "/oracle/.ora_envfile"
AutoEndBkup = 0
Encoding = eucJP
)
The example value for the Encoding attribute sets encoding to the Japanese
language set. For a complete list of optional attributes, see the Veritas Cluster
Server Agent for Oracle Installation and Configuration Guide for your platform.
16
• MonScript: The executable script file containing the SQL statements VCS uses
when writing to the table
• EnvFile: The file containing environment variables sourced by the agent
Configuration prerequisites
• Create the database user and password for use by VCS.
• Create a test table within the monitored database.
• Create an executable script with SQL statements.
In this example, the user scott with the password tiger should be defined in
the HR database with update privileges to the table called testtable. This table
should be created in the database before the additional monitoring is enabled.
CONFIDENTIAL - NOT FOR DISTRIBUTION
304 16–20 Symantec Cluster Server 6.x for UNIX: Administration Fundamentals
Copyright © 2014 Symantec Corporation. All rights reserved.
Encrypting passwords
You can use the VCS encryption utility to encrypt database passwords before
configuring the Pword attribute in the Oracle agent configuration.
Note: The value of Pword is automatically encrypted when you use VOM or the
VCS Java GUI to configure the resource.
16
Note: Consider minimizing the number of volumes and disk groups used in
database service groups. Large numbers of objects complicate
administration and can slow service group startup.
Copyright © 2014 Symantec Corporation. All rights reserved.
value.
Otherwise, the database in the backup mode on the failover system cannot be
opened and VCS cannot bring the Oracle resource online. The following errors are
displayed to indicate this condition:
$ ORA-1110 "data file %s: ’%s’"
$ ORA-1113 "file %s needs media recovery"
Before VCS can bring the Oracle resource online on the failover system, you must
take the tablespaces out of backup mode and shut down the database instance so 16
that it can be reopened. Refer to the Oracle documentation for instructions on how
to change the state of the tablespaces.
CONFIDENTIAL - NOT FOR DISTRIBUTION
309 Lesson 16 Clustering Databases Copyright © 2014 Symantec Corporation. All rights reserved.
16–25
Additional Oracle agent functions
The Oracle agent supports two additional entry points you can use to manage
database functions from within VCS:
• Action: Performs specified actions, such as backing up the Oracle database,
changing the database state, and suspending and resuming a database instance
This can be useful for scripting common database administration tasks that can
be initiated from the VCS operator or administrator.
• Info: Checks the status of the instance
Copyright © 2014 Symantec Corporation. All rights reserved.
16
Lab solutions for this lesson are located on the following pages.
“Lab 15: Configuring an Oracle service group,” page A-327 16
x You invest a considerable amount of time, expense, and expertise to prepare for
and complete a Symantec technical exam, which is undermined by those who
engage in exam misconduct.
x Exam misconduct enables less qualified individuals to compete for the jobs and
benefits YOU deserve.
x Exam misconduct erodes confidence in both Symantec programs and your skills
as a certified IT professional and can lead to security and liability risks for your
customers and/or employer
x To confidentially report suspected cases of misconduct, please contact
global_exams@symantec.com.
Copyright © 2014 Symantec Corporation. All rights reserved.
Symantec is committed to maintaining the security and integrity of its brand and
certification and accreditation exams. This ensures that our products are installed and
maintained by qualified IT Professionals and provides end users with the confidence
that their system software is operating at maximum efficiency. Symantec actively
investigates and takes corrective action against individuals and organizations who
attempt to compromise the security of our exams or engage in any form of exam
misconduct. To learn more about Symantec Testing Policies and Exam Security, visit
http://www.symantec.com/business/training/certification/path.jsp?pathID=policies
CONFIDENTIAL
To learn more - NOT Certification
about the Symantec FOR DISTRIBUTION
Program and exams,
314 visit http://go.symantec.com/certification