Professional Documents
Culture Documents
Legal Notices
The information contained in this document is subject to change without
notice.
Hewlett-Packard makes no warranty of any kind with regard to this
manual, including, but not limited to, the implied warranties of
merchantability and fitness for a particular purpose. Hewlett-Packard
shall not be liable for errors contained herein or direct, indirect, special,
incidental or consequential damages in connection with the furnishing,
performance, or use of this material.
Copyright 1999 Hewlett-Packard Company.
This document contains information which is protected by copyright. All
rights are reserved. Reproduction, adaptation, or translation without
prior written permission is prohibited, except as allowed under the
copyright laws.
Corporate Offices:
Hewlett-Packard Co.
3000 Hanover St.
Palo Alto, CA 94304
Contents
Contents
39
41
41
41
42
65
65
65
66
Performance Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
System Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Network Performance Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Testing Monitor Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing Disk Monitor Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing Cluster Monitor Requests . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing Network Monitor Requests . . . . . . . . . . . . . . . . . . . . . . . . . . .
Testing System Resource Monitor Requests . . . . . . . . . . . . . . . . . . . .
68
68
68
68
68
Contents
Contents
Printing History
Table 1
Printing Date
Part Number
Edition
March 1999
B7612-90009
Edition 1
Preface
This guide describes how to use Event Monitoring Service (EMS) alone,
and in conjunction with high availability software such as
MC/ServiceGuard and enterprise management products like IT/O. The
chapters are as follows:
Understanding the Event Monitoring Service
Installing and Using EMS
Selecting Resources to Monitor
Defining a Monitoring Request
EMS Operations
Configuring MC/ServiceGuard Package Dependencies
Troubleshooting
Related
Publications
10
11
12
Chapter 1
Understanding EMS
EMS can be configured to poll a local system or application resource and
send messages when events occur. It can also be configured to listen for
event messages from asynchronous monitors. An event is any action or
state you want to know about. For example, you may want to be alerted
when a disk fails or when available filesystem space falls below a certain
level. EMS allows you to configure what you consider an event for any
monitored system resource.
EMS sends event notification to a wide variety of software using multiple
protocols (opcmsg, SNMP, TCP, UDP, syslog, console, textlog, email). For
example, you can configure EMS to send a message to MC/ServiceGuard
and IT/Operations when a disk fails. These applications can then use
that message to trigger package failover and to send a message to an
administrator to fix the disk.
EMS operates with a registry framework, a collection of monitors, and a
configuration interface that runs under SAM (HP-UX System
Administration Manager). EMS starts and stops the monitors, stores
information used by the monitors, and directs monitors where to send
events.
Figure 1-1 illustrates these EMS relationships.
Figure 1-1
Client, such as
EMS the SAM interface to monitors,
MC/ServiceGuard package configuration
Target, such as
IT/Operations,
MC/ServiceGuard
Framework
Registrar
API
MONITORS
Chapter 1
Resource
dictionary
System and
application
resources
13
14
Chapter 1
/interfaces
/pv_summary
/lv
/lan
/lv_summary
/copies /pv_pvlink
/status
/status
/lvName
/lvName
/deviceName
/status
/system
contains all job
queue, user, and
filesystem status
/jobQueue1Min
/jobQueue5Min
/LANname
/cluster
contains all package,
node, and cluster
status
/filesystem
/jobQueue15Min
/availMB
/numUsers
/localNode /status
/package
/status
/clusterName
/clusterName
/fsName
/service_status /package_status
/package_name
/package_name
Bold Italics - Instances, replaced with an actual name
Bold - Resource instance
/service_name
The full path of a resource includes the resource class hierarchy and
instance. An example of a full resource path for the physical volume
status of the device /dev/dsk/c0t1d2 belonging to volume group
vgDataBase, is /vg/vgDataBase/pv_pvlink/status/c0t1d2.
Creating a monitoring request means indentifying the full pathname of
the resources to be monitored and specifying the conditions under which
Chapter 1
15
Understanding Monitors
Monitors are applications written to gather and report information about
specific resources on the system. They use system information stored in
places like /etc/lvmtab and the MIB database. When you make a
request to a monitor the following sequence of events occurs:
The monitor polls the resource to collect information
The monitor sends a message to the framework
The framework evaluates the data to determine if an event has
occurred
If an event has occurred, the framework sends notification in the
appropriate format
To obtain additional information about installed monitors:
1. Go to the /etc/opt/resmon/dictionary directory.
Each monitor registered with EMS has an information file that is
stored in this directory.
2. View the monitor dictionary file.
The file name is descriptive of each monitor. The file extension is
.dict. For example, the mibmonitor dictionary filename is
mibmond.dict.
A standard API provides a method for adding new monitors to EMS.
To create your own monitor, refer to the Writing Monitors for the Event
Monitoring Service (EMS) Developers Kit:
1. Go to the high availability web site:
http://www.software.hp.com.
2. From the web site select High Availability, then select Event
Monitoring Service Developers Kit.
16
Chapter 1
Chapter 1
17
2. From the web site select High Availability, then select Event
Monitoring Service Developers Kit.
18
Chapter 1
20
Chapter 2
Installing EMS
EMS checks the resources for local systems only. Install EMS on every
system you wish to monitor.
NOTE
The EMS bundle (Part Number B7609BA) version A.03.00 contains the
following file sets:
EMS-CoreEMS framework
EMS-ConfigSAM interface to EMS
To install EMS:
Use swinstall, or
Use the Software Management area in SAM
If you have many systems, it may be easier to install over the network
from a central location:
1. Create a network depot according to the instructions in Managing
HP-UX Software with SD-UX.
2. rlogin or telnet to the remote host on which you are installing
EMS.
3. Install over the network from the depot.
When monitors are updated, or when you re-install a monitor on top of
an existing monitor, your requests are retained. This is part of the
functionality provided by the persistence client.
Chapter 2
21
Removing EMS
To remove EMS, use swremove or the Software Management products
under SAM. Note that the monitors are persistent, that is, they are
always automatically started if they are stopped. Therefore, it is likely
you will have warnings in your removal log file that say:
Could not shut down process
or errors that say:
File /etc/opt/resomn/lbin/p_client could not be removed.
Even if you see these warnings, monitors are removed and any dirty files
are cleaned up on reboot.
CAUTION
When you remove EMS, all log files are also removed. If you wish to save
any of the EMS log files, rename them or store them off before you
remove EMS.
22
Chapter 2
Chapter 2
23
24
Chapter 2
Selecting Resources
All resources are divided into classes. To select a resource to monitor:
1. Click on the Actions menu.
2. Select Add Monitoring Request
The top-level resource classes for all installed monitors are
dynamically discovered and then listed as shown in the figure below.
Some Hewlett-Packard products, such as ATM Adapter for HP/9000
Servers, HP OTS 9000 or Support Tools Manager (STM) for HP 9000
hardware monitoring, include their own monitors within their
product hierarchy. If those products are installed on the system, then
their top-level resource classes also appear here.
NOTE
26
Chapter 3
Chapter 3
27
28
Chapter 3
Chapter 3
29
30
Chapter 3
32
Chapter 4
Chapter 4
33
At each interval
34
Chapter 4
NOTE
Updated monitors may have new status values that change the meaning
of your monitoring requests, or generate new alerts. For example,
assume you have a request for notification if status > 3 for a resource
with a values range of 1 through 7. You receive alerts each time the value
equals 4, 5, 6, or 7. If the updated version of the monitor has a new status
value of 1 through 8, you also see alerts when the resource equals 8.
Chapter 4
35
Repeat
Return
36
Chapter 4
Chapter 4
37
38
Chapter 4
message group to HA
Chapter 4
39
Chapter 4
Email Option
This sends event notification to the email address indicated for that
request.
To set for an email notification:
1. Select the Email option from the list in the Notify via area.
2. Specify the full email address in the Email Address field.
Console Option
This sends event notification to the system console.
To set for a console notification:
Select the Console option from the list in the Notify via area.
Syslog Option
This sends event notification to the system log.
An abnormal event message (error) is returned under the following
conditions:
The When value is . . . condition changes from FALSE to TRUE
The When value changes when the return value does not match
the previous return value
For an abnormal event, a system logging level of error will be associated
with the logged message.
To set for a system log notification:
Chapter 4
41
Textlog Option
This sends event notification to the file indicated for that request.
To set for a text log notification:
1. Select the Textlog option from the list in the Notify via area.
2. Specify the filename and path in the File Path field.
The default path, /var/opt/resmon/log/event.log, displays when
the Textlog option is selected.
42
Chapter 4
Chapter 4
43
44
Chapter 4
EMS Operations
This chapter describes the following:
Copying Monitoring Requests
Modifying Monitoring Requests
45
46
Chapter 5
47
48
Chapter 5
Chapter 5
49
50
Chapter 5
Chapter 5
51
52
Chapter 5
Configuring MC/ServiceGuard
Package Dependencies
This chapter describes how to use SAM to create package resource
dependencies on EMS resources. That is how to create a monitoring
53
NOTE
54
Chapter 6
Chapter 6
55
56
Chapter 6
7. If you have not previously done so, specify a Package Name and Node
and specify a Package SUBNET Address. Then click on Specify
Package Resource Dependencies... to add EMS resources as
package dependencies. A screen similar to Figure 6-3 displays.
Figure 6-3
Chapter 6
57
58
Chapter 6
Figure 6-5
NOTE
If you select UP, the package fails over if the value is anything but UP.
If you select UP and PVG-UP, the package fails over if the pv_summary
value is not equal to UP or PVG_UP; in other words, if pv_summary is
SUSPECT or DOWN.
The polling interval determines the maximum amount of elapsed
time before the monitor knows about a change in resource status. For
critical resources, you may want to set a short polling interval, such
as 30 seconds, but this could adversely affect system performance.
With longer polling intervals you gain system performance, but you
Chapter 6
59
/vg/vg01/pv_summary
60
= UP
= PVG_UP
60
Chapter 6
Troubleshooting
This section gives hints on testing your monitoring requests, and gives
you some information about log files and monitor behavior that will help
you determine the cause of problems. For information on fixing problems
detected by monitors, see the list of related publications in the Preface.
61
Troubleshooting
62
Chapter 7
Troubleshooting
EMS Directories and Files
Chapter 7
63
Troubleshooting
EMS Directories and Files
Table 7-1
/opt/resmon/resls
This command lists the latest polled status of the specified resource
on a specified system.
This can be used to discover and get details on resources available
for monitoring.
/opt/resmon/resdata
This is used to get additional information from EMS regarding
events or monitor restarts.
64
Chapter 7
Troubleshooting
Logging and Tracing
EMS Logging
As mentioned in the previous section, log files in /etc/opt/resmon/log
/ contain information logged by the monitors.
Look at the client.log if you seem to be having a problem with the
SAM, or any other client, interfaces to EMS or MC/ServiceGuard. With
the default level of logging, only audit and error messages are logged. An
example of an audit message is:
User event occurred at Thu Jul 31 16:13:31 1997
Process ID: 10404 (client)
Log Level:Audit
+ /vg/vg00/lv/copies/* (8 instances)
If (<1), OpC (m/n), 18000s, Thu Jul 31 16:13:31 1997
The plus (+) means that request has been added. A minus (-) indicates
a removal. A minus (-) followed by a plus (+) indicates a modification.
Events sent to targets are marked with period (.). Errors are marked
with Log Level: Error or with Log Level: Warning.
Look at the api.log if you seem to be having a problem with a specific
monitor. Check for warnings or errors.
Some monitors have their own logs, refer to the man page for individual
monitors.
High Availability Monitors
High availability monitors provide additional logging support.
NOTE
Logging will occur at every polling interval. This can create a very large
syslog file, so you may only want to use logging when you are
Chapter 7
65
Troubleshooting
Logging and Tracing
troubleshooting.
EMS Tracing
Some monitors provide tracing which can be used for debugging monitor
code.
Use the -d option to turn on tracing for EMS. Tracing should only be
used at the request of your HP support personnel when trying to
determine if there may be a problem with EMS. To turn on tracing,
modify the .dict file in /etc/opt/resmon/dictionary and add -d to
the monitor you would like to trace:
MONITOR:
/etc/opt/resmon/lbin/diskmond -l -d
Kill the monitor process. The monitor will automatically restart with
tracing enabled. To speed up monitor restart, use the resls command
with the top level of the resource class as an argument, for example,
resls /vg.
Tracing is customarily logged to
/etc/opt/resmon/log/monitor_name.log. The monitor_name usually
matches the name used for the monitor in the dictionary file. For
example, the MIBmonitor uses mibmond.dict and mibmond.log.
66
Chapter 7
Troubleshooting
Performance Considerations
Performance Considerations
Monitoring your system, is important to maintain high availability, but
monitoring consumes system resources. You must carefully consider your
performance needs against your need to know as soon as possible when a
failure threatens availability.
Chapter 7
67
Troubleshooting
Testing Monitor Requests
Chapter 7
Troubleshooting
Testing Monitor Requests
load. The EMS system resource monitor should return the same values
that this command does.
Chapter 7
69
Troubleshooting
Testing Monitor Requests
70
Chapter 7
Glossary
A-H
alert An event. A message sent to
warn a user or application when
certain conditions are met.
asynchronous monitor A
monitor that monitors resource
instances (or resource class)
asynchronously. It is event driven
and send notifications when events
occur. It does not keep track of the
current state or value of each
resource it monitors.
client The application that creates
or cancels requests to monitor
particular resources. The
consumer of a resource status
message. A user of the Resource
Monitor framework. This user may
browse resources, request status,
and make requests to have
resources monitored. Examples are
MC/ServiceGuard as it starts a
package or the SAM interface to
EMS.
dictionary See Resource
Dictionary.
EMS (Event Monitoring Service)
The interface between resource
monitors, the client and target
applications.
I-L
ITO HP OpenView IT/Operations,
formerly known as Operations
Center
logical extent The basic
allocation unit for a logical volume
is called a logical extent. For
mirrored logical volumes, either
two or three physical extents are
mapped for each logical extent,
depending on whether you are
using 2-way or 3-way mirroring.
logical volume A collection of
disk space from one or more disks.
Each collection appears to the
operating system as a single disk.
Like disks, logical volumes can be
used to hold file systems, raw data
areas, dump areas, or swap areas.
Unlike disks, logical volumes can
be given a size when they are
created, and a logical volume can
71
M
MIB (Management Information
Base). A document that describes
objects to be managed. A MIB is
created using a grammar defined
in Structure of Management
Information (SMI) format. This
grammar concisely defines the
objects being managed, the data
types these objects take,
descriptions of how the objects can
be used, whether the objects are
read-only or read-write, and
identifiers for the objects.
MIB II (MIB2) A MIB that
defines information about the
system, the network interface
cards it contains, routing
information it contains, the TCP
and UDP sockets it contains and
72
N-P
notification See alert.
physical extent LVM divides
each physical disk into addressed
units called physical extents.
physical volume A disk that has
been initialized by LVM.
polling The process by which a
monitor obtains the most recent
status of a resource.
PVG (physical volume group)
A grouping of physical devices
(host adapters, busses, controllers,
or disks), that allow LVM to
manage redundant links or
mirrored disks and access the
redundant hardware when the
primary hardware fails.
Q-R
registrar The registrar process
provides the link between resource
status consumers (clients) and
resource status providers (resource
monitors). The central part of the
resource monitor framework which
uses the resource dictionary to act
as an intermediary between client
systems and resource monitors.
resource May be any entity a
monitor application developer
names. Examples include a
network interface, CPU statistics,
a MIB object, or a network service.
resource class A category of
resources useful during
configuration. For example,
/net/interfaces/lan/status is
provided as a resource class.
resource dictionary A file
describing the hierarchy of
resources that can be monitored
and the processes that perform the
resource monitoring.
S-T
SNMP (Simple Network
Management Protocol) Standard
protocol for network based
retrieval of information about
system resources.
state The current value of a
resource (UP or DOWN). For some
resource instances, a monitor may
73
U-Z
volume group In LVM, a set of
physical volumes whose extents
are grouped together and then
made available to users as logical
volumes. A volume group can be
activated by only one node at a
time unless you are using
MC/LockManager.
MC/ServiceGuard can activate a
volume group when it starts a
package. A given disk can belong
to only one volume group. A logical
volume can belong to only one
volume group.
74
Index
A
api.log file, 65
asynchronous
monitors, 37
asynchronous monitors, 28
C
classes, 28
client.log file, 65
cluster, 54
cluster monitor request
testing, 68
console
notification option, 41
copying requests, 47
creating a monitoring request,
33
creating package dependencies,
54
D
dictionary, 63
disk monitor request
testing, 68
E
email
notification option, 41
EMS
files and directories, 63
logging, 65
restarting, 69
SAM interface, 13, 23
starting, 23
testing monitoring requests,
68
tracing, 66
EMS, high availability, 17
event
notification, 34
notification frequency, 36
Index
moidfying requests, 49
monitor, 16
API, 16
asynchronous, 28, 37
finding information, 26
view information, 16
monitor daemons, 63
monitor persistence, 69
monitoring request
copying, 47
creating, 33
creating comments, 43
modifying, 49
polling interval, 37
removing, 50
testing, 68
viewing, 51
monitors
updating, 35
N
Network Node Manager
send event, 17
notification, 34
event frequency, 36
notification comment, 43
notification option
console, 41
email, 41
syslog, 41
TCP, 39
textlog, 42
UDP, 39
notification options, 36
notification protocol, 39
SNMP, 39
Notify at each interval, 34
Notify when value changes, 34
Notify when value is..., 34
O
opcmsg
75
Index
event notification, 38
options for notification, 36
P
p_client, 69
package configuration file, 60
package dependencies, 60
creating, 54
performace, 67
persistence, 35, 69
polling interval, 37, 60, 67
protocol
event notification options, 38
protocol, sending notification, 39
R
registrar.log file, 66
removing requests, 50
repeat notification, 36
resource
as MC/ServiceGuard
dependency, 54
selecting, 26
UP values, 59
viewing descriptions, 30
resource classes, 14, 28
resource dictionary, 63
resource monitor, 16
restarting EMS, 69
return notification, 36
S
SAM interface
EMS, 13
SAM interface to EMS, 23
SAM interface to
MC/ServiceGuard, 57
selecting a resource, 26
sending
events, 34
severity options
ITO, 38
76
SNMP, 39
SNMP
severity options, 39
SNMP traps, 39
starting
EMS, 23
syslog
notification option, 41
system load, 67
system requirements, 14
T
TCP
notification option, 39
testing monitoring requests, 68
textlog
notification option, 42
tracing, 66
U
UDP
notification option, 39
UP value, 59
updating monitors, 35
V
viewing
resource descriptions, 30
viewing requests, 51
W
wildcard, 29
Index