EMS On Sun Cluster

A Technical Document
Enterprise Integration Framework: TIBCO EMS on Sun Cluster

This document is a part of Enterprise Integration Framework at COMPANY XYZ. This document describes the rationale for deploying TIBCO Enterprise Message Service (EMS) on Sun Cluster, along with the detailed steps to achieve a functioning implementation.
Document Revisions
Version Date Author Comments
1.0
11/01/2004
Initial Draft Portions taken from EMS Best Practices Sun Cluster section taken from Sun Cluster Overview for Solaris
Document Approvals
Name Signature Date
Document Owners
Name
Proprietary and Confidential

TIBCO Software Inc. http://www.tibco.com 3303 Hillview Avenue Palo Alto, CA 94304 1-800-420-8450
This document contains information that is confidential to both COMPANY XYZ and TIBCO Software Inc.
2004 TIBCO Software Inc. All Rights Reserved. TIBCO Confidential and Proprietary
404040
EIF- EMS on Sun Cluster 1.0
Copyright Notice
04 TIBCO Software Inc. This document is unpublished and the foregoing notice is affixed to protect TIBCO Software nadvertent publication. All rights reserved. No part of this document may be reproduced in any form, including nsmission electronically to any computer, without prior written consent of TIBCO Software Inc. The information cument is confidential and proprietary to TIBCO Software Inc. and may not be used or disclosed except as expressly g by TIBCO Software Inc. Copyright protection includes material generated from our software programs displayed on cons, screen displays, and the like.
bed herein are either covered by existing patents or patent applications are in progress. All brand and product names gistered trademarks of their respective holders and are hereby acknowledged.
his document is subject to change without notice. This document contains information that is confidential and O Software Inc. and may not be copied, published, or disclosed to others, or used for any purposes other than review, horization of an officer of TIBCO Software Inc. Submission of this document does not represent a commitment to on of this specification in the products of the submitters.
his document is subject to change without notice. THIS DOCUMENT IS PROVIDED "AS IS" AND TIBCO MAKES EXPRESS, IMPLIED, OR STATUTORY, INCLUDING BUT NOT LIMITED TO ALL WARRANTIES OF ITY OR FITNESS FOR A PARTICULAR PURPOSE. TIBCO Software Inc. shall not be liable for errors contained ntal or consequential damages in connection with the furnishing, performance or use of this material.
on, please contact:
c. nue 4
2004 TIBCO Software Inc. All Rights Reserved. TIBCO Confidential and Proprietary.
404040
tents
tion Framework: TIBCO EMS on Sun Cluster...........................................2
.....................................................................................................................4
..................................................................................................................................4 .................................................................................................................................4 mentation....................................................................................................................4
age Server (EMS).........................................................................................5
..................................................................................................................................5 lerance......................................................................................................................5 ctory..........................................................................................................................6 ..................................................................................................................................7
ware..............................................................................................................8
..................................................................................................................................8 ..................................................................................................................................9 nnect.........................................................................................................................9 ership.......................................................................................................................10 uration Repository....................................................................................................10 ................................................................................................................................10 es............................................................................................................................11 ................................................................................................................................11 g..............................................................................................................................12 s...............................................................................................................................12
ster..............................................................................................................15
chitecture.................................................................................................................15 ation.........................................................................................................................15 uration......................................................................................................................17 n...............................................................................................................................19 ................................................................................................................................19
s..................................................................................................................21
................................................................................................................................21 er Installation...........................................................................................................21 ership......................................................................................................................21 ers...........................................................................................................................22 s under Cluster Control............................................................................................22 ation Changes..........................................................................................................22 rver Instance............................................................................................................24 Password................................................................................................................24 rver Instance............................................................................................................24
...................................................................................................................26
................................................................................................................................26
404040
g..............................................................................................................................27
...................................................................................................................28
...............................................................................................................................28 re.............................................................................................................................28 d Management..........................................................................................................28 conf.........................................................................................................................28
...................................................................................................................29
................................................................................................................................29 ster.sh......................................................................................................................31 onfiguration ( scstat p )...........................................................................................32 QueueSender.java....................................................................................................33 ducer.java.................................................................................................................36
EIF-Developers Guide for HEB 4
1 Introduction
This document is part of the Enterprise Integration Framework (EIF) for COMPANY XYZ. COMPANY XYZ have chosen TIBCO Enterprise Message Service (EMS) as the messaging backbone for all of their integration projects. To this end, it is imperative that EMS be implemented in such a way as to deliver the Level Of Service required by the business. In addition COMPANY XYZ want to make best use of server hardware by not having hardware tied up waiting to be brought into use in the event of server failure. They are also current and reasonably experienced users of Sun Cluster Software.
1.1 Audience
The audience of this document are: Developers attempting to understand the rationale for a clustered deployment Administration staff involved in implementing or supporting such a deployment
1.2 Purpose
This document addresses the following questions: What do we mean by the terms Fault Tolerant and Highly Available in the context of EMS? What benefits does Sun Cluster provide? How do we install the EMS components into a Sun Cluster? What role will TIBCO Administrator play? What role will TIBCO Hawk play?
Although not intended to be study on Sun Cluster software, there are certain principles that Sun Cluster uses to achieve its objectives that are common across other forms of cluster software and can therefore be treated as patterns or templates for re-use.
1.3 Related Documentation

TIBCO Product Documentation for the following products: TIBCO Enterprise Message Service TIBCO Administrator TIBCO Runtime Agent TIBCO Domain Utility
Sun Documentation for the following products: TIBCO Enterprise Message Service
EIF Documentation: Server Installation Guide Message Server Package
2 Enterprise Message Server (EMS)

EMS is the TIBCO implementation of the JMS (Java Messaging Service) specification (v1.1), which is an API specification from Sun and co-written by many other companies including TIBCO. However, TIBCO EMS provides a more complete suite of messaging products and tools than simply the JMS API. It also provides a critical infrastructure component that has implications on servers, storage, and network infrastructure. This section will not cover JMS basics or API, which can be found at a variety of other sources. It assumes that readers have basic knowledge of JMS and familiarity with the TIBCO EMS product. Instead it will focus on those features and capabilities available within the product to provide an industrial-strength messaging foundation for the applications. This will provide a foundation on which to build an understanding of the reasons behind decisions made in the Sun Cluster implementation described in Chapter 4.
2.1 Datastore
The TIBCO EMS Server requires storage for persistent messages, state metadata and configuration data, known as its datastore. This consists of three disk files as follows: meta.db stores information required by the server, but stores no messages sync-msgs.db stores data for queues or topics defined as failsafe async-msgs.db - stores data for queues or topics NOT defined as failsafe
It is obvious that a large amount of business data will pass through the EMS Server and be stored on disk. This places some stringent requirements on the disk storage: performance it must be fast, robust and reliable size the storage allocated should be able to grow dynamically over time recovery in the event of a disaster, some if not all of the data should be recoverable
SAN based storage is the best option due to its ability to deliver each of the above requirements.
2.2 Client Fault Tolerance

TIBCO EMS delivers a truly Fault Tolerant solution from the perspective of the client. In the event of brief interruptions in service from an individual EMS Server, another physical server process can take its place with no loss of data from the perspective of the client. From an infrastructure perspective, the most robust solution is to have the second process hosted on a separate physical server. This allows for the complete hardware failure of the first server without any loss of data. However this introduces some complexities around storage of message data that is in transit. The TIBCO EMS Server requires storage for persistent messages, state metadata and configuration data, known as its datastore (see above). However, only a single active server process can access the datastore at any one time without corruption occurring due to overlapping writes. There are two basic ways to manage the use of the datastore: Allow TIBCO EMS to control access to the datastore through the use of file locking protocols Use an external means to control access to the datastore
The first option is achieved by running the Fault Tolerant pair of EMS servers simultaneously and configuring them to be aware of each other via a tcp connection. In the event of the primary server failing, the backup server will be aware and attempt to gain control of the datastore by locking it. This option works well in situations where the datastore is local to the servers but is complicated when the datastore resides on a network device or on a SAN. Cheap network locking protocols such as NFS are notoriously unreliable whereas commercial products that provide this functionality reliably are prohibitively expensive. The second option is achieved through the use of TIBCO Hawk or clustering software. TIBCO Hawk can be used to ensure that only a single instance of a process is running, but Clustering software has the added advantage that it can detect network malfunctions and can also guarantee, through the mounting and unmounting of disk partitions, that only a single server has access to the datastore. NOTE: Even though the combination of the above features provides a reasonable level of Fault Tolerance it cannot mitigate every possible failure mode. Failures involving both physical server nodes or prolonged network outages will result in client disconnects eventually. However these situations can be handled by TIBCO Hawk rulebases running locally to the client in conjunction with good process design to shutdown clients and restart them when the EMS Servers come back up, thus preventing many spurious error conditions.
2.3 Connection Factory

In order to make use of the Client Fault Tolerance capabilities of TIBCO EMS, clients must ensure that their connections are created correctly and have appropriate values for retry and timeout settings. A Fault Tolerant connection URL takes the form: tcp://<server 1>:<port>,tcp://<server 2>:<port> The retry count and timeout settings are properties on the Tibjms object which control the attempts by the client to reconnect to one of the possible servers as follows: reconnect_attempt_count After losing its server connection, a client program iterates through its URL list until it re-establishes a connection with an EMS server. This property determines the maximum number of iterations. When absent, the default is 4. When attempting to reconnect, the client sleeps for this interval (in milliseconds) between iterations through the URL list. When absent, the default is 500 milliseconds.
reconnect_attempt_delay
While this can be done on an individual basis through code, the recommended method is to use a JNDI call to a Connection Factory object. This allows the retrieval of the above parameters from the server, thus centralizing control and administration. A Fault Tolerant JNDI Connection Factory URL takes the form: tibjmsnaming://<server 1>:<port>, tibjmsnaming://<server 2>:<port>
2.4 Log Files

Depending upon the tracing and logging parameters the server can generate different amounts of log file output. It is recommended for performance reasons that these files reside on a fast disk. The settings for logging should be set as follows for production systems: Message Tracing should not be permanently enabled in production At a minimum, the WARNING mode should be set A maximum file size should be specified to prevent uncapped growth
Where possible these files should reside on the SAN. This will allow access to diagnose issues in the event that the server node cannot be immediately recovered.
3 Sun Cluster Software

A full description of the capabilities of Sun Cluster software is beyond the scope of this document. However, an understanding of its operation is essential to comprehending the decisions made in deploying TIBCO EMS on such a cluster.
3.1 Introduction
A cluster is two or more systems, or nodes, that work together as a single, continuously available system to provide applications, system resources, and data to users. Each node on a cluster is a fully functional standalone system. However, in a clustered environment, the nodes are connected by an interconnect and work together as a single entity to provide increased availability and performance.
Figure 1.
Sun Cluster Hardware Components
Highly available clusters provide nearly continuous access to data and applications by keeping the cluster running through failures that would normally bring down a single server system. No single failure hardware, software, or networkcan cause a cluster to fail. By contrast, fault-tolerant hardware systems provide constant access to data and applications, but at a higher cost because of specialized hardware. Fault-tolerant systems usually have no provision for software failures. An application is highly available if it survives any single software or hardware failure in the system. Failures that are caused by bugs or data corruption within the application itself are excluded. The following apply to highly available applications: Recovery is transparent from the applications that use a resource. Resource access is fully preserved across node failure. Applications cannot detect that the hosting node has been moved to another node. Failure of a single node is completely transparent to programs on remaining nodes that use the files, devices, and disk volumes attached to this node.
A failover service provides high availability through redundancy. When a failure occurs, you can configure an application that is running to either restart on the same node, or be moved to another node in the cluster, without user intervention.
The Sun Cluster system makes the path between users and data highly available by using multihost disks, multipathing, and a global file system. The Sun Cluster system monitors failures for the following: Applications Most of the Sun Cluster data services supply a fault monitor that periodically probes the data service to determine its health. A fault monitor verifies that the application daemon or daemons are running and that clients are being served. Based on the information that is returned by probes, a predefined action such as restarting daemons or causing a failover can be initiated. Disk-Paths Sun Cluster software supports disk-path monitoring (DPM). DPM improves the overall reliability of failover and switchover by reporting the failure of a secondary disk path. Internet Protocol (IP) Multipath Solaris IP network multipathing software on Sun Cluster systems provide the basic mechanism for monitoring public network adapters. IP multipathing also enables failover of IP addresses from one adapter to another adapter when a fault is detected.
The following sections describe some of the key terms and definitions used when discussing clustering using Sun Cluster software.
3.2 Cluster Nodes

A cluster node is a server running both Solaris and the Sun Cluster software. This can be an entire physical server, or a Solaris domain within a larger server. Sun Cluster allows from two to eight nodes in a cluster. Cluster nodes are generally attached to one or more disks. Nodes not attached to disks use the cluster file system to access the multihost disks. Every node in the cluster is aware when another node joins or leaves the cluster. Also, every node in the cluster is aware of the resources that are running locally as well as the resources that are running on the other cluster nodes. Nodes in the same cluster should have similar processing, memory, and I/O capability to enable failover to occur without significant degradation in performance. Because of the possibility of failover, each node should have sufficient capacity to meet service level agreements if a node fails..
3.3 Cluster Interconnect

The cluster interconnect is the physical configuration of devices that are used to transfer cluster-private communications and data service communications between cluster nodes. Redundant interconnects enable operation to continue over the surviving interconnects while system administrators isolate failures and repair communication. The Sun Cluster software detects, repairs, and automatically reinitiates communication over a repaired interconnect. All nodes must be connected by the cluster interconnect through at least two redundant physically independent networks, or paths, to avoid a single point of failure. While two interconnects are required for redundancy, up to six can be used to spread traffic to avoid bottlenecks and improve redundancy and scalability. The Sun Cluster interconnect uses Fast Ethernet, Gigabit-Ethernet, Sun Fire Link, or the Scalable Coherent Interface (SCI, IEEE 1596-1992), enabling high-performance cluster-private communications. The reliable detection of interconnect issues is one area where cluster software is superior to the traditional use of Hawk to control processes on geographically dispersed systems. While Hawk can detect loss of network connectivity, it lacks the multipath facilities and quorum features of the cluster where failing components are removed or fenced to prevent them attempting to regain control of system resources.
3.4 Cluster Membership

The Cluster Membership Monitor (CMM) is a distributed set of agents that exchange messages over the cluster interconnect to complete the following tasks: Enforcing a consistent membership view on all nodes (quorum) Driving synchronized reconfiguration in response to membership changes Handling cluster partitioning Ensuring full connectivity among all cluster members by leaving unhealthy nodes out of the cluster until it is repaired
The main function of the CMM is to establish cluster membership, which requires a cluster-wide agreement on the set of nodes that participate in the cluster at any time. The CMM detects major cluster status changes on each node, such as loss of communication between one or more nodes. The CMM relies on the transport kernel module to generate heartbeats across the transport medium to other nodes in the cluster. When the CMM does not detect a heartbeat from a node within a defined time-out period, the CMM considers the node to have failed and the CMM initiates a cluster reconfiguration to renegotiate cluster membership. To determine cluster membership and to ensure data integrity, the CMM performs the following tasks: Accounting for a change in cluster membership, such as a node joining or leaving the cluster Ensuring that an unhealthy node leaves the cluster Ensuring that an unhealthy node remains inactive until it is repaired Preventing the cluster from partitioning itself into subsets of nodes.
3.5 Cluster Configuration Repository

The Cluster Configuration Repository (CCR) is a private, cluster-wide, distributed database for storing information that pertains to the configuration and state of the cluster. To avoid corrupting configuration data, each node must be aware of the current state of the cluster resources. The CCR ensures that all nodes have a consistent view of the cluster. The CCR is updated when error or recovery situations occur or when the general status of the cluster changes. The CCR structures contain the following types of information: Cluster and node names Cluster transport configuration The names of Solaris Volume Manager disk sets or VERITAS disk groups A list of nodes that can master each disk group Operational parameter values for data services Paths to data service callback methods DID device configuration Current cluster status
3.6 Fault Monitors

Sun Cluster system makes all components on the path between users and data highly available by monitoring the applications themselves, the file system, and network interfaces. The Sun Cluster software detects a node failure quickly and creates an equivalent server for the resources on the failed node. The Sun Cluster software ensures that resources unaffected by the failed node are constantly available during the recovery and that resources of the failed node become available as soon as they are recovered.
3.6.1
Data Services Monitoring
Each Sun Cluster data service supplies a fault monitor that periodically probes the data service to determine its health. A fault monitor verifies that the application daemon or daemons are running and that clients are being served. Based on the information returned by probes, predefined actions such as restarting daemons or causing a failover, can be initiated.
3.6.2
Disk-Path Monitoring
Sun Cluster software supports disk-path monitoring (DPM). DPM improves the overall reliability of failover and switchover by reporting the failure of a secondary disk-path.
3.6.3
IP Multipath Monitoring
Each cluster node has its own IP network multipathing configuration, which can differ from the configuration on other cluster nodes. IP network multipathing monitors the following network communication failures: The transmit and receive path of the network adapter has stopped transmitting packets. The attachment of the network adapter to the link is down. The port on the switch does not transmit-receive packets. The physical interface in a group is not present at system boot..
3.7 Quorum Devices

A quorum device is a disk shared by two or more nodes that contributes votes that are used to establish a quorum for the cluster to run. The cluster can operate only when a quorum of votes is available. The quorum device is used when a cluster becomes partitioned into separate sets of nodes to establish which set of nodes constitutes the new cluster. Both cluster nodes and quorum devices vote to form quorum. By default, cluster nodes acquire a quorum vote count of one when they boot and become cluster members. Nodes can have a vote count of zero when the node is being installed, or when an administrator has placed a node into the maintenance state. Quorum devices acquire quorum vote counts that are based on the number of node connections to the device. When you set up a quorum device, it acquires a maximum vote count of N-1 where N is the number of connected votes to the quorum device. For example, a quorum device that is connected to two nodes with nonzero vote counts has a quorum count of one (two minus one).
3.8 Data Integrity

The Sun Cluster system attempts to prevent data corruption and ensure data integrity. Because cluster nodes share data and resources, a cluster must never split into separate partitions that are active at the same time. The CMM guarantees that only one cluster is operational at any time. Two types of problems can arise from cluster partitions: Split Brain Amnesia
Split brain occurs when the cluster interconnect between nodes is lost and the cluster becomes partitioned into subclusters, and each subcluster believes that it is the only partition. A subcluster that is not aware of the other subclusters could cause a conflict in shared resources such as duplicate network addresses and data corruption. Amnesia occurs if all the nodes leave the cluster in staggered groups. An example is a two-node cluster with nodes A and B. If node A goes down, the configuration data in the CCR is updated on node B only, and not node A. If node B goes down at a later time, and if node A is rebooted, node A will be running with old contents of the CCR.
This state is called amnesia and might lead to running a cluster with stale configuration information. Sun Cluster avoids split brain and amnesia by giving each node one vote and mandating a majority of votes for an operational cluster. A partition with the majority of votes has a quorum and is enabled to operate. This majority vote mechanism works well if more than two nodes are in the cluster. In a twonode cluster, a majority is two. If such a cluster becomes partitioned, an external vote enables a partition to gain quorum. This external vote is provided by a quorum device. A quorum device can be any disk that is shared between the two nodes..
3.9 Failure Fencing

A major issue for clusters is a failure that causes the cluster to become partitioned (called split brain). When this situation occurs, not all nodes can communicate, so individual nodes or subsets of nodes might try to form individual or subset clusters. Each subset or partition might believe it has sole access and ownership to the multihost disks. Attempts by multiple nodes to write to the disks can result in data corruption. Failure fencing limits node access to multihost disks by preventing access to the disks. When a node leaves the cluster (it either fails or becomes partitioned), failure fencing ensures that the node can no longer access the disks. Only current member nodes have access to the disks, ensuring data integrity.
3.10 Data Services

A data service is the combination of software and configuration files that enables an application to run without modification in a Sun Cluster configuration. When running in a Sun Cluster configuration, an application runs as a resource under the control of the Resource Group Manager (RGM). A data service enables you to configure an application such as Sun Java System Web Server or Oracle database to run on a cluster instead of on a single server. The software of a data service provides implementations of Sun Cluster management methods that perform the following operations on the application: Starting the application Stopping the application Monitoring faults in the application and recovering from these faults
The configuration files of a data service define the properties of the resource that represents the application to the RGM. The RGM controls the disposition of the failover and scalable data services in the cluster. The RGM is responsible for starting and stopping the data services on selected nodes of the cluster in response to cluster membership changes. The RGM enables data service applications to utilize the cluster framework. The RGM controls data services as resources. These implementations are either supplied by Sun or created by a developer who uses a generic data service template, the Data Service Development Library API (DSDL API), or the Resource Management API (RMAPI). The cluster administrator creates and manages resources in containers that are called resource groups. RGM and administrator actions cause resources and resource groups to move between online and offline states.
3.10.1 Resource Types

A resource type is a collection of properties that describe an application to the cluster. This collection includes information about how the application is to be started, stopped, and monitored on nodes of the cluster. A resource type also includes application-specific properties that need to be defined in order to use the application in the cluster. Sun Cluster data services has several predefined resource types. For example, Sun Cluster HA for Oracle is the resource type SUNW.oracle-server and Sun Cluster HA for Apache is the resource type SUNW.apache.
3.10.2 Resources
A resource is an instance of a resource type that is defined cluster wide. The resource type enables multiple instances of an application to be installed on the cluster. When you initialize a resource, the RGM assigns values to application-specific properties and the resource inherits any properties on the resource type level. Data services utilize several types of resources. Applications such as Apache Web Server or Sun Java System Web Server utilize network addresses (logical hostnames and shared addresses) on which the applications depend. Application and network resources form a basic unit that is managed by the RGM.
3.10.3 Resource Groups

Resources that are managed by the RGM are placed into resource groups so that they can be managed as a unit. A resource group is a set of related or interdependent resources. For example, a resource derived from a SUNW.LogicalHostname resource type might be placed in the same resource group as a resource derived from an Oracle database resource type. A resource group migrates as a unit if a failover or switchover is initiated on the resource group.
3.10.4 Data Service Types

Data services enable applications to become highly available and scalable services help prevent significant application interruption after any single failure within the cluster. When a data service is configured, the data service must be configured as one of the following data service types: Failover data service Scalable data service Parallel data service
3.10.4.1 Failover Data Services Failover is the process by which the cluster automatically relocates an application from a failed primary node to a designated redundant secondary node. Failover applications have the following characteristics: Capable of running on only one node of the cluster Not cluster-aware Dependent on the cluster framework for high availability
If the fault monitor detects an error, it either attempts to restart the instance on the same node, or to start the instance on another node (failover), depending on how the data service has been configured. Failover services use a failover resource group, which is a container for application instance resources and network resources (logical hostnames). Logical hostnames are IP addresses that can be configured up on one node, and later, automatically configured down on the original node and configured up on another node. Clients might have a brief interruption in service and might need to reconnect after the failover has finished. However, clients are not aware of the change in the physical server that is providing the service. 3.10.4.2 Scalable Data Services The scalable data service enables application instances to run on multiple nodes simultaneously. Scalable services use two resource groups. The scalable resource group contains the application resources and the failover resource group contains the network resources (shared addresses) on which the scalable service depends. The scalable resource group can be online on multiple nodes, so multiple instances of the service can be running simultaneously. The failover resource group that hosts the shared address is online on only one node at a time. All nodes that host a scalable service use the same shared address to host the service.
The cluster receives service requests through a single network interface (the global interface). These requests are distributed to the nodes, based on one of several predefined algorithms that are set by the load-balancing policy. The cluster can use the load-balancing policy to balance the service load between several nodes. 3.10.4.3 Parallel Applications Sun Cluster systems provide an environment that shares parallel execution of applications across all the nodes of the cluster by using parallel databases. Sun Cluster Support for Oracle Parallel Server/Real Application Clusters is a set of packages that, when installed, enables Oracle Parallel Server/Real Application Clusters to run on Sun Cluster nodes. This data service also enables Sun Cluster Support for Oracle Parallel Server/Real Application Clusters to be managed by using Sun Cluster commands. A parallel application has been instrumented to run in a cluster environment so that the application can be mastered by two or more nodes simultaneously. In an Oracle Parallel Server/Real Application Clusters environment, multiple Oracle instances cooperate to provide access to the same shared database. The Oracle clients can use any of the instances to access the database. Thus, if one or more instances have failed, clients can connect to a surviving instance and continue to access the database.
4 EMS on Sun Cluster
4.1 Conceptual Architecture

Each EMS Business Service is implemented as a separate Sun Cluster Data Service along with its associated Logical Host, Datastore and Application Resources.
Cluster Node A
Storage Area Network
Cluster Node B
Merchandising Resource Group

Data Store EMS Primary Logs Config EMS Secondary
Application Resource Supply Chain Resource Group
Datastore Resource
Application Resource
Data Store EMS Secondary Logs
Config EMS Primary
TIBCO Runtime Agent (TRA)
TIBCO Runtime Agent (TRA)
Exclusive access granted to Primary

TIBCO Software TIBCO Software
Figure 2.
Conceptual Architecture
Additionally, TIBCO Runtime Agent is installed on each server node in the cluster and bound to the physical name and IP address of each server. TIBCO Runtime Agent is NOT under cluster control and is started at system boot via the usual init.d mechanism.
4.2 EMS Configuration

EMS is installed normally on each server, then a series of changes are made. Note that this section only describes the reason for and nature of each changes and does not describe the overall sequence of events. This is detailed in the next chapter.
4.2.1
Control Scripts
When an EMS server is registered into the TIBCO Administration Domain, a control shell script is created as follows:
$TIBCO_HOME/ems/bin/domain/<domain name>/ TIBCOServers-E4JMS_<port number>.sh
When a second server is added anywhere in the domain with the same port number, the control script is created with a different name as follows:
$TIBCO_HOME/ems/bin/domain/<domain name>/ TIBCOServers-E4JMS-1_<port number>.sh
The COMPANY XYZ EMS installation package creates a Unix shell script tibco_ems.sh that is the main script used to start/stop/check an EMS service. It takes a single argument, the EMS listening port number and utilizes whichever of the above shell scripts is present to start/stop a given EMS server. The use of the tibco_ems.sh script whenever interacting with EMS at the command line ensures that a consistent state will always be reported in TIBCO Administrator. It is also used by the Sun Cluster software to check whether EMS is running and to start/stop it as necessary. The contents of the tibco_ems.sh script are listed in Appendix 8.1
4.2.2
Configuration Files
Configuration files are created in advance for each server and contain the Queue, Topic and ACL definitions modeled in lower environments and promoted through change management procedures. These files are originally located in the $CONFIG_ROOT root folder as designated in the tibco.sh environment control file. Under the ems sub-folder there is a folder for each individual server containing the set of configuration files required by EMS. This is the location specified in the Domain Utility when adding the EMS server to the TIBCO Administration Domain. $CONFIG_ROOT ems 7020 tibemsd.conf factories.conf users.conf 7030 7040 hawk
Figure 3.
Config files for Merch Business Domain
Config files for Supply Chain Business Domain
EMS Configuration Files Directory Structure
When first installed on the server the configuration files have the following important characteristics:
A copy physically resides on each server They point to logfiles local to each server They point to a datastore local to each server They contain the same server name, which is the name of the Business Domain (e.g. EMSMERCH) They contain the listen parameter tcp://<port number> which binds EMS to the default interface for the given server They do NOT contain any Fault-Tolerant setup parameters
This configuration allows each server to be registered into the domain and tested prior to placing them under Sun Cluster control. Once the Sun Cluster configuration has been created and tested, the following modifications are made: A single copy of the configuration files is copied to the Sun Cluster partition A logical link is created from the original config folder to the above folder The central tibemsd.conf file is edited to place the datastore on the Sun Cluster partition The central tibemsd.conf file is edited to place logfiles on the Sun Cluster partition The central tibemsd.conf file is edited to configure the FT Connection Factories
Note that the servers are NOT configured to be aware of each other in the traditional Fault-Tolerant setup configuration. At no time will the two servers ever be allowed to be both running simultaneously. This is controlled by the configuration of the Sun Cluster software. The centralized configuration file factories.conf that controls the Connection Factory parameters is modified to add the reconnect_attempt_count and reconnect_attempt_delay parameters as follows:
[FTTopicConnectionFactory] type = topic url = tcp://<server a>:<port number>,tcp://<server b>:<port number> reconnect_atempt_count = 60 reconnect_atempt_delay = 5 [FTQueueConnectionFactory] type = queue url = tcp://<server a>:<port number>,tcp://<server b>:<port number> reconnect_atempt_count = 60 reconnect_atempt_delay = 5
The settings above will allow for the client libraries to attempt to re-connect every 5 seconds for up to 5 minutes. These settings will be subject to change with further experience and testing.
4.3 Cluster Configuration

The installation and configuration of the Sun Cluster software on each node is beyond the scope of this document and will normally be carried out by the Unix Services team. This section covers the salient points of the configuration of the Resource Groups to support the EMS servers and any configuration changes or scripts required. The following Resources are required within an EMS Resource Group: Logical Host Resource although not used to connect to EMS, this is required
Datastore Resource this is the disk partition that will be mounted on only the single active primary node for each EMS server Application Resource this defines the application in terms of how to start/stop it and how to check its status
For example, the following Resource Groups were created to support the Merchandising and Supply Chain Business Domains: Resource Group ctibco_merch_rg Resources lh_cert_tibcomerch HA_ctibco_merch_store ctibco_merch_app lh_cert_tibcosuppch HA_ctibco_suppch_store ctibco_suppch_app
Figure 4.
Description Logical Host resource for Merchandising Datastore resource for Merchandising Application Resource for Merchandising Logical Host resource for Supply Chain Datastore resource for Supply Chain Application Resource for Supply Chain
ctibco_suppch_rg
Resource Groups created for testing
These resources are setup such that the ctibco_merch_rg items are active on node a and inactive on node b and the ctibco_suppch_rg items are active on node b and inactive on node a. When creating an Application Resource Sun Cluster requires three parameters A script to start the Application A script to stop the Application A script to return 1 if the application is running correctly and 0 otherwise
These requirements are fulfilled by the tibco_ems_cluster.sh script which provides all three services via a single command line argument which is either start, stop or check. It utilizes the tibco_ems.sh script and translates the output of that script into the format required by Sun Cluster. The tibco_ems_cluster.sh script is listed in full in Appendix 8.2 and is installed by the Unix Services team as part of the cluster Resource Groups creation. During the installation they will rename the script as appropriate, e.g. tibco_ems_merch.sh and change the port number as required. The cluster software is configured to automatically start the Application Resource items on their primary cluster nodes at startup. It is also configured to check them at regular intervals and attempt to restart them if not running. If the application will not restart after a given number of attempts then it will be failed over to the other cluster node. The monitoring interval and number of restart attempts are configurable and were set to 15 seconds and 3 restart attempts during testing. It is also important to note that the Application Resource is created as Non-network aware. This means that the cluster software will not attempt to assess the status of the EMS servers by connecting to a tcp port at regular intervals. Instead it will rely on the information returned from the check script as configured above. A full listing of the Sun Cluster configuration as returned by the command scstat p can be found in Appendix 8.3. Note that the EMS listen port is not bound to the logical or virtual IP address of the logical host. Due to the limitations of the current EMS Administrator Plugin, the server must be listening on the default interface of the host on which it is running in order to be administered through TIBCO Administrator. This limitation may be removed in future releases.
4.4 Domain Design

TIBCO Runtime Agent is installed on each node in the cluster and is NOT placed under cluster control. Both machines are added to the TIBCO Administration Domain and this allows the monitoring of the resources and health of each server independently. The EMS servers on each cluster node for a given Business Domain (Merchandising, Supply Chain etc) are added to the TIBCO Administration Domain. Thus, within TIBCO Administrator, for each Business Domain, the EMS server instances on both cluster nodes will appear in the display. Under normal conditions, one of them will appear as Running, the other as Stopped as shown below:
Figure 5.
Display of Clustered EMS Servers in TIBCO Administrator GUI
Figure 6.
Display after failover of the 7020 server instance to its secondary cluster node
Note: It is recommended that, for consistency, the proposed Primary Server node be added first, then the secondary due to that fact the Administrator will add the -1 suffix to the second server. This will ensure that whenever an operator sees a service with a -1 suffix running they know that the server is a secondary server. Should a failover occur then the TIBCO Administrator display will automatically update to show the correct status of the two servers. This ensures that operators always receive coherent information on the status of the servers regardless of whether they use TIBCO Administrator, the Sun Cluster Manager or the command line shell scripts.
4.5 Hawk
TIBCO Hawk is not used to control the lifecycle of the EMS servers as they are controlled by the cluster software.
However, Hawk can be and is used to monitor the health of the cluster nodes and to notify clients if for some reason the EMS Service becomes completely unavailable.
These rulebases are deployed into the TIBCO Runtime Agents running on each cluster node.
EIF-Developers Guide for 21 COMPPANY XYZ
5 Installation Steps
The step-by-step instruction guide can be found in the Message Server Installation Guide. The following sections describe the reason for each step and the order in which they must be completed.
5.1 Pre-Requisites
5.1.1 Configuration Files
Prior to installation, a set of configuration files will be created which will control the installation of all the COMPANY XYZ Packages based on the servers purpose. These files are un-tarred into the $CONFIG_ROOT folder and will contain (amongst other things) a folder for each EMS Service to be created, denoted by its EMS port number. See section 4.2.2 for details.
5.1.2
Systems Management Package
Before the Message Server Package can be installed the Systems Management Package (consisting of TIBCO Runtime Agent and Domain Membership) must be installed. The Systems Management Package will create the TIBCO Runtime Agent for the server nodes based on the configuration files loaded on the machine in the previous step.
5.2 Message Server Installation

TIBCO EMS can now be installed on each cluster node using the COMPANY XYZ Message Server Package. This will install the EMS software along with the control script tibco_ems.sh.
5.3 Domain Membership

Each server should now be added to the TIBCO Administration Domain using the TIBCO Domain Utility. Note that this utility must be run via an X-Windows session. Ensure that for each Business Domain, the EMS server on the intended Primary Cluster Node is added to the domain first for the reasons described in sections 4.2.1 and 4.4. In the case of a standard COMPANY XYZ install the Domain Utility parameters will look like the following: EMS Version Port Home Path Configuration File User Name Password : : : : : : 4.1.0 See Tables below /opt/tibco/ems /opt/tibco/config/ems/<port number>/tibemsd.conf admin *****
The password will be set to the current administrator password for the environment. This will be set in the EMS Servers in a subsequent step. It is imperative that this information, especially the password, be specified correctly as the only way of changing it is to remove the EMS Server from the domain and add it again.
5.4 Test Run Servers

It is prudent at this stage to confirm that each of the Server Instances can be started. Due to the fact that they are currently using local configuration files, all of the servers can be started simultaneously or individually. For each Server Instance, start the server from TIBCO Administrator and confirm that the Service displays Started. Then proceed to the command line and confirm that tibco_ems.sh reports the service as running as follows:
$ cd /opt/tibco/scripts $ ./tibco_ems.sh <port number> check TIBCO Enterprise Messaging Server (<port number>) running with pid <pid>
Now stop the Server Instance from the command line as follows:
$ ./tibco_ems.sh <port number> stop TIBCO Enterprise Messaging Server (<port number>) stopping
Confirm that the Administrator eventually shows the Server Instance as Stopped. Repeat this process for each Server Instance on each Cluster Node.
5.5 Placing Servers under Cluster Control

The next step is to have the Unix Services team create the cluster configuration and place the servers under cluster control. Once this is finished, have the Unix Services team start the Server Resource Groups. The servers should display in TIBCO Administrator as Running on their allocated primary Cluster Node and Stopped on their allocated Secondary Cluster Node. It is useful at this stage to verify that the Server Instances can be failed between their Primary and Secondary Cluster Nodes from the Cluster Management console. Also verify that the cluster Data Store Resource fails over between the nodes and is mounted correctly on only the currently active node. It is also prudent to check that if the currently active Server Instance is stopped from within TIBCO Administrator, that the Cluster Software restarts it. Remember that at this stage, clicking on the link in Administrator for a given Server Instance will not work as the password has not yet been set on the EMS Server Instance.
5.6 Final Configuration Changes

Due to the fact that the EMS Server only accesses the configuration files at startup or when changes are made, the following changes can be made with the Server Instance running.
5.6.1
Move Configuration Files to Cluster File System
For a given Server Instance, on the currently active Cluster Node, move the configuration folder to the mounted folder for that Server Instance and replace it with a link to the folders new location. For example, on the testing servers: $ $ $ $ cd /opt/tibco/config/ems mkdir /var/ems_data/ems_merch/config mv 7020/* /var/ems_data/ems_merch/config/ rmdir 7020
$ ln s /var/ems_data/ems_merch/config 7020 On the currently inactive cluster node, simply delete the existing configuration files and create a link to where the folder will be mounted. Even though the folder is not currently mounted, it will be valid when the Cluster Software fails the Server Instance over along with its Data Store Resource. $ cd /opt/tibco/config/ems $ rm -rf 7020 $ ln s /var/ems_data/ems_merch/config 7020 Repeat this process for each pair of EMS Server Instances using the correct port numbers and mount point folders.
5.6.2
Point Datastore to Cluster File System
On the currently active Cluster Node for a given Server Instance, edit the tibemsd.conf file and set the store parameter to point to the desired folder under the mounted partition. ######################################################################## # Persistent Storage. store = /var/tibco_ems/ems_merch/datastore
EMS will create the datastore folder at startup if it does not already exist.
5.6.3
Point Log File to Cluster File System
On the currently active Cluster Node for a given Server Instance, edit the tibemsd.conf file and set the logfile parameter to point to the desired folder under the mounted partition. ####################################################################### # Log file name and tracing parameters. logfile = /var/tibco_ems/ems_merch/logs
Create the logs folder manually as follows: $ mkdir /var/tibco_ems/ems_merch/logs
5.6.4
Reset the Admin user password
Although the configuration files should be pre-built with a blank Admin password, it is worthwhile confirming this as follows. On the currently active Cluster Node for a given Server Instance, edit the users.conf file and identify the following line: admin:<misc text>:"Administrator" Remove any text between the two colons to leave the line as follows: admin::"Administrator"
5.6.5
Other Configuration File Entries
Although the configuration files are pre-built, it is worthwhile confirming that the following settings are correct for each Server Instance: The listen parameter in tibemsd.conf is set to tcp://<port number> The Fault-tolerant Setup parameters in tibemsd.conf are all empty The server parameter in tibemsd.conf is correct
5.7 Restart the Server Instance

Go to TIBCO Administrator and stop the currently active Server Instance. The Cluster Software will restart it. At this point the configuration changes entered above will have taken effect.
5.8 Set the Server Password

The last remaining step is to set the EMS Server password to the same value as was entered in the TIBCO Domain Utility in section 5.3. At the command line on either Cluster Node, start the TIBCO EMS Administration Tool as follows: $ cd /opt/tibco/ems/bin $ ./tibemsadmin TIBCO Enterprise Message Service Administration Tool. Copyright 2003-2004 by TIBCO Software Inc. All rights reserved. Version 4.1.0 V6 6/21/2004 Type 'help' for commands help, 'exit' to exit: > connect tcp://<node a>:<port number>, tcp://<node b>:<port number> Press return twice to login as the admin user with no password: Login name (admin): (Press Return) Password: (Press Return) Connected to: tcp://<server>:7020 tcp://<server>:7020>
5.9 Restart the Server Instance

Go to TIBCO Administrator and stop the currently active Server Instance. The Cluster Software will restart it. At this point the link in TIBCO Administrator will display the EMS Server Administration pages. Confirm that the following settings show the correct values:
Server Name on the General Page Log File Name on the General Page. Server Store Directory on the Server Page
FTQueueConnectionFactory on the Resources Page FTTopicConnectionFactory on the Resources Page
6 Testing Results
6.1 Test Clients
6.1.1 Java Client (Connection Factory)
This client connected to the EMS Server Instance via a Connection Factory URL of the form. tibjmsnaming://<server a>:<port>,tibjmsnaming:// <server b>:<port> The test program tibjmsFactoryQueueSender.java was created from the existing sample program tibjmsMsgProducer.java and was modified to create the test queue from the class factory as follows:
String providerContextFactory = "com.tibco.tibjms.naming.TibjmsInitialContextFactory"; String defaultTopicConnectionFactory = "FTTopicConnectionFactory"; String defaultQueueConnectionFactory = "FTQueueConnectionFactory"; String providerUrls ="tibjmsnaming://localhost:7222,tibjmsnaming://localhost:7222"; Hashtable env = new Hashtable (); env.put ( Context.INITIAL_CONTEXT_FACTORY, providerContextFactory ); env.put ( Context.PROVIDER_URL, providerUrls ); InitialContext jndiContext = new InitialContext ( env ); QueueConnectionFactory factory = (QueueConnectionFactory)jndiContext.lookup ( defaultQueueConnectionFactory ); QueueConnection connection = factory.createQueueConnection ( userName, password );
In addition, the message sending code was modified to send the same test message 1000 times in a loop. The full listing is contained in Appendix 8.4 During testing, the program prints the following output
Sent message(1): Sent message(2): Sent message(3):
During the failover from one Cluster Node to the other, the output pauses, then continues on uninterrupted.
6.1.2
Java Client (FT URL)
This client connects to the EMS Server Instance via a Fault-Tolerant URL of the form. tcp://<server a>:<port>,tcp:// <server b>:<port>
The test program tibjmsFactoryQueueSender.java was modified only slightly to incorporate the message sending loop described in the previous section and to increase the default Fault-Tolerant connection retry and timeout settings as follows.
String reconnect = new String ( "60, 5000" ); Tibjms.setReconnectAttempts ( reconnect ); System.out.println ( "After change for reconnections: " + Tibjms.getReconnectAttempts () ); ConnectionFactory factory = new com.tibco.tibjms.TibjmsConnectionFactory ( serverUrl );
The full listing is contained in Appendix 8.5 This test client behaved identically to the one using the Connection Factory URL.
6.1.3
BW Client
This test client consisted of a BW process using a timer instance to create and send a JMS message to a test queue every second. A second process subscribed to the same queue and pulled off the message. The number of process instances created was monitored via TIBCO Administrator. No errors were seen during the failover testing.
6.2 Failover Testing

6.2.1 Manual
In this mode of testing the Unix Services Team used the Sun Cluster console to force the migration of a Server Instance from one node to the other. The observed migration time from the clients perspective was approximately 15 seconds.
6.2.2
Process Failure
In order to simulate a real-world problem, the executable permissions were removed from the $TIBCO_HOME/ems/bin/tibemsd file and the running process terminated with a kill signal. After going through its retry loop, the Cluster Software failed the process over to the other Cluster Node. The test clients were paused for a longer period of time, approximately 60 seconds which is three retry periods of 15 seconds plus the failover time of 15 seconds.
6.2.3
Machine Failure
In an effort to simulate a catastrophic machine failure, the active Cluster Node for one of the EMS Server Instances was forcefully rebooted. The Cluster Software detected the failure after the configured timeout period and migrated the EMS Server Instance to the other Node. The test clients were paused for approximately 30 seconds.
7 Conclusions
7.1 Failover time
The low failover time (circa 15 seconds) in conjunction with the uninterrupted operation of clients makes the use of more expensive distributed lock manager systems unnecessary at COMPANY XYZ. It is felt that this solution meets the business needs for COMPANY XYZ at the present time. Other parameters will affect the failover time, such as: Size of datastore file system Number of messages in datastore Number of clients attempting to reconnect at failover.
However, these factors are common to both a clustered and distributed locking solution and are therefore excluded from the decision making process.
7.2 Use of hardware

The solution detail here makes good use of all servers in the cluster to house primary EMS servers therefore making best use of available hardware.
7.3 Monitoring and Management

The configuration detailed here provides good consistent visibility from TIBCO Administrator, Unix command line and Sun Cluster Console. In addition, the fact that the TIBCO Runtime Agent is installed and configured on each Cluster Node with no unusual changes allows TIBCO Hawk rules to be used to monitor the health of each Cluster Node and the EMS Server Instances.
7.4 Local tibemsd.conf

tibemsd.conf may have to reside locally to each server and point to centralized users.conf, factories.conf etc under the following conditions:
SSL parameters required are unique to each physical server A specific interface must be entered into the listen parameter
8 Appendices
8.1 tibco_ems.sh
#!/bin/sh # For all Unix platforms # # ######################################################################### # Boot script for Unix platforms # This script takes one argument: "start", "stop" or check. # # ######################################################################### # Copyright 2004 TIBCO Software Inc. All rights reserved. TIBCO_ROOT=/opt/tibco export TIBCO_ROOT # All environment variables are set in tibco.sh. Can't proceed further # if file is missing if [ -f $TIBCO_ROOT/tibco.sh ]; then . $TIBCO_ROOT/tibco.sh else echo "File not found $TIBCO_ROOT/tibco.sh" exit 1 fi # Check if [ $# then echo exit fi that the correct number of options have been passed -ne 2 ] "Usage: $0 [EMS Port] [start|stop|check]" 1>&2 1
EMS_PORT=$1 EMS_BIN=$TIBCO_ROOT/ems/bin/domain/$TIBCO_DOMAIN_NAME # Find the script that controls the server on the given Port Number # Secondary servers have '-1', '-2' etc inserted in the script name SCRIPT_FILE=`/usr/bin/find $EMS_BIN -name TIBCOServers-E4JMS*_$EMS_PORT.sh -print` # Check if [ -z echo exit fi that an EMS Server has been installed for this Port Number "$SCRIPT_FILE" ]; then "EMS Server for port $EMS_PORT not installed" 1
NOHUP="nohup" OS_TYPE=`uname -a | awk '{print $1}'` case $OS_TYPE in 'SunOS') ulimit -n 256 ;; *) ;; esac # ######################################################################### # # This function checks for a running process. #

# Takes a single argument : A string to search for in the process table # # WARNING: PS COMMAND IN UNIX TRUNCATES OUTPUT IF IT EXCEEDS # 80 COLUMNS IN WIDTH, THEREFORE, IF THE PATH POINTING # TO THE JRE/JAVA IS TOO LONG THEN THE GREP FOR # "tibcoadmin --" BELOW MAY FAIL. # # ######################################################################### findPid() { case $OS_TYPE in 'Linux') procpid=`/bin/ps awxf | /bin/grep "$1" |/bin/fgrep -v '\\_'| awk '{print $1}'` echo $procpid ;; *) procpid=`/usr/bin/ps -ef | grep "$1" | grep -v "grep" | awk '{print $2}'` echo $procpid
;; esac
case "$2" in # ######################################################################### # # Start TIBCO Enterprise Messaging Server # # ######################################################################### 'start') procname="$EMS_PORT/tibemsd.conf" pid=`findPid "$procname"` if [ "$pid" != "" ]; then echo "TIBCO Enterprise Messaging Server ($EMS_PORT) already running" else cd $CONFIG_ROOT/ems if [ -x $SCRIPT_FILE ]; then echo "TIBCO Enterprise Messaging Server ($EMS_PORT) starting..." $NOHUP $SCRIPT_FILE >/dev/null 2>&1 & echo "Started TIBCO Enterprise Messaging Server ($EMS_PORT)" else echo "EMS Server for port $EMS_PORT not installed" fi fi ;; # ######################################################################### # # Stop TIBCO Enterprise Messaging Server # # ######################################################################### 'stop') procname="$EMS_PORT/tibemsd.conf" pid=`findPid "$procname"` if [ "$pid" != "" ]; then echo "TIBCO Enterprise Messaging Server ($EMS_PORT) stopping." kill $pid else echo "TIBCO Enterprise Messaging Server ($EMS_PORT) not running" fi ;; # ######################################################################### # # Check if TIBCO Enterprise Messaging Server is running # # ######################################################################### 'check')

procname="$EMS_PORT/tibemsd.conf" pid=`findPid "$procname"` if [ "$pid" != "" ]; then echo "TIBCO Enterprise Messaging Server ($EMS_PORT) running with pid $pid" else echo "TIBCO Enterprise Messaging Server ($EMS_PORT) not running" fi ;; # ######################################################################### # # Unrecognized Option # # ######################################################################### *) echo "usage: $0 [EMS Port] [start|stop|check]" 1>&2 ;; esac
8.2 tibco_ems_cluster.sh
#!/bin/sh # Cluster control script for TIBCO EMS # Greg Mabrito - Oct 25, 2004 TIBCO_HOME="/opt/tibco" TIBCO_SCRIPTS="$TIBCO_HOME/scripts" TIBCO_EMS_PORT="7020" # process command line parameters, if any case "$1" in start) su - tibco -c "$TIBCO_SCRIPTS/tibco_ems.sh $TIBCO_EMS_PORT start" ;; stop) su - tibco -c "$TIBCO_SCRIPTS/tibco_ems.sh $TIBCO_EMS_PORT stop" ;; check) RET_VAL=`su - tibco -c "$TIBCO_SCRIPTS/tibco_ems.sh $TIBCO_EMS_PORT check" | grep "running with pid"` if [ -n "$RET_VAL" ] ; then exit 0 else exit 1 fi ;; *) echo "Usage: $0 {start|stop|check}" exit 1 ;; esac exit 0
8.3 Sun Cluster Configuration ( scstat p )

This is the printout from the testing configuration on SYS99115 and SYS99116
------------------------------------------------------------------- Cluster Nodes -Node name --------sys99115 sys99116 Status -----Online Online
Cluster node: Cluster node:
------------------------------------------------------------------- Cluster Transport Paths -Endpoint -------sys99115:qfe1 sys99115:eri0 Endpoint -------sys99116:qfe1 sys99116:eri0 Status -----Path online Path online
Transport path: Transport path:
------------------------------------------------------------------- Quorum Summary -Quorum votes possible: Quorum votes needed: Quorum votes present: -- Quorum Votes by Node -Node Name --------sys99115 sys99116 Present Possible Status ------- -------- -----1 1 Online 1 1 Online 3 2 3
Node votes: Node votes:
-- Quorum Votes by Device -Device Name ----------/dev/did/rdsk/d8s2 Present Possible Status ------- -------- -----1 1 Online
Device votes:
------------------------------------------------------------------- Device Group Servers -Device Group Primary -----------------tibco_ems_data_merch sys99115 tibco_ems_data_suppch sys99116 Secondary --------sys99116 sys99115
Device group servers: Device group servers: -- Device Group Status --
Device group status: Device group status:
Device Group Status ----------------tibco_ems_data_merch Online tibco_ems_data_suppch Online
------------------------------------------------------------------- Resource Groups and Resources -Group Name ---------Resources: ctibco_merch_rg Resources: ctibco_suppch_rg -- Resource Groups -Group Name ---------Group: ctibco_merch_rg Group: ctibco_merch_rg Group: ctibco_suppch_rg Group: ctibco_suppch_rg Node Name --------sys99115 sys99116 sys99116 sys99115 State ----Online Offline Online Offline Resources --------lh_cert_tibcomerch HA_ctibco_merch_store ctibco_merch_app lh_cert_tibcosuppch HA_ctibco_suppch_store ctibco_suppch_app

-- Resources -Resource Name ------------Resource: lh_cert_tibcomerch Resource: lh_cert_tibcomerch Node Name --------sys99115 sys99116 State ----Online Offline Online Offline Online Offline Online Offline Online Offline Online Offline Status Message -------------Online - LogicalHostname online. Offline - LogicalHostname offline. Online Offline Online Offline Online - LogicalHostname online. Offline Online Offline Online Offline
Resource: HA_ctibco_merch_store sys99115 Resource: HA_ctibco_merch_store sys99116 Resource: ctibco_merch_app Resource: ctibco_merch_app sys99115 sys99116
Resource: lh_cert_tibcosuppch sys99116 Resource: lh_cert_tibcosuppch sys99115 Resource: HA_ctibco_suppch_store sys99116 Resource: HA_ctibco_suppch_store sys99115 Resource: ctibco_suppch_app Resource: ctibco_suppch_app sys99116 sys99115
------------------------------------------------------------------- IPMP Groups -Node Name --------IPMP Group: sys99115 IPMP Group: sys99116 Group Status ---------ipmp827 Online ipmp827 Online Adapter ------qfe0 qfe0 Status -----Online Online
8.4 tibjmsFactoryQueueSender.java
import javax.jms.*; import javax.naming.*; import java.util.*; public class tibjmsFactoryQueueSender implements ExceptionListener { String userName = null; String password = null; String Vector queueName data = "queue.sample"; = new Vector (); providerContextFactory = "com.tibco.tibjms.naming.TibjmsInitialContextFactory"; defaultProviderURLs = "tibjmsnaming://localhost:7222, tibjmsnaming://localhost:7222";
static final String static final String
static final String defaultTopicConnectionFactory = "FTTopicConnectionFactory"; static final String defaultQueueConnectionFactory = "FTQueueConnectionFactory"; String providerUrls = defaultProviderURLs; public tibjmsFactoryQueueSender ( String[] args ) { parseArgs ( args ); /* print parameters */ System.out.println ( "\n------------------------------------------------------------------------" ); System.out.println ( "tibjmsQueueSender SAMPLE" ); System.out.println ( "------------------------------------------------------------------------" ); System.out.println ( "Provider URL................. " + providerUrls ); System.out.println ( "User......................... " + ( userName != null ? userName:"(null)" ) ); System.out.println ( "Queue........................ " + queueName ); System.out.println ( "------------------------------------------------------------------------\n" ); if ( queueName == null ) { System.err.println ( "Error: must specify queue name" ); usage (); } if ( 0 == data.size () ) { System.err.println ( "Error: must specify at least one message text" );

} usage ();
System.err.println ( "Publishing into queue: '" + queueName + "'\n" ); try { /* * Init JNDI Context. */ Hashtable env = new Hashtable (); env.put ( Context.INITIAL_CONTEXT_FACTORY, providerContextFactory ); env.put ( Context.PROVIDER_URL, providerUrls ); if ( null != userName ) { env.put ( Context.SECURITY_PRINCIPAL, userName ); if ( null != password ) { env.put ( Context.SECURITY_CREDENTIALS, password ); }
InitialContext jndiContext = new InitialContext ( env ); QueueConnectionFactory factory = (QueueConnectionFactory)jndiContext.lookup ( defaultQueueConnectionFactory ); QueueConnection connection = factory.createQueueConnection ( userName, password ); connection.setExceptionListener ( this ); Tibjms.setExceptionOnFTSwitch ( true ); QueueSession session = connection.createQueueSession ( false,javax.jms.Session.AUTO_ACKNOWLEDGE ); /* * Use createQueue() to enable sending into dynamic queues. */ javax.jms.Queue queue = session.createQueue ( queueName ); QueueSender sender = session.createSender ( queue ); javax.jms.TextMessage message = session.createTextMessage (); String text = (String)data.elementAt ( 0 ); message.setText ( text ); /* publish messages */ for ( int i=0; i < 1000 ; i++ ) { sender.send ( message ); System.err.println ( "Sent message(" + i + "): " + text ); try { Thread.sleep ( 1000 ); } catch ( Exception e ) { } } connection.close (); } catch ( NamingException e ) { e.printStackTrace (); System.exit ( 0 ); } catch ( JMSException e ) { e.printStackTrace (); System.exit ( 0 ); }
public static void main ( String args[] ) { tibjmsFactoryQueueSender t = new tibjmsFactoryQueueSender ( args ); } void usage () { System.err.println System.err.println System.err.println System.err.println
( ( ( (
"\nUsage: java tibjmsQueueSender [options]" ); " <message-text1 ... message-textN>" ); "" ); " where options are:" );

System.err.println System.err.println System.err.println System.err.println System.err.println System.err.println System.exit ( 0 ); ( ( ( ( ( ( "" ); " -provider " -user " -password " -queue " -help-ssl <provider URL> <user name> <password> <queue-name> EMS server URL, default is local server" ); user name, default is null" ); password, default is null" ); queue name, default is \"queue.sample\"" ); help on ssl parameters\n" );
void parseArgs ( String[] args ) { int i = 0; while ( i < args.length ) { if ( args[i].compareTo ( "-provider" ) == 0 ) { if ( (i+1) >= args.length ) usage (); providerUrls = args[i+1]; i += 2; } else if ( args[i].compareTo ( "-queue" ) == 0 ) { if ( (i+1) >= args.length ) usage (); queueName = args[i+1]; i += 2; } else if ( args[i].compareTo ( "-user" ) == 0 ) { if ( (i+1) >= args.length ) usage (); userName = args[i+1]; i += 2; } else if ( args[i].compareTo ( "-password" ) == 0 ) { if ( (i + 1) >= args.length ) usage (); password = args[i+1]; i += 2; } else if ( args[i].compareTo ( "-help" ) == 0 ) { usage (); } else if ( args[i].compareTo ( "-help-ssl" ) == 0 ) { tibjmsUtilities.sslUsage (); } else if ( args[i].startsWith ( "-ssl" ) ) { i += 2; } else { data.addElement ( args[i] ); i++; } }
public void onException ( JMSException exception ) { String strErrCode = exception.getErrorCode (); String strFTSwitch = "FT-SWITCH"; if ( true == strErrCode.startsWith ( strFTSwitch ) ) { String strNewServer = strErrCode.substring ( strFTSwitch.length () + 2 ); System.out.println ( "FT Connection switched to: " + strNewServer ); } else { exception.printStackTrace (); }
} }
8.5 tibjmsMsgProducer.java
import javax.jms.*; import javax.naming.*; import com.tibco.tibjms.Tibjms; public class tibjmsMsgProducer implements ExceptionListener { /*----------------------------------------------------------------------* Parameters *----------------------------------------------------------------------*/ String String String String Vector boolean serverUrl userName password name data useTopic = = = = = = null; null; null; "topic.sample"; new Vector(); true;
/*----------------------------------------------------------------------* Variables *----------------------------------------------------------------------*/ Connection connection = null; Session session = null; MessageProducer msgProducer = null; Destination destination = null; public tibjmsMsgProducer ( String[] args ) { parseArgs ( args ); try { tibjmsUtilities.initSSLParams ( serverUrl, args ); } catch ( JMSSecurityException e ) { System.err.println ( "JMSSecurityException: "+e.getMessage ()+", provider=" + e.getErrorCode () ); e.printStackTrace (); System.exit ( 0 ); } /* print parameters */ System.err.println ( "\n------------------------------------------------------------------------" ); System.err.println ( "tibjmsMsgProducer SAMPLE" ); System.err.println ( "------------------------------------------------------------------------" ); System.err.println ( "Server....................... " +((serverUrl!=null)?serverUrl:"localhost" ) ); System.err.println ( "User......................... " +((userName !=null)?userName : "(null)" ) ); System.err.println ( "Destination.................. " + name ); System.err.println ( "Message Text................. " ); for ( int i = 0 ; i < data.size () ; i++ ) { System.err.println ( data.elementAt ( i ) ); } System.err.println ( "------------------------------------------------------------------------\n" ); try { if ( data.size () == 0 ) { System.err.println ( "***Error: must specify at least one message text\n" ); usage (); } /* Increase FT Reconnection Settings */ String reconnect = new String ( "60, 5000" ); Tibjms.setReconnectAttempts ( reconnect ); System.out.println ( "After change for reconnections: " + Tibjms.getReconnectAttempts () ); System.err.println ( "Publishing to destination '" + name + "'\n" ); ConnectionFactory factory = new com.tibco.tibjms.TibjmsConnectionFactory ( serverUrl ); connection = factory.createConnection ( userName, password ); connection.setExceptionListener ( this ); Tibjms.setExceptionOnFTSwitch ( true ); /* create the session */ session = connection.createSession ( false, javax.jms.Session.AUTO_ACKNOWLEDGE ); /* create the destination */

if ( useTopic ) destination = session.createTopic ( name ); else destination = session.createQueue ( name ); /* create the producer */ msgProducer = session.createProducer ( null ); TextMessage message = session.createTextMessage (); String text = (String)data.elementAt ( 0 ); message.setText ( text ); /* publish messages */ for ( int i=0; i < 1000 ; i++ ) { /* publish message */ msgProducer.send ( destination, message ); System.err.println ( "Sent message(" + i + "): " + text ); try { Thread.sleep ( 1000 ); } catch ( Exception e ) { }
} catch ( JMSException e ) { e.printStackTrace (); System.exit ( -1 ); }
/* close the connection */ connection.close ();
/*----------------------------------------------------------------------* usage *----------------------------------------------------------------------*/ private void usage () { System.err.println ( "\nUsage: java tibjmsMsgProducer [options] [ssl options]" ); System.err.println ( " <message-text-1>" ); System.err.println ( " [<message-text-2>] ..." ); System.err.println ( "\n" ); System.err.println ( " where options are:" ); System.err.println ( "" ); System.err.println ( " -server <server URL> - EMS server URL, default is local server" ); System.err.println ( " -user <user name> - user name, default is null" ); System.err.println ( " -password <password> - password, default is null" ); System.err.println ( " -topic <topic-name> - topic name, default is \"topic.sample\"" ); System.err.println ( " -queue <queue-name> - queue name, no default" ); System.err.println ( " -help-ssl - help on ssl parameters" ); System.exit ( 0 ); } /*----------------------------------------------------------------------* parseArgs *----------------------------------------------------------------------*/ void parseArgs(String[] args) { int i=0; while(i < args.length) { if (args[i].compareTo("-server")==0) { if ((i+1) >= args.length) usage(); serverUrl = args[i+1]; i += 2; } else if (args[i].compareTo("-topic")==0) { if ((i+1) >= args.length) usage(); name = args[i+1]; i += 2; } else if (args[i].compareTo("-queue")==0) { if ((i+1) >= args.length) usage(); name = args[i+1];

i += 2; useTopic = false; } else if (args[i].compareTo("-user")==0) { if ((i+1) >= args.length) usage(); userName = args[i+1]; i += 2; } else if (args[i].compareTo("-password")==0) { if ((i+1) >= args.length) usage(); password = args[i+1]; i += 2; } else if (args[i].compareTo("-help")==0) { usage(); } else if (args[i].compareTo("-help-ssl")==0) { tibjmsUtilities.sslUsage(); } else if(args[i].startsWith("-ssl")) { i += 2; } else { data.addElement(args[i]); i++; }
} }
/*----------------------------------------------------------------------* main *----------------------------------------------------------------------*/ public static void main ( String[] args ) { tibjmsMsgProducer t = new tibjmsMsgProducer ( args ); } public void onException ( JMSException exception ) { String strErrCode = exception.getErrorCode (); String strFTSwitch = "FT-SWITCH"; if ( true == strErrCode.startsWith ( strFTSwitch ) ) { String strNewServer = strErrCode.substring ( strFTSwitch.length () + 2 ); System.out.println ( "FT Connection switched to: " + strNewServer ); } else { exception.printStackTrace (); } } }

EMS On Sun Cluster

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

EMS On Sun Cluster

Uploaded by

Copyright:

Available Formats

A Technical Document

Enterprise Integration Framework: TIBCO EMS on Sun Cluster

Proprietary and Confidential

EIF- EMS on Sun Cluster 1.0

on, please contact:

EIF- EMS on Sun Cluster 1.0

tion Framework: TIBCO EMS on Sun Cluster...........................................2

age Server (EMS).........................................................................................5

EIF- EMS on Sun Cluster 1.0

EIF-Developers Guide for HEB 4

1.3 Related Documentation

EIF Documentation: Server Installation Guide Message Server Package

EIF-Developers Guide for HEB 5

2 Enterprise Message Server (EMS)

2.2 Client Fault Tolerance

EIF-Developers Guide for HEB 6

2.3 Connection Factory

EIF-Developers Guide for HEB 7

2.4 Log Files

EIF-Developers Guide for HEB 8

3 Sun Cluster Software

Sun Cluster Hardware Components

EIF-Developers Guide for HEB 9

3.2 Cluster Nodes

3.3 Cluster Interconnect

EIF-Developers Guide for HEB 10

3.4 Cluster Membership

3.5 Cluster Configuration Repository

3.6 Fault Monitors

EIF-Developers Guide for HEB 11

Data Services Monitoring

3.7 Quorum Devices

3.8 Data Integrity

EIF-Developers Guide for HEB 12

3.9 Failure Fencing

3.10 Data Services

3.10.1 Resource Types

EIF-Developers Guide for HEB 13

3.10.3 Resource Groups

3.10.4 Data Service Types

EIF-Developers Guide for HEB 14

EIF-Developers Guide for HEB 15

4 EMS on Sun Cluster

4.1 Conceptual Architecture

Storage Area Network

Merchandising Resource Group

Application Resource Supply Chain Resource Group

Data Store EMS Secondary Logs

Config EMS Primary

TIBCO Runtime Agent (TRA)

TIBCO Runtime Agent (TRA)

Exclusive access granted to Primary

4.2 EMS Configuration

EIF-Developers Guide for HEB 16

Config files for Merch Business Domain

Config files for Supply Chain Business Domain

EMS Configuration Files Directory Structure

EIF-Developers Guide for HEB 17

4.3 Cluster Configuration

EIF-Developers Guide for HEB 18

Resource Groups created for testing

EIF-Developers Guide for HEB 19