Professional Documents
Culture Documents
Temenos T24 and Microsoft SQL Server HADR White Paper
Temenos T24 and Microsoft SQL Server HADR White Paper
A deployment reference architecture and guidance for implementing a high-availability and disaster-recovery solution for TEMENOS T24 running on the Microsoft Application Platform
Technical White Paper Published: May 2012 Applies to: Microsoft SQL Server 2012 Authors: Igor Pagliai (Microsoft) Dammika Wickramasinghe (Temenos)
Abstract
Temenos and Microsoft worked together to define a deployment architecture/topology that provides high availability and disaster recovery for the TEMENOS T24 core banking solution using the Microsoft Application platform and Microsoft technologies. This white paper describes the results of this joint effort.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
2012 Microsoft Corporation. All rights reserved. This document is provided as -is. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
ii
Table of Contents
Introduction .................................................................................................................................................. 1 Technical Overview of TEMENOS T24 ............................................................................................................ 5 SQL Server AlwaysOn .................................................................................................................................... 6 Recovery Objectives .......................................................................................................................................... 7 Fault Tolerance and Disaster Recovery Architecture ........................................................................................ 8 High Availability and Disaster Recovery Solution ......................................................................................... 10 Setup and Configuration .............................................................................................................................. 13 SQL Server 2012 HADR Configuration ............................................................................................................ 13 Windows Server Firewall Configurations ........................................................................................................ 14 T24 File Share Configuration .......................................................................................................................... 15 Active Directory Domain Services DNS Configuration .................................................................................... 17 Application-Tier NLB Configuration ................................................................................................................ 18 T24 Application Server Configuration ............................................................................................................. 20 Web-Tier NLB Configuration ........................................................................................................................... 23 T24Browser Configuration.............................................................................................................................. 25 Disaster Recovery Procedures ..................................................................................................................... 27 DNS Switching ................................................................................................................................................ 29 SQL Server 2012 HADR Failover ...................................................................................................................... 31 Findings and Carryovers .............................................................................................................................. 50 Recommended Hotfixes and Service Packs .................................................................................................. 51 Additional Resources ................................................................................................................................... 52 SQL Server 2012 .............................................................................................................................................. 52 Windows Server Failover Cluster .................................................................................................................... 55 Network Load Balancing ................................................................................................................................ 56 About Temenos .............................................................................................................................................. 57 About Microsoft.............................................................................................................................................. 57
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
iii
Introduction
TEMENOS T24 (T24) is a fully integrated, modular core banking solution that covers a broad spectrum of functional requirements for the retail, private, corporate, universal, and Islamic banking and microfinance sectors. T24 provides a single, real-time view of client computers across the entire enterprise, making it possible for banks to maximize returns while also streamlining costs. Microsoft SQL Server 2012 data management software provides an ideal data management framework for T24. With this foundation, T24 customers can experience faster funds transfers, higher security-trades volumes, and quicker close-of-business processes; they can benefit from open, state-of-the-art technologies to accelerate innovation, which helps to greatly increase the speed and effectiveness with which new products and services are created. As part of their strategic alliance, Microsoft and Temenos worked together to define a recommended deployment architecture that provides high availability and disaster recovery (HADR) for T24 running on the Microsoft Application Platform and using Microsoft technologies. This joint effort was conducted in the Temenos Hemel Hempstead lab. One of the main drivers for developing the architecture/topology was to reduce the cost of Microsoft software licenses and the use of specialized hardware (such as load balancers) to minimize the total cost of ownership (TCO). Therefore, the recommended software topologies can be customized to meet customers needs. The following considerations apply to the recommended architecture: The SQL Server 2012 Availability Group feature, part of the AlwaysOn technology set, was selected instead of storage area network (SAN)level synchronous storage replication to avoid the cost of an additional SAN device and the licensing cost for SAN replication software. A SQL Server 2012 Failover Cluster Instance (FCI) was adopted for the primary site instead of two standalone instances to reduce licensing cost, minimize management and performance overhead, and augment the possibility of using an existing deployment based on a typical Windows Failover Clustering (WSFC) configuration. The Network Load Balancing (NLB) feature of Windows Server 2008 R2 was chosen to eliminate the need for an expensive hardware load balancer device in front of the JBoss servers. The NLB feature of Windows Server 2008 R2 was chosen to provide better load balancing performance than the native T24 capabilities in front of T24 servers. Two cluster nodes in the primary site with shared SAN storage were used to provide high availability for the T24 application file share.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
The implementation/requirements of HADR solutions can vary based on variety of factors, including service level agreements (SLAs), cost, number of sites, and network infrastructure. Therefore, the requirements of individual HADR solutions need to be determined on a case-by-case basis for each deployment.
Use an existing highly available network storage for the cluster file share witness. Used in combination with the previous option, a highly available network storage for the cluster file share witness can render the installation of a Windows Server Failover Cluster unnecessary. o NOTE Distributed File System Replication (DFS-R) can be used to replicate files from the primary site to the disaster recovery site with a less frequent schedule. Use of DFS-R as a solution to avoid a clustered file share by having continuous replication with local folders, however, is not recommended because of the possible performance impact.
Use an additional node in the disaster recovery site with shared SAN storage between the nodes. With this alternative, a second SQL Server 2012 FCI can be used, providing high availability at the level of the disaster recovery site as well. o o This second instance must be installed only on the nodes in the disaster recovery site. This instance is distinct from the one used in the primary site.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
This instance should be configured for synchronous replication in the availability group replication. The shared SAN storage between the nodes in the disaster recovery site is not linked/replicated to the shared storage between the nodes in the primary site.
IMPORTANT In the proposed scenario, the minimum number of servers has been used in the disaster recovery site to reduce costs. This means that in the case of a complete primary site disaster, the disaster recovery site will operate in an exposed configuration that is not highly available. For this reason, it is highly recommended that you recover the primary site as soon as possible or use an additional node in the disaster recovery site with shared SAN storage between the nodes, as mentioned previously.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
Multi-subnet failover clustering Windows Server 2008 R2 and SQL Server 2012 support this type of configuration, but this has not been tested for using in reducing downtime because of Domain Name System (DNS) replication latency. The following links provide more information. o o SQL Server Multi-Subnet Clustering (http://msdn.microsoft.com/en-us/library/ff878716.aspx) SQL Server 2012 AlwaysOn: Multisite Failover Cluster Instance (http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012alwayson_3a00_-multisite-failover-cluster-instance.aspx)
Flexible failover policy SQL Server 2012 introduces a new health detection mechanism for clustered installation that can be modified so that the Windows Failover Clustering is more alert to possible SQL Server 2012 health problem conditions. The following links provide more information. o Failover Policy for Failover Cluster Instances (http://msdn.microsoft.com/en-us/library/ff878664.aspx) Configure FailureConditionLevel Property Settings (http://msdn.microsoft.com/en-us/library/ff878667.aspx)
Document Scope
The following are considered in the scope of this white paper: This document applies to T24 R11 and R12 (Temenos Application Framework C) with T24Browser as a channel. This document focuses only on HADR functionality. The document applies to following software: o o o o o o o o Windows Server 2008 R2 with Service Pack 1 (SP1) Windows Server 2008 R2 Network Load Balancing (NLB) Windows Server 2008 R2 clustering Windows Server 2008 R2 clustered file share Windows Server 2008 R2 Distributed File System (DFS) Replication SQL Server 2012 AlwaysOn Availability Group Windows Server 2008 R2 domain controller JBoss 5.1.0 GA
The following are considered out of the scope of this white paper: Performance tuning recommendations. T24 channels other than T24Browser, such as TWS.NET, TOCF.NET, and BizTalk Adapter.
4
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
Administration and monitoring of the software. Hardware configurations, such as RAID and network adapter teaming. Security. Local area network (LAN)/wide area network (WAN) configurations and recommendations.
T24 Browser TWS.NET ARC IB ARC Mobile TOCF.NET TWS (EE) TOCF (EE)
Channels
Connectivity
Temenos T24
Management
Security
Application
CC // C++ Agent C++ TAFC TAFC Agent CC // C++ TAFC Agent C++ C/ C++ TAFC Agent C / C++ Agent C / C++ TAFC C / C++ TAFC Agent CC // C++ T24 Agent C++ C/ C++ T24 C / C++ T24 T24 T24 T24 T24 T24 C C/ /C++ C++ TAFC C / C++ TAFC C / C++ TAFC TAFC C / C++DCD C C / C++ TAFC C/ /C++ C++ TAFC C / C++ DCD TAFC C / C++ DCD C / C++ DCD C / C++ DCD DCD Database Driver
T24 Monitor
FX FX FX EB AA DX AC
Message Queue
Table 1 provides a description of the components. Note that the HADR solution recommended in this white paper focuses on T24 with T24Browser as a channel.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
Description T24 Agent is a server-side jBASE component that is responsible for accepting and processing incoming client requests. Communication is established via TCP socket connections and by means of a well-defined protocol. T24 Agent is a socket server listening on a user-defined TCP port, and has the capability to serve a wide range of client applications as long as they speak the same protocol. T24 is the banking business logic written by using jBC, which is used to generate C / C++ code. The Temenos Application Framework C (TAFC) version provides additional runtime services that are currently not available in jBC. Direct Connect Driver (DCD) is the T24 data abstraction layer that decouples T24 business logic from the underlying data storage/structure. T24 Monitor is a Java Management Extensions (JMX) and web-based online monitoring tool for T24, offering real-time statistics, as well as historical views of a particular T24 system. Message Queue is an optional middleware infrastructure that lets T24 use message-driven communication with the channel layer. The jBASE or vendor-provided relational database management system (RDBMS); currently supported platforms are Oracle, Microsoft SQL Server, and IBM DB2.
T24 TAFC
Database Driver
T24 Monitor
Message Queue
Database
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
AlwaysOn Availability Groups AlwaysOn Availability Groups are new in SQL Server 2012. They greatly enhance the capabilities of database mirroring, help ensure availability of application databases, and enable zero data loss through log-based data movement for data protection without shared disks. Availability groups provide an integrated set of options, including automatic and manual failover of a logical group of databases, support for up to four secondary replicas, fast application failover, and automatic page repair. AlwaysOn Failover Cluster Instances (FCIs) FCIs enhance the SQL Server failover clustering feature and support multi-site clustering across subnets, which enables cross-data-center failover of SQL Server instances. Faster and more predictable instance failover is another key benefit that enables faster application recovery.
Recovery Objectives
Data redundancy is a key component of a high-availability database solution. Transactional activity on your primary SQL Server instance is synchronously or asynchronously applied to one or more secondary instances. When an outage occurs, transactions that were in-flight might be rolled back, or they might be lost on the secondary instances because of delays in data propagation. You can measure the impact and set recovery goals in terms how long it takes to get back in business and how much time latency there is in the last transaction recovered: Recovery Time Objective (RTO) The RTO is the duration of the outage. The initial goal is to get the system back online in at least a read-only capacity to facilitate investigation of the failure. However, the primary goal is to restore full service to the point that new transactions can take place. Recovery Point Objective (RPO) The RPO is often referred to as a measure of acceptable data loss. It is the time gap or latency between the last committed data transaction before the failure and the most recent data recovered after the failure. The actual data loss can vary depending on the workload on the system at the time of the failure, the type of failure, and the type of high availability solution used.
You should use RTO and RPO values as goals that indicate business tolerance for downtime and acceptable data loss, and as metrics for monitoring availability health. The business goals for RTO and RPO should be key drivers in selecting a SQL Server technology for your high-availability and disaster-recovery solution. Table 2 offers a rough comparison of the type of results that those different solutions may achieve.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
SQL Server HADR Solution AlwaysOn Availability Groupsynchronouscommit AlwaysOn Availability Groupasynchronouscommit AlwaysOn Failover Cluster Instance
Automatic Failover
Readable Secondaries1
Zero
Yes2
0 2
Seconds
No
0 4
NA3
Yes Yes No No No
Database Mirroring4 Zero High-safety (sync + witness) Database Mirroring2 High-performance (async) Log Shipping Backup, Copy, Restore6 Seconds5 Minutes5 Hours5
1 2
An AlwaysOn Availability Group can have no more than a total of four secondary replicas, regardless of type. Automatic failover of an Availability Group is not supported to or from a failover cluster instance. 3 The FCI itself does not provide data protection; data loss is dependent upon the storage system implementation. 4 This feature will be removed in future versions of Microsoft SQL Server. Use AlwaysOn Availability Groups instead. 5 This is highly dependent upon the workload, data volume, and failover procedures. 6 Backup, Copy, Restore is appropriate for disaster recovery, but not for high availability.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
Infrastructure level Server-level fault-tolerance and intra-node network communication use Windows Server Failover Clustering (WSFC) features for health monitoring and failover coordination. SQL Server instance level A SQL Server AlwaysOn Failover Cluster Instance (FCI) is a SQL Server instance that is installed across and can fail over to server nodes in a WSFC cluster. The nodes that host the FCI are attached to robust symmetric shared storage (SAN or SMB). Database level An availability group is a set of user databases that fail over together. An availability group consists of a primary replica and one to four secondary replicas. Each replica is hosted by an instance of SQL Server (FCI or non-FCI) on a different node of the WSFC cluster. Client connectivity Database client applications can connect directly to a SQL Server instance network name, or they may connect to a virtual network name (VNN) that is bound to an availability group listener. The VNN abstracts the WSFC cluster and Availability Group topology, logically redirecting connection requests to the appropriate SQL Server instance and database replica.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
The following decisions were made in the solution design. Refer to Figure 3 for further information. The disaster recovery site used for testing had only one server for each tier. If the disaster recovery site also requires high availability, the configuration used in the primary site should be used for the disaster recovery site. The Windows Server 2008 R2 NLB feature is used to load balance the traffic into the JBoss application servers in the primary site. The same feature can be used for the disaster recovery site if there will be two or more disaster recovery nodes. A DNS host record was created for the web-tier NLB IP to make the failover to the disaster recovery site transparent to the users (for example, T24Browser.CoE.Temenos.com). T24Browser is a stateful application that normally deploys with a sticky-session configuration. Although this configuration provides the required functionality, it reduces the scalability of the T24 web tier. The user might lose the session if an application server goes down, reducing the availability. The solution presented in this white paper eliminates these limitations by removing sticky sessions. This is achieved by persisting the JBoss session state in the SQL Server database and configuring NLB to Affinity: None. Using NLB and DNS host record and avoiding the use of sticky sessions lets you add or remove web-tier servers transparently, without affecting users. T24Browser is capable of performing simple load balancing among the available T24 application servers when a load balancing solution is not available in the application tier. This feature is disabled in the recommended solution, and NLB is used instead with the Affinity: None configuration to achieve the best possible load balancing. DNS host record was created for the application-tier NLB IP so that you have the option of failing over only the application tier to the disaster recovery site if necessary (for example, T24Server.CoE.Temenos.com). This is an optional configuration that is only required if a facility needs to simplify server maintenance and keep the T24Browser configurations identical in both sites. However, this option does create an additional step in the disaster recovery procedures.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24 10
Using the NLB Affinity: None configuration makes it possible to add or remove application-tier servers transparently, without affecting online transactions. The SQL Server 2012 HADR AlwaysOn (HADRON) configuration with a SQL Server 2012 Failover Cluster instance for the primary site is used to reduce the number of required SQL Server 2012 licenses. The primary site can have two standalone instances of SQL Server 2012 instead of the failover cluster instance if you need to remove the shared storage; however, this will require licenses for each SQL Server 2012 instance, while the failover cluster instance requires only one license regardless of the number of nodes in the cluster.
The disaster recovery instance of SQL Server 2012 is configured as a SQL Server 2012 HADRON synchronous AlwaysOn replica for zero data loss. Synchronous replication requires a fast and stable network connection in order to work as expected. This needs to be taken into account when setting up the network. If you do not have a fast and stable network connection, implement asynchronous replication instead, but understand that asynchronous replication does have a possibility of data loss.
The same Windows Server Failover Cluster that hosts the SQL Server 2012 clustered instance is used to host a clustered file share to keep T24 shared files and folders. The clustered file share increases the availability of the T24 shared files and folders. The disaster recovery site has a local folder for T24 shared files/folders. Windows Server 2008 R2 Distributed File System Replication (DFS-R) is implemented with an Active Directory Domain Services (AD DS)published namespace to make the file share failover to the disaster recovery site transparent and to replicate T24 shared files/folders. Making the T24 shared files available in the disaster recovery site is not mandatory because T24 can recover without them. However, having the T24 shared files available has a positive impact. Therefore, DFS-R is scheduled to occur several times per day to reduce the overhead of the replication.
T24 typically accesses shared files and folders via a mapped drive letter in each T24 server. Since accidentally removing or changing the mapped drive letter can cause failures, file and folder symbolic links were created by using the mklink utility of Windows and used instead of the mapped drive letters to avoid unintended mistakes. Symbolic links make the shared files and folders imitate local entities, and therefore T24 can access them directly. A JBoss session persistence database was created in the same SQL Server 2012 HADRON configuration as the T24 database, therefore having the same high availability and disaster recovery capabilities. This makes management easier and reduces the steps in disasterrecovery procedures. You can, however implement the JBoss session persistence database as a different instance, if required.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
11
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
12
The Windows Server Failover Cluster consists of a cluster with three nodes: two nodes in the primary site and one node in the disaster recovery site with a SAN shared only between the two nodes in the primary site. The disaster recovery instance has only local storage where the database content is replicated by using the availability group. The cost of the solution is reduced because there is no shared storage between nodes in the primary site and the node in the disaster recovery site, because there is no SAN in the secondary site, and because you do not need an expensive storage-level synchronization mechanism to replicate disk data content. A clustered SQL Server 2012 instance is primarily used to reduce the number of SQL Server 2012 licenses that are required. The primary site could have two standalone instances of SQL Server 2012 instead of the failover cluster instance if this is required to remove the shared storage; however, this option requires licenses for each SQL Server 2012 instance, while the failover cluster instance requires only one licence regardless of the number of nodes in the cluster.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
13
If the disaster recovery site also requires high availability, the same configuration used in the primary site needs to be available in the disaster recovery site. When the recommended solution was tested, all of the SQL Server instances were created as named instances to make them easy to identify during maintenance and monitoring. Table 3 lists the names that were used in the test environment during setup; these names can be used as a reference guideline.
Table 3. Names of SQL Server instances
Name SQL11HA
Description SQL Server 2012 instance name of the primary site. Since the named instance uses a dynamic TCP port, static TCP port 1533 was configured via the SQL Server Configuration Manager.
SQL11DR
SQL Server 2012 instance name of the disaster recovery site. Since the named instance uses a dynamic TCP port, static TCP port 1533 was configured via the SQL Server Configuration Manager.
T24AG
SQL Server 2012 AlwaysOn Availability Group name. This name is not used by T24, and is used in SQL Server Management Studio when required to fail over to the disaster recovery instance. The JBoss session persistence database was added to the same availability group in the test environment. This makes management easier, and disaster recovery failover becomes a single process for both the databases.
T24AgListener
SQL Server 2012 AlwaysOn Availability Group listener name. This is the name T24 uses to connect the SQL Server 2012 HADRON instance. When creating the listener, 1433 (the SQL Server default port) was used as the TCP port number to avoid having to change the T24 connection parameters to use a different port number.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
14
Description Inbound firewall exception rule for TCP port 1533, which is the static port configured for the SQL Server instance. Inbound firewall exception rule for UDP port 1434, which is required for the SQL Server Browser when named instances exist. Inbound firewall exception rule for TCP port 5022, which is required for the SQL Server 2012 HADRON Availability Group. Inbound firewall exception rule for TCP port 1433, which is configured for the SQL Server 2012 HADRON Availability Group Listener.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
15
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
16
If the disaster recovery site also uses a T24 multi-server configuration, the same type of file share needs to be created in the disaster recovery site. However, because the test environment had only a single T24 instance, a local folder was created with the same shared folder name.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
17
While shorter TTL values can increase the load on the DNS server, they can be useful with critical services like web servers, application servers, and load balancers. TTL values are often reduced by the DNS administrator before service is moved to minimize disruptions. Table 5 shows the DNS host records that were created in the test environment.
Table 5. DNS host records
Description The Domain Name System (DNS) host record of the T24 web-tier load balancer that was used in the web browser URL to connect to T24Browser. The TTL value was set to one minute for testing.
T24Server.CoE.Temenos.com
An optional DNS host record created for the T24 application-tier load balancer to test transparent failover of the application tier independently to the web tier. This was used by the T24Browser (configured in t24ds.xml) to connect to the load balancer in the test environment. The TTL value was set to one minute for testing.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
18
If the disaster recovery site has multiple T24 application servers, an NLB cluster needs to be configured in those servers as well. Table 6 shows the NLB configurations used.
Table 6. NLB configurations
Description Multicast operation mode was used to keep the network adapters built-in media access control (MAC) address. This was because the test servers had only one network adapter, and this network adapter had to be used for server management as well. If the server has multiple network adapters, the cluster operation mode can be set to Unicast.
The protocol used for communication with T24 was TCP/IP. The port range was limited to 20002, which is the T24 agent port configuration. Affinity: None was selected to achieve best possible load balancing.
The simple load balancing feature in T24 of T24Browser is disabled and used NLB cluster name (T24Server.CoE.Temenos.com) as the T24 instance. This lets the network load balancing route the connections to the T24 instances in the cluster.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
19
Using NLB with the Affinity: None configuration lets you add or remove application-tier servers transparently, without affecting online transactions.
The Temenos Application Framework C (TAFC) is the execution environment for the T24 application. Install TAFC and T24 application on all application servers (for installation guidance, contact Temenos).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
20
Following is a description of how the T24 application servers were configured: All the T24 instances in the test environment used multiple server configurations with the required licenses. To use one instance of T24 on multiple servers, install the multiple application server module. When using multiple application servers, define port ranges for each T24 application server to avoid conflicts or deadlock situation during close of business. Ports can be assigned by using the following variable in each application server: JBCPORTNO= {port range} The same jbase_agent port must be used on all T24 application servers. The default jbase_agent port 20002 was used in the test environment. The same port must be used because requests to the T24 servers are controlled by the load balancer, and therefore T24Browser sees only a single instance of T24 (load balancing cluster name), regardless of the number of T24 applications servers available. Inbound Windows firewall exception rule for TCP port 20002 was created to make the jbase_agent port accessible from T24Browser. The T24 database driver (Direct Connect Driver *DCD+) requires the SQL Server client to be installed on the server. At the time of testing, the DCD for the SQL Server 2012 Native Client was still in development. For this reason, the SQL Server 2008 R2 Native Client was used. Because the SQL Server 2012 HADR configuration is used for the database tier, the T24 database must be accessed via the SQL Server 2012 AlwaysOn Availability Group. Therefore, the availability group listener name was used in the T24 configuration instead of the database server IP address. File jedi_config , Record 'XMLMSSQL_FRMWRK' Command-> 0001 R12.100203 0002 T24AgListener]T24R12 0003 T24User]uHdE9oJj8B5Y0cUF0hGh0A==] Direct connect driver version. DB Server name] DB name DB User/Password encrypted
Default database locking (SQL Server application lock) was used for the testing.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
21
When the SQL Server 2012 Native Client is certified for use with T24, the considerations for client availability features shown in Table 7 will apply.
Table 7. Client type considerations
Driver
Multisubnet failover
Application intent
Readonly routing
Multi-subnet failover: faster single subnet endpoint failover Yes No Future date
Multi-subnet failover: named instance resolution for SQL Server clustered instances Yes No Future date
SQL Server Native Client 11.0 ODBC SQL Server Native Client 11.0 OLE DB ADO.NET with Microsoft .NET Framework 4.0 update 4.0.2* ADO.NET with .NET Framework 3.5 Microsoft Java Database Connectivity (JDBC) driver 4.0 for SQL Server
Yes No Yes
*ADO.NET with .NET Framework 4.0.2 patch download for connectivity improvement (http://support.microsoft.com/kb/2544514). For more information about connection string keywords, see: Using Connection String Keywords with SQL Server Native Client (http://msdn.microsoft.com/en-us/library/ms130822(v=sql.110).aspx).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
22
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
23
If the disaster recovery site has multiple web-tier servers, an NLB cluster needs to be configured in those servers as well. Table 8 shows the NLB configurations used.
Table 8. NLB configurations
Description Multicast operation mode was used to keep the network adapters built-in media access control (MAC) address. This was because the test servers had only one network adapter, and this network adapter had to be used for server management as well. If the server has multiple network adapters, the cluster operation mode can be set to Unicast.
TCP was used as the HTTP traffic transport over TCP/IP. The port range was limited to 8080, which was the JBoss web site port range configured in the test environment. Affinity: None was selected to achieve best possible load balancing. Typically, the T24Browser requires Affinity: Single (stickysession) configuration because it is a stateful application. However, in the recommended solution, JBoss is configured to persist session states in the SQL Server database; therefore, it is possible to use the Affinity: None configuration in the load balancer.
To make it possible to fail over the web tier to the disaster recovery site transparently, the DNS host record (T24Browser.CoE.Temenos.com) is used for the NLB cluster IP address. Therefore, the web browser URL remains unchanged, even if there is a failover to the disaster recovery site. Not using sticky-sessions increases the availability of the site; in addition, using NLB with the DNS host record allows for adding or removing web-tier servers transparently and without affecting the users.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
24
T24Browser Configuration
The T24 web tier is configured with two JBoss instances with T24Browser (nodes: Web Node 1 and Web Node 2) in the primary site and a single instance (node: Web Node 3) in the disaster recovery site. It is possible to have a multiple JBoss/T24Browser instances (web server nodes) in the disaster recovery site if high availability is a requirement for the disaster recovery site. The Windows Server 2008 R2 NLB feature was used to balance the loads on the JBoss server nodes. Figure 9 shows the T24 web tier.
JBoss Configuration
The JBoss application server 5.1.0 GA was used in the test environment that hosted T24Browser Java Servlet application. No clustered instance of JBoss was installed in the web-tier servers. Following is the list of configurations that were made after successfully installing JBoss: Because of the limitations of JBoss cluster session replication and to avoid using sticky sessions, JBoss session persistence functionality was implemented using a SQL Server database. A JBoss session persistence database was created in the same SQL Server 2012 HADR configuration as the T24 database. Therefore, the JBoss session persistence database has the same high availability and disaster recovery capabilities as the T24 database. This makes management easier and reduces the number of steps in the disaster recover procedures. (Note that the JBoss session persistence database can be implemented as a different instance if required.)
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
25
An inbound Windows firewall exception rule for TCP port 8080 was created to make JBoss accessible to users.
Description Configuration of the connection to the T24 server. AGENT connection method was used for the testing.
ConnectionTime The connection expiration time if T24Browser does not get a response from the out T24 application server. This was set to 20 seconds. RetryCount The number of retry attempts the T24Browser should make if it cant reach T24 to successfully execute a transaction. This was set to 20 times. RetryWait When retrying, the number of seconds to wait before attempting to retry the transaction. This was set to 5 seconds.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
26
Description A comma-separated list of available T24 servers. Because the NLB feature in Windows Server 2008 R2 is configured at the application tier, the name of the load balancing cluster needs to be used instead of the names of the T24 servers. The load balancing cluster T24Server.CoE.Temenos.com was used in the test environment.
Ports
The jbase_agent TCP port number. All T24 instances in the test environment are configured to use TCP port 20002; therefore, 20002 is used as the jbase_agent port number.
loadBalancing
To enable or disable the simple load balancing feature in T24Browser. This is set to false because the NLB feature in Windows Server 2008 R2 performs the load balancing in the recommended solution.
actionTimeout
The number of seconds that the jbase_agent waits for a response from T24 application server. This was set to 60 seconds in the test environment.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
27
The steps required for the failover activities are described in detail in the sections that follow. Note that the steps in all sections need to be completed to successfully fail over to the disaster recovery site.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
28
DNS Switching
Web-tier and application-tier DNS switching require changing the IP address of the DNS host records to the IP address of the relevant server (node) in the disaster recovery site. Following are the steps that need to be followed to change the IP addresses of the DNS host records: 1. Log on to the domain controller as the administrator. 2. Navigate to Server Manager. 3. Expand Roles, expand DNS Server, expand DNS, expand Server Name, and then expand Forward Lookup Zones. 4. Select the domain name (for example, CoE.Temenos.com). Note that T24Browser and optional T24Server are the DNS host records that require the IP changes (Figure 11).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
29
5. Right click on the DNS host record T24Browser, and then select Properties (Figure 12).
6. Change the address in the IP address field to the IP address of the web-tier server in the disaster recovery site, and then click OK. If the disaster recovery site has more than one web-tier server, the previous IP address should be the IP address of the web-tier load balancer (NLB cluster).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
30
7. If the T24Server DNS host record is also available, right-click the DNS host record, and then select Properties. Change the address in the IP address field to the IP address of the application-tier server in the disaster recovery site, and then click OK (Figure 13).
If the disaster recovery site has more than one application-tier server, the IP address should be the IP address of the application-tier load balancer (NLB cluster).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
31
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
32
For more information about planned manual failover, see: Perform a Planned Manual Failover of an Availability Group (SQL Server) (http://msdn.microsoft.com/en-us/library/hh231018.aspx).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
33
Failover Procedure
Following are the steps that need to be followed to fail over the SQL Server 2012 HADR to the disaster recovery site. 1. Connect to Primary or Secondary (disaster recovery) instance of SQL Server by using the SQL Server 2012 Management Studio (Figure 16).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
34
2. Right-click on the Availability Group (for example, T24AG), and then select Failover (Figure 17).
3. In the Fail Over Availability Group Wizard, click Next (Figure 18).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
35
4. In the Select New Primary Replica page, select the secondary SQL Server instance if it is not already selected, and then click Next (Figure 19).
Figure 19. Fail Over Availability Group wizard Select New Primary Replica page
5. In the Connect to Replica page, connect to the secondary instance by providing the credentials, and then click Next (Figure 20).
Figure 20. Fail Over Availability Group wizard Connect to Replica page
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
36
6. Click Finish at the Summary page to start the failover (Figure 21).
7. After the successful failover, the wizard will show a Results page similar to the following (Figure 22).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
37
The Validating WSFC quorum vote configuration warning appears because of the special quorum configuration used in this solution and is safe to ignore (Figure 23).
Figure 23. Fail Over Availability Group wizard WSFC quorum configuration warning
8. Check the database status and Availability Group status in SQL Server 2012 Management Studio to verify the failover (Figure 24).
Figure 24. Management Studio after Fail Over Availability Group wizard
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
38
(http://msdn.microsoft.com/en-us/library/ms366279.aspx).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
39
Failover Procedure
When the primary site or the primary site database servers are not available, the only accessible database server will be the disaster recovery instance. The following shows how Windows Server Failover Cluster and SQL Server instance can be seen in the disaster recovery database server (Figure 25 and Figure 26).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
40
To bring the database online in the disaster recover site, you first need to start Windows Server Failover Cluster with forced quorum, followed by SQL Server 2012 availability group forced failover. The following sub-sections provide the steps required to bring the database online. The steps in all sections need to be completed to successfully fail over to the disaster recovery site. Force Cluster Start with Force Quorum Following are the steps need to be followed to force the cluster to start in the disaster recovery site with force quorum: 1. Log on to the disaster recovery database server with a domain account that has administrator privileges to the local computer.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
41
2. Open Server Manager, expand Features, and then expand Failover Cluster Manager. Select the cluster (Figure 27).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
42
4. Confirm the action by selecting Yes Force my cluster to start option (Figure 29).
5. Cluster start will take some timewait till the cluster starts successfully (Figure 30).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
43
6. After the cluster starts, the cluster will look like the following figure in the Failover Cluster Manager (Figure 31).
Force Failover SQL Server 2012 Availability Group Once the Windows Server Failover Cluster is online with force quorum, the following steps need to be followed to force failover in the SQL Server 2012 availability group: 1. Open SQL Server 2012 Management Studio and connect to the SQL Server disaster recovery instance (Figure 32).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
44
2. Right-click on the Availability Group (for example, T24AG), and then select Failover (Figure 33).
3. In the Fail Over Availability Group Wizard, click Next (Figure 34).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
45
4. In the Select New Primary Replica page, select the secondary SQL Server instance if it is not already selected. Also note the warning. Click Next (Figure 35 and Figure 36). Because the cluster quorum is forced, the quorum status is showing as Forced Quorum.
Figure 35. Fail Over Availability Group wizard Select New Primary Replica page
Figure 36. Fail Over Availability Group wizard Select New Primary Replica page warning
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
46
5. Select and confirm failover with potential data loss, and then click Next (Figure 37). Because the database status is not synchronized, SQL Server warns about potential data loss. However, there is no data loss if the databases were in Synchronized state at the time of the site failure
Figure 37. Fail Over Availability Group wizard Potential Data Loss confirmation
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
47
6. Click Finish on the Summary page to start the failover (Figure 38).
Figure 38. Fail Over Availability Group wizard Force Failover Summary page
7. After the successful force failover, wizard will show the Results page (Figure 39).
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
48
The Validating WSFC quorum vote configuration warning appears because of the special quorum configuration that is used in the recommended solution and is safe to ignore (Figure 40).
Figure 40. Fail Over Availability Group wizard WSFC Quorum Configuration warning
8. After successful force failover, the database status and availability group status in SQL Server 2012 Management Studio will look like the following figure (Figure 41).
Figure 41. Management Studio after Fail Over Availability Group wizard
Additional Considerations
It is highly recommended that you change the cluster quorum configuration if planned (scheduled maintenance) or unplanned (primary site disaster) shutdown of all cluster nodes in the primary site occurs, and if the disaster recovery SQL Server 2012 instance becomes active as the primary instance for an extended period of time. If you do not change the cluster quorum configuration, the entire cluster might shut down because of insufficient quorum vote availability.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
49
Change the value for the disaster recovery cluster node property NodeWeight to 1, and change the value for the cluster nodes in the primary site to 0. For more information, see the Microsoft Support article at http://support.microsoft.com/kb/2494036/en-us. Shutting down only one node in the primary site will not affect cluster availability as long as the second node in the primary site will be still up and running along with the File Share Witness (FSW).
If the FSW in the primary site will not be available and cannot be contacted by the cluster node in the disaster recovery site, change the FSW location to be in the disaster recovery site.
Running the entire system with only one node in the disaster recovery site will not guarantee high availability. Therefore, this should only be done for a limited amount of time. Otherwise, it is highly recommended that you add a second node in the disaster recovery site and modify the cluster quorum configuration accordingly.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
50
Windows Server DFS-R with DFS Namespace published in Active Directory Domain Services provides a unique URL that can be used to refer the file share, regardless of the system that is operating in the primary or the disaster recovery environment. File and folder symbolic links make the shared file/folder access more resilient. A clustered instance of SQL Server 2012 for high availability reduces licensing requirements. A SQL Server 2012 AlwaysOn Availability Group eliminates SAN replications. DNS host records used for the load balancer IP addresses make disaster recovery failover transparent at the web and application tiers.
Regularly check and apply the pertinent hotfixes mentioned in the following knowledge base (KB) article to enhance stability and fix known critical bugs (not security related). Recommended hotfixes and updates for Windows Server 2008 R2based server clusters http://support.microsoft.com/kb/980054/en-us
As a special out-of-band recommended hotfix for Windows Server 2008 R2, please install the following hotfix on all the cluster nodes in the primary and disaster recovery sites. A hotfix that improves the performance of the "AlwaysOn Availability Group" feature in SQL Server 2012 is available for Windows Server 2008 R2 http://support.microsoft.com/kb/2687741/en-us
Regularly check and apply all the security hotfixes for SQL Server 2012. o NOTE Currently, SQL Server 2012 does not have any security hotfixes released.
Regularly check and apply the latest available service pack for SQL Server 2012 after checking with Temenos about the supportability. o NOTE Currently there is no released service pack for SQL Server 2012.
51
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
As a special out-of-band recommended hotfix for SQL Server 2012, install the following update package on all the SQL Server 2012 instances in the primary and disaster recovery sites. Cumulative update package 1 for SQL Server 2012 http://support.microsoft.com/kb/2679368/en-us NOTE If a more recent update is available, it is not necessary to install the previous hotfix.
Regularly check for latest cumulative update (CU) release for SQL Server 2012, review the fixed bugs and install only if you are affected and after checking with Temenos about supportability. For a list of released CUs for SQL Server 2012, see the following KB article. The SQL Server 2012 builds that were released after SQL Server 2012 was released http://support.microsoft.com/kb/2692828/en-us
Finally, it is highly recommended that you check periodically with the Microsoft Support Service for any recommended non-security related hotfixes for Windows Server 2008 R2 and SQL Server 2012.
Additional Resources
Following are links for further information.
Database Availability Step-by-Step Guide: o Deploying a new Availability Group http://msdnstage.redmond.corp.microsoft.com/enus/library/ff877884.aspx#RelatedTasks Create or Configure an Availability Group Listener (SQL Server) http://go.microsoft.com/fwlink/?LinkId=201271 Perform a Forced Manual Failover of an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/ff877957.aspx
o o
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
52
Instance Availability Step-by-Step Guide: o o o SQL Server Multi-Subnet Clustering http://msdn.microsoft.com/en-us/library/ff878716.aspx Configure FailureConditionLevel Property Settings http://msdn.microsoft.com/en-us/library/ff878667.aspx View and Read Failover Cluster Instance Diagnostics Log http://msdn.microsoft.com/en-us/library/ff878700.aspx
AlwaysOn FAQ for SQL Server 2012 http://msdn.microsoft.com/en-us/sqlserver/gg508768(l=en-us) Hardware and Software Requirements for Installing SQL Server 2012 http://msdn.microsoft.com/en-us/library/ms143506.aspx Introducing SQL Server AlwaysOn http://msdn.microsoft.com/en-us/sqlserver/gg490638 Overview of AlwaysOn Availability Groups http://msdn.microsoft.com/en-us/library/ff877884.aspx Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups http://msdn.microsoft.com/en-us/library/ff878487.aspx#SystemReqsForAOAG Before Installing Failover Clustering http://msdn.microsoft.com/en-us/library/ms189910.aspx Create a New SQL Server Failover Cluster (Setup) http://msdn.microsoft.com/en-us/library/ms179530.aspx Add or Remove Nodes in a SQL Server Failover Cluster (Setup) http://msdn.microsoft.com/en-us/library/ms191545.aspx Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster Recovery http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/Microsoft%20SQL%20Server%20AlwaysOn%20Solutions%20Guide%20for% 20High%20Availability%20and%20Disaster%20Recovery.docx Availability Modes http://msdn.microsoft.com/en-us/library/ff877931.aspx
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
53
AlwaysOn Failover Cluster Instances http://msdn.microsoft.com/en-us/library/ms189134.aspx Enable and Disable AlwaysOn Availability Groups (SQL Server) http://msdn.microsoft.com/en-us/library/ff878259.aspx Creating an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/ff878176.aspx Create or Configure an Availability Group Listener (SQL Server) http://msdn.microsoft.com/en-us/library/hh213080.aspx Monitor Availability Groups http://msdn.microsoft.com/en-us/library/ff878305.aspx AlwaysOn Availability Groups Dynamic Management Views and Functions http://msdn.microsoft.com/en-us/library/ff877943.aspx Manually Prepare a Secondary Database for an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/ff878349.aspx SQL Server 2012 AlwaysOn: Multisite Failover Cluster Instance http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012alwayson_3a00_-multisite-failover-cluster-instance.aspx Perform a Forced Manual Failover of an Availability Group http://msdn.microsoft.com/en-us/library/ff877957.aspx Availability Group Listeners, Client Connectivity, and Application Failover (SQL Server) http://msdn.microsoft.com/en-us/library/hh213417.aspx Configure Read-Only Access on an Availability Replica (SQL Server) http://msdn.microsoft.com/en-us/library/hh213002.aspx Configure Read-Only Routing on an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/hh710054.aspx Client Connection Access to Availability Replicas (SQL Server) http://msdn.microsoft.com/en-us/library/hh510184.aspx Configure Read-Only Access on an Availability Replica http://msdn.microsoft.com/en-us/library/hh213002.aspx Configure the Windows Firewall to Allow SQL Server Access http://msdn.microsoft.com/en-us/library/cc646023.aspx
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
54
How to use Kerberos authentication in SQL Server http://support.microsoft.com/kb/319723/en-us How to transfer the logins and the passwords between instances of SQL Server 2005 and SQL Server 2008 http://support.microsoft.com/kb/918992/en-us SQL Server Web site http://www.microsoft.com/sqlserver SQL Server Tech Center http://technet.microsoft.com/en-us/sqlserver SQL Server Dev Center http://msdn.microsoft.com/en-us/sqlserver
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
55
Recommended hotfixes and updates for Windows Server 2008 R2-based server clusters http://support.microsoft.com/kb/980054/en-us A hotfix that improves the performance of the "AlwaysOn Availability Group" feature in SQL Server 2012 is available for Windows Server 2008 R2 http://support.microsoft.com/kb/2687741/en-us
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
56
About Temenos
Founded in 1993 and listed on the Swiss Stock Exchange (SIX: TEMN), Temenos Group AG is the market-leading provider of banking software systems to retail, corporate, universal, private, Islamic, and microfinance and community banks. Headquartered in Geneva with more than 60 offices worldwide, Temenos serves more than 1,500 customers in 125 countries. Temenos software products provide advanced technology and rich functionality, incorporating best-practice processes that take advantage of Temenos experience in 700 implementations around the globe. For more information, visit: www.temenos.com
About Microsoft
Founded in 1975, Microsoft (Nasdaq "MSFT") is the worldwide leader in software, services, and solutions that help people and businesses realize their full potential. For more information, visit: www.microsoft.com
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24
57