You are on page 1of 60

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

A deployment reference architecture and guidance for implementing a high-availability and disaster-recovery solution for TEMENOS T24 running on the Microsoft Application Platform

Technical White Paper Published: May 2012 Applies to: Microsoft SQL Server 2012 Authors: Igor Pagliai (Microsoft) Dammika Wickramasinghe (Temenos)

Abstract
Temenos and Microsoft worked together to define a deployment architecture/topology that provides high availability and disaster recovery for the TEMENOS T24 core banking solution using the Microsoft Application platform and Microsoft technologies. This white paper describes the results of this joint effort.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

2012 Microsoft Corporation. All rights reserved. This document is provided as -is. Information and views expressed in this document, including URL and other Internet Web site references, may change without notice. You bear the risk of using it. This document does not provide you with any legal rights to any intellectual property in any Microsoft product. You may copy and use this document for your internal, reference purposes.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

ii

Table of Contents
Introduction .................................................................................................................................................. 1 Technical Overview of TEMENOS T24 ............................................................................................................ 5 SQL Server AlwaysOn .................................................................................................................................... 6 Recovery Objectives .......................................................................................................................................... 7 Fault Tolerance and Disaster Recovery Architecture ........................................................................................ 8 High Availability and Disaster Recovery Solution ......................................................................................... 10 Setup and Configuration .............................................................................................................................. 13 SQL Server 2012 HADR Configuration ............................................................................................................ 13 Windows Server Firewall Configurations ........................................................................................................ 14 T24 File Share Configuration .......................................................................................................................... 15 Active Directory Domain Services DNS Configuration .................................................................................... 17 Application-Tier NLB Configuration ................................................................................................................ 18 T24 Application Server Configuration ............................................................................................................. 20 Web-Tier NLB Configuration ........................................................................................................................... 23 T24Browser Configuration.............................................................................................................................. 25 Disaster Recovery Procedures ..................................................................................................................... 27 DNS Switching ................................................................................................................................................ 29 SQL Server 2012 HADR Failover ...................................................................................................................... 31 Findings and Carryovers .............................................................................................................................. 50 Recommended Hotfixes and Service Packs .................................................................................................. 51 Additional Resources ................................................................................................................................... 52 SQL Server 2012 .............................................................................................................................................. 52 Windows Server Failover Cluster .................................................................................................................... 55 Network Load Balancing ................................................................................................................................ 56 About Temenos .............................................................................................................................................. 57 About Microsoft.............................................................................................................................................. 57

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

iii

Introduction
TEMENOS T24 (T24) is a fully integrated, modular core banking solution that covers a broad spectrum of functional requirements for the retail, private, corporate, universal, and Islamic banking and microfinance sectors. T24 provides a single, real-time view of client computers across the entire enterprise, making it possible for banks to maximize returns while also streamlining costs. Microsoft SQL Server 2012 data management software provides an ideal data management framework for T24. With this foundation, T24 customers can experience faster funds transfers, higher security-trades volumes, and quicker close-of-business processes; they can benefit from open, state-of-the-art technologies to accelerate innovation, which helps to greatly increase the speed and effectiveness with which new products and services are created. As part of their strategic alliance, Microsoft and Temenos worked together to define a recommended deployment architecture that provides high availability and disaster recovery (HADR) for T24 running on the Microsoft Application Platform and using Microsoft technologies. This joint effort was conducted in the Temenos Hemel Hempstead lab. One of the main drivers for developing the architecture/topology was to reduce the cost of Microsoft software licenses and the use of specialized hardware (such as load balancers) to minimize the total cost of ownership (TCO). Therefore, the recommended software topologies can be customized to meet customers needs. The following considerations apply to the recommended architecture: The SQL Server 2012 Availability Group feature, part of the AlwaysOn technology set, was selected instead of storage area network (SAN)level synchronous storage replication to avoid the cost of an additional SAN device and the licensing cost for SAN replication software. A SQL Server 2012 Failover Cluster Instance (FCI) was adopted for the primary site instead of two standalone instances to reduce licensing cost, minimize management and performance overhead, and augment the possibility of using an existing deployment based on a typical Windows Failover Clustering (WSFC) configuration. The Network Load Balancing (NLB) feature of Windows Server 2008 R2 was chosen to eliminate the need for an expensive hardware load balancer device in front of the JBoss servers. The NLB feature of Windows Server 2008 R2 was chosen to provide better load balancing performance than the native T24 capabilities in front of T24 servers. Two cluster nodes in the primary site with shared SAN storage were used to provide high availability for the T24 application file share.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

The implementation/requirements of HADR solutions can vary based on variety of factors, including service level agreements (SLAs), cost, number of sites, and network infrastructure. Therefore, the requirements of individual HADR solutions need to be determined on a case-by-case basis for each deployment.

Alternatives to the Recommended Architecture


The architecture proposed in this white paper is not the only one possible using SQL Server 2012 AlwaysOn features, but this architecture has been thoroughly tested. Possible alternatives to the recommended schema can include the following: Use two standalone SQL Server 2012 instances (in an AlwaysOn Availability Group) instead of a single SQL Server 2012 Failover Cluster Instance. This lets you avoid using shared SAN for the cluster nodes in the primary site. o If you are using an availability group, all nodes in the must still be part of a cluster, and a standalone SQL Server 2012 instance must still be installed on each node. The cost savings with this alternative come from eliminating the need for shared storage. To ensure that there is no local data loss if there is local failover between instances in the primary site, the two standalone SQL Server 2012 instances, along with the one (or more) in the disaster recovery site, must be configured for synchronous replication. In this configuration, automatic failover can be provided by the AlwaysOn Availability Group feature, but extra care must be taken to avoid unwanted failover to the remote disaster recovery site.

Use an existing highly available network storage for the cluster file share witness. Used in combination with the previous option, a highly available network storage for the cluster file share witness can render the installation of a Windows Server Failover Cluster unnecessary. o NOTE Distributed File System Replication (DFS-R) can be used to replicate files from the primary site to the disaster recovery site with a less frequent schedule. Use of DFS-R as a solution to avoid a clustered file share by having continuous replication with local folders, however, is not recommended because of the possible performance impact.

Use an additional node in the disaster recovery site with shared SAN storage between the nodes. With this alternative, a second SQL Server 2012 FCI can be used, providing high availability at the level of the disaster recovery site as well. o o This second instance must be installed only on the nodes in the disaster recovery site. This instance is distinct from the one used in the primary site.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

This instance should be configured for synchronous replication in the availability group replication. The shared SAN storage between the nodes in the disaster recovery site is not linked/replicated to the shared storage between the nodes in the primary site.

IMPORTANT In the proposed scenario, the minimum number of servers has been used in the disaster recovery site to reduce costs. This means that in the case of a complete primary site disaster, the disaster recovery site will operate in an exposed configuration that is not highly available. For this reason, it is highly recommended that you recover the primary site as soon as possible or use an additional node in the disaster recovery site with shared SAN storage between the nodes, as mentioned previously.

Additional SQL Server 2012 HADR Capabilities for Future Consideration


Note that the following SQL Server 2012 HADR capabilities have not been tested prior to publication of this white paper because of time, resource, and configuration constraints. They should be considered to be future enhancements to the recommended architecture, and should be tested for custom deployments and/or lab testing sessions: Readable secondary for Availability Group replicated databases This feature presents no theoretical risks and could be used to better utilize hardware resources in the disaster recovery site (including read-only queries, reporting, backups, and integrity checks,), but T24 should be modified to take advantage of this capability (for readonly queries only). The following links provide more information: o o Active Secondaries: Readable Secondary Replicas (http://msdn.microsoft.com/en-us/library/ff878253.aspx) Configure Read-Only Access on an Availability Replica (http://msdn.microsoft.com/en-us/library/hh213002.aspx) NOTE In the recommended configuration, the secondary replicas for the availability group replicated the databases. Read-only access is not enabled, but can be easily activated with no downtime. Availability Group Read-Only Routing and Application Intent These features cannot be used because they require the SQL Server 2012 Native Open Database Connectivity (ODBC) client to be installed on the T24 servers. As a future enhancement, this version of the client should be tested for T24 use. The following links provide more information: o o Configure Read-Only Routing for an Availability Group (SQL Server) (http://msdn.microsoft.com/en-us/library/hh710054.aspx) Client Connection Access to Availability Replicas (SQL Server) (http://msdn.microsoft.com/en-us/library/hh510184.aspx)
3

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

Multi-subnet failover clustering Windows Server 2008 R2 and SQL Server 2012 support this type of configuration, but this has not been tested for using in reducing downtime because of Domain Name System (DNS) replication latency. The following links provide more information. o o SQL Server Multi-Subnet Clustering (http://msdn.microsoft.com/en-us/library/ff878716.aspx) SQL Server 2012 AlwaysOn: Multisite Failover Cluster Instance (http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012alwayson_3a00_-multisite-failover-cluster-instance.aspx)

Flexible failover policy SQL Server 2012 introduces a new health detection mechanism for clustered installation that can be modified so that the Windows Failover Clustering is more alert to possible SQL Server 2012 health problem conditions. The following links provide more information. o Failover Policy for Failover Cluster Instances (http://msdn.microsoft.com/en-us/library/ff878664.aspx) Configure FailureConditionLevel Property Settings (http://msdn.microsoft.com/en-us/library/ff878667.aspx)

Document Scope
The following are considered in the scope of this white paper: This document applies to T24 R11 and R12 (Temenos Application Framework C) with T24Browser as a channel. This document focuses only on HADR functionality. The document applies to following software: o o o o o o o o Windows Server 2008 R2 with Service Pack 1 (SP1) Windows Server 2008 R2 Network Load Balancing (NLB) Windows Server 2008 R2 clustering Windows Server 2008 R2 clustered file share Windows Server 2008 R2 Distributed File System (DFS) Replication SQL Server 2012 AlwaysOn Availability Group Windows Server 2008 R2 domain controller JBoss 5.1.0 GA

The following are considered out of the scope of this white paper: Performance tuning recommendations. T24 channels other than T24Browser, such as TWS.NET, TOCF.NET, and BizTalk Adapter.
4

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

Administration and monitoring of the software. Hardware configurations, such as RAID and network adapter teaming. Security. Local area network (LAN)/wide area network (WAN) configurations and recommendations.

Technical Overview of TEMENOS T24


The various components of a T24-based solution are shown in Figure 1.
Windows Server 2008 R2 Windows Server 2008 R2 Internet Information Services (IIS) 7.5

T24 Browser TWS.NET ARC IB ARC Mobile TOCF.NET TWS (EE) TOCF (EE)

Channels

Connectivity

Temenos T24

Management

Security

Application

Windows Server 2008 R2 Active Directory

CC // C++ Agent C++ TAFC TAFC Agent CC // C++ TAFC Agent C++ C/ C++ TAFC Agent C / C++ Agent C / C++ TAFC C / C++ TAFC Agent CC // C++ T24 Agent C++ C/ C++ T24 C / C++ T24 T24 T24 T24 T24 T24 C C/ /C++ C++ TAFC C / C++ TAFC C / C++ TAFC TAFC C / C++DCD C C / C++ TAFC C/ /C++ C++ TAFC C / C++ DCD TAFC C / C++ DCD C / C++ DCD C / C++ DCD DCD Database Driver

T24 Monitor

FX FX FX EB AA DX AC

Message Queue

SQL Server 2012

Windows Server 2008 R2

Figure 1. T24 logical component view

Table 1 provides a description of the components. Note that the HADR solution recommended in this white paper focuses on T24 with T24Browser as a channel.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

Table 1. Component descriptions

Component T24 Agent

Description T24 Agent is a server-side jBASE component that is responsible for accepting and processing incoming client requests. Communication is established via TCP socket connections and by means of a well-defined protocol. T24 Agent is a socket server listening on a user-defined TCP port, and has the capability to serve a wide range of client applications as long as they speak the same protocol. T24 is the banking business logic written by using jBC, which is used to generate C / C++ code. The Temenos Application Framework C (TAFC) version provides additional runtime services that are currently not available in jBC. Direct Connect Driver (DCD) is the T24 data abstraction layer that decouples T24 business logic from the underlying data storage/structure. T24 Monitor is a Java Management Extensions (JMX) and web-based online monitoring tool for T24, offering real-time statistics, as well as historical views of a particular T24 system. Message Queue is an optional middleware infrastructure that lets T24 use message-driven communication with the channel layer. The jBASE or vendor-provided relational database management system (RDBMS); currently supported platforms are Oracle, Microsoft SQL Server, and IBM DB2.

T24 TAFC

Database Driver

T24 Monitor

Message Queue

Database

SQL Server AlwaysOn


SQL Server AlwaysOn is a new integrated, flexible, and cost-efficient HADR solution. AlwaysOn can provide data and hardware redundancy within and across data centers, and it can improve application failover time to increase the availability of mission-critical applications. AlwaysOn is flexible and lets you reuse existing hardware investments. A solution using AlwaysOn can take advantage of two major SQL Server 2012 features for configuring availability at both the database and the instance level:

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

AlwaysOn Availability Groups AlwaysOn Availability Groups are new in SQL Server 2012. They greatly enhance the capabilities of database mirroring, help ensure availability of application databases, and enable zero data loss through log-based data movement for data protection without shared disks. Availability groups provide an integrated set of options, including automatic and manual failover of a logical group of databases, support for up to four secondary replicas, fast application failover, and automatic page repair. AlwaysOn Failover Cluster Instances (FCIs) FCIs enhance the SQL Server failover clustering feature and support multi-site clustering across subnets, which enables cross-data-center failover of SQL Server instances. Faster and more predictable instance failover is another key benefit that enables faster application recovery.

Recovery Objectives
Data redundancy is a key component of a high-availability database solution. Transactional activity on your primary SQL Server instance is synchronously or asynchronously applied to one or more secondary instances. When an outage occurs, transactions that were in-flight might be rolled back, or they might be lost on the secondary instances because of delays in data propagation. You can measure the impact and set recovery goals in terms how long it takes to get back in business and how much time latency there is in the last transaction recovered: Recovery Time Objective (RTO) The RTO is the duration of the outage. The initial goal is to get the system back online in at least a read-only capacity to facilitate investigation of the failure. However, the primary goal is to restore full service to the point that new transactions can take place. Recovery Point Objective (RPO) The RPO is often referred to as a measure of acceptable data loss. It is the time gap or latency between the last committed data transaction before the failure and the most recent data recovered after the failure. The actual data loss can vary depending on the workload on the system at the time of the failure, the type of failure, and the type of high availability solution used.

You should use RTO and RPO values as goals that indicate business tolerance for downtime and acceptable data loss, and as metrics for monitoring availability health. The business goals for RTO and RPO should be key drivers in selecting a SQL Server technology for your high-availability and disaster-recovery solution. Table 2 offers a rough comparison of the type of results that those different solutions may achieve.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

Table 2. Comparison of SQL Server HADR solutions

SQL Server HADR Solution AlwaysOn Availability Groupsynchronouscommit AlwaysOn Availability Groupasynchronouscommit AlwaysOn Failover Cluster Instance

Potential Data Loss (RPO)

Potential Recovery Time (RTO) Seconds

Automatic Failover

Readable Secondaries1

Zero

Yes2

0 2

Seconds

Minutes Seconds -to-minutes Seconds Minutes5 Minutes -to-hours5 Hours -to-days5

No

0 4

NA3

Yes Yes No No No

NA NA NA Not during a restore Not during a restore

Database Mirroring4 Zero High-safety (sync + witness) Database Mirroring2 High-performance (async) Log Shipping Backup, Copy, Restore6 Seconds5 Minutes5 Hours5

Fault Tolerance and Disaster Recovery Architecture


SQL Server AlwaysOn solutions help provide fault tolerance and disaster recovery across several logical and physical layers of infrastructure and application components. Historically, it has been common practice to separate duties and responsibilities for the various audiences and roles involved, so that each was predominately concerned with only a portion of those solution layers. This section describes each of those layers and offers guidance for your design discussions and implementation decisions. A successful SQL Server AlwaysOn solution requires understanding and collaboration across these solution layers:

1 2

An AlwaysOn Availability Group can have no more than a total of four secondary replicas, regardless of type. Automatic failover of an Availability Group is not supported to or from a failover cluster instance. 3 The FCI itself does not provide data protection; data loss is dependent upon the storage system implementation. 4 This feature will be removed in future versions of Microsoft SQL Server. Use AlwaysOn Availability Groups instead. 5 This is highly dependent upon the workload, data volume, and failover procedures. 6 Backup, Copy, Restore is appropriate for disaster recovery, but not for high availability.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

Infrastructure level Server-level fault-tolerance and intra-node network communication use Windows Server Failover Clustering (WSFC) features for health monitoring and failover coordination. SQL Server instance level A SQL Server AlwaysOn Failover Cluster Instance (FCI) is a SQL Server instance that is installed across and can fail over to server nodes in a WSFC cluster. The nodes that host the FCI are attached to robust symmetric shared storage (SAN or SMB). Database level An availability group is a set of user databases that fail over together. An availability group consists of a primary replica and one to four secondary replicas. Each replica is hosted by an instance of SQL Server (FCI or non-FCI) on a different node of the WSFC cluster. Client connectivity Database client applications can connect directly to a SQL Server instance network name, or they may connect to a virtual network name (VNN) that is bound to an availability group listener. The VNN abstracts the WSFC cluster and Availability Group topology, logically redirecting connection requests to the appropriate SQL Server instance and database replica.

Figure 2 shows a logical topology of a representative AlwaysOn solution.

Figure 2. Logical representation of an AlwaysOn solution

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

High Availability and Disaster Recovery Solution


The recommended HADR solution for T24 deployments was designed based on the following: Incurring zero data loss when failing over to the disaster recovery site, assuming that there is a compatible network connection between the sites that are capable of synchronous data replication. Reducing the cost of Microsoft software licenses and specialized hardware (such as load balancers) to minimize the total cost of ownership. Maximizing use of any Windows Server 2008 R2 features and capabilities that complement T24.

The following decisions were made in the solution design. Refer to Figure 3 for further information. The disaster recovery site used for testing had only one server for each tier. If the disaster recovery site also requires high availability, the configuration used in the primary site should be used for the disaster recovery site. The Windows Server 2008 R2 NLB feature is used to load balance the traffic into the JBoss application servers in the primary site. The same feature can be used for the disaster recovery site if there will be two or more disaster recovery nodes. A DNS host record was created for the web-tier NLB IP to make the failover to the disaster recovery site transparent to the users (for example, T24Browser.CoE.Temenos.com). T24Browser is a stateful application that normally deploys with a sticky-session configuration. Although this configuration provides the required functionality, it reduces the scalability of the T24 web tier. The user might lose the session if an application server goes down, reducing the availability. The solution presented in this white paper eliminates these limitations by removing sticky sessions. This is achieved by persisting the JBoss session state in the SQL Server database and configuring NLB to Affinity: None. Using NLB and DNS host record and avoiding the use of sticky sessions lets you add or remove web-tier servers transparently, without affecting users. T24Browser is capable of performing simple load balancing among the available T24 application servers when a load balancing solution is not available in the application tier. This feature is disabled in the recommended solution, and NLB is used instead with the Affinity: None configuration to achieve the best possible load balancing. DNS host record was created for the application-tier NLB IP so that you have the option of failing over only the application tier to the disaster recovery site if necessary (for example, T24Server.CoE.Temenos.com). This is an optional configuration that is only required if a facility needs to simplify server maintenance and keep the T24Browser configurations identical in both sites. However, this option does create an additional step in the disaster recovery procedures.
The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24 10

Using the NLB Affinity: None configuration makes it possible to add or remove application-tier servers transparently, without affecting online transactions. The SQL Server 2012 HADR AlwaysOn (HADRON) configuration with a SQL Server 2012 Failover Cluster instance for the primary site is used to reduce the number of required SQL Server 2012 licenses. The primary site can have two standalone instances of SQL Server 2012 instead of the failover cluster instance if you need to remove the shared storage; however, this will require licenses for each SQL Server 2012 instance, while the failover cluster instance requires only one license regardless of the number of nodes in the cluster.

The disaster recovery instance of SQL Server 2012 is configured as a SQL Server 2012 HADRON synchronous AlwaysOn replica for zero data loss. Synchronous replication requires a fast and stable network connection in order to work as expected. This needs to be taken into account when setting up the network. If you do not have a fast and stable network connection, implement asynchronous replication instead, but understand that asynchronous replication does have a possibility of data loss.

The same Windows Server Failover Cluster that hosts the SQL Server 2012 clustered instance is used to host a clustered file share to keep T24 shared files and folders. The clustered file share increases the availability of the T24 shared files and folders. The disaster recovery site has a local folder for T24 shared files/folders. Windows Server 2008 R2 Distributed File System Replication (DFS-R) is implemented with an Active Directory Domain Services (AD DS)published namespace to make the file share failover to the disaster recovery site transparent and to replicate T24 shared files/folders. Making the T24 shared files available in the disaster recovery site is not mandatory because T24 can recover without them. However, having the T24 shared files available has a positive impact. Therefore, DFS-R is scheduled to occur several times per day to reduce the overhead of the replication.

T24 typically accesses shared files and folders via a mapped drive letter in each T24 server. Since accidentally removing or changing the mapped drive letter can cause failures, file and folder symbolic links were created by using the mklink utility of Windows and used instead of the mapped drive letters to avoid unintended mistakes. Symbolic links make the shared files and folders imitate local entities, and therefore T24 can access them directly. A JBoss session persistence database was created in the same SQL Server 2012 HADRON configuration as the T24 database, therefore having the same high availability and disaster recovery capabilities. This makes management easier and reduces the steps in disasterrecovery procedures. You can, however implement the JBoss session persistence database as a different instance, if required.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

11

Figure 3. HADR solution

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

12

Setup and Configuration


This section describes how to configure the HADR solution.

SQL Server 2012 HADR Configuration


SQL Server 2012 HADR is configured with a clustered instance for the primary site and a standalone instance in the disaster recovery site. The configuration uses the AlwaysOn Availability Group to replicate database content and to provide transparent failover. The disaster recovery instance is configured as a synchronous replica for zero data loss. Figure 4 shows a schematic of the solution.

Figure 4. SQL Server 2012 HADR solution

The Windows Server Failover Cluster consists of a cluster with three nodes: two nodes in the primary site and one node in the disaster recovery site with a SAN shared only between the two nodes in the primary site. The disaster recovery instance has only local storage where the database content is replicated by using the availability group. The cost of the solution is reduced because there is no shared storage between nodes in the primary site and the node in the disaster recovery site, because there is no SAN in the secondary site, and because you do not need an expensive storage-level synchronization mechanism to replicate disk data content. A clustered SQL Server 2012 instance is primarily used to reduce the number of SQL Server 2012 licenses that are required. The primary site could have two standalone instances of SQL Server 2012 instead of the failover cluster instance if this is required to remove the shared storage; however, this option requires licenses for each SQL Server 2012 instance, while the failover cluster instance requires only one licence regardless of the number of nodes in the cluster.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

13

If the disaster recovery site also requires high availability, the same configuration used in the primary site needs to be available in the disaster recovery site. When the recommended solution was tested, all of the SQL Server instances were created as named instances to make them easy to identify during maintenance and monitoring. Table 3 lists the names that were used in the test environment during setup; these names can be used as a reference guideline.
Table 3. Names of SQL Server instances

Name SQL11HA

Description SQL Server 2012 instance name of the primary site. Since the named instance uses a dynamic TCP port, static TCP port 1533 was configured via the SQL Server Configuration Manager.

SQL11DR

SQL Server 2012 instance name of the disaster recovery site. Since the named instance uses a dynamic TCP port, static TCP port 1533 was configured via the SQL Server Configuration Manager.

T24AG

SQL Server 2012 AlwaysOn Availability Group name. This name is not used by T24, and is used in SQL Server Management Studio when required to fail over to the disaster recovery instance. The JBoss session persistence database was added to the same availability group in the test environment. This makes management easier, and disaster recovery failover becomes a single process for both the databases.

T24AgListener

SQL Server 2012 AlwaysOn Availability Group listener name. This is the name T24 uses to connect the SQL Server 2012 HADRON instance. When creating the listener, 1433 (the SQL Server default port) was used as the TCP port number to avoid having to change the T24 connection parameters to use a different port number.

Windows Server Firewall Configurations


The Windows Server Firewall is on by default; therefore, you need to create relevant inbound firewall exceptions in the servers for the configuration to work as expected. Table 4 shows the inbound firewall rules that need to be created in all the database servers.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

14

Table 4. Firewall rules

Name SQL11 (1533) SQL11 Browser (1434)

Description Inbound firewall exception rule for TCP port 1533, which is the static port configured for the SQL Server instance. Inbound firewall exception rule for UDP port 1434, which is required for the SQL Server Browser when named instances exist. Inbound firewall exception rule for TCP port 5022, which is required for the SQL Server 2012 HADRON Availability Group. Inbound firewall exception rule for TCP port 1433, which is configured for the SQL Server 2012 HADRON Availability Group Listener.

SQL11 AG (5022) SQL11 AG Listener (1433)

T24 File Share Configuration


In the multi-server configuration, T24 is required to have a shared location for its working files and folders. Any single file is created or written by only one T24 instance and is read by all instances. There is no concern about file write locks; however, the share needs to be resilient for the multiserver configuration to function properly. If T24 fails over to the disaster recovery site, making the T24 shared files available in the disaster recovery site is not mandatory because T24 can recover without them. However, having the shared files available does have a positive impact. A resilient file share solution with less frequent (once or twice a day) file replication to the disaster recovery site is therefore a good solution. Windows Server Clustered File Server, in conjunction with DFS-R, provides an optimal solution and does not require any additional licenses. For simplicity, an Active Directory Domain Services (AD DS)published DFS Namespace is used to refer the shared file folder. Therefore, T24 can refer the same path (namespace) for shared files, whether it is in the primary site or in the disaster recovery site. Figure 5 shows the T24 file share and file replication configuration.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

15

Figure 5. File share and file replication

Windows Server Clustered File Share Configuration


The recommended SQL Server 2012 HADR configuration uses a Windows Server Cluster. Using the same cluster to host the file server reduces the complexity of the solution and simplifies management and monitoring. Since only the primary site servers in the cluster have access to the shared storage, the only possible owners of the file server are the servers in the primary site. The file server, therefore, does not fail over to the disaster recovery site, and the disaster recovery instance of T24 will only have access to its local folders. A shared folder called T24FileShare was created in the file server and used as the resilient file share location of the primary site.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

16

If the disaster recovery site also uses a T24 multi-server configuration, the same type of file share needs to be created in the disaster recovery site. However, because the test environment had only a single T24 instance, a local folder was created with the same shared folder name.

Distributed File System Replication Configuration


DFS-R was used to periodically replicate T24 shared files between the primary site and the disaster recovery site. The replication frequency was set to the lowest possible (once or twice per day) to avoid any performance implications, and because having the shared files available in the disaster recovery site is not mandatory to T24. The disaster recovery site of the test environment had a single instance of T24; therefore, the folder for the shared files was created locally in the same server. The DFS replication was set up to replicate the files between the clustered file share in the primary site and the local folder in the T24 disaster recovery instance.

Active Directory Domain Services DNS Configuration


To make the web-tier failover transparent to the users, you must have a DNS host record that can be referred by the users to reach T24Browser instead of the load balancer IP. Failover to disaster recovery will therefore only require changing the IP address of the DNS host record, and users do not need to use a different URL. In the test environment, the DNS host record T24Browser.CoE.Temenos.com was created for the web-tier Network Load Balancing IP. You can also create a DNS host record for the application-tier servers if it is a requirement to be able to transparently fail over the application tier independently to the web tier. Note that this is an optional configuration that is helpful if you need to ease server maintenance and keep the T24Browser configurations identical in both sites. However, this configuration does add a step to the disaster recovery procedures. The DNS host record T24Server.CoE.Temenos.com was created for the application-tier NLB IP in the test environment. One drawback of using DNS host records is that the client application using the name caches the IP address. Therefore, even if the IP address of the DNS host record is changed at the server-side in a disaster recovery failover, the client application might still use the old IP address, and this old IP address might no longer be available. To minimize the chance to this happening, the time to live (TTL) value of the DNS host record needs to be adjusted. In the test environment, the TTL value was set to one minute, which means that the client application verified the DNS host record IP address with the server every one minute.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

17

While shorter TTL values can increase the load on the DNS server, they can be useful with critical services like web servers, application servers, and load balancers. TTL values are often reduced by the DNS administrator before service is moved to minimize disruptions. Table 5 shows the DNS host records that were created in the test environment.
Table 5. DNS host records

DNS Host Record T24Browser.CoE.Temenos.com

Description The Domain Name System (DNS) host record of the T24 web-tier load balancer that was used in the web browser URL to connect to T24Browser. The TTL value was set to one minute for testing.

T24Server.CoE.Temenos.com

An optional DNS host record created for the T24 application-tier load balancer to test transparent failover of the application tier independently to the web tier. This was used by the T24Browser (configured in t24ds.xml) to connect to the load balancer in the test environment. The TTL value was set to one minute for testing.

Application-Tier NLB Configuration


T24Browser is capable of performing simple load balancing among the available T24 application servers when a load balancing solution is not available in the application tier. However, specialized load balancing solutions can provide better load balancing capabilities. The NLB feature in Windows Server is a software load balancing solution that does not require additional licenses and complements T24 by providing a specialised load balancing solution. In the recommended solution, the NLB feature In Windows Server is enabled and configured in the T24 application servers in the primary site, and created an NLB cluster consisting of the two servers. Figure 6 shows the application-tier NLB cluster.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

18

Figure 6. Application-tier NLB cluster

If the disaster recovery site has multiple T24 application servers, an NLB cluster needs to be configured in those servers as well. Table 6 shows the NLB configurations used.
Table 6. NLB configurations

Configuration Cluster operation mode

Description Multicast operation mode was used to keep the network adapters built-in media access control (MAC) address. This was because the test servers had only one network adapter, and this network adapter had to be used for server management as well. If the server has multiple network adapters, the cluster operation mode can be set to Unicast.

Protocol Port range Filtering mode

The protocol used for communication with T24 was TCP/IP. The port range was limited to 20002, which is the T24 agent port configuration. Affinity: None was selected to achieve best possible load balancing.

The simple load balancing feature in T24 of T24Browser is disabled and used NLB cluster name (T24Server.CoE.Temenos.com) as the T24 instance. This lets the network load balancing route the connections to the T24 instances in the cluster.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

19

Using NLB with the Affinity: None configuration lets you add or remove application-tier servers transparently, without affecting online transactions.

T24 Application Server Configuration


The T24 application tier is configured with two T24 instance (nodes: App Node 1 and App Node 2) in the primary site and a single instance (node: App Node 3) in the disaster recovery site. Note that it is possible to have multiple T24 instances (application server nodes) in the disaster recovery site if high availability is a requirement for the disaster recovery site. The Windows Server 2008 R2 NLB feature was used to balance the T24 application servers. The HADR solution for the T24 file share is implemented by using a Windows Server 2008 R2 clustered file share and DFS-R. Figure 7 shows the T24 application tier configuration.

Figure 7. T24 application tier

The Temenos Application Framework C (TAFC) is the execution environment for the T24 application. Install TAFC and T24 application on all application servers (for installation guidance, contact Temenos).

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

20

Following is a description of how the T24 application servers were configured: All the T24 instances in the test environment used multiple server configurations with the required licenses. To use one instance of T24 on multiple servers, install the multiple application server module. When using multiple application servers, define port ranges for each T24 application server to avoid conflicts or deadlock situation during close of business. Ports can be assigned by using the following variable in each application server: JBCPORTNO= {port range} The same jbase_agent port must be used on all T24 application servers. The default jbase_agent port 20002 was used in the test environment. The same port must be used because requests to the T24 servers are controlled by the load balancer, and therefore T24Browser sees only a single instance of T24 (load balancing cluster name), regardless of the number of T24 applications servers available. Inbound Windows firewall exception rule for TCP port 20002 was created to make the jbase_agent port accessible from T24Browser. The T24 database driver (Direct Connect Driver *DCD+) requires the SQL Server client to be installed on the server. At the time of testing, the DCD for the SQL Server 2012 Native Client was still in development. For this reason, the SQL Server 2008 R2 Native Client was used. Because the SQL Server 2012 HADR configuration is used for the database tier, the T24 database must be accessed via the SQL Server 2012 AlwaysOn Availability Group. Therefore, the availability group listener name was used in the T24 configuration instead of the database server IP address. File jedi_config , Record 'XMLMSSQL_FRMWRK' Command-> 0001 R12.100203 0002 T24AgListener]T24R12 0003 T24User]uHdE9oJj8B5Y0cUF0hGh0A==] Direct connect driver version. DB Server name] DB name DB User/Password encrypted

Default database locking (SQL Server application lock) was used for the testing.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

21

Limitations of Using the SQL 2008 R2 Native Client


During the testing, the SQL Server 2008 R2 Native Client was used with the T24 database driver because the DCD did not support SQL Server 2012 client libraries. The following limitations therefore apply to the SQL Server 2012 HADR AlwaysOn functionalities: Read-only routing for the availability group is not available. Application intent is not available. Optimizations for fast multi-subnet failover clustering are not available.

When the SQL Server 2012 Native Client is certified for use with T24, the considerations for client availability features shown in Table 7 will apply.
Table 7. Client type considerations

Driver

Multisubnet failover

Application intent

Readonly routing

Multi-subnet failover: faster single subnet endpoint failover Yes No Future date

Multi-subnet failover: named instance resolution for SQL Server clustered instances Yes No Future date

SQL Server Native Client 11.0 ODBC SQL Server Native Client 11.0 OLE DB ADO.NET with Microsoft .NET Framework 4.0 update 4.0.2* ADO.NET with .NET Framework 3.5 Microsoft Java Database Connectivity (JDBC) driver 4.0 for SQL Server

Yes No Yes

Yes Yes Yes

Yes Yes Yes

Future date Yes

Future date Yes

Future date Yes

Future date Yes

Future date Future date

*ADO.NET with .NET Framework 4.0.2 patch download for connectivity improvement (http://support.microsoft.com/kb/2544514). For more information about connection string keywords, see: Using Connection String Keywords with SQL Server Native Client (http://msdn.microsoft.com/en-us/library/ms130822(v=sql.110).aspx).

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

22

T24 Shared Files


For T24 multiple server installation, it is necessary to share certain files and folders among T24 application servers. T24 typically accesses shared files and folders via a mapped drive letter in each T24 server. However, accidentally removing or changing the mapped drive letter can cause failures. Therefore, file and folder symbolic links were created by using the Windows mklink utility instead of mapped drive letters to avoid unintended mistakes. Symbolic links make the shared files and folders act as local entities, so T24 can directly access them. If there are additional folders/files that need to be shared, appropriate symbolic links should be created.

Web-Tier NLB Configuration


When the web tier has multiple servers (nodes), there needs to be a mechanism to route the requests to the servers and to provide a single address to the requester (web browser), regardless of the number of servers in the tier. This functionality is typically provided by using the proxy server or/and load balancer with redundancy to increase the availability of the service. The Network Load Balancing (NLB) feature of Windows Server does not have a single point of failure because the service works on the network layer of all the servers. Because it is a readily available feature in Windows Server, the NLB feature does not require additional licenses. The NLB feature is enabled and configured in the web servers in the primary site, and created an NLB cluster consisting of the two servers. Figure 8 shows the web-tier NLB cluster.

Figure 8. Web-tier NLB cluster

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

23

If the disaster recovery site has multiple web-tier servers, an NLB cluster needs to be configured in those servers as well. Table 8 shows the NLB configurations used.
Table 8. NLB configurations

Configuration Cluster operation mode

Description Multicast operation mode was used to keep the network adapters built-in media access control (MAC) address. This was because the test servers had only one network adapter, and this network adapter had to be used for server management as well. If the server has multiple network adapters, the cluster operation mode can be set to Unicast.

Protocol Port range Filtering mode

TCP was used as the HTTP traffic transport over TCP/IP. The port range was limited to 8080, which was the JBoss web site port range configured in the test environment. Affinity: None was selected to achieve best possible load balancing. Typically, the T24Browser requires Affinity: Single (stickysession) configuration because it is a stateful application. However, in the recommended solution, JBoss is configured to persist session states in the SQL Server database; therefore, it is possible to use the Affinity: None configuration in the load balancer.

To make it possible to fail over the web tier to the disaster recovery site transparently, the DNS host record (T24Browser.CoE.Temenos.com) is used for the NLB cluster IP address. Therefore, the web browser URL remains unchanged, even if there is a failover to the disaster recovery site. Not using sticky-sessions increases the availability of the site; in addition, using NLB with the DNS host record allows for adding or removing web-tier servers transparently and without affecting the users.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

24

T24Browser Configuration
The T24 web tier is configured with two JBoss instances with T24Browser (nodes: Web Node 1 and Web Node 2) in the primary site and a single instance (node: Web Node 3) in the disaster recovery site. It is possible to have a multiple JBoss/T24Browser instances (web server nodes) in the disaster recovery site if high availability is a requirement for the disaster recovery site. The Windows Server 2008 R2 NLB feature was used to balance the loads on the JBoss server nodes. Figure 9 shows the T24 web tier.

Figure 9. T24 web tier

JBoss Configuration
The JBoss application server 5.1.0 GA was used in the test environment that hosted T24Browser Java Servlet application. No clustered instance of JBoss was installed in the web-tier servers. Following is the list of configurations that were made after successfully installing JBoss: Because of the limitations of JBoss cluster session replication and to avoid using sticky sessions, JBoss session persistence functionality was implemented using a SQL Server database. A JBoss session persistence database was created in the same SQL Server 2012 HADR configuration as the T24 database. Therefore, the JBoss session persistence database has the same high availability and disaster recovery capabilities as the T24 database. This makes management easier and reduces the number of steps in the disaster recover procedures. (Note that the JBoss session persistence database can be implemented as a different instance if required.)

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

25

An inbound Windows firewall exception rule for TCP port 8080 was created to make JBoss accessible to users.

T24Browser with AGENT Connection Method


After successful installation of the JBoss application server, T24Browser can be deployed and configured to use one of the two types of supported configurations, AGENT or JMS. Detailed stepby-step setup and configuration can be requested from Temenos. For the online transactions used in this testing, the AGENT configuration is recommended. Tables 9 and 10 show the settings that were configured in the T24Browser.
Table 9. Settings in browserParameters.xml

Parameter name Server Connection Method

Description Configuration of the connection to the T24 server. AGENT connection method was used for the testing.

ConnectionTime The connection expiration time if T24Browser does not get a response from the out T24 application server. This was set to 20 seconds. RetryCount The number of retry attempts the T24Browser should make if it cant reach T24 to successfully execute a transaction. This was set to 20 times. RetryWait When retrying, the number of seconds to wait before attempting to retry the transaction. This was set to 5 seconds.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

26

Table 10. Settings in t24-ds.xml

Property name Host

Description A comma-separated list of available T24 servers. Because the NLB feature in Windows Server 2008 R2 is configured at the application tier, the name of the load balancing cluster needs to be used instead of the names of the T24 servers. The load balancing cluster T24Server.CoE.Temenos.com was used in the test environment.

Ports

The jbase_agent TCP port number. All T24 instances in the test environment are configured to use TCP port 20002; therefore, 20002 is used as the jbase_agent port number.

loadBalancing

To enable or disable the simple load balancing feature in T24Browser. This is set to false because the NLB feature in Windows Server 2008 R2 performs the load balancing in the recommended solution.

actionTimeout

The number of seconds that the jbase_agent waits for a response from T24 application server. This was set to 60 seconds in the test environment.

Disaster Recovery Procedures


The high availability solution described in this document implements automatic failover between the primary site servers (nodes). Human intervention is therefore not required. However, the disaster recovery failover is intentionally designed to be manual, because this is typically part of the business continuity plan. Therefore, the disaster recovery failover might require additional procedures to be followed. This section describes the disaster recovery procedures that were successfully tested for the recommended solution. Figure 10 shows the three failover activities that are required. Note that the second failover activity is optional, and can be used if application-tier failover is implemented to ease maintenance activities. In addition, if the optional DFS-R is implemented, the DFS Namespace fails over automatically and manual failover is not required.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

27

Figure 10. Failover to disaster recovery site

The steps required for the failover activities are described in detail in the sections that follow. Note that the steps in all sections need to be completed to successfully fail over to the disaster recovery site.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

28

DNS Switching
Web-tier and application-tier DNS switching require changing the IP address of the DNS host records to the IP address of the relevant server (node) in the disaster recovery site. Following are the steps that need to be followed to change the IP addresses of the DNS host records: 1. Log on to the domain controller as the administrator. 2. Navigate to Server Manager. 3. Expand Roles, expand DNS Server, expand DNS, expand Server Name, and then expand Forward Lookup Zones. 4. Select the domain name (for example, CoE.Temenos.com). Note that T24Browser and optional T24Server are the DNS host records that require the IP changes (Figure 11).

Figure 11. Select the DNS host record

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

29

5. Right click on the DNS host record T24Browser, and then select Properties (Figure 12).

Figure 12. T24Browser DNS host record properties

6. Change the address in the IP address field to the IP address of the web-tier server in the disaster recovery site, and then click OK. If the disaster recovery site has more than one web-tier server, the previous IP address should be the IP address of the web-tier load balancer (NLB cluster).

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

30

7. If the T24Server DNS host record is also available, right-click the DNS host record, and then select Properties. Change the address in the IP address field to the IP address of the application-tier server in the disaster recovery site, and then click OK (Figure 13).

Figure 13. T24Server DNS host record properties

If the disaster recovery site has more than one application-tier server, the IP address should be the IP address of the application-tier load balancer (NLB cluster).

SQL Server 2012 HADR Failover


The SQL Server 2012 HADR failover to the disaster recovery site might be required for the following two scenarios: Planned manual failover Primary site database servers are available, but required to fail over to the disaster recovery site. Unplanned forced failover Complete primary site or primary site database server failure, and the database servers in the primary site are not accessible.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

31

Planned Manual Failover


When the failover is planned, there is no server downtime in the primary site, the Windows Server Failover Cluster (WSFC) is active, and databases are in Synchronized state in both primary and disaster recovery instances of SQL Server. Therefore, before starting the failover procedure, make sure that the databases are in Synchronized state in both primary and disaster recovery instances of SQL Server (Figure 14 and Figure 15).

Figure 14: SQL Server primary instance database status

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

32

Figure 15. SQL Server disaster recovery instance database status

For more information about planned manual failover, see: Perform a Planned Manual Failover of an Availability Group (SQL Server) (http://msdn.microsoft.com/en-us/library/hh231018.aspx).

Limitations and Restrictions


A failover command returns as soon as the target secondary replica has accepted the command. However, database recovery occurs asynchronously after the availability group has finished failing over. Cross-database consistency across databases within the availability group is not maintained during failover. Cross-database transactions and distributed transactions are not supported by AlwaysOn Availability Groups. For more information, see: Cross-Database Transactions Not Supported for Database Mirroring or AlwaysOn Availability Groups (SQL Server) (http://msdn.microsoft.com/en-us/library/ms366279.aspx).

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

33

Prerequisites and Restrictions


The target secondary replica and the primary replica must both be running in synchronouscommit availability mode. The target secondary replica must currently be synchronized with the primary replica. This requires that all the secondary databases on this secondary replica must have been joined to the availability group and must be synchronized with their corresponding primary databases (that is, the local secondary databases must be synchronized). To determine the failover readiness of a secondary replica, query the is_failover_ready column in the sys.dm_hadr_database_cluster_states dynamic management view (see: http://msdn.microsoft.com/en-us/library/hh213319.aspx) or look at the Failover Readiness column of the AlwaysOn Group Dashboard (see: http://msdn.microsoft.com/en-us/library/hh213474.aspx). This task is supported only on the target secondary replica. You must be connected to the server instance that hosts the target secondary replica.

Failover Procedure
Following are the steps that need to be followed to fail over the SQL Server 2012 HADR to the disaster recovery site. 1. Connect to Primary or Secondary (disaster recovery) instance of SQL Server by using the SQL Server 2012 Management Studio (Figure 16).

Figure 16. SQL Server 2012 primary instance

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

34

2. Right-click on the Availability Group (for example, T24AG), and then select Failover (Figure 17).

Figure 17. Select "Failover"

3. In the Fail Over Availability Group Wizard, click Next (Figure 18).

Figure 18. Failover Availability Group wizard Introduction page

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

35

4. In the Select New Primary Replica page, select the secondary SQL Server instance if it is not already selected, and then click Next (Figure 19).

Figure 19. Fail Over Availability Group wizard Select New Primary Replica page

5. In the Connect to Replica page, connect to the secondary instance by providing the credentials, and then click Next (Figure 20).

Figure 20. Fail Over Availability Group wizard Connect to Replica page

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

36

6. Click Finish at the Summary page to start the failover (Figure 21).

Figure 21. Fail Over Availability Group wizard Summary page

7. After the successful failover, the wizard will show a Results page similar to the following (Figure 22).

Figure 22. Fail Over Availability Group Wizard Results Page

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

37

The Validating WSFC quorum vote configuration warning appears because of the special quorum configuration used in this solution and is safe to ignore (Figure 23).

Figure 23. Fail Over Availability Group wizard WSFC quorum configuration warning

8. Check the database status and Availability Group status in SQL Server 2012 Management Studio to verify the failover (Figure 24).

Figure 24. Management Studio after Fail Over Availability Group wizard

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

38

Unplanned Forced Failover


When the primary site or the database servers (nodes) in the primary site are not available, the Windows Server Failover Cluster (WSFC) will not have quorum to bring the cluster online. Therefore WSFC needs to be deliberately started (forced) before the database failover. After bringing the WSFC online with a forced quorum, the SQL Server 2012 AlwaysOn Availability Group needs to force failover to the disaster recovery instance. For more information about unplanned forced failover, see: Perform a Forced Manual Failover of an Availability Group (SQL Server) (http://msdn.microsoft.com/en-us/library/ff877957(SQL.110).aspx).

Limitations and Restrictions


Data loss is possible during the forced failover of an availability group. In addition, if the primary replica is running when you initiate a forced failover, client computers might still be connected to former primary databases. Therefore, it is strongly recommended that you force failover only if the primary replica is no longer running and if you are willing to risk losing data to restore access to databases in the availability group. When a database on a secondary replica is in the REVERTING or INITIALIZING state, forcing failover causes the database to fail to start as a primary database. If the database was in the INITIALIZING state, you will need to apply the missing log records from a database backup or fully restore the database from scratch. If the database was in the REVERTING state, you will need to fully restore the database from backups. A failover command returns as soon as the target secondary replica has accepted the command. However, database recovery occurs asynchronously after the availability group has finished failing over. Cross-database consistency across databases within the availability group is not maintained upon failover. Cross-database transactions and distributed transactions are not supported by AlwaysOn Availability Groups. For more information, see:
Cross-Database Transactions Not Supported for Database Mirroring or AlwaysOn Availability Groups (SQL Server)

(http://msdn.microsoft.com/en-us/library/ms366279.aspx).

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

39

Prerequisites and Restrictions


Windows Server Failover Cluster (WSFC) needs to be brought online with a forced quorum. For more information about the forced quorum procedure, see: WSFC Disaster Recovery through Forced Quorum (SQL Server) (http://msdn.microsoft.com/en-us/library/hh270277.aspx). You must be able to connect to the server instance that hosts the target secondary replica.

Failover Procedure
When the primary site or the primary site database servers are not available, the only accessible database server will be the disaster recovery instance. The following shows how Windows Server Failover Cluster and SQL Server instance can be seen in the disaster recovery database server (Figure 25 and Figure 26).

Figure 25. Windows Server failover Cluster without quorum

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

40

Figure 26. SQL Server 2012 primary site failure

To bring the database online in the disaster recover site, you first need to start Windows Server Failover Cluster with forced quorum, followed by SQL Server 2012 availability group forced failover. The following sub-sections provide the steps required to bring the database online. The steps in all sections need to be completed to successfully fail over to the disaster recovery site. Force Cluster Start with Force Quorum Following are the steps need to be followed to force the cluster to start in the disaster recovery site with force quorum: 1. Log on to the disaster recovery database server with a domain account that has administrator privileges to the local computer.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

41

2. Open Server Manager, expand Features, and then expand Failover Cluster Manager. Select the cluster (Figure 27).

Figure 27. Failed cluster due to quorum vote

3. Click Force Cluster Start in the Actions pane (Figure 28).

Figure 28. Cluster Manager - force cluster start option

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

42

4. Confirm the action by selecting Yes Force my cluster to start option (Figure 29).

Figure 29. Confirm force cluster start

5. Cluster start will take some timewait till the cluster starts successfully (Figure 30).

Figure 30.Cluster force start in progress

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

43

6. After the cluster starts, the cluster will look like the following figure in the Failover Cluster Manager (Figure 31).

Figure 31. Cluster started with force quorum

Force Failover SQL Server 2012 Availability Group Once the Windows Server Failover Cluster is online with force quorum, the following steps need to be followed to force failover in the SQL Server 2012 availability group: 1. Open SQL Server 2012 Management Studio and connect to the SQL Server disaster recovery instance (Figure 32).

Figure 32.SQL Server instance before forced failover

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

44

2. Right-click on the Availability Group (for example, T24AG), and then select Failover (Figure 33).

Figure 33. Start force failover

3. In the Fail Over Availability Group Wizard, click Next (Figure 34).

Figure 34. Fail Over Availability Group wizard Introduction page

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

45

4. In the Select New Primary Replica page, select the secondary SQL Server instance if it is not already selected. Also note the warning. Click Next (Figure 35 and Figure 36). Because the cluster quorum is forced, the quorum status is showing as Forced Quorum.

Figure 35. Fail Over Availability Group wizard Select New Primary Replica page

Figure 36. Fail Over Availability Group wizard Select New Primary Replica page warning

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

46

5. Select and confirm failover with potential data loss, and then click Next (Figure 37). Because the database status is not synchronized, SQL Server warns about potential data loss. However, there is no data loss if the databases were in Synchronized state at the time of the site failure

Figure 37. Fail Over Availability Group wizard Potential Data Loss confirmation

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

47

6. Click Finish on the Summary page to start the failover (Figure 38).

Figure 38. Fail Over Availability Group wizard Force Failover Summary page

7. After the successful force failover, wizard will show the Results page (Figure 39).

Figure 39. Fail Over Availability Group wizard Results page

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

48

The Validating WSFC quorum vote configuration warning appears because of the special quorum configuration that is used in the recommended solution and is safe to ignore (Figure 40).

Figure 40. Fail Over Availability Group wizard WSFC Quorum Configuration warning

8. After successful force failover, the database status and availability group status in SQL Server 2012 Management Studio will look like the following figure (Figure 41).

Figure 41. Management Studio after Fail Over Availability Group wizard

Additional Considerations
It is highly recommended that you change the cluster quorum configuration if planned (scheduled maintenance) or unplanned (primary site disaster) shutdown of all cluster nodes in the primary site occurs, and if the disaster recovery SQL Server 2012 instance becomes active as the primary instance for an extended period of time. If you do not change the cluster quorum configuration, the entire cluster might shut down because of insufficient quorum vote availability.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

49

Change the value for the disaster recovery cluster node property NodeWeight to 1, and change the value for the cluster nodes in the primary site to 0. For more information, see the Microsoft Support article at http://support.microsoft.com/kb/2494036/en-us. Shutting down only one node in the primary site will not affect cluster availability as long as the second node in the primary site will be still up and running along with the File Share Witness (FSW).

If the FSW in the primary site will not be available and cannot be contacted by the cluster node in the disaster recovery site, change the FSW location to be in the disaster recovery site.

Running the entire system with only one node in the disaster recovery site will not guarantee high availability. Therefore, this should only be done for a limited amount of time. Otherwise, it is highly recommended that you add a second node in the disaster recovery site and modify the cluster quorum configuration accordingly.

Findings and Carryovers


The following findings and carryovers were noted during the testing of the proposed solution in this document. Using the NLB feature in Windows Server provides better stability, better scalability, and faster failover with no additional cost. NLB also lets you transparently add or remove nodes in the web and application tiers. JBoss session persistence increases the reliability and provided better scalability for the solution. Removing the sticky-session requirement in T24Browser makes the solution more reliable and scalable. A JBoss session persistence database in the same SQL Server 2012 AlwaysOn Availability Group reduces the administrative work and reduces the steps in the disaster recovery procedures. SQL Server 2012 HADR and AlwaysOn provides simplified disaster recovery failover while maintaining database replica in the disaster recovery site. T24 works well with a configuration that uses the NLB feature in Windows Server and provides faster application-tier failover.

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

50

Windows Server DFS-R with DFS Namespace published in Active Directory Domain Services provides a unique URL that can be used to refer the file share, regardless of the system that is operating in the primary or the disaster recovery environment. File and folder symbolic links make the shared file/folder access more resilient. A clustered instance of SQL Server 2012 for high availability reduces licensing requirements. A SQL Server 2012 AlwaysOn Availability Group eliminates SAN replications. DNS host records used for the load balancer IP addresses make disaster recovery failover transparent at the web and application tiers.

Recommended Hotfixes and Service Packs


The following best practices apply to the recommended configuration: Regularly check and apply all the security hotfixes for Windows Server 2008 R2. Regularly check and apply the latest available service pack for Windows Server 2008 R2 after checking with Temenos about the supportability. o NOTE Currently, Service Pack 1 (SP1) for Windows Server 2008 R2 is available and certified by both Microsoft and Temenos.

Regularly check and apply the pertinent hotfixes mentioned in the following knowledge base (KB) article to enhance stability and fix known critical bugs (not security related). Recommended hotfixes and updates for Windows Server 2008 R2based server clusters http://support.microsoft.com/kb/980054/en-us

As a special out-of-band recommended hotfix for Windows Server 2008 R2, please install the following hotfix on all the cluster nodes in the primary and disaster recovery sites. A hotfix that improves the performance of the "AlwaysOn Availability Group" feature in SQL Server 2012 is available for Windows Server 2008 R2 http://support.microsoft.com/kb/2687741/en-us

Regularly check and apply all the security hotfixes for SQL Server 2012. o NOTE Currently, SQL Server 2012 does not have any security hotfixes released.

Regularly check and apply the latest available service pack for SQL Server 2012 after checking with Temenos about the supportability. o NOTE Currently there is no released service pack for SQL Server 2012.
51

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

As a special out-of-band recommended hotfix for SQL Server 2012, install the following update package on all the SQL Server 2012 instances in the primary and disaster recovery sites. Cumulative update package 1 for SQL Server 2012 http://support.microsoft.com/kb/2679368/en-us NOTE If a more recent update is available, it is not necessary to install the previous hotfix.

Regularly check for latest cumulative update (CU) release for SQL Server 2012, review the fixed bugs and install only if you are affected and after checking with Temenos about supportability. For a list of released CUs for SQL Server 2012, see the following KB article. The SQL Server 2012 builds that were released after SQL Server 2012 was released http://support.microsoft.com/kb/2692828/en-us

Finally, it is highly recommended that you check periodically with the Microsoft Support Service for any recommended non-security related hotfixes for Windows Server 2008 R2 and SQL Server 2012.

Additional Resources
Following are links for further information.

SQL Server 2012


Books Online for SQL Server 2012 http://msdn.microsoft.com/en-us/library/ms130214.aspx Database Availability Key Capabilities and Concepts: o o Failover Clustering and AlwaysOn Availability Groups (SQL Server) http://msdn.microsoft.com/en-us/library/ff929171.aspx Active Secondaries: Readable Secondary Replicas (AlwaysOn Availability Groups) http://msdn.microsoft.com/en-us/library/ff878253.aspx

Database Availability Step-by-Step Guide: o Deploying a new Availability Group http://msdnstage.redmond.corp.microsoft.com/enus/library/ff877884.aspx#RelatedTasks Create or Configure an Availability Group Listener (SQL Server) http://go.microsoft.com/fwlink/?LinkId=201271 Perform a Forced Manual Failover of an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/ff877957.aspx

o o

Instance Availability Key Capabilities and Concepts:

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

52

Failover Policy for Failover Cluster Instances http://msdn.microsoft.com/en-us/library/ff878664.aspx

Instance Availability Step-by-Step Guide: o o o SQL Server Multi-Subnet Clustering http://msdn.microsoft.com/en-us/library/ff878716.aspx Configure FailureConditionLevel Property Settings http://msdn.microsoft.com/en-us/library/ff878667.aspx View and Read Failover Cluster Instance Diagnostics Log http://msdn.microsoft.com/en-us/library/ff878700.aspx

AlwaysOn FAQ for SQL Server 2012 http://msdn.microsoft.com/en-us/sqlserver/gg508768(l=en-us) Hardware and Software Requirements for Installing SQL Server 2012 http://msdn.microsoft.com/en-us/library/ms143506.aspx Introducing SQL Server AlwaysOn http://msdn.microsoft.com/en-us/sqlserver/gg490638 Overview of AlwaysOn Availability Groups http://msdn.microsoft.com/en-us/library/ff877884.aspx Prerequisites, Restrictions, and Recommendations for AlwaysOn Availability Groups http://msdn.microsoft.com/en-us/library/ff878487.aspx#SystemReqsForAOAG Before Installing Failover Clustering http://msdn.microsoft.com/en-us/library/ms189910.aspx Create a New SQL Server Failover Cluster (Setup) http://msdn.microsoft.com/en-us/library/ms179530.aspx Add or Remove Nodes in a SQL Server Failover Cluster (Setup) http://msdn.microsoft.com/en-us/library/ms191545.aspx Microsoft SQL Server AlwaysOn Solutions Guide for High Availability and Disaster Recovery http://download.microsoft.com/download/D/2/0/D20E1C5F-72EA-4505-9F26FEF9550EFD44/Microsoft%20SQL%20Server%20AlwaysOn%20Solutions%20Guide%20for% 20High%20Availability%20and%20Disaster%20Recovery.docx Availability Modes http://msdn.microsoft.com/en-us/library/ff877931.aspx

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

53

AlwaysOn Failover Cluster Instances http://msdn.microsoft.com/en-us/library/ms189134.aspx Enable and Disable AlwaysOn Availability Groups (SQL Server) http://msdn.microsoft.com/en-us/library/ff878259.aspx Creating an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/ff878176.aspx Create or Configure an Availability Group Listener (SQL Server) http://msdn.microsoft.com/en-us/library/hh213080.aspx Monitor Availability Groups http://msdn.microsoft.com/en-us/library/ff878305.aspx AlwaysOn Availability Groups Dynamic Management Views and Functions http://msdn.microsoft.com/en-us/library/ff877943.aspx Manually Prepare a Secondary Database for an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/ff878349.aspx SQL Server 2012 AlwaysOn: Multisite Failover Cluster Instance http://sqlcat.com/sqlcat/b/whitepapers/archive/2011/12/22/sql-server-2012alwayson_3a00_-multisite-failover-cluster-instance.aspx Perform a Forced Manual Failover of an Availability Group http://msdn.microsoft.com/en-us/library/ff877957.aspx Availability Group Listeners, Client Connectivity, and Application Failover (SQL Server) http://msdn.microsoft.com/en-us/library/hh213417.aspx Configure Read-Only Access on an Availability Replica (SQL Server) http://msdn.microsoft.com/en-us/library/hh213002.aspx Configure Read-Only Routing on an Availability Group (SQL Server) http://msdn.microsoft.com/en-us/library/hh710054.aspx Client Connection Access to Availability Replicas (SQL Server) http://msdn.microsoft.com/en-us/library/hh510184.aspx Configure Read-Only Access on an Availability Replica http://msdn.microsoft.com/en-us/library/hh213002.aspx Configure the Windows Firewall to Allow SQL Server Access http://msdn.microsoft.com/en-us/library/cc646023.aspx

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

54

How to use Kerberos authentication in SQL Server http://support.microsoft.com/kb/319723/en-us How to transfer the logins and the passwords between instances of SQL Server 2005 and SQL Server 2008 http://support.microsoft.com/kb/918992/en-us SQL Server Web site http://www.microsoft.com/sqlserver SQL Server Tech Center http://technet.microsoft.com/en-us/sqlserver SQL Server Dev Center http://msdn.microsoft.com/en-us/sqlserver

Windows Server Failover Cluster


Windows Server | Failover Clustering and Node Balancing http://www.microsoft.com/windowsserver2008/en/us/failover-clustering-main.aspx Checklist: Create a Failover Cluster http://technet.microsoft.com/en-us/library/cc755009.aspx Failover Cluster Step-by-Step Guide: Validating Hardware for a Failover Cluster http://technet.microsoft.com/en-us/library/cc732035(WS.10).aspx Failover Cluster Step-by-Step Guide: Configuring the Quorum in a Failover Cluster http://technet.microsoft.com/en-us/library/cc770620(v=ws.10).aspx Failover Cluster Step-by-Step Guide: Configuring Accounts in Active Directory http://technet.microsoft.com/en-us/library/cc731002(WS.10).aspx Configure Cluster Quorum NodeWeight Settings http://msdn.microsoft.com/en-us/library/hh270281(SQL.110).aspx Force a WSFC Cluster to Start Without a Quorum http://msdn.microsoft.com/en-us/library/hh270275(v=SQL.110).aspx Failover Policy for Failover Cluster Instances http://msdn.microsoft.com/en-us/library/ff878664(SQL.110).aspx Checklist: Create a Clustered File Server http://technet.microsoft.com/en-us/library/cc753969.aspx

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

55

Recommended hotfixes and updates for Windows Server 2008 R2-based server clusters http://support.microsoft.com/kb/980054/en-us A hotfix that improves the performance of the "AlwaysOn Availability Group" feature in SQL Server 2012 is available for Windows Server 2008 R2 http://support.microsoft.com/kb/2687741/en-us

Network Load Balancing


Network Load Balancing http://technet.microsoft.com/en-us/library/cc770558(v=ws.10).aspx NLB 101: How NLB balances network traffic http://blogs.technet.com/b/networking/archive/2008/10/01/nlb-101-how-nlb-balancesnetwork-traffic.aspx Network Load Balancing parameters http://technet.microsoft.com/en-us/library/cc778263.aspx Specifying the Affinity and Load-Balancing Behavior of the Custom Port Rule http://technet.microsoft.com/en-us/library/cc759039.aspx Upgrading the Network Load Balancing Cluster (to 2008) http://technet.microsoft.com/en-us/library/cc755161.aspx Network Load Balancing: Configuration Best Practices for Windows 2000 and Windows Server 2003 http://www.microsoft.com/downloadS/details.aspx?FamilyID=d24c373e-bafc-4e31-b1b2d86584a12ca4&displaylang=en

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

56

About Temenos
Founded in 1993 and listed on the Swiss Stock Exchange (SIX: TEMN), Temenos Group AG is the market-leading provider of banking software systems to retail, corporate, universal, private, Islamic, and microfinance and community banks. Headquartered in Geneva with more than 60 offices worldwide, Temenos serves more than 1,500 customers in 125 countries. Temenos software products provide advanced technology and rich functionality, incorporating best-practice processes that take advantage of Temenos experience in 700 implementations around the globe. For more information, visit: www.temenos.com

About Microsoft
Founded in 1975, Microsoft (Nasdaq "MSFT") is the worldwide leader in software, services, and solutions that help people and businesses realize their full potential. For more information, visit: www.microsoft.com

The Microsoft High Availability and Disaster Recovery Solution for TEMENOS T24

57

You might also like