You are on page 1of 64

High Availability and DR Test Report

T24 Architecture with JMS Connectivity


Oracle Stack

Information in this document is subject to change without notice.

No part of this document may be reproduced or transmitted in any form or by any means, for any purpose,
without the express written permission of TEMENOS HEADQUARTERS SA.

COPYRIGHT 2016 TEMENOS HEADQUARTERS SA. All rights reserved.


T24 Reference Architecture Oracle Platform View

Table of Contents

Document History..................................................................................................................... 4
Contributors.............................................................................................................................. 4
Temenos............................................................................................................................... 4
Oracle................................................................................................................................... 5
Trademark................................................................................................................................ 5
References............................................................................................................................... 5
Introduction.............................................................................................................................. 6
Executive Summary................................................................................................................. 6
HA Tests with Online Traffic.................................................................................................6
HA Tests with COB............................................................................................................... 7
DR Tests with Online Traffic................................................................................................. 7
DR Tests with COB.............................................................................................................. 8
Solution Deployment................................................................................................................ 8
Solution Description.............................................................................................................. 8
Architecture Diagram............................................................................................................ 9
HA Design Considerations.................................................................................................... 9
DR Design Considerations................................................................................................. 10
Timeouts Considerations.................................................................................................... 11
Software Deployed............................................................................................................. 11
Issues Identified and Fixes Applied........................................................................................ 11
Issue 1: T24ConnectionFactory Load Balancing causing failures......................................11
Issue 2: Session replication of the BrowserWeb application is not working........................12
Issue 3: Missing managed server start up argument on the app layer...............................12
Issue 5: Failures to cast to XML Type.................................................................................12
Issue 6: tLockManager is corrupting the database.............................................................13
Issue 7: Node manager fails to restart an OHS process when killed..................................13
Issue 8: Running COB from servlet....................................................................................13
Issue 9: Errors reported by JMeter are not confirmed by missing records in database......13
Issue 10: The Temenos logs produced with Weblogic don’t have the right permissions....14
Testing Approach................................................................................................................... 14
Test Data............................................................................................................................ 14
HA Tests with Online Traffic............................................................................................... 14
HA Tests with COB............................................................................................................. 16
DR Tests with Online Traffic............................................................................................... 16

2 Competency Centre
T24 Reference Architecture Oracle Platform View

DR Tests with COB............................................................................................................ 16


Baseline Tests........................................................................................................................ 16
Baseline Test with Online Traffic........................................................................................ 16
Baseline Test with COB...................................................................................................... 20
Application Layer HA Tests.................................................................................................... 21
Kill of MS, AS and NM Processes On App Layer...............................................................21
Graceful Shutdown and Start of MS Processes on App Layer...........................................24
Restart of App Layer VM Nodes......................................................................................... 27
Web Layer HA Tests.............................................................................................................. 30
Kill of MS, AS and NM Processes On Web Layer..............................................................30
Graceful Shutdown and Start of MS Processes on Web Layer..........................................33
Kill OHS Processes on Web Layer.....................................................................................36
Graceful Restart of OHS..................................................................................................... 39
Shutdown and Start of OHS Processes on Web Layer.......................................................39
Restart of Web Layer VM Nodes........................................................................................ 42
Data Layer HA Tests.............................................................................................................. 45
Shutdown of DB Nodes...................................................................................................... 45
Restart of Database VM Nodes.......................................................................................... 47
COB HA Tests........................................................................................................................ 50
COB with App Server VM Restart.......................................................................................50
COB with Database VM Restart......................................................................................... 50
DR Tests with Online Traffic................................................................................................... 51
Site Switchover................................................................................................................... 51
Site Failover........................................................................................................................ 52
DR Tests with COB................................................................................................................ 54
Site Switchover................................................................................................................... 54
Site Failover........................................................................................................................ 55

3 Competency Centre
T24 Reference Architecture Oracle Platform View

Document History

Author Version Date

Mohand Oussena 0.1 26/10/2016

Mohand Oussena 0.2 27/10/2016

Mohand Oussena 0.3 31/10/2016

Mohand Oussena 0.4 1/11/2016

Mohand Oussena 0.5 11/11/2016

Mohand Oussena 0.6 08/03/2017

Comments:

0.1 – First draft

0.2 – Applied the Temenos template to the document and made some other modifications

0.3 – Added sections HA Design Considerations, DR Design Considerations and Timeouts


Considerations

0.4 – Captured comments from Nanda Badrappan

0.5 – Captured comments from Dylan

0.6 – Captured additional comments from Nanda

Contributors
Temenos
Name Role

Simon Henman Product Manager

Nanda Badrappan Project Manager

Surender Padakanti Technical Approver

Yanxin Zhao Database Developer

Julie Bennett Database Administrator

Sheeraz Junejo Solution Architect

Mohand Oussena Performance Architect

4 Competency Centre
T24 Reference Architecture Oracle Platform View

Oracle
Name Role

Felipe Garre Sales Consulting Manager

Dylan Lobo Sales Consultant

Richard Jacob Cloud Solution Architect

Diego Ibanez Caro Sales Consultant

Trademark

TEMENOS T24TM is a registered trademark of the TEMENOS GROUP and referred to as


‘T24’.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may
be trademarks of their respective owners.

References
T24 TAFJ Runbook
Available under TAFJ_HOME/doc

Glossary
Acronym Description

TWS Temenos Web Services

TAFJ Temenos Application Framework Java

TCIB Temenos Connect Internet Banking

OEL Oracle Enterprise Linux

JDK Java Development Kit

5 Competency Centre
T24 Reference Architecture Oracle Platform View

Introduction
The document reports the results of the high availability and disaster recovery testing carried
out on Temenos T24 banking product.
The T24 architecture tested is the one using JMS connectivity between the web and app
layers and is tested on the oracle stack.

Executive Summary
HA Tests with Online Traffic
The solution is highly available and recovers within few seconds but not without errors.
However the rate of errors is very low. Error count shown on table below are to be compared
to a total of 33k transactions. Note that errors reported by JMeter reflect errors that would be
experienced by a real end user.
During failure, the remaining servers have been found to stay balanced.
The disturbance observed with a graceful shutdown of managed servers from the Weblogic
console was not expected.
The downtime caused by disturbances on the data layer may be reduced if Active GridLink
data source is used. This will be investigated in the near future.
Numbers on the table are average per test of the same kind. The “Disturbance time” on the
table is the time period where errors have been registered by JMeter.

HA Test Description Jmeter errors DB missing records Disturbance time (s)

App Layer      

NM and AS restarts 0 0 0

MS process kill 5 2 6

MS Graceful shutdown & Start 6 6 27

VM Restart 7 5 17

Web Layer      

NM and AS restarts 0 0 0

MS process kill 2 1 4

MS Graceful shutdown & Start 3 1 6

OHS process kill 2 1 3

Graceful OHS restart 0 0 0

OHS shutdown & start 6 4 8

VM Restart 4 3 5

Data Layer      

6 Competency Centre
T24 Reference Architecture Oracle Platform View

DB Node non-graceful shutdown 17 17 26

VM Restart 22 21 45

HA Tests with COB


At the start of the COB, one tSA per app server got started. For the first test, the tSA stopped
on the VM that got restarted and hence COB carried on with 3 tSAs.
For the second test, one tSA stopped unexpectedly. It had to be restarted.
The COB finished successfully with no errors.

HA Test Description COB Duration (min) Comment

App Layer VM restart 57 4 tSA and then 3 after restart

DB VM restart 40 4 tSA from beginning to end

The COB times are to be compared to baseline tests with 4 tSA that took 43 min. The first test
where an App Server VM got restarted took longer as expected since the COB continued with
3 tSAs.
For the second test, a tSA on one of the app servers had to be restarted manually so the test
could carry on with 4 tSAs.

DR Tests with Online Traffic


The system recovered after a certain downtime when either a switchover or a failover has
been triggered.

Site Switchover
DB Switchover  Load balancer switchover 
Start End Duration Start End
11:59:59 12:01:13 00:01:14 12:00 instantaneous

Jmeter errors capture Error count and missing records

Start End Downtime Error count missing db rec

11:59:49 12:03:12 00:03:23 2613 1170

The “Downtime” on the table is the time period where JMeter recorded errors. JMeter error
count are read and write errors. Note that downtime is higher than site switchover time.

Site Failover
DB Kill Failover     Load balancer switchover 

7 Competency Centre
T24 Reference Architecture Oracle Platform View

Time Start End Downtime Start End


14:01:17 14:02:36 14:06:55 00:04:19 14:01:20 instantaneous

Jmeter errors capture Error count and missing records

Start End Downtime Error count missing db rec

14:01:15 14:07:31 00:06:16 4443 2007

The “Downtime” on the table is the time period where JMeter recorded errors. JMeter error
count are read and write errors. The downtime here includes an additional voluntary delay,
about 2 minutes, between killing the db and triggering the failover.

DR Tests with COB


We were able to get the COB to finish successfully after a switchover or a failover to DR.
Note however that we had to use a different DBTools user than the one used on live site in
order to create the UD subdirectories on the DR site prior to re-starting COB. The user used
on LIVE site was locked.

Site Switchover
Site Switchover 
Start End DB Open State Downtime
09:24:08 09:26:08 09:26:26 00:02:18

Start of COB on LIVE site  Completion of COB on DR site 

tSA tSA
s Start Stop COB state Duration s Start End Duration Total Duration

2 09:10:47 09:21:33 App @ 61% 00:10:46 1 09:35:27 10:39:34 01:04:07 01:14:53

What is relevant here is the fact COB could be finished successfully.

Site Failover
DB Kill Site Failover 
Time Start End Downtime
14:48:30 14:53:39 15:04:31 00:10:52

Start of COB on LIVE site  Completion of COB on DR site 

tSAs Start Stop COB state Duration tSAs Start End Duration Total Duration

2 14:33:43 14:48:30 App @ 85% 00:14:47 1 15:11:23 16:06:12 00:54:49 01:09:36

What is relevant here is the fact COB could be finished successfully.

8 Competency Centre
T24 Reference Architecture Oracle Platform View

After the COB got finished, switching back to live site has been executed successfully and the
live db became in sync with DR db.

Solution Deployment
Solution Description
The solution has been deployed to be highly available using Oracle Cloud Service in a three
tiered architecture: Web layer, App layer and Data layer. The app and web layers are
Weblogic clusters made of four managed servers. The data layer is a RAC Oracle database
with two nodes. See architecture diagram in next section.
The web layer also has two Oracle Http Servers, OHS instances collocated with web MS2
and web MS3. These OHS instances have been configured to forward requests to all
managed servers on the web layer. The benefit of using OHS is that they can be configured
to target the cluster. Also, enabling the dynamic list makes it possible to scale up the
environment without having to restart anything. The other benefit is that, with replication
enabled, the OHS instances favour the primary session when distributing requests. The
drawback is that web servers may not be very well balanced when number of sessions is not
high, but throughput is optimised.
A load balancer has also been configured to transfer requests to both OHS instances in a RR
fashion.
For DR site to work, an infrastructure database has been added to host schemas required by
oracle technology. The RAC database contains the T24 schema only. The database on DR
site is kept in sync with the database on live site using Oracle DATAGUARD technology.

9 Competency Centre
T24 Reference Architecture Oracle Platform View

Architecture Diagram

HA Design Considerations

External Load Balancer


Online requests first hit a load balancer. It is recommended to use a highly available hardware
load balancer. There wasn’t one available so a non-redundant software load balancer has
been used instead.

Web Layer
The HA at the web layer has been achieved as follows:
 The Temenos code is deployed in a Weblogic cluster of four managed servers. The
code has been deployed with replication enabled. This means there is no need to
configure sticky sessions at the load balancer.

10 Competency Centre
T24 Reference Architecture Oracle Platform View

 Two collocated OHS servers in active/active mode, configured to forward requests


from the load balancer to all managed servers in the weblogic cluster. These OHS
instances favour the location of the primary sessions, which would optimise
throughput. However, managed servers may not be well balanced if the number of
sessions is low.
 A foreign JMS server with a JNDI URL pointing to all app servers for redundancy. It
would have been enough to point to only two of them to ensure high availability.

App Layer
The HA at the app layer has been achieved as follows:
 The Temenos code is deployed in a Weblogic cluster of four managed servers.
 Requests from the Web layer are transferred to app servers using JMS technology.
o The browser request queue is configured to be uniformly distributed
(WebLogic Uniform Distributed Queue), accessed from the web layer using a
Foreign JMS server. The latter has been configured to point to all app servers
to ensure high availability.
o The JMS connection factory had load balancing enabled, which ensures
requests are load balanced among all four app servers
o A distributed reply queue was found to generate errors. Instead, four local
browser reply queues with local JNDI have been configured. This setup
required a fix, which has been described in the section “ Issue 1:
T24ConnectionFactory Load Balancing causing failures ”.
o A JMS server on each of the app servers has been configured for high
availability and load balancing.
 The Temenos shared libraries (TAFJ runtime and T24) have been installed on each
app server. A reliable shared storage was not available.

Data Layer
The HA at the data layer has been achieved as follows:
 A RAC database with two nodes
 The app layer uses a URL with SCAN addresses to point to the database
 A generic data source type has been used.

Active GridLink datasource for fast failover has not been tested yet but is in the plan. It
may reduce the downtime for disturbances on the data layer.

DR Design Considerations
The LIVE and DR sites have been configured in Active/Standby mode.
An infrastructure database has been added to store the Weblogic schemas. The RAC
database described in the section “HA Design Considerations” contains the T24 database
only. This database is kept in sync with the DR RAC database, using DATAGUARD
technology.

Timeouts Considerations
Timeouts exist at various levels and need to be consistently configured.
 ConnectionTimeout defined in BrowserParameters.xml located in BrowserWeb.war
archive, currently set to 60s

11 Competency Centre
T24 Reference Architecture Oracle Platform View

 Web Layer: Weblogic JTA timeout, currently set to 50s


 App Layer: Weblogic JTA timeout, currently set to 40s
 ofsTimeout defined in ejb-jar.xml located in TAFJJEE_EJB.jar, currently set to 30s

Software Deployed

Temenos
 TAFJ R16 SP1
 TAFJ Java Functions Version in DB: PB201510 08/18/2015
 T24 R16 AMR

Infrastructure
 JDK 1.7.0
 Oracle Linux Release 6.6
 Weblogic server version 12.1.3.0.160419
 Oracle Database 12c Enterprise Edition Release 12.1.0.2.0

Issues Identified and Fixes Applied


Issue 1: T24ConnectionFactory Load Balancing causing failures

Issue Description
While generating online traffic with many concurrent users, a lot of errors occured. The errors
have been tracked to be caused by JMS exceptions resulting from timeouts while waiting for
replies on the T24BrowserReplyQueue.
We found that, enabling affinity and disabling load balancing on the T24Connection factory,
gets rid of the errors. However the application layer is not balanced. We also found that, with
affinity set, a given web server has its requests handled by one given app server. This means,
in case of an app server failure, all requests would be redirected to one single app server,
which may create overload situations.
Note that when affinity is enabled the load balancing option becomes redundant.

Recommended Solution
Replace the uniformly distributed reply queue with global JNDI name by local reply queues
with local JNDI names. The local JNDI names have to be the same on all app servers,
namely “jms/t24BrowserReplyQueue”.
This solution requires updating the ejb-jar.xml in the TFAJEE_MDB.jar as follows:
<message-driven>
<display-name>Transacted Listener MDB for BROWSER</display-name>
<ejb-name>BROWSERTransactedMDB</ejb-name>
<ejb-class>com.temenos.tafj.mdb.TransactedMDB</ejb-class>
<messaging-type>javax.jms.MessageListener</messaging-type>
<transaction-type>Container</transaction-type>
<message-destination-type>javax.jms.Queue</message-destination-type>
<env-entry>
<description>Enable jmsReplyTo feature of an MDB </description>
<env-entry-name>com.temenos.tafj.mdb.TransactedMDB/sendToJmsReplyTo</env-
entry-name>

12 Competency Centre
T24 Reference Architecture Oracle Platform View

<env-entry-type>java.lang.Boolean</env-entry-type>
<env-entry-value>true</env-entry-value>
</env-entry>
<ejb-local-ref>
.
.
.
</message-driven>

Issue 2: Session replication of the BrowserWeb application is not


working

Issue Description
When running traffic with sticky session disabled on the load balancer, errors are generated.

Recommended Solution
The following has been added to the weblogic.xml in the BrowserWeb.war file in order to
enable replication:
<session-descriptor>
<persistent-store-type>replicated_if_clustered</persistent-store-type>
</session-descriptor>

Issue 3: Missing managed server start up argument on the app layer

Issue Description
The following error occurs:
####<Aug 18, 2016 4:51:19 PM UTC> <Error> <HTTP> <applayer-live-wls-2.compute-
temoarch.oraclecloud.internal> <AppLayer_server_2> <[ACTIVE] ExecuteThread: '13' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <7b22a89e-fb5f-4dfa-820a-3079ff7cdcf0-
00007d40><1471539079942><BEA-101019>
<[ServletContext@2093712906[app:bea_wls_cluster_internal
module:bea_wls_cluster_internal.war path:null spec-version:3.0]] Servlet failed with an IOException.
java.io.NotSerializableException: com.sun.jersey.server.impl.cdi.CDIExtension

Recommended Solution
The issue and fix is described in the following support knowledge article:
https://support.oracle.com/epmos/faces/DocumentDisplay?id=1490080.1

Add the following property to the app layer managed server’ start up argument list:
-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=true

Issue 5: Failures to cast to XML Type

Issue Description
Following errors have been observed:
[ERROR] 2016-08-21 12:50:53,990 [[ACTIVE] ExecuteThread: '14' for queue: 'weblogic.kernel.Default
(self-tuning)'] DATABASE - JDBC Read : Failed to cast directly to XMLType. Using OPAQUE
java.lang.NullPointerException
at com.temenos.tafj.dataaccess.specific.OracleSpecific.read(OracleSpecific.java:255)
at com.temenos.tafj.dataaccess.jTable.readWithFlags(jTable.java:970)

13 Competency Centre
T24 Reference Architecture Oracle Platform View

at com.temenos.tafj.dataaccess.jTable.read(jTable.java:822)
at com.temenos.tafj.dataaccess.JDBCDataAccessConductor.readWithFlags
(…)

Recommended Solutions
Apply TAFJ SP1 and add the following property to tafj.properties file:
temn.tafj.jdbc.use.sqlxml.resultset = false

Issue 6: tLockManager is corrupting the database

Issue Description
During COB tests, for some unknown reason, connection to tLockManager is getting lost in
the oracle cloud environment that was used. As a result, only few lines get written to the db
leading to data corruption or loss.

Recommended Solutions
A TAFJ fix has been released in TAFJ R16 SP3.
When a connection to tLockManager is lost, a runtime exception would be generated rolling
back all the db writes.

Issue 7: Node manager fails to restart an OHS process when killed.

Issue Description
During HA testing, it was found that Node Manager fails to restart the OHS process when it is
killed with a kill command.

Recommended Solutions
Increase the “RestartDelaySeconds” to to 10s and “RestarMax” to 5 on the
“startup.properties” file located where indicated below and then restart Node Manager.
/
u01/data/domains/WebLayer_domain/system_components/OHS/OHS_2/data/nodemanager/startup.pro
perties

Issue 8: Running COB from servlet

Issue Description
COB could be run in classic mode but not from servlet.

Recommended Solution
Add the following argument to the server start on Weblogic to every MS:
-Dhostname=<hostname>

14 Competency Centre
T24 Reference Architecture Oracle Platform View

Issue 9: Errors reported by JMeter are not confirmed by missing


records in database

Issue Description
During some baseline tests, JMeter reported some write errors that have not been confirmed
by the database. This kind of mismatch will create an unnecessary negative user experience.
JMeter errors have been matched to following kind of error messages on App server logs:
<Oct 20, 2016 2:02:36 PM UTC> <Error> <EJB> <BEA-010026> <Exception occurred during commit of
transaction Name=[EJB
com.temenos.tafj.mdb.TransactedMDB.onMessage(javax.jms.Message)],Xid=BEA1-
1416406F23FC17BCE447(790252712),Status=Rolled back.
[Reason=weblogic.transaction.internal.TimedOutException: Transaction timed out after 29 seconds

Recommended Solutions
See section “Timeouts Considerations” for more details.
The following is to be implemented:
 The timeout specified in the ejb-jar.xml on the TAFJEE_EJB.jar has to be shorter than
the JTA timeout on weblogic. This timeout has been decreased from 300s to 30s. Not
sure why it was set so high initially.
 The JTA timeout has been increased from 30s to 40s. This timeout has to be
increased at both cluster and domain level because of a bug. The relevant patch
cannot be applied on the cloud as it is a trial account, see link below:
o https://support.oracle.com/epmos/faces/DocumentDisplay?id=2180843.1

Issue 10: The Temenos logs produced with Weblogic don’t have the
right permissions

Issue Description
The Temenos logs produced with Weblogic don’t have the right permission. Permission
denied errors are generated when working as a temenos user in classic mode and executing
tRun command.
The file permission is as follows:
-rw-r----- 1 oracle oracle 30512 Oct 20 14:27 database.log

The temenos user is configured to be part of oracle group, so rw permission is required for
the oracle group.

Recommended Solutions
The following has been changed:
 Change umask value to 002 for the NM script located here:
$WL_HOME/server/bin/startNodeManager.sh
 Change umask value to 002 for following script
$DOMAIN_HOME/bin/startWeblogic.sh
 Restart NM by running $DOMAIN_HOME/bin/startNodeManager.sh
 Restart Managed Server

The file permissions have changed to become rw for the oracle group as required.

15 Competency Centre
T24 Reference Architecture Oracle Platform View

Testing Approach
Test Data
Temenos Model Bank

Tools
JMeter and AppDynamics

HA Tests with Online Traffic

Test Traffic Generation


 Traffic is generated by JMeter scripts
 The requests are sent to the load balancer, which forwards requests to two OHS
instances located on the same box as web managed server 2 and 3.
 These OHS instances are configured to route traffic to all managed servers on the
web layer.
 JMeter executes 10 concurrent threads, each doing 300 iterations of the following:
o Login as internal user
o Create customer
o Create two accounts for the customer
o Open till
o Local teller cash deposit on one of the accounts
o Account balance (ACCT.BAL.TODAY)
o Two statement requests (STMT.ENT.BOOK)
o logoff
 Pacing has been used. A random delay of up to 500 ms has been inserted between
the transactions on the JMeter scripts. This will make the throughput less sensitive to
transaction response times.
 The LoopController used on JMeter, to generate the 300 iterations, does not reset the
cookie after logoff step. This means the load balancer sees 10 concurrent sessions.

Test Validation
The JMeter scripts have robust response assertions. In addition, at the end of every test run,
the following SQL scripts will be executed against the database to count the total number of
records that have been inserted:
 Select count(*) from FBNK.CUSTOMER;  (there are none before the run)
 Select count(*) from FBNK.ACCOUNT;  (there are none before the run)
 Select count(*) from FBNK.TELLER;  (there are 16 before the run)

JMeter Error Count


Every thread on JMeter executes the requests sequentially. So, if during a failure test, the
login page failed then all subsequent transactions will fail. These additional failures should be
discounted when counting failures caused by the failure test. These additional failures are
script limitation failures.
Note that errors reported by JMeter reflect what a real end user would see.

16 Competency Centre
T24 Reference Architecture Oracle Platform View

Test Execution

Killing of Processes
While traffic is running, the following commands will be executed:
 MS1:
o date;pgrep -lf java | awk '/AppLayer_admin/ {print $1,$8}'
o date;pgrep -lf java | awk '/AppLayer_server/ {print $1,$8}'
o date;pgrep -lf java | awk '/NodeManager/ {print $1,$(NF-1)}'
o date;kill -9 <process Id>
 MS2, MS3 and MS4
o date;pgrep -lf java | awk '/AppLayer_server/ {print $1,$4}'
o date;pgrep -lf java | awk '/NodeManager/ {print $1,$(NF-1)}'
o date;kill -9 <process Id>
 Similar commands for Web layer by replacing App with Web.

Graceful Shutdown of MS
While traffic is running, use Weblogic console to gracefully shut down and then start the
managed servers one at a time.

DB Node Shutdown
While traffic is running, the following command line will be used to shut down the DB service:
o SQL>shutdown abort;

Restart of VMs
While traffic is running, use Oracle Cloud Console to restart the relevant box.

HA Tests with COB


While COB is running, an App server VM and then a DB server VM will be restarted using
OEM.

DR Tests with Online Traffic


Execution steps are as follows:
1. Generate traffic with JMeter and let run for 10 minutes.
2. Switchover or failover DB, depending on the test
a. If switchover test: execute switchover
b. If failover test: run “srvctl stop database –d t24db –o abort”, wait a couple of
minutes and then execute failover
3. Update load balancer by enabling rooting to DR site server and then disabling rooting
to live site servers.

DR Tests with COB


Execution steps are as follows:
1. Start COB on LIVE site and wait for 10 min
2. Switchover or failover DB, depending on the test.
a. If switchover test: stop COB on live site first and then execute switchover.
b. If failover test: run “srvctl stop database –d t24db –o abort”, wait a couple of
minutes and then execute failover

17 Competency Centre
T24 Reference Architecture Oracle Platform View

3. Continue COB on DR server


c. If switchover test: set TSM and COB to START. Ignore this step for failover test.
d. Create UD subdirectories on DR server (this may not be necessary if shared
storage is used by both LIVE and DR sites)
e. Execute START.TSM from servlet

Baseline Tests
Baseline Test with Online Traffic

Results Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

Web Layer hit rate


The OHS servers favour the location of the primary session when routing traffic. This will
improve throughput but web servers may not be very well balanced, especially when the
number of sessions is low and is not a multiple number of the number of managed servers.
We have 10 threads running on JMeter, each using a loop with 300 iterations to go through
the transactions from login to logoff, described in a previous section. The loop controller does
not reset cookie after logoff, as a result all requests from a given thread are treated as one
session on the load balancer.
The picture below is in line with two servers having 2 sessions and two servers having 3
sessions. The system looks unbalanced because of the unusual situation we are in, where
one session is made of 300 iterations of one user going through all the steps described in
section “Test Traffic Generation”. Otherwise, an extra session wouldn’t have made such an
impact.

18 Competency Centre
T24 Reference Architecture Oracle Platform View

App Layer hit rate


The app layer is very well balanced, which is as expected given that load balancing is
enabled on the T24ConnectionFactory.

Response Time
The initial spike on the response time is likely due to the system warming up.
The 95th percentile is less than 0.9s for all transactions.

19 Competency Centre
T24 Reference Architecture Oracle Platform View

Infrastructure Resources

CPU
No issue with CPU.

20 Competency Centre
T24 Reference Architecture Oracle Platform View

Memory
No issue with memory.

JDBC Connections
The connection pools were large enough.

21 Competency Centre
T24 Reference Architecture Oracle Platform View

Threads

Failures and Exceptions

There have been no errors for the whole run.

Validation
The execution of the relevant SQL queries at the end of the run showed that expected
number of records have been inserted. The test run created the expected number of
customers (3000), accounts (6000) and cash deposits (3000).

22 Competency Centre
T24 Reference Architecture Oracle Platform View

Baseline Test with COB

Test Summary
A COB with one tSA on each of the four app servers has been executed using the TAFJEE
servlet.

Result Summary
Start and end times extracted from the COMO files.

COB test Start time End time Duration

Baseline 1 15:36:58 16:22:57 00:45:59

Baseline 2 12:34:41 13:17:17 00:42:36

Baseline 3 11:57:00 12:39:58 00:42:58

Error Summary
The COB finished successfully with no errors in the EB.EOD.ERROR.

Application Layer HA Tests


Kill of MS, AS and NM Processes On App Layer

Test Summary
The procedure used to execute the tests has been captured on the following file:

App_MS_AD_NM_kill.txt

Process killed Time

AS 10:47:46

NM1 10:50:02

NM2 10:52:25

NM3 10:55:31

NM4 10:58:15

MS1 11:01:15

MS2 11:06:35

MS3 11:12:01

23 Competency Centre
T24 Reference Architecture Oracle Platform View

MS4 11:18:33

NM1 11:30:36

NM2 11:32:49

NM3 11:35:30

NM4 11:37:01

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

AppLayer hit rate


Kill MS seems to corrupt the counters AppDynamics is capturing. A logarithmic scale is used
in order to visualise that app servers are still balanced when the killed managed server
recovers.

24 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 500, timeouts and some legitimate errors caused by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

Process killed Time JMeter Errors Adjusted Errors

Admin Server 10:47:46 0 0

NM1 10:50:02 0 0

NM2 10:52:25 0 0

NM3 10:55:31 0 0

NM4 10:58:15 0 0

MS1 11:01:15 14 6

MS2 11:06:35 2 2

MS3 11:12:01 24 7

MS4 11:18:33 11 5

NM1 11:30:36 0 0

25 Competency Centre
T24 Reference Architecture Oracle Platform View

NM2 11:32:49 0 0

NM3 11:35:30 0 0

NM4 11:37:01 0 0

SQL Records inserted:

Record Type Expected Actual number Missing


number records

Customers 3000 2996 4 Given that 4


customers
Accounts 6000 5991 9 failed, we
expect that to
Cash deposits 3000 2993 7 be followed by 8
accounts and 4
Total missing 20 cash deposits
missing
Adjusted 5 to 8
records. But
missing
there has been
an additional 1
account and 3 cash deposits missing records. Some of these cash deposit missing records
may be legitimate as they may have been caused by missing accounts. So these failure tests
would have effectively generated between 4+1=5 and 4+1+3 = 8 missing records.
Record of errors from JMeter on snapshot below.

26 Competency Centre
T24 Reference Architecture Oracle Platform View

Graceful Shutdown and Start of MS Processes on App Layer

Test Summary
The managed servers have been gracefully shut down and then started using the Weblogic
console (shut down when work complete). The relevant times have been extracted from the
managed server logs. The commands used are described on the attached file below:

App_MS_Shutdown.txt

Process / Action Req. Start Time

MS1 Shutdown 9:00:25

MS1 Start 9:08:08

MS2 Shutdown 9:13:27

27 Competency Centre
T24 Reference Architecture Oracle Platform View

MS2 Start 9:18:46

MS3 Shutdown 9:23:20

MS3 Start 9:26:34

MS4 Shutdown 9:31:49

MS4 Start 9:36:28

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

App Layer hit rate


When a server is restarted, the remaining ones pick up the load. The system stays balanced
all the time.
There is an odd dip that lasted about a couple of minutes around 9:32 when MS4 got shut
down. This dip has not been corroborated by the errors captured by JMeter. There was no
odd error spike around this time. However, errors captured by JMeter spread over 69 s. It is
very likely that requests got buffered.

28 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.
There is an odd gap in requests that lasted about a couple of minutes around 9:32 when MS4
got shut down. This dip has not been corroborated by the errors captured by JMeter. There
was no odd error spike around this time. However, errors captured by JMeter spread over 69
s. It is very likely that requests got buffered.

Error Summary
Errors are HTTP 500, timeouts and some legitimate errors caused by the script limitation.

29 Competency Centre
T24 Reference Architecture Oracle Platform View

The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

Process / Action Req. Start Time Req. End Time JMeter errors Adjusted errors

MS1 Shutdown 9:00:25 9:01:31 11 5

MS1 Start 9:08:08 9:09:51 2 2

MS2 Shutdown 9:13:27 9:14:32 14 5

MS2 Start 9:18:46 9:20:12 0 0

MS3 Shutdown 9:23:20 9:24:23 2 2

MS3 Start 9:26:34 9:28:00 0 0

MS4 Shutdown 9:31:49 9:33:00 30 9

MS4 Start 9:36:28 9:37:58 14 2

SQL Records inserted:

Record Type Expected Actual number Missing


number records

Customers 3000 2996 4

Accounts 6000 5981 19

Cash deposits 3000 2989 11

Total missing 34

Adjusted 16 to 22
missing

Given that 4 customers failed, we expect that to be followed by 8 accounts and 4 cash
deposits missing records. But there has been an additional 11 accounts and 7 cash deposits
missing records. Some of these cash deposit missing records may be legitimate as they may
have been caused by missing accounts. So these failure tests would have effectively
generated between 4+11=15 and 4+11+7 = 22 missing records.
Record of errors from JMeter on attached file below.

Microsoft Excel
Worksheet

30 Competency Centre
T24 Reference Architecture Oracle Platform View

Restart of App Layer VM Nodes

Test Summary
Servers have been restarted using the Oracle Cloud Console. Times have been captured
from a clock prior to confirming request.

VM Restarted Time

VM1 (admin & MS1) restart 10:40

VM2 (MS2) restart 11:03

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

App Layer hit rate


The behaviour is as expected. When a server is restarted, the remaining ones pick up the
load. The system stays balanced all the time.

31 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time

Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 500, timeouts and some legitimate errors caused by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

32 Competency Centre
T24 Reference Architecture Oracle Platform View

VM Restarted Time JMeter errors Adjusted errors

VM1 (admin & MS1) restart 10:41 10 6

VM1 back up 10:54 12 3

VM2 (MS2) restart 11:03 4 1

VM2 back up 11:13 5 3

SQL Records inserted:

Record Type Expected number Actual number Missing


records

Customers 3000 2999 1

Accounts 6000 5993 7

Cash deposits 3000 2996 4

Total missing 12

Adjusted 6 to 9
missing

Given that one customer failed, we expect that to be followed by 2 accounts and one deposit
missing records. But there has been an additional 5 accounts and 3 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 1+5=6 and 1+5+3 = 9 missing records.
Record of errors from JMeter on snapshot below.

33 Competency Centre
T24 Reference Architecture Oracle Platform View

Web Layer HA Tests


Kill of MS, AS and NM Processes On Web Layer

Test Summary
The procedure used to execute the tests has been captured on the following file:

Web_MS_AD_NM_kill.txt

Process killed Time

AS 14:31:31

NM1 14:33:04

NM2 14:35:15

NM3 14:38:15

NM4 14:44:28

MS1 14:47:44

MS2 14:54:06

MS3 15:00:59

MS4 15:06:32

34 Competency Centre
T24 Reference Architecture Oracle Platform View

AS 15:10:45

NM1 15:12:31

NM2 15:14:34

NM3 15:16:57

NM4 15:19:25

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

Web Layer hit rate


Behaviour is as expected. See baseline equivalent section for more details.

35 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 404, 500 and some legitimate errors caused by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

Process killed Time JMeter errors Adjusted errors

Admin Server 14:31:31 0 0

NM1 14:33:04 0 0

NM2 14:35:15 0 0

NM3 14:38:15 0 0

NM4 14:44:28 0 0

MS1 14:47:44 2 2

MS2 14:54:06 9 2

MS3 15:00:59 1 1

MS4 15:06:32 9 2

Admin Server 15:10:45 0 0

36 Competency Centre
T24 Reference Architecture Oracle Platform View

NM1 15:12:31 0 0

NM2 15:14:34 0 0

NM3 15:16:57 0 0

NM4 15:19:25 0 0

SQL Records inserted:

Record Type Expected number Actual number Missing


records

Customers 3000 2998 2

Accounts 6000 5996 4

Cash deposits 3000 2998 2

Total missing 8

Adjusted 2
missing

Given that 2 customers failed, we expect that to be followed by 4 accounts and 2 cash
deposits missing records. So these failure tests would have effectively generated a total of 2
missing records.
Record of errors from JMeter on snapshot below.

37 Competency Centre
T24 Reference Architecture Oracle Platform View

Graceful Shutdown and Start of MS Processes on Web Layer

Test Summary
The managed servers have been gracefully shut down and then started using the Weblogic
console (shut down when work complete option). The relevant times have been extracted
from the managed server logs. The commands used are described on the file below:

Web_MS_Shutdown.txt

Process / Action Req. Start Time

MS1 Shutdown 15:57:53

MS1 Start 16:03:17

MS2 Shutdown 16:08:53

MS2 Start 16:13:17

MS3 Shutdown 16:18:14

MS3 Start 16:24:31

MS4 Shutdown 16:30:03

MS4 Start 16:36:32

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

38 Competency Centre
T24 Reference Architecture Oracle Platform View

Web Layer hit rate


Behaviour is as expected. See baseline equivalent section for more details.

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 404, 500 and some legitimate errors caused by the script limitation.

39 Competency Centre
T24 Reference Architecture Oracle Platform View

The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

Process / Action Req. Start Time Req. End Time JMeter errors Adjusted errors

MS1 Shutdown 15:57:53 15:57:56 10 1

MS1 Start 16:03:17 16:04:43 0 0

MS2 Shutdown 16:08:53 16:08:56 1 1

MS2 Start 16:13:17 16:14:37 0 0

MS3 Shutdown 16:18:14 16:20:21 11 6

MS3 Start 16:24:31 16:25:47 0 0

MS4 Shutdown 16:30:03 16:32:10 13 5

MS4 Start 16:36:32 16:37:43 0 0

SQL Records inserted:

Record Type Expected Actual number Missing records


number

Customers 3000 2997 3

Accounts 6000 5994 6

Cash deposits 3000 2996 4

Total missing 13

Adjusted 4
missing

Given that 3 customers are missing, this would generate 6 accounts and 3 cash deposits
missing records. So these failure tests would have effectively generated 3+1=4 missing
records. The other missing records are due to a script limitation.
Record of errors from JMeter on snapshot below.

40 Competency Centre
T24 Reference Architecture Oracle Platform View

Kill OHS Processes on Web Layer

Test Summary
The procedure used to execute the tests has been captured on the following file:

OHS_kill.txt

Process killed Time

OHS_2 21:13

OSH_3 21:23

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s

41 Competency Centre
T24 Reference Architecture Oracle Platform View

 Average Bytes per transaction

Web Layer hit rate


Behaviour is as expected. See baseline equivalent section for more details.

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

42 Competency Centre
T24 Reference Architecture Oracle Platform View

Error Summary
Errors are security violations but some of them are legitimate errors caused by the script
limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

Process killed Time JMeter errors Adjusted errors

OHS_2 21:13 9 2

OHS_3 21:23 2 2

SQL Records inserted:

Record Type Expected Actual number Missing records


number

Customers 3000 2999 1

Accounts 6000 5998 2

Cash deposits 3000 2999 1

Total missing 4

Adjusted 1
missing

43 Competency Centre
T24 Reference Architecture Oracle Platform View

Given that 1 customer failed, we expect that to be followed by 2 accounts and 1 cash deposits
missing records. So these failure tests would have effectively generated a total of 1 missing
record.
Record of errors from JMeter on snapshot below:

Graceful Restart of OHS

Test Summary
Using the OEM, one can restart gracefully OHS. There has been no impact. The OHS log
confirms that it actually happened.
==================First restart================

44 Competency Centre
T24 Reference Architecture Oracle Platform View

[2016-10-13T16:19:52.8833+00:00] [OHS] [NOTIFICATION:16] [OHS-9999] [core.c] [host_id:


weblayer-live-wls-2] [host_addr: 10.196.217.174] [pid: 27427] [tid: 140677071062848] [user:
oracle] [VirtualHost: main]  SIGUSR1 received.  Doing graceful restart
[2016-10-13T16:19:54.0619+00:00] [OHS] [NOTIFICATION:16] [OHS-9999] [core.c]
[host_id: weblayer-live-wls-2] [host_addr: 10.196.217.174] [pid: 27444] [tid:
140677071062848] [user: oracle] [VirtualHost: main]  Oracle-HTTP-Server-12c/12.1.3.0.0
(Unix) mod_ssl/12.1.3.0.0 OtherSSL/0.0.0 mod_plsql/11.1.1.0.0 configured -- resuming
normal operations

==================Second restart================
[2016-10-13T16:28:53.0747+00:00] [OHS] [NOTIFICATION:16] [OHS-9999] [core.c]
[host_id: weblayer-live-wls-2] [host_addr: 10.196.217.174] [pid: 27444] [tid:
140677071062848] [user: oracle] [VirtualHost: main]  SIGUSR1 received.  Doing graceful
restart
[2016-10-13T16:28:54.0605+00:00] [OHS] [NOTIFICATION:16] [OHS-9999] [core.c] [host_id:
weblayer-live-wls-2] [host_addr: 10.196.217.174] [pid: 27444] [tid: 140677071062848] [user:
oracle] [VirtualHost: main]  Oracle-HTTP-Server-12c/12.1.3.0.0 (Unix) mod_ssl/12.1.3.0.0
OtherSSL/0.0.0 mod_plsql/11.1.1.0.0 configured -- resuming normal operations
=================================

Shutdown and Start of OHS Processes on Web Layer

Test Summary
The OHS servers have been shut down and then started using the OEM console. The times
are when the request was given on the console.

Process / Action Req. Start Time

OHS1 Shutdown 17:37

OHS1 Start 17:40

OHS2 Shutdown 17:46

OHS2 Start 17:50

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

45 Competency Centre
T24 Reference Architecture Oracle Platform View

Web Layer hit rate


Behaviour is as expected. See baseline equivalent section for more details.

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 404, 500 and some legitimate errors caused by the script limitation.

46 Competency Centre
T24 Reference Architecture Oracle Platform View

The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

Process / Action Req. Start Time JMeter errors Adjusted errors

OHS1 Shutdown 17:37 16 5

OHS1 Start 17:40 0 0

OHS2 Shutdown 17:46 22 6

OHS2 Start 17:50 0 0

SQL Records inserted:

Record Type Expected Actual number Missing records


number

Customers 3000 2997 3

Accounts 6000 5993 7

Cash deposits 3000 2994 6

Total missing 16

Adjusted 4 to 7
missing

Given that three customers failed, we expect that to be followed by 6 accounts and 3 deposit
missing records. But there has been an additional 1 account and 3 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 3+1=4 and 3+1+3 = 7 missing records.
Record of errors from JMeter on snapshot below.

47 Competency Centre
T24 Reference Architecture Oracle Platform View

Restart of Web Layer VM Nodes

Test Summary
Servers have been restarted using the Oracle Cloud Console. Restart request times have
been captured from a clock prior to confirming request.

VM Restarted Time

VM1 (admin & MS1) restart 12:32

VM2 (MS2 & OHS) restart 12:46

VM4 (MS4) restart 13:02

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

48 Competency Centre
T24 Reference Architecture Oracle Platform View

Web Layer hit rate


See baseline equivalent section for more details on OHS load balancing on web layer.
The lack of requests around 12:46 that lasted 2-3 minutes occurred when the VM2 hosting
both MS and OHS got restarted. This graph is from AppDynamics but it is confirmed by the
response time graph from JMeter.
This dip is strange because traffic from JMeter would have been distributed by the other OHS
instance on VM3. This dip has not been corroborated by the errors captured by JMeter. There
has been no unusual spike in errors and the number of missing records in the DB is not
unusually high. The second graph from JMeter showing hit rate suggests that JMeter stopped
sending requests for few minutes.

JMeter Hit Rate

49 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Note the lack of requests around 12:46 already discussed in the previous section.

Error Summary
Errors for VM1 and VM4 restarts are HTTP 404
Errors for VM2 restart are 500, 502, proxy error, failed to respond and some legitimate errors
caused by the script limitation. This server contains OHS as well as MS2.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

VM Restarted Time JMeter errors Adjusted errors

VM1 (admin & MS1) restart 12:32 1 1

VM1 back up 12:39 0 0

VM2 (MS2 & OHS) restart 12:46 18 7

VM2 back up 12:50 0 0

VM4 (MS4) restart 13:02 10 4

VM4 back up 13:08 0 0

SQL Records inserted:

50 Competency Centre
T24 Reference Architecture Oracle Platform View

Record Type Expected number Actual number Missing


records

Customers 3000 2999 1

Accounts 6000 5994 6

Cash deposits 3000 2994 6

Total missing 13

Adjusted 5 to 10
missing

Given that one customer failed, we expect that to be followed by 2 accounts and one deposit
missing records. But there has been an additional 4 accounts and 5 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 1+4=5 and 1+4+5 = 10 missing records.
Record of errors from JMeter on snapshot below.

51 Competency Centre
T24 Reference Architecture Oracle Platform View

Data Layer HA Tests


Shutdown of DB Nodes

Test Summary
The node got shutdown by running “SQL>shutdown abort;” on each node. The node shuts
down and then restarts automatically.

DB node shut down Time

Node 1 15:15

Node 2 15:25

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

App Layer hit rate


The app servers stayed very well balanced.

52 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 500, timeouts, duplicate OFS message and some legitimate errors caused
by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

53 Competency Centre
T24 Reference Architecture Oracle Platform View

DB node shut down Time JMeter Errors Adjusted Errors

Node 1 15:15 46 11

Node 2 15:25 154 22

SQL Records inserted:

Record Type Expected number Actual number Missing


records

Customers 3000 2989 11

Accounts 6000 5970 30

Cash deposits 3000 2974 26

Total missing 67

Adjusted 19 to 34
missing

Given that 11 customers failed, we expect that to be followed by 22 accounts and 11 deposits
missing records. But there has been an additional 8 accounts and 15 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 11+8=19 and 11+8+15 = 34 missing records.
Record of failures from JMeter are in the file below:

Microsoft Excel
Worksheet

Restart of Database VM Nodes

Test Summary
Servers have been restarted using the Oracle Cloud Console. Restart request times have
been captured from a clock prior to confirming request.

VM restarted Time

VM1 restart 16:45

VM2 restart 17:01

54 Competency Centre
T24 Reference Architecture Oracle Platform View

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

App Layer hit rate


The app servers stayed very well balanced.

55 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.

Error Summary
Errors are HTTP 500, timeouts, duplicate OFS message and some legitimate errors caused
by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.

VM Restarted Time JMeter Errors Adjusted Errors

VM 1 16:45 69 16

VM 2 17:02 168 27

SQL Records inserted:

Record Type Expected number Actual number Missing records

Customers 3000 2980 20

Accounts 6000 5949 51

Cash deposits 3000 2969 31

56 Competency Centre
T24 Reference Architecture Oracle Platform View

Total missing 102

Adjusted 31 to 42
missing

Given that 20 customers failed, we expect that to be followed by 40 accounts and 20 deposit
missing records. But there has been an additional 11 accounts and 11 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 20+11=31 and 20+11+11 = 42 missing records.
Record of errors from JMeter are in file below:

Microsoft Excel
Worksheet

COB HA Tests
COB with App Server VM Restart

Test Summary
A COB with one tSA on each of the four servers got started. When application stage reached
67%, one of the app layer servers got restarted.

Result Summary
Start and end times extracted from the COMO files.

Start time VM restart COB stage End time Duration Baseline


duration

23:06:10 16:45 Application @ 00:03:14 57 min 43-46 min


67%

When App server 1 got restarted, the COB carried on with a total of 3 tSA, each one running
on the three servers that have not been restarted.

Error Summary
The COB finished successfully with no errors in the EB.EOD.ERROR.

57 Competency Centre
T24 Reference Architecture Oracle Platform View

COB with Database VM Restart

Test Summary
A COB with one tSA on each of the four app servers got started. When application stage
reached 90%, one of the app layer servers got restarted.

58 Competency Centre
T24 Reference Architecture Oracle Platform View

Result Summary

Start and end times extracted from the COMO files.

Start time VM restart COB stage End time Duration Baseline


duration

10:51:01 10:59 Application @ 11:31:24 40 min 43-46 min


90%

The tSA on App server 1 has been restarted manually at 11:09 so COB could carry on with 4
tSAs, one on each of the four app servers as in design base.

Error Summary
The COB finished successfully with no errors in the EB.EOD.ERROR.

DR Tests with Online Traffic


Site Switchover

Test Summary
DB Switchover  Load balancer switchover 
Start End Duration Start End
11:59:59 12:01:13 00:01:14 12:00 instantaneous

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

App Layer hit rate


There are 4 managed servers on live, each one handling a quarter of the requests, and only
one managed server on DR handling all requests (red line).

59 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.
The downtime is from 11:59:49 to 12:03:12.

Error Summary
Errors are session timeouts.

Jmeter errors capture Error count and missing records


Start End Downtime Error count Missing db records
11:59:49 12:03:12 00:03:23 2613 1170

Record of errors from JMeter in attached file below.

60 Competency Centre
T24 Reference Architecture Oracle Platform View

Microsoft Excel
Worksheet

Site Failover

Test Summary
DB Kill Failover     Load balancer switchover 
Time Start End Downtime Start End
14:01:17 14:02:36 14:06:55 00:04:19 14:01:20 instantaneous

Result Summary
Table shows the following:

 Transaction name
 Total number of transactions
 Average response time, minimum and maximum in ms
 Standard deviation and error proportion
 Throughput in req/s or req/min and bandwidth rate in kB/s
 Average Bytes per transaction

App Layer hit rate


There are 4 managed servers on live, each one handling a quarter of the requests, and only
one managed server on DR handling all requests (red line).

61 Competency Centre
T24 Reference Architecture Oracle Platform View

Response Time
Below is the response time for every transaction. The Y-axis is in ms.
The downtime is from 14:01:15 to 14:07:31.

Error Summary
Errors are session timeouts.

Jmeter errors capture  Error count and missing records


Start End Downtime Error count missing db rec
14:01:15 14:07:31 00:06:16 4443 2007

62 Competency Centre
T24 Reference Architecture Oracle Platform View

Some of the downtime experienced by JMeter is voluntary as we waited between DB kill


command and Failover execution, about a couple of minutes. Record of errors from JMeter in
attached file below.

Microsoft Excel
Worksheet

DR Tests with COB


Site Switchover
We managed to complete a COB successfully after a switchover to DR site.

Start of COB on LIVE site 


tSAs Start Stop COB state Duration
2 09:10:47 09:21:33 App @ 61% 00:10:46

The COB on LIVE site used two app servers with one tSA each.

Site Switchover 
Start End DB Open State Downtime
09:24:08 09:26:08 09:26:26 00:02:18

Completion of COB on DR Site 


tSAs Start End Duration Total Duration
1 09:35:27 10:39:34 01:04:07 01:14:53

We had to use a different DBTools user than the one used in LIVE in order to create the UD
subdirectories. The user used in LIVE is locked.

63 Competency Centre
T24 Reference Architecture Oracle Platform View

Site Failover
We managed to complete a COB successfully after a switchover to DR site.

Start of COB on LIVE site 


Stop = DB
tSAs Start kill COB state Duration
2 14:33:43 14:48:30 App @ 85% 00:14:47

The COB on LIVE site used two app servers with one tSA each.

DB Kill Site Failover 


Time Start End Downtime
14:48:30 14:53:39 15:04:31 00:10:52

Completion of COB on DR site 


tSAs Start End Duration Total Duration
1 15:11:23 16:06:12 00:54:49 01:09:36

We had to use a different DBTools user than the one used in LIVE in order to create the UD
subdirectories. The user used in LIVE is locked.

Switching back to live site has been executed successfully too.

64 Competency Centre

You might also like