Professional Documents
Culture Documents
No part of this document may be reproduced or transmitted in any form or by any means, for any purpose,
without the express written permission of TEMENOS HEADQUARTERS SA.
Table of Contents
Document History..................................................................................................................... 4
Contributors.............................................................................................................................. 4
Temenos............................................................................................................................... 4
Oracle................................................................................................................................... 5
Trademark................................................................................................................................ 5
References............................................................................................................................... 5
Introduction.............................................................................................................................. 6
Executive Summary................................................................................................................. 6
HA Tests with Online Traffic.................................................................................................6
HA Tests with COB............................................................................................................... 7
DR Tests with Online Traffic................................................................................................. 7
DR Tests with COB.............................................................................................................. 8
Solution Deployment................................................................................................................ 8
Solution Description.............................................................................................................. 8
Architecture Diagram............................................................................................................ 9
HA Design Considerations.................................................................................................... 9
DR Design Considerations................................................................................................. 10
Timeouts Considerations.................................................................................................... 11
Software Deployed............................................................................................................. 11
Issues Identified and Fixes Applied........................................................................................ 11
Issue 1: T24ConnectionFactory Load Balancing causing failures......................................11
Issue 2: Session replication of the BrowserWeb application is not working........................12
Issue 3: Missing managed server start up argument on the app layer...............................12
Issue 5: Failures to cast to XML Type.................................................................................12
Issue 6: tLockManager is corrupting the database.............................................................13
Issue 7: Node manager fails to restart an OHS process when killed..................................13
Issue 8: Running COB from servlet....................................................................................13
Issue 9: Errors reported by JMeter are not confirmed by missing records in database......13
Issue 10: The Temenos logs produced with Weblogic don’t have the right permissions....14
Testing Approach................................................................................................................... 14
Test Data............................................................................................................................ 14
HA Tests with Online Traffic............................................................................................... 14
HA Tests with COB............................................................................................................. 16
DR Tests with Online Traffic............................................................................................... 16
2 Competency Centre
T24 Reference Architecture Oracle Platform View
3 Competency Centre
T24 Reference Architecture Oracle Platform View
Document History
Comments:
0.2 – Applied the Temenos template to the document and made some other modifications
Contributors
Temenos
Name Role
4 Competency Centre
T24 Reference Architecture Oracle Platform View
Oracle
Name Role
Trademark
References
T24 TAFJ Runbook
Available under TAFJ_HOME/doc
Glossary
Acronym Description
5 Competency Centre
T24 Reference Architecture Oracle Platform View
Introduction
The document reports the results of the high availability and disaster recovery testing carried
out on Temenos T24 banking product.
The T24 architecture tested is the one using JMS connectivity between the web and app
layers and is tested on the oracle stack.
Executive Summary
HA Tests with Online Traffic
The solution is highly available and recovers within few seconds but not without errors.
However the rate of errors is very low. Error count shown on table below are to be compared
to a total of 33k transactions. Note that errors reported by JMeter reflect errors that would be
experienced by a real end user.
During failure, the remaining servers have been found to stay balanced.
The disturbance observed with a graceful shutdown of managed servers from the Weblogic
console was not expected.
The downtime caused by disturbances on the data layer may be reduced if Active GridLink
data source is used. This will be investigated in the near future.
Numbers on the table are average per test of the same kind. The “Disturbance time” on the
table is the time period where errors have been registered by JMeter.
App Layer
NM and AS restarts 0 0 0
MS process kill 5 2 6
VM Restart 7 5 17
Web Layer
NM and AS restarts 0 0 0
MS process kill 2 1 4
VM Restart 4 3 5
Data Layer
6 Competency Centre
T24 Reference Architecture Oracle Platform View
VM Restart 22 21 45
The COB times are to be compared to baseline tests with 4 tSA that took 43 min. The first test
where an App Server VM got restarted took longer as expected since the COB continued with
3 tSAs.
For the second test, a tSA on one of the app servers had to be restarted manually so the test
could carry on with 4 tSAs.
Site Switchover
DB Switchover Load balancer switchover
Start End Duration Start End
11:59:59 12:01:13 00:01:14 12:00 instantaneous
The “Downtime” on the table is the time period where JMeter recorded errors. JMeter error
count are read and write errors. Note that downtime is higher than site switchover time.
Site Failover
DB Kill Failover Load balancer switchover
7 Competency Centre
T24 Reference Architecture Oracle Platform View
The “Downtime” on the table is the time period where JMeter recorded errors. JMeter error
count are read and write errors. The downtime here includes an additional voluntary delay,
about 2 minutes, between killing the db and triggering the failover.
Site Switchover
Site Switchover
Start End DB Open State Downtime
09:24:08 09:26:08 09:26:26 00:02:18
tSA tSA
s Start Stop COB state Duration s Start End Duration Total Duration
Site Failover
DB Kill Site Failover
Time Start End Downtime
14:48:30 14:53:39 15:04:31 00:10:52
tSAs Start Stop COB state Duration tSAs Start End Duration Total Duration
8 Competency Centre
T24 Reference Architecture Oracle Platform View
After the COB got finished, switching back to live site has been executed successfully and the
live db became in sync with DR db.
Solution Deployment
Solution Description
The solution has been deployed to be highly available using Oracle Cloud Service in a three
tiered architecture: Web layer, App layer and Data layer. The app and web layers are
Weblogic clusters made of four managed servers. The data layer is a RAC Oracle database
with two nodes. See architecture diagram in next section.
The web layer also has two Oracle Http Servers, OHS instances collocated with web MS2
and web MS3. These OHS instances have been configured to forward requests to all
managed servers on the web layer. The benefit of using OHS is that they can be configured
to target the cluster. Also, enabling the dynamic list makes it possible to scale up the
environment without having to restart anything. The other benefit is that, with replication
enabled, the OHS instances favour the primary session when distributing requests. The
drawback is that web servers may not be very well balanced when number of sessions is not
high, but throughput is optimised.
A load balancer has also been configured to transfer requests to both OHS instances in a RR
fashion.
For DR site to work, an infrastructure database has been added to host schemas required by
oracle technology. The RAC database contains the T24 schema only. The database on DR
site is kept in sync with the database on live site using Oracle DATAGUARD technology.
9 Competency Centre
T24 Reference Architecture Oracle Platform View
Architecture Diagram
HA Design Considerations
Web Layer
The HA at the web layer has been achieved as follows:
The Temenos code is deployed in a Weblogic cluster of four managed servers. The
code has been deployed with replication enabled. This means there is no need to
configure sticky sessions at the load balancer.
10 Competency Centre
T24 Reference Architecture Oracle Platform View
App Layer
The HA at the app layer has been achieved as follows:
The Temenos code is deployed in a Weblogic cluster of four managed servers.
Requests from the Web layer are transferred to app servers using JMS technology.
o The browser request queue is configured to be uniformly distributed
(WebLogic Uniform Distributed Queue), accessed from the web layer using a
Foreign JMS server. The latter has been configured to point to all app servers
to ensure high availability.
o The JMS connection factory had load balancing enabled, which ensures
requests are load balanced among all four app servers
o A distributed reply queue was found to generate errors. Instead, four local
browser reply queues with local JNDI have been configured. This setup
required a fix, which has been described in the section “ Issue 1:
T24ConnectionFactory Load Balancing causing failures ”.
o A JMS server on each of the app servers has been configured for high
availability and load balancing.
The Temenos shared libraries (TAFJ runtime and T24) have been installed on each
app server. A reliable shared storage was not available.
Data Layer
The HA at the data layer has been achieved as follows:
A RAC database with two nodes
The app layer uses a URL with SCAN addresses to point to the database
A generic data source type has been used.
Active GridLink datasource for fast failover has not been tested yet but is in the plan. It
may reduce the downtime for disturbances on the data layer.
DR Design Considerations
The LIVE and DR sites have been configured in Active/Standby mode.
An infrastructure database has been added to store the Weblogic schemas. The RAC
database described in the section “HA Design Considerations” contains the T24 database
only. This database is kept in sync with the DR RAC database, using DATAGUARD
technology.
Timeouts Considerations
Timeouts exist at various levels and need to be consistently configured.
ConnectionTimeout defined in BrowserParameters.xml located in BrowserWeb.war
archive, currently set to 60s
11 Competency Centre
T24 Reference Architecture Oracle Platform View
Software Deployed
Temenos
TAFJ R16 SP1
TAFJ Java Functions Version in DB: PB201510 08/18/2015
T24 R16 AMR
Infrastructure
JDK 1.7.0
Oracle Linux Release 6.6
Weblogic server version 12.1.3.0.160419
Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
Issue Description
While generating online traffic with many concurrent users, a lot of errors occured. The errors
have been tracked to be caused by JMS exceptions resulting from timeouts while waiting for
replies on the T24BrowserReplyQueue.
We found that, enabling affinity and disabling load balancing on the T24Connection factory,
gets rid of the errors. However the application layer is not balanced. We also found that, with
affinity set, a given web server has its requests handled by one given app server. This means,
in case of an app server failure, all requests would be redirected to one single app server,
which may create overload situations.
Note that when affinity is enabled the load balancing option becomes redundant.
Recommended Solution
Replace the uniformly distributed reply queue with global JNDI name by local reply queues
with local JNDI names. The local JNDI names have to be the same on all app servers,
namely “jms/t24BrowserReplyQueue”.
This solution requires updating the ejb-jar.xml in the TFAJEE_MDB.jar as follows:
<message-driven>
<display-name>Transacted Listener MDB for BROWSER</display-name>
<ejb-name>BROWSERTransactedMDB</ejb-name>
<ejb-class>com.temenos.tafj.mdb.TransactedMDB</ejb-class>
<messaging-type>javax.jms.MessageListener</messaging-type>
<transaction-type>Container</transaction-type>
<message-destination-type>javax.jms.Queue</message-destination-type>
<env-entry>
<description>Enable jmsReplyTo feature of an MDB </description>
<env-entry-name>com.temenos.tafj.mdb.TransactedMDB/sendToJmsReplyTo</env-
entry-name>
12 Competency Centre
T24 Reference Architecture Oracle Platform View
<env-entry-type>java.lang.Boolean</env-entry-type>
<env-entry-value>true</env-entry-value>
</env-entry>
<ejb-local-ref>
.
.
.
</message-driven>
Issue Description
When running traffic with sticky session disabled on the load balancer, errors are generated.
Recommended Solution
The following has been added to the weblogic.xml in the BrowserWeb.war file in order to
enable replication:
<session-descriptor>
<persistent-store-type>replicated_if_clustered</persistent-store-type>
</session-descriptor>
Issue Description
The following error occurs:
####<Aug 18, 2016 4:51:19 PM UTC> <Error> <HTTP> <applayer-live-wls-2.compute-
temoarch.oraclecloud.internal> <AppLayer_server_2> <[ACTIVE] ExecuteThread: '13' for queue:
'weblogic.kernel.Default (self-tuning)'> <<WLS Kernel>> <> <7b22a89e-fb5f-4dfa-820a-3079ff7cdcf0-
00007d40><1471539079942><BEA-101019>
<[ServletContext@2093712906[app:bea_wls_cluster_internal
module:bea_wls_cluster_internal.war path:null spec-version:3.0]] Servlet failed with an IOException.
java.io.NotSerializableException: com.sun.jersey.server.impl.cdi.CDIExtension
Recommended Solution
The issue and fix is described in the following support knowledge article:
https://support.oracle.com/epmos/faces/DocumentDisplay?id=1490080.1
Add the following property to the app layer managed server’ start up argument list:
-Dcom.sun.jersey.server.impl.cdi.lookupExtensionInBeanManager=true
Issue Description
Following errors have been observed:
[ERROR] 2016-08-21 12:50:53,990 [[ACTIVE] ExecuteThread: '14' for queue: 'weblogic.kernel.Default
(self-tuning)'] DATABASE - JDBC Read : Failed to cast directly to XMLType. Using OPAQUE
java.lang.NullPointerException
at com.temenos.tafj.dataaccess.specific.OracleSpecific.read(OracleSpecific.java:255)
at com.temenos.tafj.dataaccess.jTable.readWithFlags(jTable.java:970)
13 Competency Centre
T24 Reference Architecture Oracle Platform View
at com.temenos.tafj.dataaccess.jTable.read(jTable.java:822)
at com.temenos.tafj.dataaccess.JDBCDataAccessConductor.readWithFlags
(…)
Recommended Solutions
Apply TAFJ SP1 and add the following property to tafj.properties file:
temn.tafj.jdbc.use.sqlxml.resultset = false
Issue Description
During COB tests, for some unknown reason, connection to tLockManager is getting lost in
the oracle cloud environment that was used. As a result, only few lines get written to the db
leading to data corruption or loss.
Recommended Solutions
A TAFJ fix has been released in TAFJ R16 SP3.
When a connection to tLockManager is lost, a runtime exception would be generated rolling
back all the db writes.
Issue Description
During HA testing, it was found that Node Manager fails to restart the OHS process when it is
killed with a kill command.
Recommended Solutions
Increase the “RestartDelaySeconds” to to 10s and “RestarMax” to 5 on the
“startup.properties” file located where indicated below and then restart Node Manager.
/
u01/data/domains/WebLayer_domain/system_components/OHS/OHS_2/data/nodemanager/startup.pro
perties
Issue Description
COB could be run in classic mode but not from servlet.
Recommended Solution
Add the following argument to the server start on Weblogic to every MS:
-Dhostname=<hostname>
14 Competency Centre
T24 Reference Architecture Oracle Platform View
Issue Description
During some baseline tests, JMeter reported some write errors that have not been confirmed
by the database. This kind of mismatch will create an unnecessary negative user experience.
JMeter errors have been matched to following kind of error messages on App server logs:
<Oct 20, 2016 2:02:36 PM UTC> <Error> <EJB> <BEA-010026> <Exception occurred during commit of
transaction Name=[EJB
com.temenos.tafj.mdb.TransactedMDB.onMessage(javax.jms.Message)],Xid=BEA1-
1416406F23FC17BCE447(790252712),Status=Rolled back.
[Reason=weblogic.transaction.internal.TimedOutException: Transaction timed out after 29 seconds
Recommended Solutions
See section “Timeouts Considerations” for more details.
The following is to be implemented:
The timeout specified in the ejb-jar.xml on the TAFJEE_EJB.jar has to be shorter than
the JTA timeout on weblogic. This timeout has been decreased from 300s to 30s. Not
sure why it was set so high initially.
The JTA timeout has been increased from 30s to 40s. This timeout has to be
increased at both cluster and domain level because of a bug. The relevant patch
cannot be applied on the cloud as it is a trial account, see link below:
o https://support.oracle.com/epmos/faces/DocumentDisplay?id=2180843.1
Issue 10: The Temenos logs produced with Weblogic don’t have the
right permissions
Issue Description
The Temenos logs produced with Weblogic don’t have the right permission. Permission
denied errors are generated when working as a temenos user in classic mode and executing
tRun command.
The file permission is as follows:
-rw-r----- 1 oracle oracle 30512 Oct 20 14:27 database.log
The temenos user is configured to be part of oracle group, so rw permission is required for
the oracle group.
Recommended Solutions
The following has been changed:
Change umask value to 002 for the NM script located here:
$WL_HOME/server/bin/startNodeManager.sh
Change umask value to 002 for following script
$DOMAIN_HOME/bin/startWeblogic.sh
Restart NM by running $DOMAIN_HOME/bin/startNodeManager.sh
Restart Managed Server
The file permissions have changed to become rw for the oracle group as required.
15 Competency Centre
T24 Reference Architecture Oracle Platform View
Testing Approach
Test Data
Temenos Model Bank
Tools
JMeter and AppDynamics
Test Validation
The JMeter scripts have robust response assertions. In addition, at the end of every test run,
the following SQL scripts will be executed against the database to count the total number of
records that have been inserted:
Select count(*) from FBNK.CUSTOMER; (there are none before the run)
Select count(*) from FBNK.ACCOUNT; (there are none before the run)
Select count(*) from FBNK.TELLER; (there are 16 before the run)
16 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Execution
Killing of Processes
While traffic is running, the following commands will be executed:
MS1:
o date;pgrep -lf java | awk '/AppLayer_admin/ {print $1,$8}'
o date;pgrep -lf java | awk '/AppLayer_server/ {print $1,$8}'
o date;pgrep -lf java | awk '/NodeManager/ {print $1,$(NF-1)}'
o date;kill -9 <process Id>
MS2, MS3 and MS4
o date;pgrep -lf java | awk '/AppLayer_server/ {print $1,$4}'
o date;pgrep -lf java | awk '/NodeManager/ {print $1,$(NF-1)}'
o date;kill -9 <process Id>
Similar commands for Web layer by replacing App with Web.
Graceful Shutdown of MS
While traffic is running, use Weblogic console to gracefully shut down and then start the
managed servers one at a time.
DB Node Shutdown
While traffic is running, the following command line will be used to shut down the DB service:
o SQL>shutdown abort;
Restart of VMs
While traffic is running, use Oracle Cloud Console to restart the relevant box.
17 Competency Centre
T24 Reference Architecture Oracle Platform View
Baseline Tests
Baseline Test with Online Traffic
Results Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
18 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
The initial spike on the response time is likely due to the system warming up.
The 95th percentile is less than 0.9s for all transactions.
19 Competency Centre
T24 Reference Architecture Oracle Platform View
Infrastructure Resources
CPU
No issue with CPU.
20 Competency Centre
T24 Reference Architecture Oracle Platform View
Memory
No issue with memory.
JDBC Connections
The connection pools were large enough.
21 Competency Centre
T24 Reference Architecture Oracle Platform View
Threads
Validation
The execution of the relevant SQL queries at the end of the run showed that expected
number of records have been inserted. The test run created the expected number of
customers (3000), accounts (6000) and cash deposits (3000).
22 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
A COB with one tSA on each of the four app servers has been executed using the TAFJEE
servlet.
Result Summary
Start and end times extracted from the COMO files.
Error Summary
The COB finished successfully with no errors in the EB.EOD.ERROR.
Test Summary
The procedure used to execute the tests has been captured on the following file:
App_MS_AD_NM_kill.txt
AS 10:47:46
NM1 10:50:02
NM2 10:52:25
NM3 10:55:31
NM4 10:58:15
MS1 11:01:15
MS2 11:06:35
MS3 11:12:01
23 Competency Centre
T24 Reference Architecture Oracle Platform View
MS4 11:18:33
NM1 11:30:36
NM2 11:32:49
NM3 11:35:30
NM4 11:37:01
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
24 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 500, timeouts and some legitimate errors caused by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
NM1 10:50:02 0 0
NM2 10:52:25 0 0
NM3 10:55:31 0 0
NM4 10:58:15 0 0
MS1 11:01:15 14 6
MS2 11:06:35 2 2
MS3 11:12:01 24 7
MS4 11:18:33 11 5
NM1 11:30:36 0 0
25 Competency Centre
T24 Reference Architecture Oracle Platform View
NM2 11:32:49 0 0
NM3 11:35:30 0 0
NM4 11:37:01 0 0
26 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
The managed servers have been gracefully shut down and then started using the Weblogic
console (shut down when work complete). The relevant times have been extracted from the
managed server logs. The commands used are described on the attached file below:
App_MS_Shutdown.txt
27 Competency Centre
T24 Reference Architecture Oracle Platform View
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
28 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
There is an odd gap in requests that lasted about a couple of minutes around 9:32 when MS4
got shut down. This dip has not been corroborated by the errors captured by JMeter. There
was no odd error spike around this time. However, errors captured by JMeter spread over 69
s. It is very likely that requests got buffered.
Error Summary
Errors are HTTP 500, timeouts and some legitimate errors caused by the script limitation.
29 Competency Centre
T24 Reference Architecture Oracle Platform View
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
Process / Action Req. Start Time Req. End Time JMeter errors Adjusted errors
Total missing 34
Adjusted 16 to 22
missing
Given that 4 customers failed, we expect that to be followed by 8 accounts and 4 cash
deposits missing records. But there has been an additional 11 accounts and 7 cash deposits
missing records. Some of these cash deposit missing records may be legitimate as they may
have been caused by missing accounts. So these failure tests would have effectively
generated between 4+11=15 and 4+11+7 = 22 missing records.
Record of errors from JMeter on attached file below.
Microsoft Excel
Worksheet
30 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
Servers have been restarted using the Oracle Cloud Console. Times have been captured
from a clock prior to confirming request.
VM Restarted Time
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
31 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 500, timeouts and some legitimate errors caused by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
32 Competency Centre
T24 Reference Architecture Oracle Platform View
Total missing 12
Adjusted 6 to 9
missing
Given that one customer failed, we expect that to be followed by 2 accounts and one deposit
missing records. But there has been an additional 5 accounts and 3 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 1+5=6 and 1+5+3 = 9 missing records.
Record of errors from JMeter on snapshot below.
33 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
The procedure used to execute the tests has been captured on the following file:
Web_MS_AD_NM_kill.txt
AS 14:31:31
NM1 14:33:04
NM2 14:35:15
NM3 14:38:15
NM4 14:44:28
MS1 14:47:44
MS2 14:54:06
MS3 15:00:59
MS4 15:06:32
34 Competency Centre
T24 Reference Architecture Oracle Platform View
AS 15:10:45
NM1 15:12:31
NM2 15:14:34
NM3 15:16:57
NM4 15:19:25
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
35 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 404, 500 and some legitimate errors caused by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
NM1 14:33:04 0 0
NM2 14:35:15 0 0
NM3 14:38:15 0 0
NM4 14:44:28 0 0
MS1 14:47:44 2 2
MS2 14:54:06 9 2
MS3 15:00:59 1 1
MS4 15:06:32 9 2
36 Competency Centre
T24 Reference Architecture Oracle Platform View
NM1 15:12:31 0 0
NM2 15:14:34 0 0
NM3 15:16:57 0 0
NM4 15:19:25 0 0
Total missing 8
Adjusted 2
missing
Given that 2 customers failed, we expect that to be followed by 4 accounts and 2 cash
deposits missing records. So these failure tests would have effectively generated a total of 2
missing records.
Record of errors from JMeter on snapshot below.
37 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
The managed servers have been gracefully shut down and then started using the Weblogic
console (shut down when work complete option). The relevant times have been extracted
from the managed server logs. The commands used are described on the file below:
Web_MS_Shutdown.txt
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
38 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 404, 500 and some legitimate errors caused by the script limitation.
39 Competency Centre
T24 Reference Architecture Oracle Platform View
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
Process / Action Req. Start Time Req. End Time JMeter errors Adjusted errors
Total missing 13
Adjusted 4
missing
Given that 3 customers are missing, this would generate 6 accounts and 3 cash deposits
missing records. So these failure tests would have effectively generated 3+1=4 missing
records. The other missing records are due to a script limitation.
Record of errors from JMeter on snapshot below.
40 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
The procedure used to execute the tests has been captured on the following file:
OHS_kill.txt
OHS_2 21:13
OSH_3 21:23
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
41 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
42 Competency Centre
T24 Reference Architecture Oracle Platform View
Error Summary
Errors are security violations but some of them are legitimate errors caused by the script
limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
OHS_2 21:13 9 2
OHS_3 21:23 2 2
Total missing 4
Adjusted 1
missing
43 Competency Centre
T24 Reference Architecture Oracle Platform View
Given that 1 customer failed, we expect that to be followed by 2 accounts and 1 cash deposits
missing records. So these failure tests would have effectively generated a total of 1 missing
record.
Record of errors from JMeter on snapshot below:
Test Summary
Using the OEM, one can restart gracefully OHS. There has been no impact. The OHS log
confirms that it actually happened.
==================First restart================
44 Competency Centre
T24 Reference Architecture Oracle Platform View
==================Second restart================
[2016-10-13T16:28:53.0747+00:00] [OHS] [NOTIFICATION:16] [OHS-9999] [core.c]
[host_id: weblayer-live-wls-2] [host_addr: 10.196.217.174] [pid: 27444] [tid:
140677071062848] [user: oracle] [VirtualHost: main] SIGUSR1 received. Doing graceful
restart
[2016-10-13T16:28:54.0605+00:00] [OHS] [NOTIFICATION:16] [OHS-9999] [core.c] [host_id:
weblayer-live-wls-2] [host_addr: 10.196.217.174] [pid: 27444] [tid: 140677071062848] [user:
oracle] [VirtualHost: main] Oracle-HTTP-Server-12c/12.1.3.0.0 (Unix) mod_ssl/12.1.3.0.0
OtherSSL/0.0.0 mod_plsql/11.1.1.0.0 configured -- resuming normal operations
=================================
Test Summary
The OHS servers have been shut down and then started using the OEM console. The times
are when the request was given on the console.
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
45 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 404, 500 and some legitimate errors caused by the script limitation.
46 Competency Centre
T24 Reference Architecture Oracle Platform View
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
Total missing 16
Adjusted 4 to 7
missing
Given that three customers failed, we expect that to be followed by 6 accounts and 3 deposit
missing records. But there has been an additional 1 account and 3 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 3+1=4 and 3+1+3 = 7 missing records.
Record of errors from JMeter on snapshot below.
47 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
Servers have been restarted using the Oracle Cloud Console. Restart request times have
been captured from a clock prior to confirming request.
VM Restarted Time
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
48 Competency Centre
T24 Reference Architecture Oracle Platform View
49 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Note the lack of requests around 12:46 already discussed in the previous section.
Error Summary
Errors for VM1 and VM4 restarts are HTTP 404
Errors for VM2 restart are 500, 502, proxy error, failed to respond and some legitimate errors
caused by the script limitation. This server contains OHS as well as MS2.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
50 Competency Centre
T24 Reference Architecture Oracle Platform View
Total missing 13
Adjusted 5 to 10
missing
Given that one customer failed, we expect that to be followed by 2 accounts and one deposit
missing records. But there has been an additional 4 accounts and 5 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 1+4=5 and 1+4+5 = 10 missing records.
Record of errors from JMeter on snapshot below.
51 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
The node got shutdown by running “SQL>shutdown abort;” on each node. The node shuts
down and then restarts automatically.
Node 1 15:15
Node 2 15:25
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
52 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 500, timeouts, duplicate OFS message and some legitimate errors caused
by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
53 Competency Centre
T24 Reference Architecture Oracle Platform View
Node 1 15:15 46 11
Total missing 67
Adjusted 19 to 34
missing
Given that 11 customers failed, we expect that to be followed by 22 accounts and 11 deposits
missing records. But there has been an additional 8 accounts and 15 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 11+8=19 and 11+8+15 = 34 missing records.
Record of failures from JMeter are in the file below:
Microsoft Excel
Worksheet
Test Summary
Servers have been restarted using the Oracle Cloud Console. Restart request times have
been captured from a clock prior to confirming request.
VM restarted Time
54 Competency Centre
T24 Reference Architecture Oracle Platform View
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
55 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
Error Summary
Errors are HTTP 500, timeouts, duplicate OFS message and some legitimate errors caused
by the script limitation.
The JMeter threads execute transactions in series between login and logoff. If login fails then
all subsequent transactions until logoff are expected to fail. So the error count has been
adjusted on this table.
VM 1 16:45 69 16
VM 2 17:02 168 27
56 Competency Centre
T24 Reference Architecture Oracle Platform View
Adjusted 31 to 42
missing
Given that 20 customers failed, we expect that to be followed by 40 accounts and 20 deposit
missing records. But there has been an additional 11 accounts and 11 cash deposits missing
records. Some of these cash deposit missing records may be legitimate as they may have
been caused by missing accounts. So these failure tests would have effectively generated
between 20+11=31 and 20+11+11 = 42 missing records.
Record of errors from JMeter are in file below:
Microsoft Excel
Worksheet
COB HA Tests
COB with App Server VM Restart
Test Summary
A COB with one tSA on each of the four servers got started. When application stage reached
67%, one of the app layer servers got restarted.
Result Summary
Start and end times extracted from the COMO files.
When App server 1 got restarted, the COB carried on with a total of 3 tSA, each one running
on the three servers that have not been restarted.
Error Summary
The COB finished successfully with no errors in the EB.EOD.ERROR.
57 Competency Centre
T24 Reference Architecture Oracle Platform View
Test Summary
A COB with one tSA on each of the four app servers got started. When application stage
reached 90%, one of the app layer servers got restarted.
58 Competency Centre
T24 Reference Architecture Oracle Platform View
Result Summary
The tSA on App server 1 has been restarted manually at 11:09 so COB could carry on with 4
tSAs, one on each of the four app servers as in design base.
Error Summary
The COB finished successfully with no errors in the EB.EOD.ERROR.
Test Summary
DB Switchover Load balancer switchover
Start End Duration Start End
11:59:59 12:01:13 00:01:14 12:00 instantaneous
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
59 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
The downtime is from 11:59:49 to 12:03:12.
Error Summary
Errors are session timeouts.
60 Competency Centre
T24 Reference Architecture Oracle Platform View
Microsoft Excel
Worksheet
Site Failover
Test Summary
DB Kill Failover Load balancer switchover
Time Start End Downtime Start End
14:01:17 14:02:36 14:06:55 00:04:19 14:01:20 instantaneous
Result Summary
Table shows the following:
Transaction name
Total number of transactions
Average response time, minimum and maximum in ms
Standard deviation and error proportion
Throughput in req/s or req/min and bandwidth rate in kB/s
Average Bytes per transaction
61 Competency Centre
T24 Reference Architecture Oracle Platform View
Response Time
Below is the response time for every transaction. The Y-axis is in ms.
The downtime is from 14:01:15 to 14:07:31.
Error Summary
Errors are session timeouts.
62 Competency Centre
T24 Reference Architecture Oracle Platform View
Microsoft Excel
Worksheet
The COB on LIVE site used two app servers with one tSA each.
Site Switchover
Start End DB Open State Downtime
09:24:08 09:26:08 09:26:26 00:02:18
We had to use a different DBTools user than the one used in LIVE in order to create the UD
subdirectories. The user used in LIVE is locked.
63 Competency Centre
T24 Reference Architecture Oracle Platform View
Site Failover
We managed to complete a COB successfully after a switchover to DR site.
The COB on LIVE site used two app servers with one tSA each.
We had to use a different DBTools user than the one used in LIVE in order to create the UD
subdirectories. The user used in LIVE is locked.
64 Competency Centre