You are on page 1of 4

Report: DR Site Cluster Switch over

DR Drill No. Activity Start Date End Date Primary Data Near Data
Center (PDC) Center (NDC)
SAP-2021-02-21 Failover 20-FEB-2021 20-FEB-2021 Samanvay DC Naranpura DC
SAP-2021-02-28 Failback 27-FEB-2021 27-FEB-2021 Naranpura DC Samanvay DC

Systems checked marked were tested for failover scenarios.

SUSE Cluster AIX Cluster

ECC BW WEB-Dispatcher EP PO SOLMAN DMS ATSM FIORI

List of Nodes
Following is the list of servers (referred as Node-X) that were tested along with their physical location.
System PDC NDC
ECC AECCDB, AECC00 BECCDB, BECC00
BW ABWDB, ABWA0 BBWDB, BBWA0
EP AEPDBCS BEPDBCS
PO APROORCH BPROORCH
SOLMAN ASOLMAN BSOLMAN
DMS ADMS BDMS
FIORI AFFEDB BFFEDB
Web Dispatcher AWEBDIS BWEBDIS
IBM Spectrum Backup ATSM BTSM
Node-X tag Node-A Node-B

Results of DR scenarios tested for SAP/Non-SAP systems – Failover to NDC


Overall
Detailed Test Process &
System Result Issues / Shortcomings / Observations
Results
(PASS/FAIL)
ECC PASS Successful failover
BW PASS Successful failover
EP PASS Successful failover / EP application not starting Refer: EP Issue
PO PASS Successful failover / Issue came in traditional failover Refer: PO Issue
SOLMAN PASS Successful failover / Not connected to HAHA DB Refer: SOLMAN Issue
DMS PASS Successful failover
TSM PASS Successful failover
FIORI PASS Successful failover
WEB-Dispatcher PASS Successful failover

Prepared By Approved By
Sign: Sign:

16-Mar-2021
Date: ____________ Name: Umang Patel Date: __/ __ / ____ Name: Shri. Jagdish Trivedi (VP-IT)
Page 1 of 4
Functional Observation/Issues (if any):
(List here any non-technical or non-BASIS issue or observation that understood during testing)
1. No functional issues reported by end-users or IT support team

Open Items (if any):-


(Mention list of activities that needs to be carried out as follow-up to the testing carried out)
1. IBM case : TS005065070 for issue: “Site B automatically restarts if restarted Site A” we needs to
apply HACMP CLuster Fix (APAR) to resolve this issue.

EP Issue:
1. EP cluster was failover successfully from Site A to Site B, but application did not start automatically.

Observations / Findings:
• AIX cluster of EP system was moved successfully from Site A (AEPDBCS) to Site B (BEPDBCS).
• When the cluster went to Site B, we observed that the application was taking time to start.
• For initial startup load of EP, it was needing more memory resources at HANA tenant database TPP.
• Issue has been resolved after increased memory from 100GB to 125GB of HANA tenant TPP.

PO Issues:
1) First Issue: PO cluster did not failover by cluster move resource group method.

Observations / Findings:
• We stopped sap application and executed command “smitty clstop” at Site A (APROORCH) and
selected option: Move resource group.
• The cluster started moving to Site B (BPROORCH) but got stuck at one point with below warning
message.

WARNING: Cluster has been running recovery program 'TE_RG_MOVE' for 360 seconds. Please
check cluster status.

• We waited for 45 minutes, but it was stuck at the above point only.
• To resolve this, we restarted both the nodes and again started cluster at Site A.
• Executed halt at Site A and cluster successfully moved to Site B.
• We had raised case: TS005063238 to IBM for the error faced in the beginning.
• Feedback from IBM that this behaviour is a bug.
• Support quoted: “Here what I understood is the application was stopped by the user in advance,
since there was no monitor script configured the Power HA works as usual. When user-initiated
cl_stop command Power HA called the stop script and wait for the execution completion.”

2) Second Issue: Site B automatically restarts if restarted Site A.

Observations / Findings:
• Situation: The cluster was online at site B (BPROORCH) and Site A (APROORCH) was in halt mode.
• When we started the Site A (APROORCH) node from HMC, the Site B (BPROORCH) node
automatically got restarted.
• We created support case: TS005065070 to IBM for this issue.

Page 2 of 4
• Feedback from IBM: “When node APROORCH was restarted after the HALT, it caused node
BPROORCH to be rebooted by the RSCT (Reliable Scalable Cluster Technology) subsystem. This is a
known issue with RSCT subsystem and is resolved by installing apar IJ02843.”

Solution Manager Issue:


1) Solman system did not get connected to HANA database during application startup.

Observations / Findings:
• The cluster was successfully moved from Site A (ASOLMAN) to Site B (BSOLMAN).
• Due to old profile (ASE sysbase DB) file on bsolman host, Application did not connect HANA
database.
• After copied new profile from asolman to bsolman issue has been resolved.

Results of DR scenarios tested for SAP/Non-SAP systems – Failback to


Primary
Overall
System Result Issues / Shortcomings / Observations Detailed Test Process & Results
(PASS/FAIL)
ECC PASS Successful failback
BW PASS Successful failback
EP PASS Successful failback
PO PASS Successful failback
SOLMAN PASS Successful failback
DMS PASS Successful failback
TSM FAIL Cluster movement error at Site A. Refer: TSM Issue
FIORI PASS Successful failback
WEB-Dispatcher PASS Successful failback

Functional Observation/Issues (if any):


(List here any non-technical or non-BASIS issue or observation that understood during testing)
2. No functional issues reported by end-users or IT support team

Open Items (if any):-


(Mention list of activities that needs to be carried out as follow-up to the testing carried out)
2. We did the failback of TSM-Backup Server cluster to Site-A after resolve “en2 adapter” issue on ATSM
host after resolving the adapter error on 11-Mar-2021.

TSM Issue:
2. We will failback TSM-Backup Server cluster to Site-A after resolve “en2 adapter” issue on ATSM host.

Observations / Findings:
• We executed cluster failback command at Site B (BTSM) to Site A (ATSM).
• The cluster command returned error as below: “The PowerHA System Mirror adapter en2 is not
available on node atsm.”
Page 3 of 4
• Raised issue to IBM with case number: TS005108185.
• Initial Observation: The en2 adapter is in defined state at Site A (ATSM) and we must put the adapter
in available state. We are awaiting further instructions from IBM.

Any other learnings:


1. We need to do full re-sync of HANA database from Site-B (Naranpura) to Site-C (Banglore Sify) after
failback activity completed.
2. Before to start full re-sync of HANA database, we can start HANA database at Site-C, take backup and
schedule ECC-QA refresh activity after HA drill activity.
3. ECC & BW HANA DB took approx. 20 minutes to failover and failback activity individually.

Page 4 of 4

You might also like