Professional Documents
Culture Documents
Selecting A DR Methodology
Selecting A DR Methodology
A Disaster Recovery
very
Solution Selection
Methodology
Learn and apply a Disaster Recovery
Solution Selection Methodology
Cathy Warrick
John Sing
ibm.com/redbooks Redpaper
International Technical Support Organization
February 2004
Note: Before using this information and the product it supports, read the information in “Notices” on page v.
Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
The team that wrote this Redpaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
This information was developed for products and services offered in the U.S.A.
IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.
The following paragraph does not apply to the United Kingdom or any other country where such provisions are
inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.
This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.
Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.
IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.
Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.
This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.
COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and
distribute these sample programs in any form without payment to IBM for the purposes of developing, using,
marketing, or distributing application programs conforming to IBM's application programming interfaces.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Other company, product, and service names may be trademarks or service marks of others.
This Redpaper will help you design a Disaster Recovery solution and presents a Disaster
Recovery Solution Selection Methodology to assist in this process.
Cathy Warrick is a Project Leader at the International Technical Support Organization, San
Jose Center. Before joining the ITSO, she worked in the IBM Storage Field Education group,
managing the Technical Leadership Program
John Sing is a Senior Consultant with IBM Systems Group Business Continuance Strategy
and Planning, helping to plan and integrate IBM TotalStorage® products into the overall IBM
Business Continuance strategy and product portfolio. He started in the Disaster Recovery
arena in 1994 while on assignment to IBM Hong Kong S.A.R. of China and IBM China. In
1998, John joined the Enterprise Storage Server® (ESS) Planning team for PPRC, XRC, and
FlashCopy®; in 2000, John became the Marketing Manager for ESS Copy Services, and in
mid-2002, joined the Systems Group. John has been with IBM for 22 years.
Your efforts will help increase product acceptance and client satisfaction. As a bonus, you'll
develop a network of contacts in IBM development labs, and increase your productivity and
marketability.
Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html
We want our papers to be as helpful as possible. Send us your comments about this
Redpaper or other Redbooks™ in one of the following ways:
Use the online Contact us review redbook form found at:
ibm.com/redbooks
Send your comments in an Internet note to:
redbook@us.ibm.com
Mail your comments to:
IBM® Corporation, International Technical Support Organization
Dept. QXXE Building 80-E2
650 Harry Road
San Jose, California 95120-6099
These questions have vexed Disaster Recovery solution designers for a long time.
Developing the skill to perform this selection function effectively was often time consuming
and incomplete. It can be difficult to transfer these skills to other colleagues.
This methodology is not designed to replace in-depth skills. It is meant as a guideline and a
framework. Proper application of this methodology can significantly reduce the effort and time
required to identify proper solutions, and therefore accelerate the selection cycle.
For more information about this methodology, see the redbook IBM TotalStorage Solutions for
Disaster Recovery, SG24-6547.
The common problem in the past has been a tendency to view the Disaster Recovery solution
as individual product technologies and piece parts; see Figure 1-1. Instead, Disaster
Recovery solutions need to be viewed as a whole, integrated multiproduct solution.
In this chapter we propose a Disaster Recovery Solution Selection Methodology that can be
used to sort, summarize, and organize the various business requirements in a methodical
way. Then, we methodically use those business requirements to efficiently identify a proper
and valid subset of Disaster Recovery technologies to address the requirements.
Each vendor and product area tends to build separate pieces of the solution
Insufficient interlocking of the different areas
Business Continuance and Disaster Recovery need to be seen as an
integrated product solution
Many valid technologies, but how to choose among them?
All IT infrastructure necessary to support the Disaster Recovery solution can be inserted into
one of these five components; see Figure 1-2 on page 3.
Applications Staff
Management
Control
1. Servers
2. Storage
3. Software and Automation Provide all five to assure:
4. Networking (includes Physical Infrastructure) "On Time, On Budget, On
5. Skills and Services Demand"
These five categories provide a framework to organize the various component evaluation
skills that will be needed. Gathering the proper mix of evaluation skills together facilitates an
effective comparison, contrast, and blending of all five product component areas to arrive at
an optimum solution.
By categorizing Disaster Recovery technology into the various tiers, we have the capability to
more easily match our desired RTO time with the optimum set of technologies. The reason for
multiple tiers is that as the RTO time decreases, the optimum Disaster Recovery technologies
for RTO must change. For any given RTO, there are always a particular set of optimum
price/performance Disaster Recovery technologies.
The tiers concept is flexible. As products and functions change and improve over time, the
Tiers chart only needs to be updated by the addition of that new technology into the
appropriate tier and RTO.
The Tiers chart, shown in Figure 1-3 on page 4, gives a generalized view of some of today’s
IBM Disaster Recovery technologies by tier. As the recovery time becomes shorter, then
Applications with
Tier 7 - Highly automated, business wide, integrated solution (Example: low tolerance to
GDPS/PPRC/VTS P2P, AIX HACMP/PPRC , OS/400 HABP.... outage
Zero
Zero or
or near
near zero
zero data
data
Tier 6 - Storage mirroring (example: XRC, recreation
recreation
PPRC, VTS Peer to Peer)
Tier 5 - Software two site, two phase commit (transaction
integrity)
Value
minutes
minutes to
to hours
hours Tier 4 - Batch/Online database shadowing & journaling, Applications
data
data recreation
recreation Point in Time disk copy (FlashCopy), TSM-DRM
somewhat tolerant
up
up to
to 24
24 hours
hours Tier 3 - Electronic Vaulting, TSM**, to outage
data
data recreation
recreation Tape
24-48
24-48 hours
hours
Tier 2 - PTAM, Hot Site, Applications very
data
data recreation
recreation TSM**
Tier 1 - PTAM* tolerant to outage
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days
The concept and shape of the Tiers chart continues to apply even as the scale of the
application or applications changes. Large scale applications will tend to move the curve to
the right, and small scale applications will tend to move the curve to the left. But in both
cases, the general relationship of the various tiers and Disaster Recovery technologies to
each other remains the same. Finally, although some Disaster Recovery technologies fit into
multiple tiers, clearly there is not one Disaster Recovery technology that can be optimized for
all the tiers.
Of course, your technical staff can and should, when appropriate, create a specific version of
the Tiers chart for your particular environment. After the staff agrees on what tier or tiers and
corresponding RTO a solution delivers for your enterprise, then Disaster Recovery technical
evaluation and comparisons are much easier, and the technology alternatives can be tracked
and organized in relation to each other. Although the technology within the tiers has obviously
changed through time, the concept continues to be as valid today as when it was first
described by the U.S. SHARE user group in 1988.
To use the tiers to derive a blended, optimized enterprise Disaster Recovery architecture, we
suggest the following steps:
1. Categorize the business' entire set of applications into three bands: Low tolerance to
outage, Somewhat tolerant to outage, and Very tolerant to outage. Of course, although
some applications that are not in and of themselves critical, they do feed the critical
applications. Therefore, those applications would need to be included in the higher tier.
2. Within each band, there are tiers. The individual tiers represent the major Disaster
Recovery technology choices for that band. It is not necessary to use all the tiers, and of
course, it is not necessary to use all the technologies.
3. After we have segmented the applications (as best we can) into the three bands, we
usually select one best strategic Disaster Recovery methodology for that band. The
contents of the tiers are the candidate technologies from which the strategic methodology
is chosen.
A blended architecture optimizes and maps the varying application recovery time
requirements with an appropriate technology at an optimized cost. The net resulting blended
tier Disaster Recovery architecture provides the best possible application coverage for the
minimum cost.
The tier concept is simple enough that non-technical personnel can see the end result of
technical evaluations in a straightforward fashion. Senior management does not need to
understand the technology that is inside the tier; but they can clearly see the Recovery Time
Objective and the associated cost versus RTO trade-off.
This ability to communicate the bottom line allows senior management to understand the
recommendation, the trade-offs, and therefore make a decision quickly and efficiently.
Because of the clarity of the decision alternatives, it can be more likely that management
understands the choices and reaches decisions more quickly. This clarity of the choices and
the associated financial cost should result in a higher likelihood of adequate funding for the
Disaster Recovery project.
#2: Data
eServer Integrity
zSeries:
Geographically Dispersed Parallel Sysplex
Site 1 Site 2 (GDPS) - Tier 7
Common
Timers pSeries:
AIX/HACMP (High Availability Clustered
Applications Clustering Applications Multi-Processors) with PRRC - (Tier 7)
Facilities
Servers Servers iSeries:
Clustering High Availability Business Partner software:
Facilities
Vision, Lakeview, DataMirror (Tier 7)
xSeries:
X-Architecture, Blades (Tier 6)
Primary Mirrored
disk disk
#3:
#1: Software and Automation Transaction
IBM TotalStorage Byte Integrity
ESS PPRC (Tier 6) Movers DB2, IMS, CICS, WebSphere (Tier 5)
ESS XRC (Tier 6) WebSphere, MQ (Tier 5)
Virtual Tape Server Peer to Peer (Tier 6) Tivoli Storage Manager (Tier 2,3,4)
FAStT, SAN Volume Ctlr Mirroring (Tier
6) Networking and Infrastructure
ESS, FAStT, SAN Volume Controller IBM Global Services, IBM Business Partners,
FlashCopy (Tier 4) IBM Networking Partners
3590, 3592, LTO tape (Tier 1,2,3,4)
Storage software (Tier 1,2,3,4)
IGS, Business Partner Services
We need to
be online
24x7 Detailed
The Tiers of DR solution DR solution
Disaster
Recovery matrix description
table
Risk
Analysis
results
Identify
Eliminate Valid
CEO DR solution preliminary
Define the Tier subset DR solutions
level for each that do not candidate
Hmm.... That application from apply to all solution
means DR solution requirements
Oracle and
SAP must be BIA / RTO matrix
recovered RPO
Analysis
results
Detailed
Business Evaluation
Requirements Team
CO
Figure 1-5 Flow of the Disaster Recovery Solution Selection Methodology
Note that the prerequisite to entering the methodology is having already performed and
reached organizational agreement on the business requirements: Risk analysis, Business
Impact Analysis, application segmentation, and associated Recovery Time Objectives and
Recovery Point Objectives.
It is important to note what the Disaster Recovery Solution Selection Methodology cannot do:
Not intended to replace detailed solution recommendation configuration assistance.
Not intended to replace in-depth technical validation.
Not intended to replace detailed design and implementation skills and services.
'At the
B. Use RTO to pick appropriate solution Neck'
subset
Pick proper Organize solutions by Tiers (creates RTO subset)
subset
By segmenting the asking of questions into this hourglass concept and these three
categories, it becomes possible to efficiently subset the nearly endless permutations of
possible Disaster Recovery technology combinations and solutions into a manageable,
methodical process.
These are not all the possible questions, of course, but they are a valid starting point. You can
see additional questions in Appendix A, “Disaster Recovery Solution Selection Methodology
matrixes” on page 29.
Note that the specific order of the questions is by intent, designed to eliminate non-solutions
even as we are performing the information gathering phase.
The questions and how they are used in our hourglass concept are shown in the following
chart in Figure 1-8.
The questions above the neck of the hourglass define the basic business and IT
requirements. It is essential that these basic questions be answered fully, because a lack of
any of these answers means that it is not possible to properly evaluate what subset of
solutions are the ones we should investigate. In this way, the methodology enforces the
collection of proper business and infrastructure requirements before proceeding.
We must assure that the answers to these questions have gained consensus from the
enterprise’s management, business lines, application staff, in addition to the IT operations
staff.
Having identified the appropriate Level of Recovery, and in combination with the RTO, we now
reference the Solution Matrix in Appendix A, “Disaster Recovery Solution Selection
Methodology matrixes” on page 29.
An extract of the full Solution Matrix is shown for illustration purposes in Figure 1-9. Take the
identified Level of Recovery and RTO answers, and look into the Solution Matrix chart. You’ll
immediately identify the intersect of the Level of Recovery with the RTO/Tier. At the intersect,
in the contents of the intersection cell, are the initial candidate Disaster Recovery solutions
for this particular RTO.
Figure 1-9 Step B: Identify candidate RTO solutions using tabular Tiers chart, RTO, and Level of
Recovery
For the solutions in this paper, we supply a starter set of the eliminate non-solutions in
Appendix A, “Disaster Recovery Solution Selection Methodology matrixes” on page 29. An
extract from that table is shown in Figure 1-10.
Lookup....
Unplanned GDPS/PPRC XRC, Point in Time
Outages GDPS/XRC GDPS Storage Mgr, FlashCopy, VTS
eRCMF, Peer to Peer
etc..
Transaction IMS RSR, Oracle, DB2-specific
Integrity DB2-specific....
My Questions
and Answers
eliminate C. Use 'answers' to eliminate non-solutions
non-solutions
XRC GDPS Storage eRMCF
Manager
PPRC
Platform zSeries only zSeries and z + Open Systems
Open only...
heterogeneous 'Below
Distance any distance.... < 103 km < 103 km the
Neck'
Recovery Time 2-4 hours 1-4 hours 1-4 hours
Objective
Connectivity..... ESCON ESCON ESCON
Recovery Point few seconds to zero data loss zero data loss
Objective few minutes
Valid No Yes No
Option?
By applying the answers from Step A, on topics such as distance and non-support of
platforms, those candidate solutions that do not apply will be eliminated.
It is normal to have multiple possible solutions after we complete Step C. Whatever Disaster
Recovery candidate solution or solutions remain after this pass through Step C are therefore
a valid Disaster Recovery candidate solutions.
This methodology also supports our current best Disaster Recovery practices of segmenting
the Disaster Recovery architecture into three blended tiers (and therefore three tiers of
solutions). To identify the solutions for the other bands of solutions, you would simply re-run
the methodology, and give the lower RTO Level of Recovery for those lower bands and
applications; you would find the corresponding candidate solution technologies in the
appropriate (lower) RTO solution subset cells.
The valid identified candidate solutions also dictate what mix of skills will be necessary on the
evaluation team.
The evaluation team will in all likelihood need to further configure the candidate solutions into
more detailed configurations to complete the evaluation. This is also normal. In the end, that
team will still make the final decision as to which of the identified options (or the blend of
them) is the one that should be selected.
In most cases, the questions being asked in either Step A or Step B will not need to change.
Let us suppose that the answers to our starter set of Disaster Recovery Solution Selection
Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous
1.5.2 Step B: Use level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery to our Solution Matrix, a simplified version
for illustration purposes is shown in Figure 1-11 on page 15. A full version of this table is in
Appendix A, “Disaster Recovery Solution Selection Methodology matrixes” on page 29.
Unplanned Level of Recovery
Recovery Time Objective = Three hours
By intersecting the Tier 6 RTO column with the Unplanned Outage row, we find that the
preliminary candidate recommendations in our simplified table would be:
XRC
GDPS Storage Manager with PPRC
eRCMF
We examine the Step C: Eliminate Non-Solutions table for this Tier 6 Unplanned Outages for
which a starter set is supplied in Appendix A, “Disaster Recovery Solution Selection
Methodology matrixes” on page 29. A simplified version of the Eliminate Non-Solutions table
for the Tier 6 Unplanned Outage chart is shown in Figure 1-12 on page 16.
P la tf o r m z S e r ie s z S e r ie s , p S e r ie s , L in u x ,
H e te re o g e n e o u s S un, H P ,
in c lu d in g z S e r ie s M ic r o s o ft
W in d o w s ,
H e te ro g e n e o u s
(o p e n )
D is ta n c e < 4 0 k m , 4 0 -1 0 3 < 4 0 k m , 4 0 -1 0 3 < 4 0 k m , 4 0 -1 0 3
km , >103 km km km
C o n n e c tiv ity E S C O N , F IC O N E S C O N , F ib r e E S C O N , F ib r e
C hannel C hannel
V e n d o r (1 ) A ny X R C - A ny P P R C - IB M
c o m p lia n t z / O S c o m p lia n t
s u b s y s te m s u b s y s te m
V e n d o r (2 ) A n y z /O S S am e vendor as IB M
s u b s y s te m P P R C s u b s y s te m
R P O F e w s e c o n d s to N e a r z e ro N e a r z e ro
fe w m in u te s
A m t o f D a ta A ny A ny A ny
As we apply the different criteria sequentially from top to bottom, we find that:
1. Because the platform is IBM Sserver zSeries, we can eliminate eRCMF because that
does not support zSeries.
2. From a distance of 35 km, all remaining solutions qualify.
3. From a connectivity standpoint of ESCON, all remaining solutions qualify.
4. From a storage vendor hardware standpoint for site 1 of IBM ESS, all solutions qualify.
5. From a storage vendor hardware standpoint for site 2 of IBM ESS, all solutions qualify.
6. From a RPO standpoint of near zero, only GDPS Storage Manager with PPRC qualifies.
Therefore, we see that after applying the answers to the identified candidates and eliminating
non-solutions, this is the valid preliminary candidate solution:
GDPS Storage Manager with ESS PPRC
The methodology can often result in more than one possible solution. This is normal.
In all cases, whether we have identified one or multiple possible solutions, the detailed
evaluation team step is necessary to validate this preliminary set of identified solutions, as
well as accommodate a large variety of environment-specific considerations. As stated
earlier, the methodology is not intended to be a perfect decision tree.
For additional examples, see the Chapter 2, “Sample scenarios” on page 19, in which a
series of typical client Disaster Recovery requirements are distilled through this methodology,
and a preliminary solution is identified.
This methodology is not meant as a substitute for Disaster Recovery skill and experience, nor
is it possible for the methodology to be a perfect decision tree. Although there clearly will be
ambiguous circumstances (for which knowledgeable Disaster Recovery experts will be
required), the methodology still provides for the collection of the proper Disaster Recovery
business requirements information.
In this way, the methodology provides an efficient process by which the initial preliminary
Disaster Recovery solution selection can be consistently performed. In the end, this
methodology should assist you in mentally organizing and using the information in this
Redpaper, as well as navigating any Disaster Recovery technology evaluation process.
Let us suppose that the answers to our starter set of Disaster Recovery Solution Selection
Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous
2. On what platform or platforms does it run? Various
3. What is the desired Recovery Time Objective? 24 hours
4. What is the distance between the recovery sites (if there is one)? 200 km
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that? Very low bandwidth envisioned
6. What are the specific storage vendor hardware and software configurations that need to
be recovered? Large collection of different vendors
7. What is the Recovery Point Objective? 24 hours
8. What is the amount of data that needs to be recovered? 4 TB
9. What is the desired level of recovery (Planned Outage/Unplanned Outage/Transaction
Integrity)? Unplanned Outage
10.Who will design the solution? To be determined
11.Who will implement the solution? To be determined
2.1.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery to the Solution Matrix in Appendix A,
“Disaster Recovery Solution Selection Methodology matrixes” on page 29. A simplified
version of that matrix is shown below for illustration purposes (Figure 2-1 on page 21).
Recovery Time Objective = 24 hours
Unplanned Outage Level of Recovery
By intersecting with the Tiers 3, 2 or 1 column and the Unplanned Outage row, we find that
the preliminary candidate recommendations would be:
IBM Tivoli® Storage Manager
Tape
Therefore, we see that after applying the answers to the identified candidates and eliminating
non-solutions, we are left with the valid candidate solutions from those covered in this
Redpaper:
Tivoli Storage Manager
Tape
These are the two solutions that we would then turn over to the evaluation team in Step D. It is
probable that the evaluation team would end up using both Tivoli Storage Manager and tape
to meet this particular environment’s Disaster Recovery needs.
For the following let us suppose that the answers to our starter set of DR Solution Selection
Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous
2. On what platform or platforms does it run? Open Systems
3. What is the desired Recovery Time Objective? 8 hours
4. What is the distance between the recovery sites (if there is one)? 1200 km
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that? Long distance telecom lines,
Fibre Channel in the data center
6. What are the specific storage vendor hardware and software configurations that need to
be recovered? IBM Enterprise Storage Server
7. What is the Recovery Point Objective? 8 hours
8. What is the amount of data that needs to be recovered? 4 TB
9. What is the desired level of recovery (Planned Outage/Unplanned Outage/Transaction
Integrity)? Unplanned Outage
10.Who will design the solution? To be determined
11.Who will implement the solution? To be determined
7 6 5 4, 3 2, 1
RTO ===> G e n e ra lly n e a r G e n e ra lly 1 to 6 G e n e ra lly 4 to 8 G e n e ra lly G e n e ra lly > 2 4
c o n tin u o u s to 2 h o u rs h o u rs T ie r 4 : 6 -1 2 h o u rs
h o u rs h o u rs ; T ie r 3 :
1 2 -2 4 h o u r s
D e s c rip tio n H ig h ly S to ra g e a n d S /W a n d H o t s ite , D is k Backup
a u to m a te d s e rv e r m irro rin g d a ta b a s e P iT c o p y , T iv o li s o ftw a re , ta p e
in te g ra te d h /w tra n s a c tio n S to ra g e
s /w fa ilo v e r in te g r ity M a n a g e r-D R M ,
fa s t ta p e
P la n n e d O u ta g e / PPRC, F la s h C o p y , T iv o li S to ra g e
d a ta m ig r a tio n s - P P R C -X D , P P R C -X D , M a n a g e r,
b y te m o v e rs XRC, V T S P e e r to ta p e
V T S P e e r to P e e r P e e r, T S M ,
ta p e
U n p la n n e d O u ta g e G D P S /P P R C XRC, T ie r 4 : T iv o li S to ra g e
D is a s te r R e c o v e ry , G D P S /X R C G D P S S to ra g e P tP V T S , M a n a g e r,
a d d s d a ta in te g rity M a n a g e r w ith F la s h C o p y , ta p e
to b y te m o v e rs PPRC, P P R C -X D .
eR C M F T ie r 3 :
F la s h C o p y ,
T iv o li S to ra g e
M a n a g e r,
ta p e
D a ta b a s e a n d D a ta b a s e -le v e l S A P , O ra c le , T ie r 4 :
a p p lic a tio n T ra n s a c tio n D B 2 , S Q L S e rve r D a ta b a s e
T ra n s a c tio n In te g rity In te g rity la y e re d r e m o te re p lic a tio n tra n s a c tio n
- a d d s T ra n s a c tio n o n a u to m a te d re c o v e ry w ith
In te g rity to h /w re c o v e r y jo u rn a l
U n p la n n e d O u ta g e fo r w a rd in g
d a ta in te g r ity T ie r 3 :
D a ta b a s e
tra n s a c tio n
re c o v e ry w ith
e le c tro n ic
v a u ltin g o r
p h y s ic a l ta p e
tra n s p o rt
By intersecting with the Tier 4 column and the Unplanned Outage row, we find that the
preliminary candidate recommendations would be:
PtP VTS
FlashCopy (multiple disk storage subsystems)
ESS PPRC-XD
Few seconds to
few minutes, few
Few minutes to 1 minutes to 1 hour, 1-8 hours,
hour, 1-8 hours, 1-8 hours, greater greater than 8
RPO greater than 8 hours than 8 hours hours
A m t o f d a ta A ny A ny A ny
Figure 2-4 Tier 4 Unplanned Outage Eliminate Non-Solutions table
As we apply the different criteria sequentially from top to bottom, we find that:
1. Because the data is Open Systems, we eliminate PtP VTS.
2. There are no further eliminations.
Therefore, these are the valid Disaster Recovery preliminary candidate solutions:
ESS PPRC-XD
FlashCopy
These are the two solutions that we would then turn over to the evaluation team in Step D.
It is probable that the evaluation team would end up investigating the usage of the two
facilities (which are quite different), and choose one. If FlashCopy is chosen, it is likely that
tape would also be configured into the solution.
As you can see, the evaluation team’s experience is what turns a very high-level preliminary
selection into a valid final selection.
2.3.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery. For illustration purposes, Figure 2-5 on
page 26 is a simplified version of the full table that can be found in Appendix A, “Disaster
Recovery Solution Selection Methodology matrixes” on page 29.
Recovery Time Objective = 3 hours
Unplanned Level of Recovery
By intersecting the Tier 6 and Tier 7 columns, and the Unplanned Outage row, we find that the
preliminary candidate Disaster Recovery solutions would be:
XRC
GDPS Storage Manager
eRMCF
P la tfo r m z S e r ie s z S e r ie s , p S e r ie s , L in u x ,
H e te re o g e n e o u s Sun, H P,
in c lu d in g z S e r ie s W in d o w s ,
H e te ro g e n e o u s
( o p e n o n ly )
D is ta n c e <40 km , < 40 km , <40 km ,
4 0 -1 0 3 k m , 4 0 -1 0 3 k m 4 0 -1 0 3 k m
>103 km
C o n n e c tiv ity E S C O N , F IC O N E S C O N , F ib r e E S C O N , F ib r e
C hannel C hannel
V e n d o r (1 ) Any Any IB M E S S
X R C - c o m p lia n t P P R C - c o m p lia n t
z /O S s u b s y s te m s u b s y s te m
V e n d o r (2 ) A n y z /O S S am e vendor IB M E S S
s u b s y s te m P P R C s u b s y s te m
R PO F e w s e c o n d s to N e a r z e ro N e a r z e ro
fe w m in u te s
A m t o f d a ta Any Any Any
As we apply the different criteria sequentially from top to bottom, we find that:
1. Because the platform is zSeries, we can eliminate eRCMF, because it does not support
zSeries.
2. At a distance of 1200 km, we eliminate those solutions that cannot reach this distance.
Only XRC qualifies.
3. From a connectivity standpoint of FICON, XRC qualifies.
4. From a storage vendor hardware standpoint for site 1, XRC qualifies.
5. From a storage vendor hardware standpoint for site 2, XRC qualifies.
6. From a RPO standpoint of near zero, XRC qualifies.
Therefore, we see that after applying the answers to the identified candidates and eliminating
non-solutions, this is a valid preliminary candidate solution:
XRC
We would now turn over this solution to the detailed evaluation team in Step D for
confirmation and detailed evaluation.
Some of these questions will require the business line to answer them in a Risk and Business
Impact Analysis. Other questions are for the operations staff to answer from their knowledge
of the IT infrastructure.
Tolerance to Low tolerance to Low tolerance to Low tolerance to Somewhat Very tolerant to
outage outage outage outage tolerant to outage outage
The methodology allows room for product and Disaster Recovery experts to add their
expertise to the evaluation process after an initial preliminary set of candidate solutions is
identified.
The intent of the methodology is to provide a framework for efficiently organizing multiple
Disaster Recovery technologies, and more quickly identifying the proper possible solutions for
any given client set of requirements.
Recovery Point Near zero Near zero Near zero Near zero
Objective
Other notes
Supported disk PPRC-compliant PPRC Any z/OS® IBM ESS IBM ESS
storage storage HyperSwap- supported
(secondary compliant storage
site) storage
Recovery Point Near zero Near zero Near zero Near zero Near zero
Objective
The solutions for Transaction Integrity are specific to the database and application software
being used. Because of this, the list of possible solutions is very broad, and it is not feasible to
be all-inclusive. You should involve a software specialist skilled in the application and
database set that you are using for detailed evaluation of Transaction Integrity recovery
specific to your database and application. We do show common examples of solutions in
these cells in the matrix.
Recovery Point Near zero Near zero Near zero, few Near zero
Objective seconds to few
minutes, few
minutes to hours
Other notes
Other notes
Other notes
Recovery Point Few seconds to Near zero Near zero Near zero
Objective few minutes
Recovery Point Near zero, few Near zero, few Near zero, few
Objective seconds to few seconds to few seconds to few
minutes, minutes minutes, minutes minutes, minutes
to hours to hours to hours
(defined by user (defined by user (defined by user
policy policy policy
Other notes
Recovery Point Near zero, few Depending on the log Near zero
Objective seconds to few shipping mechanism,
minutes, minutes to loss of only few
hours transactions possible
(dependent on
specific database,
application, and
hardware)
Other notes
The scope of this paper is to focus on hardware and operating system-level IBM TotalStorage
Disaster Recovery solutions. You should involve a software specialist skilled in the application
and database set that you are using. However, as a general statement, robust databases
have integrated software functionalities to enhance and minimize Planned Outages.
Other notes
The scope of this Redpaper is to focus on hardware and operating system level IBM
TotalStorage Disaster Recovery solutions. You should involve a software specialist skilled in
the application and database set that you are using. However, as a general statement, robust
databases have integrated software functionalities to do Unplanned Outage recovery.
Other notes
The scope of this paper is to focus on hardware and operating system-level IBM TotalStorage
Disaster Recovery solutions. You should involve a software specialist skilled in the application
and database set that you are using.
Other notes
Connectivity
Recovery Point Minutes to hours Minutes to hours Minutes to hours Minutes to hours Hours
Objective
Other notes
Recovery Point Minutes to hours Minutes to hours Minutes to hours Minutes to hours Minutes to hours
Objective
Other notes
Other notes
Other notes
Other notes
Other notes
Connectivity N/A
Other notes
You might partially or fully justify the requested investment for Disaster Recovery to senior
management by quantifying the following values, and then portraying the proposed Disaster
Recovery solution cost as only a portion of the anticipated ongoing daily benefits, as identified
by these questions.
Tactical
1. Employee idling labor cost
2. Cost of re-creation and recovery of lost data
3. Salaries paid to staff unable to undertake billable work
4. Salaries paid to staff to recover work backlog and maintain deadlines
5. Interest value on deferred billings
6. Penalty clauses invoked for late delivery and failure to meet service levels
7. Loss of interest on overnight balances; cost of interest on lost cash flow
8. Delays in client accounting, accounts receivable and billing/invoicing
9. Additional cost of working; administrative costs; travel and subsistence; and so on
We provide them here so that you can have a guideline for the types of information that will
need to be gathered and analyzed by the detailed evaluation teams to finalize an in-depth
recommendation.
Performance
54.Has a bandwidth analysis been performed by collecting and analyzing data on the
production applications?
55.What percentage of the workload is required to be mirrored?
56.What is the method of automation to be used (GDPS, other)?
57.Is cross-platform data consistency required?
58.What platforms?
59.What level of consistency?
Connectivity
60.What is the distance to the remote site (miles or kilometers)?
61.What is the infrastructure to the remote site (Dark Fibre, Fibre provider/DWDM, Telecom
line - what speed and flavor, T1 - 128 KBytes/sec., T3 - 5 Mbytes/sec, OC3 - 19
Mbytes/sec, IP)?
62.Will channel extenders be used? If so, which channel extender vendor is preferred (CNT,
Cisco, McData, Brocade, other)?
63.What is the write update rate (MB/Sec, Ops/sec, how does it vary by time of day/month)?
The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this Redpaper.
IBM Redbooks
For information about ordering these publications, see “How to get IBM Redbooks” on
page 55. Note that some of the documents referenced here may be available in softcopy only.
IBM TotalStorage Solutions for Disaster Recovery, SG24-6547
The IBM TotalStorage Solutions Handbook, SG24-5250
H
hourglass concept 8
M
Methodology
example 13
steps 9
R
Redbooks Web site 55
Contact us viii
T
Tier 2,1
Planned Outage matrix 45
Transaction Integrity matrix 47
Unplanned Outage matrix 46
Tier 3
Transaction Integrity matrix 44
Unplanned Outage matrix 43
Tier 4
Transaction Integrity matrix 43
Unplanned Outage matrix 42
Tier 4,3
Planned Outage matrix 42
Tier 5
Planned Outage 39
Transaction Integrity matrix 41
Unplanned Outage matrix 40
Tier 6
Planned Outage matrix 35
Transaction Integrity matrix 38
Unplanned Outage matrix 36
Tier 7
Planned Outage matrix 32
A Disaster Recovery
Solution Selection
Methodology Redpaper
Learn and apply a There are a wide variety of IBM TotalStorage Disaster Recovery
technologies and solutions. Each are very powerful in their own
INTERNATIONAL
Disaster Recovery
way, and each has their own unique characteristics. How can we TECHNICAL
Solution Selection
select the optimum combination of solutions? How do we SUPPORT
Methodology
organize and manage all these valid Disaster Recovery ORGANIZATION
How to find the right technologies?
Disaster Recovery
These questions have vexed Disaster Recovery solution
solution designers for a long time. Developing the skill to perform this
BUILDING TECHNICAL
selection function effectively was often time consuming and INFORMATION BASED ON
Working with IBM incomplete. It can be difficult to transfer these skills to other PRACTICAL EXPERIENCE
TotalStorage products colleagues.
IBM Redbooks are developed
In this Redpaper, we offer a suggested Disaster Recovery by the IBM International
Solution Selection Methodology that is designed to provide Technical Support
assistance to this problem. The intent of our methodology is to Organization. Experts from
IBM, Customers and Partners
allow us to navigate the seemingly endless permutations of from around the world create
Disaster Recovery technology quickly and efficiently, and to timely technical information
identify initial preliminary, valid, cost-justified solutions. based on realistic scenarios.
This methodology is not designed to replace in-depth skills. It is Specific recommendations
meant as a guideline and a framework. Proper application of this are provided to help you
implement IT solutions more
methodology can significantly reduce the effort and time required effectively in your
to identify proper solutions, and therefore accelerate the environment.
selection cycle.