You are on page 1of 70

Front cover

A Disaster Recovery
very
Solution Selection
Methodology
Learn and apply a Disaster Recovery
Solution Selection Methodology

How to find the right Disaster


Recovery solution

Working with IBM


TotalStorage products

Cathy Warrick
John Sing

ibm.com/redbooks Redpaper
International Technical Support Organization

A Disaster Recovery Solution Selection Methodology

February 2004
Note: Before using this information and the product it supports, read the information in “Notices” on page v.

First Edition (February 2004)

This edition applies to the gamut of IBM TotalStorage products.

© Copyright International Business Machines Corporation 2004. All rights reserved.


Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule
Contract with IBM Corp.
Contents

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v
Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
The team that wrote this Redpaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Become a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Chapter 1. Disaster Recovery Solution Selection Methodology. . . . . . . . . . . . . . . . . . . 1


1.1 The challenge in selecting Disaster Recovery solutions . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 The nature of Disaster Recovery solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The tiers of Disaster Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Blending tiers into an optimized solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 Using tiers as a communication tool to senior management and others. . . . . . . . . 5
1.3.2 The use of tiers in this Redpaper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Disaster Recovery Solution Selection Methodology tutorial . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Flowchart of the methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Intended usage and limitations of the methodology . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.3 Hourglass concept in Disaster Recovery Solution Methodology. . . . . . . . . . . . . . . 8
1.4.4 Steps in the Disaster Recovery Solution Selection Methodology . . . . . . . . . . . . . . 9
1.4.5 Value of the Disaster Recovery Solution Selection Methodology . . . . . . . . . . . . . 12
1.4.6 Step D: Turn over identified solutions to detailed evaluation team . . . . . . . . . . . . 13
1.4.7 Updating the methodology as technology advances. . . . . . . . . . . . . . . . . . . . . . . 13
1.5 An example: Using the DR Solution Selection Methodology . . . . . . . . . . . . . . . . . . . . 13
1.5.1 Step A: Ask specific questions in a specific order. . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.2 Step B: Use level of outage and Tier/RTO to identify RTO solution subset . . . . . 14
1.5.3 Step C: Eliminate non-solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5.4 Step D: Turn over identified preliminary solutions to evaluation team . . . . . . . . . 16
1.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Chapter 2. Sample scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19


2.1 Scenario 1: An efficient 24-hour recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.1 Step A: Client requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset . . . . . . . . 20
2.1.3 Step C: Eliminate non-solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Scenario 2: A long distance recovery at Tier 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.1 Step A: Client requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset . . . . . . . . 23
2.2.3 Step C: Eliminate non-solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Scenario 3: Enterprise long distance recovery at Tier 6 . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.1 Step A: Client requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset . . . . . . . . 25
2.3.3 Step C: Eliminate non-solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Appendix A. Disaster Recovery Solution Selection Methodology matrixes . . . . . . . . 29


Starter set of business requirement questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Disaster Recovery Solution Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Notes on the Solution Matrix cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Eliminate non-solutions matrixes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

© Copyright IBM Corp. 2004. All rights reserved. iii


Tier 7 Planned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Tier 7 Unplanned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Tier 7 Transaction Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Tier 6 Planned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Tier 6 Unplanned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Tier 6 Transaction Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Tier 5 Planned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Tier 5 Unplanned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Tier 5 Transaction Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Tiers 4 and 3 Planned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Tier 4 Unplanned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Tier 4 Transaction Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Tier 3 Unplanned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Tier 3 Transaction Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Tiers 2 and 1 Planned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Tiers 2 and 1 Unplanned Outage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Tiers 2 and 1 Transaction Integrity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Additional business requirements questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Justifying business continuance to the business . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Business requirements questions for detailed evaluation team . . . . . . . . . . . . . . . . . . . 50

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

iv A Disaster Recovery Solution Selection Methodology


Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult
your local IBM representative for information on the products and services currently available in your area. Any
reference to an IBM product, program, or service is not intended to state or imply that only that IBM product,
program, or service may be used. Any functionally equivalent product, program, or service that does not
infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to
evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The
furnishing of this document does not give you any license to these patents. You can send license inquiries, in
writing, to:
IBM Director of Licensing, IBM Corporation, North Castle Drive Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are
inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS
PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED,
INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT,
MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of
express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made
to the information herein; these changes will be incorporated in new editions of the publication. IBM may make
improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time
without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any
manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the
materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring
any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published
announcements or other publicly available sources. IBM has not tested those products and cannot confirm the
accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the
capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them
as completely as possible, the examples include the names of individuals, companies, brands, and products.
All of these names are fictitious and any similarity to the names and addresses used by an actual business
enterprise is entirely coincidental.

COPYRIGHT LICENSE:
This information contains sample application programs in source language, which illustrates programming
techniques on various operating platforms. You may copy, modify, and distribute these sample programs in
any form without payment to IBM, for the purposes of developing, using, marketing or distributing application
programs conforming to the application programming interface for the operating platform for which the sample
programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore,
cannot guarantee or imply reliability, serviceability, or function of these programs. You may copy, modify, and
distribute these sample programs in any form without payment to IBM for the purposes of developing, using,
marketing, or distributing application programs conforming to IBM's application programming interfaces.

© Copyright IBM Corp. 2004. All rights reserved. v


Trademarks
The following terms are trademarks of the International Business Machines Corporation in the United States,
other countries, or both:
FlashCopy® Redbooks(logo) ™
AIX® GDPS® Redbooks™
AS/400® HyperSwap™ RS/6000®
DB2® ibm.com® S/390®
Enterprise Storage Server® IBM® Tivoli®
ESCON® OS/390® TotalStorage®
^® OS/400® z/OS®
FICON® pSeries® zSeries®

The following terms are trademarks of other companies:

Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Other company, product, and service names may be trademarks or service marks of others.

vi A Disaster Recovery Solution Selection Methodology


Preface

This Redpaper will help you design a Disaster Recovery solution and presents a Disaster
Recovery Solution Selection Methodology to assist in this process.

The team that wrote this Redpaper


This Redpaper was produced by a team of specialists from around the world working at the
International Technical Support Organization, San Jose Center.

Cathy Warrick is a Project Leader at the International Technical Support Organization, San
Jose Center. Before joining the ITSO, she worked in the IBM Storage Field Education group,
managing the Technical Leadership Program

John Sing is a Senior Consultant with IBM Systems Group Business Continuance Strategy
and Planning, helping to plan and integrate IBM TotalStorage® products into the overall IBM
Business Continuance strategy and product portfolio. He started in the Disaster Recovery
arena in 1994 while on assignment to IBM Hong Kong S.A.R. of China and IBM China. In
1998, John joined the Enterprise Storage Server® (ESS) Planning team for PPRC, XRC, and
FlashCopy®; in 2000, John became the Marketing Manager for ESS Copy Services, and in
mid-2002, joined the Systems Group. John has been with IBM for 22 years.

Thanks to the following people for their contributions to this project:


Rainer Eisele, SVA GmbH, an IBM Business Partner
Ray Pratts, Mark III System, an IBM Business Partner
Ulrich Walter, IBM
Ian R. Wright, IBM

Become a published author


Join us for a two- to six-week residency program! Help write an IBM Redbook dealing with
specific products or solutions, while getting hands-on experience with leading-edge
technologies. You'll team with IBM technical professionals, Business Partners and/or clients.

Your efforts will help increase product acceptance and client satisfaction. As a bonus, you'll
develop a network of contacts in IBM development labs, and increase your productivity and
marketability.

Find out more about the residency program, browse the residency index, and apply online at:
ibm.com/redbooks/residencies.html

© Copyright IBM Corp. 2004. All rights reserved. vii


Comments welcome
Your comments are important to us!

We want our papers to be as helpful as possible. Send us your comments about this
Redpaper or other Redbooks™ in one of the following ways:
򐂰 Use the online Contact us review redbook form found at:
ibm.com/redbooks
򐂰 Send your comments in an Internet note to:
redbook@us.ibm.com
򐂰 Mail your comments to:
IBM® Corporation, International Technical Support Organization
Dept. QXXE Building 80-E2
650 Harry Road
San Jose, California 95120-6099

viii A Disaster Recovery Solution Selection Methodology


1

Chapter 1. Disaster Recovery Solution


Selection Methodology
There are a wide variety of IBM TotalStorage Disaster Recovery technologies and solutions.
Each are very powerful in their own way, and each has their own unique characteristics. How
can we select the optimum combination of solutions? How do we organize and manage all
these valid Disaster Recovery technologies?

These questions have vexed Disaster Recovery solution designers for a long time.
Developing the skill to perform this selection function effectively was often time consuming
and incomplete. It can be difficult to transfer these skills to other colleagues.

In this Redpaper, we offer a suggested Disaster Recovery Solution Selection Methodology


that is designed to provide assistance to this problem. The intent of our methodology is to
allow us to navigate the seemingly endless permutations of Disaster Recovery technology
quickly and efficiently, and to identify initial preliminary, valid, cost-justified solutions.

This methodology is not designed to replace in-depth skills. It is meant as a guideline and a
framework. Proper application of this methodology can significantly reduce the effort and time
required to identify proper solutions, and therefore accelerate the selection cycle.

For more information about this methodology, see the redbook IBM TotalStorage Solutions for
Disaster Recovery, SG24-6547.

© Copyright IBM Corp. 2004. All rights reserved. 1


1.1 The challenge in selecting Disaster Recovery solutions
From an IT infrastructure standpoint, there are a large variety of valid Disaster Recovery
products. The fundamental challenge is to select the optimum blend of all these Disaster
Recovery products and technologies.

The common problem in the past has been a tendency to view the Disaster Recovery solution
as individual product technologies and piece parts; see Figure 1-1. Instead, Disaster
Recovery solutions need to be viewed as a whole, integrated multiproduct solution.

In this chapter we propose a Disaster Recovery Solution Selection Methodology that can be
used to sort, summarize, and organize the various business requirements in a methodical
way. Then, we methodically use those business requirements to efficiently identify a proper
and valid subset of Disaster Recovery technologies to address the requirements.

Each vendor and product area tends to build separate pieces of the solution
Insufficient interlocking of the different areas
Business Continuance and Disaster Recovery need to be seen as an
integrated product solution
Many valid technologies, but how to choose among them?

Figure 1-1 Historical challenges in selecting Disaster Recovery solutions

1.1.1 The nature of Disaster Recovery solutions


To combine and properly select among multiple products, disciplines, and skills to effect a
successful IT Disaster Recovery solution, we first observe that we can categorize all valid
Disaster Recovery IT technologies into five component domains:
򐂰 Servers
򐂰 Storage
򐂰 Software and automation
򐂰 Networking and physical infrastructure
򐂰 Skills and services required to implement and operate the above

All IT infrastructure necessary to support the Disaster Recovery solution can be inserted into
one of these five components; see Figure 1-2 on page 3.

2 A Disaster Recovery Solution Selection Methodology


The Solution: True Nature of Disaster Recovery
xSeries Operating System
Applications
pSeries
iSeries xSeries
Solaris pSeries Operations Staff
HP-UX iSeries Network Staff
zSeries zSeries
WinNT/2000 Solaris
HP-UX
WinNT/2000
Data

Applications Staff

Management
Control

A comprehensive approach with the five IT Telecom Network


Physical Facilities
component areas results in a solution:

1. Servers
2. Storage
3. Software and Automation Provide all five to assure:
4. Networking (includes Physical Infrastructure) "On Time, On Budget, On
5. Skills and Services Demand"

Figure 1-2 The five components

These five categories provide a framework to organize the various component evaluation
skills that will be needed. Gathering the proper mix of evaluation skills together facilitates an
effective comparison, contrast, and blending of all five product component areas to arrive at
an optimum solution.

1.2 The tiers of Disaster Recovery


The concept of tiers is a common method used in today’s best practices for Disaster
Recovery solution design. The concept of tiers is powerful and central to our selection
methodology, because the tiers concept recognizes that for a given client Recovery Time
Objective (RTO), all Disaster Recovery products and technologies can be sorted into a RTO
solution subset that addresses that particular RTO range.

By categorizing Disaster Recovery technology into the various tiers, we have the capability to
more easily match our desired RTO time with the optimum set of technologies. The reason for
multiple tiers is that as the RTO time decreases, the optimum Disaster Recovery technologies
for RTO must change. For any given RTO, there are always a particular set of optimum
price/performance Disaster Recovery technologies.

The tiers concept is flexible. As products and functions change and improve over time, the
Tiers chart only needs to be updated by the addition of that new technology into the
appropriate tier and RTO.

The Tiers chart, shown in Figure 1-3 on page 4, gives a generalized view of some of today’s
IBM Disaster Recovery technologies by tier. As the recovery time becomes shorter, then

Chapter 1. Disaster Recovery Solution Selection Methodology 3


more aggressive Disaster Recovery technologies must be applied to achieve that RTO
(carrying with them their associated increase in value and capital cost).

Tiers of Disaster Recovery


Best Disaster Recovery practice is blend tiers of solutions in order to
maximize application coverage at lowest possible cost. One size, one
technology, or one methodology doesn't fit all applications.

Applications with
Tier 7 - Highly automated, business wide, integrated solution (Example: low tolerance to
GDPS/PPRC/VTS P2P, AIX HACMP/PPRC , OS/400 HABP.... outage
Zero
Zero or
or near
near zero
zero data
data
Tier 6 - Storage mirroring (example: XRC, recreation
recreation
PPRC, VTS Peer to Peer)
Tier 5 - Software two site, two phase commit (transaction
integrity)
Value

minutes
minutes to
to hours
hours Tier 4 - Batch/Online database shadowing & journaling, Applications
data
data recreation
recreation Point in Time disk copy (FlashCopy), TSM-DRM
somewhat tolerant
up
up to
to 24
24 hours
hours Tier 3 - Electronic Vaulting, TSM**, to outage
data
data recreation
recreation Tape
24-48
24-48 hours
hours
Tier 2 - PTAM, Hot Site, Applications very
data
data recreation
recreation TSM**
Tier 1 - PTAM* tolerant to outage
15 Min. 1-4 Hr.. 4 -8 Hr.. 8-12 Hr.. 12-16 Hr.. 24 Hr.. Days

Recovery Time *PTAM = Pickup Truck Access Method with Tape


**TSM = Tivoli Storage Manager
Tiers based on SHARE definitions *** = Geographically Dispersed Parallel Sysplex

Figure 1-3 Tiers of Disaster Recovery

The concept and shape of the Tiers chart continues to apply even as the scale of the
application or applications changes. Large scale applications will tend to move the curve to
the right, and small scale applications will tend to move the curve to the left. But in both
cases, the general relationship of the various tiers and Disaster Recovery technologies to
each other remains the same. Finally, although some Disaster Recovery technologies fit into
multiple tiers, clearly there is not one Disaster Recovery technology that can be optimized for
all the tiers.

Of course, your technical staff can and should, when appropriate, create a specific version of
the Tiers chart for your particular environment. After the staff agrees on what tier or tiers and
corresponding RTO a solution delivers for your enterprise, then Disaster Recovery technical
evaluation and comparisons are much easier, and the technology alternatives can be tracked
and organized in relation to each other. Although the technology within the tiers has obviously
changed through time, the concept continues to be as valid today as when it was first
described by the U.S. SHARE user group in 1988.

1.3 Blending tiers into an optimized solution


Best practices today in designing a Disaster Recovery solution is to further use the tiers
concept to derive a blended Disaster Recovery solution for the entire enterprise. The most
common result, from an enterprise standpoint, is a strategic architecture of three tiers in a
blended Disaster Recovery solution. Three tiers generally appear as an optimum number,

4 A Disaster Recovery Solution Selection Methodology


because at the enterprise level, two tiers generally are insufficiently optimized (in other words,
overkill at some point and underkill at others), and four tiers are more complex, but generally
do not provide enough additional strategic benefit.

To use the tiers to derive a blended, optimized enterprise Disaster Recovery architecture, we
suggest the following steps:
1. Categorize the business' entire set of applications into three bands: Low tolerance to
outage, Somewhat tolerant to outage, and Very tolerant to outage. Of course, although
some applications that are not in and of themselves critical, they do feed the critical
applications. Therefore, those applications would need to be included in the higher tier.
2. Within each band, there are tiers. The individual tiers represent the major Disaster
Recovery technology choices for that band. It is not necessary to use all the tiers, and of
course, it is not necessary to use all the technologies.
3. After we have segmented the applications (as best we can) into the three bands, we
usually select one best strategic Disaster Recovery methodology for that band. The
contents of the tiers are the candidate technologies from which the strategic methodology
is chosen.

A blended architecture optimizes and maps the varying application recovery time
requirements with an appropriate technology at an optimized cost. The net resulting blended
tier Disaster Recovery architecture provides the best possible application coverage for the
minimum cost.

1.3.1 Using tiers as a communication tool to senior management and others


The concept of tiers is also very useful as a communication tool regarding Disaster Recovery
solution recommendations to others in the department, and especially to senior management.

The tier concept is simple enough that non-technical personnel can see the end result of
technical evaluations in a straightforward fashion. Senior management does not need to
understand the technology that is inside the tier; but they can clearly see the Recovery Time
Objective and the associated cost versus RTO trade-off.

This ability to communicate the bottom line allows senior management to understand the
recommendation, the trade-offs, and therefore make a decision quickly and efficiently.
Because of the clarity of the decision alternatives, it can be more likely that management
understands the choices and reaches decisions more quickly. This clarity of the choices and
the associated financial cost should result in a higher likelihood of adequate funding for the
Disaster Recovery project.

1.3.2 The use of tiers in this Redpaper


We categorized the product information in this Redpaper by tiers. You will easily be able to
recognize what tier or tiers any given technology is likely to be in, and in this way, be able to
categorize where a particular Disaster Recovery tool and solution can be used in your
environment. In Figure 1-4 on page 6 is a partial summary list of the many IBM technologies
that will be categorized in this Redpaper.

Chapter 1. Disaster Recovery Solution Selection Methodology 5


IBM eServer / TotalStorage Disaster Recovery Portfolio of Tools

#2: Data
eServer Integrity
zSeries:
Geographically Dispersed Parallel Sysplex
Site 1 Site 2 (GDPS) - Tier 7
Common
Timers pSeries:
AIX/HACMP (High Availability Clustered
Applications Clustering Applications Multi-Processors) with PRRC - (Tier 7)
Facilities
Servers Servers iSeries:
Clustering High Availability Business Partner software:
Facilities
Vision, Lakeview, DataMirror (Tier 7)
xSeries:
X-Architecture, Blades (Tier 6)
Primary Mirrored
disk disk

#3:
#1: Software and Automation Transaction
IBM TotalStorage Byte Integrity
ESS PPRC (Tier 6) Movers DB2, IMS, CICS, WebSphere (Tier 5)
ESS XRC (Tier 6) WebSphere, MQ (Tier 5)
Virtual Tape Server Peer to Peer (Tier 6) Tivoli Storage Manager (Tier 2,3,4)
FAStT, SAN Volume Ctlr Mirroring (Tier
6) Networking and Infrastructure
ESS, FAStT, SAN Volume Controller IBM Global Services, IBM Business Partners,
FlashCopy (Tier 4) IBM Networking Partners
3590, 3592, LTO tape (Tier 1,2,3,4)
Storage software (Tier 1,2,3,4)
IGS, Business Partner Services

Figure 1-4 Portfolio of tools

1.4 Disaster Recovery Solution Selection Methodology tutorial


So let's now go into detail about using this methodology. The following sections provide a
tutorial and examples of using this methodology to more efficiently identify and select the
Disaster Recovery technology that best fits your IT environment.

1.4.1 Flowchart of the methodology


The Disaster Recovery Solution Selection Methodology is designed to provide a clear,
understandable, flexible, and repeatable method to efficiently subset, organize, and select
initial preliminary Disaster Recovery solution recommendations from the wide, possible
portfolio of technologies.

In Figure 1-5 on page 7 is a flowchart of our suggested methodology.

6 A Disaster Recovery Solution Selection Methodology


Flow of Disaster Recovery Solution Selection Methodology

We need to
be online
24x7 Detailed
The Tiers of DR solution DR solution
Disaster
Recovery matrix description
table

Risk
Analysis
results
Identify
Eliminate Valid
CEO DR solution preliminary
Define the Tier subset DR solutions
level for each that do not candidate
Hmm.... That application from apply to all solution
means DR solution requirements
Oracle and
SAP must be BIA / RTO matrix
recovered RPO
Analysis
results

Detailed
Business Evaluation
Requirements Team
CO
Figure 1-5 Flow of the Disaster Recovery Solution Selection Methodology

Note that the prerequisite to entering the methodology is having already performed and
reached organizational agreement on the business requirements: Risk analysis, Business
Impact Analysis, application segmentation, and associated Recovery Time Objectives and
Recovery Point Objectives.

The Disaster Recovery Solution Selection Methodology is designed to:


򐂰 Provide a methodology to quickly and repeatedly identify valid initial configuration options,
which is intended to accelerate determining the final solution.
򐂰 Enforce asking of correct business and IT requirements questions for a proper Disaster
Recovery configuration.
򐂰 Provide a convergence discussion methodology for the multiple products and IT
disciplines that must make up an integrated Disaster Recovery solution.
򐂰 Be easily extendable as products and technologies evolve.
򐂰 Capture basic expert Disaster Recovery intellectual capital in a teachable, repeatable way,
and provide a framework to consistently propagate these basic skills to a worldwide
audience, remote or local.

1.4.2 Intended usage and limitations of the methodology


The Disaster Recovery Solution Selection Methodology is intended to be used early in the
selection cycle to establish a generalized vision of what are the requirements and what kinds
of solutions and Disaster Recovery technologies to start investigating to solve any particular
set of Disaster Recovery requirements. See Figure 1-6 on page 8.

Chapter 1. Disaster Recovery Solution Selection Methodology 7


Intended Use of DR Solution Selection Methodology
Hmm.... That
We need to means
be online Oracle and
24x7 SAP must be
recovered Promote asking of
proper initial
questions and
collection of proper
CEO CIO information

The planning and evaluation cycle Identification of valid


initial solution
Develop
business Recognize Evaluate select Resolve Implement
technologies and
Evaluate the
business
strategy
and
needs Options Solution
Option
concerns
and
solution and
evaluate
possibilities
environment initiatives
decide success

Turn over identified


subset of possible
solutions to
DR Solution
evaluation team for
Selection
Methodology
detailed
Detailed investigation and
Evaluation selection
Team

Figure 1-6 Intended use of Disaster Recovery Selection Methodology

It is important to note what the Disaster Recovery Solution Selection Methodology cannot do:
򐂰 Not intended to replace detailed solution recommendation configuration assistance.
򐂰 Not intended to replace in-depth technical validation.
򐂰 Not intended to replace detailed design and implementation skills and services.

The detailed evaluation team will perform those functions.

The Disaster Recovery Solution Selection Methodology is not intended to be a perfect


decision tree. Rather, it is a framework for efficiently organizing multiple Disaster Recovery
technologies, and more quickly identifying the proper possible solutions for any given client
set of requirements.

1.4.3 Hourglass concept in Disaster Recovery Solution Methodology


The Disaster Recovery Solution Selection Methodology uses the following hourglass concept
in its methodology. The hourglass concept allows us to organize and minimize the information
that we need to gather in order to arrive at a valid solution, see Figure 1-7 on page 9.

8 A Disaster Recovery Solution Selection Methodology


Disaster Recovery Solution Selection Methodology:
"Hourglass" Concept
A. Ask Specific Questions in Specific Order
'Above
Ask proper high-level questions the
Start here Application, platform, RTO, Distance, Connectivity, Neck'
RPO, Vendor....
Order of questions:
Designed to eliminate many non-qualifying solutions
up front
Document answers

'At the
B. Use RTO to pick appropriate solution Neck'
subset
Pick proper Organize solutions by Tiers (creates RTO subset)
subset

C. From among subset, use question


answers to eliminate non-solutions
Apply answers gathered previously 'Below
the
Neck'
D. Turn over remaining valid solutions to
detailed evaluation team
Solution 1 Solution 2 Solution 3
As appropropriate, can then expand each solution
Solution 1a... Solution 3a... to multiple flavors, tailoring for the client's exact
needs

Figure 1-7 Hourglass concept

By segmenting the asking of questions into this hourglass concept and these three
categories, it becomes possible to efficiently subset the nearly endless permutations of
possible Disaster Recovery technology combinations and solutions into a manageable,
methodical process.

1.4.4 Steps in the Disaster Recovery Solution Selection Methodology


Let’s step through the concepts: Steps A through D.

Step A: Ask specific questions in a specific order


A series of business Disaster Recovery requirements questions is asked, in a specific order.
With these questions, the basic environment, infrastructure, and desired recovery times for
the Disaster Recovery solution are established. Below, we suggest a basic starter set of
specific questions. Some of the questions will require the business line to answer them in
their Risk and Business Impact Analysis. Other questions are for the Operations staff to
answer from their knowledge of the IT infrastructure.

Starter set of business requirements questions:


1. What is the application or applications that need to be recovered?
2. On what platform or platforms does it run?
3. What is the desired Recovery Time Objective?
4. What is the distance between the recovery sites (if there is one)?
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that?
6. What are the specific hardware and software configurations that need to be recovered?
7. What is the Recovery Point Objective?

Chapter 1. Disaster Recovery Solution Selection Methodology 9


8. What is the amount of data that needs to be recovered?
9. What is the desired level of recovery (Planned/Unplanned/Transaction Integrity)?
10.Who will design the solution?
11.Who will implement the solution?

These are not all the possible questions, of course, but they are a valid starting point. You can
see additional questions in Appendix A, “Disaster Recovery Solution Selection Methodology
matrixes” on page 29.

Note that the specific order of the questions is by intent, designed to eliminate non-solutions
even as we are performing the information gathering phase.

The questions and how they are used in our hourglass concept are shown in the following
chart in Figure 1-8.

Step A: Ask Specific Questions in a Specific Order


Step A: Start here, gather
answers to proper questions

1. What applications or 3. What is desired 5. What is the connectivity,


Recovery Time infrastructure, and bandwidth 'Above
databases to recover? the
Objective (RTO)? between sites? Neck'

2. What platform? (z, p,


4. What is distance 6. What are the specific h/w
i, x and Windows, Linux,
between the sites? (if equipment(s) that needs to
heterogeneous open, be recovered?
there are two sites)
heterogeneous z+Open)

7. What is the Level of 'At the


Recovery? Neck'
Step B: Identify
proper possible - Planned Outage
solution subset - Unplanned Outage
- Transaction Integrity

8. What is the Recovery 10. Who will design the


'Below
Point Objective? solution? (IGS, BP,
the
client) Neck'
Step C: Eliminate
non-solutions 9. What is the amount of 11. Who will implement
data to be recovered (in the solution? (IGS, BP,
GB or TB)? client) 12. Remaining solutions
are valid choices to give
to detailed evaluation
team

Figure 1-8 Step A: Ask specific questions in a specific order

The questions above the neck of the hourglass define the basic business and IT
requirements. It is essential that these basic questions be answered fully, because a lack of
any of these answers means that it is not possible to properly evaluate what subset of
solutions are the ones we should investigate. In this way, the methodology enforces the
collection of proper business and infrastructure requirements before proceeding.

We must assure that the answers to these questions have gained consensus from the
enterprise’s management, business lines, application staff, in addition to the IT operations
staff.

10 A Disaster Recovery Solution Selection Methodology


Step B: Use Tier/RTO and Level of Recovery to identify solution subset
We now are ready to identify the preliminary candidate solutions. To do that, let’s review one
final concept: the Level of Recovery (which is defined below). Note that each level builds on
the previous level. The reason for the three levels, is to accommodate the fact that Disaster
Recovery technology and solutions used will vary for Planned Outages versus Unplanned
Outages versus Transaction Integrity.
򐂰 Planned Outage: The solution is required only to facilitate Planned Outages or data
migrations. Unplanned Outage recovery is not necessary.
򐂰 Unplanned Outage: The solution is required, at the hardware and data integrity level, to
facilitate Unplanned Outage recovery. It implies that Planned Outage support is also
available in this solution. This level of recovery does not perform Transaction Integrity
recovery at the application or database level.
򐂰 Transaction Integrity: The solution is required to provide Unplanned Outage recovery at
the application and database Transaction Integrity level. This level relies on an underlying
assumption that hardware level Planned Outage and Unplanned Outage support is also
available.

Having identified the appropriate Level of Recovery, and in combination with the RTO, we now
reference the Solution Matrix in Appendix A, “Disaster Recovery Solution Selection
Methodology matrixes” on page 29.

An extract of the full Solution Matrix is shown for illustration purposes in Figure 1-9. Take the
identified Level of Recovery and RTO answers, and look into the Solution Matrix chart. You’ll
immediately identify the intersect of the Level of Recovery with the RTO/Tier. At the intersect,
in the contents of the intersection cell, are the initial candidate Disaster Recovery solutions
for this particular RTO.

Step B: Use Level of Outage and Tier/RTO to identify


RTO Solution Subset
B1. Use RTO, recovery level to select subset...

Tier 7 Tier 6 Tier 4

Planned Outages PPRC, PPRC-XD


PPRC-XD 'At the
Neck'
Unplanned GDPS/PPRC XRC, Point in Time
Outages GDPS/XRC GDPS FlashCopy,
RTO and Tiers Storage Mgr, VTS Peer to
tells me my eRCMF, etc.. Peer
"RTO Solution Transaction IMS RSR, DB2-specific
Subset" Integrity Oracle,
DB2-specific...

Disaster Recovery Solution Matrix


(extract for illustration purposes)

Figure 1-9 Step B: Identify candidate RTO solutions using tabular Tiers chart, RTO, and Level of
Recovery

Chapter 1. Disaster Recovery Solution Selection Methodology 11


In the illustration in Figure 1-9 on page 11, the identified preliminary candidate solutions are
XRC, GDPS® Storage Manager, and eRCMF.

Step C: Eliminate non-solutions


Now that we have identified the preliminary candidate Disaster Recovery solutions, we
eliminate non-solutions by applying the other answers gathered in Step A to the candidate
solutions.

For the solutions in this paper, we supply a starter set of the eliminate non-solutions in
Appendix A, “Disaster Recovery Solution Selection Methodology matrixes” on page 29. An
extract from that table is shown in Figure 1-10.

Step C: Eliminate Non-Solutions


B1. Use RTO, recovery level to select subset...
Tier 7 Tier 6 Tier 4
Planned Outages PPRC, PPRC-XD PPRC-XD

Lookup....
Unplanned GDPS/PPRC XRC, Point in Time
Outages GDPS/XRC GDPS Storage Mgr, FlashCopy, VTS
eRCMF, Peer to Peer
etc..
Transaction IMS RSR, Oracle, DB2-specific
Integrity DB2-specific....

My Questions
and Answers
eliminate C. Use 'answers' to eliminate non-solutions
non-solutions
XRC GDPS Storage eRMCF
Manager
PPRC
Platform zSeries only zSeries and z + Open Systems
Open only...
heterogeneous 'Below
Distance any distance.... < 103 km < 103 km the
Neck'
Recovery Time 2-4 hours 1-4 hours 1-4 hours
Objective
Connectivity..... ESCON ESCON ESCON
Recovery Point few seconds to zero data loss zero data loss
Objective few minutes
Valid No Yes No
Option?

Step D. Turn over Identified solutions


to detailed evaluation team

Figure 1-10 Step C: Eliminate non-solutions

By applying the answers from Step A, on topics such as distance and non-support of
platforms, those candidate solutions that do not apply will be eliminated.

It is normal to have multiple possible solutions after we complete Step C. Whatever Disaster
Recovery candidate solution or solutions remain after this pass through Step C are therefore
a valid Disaster Recovery candidate solutions.

Here we have completed our methodology.

1.4.5 Value of the Disaster Recovery Solution Selection Methodology


As simple as this sounds, this process of quickly identifying proper candidate Disaster
Recovery solutions for a given set of requirements is of significant value.

12 A Disaster Recovery Solution Selection Methodology


Much less time and skill is necessary to reach this preliminary solution identification in the
evaluation cycle than would otherwise be experienced. This methodology can manage the
preliminary evaluation phase more consistently and repeatedly, and can be easily taught to
others.

This methodology also supports our current best Disaster Recovery practices of segmenting
the Disaster Recovery architecture into three blended tiers (and therefore three tiers of
solutions). To identify the solutions for the other bands of solutions, you would simply re-run
the methodology, and give the lower RTO Level of Recovery for those lower bands and
applications; you would find the corresponding candidate solution technologies in the
appropriate (lower) RTO solution subset cells.

1.4.6 Step D: Turn over identified solutions to detailed evaluation team


Having identified a preliminary set of valid candidate Disaster Recovery solutions and
technologies, we turn over this set of candidate solutions to a skilled evaluation team, made
up of members qualified to contrast and compare the identified solutions in detail.

The valid identified candidate solutions also dictate what mix of skills will be necessary on the
evaluation team.

The evaluation team will in all likelihood need to further configure the candidate solutions into
more detailed configurations to complete the evaluation. This is also normal. In the end, that
team will still make the final decision as to which of the identified options (or the blend of
them) is the one that should be selected.

1.4.7 Updating the methodology as technology advances


This methodology is flexible. Because of the table-driven format, as technology changes, only
the contents of the Tiers chart will change; the methodology itself need not change.

In particular, as Disaster Recovery technology is created or enhanced and results in an


improvement of its tier of Disaster Recovery capability, this methodology simply:
򐂰 Adds the new technology to the appropriate RTO/Tier cell.
򐂰 Adds that solution as a column to the Eliminate Non-Solutions table.

In most cases, the questions being asked in either Step A or Step B will not need to change.

1.5 An example: Using the DR Solution Selection Methodology


To illustrate the use of the Disaster Recovery (DR) Solution Selection Methodology in
practice, here is an example. Further examples are shown in Chapter 2, “Sample scenarios”
on page 19.

1.5.1 Step A: Ask specific questions in a specific order


The first step in any Disaster Recovery solution evaluation is to gather the appropriate
business and IT infrastructure requirements by working within your organization to reach
agreement on the following questions.

Let us suppose that the answers to our starter set of Disaster Recovery Solution Selection
Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous

Chapter 1. Disaster Recovery Solution Selection Methodology 13


2. On what platform or platforms does it run? zSeries®
3. What is the desired Recovery Time Objective? 3 hours
4. What is the distance between the recovery sites (if there is one)? 35 km
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that? ESCON®, DWDM, bandwidth =
50 MB/sec
6. What are the specific storage vendor hardware and software configurations that need to
be recovered? IBM ESS
7. What is the Recovery Point Objective? Near zero data loss
8. What is the amount of data that needs to be recovered? 4 TB
9. What is the desired level of recovery (Planned/Unplanned/Transaction Integrity)?
Unplanned Integrity
10.Who will design the solution? To be determined
11.Who will implement the solution? To be determined

After this information is obtained, we proceed to Step B.

1.5.2 Step B: Use level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery to our Solution Matrix, a simplified version
for illustration purposes is shown in Figure 1-11 on page 15. A full version of this table is in
Appendix A, “Disaster Recovery Solution Selection Methodology matrixes” on page 29.
򐂰 Unplanned Level of Recovery
򐂰 Recovery Time Objective = Three hours

14 A Disaster Recovery Solution Selection Methodology


7 6 5 4, 3 2, 1
RTO ===> G e n e r a lly n e a r G e n e r a lly 1 to 6 G e n e r a lly 4 to 8 G e n e r a lly G e n e r a lly > 2 4
c o n tin u o u s to 2 h o u rs h o u rs T ie r 4 : 6 - 1 2 h o u rs
h o u rs h o u r s ; T ie r 3 :
1 2 -2 4 h o u rs
D e s c r ip tio n H ig h ly S to r a g e a n d S /W a n d H o t s ite , D is k B ackup
a u to m a te d S e r v e r m ir r o r in g d a ta b a s e P iT c o p y , s o ftw a r e , ta p e
in te g r a te d h /w tr a n s a c tio n T iv o lio S to r a g e
s /w fa ilo v e r in te g r ity M a n a g e r-D R M ,
fa s t ta p e
P la n n e d O u ta g e / PPRC, F la s h C o p y , T iv o li S to r a g e
d a ta m ig r a tio n s - P P R C -X D , P P R C -X D , M a n a g e r,
b y te m o v e r s XRC, V T S P e e r to ta p e
V T S P e e r to P e e r P e e r , T iv o li
S to r a g e
M a n a g e r,
ta p e
U n p la n n e d O u ta g e G D P S /P P R C , XRC, T ie r 4 : T iv o li S to r a g e
D ia s te r R e c o v e r y , G D P S /X R C , G D P S S to r a g e V T S P e e r to M a n a g e r,
a d d s d a ta in te g r ity A IX H A C M P - X D M a n a g e r w ith P e e r, ta p e
to b y te m o v e r s w ith E S S P P R C , PPRC, F la s h C o p y ,
W in d o w s eRCM F F la s h C o p y
G e o D is ta n c e M ig r a tio n
M a n a g e r,
P P R C -X D ,
e R C M F w ith
P P R C -X D .
T ie r 3 :
F la s h C o p y ,
T iv o li S to r a g e
M a n a g e r,
ta p e
D a ta b a s e a n d D B 2 w ith S A P , O r a c le , T ie r 3 : M S S Q L
a p p lic a tio n G D P S /P P R C D B 2 , S Q L S e rv e r S e rv e r
T r a n s a c tio n In te g rity r e m o te r e p lic a tio n d a ta b a s e
- a d d s T r a n s a c tio n c lu s te r w ith
In te g r ity to p h y s ic a l ta p e
U n p la n n e d O u ta g e tr a n s p o r t
d a ta in te g r ity

Figure 1-11 IBM TotalStorage Disaster Recovery Solution Matrix

By intersecting the Tier 6 RTO column with the Unplanned Outage row, we find that the
preliminary candidate recommendations in our simplified table would be:
򐂰 XRC
򐂰 GDPS Storage Manager with PPRC
򐂰 eRCMF

1.5.3 Step C: Eliminate non-solutions


We now use the information gathered in Step A for Step C: Eliminate non-solutions.

We examine the Step C: Eliminate Non-Solutions table for this Tier 6 Unplanned Outages for
which a starter set is supplied in Appendix A, “Disaster Recovery Solution Selection
Methodology matrixes” on page 29. A simplified version of the Eliminate Non-Solutions table
for the Tier 6 Unplanned Outage chart is shown in Figure 1-12 on page 16.

Chapter 1. Disaster Recovery Solution Selection Methodology 15


S o lu t io n X R C G D P S S to ra g e eR C M F
M a n a g e r w ith
P P R C

P la tf o r m z S e r ie s z S e r ie s , p S e r ie s , L in u x ,
H e te re o g e n e o u s S un, H P ,
in c lu d in g z S e r ie s M ic r o s o ft
W in d o w s ,
H e te ro g e n e o u s
(o p e n )
D is ta n c e < 4 0 k m , 4 0 -1 0 3 < 4 0 k m , 4 0 -1 0 3 < 4 0 k m , 4 0 -1 0 3
km , >103 km km km
C o n n e c tiv ity E S C O N , F IC O N E S C O N , F ib r e E S C O N , F ib r e
C hannel C hannel
V e n d o r (1 ) A ny X R C - A ny P P R C - IB M
c o m p lia n t z / O S c o m p lia n t
s u b s y s te m s u b s y s te m
V e n d o r (2 ) A n y z /O S S am e vendor as IB M
s u b s y s te m P P R C s u b s y s te m
R P O F e w s e c o n d s to N e a r z e ro N e a r z e ro
fe w m in u te s
A m t o f D a ta A ny A ny A ny

Figure 1-12 Tier 6 Unplanned Outage Eliminate Non-Solutions table

As we apply the different criteria sequentially from top to bottom, we find that:
1. Because the platform is IBM Sserver zSeries, we can eliminate eRCMF because that
does not support zSeries.
2. From a distance of 35 km, all remaining solutions qualify.
3. From a connectivity standpoint of ESCON, all remaining solutions qualify.
4. From a storage vendor hardware standpoint for site 1 of IBM ESS, all solutions qualify.
5. From a storage vendor hardware standpoint for site 2 of IBM ESS, all solutions qualify.
6. From a RPO standpoint of near zero, only GDPS Storage Manager with PPRC qualifies.

Therefore, we see that after applying the answers to the identified candidates and eliminating
non-solutions, this is the valid preliminary candidate solution:
򐂰 GDPS Storage Manager with ESS PPRC

The methodology can often result in more than one possible solution. This is normal.

1.5.4 Step D: Turn over identified preliminary solutions to evaluation team


We would now turn over this solution or solutions to the detailed evaluation team in Step D.

In all cases, whether we have identified one or multiple possible solutions, the detailed
evaluation team step is necessary to validate this preliminary set of identified solutions, as
well as accommodate a large variety of environment-specific considerations. As stated
earlier, the methodology is not intended to be a perfect decision tree.

This completes the methodology example.

For additional examples, see the Chapter 2, “Sample scenarios” on page 19, in which a
series of typical client Disaster Recovery requirements are distilled through this methodology,
and a preliminary solution is identified.

16 A Disaster Recovery Solution Selection Methodology


1.6 Summary
This methodology is meant as a framework and an organizational pattern for the efficient
preliminary identification of proper Disaster Recovery solutions. This methodology is
adaptable as technology or environment changes by updating the tables and questions used.
It provides a consistent, teachable, repeatable method of locating the proper preliminary
Disaster Recovery solutions.

This methodology is not meant as a substitute for Disaster Recovery skill and experience, nor
is it possible for the methodology to be a perfect decision tree. Although there clearly will be
ambiguous circumstances (for which knowledgeable Disaster Recovery experts will be
required), the methodology still provides for the collection of the proper Disaster Recovery
business requirements information.

In this way, the methodology provides an efficient process by which the initial preliminary
Disaster Recovery solution selection can be consistently performed. In the end, this
methodology should assist you in mentally organizing and using the information in this
Redpaper, as well as navigating any Disaster Recovery technology evaluation process.

Chapter 1. Disaster Recovery Solution Selection Methodology 17


18 A Disaster Recovery Solution Selection Methodology
2

Chapter 2. Sample scenarios


This chapter examines various client scenarios.

Application of the Disaster Recovery Solution Selection Methodology discussed in Chapter 1,


“Disaster Recovery Solution Selection Methodology” on page 1, is applied to each of these
client scenarios to illustrate identifying valid preliminary Disaster Recovery solution
candidates.

As detailed in Chapter 1, these preliminary identified candidate solutions should be expected


to be further refined by a detailed evaluation team.

© Copyright IBM Corp. 2004. All rights reserved. 19


2.1 Scenario 1: An efficient 24-hour recovery
Some client applications, due to their nature, are tolerant of outages and can be quite
satisfied with a 24-hour recovery. Let’s examine one of these scenarios.

2.1.1 Step A: Client requirements


To determine the client requirements, we start with Disaster Recovery Solution Selection
Methodology Step A.

Step A: Ask specific questions in a specific order


The first step is to gather the appropriate business and IT infrastructure requirements by
working within your organization to reach agreement on the following questions.

Let us suppose that the answers to our starter set of Disaster Recovery Solution Selection
Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous
2. On what platform or platforms does it run? Various
3. What is the desired Recovery Time Objective? 24 hours
4. What is the distance between the recovery sites (if there is one)? 200 km
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that? Very low bandwidth envisioned
6. What are the specific storage vendor hardware and software configurations that need to
be recovered? Large collection of different vendors
7. What is the Recovery Point Objective? 24 hours
8. What is the amount of data that needs to be recovered? 4 TB
9. What is the desired level of recovery (Planned Outage/Unplanned Outage/Transaction
Integrity)? Unplanned Outage
10.Who will design the solution? To be determined
11.Who will implement the solution? To be determined

2.1.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery to the Solution Matrix in Appendix A,
“Disaster Recovery Solution Selection Methodology matrixes” on page 29. A simplified
version of that matrix is shown below for illustration purposes (Figure 2-1 on page 21).
򐂰 Recovery Time Objective = 24 hours
򐂰 Unplanned Outage Level of Recovery

20 A Disaster Recovery Solution Selection Methodology


7 6 5 4, 3 2, 1
R TO ===> G e n e ra lly n e a r G e n e ra lly 1 to 6 G e n e ra lly 4 to 8 G e n e ra lly G e n e ra lly > 2 4
c o n tin u o u s to 2 h o u rs h o u rs T ie r 4 : 6 -1 2 h o u rs
h o u rs h o u rs ;
T ie r 3 : 1 2 -2 4
h o u rs
D e s c rip tio n H ig h ly S to ra g e a n d S /W a n d H o t s ite , D is k B ackup
a u to m a te d s e rv e r m irro rin g d a ta b a s e P iT c o p y , T iv o li s o ftw a r e , ta p e
in te g ra te d h /w tra n s a c tio n S to ra g e
s /w fa ilo v e r in te g r ity M a n a g e t- D R M ,
fa s t ta p e
P la n n e d O u ta g e / PPRC, F la s h C o p y , T iv o li S to ra g e
d a ta m ig ra tio n s - P P R C -X D , P P R C -X D , M a n a g e r,
b y te -m o v e rs XRC, V T S P e e r to ta p e
V T S P e e r to P e e r P e e r, T S M ,
ta p e
U n p la n n e d O u ta g e G D P S /P P R C XRC, T ie r 4 : T iv o li S to ra g e
D is a s te r R e c o v e ry , G D P S /X R C G D P S S to r a g e P tP V T S , M a n a g e r,
a d d s d a ta in te g rity M a n a g e r w ith F la s h C o p y , ta p e
to b y te -m o v e rs PPRC, P P R C -X D .
eR CMF T ie r 3 :
F la s h C o p y ,
T iv o li S to ra g e
M a n a g e r,
ta p e
D a ta b a s e a n d D a ta b a s e -le v e l S A P , O ra c le , T ie r 4 :
a p p lic a tio n T ra n s a c tio n D B 2 , S Q L S e rv e r D a ta b a s e
T r a n s a c tio n In te g rity In te g rity la y e re d re m o te re p lic a tio n tra n s a c tio n
- a d d s T ra n s a c tio n o n to p o f re c o v e ry w ith
In te g rity to a u to m a te d h /w jo u r n a l
U n p la n n e d O u ta g e re c o v e ry fo rw a rd in g
d a ta in te g rity T ie r 3 :
D a ta b a s e
tra n s a c tio n
re c o v e ry w ith
p h y s ic a l ta p e
o r e le c tr o n ic
v a u ltin g

Figure 2-1 IBM TotalStorage Disaster Recovery Solution Matrix

By intersecting with the Tiers 3, 2 or 1 column and the Unplanned Outage row, we find that
the preliminary candidate recommendations would be:
򐂰 IBM Tivoli® Storage Manager
򐂰 Tape

2.1.3 Step C: Eliminate non-solutions


We apply the supplied Step C: Eliminate Non-Solutions table for this Tier 2 Unplanned
Outages; see Appendix A, “Disaster Recovery Solution Selection Methodology matrixes” on
page 29. A simplified version of the table is shown in Figure 2-2 for illustration purposes.

Tier 2 Unplanned Outage - Eliminate Non-Solutions


Solution Tape Tivoli Storage Manager
Platform Any Any
Distance Any Any
Connectivity Any Any
Vendor (1) Any Any
Vendor (2) Any Any
1 to 8 hours, greater than 1 to 8 hours, greater
RPO 8 hours than 8 hours
Amt of data Any Any
Figure 2-2 Tier 2 Unplanned Outage Eliminate Non-Solutions table

Chapter 2. Sample scenarios 21


As we apply the different criteria sequentially from top to bottom, we find that there is no
additional elimination of non-solutions.

Therefore, we see that after applying the answers to the identified candidates and eliminating
non-solutions, we are left with the valid candidate solutions from those covered in this
Redpaper:
򐂰 Tivoli Storage Manager
򐂰 Tape

These are the two solutions that we would then turn over to the evaluation team in Step D. It is
probable that the evaluation team would end up using both Tivoli Storage Manager and tape
to meet this particular environment’s Disaster Recovery needs.

2.2 Scenario 2: A long distance recovery at Tier 4


Some client applications have a moderate tolerance of outages and can be acceptably
recovered in perhaps hours. Let’s examine one of these scenarios.

2.2.1 Step A: Client requirements


To determine the client requirements, we start with Disaster Recovery Solution Selection
Methodology Step A.

Step A: Ask specific questions in a specific order


The first step is to gather the appropriate business and IT infrastructure requirements by
working within your organization to reach agreement on the following questions.

For the following let us suppose that the answers to our starter set of DR Solution Selection
Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous
2. On what platform or platforms does it run? Open Systems
3. What is the desired Recovery Time Objective? 8 hours
4. What is the distance between the recovery sites (if there is one)? 1200 km
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that? Long distance telecom lines,
Fibre Channel in the data center
6. What are the specific storage vendor hardware and software configurations that need to
be recovered? IBM Enterprise Storage Server
7. What is the Recovery Point Objective? 8 hours
8. What is the amount of data that needs to be recovered? 4 TB
9. What is the desired level of recovery (Planned Outage/Unplanned Outage/Transaction
Integrity)? Unplanned Outage
10.Who will design the solution? To be determined
11.Who will implement the solution? To be determined

22 A Disaster Recovery Solution Selection Methodology


2.2.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery to the Solution Matrix in Appendix A,
“Disaster Recovery Solution Selection Methodology matrixes” on page 29. A simplified
version of that matrix is shown below for illustration purposes (Figure 2-3).
򐂰 Recovery Time Objective = 8 hours
򐂰 Unplanned Level of Recovery

7 6 5 4, 3 2, 1
RTO ===> G e n e ra lly n e a r G e n e ra lly 1 to 6 G e n e ra lly 4 to 8 G e n e ra lly G e n e ra lly > 2 4
c o n tin u o u s to 2 h o u rs h o u rs T ie r 4 : 6 -1 2 h o u rs
h o u rs h o u rs ; T ie r 3 :
1 2 -2 4 h o u r s
D e s c rip tio n H ig h ly S to ra g e a n d S /W a n d H o t s ite , D is k Backup
a u to m a te d s e rv e r m irro rin g d a ta b a s e P iT c o p y , T iv o li s o ftw a re , ta p e
in te g ra te d h /w tra n s a c tio n S to ra g e
s /w fa ilo v e r in te g r ity M a n a g e r-D R M ,
fa s t ta p e
P la n n e d O u ta g e / PPRC, F la s h C o p y , T iv o li S to ra g e
d a ta m ig r a tio n s - P P R C -X D , P P R C -X D , M a n a g e r,
b y te m o v e rs XRC, V T S P e e r to ta p e
V T S P e e r to P e e r P e e r, T S M ,
ta p e
U n p la n n e d O u ta g e G D P S /P P R C XRC, T ie r 4 : T iv o li S to ra g e
D is a s te r R e c o v e ry , G D P S /X R C G D P S S to ra g e P tP V T S , M a n a g e r,
a d d s d a ta in te g rity M a n a g e r w ith F la s h C o p y , ta p e
to b y te m o v e rs PPRC, P P R C -X D .
eR C M F T ie r 3 :
F la s h C o p y ,
T iv o li S to ra g e
M a n a g e r,
ta p e
D a ta b a s e a n d D a ta b a s e -le v e l S A P , O ra c le , T ie r 4 :
a p p lic a tio n T ra n s a c tio n D B 2 , S Q L S e rve r D a ta b a s e
T ra n s a c tio n In te g rity In te g rity la y e re d r e m o te re p lic a tio n tra n s a c tio n
- a d d s T ra n s a c tio n o n a u to m a te d re c o v e ry w ith
In te g rity to h /w re c o v e r y jo u rn a l
U n p la n n e d O u ta g e fo r w a rd in g
d a ta in te g r ity T ie r 3 :
D a ta b a s e
tra n s a c tio n
re c o v e ry w ith
e le c tro n ic
v a u ltin g o r
p h y s ic a l ta p e
tra n s p o rt

Figure 2-3 Scenario 2: Solution Matrix

By intersecting with the Tier 4 column and the Unplanned Outage row, we find that the
preliminary candidate recommendations would be:
򐂰 PtP VTS
򐂰 FlashCopy (multiple disk storage subsystems)
򐂰 ESS PPRC-XD

2.2.3 Step C: Eliminate non-solutions


For this Tier 4 Unplanned Outage set of requirements, we apply the supplied Step C:
Eliminate Non-Solutions table. For illustration purposes, a simplified version of the
appropriate Eliminate Non-Solutions table is shown in Figure 2-4 on page 24. A full version
can be found in Appendix A, “Disaster Recovery Solution Selection Methodology matrixes” on
page 29.

Chapter 2. Sample scenarios 23


T i e r 4 U n p l a n n e d O u ta g e - E l i m i n a te N o n -S o l u ti o n s
ESS
S o l u ti o n PtP VTS PPRC-XD FlashCopy
P l a tfo r m zSeries Any platform Any platform
D i sta n c e Any distance Any distance Any distance
ESCON, Fibre Any
C o n n e c ti v i ty ESCON, FICON Channel connectivity
V e n d o r (1 ) IBM IBM IB M
V e n d o r (2 ) IBM IBM A n y ve n d o r

Few seconds to
few minutes, few
Few minutes to 1 minutes to 1 hour, 1-8 hours,
hour, 1-8 hours, 1-8 hours, greater greater than 8
RPO greater than 8 hours than 8 hours hours

A m t o f d a ta A ny A ny A ny
Figure 2-4 Tier 4 Unplanned Outage Eliminate Non-Solutions table

As we apply the different criteria sequentially from top to bottom, we find that:
1. Because the data is Open Systems, we eliminate PtP VTS.
2. There are no further eliminations.

Therefore, these are the valid Disaster Recovery preliminary candidate solutions:
򐂰 ESS PPRC-XD
򐂰 FlashCopy

These are the two solutions that we would then turn over to the evaluation team in Step D.

It is probable that the evaluation team would end up investigating the usage of the two
facilities (which are quite different), and choose one. If FlashCopy is chosen, it is likely that
tape would also be configured into the solution.

As you can see, the evaluation team’s experience is what turns a very high-level preliminary
selection into a valid final selection.

2.3 Scenario 3: Enterprise long distance recovery at Tier 6


Here, we have a large enterprise class client, who requires very fast recovery times.

2.3.1 Step A: Client requirements


To determine the client requirements, we start with Disaster Recovery Solution Selection
Methodology Step A.

Step A: Ask specific questions in a specific order


The first step is to gather the appropriate business and IT infrastructure requirements by
working within your organization to reach agreement on the following questions.

24 A Disaster Recovery Solution Selection Methodology


For the following, let us suppose that the answers to our starter set of Disaster Recovery
Solution Selection Methodology questions turn out to be as follows:
1. What is the application or applications that need to be recovered? Heterogeneous
2. On what platform or platforms does it run? zSeries
3. What is the desired Recovery Time Objective? 3 hours
4. What is the distance between the recovery sites (if there is one)? 1200 km
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that? Long distance telecom lines,
FICON® in the data center
6. What are the specific storage vendor hardware and software configurations that need to
be recovered? IBM Enterprise Storage Server, Hitachi Lightning
7. What is the Recovery Point Objective? Few seconds to few minutes
8. What is the amount of data that needs to be recovered? 4 TB
9. What is the desired level of recovery (Planned Outage/Unplanned Outage/Transaction
Integrity)? Unplanned Outage
10.Who will design the solution? To be determined
11.Who will implement the solution? To be determined

2.3.2 Step B: Level of outage and Tier/RTO to identify RTO solution subset
We now apply our Tier/RTO and Level of Recovery. For illustration purposes, Figure 2-5 on
page 26 is a simplified version of the full table that can be found in Appendix A, “Disaster
Recovery Solution Selection Methodology matrixes” on page 29.
򐂰 Recovery Time Objective = 3 hours
򐂰 Unplanned Level of Recovery

Chapter 2. Sample scenarios 25


7 6 5 4, 3 2, 1
R TO ===> G e n e r a lly n e a r G e n e r a lly 1 to 6 G e n e ra lly 4 to 8 G e n e r a lly G e n e r a lly > 2 4
c o n tin u o u s to 2 h o u rs h o u rs T ie r 4 : 6 - 1 2 h o u rs
h o u rs h o u r s ; T ie r 3 :
1 2 -2 4 h o u rs
D e s c r ip tio n H ig h ly S to r a g e a n d S /W a n d H o t s ite , D is k B ackup
a u to m a te d s e rv e r m ir r o r in g d a ta b a s e P iT c o p y , s o ftw a r e , ta p e
in te g r a te d h /w tr a n s a c tio n T S M - D R M , fa s t
s /w fa ilo v e r in te g r ity ta p e
P la n n e d O u ta g e / PPRC, F la s h C o p y , T iv o li S to r a g e
d a ta m ig r a tio n s - P P R C -X D , P P R C -X D , M a n a g e r,
b y te - m o v e r s XRC, V T S P e e r to ta p e
V T S P e e r to P e e r P e e r , T iv o li
S to r a g e
M a n a g e r,
ta p e
U n p la n n e d O u ta g e G D P S /P P R C XRC, T ie r 4 : T iv o li S to r a g e
D is a s te r R e c o v e ry , G D P S /X R C G D P S S to r a g e P tP V T S , M a n a g e r,
a d d s d a ta in te g r ity M a n a g e r w ith F la s h C o p y , ta p e
to b y te -m o v e r s PPRC, P P R C -X D .
eRCM F T ie r 3 :
F la s h C o p y ,
T iv o li S to r a g e
M a n a g e r,
ta p e
D a ta b a s e a n d D a ta b a s e S A P , O r a c le , T ie r 4 :
a p p lic a tio n tr a n s a c tio n D B 2 , S Q L S e rv e r D a ta b a s e
T r a n s a c tio n In te g r ity in te g r ity la y e r e d r e m o te r e p lic a tio n tr a n s a c tio n
- a d d s T r a n s a c tio n o n a u to m a te d r e c o v e r y w ith
In te g r ity to h /w r e c o v e r y jo u r n a l
U n p la n n e d O u ta g e fo r w a r d in g
d a ta in te g r ity T ie r 3 :
D a ta b a s e
tr a n s a c tio n
re c o ve ry
e le c tr o n ic
v a u ltin g o r
p h y s ic a l ta p e
tr a n s p o r t

Figure 2-5 Solution Matrix

By intersecting the Tier 6 and Tier 7 columns, and the Unplanned Outage row, we find that the
preliminary candidate Disaster Recovery solutions would be:
򐂰 XRC
򐂰 GDPS Storage Manager
򐂰 eRMCF

2.3.3 Step C: Eliminate non-solutions


We apply the supplied Step C: Eliminate Non-Solutions table for this Tier 6 Unplanned
Outages. A simplified version of the full table is in Figure 2-6 on page 27. See Appendix A,
“Disaster Recovery Solution Selection Methodology matrixes” on page 29 for the full version.

26 A Disaster Recovery Solution Selection Methodology


S o lu t io n XR C G D P S S to ra g e eR C M F
M a n a g e r w ith
PPR C

P la tfo r m z S e r ie s z S e r ie s , p S e r ie s , L in u x ,
H e te re o g e n e o u s Sun, H P,
in c lu d in g z S e r ie s W in d o w s ,
H e te ro g e n e o u s
( o p e n o n ly )
D is ta n c e <40 km , < 40 km , <40 km ,
4 0 -1 0 3 k m , 4 0 -1 0 3 k m 4 0 -1 0 3 k m
>103 km
C o n n e c tiv ity E S C O N , F IC O N E S C O N , F ib r e E S C O N , F ib r e
C hannel C hannel
V e n d o r (1 ) Any Any IB M E S S
X R C - c o m p lia n t P P R C - c o m p lia n t
z /O S s u b s y s te m s u b s y s te m
V e n d o r (2 ) A n y z /O S S am e vendor IB M E S S
s u b s y s te m P P R C s u b s y s te m
R PO F e w s e c o n d s to N e a r z e ro N e a r z e ro
fe w m in u te s
A m t o f d a ta Any Any Any

Figure 2-6 Scenario 3: Eliminate non-solutions Tier 6 Unplanned Outage

As we apply the different criteria sequentially from top to bottom, we find that:
1. Because the platform is zSeries, we can eliminate eRCMF, because it does not support
zSeries.
2. At a distance of 1200 km, we eliminate those solutions that cannot reach this distance.
Only XRC qualifies.
3. From a connectivity standpoint of FICON, XRC qualifies.
4. From a storage vendor hardware standpoint for site 1, XRC qualifies.
5. From a storage vendor hardware standpoint for site 2, XRC qualifies.
6. From a RPO standpoint of near zero, XRC qualifies.

Therefore, we see that after applying the answers to the identified candidates and eliminating
non-solutions, this is a valid preliminary candidate solution:
򐂰 XRC

We would now turn over this solution to the detailed evaluation team in Step D for
confirmation and detailed evaluation.

Chapter 2. Sample scenarios 27


28 A Disaster Recovery Solution Selection Methodology
A

Appendix A. Disaster Recovery Solution


Selection Methodology matrixes
This appendix provides the following tools for use with the Disaster Recovery Solution
Selection Methodology described in Chapter 1, “Disaster Recovery Solution Selection
Methodology” on page 1:
򐂰 “Starter set of business requirement questions” on page 30 provides the starter set of
business requirements questions for the methodology.
򐂰 “Disaster Recovery Solution Matrix” on page 30 is the Solution Matrix that organizes the
collection of IBM TotalStorage solutions in this Redpaper into tiers.
򐂰 “Eliminate non-solutions matrixes” on page 32 are the Eliminate Non-Solutions tables, one
for each cell in the Solution Matrix.
򐂰 Additional questions useful for building business justification for the Disaster Recovery
solution, as well as further information needed by the detailed evaluation team, are
included in “Additional business requirements questions” on page 47.

© Copyright IBM Corp. 2004. All rights reserved. 29


Starter set of business requirement questions
Following is a suggested starter set of business Disaster Recovery requirements questions
and answers to be obtained, prior to entering into the methodology. These are designed to
elicit enough basic information to start the process.

A detailed list of additional questions is supplied in “Additional business requirements


questions” on page 47.

Some of these questions will require the business line to answer them in a Risk and Business
Impact Analysis. Other questions are for the operations staff to answer from their knowledge
of the IT infrastructure.

The starter set of business requirements questions is as follows:


1. What is the application or applications that need to be recovered?
2. On what platform or platforms does it run?
3. What is the desired Recovery Time Objective?
4. What is the distance between the recovery sites (if there is one)?
5. What is the form of connectivity or infrastructure transport that will be used to transport the
data to the recovery site? How much bandwidth is that?
6. What are the specific hardware and software configurations that need to be recovered?
7. What is the Recovery Point Objective?
8. What is the amount of data that needs to be recovered?
9. What is the desired Level Of Recovery (Planned/Unplanned/Transaction Integrity)?
10.Who will design the solution?
11.Who will implement the solution?

Disaster Recovery Solution Matrix


The full Solution Matrix for use with the IBM TotalStorage solutions in this paper is the matrix
shown in Table A-1 on page 31.

30 A Disaster Recovery Solution Selection Methodology


Table A-1 Disaster Recovery Solution Matrix
Tier 7 Tier 6 Tier 5 Tiers 4,3 Tiers 2, 1

RTO Generally near Generally 1 to 6 Generally 4 to 8 Generally Generally > 24


continuous to 2 hours hours Tier 4: 6-12 hrs hours
hours Tier 3: 12-24 hrs
Description Highly automated Storage and Software, Hotsite, disk PiT Backup software,
integrated server mirroring application, and copy, database physical
hardware and database journaling and transport of tape
software failover Transaction forwarding,
Integrity comprehensive
backup s/w, fast
tape, electronic
vaulting

Level of - GDPS/PPRC - PPRC, Software, - FlashCopy, - Tivoli Storage


Recovery: - GDPS/PPRC - PPRC-XD application, and - PPRC-XD, Manager
with - XRC, database level - PtP VTS - tape
Planned HyperSwap™ - PtP VTS facilities - Tivoli Storage
Outages, data - GDPS/XRC Manager - DRM,
migrations - AIX® - tape
“byte movers” HACMP-XD with
PPRC
- Windows®
GeoDistance

Level of - GDPS/PPRC - XRC, Software, Tier 4: - Tivoli Storage


Recovery: - GDPS/PPRC - GDPS Storage application, and - PtP VTS Manager
with HyperSwap Manager with database-level - FlashCopy - tape
Unplanned - GDPS/XRC PPRC, facilities - FlashCopy
Outage, adds - AIX HACMP/XD - eRCMF Manager
data integrity with PPRC - PPRC Migration - PPRC-XD
to “byte - Windows Manager - eRCMF with
movers” GeoDistance - AIX LVM PPRC-XD
- AIX HACMP Tier 3:
- FlashCopy
- Tivoli Storage
Manager
- tape

Level of - Database-level - Database-level - Remote - Tier 4: - Database-level


Recovery: recovery on top recovery on top replication with Database-level recovery with
of any Tier 7 of any Tier 6 DB2, Oracle, journal file physical tape
Transaction Unplanned Unplanned SQL Server, etc. forwarding and transport
Integrity, adds Outage recovery. Outage recovery. - Software, remote
Transaction Examples: DB2® Examples: DB2 application, and application
Integrity with GDPS or AIX UDB with AIX database level - Tier 3:
hardware HACMP-XD, SAP HACMP/XD,SAP facilities. Database-level
“Unplanned and DB2 remote and DB2 remote Examples: recovery with
Outage” data replication, replication, shadow database physical tape
integrity shadow database shadow database with forward transport
with forward with forward recovery, Split
recovery, split recovery, split mirror, etc.
mirror, etc. mirror, etc.

Tolerance to Low tolerance to Low tolerance to Low tolerance to Somewhat Very tolerant to
outage outage outage outage tolerant to outage outage

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 31


Notes on the Solution Matrix cells
As a general comment, recall that the Disaster Recovery Solution Selection Methodology is
not intended to be a perfect decision tree, and the boundaries and contents of the cells are of
necessity giving general guideline suggestions rather than attempting to be all-inclusive.

The methodology allows room for product and Disaster Recovery experts to add their
expertise to the evaluation process after an initial preliminary set of candidate solutions is
identified.

The intent of the methodology is to provide a framework for efficiently organizing multiple
Disaster Recovery technologies, and more quickly identifying the proper possible solutions for
any given client set of requirements.

Tiers 7, 6, and 5 Transaction Integrity


The solutions for Transaction Integrity are specific to the database and application software
being used. Because of this, the list of possible solutions is very broad, and it is not feasible to
be all-inclusive. You should involve a software specialist skilled in the application and
database set that you are using for detailed evaluation of Transaction Integrity recovery
specific to your database and application. We do show common examples of solutions in
these cells in the matrix.

Eliminate non-solutions matrixes


The following tables are used to eliminate non-solutions. There is one matrix set for each cell
in the Solution Matrix.

Tier 7 Planned Outage


This is the matrix for Tier 7 and Planned Outage.

Table A-2 Tier 7 and Planned Outage matrix


Solution GDPS/PPRC GDPS/PPRC AIX HACMP-XD Windows
with HyperSwap with PPRC GeoDistance

Platform zSeries zSeries AIX Windows


zSeries + Open

Distance <103 km <103 km <103 km <103 km

Connectivity ESCON, ESCON, Fibre Channel, Fibre Channel,


Fibre Channel, Fibre Channel, TCP/IP TCP/IP
FICON FICON

Supported disk PPRC-compliant PPRC IBM ESS IBM ESS


storage (primary storage HyperSwap-
site) capable storage

Supported disk PPRC-compliant PPRC IBM ESS IBM ESS


storage vendor storage HyperSwap-
(secondary site) capable storage

Recovery Point Near zero Near zero Near zero Near zero
Objective

Amount of data Any Any Any Any

Other notes

32 A Disaster Recovery Solution Selection Methodology


Tier 7 Unplanned Outage
This is the matrix for Tier 7 and Unplanned Outage.

Table A-3 Tier 7 and Unplanned Outage matrix


Solution GDPS/PPRC GDPS/PPRC GDPS/XRC AIX HACMP-XD Windows
with with PPRC GeoDistance
HyperSwap

Platform zSeries zSeries zSeries AIX Windows


zSeries + Open

Distance <103 km <103 km <103 km <103 km <103 km

Connectivity ESCON, ESCON, ESCON, Fibre Channel, Fibre Channel,


Fibre Channel, Fibre Channel, Fibre Channel, TCP/IP TCP/IP
FICON FiCON FiCON

Supported disk PPRC-compliant PPRC XRC-capable IBM ESS IBM ESS


storage storage HyperSwap- storage
(primary site) compliant
storage

Supported disk PPRC-compliant PPRC Any z/OS® IBM ESS IBM ESS
storage storage HyperSwap- supported
(secondary compliant storage
site) storage

Recovery Point Near zero Near zero Near zero Near zero Near zero
Objective

Amount of data Any Any Any Any Any

Other notes Delivered as IBM Delivered as IBM Delivered as IBM


Global Services Global Services Global Services
Offering Offering Offering

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 33


Tier 7 Transaction Integrity
This matrix is for Tier 7 and Transaction Integrity. For readability, the multiple columns for this
matrix are separated into multiple parts, Parts 1 and 2.

The solutions for Transaction Integrity are specific to the database and application software
being used. Because of this, the list of possible solutions is very broad, and it is not feasible to
be all-inclusive. You should involve a software specialist skilled in the application and
database set that you are using for detailed evaluation of Transaction Integrity recovery
specific to your database and application. We do show common examples of solutions in
these cells in the matrix.

Table A-4 Tier 7 and Transaction Integrity: Part 1


Solution Database Database Database Database
transaction transaction transaction transaction
recovery recovery recovery recovery
layered on layered on layered on AIX layered on
GDPS/PPRC GDPS/PPRC HACMP-XD with Windows
with HyperSwap ESS PPRC GeoDistance

Platform zSeries, zSeries pSeries® Windows


zSeries + Open Microsoft®
Clustering

Distance <103 km <103 km Any <103 km

Connectivity ESCON, ESCON, Fibre Channel, Fibre Channel,


Fibre Channel, Fibre Channel, TCP/IP TCP/IP
FICON FICON

Supported disk PPRC-compliant PPRC IBM ESS IBM ESS


storage storage HyperSwap-
(primary site) compliant
storage

Supported disk PPRC-compliant PPRC IBM ESS IBM ESS


storage storage HyperSwap-
(secondary site) compliant
storage

Recovery Point Near zero Near zero Near zero, few Near zero
Objective seconds to few
minutes, few
minutes to hours

Amount of data Any Any Any Any

Other notes Delivered as IBM Delivered as IBM


Global Services Global Services
Offering Offering

34 A Disaster Recovery Solution Selection Methodology


Table A-5 Tier 7 and Transaction Integrity: Part 2
Solution Shadow database Split Mirror database
with forward with PPRC
recovery
Platform Any Any

Distance Any <103 km

Connectivity Any ESCON,


Fibre Channel,
FICON

Supported disk Any PPRC-compliant disk


storage (primary site) subsystem

Supported disk Any PPRC-compliant disk


storage (secondary subsystem
site)

Recovery Point Depending on the log Near zero


Objective shipping mechanism,
loss of only few
transactions possible

Amount of data Any Any

Other notes

Tier 6 Planned Outage


For readability, the multiple columns for this matrix are separated into multiple parts, Parts 1
and 2.

Table A-6 Tier 6 Planned Outage: Part 1


Solution PPRC PPRC-XD XRC

Platform Any Any zSeries

Distance <103 km Any Any

Connectivity ESCON, ESCON, FICON,


Fibre Channel Fibre Channel ESCON

Supported disk PPRC-compliant IBM ESS XRC-capable


storage storage storage
(primary site)

Supported disk PPRC compliant IBM ESS Any z/OS


storage storage supported
(secondary site) storage

Recovery Point Near zero Few seconds to Few seconds to


Objective few minutes few minutes

Amount of data Any Any Any

Other notes

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 35


Table A-7 Tier 6 Planned Outage: Part 2
Solution PtP VTS PtP VTS
Synchronous Asynchronous

Platform zSeries zSeries

Distance <43 km Any

Connectivity ESCON, ESCON,


Fibre Channel Fibre Channel

Supported disk Any z/OS-supported Any z/OS-supported


storage (primary site) storage storage

Supported disk Any z/OS-supported Any z/OS-supported


storage (secondary storage storage
site)

Recovery Point Near zero Few seconds to few


Objective minutes, minutes to
hours (defined by user
policy)

Amount of Data Any Any

Other notes

Tier 6 Unplanned Outage


For readability, the multiple columns for this matrix are separated into multiple parts, Parts 1
and 2.

Table A-8 Tier 6 Unplanned Outage: Part 1


Solution XRC GDPS Storage eRCMF Storage PPRC Migration
Manager with Manager with Manager
PPRC PPRC

Platform zSeries zSeries Open System zSeries

Distance Any <103 km <103 km <103 km

Connectivity FICON, Fibre Channel, Fibre Channel, ESCON,


ESCON TCP/IP TCP/IP Fibre Channel

Supported XRC-capable PPRC-compliant IBM ESS IBM ESS


storage storage storage
(primary site)

Supported Any PPRC-compliant IBM ESS IBM ESS


storage z/OS-supported storage
(secondary site) storage

Recovery Point Few seconds to Near zero Near zero Near zero
Objective few minutes

Amount of data Any Any Any Any

Other notes Delivered as IBM Delivered as IBM Lower cost


Global Services Global Services special bid
Offering Offering offering for
circumstances
that cannot justify
GDPS solutions

36 A Disaster Recovery Solution Selection Methodology


Table A-9 Tier 6 Unplanned Outage: Part 2
Solution AIX LVM AIX HACMP AIX HACMP/XD

Platform pSeries pSeries pSeries

Distance 10 km Metropolitan Any


distances

Connectivity Fibre Channel Fibre Channel, Fibre Channel,


TCP/IP TCP/IP

Supported Any storage Any storage Any storage


storage supported by AIX supported by AIX supported by AIX
(primary site)

Supported Any storage Any storage Any storage


storage supported by AIX supported by AIX supported by AIX
(secondary site)

Recovery Point Near zero, few Near zero, few Near zero, few
Objective seconds to few seconds to few seconds to few
minutes, minutes minutes, minutes minutes, minutes
to hours to hours to hours
(defined by user (defined by user (defined by user
policy policy policy

Amount of data Any Any Any

Other notes

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 37


Tier 6 Transaction Integrity
The solutions for Transaction Integrity are specific to the database and application software
being used. Because of this, the list of possible solutions is very broad, and it is not feasible to
be all-inclusive. You should involve a software specialist skilled in the application and
database set that you are using for detailed evaluation of Transaction Integrity recovery
specific to your database and application. We do show common examples of solutions in
these cells in the matrix.

This matrix shows examples for Tier 6 and Transaction Integrity.

Table A-10 Tier 6 and Transaction Integrity


Solution Database-level Shadow database Split mirror
transaction with forward database with
recovery on top of recovery PPRC
any Tier 6
Unplanned Outage
recovery

Platform Database and Any Any


application specific

Distance Database and Any <103 km


application specific

Connectivity TCP/IP Any ESCON,


Fibre Channel,
FICON

Supported disk Any Any PPRC-compliant disk


storage (primary subsystem
site)

Supported disk Any Any PPRC-compliant disk


storage (secondary subsystem
site)

Recovery Point Near zero, few Depending on the log Near zero
Objective seconds to few shipping mechanism,
minutes, minutes to loss of only few
hours transactions possible
(dependent on
specific database,
application, and
hardware)

Amount of data Any Any Any

Other notes

38 A Disaster Recovery Solution Selection Methodology


Tier 5 Planned Outage
The solutions in Tier 5 are specifically defined as database and application software
functionalities for Planned, Unplanned, and Transaction Integrity recovery. These solutions
are dependent on each individual software’s capabilities.

The scope of this paper is to focus on hardware and operating system-level IBM TotalStorage
Disaster Recovery solutions. You should involve a software specialist skilled in the application
and database set that you are using. However, as a general statement, robust databases
have integrated software functionalities to enhance and minimize Planned Outages.

Table A-11 Tier 5 Planned Outage


Solution Software, application,
and database-level
facilities

Platform Software, application,


and database specific

Distance Software, application,


and database specific

Connectivity Fibre Channel,


TCP/IP

Supported disk Any


storage (primary site)

Supported disk Any


storage (secondary
site)

Recovery Point Software, application,


Objective and database specific,
typically defined by
software policy

Amount of data Any

Other notes

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 39


Tier 5 Unplanned Outage
The solutions in Tier 5 are specifically defined as database and application software
functionalities for Unplanned and Transaction Integrity recovery. These solutions are
dependent on each individual software’s capabilities.

The scope of this Redpaper is to focus on hardware and operating system level IBM
TotalStorage Disaster Recovery solutions. You should involve a software specialist skilled in
the application and database set that you are using. However, as a general statement, robust
databases have integrated software functionalities to do Unplanned Outage recovery.

Table A-12 Tier 5 Unplanned Outage


Solution Software, application,
and database-level
facilities

Platform Software, application,


and database specific

Distance Software, application,


and database specific

Connectivity Fibre Channel,


TCP/IP

Supported disk Any


storage (primary site)

Supported disk Any


storage (secondary
site)

Recovery Point Software, application,


Objective and database specific,
typically defined by
software policy

Amount of data Any

Other notes

40 A Disaster Recovery Solution Selection Methodology


Tier 5 Transaction Integrity
The solutions in Tier 5 are specifically defined as database and application software
functionalities for Transaction Integrity recovery. These solutions are dependent on each
individual software’s capabilities.

The scope of this paper is to focus on hardware and operating system-level IBM TotalStorage
Disaster Recovery solutions. You should involve a software specialist skilled in the application
and database set that you are using.

However, as a general statement, robust databases have integrated software functionalities to


do remote replication. To maintain Transaction Integrity, the database functionality must be
integrated into whatever replication architecture is being used.

Table A-13 Tier 5 Transaction Integrity


Solution Remote replication
Transaction Integrity
with DB2, Oracle, SQL
Server, and so on

Platform Software, application,


and database specific

Distance Software, application,


and database specific

Connectivity Fibre Channel,


TCP/IP

Supported disk Any


storage (primary site)

Supported disk Any


storage (secondary
site)

Recovery Point - Near zero, few


Objective seconds to few minutes,
minutes to hours
- Dependent on user
policy

Amount of data Any

Other notes

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 41


Tiers 4 and 3 Planned Outage
This matrix is for Tiers 4 and 3 and Planned Outage.

Table A-14 Tiers 4 and 3 and Planned Outage matrix


Solution FlashCopy PPRC-XD PtP VTS Tivoli Storage Fast tape
Manger -
Disaster
Recovery
Manager

Platform Any Any zSeries Any open Any

Distance Any Any Any Any Any

Connectivity

Supported disk FlashCopy- IBM ESS Any Any Any


storage capable storage
(primary site)

Supported disk FlashCopy- IBM ESS Any Any Any


storage capable storage
(secondary site)

Recovery Point Minutes to hours Minutes to hours Minutes to hours Minutes to hours Hours
Objective

Amount of data Any Any Any Any Any

Other notes

Tier 4 Unplanned Outage


This matrix is for Tier 4 and Unplanned Outage.

Table A-15 Tier 4 and Unplanned Outage matrix


Solution PtP VTS FlashCopy FlashCopy PPRC-XD eRCMF with
Asynch Manager PPRC-XD

Platform zSeries Any zSeries Any Any

Distance Any Any Any Any Any

Connectivity FICON, ESCON N/A N/A Fibre Channel, Fibre Channel,


ESCON ESCON

Supported disk Any Any Any IBM ESS IBM ESS


storage (primary
site)

Supported disk Any Any Any IBM ESS IBM ESS


storage
(secondary site)

Recovery Point Minutes to hours Minutes to hours Minutes to hours Minutes to hours Minutes to hours
Objective

Amount of data Any Any Any Any Any

Other notes IBM Storage IBM Global


Services Services
Offering Offering

42 A Disaster Recovery Solution Selection Methodology


Tier 4 Transaction Integrity
This matrix is for Tier 4 and Transaction Integrity.

Table A-16 Tier 4 and Transaction Integrity matrix


Solution Database-level journal
file forwarding and
remote application

Platform Software, application,


and database specific

Distance software, application,


and database specific

Connectivity Fibre Channel,


TCP/IP

Supported disk Any


storage (primary site)

Supported disk Any


storage vendor
(secondary site)

Recovery Point - Minutes to hours


Objective - Dependent on user
policy

Amount of data Any

Other notes

Tier 3 Unplanned Outage


This matrix is for Tier 3 and Unplanned Outage.

Table A-17 Tier 3 and Unplanned Outage matrix


Solution FlashCopy Tivoli Storage Tape
Manager

Platform Any Any Any

Distance Any Any Any

Connectivity N/A TCP/IP N/A

Supported disk FlashCopy- Any Any


storage capable storage
(primary site)

Supported disk FlashCopy- Any Any


storage capable storage
(secondary site)

Recovery Point Minutes to hours Minutes to hours Minutes to hours


Objective

Amount of data Any Any Any

Other notes

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 43


Tier 3 Transaction Integrity
This matrix is for Tier 3 and Transaction Integrity.

Table A-18 Tier 3 and Transaction Integrity matrix


Solution Database-level
recovery using
electronic tape
vaulting

Platform Software, application,


and database specific

Distance Software, application,


and database specific

Connectivity Fibre Channel,


TCP/IP

Supported disk Any


storage (primary site)

Supported disk Any


storage (secondary
site)

Recovery Point - Minutes to hours


Objective - Dependent on user
policy

Amount of data Any

Other notes

44 A Disaster Recovery Solution Selection Methodology


Tiers 2 and 1 Planned Outage
This matrix is for Tiers 2 and 1 and Planned Outage.

Table A-19 Tiers 2 and 1 and Planned Outage matrix


Solution Tivoli Storage Tape
Manager

Platform Software, Any


application, and
database specific

Distance Software, Any


application, and
database specific

Connectivity Fibre Channel, Any


TCP/IP

Supported disk Any Any


storage (primary
site)

Supported disk Any Any


storage
(secondary site)

Recovery Point - Minutes to hours - Minutes to hours


Objective - Dependent on - Dependent on
user policy user policy

Amount of data Any Any

Other notes

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 45


Tiers 2 and 1 Unplanned Outage
This matrix is for Tiers 2 and 1 and Unplanned Outage.

Table A-20 Tiers 2 and 1 and Unplanned Outage matrix


Solution Tivoli Storage Tape
Manager

Platform Software, Any


application, and
database specific

Distance Software, Any


application, and
database specific

Connectivity Fibre Channel, Any


TCP/IP

Supported disk Any Any


storage (primary
site)

Supported disk Any Any


storage
(secondary site)

Recovery Point - Minutes to hours - Minutes to hours


Objective - Dependent on - Dependent on
user policy user policy

Amount of data Any Any

Other notes

46 A Disaster Recovery Solution Selection Methodology


Tiers 2 and 1 Transaction Integrity
This matrix is for Tiers 2 and 1 and Transaction Integrity.

Table A-21 Tiers 2 and 1 and Transaction Integrity matrix


Solution Database-level
recovery using
physical tape
transport

Platform Software, application,


and database specific

Distance Software, application,


and database specific

Connectivity N/A

Supported disk Any


storage (primary site)

Supported disk Any


storage (secondary
site)

Recovery Point - Hours to days hours


Objective - Dependent on user
policy

Amount of data Any

Other notes

Additional business requirements questions


Following is a list of additional business requirement questions that can and should be
answered prior to entering the Disaster Recovery Solution Selection Methodology.

Justifying business continuance to the business


Because Disaster Recovery solutions are by their very nature insurance, the following
questions can help identify the ongoing daily payback value of a proposed Disaster Recovery
solution.

You might partially or fully justify the requested investment for Disaster Recovery to senior
management by quantifying the following values, and then portraying the proposed Disaster
Recovery solution cost as only a portion of the anticipated ongoing daily benefits, as identified
by these questions.

Tangible, compelling IT values


Tangible, compelling IT values include the following benefits.

Savings due to Planned Outage reductions


1. Benefits and savings, revenue increases, due to business being able to operate without
the Planned Outage
2. Benefits in personnel productivity
3. Savings in removing overtime compensation, overtime savings for Planned Outages

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 47


Savings due to better testability/maintainability of recovery solution
1. Lowered cost for every test:
– Savings due to lowered system resource impact for test
– Savings due to better control of testing
– Savings due to better reliability in testing
– Savings due to better speed in test completion
2. Savings in maintainability costs, because GDPS insulates the Disaster Recovery method:
– From application changes
– From hard to define/hard to manage applications
– From hard to manage/inability to manage data

Benefits of absolute confidence in switch or cutover


1. Better information flow to decision team due to automation messaging of status
2. Lowered cost of maintaining solution
3. Increased accuracy of test and switch due to automation

Savings due to personnel cost reductions


1. Savings due to reduced labor and cost of a custom roll-your-own implementation.
2. Savings due to reduces labor costs due to automation.
3. Automation reduces salary/overtime cost of personnel to perform a recovery.
4. Pre-install and post-install difference in amount of staff required for change windows,
Unplanned Outages, and practice and execute Disaster Recovery.
5. Savings because less costly and more available “B” and “C” personnel team can perform
Planned/Unplanned Outage recovery.
6. Lowered skill requirements for operations or recovery team.
7. Survivability without requiring key personnel.

Benefit of providing DR after large storage or server consolidation


1. Benefits of providing efficient, trustable recovery of large consolidated data center of
servers / storage.

Cost savings of bringing DR in-house versus out-sourced service provider


1. For same expenditure: Better recoverability, removal of dependencies on other service
provider clients, no expiration time limit in recovery center.
2. Savings due to removal of out-sourced recovery center for equivalent functionality.

Tangible, compelling business values


Tangible, compelling business values include the following benefits.

Strategic and competitive advantage


1. 24x7 Internet client availability required on new applications
2. Worldwide client availability required on new applications
3. Meet mandatory regulatory requirements
4. Avoidance of large $ impact to business of a disaster (client $1000/hr.)
5. Exploit existing investment in installed equipment
6. Future regulation compliance in affordable, strategic approach

48 A Disaster Recovery Solution Selection Methodology


Confidence
1. Regulatory agency confidence
2. Shareholder confidence
3. Financial markets confidence
4. Senior management confidence and trust in the recovery
5. Maintenance of brand image
6. Willingness to use the recovery or switch because of the switch

Tactical
1. Employee idling labor cost
2. Cost of re-creation and recovery of lost data
3. Salaries paid to staff unable to undertake billable work
4. Salaries paid to staff to recover work backlog and maintain deadlines
5. Interest value on deferred billings
6. Penalty clauses invoked for late delivery and failure to meet service levels
7. Loss of interest on overnight balances; cost of interest on lost cash flow
8. Delays in client accounting, accounts receivable and billing/invoicing
9. Additional cost of working; administrative costs; travel and subsistence; and so on

Intangible, compelling values: IT


Intangible compelling values for IT include the following:
1. Value of Disaster Recovery strategy that resolves failed previous Disaster Recovery
methods.
2. Personnel:
– Savings due to reduced number of storage administrators required per TB of disk
storage
– Recruitment costs for new staff on staff turnover
– Training/retraining costs for staff
3. Confidence in recoverability because of:
– More frequent tests
– Success of tests
4. Planned Outage reductions creates new options in testing or site maintenance:
– Confidence and accuracy value due to more frequent testing
– Savings due to less expensive cost for testing
– High confidence in switch
– Value of prior and post “Planned Outage minutes/year”
– Business impact of Planned Outages/year (Planned Outage client cost * Planned
Outage minutes)
5. Testing:
– Assuring successful recovery through increased frequency of testing
– Catching errors in recovery through increased frequency of testing
– Repeatability
6. Automation value:
– Repeatability

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 49


– Trustability

Intangible compelling values: Business


Intangible compelling business values include the following benefits.

Unplanned Outage revenue loss avoidances


1. Lost revenue
2. Loss of cash flow
3. Loss of clients (lifetime value of each) and market share
4. Loss of profits

Unplanned Outage cost avoidances: IT


1. Cost of replacement of buildings and plant
2. Cost of replacing equipment
3. Cost of replacing software

Unplanned Outage business impacts


1. Brand image recovery.
2. Fines and penalties for noncompliance.
3. Liability claims.
4. Additional cost of advertising, PR, and marketing to reassure clients and prospects to
retain market share.
5. Loss of share value.
6. Loss of control over debtors.
7. Loss of credit control and increased bad debt.
8. Delayed achievement of benefits of profits from new projects or products.
9. Loss of revenue for service contracts from failure to provide service or meet service levels.
10.Lost ability to respond to contract opportunities.
11.Penalties from failure to produce annual accounts or produce timely tax payments.
12.Where company share value underpins loan facilities, share prices could drop and loans
be called in or be re-rated at higher interest levels.
13.Additional cost of credit through reduced credit rating.

Business requirements questions for detailed evaluation team


The following list of questions and answers will need to be addressed by the detailed
evaluation team in the course of quantifying, justifying, and designing the Disaster Recovery
solution. Some questions are business in nature, others are IT or infrastructure in nature.
They are the expanded super-set from which the basic starter set business requirements
questions in “Starter set of business requirement questions” on page 30 are derived.

We provide them here so that you can have a guideline for the types of information that will
need to be gathered and analyzed by the detailed evaluation teams to finalize an in-depth
recommendation.

50 A Disaster Recovery Solution Selection Methodology


Business profile
1. What is the client business/industry?
2. What is the compelling reason for the client to act at this time?
3. Who is the sponsor within the organization?
4. What is the budget that is allocated for this project?
5. When do they expect to have this implemented?
6. What are your goals that you feel are important for a successful project?
7. Which business sponsors do we need to engage with to properly determine the critical
success factors for the project?
8. Will funding come from these mission-critical business sponsors or from within the
previously constructed IT budget? Are funds allocated?
9. Have you designed an IT recovery program, which incorporates various “speeds” of
recovery in the event of interruption? Who is your current business continuance provider?
Current contract expiration date?
10.Does your recovery plan take into account any acceptable level of transaction data loss
and data unrecoverable? Explain.
11.What would the financial impact be on the interruption to your company due to some
unexpected, unplanned catastrophic event?
12.Which business processes require an advanced level of recoverability in the event of an
unplanned medium to a long-term interruption of I/T services?
13.Do you back up all of your company’s critical data on a regular basis? Frequency? If a
declared disaster occurred, would you be ready and able to restore your company’s critical
data to the point of failure?
14.Are critical applications replicated offsite in case of disaster? Can you access the site
quickly with your staff in the time you have established?
15.If you don’t have a business continuance program in place, what is the motivating factor
associated with this change in strategy? Why are you interested in doing this now?
16.What is your current yearly cost associated with business continuance? If internal,
approximate cost.
17.What is your current time frame for the business continuance project?
18.What type of disk do you currently use? Manufacturer? Total capacity? Mixed
environment? Utility S/W? Upgrade plans? Explain.
19.Can you supply a total inventory list of all current server hardware?
20.Is your company a current IBM Hot Site client?

Disaster Recovery planning and infrastructure sizing


21.Has a sizing exercise been done? (Disk Magic for PPRC or XRC sizing, or both)?
22.Is the implementation for data migration or Disaster Recovery?
23.If the implementation is for data migration, is the plan to minimize the amount of time in
duplex by using the hardware bit maps?
24.Has the client been made aware of the various Disaster Recovery documents and tools?
25.Will IBM Global Services or a Business Partner be involved in the implementation?
26.Has FlashCopy or some other Point in Time (PiT) copy been considered?

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 51


27.What is the current Disaster Recovery Objective (RTO or RPO)?
28.Recovery Time Objective (RTO)? What is your desired elapsed time objective from time of
disaster until time of full recovery and accessibility to end users? (Includes database
recovery time.)
29.Recovery Point Objective (RPO)? At the time that the RPO is complete, how much data is
possible to recreate? (Measured in terms of seconds, minutes, and hours.)
30.How long does it take to Initial Program Load (IPL) the system following an unplanned
system failure?
31.What is the critical application restart time after a system failure (after the system is
IPLed)?
32.What is the planned system restart time under normal conditions (length of time to bring
up your system)?
33.What is the planned system shutdown time (length of time to stop all applications and the
system)?
34.What platforms are required to be recovered?
– z/OS
– S/390®
– OS/390®
– VM
– VSE
– TPF
– Linux/390
– UNIX®
– IBM Sserver pSeries
– RS/6000®
– AIX (non-clustered or clustered?)
– Sun Solaris (non-clustered or clustered?)
– HP-UX (non-clustered or clustered?)
– IBM Sserver iSeries (AS/400®, OS/400®)
– IBM Sserver xSeries
– Microsoft Windows NT®
– Microsoft Windows 2000
– Linux
– Other
35.What storage mirroring technologies will be used (ESS XRC, ESS PPRC, PtP VTS,
other)?
36.Are coupling facilities being used?
37.How many/type/model/vendor?
38.Are facilities to handle data integrity included?
39.Are there adequate resources for managing Internet security and intrusion, with ongoing
monitoring and management?
40.Is the IT recovery strategy in line with the business objectives? Does the business /or IT
operations, or both, hinge on the availability of an individual person’s skills?

Primary side hardware


41.How many primary control units will be installed?
42.Who is the vendor?
43.How many volumes/LUNs are expected to be recovered?

52 A Disaster Recovery Solution Selection Methodology


44.What processors are installed?
45.How many/type/model/vendor?
46.Are there tape drives involved in this proposal? (If yes, describe.)

Secondary side hardware


47.Is the secondary site client owned or are you using a business recovery center? If so,
which one? Client owned?
48.How many secondary control units will be installed?
49.Who is the vendor?
50.How many volumes are expected?
51.What processors are installed?
52.How many/type/model/vendor?
53.Are there tape drives? (If yes, describe.)

Performance
54.Has a bandwidth analysis been performed by collecting and analyzing data on the
production applications?
55.What percentage of the workload is required to be mirrored?
56.What is the method of automation to be used (GDPS, other)?
57.Is cross-platform data consistency required?
58.What platforms?
59.What level of consistency?

Connectivity
60.What is the distance to the remote site (miles or kilometers)?
61.What is the infrastructure to the remote site (Dark Fibre, Fibre provider/DWDM, Telecom
line - what speed and flavor, T1 - 128 KBytes/sec., T3 - 5 Mbytes/sec, OC3 - 19
Mbytes/sec, IP)?
62.Will channel extenders be used? If so, which channel extender vendor is preferred (CNT,
Cisco, McData, Brocade, other)?
63.What is the write update rate (MB/Sec, Ops/sec, how does it vary by time of day/month)?

Appendix A. Disaster Recovery Solution Selection Methodology matrixes 53


54 A Disaster Recovery Solution Selection Methodology
Related publications

The publications listed in this section are considered particularly suitable for a more detailed
discussion of the topics covered in this Redpaper.

IBM Redbooks
For information about ordering these publications, see “How to get IBM Redbooks” on
page 55. Note that some of the documents referenced here may be available in softcopy only.
򐂰 IBM TotalStorage Solutions for Disaster Recovery, SG24-6547
򐂰 The IBM TotalStorage Solutions Handbook, SG24-5250

How to get IBM Redbooks


You can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft
publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at
this Web site:
ibm.com/redbooks

Help from IBM


IBM Support and downloads
ibm.com/support

IBM Global Services


ibm.com/services

© Copyright IBM Corp. 2004. All rights reserved. 55


56 A Disaster Recovery Solution Selection Methodology
Index
Transaction Integrity matrix 34
D Unplanned Outage matrix 33
Disaster Recovery Tiers
business requirements questions 47 blending 4
challenge in selecting a solution 2 Disaster Recovery 3
eliminate non-solutions 32 tutorial
example of Solution Selection Methodology 13 Disaster Recovery Solution Selection Methodology 6
hourglass concept in methodology 8
nature of solutions 2
Solution Selection Methodology 6
Solution Selection Methodology steps 9
Tiers 3
usage of methodology 7
value of Solution Selection Methodology 12
Disaster Recovery Solution Selection Methodology
hourglass concept 8
tutorial 6

H
hourglass concept 8

M
Methodology
example 13
steps 9

R
Redbooks Web site 55
Contact us viii

T
Tier 2,1
Planned Outage matrix 45
Transaction Integrity matrix 47
Unplanned Outage matrix 46
Tier 3
Transaction Integrity matrix 44
Unplanned Outage matrix 43
Tier 4
Transaction Integrity matrix 43
Unplanned Outage matrix 42
Tier 4,3
Planned Outage matrix 42
Tier 5
Planned Outage 39
Transaction Integrity matrix 41
Unplanned Outage matrix 40
Tier 6
Planned Outage matrix 35
Transaction Integrity matrix 38
Unplanned Outage matrix 36
Tier 7
Planned Outage matrix 32

© Copyright IBM Corp. 2004. All rights reserved. 57


58 A Disaster Recovery Solution Selection Methodology
Back cover ®

A Disaster Recovery
Solution Selection
Methodology Redpaper

Learn and apply a There are a wide variety of IBM TotalStorage Disaster Recovery
technologies and solutions. Each are very powerful in their own
INTERNATIONAL
Disaster Recovery
way, and each has their own unique characteristics. How can we TECHNICAL
Solution Selection
select the optimum combination of solutions? How do we SUPPORT
Methodology
organize and manage all these valid Disaster Recovery ORGANIZATION
How to find the right technologies?
Disaster Recovery
These questions have vexed Disaster Recovery solution
solution designers for a long time. Developing the skill to perform this
BUILDING TECHNICAL
selection function effectively was often time consuming and INFORMATION BASED ON
Working with IBM incomplete. It can be difficult to transfer these skills to other PRACTICAL EXPERIENCE
TotalStorage products colleagues.
IBM Redbooks are developed
In this Redpaper, we offer a suggested Disaster Recovery by the IBM International
Solution Selection Methodology that is designed to provide Technical Support
assistance to this problem. The intent of our methodology is to Organization. Experts from
IBM, Customers and Partners
allow us to navigate the seemingly endless permutations of from around the world create
Disaster Recovery technology quickly and efficiently, and to timely technical information
identify initial preliminary, valid, cost-justified solutions. based on realistic scenarios.
This methodology is not designed to replace in-depth skills. It is Specific recommendations
meant as a guideline and a framework. Proper application of this are provided to help you
implement IT solutions more
methodology can significantly reduce the effort and time required effectively in your
to identify proper solutions, and therefore accelerate the environment.
selection cycle.

For more information about this methodology, see the redbook


IBM TotalStorage Solutions for Disaster Recovery, SG24-6547. For more information:
ibm.com/redbooks

You might also like