You are on page 1of 10

Classes of Recovery

WHITE PA P E R

Developing a truly effective business continu-

ity plan—one that will result in successful

recovery should the need actually arise—

requires more effort than just implementing

a backup solution. This white paper, part of a

series on the IT infrastructure of business

continuity, will outline a methodology for set-

ting recovery priorities based on business

requirements and matching recovery technol-

ogy to recovery objectives that will result in

cost effective business continuity systems

and timely recoveries.


Introduction
Table of Contents
Backup and recovery solutions are often thought of as expensive ways for IT personnel
to have peace of mind in the event of an interruption to important business functions.
Introduction 2
This is true in many cases because the solutions are based on budgetary constraints
Recovery methodologies 2 instead of business needs. Enterprises would benefit instead by developing a methodolo-
gy to determine the characteristics of a recovery, and by placing each business function
Prioritizing the recovery 2
into an appropriate class of recovery. By classifying each business function and assigning a
Business requirements 3 corresponding priority, performing an actual recovery will be more successful and will
result in smaller loss to the company in the event of an outage.
Recovery objectives 4

Techniques 5
Recovery methodologies
Technology 6
We need to ask ourselves, “How successful have most disaster recovery efforts been?”
Cost 7 The answer is not particularly easy to obtain, since there have been very few major disas-
ters, and most IT managers have never actually experienced one.When a business does
Assessing the classes
attempt a recovery, they often discover that it can’t be done in the required timeframe.
of recovery 8
This occurs because the backup capabilities can’t support the recovery objectives needed
Conclusion 8 by the business.

Many companies prepare for disaster recoveries by performing disaster recovery tests,
which for the most part are minimally successful. One of the major issues that CNT sees
repeatedly during these tests is a lack of prioritization of recovery activities. Instead,
companies attempt to recover all of the infrastructure and data as quickly as possible.

What is needed is a recovery strategy methodology that meets the needs of the business
and allows the business functions to be recovered in a pre-determined order. In this way,
a company will be better equipped to develop and maintain the right solution at the
right cost. Using such a methodology instead of the all-or-nothing approach will benefit
the company by providing the proper level of protection for each of the business units
instead of being excessive or insufficient.

Prioritizing the recovery


All too often, backup/recovery solutions are devised in a vacuum without much regard
for the actual needs of the business. Since the primary purpose of recovery is to reinstate
critical business function(s) in the event of an outage, it is imperative that the recovery
scenario is designed with business needs in mind.This must be done well before the dis-
aster occurs.Trying to recover is difficult enough-trying to prioritize during the recovery
is an impossible task.

The issue that slows down most disaster recoveries is attempting to restore everything at
once.The sensible thing is to first restore only what is absolutely necessary, less critical
functions second, and non-vital business functions last. Recovering 50 percent of the
infrastructure is certainly easier and quicker than recovering the entire enterprise. By
reducing the number of tapes, servers, and disk requirements, a seemingly impossible
task can turn into an achievable one.

CNT believes that each enterprise should try to categorize their business functions into
several classes of recovery.We recommend using the following general approach:
• Class 0: no reason to recover during a disaster recovery
• Class 1: non-vital business functions
• Class 2: business functions that are vital to the company, but are not the most important
• Class 3: critical, “must have” business functions
• Class 4: continuously available, absolutely cannot go offline for any reason

2 WHITE PA P E R
Breaking down each component into recovery classes makes it easier to determine what
the company’s needs are, what the recovery capabilities currently are, and what needs to
be done to achieve the desired class of recovery should the current capabilities prove to
be inadequate.These requirements can vary from company to company as well as from
industry to industry.

Class 0 and Class 4 require the least amount of detail pertaining to continuance of business.
Class 4 simply must continue uninterrupted at all cost.This class is normally reserved
for the most critical of process such as financial market transactions, air traffic control-
ling, critical health care systems and infrastructure such as power and communication.
Class 0 environments are not recovered at all in the event of an outage.While these sys-
tems may support the overall IT infrastructure of a company, they contain no critical
data and would be replaced rather than recovered if lost in a disaster.We feel it is impor-
tant to acknowledge these different recovery classes as many systems are designated with
these recovery objectives.The remainder of this paper will focus on Class 1, Class 2 and
Class 3.This is where business continuity is achieved for most business functions.

Business requirements
Figure 1 shows the high-level overview of the process needed to assess the recovery
requirements for each business function.

The initial step in this process involves assessing each of the business requirements and
classifying them by their relative importance to the company (see Figure 2, next page).
This is traditionally done through a business impact analysis (BIA), which rates the sever-
ity of the impact on the company should the business function become unavailable.The
impact to the company is perceived in terms of operating costs, infrastructure costs, reg-
ulatory fines or sanctions, financial losses, or damage to the company’s reputation (loss
of market share, decreased customer satisfaction, etc.).While the impact on reputation
is intangible and certainly cannot be quantified, it is a real threat and can eventually
result in some financial reverses.

Business functions that are most important and critical will fall into Class 3. Functions
that are vital but can be recovered after critical functions are restored will be designated
as Class 2. Non-vital functions that can be performed via alternate methods for an
extended period of time will be designated Class 1.

DEFINE

Figure 1: assessing recovery require-


BUSINESS RECOVERY
ments for each business function
REQUIREMENTS OBJECTS

REASSESS DICTATE

COST TECHNIQUES

DRIVES TECHNOLOGY DETERMINE

Classes of Recovery 3
Recovery objectives
After each business requirement is classified as to the relative impact to the business in
the event of an outage, the recovery objectives for each class must be quantified (see
Figure 3).The class of recovery mandated by the business requirements defines the
recovery objectives for each business function. For example, business functions that have
Class 3 business requirements can only have Class 3 recovery objectives.

These objectives also need to be quantified in terms of recovery time objectives (RTO)
and recovery point objectives (RPO).The RTO is the time it takes to restore the busi-
ness function to a functioning level.The RPO is the specific point-in-time the data needs
to be restored to in order to affect a successful recovery. Although the RTO and RPO in
each class of recovery will vary from company to company, some general guidelines are
suggested in figure 4.

To keep things in perspective and make sure the actions are derived within a realistic
timeframe, remember that recovery is comprised of several factors:
• Time to restore hardware infrastructure:
• Server(s)
• Disk
• Network components
• Storage area network (SAN) components
• Tape drives/libraries
• Time to restore operating system(s)
• Time to establish connectivity
• Time to restore application software and data

Figure 2: classifying business Class 3 (critical) Class 2 (vital) Class 1 (non-vital)


requirements
• Disaster has immediate • Disaster will adversely • Disaster has no short or
One could also include a Class 4 for busi- effect on company affect company after long term effect on
ness functions which require solutions some period of time company
• Cannot do business
that are fully fault-tolerant at the hard-
without process • Can do business with- • Can do business with-
ware level with transaction replication.
out process but not for out process for an
However, these scenarios are rarely used
extended time period extended period of time
and are really only practical in mainframe
environments.

Figure 3: quantifying recovery Class 3 (critical) Class 2 (vital) Class 1 (non-vital)


objectives
• Critical process • Critical process • Non-vital process

• Requires shortest recov- • Must be restored as • Can use alternate


ery and time-to-data soon as possible after processes to perform
timeframes the critical processes function

• Next shortest recovery • Longest recovery and


and time-to-data time- time-to-data timeframes
frames

4 WHITE PA P E R
Each of these factors must be considered when planning for recovery.There will be
overlap and timing issues that vary from company to company, but breaking down the
recovery in a logical manner will result in realistic recovery objectives.

Techniques
The techniques used for each class of recovery are dictated by the recovery objective
time frames. Figure 5 details these techniques according to their recovery class.These
techniques must take into account the nature of the outage and the extent of the recov-
ery effort. For instance, if a company does not wish to recover from a total site disaster
(e.g., fire, flood, earthquake) then the techniques involving any sort of replication of
processors or data at a remote site are not pertinent.

Class 3 recovery techniques are the most robust, and provide the shortest recovery time
and the fastest time to data.These include hot backup systems, remote mirrored disk
arrays, electronic tape vaulting, SANs, local and remote redundant networks, and use of
snapshot disk volumes.

Class 2 recovery techniques are less robust than Class 3 techniques, but still rely on
high-end technology.

Business functions that only require Class 1 recovery methods employ the low-end tech-
niques such as no redundancy, no failover capability, manual backup processes, and static
database backups. Backups are usually performed on a server-by-server basis and the
responsibility for them lies with a single person. For the most part, if recovery is
required for a Class 1 function, it will be handled at that time with little or no planning.

Just as important as recovery techniques are backup techniques. For instance, it is


imperative to use enterprise class backup software for the business functions requiring
Class 3 recovery. Disk mirrors are also very useful, since they protect against disk media
failure. Should a virus, sabotage, or software bug corrupt the data in the file or database,
however, the ability to recover will rest upon a volume clone or tape backup.

It is important to remember that no matter what sort of state-of-the-art technology is


employed, the ability to recover successfully is only as good as the last backup. It is for
this reason that the backup process must be deemed a critical business function. All too

primary site
Class 3 (critical) Class 2 (vital) Class 1 (non-vital) Figure 4: typical recovery time
and recovery point objectives

RTO 5 — 30 minutes 30 minutes — 8 hours > 8 hours


This table points out that RTO and RPO
should be stated differently depending
RPO < 1 minute before event 1 minute — 24 hours 24 hours (last backup) on whether recovery can take place at
the primary site as opposed to the
secondary site.
secondary site
Class 3 (critical) Class 2 (vital) Class 1 (non-vital)

RTO < 8 hours 8 hours — 72 hours 72 hours — 1 week

RPO < 15 minutes Last backup Last full backup


before event (< 24 hours) (< 1 week)

Classes of Recovery 5
often the backup process is insufficient to handle the volume of data required for recovery;
if this process does not complete in the allotted time it is cancelled without hesitation.

Given the astronomical growth rate of data in most companies, the ability to manage this
data becomes a necessity and warrants strong storage management policies and proce-
dures, as well as the personnel to enforce them. Good storage management helps reduce
the amount of data being backed up by ensuring that only the data required to recover a
business function is backed up.This is a much more effective paradigm than the all-or-
nothing approach.

Note that network recovery needs to be included as part of the overall recovery sce-
nario. Both local and remote redundant networks require a level of planning and testing
well above that required by the lower classes of recovery.This will ensure that neither
performance nor connectivity is compromised.

Technology
The technology for each class of recovery is determined by the technique required to
support the desired recovery objective. Figure 6 details the technology according to
recovery class.

A Note on Business Continuity


While often thought of as being one and the same, there is a critical distinction between
disaster recovery and business continuity. A disaster recovery plan should be just one
component of a broader business continuity strategy to keep business operations continu-
ing as usual no matter what kind of disruption occurs-planned or unplanned. For further
reading, please see CNT’s paper titled “The IT Infrastructure of Business Continuity.”

primary site
Figure 5: classifying techniques used Class 3 (critical) Class 2 (vital) Class 1 (non-vital)
for each class of recovery
• Automatic failover • Manual failover • Ad hoc backup and
capabilities capabilities recovery solutions

• Snapshot volumes • Enterprise backup and • Static database backups


of databases recovery solutions
• No failover capabilities
• No single points • Split mirror database
of failure backups

• Online database
backups

secondary site
Class 3 (critical) Class 2 (vital) Class 1 (non-vital)

• Real-time or near real- • Remote tape vaulting • Cold servers—rebuild at


time data replication someone else’s site
• Warm servers available
• Hot failover servers for Manual reconfig (OS • Offsite tape storage
(OS, App online and and app loaded, used
dedicated to production for non-production
support) work)

• No single point of
failure

6 WHITE PA P E R
Cost
The cost associated with each class of recovery is driven by the technology needed to
employ the appropriate recovery technique for that class (see Figure 7). In general, the
more critical the business function, the more expensive the recovery solution.

When the cost of recovery is deemed to be too high, the business requirements should
be reassessed to see whether or not any business functions could be reclassified to a
lower recovery class. All too often, this process fails because instead of reassessing the
business requirements, the technology is compromised without regard for the business
functions.This jeopardizes the integrity of the recovery class for the business functions in
that class.Those in charge of the business functions must decide what level of risk is
acceptable to the company, and the assumption of risk should never be made solely on
the basis of budget constraints.

Finding the optimum cost of protection is rarely an easy task. How much is enough? Most
companies merely look at the cost associated with the technology needed to protect the
business functions. A simple benefit analysis would help justify the cost of protection.

local
Class 4 Class 3 Class 2 Class 1 Figure 6: determining technology used
for each class of recovery
• Fault tolerant • Server • Departmental • JBOD
hardware clustering Storage
• Locally attached
Solutions
• WAN server tape drives
clustering • Duplicated hard-
• Manual tape
ware configs
• Enterprise stor- movers
age solutions • Automated tape
library
• Redundant
networks • Dedicated back-
up networks
• Networked
storage

remote
Class 4 Class 3 Class 2 Class 1

• Fault tolerant • Duplicate hot • Co-location sites • Rented recovery


hardware sites sites
• Departmental
• Transaction • WAN server storage solu- • Tapes stored at
replication clustering tions rented off-site
software location
• Storage con- • Duplicated hard-
troller-based ware configs
disk replication
• Remote
• Storage virtual- accessed auto-
ization replica- mated tape
tion library

• Software based • Dedicated back-


replication up networks

• Application
based replication

Classes of Recovery 7
To do this, first determine the expected loss if there is an interruption to a business
function and no protection in place (Ln).Then calculate the expected loss due to an
interruption of a business function with protection in place (Lp).The purported benefit
of protection for a business function is represented as follows: Benefit = (Ln) - (Lp).

This is only one method of trying to justify the cost associated with protection.
Quantifying the expected loss in the event of a business function interruption should
always be done and used as one factor in determining what the cost of protection
should be for that business function.

Assessing the classes of recovery


Once the business requirements have been run top-down through the process and the
appropriate technologies have been selected for each class of recovery, it is necessary to
assess the current technology of each business function and assign each one a specific
class of recovery. Doing this provides a view of the present class of recovery capability
versus the class of recovery needed by the business. If the current class of recovery is
greater than that required by the business function, the level of protection is excessive.
On the other hand, if it is lower than the required class of recovery, the level of protec-
tion is deficient and needs to be improved. Comparing the differences in technology
between the classes of recovery for each business function makes it easy to determine
what needs to change in order to achieve the required class of recovery. Only at this
point can a realistic upgrade plan be developed.

Conclusion
Following the classes of recovery methodology outlined in this paper is beneficial to a
company for several reasons. First, it ensures that recovery methods selected for each
business function are driven by the needs of the business. Second, it allows a company to
easily assess their current recovery capabilities and develop a viable strategy to correct
deficiencies in their existing recovery scheme. Basing the classes of recovery on the
amount of risk that can be assumed by each business function ensures that the expense
required to preserve the business correlates directly to its importance to the company.

Further reading
Another paper in this series, “Evaluating Your Exposure,” details the process of cost justify-
ing business continuity. It begins by discussing factors to consider in establishing the cost
of disrupted service for a business function. It also describes a study known as a business
impact analysis.This paper is co-authored by CNT’s strategic partner in business continu-
ity, Strohl Systems, experts in business impact analysis and business continuity planning.

Figure 7: costs associated with each Class 3 (critical) Class 2 (vital) Class 1 (non-vital)
class of recovery
• Most expensive to • Moderate expense to • Least expensive to
implement implement implement

• Least loss incurred from • Moderate loss incurred • Highest loss incurred
disaster from disaster from disaster

8 WHITE PA P E R
After establishing the desired recovery class of your application environments, and
CNT has nearly two decade’s worth of
after justifying the costs by understanding your exposure, you are ready to talk tech-
experience assessing, designing, and
nology. Additional white papers in this series focus on current technologies necessary
deploying IT solutions to support busi-
to achieve specific recovery objectives. One, “Primary Site Recovery Techniques,”
ness continuity objectives. Our profes-
focuses on business continuation technology within the primary data center. A second
sional consulting organization can help
paper, “Secondary Site Recovery Techniques,” focuses on continuation technology at an
you effectively evaluate and plan your
alternate location. The technology solutions will be compared against others available
optimal solution. From business continu-
within a given recovery class. These comparisons will give consideration to costs to
ity architecture assessments, design, and
implement and complexity to manage.
integration, to remote network manage-
ment and support, we help you stream-
line the decision making process, acceler-
ate technology deployment, and meet
your IT recovery objectives.

Classes of Recovery 9
CNT is one of the world’s largest providers of comprehensive © 2003 by Computer Network Technology Corporation (Nasdaq: USA: 1-800-638-8324 „ Canada: 905-595-1500
storage networking solutions. For over 20 years, our experts have CMNT). All rights reserved. Any reproduction of these materials U K : 4 4 - 17 5 3 - 7 9 2 4 0 0 „ F r a n c e : 3 3 - 1 - 4 13 0 - 1 2 1 2
analyzed, designed, and built enterprise storage networks. without the prior written consent of CNT is strictly prohibited. CNT, Australia: 61-2-9540-5486 „ Germany: 49-89-42 74 11-0
the CNT logo, Channelink, and UltraNet are registered trademarks of Switzerland: 41-1-73 35-733 „ Belgium: 32-2-737 76 42
Visit www.cnt.com to learn about our solutions, products, partner- Computer Network Technology Corporation. All other trademarks Italy: 39-06-51 49 31 „ Brazil: 55-11-5509-1504
ships, career opportunities, and more. identified herein are the property of their respective owners. CNT is Japan: 813-5403-4858 „ Other locations: 1-763-268-6000
an equal opportunity employer. CNT corporate headquarters’ QMS is
registered to ISO 9001: 2000. Certificate #006765. PL563 | 0803