Professional Documents
Culture Documents
Drexel University
Professor James D. Baranello
CT-415 Disaster Recovery & Continuity Planning
Fall Quarter 2012
Friday, November 2nd, 2012 (Part One)
Saturday, December 15th, 2012 (Part Two)
Page 1 of 40
Table of Contents
Abstract.......................................................................................................................................................4
Disclaimer...................................................................................................................................................5
Section One.................................................................................................................................................6
Introduction.............................................................................................................................................6
Organizational Background.........................................................................................................................7
Backup Strategy......................................................................................................................................7
FEMA Disaster Scenario: Earthquake.......................................................................................................11
Emergency Scenario..............................................................................................................................11
Threat, Risk and Impact.........................................................................................................................11
Recovery Strategy.................................................................................................................................12
FEMA Disaster Scenario: Explosion.........................................................................................................15
Emergency Scenario..............................................................................................................................15
Threat, Risk and Impact.........................................................................................................................15
Recovery Strategy.................................................................................................................................17
Section Two...............................................................................................................................................20
Introduction...........................................................................................................................................20
Disaster Recovery Plan..............................................................................................................................21
Mission Statement.................................................................................................................................21
Definitions.............................................................................................................................................21
Stakeholders..........................................................................................................................................22
Scope of Work.......................................................................................................................................25
Disaster Recovery Timeframe...............................................................................................................26
Team Participants and Responsibility Matrix........................................................................................27
Resources..............................................................................................................................................28
Tools......................................................................................................................................................29
Sites.......................................................................................................................................................29
Recovery Procedure...............................................................................................................................30
Section Three.............................................................................................................................................33
Introduction...........................................................................................................................................33
Testing the DRP........................................................................................................................................34
Page 2 of 40
Semi Annual Review.............................................................................................................................34
Semi Annual Testing.............................................................................................................................34
Project Summary.......................................................................................................................................37
Appendices................................................................................................................................................38
Appendix I.............................................................................................................................................38
Appendix II...........................................................................................................................................39
References.................................................................................................................................................40
Page 3 of 40
________
Abstract:
________
plan. The company is Archive Concepts, LLC. I am the legitimate owner of Archive Concepts.
However, some of the facts about the company need to be changed to accurately describe what a
disaster recovery plan for a medium to large business should consist of. This paper is divided
into three parts; section one, section two and section three. Section one will discuss the
background of the organization, define and explain two disaster scenarios, analyze the risk, threat
and impact to the organization and discuss the recovery strategy. Section two will contain the
recovery team mission statement, scope of work for the recovery team, the timeframe to
recovery, team participants, tools, sites in which to recover and the recovery procedures.
Stakeholders will be identified, describe how they are impacted and how they will be updated
throughout the recovery process. Section three discusses the importance of testing the disaster
recovery plan and reviewing it on a regular basis for accuracy and relevance. The end goal is to
Page 4 of 40
__________
Disclaimer:
__________
The company Archive Concepts, LLC is a real business entity. However, some of the
details about the company are false. Some regional offices are needed along with data centers to
accurately describe what a disaster recovery plan should present for a medium to large
organization. The company assets, resources, personnel, and other pertinent information will be
dramatized for the purpose of writing this paper to fulfill the requirements of the assignment and
Page 5 of 40
___________
Section One:
___________
Introduction:
Section one discusses the background of the organization, defines and explains two
disaster scenarios, analyzes the risk, threat and impact to the organization, and discusses the
recovery strategy for the organization. The disaster scenarios are chosen from the Federal
Emergency Management Agency (FEMA) categories available from the document “Are You
disaster scenarios chosen and discussed in this section are earthquakes and explosions.
Page 6 of 40
________________________
Organizational Background:
________________________
Archive Concepts LLC (AC) is a Full Service I.T. Company offering a wide variety of
products and services. Archive Concepts have been servicing the Pocono and surrounding areas
since the year 2000 and regionally since 2005. The company specializes in corporate networking
management systems, virtual desktops, data storage solutions and managed hosting. Archive
Concepts has its corporate headquarters in East Stroudsburg Pennsylvania with primary
datacenter operations in a nearby Tier 1 facility and two regional offices, one in Atlanta Georgia
The primary datacenter facility of the company uses the latest in technological advances
and utilizes a large VMware infrastructure connected to a SAN. The facility is Tier 1 and is
hardened against environmental disasters such as flooding, hurricanes, and tornadoes being
located in a mountainous non flood zone area. The facility has standard security features such as
electronic key card entry, hand scan, fire prevention, video surveillance, backup power and
efficient floor cooling. The secondary datacenter is located in a Tier 2 facility in San Diego. That
facility also has standard security features such as electronic key card entry, hand scan, fire
prevention, video surveillance, backup power and efficient floor cooling but is more susceptible
Backup Strategy:
The backup solution for Archive Concepts is quite elaborate but does not take advantage
of a hot site as at this time there is no justification for one. The primary backup strategy
Page 7 of 40
leverages storage solutions offered by NetApp, ONTAP Snap Technology, DoubleTake
Replication Software, Symantec BackupExec Software and Overland LTO5 Tape Libraries. The
company refers to the backup as the “Company-Wide Backup Solution” (CWBS). The company
has critical data in all offices. CWBS consists of DoubleTake replication software loaded on any
server in the field that needs data to be backed up (referred to as source servers). Those source
servers replicate the data to the IDC on two CWBS target servers connected to our NetApp SAN
(Storage Area Network) system. All datastores in the VMware environment are also stored on
the NetApp. The NetApp appliances are the primary location for all data. The following chart
contains a list of technologies all covered by a 24/7 four hour response premium support contract
that will fully replace all hardware that fails or is damaged during a disaster event:
IDC-NETAPP01
SDI-NETAPP01
SDI-NETAPP02
Backup storage solution for all
Model FAS2240-4 managed hosting data, VMware
Cluster datastores and other CWBS data. None FibreChannel
Page 8 of 40
As mentioned, all data is stored on the primary NetApp appliances located at the Tier 1
datacenter facility. AC has a ten node clustered VMware virtual infrastructure environment
where all datastores are located on IDC-NETAPP01 and IDC-NETAPP02 appliances. The two
servers IDC-CWBS01 and IDC-CWBS02 are within the virtual environment. All critical field
data is replicated in real time via DoubleTake replication software to partitions on IDC-CWBS01
and IDC-CWBS02. The filer controllers IDC-NETAPP01 and IDC-NETAPP02 have Data
ONTAP snap technology enabled. Local snapshots of all volumes (which contain VM datastores
and other data) are enabled and occur on an hourly, weekly and monthly basis as defined within
The NetApp filers SDI-NETAPP01 and SDI-NETAPP02 are located in the San Diego
Tier 2 datacenter facility. Data ONTAP snapmirror technology replicates the local snapshots of
weekly and monthly basis as defined in the data protection policies. Controller SDI-NETAPP01
then “vaults” each of those snapshots to the second controller SDI-NETAPP02 for longer term
retention. The server SDI-BACKUP01 via the Symantec BackupExec server executes a backup
against the vaulted snapshots contained on filer SDI-NETAPP02. For backup rotations, AC
The Grandfather rotation runs 2 -3 times a month on depending how long the month is. It
always starts on a Friday night and runs until completion throughout the weekend. This
rotation always demands a new tape – it never appends to tapes that may already have
data on them. This is because when we take the tapes offsite, no other rotation will
Page 9 of 40
The Father rotation runs 2 times a month. It always starts on a Friday night and runs until
completion throughout the weekend. Like the Grandfather rotation, it always demands a
new tape – it never appends to tapes that may already have data on them. This is because
if we take the tapes offsite, no other rotation will depend on the missing data.
The Son rotation occurs every Monday through Thursday night of each week. It
performs a differential backup from the last Grandfather or Father that was run. This
rotation appends to other Son tapes to maximize the space on each tape.
All Grandfather tapes are shipped off site to an Iron Mountain secure facility for long
term storage. The Grandfather tapes contain a full backup of the vaulted snapshots and therefore
all data contained within them. The Father rotations are recycled after 180 days and Son rotations
are recycled after 45 days. The Father and Son tapes are taken off site with local IT resources.
Page 10 of 40
_________________________________
Emergency Scenario:
The most destructive natural disaster is a severe earthquake. Depending on the severity of
the event, it could cause total destruction of a datacenter facility. In this scenario we discuss the
risk, threat and impact of a complete loss of the datacenter and the subsequent recovery effort to
bring the production environment back online. The scenario consists of a high impact earthquake
that was to hit either datacenter location in East Stroudsburg or San Diego. The result of the
earthquake to the organization is a complete loss of the datacenter but the devastation was not so
The first objective is to identify the threat. Knowing the threat one can then focus on the
assets that are directly affected by the given threat. The most vulnerable assets will be on the top
of the list and will be those that are recovered first. Once you have identified the threat to the
organization, one can then determine the risk that those threats pose to the assets. Lastly, after
determining the risk, each asset should be analyzed as to the impact it would cause to the
In this case the threat is earthquakes. The risk of an earthquake affecting datacenter
operations at the primary Tier 1 facility in East Stroudsburg Pennsylvania is very low. The
datacenter is located in a mountainous area outside of any major fault lines, yet as slim as the
probability is, the possibility still exists. The risk of an earthquake affecting datacenter
Page 11 of 40
operations at the secondary Tier 2 facility in San Diego California is much higher than the
primary facility. California is known to be earthquake prone. The assets most critical to Archive
Concepts are its personnel, the datacenter facility itself and all of the equipment contained within
the facility. The specific most vulnerable assets are listed in table 1 above.
The primary datacenter contains all live production data. The secondary datacenter
contains a replica of the production data from the primary datacenter. A loss of the primary
datacenter facility would have a greater impact on operations as it is the live environment. This is
precisely the reason why the primary datacenter facility is located in an area where very little
natural disasters occur. If an earthquake where to hit the primary datacenter and completely
destroy the facility, there would be service interruption to company personnel and customers.
However, with minimal effort, the primary production processes can be transferred to the
secondary datacenter facility since it does have a fully replicated mirror of the production data. A
temporary VMware environment can be implemented quickly at the secondary location which
can access the mirrored data. If an earthquake where to hit the secondary datacenter and
completely destroy the facility, there would be no service interruption to company personnel and
Recovery Strategy:
First it is most prudent to address the recovery of the primary datacenter facility in the
event of a massive earthquake. A complete loss of the facility and operations would cause
interruption of service. The recovery strategy is to transfer datacenter operations to the secondary
San Diego facility. Converting the mirrored snapshots of the primary facility to the backup
NetApp SAN will take place first. Hardware for creating a temporary VMware environment is in
Page 12 of 40
inventory. The process of configuring ESXi is a straightforward and relatively quick process.
The VMware environment can then be attached to the already existing mirrored copies of
VMware boot LUNs and datastores. Complete restoration of client services can be restored
Once restoration of services has been implemented at the backup facility, the corporate
facilities manager can then focus his attention to recovering the primary facility. The facilities
manager will coordinate with insurance, vendors and contractors to restore the primary
datacenter facility to its full functioning state. Other senior level management and information
technology professionals will also be involved in the recovery effort. The below table describes
Responsibility Matrix
These plans/points assume employee safety is taken first priority
Role Has Ability to Actions Responsible for
General Manager Declare Emergency Ensuring impacted staff are provided real-time
Request DR plan to be instructions
activated GM to keep RGM informed
Keeps clients and others updated
General status communications / point person.
Regional General Declare Emergency Keeps HR director updated
Manager Request DR plan to be Keeps Facility director updated
activated Keeps CIO updated
Approval authority to Ensuring General Manager is aware of options
activate DR plan Keeping Senior Management updated on event
handling
Coordination with Human Resources on event
handling
Participates in general status communications
Senior Management Declare Emergency Overall Oversight / set direction
Request DR plan to be Coordination with Executive Management
activated Team
Approval authority to
activate DR plan
Information Request DR plan to be Initiates Technology tasks per DR plan
Technology activated Communicates status of technology
Page 13 of 40
Updates GM as to IT tasks/progress of DR plan
action status
Human Resources Request DR plan to be Ensures any legal, employee safety concerns
(Dir/Delegate) activated are being addressed.
Approval authority to
activate DR plan
Facilities Request DR plan to be Keeping Senior Management updated on event
(Dir/Delegate) activated handling
When primary datacenter has been restored to its original state, the data at the San Diego
facility will be considered the production data. The data will need to be mirrored to the primary
datacenter facility using snap technology. The VMware environment will be rebuilt and a cut
over to the mirrored data will be scheduled. Primary production processes will be transferred to
the East Stroudsburg datacenter facility and San Diego will once again become the backup site.
Regularly scheduled snapshots, mirrors and vaults will resume the pre disaster time frame.
Secondly, if the San Diego facility were to be destroyed, the recovery strategy will be a
straightforward process since it is the backup site and not user or customer impacting. The
facilities manager will coordinate with insurance, contractors and vendors to restore the facility
to its original state. Information technology engineers will implement the replacement hardware
back to its original state. Replication of data can then be restored from the primary datacenter
facility.
Page 14 of 40
_______________________________
FEMA Disaster Scenario: Explosion:
_______________________________
Emergency Scenario:
result of a terrorist attack. If you think about it, a terrorist could potentially inflict a lot of damage
by destroying a tier 1 datacenter facility. This is especially true if it were a major carrier such at
AT&T where many customers, some very large and important, have their technology
infrastructure hosted. The impact potential to the entire country could be massive. Depending on
the severity of the event, it could cause total destruction of a datacenter facility. In this scenario
we discuss the risk, threat and impact of a complete loss of the datacenter and the subsequent
recovery effort to bring the production environment back online. The scenario consists of an
explosion due to terrorist activity that was to hit either datacenter location in East Stroudsburg or
San Diego. The result of the terrorist attack to the organization is a complete loss of the
datacenter but the devastation was not so heavy to the area that the facility could not be rebuilt.
The first objective is to identify the threat. Knowing the threat one can then focus on the
assets that are directly affected by the given threat. The most vulnerable assets will be on the top
of the list and will be those that are recovered first. Once you have identified the threat to the
organization, one can then determine the risk that those threats pose to the assets. Lastly, after
determining the risk, each asset should be analyzed as to the impact it would cause to the
Page 15 of 40
In this case the threat is an explosion caused by a terrorist attack. The risk of a terrorist
attack affecting datacenter operations at the primary Tier 1 facility in East Stroudsburg
Pennsylvania is very low. The datacenter is located in a mountainous area outside of any major
cities or heavily populated areas, yet as slim as the probability is, the possibility still exists. The
risk of a terrorist attack affecting datacenter operations at the secondary Tier 2 facility in San
Diego California is much higher than the primary facility since it is a more heavily populated
area. Terrorists are known to want to take as many lives as possible, they do not necessarily think
about technological infrastructure damage at a datacenter facility. The assets most critical to
Archive Concepts are its personnel, the datacenter facility itself and all of the equipment
contained within the facility. The specific most vulnerable assets are listed in table 1 above.
The primary datacenter contains all live production data. The secondary datacenter
contains a replica of the production data from the primary datacenter. A loss of the primary
datacenter facility would have a greater impact on operations as it is the live environment. This is
precisely the reason why the primary datacenter facility is located in an area where very little
natural disasters occur and is located outside of any densely populated area. If a terrorist attack
where to happen to the primary datacenter and completely destroy the facility, there would be
service interruption to company personnel and customers. However, with minimal effort, the
primary production processes can be transferred to the secondary datacenter facility since it does
have a fully replicated mirror of the production data. A temporary VMware environment can be
implemented quickly at the secondary location which can access the mirrored data. If a terrorist
attack where to happen at the secondary datacenter and completely destroy the facility, there
would be no service interruption to company personnel and customers as that site’s purpose is
Page 16 of 40
Recovery Strategy:
The recovery strategy for a terrorist attack will be the same as an earthquake as in both
scenarios the facilities are destroyed but still recoverable over time. First it is most prudent to
address the recovery of the primary datacenter facility in the event of a terrorist attack. A
complete loss of the facility and operations would cause interruption of service. The recovery
strategy is to transfer datacenter operations to the secondary San Diego facility. Converting the
mirrored snapshots of the primary facility to the backup NetApp SAN will take place first.
configuring ESXi is a straightforward and relatively quick process. The VMware environment
can then be attached to the already existing mirrored copies of VMware boot LUNs and
datastores. Complete restoration of client services can be restored within a 24 hour period.
Once restoration of services has been implemented at the backup facility, the corporate
facilities manager can then focus his attention to recovering the primary facility. The facilities
manager will coordinate with insurance, vendors and contractors to restore the primary
datacenter facility to its full functioning state. Other senior level management and information
technology professionals will also be involved in the recovery effort. The below table describes
Responsibility Matrix
These plans/points assume employee safety is taken first priority
Role Has Ability to Actions Responsible for
General Manager Declare Emergency Ensuring impacted staff are provided real-time
Request DR plan to be instructions
activated GM to keep RGM informed
Keeps clients and others updated
General status communications / point person.
Regional General Declare Emergency Keeps HR director updated
Manager Request DR plan to be Keeps Facility director updated
Page 17 of 40
activated Keeps CIO updated
Approval authority to Ensuring General Manager is aware of options
activate DR plan Keeping Senior Management updated on event
handling
Coordination with Human Resources on event
handling
Participates in general status communications
Senior Management Declare Emergency Overall Oversight / set direction
Request DR plan to be Coordination with Executive Management
activated Team
Approval authority to
activate DR plan
Information Request DR plan to be Initiates Technology tasks per DR plan
Technology activated Communicates status of technology
Updates GM as to IT tasks/progress of DR plan
action status
Human Resources Request DR plan to be Ensures any legal, employee safety concerns
(Dir/Delegate) activated are being addressed.
Approval authority to
activate DR plan
Facilities Request DR plan to be Keeping Senior Management updated on event
(Dir/Delegate) activated handling
When primary datacenter has been restored to its original state, the data at the San Diego
facility will be considered the production data. The data will need to be mirrored to the primary
datacenter facility using snap technology. The VMware environment will be rebuilt and a cut
over to the mirrored data will be scheduled. Primary production processes will be transferred to
the East Stroudsburg datacenter facility and San Diego will once again become the backup site.
Regularly scheduled snapshots, mirrors and vaults will resume the pre disaster time frame.
Secondly, if the San Diego facility were to be destroyed, the recovery strategy will be a
straightforward process since it is the backup site and not user or customer impacting. The
facilities manager will coordinate with insurance, contractors and vendors to restore the facility
to its original state. Information technology engineers will implement the replacement hardware
Page 18 of 40
back to its original state. Replication of data can then be restored from the primary datacenter
facility.
Page 19 of 40
___________
Section Two:
___________
Introduction:
Section two defines the mission statement of the Archive Concepts disaster recovery
team, the scope of work for the team, the timeframe for the disaster to recover, team participants,
resources, tool, sites that will be recovered, and recovery procedures. The Archive Concepts
stakeholders will be identified, how they are impacted, and how the team will respond and
inform them during the disasters. Essentially, section two is the disaster recovery plan for the
Archive Concepts organization. In section one technology and scenarios are discussed. Section
two applies these technologies and formulates the actual disaster recovery plan recovery
Page 20 of 40
____________________
Mission Statement:
The following plan is designed to provide Archive Concepts staff information, guidance
and other direction regarding an operationally impacting event. The mission of the disaster
recovery team and the recovery effort is to protect all Archive Concepts company assets, to
ensure the safety of all company associates, and to ensure the continued high level of service to
the Archive Concepts customer and user base. The disaster recovery team will be prepared to
recover all mission critical business systems and services to another datacenter location and
ensure pre-arranged vendor agreements. The mission of the disaster recovery plan is to clearly
define responsibilities, actions and procedures to recover the Archive Concepts system,
communication and network environments in the event of a disaster. The DRP has three main
objectives; recover the physical network, recover the applications, and minimize the impact on
the business all within acceptable time frames as defined by the disaster recovery team and
executive management.
Definitions:
during an event.
Disaster – Any situation, with advance notice or zero advance notice, that causes a
severe, potential risk to employees and/or the ongoing and possibly sustained impact to
operations.
Page 21 of 40
Disaster Recovery Plan: The document that defines the resources, actions, tasks, and data
required to manage the business recovery process in the event of a disaster. The plan is
designed to assist in restoring the business processes within the stated disaster recovery
goals. Tim. J Smith states, “The business objective of Disaster Recovery is to manage
system outages” which aligns with the ideals of the Archive Concepts disaster recovery
Recovery Time Objective: Amount of down time before an outage threatens the survival
of the organization and its mission critical processes through lost revenue and reputation
loss.
Stakeholders: Any individuals who have a vested interest in the outcome of the disaster
Stakeholders:
The internal stakeholders of the Archive Concepts disaster recovery plan policies and
Managers, Information Technology professionals, Human Resources and the Facilities Manager.
Impact:
In the event of a complete disruption of services, all stakeholders will be affected in some
fashion. A business impact analysis was conducted to determine the impact of a disaster on the
operations of each operating unit within Archive Concepts. The business impact analysis
complements the disaster recovery plan by identifying those applications and systems with the
greatest impact on the business in the event of a disaster. By performing an impact analysis, it
Page 22 of 40
allows for defining the most effective recovery time period for each system and application.
The following chart outlines the Archive Concepts classification of systems, critical or
Impact Matrix
System Classification Impact to Operations Coordinated Action
Email Critical Email is a critical application for Upstream SPAM partner
business purposes. If this service is has the ability to hold
unavailable, staff members are company email in the
unable to effectively communicate queue thus preventing
with each other and customers. undeliverable messages.
SQL Databases Critical SQL databases contain critical SQL databases can be
customer and Archive Concepts brought online at the
proprietary data. With services disaster recovery site
offline none of the staff can work. (San Diego).
Internet Access Critical Internet access is most critical for If necessary, VPN can
email communication. With no be established to the
Internet access there would be no disaster recovery facility
email communication. This would for individuals to access
impact business and internal the Internet.
communication operations.
ADP Payroll System Non-Critical ADP Payroll is accessed by the The ADP Payroll system
Payroll Department. This is a non- is in the virtualized
business impacting service that can environment. The
be offline for a reasonable amount VMware environment
of time. If necessary Payroll will be brought online at
operations can temporarily be the disaster recovery site
transferred to ADP. (San Diego).
iPro Critical iPro contains customer data. The iPro system is in the
Customers who rely on the ability virtualized environment.
to access this data would be The VMware
impacted negatively if the service environment will be
were unavailable. brought online at the
disaster recovery site
(San Diego).
Web Applications Critical These are customer facing web All web applications are
applications. These applications in the virtualized
interface with data such as iPro. environment. The
VMware environment
will be brought online at
the disaster recovery site
Page 23 of 40
(San Diego).
ACentral Non-Critical ACentral is an internal web The ACentral system is
application for use by Archive in the virtualized
Concepts only. This system can be environment. The
offline for a reasonable amount of VMware environment
time. will be brought online at
the disaster recovery site
(San Diego).
Great Plains Critical Microsoft Great Plains is a critical The Great Plains system
internal application for the is in the virtualized
operations, financial and human environment. The
resources departments. A disruption VMware environment
in service then staff cannot work will be brought online at
effectively. the disaster recovery site
(San Diego).
VMware Critical Archive Concepts has an extensive The virtualized
Environment VMware environment consisting of environment data is
many production virtual servers. A completely replicated to
service disruption would cause the recovery site in real
reputation damage and lost revenue. time. The VMware
environment will be
brought online at the
disaster recovery site
(San Diego).
Storage Systems Critical Many systems are interlocked with The primary NetApp
the NetApp storage systems. In storage system is
particular, VMware houses all the mirrored in real time to
datastores on the NetApp storage the disaster recovery
system. It is business critical to site. Primary storage
keep the storage system online and operations will be
accessible. Operations would be shifted to the San Diego
crippled with a service interruption. disaster recovery site.
Network Layer Critical The network layer is the backbone IT will assess failure
of the IT operations. Service point and work to define
interruption would bring all work-around or send
services offline and would be replacement equipment.
customer and revenue impacting.
Phone Critical Customers would not be able to IT staff will use phone
Communication communicate with Archive system (if available) to
Systems Concepts if the communication forward calls or attempt
systems are offline. This would to forward calls at the
affect revenue and reputation. carrier level.
Status Updates:
Page 24 of 40
Status updates will be given by various individuals as defined in the responsibility matrix.
Scope of Work:
Responsibility for the development and maintenance of the Archive Concepts disaster
responsibility for ensuring the plan is maintained and tested is assigned to the Infrastructure
Team within the Information Technology department. The end user community is responsible to
coordinate with the Help Desk Manager for their information technology requirements in the
event of a disaster.
outage in building.
Major Infrastructure events - Critical equipment failure, such as network router, voice
router failure.
Page 25 of 40
Cisco Phones (incoming calls on primary line/extensions, inter office calling, dialing
out).
Workstations (desktops/laptops)
To determine the maximum time frame allowable, the following Archive Concepts
Information Technology
Operations
Purchasing
Human Resources
Facilities
Customer Service
By interviewing all individuals, the recovery team is able to determine the maximum time
frame that each department can be without functionality of a given system without incurring
severe operational impact. The Recovery Time Objective is defined in business days as the
elapsed time between the disasters up to the point where the systems must be functional again.
Page 26 of 40
The recovery plan involves restoring the most critical systems such as network and storage
systems. Without these two systems online it would not be possible to bring VMware and the
application servers back online. The least critical services such as ACentral and the ADP Payroll
system will be focused on last. Archive Concepts must have critical systems back online within a
24 hour period to sufficiently conduct business and care for its customer’s needs.
The following chart outlines the recovery time frame for each system:
Disaster Recovery Plan”, “A team needs to be assembled that will respond in the event of a
disaster. This team should include a member or representative of Senior Management, members
from the IS Department that will perform the assessment and recovery, representatives from
Facilities, and members from the Business and User Communities to determine what level of
recovery is needed and to verify that recovery is complete”[ CITATION Chr02 \l 1033 ]. The Archive
Concepts disaster recovery plan reflects this opinion as outline in the following table:
Page 27 of 40
Responsibility Matrix
These plans/points assume employee safety is taken first priority
Role Has Ability to Actions Responsible for
General Manager Declare Emergency Ensuring impacted staff are provided real-time
Request DR plan to be instructions
activated GM to keep RGM informed
Keeps clients and others updated
General status communications / point person.
Regional General Declare Emergency Keeps HR director updated
Manager Request DR plan to be Keeps Facility director updated
activated Keeps CIO updated
Approval authority to Ensuring General Manager is aware of options
activate DR plan Keeping Senior Management updated on event
handling
Coordination with Human Resources on event
handling
Participates in general status communications
Senior Management Declare Emergency Overall Oversight / set direction
Request DR plan to be Coordination with Executive Management
activated Team
Approval authority to
activate DR plan
Information Request DR plan to be Initiates Technology tasks per DR plan
Technology activated Communicates status of technology
Updates GM as to IT tasks/progress of DR plan
action status
Human Resources Request DR plan to be Ensures any legal, employee safety concerns
(Dir/Delegate) activated are being addressed.
Approval authority to
activate DR plan
Facilities Request DR plan to be Keeping Senior Management updated on event
(Dir/Delegate) activated handling
Resources:
The Archive Concepts disaster recovery team will leverage many internal and external
resources. Internal resources can include business analysts, end users, application owners,
managers and any other resource that is required. External resources will mainly consist of
primary vendors such as NetApp, VMware, CDW, and a variety of others on an as needed basis.
Page 28 of 40
Tools:
The tools used for disaster recovery is primarily the NetApp SnapMirror technology which
allows all data from the primary East Stroudsburg datacenter facility to be replicated to the
secondary San Diego backup facility. Since Archive Concepts systems are virtualized and stored
on the NetApp storage system this make for a safe ad easy recovery process. The following flow
All other flat files that may be contained within other storage systems are taken care of by
DoubleTake replication technology. All data is destined to a NetApp storage system where it is
Sites:
Page 29 of 40
There are two sites covered within this disaster recovery document, the primary East
Recovery Procedure:
leadership will determine the appropriate data recovery strategy. The data recovery processes
shall reflect Archive Concepts’ information system priorities as outlined in the disaster recovery
timeframe matrix. Data recovery activities shall take place in a pre-planned sequential fashion so
that system components can be restored in a logical manner and should take into consideration:
Personnel: The IT leadership and workforce members as well as all the disaster recovery team
members involved in disaster recovery processes will be the most valuable resource. These
individuals may be asked to work above and beyond normal working hours. Archive Concepts
Communication: Notification of internal and external business partners will be carried out by
Salvage of Existing IT Equipment: Initial data recovery efforts will be targeted at protecting and
preserving the current media, equipment, applications and systems. A priority will be to identify
if the network layer and NetApp storage system is recoverable. The IT equipment will be further
protected from the elements or removed to a safe location, away from the disaster site if
necessary and immediate shift in focus to the secondary backup site in San Diego will be
initiated.
Page 30 of 40
Designate Recovery Site: It will be necessary to determine if the data recovery efforts can be
carried out at the original primary East Stroudsburg site or moved to the secondary backup San
Diego location. The choice of using the primary site or the secondary backup site will be
dependent on the damage and estimated recovery time of the primary data center location.
Backup Equipment: The datacenter facility in San Diego has mirrored data to an existing NetApp
storage system. Primary operations can be transferred to this facility. A new VMware
environment can be quickly built and attached to the datastores already existing at the backup
facility. However, the recovery process will rely heavily on the ability of the Archive Concepts
vendors to quickly provide replacements for the resources which were not thought of or cannot
be salvaged from the primary facility. Emergency procurement processes will be implemented to
allow the IT leadership to quickly replace equipment, supplies, software and any others items
Restoration of Data from Backups: Data recovery will rely on the availability of the backup data
from the secondary backup site. Initial data recovery efforts will focus on restoring the VMware
Restoration of Applications: IT leadership will work with the individual departments and
application owners to restore each running application. This should be a painless process since
all VM images contained working application profiles that were replicated in real time prior to
the disaster. However, it is possible that application owners must issues as a result of the new
environment.
Move Back to Primary East Stroudsburg Site: Since the disaster recovery process has taken place
at the secondary backup San Diego site, the systems that have been brought online at the
Page 31 of 40
secondary site will need to be replicated to the original site when it becomes available. After
which, the primary data center operations can be shifted back to the original location in East
Data is replicated to the secondary San Diego facility in real time. All Archive Concepts
systems are in a virtualized environment. All VM datastores which contain virtual server images
are contained within datastores stored on the NetApp systems. Since all data is current, an
interim VM environment can be used at the secondary backup facility. When the primary
datacenter facility has been restored, current data from San Diego will be replicated back to East
Page 32 of 40
____________
Section Three:
____________
Introduction:
Concepts disaster recovery plan document, policies and procedures. The disaster recovery plan
documentation is considered a living document and will need to be regularly reviewed for
accuracy given the changes in the business environment. The objective of tests is to obtain the
most value from the disaster recovery procedures. The use of test objectives and success criteria
enable the effectiveness of the disaster recovery plan and allows for business continuity.
Page 33 of 40
_______________
Maintenance of the Archive Concepts disaster recovery plan and the other business
continuity policies is critical to the success of an actual recovery and the stability of the
company. The disaster recovery plan must reflect changes to the system and networking
environments that are defined within the disaster recovery plan and be updated as appropriate. If
new technology is implemented then it needs to be addressed and added to the disaster recovery
plan. Conversely, if systems are decommissioned, they need to be removed from the disaster
recovery plan and objectives need to be redefined appropriately. The Archive Concepts disaster
recovery plan and other business continuity policies should be reviewed on a semiannual basis.
This is currently defined as a quarterly review. Each department should review the system to
which they own and suggest to IT any changes that have been made to the environment since the
last review. The IT department will then update the disaster recovery plan appropriately.
Testing a disaster recovery plan can be very complex depending on the environment. The
overall objective of a disaster recovery plan is to replicate parts of or the entire existing IT
production environment at an alternate site until normal operations have been resumed at the
primary facility. An actual test of the recovery procedures defined within the Archive Concepts
disaster recovery plan should be conducted at least twice a year to ensure the backup
technologies and procedures are still functioning and relevant. With the current technologies in
use by Archive Concepts this should be a relatively painless process. Since all critical data is on
Page 34 of 40
the NetApp system, engineers can check the replication status at any time to ensure compliance.
If there are any faults in replication status, they can be addressed at the time of discovery. If all
mirror replications have a mirrored status then all is well. Since all of the production data,
including CIFS shares, VM datastores and other assortment of LUNs are present on the primary
NetApp storage system, all of this data will be replicated to the secondary San Diego facility.
replicated VM datastore to another volume at the San Diego facility and attaching that volume to
the test VM environment. This is exactly the procedure during disaster recovery minus the need
for snapmirroring to another volume. Once the datastore is mounted to the test VM environment,
access to the individual virtual server images and configuration files will be granted. At that
point the test VM can be powered online to test access to the operating system and production
data. Any errors uncovered during the testing can be addressed without any impact to the
production environment.
The testing procedures should not interrupt production processes. Testing the transfer of
the network layer to the secondary San Diego facility is not possible during business hours.
However, a simulation of the transfer can be completed with software such as Cisco Packet
Tracer where network equipment configs can be loaded into actual IOS images to simulate the
actual Archive Concepts network layer. Accurate tests can be performed using this method to
ensure packets are routed between facilities in the event of a disaster at the primary East
Stroudsburg facility.
Once all testing is complete and errors are addressed, an update to the disaster recovery
Page 35 of 40
should review and give final authorization to the recommended documentation changes. Again,
the disaster recovery plan as well as the testing processes are living entities and should be
Page 36 of 40
________________
Project Summary:
________________
plan. The company is Archive Concepts, LLC. I am the legitimate owner of Archive Concepts.
However, some of the facts about the company need to be changed to accurately describe what a
disaster recovery plan for a medium to large business should consist of. This paper is divided
into three parts; section one, section two and section three. Section one discussed the background
of the organization, defined and explained two disaster scenarios, analyze the risks, threats and
impacts to the organization and discussed the recovery strategy. Section two contained the
recovery team mission statement, scope of work for the recovery team, the timeframe to
recovery, team participants, tools, sites in which to recover and the recovery procedures.
Stakeholders were identified, described how they are impacted and how they would be updated
throughout the recovery process. Section three discussed the importance of testing the disaster
recovery plan and reviewing it on a regular basis for accuracy and relevance. The end goal is to
This paper gave me the opportunity to explore different ideas and write about my own
engineering experiences. The topics explained in this document are actual technologies I have
used for successful backup procedures and disaster recoveries. In the end I can use the
information contained in this document for future reference to draft disaster recovery plans for
other companies I deal with. I will obviously continue my research into new technologies to
Page 37 of 40
__________
Appendices:
__________
Appendix I:
This Visio file shows the logical flow of data replication to the secondary San Diego
datacenter facility.
Page 38 of 40
Appendix II:
Page 39 of 40
__________
References:
__________
Agency, F. E. (2004, August 22). Are You Ready Guide. Retrieved October 20, 2012, from
Ready.gov: http://www.fema.gov/pdf/areyouready/areyouready_full.pdf
Bahan, C. (2003, June). The Disaster Recovery Plan. Retrieved December 10, 2012, from
SANS.org: http://www.sans.org/reading_room/whitepapers/recovery/disaster-recovery-
plan_1164
Swanson, M. (2010, May). Contingency Planning Guide for Federal Information Systems.
Retrieved December 10, 2012, from csrc.nist.gov:
http://csrc.nist.gov/publications/nistpubs/800-34-rev1/sp800-34-rev1_errata-Nov11-
2010.pdf
Tim J. Smith, P. (2002, November). CSA Explains Disaster Recovery. Retrieved December 10,
2012, from The Wiglaf Journal:
http://www.wiglafjournal.com/uncategorized/2002/11/csa-explains-disaster-recovery/
Page 40 of 40