State Data Center

Oregon

Disaster Recovery Overview
Presented to: Disaster Recovery Coordinators Meeting Date: 3/19/2009

SDC Service Continuity View SDC can be used Operational Recovery ± Restoration of normal service is part of standard rates SDC partially usable SDC not usable Disaster Recovery ± additional costs apply Impact to service delivery Grey area Expected time to return equipment to normal service will determine whether DR is invoked Normal Operations Bug or minor issue where application is still functioning Severity 4 Severity 3 Severity 2 Severity 1 Major issue with high impact ± equipment not usable Disaster 2 .

backup resources. and communications connectivity ‡ Keeps vendor informed of changes Disaster Recovery Vendor ‡ Provides environment for disaster recovery and testing (³hot sites´ and portable sites) ‡ Hosts DR tests ‡ May provide consulting and operations support as contracted 3 . backup & recovery ‡ Prioritizes recovery sequence within agency ‡ Tests agency DR plans ‡ Determines scope and declares disaster for out-ofscope IT ‡ Arranges for backups of data and applications ‡ Keeps vendor informed of changes SDC ‡ Plans internal continuity of service delivery if infrastructure must relocate ‡ Contracts with DR vendor to provide infrastructure environment in case of SDC disaster ‡ Determines scope and declares SDC disaster to DR vendor ‡ Coordinates Cross-Agency priority sequencing ‡ Tests SDC DR plans ‡ Coordinates movement of people.Successful DR Requires Cooperation Participating Agency ‡ Plans DR based on business needs and priorities ‡ Acquires DR services for out-of-scope IT ‡ Funds DR planning.

What BCP Coordinators need to know ‡ ‡ ‡ ‡ ‡ ‡ ‡ ‡ Who is your DR coordinator? Has your agency done DR planning? What applications are needed to support critical business functions? Where are those applications hosted? What are the disaster recovery time objective (RTO) and recovery point objective (RPO) for each of those applications? Are the applications and their data backed up frequently enough to meet RPO? Is the recovery option and grouping of back ups for each application reasonable for the RTO? Will the agency¶s budget planning support the cost associated with meeting the desired RTO and RPO level? 4 .

blackberry. email. IM) ± What are the recovery procedures for agency infrastructure. communications. LDAP.. plus: ± Who is your BCP coordinator? ± What agency infrastructure will need to be recovered before recovered applications and data will be accessible to users? (e.g. DNS. and data? ± What are DR testing plans? ± What are the procedures for keeping all of this up to date? 5 .g.What DR Coordinators need to know ‡ Answers to all the questions in What do BCP Coordinators need to know. Active Directory. applications.. networks) ± What communications vehicles are expected to be available during a disaster? (e.

applications and data ‡ Develop and implement tiered DR strategies ‡ Develop DR test plans and execute initial tests ‡ Develop and implement DR maintenance process 6 . scope and develop backup and recovery for SDC core infrastructure and infrastructure needed to support agency recovery requirements ‡ Assist agencies with identifying and scoping DR requirements for their infrastructure.SDC DR Project Actions ‡ Develop DR planning framework and templates with SunGard ‡ Identify.

get agency approval to proceed ‡ Provide detailed planning information ‡ Plan agency testing 7 .Working with the SDC on DR Planning ‡ Submit request for DR planning and preparation through normal agency procedures ‡ Provide initial information on DR requirements ‡ Once potential solutions are scoped and priced.

etc. reporting. Crystal Reports. e. 8 Needed for planning the best recovery strategy Needed for planning the best backup strategy Needed for: ‡ estimating cost ‡ aggregating need Software Component Vendor Software Component Type(s) Server or LPAR Name Server or LPAR Operating System . Unix.Days (RPO) Software Component / Database Name Your agency acronym How is the application most commonly known? Who could answer questions about the infrastructure needs of the application? What is the best way to reach the primary technician? If a disaster occurred at this moment.g. Oracle. connectivity. What's the component's primary function? e.Days (RTO) Recovery Time Objectives (RTO) Special Notes Recovery Point Objectives . Windows.. etc. how long could the business work around being able to have this application available? Special conditions for restoration within this RTO . Generally what type of server or LPAR is it? e.Would the RTO differ at different times of the month or year? How many days worth of new data can be lost or recreated by other means? Are yesterday's backups good enough to recover from? Last week's? A single application can consist of one or many components. iSeries. dbms. ColdFusion.. Microsoft. data conversion.g.Key data for DR Planning Needed for getting to more detail Agency Application Name or ACRONYM Technical Contact Technical Contact phone/email Recovery Time Objectives . etc.. Who is the component manufacturer? e. Database name. etc.. List every computer or server or appliance that makes up the entire application environment. zLinux. Intel Linux.g.g. IBM. Please list all primary components. Mainframe.

µload balancing¶.Recovery Options Relative Cost for DR $$$$$ Recovery Category A++ Recovery Option Recovery Time Target 0 hrs Comments µMirroring¶.72 hrs > 72 hrs. or µsplit site¶ ± automatic fail-over to site away from home site Hot standby ± µmirroring¶ at site away from home site ± fail-over requires some actions Warm standby ± first wave of recoveries Warm standby ± subsequent waves cold standby or lower priority recovery in warm standby site cold standby or lower priority recovery in warm standby site wait until return to home site System will be replaced or rebuilt from scratch Immediate recovery $$$$ $$$ $$$ $$ $$ $ $ A+ A B C D E F Fast Recovery Intermediate recovery Intermediate recovery Gradual Recovery Gradual Recovery Manual Workarounds Replace < 48 hrs 48 .4 weeks > 4 weeks as time permits none 9 . < 1 week 1 .

v110 7/30/2008.Recovery Timeline MAD RPO RTO Work backlog. Workaround procedures Work Recovery Recover lost transactions. Accomplish backlogged work Restoration Time Rebuild business continuity systems Lost transactions Last backup or data replication Disaster event Systems recovered Business process meeting SLAs Business continuity protection restored * Source: Building a Business Impact Analysis: The Keystone to Effective Business Continuity Planning by Richard Jones. Burton Group .

Work Recovery ± The work time required to recover the lost transactions of the RPO time plus the backlog of work created during the system outage. the amount of IT systems data or transaction loss that can be tolerated by the business process RTO ± Recovery Time Objective. 11 ‡ ‡ ‡ ‡ . Maximum Acceptable Downtime (MAD). Maximum Tolerable Outage (MTO). RPO ± Recovery Point Objective.Definitions for Recovery Timeline ‡ MAD ± Maximum Allowable Downtime. Restoration time ± Time to bring the business process back to a state of full business continuity protection. the time IT organizations have to recover their systems to an agreed upon operational state so that workers may then recover the lost time of the outage to bring the business process back to acceptable service levels. Maximum Allowable Outage (MAO). Maximum Tolerable Downtime (MTD). Basically this is backing up the recovered system and restoring redundancy capabilities. Also called Maximum Acceptable Outage (MAO). the maximum amount of time the business can suffer an inoperable business process before significant negative consequences are felt. Lost transactions must be recovered manually and procedures should be in place to accomplish this work. and Maximum Tolerable Period of Disruption (MTPD).

Sign up to vote on this title
UsefulNot useful