Wipro Storage Practice

Issue and Challenges In BCP/DR Implementation

Evolution of BCP



Business Focus Requirements Driven by Recovery Expectations Decision
Restore n recover Regulation Hardware Days/Hours High Avail. Mission critical (app. Driven ) 24X7 Competition

Hardware minutes/sec

Continuous Mandatory


Source : http://www.snia.org

Information Service Availability

BCP Disaster Recovery

Replication Local High Availability (Clustering)

Disk/Volume Management (Mirroring)

Stable Backups

Information Service Availability

The time to restore business operations are a function of COST, COMPLEXITY and AVAILABILITY REQUIREMENTS Cont. availability

Mirroring Sync. replication

Rapid recovery

Async replication Slow recovery Tape backup




99%.Availability Myths of 9¶s % uptime 98 99 99. overall availability comes down to 99.99 99.50 mts 5.01 0.9999 % Downtime 2 1 0.93% .3 hrs 1.75 hrs 52.6 sec If there are 7 components in the solution with each 99.999 99.001 0.25 mts 31.50 sec Downtime / Week 3.9 99.0001 Downtime / Year 7.65 Days 8.7 hrs 10.1 0.1 mts 1 mt 6 sec 0.3 days 3.

.Enablers of BCP (Disaster) Some of the basic enablers required for BCP are protection for :- HARDWARE What a DISASTER Means for your org ? APPLICATIONS NETWORK (IP AND STORAGE) Disaster recovery is the invocation of previously developed Policies and procedures for data recovery when a threat turns into a REAL EVENT.

Key elements to enable BCP Hardware Protection Some of the common techniques used today are » » RAID Clustering Data Protection Some of the common techniques used today are » » » Backups to secondary media Snapshots Replication Disaster recovery Planning .

EU4 Disaster Recovery Planning (DR) A disaster is any event that disrupts normal business processes. . A disaster recovery plan is a set of procedures to » Avoid or reduce the risk of a disaster » Minimize the effects of a disaster » Quickly re-establish business critical processes after a disaster Disaster recovery planning involves an analysis of business processes and the attendant continuity needs.

End User.G: Backup job verification Rotation schemes to suit secondary media life.Slide 8 EU4 End User 11/8/2003 Data Protection technologies still need to be surrounded by adequate processes to guarantee their effectiveness. E. 11/8/2003 .

³If anything simply cannot go wrong. Murphy¶s Laws: ³Anything that can go wrong will go wrong´. Data drives competitive advantage and market share. Business liability depends heavily on the safety and accessibility of data. Government and legal regulations mandate data protection and privacy.

Slide 9 EU6 The US $ numbers are relevant to Indian business as a rough measure. Give examples based on audience business mix. "An industry survey suggests that it takes about 20 days and costs thousands of dollars to re-type 20Mb of sales and accounting data. E. Business Liability (Data Protection Act. End User.) India is drafting one as we speak. The reliance of the service sector on the US economy adds to the appropriateness. 11/9/2003 .G: Clients might chose a competitor based on the DR/DP processes in place." As of early 2003 80% of all U-S-based companies and 91% of all European-based companies did not have a formal Disaster Recovery/business continuance plan. Privacy laws etc.

Before embarking on a DR plan. No single plan can fit every organizations goals. It is important to remember that«« Buy in from the management is crucial for success. DR for IT infrastructure and services is only one component of an effective business continuity plan. It is better to have an alternate plan than none at all. » Look beyond ³just backups´ DR planning is a continuous process. . Consult with peers in your industry who have or are looking at implementing a DR plan..

Project Initiation Risk and Business Impact Analysis Requirements Definition Mitigation Plan Development Evaluate DR/DCP Solutions Solution Implementation Testing/Exercising Sustenance Program .Key to successful DR Implementation Pre-Planning.

Provide management with a comprehensive understanding of the total effort required to develop and maintain an effective recovery plan*. » » » . Shows the company¶s organizational maturity NPV (Net Present Value) is a compelling reason for assessing the value of managing risk. Project Initiation Building up the ³Management Buy In´ » » » » Document the impact of an extended loss to operations and key business functions. Get an agreement of support from the CIO/CEO. Reduces the risk of losing existing and new customers due to downtime Reduces client risk. Illustrate past failures and their adverse impact on the organization and it¶s customer base. Describe how the plan can be used as an advantage in the marketplace.EU18 Pre-Planning. Build a case around protecting data in accordance with government regulations or meeting legal requirements that help a corporate avoid liability.

Slide 12 EU18 One approach to measuring the business value of data protection that is working well for IT professionals engaged in risk analysis is Net Present Value (NPV) Budgets are shrinking and any spending that does not directly drive revenue will be severely scrutinized. Total cost of ownership (TCO). NPV (Net Present Value) accumulates today s value of future cash flows over a given period. return on investment (RoI) or the amortization of cost as overhead fall short of adequately quantifying the value of a DR plan. End User. 11/8/2003 .

This should cover as many of the business units and locations as possible. Project Initiation DR Planning Awareness Program » Evangelize the need for and benefits of a DR plan to all the affected constituents. » Build support within your organization. » Assess the resources available internally. PROFESSIONAL DR TEAM FORMATION . Schedule Interviews » Develop an understanding of the company¶s business processes. for the success of the projects and further maintenance.EU25 Pre-Planning.

Slide 13 EU25 One approach to measuring the business value of data protection that is working well for IT professionals engaged in risk analysis is Net Present Value (NPV) Budgets are shrinking and any spending that does not directly drive revenue will be severely scrutinized. Total cost of ownership (TCO). 11/8/2003 . return on investment (RoI) or the amortization of cost as overhead fall short of adequately quantifying the value of a DR plan. NPV (Net Present Value) accumulates today s value of future cash flows over a given period. End User.

Hackers etc. Failure » Computer Shutdowns due to Worms. Fire) » Man Made disasters (Civil strife.EU21 Risk and Business Impact Analysis A Disaster Recovery Plan should prioritize and assess the impact of these (and other) possible events: » Computer Software or Hardware Failures » Power Disruptions. HR. Terrorist Acts. Earthquakes. application) . Storms. » Loss of Key Personnel » Natural Disasters (Flood. Services. Sales. Viruses. Legal etc. Marketing.) Critical applications and systems List of restored functions in order of priority Define ³DISASTER´ for every IT asset (h/w. International War) A Business Impact Assessment Report should document the impact of the above events on Every major business operation (Customer Support. Finance.

End User.Slide 14 EU21 Page 19 Table. 11/14/2003 .

On-going costs. This should cover at a minimum » » » » » » » » » » » Information Systems (Email.EU22 Requirements Definition Document existing systems and processes. Capital Costs NPV of Risk. Capital. ERP etc) Network and Operations Services Voice Communications Technology Support Key Business Units Priorities and Processes Analyze the BIA report Use the following metrics for developing the requirements RPO (Recovery Point Objective) RTO (Recovery Time Objective) RCA (Risk Coverage Allocation) Personnel. File Servers. Develop the base project management plan Dependency based Prioritize all your requirements . DB.

       This time window is commonly known as the End User. it may be days or even weeks. there is a breathing space before the impact begins to bite. Typically. the time window may be minutes.Slide 15 EU22 An effective BIA will assess the impact of disaster over time. In real time financial operations. 11/14/2003 . The length of time depends in part on the process and in part on the industry. many organizations close down for weekends and public holidays. After all. For other organizations. The impact analysis has to identify what this time window is by which recovery has to be in place.

Recovery Metrics Weeks Days Hours Mins Secs Secs Mins Hours Days Weeks Lost Data Time To Resume Business Recovery Point Objective (Max data loss you can tolerate) Recovery Time Objective (Max downtime you can tolerate) Cost Cost Define and freeze upon the SLAs for all applications .

EU23 Mitigation Plan Development Plan Scope. Objectives and Assumptions Assemble Team » » » » » » » » » » Define Team Responsibilities and Roles All critical functions should have multiple owners Maintain a Personnel Directory Every team member should have a hard copy. Plan Progress Binder Develop prevention processes Maintain good general housekeeping Observe physical security procedures Observe information security procedures Recovery Preparedness Cold Sites and Hot Sites On Site Spares Remote skills and manpower allocation Latest set of Recovery Documentation .

Slide 17 EU23 Status Report Form Document discussions. 11/14/2003 . gaps. End User. progress.

Snapshots Extended SAN Solutions (MAN) via DWDM Remote Solutions (>120 KM) Offsite Storage of Tape Replication Second site Service provider SAN Extensions FCIP . Internal vs. Media issues Failure rate of backups. Remote Active vs. RAID across multiple storage subsystems. Backup to Tape or Disk. Clustering. Recovery issues. Passive Classify servers/applications/databases Local Solutions » » » » » » » » » » High Availability through RAID.Evaluate DR/BCP Solutions Classifying solutions » » Primary(local) vs. External Storage SAN Attached Storage subsystems.

(Databases etc).Evaluate DR/BCP Solutions Primary(Local) Vs Remote » » » » » High Availability through local or WAN based clusters. could be Active-Active Replication solutions Volume. Application Level Clustering. File and App level Distance Limitations Storage based replication or Host Based replication Active-Passive » » Standby servers No performance impact on app Distance agnostic Infrastructure » » » UPS. Generators Use multiple vendors for WAN connectivity Backup lines Generate CBA for these solutions and arrive at the best !!! .

Branch Office: Second Data Center Example DP/DR Scenario 10 TB DAS WAN Public/Private Firewall + Router Monitoring Firewall + Router Backup Server Service Provider: Remote DR Site Database Servers App Servers NAS Filer Redundant SAN Fabric Tape Library Cold Site ± Secure Offsite Tape Vault Storage Arrays SAN Attached Tape Libraries .

one can expect to transfer 630MB / hour on an E1 link .With reasonable latency and 30% network overhead.Network/WAN based considerations Data Replication to remote sites requires additional planning ‡IP Address Space ‡DNS Resolution ‡Redundant DR Links The distance between sites determines the options Multiple Link providers At least one high speed link desirable ‡Initial Data synchronization strategy A practical observation .

An Info Tabloid  Moving 10 TB requires: 2.7 hours using OC-12 (622 Mb/s) 6 days using OC-3 (155 Mb/s) ««If the pipe is fully utilized! .5Gb/s) 14 hours using ´2Gµ FC (1600 Mb/s) 28 hours using ´1Gµ FC (800Mb/s) 35.25 hours using OC-192 (10Gb/s) 9 Hours using OC-48 (2.

Solution Implementation Detailed Project Planning Solution Training and Trials ± Test bed Scheduling » Downtime » Vendor/Service provider coordination » DR Drills on test bed Preventive measures » Strict Change Control procedures for every production application >> Backup data before changing .

Testing Procedures » » » Detailed recovery procedures for every restore and possible disasters. Least impact first.Testing/Exercising Testing Goals and Strategies » » » » Define the test purpose and approach Identify the test team Use non DR team members (INVOLVE END USERS ) Match DR requirements Prioritize Most important first vs. Analyzing test results and modify the plans as appropriate Retest! TEST AND DOCUMENT . Initial Test Report Detailed report of testing plan.

Sustenance Program »Establish a Corporate DR Cycle Should include all the above considerations Periodic DR Drill »Result evaluation by a review council Include top management »Document short comings and failures Gaps between requirements and objectives met »Retraining Refreshing the DR team »Upkeep the recovery documentation periodcally .

» New systems » Different Priorities Never take DR related decisions in hurry Patience and persistence are most Important cornerstones to achieve tough ³milestones´. «Slow and steady can still win the race«. Lack of upkeep on the documentation. Failing to adapt plans to organizational changes.Look Out for « Lack of management commitment. . Lack of periodic testing of the plan. Under allocation of resources.

