You are on page 1of 109

Chapter - 6

Cybersecurity Incident Management,


Business Continuity, and Disaster
Recovery

1
Outline
• Contingency Planning
• Incident Management
• Running a business continuity and disaster recovery
planning project
• Developing business continuity and disaster recovery
plans
• Testing business continuity and disaster recovery plans
• Training users
• Maintaining business continuity and disaster recovery
plans
2
Contingency Planning

3
Introduction
• Need planning for the unexpected event,
– when the use of technology is disrupted and
– business operations come close to a dead end

• Procedures are required that will permit the


organization to continue essential functions,
– if IT support is interrupted

• Over 40% of businesses that don't have a disaster


plan go out of business after a major loss.

4
What Is Contingency Planning?

• Contingency planning (CP)


– Overall planning for unexpected events
– to prepare for, detect, react to, and recover from events that
threaten the security of information resources and assets.

• Main goal:
– restoration to normal modes of operation
– with minimum cost and disruption to normal business
activities
– after an unexpected event.
5
CP Components
• Incident response (IRP)
Contingency
– focuses on immediate Planning
response
• Disaster recovery (DRP) Incident Disaster Business
– focuses on restoring response recovery continuity
operations at the primary
site after disasters occur
• Business continuity (BCP)
– facilitates establishment of
operations at an alternate
site

6
CP Operations

• Four teams are involved in contingency planning and


contingency operations:
– CP team
– Incident recovery (IR) team
– Disaster recovery (DR) team
– Business continuity plan (BC) team

7
Contingency Plan
Implementation Timeline

8
Putting a Contingency Plan
Together
• The CP team should include:
– Project Manager
– Team Members
• Business managers
• Information technology managers
• Information security managers

9
Major Tasks in Contingency
Planning (for CP team)

10
Cybersecurity Incident Management

11
Key definitions
• One of the primary goals of any security program is to
prevent security incidents.
• Cybersecurity event
– A cyber security change that may have an impact on organizational
operations (including mission, capabilities, or reputation).
• Cybersecurity incident
– A single or a series of unwanted or unexpected cybersecurity events
that are likely to compromise organizational operations.
• Cybersecurity incident management
– Processes for preparing, detecting, reporting, assessing, responding to,
dealing with and learning from cyber security incidents.

12
Common Security incidents

• Examples of common events that most organization


classifies as security incidents, such as the following:
– Any attempted network intrusion
– Any attempted denial-of-service attack
– Any detection of malicious software
– Any unauthorized access of data
– Any violation of security policies

13
Common Security incidents…
• Unauthorized access to a system
– unauthorized viewing of information
– bypassing access controls
– using another user’s access
– denying access to another user
• Unauthorized modification of info on a system
– corrupting information
– changing information without authorization
– unauthorized processing of information
14
Incident-handling procedures
• Before the Incident
– Planners draft a set of procedures, those tasks that must
be performed in advance of the incident

• Include:
– Details of data backup schedules
– Disaster recovery preparation
– Training schedules
– Testing plans
– Business continuity plans

15
Incident-handling procedures…
• During Incident
– Planners develop and document the procedures that must
be performed during the incident
• These procedures are grouped and assigned to various roles
• Planning committee drafts a set of function-specific procedures

• After the Incident


– Planners develop and document the procedures that must
be performed immediately after the incident has ceased
• Separate functional areas may develop different procedures

16
Incident Management Steps
• Effective incident response management is handled in
several steps or phases.

Detection Response Mitigation Reporting Recovery

Remediation Lesson Learned

17
Incident Detection
• Challenge
– determining whether an event is routine system use or an actual
incident
• Incident classification:
– process of examining a possible incident and
– determining whether or not it constitutes actual incident
• Ways to track and detect incident candidates
– Initial reports from end users, intrusion detection systems, host-
and network-based virus detection software, and systems
administrators
• Careful training and awareness allows everyone to convey
vital information to the IR team
18
Incident Indicators
• Probable Indicators • Definite Indicators
– Presence of unfamiliar files – Changes to logs
– Presence or execution of – Presence of hacker tools
unknown programs or processes
– Notifications by partner or peer
– Unusual consumption of – Notification by hacker
computing resources
– Use of dormant accounts
– Unusual system crashes
– Activities at unexpected times
– Presence of new accounts
– Reported attacks
– Notification from IDS

19
Occurrences of Actual
Incidents
• Loss of availability
• Loss of integrity
• Loss of confidentiality
• Violation of policy
• Violation of law

20
Incident Detection…
• The following list identifies many of the common
methods used to detect potential incidents.
• It also includes notes on how these methods report the incidents:
– IDS/IPS send alerts to administrators when an item of interest occurs.

– Anti-malware software will often display a pop-up window to indicate


when it detects malware.

– Many automated tools regularly scan audit logs looking for predefined
events, such as the use of special privileges.
• e.g. system integrity verification tools, log analysis tools, network and host
IDS/IPS, SIEM, etc…
• When they detect specific events, they typically send an alert to
administrators.
• Should be updated to reflect new attacks or vulnerabilities 21
Incident Detection…
• End users sometimes detect irregular activity and contact
administrators for help.
– when users report events such as the inability to access a
network resource, it alerts IT personnel about a potential
incident.
– train and encourage such reporting

• admins must monitor vulnerability reports

22
Handling An Actual Incident: Contain,
Eradicate, and Recovery

23
Handling An Actual Incident
• What should you do to regain control once you have
detected a cyber security incident?
• Important decisions will have to be taken about:
• how to contain the incident,
• how to eradicate it and
• how to recover from it.

• Validation of these decisions by organization's top


management is absolutely necessary.

24
Incident Response
• Once an actual incident has been confirmed and
properly classified,
– the IR team moves from detection phase to response phase

• In the incident response phase,


– Action steps taken by the IR team:
• must occur quickly and
• may occur concurrently

25
Incident Response…
• Many organizations have a designated incident response
team:
– Computer incident response team (CIRT), or
– Computer security incident response team (CSIRT).

• Team Members would assist with:


– investigating the incident,
– assessing the damage,
– collecting evidence,
– reporting the incident,
– Initiating recovery procedures,
– remediation and lessons learned stages,
– help with root cause analysis.
26
Incident Mitigation (containing)
• Mitigation steps attempt to contain an incident.
• One of the primary goals of an effective incident
response is to limit the effect or scope of an incident.

Recover quickly or gather evidence?


• Containing a cyber security incident is all about:
– limiting the damage and
– stopping the attacker.
• Find a way to both:
– limit the risk to an organization and
– keep it running at the same time.
27
Incident Containment Strategies
• Incident containment strategies focus on two tasks:
– Stopping the incident
– Recovering control of the systems

• IR team can stop the incident and attempt to recover control


by means of several strategies:
– Disconnect affected communication circuits
– Dynamically apply filtering rules to limit certain types of network
access
– Disable compromised user accounts
– Reconfigure firewalls to block problem traffic
– Temporarily disable compromised process/service
– Take down application or server
– Stop all computers and network devices
28
Containing a cybersecurity incidents…

• At the beginning of this phase, organization will have


to make an important strategic decision:
1. disconnect the systems immediately in order to recover as
quickly as possible Or
2. take time to collect evidence against the cybercriminal who
perpetrated the system

• Find a balanced solution between these two options.


• Which decision an organization will make will depend on
the scope, magnitude and impact of the incident.

29
Containing a cybersecurity incidents…
• The following criteria can help you to evaluate:
– What could happen if the incident were not contained?
– Is the attack or breach doing immediate severe damage?
– Is there (potential) damage and/or theft of assets?
– Is it necessary to preserve evidence? And if so, what sources
of evidence should the organization acquire? Where will the
evidence be stored? How long should evidence be reserved?
– Is it necessary to avoid alerting the hacker?
– Do you need to ensure service availability or is it OK to take
the system offline?

30
Disconnect Vs. Watch and learn
Disconnect Watch and learn
• Will you shut down the system
and/or disconnect the network in • Will you continue operations for the
order to be able to recover as fast as time being and monitor the activity in
possible? order to be able to perform advanced
forensics and gather as much evidence
– This approach does not allow you to do a
thorough investigation.
as possible?
– You will alert the attacker that you have • Forensics can take time, so it might
discovered him. take a little longer to get back to
• This is the fastest way back to business as usual.
business – This is a more thorough approach and you
are more likely to tackle the root causes of
• A quick response can help reduce the the problem and get rid of it for good
window of time an attacker or
malware has to spread into your • Gathering evidence is essential if you
system and networks. want to find and prosecute the
attacker.

31
Investigating: Gathering evidence
• If you want to:
– tackle the problem at its root and
– identify the perpetrator for prosecution,
• you will need to preserve the evidence.

• To gather evidence, forensic investigation must be


performed before you eradicate the incident.
• When incident violates civil or criminal law,
– it is organization’s responsibility to notify proper authorities
– Selecting appropriate law enforcement agency depends on
the type of crime committed:
• Federal, State, or Local 32
Things to remember
• In order to be acceptable in court, evidence should be collected
according to procedures that meet all applicable laws and
regulations.
• You should avoid compromising evidence.
• For example, it is not a good idea to:
• Immediately shut down your server
– You might not be able to identify the cause of the incident, or the
criminal.
– If you shut the server down, you clear out the memory on the server.
– This means you will not be able to perform memory forensics, because
you will have nothing left to analyze.
– You might be destroying crucial evidence, because RAM memory often
contains a lot of traces of malware.
– Before shutting down your server, it needs to be dumped on a USB 33
drive.
Things to remember…
• For example, it is not a good idea to:
• Restore your system from a back-up, if you are not sure the
back up is not infected itself
– Your backup may be infected: APTs can infect your network for a long
period without you noticing.
– That makes the risk of a backup infection likely.
– Installing an infected backup could recreate the infection.
• Reinstall on the same server without a foreinsic copy
• …

34
Incident Escalation
• An incident may increase in scope or severity to the
point that the IRP cannot adequately contain the
incident
• Each organization will have to determine, during the
business impact analysis, the point at which the
incident becomes a disaster
• The organization must also document when to
involve outside response

35
Eradication and Clean up
• The eradication can take many forms. It often implies actions
such as:
– Remove all components related to the incident, all artifacts left by the
attacker (malicious code, data, etc.) and
– Running a virus or spyware scanner to remove the offending files and
services
– Updating signatures
– Deleting malware
– Disabling breached user accounts
– Changing passwords of breached user accounts
– Identifying and mitigating all vulnerabilities that were exploited
– Informing employees about the threat and giving them instructions on
what to avoid in the future
– Informing external stakeholders such as the media and your customers 36
Incident Reporting

• Reporting refers to reporting an incident:


– within the organization
– to organizations and
– individuals outside the organization (law firms, media,…)
– Attorney offices, Federal Investigation offices
– INTERPOL

• Upper-level management need to know about serious


security breaches.

37
Initiating Incident Recovery
• Incident recovery
– After the incident has been contained and
– system control regained
• IR team must assess full extent of damage in order to
determine what must be done to restore systems

• Incident damage assessment


– determination of the scope of the breach of confidentiality,
integrity, and availability of information/information assets
– Document damage and
– Preserve evidence - in case the incident is part of a crime or
results in a civil action
38
Recovery Process…
• Once the extent of the damage has been determined, the
recovery process begins:
– Identify and resolve vulnerabilities that allowed incident to occur and
spread
– Address, install, and replace/upgrade safeguards that failed to stop or
limit the incident, or were missing from system
– Evaluate monitoring capabilities (if present) to improve detection and
reporting methods, or install new monitoring capabilities
– Restore data from backups as needed
– Restore services and processes in use where compromised and
interrupted
• services and processes must be examined, cleaned, and then restored
– Continuously monitor system
– Restore the confidence of the members of the organization’s
communities of interest 39
Recovery Process…
– For simple incident, rebooting
– For major incident, rebuilding a system, restoring
from recent backup
• (ACLs) configuration and ensuring that unneeded services
and protocols are disabled or removed,
• all up-to-date patches are installed, and
• that user accounts are modified from the defaults
•…
– Rebuild the system from zero

40
Incident remediation
• In the remediation stage, personnel:
– look at the incident and
– attempt to identify what allowed it to occur, and
– then implement methods to prevent it from happening again.
• This includes performing a root cause analysis.
• Example:
– Web server didn’t have up-to-date patches, allowing the attackers to gain
remote control of the server.
– Remediation steps might include implementing a patch management
program.
– Website application wasn’t using adequate input validation techniques,
allowing a successful SQL injection attack.
– Remediation would involve updating the application to include input
validation.
– Maybe the database is located on the web server instead of in a backend
database server.
– Remediation would mean moving the database to a server behind an 41
additional firewall.
Lesson Learned
• During the lessons learned stage, personnel examine the
incident and the response to see if there are any lessons to be
learned.
• Example,
– administrators may realize that attacks are getting through
undetected and
– increase their detection capabilities and
– recommend changes to their intrusion detection systems.
• Based on the findings, the team may recommend:
– changes to procedures,
– the addition of security controls, or
– even changes to policies.

42
Documenting an Incident
• While the notification process is underway, the team
should begin documentation
– Record the who, what, when, where, why and how of
each action taken while the incident is occurring
• Serves as a case study after the fact to determine:
– if right actions were taken and
– if they were effective
• Can also prove the organization did everything possible to
deter the spread of the incident.

43
After Action Review
• Before returning to routine duties, the IR team must
conduct an after-action review, or AAR
– AAR: detailed examination of events that occurred

• All team members:


– Review their actions during the incident
– Identify areas where the IR plan worked, didn’t work, or
should improve

44
Incident Management Steps: summary
• Effective incident response management is handled in
several steps or phases.

Detection Response Mitigation Reporting Recovery

Remediation Lesson Learned

45
Business Continuity and Disaster
Planning Basics
Disaster Recovery and
Business Continuity Planning

47
What Is a Disaster?
• Any natural or man-made event that disrupts the
operations of a business in such a significant way that
– a considerable and coordinated effort is required to
achieve a recovery.

• Disaster recovery plan


– Preparation for and recovery from a disaster,
whether natural or man made

48
Disaster Recovery…
• In general, an incident is a disaster when:
– organization is unable to contain or control the impact of
an incident
OR
– level of damage or destruction from incident is so severe
that the organization is unable to quickly recover

• Key role of DRP:


– defining how to reestablish operations at location where
organization is usually located (primary site)

49
Incident Response and
Disaster Recovery

50
Disaster Classifications
• A DRP can classify disasters in a number of ways
• Most common:
– Separate natural disasters from man-made disasters

• Another way: by speed of development


– Rapid onset disasters
• Earthquake, floods, storms, tornadoes
– Slow onset disasters
• Droughts, famines, environmental degradation, deforestation,

51
Natural Disasters
• Geological: earthquakes, volcanoes, tsunamis,
landslides, and sinkholes
• Meteorological: hurricanes, tornados, wind storms,
hail, ice storms, snow storms, rainstorms, and
lightning
• Other: avalanches, fires, floods, meteors and
meteorites, and solar storms
• Health: widespread illnesses, quarantines, and
pandemics

52
Man-made Disasters
• Labor: strikes, walkouts, and slow-downs that
disrupt services and supplies
• Social-political: war, terrorism, sabotage, vandalism,
civil unrest, protests, demonstrations, cyber attacks,
and blockades
• Utilities: power failures, communications outages,
water supply shortages, fuel shortages, and
radioactive fallout from power plant accidents
• Materials: fires, hazardous materials leaks
53
How Disasters Affect Businesses
• Direct damage to facilities and equipment
• Transportation infrastructure damage
– Delays deliveries, supplies, customers, employees going to
work
• Communications outages
• Utilities outages

54
Planning for Disaster

• Key points in the DRP:


– Clear delegation of roles and responsibilities
– Execution of alert roster and notification of key personnel
– Clear establishment of priorities
– Documentation of the disaster
– Action steps to mitigate the impact
– Alternative implementations for various systems components

55
Business Continuity Planning
(BCP)
• Ensures CRITICAL BUSINESS FUNCTIONS can continue in a disaster
• Most properly managed by top management of organization
• Activated and executed concurrently with the DRP when
needed
• Reestablishes critical functions at alternate site (DRP focuses
on reestablishment at primary site)
• Relies on identification of CRITICAL BUSINESS FUNCTIONS and the
resources to support them.

56
How BCP and DRP
Support Security
• BCP (Business Continuity Planning) and DRP
(Disaster Recovery Planning)
• Security pillars: C-I-A
– Confidentiality
– Integrity
– Availability
• BCP and DRP directly support availability

57
BCP and DRP Differences
and Similarities
• BCP
– Activities required to ensure the continuation of critical
business processes in an organization
– Alternate personnel, equipment, and facilities are required
– Often includes non-IT aspects of business
• DRP
– Assessment, recover, repair, and eventual restoration of
damaged facilities and systems
– Often focuses on IT systems

58
Industry Standards Supporting
BCP and DRP

• ISO 27001: Requirements for Information Security


Management Systems.
• Section 14 addresses business continuity management.
• ISO 27002: Code of Practice for Business
Continuity Management.

59
Industry Standards Supporting
BCP and DRP (cont.)
• NIST 800-34
– Contingency Planning Guide for Information
Technology Systems.
– Seven step process for BCP and DRP projects
– From U.S. National Institute for Standards and
Technology
• NFPA 1600
– From U.S. National Fire Protection Association
– Standard on Disaster / Emergency Management and
Business Continuity Programs
60
Industry Standards Supporting
BCP and DRP (cont.)

• NFPA 1620: The Recommended Practice for Pre-


Incident Planning.
• HIPAA: Requires a documented and tested
disaster recovery plan
– U.S. Health Insurance Portability and Accountability Act

61
Benefits of BCP and DRP Planning
• Reduce risk
• Process improvements
• Improve organizational maturity
• Improve availability and reliability
• Marketplace advantage

62
Business Continuity Strategies
• Several continuity strategies for business continuity
– Determining factor is usually cost

• Three exclusive-use options:


– Hot sites
– Warm sites
– Cold sites

• Three shared-use options:


– Timeshare
– Service bureaus
– Mutual agreements
63
Exclusive Use Options

• Hot Sites
– Fully configured computer facility with all services

• Warm Sites
– Like hot site, but software applications not kept fully
prepared

• Cold Sites
– Only basic services and facilities kept in readiness

64
Shared Use Options
• Timeshares
– Like an exclusive use site but leased
• Service Bureaus
– Agency that provides physical facilities
• Mutual Agreements
– Contract between two organizations to assist
• Specialized alternatives:
– Developing mobile site
– Externally stored resources

65
Off-Site Disaster Data Storage

• To get any BCP site running quickly, organization must


be able to recover data.

• Options include:
– Electronic vaulting: bulk batch-transfer of data to an off-
site facility.
• makes copies of files as they are modified and periodically transmits them
to an offsite backup site.
– Remote Journaling: transfer of live transactions to an off-
site facility.
– Database shadowing: storage of duplicate online
transaction data.
66
Running a BCP / DRP Project
Running a BCP / DRP Project
• Main phases
– Pre-project activities
– Perform a Business Impact Assessment (BIA)
– Develop business continuity and recovery plans
– Test resumption and recovery plans

68
Pre-project Activities
• Obtain executive support
• Formally define the scope of the project
• Choose project team members
• Develop a project plan
– Get a project manager
• Develop a project charter
– A document listing all these items, plus budget, and
milestones

69
Business Impact Assessment (BIA)
Business Impact Analysis
(BIA)
• BIA
– Provides information about systems/threats and detailed scenarios
for each potential attack
– Not a risk management focusing on identifying threats,
vulnerabilities, and attacks to determine controls
• BIA Assumption:
– controls have been bypassed or are ineffective and attack was successful
• CP team conducts BIA in the following stages:
– Threat attack identification
– Business unit analysis
– Attack success scenarios
– Potential damage assessment

71
Attack Success Scenarios
• Some examples of well-constructed risk scenarios are illustrated below.
Legend: Threat Event | Vulnerability | Asset | Consequence

Example 1: Risk Scenario

Attacker performs an SQL injection on an unpatched legacy web


application to download sensitive patient medical records.

Example 2: Risk Scenario

Internal staff makes a fraudulent payment instruction exceeding bank


account balance on the payment system with no set limit, resulting in a
bank overdraft.
72
Attack Success Scenarios …
Legend: Threat Event | Vulnerability | Asset | Consequence

Example 3: Risk Scenario

Unauthorized employee accesses the SCADA server using default login


credentials and execute shutdown command to disrupt the water
supply to the entire east side of Ethiopia.

Example 4: Risk Scenario

Attacker delivers spear-phishing email to unsuspecting user, which when


clicked, triggers the user account to perform SMB authentication with
malicious server and discloses hashed credentials.

73
Performing a Business
Impact Assessment (BIA)…
• Survey critical processes
• Perform risk analyses and threat assessment
• Determine Maximum Tolerable Downtime (MTD)
• Establish key recovery targets

74
Survey In-scope
Business Processes
• Develop interview template
• Interview a representative from each department
– Identify all important processes
– Identify dependencies on systems, people, equipment
• Collate/organize data into database or spreadsheets
– Gives a big picture, all-company view
– This phase is BUI (Business Unit Analysis) phase

75
Threat and Risk Analysis

• Identify threats, vulnerabilities, risks, for each key


process
– Rank according to probability, impact, cost
– Identify mitigating controls
– Add attack profile to it
• detail description of the threats that are likely to attack your
organization

76
Determine Maximum
Tolerable Downtime (MTD)

• The time available to recover disrupted systems and


resources (systems recovery time).
• For each business process
– Identify the maximum time that each business process can
be inoperative before significant damage or long-term
viability is threatened
– Probably an educated guess for many processes

77
Develop Statements of Impact

• For each process, describe the impact on the rest of


the organization if the process is incapacitated
• Examples
– Inability to process payments
– Inability to produce invoices
– Inability to access customer data for support purposes

78
Record Other Key Metrics

• Examples
– Cost to operate the process
– Cost of process downtime
– Profit derived from the process
• Useful for upcoming Criticality Analysis

79
Ascertain Current Continuity and Recovery
Capabilities
• For each business process
– Identify documented continuity capabilities
– Identify documented recovery capabilities
– Identify undocumented capabilities
• What if the disaster happen tomorrow

80
Develop Key Recovery Targets

• Recovery time objective (RTO)


– Period of time from disaster commencement to
resumption of business process

• Recovery point objective (RPO)


– The maximum amount of data – as measured by time –
that can be lost after a recovery from a disaster event
– Maximum period of data loss from onset of disaster
counting backwards
– Amount of work that will have to be done over
81
Develop Key Recovery
Targets (cont.)
• Obtain senior management buyoff on RTO and RPO
• Publish into the same database / spreadsheet listing
all business processes

82
Sample Recovery Time Objectives

RPO Technology(ies) required


8-14 days New equipment, data recovery from backup
4-7 days Cold systems, data recovery from backup
2-3 days Warm systems, data recovery from backup
12-24 hours Warm systems, recovery from high speed backup
media

83
Sample Recovery
Time Objectives (cont.)
RPO Technology(ies) required
6-12 hours Hot systems, recovery from high speed backup
media
3-6 hours Hot systems, data replication
1-3 hours Clustering, data replication
< 1 hour Clustering, near real time data replication

84
Criticality Analysis

• Rank processes by criticality criteria


– MTD (maximum tolerable downtime)
– RTO (recovery time objective)
– RPO (recovery point objective)
– Cost of downtime or other metrics
– Qualitative criteria
• Reputation, market share, goodwill

85
Improve System and
Process Resilience
• For the most critical processes (based upon ranking
in the criticality analysis)
– Identify the biggest risks
– Identify cost of mitigation
– Can several mitigating controls be combined
– Do mitigating controls follow best / common practices

86
Develop Business Continuity and
Recovery Plans
Select Recovery Team Members

• Selection criteria
– Location of residence, relative to work
and other key locations
– Skills and experience (determines effectiveness)
– Ability and willingness to respond
– Health and family (determines probability to serve)
– Identify backups
• Other team members, external resources

88
Emergency Response

• Personnel safety: includes first-aid, searching for


personnel, etc.
• Evacuation: evacuation procedures to prevent any
hazard to workers.
• Asset protection: includes buildings, vehicles, and
equipment.

89
Emergency Response (cont.)

• Damage assessment: this could involve


outside structural engineers to assess damage to
buildings and equipment.
• Emergency notification: response team
communication, and keeping management and
organization staff informed.

90
Damage Assessment and Salvage

• Determine damage to buildings, equipment, utilities


– Requires inside experts
– Usually requires outside experts
• Civil engineers to inspect buildings
• Government building inspectors
• Salvage
– Identify working and salvageable assets

91
Notification
• Many parties need to know the condition of the
organization
– Employees, suppliers, customers, regulators, authorities,
shareholders, community
• Methods of communication
– Telephone call trees, web site, media
– Alternate means of communication must be identified

92
Personnel Safety

• The number one concern in any disaster response


operation
– Emergency evacuation
– Accounting for all personnel
– Administering first-aid
– Emergency supplies
• Water, food, blankets, shelters
• On-site employees could be stranded for several days

93
Communications

• Communications essential during emergency


operations
• Considerations
– Avoid common infrastructure
• Don't have emergency communications through the same
wires as normal communications
– Diversify mobile services
– Consider two-way radios
– Consider satellite phones
– Consider amateur radio
94
Logistics and Supplies
• Food and drinking water
• Blankets and sleeping cots
• Sanitation (toilets, showers, etc.)
• Tools
• Spare parts
• Waste bins
• Information
• Communications
• Fire protection (extinguishers, sprinklers, smoke
alarms, fire alarms)
95
Business Resumption Planning

• Alternate work locations


• Alternate personnel
• Communications
– Emergency, support of business processes
• Standby assets and equipment
• Access to procedures, business records

96
Restoration and Recovery

• Repairs to facilities, equipment


• Replacement equipment
• Restoration of utilities
• Resumption of business operations in primary
business facilities

97
Improving System Resilience
and Recovery
• Off-site media storage
– Assurance of data recovery
• Server clusters
– Improve availability
– Geographic clusters: members far apart
• Data replication
– Application, DMBS, OS, or Hardware
– Maintains current data on multiple servers even in remote
places
98
Training Staff

• Everyday operations
• Recovery procedures
• Emergency procedures
• Resumption procedures

99
Combining the DRP and the
BCP
• DRP and BCP are closely related,
– Most organizations prepare them concurrently and may
combine them into a single document
– Such a comprehensive plan must be able to support
reestablishment of operations at two different locations
• Immediately at alternate site
• Eventually back at primary site

• Although a single planning team can develop combined


DRP/BRP, execution requires separate teams

100
Testing Business Continuity
and Disaster Recovery Plans
Testing Business Continuity
and Disaster Recovery Plans

• Five levels of testing


– Document review (Read through test)
– Walkthrough
– Simulation
– Parallel test
– Cutover test

102
Document Review
• Review of recovery, operations, resumption plans
and procedures
• Performed by individuals
• Provide feedback to document owners

103
Walkthrough
• Performed by teams
• Group discussion of recovery, operations,
resumption plans and procedures
• Brainstorming and discussion brings out new issues,
ideas
• Provide feedback to document owners

104
Simulation
• Walkthrough of recovery, operations, resumption
plans and procedures in a scripted “case study” or
“scenario”
• Performed by teams
• Places participants in a mental disaster setting that
helps them recognize real issues more easily

105
Parallel Test
• Full or partial workload is applied to recovery
systems
• Performed by teams
• Tests actual system readiness and accuracy of
procedures
• Production systems continue to operate and
support actual business processes

106
Cutover Test
• Production systems are shut down or disconnected;
recovery systems assume full actual workload
• Risk of interrupting real business
• Gives confidence in DR (Disaster Recovery) system
if it works

107
Maintaining Business Continuity and
Disaster Recovery Plans
Maintaining Business Continuity and
Disaster Recovery Plans
• Events that necessitate review and modification of
DRP and BCP procedures:
– Changes in business processes and procedures
– Changes to IT systems and applications
– Changes in IT architecture
– Additions to IT applications
– Changes in service providers
– Changes in organizational structure

109

You might also like