Professional Documents
Culture Documents
Chapter 6 Incident Managment Business Continutiy and Disaster Recovery
Chapter 6 Incident Managment Business Continutiy and Disaster Recovery
1
Outline
• Contingency Planning
• Incident Management
• Running a business continuity and disaster recovery
planning project
• Developing business continuity and disaster recovery
plans
• Testing business continuity and disaster recovery plans
• Training users
• Maintaining business continuity and disaster recovery
plans
2
Contingency Planning
3
Introduction
• Need planning for the unexpected event,
– when the use of technology is disrupted and
– business operations come close to a dead end
4
What Is Contingency Planning?
• Main goal:
– restoration to normal modes of operation
– with minimum cost and disruption to normal business
activities
– after an unexpected event.
5
CP Components
• Incident response (IRP)
Contingency
– focuses on immediate Planning
response
• Disaster recovery (DRP) Incident Disaster Business
– focuses on restoring response recovery continuity
operations at the primary
site after disasters occur
• Business continuity (BCP)
– facilitates establishment of
operations at an alternate
site
6
CP Operations
7
Contingency Plan
Implementation Timeline
8
Putting a Contingency Plan
Together
• The CP team should include:
– Project Manager
– Team Members
• Business managers
• Information technology managers
• Information security managers
9
Major Tasks in Contingency
Planning (for CP team)
10
Cybersecurity Incident Management
11
Key definitions
• One of the primary goals of any security program is to
prevent security incidents.
• Cybersecurity event
– A cyber security change that may have an impact on organizational
operations (including mission, capabilities, or reputation).
• Cybersecurity incident
– A single or a series of unwanted or unexpected cybersecurity events
that are likely to compromise organizational operations.
• Cybersecurity incident management
– Processes for preparing, detecting, reporting, assessing, responding to,
dealing with and learning from cyber security incidents.
12
Common Security incidents
13
Common Security incidents…
• Unauthorized access to a system
– unauthorized viewing of information
– bypassing access controls
– using another user’s access
– denying access to another user
• Unauthorized modification of info on a system
– corrupting information
– changing information without authorization
– unauthorized processing of information
14
Incident-handling procedures
• Before the Incident
– Planners draft a set of procedures, those tasks that must
be performed in advance of the incident
• Include:
– Details of data backup schedules
– Disaster recovery preparation
– Training schedules
– Testing plans
– Business continuity plans
15
Incident-handling procedures…
• During Incident
– Planners develop and document the procedures that must
be performed during the incident
• These procedures are grouped and assigned to various roles
• Planning committee drafts a set of function-specific procedures
16
Incident Management Steps
• Effective incident response management is handled in
several steps or phases.
17
Incident Detection
• Challenge
– determining whether an event is routine system use or an actual
incident
• Incident classification:
– process of examining a possible incident and
– determining whether or not it constitutes actual incident
• Ways to track and detect incident candidates
– Initial reports from end users, intrusion detection systems, host-
and network-based virus detection software, and systems
administrators
• Careful training and awareness allows everyone to convey
vital information to the IR team
18
Incident Indicators
• Probable Indicators • Definite Indicators
– Presence of unfamiliar files – Changes to logs
– Presence or execution of – Presence of hacker tools
unknown programs or processes
– Notifications by partner or peer
– Unusual consumption of – Notification by hacker
computing resources
– Use of dormant accounts
– Unusual system crashes
– Activities at unexpected times
– Presence of new accounts
– Reported attacks
– Notification from IDS
19
Occurrences of Actual
Incidents
• Loss of availability
• Loss of integrity
• Loss of confidentiality
• Violation of policy
• Violation of law
20
Incident Detection…
• The following list identifies many of the common
methods used to detect potential incidents.
• It also includes notes on how these methods report the incidents:
– IDS/IPS send alerts to administrators when an item of interest occurs.
– Many automated tools regularly scan audit logs looking for predefined
events, such as the use of special privileges.
• e.g. system integrity verification tools, log analysis tools, network and host
IDS/IPS, SIEM, etc…
• When they detect specific events, they typically send an alert to
administrators.
• Should be updated to reflect new attacks or vulnerabilities 21
Incident Detection…
• End users sometimes detect irregular activity and contact
administrators for help.
– when users report events such as the inability to access a
network resource, it alerts IT personnel about a potential
incident.
– train and encourage such reporting
22
Handling An Actual Incident: Contain,
Eradicate, and Recovery
23
Handling An Actual Incident
• What should you do to regain control once you have
detected a cyber security incident?
• Important decisions will have to be taken about:
• how to contain the incident,
• how to eradicate it and
• how to recover from it.
24
Incident Response
• Once an actual incident has been confirmed and
properly classified,
– the IR team moves from detection phase to response phase
25
Incident Response…
• Many organizations have a designated incident response
team:
– Computer incident response team (CIRT), or
– Computer security incident response team (CSIRT).
29
Containing a cybersecurity incidents…
• The following criteria can help you to evaluate:
– What could happen if the incident were not contained?
– Is the attack or breach doing immediate severe damage?
– Is there (potential) damage and/or theft of assets?
– Is it necessary to preserve evidence? And if so, what sources
of evidence should the organization acquire? Where will the
evidence be stored? How long should evidence be reserved?
– Is it necessary to avoid alerting the hacker?
– Do you need to ensure service availability or is it OK to take
the system offline?
30
Disconnect Vs. Watch and learn
Disconnect Watch and learn
• Will you shut down the system
and/or disconnect the network in • Will you continue operations for the
order to be able to recover as fast as time being and monitor the activity in
possible? order to be able to perform advanced
forensics and gather as much evidence
– This approach does not allow you to do a
thorough investigation.
as possible?
– You will alert the attacker that you have • Forensics can take time, so it might
discovered him. take a little longer to get back to
• This is the fastest way back to business as usual.
business – This is a more thorough approach and you
are more likely to tackle the root causes of
• A quick response can help reduce the the problem and get rid of it for good
window of time an attacker or
malware has to spread into your • Gathering evidence is essential if you
system and networks. want to find and prosecute the
attacker.
31
Investigating: Gathering evidence
• If you want to:
– tackle the problem at its root and
– identify the perpetrator for prosecution,
• you will need to preserve the evidence.
34
Incident Escalation
• An incident may increase in scope or severity to the
point that the IRP cannot adequately contain the
incident
• Each organization will have to determine, during the
business impact analysis, the point at which the
incident becomes a disaster
• The organization must also document when to
involve outside response
35
Eradication and Clean up
• The eradication can take many forms. It often implies actions
such as:
– Remove all components related to the incident, all artifacts left by the
attacker (malicious code, data, etc.) and
– Running a virus or spyware scanner to remove the offending files and
services
– Updating signatures
– Deleting malware
– Disabling breached user accounts
– Changing passwords of breached user accounts
– Identifying and mitigating all vulnerabilities that were exploited
– Informing employees about the threat and giving them instructions on
what to avoid in the future
– Informing external stakeholders such as the media and your customers 36
Incident Reporting
37
Initiating Incident Recovery
• Incident recovery
– After the incident has been contained and
– system control regained
• IR team must assess full extent of damage in order to
determine what must be done to restore systems
40
Incident remediation
• In the remediation stage, personnel:
– look at the incident and
– attempt to identify what allowed it to occur, and
– then implement methods to prevent it from happening again.
• This includes performing a root cause analysis.
• Example:
– Web server didn’t have up-to-date patches, allowing the attackers to gain
remote control of the server.
– Remediation steps might include implementing a patch management
program.
– Website application wasn’t using adequate input validation techniques,
allowing a successful SQL injection attack.
– Remediation would involve updating the application to include input
validation.
– Maybe the database is located on the web server instead of in a backend
database server.
– Remediation would mean moving the database to a server behind an 41
additional firewall.
Lesson Learned
• During the lessons learned stage, personnel examine the
incident and the response to see if there are any lessons to be
learned.
• Example,
– administrators may realize that attacks are getting through
undetected and
– increase their detection capabilities and
– recommend changes to their intrusion detection systems.
• Based on the findings, the team may recommend:
– changes to procedures,
– the addition of security controls, or
– even changes to policies.
42
Documenting an Incident
• While the notification process is underway, the team
should begin documentation
– Record the who, what, when, where, why and how of
each action taken while the incident is occurring
• Serves as a case study after the fact to determine:
– if right actions were taken and
– if they were effective
• Can also prove the organization did everything possible to
deter the spread of the incident.
43
After Action Review
• Before returning to routine duties, the IR team must
conduct an after-action review, or AAR
– AAR: detailed examination of events that occurred
44
Incident Management Steps: summary
• Effective incident response management is handled in
several steps or phases.
45
Business Continuity and Disaster
Planning Basics
Disaster Recovery and
Business Continuity Planning
47
What Is a Disaster?
• Any natural or man-made event that disrupts the
operations of a business in such a significant way that
– a considerable and coordinated effort is required to
achieve a recovery.
48
Disaster Recovery…
• In general, an incident is a disaster when:
– organization is unable to contain or control the impact of
an incident
OR
– level of damage or destruction from incident is so severe
that the organization is unable to quickly recover
49
Incident Response and
Disaster Recovery
50
Disaster Classifications
• A DRP can classify disasters in a number of ways
• Most common:
– Separate natural disasters from man-made disasters
51
Natural Disasters
• Geological: earthquakes, volcanoes, tsunamis,
landslides, and sinkholes
• Meteorological: hurricanes, tornados, wind storms,
hail, ice storms, snow storms, rainstorms, and
lightning
• Other: avalanches, fires, floods, meteors and
meteorites, and solar storms
• Health: widespread illnesses, quarantines, and
pandemics
52
Man-made Disasters
• Labor: strikes, walkouts, and slow-downs that
disrupt services and supplies
• Social-political: war, terrorism, sabotage, vandalism,
civil unrest, protests, demonstrations, cyber attacks,
and blockades
• Utilities: power failures, communications outages,
water supply shortages, fuel shortages, and
radioactive fallout from power plant accidents
• Materials: fires, hazardous materials leaks
53
How Disasters Affect Businesses
• Direct damage to facilities and equipment
• Transportation infrastructure damage
– Delays deliveries, supplies, customers, employees going to
work
• Communications outages
• Utilities outages
54
Planning for Disaster
55
Business Continuity Planning
(BCP)
• Ensures CRITICAL BUSINESS FUNCTIONS can continue in a disaster
• Most properly managed by top management of organization
• Activated and executed concurrently with the DRP when
needed
• Reestablishes critical functions at alternate site (DRP focuses
on reestablishment at primary site)
• Relies on identification of CRITICAL BUSINESS FUNCTIONS and the
resources to support them.
56
How BCP and DRP
Support Security
• BCP (Business Continuity Planning) and DRP
(Disaster Recovery Planning)
• Security pillars: C-I-A
– Confidentiality
– Integrity
– Availability
• BCP and DRP directly support availability
57
BCP and DRP Differences
and Similarities
• BCP
– Activities required to ensure the continuation of critical
business processes in an organization
– Alternate personnel, equipment, and facilities are required
– Often includes non-IT aspects of business
• DRP
– Assessment, recover, repair, and eventual restoration of
damaged facilities and systems
– Often focuses on IT systems
58
Industry Standards Supporting
BCP and DRP
59
Industry Standards Supporting
BCP and DRP (cont.)
• NIST 800-34
– Contingency Planning Guide for Information
Technology Systems.
– Seven step process for BCP and DRP projects
– From U.S. National Institute for Standards and
Technology
• NFPA 1600
– From U.S. National Fire Protection Association
– Standard on Disaster / Emergency Management and
Business Continuity Programs
60
Industry Standards Supporting
BCP and DRP (cont.)
61
Benefits of BCP and DRP Planning
• Reduce risk
• Process improvements
• Improve organizational maturity
• Improve availability and reliability
• Marketplace advantage
62
Business Continuity Strategies
• Several continuity strategies for business continuity
– Determining factor is usually cost
• Hot Sites
– Fully configured computer facility with all services
• Warm Sites
– Like hot site, but software applications not kept fully
prepared
• Cold Sites
– Only basic services and facilities kept in readiness
64
Shared Use Options
• Timeshares
– Like an exclusive use site but leased
• Service Bureaus
– Agency that provides physical facilities
• Mutual Agreements
– Contract between two organizations to assist
• Specialized alternatives:
– Developing mobile site
– Externally stored resources
65
Off-Site Disaster Data Storage
• Options include:
– Electronic vaulting: bulk batch-transfer of data to an off-
site facility.
• makes copies of files as they are modified and periodically transmits them
to an offsite backup site.
– Remote Journaling: transfer of live transactions to an off-
site facility.
– Database shadowing: storage of duplicate online
transaction data.
66
Running a BCP / DRP Project
Running a BCP / DRP Project
• Main phases
– Pre-project activities
– Perform a Business Impact Assessment (BIA)
– Develop business continuity and recovery plans
– Test resumption and recovery plans
68
Pre-project Activities
• Obtain executive support
• Formally define the scope of the project
• Choose project team members
• Develop a project plan
– Get a project manager
• Develop a project charter
– A document listing all these items, plus budget, and
milestones
69
Business Impact Assessment (BIA)
Business Impact Analysis
(BIA)
• BIA
– Provides information about systems/threats and detailed scenarios
for each potential attack
– Not a risk management focusing on identifying threats,
vulnerabilities, and attacks to determine controls
• BIA Assumption:
– controls have been bypassed or are ineffective and attack was successful
• CP team conducts BIA in the following stages:
– Threat attack identification
– Business unit analysis
– Attack success scenarios
– Potential damage assessment
71
Attack Success Scenarios
• Some examples of well-constructed risk scenarios are illustrated below.
Legend: Threat Event | Vulnerability | Asset | Consequence
73
Performing a Business
Impact Assessment (BIA)…
• Survey critical processes
• Perform risk analyses and threat assessment
• Determine Maximum Tolerable Downtime (MTD)
• Establish key recovery targets
74
Survey In-scope
Business Processes
• Develop interview template
• Interview a representative from each department
– Identify all important processes
– Identify dependencies on systems, people, equipment
• Collate/organize data into database or spreadsheets
– Gives a big picture, all-company view
– This phase is BUI (Business Unit Analysis) phase
75
Threat and Risk Analysis
76
Determine Maximum
Tolerable Downtime (MTD)
77
Develop Statements of Impact
78
Record Other Key Metrics
• Examples
– Cost to operate the process
– Cost of process downtime
– Profit derived from the process
• Useful for upcoming Criticality Analysis
79
Ascertain Current Continuity and Recovery
Capabilities
• For each business process
– Identify documented continuity capabilities
– Identify documented recovery capabilities
– Identify undocumented capabilities
• What if the disaster happen tomorrow
80
Develop Key Recovery Targets
82
Sample Recovery Time Objectives
83
Sample Recovery
Time Objectives (cont.)
RPO Technology(ies) required
6-12 hours Hot systems, recovery from high speed backup
media
3-6 hours Hot systems, data replication
1-3 hours Clustering, data replication
< 1 hour Clustering, near real time data replication
84
Criticality Analysis
85
Improve System and
Process Resilience
• For the most critical processes (based upon ranking
in the criticality analysis)
– Identify the biggest risks
– Identify cost of mitigation
– Can several mitigating controls be combined
– Do mitigating controls follow best / common practices
86
Develop Business Continuity and
Recovery Plans
Select Recovery Team Members
• Selection criteria
– Location of residence, relative to work
and other key locations
– Skills and experience (determines effectiveness)
– Ability and willingness to respond
– Health and family (determines probability to serve)
– Identify backups
• Other team members, external resources
88
Emergency Response
89
Emergency Response (cont.)
90
Damage Assessment and Salvage
91
Notification
• Many parties need to know the condition of the
organization
– Employees, suppliers, customers, regulators, authorities,
shareholders, community
• Methods of communication
– Telephone call trees, web site, media
– Alternate means of communication must be identified
92
Personnel Safety
93
Communications
96
Restoration and Recovery
97
Improving System Resilience
and Recovery
• Off-site media storage
– Assurance of data recovery
• Server clusters
– Improve availability
– Geographic clusters: members far apart
• Data replication
– Application, DMBS, OS, or Hardware
– Maintains current data on multiple servers even in remote
places
98
Training Staff
• Everyday operations
• Recovery procedures
• Emergency procedures
• Resumption procedures
99
Combining the DRP and the
BCP
• DRP and BCP are closely related,
– Most organizations prepare them concurrently and may
combine them into a single document
– Such a comprehensive plan must be able to support
reestablishment of operations at two different locations
• Immediately at alternate site
• Eventually back at primary site
100
Testing Business Continuity
and Disaster Recovery Plans
Testing Business Continuity
and Disaster Recovery Plans
102
Document Review
• Review of recovery, operations, resumption plans
and procedures
• Performed by individuals
• Provide feedback to document owners
103
Walkthrough
• Performed by teams
• Group discussion of recovery, operations,
resumption plans and procedures
• Brainstorming and discussion brings out new issues,
ideas
• Provide feedback to document owners
104
Simulation
• Walkthrough of recovery, operations, resumption
plans and procedures in a scripted “case study” or
“scenario”
• Performed by teams
• Places participants in a mental disaster setting that
helps them recognize real issues more easily
105
Parallel Test
• Full or partial workload is applied to recovery
systems
• Performed by teams
• Tests actual system readiness and accuracy of
procedures
• Production systems continue to operate and
support actual business processes
106
Cutover Test
• Production systems are shut down or disconnected;
recovery systems assume full actual workload
• Risk of interrupting real business
• Gives confidence in DR (Disaster Recovery) system
if it works
107
Maintaining Business Continuity and
Disaster Recovery Plans
Maintaining Business Continuity and
Disaster Recovery Plans
• Events that necessitate review and modification of
DRP and BCP procedures:
– Changes in business processes and procedures
– Changes to IT systems and applications
– Changes in IT architecture
– Additions to IT applications
– Changes in service providers
– Changes in organizational structure
109