Professional Documents
Culture Documents
INCIDENT MANAGEMENT
STANDARD OPERATING
PROCEDURE
Purpose:
This document provides a step-by-step processes and procedure for incident management and
Service Level Agreements (SLAs).
Related Topics:
PREPARATION POST
ANALYSIS RECOVERY
INCIDENT
FEEDBACK
Preparation
To prepare for incidents, compile a list of necessary information about the incident. Set up
monitoring so you have a baseline of normal activity. Determine which types of security events
should be investigated and create detailed response steps for common types of incidents.
Analysis
Involves identifying a baseline or normal activity for the affected systems, correlating related
events and seeing if and how they deviate from normal behavior.
Recovery
Is to restore to normal services. Your recovery strategy will depend on the level of damage the
incident can cause, the need to keep critical services available to the customers, and the
duration of the solution A temporary solution for a few hours, days or weeks, or a permanent
solution.
Post Incident
Incident response methodology is learning from previous incidents to improve the process.
You should ask, investigate and document the answers to the following questions:
Step 1: - ANYONE
- Announce an incident and ensure an incident coordinator or manager is
aware and starts the next step – else call a manager immediately
Step 2: INCIDENT COORDINATOR:
- Send initial Communication to stakeholders within 15 minutes.
- For initial notice – a ticket number is required while waiting for full details is
not required – incident members can be identified on the next update once
gathered.
Step 2: - INCIDENT COORDINATOR:
- Gather team and Schedule Teams-bridge with a dial-in option for 3rd parties.
- Share Teams-bridge in the Slack chat thread for this one incident.
- Update should include changes in roles if people change shifts or come in to
help.
- Anything touched must be documented in a timeline format – document
changes ie. Tech 1 made change 2 – 3 rd party agent made changes made to
systems.
Step 3: - REPAIR AGENT or REPAIR MANAGER:
- Post updates to Topic Thread for awareness and documentation
- Test with user and from technical- We should state the following here “who
we test with to make sure the issue is resolved”.
Step 4: - INCIDENT COORDINATOR: continue updates and comms until resolved.
a) Send out a final incident notice.
b) Update weekly incident resolution report.
c) Drop incident to a ticket for post-incident resolution monitoring -24 hours.
d) Close the incident ticket and open an RCA ticket for an RCA report generation
with a due date of 3 business days.
Step 5: Make RCA Set Ticket number.
Documentation Support shall take the captured the work performed and action(s)
implemented for future references and Job-aid drafting.
Step 6: Follow-up (This could be someone from Support, Dev Ops Manager, etc. To gather
feedback from the user)
4.0 Reporting
An Incident report shall be created and approved by the Incident Repair Manager and the
Incident Manager prior to being presented to Upper Management.
Furthermore, a Root Cause Analysis report shall be created and signed off by the Incident Repair
Manager before being uploaded to the repository.
The Ticket Management team shall create a weekly Incident Summary report detailing the
status, details for all incidents worked on during that very week, the projected resolution date if
the resolution date is not specified and the action taken for each incident worked on.
Escalation matrix: A document or system that defines when escalation should happen and who
should handle incidents at each escalation level.
Point of contact: A person that can be approached for information concerning the incident.
SLA: Service Level Agreement, the expectations between the service provider and
the customer