You are on page 1of 7

2023

INCIDENT MANAGEMENT
STANDARD OPERATING
PROCEDURE

Purpose:

This document provides a step-by-step processes and procedure for incident management and
Service Level Agreements (SLAs).

Related Topics:

 Incident response plan and Status feedback


 Incident Intake and Escalation
 Incident Workflow
 Service Level Agreement
 Reporting

1. Incident response plan and Status feedback

PREPARATION POST
ANALYSIS RECOVERY
INCIDENT
FEEDBACK
Preparation
To prepare for incidents, compile a list of necessary information about the incident. Set up
monitoring so you have a baseline of normal activity. Determine which types of security events
should be investigated and create detailed response steps for common types of incidents.
Analysis
Involves identifying a baseline or normal activity for the affected systems, correlating related
events and seeing if and how they deviate from normal behavior.

Recovery
Is to restore to normal services. Your recovery strategy will depend on the level of damage the
incident can cause, the need to keep critical services available to the customers, and the
duration of the solution A temporary solution for a few hours, days or weeks, or a permanent
solution.

Post Incident

Incident response methodology is learning from previous incidents to improve the process.

You should ask, investigate and document the answers to the following questions:

 What happened, and at what times?


 How well did the incident response team deal with the incident? Were processes
followed, and were they sufficient?
 What information was needed sooner?
 Were any wrong actions taken that caused damage or inhibited recovery?
 What could staff do differently next time if the same incident occurred?
 Could staff have shared information better with other organizations or other
departments?
 Have we learned ways to prevent similar incidents in the future?
 Have we discovered new precursors or indicators of similar incidents to watch for in the
future?
 What additional tools or resources are needed to help prevent or mitigate similar
incidents?
2 Service Level Agreement

Incident Priority is set by Severity and Impact: 

Category Description Resolution Expected Expected Reporting


Response Time Resolution Frequency
Time

Feature Feature does Resolution may Within 1 hours 4 hours 4 hours


Request not affect not be of confirmation
normal required
operations

Low Individual work Resolution may Within hours of 2 hours 2 hours


hindrance not be confirmation
acceptable required

Normal Individual work Immediate Within 1 hours 1 hours 1 hour


hindrance not resolution may of confirmation
acceptable not be
required

High Interruption to Immediate Within 20 45 minutes 30 Minutes


critical resolution is minutes of
processes required confirmation

Urgent Interruption to Immediate Within 15 30 minutes 15 Minutes


critical resolution is minutes
processes required
of confirmation
affecting many
users or
departments
Incident Escalation Matrix

Priority assignment Incident response


Incident Intake & Root Cause
& incident risk and Status
Escalation Analysis Report
assessment Feedback

Service Desk Service Desk Incident Coordinator S.D Manager

2.1 Incident Reception


Upon reception of an incident notice from the customer, the recipient is obliged to gather as
much information as possible concerning the incident in question and open a ticket as master
Incident ticket and hand that to incident coordinator.
Goal of this is that the information gathered will help the appropriate leadership and technical
resources to:
a) Assess the seriousness of the incident.
b) Assess the extent of the damage.
c) Identify the vulnerability created.
d) Estimate the additional resources required to mitigate the incident.
Below is a set of critical information to gather during incident reception:
1. Practice Name and location
a. Phone number
2. Point of contact
a. Name
b. Title
3. Incident background
a. Problem faced and when it started.
b. device(s) affected [device(s) name(s)]
c. Business processes impacted.
i. Is business impacted?
ii. Is there a financial impact.
iii. What else cannot be completed?
All incidents must be immediately posted in the #Incident-management Slack channel and start
the Incident Process with initial customer communications. Existing or new ticket priority is set
based on the SLA matrix.

3.0 Incident Response Plan and Status feedback

3.1 incident Response plan

3.1.1 Resource Gathering


The following resources should be set up prior to contacting the Point of contact:
1. Incident coordinator
a. Gather people and resources to address the issue, gather updates to
send based on priority schedule to stakeholders.
2. Incident repair manager
a. Approve needed emergency repairs and assist in tracking change
tickets.
3. Incident repair technician
a. Lead Agent to repair the issue.
4. Documentation Support
a. Knowledge team member to assist with documentation retrievals,
note needed updates to existing documentation, assist with ticket
documentation for compliance.
3.1.2 Incident Handling

Step 1: - ANYONE
- Announce an incident and ensure an incident coordinator or manager is
aware and starts the next step – else call a manager immediately
Step 2: INCIDENT COORDINATOR:
- Send initial Communication to stakeholders within 15 minutes.
- For initial notice – a ticket number is required while waiting for full details is
not required – incident members can be identified on the next update once
gathered.
Step 2: - INCIDENT COORDINATOR:
- Gather team and Schedule Teams-bridge with a dial-in option for 3rd parties.
- Share Teams-bridge in the Slack chat thread for this one incident.
- Update should include changes in roles if people change shifts or come in to
help.
- Anything touched must be documented in a timeline format – document
changes ie. Tech 1 made change 2 – 3 rd party agent made changes made to
systems.
Step 3: - REPAIR AGENT or REPAIR MANAGER:
- Post updates to Topic Thread for awareness and documentation
- Test with user and from technical- We should state the following here “who
we test with to make sure the issue is resolved”.
Step 4: - INCIDENT COORDINATOR: continue updates and comms until resolved.
a) Send out a final incident notice.
b) Update weekly incident resolution report.
c) Drop incident to a ticket for post-incident resolution monitoring -24 hours.
d) Close the incident ticket and open an RCA ticket for an RCA report generation
with a due date of 3 business days.
Step 5: Make RCA Set Ticket number.
Documentation Support shall take the captured the work performed and action(s)
implemented for future references and Job-aid drafting.

Step 6: Follow-up (This could be someone from Support, Dev Ops Manager, etc. To gather
feedback from the user)

4.0 Reporting

 An Incident report shall be created and approved by the Incident Repair Manager and the
Incident Manager prior to being presented to Upper Management.

 Furthermore, a Root Cause Analysis report shall be created and signed off by the Incident Repair
Manager before being uploaded to the repository.

 The Ticket Management team shall create a weekly Incident Summary report detailing the
status, details for all incidents worked on during that very week, the projected resolution date if
the resolution date is not specified and the action taken for each incident worked on.

 Root Cause Analysis will be delivered within 24.


Glossary

Escalation matrix: A document or system that defines when escalation should happen and who
should handle incidents at each escalation level. 

Impact: The effect or influence that an incident has on the organization’s daily business


processes.

Incident: An unplanned interruption to or quality reduction of an IT service.

Incident intake: Incident reception and information gathering.

Job-aid:  Clear instructions on how to perform a certain task.

Point of contact: A person that can be approached for information concerning the incident.

Process: A process is a series of steps involved in the way work is completed

RCA: Root Cause Analysis report

Severity: the extent to which the incident affects the organization.

SLA: Service Level Agreement, the expectations between the service provider and
the customer

Teams-bridge: Audio conferencing in Microsoft Teams

You might also like