You are on page 1of 11

Incident Management

For IT Support Staff at the University of Melbourne


Contents
Contents......................................................................................................................................................... 1
Scope ............................................................................................................................................................. 2
Definition of an Incident ............................................................................................................................... 2
Goal ............................................................................................................................................................... 2
Incident Management Process overview ....................................................................................................... 2
Links with other processes, activities and functions ..................................................................................... 2
Inputs, Activities and Outputs ....................................................................................................................... 3
Inputs .............................................................................................................................................................. 3
Activities ........................................................................................................................................................ 3
Outputs ........................................................................................................................................................... 3
Incident Management Activity Workflow..................................................................................................... 4
Roles & Responsibilities ............................................................................................................................... 5
1st Level - Service Desk.................................................................................................................................. 5
Specialist Support Groups .............................................................................................................................. 5
2nd Level Support............................................................................................................................................ 6
3rd Level Support ............................................................................................................................................ 6
Service Desk Manager.................................................................................................................................... 6
Incident Manager............................................................................................................................................ 6
Incident Handling .......................................................................................................................................... 7
End User based input methods ....................................................................................................................... 7
Incident Status ................................................................................................................................................ 7
Incident Record Keeping................................................................................................................................ 7
Incident Record (Case) History ...................................................................................................................... 7
Escalation ...................................................................................................................................................... 8
Functional ....................................................................................................................................................... 8
Hierarchical .................................................................................................................................................... 8
Use of the ITSM Toolset ............................................................................................................................... 8
Relationship to other Processes ..................................................................................................................... 8
Commonly Used Terms................................................................................................................................. 9
Attachment: IS Priority Response Matrix ................................................................................................... 10
Influencing factors:....................................................................................................................................... 10
Instructions for determining a Priority for an Incident:................................................................................ 10
Priority – Response Matrix:.......................................................................................................................... 10
Definitions: ................................................................................................................................................... 10
Rating Matrix: .............................................................................................................................................. 11

Incident Management Process Page 1 of 11


Scope
This process is for all End User enquiries. Communication to staff will be governed by a self regulatory
process, whilst written communications to Students will be governed by the Student Communication Group
(SCG).

Definition of an Incident
Any event which is not part of the standard operation of a service and which causes, or may cause, an
interruption to, or a reduction in, the quality of that service.

Goal
The primary goal of the Incident Management process is to restore normal service operation as quickly as
possible and minimise the adverse impact on business operations, thus ensuring that the best possible levels of
service quality and availability are maintained. ‘Normal service operation’ is defined here as service operation
within Service level agreement (SLA) limits.

Incident Management Process overview


Incidents
Service Request Procedures
enter
process

Incident Management Process


Service Desk • Incident detection and
RFC
Change
recording Management
Process
• Classification and Initial Resolution
Support
Networks
• Investigation and
Diagnosis
• Resolution and Resolution and
workarounds
Recovery Problem and
Other sources error
of Incidents
• Incident Closure database

• Incident Ownership,

Resolutions/ Configuration Details


workarounds
leave process
CMDB

Links with other processes, activities and functions


• Configuration Management Process.
• Change Management Process.
• Problem Management Process.
• Process, Procedures and documentation for Training and Knowledge Management.

Incident Management Process Page 2 of 11


Inputs, Activities and Outputs

Inputs
• Incident details are sourced (for example) from End Users via the Service Desk, networks or computer
operations via monitoring tools and manual detection during defined operational hours (Service
Catalogue)
• Configuration item (CI) details from the Configuration Management Database (CMDB)
• Response from Incident matching against Problems and Known Errors
• Resolution details
• Response on RFC to effect resolution for Incident(s).

Activities
• Detection, recording and alerting
• Interrogation, classification, prioritisation and initial support
• Investigation and diagnosis: A resolution or Work-around is required to be established as quickly as
possible in order to restore the service to End Users with minimum disruption to their work.
• Resolution and recovery. Resolution of the Incident and restoration of the agreed service.
• Closure.
• Incident ownership, monitoring, tracking and communication.

Outputs
• Resolved (via Workarounds or Known Errors) and closed Incidents
• RFC for Incident resolution;
• Incident record information (including linkages to resolutions and/or Workarounds and/or CI data)
• Communication to Clients and End Users.
• Management information (reports and procedural information).

Incident Management Process Page 3 of 11


Incident Management Activity Workflow
Level 1 - Service Desk Level 2 - Internal Support Level 3 - External Support

1.
Incident
Detection and
Recording

Yes Is this a
Service
request?

No
C
O 2.
2.(a)
M Interrogation,
Classification
M Classification
and Priority
U and Priority *
N
I
C
A
T 3.(a) 3.
I Follow Level 1
O Service Request Investigate and
N Process Diagnose
S

W
I
3.(b)
T
Able to No Level 2
H
Resolve? Investigate and
Diagnose
C
L
I
Yes
E
N 3.(c)
T 4.
Yes Able to No Level 3
Resolution
Resolve? Investigate and
and Recovery
Diagnose

5.
Yes Able to
Confirmation
Resolve?
and Closure
**

Creation
PIR
of Problem,
required for
RFC or
Major
Known
Incidents
Error
* Check for duplicate incidents and known errors.
* * There are links to Knowledge Management.

Incident Management Process Page 4 of 11


Roles & Responsibilities
1st Level - Service Desk
The Service Desk is responsible for the monitoring of the resolution process of all registered Incidents–
in effect the Service Desk is the owner of all Incidents.

The Service Desk plays an important role in the Incident Management process, as follows:
• All Incidents are reported to and registered by the Service Desk – where detected Incidents are
generated automatically, the process still includes registration of the incident by the Service
Desk (automatically or manually)
• A goal of the Service Desk is that: the majority of Incidents (perhaps up to 85% in a highly
skilled environment) will be resolved at the Service Desk
• The Service Desk is the ‘independent’ function monitoring Incident resolution progress of all
registered Incidents.

On receipt of an incident notification, the responsibilities and main actions to be carried out by the (1st
level) Service Desk are:
• Incident detection and recording; record basic details – this includes timing data and details of
symptoms obtained
• Routing service requests to support groups when Incidents are not closed; if a service request
has been made, the request is handled in conformance with the organisation’s standard
procedures
• Initial support and classification and prioritisation;
from the CMDB, the Configuration Items (CI) reported as the cause for an Incident is selected,
to complete the Incident record, the appropriate priority is derived (impact x urgency = priority)
and the End User is given the unique system-generated Incident number (to be quoted at the
beginning of all further communication).
The priority is defined by using the Priority Matrix (refer to appendix a)
• Resolution and recovery of Incidents not assigned to 2nd level support;
the Incident is resolved in agreement with the End User;
those assigned to 2nd level support (i.e. a specialist group) following unsuccessful resolution at
1st level, have the case history updated with the relevant details and then assigned back to the
Service Desk to then notify the End User.
• Closure of Incidents;
following the review of classification, the Incident record is closed: details of the resolution
action and the appropriate category code are added
• Ownership, monitoring, tracking and communication;

Specialist Support Groups


Most IT departments and specialist groups contribute to handling and investigation of Incidents at some
time. Incidents that cannot be resolved immediately by the Service Desk are assigned to specialists
within 2nd and 3rd Level Support groups.

Support (specialist groups that may or may not be part of the Service Desk) will be involved in tasks
such as:
• Handling service requests
• Monitoring Incident details, including the Configuration Items affected
• Incident investigation and diagnosis (including resolution where possible)
• Detection of possible Problems and the assignment of them to the appropriate Problem
Management team for them to raise Problem records
• The resolution and recovery of assigned Incidents.

Incident Management Process Page 5 of 11


The definitions of 2nd and 3rd Level Support follow:

2nd Level Support


All support personnel that are employed by or work within the organisation are considered 2nd Level
Support staff (Providers); this means that they are part of the internal workforce, whether they are full
time, part time or contract staff.

When an incident requires additional 2nd Level resources from internal support teams to assist with
investigation and resolution of the error, the group assigned to the incident is responsible for engaging
the help of other 2nd Level resources as required.

Contractors that work for the organisation, possessing an employee number are considered 2nd Level
resources.

Groups may consist of matrix managed staff from different programs both inside and outside of
Information Services; these are also considered to be 2nd Level resources.

3rd Level Support


All support personnel that are external to the organisation are considered 3rd Level Support (Suppliers);
this means that they work for an external company, supplier or vendor.

When an incident requires 3rd Level resources from external support to assist with investigation of the
error, the 2nd Level support group assigned to the incident is responsible for engaging the help of those
extra resources.

Service Desk Manager


The Service Desk Manager has the responsibility of ensuring compliance with the process and ensuring
the highest standards for ongoing delivery of 1st Level support services.
Key responsibilities include;
• Ownership, monitoring, tracking and communication tasks which cover:
• Monitoring the status and progress towards resolution of all open Incidents
• Keeping affected End Users informed about progress
• Escalating the process if necessary.

Incident Manager
The Incident Manager has the responsibility for:
• Driving the efficiency and effectiveness of the Incident Management process
• Producing management information
• Managing the workflow of the Incident Management Process
• Monitoring the effectiveness of Incident Management and making recommendations for
improvement
• Developing and maintaining the Incident Management systems.

Incident Management Process Page 6 of 11


Incident Handling
End User based input methods
There are several methods by which an End User can contact the Service Desk, they are split up
between the current methods and methods that are not used yet but maybe at sometime in the future.

Current
Email, Web Form, Phone, Fax / Correspondence (Letter), Monitoring Tools, Service Desk (alternate
service point) and Walk-up

Future
SMS, Messenger (Synchronous Online Apps), from other End User tracking systems.

Incident Status
The status of an Incident reflects its current position in the Incident life-cycle, sometimes known as its
‘workflow position’. Everyone working with the Incident Management process must be aware of each
status and its meaning. Categories within the current ITSM include:
New - When a new Incident is being created
Assigned - When it is first saved and assigned to support group
Work in progress - When a support person is working on the Incident
Pending - Incidents awaiting feedback from external sources/End Users
Resolved - Incident completed and feedback completed with End User
Closed - When an Incident is finished

Incident Record Keeping


Throughout the Incident lifecycle the record must be maintained, this will allow the Service Desk agents
to provide a End User with the most up to date progress report. Such activities will include:
• Update history details
• Modify status (e.g. ‘new’ to ‘work-in-progress’ or ‘pending’)
• Modify business impact/priority
• Enter time spent and costs
• Monitor escalation status.
The ITSM Toolset (Remedy) will be the authoritative tool used to record this information.

Incident Record (Case) History


The Case history of the Incident shows the entire life-cycle of the case, it is therefore one of the most
important aspects of an Incident to keep up to date. Without a case history ongoing process
improvements will not be possible. The accurate logging of case details will be enforced by the Service
Desk for all Incidents before they can be resolved. The work info (previously known as the work log)
shows a running sheet of the activity of the case. The solution of the case will be recorded in the history
and issued to the End User upon resolution.

Incident Management Process Page 7 of 11


Escalation

Often, departments and (specialist) support groups other than the Service Desk are referred to as 2nd or
3rd Level support groups, having more specialist skills, time or other resources to solve Incidents. In this
respect, the Service Desk would be 1st Level support.

Note that 2nd or 3rd and/or n-level support may eventually include external suppliers, who may have
direct access to the Incident registration tool (depending on safety rules and technical issues).

‘Escalation’ is the mechanism that assists timely resolution of an Incident. It takes place during every
activity in the resolution process.

Functional
Transferring an Incident from 1st Level to 2nd Level support groups or further is called ‘functional
escalation’ and primarily takes place because of lack of knowledge or expertise. Functional escalation
also takes place when agreed time intervals elapse and must not exceed the (SLA) agreed resolution
times.

Hierarchical
‘Hierarchical escalation’ takes place at any moment during the resolution process when it is likely that
resolution of an incident will not be in time or satisfactory. In case of lack of knowledge or expertise,
hierarchical escalation is performed manually (by the Service Desk or other support staff). Automatic
hierarchical escalation can be considered after a certain critical time interval, when it is likely that a
timely resolution will fail. Preferably, this takes place long enough before the (SLA) agreed resolution
time is exceeded so that corrective actions by authorised line management can be carried out, for
example hiring third-party specialists.

Use of the ITSM Toolset

Remedy is the ITSM ITIL focused toolset that will be used to automate the recording of Incidents. Not
all the steps in the Process are automatic, some may be manual, and therefore may not be something that
can be automated within Remedy.

Relationship to other Processes

There are close relationship between the Incident Management process, Problem Management and
Change Management processes. These all link in directly with the function of the Service Desk.
Incidents that have an unknown cause can result in Problems being logged and their resolution can
result in a Request for Change (RFC). There is also a relationship with Service Level Management for
any reporting requirements.

Incident Management Process Page 8 of 11


Commonly Used Terms

Problem: The unknown underlying cause of one or more Incidents.

Service Request: (How-to, or Information request) Every Incident not being a failure in the IT
Infrastructure.

Known Error: An Incident or Problem for which the root cause is known and for which a temporary
Work-around or a permanent alternative has been identified. If a business case exists, an RFC will be
raised, but, in any event, it remains a known error unless it is permanently fixed by a Change.

RFC: Form, or screen, used to record details of a request for a Change to any CI (Configuration Item)
within an infrastructure or to procedures and items associated with the infrastructure (or to any aspect of
IT services).

Priority: The sequence in which an Incident or Problem needs to be resolved, based on impact and
urgency.

Priority Matrix: The tool used to calculate Priority of an Incident

End User: The person using or consuming the service on a daily basis

Client: The recipient of a service; usually the Client management will have responsibility for the
funding of the service.

Provider: The unit responsible for the provision of IT services.

Supplier: A third party responsible for supplying or supporting underpinning elements of the IT
services.

1st Level: The frontline, initial point of contact (Service Desk)

2nd Level: Provider (Internal Support)

3rd Level: Supplier (External Supplier, Contractor or Vendor)

Incident Management Process Page 9 of 11


Attachment: IS Priority Response Matrix
The Priority – Response Matrix has been developed to guide 1st Level Service Desk staff in the determination
of the overall priority placed on Incidents. The matrix is available to the University community so they can
ascertain the priority applied to their particular Incident. The user can state a priority, but the Service Desk
reserves the right to evaluate it against this matrix.

Influencing factors:
• The priority given to an Incident will be applied so long as the correct procedures are followed in requesting
a service or reporting an incident.
• The priority can change (due to escalation) or the length of time a help request has already been in the
ITSM toolset.
• The response to an Incident must fall within the realm of Information Services, and there must be sufficient
resources to respond.

Instructions for determining a Priority for an Incident:


1. Using the Rating Matrix score the Incident by rating it on each of the three IMPACT scales of scope /
reputation / business (each receives a score of 0 - 3), then add the URGENCY scale score together to arrive
at a total rating number. For example, an Incident may have a Scope (impact) of 3, a Reputation (impact) of
0, a Business (impact) of 1 and an urgency of 2, with a total rating of 6.
2. The total rating is then applied to the Priority – Response Matrix. The Priority – Response Matrix
determines the response offered by IS. Each Priority will have a dedicated process flow for the resolution
of the Incident.
3. The Priority in square brackets is the End User friendly wording, to be used in Announcements.
4. Higher rating tasks should be addressed before lower rating tasks. In case of an equal rating or priority then
it is first come first serve.

Priority – Response Matrix:


Rating Priority Response
12 Critical Immediate and sustained effort using all necessary resources until resolved.
[Critical] Emergency call out procedure in effect.
Action initiated - immediate Resolution - undetermined
9 – 11 High IS Technicians respond immediately to assess the situation. Service Desk staff
[Urgent] may request immediate assistance from other IS staff directly.
Action initiated - within 1 business hour Resolution - 1 business day
6–8 Medium Respond using standard operating procedures and within the supervisory
[Important] structure.
Action initiated - within 2 hours Resolution - 2 business days
0–5 Low Respond using standard operating procedures and as time allows.
[For Action initiated - within 2 business days Resolution - 10 business days
Information]

Definitions:
Business HoursAs per agreed times listed within the Service Catalogue.
Business Days Monday to Friday (excluding public holidays and University closures).
The correct 2nd level support team have assigned the job to a technician to resolve. All 1st
Action initiated
2nd Level level actions are exhausted.
Resolution Either a solution is provided, a workaround is developed, or an alternate manual method is
implemented.
Communication Receipt of request acknowledged, ITSM holds case details. Every User will receive relevant
updates upon the start of a case within 30mins from the submission of a case to the ITSM
Toolset.

Incident Management Process Page 10 of 11


Rating Matrix:
IMPACT IMPACT IMPACT
Rating Scope Reputation Business Urgency
3= Affects Everybody Areas outside the Interferes with core Activity or event is
high OR University will learning and teaching already in progress
Affects a significant respond negatively. activities including and cannot be made
percentage of (>50%) OR classroom based up or rescheduled and
users. Areas outside the instruction. immediate action
University will OR could eliminate or
respond positively. Impacts upon a mitigate the problem
mission critical or the condition is
business function. OR ongoing and persists
Involves potential loss until solved.
of mission critical
information.

2= Affects some (>10 but University will Interferes with non- Activity or event is
medium < 50) users respond negatively. instructional student scheduled to occur in
OR OR activity the very near future
Affects no more than University will OR but enough time
50% of users. respond positively. Impacts upon a remains to remedy the
departmental business request without
function that does not impacting the event.
impact the functioning
of the university on
the whole.

1= Affects 2 to several Faculty or Division Interferes with the Activity or event can
normal (10) users. will respond normal completion of be postponed without
OR negatively. general work. loss of productivity or
Affects no more than OR OR is scheduled far
25% of users. Faculty or Division Tasks are more enough in the future
will respond difficult but not so that normal
positively. impossible to processes will lead to
complete. its completion on
time.
0= Affects a single user. Goodwill remains Interferes with student No scheduled
low unchanged. recreational use of the completion time is
technology. required and normal
OR work can continue in
Interferes with staff the interim.
non-business related
use of office
equipment.

Notes:
1. “Scope” and “Urgency” impact values are automatically calculated when logging a case into the ITSM tool.
2. Additional weighting values must be added within the ITSM tool to ascertain “Reputation” and “Business” impact values.
3. a) When calculating “Scope” and one department is affected use the top half of the calculation.
b) When calculating “Scope” and a large percentage of the University is affected use the bottom half of the calculation.
4. Apply the total score to the Priority Response Matrix to prioritise the incident.
No “negative” weighting is to be applied, if you have any queries please consult your Team Leader.

Incident Management Process Page 11 of 11

You might also like