You are on page 1of 10

What is an Incident?

ITIL defines an incident as an unplanned interruption to or quality reduction of an IT service. The service
level agreements (SLA) define the agreed-upon service level between the provider and the customer.

Incidents differ from both problems and requests. An incident interrupts normal service; a problem is a
condition identified through a series of multiple incidents with the same symptoms.

Problem management resolves the root cause of the problem;


Incident management restores IT services to normal working levels.
Requests for fulfillment are formal requests to provide something. These may include training, account
credentials, new hardware, license allocation, and anything else that the IT service desk offers. A request
may need approvals before IT fulfills it.

Incidents interrupt normal service, such as when a user’s computer breaks, when the VPN won’t
connect, or when the printer jams. These are unplanned events that require help from the service
provider to restore normal function.

The Incident Process


In ITIL, incidents go through a structured workflow that encourages efficiency and best results for both
providers and customers. ITIL recommends the incident management process follow these steps:

Incident identification
Incident logging
Incident categorization
Incident prioritization
Incident response
Initial diagnosis
Incident escalation
Investigation and diagnosis
Resolution and recovery
Incident closure

Once identified as an incident, the service desk logs the incident as a ticket. The ticket should include
information, such as the user’s name and contact information, the incident description, and the date and
time of the incident report (for SLA adherence). The logging process can also include categorization,
prioritization, and the steps the service desk completes..

Incident categorization is a vital step in the incident management process.

Categorization involves assigning a category and at least one subcategory to the incident. This action
serves several purposes. First, it allows the service desk to sort and model incidents based on their
categories and subcategories. Second, it allows some issues to be automatically prioritized. For example,
an incident might be categorized as “network” with a sub-category of “network outage”. This
categorization would, in some organizations, be considered a high-priority incident that requires a major
incident response. The third purpose is to provide accurate incident tracking. When incidents are
categorized, patterns emerge. It’s easy to quantify how often certain incidents come up and point to
trends that require training or problem management. For example, it’s much easier to sell the CFO on
new hardware when the data supports the decision.
Incident prioritization is important for SLA response adherence. An incident’s priority is determined by
its impact on users and on the business and its urgency. Urgency is how quickly a resolution is required;
impact is the measure of the extent of potential damage the incident may cause.
Low-priority incidents are those that do not interrupt users or the business and can be worked around.
Services to users and customers can be maintained.
Medium-priority incidents affect a few staff and interrupt work to some degree. Customers may be
slightly affected or inconvenienced.
High-priority incidents affect a large number of users or customers, interrupt business, and affect service
delivery. These incidents almost always have a financial impact.

Once identified, categorized, prioritized, and logged, the service desk can handle and resolve the
incident. Incident resolution involves five steps:
Initial diagnosis: This occurs when the user describes his or her problem and answers troubleshooting
questions.
Incident escalation: This happens when an incident requires advanced support, such as sending an on-
site technician or assistance from certified support staff. As mentioned previously, most incidents should
be resolved by the first tier support staff and should not make it to the escalation step.
Investigation and diagnosis: These processes take place during troubleshooting when the initial incident
hypothesis is confirmed as being correct. Once the incident is diagnosed, staff can apply a solution, such
as changing software settings, applying a software patch, or ordering new hardware.
Resolution and recovery: This is when the service desk confirms that the user’s service has been
restored to the required SLA level.
Incident closure: At this point, the incident is considered closed and the incident process ends.

Incident Statuses
Incident statuses mirror the incident process and include:
New
Assigned
In progress
On hold or pending
Resolved
Closed

The new status indicates that the service desk has received the incident but has not assigned it to an
agent.

The assigned status means that an incident has been assigned to an individual service desk agent.

The in-progress status indicates that an incident has been assigned to an agent but has not been
resolved. The agent is actively working with the user to diagnose and resolve the incident.

The on-hold status indicates that the incident requires some information or response from the user or
from a third party. The incident is placed “on hold” so that SLA response deadlines are not exceeded
while waiting for a response from the user or vendor.
The resolved status means that the service desk has confirmed that the incident is resolved and that the
user’s service has restored to the SLA levels.

The closed status indicates that the incident is resolved and that no further actions can be taken.
Problem Management
A problem is defined by ITIL as the cause of one or more incidents. Some incidents, such as a
malfunctioning mouse at a user’s workstation, are not indicative of a problem. Other incidents,
such as repeated network outages, create a problem investigation due to their frequency. In this
case, problem management is reactive. Proactive problem management involves addressing the
state of hardware, software, and processes, and preemptively addressing issues before they
cause excessive incidents. Neither incident management nor request management has the
ability to be proactive like problem management.

The Purpose of Problem Management


When users continue to face the same incidents without resolution, they lose trust in the
service desk’s ability to resolve any problem. Hence the primary objective of problem
management is to identify, troubleshoot, document, and resolve the root causes of repeated
incidents. Incident information filters up to problem management and problem management, in
turn, provides the service desk with the known error and workaround information necessary to
mitigate problems in the short term.

Problems include issues such as failing hardware or an inadequately configured database query.
Problem management reduces incidents over the long term. Incident reduction decreases the
load on the service desk, improves end-user satisfaction, and decreases the long-term costs
associated with user and service downtime. When problems cannot be resolved, problem
management works with the service desk to mitigate the impact of the related incidents. The
end goal of problem management should always be to reduce the overall quantity of
preventable incidents and thereby increase the quality of service provided.

The Scope of Problem Management


Problem management has a very limited scope and includes the following activities:
• Problem detection
• Problem logging
• Problem categorization
• Problem prioritization
• Problem investigation and diagnosis
• Creating a known error record
• Problem resolution and closure
• Major problem review

The Main Function of Problem Management


While problem management involves several functions, the most important is the service desk.
While it is also known as a help desk, this is not the ITIL-preferred term and should be avoided.
In ITIL, this function acts as the single point of contact for service customers to report incidents
and submit service requests. Without a single point of contact, users may contact staff and
expect immediate service without prioritization limitations. Unfortunately, this means that
urgent incidents could be ignored while incidents that don't impact the business get handled
first. Another common scenario is that important but low-priority incidents are not handled for
weeks while the IT support staff take care of the most pressing issues on their desks, leaving no
time for smaller issues. The service desk allows the service provider to address everyone’s
issues promptly and sequentially. It also encourages knowledge transfer between departments,
collects data on IT trends, and feeds problem management.

This function can be divided into separate support levels called tiers. The first tier is for basic
issues. This includes low-priority issues such as basic computer troubleshooting. Tier one
incidents are the most likely to be turned into incident models, since these are easy to solve and
recur often. Tier-one incidents do not impact the business or other users. They can always be
worked around until the service desk resolves them. For example, a Microsoft ®Outlook® error
can be worked around by using the web-based email application instead.

Then there's tier two. The second-tier support level handles issues that have some impact on
the user but not on the business as a whole. Usually these incidents require more skill or access
to resolve. Tier-two incidents are medium priority, and require a more immediate response and
higher level of access or training than tier-one incidents.

Tier-three incidents affect the entire organization and many users. Sometimes, a VIP may fall
into a tier-two or tier-three categorization to provide a faster response time for these users.
Often, these incidents fall into the Major Incident Response (MIR) process. These incidents are
defined by ITIL as those that cause significant disruption to the business. These are always high
priority. Incidents that require MIR are good candidates as potential problems, since they affect
the business and likely have a different root cause than regular incidents.

You'll know that you've accurately assessed tiers and priorities when most incidents fall into tier
one/low priority, fewer fall into tier two, and only a few require escalation to tier three.

The service desk interfaces with the problem management team in several ways. The first
interaction is when a potential problem is raised. This often happens when an incident is
deemed unresolvable at the service desk and must be escalated. This also happens when an
incident occurs repeatedly despite normal troubleshooting and resolution steps. Finally, when
the problem management or continual service improvement team identifies problems
proactively, they may contact the service desk for more information or incident statistics.

The Problem Management Process


The ITIL problem management process has many steps, and each is vitally important to the
success of the process and the quality of service delivered.

The first step is to detect the problem. A problem is raised either through escalation from the
service desk, or through proactive evaluation of incident patterns and alerts from event
management or continual service improvement processes. Signs of a problem include incidents
that occur across the organization with similar conditions, incidents that repeat despite
otherwise successful troubleshooting, and incidents that are unresolvable at the service desk.

The second step is to log the problem. In an ITIL framework, problems are logged in a problem record. A
problem record is a compilation of every problem in an organization. This can be accomplished via a
ticketing system that allows for problem ticket types. Pertinent problem data, such as the time and date
of occurrence, the related incident(s), the symptoms, previous troubleshooting steps, and the problem
category all help the problem management team research the root cause.

The third step is to categorize the problem. Problem categorization should match incident
categorization. Incident [and problem] categorization involves assigning a main and secondary category
to the issue. This step is beneficial in several ways. One benefit is that it allows the service desk to sort
and model incidents that occur regularly. The modeling allows for automatic assignment of prioritization.
The third and most important benefit is the ability to gather and report on service desk data. This data
allows the organization to not only track problem trends, but also to assess its effect on service demand
and service provider capacity.

The fourth step is to prioritize the problem. A problem’s priority is determined by its impact on users
and on the business and its urgency. Urgency is how quickly the organization requires a resolution to the
problem. The impact is a measure of the extent of potential damage the problem can cause the
organization. Prioritizing the problem allows an organization to utilize investigative resources most
effectively. It also allows organizations to mitigate damage to the service level agreement (SLA) by
reallocating resources as soon as the issue is known.

The fifth step is a two-part process, which involves investigating and diagnosing the problem. The speed
at which a problem is investigated and diagnosed depends on its assigned priority. High-priority issues
should always be addressed first, as their impact on services is the greatest. Correct categorization helps
here, since identifying trends is easier when problem categories correlate to incident categories.
Diagnosis usually involves analyzing the incidents that lead to the problem report as well as further
testing that may not be possible at the service desk level, such as advanced log analysis.

The sixth step is to identify a workaround for the problem. A workaround should always be indicated,
because problems are not resolved at the incident level. A workaround enables the service desk to
restore services to users while the problem is being resolved. A problem can take anywhere from an
hour to months to resolve, therefore a workaround is vital. A problem is considered open until resolved,
so a workaround should only be considered a temporary measure.

Step seven is to raise a known error record. Once the workaround has been identified, it should be
communicated to staff within the organization as a known error. It’s good practice to record a known
error in both an incident knowledge base and what ITIL calls a known error database (KEDB).
Documenting the workaround allows the service desk to resolve incidents quickly and avoid further
problems being raised on the same issue.

Step eight is to resolve the problem. Problems should be resolved whenever possible. Resolution
resolves the underlying cause of a set of incidents and prevents those incidents from recurring. Some
resolutions may require the change management board, as they may affect service levels. For example, a
database switchover may cause slowness during the switchover period. All risks should be evaluated and
accounted for before implementing the resolution. Document the steps taken to resolve the problem in
the organization’s knowledge base.

The ninth step is to close the problem. This step should only occur after the problem has been raised,
categorized, prioritized, identified, diagnosed, and resolved. While many organizations stop at this step,
it isn't the last according to ITIL.

The final step is to review the problem. This is also known as a major problem review. The major
problem review is an organizational activity that prevents future problems. During the review, the
problem management team evaluates the problem documentation and identifies what happened and
why. Lessons learned, such as process bottlenecks, what went wrong, and what helped should be
discussed. This is where having a complete problem log will help. A completed log will work much better
than trying to pull the details from memory. This problem review should result in improved processes,
staff training, or more complete documentation.

IT Change Management Procedures


The IT change management process typically consists of different procedures:
Request for change review: Change coordinators use this procedure when they are dealing with
requests for change.

Change planning: Change coordinators and specialists employ this process to prepare the
implementation plans for changes.

Change approval: The change manager and approvers (e.g., customer representatives and
service owners) follow this procedure to approve planned changes.

Change implementation: Specialists use this process to implement infrastructure changes.

Change closure: Specialists follow this procedure when they perform production tests after
changes have been implemented, and change coordinators employ it when they close out
changes.

These procedures vary slightly for different types of IT changes and risk levels.

Wherever possible, IT organizations should standardize and automate the way that they process
requests.

Types of IT Changes
There are different types of change requests, or change classes, that are typically managed in
different ways:

• Standard changes are changes to a service or to the IT infrastructure where the


implementation process and the risks are known upfront. These changes are managed
according to policies that are the IT organization already has in place. Since these changes are
subject to established policies and procedures, they are the easiest to prioritize and implement,
and often don’t require approval from a risk management perspective.

• Normal changes are those that must go through the change process before being approved
and implemented. If they are determined to be high-risk, a change advisory board must decide
whether they will be implemented.

• Emergency changes arise when an unexpected error or threat occurs, such as when a flaw in
the infrastructure related to services needs to be addressed immediately. A security threat is
another example of an emergency situation that requires changes to be made immediately.

Latent changes, expedited changes, and no-impact changes are classes that BMC Remedy® ITSM
software handles, though they are typically not listed in ITIL documentation.

Change Management Roles


Before you begin utilizing ITIL procedures, you will need to assign the following roles to your
team:

The change initiator recognizes and identifies the need for change. Your change initiator should
be someone who works directly with support services tools. Members of your team who
provide support services to customers may be best suited for this position due to their frequent
interaction with the system.

The change coordinator assesses requests for change that originate from incident management,
problem management, release management, or continuity management. The change
coordinator registers changes as needed to handle requests for change or receives change
requests from other change initiators; determines the risk and impact for requested changes;
prepares implementation plans by creating tasks; and monitors the progress of changes.

The change manager is generally needed in mid-sized and larger organizations. If your IT
department is part of a larger company, you will need to pick one or multiple persons to
perform the role of change manager. These individuals are responsible for managing change
procedures, receiving and prioritizing change requests, evaluating the risk level associated with
requests, and keeping thorough records of the outcome of each change.

The change advisory board is responsible for authorizing changes and further evaluating
requests when the change manager determines that there is a high risk associated with these
requests. The board takes into account the impact that a requested change may have on all
affected parties. When these high-risk changes are brought to the attention of the change
advisory board, the board will schedule a meeting with a detailed agenda to determine how to
proceed.

The approver decides whether to approve or reject changes.

The change implementation team consists of the specialists on your team who are responsible
for actually making changes. You will likely be part of this team and employees directly under
you may also be assigned to implement changes. As an IT manager, you will often be
responsible for overseeing changes.

ITIL Change Management Process


This section reviews the various procedures that are part of the ITIL change management
process. Once you understand the course, you will be able to proceed through the process
without having to stop to determine what comes next.

Creating a Request for Change


If you are creating a request for change, you are responsible for documenting details that will
help others understand what change needs to be implemented and why you are making the
request. The initial change request submission often includes details about the risk and
implementation steps, if the initiator already knows this information. However, this is not
required information at this time.

Details that may be found in a change request include:

1. Incidents that necessitate the change

2. Description of how the change would be implemented


3. The impact that the change would have on all associated systems
4. A risk assessment
5. Contact information for everyone involved in the change
6. An outline of who will need to approve the request
7. A backup plan to follow in case the change is not successful

Reviewing and Assessing a Request for Change

If you are responsible for reviewing a request for change, you will need to evaluate the request
based on its practicality and priority. Your job is to determine whether the request is reasonable
and to give feedback related to the request. If requests relate to problems that have already
been addressed or are not practical to implement, they will be set aside.

Practical requests will be evaluated according to the originator of the request, the impact that
making a change would have on the company, the estimated return on any investment made in
relation to the request, and the resources that are needed to fulfill the request. You will also
determine who would be responsible for fulfilling the request, and the implementers’ ability to
dedicate time to making the change.

Planning the Change

Once a change request is made, you will need to plan the change as if it is going to occur. A
change plan outlines the course that the change will take, the resources that are needed to
complete the change, and a timeline for implementation.

Testing the Change


If a change relates to debugging software or otherwise changing a system, you may need to test
the change before it is approved. A small-scale test will demonstrate the procedure to be
followed in case the change request is approved. Testing the change also gives you the
opportunity to work out any problems in the procedures that you have developed.

Creating a Change Proposal

A change proposal outlines the type of change, the priority associated with a change request,
and the outcomes that could occur if the change is not made. Your proposal will be given to the
person empowered to authorize the change, so it is important that you provides a thorough
explanation of why a change needs to be made. For example, a change with a high-priority level
may result in outages that will affect customers and result in revenue losses. The people who
authorize changes must be aware of the severity of the impact if you do nothing.

Implementing Changes

Implementing a change is not a simple process. The change has to be built during the planning
process, and implementation is just one step in the change management process. Once the
change has been made, tests must be done to determine whether the desired results have been
achieved. If the change is not successful, remediation methods may be used to determine what
went wrong and to implement a backup plan to alleviate the issues that necessitated the
change request.

Reviewing Change Performance

The post-implementation review is an essential part of the change management process. As an


IT professional, you want to understand whether your change procedures are working as
expected. This includes reviewing records to determine whether the change was successful or
failed, and recording details about the time and expense of the change to determine the
accuracy of estimates that were made before a request was fulfilled. Reviewing change
performance gives you the opportunity to fine-tune your change management process for
better results in the future.

Closing the Process

Once the change process is complete, you must be sure that the entire process has been
documented in a database that all stakeholders can access. Once this documentation has been
made, the process is closed out.

You might also like