ITIL v3 / 2011

Service Operation

ITIL v3/2011 Processes

Objective
The objective of ITIL Service Operation is to make sure that IT services are delivered effectively and efficiently. This includes 1. fulfilling user requests 2. resolving service failures 3. fixing problems 4. carrying out routine operational tasks.

Processes
• • • • • • • • • Event Management Incident Management Request Fulfillment Access Management Problem Management IT Operations Control Facilities Management Application Management Technical Management

Event Management
• Objective: The objective of ITIL Event Management is to make sure CIs and services are constantly monitored. Event Management aims to filter and categorize Events in order to decide on appropriate actions if required. • Process Description :Essentially, the activities and process objectives of the Event Management process are identical in ITIL V3 and V2. In ITIL 2011 Event Management has been updated to reflect the concept of 1st Level Correlation and 2nd Level Correlation

Sub Processes
1. Maintenance of Event Monitoring Mechanisms and Rules
- To set up and maintain the mechanisms for generating meaningful Events and effective rules for their filtering and correlating. Event Filtering and 1st Level Correlation - To filter out Events which are merely informational and can be ignored, and to communicate any Warning and Exception Events. 2nd Level Correlation and Response Selection - To interpret the meaning of an Event and select a suitable response if required. Event Review and Closure - To check if Events have been handled appropriately and may be closed. This process also makes sure that Event logs are analyzed in order to identify trends or patterns which suggest corrective action must be taken.

2.
3. 4.

Definitions
to represent process outputs and inputs
• • • Event- see Event Record – Event Categorization Scheme The Categorization Scheme for Events supports a consistent approach to dealing with specific types of Events. Ideally, this scheme should be harmonized with the schemes to categorize CIs, Incidents and Problems. Event Filtering and Correlation Rules - Rules and criteria used to determine if an Event is significant and to decide upon an appropriate response. Event Filtering and Correlation Rules are typically used by Event Monitoring systems. Some of those rules are defined during the Service Design stage, for example to ensure that Events are triggered when the required service availability is endangered. Event Record - A record describing a change of state which has significance for the management of a Configuration Item or service. The term Event is also used to mean an alert or notification created by any IT service, Configuration Item or monitoring tool. Events often require IT operations personnel to take actions, and may lead to Incidents being logged. Event Trends and Patterns -Any trends and patterns identified during analysis of significant Events, which suggest that improvements to the infrastructure are needed.

Responsibility Matrix: ITIL Event Management
Responsibility Matrix: ITIL Event Management
ITIL Role / Sub-Process Maintenance of Event Monitoring Mechanisms and Rules Event Filtering and 1st Level Correlation 2nd Level Correlation and Response Selection Event Review and Closure
Remarks
[1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Event Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Event Management. [3] In cooperation, as appropriate: IT Operations Manager, Access Manager, Capacity Manager, Availability Manager, IT Service Continuity Manager, Information Security Manager, Applications Analyst and/ or technical Analyst.

IT OPM A[1]R[2] A A AR

IT OPR R R -

EMS R R -

Other Roles R -

Incident Management
• Objective: ITIL Incident Management aims to manage the lifecycle of all Incidents. The primary objective of Incident Management is to return the IT service to users as quickly as possible. • Parent Process: Service Operation • Process Owner: Incident Manager

Process Description (v3)
• Incident Management according to ITIL V3 distinguishes between Incidents (Service Interruptions) and Service Requests (standard requests from users, e.g. password resets). Service Requests are no longer fulfilled by Incident Management; instead there is a new process called Request Fulfilment. • There is a dedicated process in ITIL V3 for dealing with emergencies ("Major Incidents"). Furthermore a process interface was added between Event Management and Incident Management. Significant Events are triggering the creation of an Incident.

Process Description (2011)


Guidance has been improved in Incident Management on how to prioritize an Incident (see Checklist Incident Prioritization Guideline). Additional steps have been added to Incident Resolution by 1st Level Support to explain that Incidents should be matched (if possible) to existing Problems and Known Errors. Incident Resolution by 1st Level Support and Incident Resolution by 2nd Level Support have been considerably expanded to provide clearer guidance on when to invoke Problem Management from Incident Management. The emphasis is now on restoring services as quickly as possible, and to seek the help of Problem Management if the underlying cause of an Incident cannot be resolved with a minor Change and/or within the committed resolution time. The Incident Management sub-process Incident Closure and Evaluation now states more clearly that it is important to check whether there are new Problems, Workarounds or Known Errors that must be submitted to Problem Management. The process overview of ITIL Incident Managementis showing the most important interfaces (see Figure 1).

Incident Prioritization Guideline
• This describes the rules for assigning priorities to Incidents, including the definition of what constitutes a Major Incident. Since Incident Management escalation rules are usually based on priorities, assigning the correct priority to an Incident is essential for triggering appropriate Incident escalations. • An Incident’s priority is usually determined by assessing its impact and urgency, where Urgency is a measure how quickly a resolution of the Incident is required Impact is measure of the extent of the Incident and of the potential damage caused by the Incident before it can be resolved.

Incident Urgency (Categories)
• This section establishes categories of urgency. The definitions must suit the type of organization, so the following table is only an example: To determine the Incident’s urgency, choose the highest relevant category:
Category High (H) Description
• • • • Staff are not able to do their job Customers are being acutely disadvantaged in some way Staff are unable to do their job properly Customers are inconvenienced in some way

Medium (M)

Low (L)


Staff are able to deliver an acceptable service but this requires extra effort Customers are inconvenienced but not in a significant way

Incident Impact (Categories)
• This section establishes categories of impact. The definitions must suit the type of organization, so the following table is only an example: • To determine the Incident’s impact, choose the highest relevant category:
Category High (H) Description
• • • • • A large number of users is affected A large number of customers is affected The financial impact of the Incident is (for example) likely to exceed $10,000 The damage to the reputation of the business is likely to be high Someone has been injured A moderate number of users is affected A moderate number of customers is affected The financial impact of the Incident is (for example) likely to exceed $1,000 but will not be more than $10,000 The damage to the reputation of the business is likely to be moderate

Medium (M)

• • • •

Low (L)

• • •

A minimal number of users is affected A minimal number of customers is affected The financial impact of the Incident is (for example) likely to be less than $1,000 The damage to the reputation of the business is likely to be minimal

Incident Priority Classes
Impact

• Incident Priority is derived from urgency and impact. If classes are defined to rate urgency and impact (see above), an Urgency-Impact Matrix can be used to define priority classes, identified in this example by colors and priority codes:

H H 1

M 2

N 3

Urgency

M
L

2
3

3
4

4
5
Target Resolution Time 1 Hour 4 Hours 8 Hours 24 Hours

Priority Code 1 2 3 4

Description Critical High Medium Low

Target Response Time Immediate 10 Minutes 1 Hour 4 Hours

5

Very low

1 Day

1 Week

Circumstances that warrant the Incident to be treated as a Major Incident
• Major Incidents call for the establishment of a Major Incident Team and are managed through the Handling of Major Incidents process. The above prioritization scheme notwithstanding, it is often appropriate to define additional, readily understandable indicators for identifying Major Incidents (see also the comments below on identifying Major Incidents). Examples for such indicators are: 1. Certain (groups of) business-critical services, applications or infrastructure components are unavailable and the estimated time for recovery is unknown or exceedingly long (specify services, applications or infrastructure components) 2. Certain (groups of) Vital Business Functions (business-critical processes) are affected and the estimated time for restoring these processes to full operating status is unknown or exceedingly long (specify business-critical processes)

Identifying Major Incidents
• 1. 2. 3. 4. 5. • It is not easy to give clear guidelines on how to identify major incidents although the 1st Level Support often develops a "sixth sense" for these. It is also probably better to err on the side of caution in this respect. A Major incidents tend to be characterized by its impact, especially on customers. Consider some examples: A high speed network communications link fails and part of or all data communication to and from outside the organization is cut off. A website grinds to a halt because of unexpected heavy demand prior to a deadline (for example to reserve tickets or make a legal submission) resulting in large numbers of customers failing to meet that deadline. A key business database is found to be corrupted. More than one business server is infected by a worm. The private and confidential information of a significant number of individuals is accidentally disclosed in a public forum. Note also that all disasters (covered by the IT Service Continuity Strategy and underpinning ITSCM Plans) are Major Incidents and that smaller incidents that are compounded by errors or inaction can become major incidents.

Some of the key characteristics that make these Major Incidents are:
• The ability of significant numbers of customers and/or key customers to use services or systems is or will be affected. • The cost to customers and/or the service provider is or will be substantial, both in terms of direct and indirect costs (including consequential loss). • The reputation of the Service Provider is likely to be damaged. AND • The amount of effort and/or time required to manage and resolve the incident is likely to be large and it is very likely that agreed service levels (target resolution times) will be breached. • A Major Incident is also likely to be categorized as a critical or high priority incident.

9 Sub-Processes
• • • Incident Management Support - to provide and maintain the tools, processes, skills and rules for an effective and efficient
handling of Incidents.

Incident Logging and Categorization - To record and prioritize the Incident with appropriate diligence, in order to facilitate a
swift and effective resolution.

Immediate Incident Resolution by 1st Level Support - To solve an Incident (service interruption) within the agreed time
schedule. The aim is the fast recovery of the IT service, where necessary with the aid of a Workaround. As soon as it becomes clear that 1st Level Support is not able to resolve the Incident itself or when target times for 1st level resolution are exceeded, the Incident is transferred to a suitable group within 2nd Level Support.

Incident Resolution by 2nd Level Support - To solve an Incident (service interruption) within the agreed time schedule. The
aim is the fast recovery of the service, where necessary by means of a Workaround. If required, specialist support groups or third-party suppliers (3rd Level Support) are involved. If the correction of the root cause is not possible, a Problem Record is created and the errorcorrection transferred to Problem Management.

Handling of Major Incidents - To resolve a Major Incident. Major Incidents cause serious interruptions of business activities and
must be resolved with greater urgency. The aim is the fast recovery of the service, where necessary by means of a Workaround. If required, specialist support groups or third-party suppliers (3rd Level Support) are involved. If the correction of the root cause is not possible, a Problem Record is created and the error-correction transferred to Problem Management.

• • • •

Incident Monitoring and Escalation- To continuously monitor the processing status of outstanding Incidents, so that countermeasures may be introduced as soon as possible if service levels are likely to be breached.

Incident Closure and Evaluation Process : To submit the Incident Record to a final quality control before it is closed. The aim
is to make sure that the Incident is actually resolved and that all information required to describe the Incident's life-cycle is supplied in sufficient detail. In addition to this, findings from the resolution of the Incident are to be recorded for future use.

Pro-Active User Information Process : To inform users of service failures as soon as these are known to the Service Desk, so
that users are in a position to adjust themselves to interruptions. Proactive user information also aims to reduce the number of inquiries by users. This process is also responsible for distributing other information to users, e.g. security alerts.

Incident Management Reporting Process : ITIL Incident Management Reporting aims to supply Incident-related information
to the other Service Management processes, and to ensure that that improvement potentials are derived from past Incidents.

Definitions The following ITIL terms and acronyms (information objects) are used in the ITIL Incident Management process to represent process outputs and inputs:
• • • • • • • • Incident -An Incident is defined as an unplanned interruption or reduction in quality of an IT service (a Service Interruption). Incident Escalation Rules -A set of rules defining a hierarchy for escalating Incidents, and triggers which lead to escalations. Triggers are usually based on Incident severity and resolution times. See also: Checklist Incident Priority Incident Management Report -A report supplying Incident-related information to the other Service Management processes. Incident Model -An Incident Model contains the pre-defined steps that should be taken for dealing with a particular type of Incident. This is a way to ensure that routinely occurring Incidents are handled efficiently and effectively. Incident Prioritization Guideline -The Incident Prioritization Guideline describes the rules for assigning priorities to Incidents, including the definition of what constitutes a Major Incident. Since Incident Management escalation rules are usually based on priorities, assigning the correct priority to an Incident is essential for triggering appropriate escalations. See also: Checklist Incident Prioritization Guideline Incident Record -A set of data with all details of an Incident, documenting the history of the Incident from registration to closure. An Incident is defined as an unplanned interruption or reduction in quality of an IT service. Every event that could potentially impair an IT service in the future is also an Incident (e.g. the failure of one hard-drive of a set of mirrored drives). See also: ITIL Checklist Incident Record Incident Status Information -A message containing the present status of an Incident sent to a user who earlier reported a service interruption. Status information is typically provided to users at various points during an Incident's lifecycle. Major Incident - Major Incidents cause serious interruptions of business activities and must be solved with greater urgency. See also: Checklist Incident Priority: Major Incidents Major Incident Review -A Major Incident Review takes place after a Major Incident has occurred. The review documents the Incident's underlying causes (if known) and the complete resolution history, and identifies opportunities for improving the handling of future Major Incidents. Notification of Service FailureThe reporting of a service failure to the Service Desk, for example by a user via telephone or e-mail, or by a system monitoring tool. Pro-Active User Information - A notification to users of existing or imminent service failures even if the users are not yet aware of the interruptions, so that users are in a position to prepare themselves for a period of service unavailability. Status Inquiry - An inquiry regarding the present status of an Incident or Service Request, usually from a user who earlier reported an Incident or submitted a request. Support Request - A request to support the resolution of an Incident or Problem, usually issued from the Incident or Problem Management processes when further assistance is needed from technical experts. User Escalation -Escalation regarding the processing of an Incident or Service Request, initiated by a user experiencing delays or a failure to restore their services. User FAQs -Self-help information for users supplied by the Service Desk, usually as part of the Support Pages on the intranet.


• • • • • •

ITIL KPIs Incident Management
Key Performance Indicator (KPI) Number of repeated Incidents   Incidents resolved Remotely    Number of Incidents   Definition Number of repeated Incidents, with known resolution methods Number of Incidents resolved remotely by the Service Desk (i.e.without carrying out work at user's location) Number of escalations for Incidents not resolved in the agreed resolution time Number of incidents registered by the Service Desk grouped into categories Average time taken between the time a user reports an Incident and the time that the Service Desk responds to that Incident Average time for resolving an incident grouped into categories Percentage of Incidents resolved at the Service Desk during the first call grouped into categories Rate of incidents resolved during solution times agreed in SLA grouped into categories Average work effort for resolving Incidents grouped into categories

Number of Escalations

Average Initial Response Time

 Incident Resolution Time   First Time Resolution Rate   Resolution within SLA   Incident Resolution Effort 

Checklists for Service Desk and Incident Management
• • • • • Checklist Incident Record Checklist Initial Analysis of an Incident Checklist Incident Escalation Checklist Closure of an Incident Checklist Incident Report

Checklist Incident Record - ITIL V2
The following data is recorded during the registration of an Incident: • Unique ID of the Incident (usually allocated automatically by the system) • Date and time of the creation (usually allocated automatically by the system) • Service Desk agent responsible for the registration • Caller/ User data • Incident type (Service Interruption, Service Request) • Description of symptoms • Affected IT Service(s) • Relevant SLAs • Relationship to CIs • Product category, usually selected from a category-tree according to the following example:
– Client PC
• • Standard configuration 1 ... Manufacturer 1 ... Hardware error Software error ...

Printer
• •

Incident category, i.e.
• • •

Link/ Attribution to another Incident (if a similar outstanding Incident exists, to which the new Incident is able to be attributed)

Checklist Initial Analysis of an Incident
Using the assignment of the Incident to CIs and to Product and Incident categories, the Support Knowledge Base is searched for: • Known Solutions • Known Workarounds • Known Errors If it becomes apparent during the initial analysis that the attributions originally assigned were not applicable, these are corrected: • Relationships to CIs • Product category, usually selected from a category-tree according to the following example
– – – Client PC
• • • • • • • Standard configuration 1 ... Manufacturer 1 ... Hardware error Software error ...

Printer Incident category, i.e.

Checklist Incident Escalation
The Escalation of Incidents follows pre-defined rules: • Defined triggers for Escalations, i.e. combinations of
– Degree of severity of an Incident (severe Incidents are, for example, immediately escalated) – Duration (an Escalation occurs, if the Incident was not resolved within a pre-determined period, as for example the maximum resolution times agreed within the SLAs) – In an ideal case this would be system-controlled triggered by customisable Escalation rules

Defined Escalation levels in the form of an Escalation Hierarchy, for example
– – – – 1st Level Support Incident Manager Manager of Data Processing Centre CIO

Assigned triggers to the Escalation Hierarchy (conditions/ rules, which lead to the Escalation to a particular level within the Escalation Hierarchy)

Checklist Closure of an Incident
The following entries of an Incident Record are investigated for their integrity and completeness during the closure of an Incident: • Protocol of actions
– – – – – – – – Person in charge Support Group Time and Date Description of the activity "New" into "Initial Analysis Completed" "Initial Analysis Completed" into "Assigned to 2nd Level Support" ... "Resolved" into "Closed"

• History of status changes, for example

• • • • •

Documentation of applied Workarounds Documentation of the root cause of the Service interruption Documentation of the applied resolution to eliminate the root cause Date of the Incident resolution Date of the Incident closure

Checklist Incident Report
The Incident Manager's report includes the following information: • Adherence to agreed Service Levels
– – Agreed Service Levels Attained Service Levels In the past (prolonged IT Service failures etc.) • Type of event • Causes • Counter-measures for the elimination of the Incident • Measures for the future avoidance of similar occurrences In the future (e.g. planned prolonged downtimes to IT Services) Number of Incidents • Over time • According to categories Resolution times • According to duration • According to categories Initial resolution rate • Over time • According to categories Trend analyses Description Applied resolution strategy • Elimination of the root cause • Workaround

Major Incidents causing breaches of agreed IT Service Levels

Statistical evaluations

Technical analysis of important or repetitive Incidents
– –

Roles | Responsibilities
• • Incident Manager - Process Owner - The Incident Manager is responsible for the effective implementation of the Incident Management process and carries out the corresponding reporting. He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels. 1st Level Support - The responsibility of 1st Level Support is to register and classify received Incidents and to undertake an immediate effort in order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also keeps users informed about their Incidents' status at agreed intervals. 2nd Level Support - 2nd Level Support takes over Incidents which cannot be solved immediately with the means of 1st Level Support. If necessary, it will request external support, e.g. from software or hardware manufacturers. The aim is to restore a failed IT service as quickly as possible. If no solution can be found, the 2nd Level Support passes on the Incident to Problem Management. 3rd Level Support - 3rd Level Support is typically located at hardware or software manufacturers (third-party suppliers). Its services are requested by 2nd Level Support if required for solving an Incident. The aim is to restore a failed IT Service as quickly as possible. Major Incident Team - A dynamically established team of IT managers and technical experts, usually under the leadership of the Incident Manager, formulated to concentrate on the resolution of a Major Incident.

• •

Responsibility Matrix
ITIL Role / Sub-Process Incident Manager A[1]R[2] A 1st Level Support R 2nd Level Support Major Incident Team -

Responsibility Matrix: ITIL Incident Management Applications Analyst[3] Technical Analyst[3] IT Operator[3] -

Incident Management Support
Incident Logging and Categorization Immediate Incident Resolution by 1st Level Support Incident Resolution by 2nd Level Support Handling of Major Incidents Incident Monitoring and Escalation Incident Closure and Evaluation Pro-Active User Information Incident Management Reporting

A

R

-

-

-

-

-

A AR AR A A AR

R R R R -

R -

R -

R[4] -

R[4] -

R[4] R -

Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Incident Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Incident Management. [3] see → Role descriptions... [4] In cooperation, as required. 2nd Level Support Groups often include Applications Analysts and/ or Technical Analysts.

Request Fulfilment • Process Objective: To fulfill Service Requests, which in most cases are minor (standard) Changes (e.g. requests to change a password) or requests for information. • Process Description - Request Fulfilment was added as a new process to ITIL V3 with the aim to have a dedicated process dealing with Service Requests. This was motivated by a clear distinction in ITIL V3 between Incidents (Service Interruptions) and Service Requests (standard requests from users, e.g. password resets). • In ITIL 2011, Request Fulfilment has been completely revised. To reflect the latest guidance Request Fulfilment now consists of five sub-processes, to provide a detailed description of all activities and decision points. • Request Fulfilment now contains interfaces with Incident Management - if a Service Request turns out to be an Incident and with Service Transition if fulfilling a Service Request requires the involvement of Change Management. The process overview of ITIL Request Fulfilment is showing the most important interfaces (see Figure 1). • A clearer explanation of the information that describes a Service Request and its life cycle has been added. • The concept of Service Request Models is explained in more detail.

Service Request Fulfilment

Sub Processes
• Request Fulfilment Support Process Objective: To provide and maintain the tools, processes, skills and rules for an effective and efficient handling of Service Requests. • Request Logging and CategorizationProcess Objective: To record and categorize the Service Request with appropriate diligence and check the requester's authorization to submit the request, in order to facilitate a swift and effective processing. • Request Model ExecutionProcess Objective: To process a Service Request within the agreed time schedule. • Request Monitoring and EscalationProcess Objective: To continuously monitor the processing status of outstanding Service Requests, so that counter-measures may be introduced as soon as possible if service levels are likely to be breached. • Request Closure and EvaluationProcess Objective: To submit the Request Record to a final quality control before it is closed. The aim is to make sure that the Service Request is actually processed and that all information required to describe the request's life-cycle is supplied in sufficient detail. In addition to this, findings from the processing of the request are to be recorded for future use.

Definitions
• Request for Service - A formal request from a user for something to be provided – for example, a request for information or advice; to reset a password; or to install a workstation for a new user. The details of a Request for Service are recorded by Request Fulfilment in a Service Request Record. • Service Request Model - A (Service) Request Model defines specific agreed steps that will be followed for a Service Request of a particular type (or category). • Service Request Record - A record containing all details of a Service Request. Service Requests are formal requests from a user for something to be provided – for example, a request for information or advice; to reset a password; or to install a workstation for a new user. • Service Request Status Information - A message containing the present status of a Service Request sent to a user who earlier reported requested a service. Status information is typically provided to users at various points during a Service Request's lifecycle.

Roles | Responsibilities
• Incident Manager – (Process Owner) - The Incident Manager is responsible for the effective implementation of the Incident Management process and carries out the respective reporting.He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels. • 1st Level Support - The responsibility of 1st Level Support is to register and classify received Incidents and to undertake an immediate effort in order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also processes Service Requests and keeps users informed about their Incidents' status at agreed intervals. • Service Request Fulfilment Group - Groups specialize on the fulfilment of certain types of Service Requests. Typically, 1st Level Support will process simpler requests, while others are forwarded to the specialized Fulfilment Groups.

Responsibility Matrix: Request Fulfilment
Responsibility Matrix: ITIL Request Fulfilment

ITIL Role / Sub-Process Request Fulfilment Support Request Logging and Categorization Request Model Execution Request Monitoring and Escalation Request Closure and Evaluation

Incident Manager

1st Level Support

Service Request Fulfilment Group

A[1]R[2]

A

R

-

A

R

R

AR

R

-

A

R

-

Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Request Fulfilment process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Request Fulfilment.

Access Management
• Objective: ITIL Access Management aims to grant authorized users the right to use a service, while preventing access to non-authorized users. The Access Management processes essentially execute policies defined in Information Security Management. Access Management is sometimes also referred to as Rights Management or Identity Management. Part of: Service Operation Process Owner: Access Manager Process Description - Access Management was added as a new process to ITIL V3. The decision to include this dedicated process was motivated by Information security reasons, as granting access to IT services and applications only to authorized users is of high importance from an Information Security viewpoint. In ITIL 2011 an interface between Access Management and Event Management has been added, to emphasize that (some) Event filtering and correlation rules should be designed by Access Management to support the detection of unauthorized access to services. The process overview of ITIL Access Management is showing the most important interfaces (see Figure 1). A dedicated activity has been added to revoke access rights if required, to make this point clearer. In ITIL 2011 it has been made clearer in the Request Fulfilment and Incident Management processes that the requester's authorization must be checked.

• • •

Sub Processes
• Maintenance of Catalogue of User Roles and Access Profiles
Process Objective: To make sure that the catalogue of User Roles and Access Profiles is still appropriate for the services offered to customers, and to prevent unwanted accumulation of access rights.

• Processing of User Access Requests
Process Objective: To process requests to add, change or revoke access rights, and to make sure that only authorized users are granted the right to use a service.

Definitions
• Access Rights - A set of data defining what services a user is allowed to access. This definition is achieved by assigning the user, identified by his User Identity, to one or more User Roles. • Request for Access Rights - A request to grant, change or revoke the right to use a particular service or access certain assets. • User Identity Record - A set of data with all the details identifying a user or person. It is used to grant rights to that user or person. • User Identity Request - A request to create, modify or delete a User Identity. • User Role - A role as part of a catalogue or hierarchy of all the roles (types of users) in the organization. Access rights are based on the roles that individual users have as part of an organization. • User Role Access Profile - A set of data defining the level of access to a service or group of services for a certain type of user (User Role). User Role Access Profiles help to protect the confidentiality, integrity and availability of assets by defining what information computer users can utilize, the programs that they can run, and the modifications that they can make. • User Role Requirements - Requirements from the business side for the catalogue or hierarchy of user roles (types of users) in the organization. Access rights are based on the roles that individual users have as part of an organization.

Roles | Responsibilities - Matrix
• Access Manager – (Process Owner) grants authorized users the right to use a service, while preventing access to non-authorized users. The Access Manager essentially executes policies defined in Information Security Management.
Responsibility Matrix: ITIL Access Management ITIL Role / Sub-Process Maintenance of Catalogue of User Roles and Access Profile Access Manager

A[1]R[2]

Processing of User Access Requests

AR

Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Access Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Access Management.

Process Implementation: Notes
• There are a number of different approaches to implementing Access Management. Depending on the size of an organization the methods applied can be rather complex. In this context, ITIL does not provide a detailed explanation of all aspects of Access Management. • Well-defined interfaces between the business and Access Management are vital to achieve high security standards. Typically, responsibilities of both sides are defined in a dedicated Information Security Policy. This policy would, for example, stipulate that HR is to inform Access Management without delay about employees entering or leaving the company.

Problem Management
• Objective: The objective of ITIL Problem Management is to manage the lifecycle of all Problems. The primary objectives of Problem Management are to prevent Incidents from happening, and to minimize the impact of incidents that cannot be prevented. Proactive Problem Management analyzes Incident Records, and uses data collected by other IT Service Management processes to identify trends or significant Problems. Process Description - Essentially, the activities and process objectives of ITIL Problem Management are identical in ITIL V3 and ITIL V2. A new sub-process Major Problem Review was introduced in ITIL V3 to review the solution history of major Problems in order to prevent a recurrence and learn lessons for the future. In ITIL 2011 the new sub-process Proactive Problem Identification has been added to emphasize the importance of proactive Problem Management. In Problem Categorization and Prioritization, it has been made clearer that categorization and prioritization should be harmonized with the approach used in Incident Management, to facilitate matching between Incidents and Problems. The process overview of ITIL Problem Management is showing the most important interfaces (see Figure 1). The concept of recreating Problems during Problem Diagnosis and Resolution is now more prominent. This sub-process has been completely revised to provide clearer guidance on how this process cooperates with Incident Management. Note: The new ITIL 2011 books also contain an expanded section on problem analysis techniques and examples for situations where the various techniques may be applied.

• •


Sub-Processes

1. Proactive Problem Identification - To improve overall availability of services by proactively identifying Problems. Proactive Problem Management aims to identify and solve Problems and/or provide suitable Workarounds before (further) Incidents recur. 2. Problem Categorization and Prioritization - To record and prioritize the Problem with appropriate diligence, in order to facilitate a swift and effective resolution. 3. Problem Diagnosis and Resolution- To identify the underlying root cause of a Problem and initiate the most appropriate and economical Problem solution. If possible, a temporary Workaround is supplied. 4. Problem and Error Control - To constantly monitor outstanding Problems with regards to their processing status, so that where necessary corrective measures may be introduced.

Sub-Processes

5.

6.

7.

Problem Closure and Evaluation - To ensure that - after a successful Problem solution - the Problem Record contains a full historical description, and that related Known Error Records are updated. Major Problem Review - To review the resolution of a Problem in order to prevent recurrence and learn any lessons for the future. Furthermore it is to be verified whether the Problems marked as closed have actually been eliminated. Problem Management Reporting - ITIL Problem Management Reporting aims to ensure that the other Service Management processes as well as IT Management are informed of outstanding Problems, their processing-status and existing Workarounds (see "Problem Management Report").

Definitions
• • • • • • Known Error - is a problem that has a documented root cause and a Workaround. Known Errors are managed throughout their lifecycle by the Problem Management process. The details of each Known Error are recorded in a Known Error Record stored in the Known Error Database (KEDB). As a rule, Known Errors are identified by Problem Management, but Known Errors may also be suggested by other Service Management disciplines, e.g. Incident Management, or by suppliers. Known Error Database (KEDB) - is created by Problem Management and used by Incident and Problem Management to manage all Known Error Records. Problem - cause of one or more Incidents. The cause is not usually known at the time a Problem Record is created. Problem Management Report - A report supplying Problem-related information to the other Service Management processes. Problem Record - contains all details of a Problem, documenting the history of the Problem from detection to closure (see: ITIL Checklist Problem Record). Suggested new Known Error - A suggestion to create a new entry in the Known Error Database, for example raised by the Service Desk or by Release Management. Known Errors are managed throughout their lifecycle by Problem Management. Suggested new Problem - A notification about a suspected Problem, handed over to Problem Management for further investigation, possibly leading to the formal logging of a Problem. Suggested new Workaround - A suggestion to enter a new Workaround in the Known Error Database, for example raised by the Service Desk or by Release Management. Workarounds are managed throughout their lifecycle by Problem Management. Workaround - are temporary solutions aimed at reducing or eliminating the impact of Known Errors (and thus Problems) for which a full resolution is not yet available. As such, Workarounds are often applied to reduce the impact of Incidents or Problems if their underlying causes cannot be readily identified or removed.

• •

ITIL KPIs Problem Management
Key Performance Indicator (KPI)

Definition Number of Problems registered by Problem Management grouped into categories Average time for resolving Problems grouped into categories

Number of Problems
 

Problem Resolution Time
 

Number of unresolved Problem
Number of Incidents per Known Problem Time until Problem Identification

Number of Problems where the underlying root cause is not known at a particular time
Number of reported Incidents linked to the same Problem after problem identification Average time between first occurance of an Incident and identification of the underlying root cause Average work effort for resolving Problems grouped into categories

Problem Resolution Effort

Checklists for Problem Management Checklist Problem Record - ITIL V2
The following data is entered during the creation of a Problem Record:
• • • • • • • • Unique Problem ID (usually assigned automatically by the system) Creation date and time (usually allocated automatically by the system) Person in charge for the creation Description of symptoms Affected IT Service(s) Relevant SLAs Relationship to CIs Product category, usually selected from a category-tree according to the following example:
– Client PC
• • Standard configuration 1 ... Manufacturer 1 ... Hardware error Software error ...

Printer
• •

Problem category, for example
• • •

Links to
– – Incidents associated with this problem Other Problems, whose resolution is associated with this Problem

Workaround for the circumvention of the Problem, if known

Checklist Problem Record - ITIL 2011
• • • • • • • • A Problem Record typically contains the following information: Unique ID of the Problem (usually allocated automatically by the system) Date and time of detection Problem owner Description of symptoms Affected users/ business areas Affected service(s) Prioritization, a function of the following components:
– Urgency (available time until the resolution of the Problem), e.g.
• • • Up to 5 working days Up to 2 weeks Up to 4 weeks "High" (interruption to critical business processes) "Normal" (interruption to the work of individual employees) "Low" (hindrance to the work of individual employees, continuation of work possible by means of a circumventive solution)

Degree of severity (damage caused to the business), e.g.
• • •

Priority (for example in stages 1, 2 and 3): The result from the combination of urgency and the degree of severity

• •

Relationships to CIs Problem category, usually selected from a category-tree according to the following example (Problem categories should be harmonized with CI and Incident categories to support matching between Incidents, Problems and CIs):
– Hardware error
• Server A
– Component x » » Component y … Symptom a Symptom b

– –

• •

Server B … System A System B …

Software error
• • •

– –

Network error ...

• • • • •

Links to related Problem Records (if there are other outstanding Problems related to this one) Links to related Incident Records (if outstanding Incidents exist, whose solution depends on the solution of this Problem) Links to Known Errors and Workarounds (if Known Errors and Workarounds related to the Problem have been identified) Problem Recovery Procedures: Any procedures that are required to be performed to eliminate the Problem. These procedures may need to be performed as part of removing Workarounds that have been applied while solving related Incidents. Activity log/ resolution history
– – – – Date and time Person in charge Description of activities New Problem status (if the activity results in a change of status)

Checklist Problem Priority
The priority of a Problem is assigned according to the following rules:
• Urgency (available time until the resolution of the Problem), e.g.
– 1: up to 4 hrs. – 2: up to 1 day – 3: up to 5 days

• Degree of severity (damage caused to the business), e.g.
– 1: „High“ (interruption to critical business processes) – 2: „Normal“ (interruption to the work of individual employees) – 3: „Low“ (hindrance to the work of individual employees, continuation of work possible by means of a circumventive solution)

• Priority (e.g. in stages 1, 2 and 3): A function of urgency and the degree of severity

Checklist Closure of a Problem
The following entries are investigated with regards to their completeness and integrity during the closure of a Problem: • Protocol of actions
– – – – – – – – Person in charge Support group Time and date Description of the activity „New“ into „Initial Analysis Completed“ „Initial Analysis Completed“ into „Assigned to Specialists“ ... „Resolved“ into „Closed“

• History of the change in status, e.g.

• • • • •

Documentation of the root cause of the Problem (Known Error) Documentation of possible Workarounds Documentation of the applied (causal) resolution Date of Problem resolution Date of Problem closure

Checklist Problem Report
The Problem Manager's report includes the following information: • Statistical evaluations
– Outstanding Problems
• • • • According to duration since creation of the Problem Record According to categories According to duration According to categories

– Resolution times of closed Problems
– Trend analyses

Problems with special importance regarding Availability, Capacity, IT Service Continuity and IT Security Management
– Description – Problem cause – Applied resolution strategy
• • Elimination of the root cause Possible Workarounds

– Time schedule for the resolution of the Problem

Other important Problems with extensive effects upon the quality of the IT Services
– Description – Problem cause – Applied resolution strategy
• • Elimination of the root cause Possible Workarounds

– Time schedule for the resolution of the Problem

Roles | Responsibilities
Responsibility Matrix: ITIL Problem Management Problem Applications ITIL Role | Sub-Process Technical Analyst[3] Manager Analyst[3] Proactive Problem Identification Problem Categorization and Prioritization Problem Diagnosis and Resolution A[1]R[2] AR -

AR

R

R

Problem and Error Control
Problem Closure and Evaluation Major Problem Review Problem Management Reporting

AR
AR AR AR

-

-

Problem Manager – (Process Owner) is responsible for managing the lifecycle of all Problems. His primary objectives are to prevent Incidents from happening, and to minimize the impact of Incidents that cannot be prevented. To this purpose he maintains information about Known Errors and Workarounds.

Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the Problem Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Problem Management. [3] see Role descriptions...

IT Operations Control
• Objective: IT Operations Control aims to monitor and control the IT services and their underlying infrastructure. The process IT Operations Control executes day-to-day routine tasks related to the operation of infrastructure components and applications. This includes job scheduling, backup and restore activities, print and output management, and routine maintenance. • Part of: Service Operation • Process Owner: IT Operations Manager

Process Description
• ITIL does not provide a detailed explanation of all aspects of IT Operations, as the activities to be carried out will depend on the specific applications and infrastructure components in use. Rather, ITIL 2011 highlights common operational activities and assists in identifying important interfaces with other Service Management processes. The official ITIL publications treat IT Operations Control as a "function". The process overview of IT Operations Control is showing the most important interfaces (see Figure 1). • Remark: In ITIL V3, IT Operations Control activities were covered in the process "IT Operations Management".

Roles | Responsibilities
• IT Operations Manager - Process Owner. An IT Operations Manager will be needed to take overall responsibility for a number of Service Operation activities. For instance, this role will ensure that all day-to-day operational activities are carried out in a timely and reliable way. • IT Operator - are the staff who perform the dayto-day operational activities. Typical responsibilities include: Performing backups, ensuring that scheduled jobs are performed, installing standard equipment in the data center.

Responsibility Matrix: IT Operations Control
Responsibility Matrix: IT Operations Control
IT Operations Manager A[1] ITIL Role / Sub-Process IT Operations Control (no sub-processes specified) IT-Operator R[2]

Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the IT Operations Control process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within IT Operations Control.

IT Facilities Management
• Objective: The objective of ITIL Facilities Management is to manage the physical environment where the IT infrastructure is located. Facilities Management includes all aspects of managing the physical environment, for example power and cooling, building access management, and environmental monitoring. • Part of: Service Operation • Process Owner: Facilities Manager

Process Description
• ITIL Facilities Management is part of ICT Infrastructure Management in ITIL V2, where some aspects of managing facilities are described in more detail as in the new ITIL V3 books. • Interfaces between Facilities Management and the other ITIL processes were adjusted in order to reflect the new ITIL V3 process structure. The process overview of ITIL Facilities Management is showing the most important interfaces (see Figure 1). • Note: The official ITIL publications treat Facilities Management as a "function".

Roles | Responsibilities
• Facilities Manager – (Process Owner) The Facilities Manager is responsible for managing the physical environment where the IT infrastructure is located. This includes all aspects of managing the physical environment, for example power and cooling, building access management, and environmental monitoring.
Responsibility Matrix: ITIL Facilities Management ITIL Role / SubProcess Facilities Management (no sub-processes specified)

Facilities Manager

A[1]R[2]

Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the ITIL Facilities Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within ITIL Facilities Management.

ITIL Application Management
• Objective: ITIL Application Management is responsible for managing applications throughout their lifecycle. This process plays an important role in the applicationrelated aspects of designing, testing, operating and improving IT services, as well as in developing the skills required to operate the IT organization's applications. Application Management is an ongoing activity, as opposed to Application Development which is typically a one-time set of activities to construct applications. • Part of: Service Operation • Process Owner: Applications Analyst

Process Description
• Application Management is treated in ITIL as a "function". It plays an important role in the management of applications and systems. • Many Application Management activities are embedded in various ITIL processes - but not all Application Management activities. For this reason, at IT Process Maps we decided to introduce an Application Management process as part of the ITIL Process Map which contains the Application Management activities not covered in any other ITIL process. • Application Management activities embedded in other processes are shown there, with responsibility assigned to the Applications Analyst role. • The process overview of ITIL Application Management is showing the most important interfaces (see Figure 1).

Definitions /Roles | Responsibilities
• Skills Inventory - identifies the skills required to deliver IT services (now and in future), as well as the individuals who possess those skills. The Skills Inventory is the basis for developing training plans for individual employees. • Applications Analyst - Process Owner - is an Application Management role which manages applications throughout their lifecycle. There is typically one Applications Analyst or team of analysts for every key application. This role plays an important part in the application-related aspects of designing, testing, operating and improving IT services. It is also responsible for developing the skills required to operate the applications required to deliver IT services.
Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the ITIL Application Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within Application Management.
Responsibility Matrix: ITIL Application Management ITIL Role / SubProcess Application Management (no sub-processes specified) Applications Analyst

A[1]R[2]

ITIL Technical Management
• Objective: ITIL Technical Management provides technical expertise and support for the management of the IT infrastructure. Technical Managements plays an important role in the technical aspects of designing, testing, operating and improving IT services, as well as in developing the skills required to operate the IT infrastructure required. • Part of: Service Operation • Process Owner: Technical Analyst

Process Description
• Technical Management is treated in ITIL as a "function". It plays an important role in the management of the IT infrastructure. • Many Technical Management activities are embedded in various ITIL processes - but not all Technical Management activities. For this reason, at IT Process Maps we decided to introduce a Technical Management process as part of the ITIL Process Map which contains the Technical Management activities not covered in any other ITIL process. • Technical Management activities embedded in other processes are shown there, with responsibility assigned to the Technical Analyst role. • The process overview of ITIL Technical Managementis showing the most important interfaces (see Figure 1).

Roles | Responsibilities
• Technical Analyst - Process Owner - is a Technical Management role which provides technical expertise and support for the management of the IT infrastructure. There is typically one Technical Analyst or team of analysts for every key technology area. This role plays an important part in the technical aspects of designing, testing, operating and improving IT services. It is also responsible for developing the skills required to operate the IT infrastructure.
Remarks [1] A: Accountable according to the RACI Model: Those who are ultimately accountable for the correct and thorough completion of the ITIL Technical Management process. [2] R: Responsible according to the RACI Model: Those who do the work to achieve a task within ITIL Technical Management.

Responsibility Matrix: ITIL Technical Management ITIL Role / SubProcess Technical Analyst

Technical Management (no sub-processes specified)

A[1]R[2]

ITIL roles and boards - Service Operation
• 1st Level Support - The responsibility is to register and classify received Incidents and to undertake an immediate effort in order to restore a failed IT service as quickly as possible. If no ad-hoc solution can be achieved, 1st Level Support will transfer the Incident to expert technical support groups (2nd Level Support). 1st Level Support also processes Service Requests and keeps users informed about their Incidents' status at agreed intervals. 2nd Level Support - takes over Incidents which cannot be solved immediately with the means of 1st Level Support. If necessary, it will request external support, e.g. from software or hardware manufacturers. The aim is to restore a failed IT Service as quickly as possible. If no solution can be found, the 2nd Level Support passes on the Incident to Problem Management. 3rd Level Support - is typically located at hardware or software manufacturers (third-party suppliers). Its services are requested by 2nd Level Support if required for solving an Incident. The aim is to restore a failed IT Service as quickly as possible. Access Manager - grants authorized users the right to use a service, while preventing access to non-authorized users. The Access Manager essentially executes policies defined in Information Security Management. Facilities Manager - is responsible for managing the physical environment where the IT infrastructure is located. This includes all aspects of managing the physical environment, for example power and cooling, building access management, and environmental monitoring. Incident Manager - is responsible for the effective implementation of the Incident Management process and carries out the corresponding reporting. He represents the first stage of escalation for Incidents, should these not be resolvable within the agreed Service Levels. IT Operations Manager - will be needed to take overall responsibility for a number of Service Operation activities. For instance, this role will ensure that all day-to-day operational activities are carried out in a timely and reliable way. IT Operator - IT Operators are the staff who perform the day-to-day operational activities. Typical responsibilities include: Performing backups, ensuring that scheduled jobs are performed, installing standard equipment in the data center. Major Incident Team - A dynamically established team of IT managers and technical experts, usually under the leadership of the Incident Manager, formulated to concentrate on the resolution of a Major Incident. Problem Manager - is responsible for managing the lifecycle of all Problems. His primary objectives are to prevent Incidents from happening, and to minimize the impact of Incidents that cannot be prevented. To this purpose he maintains information about Known Errors and Workarounds. Service Request Fulfilment Group - Groups specialize on the fulfillment of certain types of Service Requests. Typically, 1st Level Support will process simpler requests, while others are forwarded to the specialized Fulfilment Groups

• •

• • • •

Sign up to vote on this title
UsefulNot useful