You are on page 1of 106

ITIL

Foundation

User Guide

1
Printed and Published by:

Key Skills
George House
Princes Court
Beam Heath Way
Nantwich
Cheshire
CW5 6GD

The company has endeavoured to make sure that the information contained within this User Guide is correct at
the time of its release.

The information given here must not be taken as forming part of or establishing any contractual or other
commitment by Key Skills and no warranty or representation concerning the information is given.

All rights reserved. This publication, in part or whole, may not be reproduced, stored in a retrieval system, or
transmitted in any form or by any means – electronic, electrostatic, magnetic disc or tape, optical disk,
photocopying, recording or otherwise without the express written permission of the publishers, Key Skills.

© Key Skills 2000

2
Contents

Page

Foreword 4

Section 1

Hardware/Software Pre-requisites 5

Section 2

Installation Procedure 6

Section 3

Operating the software 7


Sign-on procedure 7
The user interface 7

Section 4 - Course Notes

Topic 1 – Overview of IT Service Management and ITIL


Lesson 1a - Introduction 9

Topic 2 – Supporting the User of IT Services


Lesson 2a – Service Desk 13
Lesson 2b – Incident Management 20
Lesson 2c – Problem Management 24

Topic 3 – Control Processes


Lesson 3a – Configuration Management 30
Lesson 3b – Change Management 37
Lesson 3c – Release Management 46

Topic 4 – Service Delivery Building Blocks


Lesson 4a – Availability Management 51
Lesson 4b – Capacity Management 59

Topic 5 – Getting the Right Service Quality at the Right Price


Lesson 5a – Service Level Management 65
Lesson 5b – Financial Management for IT Services 72

Topic 6 – Protecting Business and IT Services


Lesson 6a – Continuity Management 77

Topic 1- Exam Technique


Lesson 7a – Passing the ITIL Foundation Exam 82

Acronyms 84

Glossary of Terms 90

3
Foreword

Projects are essentially about change – and because managing change is an increasingly significant
fact of business life – project management is an essential key skill in today’s working environment.

Many people are involved in project work, either directly or in a supporting role, and yet they have
never received formal training in the basic techniques which can make the difference between a
successful project and an expensive failure.

For exactly the same reason, the introduction of computer-based project management tools can
lead to disappointing results. Training someone to simply operate a computerised project planning
tool does not make them a project manager – any more than teaching them to use a calculator
would make them an accountant!

Key Skills in Project Management (Fundamentals) is the first step in bridging this skills gap. For
many people it is all the training they will need to enable them to operate more effectively in a
project environment and to make more effective use of their planning software. Professional
project managers will find this course lays a solid foundation on which the other modules in the Key
Skills PM Portfolio will build to provide a career-enhancing programme of learning and
development.

4
Section 1: Hardware/Software Pre-requisites

For optimum performance, you should operate this multimedia course on a computer with the
following minimum specification:

Pentium P100 Processor


16 mb RAM
8x CD-ROM drive
Sound-card & speakers
SVGA Monitor (NB The course uses 800x600 resolution)
Mouse/Pointing device

A 32-bit Windows® operating system is also needed.

5
Section 2: Installation Procedure

2.1 From CD-ROM (Single User)

Place the CD in your CD drive and run START.EXE.

START.EXE will run the course directly from your CD-ROM drive and no runtime files will be copied
to your hard disk drive.

The first time you run the course you will be required to register it with Key Skills. Please follow the
accompanying registration instructions carefully

2.2 Network Instructions

Subject to bandwidth and licensing terms, this multi-media training course can be installed and
operated over a local area network or a corporate Intranet.

There are a number of ways in which installation and operation can be effected and you should
contact Key Skills Technical Support Section for advice.

Any problems, please call us on 01270 611600.

6
Section 3: Operating the Software

3.1 Sign-On Procedure

To start the course, double click on the course icon and the program will commence, with music
and introductory title screen.

Once you have passed the title screen and copyright notices you will be asked to identify yourself
to the system.

If you are new to the course you must enter your name/identification and then confirm this to the
system. If you have used the course previously be sure to use the same name, otherwise your
bookmarks within the course will be invalid.

3.2 The User Interface

Once sign on is completed you will be presented with the main menu which looks like this:

Each topic is represented by one of the “panes” on the menu screen, for example:

Go to Bookmarks Start at First Page

Each of these lesson panes is divided into two distinct areas. If you click on the lesson title text
then you will be taken to the start of the corresponding lesson.

The left side of the pane is the bookmark area and a pink bar will appear in this area to show
whether you have part or fully completed the corresponding lesson. By clicking in the bookmark
area you will be taken to your last point of study within the corresponding lesson.

Note: The bookmarking system is switched off as soon as you move around the course using
either the Index or the Contents buttons at the bottom of each page.

7
Throughout the course, the main user controls are located at the bottom of the screen, and their
functions are as shown below:

Newcomers to the course will gain most benefit from starting at the beginning of the first lesson
and working their way through, sequentially, to the end. However, the package is also a valuable
source of reference and it is possible to re-visit specific lessons, or parts of a lesson, at any time.
The Contents and Index facilities are particularly useful for browsing in this way.

8
Lesson 1a Introduction
Section 4: Course Notes

Lesson 1a - Introduction The ITIL Library consists of seven volumes. The


central core of the library consists of Service
Delivery, Service Support, Business Perspective
Welcome to this computer based training course
and Infrastructure Management, and at its
in IT Service Management.
centre - Application Management.
This course has been designed to provide you
Applications Management holds this central
with sufficient knowledge to pass the ISEB and
position as it’s the only volume in the library
EXIN Foundation level exams.
which deals with both Development and Service
Delivery issues.
People who will benefit from this course
include:
There are two further ancillary volumes, which
provide additional guidance. They are: ‘Planning
• Individuals currently working in an
to Implement Service Management’ – used by
organisation’s IT department
Project managers who are implementing ITIL,
and ‘Security Management’ – which offers
• Those wishing to develop skills in IT
additional information on infrastructure.
Service Management
For the purposes of this course we are
• Organisations and their employees who
interested in what’s known as ‘Core ITIL’ This
have implemented or intend to implement
core consists of two major volumes, ‘Service
an IT Service Management structure.
Support’ and ‘Service Delivery’.
• In this introductory lesson we will:
In addition to the two main manuals we will
also refer to a guidance overview booklet
• Discover what ITIL is, and how ITIL fits in
known as ‘little ITIL’. This overview booklet is
to a quality environment.
published by the IT Service Management Forum
or ITSMF, an independent user organisation
• Examine Service Management and the
dedicated to IT Service Management.
Organisation, the ICT infrastructure, and
how we define a service in IT terms.
This course forms an ‘introductory overview’ to
the content of both books, and you will find that
• Finally we will examine the functions that
much of the material is also covered in the ‘little
make up the core ITIL processes.
ITIL’ book. This ‘overview’ will provide you with
enough knowledge to sit the Foundation
So what is ITIL? Certificate in Service Management.

ITIL is an acronym for Information Technology


Infrastructure Library. It consists of a library of
ITIL and ISO9000
reference books outlining good practice
Today’s businesses need to concentrate on
guidelines for IT Service Management.
providing a ‘Quality Service’ and to adopt a
more customer focussed approach. ITIL
It was conceived by the UK government who
provides a best practice framework focusing on
approached various organisations and subject
the provision of high quality services, and it
matter experts to write all of the books in the
places particular importance on customer-
library, and it was originally published in the
supplier relationships.
late 1980’s.
For example, areas within ‘Service Delivery’
The ITIL library is published by the Office of
address customer agreements and monitors
Government Commerce, or OGC, and in 2001
targets within these agreements. On an
revised versions of the ITIL manuals were
operational level ‘Service Support’ processes
published to include, amongst other things,
address any changes or failings outlined in
recent technological developments, such as the
these agreements.
internet and e-commerce.
In both cases, there is a strong link between
Further updates to the manuals were published
ITIL and recognised quality systems, such as
late in 2002.
ISO9000. ITIL’s non prescriptive nature allows
the tailoring of ‘Service Management’
Since its inception ITIL has expanded from a
implementation, allowing it to sit comfortably
library of books into a whole industry, with
along side a recognised quality system.
many organisations offering related products
including training, consultancy and
Many companies require their suppliers to
management tools.
become registered to ISO 9001 and because of

9
Lesson 1a Introduction

this, registered companies find that their The ICT Infrastructure


market opportunities have increased. In
addition, a company's compliance with ISO If service provision to business is to be
9001 ensures that it has a sound Quality effective, then its implementation should be as
Assurance system. transparent as possible. It should be assumed
that end-users have no Information &
Registered companies have had dramatic Communications Technology knowledge. IT
reductions in Customer complaints, Service Management staff must take a
customer focused view and concentrate on
significant reductions in operating costs and providing high quality services that are
increased demand for their products and available when users want them, that respond
services. quickly to demand, and are easily maintainable.
As IT management staff, you will be working
Service Management and the alongside technical specialists helping to
Organisation maintain the ICT infrastructure, and ensuring
that delivered services are cost effective.
In any organisation, managing IT services is a
fundamental part of day to day operations. As The ICT infrastructure is divided into 3 areas.
well as maintaining and servicing these ongoing Hardware, Software and Peopleware.
business functions, an organisation develops
new applications. Hardware consists of all the ICT and
environmental infrastructure, including
Each new application might be made up of a mainframe computers, network equipment,
number of projects, or a group of projects, workstations etc.
known as a programme. The relationship
between these different projects needs to be Software consists of network and mainframe
understood and documented in order to monitor operating systems, database management
progress, change and so on. systems, development tools and general
applications and computer data itself.
As these projects develop they approach a
transition point. A transition point is defined as Inclusion of data here is a contentious one, as
the point at which responsibility for the project it’s suggested by some people that a fourth
passes from the development team to the team infrastructure category should exist, handling
responsible for end user delivery and support. data as a separate corporate resource.

This transition point is also known as the And finally, Peopleware, this includes skills sets,
implementation point, and it can vary details of training products, documentation of
depending on organisational structure and both products and services, Working practices
policy. For example a development team might and general procedures.
retain project responsibility until the end of a
warranty period, at the end of which they hand To deliver effective services to business, all
over the completed project, and associated three infrastructure components should be
ownership, to service management staff. managed and controlled efficiently.

ITIL defines a major process to handle the The management of Hardware and Software is
complex relationships which affect projects, and dealt with in a separate ITIL guidance volume
this is known as Application Management. called ‘ICT Infrastructure Management’.

Application Management considers the whole Our focus in this course is the management of
‘cradle to grave’ lifecycle of an application, ‘Peopleware’, its documents and procedures,
considering issues from feasibility through and how it relates to Service Support and
productive life and final retirement of the Service Delivery.
application. It considers applications as
‘strategic resources’ that need to be managed
throughout their life, understanding the
implications that decisions made at one stage
has on later stages.

Although this process isn’t examined in detail in


this course, it is important to understand the
relationship between Service Management
Guidance the IT business as a whole.

10
Lesson 1a Introduction

What does ITIL regard as a Service Five of these disciplines relate to service
delivery.
We all encounter business services in our
everyday lives. Placing an order for goods or These are:
services for example, or when checking into a • Service Level Management
hotel, we are being offered a business service. • IT Financial Management
• Availability Management
In most cases businesses are underpinned by IT • Capacity Management
services. The IT service consists of a set of • IT Service Continuity Management
related functions provided by IT systems in
support of the business, and is seen by the Day to day Service Delivery functions might
customer as a coherent and self-contained consist of technical support, and pro-active
entity. long-term planning of services.

A key phrase in the definition of IT services is The remaining six disciplines make up the
‘end to end’. Broadly speaking ‘end to end’ Service Support function.
means that we deal with all aspects of the
service, its documentation, its support, the These are:
application software, its networks, hardware • Service Desk
and so on. Obvious examples of IT services • Incident Management
might include e-mail, payroll and order • Problem Management
processing. However, there are other less • Change Management
obvious IT services, and these could include a • Release Management
wide area network or a UNIX server, or a • Configuration Management
customer database forming part of a service
support IT system. All six disciplines relate to the day-to-day
maintenance of a quality service.
The ITSMF’s ‘little ITIL’ book defines Service as:
Ten of the eleven disciplines support the
‘An integrated composite that consists of a Process Management discipline, with exception
number of components, such as management of one, and that’s the Service Desk function.
processes, hardware, software, facilities and
people, that provides a capability to satisfy a Service Desk is seen as a function. Every
stated management need or objective. organisation will have this function in place,
operating a Service Desk, employing service
The core ITIL processes are made up of eleven desk staff and managed by a service desk
disciplines. manager.

The remaining 10 disciplines all relate to


processes. For example we might have in place
an Incident Management process, but may not
have an Incident Manager. Our Incident
Management function might be managed by a
member of the Technical Support or Service
Desk team.

ITIL does not mandate the creation of specific


functional areas. So, for example a Problem
Management team need not be separate from a
Capacity Management Team and so on. In
practice, many organisations do follow this
model, but ITIL guidance allows you to form
your own structures.

However, ITIL does suggest one good practice,


that is for Configuration, Change and Release
Management to ‘share’ staff, and to be
managed by one individual. This shared
management is known as the CCRM or
Configuration, Change and Release
Management function.

11
Lesson 1a Introduction

Although we have represented each function Summary


here as a separate entity a great deal of
interactivity exists between each of them. Each In this introductory lesson we have:
function communicates with others in the
group. In fact there is a great deal of Briefly examined the history of the ITIL library,
relationship management within IT Service its make-up, and how Service Delivery and
Management. Service Support sit at its core.

For example, Service Level Management deals We have discussed how ITIL’s flexibility allows
with the provision of high quality services, easy integration into a recognised quality
provided at the right cost levels. Consequently system, such as ISO9000.
it interacts frequently with IT Financial
Management. We looked at the relationship between service
management and the business organisation,
Interaction between other functional and how ITIL defines Application Management
departments might be less frequent. For as a major process designed to handle these
example, Capacity Management and IT Service complex relationship.
Continuity Management might work together to
develop a cost effective and workable strategy We looked at the ICT infrastructure and its
to handle a major disaster, such as a flood. three constituent components, Hardware,
Software and Peopleware. We highlighted
In this scenario, Information on available Peopleware, its documents and procedures as a
capacity at a remote site or location would be primary focus of this course.
provided by Capacity Management.
We defined ‘What a service is’ in IT terms, and
The pre-determined level of support required examined some less obvious examples of ‘IT
for on-going business function would be services’
managed by IT Service Continuity Management.
And finally we looked at the eleven disciplines
These 11 disciplines and the relationship which form the core ITIL processes, and the
between them form the basis of this course, interactivity which exists between them within
and are the subject of the ISEB and EXIN IT Service Management.
examinations, leading to certificates in
Foundation IT Service Management.

12
Lesson 2a Service Desk

Lesson 2a Service Desk Problem Management addresses the underlying


reasons for such incidents and seeks to
implement permanent resolutions in order to
In this lesson we will be examining the IT
prevent a recurrence.
Service Desk, which is described in Chapter 4 of
the Service Support book of the IT
We will be looking in more detail at both
Infrastructure Library.
Incident Management and Problem
Management in the remaining two lessons of
When you have completed this lesson you will
this topic.
be able to:
For the rest of this lesson we will be examining
• List the main reasons why the
the Service Desk function.
establishment of a service desk can have
major benefits for the organisation, the
When a Customer or User has a problem,
end-user and the IT provider alike.
complaint or question, they want answers
quickly. More importantly they want a result -
• Describe the importance of the Service
their problem solved.
Desk as a single point of contact for IT
users.
Nothing is more frustrating than calling an
organisation and getting passed around until
• Identify three of the main approaches to
you find the right person to speak to - provided,
structuring a service desk.
of course, that they are not out at lunch or on
holiday or it's just after five o'clock.
• Explain what is meant by “escalation” in a
service desk context and identify two
ITIL Best Practice demands a single point of
different types of escalation procedure.
contact for users in their communication with
the IT service provider.
• Name at least six technological aids that
can be employed to improve the efficiency
Such a facility is known by various names in
of a service desk.
different organisations – some common ones
being Help Desk, Call Centre or Customer
Introduction Hotline. The name used by ITIL – and hence
during this course - is “Service Desk”.
One of the most important considerations when
delivering IT Services is to ensure the provision Obviously, what ITIL is referring to in this
of proper support for the users, so that when a context is an IT Service Desk – but the principle
problem or a query comes up, they can contact can, and often is applied to many areas of a
someone who will provide a solution or an company’s business.
answer.
So, in addition to an IT Service Desk there may
Often, time is of the essence and what the be a Service Desk where customers for the
users want is either a rapid resolution or a company’s products can call to get support.
work-around to their problem that will enable Another Service Desk may exist so that
them to carry on with their work with a employees can get answers to queries relating
minimum of interruption. to company policies, personnel issues and so
on.
In order to support users in this way, ITIL has
three closely related chapters, namely: For the purposes of this course will be making
• Service Desk the assumption that the term Service Desk
• Incident Management refers to an IT and Communications Technology
• and Problem Management. -or ICT - Service Desk. The integration between
IT and communications technology is so close
The Service Desk is meant to be the focal point these days that it makes sense to handle them
for the reporting of incidents, requests for via the same Service Desk.
change, or any queries that a user may have
about the service. On the other hand it also
provides a channel for the IT provider to
communicate information to users.

The Incident Management process enables the


recording, tracking, monitoring and resolution
of events that are a threat to “normal service”.

13
Lesson 2a Service Desk

Why Have A Service Desk? a problem with the IT Services that they use
will be both disruptive and costly. An effective
The establishment and operation of an effective Service Desk will significantly reduce the
Service Desk is a relatively expensive likelihood of such problems.
proposition. So it is important to understand
why such a facility might be needed and the A further consequence of this will be that the IT
benefits that it should provide. users will in turn be able offer a better level of
service to the external customers of the
The principle of a “single point of contact” that business.
we have already mentioned is considered an
essential element of ITIL Best Practice. This factor becomes even more crucial in an e-
business context where the lack of service will
The users of our IT services and their managers directly impact on end-customers and certainly
are customers in every sense of the word. lead to loss of business.

Like all customers they would quickly become Finally, another major benefit that a Service
frustrated and unhappy if they were unable to Desk brings is its contribution to the principle of
find somebody who could help them when they continuous improvement of the services offered
had problems with the systems on which they by IT.
depend.
The Service Desk will keep records of types of
So customer satisfaction and retention can also enquiry, the issues that are raised, the
be listed as an important benefit. particular services, or aspects of a service, that
seem to cause most problems and so on.
Another guiding principle of ITIL is that IT
should maintain a focus on the support of Identifying the most commonly occurring
business goals. IT does not exist just to problems and feeding this information back
provide ICT components or technology just for quickly to the IT Service Management structure
the sheer joy of playing with new equipment. is a critical aspect of the Service Desk.

It is there to help the organisation achieve its In this way, the Service Desk is the
business objectives. A well-staffed and efficient thermometer by which we can monitor the
Service Desk is a critical element in proving to health of the IT services that are being
the business that IT is listening and responding provided.
to their needs.
Additionally, the service desk can also operate
An efficient Service Desk can help to reduce the as a “shop window” – adding value to the
overall cost of ownership of the IT department, business by making users aware of facilities
and it can do this in a number of ways. that they may not know exist – or how to make
better use, in a business sense, of the facilities
The alternative to a Service Desk is for each that they are already using.
group of users to have their own “super-user”,
to whom they can turn when things go wrong. Points of Contact

ITIL strongly suggests that IT costs can be There is often some confusion about the terms
reduced by not requiring high levels of IT skills “user” and “customer” – so far in this course we
within the business community, and by making have used the words interchangeably and for
it obvious to all how support can be achieved many people they mean pretty much the same
very quickly via a centralised Service Desk. thing.

Making better use of skilled and expensive IT ITIL, however, does draw a distinction between
staff can also reduce costs. Straightforward the two terms.
issues can be resolved immediately by the
Service Desk, leaving skilled network A User – or End-User - is taken to mean the
technicians or database experts, for example, person who actually uses the product or service
to concentrate only on the complex problems or under discussion. A machine operator for
concentrate on improving the quality of the example.
infrastructure.
A Customer is the person who negotiates for
It will usually be the case that the users or the provision of the product or service, what the
customers are performing a valuable function specification should be, any changes that may
for the organisation. So, any time that they are be needed and possibly the payment
unable to operate at full efficiency as a result of arrangements.

14
Lesson 2a Service Desk

It may well be that the User and the Customer chances of a problem being resolved directly
are the same person. But in many cases, for and immediately at the desk.
operational systems, they will be different
groups of people. Customers normally being Here, we’ll assume a Service Desk is the single
managers, and users being the operators. point of contact for just Information and
Communications Technology issues.
These definitions are relevant here because
whilst the Service Desk is the main point of So, as the single contact point, the first duty of
contact between the User and the IT service the Service Desk is act as the IT users “friend”
provider, the Service Level Management within the IT department.
process is main point of contact between the
paying customer and the provider. This particularly relates to the role of the
Service Desk in:
In both cases the key point of reference is the
IT Service itself – as defined in the Service • Monitoring progress on incidents and
Level Agreements – which will contain queries
statements about hours of availability, time to
resolve issues, response times and so on. • Reporting this progress back to the user.

The importance of this to the Service Desk is • Chasing any experts that have been
that they must be aware of what Service Level assigned responsibility for resolving an
Agreements are in place and how these match issue.
up with the question, complaints and issues
that may be being raised by users. • Keeping an eye on any Service Level
Agreements that may specify maximum
It may well be for, example, that a user calls in acceptable response times for resolving
complaining of a 2 second response time – user issues.
when in fact the Service Level Agreement
specifies that 95% of responses should be As the user’s friend, the Service Desk has the
within 4 seconds. responsibility of communicating with the user,
both Reactively and Proactively.
Such an incident would be given a much lower
priority than had the figures been reversed. Reactively being in response to issues,
problems and queries raised by the users and
So, the general point is that Service Level ‘proactively’ being where the Service Desk goes
Agreements provide the link between the out to make users aware of issues that might
Customer, User and Service Level Management affect them.
relationships and that the Service Desk has a
responsibility to act on behalf of the User within It is not uncommon, for example, for the Service
the IT infrastructure. Desk to publish regular electronic newsletters to
the user community informing them of new
Service Desk as a Single Point facilities, changes to services and so on.

of Contact In order to operate effectively as a single point


of contact and the users friend, the Service
As we have already seen, the idea of the Desk should have the following ingredients:
Service Desk as a single point of contact is an
important one in ITIL. • Well trained staff with good interpersonal
skills.
Some organisations will take this principle to its
ultimate conclusion and have a single Service • Well organised systems and processes for
Desk as the point of contact for everything to recording and tracking incidents and
do with the ability of the business to continue to matching against previous incidents and
function properly. solutions.

So staff within such an organisation could call • Appropriate technology, such as automatic
the Service Desk if the lift broke down, or a call distribution equipment and knowledge-
light bulb in their area failed, or if they had a based systems that assist in identifying
query on their pension arrangements. solutions to problems.

This kind of Service Desk has the disadvantage • Enough technical competence to address
of demanding a very wide range of skills to be users’ problems directly or to interface with
available – which normally implies a referral technical experts if necessary.
system being used – which in turn reduces the

15
Lesson 2a Service Desk

In addition, the Service Desk must have all the Centralisation has the benefit of providing
necessary linkages with other ITIL disciplines. consolidation of management information and
improves utilisation of resources – and
For examples, there must be continuous therefore can reduce operation costs.
communication with the Problem Management
process – particularly when a major problem There are dangers, however, in that a perceived
has cropped up. loss of local knowledge may tempt local sites to
set up their own super-users or unofficial help
There will need to be liaison with Service Level desks.
Management so that potential breaches of
Service Level Agreements can be recognised. Another major issue with this centralised
approach is the cost of communications.
Configuration Management records will need to
be readily accessible so that, for example, a Particularly in an international context, careful
caller’s IT equipment can be easily identified. planning will be needed, otherwise long-
distance telephone calls could easily drive up
Conversely, the Availability Management the cost of providing the service to
process will be keen to look at Service Desk unacceptable levels.
records of incidents for conducting their own
analyses and as part of their role in improving The Virtual Service Desk is based on the
service availability. concept that physical location is not relevant
and that whilst the Service Desk may be
Service Desk Structure perceived as a centralised point, it may actually
consist of several local service desks.
A debate that always occurs early on in the
implementation of a service desk is how the As far as the local users are concerned they are
desk should be structured, from a geographical contacting a local service desk – but in reality
perspective. their calls may be automatically routed to the
most appropriate desk, based on the proximity,
There are a number of strategies that will time of day, staffing or whatever criteria apply.
usually be considered.
This option is obviously much more demanding
Here for example, each distinct site or region of on the use of technology, particularly telephony
the organisation has it’s own service desk – and re-routing equipment, in order to ensure that
hence can provide local expertise to solve local the whole process appears transparent to the
problems. end user.

There are a couple of obvious disadvantages to The logical extension of the virtual service desk
this approach, such as duplication of resources is what is sometimes called the “follow the sun”
and the maintenance of organisation-wide option.
standards and consistency. Also, lessons
learned in one area may not be passed on to This is widely used by multi-national companies
the others. – or even, these days, by local companies who
want to take advantage of cheaper labour rates
Such problems can be minimised the use of in other parts of the world.
centralised logging of incidents and results and
by establishing a central configuration So a typical “follow the sun” strategy might
management database that is accessible by all consist of a service desk in Australia, operating
the local service desks. between the hours of 6am to 6pm local time
The big advantage of this approach, which is and a second desk in London operating the
local knowledge, will obviously become more same hours local time there.
important the more geographically and
functionally dispersed the organisation’s sites The aim is to provide as close to 24 hour
become. In these situations, the issue of coverage as possible for users in each
language alone may give favour to local service hemisphere with the European service desk
desks. coming on line just as the Australian one is
closing down for the night – and vice versa.
The opposite extreme of the local service desk
is the central service desk, where all incidents So people in Europe requiring support during
and queries are reported to and handled by a the night will have their calls automatically re-
single centralised structure. routed to Australia.

16
Lesson 2a Service Desk

This is in fact a major advantage of this It may even be possible to introduce a degree
approach in that the local desk will tend to be of self-service where users register and track
handling local calls during the period of peak their own incidents without the need for inter-
demand – so that overnight re-routing, and personal communication with service desk staff.
hence long-distance traffic, should be relatively
minimal – but it’s there if needed. Be careful with this one though. It can all too
easily be used as an excuse for the service desk
Of course, “follow the sun” may well be more not playing its role in monitoring and processing
than two service desks, depending on the incidents on behalf of the user as the user’s
location or users, time differences and coverage friend.
required.
Also, be careful with telephone calls. If they
To make this work effectively it is imperative are not handled properly it is possible that the
that information about incidents is replicated or user will hang up in frustration and not re-dial.
shared between the different sites so that the
European Desk, for example, can continue to Hence the information that would have been
support a user with a query that may have been gained about a particular incident or query will
raised with the Australian Desk a few hours be lost. All that would be recorded is that a call
earlier. had been dropped, which in turn will be used as
a key measure of service desk performance.
Although there are some complexities with this
approach, it clearly has many advantages and it Lost calls of this kind are often referred to as
is becoming a very common arrangement for “fugitives”. There’s a problem out there that
multi-national organisations offering 24 hour cannot be investigated because it hasn’t been
/7day a week coverage – particularly those in recorded – and although the user could have
the e-commerce field. been more persistent, the fault is with the
service desk staff and or their technology for
Communicating with the Service not making it easier for them to report the
Desk incident.

Finally, the service desk needs the


There are many mechanisms by which problems
automatically generated notifications about
and incidents can be communicated to the
operational events so that they can inform
service desk.
users about a possible degradation in
performance caused by the fault or actions
These can be categorised into two sorts
necessary to repair it.
- human generated and machine generated.
The role of the service desk is simply to act on
Human users can communicate using a whole
these reports and to ensure that they are
range of options such as telephone, fax, voice
handled in the same way as user-reported
mail, e-mail, browser-based web-forms and so
incidents as far as recording and classification
on.
are concerned.
Machine generated communications could come
from some form of system monitoring tool. For Escalation
example, the loss of a particular
communications link in a network would usually Escalation Management is an important part of
be reported via network monitoring software. running an effective service desk.
Such incidents are often referred to as
Operational Events. Escalation is the process of moving an incident
or query to the point where it is most ably
So when a service desk is established, the resolved.
different inputs that will be encountered must
be anticipated and catered for. So, if the initial recipient of the call is not able
to deal with the incident or query – who should
Clearly, some of these inputs allow potential for it be passed to so that resolution can be
some form of automated response. If achieved as quickly as possible?
something comes in via e-mail then at least an
acknowledgement of receipt can generated ITIL distinguishes between Functional and
automatically. Hierarchical escalation.

17
Lesson 2a Service Desk

Here for an example, in a generic rather than Once things have bedded down it may be
just ICT service desk, calls that cannot be possible to relocate them to more productive
directly handled by the service desk will be areas.
directed to experts in the relevant functional
area. So at the one end of the scale we may have an
unskilled service desk, merely logging and
The percentage of calls that get passed routing calls – and at the other would be an
upwards will be determined by the skill levels expert desk capable of handing most, if not all,
and training of the service desk staff. the conceivable issues at the first point of call.

So functional escalation is the handing over of In between these would be what is often called
responsibility to a functionally more competent the skilled or semi-skilled service desk – and
area, in order to tackle a particular issue. this is considered by many to be the optimal
solution.
Hierarchical escalation is where problems are
passed up the management chain - either Achieving this optimal balance is an interesting
because they are very serious or need higher and difficult task. As we have said, there are
level authority to sanction the resources needed no hard and fast rules.
to provide a solution.
There is a school of thought that says a good
The first level of hierarchical escalation would target is to have about 70% of all issues
normally be to the service desk manager, who resolved at the service desk, without further
is usually the own of the incident management referral. But this will vary considerably
process. depending on the service being offered and the
maturity level of that service.
More serious issues may then go to the problem
manager, with a remit to call together the Whatever skill level is adopted, the use of
necessary specialists to resolve the incident as diagnostic scripts will increase the rate of
quickly as possible. resolution at first call, as will access to
knowledge databases, change schedules and so
Very explicit parameters need to be established on.
to govern hierarchical escalation, otherwise it is
very easy for it to become the norm, rather Service Level Agreements must also be
than the exception, which would clearly be accessible so that work can be prioritised
unacceptable. depending on the SLA clauses.

Service Desk Capability Regardless of the technical skills that are put in
place on the Service Desk, all operators must
Related to the escalation procedures is the have certain basic attributes to make them
general debate about how skilled and capable of suitable for the job.
resolving issues the service desk staff should
be. These will include:

ITIL does not make any recommendations in • A customer-focussed attitude – where


this respect because there is no absolute helping the customer is far more important
answer – every case must be considered on its and satisfying than playing with the latest
merits. technology.

Factors that are normally considered are the • An articulate nature – in particular the
increased costs of employing more highly skilled ability to translate technical information
staff against the improved service to the end- into something that is meaningful to the
users that will almost certainly result. business user. This can be particularly
challenging when dealing with customers
Also this may be a dynamic situation with the who are slow to catch on or who become
optimum skill level changing over time. frustrated, irate or even abusive.

Immediately following the introduction of a new • A methodical approach to questioning and


service, for example, it may be desirable to the recording of facts – and the ability to
have some experts available on the service maintain that approach when under severe
desk to handle the initial rush of calls about the pressure or when handling a difficult
new system. customer.

18
Lesson 2a Service Desk

• A good business perspective and As with all business investments - the costs of
understanding of what are the business introducing all of this kind of technology must
critical services. This business culture is be carefully weighed against the benefits that
often helped by recruiting service desk staff they bring in terms of service improvements
from within the business itself. and operational efficiency.

• And finally - multi-lingual capability is Benefits & Problems


becoming an increasingly important
attribute for some service desk staff. This The benefits of and potential difficulties with
is particularly true in the case of the virtual Service Desk are listed on Page 14 of the little
service desk, as discussed earlier, or in ITIL book and in Section 4.1.8 of the Service
multi-national organisations. Support Manual.

Service Desk Technology Summary


For the service desk to work effectively, some In this lesson we have been looking at the
investment in modern technology will be reasons for and functions of an ICT Service
needed. Desk.

Relevant technology can be categorised into We have seen how the Service Desk’s role is to
two types, telephony and software. act as a single point of contact and the users
friend in IT.
Examples of telephony technology might be
Automatic Call Distribution systems, which We have examined different strategies for
ensure that a bank of service desk operators structuring and resourcing a service desk and
are used in an optimal order and that work is we have seen the skills and attributes that
smoothed out as evenly as possible. service desk staff must have if they are to
operate effectively.
Conference call facilities can be useful in
allowing a second-line expert for example to be Finally we have seen some of the new
included in the conversation with the end-user. technology that can be employed to improve
the efficiency of operation of the service desk.
Computer-Telephony Integration can achieve
major gains in efficiency. An example of this
would be the identification of an incoming caller
based on their telephone number and the
linkage of this with configuration management.

This would allow all the details of the user, their


facilities and equipment, and possibly service
history, to be brought to the operators screen
before the call is even answered.

Useful software technology would be Intelligent


Knowledge-based systems that record
incidents, learn from them and identify patterns
over time and are able to suggest probable
causes and solutions.

Database access would provide fast


identification of known errors, problems or
anything that would help to provide a better
answer.

Automatic Referral or Escalation tools would


divert an issue to a pre-determined list of
second-line support staff after a certain period
of time.

And finally Automatic Tracking and Alert tools


could be used to monitor the status of an
incident as it progresses through the various
stages towards a resolution.

19
Lesson 2b Incident Management

Lesson 2b
Incident Management is more aimed at a “quick
Incident Management fix” or a workaround rather than a longer term
structural resolution to any fault. The priority
Objectives for Incident Management is recovery of service
as quickly and painlessly as possible.
In this lesson we will be examining Incident
Management, which is described in Chapter 5 of Problem Management is more about identifying
the Service Support book of the IT the underlying cause of faults and finding ways
Infrastructure Library. of engineering out these faults in the longer
term.
• When you have completed this lesson you
will be able to: This can of course lead to some conflict
between the two disciplines when Incident
• Define the term Incident Management Management staff are driven get a system back
according ITIL Best Practice. up and running quickly.

• Understand the difference between Incident Their colleagues in Problem Management, on


Management and Problem Management. the other hand, would like to have the system
down for longer so that they can conduct
• Identify the key stages in an Incident’s analyses and identify strategies for designing
Lifecycle. out any problems that may exist.

• Assess the priority of Incidents can be The Scope of Incident Management


prioritised based on a number of factors.
As we mentioned in the previous lesson, the
Incident Management Service Desk often plays a key role in Incident
– Introduction Management; recording and monitoring their
progress and retaining ownership on behalf of
ITIL defines an incident as “Any event which is the user as long as the incident is still “open”.
not part of the standard operation of a service
and which causes, or may cause, an It is considered good practice to record all
interruption to, or a reduction in quality of, that enquiries as incidents because they are often
service”. evidence of poor quality training and/or
inadequate documentation.
Historically, incidents were handled by a
fragmented set of processes where users faced It may be that following the initial logging, a
with a problem would contact IT staff direct and distinction is made between simple queries and
any resolutions would not be documented. an incident that relates to a failure or
degradation of a system.
Alternatively, system monitoring tools may
have alerted technical specialists who would A request for a new product or service is usually
rectify the problem, but again with no central regarded as a Request for Change rather than
recording or control. an Incident.

This approach led to poor use of expensive However, because the processes are essentially
resources – the IT experts – to a failure to learn similar, many organisations include Requests
lessons from previous incidents. ITIL Best for Change within the scope of incident
Practice processes aim to resolve both of these Management.
issues.
Automatically registered events, such as the
One of the main goals of Incident Management failure of a disk drive or a network connection,
is to restore normal service as quickly as are often regarded as part of normal
possible, with a minimum of disruption to the operations. They are still included in the
business. definition of Incidents though – albeit that the
service to end-users may never be affected.
This has to be balanced against the efficient us
of resources – and the prioritisation of different
incidents that can occur simultaneously.

It is important to distinguish between Incident


Management and Problem Management – which
is the subject of the next lesson.

20
Lesson 2b Incident Management

around may come from the expertise of the


Service Desk staff – in which case it should be
recorded for future use.
In the event that the incident cannot be
immediately resolved at the Service Desk, one
of the vital jobs at this point of the life-cycle is
to identify the correct second-line support
group to whom the incident should be
functionally escalated.

Investigation and Diagnosis may result in a


direct resolution or the incident being routed to
the identified second line support.

This activity may be iterative, in that several


attempts may be required in order to find the
best resources to tackle the problem.

This shuttling backwards and forwards of an


incident between different support groups is
one of the major issues for Incident
Management.

If this total process is taking too long then


hierarchical escalation procedures may end up
being used, as we discussed in the previous
lesson.
Incident Lifecycle
Resolution and Recovery may involve raising a
It’s very important to understand the process
Request for Change and getting that change
that an incident goes through from its initial
implemented.
detection right through to its point of closure.
Recovery itself may entail the business in
The first step is the detection and recording of
further actions, such as re-entering or verifying
the incident. It is vital that every incident is
data. For example, if a disk has crashed, the
logged with a unique ID reference – even if we
problem may have been resolved by replacing
know that the problem has already been
the disk drive, based on an official request for
reported and a fix is being produced.
change. But the service has not been
recovered until the data is brought up to date
Apart from the basic details about the incident,
from the backup or archive copies.
the log will normally include details of how the
incident was reported and the services and
Incident Closure should involve some
Configuration Items that are affected.
confirmation by the originating user and, where
appropriate a revised classification.
Incidents can also be classified into different
types for use in subsequent analysis.
It is quite likely, for example that an initial
report of a printer problem was classified as a
The example classifications given in ITIL are
hardware fault – but subsequent analysis
Hardware. Software and Service Requests – but
determined that the fault was actually with the
what is sensible here will obviously depend on
software. It is important that such corrections
circumstances.
are made to the incident classifications so that
an accurate record is maintained.
Also included in this part of the process will be
the matching of the details against previously
It is possible for an incident to be closed whilst
reported incidents to check for known errors,
the underlying problem is still under
and then assigning a priority to the incident.
investigation. This would be true where a
work-around is available, for example.
We will be returning to the subject of priority in
a few minutes.
Some organisations have an extra category
which is “Incident Closed and Underlying Cause
Initial Support may involve the application of a
Resolved”, which they don’t use until the final
work-around, some sort of temporary solution
resolution of the underlying problem.
that we know about from the existing problem
or incident database. Alternatively, a work-

21
Lesson 2b Incident Management

Whilst all this is going on there are the issues of If a “known error” is generated then in most
ownership, monitoring, tracking and com- cases this will lead to a Request for Change – in
munication to be maintained. order for the underlying fault to be corrected.
Additionally, there will be constant updating of Unless, as we have just said, there are good
the status of the incident as it moves through reasons why we should just live with the
the various points of it’s life-cycle. problem for now because the cost of a short-
term fix is not justified.
All of these are proactive activities carried out
by the incident management staff – which is Once a Request for Change has been through
usually the Service Desk, acting on the users the Change Management process as defined by
behalf. It involves generating reports, keeping ITIL, then this will lead to the release of a
users informed and managing escalations. structural solution to the problem. This will be
a permanent fix to the underlying fault, not just
ITIL standard practice guidance says that all a work-around.
these activities remain with the Service Desk
and the use of to help with automatic status Whilst all this is going on, the Configuration
tracking is very important in the incident Management Database should be being updated
lifecycle. with information about the incident, any
problems and their links to incidents, about any
Finally. Remember that everything should be “known errors” and their links to problems, and
logged as an incident – even if it is a Service about requests for change and their links to
Request ie. a request for a standard operational known errors.
item, such as a password reset for example.,
So an integrated Configuration Management
If the Classification and Initial Support process Database not only contains configuration item
determines that the incident is in fact a Service information but also related support records,
Request then the Service Request procedure such as incidents, problems, known errors,
will be invoked. requests for change, and release records.

Because the request was raised as an incident, The absence of a Configuration Management
however, it will eventually have to be brought Database will make it very difficult to harmonise
back into the incident lifecycle at incident separate incident recording, problem recording,
closure, in order to achieve the close down of and change recording systems.
that request procedure.
We will be looking in more detail at the
In understanding the full lifecycle of an incident Configuration Management process in Lesson 3.
it is important to know what further records and
processes may be generated as a result of an Assessing Priorities
incident.
Assessing the priority of an incident is a very
When an infrastructure fault is first reported it important process that needs to be carried out
is recorded as an incident, either by the Service early in the incident’s lifecycle, since it
Desk or direct to the incident management determines what effort is going to be put into
process by automated support tools. its resolution.

Incidents can spawn problems if they are


recurring incidents, or if the Service Desk or
second or third-line support cannot ascertain
the underlying cause.

Some problems will justify the generation of a


“known error”, this being an admission or
statement that we are aware of the problem
and we have a resolution to it.

In other cases, it may well be that a work-


around is an adequate solution – at both the
incident and problem levels.
Priority is determined mainly by the impact and
A good example of this might be ahead of a urgency of the incident or enquiry.
major infrastructure change, where making
significant changes now would not be However, other things can also come into play.
worthwhile. Pragmatically, resource availability will also
have a bearing. So if nobody with the right

22
Lesson 2b Incident Management

skills to solve the fault is immediately available and SLA threat – Problem Management staff
it may have to be put down the list a little. must be informed so that they can provide
extra support to the Service Desk team.
Another factor affecting priority may be the
existence of a specific statement in a Service
Level Agreement that is threatened by the
Benefits & Problems of Incident
incident. Management

Impact - in this definition, is the measure of the The benefits of and potential difficulties with
effect of the incident on the business. This Incident Management are listed on Page 18 of
could be measured in terms of numbers of the little ITIL book and in Section 5.4 of the
users affected or financial loss for example. So Service Support Manual.
it is important to work very closely with the
business in order to understand the factors that Summary
are considered high or low impact.
In this lesson we have been examining Chapter
Urgency concerns the time scale in which the 5 of the Service Support Manual – Incident
incident needs to be resolved. Management.

For example, a fault with a payroll system that We have seen how Incident Management is
occurs on the 2nd of the month may well be Defined, the scope of Incident Management and
considered less urgent than the same fault the differences between Incident Management
occurring on the 20th. and Problem Management, which is the subject
of the next lesson.
These two factors together dominate the ITIL
model for determining priority. So a high We have followed the main stages through
urgency does not always mean a high priority - which an Incident passes during it’s lifecycle
if the impact is considered to be relatively low. and looked at the records that must be kept
For something to be high priority both the and the need for an integrated Configuration
impact and urgency must be high. Management Database.

As we have already mentioned, Service Level We have also examined the different factors
Agreements can also influence priority. that must be considered in determining the
priority of different incidents, which may be
Lets say that Incident A occurs and that this is competing for limited resources.
the fourth incident relating to a particular
service in the current month.

On the other hand Incident B occurs on a


different service and this is the second incident
to have occurred so far during the month.

In both cases, The Service Level Agreement for


the service states that only four incidents per
month are permissible.

In these circumstances – all other things being


equal – it would be reasonable to give Incident
a higher priority.

The resources available are also likely to affect


the priority given to an incident. Although if
both the impact and urgency are high then it is
likely resources will just have to be made
available from whatever sources.

Where there are a number of medium priority


incidents to resolve then clearly the ones that
have suitable resources immediately available
will be tackled first.

Note that when a major incident occurs – in


other words one with a high impact, urgency

23
Lesson 2c Problem Management

Lesson 2c As the term suggests a proactive response is an


ongoing and methodical process. The intention
Problem Management is to minimise occurrences of incidents by
identifying and resolving problems and known
Objectives errors. We will define the difference between
problems and known errors a little later in this
In this lesson we will be examining Problem
lesson.
Management, which is described in Chapter 6 of
the Service Support book of the IT
The ‘reactive’ requirement of problem
Infrastructure Library.
management is to resolve Problems quickly,
effectively and permanently. It should identify
When you have completed this lesson you will
the underlying problems, which are causing
be able to:
related incidents, and find an immediate
workaround.
• Define the term Problem Management
according to ITIL best practice.
Any workaround should allow the smooth
continuation of business. When a resolution is
• Identify Problem Management’s reactive
implemented via the change management
and proactive activities.
process, it should be a permanent solution that
will resolve the problem and the related
• Recognise the standard set of activities for
incidents.
problem control and error control.
Once a problem has been identified, and a
• List the benefits gained from this process
satisfactory resolution found to that problem,
then the change will normally be implemented
The final component in the ITIL infrastructure
through change management procedures.
library guidance for supporting the user of IT
services is Problem Management.
Whether problem management acts reactively
or proactively, it is important that resources to
ITIL defines a problem as ‘the unknown
deal with them are prioritised on a ‘business
underlying cause of one or more incidents.’
needs’ basis.
It goes on to define the goal of Problem
This prioritisation is sometimes referred to as
Management, and that is to minimise the
‘prioritising in pain factor order’. The pain factor
adverse effect on the business of incidents and
relates to the number of people affected by
problems caused by errors in the infrastructure,
incidents, and the related problem, and the
and to proactively prevent the occurrence of
seriousness of the impact on the business.
incidents, problems and errors.
Remember that we said that Problem
Broadly speaking, Problem Management exists
Management processes are normally carried out
to ensure that a process is in place which
by technical staff, and with a combination of
identifies once and for all the root causes of
Service Desk, Incident Management and
problems. It also helps minimise the effects as
Problem Management, we aim to use skilled,
well as preventing potential problems occurring
technical specialists in the most effective way
in the future, thereby attempting to minimise
possible, allowing them to concentrate on major
underlying problems and their causes.
incidents, where they support the incident
management process and the service desk, and
Problem Management processes are usually
more of their time on resolving underlying
carried out by teams of technically focused
causes of those incidents through problem
specialists who work closely with Service Desk
management processes.
and Incident Management staff, and with other
internal and external suppliers.
As is common to other ITIL processes, the
communication of management information
As is common to other ITIL processes, Problem
between IT Service Management roles is very
Management responds to incidents in a reactive
important. This information is used both
way, but also has a proactive element.
internally, within the problem management
team itself, and distributed to other IT Service
Proactive response adopts a forward-looking
Management roles, such as Availability
approach. Trying to prevent issues occurring by
Management.
providing intelligent analysis of problem trends
and statistics, they may even get involved in
For example, if IT users were encountering lots
making decisions about purchasing, and IT
of problems caused by poor quality software
provision.
delivered and supported by a third party

24
Lesson 2c Problem Management

supplier, then information gained from Problem statements made about availability in Service
Management would be very useful to the Level Agreements.
Contract Management team. They could use
this to help the suppliers make improvements, Ultimately, by redirecting the efforts of an
or in evaluation or analysis of the software or organisation from reacting to large numbers of
supplied service. In some instances they could incidents to preventing future Incidents, you
also revoke the contract. provide a better overall service to you
customers and make better use of the IT
So how do we define the responsibilities of staff support organisation resources.
working in Problem Management? These
responsibilities can be broken down into a Finally conducting Major Problem Reviews.
number of focused areas. These reviews take place after a problem
causing major incident or multiple related
These are incidents have been successfully resolved. It is
• Problem Control the responsibility of the Problem Management
• Error Control process to review, identify and prevent the
• Assistance with handling major problem reoccurring in the future. Additionally,
incidents information from these reviews can identify
• Proactive prevention of problems weaknesses in problem management and
• Providing management information incident management processes.
from problem data
• Conducting major problem reviews These review procedures form part of a ‘Service
Improvement Programme’ a key task for any
Problem Control focuses on transforming ITIL conformant organisation which aims to
Problems into Known Errors. It does this by improve value and quality.
identifying the root cause of the problem and
providing a temporary workaround where So let’s look at some problem management
possible. This process redefines a Problem as a definitions in more detail. Firstly, the definition
Known Error. of a problem, which is ‘The unknown underlying
cause of one or more incidents’.
Error Control focuses on resolving Known Errors
under the control of the Change Management We defined how Problem Control focuses on
Process. The objective of Error Control is to be transforming Problems into Known Errors. A
aware of errors, to monitor them, and to problem only exists from the point of
eliminate them when feasible and financially identification to the point when we have found
justifiable. the reason for the problem occuring. Once this
point is reached the Problem becomes a ‘Known
Error Control has become a common process in Error’.
both the applications development, enhance-
ment and maintenance environment and the New Problem identification occurs when we are
live environment; Normally a service and its unable to find a match amongst the definitions
configuration items are introduced to the live of existing problems, or existing Known Error
environment with some Known Errors. It is records. A Problem Record is then raised. One
important that these are recorded in a ‘Known of the most effective Problem Management
Error Database, so that when related incidents techniques is to match against a number of
are reported in the live environment they can multiple related incidents, and realising that
easily be identified. they have a common underlying cause.

Proactive Prevention of Problems, and Providing These Multiple related incidents are of particular
Management Information from Problem Data concern to Service Managers, as they can
includes techniques such as trend analysis, threaten reliability clauses within Service Level
targeting support action, and providing support Agreements or Contracts. For example, an SLA
to the organisation. Typically 80% of incidents might specify that in any rolling month there
are caused by 20% of the IT infrastructure will be no more than two breaks in service
components. provision, and the duration of these breaks will
be no greater than two minutes. So any train of
This Configuration item information can prove events casing us to approach these parameters
useful when attempting to identify the is a major concern. Hence Problem
underlying cause of incidents. The provision of Management helps by providing a very
management information from problem data to important role in the ITIL Service Management
Availability Management for example, can structure, by providing early Identification of
provide vital information on expected levels of problems, and communicating this information
availability, and as a consequence, influence to relevant management areas.

25
Lesson 2c Problem Management

The Problem Control process set consists of a


standard set of control activities.

These are:
• Identification
• Recording
• Classification
• Investigation
• Diagnosis
• Review & Closure

Each reported incident passes through this


process set, so let’s take a few moments to These two stages are complex, and require a
define each of these in more detail. good technical knowledge, supported by
problem solving and diagnostic skills. ITIL
Identification recommends, amongst others, two techniques
Problems can be generated from many sources. to help this process. These are Kepner and
An incident might be completely new and have Tregoe analysis and Ishikawa fishbone
no matching characteristics with records in diagrams. Both are important mechanisms,
either existing Problem or Known Error which allow those working in Problem
databases. It may also be a reoccurring Management to use a structured approach to
incident, which has already been identified. Or problem diagnosis.
it might come about as a result of Problem
Management’s proactive work, where a trend
has been identified and a problem identified as
a result.

Recording
Once a problem has been identified, a record is
created with a unique identifier, and a link is
generated to any associated records, such as
the incidents that caused it, and also to any
Known Errors to which it might relate.

It’s likely that the incident will pass through the


change process, and at this point it will be
linked to requests for change. Throughout this In general it is important to record everything,
process records will also be linked to related and to be able to track back. ITIL’s good
configuration items, within the configuration practice guidance suggests that, regardless of
management database. the type of fault, Known Error records are kept,
although there is no statement on how to do so.
Classification
Problem Management is unlikely to implement
Problem Classification is often an extension of
the resolution of an error. Once a Known Error
the incident classification, and is used mainly to
has been identified then it is handed to Error
determine an appropriate allocation of
Control. Although Error Control remains part of
resources. For example, a problem might be
the Problem Management Process Set, any
identified in the Local Area Network. This leads
resolution is likely to require some level of
to the creation of a team of problem solvers
agreed change, hence the responsibility for the
mainly drawn from network specialists. We will
resolution will transfer to Change Management.
discuss this classification process in more detail
later in the course.
However, for particular types of problems, there
are occasions when Change Management may
Investigation and Diagnosis devolve authority to the Problem Management
These two stages are defined separately team. Importantly, Problem Management must
because they form an iterative process. Initial still raise the necessary change records in order
investigation results in initial diagnosis, which to do this.
leads to further investigation and so on.
Ultimately the outcome from this process
should be a Known Error. Review and Closure
On resolution of every major Problem, Problem
Management should complete a major problem
review. The appropriate people involved in the

26
Lesson 2c Problem Management

resolution should be called to the review to Impact describes how vulnerable the business
determine. might be. For example, life threatening, or
merely a small inconvenience.
• What was done right
• What was done wrong? Urgency illustrates the time that is available to
• What could be done better next time? avert, or at least reduce, this impact.
• And finally how can we prevent the
Problem from happening again A problem’s classification may well change as a
consequence of the diagnosis activity. This first
Problem closure is the last of the Problem classification of a problem is described as the
Control Activities and is often carried out ‘initial classification’. For example, what at first
automatically when a resolution to a Known appeared to be a problem with a network might
Error is implemented. However we should point actually be the result of a database problem.
out that an interim closure status can exist. For The problem is then reclassified. However, it is
example, when a Known Error has been usual to retain both the initial and final
identified and a solution put in place, a status of classifications, so that resource allocation to
‘Closed pending Post Implementation Review’ problem areas can be improved.
could be assigned to it in either the Incident,
Known Error or Problem records. ‘Closed Sources of Problem and Error Identification
pending PIR allows us to confirm the We discussed earlier in this lesson how problem
effectiveness of the solution prior to final management works reactively to identify
closure. problems, by checking knowledge bases for
records of problems, Known Errors, changes
For incidents, this may involve nothing more etc.
than a telephone call to the user to ensure that
they are now content. For more serious A proactive activity involves the analysis of past
Problems or Known Errors, a formal review may incidents, and the IT infrastructure as a whole.
be required. For example, analysis might identify that a pre-
existing problem at one site, might reoccur at
Finally, remember an important part of Problem another site, which has a similar server,
Management is to continually monitor its own hardware and software configuration.
progress, and the progress of those technical
support staff that are called in when problem Also involved is the broader analysis of the IT
diagnosis, investigation and resolution is infrastructure itself. The examination of over
necessary. This can be particularly important complex relationships, or single points of
when problem resolution is ‘time constrained’ failure, can identify any vulnerable points that
by a Service Level Agreement. are a potential threat to business.

Problem Classification This analysis might indicate that a particular


When a Problem is identified, the amount of network route is more heavily used than
effort required to detect and recover the failing expected, and as a consequence is a potential
Configuration Item has to be determined. It is future risk.
also important to be aware of the impact of the
Problem on existing service levels. This process Often this work is carried out in conjunction
is known as ‘classification’. with Availability Management staff, and involves
careful analysis of paths through the component
One of the main reasons for problem infrastructure that make up the various
classification is to ensure that any group of services. For example, a customer using on-line
specialists that we bring together to solve a banking to read their balance may involve
problem is the most appropriate. If a problem is hundreds of different paths.
generated by the local area network, then it’s
important that we assemble LAN and desktop Another element of proactive problem
specialists. management involves working with third party
suppliers, and our own internal staff, to ensure
Problem classification is also used to prioritise all procedures are adequate, for example
the sequence in which problems are addressed. testing procedures, release procedures and so
If we are experiencing a large number of on. Internal staff can be encouraged to take
incidents related to several different areas of part in system reviews during development,
the business then priority must be assigned ensuring a higher level of maintainability is
appropriately. Every incident, problem or designed into the system.
change will have both an impact on the
business services and urgency. And finally, providing access to ‘knowledge
bases’. Service Desk staff will be able to link

27
Lesson 2c Problem Management

recently occurring incidents to Known Errors


and Problems in these bases, resulting in a
better understanding of the underlying
problems and Known Errors in the Organisation.

Error Control consists of four defined processes


and these are:

• Error Identification and Recording


• Error Assessment
• Recording Error Resolution
• Error Closure

Error identification and recording only comes


about when a root cause and, if possible, a
temporary workaround has been found.

Error assessment involves deciding on how to


resolve the error and, if this is valid, raising a
request for change to achieve this.

Recording Error Resolutions in documents that


the problem has ‘actually’ been resolved. Here
Problem Management works closely with
Change Management and Release management
process teams, and the end-user. Let’s look at some example incidents and follow
their path through the model.
And finally Error Closure. Closure only occurs
when the relevant change has led to the The first example is defined as a routine
business finding a satisfactory resolution to the incident, and exits the model at the routine
underlying errors, problems and related procedures level.
incidents.
The second example is defined as a non-routine
It’s worth noting that Problem Management is incident, in other words, one which isn’t
responsible for recording errors discovered in recognised at the Service Desk. Initially we will
both the live and development environments. A attempt to match it against our Known Error
situation might arise, where due to time or cost database. If a match is found, then the incident
constraints, a product is released which moves to ‘inform user of workaround’ status,
contains Known Errors. For the Service Desk to and if the workaround exists the user is
match incidents in the faulty software to Known informed immediately.
Errors, it is vitally important that the pre-
existing Known Errors are recorded in a Known The incident process moves on to:
Errors Knowledge or Database. Increase by one the incident count on the
known error record.
All four of these processes are classified as
reactive. Error Control also has a proactive Update the category data in the incident, this
element. This proactive activity includes could involve reclassification of the incident. An
analysing and maintaining the Known Error incident might have been initially identified as a
Knowledge base, in order to provide support to network error, but recognised in the Known
the Service Desk, and identifying underlying error database as a database related error.
trends in Known Errors.
The next process is to extract any permanent
Assisting Incident Management is a resolution or circumvention knowledge from the
fundamental responsibility of Problem known error database. If a permanent
Management. To identify incidents, and to resolution exists, then the Service Desk can
assign actions to them, information execute this, often with the support of change
management moves it through an Incident management.
matching process model.
The third incident example has no match in the
Known Error database. However, as it’s a pre-
existing Problem it does have a match in the
problem database. In this case the incident
then follows a similar route to our Known Error
example.

28
Lesson 2c Problem Management

Benefits & Problems


Finally, the fourth example has no matches in
either the Known Error or Problem databases. The benefits of and potential difficulties with
This incident is identified as being caused by a Problem Management are listed on Page 22 of
new problem, and a new record is raised in the the little ITIL book and in Section 6.4 of the
Problem database. The incident is then Service Support Manual.
forwarded for further support to the problem
management team. Summary
To achieve tangible benefits in an ITIL
In this lesson we have been examining Chapter
compliant organisation Problem Management
6 of the Service Support Manual – Problem
cannot operate in isolation. To work effectively,
Management
it must coexist with a structured incident
management process. If ITIL implementation
We have examined in detail the standard set of
resources are scarce, then it’s best to focus on
control activities, and the Problem
the reactive elements of problem and error
Classification, and Problem and Error
control, leaving implementation of proactive
identification processes.
manage-ment areas until later. Ideally once
service monitoring activities are in place, and
We finished by defining the four Error control
useable knowledge bases exist.
processes, and to outline the benefits, and
some possible drawbacks, of Problem
It’s also focusing on the key problems, which
Management implementation.
are causing the greatest ‘pain’ to the business.
Remember Pareto? 20% of Problems may cause
We’ve looked at three interrelated areas, the
80% of service degradation.
Incident Management, Problem Management
and Service Desk functions, and the reasons in
favour of implementing Problem Management.

29
Lesson 3a Configuration Management

Lesson 3A ITIL guidance considers this process as the


foundation on which a stable organisation is
Configuration Management built. In any organisation, knowing what assets
we have and their current status is fundamental
to business stability. After all, how can we build
Objectives something without knowing what we are
building on, and what we have to build with.
In this lesson we will be examining the first of
the three ITIL control Processes, Configuration
This is how ITIL defines the four major
Management, which is described in Chapter 7 of
Configuration Management goals.
the Service Support book of the IT
Infrastructure Library.
To account for all IT assets and
configurations within the organisation and
In this lesson we will;
its services.
• Examine the relationship between
To know the total cost of the IT
Configuration Management and the Service
infrastructure, where it was sourced from,
Delivery and Service Support functions
who is responsible for maintaining it, and
what dependencies exist between different
• Define a Configuration Item in ITIL terms
assets.
• Look at the Configuration Management
To provide accurate information on
Database, and the type of information and
configurations and their documentation to
records it contains
support all other service management
processes.
• Describe the five Configuration
Management sub-process. Planning,
This can be very useful for cost accounting of IT
Identification, Control, Status Accounting
services. Knowing what we have, how much it
and Verification.
cost and what depreciation model we are
applying. It is critical for Configuration
Configuration Management sits at the centre of
Management to support IT Service Continuity
the three ITIL Control Processes.
Management. But without a thorough
understanding of what a ‘live site’ contains,
The objective of these Processes is to;
then we can’t know what any ‘fallback site’
should contain. In the same way, effective
• Ensure that the organisation has accurate
Capacity Management planning and Availability
records of its ICT assets
Management planning can only take place if
they fully aware of all ICT components and their
• Changes to the IT services are executed
relationship to each other.
quickly and with the minimum of business
risk
To provide a sound basis for Incident
Management, Problem Management,
• To ensure an integrated set of data exists,
Change Management and Release
recording details about services, their ICT
Management.
components and any related support
records.
Information provided by Configuration
Management is very useful to other processes.
For example, configuration information about a
fault in one type of workstation could help
Problem Management rectify future problems
before they occur.

Verify Configuration records against the


infrastructure and correct any exceptions.

Configuration Management provides


organisational confidence, providing records
that relate exactly to the real physical situation.

So lets start by looking at how Configuration


Management relates to Service Delivery and
Service Support as a whole.

30
Lesson 3a Configuration Management

ITIL places Service Level Management at the A typical CMDB should contain information on:
very top of our objectives because it represents
service delivery’s ‘shop window’ to customers • Hardware, Software, Peopleware, and
and users alike. It’s also a service to which related documentation.
guarantees are applied, in the form of Service
Level Agreements. • Services, and the relationship between
Configuration Items.
Service level management is supported by
several Support and Delivery processes, which • Incidents, problems and known errors.
amongst other things, enable Service Level
Management to negotiate and comply with • Changes and releases
SLA’s. This whole support structure is
underpinned by the configuration management • Records at the highest level contain
process. ITIL guidance is explicit on this point
and states that ’without effective configuration • information about the organisations
management we are not likely to effectively hardware, including servers, workstations,
implement the other ITIL processes, and this communications equipment and networks.
will lead us to a failure to deliver a quality
service.’ • Information relating to Software, including
operating systems, application or script
software, or any custom designed software.
• Details about Peopleware, including
information related to IT service staff and
their skills.

And finally,

• information related to documentation,


including procedures, contracts and so on.
In ITIL terms Configuration Management can be
defined as Asset Management plus The second level holds records related to IT
relationships. By definition this statements services. A service might be made up of several
broadens the scope of Configuration CI’s. For example a service for the personnel
Management. Most organisations have some dept might consist of hardware, software and
sort of asset management system in place, related documentation, all of which are
where they know the cost of equipment, where individual configuration items. These items
it was purchased, and its current status. This together can provide a service, and the service
system may only cover hardware and bought-in itself can also be defined as a configuration
software. However existing systems are unlikely item.
to cover the ‘relationships’ or linkages between
these assets. ITIL suggests that we should be able to draw a
map of how a service is assembled from its
This linkage is very important, making changes constituent components. This graphical
to one, can have a knock-on effect to several representation can help us understand the
others, so ITIL clearly focuses on Assets and impact of any changes we make to a CI on the
their relationships. service as a whole.

Because configuration management’s remit is The CMDB is also the ideal place to hold
wider than pure asset management, we tend to incident records, problem records and known
refer to the information that Configuration error records if they are held on separate
Management maintains as Configuration Items systems. ITIL guidance suggests trying to link
or CI’s, rather than IT assets. these databases, so that we can link a record to
any related configuration items. By doing so,
We have established that Configuration future searches on a particular CI will return
Management underpins all the Delivery and information relating to outstanding incident,
Support Processes, and it defines IT assets and problem or known error records.
services as Configuration Items. We’ve also
established that it monitors the inter- In the change and release section of the CMDB,
relationships or linkages between CI’s. So how we may hold requests for change, change
does Configuration Management store, manage records and so on. This information is used for
and update this information. It does this by tracking the progress of change and release
entering all this information into a Configuration records. A release record will contain
Management database or CMDB. information about a number of related CI’s,

31
Lesson 3a Configuration Management

which make up a new release, and will describe Again records relating to the contents of both
how to achieve a change defined in the change the DSL and DHS are held in the Configuration
records. Management Database.

A CMDB can offer great benefits to an Also worth noting here is the management of
organisation. However the benefits might not be software licences. This has become a major
immediately obvious to senior management, issue for many organisations, and the
who might suggest that a simple asset repercussions of illegal software use can be
management system would be sufficient. severe, so it’s considered good practice for
However, asset management only addresses configuration management and release
higher value issues in the infrastructure and management to work jointly on this process. In
doesn’t examine it to the same level of detail. a fully ITIL implemented organisation, the
configuration management team would be
Perhaps more importantly, asset management expected to hold information about licences,
systems wouldn’t contain the linkages to what they contain, and what it covers, as a CI
incident, problem, or known errors, or to in the CMDB. However, as with the DHS and
change and release management records, and DSL the physical licences might be held in a
critically wouldn’t document the relationships separate repository.
between CI’s and asset records that a CMDB
would. ITIL suggests that Configuration Management is
made up of five sub-processes.
We briefly defined earlier in this lesson what
constitutes a CI, and ‘ITIL’ defines a These are:
Configuration Item as ‘any component of an IT • Planning
Infrastructure, including a documentary item • Identification
such as a Service Level Agreement or Request • Control
for Change, which is, or is to be, under the • Status Accounting
control of Configuration Management and • Verification
therefore subject to formal Change Control’.
Planning is carried out at the beginning of any
CI’s will vary in type, distinguishing between process to establish a configuration
hardware, software and documentation, and in management plan, and should be revisited
some circumstances, will sub-define lower level regularly.
configuration item records. For example
hardware type might be made up of The processes of Identification, Control, Status
workstations, servers, network equipment and Accounting and Verification are on going.
so on.
Let’s look at each of these processes in a little
Whatever the CI type, it will require a unique more detail.
form of identification. Firstly, a unique
identifier, which should comply with a The first of the Configuration Management sub-
predefined configuration policy. Also an ID type, processes is planning. ITIL suggests five key
which categorises the item into hardware, points which should be addressed in planning,
software, peopleware and so on. Other and these are:
common CI attributes might include a
manufacturers or developers id, its location, • Strategy, policy, scope and objectives
purchase date etc.
• Processes, procedures, guidelines and
In addition to the CMDB, Configuration responsibilities
Management has linkages to two other
information repositories. These are the • The relationships with other ITIL processes
Definitive Software Library or DSL, and the
Definitive Hardware Store or DHS. • The relationship with other parties carrying
out Configuration Management
The DSL is the safe storage area for trusted
software, and is managed by the Release • And finally tools and other resource
Management process. requirements

The DHS houses spare parts for critical We start by defining a strategy. For example,
equipment, and replica configuration models in an organisation might want to establish a
the IT infrastructure. For example the DHS Configuration Management system, but for its
might contain a fully configured standard server ‘live systems’ only.
and workstation.

32
Lesson 3a Configuration Management

Another policy may define that all new bought-


in or internally developed systems or services The final point on planning is the use of tools,
are to be brought under Configuration and other resource requirements. Careful
Management control at the point of hand over, consideration needs to be given to CMDB
but existing live systems will not be within implementation, whether to design and build a
scope. CMDB from scratch, or to purchase an off-the-
shelf product. Vitally it should be possible to
The scope might encompass desktop services, link the CMDB to system and network
workstations and data centres, but not the management tools, with the benefit of
communication network. Accurate definition of automatic CI recording to the CMDB via these
the scope is important in order to understand tools.
the amount of work involved, and the resources
required. The second of the five Configuration
Management processes is identification. The
Once the strategy, policy and scope are primary focus of the identification process is the
defined, the objectives can be outlined, and a establishment of the ‘Configuration Item Level.
timeframe in which to achieve them. Remember When defining a configuration item we need to
the objectives should be ‘SMART’ objectives, in establish what level of detail is appropriate. For
other words Simple, Measurable, Achievable, example, a complete workstation might be
Realistic and Timely. considered as a configuration item, or it could
be further categorised into its component parts,
Having dealt with strategy, policy, scope and and make each of these a CI.
objectives, our next action is to examine the
processes, procedures, guidelines and This logic must also apply to software, defining
responsibilities. a CI as a program as a whole, or a module or
sub module of that program.
The organisation might already have in place
processes to control assets, or change Generally speaking, select a configuration item
management processes. Although these may level which is most beneficial to the
not be formally identified as a Configuration configuration management process. So, within
Management process, but this could be adapted any organisation, greater levels of CI detail
and improved upon. exist in some areas than others. The greater
the level of control required over an area or
Planning procedures should be created and service, the greater the number of configuration
maintained along with other related guidelines. management record detail. Be careful in
We will discuss this in more detail later in this choosing the most appropriate level, balance
lesson. information availability and the level of
independent control, against the resources and
And finally responsibility has to be allocated. effort needed to support the CMDB at that level.
After all, these plans, processes and changes
have to be carried out. So work should be The key target is ‘maximum control with
allocating to staff in either a configuration minimum records’.
management group, or a wider configuration,
change and release management group is It’s also worth noting that, the level of
necessary. configuration hierarchy could be restricted by
the support tools available. For example
If, in this example scenario, configuration breaking down a workstation into its monitor
management is being introduced into the and screen, and then further down into its
organisation after other ITIL processes, then it motherboard, CPU and other component parts,
is important to define how these other may be impossible if the depth of our CMDB
processes will have to change to accommodate system hierarchy is specified to two levels only.
the new configuration management process.
Alternatively, if configuration management is A configuration item record may well contain
implemented ahead of other processes, future information about candidate configurations
inter-process relationships will need to be items below it in its hierarchy. For example, in
considered. the event of a workstation failure, the policy
might be to replace the whole workstation
Relationships with other parties who carry out rather than the failed component. However, CI
Configuration management also requires information about the failed component could
particular attention. Suppliers, external be held in the CI for the workstation. Also
software vendors, and developers might have consider that a candidate CI might have
their own CMDB with which we want to linkages to other CI’s other than its immediate
exchange information. parent. In these circumstances the CI

33
Lesson 3a Configuration Management

information would show its linkage to its parent, ‘Connection’ describes the relationship between
and also a ‘used by’ relationship to other CI’s. It hardware items.
would not be helpful to lose this level of detail The relationship between a LAN and a server for
by incorporating details into the parent CI. example.

Documenting these linkages in the CMDB can ‘Usage’ describes the interdependency between
have a huge impact on database size. Each new application usage of a common software
CI added might identify three or four linkages. module, or the linkage from one category to the
It’s good practice to establish in advance the other.
required levels of CI’s in the database, even if
we don’t initially populate the database to this Finally having identified and documented
level. With most CMDB tools, it’s far easier to information about CI items, items should be
have empty elements in the database, than to labelled. These might exist in electronic format,
have to restructure the database at a later or might be printed labels which we apply to
date. identify the relevant CI’s.

During development we might want to capture


Successfully building and maintaining a CMDB information about CI’s and their relationships,
depends on accurately identifying and labelling to reflect the position at a particular time. This
its configuration structures and CI versions and is known as ‘baselining’. This can be a very
types, and their linkages with other CI’s. This is useful process, as baselining can provide a
termed as defining its scope. rollback point if things go wrong. It can provide
a specification from which copies can be built,
Defining scope identifies which items of and can provide valuable review information
hardware, software, peopleware and after the implementation of a request for
documentation are to be included. Part of this change.
process involves identifying the number of
‘configuration types’, and what benefits their During the baselining process, we should
identification will bring. include the relevant related items, including
documentation, procedures, peopleware and so
When identifying and refining CI types, we on. Baselines should be established at formally
might come across candidate CI’s which are agreed points of time. For example, before
generally very similar, but have subtle making significant change to the infrastructure.
differences. For instance, two workstations At any point, the current configuration consists
exist, which, except for having monitors of of the most recent baseline plus any approved
different sizes, are exactly the same. This slight changes that have been implemented. It’s very
difference in specification wouldn’t justify the common to take baselines of standard
specification of a new CI type. To help us workstation configurations to provide a
accommodate these anomalies we can specify ‘rollback’ position if recent changes prove
these as a ‘CI variant’. unsatisfactory.

Version Identification needs to address the full The third Configuration Management activity is
lifecycle of the Configuration Item, so, in Control. The control of configuration items
addition to those items already in the live consists of three sub processes. These are:
environment, items in development and Register, Update and Archive. An additional
awaiting release are also included. At the same function of the control process is to protect the
time version numbers are assigned. These integrity of configurations.
numbers should be monitored carefully. If for
example the development department assign CI’s are registered as they fall into the remit of
their own version numbers, then it’s important IT service management. If we receive new
that this information is transferred to the CMDB equipment from an external supplier, at the
at the point of handover. point of handover, we should establish that
information received from the supplier is
In defining the inter-relationships between CI’s, accurate. In many organisations this activity
there are a number of typical ‘types’ which can has a direct link with procurement.
be used. The most frequently used in ITIL good
practice are Composition, Connection and There are many reasons for updating a
Usage. configuration items status. For example, a
change in the CI’s status from testing to ‘live’. A
‘Composition’ is the simple parent child change of financial asset value. A change of
relationship. A workstation being the parent, ownership, or changes brought about by
the monitor, keyboard or system box being the incidents, problems or known errors. All these
child.

34
Lesson 3a Configuration Management

updates have to happen under the authority of that a request for change on a configuration
the configuration management process. item was properly authorised.

Archiving decommissioned CI’s takes place The fifth and final configuration management
when a component is no longer in use. The activity is Verification.
definition of what constitutes a redundant CI,
decommissioning and timing details, would The primary function of Verification, or
usually be specified in a predefined policy verification and audit as it is sometimes known,
document. is to establish that the information in the CMDB
exactly matches the real life environment.
Archiving involves the removal of CI’s from the Configuration management offers little benefit if
CMDB and archiving onto secure storage, and the information that it provides is out of date or
not necessarily the destruction of the record. inaccurate.

The protection process safeguards against This verification and audit procedure should be
illegal changes to CI’s, and procedures are carried out regularly but randomly. Deliberate
maintained so that the CMDB and the avoidance of the change, and configuration
information it contains are secure. Protecting management process is most likely to be
the integrity of the configurations includes revealed by this ‘spot check’ approach. These
security against theft, protection against audits involve checking the physical
unauthorised change or corruption. Enforcing whereabouts of equipment, and installed
access control procedures. Guarding against software. In addition to the regular ‘spot
any environmental damage. Protection against checks’, verification and audit would usually be
viruses, and making back-up copies of the carried out at the following times:
CMDB information, and the secure storage of
these back-ups. • Before a new release, or before the
preparation of a baseline.
Configuration control scope must extend to
‘bought in’ CI’s, such as commercial ‘of the • After a disaster. To establish that our
shelf’ software, sometimes known as ‘COTS’ records are accurate, following a major
packages. By definition this will involve failure in the IT infrastructure.
software licence issues, and we will be
examining this in more detail in the release • Following detection of unauthorised
management lesson. changes to the infrastructure. A single
unauthorised change might be concealing
Importantly, the protection procedures should many others, with the result that the CMDB
be in place for the definitive software library would not reflect the real life situation.
and definitive hardware store.
• And we would usually carry out an audit
The fourth Configuration Management activity is before the live implementation of a new
Status Accounting. Configuration Management database.

ITIL defines status accounting as; ‘The Carrying out a manual verification and audit can
reporting of all current and historical data be a time consuming and expensive procedure.
concerned with each CI throughout its lifecycle.’ ITIL recommends the use, where possible, of
automated verification tools. These tools are
Status accounting allows us to reveal a CI’s able to roam networks and servers, reporting
past status. What has happened to it up to this on installed hardware and software.
point? Its present status, (what state is the CI Interestingly many manufacturers are building
in now?), and its future status. (What plans automated management functions into their
there are for this CI in the future?) PC’s.

This accounting procedure enables changes to It’s also worth remembering that some
CI’s and their records to be tracked, and to verification can be carried out by the service
document changes in a CI’s status, for example desk staff. During calls from users, service desk
the change from ‘live’ status to ‘withdrawn’. It staff can ascertain what hardware and software
can also help us establish ‘baselines’. By are being used, and whether this matches
declaring a status of ‘trusted’ we save all the current configuration item records.
configuration items and relationships as a
baseline. If we encounter problems at a later Finally, it’s worth noting that in many large
date, we can then retreat to this ‘baselined’ organisations, responsibility for the verification
point. Status accounting can also be used to and audit process would rest with a
monitor organisational procedures, for instance, Configuration Librarian.

35
Lesson 3a Configuration Management

As we discussed earlier in this lesson, Benefits & Problems


configuration management is closely linked with
the overall Service Support and Service The benefits of and potential difficulties with
Delivery process, both supporting, and Configuration Management are listed on Page
depending on these processes. 26 of the little ITIL book and in Section 7.4 of
the Service Support Manual.
When an incident is identified it passes through
these processes, and it’s important to realise
how the CMDB, and configuration management
Summary
as a whole, support this.
In this lesson we have been looking at the
The CMDB is used to read and write information configuration management process.
by each of the service support process
throughout the incidents lifecycle. We have seen how configuration management
forms the foundation on which service delivery
For example, when an incident occurs we will and service support functions are built, and how
record it in the CMDB. At the same time we all of these processes support service level
could examine the CI’s which might be causing management.
the incident.
In ITIL terms, configuration management can
When the incident moves into the Problem be defined as asset management plus
process, we will be recording the problem relationships, and we looked at how these
information in the CMDB, and also looking at assets are defined as configuration Items or
the CMDB for related incidents. The Known CI’s.
Error process will have links in the database to
problem records, which in turn are linked back We went on to examine the configuration
to the ‘underlying cause’ configuration items. management database or CMDB, its structure,
and the type of information and records it
When executing a Request for Change the should contain. We also looked at how the
Configuration Items, and their CMDB links to the Definitive Software Library
interrelationships, will be examined in order to and the Definitive Hardware Store.
asses the impact of the change. Change records
will be stored, and their status changed as it We discussed in detail the five Configuration
moves through the tested, implemented, build management sub-processes, Planning,
stage of the change process. In this integrated Identification, Control, Status Accounting and
environment we can see the fundamental role Verification, and we went on to look at the
of the configuration management database and relationship between Service Support and
configuration management as a whole. Service Delivery and the CMDB.

The ultimate update authority always lies with And finally we looked at the potential benefits
the configuration management process, but this and pitfalls when implementing configuration
authority can be delegated in the case of management.
incident and problem records. Configuration
management also remains responsible for
updating the CMDB during the change and
release processes, often acting on behalf of the
change and release management processes.

36
Lesson 3b Change Management

Lesson 3b The first of these is the ability to handle


changes promptly and efficiently. When a need
Change Management for simple and routine change occurs, Change
Management should handle them in a
Objectives streamlined and pre-planned manner. Where
more significant and complex changes arise,
In this lesson we will be examining the second
they should be dealt with efficiently, but to an
of the ITIL control processes, Change
appropriate level of detail.
Management, which is described in Chapter 8 of
the Service Support book of the IT
Change Management is responsible for
Infrastructure Library.
implementing changes in the organisation with
the minimum of disruption. Historically, making
In this lesson we will;
changes to the IT infrastructure has resulted in
a loss of business, and lost production time.
• Define what change is in ITIL terms, and
ITIL guidance addresses the potential impact of
the goal of Change Management.
proposed changes by suggesting the use of
fixed change slots in what’s termed a ‘forward
• Examine the relationships between Change
schedule of change’
Management and other ITIL processes.
As a result users are informed about up coming
• Define a Request For Change or RFC, and
changes, what the change entails, and when it
examine some of its potential sources
will take place. As a further safety net, change
management carries out impact analysis on
• Look at the role of the Change Advisory
proposed changes, and produces a backout
Board, and the Change Advisory Board
plan, giving the organisation a point to which
Emergency Committee.
they can retreat if a change proves
unsatisfactory.
• Examine the Change Management process
in detail.
And finally, Change Management must balance
the need for change against the risks on the IT
The second control process within ITIL guidance
infrastructure of implementing it.
is Change Management.

So what is Change Management? Well let’s start Change Management Relationships


by more accurately defining the term change. It
has many definitions, but possibly the simplest For change management to be effective it must
one is the most apt. work very closely with several other IT service
management disciplines. These processes are;
‘Change is the process of moving from one Release, Capacity, Availability and Configuration
defined state to another’. Management. We mentioned earlier in the
course that its quite common for change,
ITIL defines the goal of change management in configuration and release management
the following way. processes to be staffed and managed as a
single team.
‘To ensure that standardised methods and
procedures are used for efficient and prompt So lets look at some of these relationships in
handling of all changes, in order to minimise more detail. Change Management has overall
the impact of any related Incidents upon responsibility for assessing the potential impact
service’. of any changes on the ICT infrastructure. It’s
supported in this role by Capacity, and
Change Management can either be restricted to Availability Management.
changes to the ICT infrastructure and the
current ICT services offered in the live Capacity Management will assess the impact on
environment, or it can be expanded to cover all the business performance of any proposed
changes, including those in development areas, change. On the other hand, Availability
or changes which are the result of strategic Management will be concerned about any
decisions. impact the change has on service availability.
Capacity and Availability Management should be
There are a number of key points here which involved as early as possible in the change
highlight why the change management process process in order to judge the impact of the
is critical to a well run IT services organisation. proposed changes.

37
Lesson 3b Change Management

Any change to the infrastructure involving has lead to a known error and a proposed
software, hardware, services and so on, will ‘structural’ resolution.
result in changes to Configuration Items. As a
consequence Change Management must work Another source of RFC’s is the need for the
closely with Configuration Management. As we introduction of new or upgraded CI’s. For
said earlier, part of Change Management’s example, your organisation has recently
responsibility is the analysis of any proposed purchased new workstations, their installation,
change. To do this effectively it must addition to the network, recognition by the
understand what CI’s will be affected by the server, providing the help and user
change, the way in which constituent CI’s are documentation, will all generate RFC’s.
linked, and if linked, how they make up one or
more services. So Configuration Management We may have a ‘New or changed business
identifies CI’s which are likely to be affected, on requirement for an IT service’, often identified
behalf of Change Management. by the service level review process. Again this
will generate a Request for Change, and be
By exchanging information with Capacity, passed on to the Change Management Process.
Availability and Configuration management,
Change Management is able to ‘Asses the An RFC might arise because of customer or user
overall impact’ of the change. Once assessed dissatisfaction with a current service. This may
we should be able to state; not have been reported via incident or problem
management, and it might not be outside our
The impact is manageable, the cost of change is current Service Level Agreements. However, it’s
reasonable, and business benefits are important, where financially viable, to meet
worthwhile. At this point Change Management customers requests.
‘authorises the change’.
Implementation of new or changed legislation
In many cases this authorisation is with the might bring about an RFC. Particular examples
help of other experts who form a body known include legislative changes relating to privacy,
as the Change Advisory Board, and in some intellectual property rights, security and so on.
cases, where the change is a simple one,
Change Management can be devolved In these A major change in business requirements may
cases it is common for the Change management generate a significant Request for Change. Such
process to be devolved to Problem a request may have already passed through a
Management, or even to operational staff. conventional investment appraisal process, and
enters the ITIL Service Management process for
Throughout the change management process, a second review. The role of Service
there is an ongoing update of information within Management is to ensure full impact analysis
the Configuration Management database. For against effects on existing services, and on the
example, a CI status can now be moved to infrastructure as a whole.
‘under change’, or a new CI is created if we
replace one piece of software with another and Typically, a request for change will contain such
so on. information as the sponsor, the requested date
for implementation, an initial list of
And finally when a change is ready for release configuration items affected, services affected,
to the wider user community, be it effecting the reason for change and initial costing
software, hardware, documentation or related information. The exact content will vary
infrastructure components, it falls to Release depending on the origins of the RFC.
Management to manage the actual physical
implementation. Remember however, that One of the main responsibilities of the Change
overall responsibility for any change remains in Management Process is to establish a ‘Change
the hands of change management. Advisory Board’ or CAB.

The trigger for the Change Management process The role of the CAB is to consider RFC’s, and in
is the receipt of a Request For Change or RFC. the light of the business need make
recommendations as to whether they should be
ITIL defines a number of sources from which an accepted and implemented, or rejected. It also
RFC can be received. The most common and ensures that any RFC’s which don’t merit
well documented are those that form part of the detailed consideration by the CAB are recorded.
incident resolution lifecycle. For example, where The CAB will also advise on the grouping of
a user identifies an incident and reports it to the changes into ‘releases’ to minimise disruption to
service desk staff, who in turn generate an RFC. the organisation and maximise benefits.
Or from Problem Management, which generates
a RFC after investigation of multiple incidents

38
Lesson 3b Change Management

Typically a CAB is made up of a Change Change Procedures


Manager, who will typically chair the meeting.
Plus representatives of the customer, users, We established earlier in this lesson that the
developers, other experts, consultants, outside trigger for the Change Management Process is
contractors, and of course IT service the receipt of a request for change.
Management staff.
To address these RFC’s, ITIL defines a
At any CAB meeting there may be a different comprehensive change management process,
combination of staff attending, however the and we will spend the next few minutes looking
core members of the CAB should be the at this process in some detail.
chairperson, customer, user, and ITSM
representatives. So lets start with an incoming Request for
change, remembering that RFC’s can come from
In general a CAB is regarded as an advisory many sources, including the business, other
body, although in some organisations it is service management staff, or as a direct result
defined as an approval board. It’s role is of incidents or problems.
considered as advisory because the ultimate
responsibility for change lies with the change The initial recipient of the RFC is the Change
management process and hence the change Manager. At this point RFC’s are filtered, with
management staff. As a consequence this the Change Manager rejecting those which, for
provides a definitive mechanism for change example, have been incorrectly requested, are
approval, and makes changes traceable. requests for service modification rather than
changes, or are repeats of earlier requests.
When making decisions about a proposed
change, the CAB should consider the business, It’s usual for RFC’s to be logged in the CMDB
financial, technical and risk implications. It ahead of this filtering process. However, after
should also consider the repercussions of not filtering it’s common for RFC’s to change status
implementing the change at all. and be redefined as change records.
If the change is accepted it moves to the next
One other area for consideration when deciding process, and the Change Manager allocates a
whether or not to implement a change is its priority to the change. This involves assessing
likely impact on IT continuity plans. Making Changes for impact on the business and
changes to the IT infrastructure without making urgency. There are two possible states from this
changes to any fall back sites can be very assessment, they are ‘urgent change’ or
dangerous. ‘standard change’. Whether changes are
standard or urgent, the principles for processing
The CAB Emergency Committee them remain the same. However, urgent
changes pass through a ‘streamlined’ version of
In many large organisations IT provision is now the change management process, and we will
24 hours a day, seven days a week. In such be looking at this process later in this lesson.
environments the need for a RFC could occur at
any time. In such organisations it is usual to In this example, the change is considered non
have a Change Advisory Board Emergency urgent, and so passes onto the ‘Categorisation’
Committee in place. The CABEC are usually process.
called in at short notice to analyse the impact of
a RFC, and authorise any correcting work. Change categorisation involves an initial
assessment of the actions and resources
The committee would usually consist of the required to make the change. There are four
Change Manager, who acts as Chairperson, a possible outcomes from this process. These are:
senior business representative, and senior IT Standard, Minor, Significant and Major.
representative.
A ‘standard’ categorisation is assigned when a
A word of caution here about CABEC activities. frequently occurring change is identified. It can
Often due to time and business pressures, then be dealt with via a pre-existing set of
comprehensive testing of changes isn’t always processes and authorisations. These change
possible. Nor are configuration items updated types are usually considered low risk, and don’t
with status or change information. Ultimately require consideration by the CAB. An example
the CAB is responsible, through the emergency of this might be a hard disk replacement or
committee, for ensuring that the change upgrade on a user workstation.
management and configuration management
process work together to update relevant
records, and logs as soon as possible.

39
Lesson 3b Change Management

40
Lesson 3b Change Management

The definition of minor, significant and major Note that a failure during the change building
will be defined by individual organisations, and process will almost certainly result in the
will be dependent on the current status of the change returning to the CAB, possibly with a
IT infrastructure, and the IT service request to modify the scope of the change. It’s
management personnel’s current feelings about important that all changes have a back out
risk. plan, so that if an error occurs during
implementation, the change can be reversed
A ‘minor change’ categorisation would usually and the service restored. At this point the failed
be authorised by the Change Manager, who will change will re-enter the process at the CAB
report their actions to the CAB after completion level.
of the change. The aim here is to reduce the
number of RFC’s forwarded to the CAB by Once the change is complete it moves to an
filtering out any low risk changes. Independent tester, where the change is tested
and quality checks are carried out. If at this
If the change is defined as either significant or point a failure occurs, the change is returned to
major, then the CAB will have a significant role. the Change Builder.
In both cases, the first action is for the Change
Manager to circulate RFC’s to either the CAB, or If the Change is tested successfully it moves
in the case of a major change, to company onto the Change Manager, who coordinates the
Board or other senior management members. implementation of the change.

As we saw earlier in this lesson, the CAB’s role Remember that the Change Manager has
is to give advice, provide estimates on required overall responsibility for the change, but that
resources and timescales, and put forward Release Management normally has control at a
schedules for change based on priority and detailed physical implementation level.
resource availability. The CAB will also perform
detailed impact analysis, and this often requires Note that throughout the cycle of building and
input from ITSM specialists, for example the testing, and during implementation the
Capacity Manager. Configuration Management process is updating
the status of change records. Typical statuses
Eventually implementation dates and a schedule include; accepted, in build, under test and so
are decided upon, this information is contained on. A change record will typically contain details
in a ‘forward schedule for change’, which is of the back out plan, when it was built, CAB
passed to the relevant service management recommendations and scheduled
staff, and to the business as a whole. If implementation dates. As a consequence, the
changes are likely to cause disruption to the change record is frequently changed.
business, then this will be formally documented
in a ‘Projected Service Availability Report’. It’s important to accurately manage the change
record system within the CMDB, so that we can
Remember, not all RFC’s considered by the CAB carry out traceability tests. Change records are
will be accepted. After investigation, the usually linked to impacted infrastructure
potential risk or financial implications might be configuration item records, and also to any
considered too high, and outweigh any potential related incident, problem or known error
benefits the change might bring. records.
The CAB activities of estimating and scheduling
may well be iterative, and the process If at the point of live implementation the
continues until an approved change status is change fails, then the Change Builder instigates
reached, or the RFC is rejected, in which case it the back out plans. If however, the change is
might re-enter the process at the beginning. At implemented successfully, it’s important that
the point of approval, the Configuration the Change Manager reviews the change.
Manager updates the Change Management
Database. The review process can provide valuable
information about our change management
The change has now reached the Change process, and can also identify vulnerable areas
Building sub process. The Change Builder may in the IT infrastructure. A successful review will
actually consist of several groups of internal or trigger the ‘closed’ status, and the request for
external staff, who are involved in hardware, change or change record will be updated in the
software, operating systems, documentation CMDB. Note the CAB itself might be involved in
and so on. Change Builders are not normally the review process. A failure at the review stage
permanent members of a Change Management would identify shortcomings in the implemented
Team, but are drawn from areas of technical change. This in turn would result in new
expertise. requests for change entering the process.

41
Lesson 3b Change Management

In the previous few pages we have seen how


the Change Management process deals with a
standard change. We will spend the next few
minutes looking at how Change Management
deals with a RFC, which has been given an
Urgent priority by the Change Manager.

42
Lesson 3b Change Management

The first action is for the Change Manager to We saw earlier in this lesson how the Change
call either a CAB meeting, or in an emergency Manager examines RFC’s and categorises them
situation, the CABEC. The aim of this meeting is as either, standard, using a standard change
to quickly evaluate the request for change, by model, minor, significant or major. To assign
assessing its impact, the resources required and one of these categories, the Change Manager
its urgency. The meeting should establish examines the RFC, and considers the following:
whether it’s urgent status is justified. If the
outcome suggests that the RFC status isn’t Impact
urgent, then it will be rejected, and will be dealt The impact the request for change will have on
with as a standard RFC. the business, considering such factors as the
number of users affected.
If, on the other hand, the RFC status is
confirmed as urgent, then it passes on to the Novelty
next process and in to the hands of the Change Is the change familiar? Has it occurred before?
Building Team. The Change Building Team then Together, Impact and Novelty can provide us
build the change and where technically possible, with some idea about the level of risk involved
prepares a back out plan. with the RFC. A RFC with high impact and high
novelty is certainly a higher risk.
When the change is complete, as much testing
as possible should be carried out. Completely Devolved Authorisation
untested Changes should not be implemented if Has the responsibility for change been devolved
at all avoidable. In this case, the Change from the CAB to the Change Manager? Or
Manager then coordinates the implementation further devolved to say the Service Desk.
of the change into the live environment.
Standard Model
If the implemented change fails, the Change Can the request for change be dealt with via a
Manager implements the back out plan. If the standard model, with a pre-established
change is successful, then the Change Manager implementation process?
firstly ensures that records are brought up to
date, carries out testing in the live So lets add some content to our table, We’ll
environment, and at a later date, reviews the start with column 1.
change. If after the review, the change is
considered successful, then it is closed, and the This RFC is regarded as low impact to the
Configuration Manager closes the RFC and business, and is a well known change, so the
updates the CMDB. novelty is also low. Authorisation has been
devolved to the change manager, and a
Lets take a few steps back, and look again at standard model exists. This is a high frequency
the process, assuming this time we have time RFC.
to test the change. This time our built change
passes from the Change Builder to the Column 2 is slightly different, again the RFC is
Independent Tester who carries out testing as regarded as low impact, but it hasn’t been done
quickly as possible. If tests are successful, then before, so its novelty is high, and as a
the change is forwarded to the Change Manager consequence, no standard model exists. Again
for coordination of implementation. If the authorisation is devolved, and it’s categorised
change fails during testing, then it returns to as a minor RFC. This type of RFC could act as a
the Change Builder process. trigger to build a new standard model.

The Change Management process deals with In our third example, the results are slightly
Requests For Change from many areas of the different. Our RFC has a high degree of novelty,
organisation, and with different levels of and no standard model exists. It will be
authorisation. Where RFC’s are frequent and forwarded to the CAB, so authorisation isn’t
repetitive, they can be dealt with via pre- devolved to the change manager. This RFC falls
existing and authorised processes. These into the significant category.
processes are known as a ‘standard model for
change’. The RFC in our fourth example has a standard
model, however, business impact is considered
Standard models needn’t be solutions to simple high, so devolution to the Change Manager
changes, often complex operations can have won’t take place, and it must be examined by
standard models. In general once a RFC is the CAB before the standard model processes
regularly repeated, we can create a standard are implemented. Hence this is regarded as a
model for that change. significant RFC.

43
Lesson 3b Change Management

As both the impact and novelty are high, the • The number of changes implemented
RFC in our fifth example must also be during the measured period
considered by the CAB. This is also a
‘significant’ RFC. • Number of changes backed out by reason
code
In example six, we are considering a change
which has very high business impact. For • Number of Staff Training records up to date
example, changing from an ISDN based
telephony system to ADSL. Changes of this • Cost per change verses estimated cost
magnitude would normally be authorised at a
higher level than the CAB. It is categorised as a • Number of urgent changes
major RFC.
By auditing the change management process
Finally, lets examine a couple of examples, we can check for compliance to procedures. In
which in general should be avoided. general a change management audit should
investigate:
Firstly, a Change which is regarded as high
impact, but which has devolved authority, this All new software releases
is likely to be considered very risky. Checking that they have been through a proper
authorisation process
Secondly, a change which has no standard
model but is low novelty, should, by definition, Incident Records
have a standard model in place, and shouldn’t Usually selected at random, and tracked
be re-submitted to the CAB. through the change process

Over time, we should expect the number of Minutes of CAB meetings


standard models, and the changes passing Not only to check that CAB meetings have
through them to increase. This should result in taken place, but also to see if identified action
a reduction in the number of changes forwarded points have been followed through
to the CAB, and reduce the number of ad-hoc
change requests devolved to the Change Forward schedule for change
Management Process. To see if it has been accurately defined, and
importantly, that its been published to the user
Metrics & Audit for Change community, and is being adhered to.
Management Process
And finally, that Change review records are
in place for all changes.
We’ve seen in this lesson how Change
Management improves the way in which an
Efficient Change management requires an
organisation implements changes. To clearly
ability to change things in an orderly way,
identify these improvements, Change
without making errors and making wrong
Management measures process performance,
decisions. Effective change management is
and this is carried out in accordance with our
indispensable to the satisfactory provision of
own standards.
services, and requires an ability to absorb a
high level of change.
Measuring performance usually takes place over
time to show, for example, that the number of
urgent changes is reducing. So that the results Benefits & Problems
can be clearly understood at all levels in the
organisation, this data is usually represented in The benefits of and potential difficulties with
graphical form. Change Management are listed on Page 33 of
the little ITIL book and in Section 8.4 of the
Regular summaries of the change process Service Support Manual.
should be provided to service, customer and
user management. Different management Summary
levels are likely to require different levels of
information, ranging from the Service Manager, In this lesson we have been looking at Change
who may require a detailed weekly report, to Management, the second ITIL control process.
senior management committees who only
require a quarterly management summary. We began the lesson by defining what change
is, and the goal of Change Management, in ITIL
Typical metrics for measuring the change terms.
management process are:

44
Lesson 3b Change Management

We looked closely at the relationships between


Change Management and other ITIL processes,
particularly Release, Capacity, Availability and
Configuration Management.

We established that the trigger for the Change


Management process is the receipt of a Request
For Change, and we looked in detail at some of
the sources of these requests.

We examined the role of the Change Advisory


Board or CAB, its make up, and the role it takes
in the Change Management process. We went
on to look at the role of the Change Advisory
Board Emergency Committee.

We studied in some detail the Change


Management process for both a normal and
standard and urgent RFC, and defined the
standard, minor, significant and major RFC
categories.

Finally we discussed the use of metrics and


auditing, in order to evaluate the change
process, and highlighted the benefits, and
potential pitfalls, of the Change Management
process.

45
Lesson 3c Release Management

Lesson 3C Releases are often divided into:

Release Management Major Software Releases


and Hardware Upgrades
Objectives These would usually contain large amounts of
new functionality, some of which may make
intervening fixes to Problems redundant. A
In this final lesson on the ITIL control processes
major upgrade or release usually supersedes all
we will be looking at Release Management,
preceding minor upgrades, releases of
which is described in Chapter 9 of the Service
emergency fixes.
Support book of the IT Infrastructure Library.
Minor Software Releases
When you have completed this lesson you will
and Hardware upgrades
be able to:
Usually containing small enhancements and
fixes, some of which may have already been
• Describe why Release Management is
issued as emergency fixes. A minor upgrade or
needed
release usually supersedes all preceding
emergency fixes.
• List the major benefits, costs and possible
problems of this process
And finally, Emergency software and
hardware fixes, normally containing the
• Understand how the Release Management
corrections to a small number of known
process functions, and its relationship with
problems.
other IT and Service Management
processes
Release Managements holistic approach to IT
service change ensures that the business as a
• Describe what is meant by a Definitive
whole and any relevant technical areas are
Software Library (DSL), a Definitive
ready to accept, implement and use that
Hardware Store (DHS), a Relapse
release. It is the responsibility of the Release
Schedule, a release policy and a release
Management process to plan and oversee the
metric
‘roll out’ of these changes.
Introduction ‘Roll out’ includes distributing all the
configuration items to wherever they are used.
The third and final ITIL control process is This could be done in a number of ways, either
Release Management. ITIL defines the goal of via the internet, by email, or by simply posting
this process is; CD’s. In general, use whatever means best
suits the business.
‘To take an holistic view of a change to an IT
Service and ensure all aspects of a Release’, This all sounds very simple, however the
both technical and non technical, are considered process becomes much more complex when
together.’ Release Management implements hundreds of servers need to be upgraded
new software or hardware releases into the simultaneously throughout a large geographic
operational environment using the controlling and cultural area. To ensure successful
processes of Configuration Management and distribution, clear and repeatable processes as
Change Management. well as technical and business skills will be
required.
So why do we need Release
Management? As part of the Roll Out activities, it is likely that
you will need to provide scripts to help install
Well in simple terms it’s the control process the release, as well as passwords to activate
which ensures that all aspects of a release are the release when needed. Release Management
handled properly, including the software, is also tasked with ensuring that only the
hardware and documentation required. It correct, authorised and tested versions are
focuses on protecting the live environment and installed in to the ‘live’ infrastructure.
its services through the use of formal
procedures and checks. This process requires Additionally Release Management ensures that
technical competence and its sub-processes are we can trace where a particular version comes
often performed by technical staff under the from, and the related changes it has
overall authority of the Change Manager. undergone. This is especially important for “due
diligence and governance”. To make this
A release is defined in ITIL as a collection of possible, software needs to be kept securely
authorised changes to an IT service.

46
Lesson 3c Release Management

before, during and after the move to the ‘live’ Also worth noting is that any back out plans
environment. which have been prepared should also be
tested.
Release Management also agrees the exact
contents of any release and a detailed roll out Part of Change Management’s role is to decide
plan with Change Management, on the particular contents of the release and it
is very important that the release management
The Release Management process encompasses team are fully aware of the decisions that have
three defined areas of the organisation. been made by other organisational elements.

The development area, its own area of pre- Within the actual production environment we
production, and finally the production area, or will have to deal with, distribution, potential
live environment. rebuild and implementation, of software and
hardware releases. There may be three
The migration from one are to the next, is only separate stages, firstly to distribute software,
permitted subject to satisfactory results from secondly, build it or rebuild it in the live
reviews, tests and other appropriate quality environment, and finally implementation.
checks.
Each of these three stages should be verified as
Release management has full responsibility for accurate. For example, before we attempt
the pre-production environment, which contains implementation, we should be absolutely
both the Definitive Hardware Store, or DHS, certain that a rebuild process has been
and the Definitive Software Library or DSL. achieved correctly.
Although we show the DHS & DSL within the
Pre-production area, it is important that it Note that ITIL refers to specific steps called
remains detached from the development, pre- ‘Roll Out Management’ and this may take place
production and live environment. Remember, after independent testing to manage in more
it’s just as important to control a hardware detail the actual implementation stages that
change and release, as it is to manage the follow. Roll out management usually comes into
software equivalent. play when we’re dealing with very large and
complex implementations or ‘roll outs’.
Independent testing might include customers
acceptance testing, operational acceptance Throughout this process it is very important to
tests and so on. It may well be that significant update the CMDB. Information is held here on
customer acceptance testing has already been Release Records, and that any status changes
carried out. However operational acceptance to these records is documented.
tests are very important – they ensure that
anything that goes wrong in the live
environment is supportable maintainable and
robust.

47
Lesson 3c Release Management

Definitive Software Library and the defined as ‘that set of Configuration Items
Definitive Hardware Store. within the infrastructure which is normally
released together’.

Release Management has responsibility for two The general aim is to decide the most
critical repositories. These are the Definitive appropriate Release-unit level for each software
Software Library or DSL, and the Definitive item or type of software. This can be set at
Hardware Store, or DHS. System, application suite, program, or module
level. Different release units will exist in
Information related to the contents of the DSL different parts of the infrastructure. For
and the DHS is held in the Configuration example an organisation may decide that a
Management Database, and responsibility for normal release unit for its order processing
keeping these records up to date belongs to service should always be at system level, and
Configuration Management. as such a change to a CI which forms part of
that system will result in a full release for the
The DSL contains only trusted versions of whole of that system. The same organisation
software, for example software which has been may decide that a more appropriate Release
developed from valid earlier versions via correct unit for PC software should be a suite level, and
Change Management Processes. so on.

The DSL may consist of one disk containing all Once the ‘release unit’ is defined, Release
bought in and created software held in a single management moves on to address the question
format. Commonly the DSL consists of separate of release type. Release types are defined in to
disk volumes or servers containing software for 3 categories, these are, full release, Delta
individual environments. Additionally the DSL release and package release.
could contain other software media, such as
diskettes, CD’s and so on, which might be A full release is where all components of the
stored in a separate cabinet. release unit are built, tested, distributed and
released together. For example, if the release
Software assets are particularly vulnerable to unit is at program level, then the whole
unintended loss or corruption, so it’s important program would have to be rebuilt.
to take very good care of the DSL. For example,
employing adequate security and access If it’s at suite level then the whole suite, which
controls. Appropriate protection against other might include many applications, would have to
threats, such as fire or flood should also be in be rebuilt. Consequently full releases are
place. Backup copies of critical elements of the expensive to build, distribute and install.
DSL would usually be kept, often at another However they do give confidence that all the
location. elements of a service work together
successfully. They are most appropriate for
Finally protecting the DSL against virus major changes, and are usually scheduled over
infection, by running regular virus checks on longer periods of time.
any item entering the library.
Delta releases involves distributing only the
The definitive Hardware Store should be components that have changed since the last
protected in a similar way, and should have release. Consequently this is a less expensive
specific protection against physical removal. option. Delta releases are most appropriate for
The contents of the DHS should be updated as fixes and urgent or emergency changes, and as
quickly as possible to reflect the live such form the most frequent form of release.
environment.
To reduce the frequency of Delta and Full
Storing older versions of hardware can be releases, and to provide longer periods of
useful if the organisation encounters significant stability ‘Package Releases’ can be used. A
problems with new configurations and software, ‘Package Release’ might consist of groups of
then it’s possible to revert back, by cloning delta or full releases, or a combination of the
these older versions. two.

Remember, responsibility for maintaining the Defining Release Type involves deciding on a
contents of the DSL and the DHS is shared form of Release Identification. It’s normal to
between Release Management and use a numbering structure, which applies to two
Configuration Management. or three levels. For example a new Payroll
System might be assigned a release Id of
One of the key activities of Release Manage- V:1.0. An additional minor release which
ment is deciding on the correct ‘release type’. involves changes to some of its applications
Firstly it defines the ‘release unit’, which is

48
Lesson 3c Release Management

would generate a release Id of V:1.1. An Roll out planning involves:


emergency fix to a small element of a module
within that system might have a release Id of • Producing a detailed timetable of events
V:1.1.1. Remember there is no absolute limit to
the levels used. • Listing all the CI’s to be installed and
decommissioned
Definitions of release Type and Release units • Producing Release notes and
should be documented in a Release Policy. This communications to End Users
policy should also clarify roles and
responsibilities, and information on Release • Planning Communication
frequency.
Roll out planning, together with Release
The policy content is usually determined by the Management decides on the type or rollout
Release Manager, in conjunction with the approach. This might be a ‘big bang’, phased or
Change Manager and the CAB. pilot approach.

A Release Policy might also contain A Big Bang approach involves all sites receiving
all functionality simultaneously. The benefit of
• Guidance on the level in the IT this approach is that it offers consistency of use
infrastructure to be controlled across the organisation. However, achieving a
simultaneous upgrade can be problematic.
• Details on release identification and
numbering conventions In a phased approach all sites could receive
some functionality at the same time, with more
• A definition on major and minor releases, coming later. In a Pilot approach a single site
plus a policy on issuing emergency fixes. receives all functionality ahead of other sites.
Note however that combinations are possible,
• Expected deliveries for each type of release for example a ‘phased pilot’ approach.

We mentioned earlier in the lesson that Release Compliance with software licence agreements
Management is responsible for the detailed has become critical to businesses. Ensuring
planning of releases. Amongst other things, these obligations are met is the joint
release planning involves: responsibility of Release and Configuration
Management. For example, when moving
• Gaining agreement on Release Content software to the DSL, it is important to check
what has been purchased has arrived, that it
• Producing a high level release schedule has been virus checked, and that the licence
agreement has been checked.
• Planning resource requirements
Remember penalties for breaching the laws on
Release planning is responsible for verifying all software theft are applicable to any responsible
of the hardware and software in use is as officer of the company, including those at the
standard, and has been derived from the highest level.
necessary definitive software library and
definitive hardware store. There are many legal precedents for holders of
software intellectual property rights arriving
In addition the Release Planner develops a unannounced at premises, and impounding any
Release Quality Plan, to ensure all aspects of equipment, which they believe, contains
the release are quality managed, and produces unlicensed copies of their software.
a back-out plan
Benefits & Problems
Where a release is going to be particularly
complex it may require a specific planning The benefits of and potential difficulties with
phase. To facilitate this, the Release Plan is Release Management are listed on Page 39 of
extended to Rollout planning. This expands the the little ITIL book and in Section 9.4 of the
Release plan produced thus far, and adds Service Support Manual.
details of the exact installation process
developed and the agreed implementation plan.

49
Lesson 3c Release Management

Summary

In this third and final lesson on the ITIL control


processes, we have been examining Release
Management.

We started the lesson by defining ITIL’s Release


Managements goals, and why Release
Management is necessary.

We saw how a release can be divided into


Major, Minor and emergency releases, and
discussed Release Managements holistic
approach to IT service change, and how, as
part of this approach it produces detailed
release or rollout plans.

We examined the Release Management process,


and the linkages to its critical repositories, the
Definitive Software Library and Definitive
Hardware Store as well as the Configuration
Management Database.

We looked in some detail at release types,


release units and release identification, and we
concluded the lesson by identifying some of the
benefits, an potential problems with the Release
Management process.

50
Lesson 4a Availability Management

Lesson 4a availability that the business can afford by using


more and more advanced techniques and
Availability Management equipment.

Objectives Business of course is interested in the


availability of its services, such as e-mail,
personnel records and so on, and is not directly
The topic for this lesson is Availability
concerned about the availability of any
Management, which is described in Chapter 8 of
components that may be vital in making up that
the Service Delivery book.
service.
Once you have completed this lesson you will
In general, the availability of a service is
be able to define Availability Management and
influenced by the complexity of that service and
describe how it relates to other ITSM
the systems that it is based on, by the reliability
components.
of the items in the infrastructure, by both
corrective and preventive maintenance
You will be able to recognise the main elements
procedures - and also by our incident, problem
of the Availability lifecycle and understand the
and change management procedures.
terms MTBF, MTTR and MTBSI.
It is important for all staff involved to
You will appreciate the main responsibilities of
understand that if a business service is
the Availability Management process and be
unavailable because of an IT problem there will
able to recognise several techniques which are
be a loss of business productivity.
of use in this area.
This may also lead to a loss of revenue,
Introduction customer dissatisfaction and extra costs in
having to pay staff overtime for the work they
Despite the fact that the IT Infrastructure is couldn’t do when the system was unavailable.
becoming ever more reliable – and hence
Availability levels are generally better than they
have ever been – Availability Management is
Availability Management -
non-the-less a critical support process for Relationships and Definitions
Service Level Management.
We will now explore the relationships that exist
Availability is now regarded as one of the most between Availability and the various elements
important issues for IT service management of the support organisation, such as Service
because of the growing dependence of Level Agreements, IT Services and their
businesses on their IT services. customers.
A customer will negotiate a Service Level
Availability Management supports Service Level Agreement with IT Services, and within the SLA
Management by actively managing the there will be statements about service
availability of services. For example it assists availability.
the Service Level Manager in negotiating and
monitoring service levels. These statements might say that we expect
99% availability from a service measured over
The Service Delivery states that: a one month period, or they may say we expect
no more than one hour’s lost service over a four
The goal of the Availability Management process weekly period.
is to optimise the capability of the IT
Infrastructure, services and supporting They may say we expect no more than three
organisation to deliver a Cost effective and breaks of service totalling one hour over a
sustained level of Availability that enables the monthly period.
business to satisfy its business objectives.
The definition of availability and the way we
The critical words here are ‘cost effective’. phrase that will be subject to local discussions.
The current best practice view is to make this
The business can have almost any availability it statement as business focused as possible and
likes provided it is prepared to pay for it. One to think in terms of unavailability rather than
only has to look at the expenditure on safety availability.
critical systems and on general aeronautical
systems to understand this. The generic definition of availability is: “The
ability of an IT service or component to perform
For most commercial and organisational its required function at a stated instant or over
systems there is a limit to the benefit in extra a stated period of time.” (SD Manual 8.2.3)

51
Lesson 4a Availability Management

Related terms, which are also defined is the Availability Lifecycle


same section of the Service Delivery manual
are, Reliability, Maintainability and It is useful to think of Availability as having a
Serviceability. lifecycle.

In Service Level Agreements and in clauses with So imagine that we have a timeline with time
suppliers through underpinning contracts, running from left to right.
Availability is often expressed as a percentage -
the percentage of the agreed service hours for Now for a particular component, lets say that a
which the component or service is available and failure occurs at time X1. This will be recorded
that is often as a measure of how good or bad in ITIL as an Incident.
the availability is.
There will then be a period of time that it takes
To say that we require 99% availability of the to repair the faulty component – this is usually
service over a given period is a fairly common referred to as the Mean Time To Recover or
way of defining what is needed by the business. MTTR.

So, customers negotiate the SLA availability Be very careful here as the R in this acronym
clauses with the IT service through service level can have a number of alternate meanings. We
management processes and then, as we will be have defined it as “Recover” – but it is also
seeing in later lessons, service level commonly taken to mean “Respond”, “Repair”
management processes require underpinning or “Restore”. Imagine, for example, that the
support. failure is a crashed hard disk.

There are broadly two types of underpinning There will be a period of time that it takes to
support, one through operational level “Respond” to the incident, to get an engineer
agreements with internal suppliers, the other on site. Then there will be a further period
through underpinning contracts with external during which the disk is being repaired or more
providers. likely replaced. Typically, it will then take some
time to “Restore” the data to the point where
In the case of the internal support, such as normally business can be resumed.
application support, hardware support and so
on, then we’ll expect to find statements in the In this course we will be using the term
OLA on availability, reliability and “Recover” to encompass all of this – and the
maintainability of the components that this Mean Time To Recover is the average length of
group is responsible for. time that all of this takes to achieve.

When we are talking about underpinning Be aware though, that it may be useful to
contracts the word ‘serviceability’ is often used understand these other measures as they are
as a contractual term and that is seen as often captured by service management
covering availability, reliability and organisations to check on various aspects of the
maintainability when applied to components availability management process.
supported by external suppliers.
Once normal service has been recovered there
You can review a definition of each of the terms will then be a hopefully long period of time
“availability”, “reliability”, “maintainability” and before the component fails again at time X2.
“serviceability” by clicking on each of the
buttons here. The period of time between the fault being
recovered and the next failure is known as the
The word Serviceability, in ITIL, is reserved for Mean Time Between Failure or MTBF.
use where support is provided by external
parties and will incorporate statements about Hence it is easy to see that the sum of the
availability, maintainability, reliability of their MTTR and MTBF will give what is called the
managed components and services. Again, Mean Time Between System Incidents or
measuring the way the third party suppliers are MTBSI.
achieving availability would be of value to the
organisation and should be part of the role of
availability management.

52
Lesson 4a Availability Management

MTBF, MTTR and MTBSI The Business View of Availability

We can now consider the relationships that All businesses rely on their IT services – but
exist between each of these three parameters some services, or parts of services, will be more
and the terms Availability, Reliability and important to the business than others.
Maintainability that we have already discussed.
For example, in an EPOS service, the critical
It is obvious from the diagram that a high Mean requirement is that we are able to take
Time Between Service Incidents implies high payments. Other functions such as automatic
Reliability. If components don’t fail very often updating of stock levels is important but not as
then the services on which are based on them vital as servicing the immediate customers.
will be reliable services. So high MTBSI is Therefore it may be necessary to aim for higher
obviously a good thing. availability of the first part of the service than
the second part.
On the other hand, a low Mean Time To Recover
is good news, since this implies a high ITIL refers to such business-critical functions as
Maintainability. This can be achieved, not only Vital Business Functions or VBFs
by technical means but by having good support
procedures within the IT service management The concept of Vital Business Functions is
team so that there are no delays between an widely used in IT Service Continuity
incident being detected and repair work Management and Availability Management
starting. within ITIL and is a way of highlighting the
services to which the business must have
As you might expect – a high Mean Time almost 100% availability.
Between Failure is very desirable and directly
equates to a high Availability. Understanding each Vital Business Function
allows the Cost of Unavailability of a service to
So, typically we can see that if we want to be measured and reported. Such costs may be
achieve higher availability, then either incurred through revenue loss, or overtime
increasing the Mean Time Between Failure or payments and so on, as we discussed earlier.
reducing the Mean Time To Repair – or a
combination of the two can achieve this. Cost of Unavailability is a more effective way of
reporting than percentage availability because it
All of these measures, MTBF, MTTR and MTBSI, relates the true cost of the loss of service to the
can be applied at both the component and business directly.
overall service level.
It is important to report on trends and to agree
Typically, if we want to increase the overall on the measurement period, for example,
availability either of a service or of an assembly “Service was available for more than 98% of
of components, then this can be done either by the agreed service hours during the last month”
increasing the reliability of each component or may be very useful when we’re reporting
the resilience of the assembly or by improving against service levels in Service Level
the maintainability and the procedural aspects. Agreements, which are often expressed in the
same way.
If an e-mail service is dependent on two servers
and each has a MTBF of 5000 hours, what will Trends are very important in the whole of
be the MTBF of the e-mail service ? service management. Service improvement
programmes, for example, set out to move
Increasing the MTBSI and MTBF figures and things forward, and that relies on having some
reducing the MTTR will all cost money. There baseline against which to measure.
will be a limit as to how much we can spend to
achieve high reliability and high resilience and So, for example, we might want to say that
there will be a limit to how much we can spend we’ve moved forward in terms of the number of
to achieve instantaneous reporting and repair. breaches of Availability Agreements from last
year to this, with the number decreasing from
As we said at the start of this lesson, the 10 to 5, say.
business can have almost whatever availability
it wants – provided that it is prepared to pay for Section 8.7.7 of the Service Delivery Manual
it. uses what it calls an IT Availability Metrics
Model (ITAMM) as a framework for deciding on
the sort of reporting that needs to be done.
Because it covers such a wide range, from
details of component availability right through

53
Lesson 4a Availability Management

to services, it is a basis for all reporting both management staff having some familiarity with
internal and external. system development processes.

It is beyond the scope of a Foundation course to The Availability Plan should be a long-term plan
understand much more about the ITAMM, just for the proactive improvement of IT service
the fact that it exists and is a basis for availability within the imposed cost constraints.
important reporting is what we need to know.
A good plan should have goals, objectives and
deliverables and should look at all the issues of
Responsibilities of Availability people, processes, tools and techniques as well
Management as looking at the technology.

Page 64 of the Little ITIL Book gives a useful In many ways the Availability Plan is analogous
listing of the responsibilities of the Availability to the Capacity Plan and should take account of
Management process. current levels of availability against the service
level requirements, trends in terms of
The first of these, concerning the optimisation availability, new technological options and
of availability is self evident and much of this knowledge of the way the business is
lesson concerns that particular point. developing.

The second point is about determining There is no absolute guideline on how far ahead
availability requirements in business terms. the plan should look, but following the capacity
management analogy, it would reasonable to
It is very important that we are able to work think in terms of one year at a time with a
with the service level manager and the review at least every three months.
customer so that their requirements for
availability can be expressed in terms with The fifth item on the list of responsibilities is all
which they feel comfortable. about the collection, analysis and maintenance
of availability data. Monitoring the various
They are often much more comfortable with availability parameters can generate a large
discussing business lost, business downtime amount of data and because of this it is not
caused by loss of IT services, than they are in unusual to find an Availability Management
percentages and fractions. Database being created. This may be either as
a separate entity or by adding extra information
Hence we must be able to gather these to Configuration Management database.
requirements in the relevant terms and
translate them into meaningful technical terms Item six is arguably one of the most important
for discussion with suppliers of underpinning areas and defines the role of the availability
services, both internal and external. manager.

Conversely, if we are producing technical This is all about monitoring service availability
information about availability, MTBFs, MTBSIs against the Service Level Agreements, for the
and so on, it is our responsibility to help the benefit of the service level manager.
service level manager to turn these figures back
into meaningful business terms for the The performance of internal and external
customer. suppliers against the serviceability
requirements in any underpinning contracts and
The third point, Predicting and Designing for targets defined in the Operational Level
expected levels of availability and security, Agreements and must also be monitored as part
implies that availability management staff are of this process.
involved in the systems development process
right from the very beginning. The final point refers to the need for the
Availability Management process to be
It is an ITIL recommendation that Availability continually looking for improvements on a
Management staff should be involved when the proactive basis. In other words, not waiting for
business case is being created for a new or targets to be threatened before taking action,
extended service and that they remain involved but to be constantly reviewing current status
all the way through the analysis and design and looking for cost effective ways of improving
process. availability.

The aim being to ensure that the needs of As with many other of the ITIL processes this
availability management, including proactive work is critical but may be the last
maintainability and reliability, are built in along part of the process to be implemented.
with security elements. This implies availability

54
Lesson 4a Availability Management

There is an additional responsibility on the levels in the area of availability, then we’ll be
process owner, and that is to monitor the constantly looking at records of service level
effectiveness and efficiency of the availability achievement or service level breaches or
management processes. potential breaches.

This can often be done by looking at how many Now let’s look at the key outputs from the
SLAs have been breached because of process, which are:
availability issues and looking at how many
components have got measurement in place. • Availability and Recovery Design criteria for
each new or enhanced IT Service. These
The Availability Management are intended to help the development
Process teams decide on how to achieve high
availability.
Section 8.3 of the Service Delivery manual
describes the Availability Management process • Details of the Availability techniques that
in some detail. will be deployed to provide additional
Infrastructure resilience to prevent or
The inputs to the process include: minimise the impact of component failure
to the IT Service
The Availability Requirements of the business,
which are critical. • Agreed targets of Availability, reliability and
maintainability for the IT Infrastructure
A business impact assessment, so that the Vital components that underpin the IT Services.
Business Functions and the consequences of
loss of availability are fully understood. This • Reporting of Availability, reliability and
will help in determining priorities when setting maintainability to reflect the business, User
up the Availability Management processes for and IT support organisation perspectives
the first time.
• The monitoring requirements for IT
Part of the service level negotiation process will components to ensure that deviations in
be to determine the availability, reliability and Availability, reliability and maintainability
maintainability requirements from the business. are detected and reported
Some of these will be for existing services while
others will be for services that are in • And finally, an Availability Plan for the
conception. proactive improvement of the IT
Infrastructure.
Incident and Problem data will also need to be
examined. Part of the proactive work will be to Security
investigate incidents and problems and to see
which of those are caused by unavailable It can be argued that the most valuable assets
equipment and what the impact of these of IT services are the data and the ability to
incidents or problems was on availability process that data.
measures.
This is why security is such an important part of
Configuration data will be very important since IT service management.
that will show the relationships between
configuration items and the chain of The basic logic behind managing these assets
configuration items that makes up a typical is:
service.
• Make sure that access is denied to
This will enable us to look for sensible places unauthorised people. In other words,
where we might decide to replace equipment by maintain Confidentiality.
higher quality equipment with a higher
reliability. • Make sure that the assets are trustworthy.
That is, maintain Integrity.
Or, for other areas where we might decide to
mitigate against a possible single point of • And, make sure that assets are available to
failure, or SPOF in ITIL terms, by looking for authorised people when they need them.
alternative routing in a network or perhaps Or, maintain Availability.
duplicating of discs or processors.
This may lead to some conflict and possible
Remembering that one of the jobs of availability trade-offs. For example, high availability is not
management is to ensure we achieve service

55
Lesson 4a Availability Management

necessarily good if it compromises Contrast this with the value given by the more
confidentiality or integrity. simple basic calculation, which would be only
90%.
Within ITIL, availability aspects are the
responsibility of availability management while Its important to note that whichever way of
the confidentiality and integrity issues are calculating availability is chosen has to be
shared responsibilities with security agreed with the users before it can be used as
management. the mechanism that we measure and report on.

Within an organisation, it may well be that the Percentage availability may not always be the
whole responsibility for CIA is devolved to the most useful measure from a business point of
availability management team. It is very view.
important that such responsibilities are clarified.
Absolute figures of up-time and down-time over
Techniques for Availability an agreed period might be more appropriate
Management and may be more acceptable for the business.

So for example we could say that there were


One of the most basic techniques used in
four hours of downtime out of 400 potential
Availability Management is the calculation of
service hours in the last week, and that may be
availability in terms of a percentage.
a more useful measure than turning that into a
percentage value.
The basic calculation is straightforward, the
availability of a service or of an individual
This is all about agreement and trust between
component or of a grouping of components is
customer and supplier and whichever figures
given by the agreed service time minus the
are chosen should be those most meaningful to
downtime, divided by the agreed service time –
the business.
all times 100 to obtain a percentage value.
It is very important to understand and be
Note that component availability, is very often
consistent in the use of reporting periods.
expressed as a decimal value – always less than
one - rather than as a percentage.
For example, an availability of 99% for a
service to be achieved on each and every day is
In order to take account of the fact that one
much more demanding than the same
user losing access to the system is significantly
percentage averaged over a year long reporting
less serious than 100 users all losing access, a
period.
weighted calculation can sometimes be more
meaningful.
It is possible to achieve a 99% availability
whilst losing service for perhaps two whole days
The way this is calculated is to replace the
in the year. In order to achieve 99% on a daily
variables AST and DT with End User Processing
basis, the allowable downtime on any one day
Time and End-User Down Time.
would have to be reduced down to just a few
minutes.
End User Processing Time is defined as the
Agreed Service Time multiplied by the total
Great care must be taken over the definition of
number of users (Nt).
what agreed service time is.
End User Down Time is found by multiplying the
For example, does it include downtime for
Down Time by the total number of users
maintenance? Is that already factored in?
affected.
So, if a system is meant to be available for 40
In most cases we would not want to be
hrs in a week and there are 10 users of the
penalised for agreed downtime for maintenance
system, EUPT will be 400.
or upgrades.
If just one of the users is affected for four hours
In 24/7 systems however, where the
but the other 9 users are not affected at all
requirement is for very high availability, the
over that period of measurement, then End
figures often do include and are meant to
User Down Time would be equal to four hours
include any time for maintenance, which will
downtime x 1, giving a value of 4.
need to be reduced to an absolute minimum.
Therefore the overall availability would be 400,
The pattern of downtime may also be critical
minus 4 divided by 400 all times 100 – giving a
and will need to be understood. For example,
weighted availability of 99 percent.
depending on business circumstances, 10 losses

56
Lesson 4a Availability Management

of service each of 10 minutes duration may be There may also be some technical limitations in
more damaging than a single loss of service of terms of how easy it is to switch from one
100 minutes for the same period of time. component to another when one fails, but the
general principle is one of significant
The reporting requirement to cover such improvement to assembly availability achieved
differences will need to be closely examined and in this way.
agreed with the business.
One difficulty in both cases is finding good
In reporting and discussing availability with end values for A1 and A2.
users and customers, the main areas of interest
will nearly always be based around services and Assuming they are hardware components, this
not around components. could be derived from a combination of
manufacturers’ engineering specifications, (NOT
However, internal reporting for service from their sales literature), other similar
improvement purposes and for supplier installations and your own experience gained
management mechanisms will often require during testing or development.
reporting at the component level.
Using a combination of those three sources will
Calculating the Availability of tend to give realistic values for the availability
Multiple CI’s of individual components.

Once an initial base of figures has been


A very common requirement is to be able to
established then monitoring of availability over
understand and calculate how the availability of
a period of time using monitoring tools and
an assembly of configuration items is governed
records from the service desk of incidents can
by the individual component availability.
allow an iterative improvement in the
component availability figures.
An assembly is a grouping of more than one
configuration item.
Finally, there are a range of techniques
designed to aid understanding of why
The formulae for calculating End-to-End
availability problems are occurring in particular
availability for items arranged in series is fairly
parts of the infrastructure and to find corrective
simple.
ways of working.
The overall availability AT is equal to the
The first of these techniques that we will look at
product of the availability of each of the
is Component Failure Impact Analysis (CFIA).
individual components.
This is represented normally in a matrix
So if we have two components, each of which is
showing configuration items against the
capable of delivering 90% availability – the
services supported.
End-to-End availability of the assembly will be
0.9 times 0.9 – or 81%. In other words,
For example, here we can see that service ‘B’ is
significantly less than each of the components
dependent on all four of the CIs 1 to 4 being
making up the assembly.
available, whilst service ‘D’ only requires items
3 and 4.
It is easy to see from this formula that the
more items that are put in series, the lower will
Looking another way, we can see that item 3 is
be the End-to-End availability figure.
essential to all 4 services, none of them can
function without it.
Calculating End-to-End availability for items
arranged in parallel is a little more complicated
It is important to realise that the CFIA matrix
– as shown.
can be used by either reading down the
columns or across the rows to give us different
So for the same two components now arranged
information.
in parallel – the resulting End-to-End availability
will be 99%.
If ‘B’ is a service which has vital business
functions within it, then it becomes critical to
Again it is easy to see that, unlike components
understand, at a more detailed level, how those
arranged in series, the more CIs that are put in
VBFs are dependent on the components.
parallel then the higher will be the overall
availability – but such duplication of
As a first pass analysis of dependency and
components, or duplexing, – will necessarily
understanding of where single points of failure
increase costs.
could be critical, CFIA is very useful.

57
Lesson 4a Availability Management

So in the example shown, CI3 is a very good Setting up a Technical Observation Post or
candidate for attention, such as replacement T.O.P. is an expensive process because it
with a more reliable item or duplication by the involves bringing together a team of people to
addition of a parallel assembly as a replacement look at a service at a vulnerable period of its
for the single component CI3. life.

More sophisticated information can be put in If, for example, we know that on a monthly
the CFIA such as information that for service ‘B’ basis are availability problems while assembling
to run, either component 3 or component 4 data for end-of-month financial work, then a
need to be there but not necessarily both. Technical Observation Post might be set up to
look at this particular process.
This may require some extension to the
notation - which is often home grown or In effect the T.O.P. would be watching the
company-specific and which is beyond the process go wrong in order to more accurately
scope of this course. understand what’s happening. This is
particularly useful in cases where it proves
Another useful technique is called ‘Fault Tree difficult in test conditions to simulate the fault
Analysis’ or FTA. that is causing the loss of availability.

This is a diagrammatic technique drawn initially It requires an inter-disciplinary team and an


from the world of engineering, which identifies acceptance from the business that the only way
the chain of events leading to service failure. of finding and resolving the issue is by allowing
some availability losses to occur.
It is part of a family of techniques generally
referred to as Failure Mode & Effect Analysis or It is worth noting that in addition to the
FMEA and this is covered in more detail in the techniques that we have discussed in this
lesson on Problem Management. section, the Availability Management process
will support and work closely with proactive
Risk analysis can be done in a variety of ways. problem management. So many of the same
The way that’s favoured in ITIL because it techniques used in Problem Management may
originally comes from the same development also help with identifying the underlying
source, is known as CRAMM, CTTA Risk Analysis reasons for lost availability.
and Measurement Method.
Benefits and Problems of
The CCTA – or Central Computer and Availability Management
Telecommunications Agency was the original
name for the OGC or Office of Government
The benefits of and potential difficulties with
Commerce. The name was changed in 2001.
Availability Management are list on Page 68 of
the little ITIL book and in Section 8.3.5 of the
We’ll talk a bit more about CRAMM in the IT
Service Delivery Manual.
Service Continuity Management lesson.
They are also summarised here for your
One of the key requirements of availability
convenience.
management is to be able to achieve an
understanding of why a particular lack of
availability is occurring and what to do about it. Summary

There are a couple of techniques that can help In this lesson we have been examining the
us here and they are called; System Outage Availability Management process
Analysis, SOA, and Technical Observation Posts
or T.O.P. Once you have completed this lesson you will
be able to define Availability Management and
SOA involves a detailed analysis of service describe how it relates to other ITSM
interruptions. It is really a post-mortem about components.
some of the more major incidents that have
occurred in the infrastructure and trying to find You will be able to recognise the main elements
some common underlying theme or cause for of the Availability lifecycle and understand the
the availability losses. terms MTBF, MTTR and MTBSI.

It requires significant inter-disciplinary work You will appreciate the main responsibilities of
between different teams to make this work and the Availability Management process and be
tends to be managed as a small project with a able to recognise several techniques which are
particular budget and reporting period. of use in this area.

58
Lesson 4b Capacity Management

Lesson 4B day activities include dealing with technical


specialists and service level managers. It’s not
Capacity Management usual for the Capacity Manager to communicate
with customers, or to be responsible for
procurement of new equipment. However,
Objectives Capacity Management will have a significant
input on purchasing decisions.
In this lesson we will be examining Capacity
Management, which is covered in Chapter 6 of
Capacity Management
the Service Delivery book in the IT
infrastructure library. - a balancing act

Once you have completed this lesson you will The Capacity Management Process can be
be able to; regarded as something of a balancing act. The
organisation must provide enough capacity to
• Define Capacity Management, and its meet justified business demands, balanced
three sub-processes of Business, against the costs that the organisation can
Service and Resource Capacity afford to pay.
Management
There a two ‘laws’ associated with Capacity
• Identify Capacity Management’s Management, which offer an insight into the
demands placed on this process. The first is
• ongoing, ad hoc and regular activities ‘Moore’s Law’, which suggests that ‘processing
capacity doubles every 12 to 18 months.
• Describe the contents of the Capacity
Database and the Capacity Plan The second is a variation on ‘Parkinsons Law’,
which states that data expands to fit the space
What is Capacity Management? available for storage. This highlights a second
‘capacity’ problem, the one of supply and
demand. As greater capacity becomes available
In order that Service Level Agreements are
users will make use of it.
met, it is critical that sufficient capacity is
available at all times to meet the agreed
There is continual pressure from the business
business requirements.
and customers to increase capacity, but in
doing so there a costs incurred to the business.
Capacity Management ensures that IT
Ultimately, a decision has to be made over
processing and storage capacity provision
whether the cost of capacity provision provides
match the evolving demands of the business in
enough business benefit.
a cost effective and timely manner. Of all the
ITIL processes this can be regarded as one of
However, Capacity Management must justify
the most proactive.
the cost of any capacity increases. Broadly
speaking the objective is to provide the:
ITIL defines Capacity Management’s goal as:
• Right Capacity, enough but not to much
‘To understand the future business
• At the right cost
requirements (the required service delivery),
• And critically, at the right time
the organisations operation (the current service
delivery), the IT infrastructure (the means of
In theory, if Capacity Management processes
service delivery), and ensure that all current
are running well, providing the right level of
and future capacity and performance aspects of
capacity at the right time, then they should be
the business requirements are provided cost
invisible to the business, and to most aspects of
effectively.’
Service Level Management.
The Capacity Management process incorporates
In any organisation, there can be a huge
Performance Management, Capacity Planning,
number of capacity elements to be managed,
and monitoring and tuning activities. In a large
which could impact on business.
organisation there may be many people
working in a Capacity management team under
Those shown in the question represent just a
the leadership of a specialist. In smaller
few of the IT components, which Capacity
organisations it might be the role of a single
Management must address.
individual who is supported by technical
specialists from Networking, desktop and so on.
Interestingly, people are not usually thought of
in capacity terms, except where a shortage of
The Capacity Manager role requires excellent
people leads to other capacity problems. For
technical and business capabilities. The day-to-

59
Lesson 4b Capacity Management

example, if we don’t have enough service desk Finally, Resource Capacity Management
staff to fulfill commitments made in Service concentrates on the underpinning technology
Level Agreements. resources that ‘enable’ business services. It also
ensures that these resources, or Configuration
As we mentioned earlier in this lesson, items, are not over used. This sub process is
providing capacity to the business at the right also responsible for monitoring future
time is critical. If capacity upgrades are too late development and capacity of technical
then the infrastructure could fail. Failures might components, and reporting these findings back
already be occurring, for example, through to the business, so that they can be integrated
incidents and complaints reported to the service into future plans.
desk. Or internal monitoring tools might
indicate that we are operating close to capacity. The Capacity Management process has a
number of ongoing, iterative activities. These
Buying in extra capacity at short notice leaves activities include: monitoring, analysis, tuning
little negotiating power with external suppliers, and implementation, and are carried out in
and as such, is likely to be very expensive. Resource Capacity Management and Service
Conversely, upgrading the infrastructure to Capacity Management. They are not normally
increase capacity, to then find it’s under used used in Business Capacity Management, except
could in itself lead to financial problems. during business reporting. For example, to
show, through analysis of data gathered
Capacity Management is also involved in the through these activities, that transaction
reduction of capacity or as it is sometimes responses are slowing down.
known, ‘managing shrinkage’. In any
organisation the capacity of certain components The monitoring activity should include the
is being reduced whilst the capacity of others is monitoring of thresholds, and baselines or
being increased. An example of this might be profiles of the normal operating levels.
where a mainframe-based environment is Thresholds and baselines are set from the
gradually being replaced by a distributed analysis of previously recorded data, they are
service. The capacity requirements on the the ‘yardstick’ by which Capacity Management
mainframe will be falling while the capacity can measure utilisation of IT infrastructure
requirements on the servers will be increasing configuration items. All thresholds should be set
rapidly. below the level at which a resource is over-
utilised, or below the targets in an SLA. For
Capacity Management Structure example, a threshold might specify that the
usage on any individual CPU does not exceed
Capacity Management consists of three inter- 80% for a sustained period of one hour. If these
related sub processes, each working at different thresholds are exceeded, alarms should be
levels in the organisational structure. raised and exception reports produced.

The three sub-processes are, In addition to exception reports, monitoring will


Business Capacity Management (BCM), also produce trend reports on a daily, weekly or
Service Capacity Management (SCM) monthly basis. Trend reports are intended to
and Resource Capacity Management (RCM). help predict future threshold breaches.

Business Capacity Management (BCM) Monitoring leads on to the analysis activity,


focuses on the future services required by the where the monitoring data is analysed to try
business and tries to predict future capacity. and identify problems, and what type of
This process is responsible for the production of problems they are. Analysis then leads onto
a Capacity Plan, which is intended to forecast reporting, and then onto tuning, where the
the future requirements for resource to support problems are addressed, and the technical
IT Services that underpin the business parameters of the system are fine tuned to
activities. To work effectively, BCM requires an improve efficiency. Once a tuning decision has
insight into the business as a whole, and should been made it is implemented through the
be able to gather medium term plans and change management process. Finally the
predictions about growth or shrinkage. activity returns to monitoring, and the iteration
begins again.
Service Capacity Management (SCM) is
concerned with the services currently in place Note that tuning is an optional activity. If no
to support the business. It tries to ensure SLAs problems are identified in analysis, then tuning
aren’t breached because of capacity problems, will be unnecessary. Tuning is an expensive
and tries to improve scarce resource utilisation activity, as it involves high level of skill.
through the use of Demand Management.

60
Lesson 4b Capacity Management

Tuning can improve service delivery without used in Business Capacity Management’s
incurring costs associated with equipment reporting activity.
purchase. However, using skilled resources will
incur costs, particularly if they are sourced from Another on-going Capacity Management activity
outside the business. is Demand Management. The main objective of
Demand Management is to influence the
Tuning at service level can ensure that services demand for computing resource and the use of
don’t clash at times of peak demand. Any that resource.
excess demand can be controlled by Demand
Management, an activity that we will look at This activity can be carried out as a short-term
later in this lesson, or by sharing capacity, in a requirement because there is insufficient
multi-server environment, across several current Capacity to support the work being run,
servers. or, as a deliberate policy of IT management, to
limit the required IT capacity in the long-term.
Importantly, tuning should be carried out
initially in a test environment. Only when we Short-term demand management might be
are confident that the change will be a benefit needed if there is a partial failure of a critical
to the business, should it be implemented resource in the IT Infrastructure. Service
through the conventional change management provision might have to be modified until a
processes. replacement or fix is found.

Activities in Capacity Management Long-term Demand Management might be used


– What does the Capacity when an expensive upgrade to the IT
infrastructure can’t be cost justified. The aim in
Manager do? this case, is to influence patterns of use, by
using mechanisms such as physical and
In the next few pages we will look at all of the financial constraints.
capacity management activities in more detail,
and how they relate to each of the Capacity Physical constraints might involve restricting
Management sub-processes of Business the number of concurrent users to a specific
Capacity Management, Service Capacity resource, a network router for example.
Management and Resource Capacity
Management. Financial constraints might involve the use of
differential charging, an example of this might
Remember Business Capacity Management is involve charging customers a premium to use
concerned with future business requirements network bandwidth during peak hours of
for IT services, its planning and timely demand.
implementation.
Demand Management must be carried out
Service Capacity Management is responsible for sensitively, without causing damage to the
ensuring the performance of all services business, customers, or the reputation of the IT
detailed in SLRs and SLA targets are monitored, organisation. It is essential that customers are
measured, recorded, analysed and reported. kept informed of all the actions being taken.

Resource Capacity Management monitors and Another ‘on-going’ Capacity Management


measures the individual components in the IT activity is providing data to the Capacity
infrastructure. Management Database or CDB. As you can see
in the diagram, all of the other on-going and ad
The Capacity Management activities can be sub hoc Capacity Management activities provide
divided in to three groups based on their information to the CDB. The CDB provides
frequency, and these are: valuable information on who has used which
resource and when. This data can be extremely
Ongoing, the day-to-day activities, Ad hoc, useful for other ITIL processes, particularly IT
carried out as a result of a particular need, and Services Financial Management.
Regular, which are carried out at fixed intervals.
The CDB is the cornerstone of a successful
Amongst the ongoing iterative activities, are Capacity Management process. Data in the CDB
those of Monitoring, Analysis, Tune and is stored and used by all the sub-processes of
Implement, which we looked at earlier in the Capacity Management, because it is the
lesson. repository that holds a number of different
types of data including; business, service,
Remember this group of activities are mainly technical, financial and utilisation data.
carried out at the Service and Resource sub-
process level. Also note that these activities are

61
Lesson 4b Capacity Management

However the CDB is unlikely to be a single Although Analytical modelling requires less time
database, and probably exists in several and effort that other modelling types, typically
physical locations. We will look at the make up the end results are less accurate.
of the CDB later in this lesson.
Simulation modelling involves the modelling of
Ad hoc activities discreet events, in other words what actually
happens millisecond by millisecond, as a
transaction passes from local pc through the
Modelling is an example of an ad hoc activity,
local area network, to server and so on. This
which is used in all Capacity sub-processes.
type of modelling can be very accurate in
Modelling tries to predict the behaviour of
predicting the effect of changes, but it is time
components and services under a given volume
consuming, and therefore costly, as it can
of work, particularly at peak times, ant tries to
involve numbers of staff in producing physical
understand the way in which current service
event simulations.
and resources are used, and the impact of that
usage on the IT infrastructure. It attempts to
However, Simulation Modelling can be cost
predict the future from our knowledge of the
justified in organisations with very large
past. In order to do this we establish a
systems, where the cost and associated
‘baseline’ model.
business implications are critical.
The baseline model reflects accurately the
Finally Benchmarking involves physically
performance that is being achieved. Once a
building a replica of part of the IT infrastructure
baseline is created, predictive modelling can be
and measuring such things as its response to a
done.
reduced workload, and extrapolating these
results, to see how it would perform under the
We can ask the ‘what if?’ questions about
‘real’ workload. Because Benchmarking involves
planned changes to the IT infrastructure. If the
the purchase of equipment, building software
baseline model is accurate then the results of
and simulating significant workloads, this is the
the predicted changes should be accurate.
most expensive modelling option, however, it
does give the most accurate predictive figures.
The major modelling types used by Capacity
Management are:
Another ad hoc Capacity Management activity is
Application Sizing. The primary objective of
• Trend Analysis
Application sizing is to estimate the resource
• Analytical Modelling
requirements to support a modified or new
• Discrete Simulation
application, and to ensure that it meets its
• Benchmarking
required service levels.
These modelling techniques vary in complexity
Application sizing has a finite lifespan. It is
and consequently cost, with Trend Analysis at
initiated at the beginning of a new application,
the top being the simplest and cheapest, whilst
or when there is likely to be a major change to
benchmarking being the most complex and
an existing one. Application sizing is complete
expensive. Lets look briefly at each of these
when the completed application is accepted into
modelling types.
the operational environment.
The Trend Analysis technique looks at various
This activity is performed together with
data over a period of time and attempts to draw
colleagues in system and service development,
a smooth curve through these figures,
to ensure that we are fully aware if the likely
extrapolating the graph data forward into the
impact of services being development, designed
future, as a way of predicting future trends.
or purchased, before they are implemented.
This provides Capacity Management with
Analytical Modelling uses mathematics to
important data on future resource
represent computer system behaviour. Typically
requirements, and this can be integrated in to
a model is built using a software package,
the Capacity Plan, as well as providing valuable
which can recreate a virtual version of a
information for purchasing and the development
computer system. When the software is
team. How we make programming, database
executed, ‘queuing theory’ is used to calculate
design and architecture design more resource
response times, and if virtual response times
efficient, is also covered by in the ‘Best Practice’
are sufficiently close to those recorded in the
guidance.
‘real life’ IT infrastructure, the model can be
regarded as accurate.
Finally, a ‘regular’ Capacity Management
activity is the production of a Capacity Plan,
which is typically created annually. Information

62
Lesson 4b Capacity Management

gained from the activities of monitoring, improve levels of capacity, or reduce costs –
demand management, modelling and preferably both!
application sizing will contribute to the
production of a Capacity Plan. We will be Carrying out ‘Effectiveness Reviews’ and
looking at the Capacity Plan in more detail later creating ‘Audit Reports’ form a basis for
in this lesson. checking that business benefits are being
achieved, and the process users are following
Inputs and Outputs of the Capacity the ‘rules’.
Management Process
Contents of the Capacity
To fully appreciate the scope of Capacity Management Database and the
Management, we will spend the next few Capacity Plan
minutes looking at the major inputs and outputs
to the process, and how these relate to the sub- Although the Capacity Management Database is
processes of Business, Service and Resource represented in the ITIL guidance as a single
Capacity Management. entity, it is unlikely to exist in this form in many
organisations. The main reason for this is that
Inputs to the BCM sub-process include, the much of the data held in a CDB is common to
external suppliers of new technology, existing that in a fully integrated Configuration
service levels and current SLAs, along with Management Database, therefore, there is an
proposed future services and related SLRs. argument for making the CMD part of a ‘Super’
Other important inputs to BCM include the integrated CMDB.
Business Plans, and any strategic plans
together with IS and ICT plans. Finally BCM Software tools used by Capacity Management
requires the Capacity Plan as an input, if one tools may have designed in to them, partial
exists. CMD functionality. If this information is
accessible by other software, than a ‘virtual’
The important inputs to the Service Capacity CDB can easily be created.
Management sub processes are; the service
levels and SLAs. Current information from Remember the data contributors to the CDB are
monitoring tools related to systems, networks the key to its success. Input from the business,
and services. The service review results, includes the ‘business strategy’ and the
including any issues raised. Incidents and business plan.
Problems related to capacity, and any SLA
breaches. Service Management will provide information
about SLAs and a full definition of the quality
RCM’s key inputs include incidents or problems processes in place.
related to a particular component. Monitoring
information related to component utilisation. It Data about manufactures specifications for
is considered important to keep utilisation existing and new technology, will be provided
below certain industry standard levels for a by the technical teams.
component type.
And finally, the IT Financial Management team
Financial Plans and Budgets are a major input will provide fiscal data. Additional financial
to all 3 sub-processes. information will be provided from the CMDB, in
its role as a ‘super’ asset register.
Outputs from the sub-processes include a
Capacity Database, Baselines and thresholds The Capacity Plan
information, which we looked at earlier in this
lesson. Capacity reports will be produced by all
The Capacity Plan is a major output of the
three sub-processes, including, Trend, Ad hoc
Capacity Management process. It has a
and exception reports.
standard structure and includes;
Other outputs include recommendations for
• Assumptions - about levels of growth.
SLAs and SLRs, as Capacity Management
• A Management Summary
activity will turn initial SLRs into achievable and
• Business Scenarios
cost effective service level quality clauses.
• A Summary of Existing Services, problems
Charging and costing recommendations are also
or issues with current services and current
produced.
levels of utilization
• A Resource Summary – which will show
SCM and RCM will be suggesting ‘proactive
what has happened to particular
changes’ and ‘Service Improvements’, to
components over the last year and since
the last Capacity Plan

63
Lesson 4b Capacity Management

• The Capacity Plan will also contain Benefits & Problems


suggestions for cost effective service
improvements. The benefits of and potential difficulties with
Capacity Management are listed on Page 57 of
• A Cost Model will illustrate some costed the little ITIL book and in Section 6.4 of the
recommendations Service Delivery Manual.
• Recommendations for the business –
Capacity Management usually provides a Summary
number of alternatives for the business,
and it should be produced in a timescale
In this lesson we have been looking at the ITIL
which allows the recommendations to be
process of Capacity Management.
considered as part of the budget planning
lifecycle.
We have defined the goal of Capacity
Management in ITIL terms, and we have looked
One final note. Remember that the Capacity
in detail at the three Capacity Management
Plan should be updated regulary, in line with
sub-processes of Business, Service and
any revised business plan, or unexpected
Resource Capacity Management.
changes in the IT infrastructure, because new
business is won or lost.
We went on to examine the iterative Capacity
Management activities, of Monitoring, Analysis,
Critical Success factors in Capacity Tuning and Implementation, and the ad hoc and
Management. regular activities of Demand Management,
Modelling and Application Sizing.
Managing the capacity of large distributed
networks is becoming increasingly complex, and We highlighted the major inputs and outputs of
the financial commitment from business to IT the Capacity Management process, and defined
continues to increase. the contents of the Capacity Database and the
Capacity Plan. We concluded the lesson by
A corporate Capacity Management process, defining the critical factors for successful
ensures that the entire organisations capacity Capacity Management implementation.
requirements are catered for. However making
the process work successfully depends on
several critical factors. These include;

• Accurate business forecasts

• An understanding of current and future


technologies

• A cost effective Capacity Management


process

• Working closely with other effective


Service Management processes, for
example Problem and Change
Management

• Effective financial management

• Links to Service Level Management - to


ensure that any business commitments
are realistic

• And finally, the ability to plan and


implement the appropriate IT capacity
to match business needs. This provides
a longer-term proactive view.

There is a further list of potential benefits and


problems, associated with the Capacity
Management process on page 51 of the
ITSMF’s little ITIL book.

64
Lesson 5a Service Level Management

Lesson 5A Why do we need SLM?


Service Level Management Customers have become more aware of their
dependency on IT for successful business
Objectives operation. Hence they feel an increased need to
In this lesson we will be examining Service formalise the contractual basis on which IT
Level Management, which is covered in Chapter services are provided, and this is where Service
4 of the Service Delivery book in the IT Level Management can help.
infrastructure library.
Often, Service Level Management is a driver for
When you have completed this lesson you will CSIP or SIP – or Continuous Service
be able to: Improvement Programmes.

• Define Service Level Management Such programmes are aimed at achieving cost-
according to ITIL best practice. effective improvements to the services offered
by the IT service provider, in a rapidly changing
• Identify the core Service Level technical environment, without necessarily
Management sub-processes and activities being driven by customer demand.

• Understand the relationships between An example of this might be to take advantage


SLA’s, OLA’s and UPC’s, and recognise the of dramatically reduced networking costs to
main sections of a Service Level provide better response times than the
Agreement. customer originally specified. Or alternatively,
by providing the same response times but at a
• List the benefits gained from the Service much lower cost.
Level Management process.
It’s the responsibility of Service Level
Management to be aware of service
What is Service Level improvement opportunities, before the
Management? customers themselves begin to ask about them.

Service Level Management, is considered by Alternative Approaches to Service


many to be the heart of ITIL-driven service
management. Provision

ITIL defines its goal as: There are a number of ways that IT services
can be provided – each having their merits and
“To maintain and gradually improve business draw-backs.
aligned IT service quality, through a constant
cycle of agreeing, monitoring, reporting and In the simplest scenario there is just the
reviewing IT service achievements and through external provider of the IT service and the
instigating actions to eradicate unacceptable customer organisation. Services will be
levels of service.” provided on the basis of a contract between
these two parties.
Service Level Management exists to ensure that
service targets, such as availability or services, Whilst this has the benefit of simplicity, it is a
response times and so on, are agreed and risky strategy and one that generally leads to
documented in a way that the business poor support for the users and poor value for
understands. money for the corporate customer.

It is also there to ensure service achievements The next approach is often said to involve an
are monitored and reviewed on a regular basis. “intelligent customer” role. That is, somebody
who negotiates on behalf of the customer with
Service Level Agreements, which are managed suppliers for service delivery. That customer
through the Service Level Management Process, has a Service Level Agreement with the Service
provide specific targets against which the Level Management process, and the service is
performance of the IT provider can be judged. underpinned by an ‘Underpinning Contract’ with
the suppliers.
The Service Level Management Process is
responsible for ensuring Service Level In this situation, the internal IT department
adds little or no value. Such arrangements are
Agreements and underlying Operational Level
common where an ‘off-the-shelf’ package
Agreements or underpinning contracts are met.
solution is being provided by the supplier.

65
Lesson 5a Service Level Management

Probably the most common arrangement is Service Based Approach


where the customer has a ‘Service Level
Agreement’ with the Service Level Management So a particular service, say Service A, will be
team. provided in a generalised format to Customer
Groups 2 and 4. And in a similar way, Service D
In order for that service to be provided, it is will be provided to Customer Groups 1 and 2.
necessary for the Service Level Management
team to establish ‘Operational Level This allows us to have just one SLA per service
Agreements’ with their own internal IT - so 50 in our previous example.
departments, who in turn may have an
‘Underpinning Contract’ with the external The drawback of this approach is that it tends
suppliers of the various components. to make each SLA more complicated, since they
may have to cater for the fact that not all
Note that for any one service there may be groups covered by a service have exactly to
several Operational Level Agreements and same requirements. If there are geographical
several Underpinning Contracts. differences between the groups as well, then
this will also add to the complexity.
Finally, although it is much less common, the
whole process can be purely internal, and no Despite this problem, this is the most common
external contracts are therefore required. So approach that you’re likely to encounter.
the Customer has a Service Level Agreement
with Service Level Management and they have Customer Based Approach
an OLA with the internal IT department – and
that’s it.
An alternative approach is to turn the previous
model on its head and map Customer Groups
This last arrangement is fairly unusual because
onto Services.
most systems will depend on some external
supply.
An SLA is created for each customer group,
describing all of the services that each customer
It is, on the other hand, quite common for a
group will receive.
total service to be provided on the basis of a
combination of two or more of these strategies.
Here for example Customer Group 3 receives
three services, however they would have just
The SLA Structure one SLA, admittedly quite a complex one,
detailing how they would receive Services A, B
One of the early decisions that has to be made & C.
is the structure of the SLA procedure – which is
a major determinant of how many SLAs will end There are a couple of advantages to this
up being produced. approach. One is that the number of SLAs can
be dramatically reduced – in our previous
For example, if we had 1000 customers and 50 example with 10 customer groups and 50
services we could theoretically produce 50,000 services, we would end up with only 10 SLAs.
Service Level Agreements. This would clearly be
impractical. Also, it becomes relatively straightforward to
introduce variances on standard services
Fortunately most businesses don’t have 1000 between the different customer groups.
customers who are entirely independent of each
other and so there is usually commonality of The disadvantage is that the SLAs can be long
service requirements amongst groups of and complex and contain a great deal of
customers. duplication from one to the other.

As an example, lets suggest 10 major groups of


Multi-Level SLAs
customers, each of which has a common set of
service requirements. So by producing SLAs at
A third approach to structuring SLAs, is to have
the Customer Group level the number required
a Multi-level or hierarchical structure.
could be reduced to 500 – more manageable
but still excessive.
ITIL suggests three levels, namely: Corporate,
Customer and Service.
There are a number of ways in which this
problem can be overcome – perhaps the most
Corporate is the highest level and contains any
common one being the mapping of services
common features that are true of all services
onto customer groups.
across all customer groups. This might cover
things like service desk hours, escalation

66
Lesson 5a Service Level Management

procedures, contact points, roles and The purpose of an SLA is to document an


responsibilities, and so on. agreement, and as such shouldn’t be an
imposition on either the business or IT.
The next level down is the Customer level. Each Importantly, it must always be written in
of the SLAs produced at this level is a unambiguous business language, and shouldn’t
description of the services for a particular group contain any technical references, which make
of customers. So in our previous example there its intention unclear, and leaves the Business
would be 10 SLAs at this level. feeling uncomfortable authorising the
agreement.
At this level SLAs would contain everything that
was common for that particular group of So we have established what constitutes an
customers, but different from the generic SLA. So what exactly is an OLA or Operational
services that appeared in the higher Corporate Level Agreement. Well in simple terms OLAs are
level. agreements that define the internal IT
arrangements that support SLAs.
Finally, Service Level sits at the bottom of the
structure. Here we have a document OLAs are also known as back-to-back
representing each service used by that agreements. The most common use of an OLA
customer, and relevant to that particular is to define the relationship between Service
customer group. It only contains information Desk and internal support groups.
which differs from the corporate of customer
level clauses. OLAs are required to ensure that the SLA
targets agreed between customer and IT
Consequently we would have a larger number provider can be delivered in practice. They
of SLAs, but each would be relatively short. This describe each of the separate components of
in itself makes change management easier. the overall service delivered to the customer,
often with one OLA for each support group and
If for example, we decided to change the a contract for each supplier.
standard hours of the service desk from 9am
until 7pm, to 9am until 9pm, then that change A further additional contract exists to ensure
would only appear in the corporate level SLA. that SLAs are supported, and this is an
‘underpinning contract.’
It’s important when using the hierarchical
structure, that the correct level of authority is Underpinning contracts are put in place with
assigned to each level. external suppliers or vendors. It’s important
that all targets contained within both SLA’s, and
For Example, at Corporate level the document OLAs that relay on these external suppliers are
would be authorised at the highest ‘underpinned’ by the appropriate level of
management level liasing with IT. maintenance and support contracts.

Customer level documents might be authorised For example, an internal software development
by Department Heads, Finance, Planning, HR team might have in place an OLA between
and so on. Individual Service Level Agreements themselves and Service Level Management.
would be authorised at the next management This OLA offers, amongst other things, a
level down in each of these departments. guaranteed response time to serious problems,
of no more than 2 hours.
The general principal is that SLA’s are
authorised by paying customers on behalf of In order to guarantee these service levels, the
users in their part of the organisation. software development team might have an
underpinning contract in place with their
So what exactly is an SLA? development software vendor, ensuring that
problems can be resolved well within this 2 hour
Well in structure SLA’s are rather like contracts, time frame.
but they are not in themselves legal documents,
However they can be included in a legal A word of warning here, it’s critical that any
contract, particularly when establishing SLAs commitments made in an OLA are directly
directly with external suppliers. In such cases supported by the underpinning contract. For
an SLA would be included in the contract as a example, committing to a 4 hour fix time in an
schedule. OLA would be useless if our underpinning
contract only commits our supplier to a 6 hour
An SLA which is used internally between fix time!
departments has no legal weight, it’s simply a
document that has a contractual structure to it.

67
Lesson 5a Service Level Management

In the last few pages we have been looking at So lets look at these 4 stages individually, and
those agreements and contracts, which form an see how they fit together to form a complete
important part of Service level Management. Service Level Management process.

But how do we establish which services are The first stage is Initial Generic. The first
available for inclusion in these agreements and activity at this stage, assuming that a Service
contracts, and which ones our customer or Level Management team is in place, is to build
users would like? the initial Service Catalogue. As we mentioned
on the previous page, this activity documents
Well, there are two other important documents all currently available services, and which
in Service Level Management, which can help customers or users are using them. It also
us with this decision, and these are ‘A Service records whether they are formally documented
Catalogue’, and ‘Service Level Requirements or in any SLA’s, and whether it’s a service which
SLR’s.’ needs to continue.

A Service Catalogue contains a list of all It isn’t possible to document every possible SLA
services used by each customer group. A clause in the catalogue, it’s more important to
service Catalogue could be used internally by understand the scope of the catalogue, and the
the service provider, for example, the Service services within it, and also any major problems
Desk might use it to help them identify those with services, and any suggested changes to
customers entitled to a higher level of service. them.
It can also be used externally as a marketing
tool, providing a shop window, showing all the The second related sub-process is planning the
services on offer to the business. Commonly, SLA structure and establishing which SLAs we
Organisations now make this available on their need to create. This activity involves prioritising
intranet as a form of advertising, and the modification of pre existing SLAs, in order
generating ‘buy in’ to the services. to re work them into standard formats. Ask
yourself – are there any new services being
Service Catalogues exist in a number of forms. developed or purchased from a software
They are often created as an internal document, provider that might provide a better starting
listing existing services when Service Level point?
Management is initially established. At a later
stage, it might be published to potential Assuming, we’ve built the Service Catalogue,
customers, and the wider business as a whole, agreed the SLA structure, and prioritised the
in a more ‘glossy’ format. work, we can move onto the second stage of
‘Initial per-service’, and its related sub
In order to establish their exact requirements, processes where we address customer specific
the customer develops a Service Level issues.
Requirement document. When doing so, the
customer should be realistic about potential The first point is to establish Service Level
levels of service, and related costs. Remember Requirements or SLRs. Find out what users
this is not a wish list, and sensible advice would really like from that service, and what
should be offered from the Service Level customers are prepared to pay. We should try
Management team. There is no specific format to establish SLRs by checking requirement
for SLR’s, and each organisation will document documents that exist for new services in
it in their own way. development.

It’s important to remember that these It’s not uncommon for organisations to arrange
documents, along with SLA’s. OLAs and UPCs training programmes for senior customers, to
are all subject to the ITIL Change Management help them understand what SLRs are, how they
Process. should be specified and what is a realistic
request in service level terms.
In the next few pages we will look in some
detail at the Service Level Management sub- The second sub-process uses those SLRs to
processes. These sub-processes can be grouped review the underpinning contracts and OLAs
into 4 stages: already in place with internal and external
service providers. This might involve
• Initial Generic discussions about upgrading current statements
• Initial Per Service on service level and provision.
• On-going Per Service
• On-going Generic Once we are happy with both our OLAs and
UPCs we can create a draft SLA. The intention is
to put actual metrics against various service

68
Lesson 5a Service Level Management

quality clauses, including fix times for problems, and Service Level Management as a whole.
transaction response times and so on. These These processes include maintaining the
statements should be supported by ITSM Service Catalogue and updating it with new
colleagues, such as Service Desk, Capacity, services. Some organisations have automated
Availability and Problem Management, amongst document links from the Service Catalogue, to
others. individual SLAs, so when an SLA is changed,
then that change is reflected in the catalogue.
When the draft SLA is available, agreement Remember the Service Catalogue falls under
should be sort from customers and users that it the Change Management control process.
represents an adequate specification of service.
This is a process of negotiation, and might A further activity is to review the Service Level
involve talking to external and internal suppliers Management process itself. By establishing
about the cost of improving service quality Critical Success Factors (CSTs) we can measure
parameters to the customer. It might require performance, we can also set KPI’s or Key
several iterations of the process before Performance Indicators for what is considered a
agreement can be reached. Usually, the cost of successful service.
providing certain levels of service becomes
apparent to customers fairly quickly, resulting The final activity is to consider a Service
in more realistic negotiations. Improvement Programme or SIP. Service Level
Management should look at all provided
Once the agreement is formally signed, the SLA services and their associated quality
must be implemented. This involves informing requirements to see how we can improve
all parties constrained by the SLA, that it is in service levels without significant increases in
place. For example service desk staff, third cost to the business. This proactive SLM activity
party suppliers, users and so on. involves talking with colleagues in Availability
and Capacity Management, and IT
The third stage in the SLM process, includes the Infrastructure Management, to identify ways of
on-going per service activities of monitoring, improving response times, and improving
reporting and review and modify. availability to the business. This activity uses
SLA contents as a trigger for service
Monitoring involves using the technical tools improvement.
available to those working in Service
Management, to monitor the users important Reporting on Service Level
SLA clauses, such as response times for
enquiries at the Service Desk. SLM isn’t Achievements
responsible for the technical implementation of
monitors, however SLM takes responsibility to We briefly mentioned the activity of reporting
ensure that the necessary monitors are in earlier in this lesson. Reporting can be
place. Monitors can provide useful reporting subdivided in to either external or internal
information to IT and the business, and we will reporting.
be looking at reporting in more detail later in
the lesson. Internal reporting involves monitoring service
quality in SLAs and related OLAs and UPCs. This
Review and modification takes place at regular detailed monitoring of service quality, is
intervals via service review meetings. These normally set up by the Capacity and Availability
meetings are held at regular intervals, weekly Management processes. They will be interested
isn’t uncommon, but most likely monthly. The in all activity which affects all service clauses,
objective of these meetings is to produce short including breaks in service, time to repair,
reports on the way the SLA is working, debate response time to users and so on.
any problems or issues, and discuss any
changes to the SLA, which might be needed. Monitoring OLAs and UPCs will help us
These reports should be written in simple understand why SLA breaches are occurring,
business language, and state whether we have and also to identify future trends, and possible
met the SLA or not, descriptions of where we future SLA breaches. Remember you can’t
failed, and explanations of how we are going to control things that you can’t monitor.
prevent the failure occurring again. Remember
however, that any suggested changes to SLA’s External reporting should be written in a simple
should be authorised by the Change and clear way. An exception report is a typical
Management Process. example of external reporting, and it should
simply point out when, where and why SLA
The fourth and final Service Level Management breaches or near breaches occurred. It should
process stage is defined as ongoing generic. It also explain how we intend to prevent things
involves sub processes, which relate to SLA’s from getting worse.

69
Lesson 5a Service Level Management

down into several response types, including


system responses, a request via mouse click on
a PC for example, or an incident response,
detailing the maximum time allowable in
responding to an incident report.

There may be as many as 20 different


measurable clauses in an SLA, against which,
customers will want us to report.

The third section in our SLA deals with the


additional statements, such as service charges
and how they are structured. Mechanisms for
change should also be outlined in this section.
Remember however, that changes to SLA
clauses should be handled via the Change
Management process.
A Service Level Management Agreement
Monitoring Chart, or SLAM chart, is a popular Statements on provision of service in case of a
mechanism for external reporting, as are RAG, disaster are also important. It is the role of IT
or Red, Amber, Green charts. Both devices offer Service Continuity Management to create cost
simple to understand graphical representations effective plans to deal with potential disasters,
of service level parameters, and show where such as fire and flood. It’s common to state in
breaches or potential breaches have occurred. SLA’s at what level, and how quickly service will
be available after a disaster.
Another important monitoring tool are trend
graphs. Businesses are very interested in Also included are statements of User and
consistency of service as well as quality. For Customer responsibilities. Customer statements
example, trend graphs can display graphically might include defining the maximum number of
that over a three month rolling period, that the Users at any one time, or a commitment to
trend is for greater throughput of activity, and provide data to the IT supplier in the event of
for less breaks in service. In displaying these weekend working for example. This can be a
trends to customers, we can convince them that lengthy section of the SLA, and it’s important to
we are achieving Service level targets, and are remember that an SLA is an agreement
likely to continue to do so. between the business and IT with
responsibilities on BOTH sides.
Typical SLA contents
If a request is received to amend an SLA clause
it is important that the proposed change
So what does a typical Service Level Agreement undergoes a thorough impact analysis. Changes
consist of? Well, broadly speaking, its contents in one SLA can impact on others, for example
can be broken down into three sections. An changing one SLA to allow more users on a
introduction, Agreed Service Levels, and network might have an adverse effect on other
general extra statements. customers using the same network. This is
where a Service Level Management process
The SLA introduction describes the service, its benefits the organisation, because each service
scope, the intended customer group, the isn’t treated in isolation, and the whole Service
commencement date and its duration. It should Level Management team work together to
be written in clear and concise business terms, ensure quality of ALL services.
and it should be authorised at an appropriate
level, by both parties.
Reviews
‘Agreed Service Levels’ will define a number of
measurable clauses, for example, normal hours In order to establish customer’s perceptions of
of service, availability and reliability of the its service, Service Level Management should
service. carry out regular service review meetings.
Typically these meetings involve customers
Clauses related to ‘throughput’ are also rather than users and consequently shouldn’t
common, detailing the number of transactions be used as a substitute for user questionnaires
the service is expected to support in a defined and so on.
period.
Ahead of these meetings Service Level
SLA’s frequently contain clauses covering Management staff should review customer
transaction response times. This is often broken related incident records from the service desk,

70
Lesson 5a Service Level Management

so that they are able to answer any questions Benefits & Problems
about these incidents.
The benefits of and potential difficulties with
Review meetings can lead to suggestions for Service Level Management are listed on Page
change, remember however, they are not the 45 of the little ITIL book and in Section 4.2.1 of
place where changes are authorised. the Service Delivery Manual.

The Service Level Management process can Summary


carry out its own internal review. This review
should be carried out by the head of the Service
In this lesson we have been looking at Service
Level Management team, or process owner. A
Level Management.
key activity in the review process is to review
KPIs. Some typical example KPI’s might include
We have seen how ITIL defines the goal of
Customer Perception ratings, the number of
Service Level Management, how it’s often
service reviews held, and how many are held at
driven by a Service Improvement Programme,
the right time. ITIL suggests that these reviews
and why it’s regarded as essential to the ITIL
are held on an annual basis, although many
structure as a whole.
organisations hold them more frequently.
We examined the relationships between the
The role of the Service Level customer, the IT provider, and external
Manager suppliers, and went on to look at the structure
of Service Level Agreements and the different
The SLM Process must be 'owned' in order to way in which we can tailor service provision to
be effective and achieve successfully the customer needs.
benefits of implementation. This isn’t meant to
imply that this should be a single post, unless We went on to look at the structure of Service
that’s appropriate to your organisation. Level Agreements, and their relationships with
Operational Level Agreements and
The Service Level Manager must be at an Underpinning Contracts, and discussed how, by
appropriate level to be able to negotiate with producing a Service Catalogue and a Service
Customers on behalf of the organisation, and to Level Requirement document, we can better
initiate and follow through actions required to satisfy customer’s requirements.
improve or maintain agreed service levels. This
requires adequate seniority within the We examined the Service Level Management
organisation and/or clearly visible management sub-processes in detail, including, planning an
support. It’s important that the role acts as a SLA structure, and the monitor, report, review
conduit between IT specialists and the and modify activities.
customer, interpreting technical language from
the IT groups in to understandable business We listed the key characteristics of the Service
language, and vice versa. Level Manager role, and highlighted some of the
potential benefits and possible problems
• In summary we could define the associated with implementing a Service Level
characteristics of a Service Level Manager Management process.
as being:
• A good negotiator, firm but fair
• A good communicator, both written
and oral
• Business orientated, customer focused and
technically aware
• Good under pressure. This can be a
stressful role, as it interfaces between two
very strong minded communities.

71
Lesson 5b Financial Management for IT Services

Lesson 5b The mechanisms by which we achieve value for


money services are:
Financial Management
for IT Services By facilitating decision making

When you have completed this lesson you will For example, ITSM decision making will include
have a broad appreciation of Financial evaluating suggested changes and formulating
Management in an IT services context. business cases This work might include
calculating return on investment and is done by
• You will be able to explain the main reasons the IT Services Financial Management team on
why financial management is necessary and behalf of the IT service management group.
you will be able to recognise the three main
elements that define the scope of financial Financial forecasting is also a critical element of
management. the decision making process and can help avoid
cots over-runs, or resource shortages.
• You will be able to identify 6 types of cost
that are commonly encountered and then Containing Costs
classify these into one of six accounting cost
categories. This includes those costs incurred internally and
externally, through any supply contracts we
• Finally, you will be able to describe seven may have. It is very important that we know
different charging policies that can be about ALL our costs and are able to manage
applied to IT services. them.

Introduction Through this mechanism we take into account


total lifecycle costs, sometimes called Total Cost
The goal of Financial management for IT of Ownership (TCO), where we look at the cost
services is to provide cost effective stewardship of both developing something and then
of the IT assets and the financial resources supporting it over its lifetime.
used in providing IT services.
If as an IT organisation we don’t understand the
In the vast majority of situations the Financial costs of support, we won’t be able to make
Management of IT Services will have to operate correct project decisions about balancing rapid
within boundaries, and adhere to policies that development against high cost of support.
are set by the higher-level financial
management authorities within the We also can contain costs through demand
organisation. management. For example, ‘Differential
Charging’ may – subject to agreement with the
For this reason the topic of Financial business - be used to persuade people to use
Management is often regarded as a “cross-over” resources at different times.
area for IT staff, because it requires some
knowledge of accounting processes, but it is Optimising Service Value is all about helping the
very difficult to make a success of the process business balance the quality of the services it
without the full commitment of the receives against the cost of providing that
management from the corporate accounting service quality.
functions.
For example if users insisted on a 99.99%
Having said that, there are some special availability of a particular service during early
considerations in the management of IT SLA negotiations, IT financial management can
services’ finances and that is where the ITIL help service level management by producing an
guidance helps. accurate cost/benefit analysis of providing that
level of availability as opposed to a reduced
level.
Why Financial Management?
It is often important to demonstrate
The major reason for having financial
achievement, perhaps though some form of
management for the IT services area is to
‘benchmarking’, where costs are compared with
ensure the provision of value-for-money (VFM)
similar organisations. Such information can be
IT services.
important when the IT function has to defend
itself against claims that it is spending more
That is providing maximum business value for
money than it should be for the level of service
the minimum financial outlay.
being provided.

72
Lesson 5b Financial Management for IT Services

Finally, The Recovery of Costs from the users of There is a hierarchical dependency between
the resources is also an important element in Budgeting, Accounting and Charging which will
the Value for Money equation. Any decision to often develop as an organisation’s financial
recover IT costs, either totally or partially, is a policies become more mature and increase in
high level business decision usually taken at both scope and complexity.
Board level and is not considered mandatory
within ITIL. The starting point might be to introduce
budgeting on a one year ahead basis, for
The Scope of Financial example. This would tell us how much IT is
costing, but shed no light on how that figure is
Management for IT Services arrived at. Also, at this stage, we have done
nothing to recoup the costs from the business.
IT financial management for services is
normally considered as having three main
Knowing in detail where the money and the
areas, each of which has a number of sub-
resources it buys are being used only becomes
processes.
possible once we introduce the accounting
processes.
Those areas are Budgeting, Accounting and
Charging.
The ability to recoup this money only becomes
possible once we move into charging processes.
Budgeting is concerned with:
The ITIL guidance is that we should implement,
• Predicting the money needed to deliver the
at the very least, budgeting and accounting.
IT service.
Charging, as we have said is optional as far as
ITIL is concerned.
• Seeking to secure that money from the
business, and;
We should do this, almost certainly, before we
attempt charging. It is theoretically possible to
• Monitoring and controlling IT spend against
charge without understanding who is using
that budget over the given period.
what resource, but is unlikely to be acceptable
to the business.
• Accounting on the other hand is the set of
processes that allows the IT service provider
Accounting without budgetary control makes
to demonstrate where, within IT, the money
little sense and in general charging without
from that budget has gone.
accounting is not a good option.
Together, the budgeting and accounting
Once accounting is in place, we have a vehicle
processes identify all the costs incurred by the
for performing cost benefit analysis and return
IT service management, and enable us to
on investment calculations.
understand where that money is going to, in
terms of business support.
IT financial management will expect such
calculations to be done whenever there are
If we decide to use charging for IT services,
proposals for significant changes, or for the
then we are attempting to do is recover money
creation new services.
from the customers of the services.
The exact models for this will be dependent on
These charges must be demonstrated to be
standards for accounting within the
equitable, between IT and the business.
organisation.
As well as being equitable, changes must also
bear some relationship to the costs. How close Types of Cost
that relationship is, is usually a matter of
debate in organisations. It is very useful when creating a budget to
understand all of the resources that we have by
The closer we want the charges to relate to the breaking them down into various cost types.
cost to the IT organisation of service provision,
the more complex the charging process will be The suggested high level cost types that ITIL
and the more the overhead in gathering the recommends are:
necessary data. Hardware, such as computers, networking
equipment, data storage devices and so on.
Once they’ve been agreed between the
customers and the service level management Software, which would include operating system
team, charges must be documented in the SLA software and applications.
for each service that is charged for.

73
Lesson 5b Financial Management for IT Services

People, in other words, salaries, taxes, stands at zero and the full £10,000 has been
expenses, benefits and other costs of recovered from operational costs.
employment.
This is the process that accountants call
Accommodation, for example, offices, machine depreciation.
rooms, utilities, storage space and so on.
Conversely, some companies try to roll up some
External Service, covering items which might be operational costs and classify those as capital,
outsourced, such as development work, ISPs, so that they too can be written off over a
disaster recovery facilities and the like. number of years. A good example of this is
software development.
And finally Transfer, which is used to account
for the cross-charges that can take place A company may decide that is it has spent
between different parts of the business. For £100,000 on salaries to develop a software
example, if it was necessary for an Excel expert application then, once it is completed that
in Finance to give two days training to someone application becomes an asset of the company
in Human Resources, because IT lacked the with a value of £100,000 – and the depreciate
resource to do this, then IT would expect a that asset over the years of its life.
cross-charge for that persons time to come
from the Finance Department. Capitalisation and depreciation policies are very
much a concern for the central accounting
A useful aide-memoir for these cost types is the functions and are in many respects governed by
acronym HAS PET – as you can see. laws to prevent fraud and tax evasion. ITIL
suggests that we take advice from the main
Cost Classification accounting section on the use of depreciation.

Once the cost elements have been identified Direct and Indirect Costs
and their types understood, they will need to be
classified for accounting and financial purposes. Costs can further be classified into direct and
indirect costs.
As a minimum, ITIL recommends that costs
need to be classified as either Capital or Direct costs refer to a cost that is directly
Operational costs. attributable to a customer or a group of
customers. For example, if we are asked to buy
Capital expenditure is assumed to increase the a package and a server for the use of Human
total value of the company, while Operational Resources only, then we could regard these
expenditure does not. package and server costs as being a direct cost
that can be ‘charged back’ to the HR function.
So capital costs relate to outright purchases of
fixed assets and may apply to accommodation, Indirect costs cannot be allocated simply to one
computers, and workstations, for example. customer or group. They are costs that are
shared amongst groups.
Operational costs, on the other hand, can be
thought of as day-to-day running costs. Once There are commonly two types of indirect cost –
money is spent on these it is no longer available absorbed costs where the costs can be
to the company as an asset. Operational costs apportioned across a number of different
include salaries, rental of equipment or groups based on their respective usage of the
buildings, and licenses for software. resource concerned.

It is sometimes the case that organisations And unabsorbed costs, where it is too difficult to
make capital purchases but want to represent in determine who is using how much of the
their accounts the fact that this capital loses resource and so the cost is allocated as a simple
value over time. percentage uplift to all costs – in other words
an overhead.
So if £10000, say, is spent on an item of
equipment which is expected to last for three An example of this might be the cost of the
years, the assets of the company will be service desk where, rather than attempt to
immediately increased by £10,000. work out which group was behind every call and
how much time that took, we take the cost of
But in each of the next three years £3333 will the service desk in total and distribute it across
be taken out of the operational expenditure and all of the customer groups, based on their
the assets will be decreased by that amount, usage of other resources.
until at the end of three years the asset value

74
Lesson 5b Financial Management for IT Services

Finally, there are fixed and variable costs. Fixed invoices and for resolving disputes. All of that
costs remain constant regardless of usage, requires gathering and processing of data, and
whereas variable costs increase in proportion to a mix of financial and IT skills in order to be
the usage made of a resource. effective.

An example of a fixed cost might be a leased What is important is that costs are understood
communication line – the price of which does and that there is budgetary control. People are
not change regardless of how much or how little then aware of how much their business is
it is used. On the other hand an ISDN line spending on IT services, but they are not
might be an example of a variable cost, charged.
because it may be charged for on the basis of
the amount of traffic that uses it. The problem with a no-charging policy is that it
does not provide a means of managing
The concept of fixed and variable can also be customer expectations or manipulating demand.
applied to charging. But there are potential
pitfalls here. If it is decided to charge for services, then “Cost
Recovery” - attempting to get back from the
If a service that is charged for on a fixed price other business units just the cost of providing
basis is based on cost elements that are IT to them is known as the ‘zero-balance’
variable then if the workload increases policy.
dramatically the cost of providing the service
may end up being greater than the money Alternatively, a “Cost Plus” policy is where IT
being recouped. expects to recoup more than they spend,
perhaps as a mechanism for dealing with
The converse of this is also true – if charges are potential variation in demand over a number of
variable, but costs are fixed, difficulties can years, or possibly as a basis for funding
arise if the volumes end up being less than investment in new infrastructure components,
predicted. which will be a benefit to the business as a
whole.
Section 5.3 of the Service Delivery manual – or
Page 50 of the little ITIL book contains a useful It is also possible to subsidise the service and to
illustration of how the different cost types and go for a ‘Cost Minus’ policy. Here, we are not
categories that we have discussed can combine attempting to recoup all of the costs from the
to build a cost model for arriving at the total individual business units but do want to achieve
cost figure for a given customer. some element of cost consciousness.

It is worth spending some time in studying this The degree of ‘subsidy’ from the business as a
cost model. whole will be a high level management decision.

Charging Policies A “Going Rate’ approach, in ITIL terms, allows


the charges to be based on what other internal
Once budgeting and accounting procedures departments charge for their services or what
have been well established, then possible other IT departments in similar organisations
charging policies can be considered. charge their internal customers.

The decision on whether to implement charging, ‘Market rate’ charging uses an external cost
and if so on what basis, is not normally a comparison, where we see what external
decision for IT financial management – those providers would charge the business for the sort
kind of high level business decisions are almost of services we’re offering and use that figure as
always made at very senior management levels our charge. This is often a useful policy when
within the business. outsourcing is being considered.

There are a number of general charging policies Some organisations allow their IT departments
which are usually considered. to sell their services externally to the company,
in other words they become a profit centre in
It is quite valid for the organisation to decide their own right. This will tend to mitigate in
that they are not going to charge for IT favour of market-rate pricing, and the business
services. will need to decide how the extra money
generated will be used.
One of the reasons for deciding on this policy,
might be that there are costs involved in Finally we might decide on a negotiated “Fixed
charging. There will need to be a mechanism Price’ policy, where the actual price we charge
for setting the charges, for sending out bills and

75
Lesson 5b Financial Management for IT Services

is a result of an agreement between ourselves Summary


and the customer group.
In this lesson we have learned what is meant
Clearly it is very important to get those prices by Financial Management in an IT services
about right, otherwise an over-recovery might context and why it is a necessary process within
discourage users from using our services. ITIL
Conversely, an under-recovery would mean that
IT would have to be rescued at the end of the We have explored in some detail the three main
year by the business as a whole. elements that define the scope of financial
management – namely, Budgeting, Accounting
Whatever charging policy is decided upon, when and Charging.
it comes to actual pricing for services, ITIL best
practice advises that charges should be fair, We have considered 6 types of cost that are
understandable by the business, and subject to commonly encountered and have seen how
control by the business. these can be classified into one of six
accounting cost categories.
Benefits and Problems of ITFM
Finally, we have evaluated seven different
The benefits of and potential difficulties with charging policies that can be applied to IT
Financial Management for IT Services are listed services.
on Page 51 of the little ITIL book and in
Sections 5.1.7 and 5.1.9 of the Service Delivery
Manual.

76
Lesson 6a Continuity Management

Lesson 6A The BCM activity incorporates two elements, a


business focused element (Business Continuity
Continuity Management Planning) and a technology element (IT Service
Continuity Management Planning).
Objectives
Which of these processes is a sub set of the
The subject of this lesson is IT Service other depends on the nature of individual
Continuity Management, which is described in business, and the extent to which the business
chapter 7 of the ITIL Service Delivery book. depends on IT. In the ITIL guidance, it is
assumed that IT Service Continuity
Once you have completed this lesson you will; Management is a sub set of Business Continuity
Management, so we’ll follow their example in
• Understand the terms Business Continuity this lesson.
Management and IT Service Continuity
Management, and appreciate the Business Continuity Management is concerned
relationship between these two processes with evaluating business processes, and
considering the impact, if any, if these
• be able to identify a Vital Business Function processes can’t be performed.
is, in ITIL terms, and be aware IT Service
Continuity Managements links to other ITIL Amongst other things, BCM will need to look at
processes. cost effective ways of;

• Have an understanding of the Business • Reducing the likelihood of a threat occurring


Continuity Lifecycle and have seen ITIL’s risk
analysis techniques, and recovery options • Minimising the impact on the business if the
threat does occur, and
Let’s start this lesson by posing a question.
What do you think would happen to your • Having a ‘disaster recovery’ mechanism in
business immediately after a ‘disaster’, for place to deal with any threat that does
example if your offices burnt down, or a local materialize, which prevents ‘business as
river flooded your offices? usual activities.’

Your answer might be; ‘Well, the business is IT Service Continuity Management or ITSCM
insured, so the insurance company will sort focuses on the IT services that support the
everything out.’ business, and it’s this process, which the ITIL
guidance concentrates on. Remember,
But what happens at start of business tomorrow however, that there is no point in making huge
morning? Firstly and pretty obviously, day-to- efforts to maintain IT services under disaster
day business operations are going to stop, no conditions, if the business has no Business
office, means no staff accommodation, no staff Continuity Management process in place.
means no ongoing business activity. As a result
you can’t service customer accounts, take or So, if staff don’t know where they should go
despatch orders, collect payment and so on. It’s after a disaster, or the alternative office location
likely that you will lose existing and new hasn’t any chairs or desks, then there’s little
customers, sales and revenue. Ultimately the point in having a ITSCM process in place. Put
business could fail. simply, it’s important that IT service
management staff point out the critical need for
This all seems pretty unlikely, but if we consider the ‘business’ to have a Business Continuity
some other scenarios, such as a computer virus Plan.
infecting the servers via email, or a disgruntled
ex employee deleting critical data, these Vital Business Functions -VBF’s
potential threats seem more likely, particularly
when you consider that statistics suggest that Business Continuity Management, and so by
80% of businesses that suffer a ‘disaster’, go association, ITSCM are primarily concerned with
out of business within six months of it Vital Business Functions or VBFs.
happening.
VBF’s are the critical parts or components of a
So how does a business prepare for such service, and as such must be ‘reinstated’ as
eventualities? Well, one very good way, is to quickly as possible.
implement Business Continuity Management or
BCM. For example, your bank has a network of
ATM’s, which dispense cash and offer a
selection of other services, including printing or

77
Lesson 6a Continuity Management

displaying a balance. The bank might consider Continuity lifecycle and its four stages, which
that the only VBF performed by the ATM is the are;
dispensing of cash, and not the other services.
• Initiation
The role of ITSCM is to identify the IT VBF’s and • Requirements and Strategy
services, and agree with the business how • Implementation
quickly those VBF’s and services need to be • Operational Management
recovered.
The initiation stage
Sometimes a service, which is reinstated The activities to be considered during the
quickly, might have components missing, or the initiation process will depend on the level of
throughput performance of the network might contingency facilities already in place with the
be reduced. It’s important that agreement is organisation. Some parts of the business may
sort from business that a ‘reduced service’ is have already established individual continuity
better than no service at all. plans based on manual workarounds, and IT
may have developed contingency plans for
Not all aspects of the IT services will require systems perceived to be critical. This can
contingency plans in the event of a disaster. provide a worthwhile starting point for the
The business may be prepared to live without process, however, effective ITSCM is dependent
certain aspects of the IT infrastructure in the on supporting vital business functions, and
short term. So the focus of ITSCM is directed at ensuring that the available budget is applied in
the Vital Business Functions, and a relevant the most appropriate way.
amount of the available budget is assigned to
each. This amount of money assigned to a VBF The initiation process covers the whole of the
is proportional to its business importance. organisation and consists of the following
activities:
ITSCM has to have strong linkages with other
ITIL disciplines, in particular Availability • Policy Setting
Management and Service Level Management. • Specifying terms of reference and scope
For example, statements in SLA’s should define • The allocation of resources
what service levels are likely to be available • Defining the project organisation and control
under ‘disaster’ as well as ‘normal’ operations. structure
• And finally, agreeing the project and quality
Other linkages include: plans

Availability Management The Requirements and Strategy Stage, is, as


- delivering risk reduction measures to maintain the title suggests, split into two sections.
‘business as usual.’ Requirements, performs Business Impact
Analysis and Risk Assessment, and Strategy
Configuration Management determines and agrees risk reduction measures
– defining the core infrastructure and recovery options to support the
requirements.
Capacity Management
– ensuring that business requirements are fully ‘How much the organisation stands to lose, as a
supported through appropriate IT hardware result of a disaster or other service disruption,
resources is a key driver in determining ITSCM
requirements.
Change Management
– ensuring the currency and accuracy of the Risk analysis techniques such as CRAMM the
Continuity Plans through established processes CCTA’s Risk Analysis and Management
and regular reviews Methodology, and Business Impact Analysis, or
BIA, are performed in the Requirements and
And finally the use of statistics provided by Strategy Stage. From this, the business can
Service Desk and the Incident Management establish the level of criticality of its services.
process. We will discuss Risk Analysis in more detail later
in the lesson.
The ITSCM Processes
The implementation stage includes the detailed
It’s not possible to develop an effective ITSCM planning required to create the disaster
plan in isolation, it’s important that it supports recovery plan. This includes putting in place risk
the requirements of the business. In the next reduction and risk mitigation measures.
few pages we will be looking at the Business

78
Lesson 6a Continuity Management

An example of this might be a smoking ban, rating. The business then has to take measures
and the introduction of an automated sprinkler to deal with that risk.
system. Implementing counter measures can be
very costly, so a business case might be In order to do this type of risk analysis, it is
required to justify the level of investment. useful to have a Service Catalogue available.
You’ll remember that the service catalogue
Also during the implementation phase, featured in the lesson on Service Level
contracts will be signed with third party standby Management, and it contains a list of services
facility providers, if they are required. available to customers or users. We can use
information from this document to help asses
The final stage ‘Operational Management’ is the risk levels on different IT services.
responsible for educating all users and IT about
the service continuity processes, and Risk Analysis can also be applied at component
specifically what will happen in the event of a level, by looking at Configuration Items, and
disaster. judging what risks they are subject to.

Also remember that people will need to be This analysis could identify a component failure
trained in their ‘disaster recovery plan’ roles. risk in a particular service. We could mitigate
For example, somebody will have to liase with the risk, by sharing it with another service, that
the press in a public relations role, and training is made up of the same components.
might be needed for this.
Any component within the IT infrastructure that
Risk Assessment and Counter has no backup capability, and can cause impact
Measures to the business and users when it fails, is
known as a SPOF, or Single Point Of Failure. A
particular concern of ITSCM, are ‘Hidden
There a number of other approaches to
SPOF’s’. An example of a hidden SPOF might be
assessing risk, perhaps the simplest looks at
the point where multiple data cables enter an
the probability of something occurring, and the
office via an underground duct. A Significant
impact if it did. This approach can be
failure would occur if the cable were severed
represented in a matrix format as shown here.
during building works.
The highest risk status being one with both a
high probability and impact. Conversely, a low
impact and a low probability would mean a ‘low Contingent Risk Countermeasures
risk’ category. A business response to this low
level risk might be to ‘just deal with it if it ITIL suggests a number of possible options
happens.’ when dealing with a ‘disaster recovery’
situation.
We mentioned the CCTA’s Risk Analysis and
Management Method or CRAMM earlier. This The first option is ‘do nothing’. Surprisingly
involves the identification of risks, any this can be a valid response, if the business has
associated threats, vulnerabilities and impacts, decided that the complete loss of some service
together with the subsequent implementation of in a disaster is acceptable. For example, the
cost justifiable counter measures. business might have insurance in place to cover
any potential ‘loss of business’.
CRAMM is a very useful method for looking at
threats that might affect the availability of ‘Manual back up’ can be an effective interim
service, as it focuses on asset values. Assets measure until the IT service is resumed. Any
could be hardware, software, people, buildings, procedures should be well documented and
telecommunications and so on. It then understood. This is possibly the most unlikely
examines the various threats that could exist, option suggested by the ITIL guidance. Would it
and how vulnerable the assets are to these be possible for example, to go back to manual
threats. The results can provide a ‘risk rating’ ordering for a short period of time, rather than
which is very useful to the business. a computerised system?

For example, we are generally aware there is a The third option is a ‘Reciprocal
threat of flood. We might then find that our Arrangement’, where organisations agree to
mainframe computer systems are vulnerable to back each other up in an emergency. This is
this threat, because they are housed in a site, rarely used now except for the off-site storage,
which is below the water level of a tidal river. as it assumes that both organisations have
The asset would be significant in terms of the enough spare capacity to fully support the
computer equipment and the services based on other.
it, and therefore this would give us a ‘high risk’

79
Lesson 6a Continuity Management

There are examples of Reciprocal Arrangements Testing IT Service Continuity Plans


working effectively on an international basis,
where there are significant time differences. It We have seen so far in this lesson, how ITSCM
relies on a network switch to allow activities prepare the business against any
communication to an alternative processing internal or external threats, and documents
environment ‘out of hours’. recovery procedures in an IT Service Continuity
Plan. The question now is, how can we be sure
The next three options are Gradual, that the plan will work successfully when we
Intermediate or Immediate recovery. Any of actually use it?
which can be provided either internally, by the
business itself, or externally by a contracted Well the obvious answer is to test it! The ITIL
third party. In all cases the alternative guidance suggests that plans should be tested;
environment that the business moves to, can be
either fixed or portable. If it is fixed, then the • After the plan has been written
business goes to a particular location to make • After any major changes, either in the plan,
use of the services. or to the IT infrastructure itself
• At least annually
If it is portable then the services may be • And after we’ve actually had to use the plan
brought to the business premises. An example and have restored ‘normal’ service.
of a portable solution might consist of a ‘mobile
computer room’ which is placed adjacent to the This sounds a bit odd, but this is a very good
business’s existing building. time to perform a test, to make sure that any
lessons learned from the disaster, and the way
The main difference between these options is the plan worked in practice, have been put into
the time scale of recovery. place. The plans can be tested altogether as a
‘big-bang’ approach, or on a service by service
Gradual recovery is also known as ‘Cold basis.
Standby’. This option involves providing only
the essential services such as power, air Test types include ‘dry runs’, where we walk
conditioning, network wiring and so on. The through the stages of the plan, and each staff
facility doesn’t contain any computer member ‘plays’ their designated role. Next we
equipment. This option is used when a business can plan a test on an agreed future date, which
can function for a period of 72 hours or more might involve visiting the remote site, and
without IT services. trying to run critical services.

Intermediate Recovery is also known as The most difficult and expensive test type, is
‘Warm Standby’ and is used where recovery is the full, unannounced test. This can be the
needed between 24 and 72 hours. A ‘Warm most effective way of finding flaws in the plan,
Standby’ facility will have the required but it’s the most disruptive to the day-to-day
computer equipment in place, but it wouldn’t be business activities.
configured or loaded with current operational
software. Key ITSCM Decisions
Immediate Recovery is also known as ‘Hot
There are a number of critical decisions, which
Standby’. This would normally involve the use
must be made by the ITSCM process.
of an alternative site with continuous mirroring
of the live environment and data. Recovery
An important one is deciding on how many
could be almost instantaneous, but the general
copies of the Continuity Plan we should have,
definition of immediate recovery is to allow up
and where they are going to be kept. For
to 24 hours for full recovery.
example, it would be very risky to have just one
copy of the plan stored at the site it provides
There are potential risks from having a ‘hot
contingency for! Many organisations keep plans
standby’ site in very close proximity to the
at the alternative site, or a local bank. The IT
business’s main site. Although it reduces
Service Business Continuity Manager may well
logistical and network issues, the whole site,
keep copies at home.
including the ‘hot standy’ could be at risk from
a disaster. So combinations of these options are
Remember that all copies should be kept in
sometimes used, and might include the use of a
‘sync’ to reflect any changes to the
‘hot standby’ third party site for a two or three
infrastructure or the plan.
days, while the internal intermediate site is
configured. This would reduce third party costs,
Another key decision is how and when to invoke
but would involve moving site twice.
the contingency plan. How long are we going to
wait before we act after a major failure?

80
Lesson 6a Continuity Management

Invoking a disaster recovery plan is an Benefits and Possible Problems


expensive and complex process, so the with ITSCM
temptation is to wait and hope that it’s an
Availability Issue rather than a Continuity
The key benefits of the ITSCM process include;
Management issue. Deciding how long to wait
before invoking the plan is difficult. Ultimately it
The management of risk, and as a
will be driven by the business, and by the
consequence, a reduced impact from failures in
criticality of the services that are being
the IT infrastructure
disrupted.
Potentially lower insurance premiums as a
Who does what during the disaster recovery
result of implementing good counter-measures
period, should also be documented. Questions
like, which team members go to the alternative
And the fulfilment of mandatory and regulatory
site, who books hotel accommodation, should
requirements.
be addressed. Importantly, everyone should
understand their role.
You can see a comprehensive list of benefits
and potential problems associated with ITSCM
The process of leaving the recovery site, and
on pages 62 and 63 of the ITSMF’s, IT Service
returning to normal working at the original site,
Management guide, or the little ITIL book for
should also be documented. Make sure that all
short.
the necessary work has been done at the home
site before the returning, and that clearly
defined processes are in place for the move. Lesson Summary
Errors are easily made at this point, so
particular attention should be made to In this lesson we have been examining the
removing all commercial or confidential data Business Continuity Management and IT Service
from the back up site before departure. Continuity Management processes.

A comprehensive list of all third party We looked at the relationship between the two
infrastructure suppliers should also be drawn processes, and have seen how ITSCM defines
up. Including those for operational and recovery key IT activities as Vital Business Functions.
systems. It’s also important to tell them to visit
the back up site if they are called out. We saw how ITSCM links to other ITIL
processes, and went on to look in some detail
Similarly, the details of third party contractors, at the Business Continuity Lifecycle.
particularly those who are providing recovery
services, should also be to hand. ITIL helps We have seen some of the Risk Analysis
here, by providing a Pro Forma disaster techniques used in the Requirements and
recovery plan that can be used as a basis for Strategy stage of the lifecycle, and listed all of
creating our own version. This pro forma the ITIL Recovery options.
contains an annex for all the contact details.
And finally, we looked at how to test IT Service
And finally. Ask the question, does our Continuity Plans, and at some of the key
contingency supplier, have in place, their own decisions required by the IT Service Continuity
contingency plan. There have been several Management process.
recent examples, where third party recovery
service providers, have been literally ‘deluged’
by demand. For example, serious flooding has
resulted in them receiving multiple requests for
help. Leaving them unable to fulfil their
contractual obligations.

Some Service Recovery Organisations have in


place, a switching facility, where they can
transfer demand to other sites in other
countries. Ultimately this makes their service
more robust.

81
Lesson 7a Passing the ITIL Foundation Exam

Lesson 7a You have already seen questions which are


typical of those asked in the Foundation exam
Passing The ITIL Foundation as we have worked through this course.
Examination
The exam itself consists of 40 such multiple
choice questions and one hour is allowed. In
Objectives order to achieve a pass at least 26 questions
must be answered correctly – in other words
So far in this course, we have concentrated on 65% or more of the questions asked.
your knowledge of ITIL – what it is, what it
contains and how it works in practice. The examination is “closed book” – in other
word you can take no notes or documentation
Well, do your remember at school, that there of any kind into the exam room with you.
were always some kids who may not have been
that bright, and may not have worked that hard The first tip for doing well at Foundation level is
– but they always got through the exams OK. therefore to do your homework.
Every class had at least one of them!
Study this course material, read the manuals
The reason that they did get through was not and the “little ITIL” books and practice in the
down to luck - they were just good at taking exam simulator.
exams.
Once you are regularly scoring in excess of 30
They had the right mental approach, and they out of 40 in the exam simulator you can be
worked out how to stack the odds in their reasonably sure that will you pass the real
favour. Foundation exam.

In this session we will be looking at how you


too can increase your chances of getting a good
The Foundation Exam
result in the Foundation exam – not just by
Assuming that you have done all your
knowing ITIL well – but by approaching the
preparation and that you have all the required
examination in an objective and systematic
knowledge, the next step is to focus on the
manner.
examination itself.
Introduction & Background As we have seen, you are allowed 60 minutes
to answer 40 questions. The vast majority of
The ITIL examinations are administered, on people finish the exam well within this time – so
behalf of the OGC, by the ISEB who are based Tip 1 – Don’t feel under time pressure. Remain
in Swindon in England and EXIN who are a cool and stick to your game plan. You have
Dutch organisation. plenty of time.

There are currently three examination levels The Foundation questions can be categorised
and associated qualifications, they are: into three types:

There are three internationally recognised ITIL Those that you find really easy and can answer
certificates; Foundations, Practitioner’s and without too much thought. Although do be
Manager’s. careful with the exact semantics of some of the
questions and make sure that you have
This course only addresses the first of these, properly read the questions.
which is, as you would expect, the entry-level
qualification. It is a prerequisite for going on Those that you probably know the answer to
the take the more advanced certificates. but the wording of the question needs some
digesting. There are a lot of “negative” type
The objective of the Foundation exam is to questions so do be careful over these.
confirm a very broad-brush knowledge across
the whole of ITIL and therefore does not Those that, even though you understand the
demand a very detailed knowledge within any question, you are not 100% sure of the answer.
specific area.
A good strategy is therefore to do the exam
In simple terms this is a test that you are paper in three passes. This is something that
broadly familiar with the contents of the Service you have not had the luxury of doing in the
Delivery and Service Support manuals. exam simulator.

82
Lesson 7a Passing the ITIL Foundation Exam

When you are first presented with the paper, Now one last exercise.
work your way through, answering all the
questions where the right answer is ITIL is nothing if not full of acronyms – and
immediately obvious to you. Avoid any many of the questions in the Foundation exam
temptation to deliberate too long over any assume that you are familiar with most of
question. If in doubt move on to the next one. them.

This first pass will ensure that – in the unlikely So it is worthwhile running through the list of
event that you do run out of time – at least you acronyms given in the little ITIL books and the
will have answered all the easy questions. For manuals themselves and memorising the less
anybody who has done the right level of obvious ones.
background study and preparation this alone
will probably be enough to secure a pass. In the meantime try this little test. Use your
mouse to drag and drop the right words into
Now go back to the beginning of the paper and position to correctly interpret these acronyms.
start work on all the second category of
questions.

Once you have worked out what a question


means, if you know the answer then answer the
question, otherwise move on to the next
unanswered question.

This time when you get to the end of the paper


you should have answered all the questions that
you understand and that you are confident of
the answers – hopefully by now you will have
answered the majority of the 40 questions.

Now it’s time for the third and final pass. Go


back to the start of the paper again and
consider each of the questions that you have
not yet answered.

At this stage you may need to be careful over


the timing – what you don’t want to do is run
out of time and leave any questions
unanswered – even if you have to guess the
answers.

Marks don’t get subtracted for wrong answers


so if you have 4 or 5 questions that you just
don’t know the answer to – make guesses – by
random chance you will get at least one of them
right.

So, count up how many questions still remain


unanswered and allocate a maximum time for
each one so that you will just get them all
answered. If you have 5 unanswered questions
and 5 minutes left – don’t spend more than one
minute on any one question. Again – never
submit a paper unless all the questions have
been answered.

One final tip – be very careful about changing


any of your answers. Experience has shown
that about two thirds of changes that
candidates make to their answers are in fact
changing a correct answer to an incorrect one.
Often your first instincts are the right ones.

83
Acronyms

ACD Automatic Call Distribution

AMDB Availability Management Database

ASP Application Service Pro vider

AST Agreed Service Time

ATM Automatic Teller Machine

BCM Business Continuity Management

BCP Business Continuity Plan(ning)

BIA Business Impact Analysis

BITA Business IT Alignment

BQF British Quality Foundation

BRM Business Relationship Management

BSC Balanced Scorecard

BSi British Standard Institution

C&CM Configuration and Change Management

CAB Change Advisory Board

CAB/EC Change Advisory Board/Emergency Committee

CASE Computer-Aided Systems Engineer

CCTA Central Computer and Telecommunications Agency

CDB Capacity Database

CFIA Component Failure Impact Analysis

CI Configuration Item

CIA Confidentiality, Integrity and Availability

CMDB Configuration Management Database

CMM Capability Maturity Model

COP C d fP ti

84
Acronyms

CRAMM CCTA Risk Analysis & Management Method

CRM Customer Relationship Management

CSBC Computer Services Business Code

CSF Critical Success Factor

CSS Customer Satisfaction Survey

CTI Computer Telephony Integration

DBMS DataBase Management System

DHS Definitive Hardware Store

DHL Definitive Hardware Library

DISC Delivering Information Systems to Customers

DR Disaster Recovery

DRP Disaster Recovery Plan(ning)

DSL Definitive Software Library

DT Down Time

EDI Electronic Data Interchange

EFQM European Foundation for Quality Management

EUA End User Availability

EUDT End User Down Time

EUPT End User Processing Time

EXIN Examen Instituut (Dutch Examination Board)

FSC Forward Schedule of Change

FTA Fault Tree Analysis

GUI G hi lU I t f

85
Acronyms

HD Help Desk

ICAM Intergrated Computer-Aided Manufacturing

ICT Information and Communication Technology(ies)

ID Identification

IDEF ICAM Definition

IP Internet Protocol

IR Incident Report

IS Information System(s) / Information Service(s)

ISEB Information Systems Examination Board

ISO International Standards Organisation

ISP Internet Service Provider

IT Information Technology

ITAMM IT Availability Metrics Model

ITIL Information Technology Infrastructure Library

ITSC IT Service Continuity

ITSCM IT Service Continuity Management

ITSM IT Service Management

itSMF IT Service Management Forum

IVR Interactive Voice Response

JD Job Description

KE Known Error

KEL Known Error Log

KER K E R d

86
Acronyms

KPI Key Performance Indicator

KSF Key Success Factor

LAN Local Area Network

MBNQA Malcolm Baldrige National Quality Award

MIM Major Incident Management

MTBF Mean Time Between Failures

MTBSI Mean Time Between System Incidents

MTTR Mean Time To Repair

OGC Office of Governnment Commerce

OLA Operational Level Agreement

OLTP On-line Transaction Processing

PAD Package Assembly/Disassembly device

PC Personal Computer

PER Project Evaluation Review

PIR Post-Implementation Review

PM Problem Management

PKI Public Key Infrastructure

PR Problem Record

PRINCE2 Projects IN Controlled Environments

PSA Projected Service Availability

QA Quality Assurance

QMS Q lit M tS t

87
Acronyms

RAG Red-Amber-Green

RAID Redundant Array of Inexpensive Disks

RCM Resource Capacity Management

RFC Request For Change

RFS Request For Service (Service Request)

ROCE Return On Capital Employed

ROI Return On Investment

RWO Real World Object

SAC/D Service Acceptance Certificate/Document

SCI Software Configuration Item

SCM Software Configuration Management

SIP Service Improvement Programme

SLA Service Level Agreement

SLAM SLA Monitoring

SLM Service Level Management

SLO Service Level Objective

SLR Service Level Requirement

SMO Service Maintenance Objectives

SMT Service Management Team

SOA System Outage Analysis

SPICE Software Process Improvement Capability dEtermination

SPOF Single Point of Failure

SQP Service Quality Plan

SSADM Structured Systems Analysis and Design Method

88
Acronyms

TCO Total Cost of Ownership

TOP Technical Observation Post

TOR Terms of Reference

TP Transaction Proccessing

TQM Total Quality Management

UPS Uninterruptible Power Supply

VBF Vital Business Function

VOIP Voice Over Internet Protocol

VSI Virtual Storage Interrupt

WAN Wide Area Network

WFD Work Flow Diagram

WIP Work in Progress

89
Glossary of Terms

Absorbed Overhead which, by means of absorption rates, is included in costs of


overhead specific products or saleable services, in a given period of time. Under-
or over-absorbed overhead. The difference between overhead cost
incurred and overhead cost absorbed: it may be split into its two
constituent parts for control purposes.

Absorption costing A principle whereby fixed as well as variable costs are allotted to cost
units and total overheads are absorbed according to activity level. The
term may be applied where production costs only, or costs of all
functions are so allotted.

Action lists Defined actions, allocated to recovery teams and individuals, within a
phase of a plan. These are supported by reference data.

Alert phase The first phase of a business continuity plan in which initial emergency
procedures and damage assessments are activated.

Allocated cost A cost that can be directly identified with a business unit.

Apportioned cost A cost that is shared by a number of business units (an indirect cost).
This cost must be shared out between these units on an equitable basis.

Asset Component of a business process. Assets can include people,


accommodation, computer systems, networks, paper records, fax
machines, etc.

Asynchronous In a communications sense, the ability to transmit each character as a


/synchronous self-contained unit of information, without additional timing information.
This method of transmitting data is sometimes called start/stop.
Synchronous working involves the use of timing information to allow
transmission of data, which is normally done in blocks. Synchronous
transmission is usually more efficient than the asynchronous method.

Availability Ability of a component or service to perform its required function at a


stated instant or over a stated period of time. It is usually expressed as
the availability ratio, i.e. the proportion of time that the service is
actually available for use by the Customers within the agreed service
hours.

Balanced An aid to organisational performance management. It helps to focus,


Scorecard not only on the financial targets but also on the internal processes,
Customers and learning and growth issues.

Baseline A snapshot or a position which is recorded. Although the position may


be updated later, the baseline remains unchanged and available as a
reference of the original state and as a comparison against the current
position (PRINCE 2).

Bridge Equipment and techniques used to match circuits to each other ensuring
minimum transmission impairment.

90
Glossary of Terms

BS7799 The British standard for Information Security Management. This


standard provides a comprehensive set of controls comprising best
practices in information security.

Build The final stage in producing a usable configuration. The process involves
taking one of more input Configuration Items and processing them
(building them) to create one or more output Configuration Items e.g.
software compile and load.

Business function A business unit within an organisation, e.g. a department, division,


branch.

Business process A group of business activities undertaken by an organisation in pursuit


of a common goal. Typical business processes include receiving orders,
marketing services, selling products, delivering services, distributing
products, invoicing for services, accounting for money received. A
business process usually depends upon several business functions for
support, e.g. IT, personnel, accommodation. A business process rarely
operates in isolation, i.e. other business processes will depend on it and
it will depend on other processes.

Business recovery The desired time within which business processes should be recovered,
objective and the minimum staff, assets and services required within this time.

Business recovery A template business recovery plan (or set of plans) produced to allow
plan framework the structure and proposed contents to be agreed before the detailed
business recovery plan is produced.

Business recovery Documents describing the roles, responsibilities and actions necessary
plans to resume business processes following a business disruption.

Business recovery A defined group of personnel with a defined role and subordinate range
team of actions to facilitate recovery of a business function or process.

Business unit A segment of the business entity by which both revenues are received
and expenditure are caused or controlled, such revenues and
expenditure being used to evaluate segmental performance.

Capital Costs Typically those costs applying to the physical (substantial) assets of the
organisation. Traditionally this was the accommodation and machinery
necessary to produce the enterprise's product. Capital Costs are the
purchase or major enhancement of fixed assets, for example computer
equipment (building and plant) and are often also referred to as 'one-
off' costs.

Capital investment The process of evaluating proposed investment in specific fixed assets
appraisal and the benefits to be obtained from their acquisition. The techniques
used in the evaluation can be summarised as non-discounting methods
(i.e. simple pay-back), return on capital employed and discounted cash
flow methods (i.e. yield, net present value and discounted pay-back).

Capitalisation The process of identifying major expenditure as Capital, whether there


is a substantial asset or not, to reduce the impact on the current
financial year of such expenditure. The most common item for this to be
li d t i ft h th d l di h h d

91
Glossary of Terms

Category Classification of a group of Configuration Items, Change documents or


problems.

Change The addition, modification or removal of approved, supported or


baselined hardware, network, software, application, environment,
system, desktop build or associated documentation.

Change Advisory A group of people who can give expert advice to Change Management
Board on the implementation of Changes. This board is likely to be made up of
representatives from all areas within IT and representatives from
business units.

Change authority A group that is given the authority to approve Change, e.g. by the
project board. Sometimes referred to as the Configuration Board.

Change control The procedure to ensure that all Changes are controlled, including the
submission, analysis, decision making, approval, implementation and
post-implementation of the Change.

Change document Request for Change, Change control form, Change order, Change
record.

Change history Auditable information that records, for example, what was done, when it
was done, by who and why.

Change log A log of Requests for Change raised during the project, showing
information on each Change, its evaluation, what decisions have been
made and its current status, e.g. Raised, Reviewed, Approved,
Implemented, Closed.

Change Process of controlling Changes to the infrastructure or any aspect of


Management services, in a controlled manner, enabling approved Changes with
minimum disruption.

Change record A record containing details of which CIs are affected by an authorised
Change (planned or implemented) and how.

Charging The process of establishing charges in respect of business units, and


raising the relevant invoices for recovery from customers.

Classification Process of formally grouping Configuration Items by type e.g. software,


hardware, documentation, environment, application.
Process of formally identifying Changes by type e.g. project scope
change request, validation change request, infrastructure change
request. Process of formally identifying incidents, problems and known
errors by origin, symptoms and cause.

Closure When the Customer is satisfied that an Incident has been resolved.

Cold stand-by See 'Gradual Recovery'.

Command, The processes by which an organisation retains overall co-ordination of


control and its recovery effort during invocation of business recovery plans.
communications

92
Glossary of Terms

Computer Aided A software tool for programmers. It provides help in the planning,
Systems analysis, design and documentation of computer software.
Engineering

Configuration Configuration of a product or system established at a specific point in


Baseline time, which captures both the structure and details of the product or
(see also system, and enables that product or system to be rebuilt at a later date.
Baseline)

Configuration Activities comprising the control of Changes to Configuration Items after


control formally establishing its configuration documents. It includes the
evaluation, co-ordination, approval or rejection of Changes. The
implementation of Changes includes changes, deviations and waivers
that impact on the configuration.

Configuration Documents that define requirements, system design, build, production,


documentation and verification for a configuration item.

Configuration Activities that determine the product structure, the selection of


identification Configuration Items, and the documentation of the Configuration Item's
physical and functional characteristics including interfaces and
subsequent Changes. It includes the allocation of identification
characters or numbers to the Configuration Items and their documents.
It also includes the unique numbering of configuration control forms
associated with Changes and Problems.

Configuration Component of an infrastructure - or an item, such as a Request for


Item (CI) Change, associated with an infrastructure - which is (or is to be) under
the control of Configuration Management. CIs may vary widely in
complexity, size and type - from an entire system (including all
hardware, software and documentation) to a single module or a minor
hardware component.

Configuration The process of identifying and defining the Configuration Items in a


Management system, recording and reporting the status of Configuration Items and
Requests for Change, and verifying the completeness and correctness of
configuration items.

Configuration A database which contains all relevant details of each CI and details of
Management the important relationships between CIs.
Database

Configuration A document setting out the organisation and procedures for the
Management plan Configuration Management of a specific product, project, system,
support group or service.

Configuration A software product providing automatic support for Change,


Management Tool Configuration or version control.
(CM Tool)

Configuration A hierarchy of all the CIs that comprise a configuration.


Structure

Contingency Planning to address unwanted occurrences that may happen at a later


Planning time. Traditionally, the term has been used to refer to planning for the
recovery of IT systems rather than entire business processes.

93
Glossary of Terms

Cost The amount of expenditure (actual or notional) incurred on, or


attributable to, a specific activity or business unit.

Cost effectiveness Ensuring that there is a proper balance between the quality of service
on the one side and expenditure on the other. Any investment that
increases the costs of providing IT services should always result in
enhancement to service quality or quantity.

Cost Management All the procedures, tasks and deliverables that are needed to fulfil an
organisation's costing and charging requirements.

Cost unit In the context of CSBC the cost unit is a functional cost unit which
establishes standard cost per workload element of activity, based on
calculated activity ratios converted to cost ratios.

Costing The process of identifying the costs of the business and of breaking
them down and relating them to the various activities of the
organisation.

Countermeasure A check or restraint on the service designed to enhance security by


reducing the risk of an attack (by reducing either the threat or the
vulnerability), reducing the Impact of an attack, detecting the
occurrence of an attack and/or assisting in the recovery from an attack.

Crisis The processes by which an organisation manages the wider impact of a


management disaster, such as adverse media coverage.

Customer Owner of the service; usually the Customer has responsibility for the
cost of the service, either directly through charging or indirectly in
terms of demonstrable business need. It is the Customer who will define
the service requirements.

Data transfer time The length of time taken for a block or sector of data to be read from or
written to an I/O device, such as a disk or tape.

Definitive The library in which the definitive authorised versions of all software CIs
Software Library are stored and protected. It is a physical library or storage repository
(DSL) where master copies of software versions are placed. This one logical
storage area may in reality consist of one or more physical software
libraries or filestores. They should be separate from development and
test filestore areas. The DSL may also include a physical store to hold
master copies of bought-in software, e.g. fire-proof safe. Only
authorised software should be accepted into the DSL, strictly controlled
by Change and Release Management.
The DSL exists not directly because of the needs of the Configuration
Management process, but as a common base for the Release
Management and Configuration Management processes.

Delta Release A release that includes only those CIs within the Release unit that have
actually changed or are new since the last full or Delta Release. For
example, if the Release unit is the program, a Delta Release contains
only those modules that have changed, or are new, since the last full
release of the program or the last Delta Release of the modules - see
also 'Full Release'.

94
Glossary of Terms

Dependency The reliance, either direct or indirect, of one process or activity upon
another.

Depreciation The loss in value of an asset due to its use and/or the passage of time.
The annual depreciation charge in accounts represents the amount of
capital assets used up in the accounting period. It is charged in the cost
accounts to ensure that the cost of capital equipment is reflected in the
unit costs of the services provided using the equipment. There are
various methods of calculating depreciation for the period, but the
Treasury usually recommends the use of current cost asset valuation as
the basis for the depreciation charge.

Differential Charging business customers different rates for the same work, typically
charging to dampen demand or to generate revenue for spare capacity. This can
also be used to encourage off-peak or night time running.

Direct cost A cost that is incurred for, and can be traced in full to a product,
service, cost centre or department. This is an allocated cost. Direct
costs are direct materials, direct wages and direct expenses.

Disaster recovery A series of processes that focus only upon the recovery processes,
planning principally in response to physical disasters, that are contained within
BCM.

Discounted cash An evaluation of the future net cash flows generated by a capital project
flow by discounting them to their present-day value. The two methods most
commonly used are:

• Yield method, for which the calculation determines the internal


rate of return (IRR) in the form of a percentage
• Net present value (NPV) method, in which the discount rate is
chosen and the answer is a sum of money.

Discounting The offering to business customers of reduced rates for the use of off-
peak resources (see also Surcharging).

Disk cache Memory that is used to store blocks of data that have been read from
controller the disk devices connected to them. If a subsequent I/O requires a
record that is still resident in the cache memory, it will be picked up
from there, thus saving another physical I/O.

Duplex (full and Full duplex line/channel allows simultaneous transmission in both
half) directions. Half duplex line/channel is capable of transmitting in both
directions, but only in one direction at a time.

Echoing A reflection of the transmitted signal from the receiving end, a visual
method of error detection in which the signal from the originating device
is looped back to that device so that it can be displayed.

Elements of cost The constituent parts of costs according to the factors upon which
expenditure is incurred viz., materials, labour and expenses.

E d U Th h th i d t d b i

95
Glossary of Terms

Environment A collection of hardware, software, network communications and


procedures that work together to provide a discrete type of computer
service. There may be one or more environments on a physical platform
e.g. test, production. An environment has unique features and
characteristics that dictate how they are administered in similar, yet
diverse manners.

Expert User In some organisations it is common to use 'Super' Users (commonly


known as Super or Expert Users) to deal with first-line support problems
and queries. This is typically in specific application areas, or
geographical locations, where there is not the requirement for full-time
support staff. This valuable resource however needs to be carefully co-
ordinated and utilised.

External Target One of the measures, against which a delivered IT service is compared,
expressed in terms of the customer's business.

Financial year An accounting period covering 12 consecutive months. In the public


sector this financial year generally coincides with the fiscal year which
runs from 1 April to 31 March.

Forward Schedule Contains details of all the Changes approved for implementation and
of Changes their proposed implementation dates. It should be agreed with the
Customers and the business, Service Level Management, the Service
Desk and Availability Management. Once agreed, the Service Desk
should communicate to the User community at large any planned
additional downtime arising from implementing the Changes, using the
most effective methods available.

Full cost The total cost of all the resources used in supplying a service i.e. the
sum of the direct costs of producing the output, a proportional share of
overhead costs and any selling and distribution expenses. Both cash
costs and notional (non-cash) costs should be included, including the
cost of capital.
See also 'Total Cost of Ownership'

Full Release All components of the Release unit are built, tested, distributed and
implemented together - see also 'Delta Release'.

Gateway Equipment which is used to interface networks so that a terminal on one


network can communicate with services or a terminal on another.

Gradual Recovery Previously called 'Cold stand-by', this is applicable to organisations that
do not need immediate restoration of business processes and can
function for a period of up to 72 hours, or longer, without a re-
establishment of full IT facilities. This may include the provision of
empty accommodation fully equipped with power, environmental
controls and local network cabling infrastructure, telecommunications
connections, and available in a disaster situation for an organisation to
install its own computer equipment.

96
Glossary of Terms

Hard charging Descriptive of a situation where, within an organisation, actual funds are
transferred from the customer to the IT organisation in payment for the
delivery of IT services.

Hard fault The situation in a virtual memory system when the required page of
code or data, which a program was using, has been redeployed by the
operating system for some other purpose. This means that another
piece of memory must be found to accommodate the code or data, and
will involve physical reading/writing of pages to the page file.

Host A host computer comprises the central hardware and software resources
of a computer complex, e.g. CPU, memory, channels, disk and magnetic
tape I/O subsystems plus operating and applications software. The term
is used to denote all non-network items.

Hot stand-by See 'Immediate Recovery'.

ICT The convergence of Information Technology, Telecommunications and


Data Networking Technologies into a single technology.

Immediate Previously called 'Hot stand-by', provides for the immediate restoration
Recovery of services following any irrecoverable incident. It is important to
distinguish between the previous definition of 'hot stand-by' and
'immediate recovery'. Hot stand-by typically referred to availability of
services within a short timescale such as 2 or 4 hours whereas
immediate recovery implies the instant availability of services.

Impact Measure of the business criticality of an Incident, Problem or Request


for Change. Often equal to the extent of a distortion of agreed or
expected Service Levels.

Impact analysis The identification of critical business processes, and the potential
damage or loss that may be caused to the organisation resulting from a
disruption to those processes. Business impact analysis identifies:
· the form the loss or damage will take · how that degree of damage or
loss is likely to escalate with time following an incident · the minimum
staffing, facilities and services needed to enable business processes to
continue to operate at a minimum acceptable level · the time within
which they should be recovered. The time within which full recovery of
the business processes is to be achieved is also identified.

Impact scenario Description of the type of impact on the business that could follow a
business disruption. Usually related to a business process and will
always refer to a period of time, e.g. customer services will be unable to
operate for two days.

Incident Any event which is not part of the standard operation of a service and
which causes, or may cause, an interruption to, or a reduction in, the
quality of that service.

Indirect cost A cost incurred in the course of making a product providing a service or
running a cost centre or department, but which cannot be traced
directly and in full to the product, service or department, because it has
been incurred for a number of cost centres or cost units. These costs
are apportioned to cost centres/cost units. Indirect costs are also
f dt h d

97
Glossary of Terms

Informed An individual, team or group with functional responsibility within an


Customer organisation for ensuring that spend on IS/IT is directed to best effect,
i.e. that the business is receiving value for money and continues to
achieve the most beneficial outcome. In order to fulfil its role the
'Informed' Customer function must gain clarity of vision in relation to
the business plans and assure that suitable strategies are devised and
maintained for achieving business goals.
The 'Informed' Customer function ensures that the needs of the
business are effectively translated into a business requirements
specification, that IT investment is both efficiently and economically
directed, and that progress towards effective business solutions is
monitored. The 'Informed' Customer should play an active role in the
procurement process, e.g. in relation to business case development, and
also in ensuring that the services and solutions obtained are used
effectively within the organisation to achieve maximum business
benefits. The term is often used in relation to the outsourcing of IT/IS.
Sometimes also called 'Intelligent Customer'.

Interface Physical or functional interaction at the boundary between Configuration


Items.

Intermediate Previously called 'Warm stand-by', typically involves the re-


Recovery establishment of the critical systems and services within a 24 to 72 hour
period, and is used by organisations that need to recover IT facilities
within a predetermined time to prevent impacts to the business process.

Internal target One of the measures against which supporting processes for the IT
service are compared. Usually expressed in technical terms relating
directly to the underpinning service being measured.

Invocation (of Putting business recovery plans into operation after a business
business recovery disruption.
plans)

Invocation (of Putting stand-by arrangements into operation as part of business


stand-by recovery activities.
arrangements)

Invocation and The second phase of a business recovery plan.


recovery phase

ISO9001 The internationally accepted set of standards concerning quality


management systems.

ITIL The OGC IT Infrastructure Library - a set of guides on the management


and provision of operational IT services.

Known Error An Incident or Problem for which the root cause is known and for which
a temporary Work-around or a permanent alternative has been
identified. If a business case exists, an RFC will be raised, but, in any
event, it remains a Known Error unless it is permanently fixed by a
Change.

98
Glossary of Terms

Latency The elapsed time from the moment when a seek was completed on a
disk device to the point when the required data is positioned under the
read/write heads. It is normally defined by manufacturers as being half
the disk rotation time.

Lifecycle A series of states, connected by allowable transitions. The lifecycle


represents an approval process for Configuration Items, Problem
Reports and Change documents.

Logical I/O A read or write request by a program. That request may, or may not,
necessitate a physical I/O. For example, on a read request the required
record may already be in a memory buffer and therefore a physical I/O
is not necessary.

Marginal Cost The cost of providing the service now, based upon the investment
already made.

Maturity The degree to which BCM activities and processes have become
level/Milestone standard business practice within an organisation.

Metric Measurable element of a service process or function.

Operational Costs Those costs resulting from the day-to-day running of the IT Services
section, e.g. staff costs, hardware maintenance and electricity, and
relating to repeating payments whose effects can be measured within a
short timeframe, usually less than the 12-month financial year.

Operational Level An internal agreement covering the delivery of services which support
Agreement the IT organisation in their delivery of services.

Opportunity cost The value of a benefit sacrificed in favour of an alternative course of


(or true cost) action. That is the cost of using resources in a particular operation
expressed in terms of foregoing the benefit that could be derived from
the best alternative use of those resources.

Outsourcing The process by which functions performed by the organisation are


contracted out for operation, on the organisation's behalf, by third
parties.

Overheads The total of indirect materials, wages and expenses.

Package assembly A device that permits terminals, which do not have an interface suitable
/disassembly for direct connection to a packet switched network, to access such a
device network. A PAD converts data to/from packets and handles call set-up
and addressing.

Page fault A program interruption that occurs when a page that is marked 'not in
l 'i f dt b ti

99
Glossary of Terms

Paging The I/O necessary to read and write to and from the paging disks: real
(not virtual) memory is needed to process data. With insufficient real
memory, the operating system writes old pages to disk, and reads new
pages from disk, so that the required data and instructions are in real
memory.

PD0005 Alternative title for the BSI publication 'A Code of Practice for IT Service
Management'.

Percentage The amount of time that a hardware device is busy over a given period
utilisation of time. For example, if the CPU is busy for 1800 seconds in a one hour
period, its utilisation is said to be 50%.

Phantom line A communications error reported by a computer system that is not


error detected by network monitoring equipment. It is often caused by
changes to the circuits and network equipment (e.g. re-routing circuits
at the physical level on a backbone network) while data communications
is in progress.

Physical I/O A read or write request from a program has necessitated a physical read
or write operation on an I/O device.

Prime cost The total cost of direct materials, direct labour and direct expenses. The
term prime cost is commonly restricted to direct production costs only
and so does not customarily include direct costs of marketing or
research and development.

PRINCE2 The standard UK government method for project management.

Priority Sequence in which an Incident or Problem needs to be resolved, based


on impact and urgency.

Problem Unknown underlying cause of one or more Incidents.

Process A connected series of actions, activities, Changes etc, performed by


agents with the intent of satisfying a purpose or achieving a goal.

Process Control The process of planning and regulating, with the objective of performing
the process in an effective and efficient way.

Programme A collection of activities and projects that collectively implement a new


corporate requirement or function.

Queuing time Queuing time is incurred when the device, which a program wishes to
use, is already busy. The program therefore has to wait in a queue to
obtain service from that device.

RAID Redundant Array of Inexpensive Disks - a mechanism for providing data


resilience for computer systems using mirrored arrays of magnetic
disks.
Diff tl l f RAID b li d t id f t ili

100
Glossary of Terms

Reference data Information that supports the plans and action lists, such as names and
addresses or inventories, which is indexed within the plan.

Release A collection of new and/or changed CIs which are tested and introduced
into the live environment together.

Request for Form, or screen, used to record details of a request for a change to any
Change (RFC) CI within an infrastructure or to procedures and items associated with
the infrastructure.

Resolution Action which will resolve an Incident. This may be a Work-around.

Resource cost The amount of machine resource that a given task consumes. This
resource is usually expressed in seconds for the CPU or the number of
I/Os for a disk or tape device.

Resource profile The total resource costs that are consumed by an individual online
transaction, batch job or program. It is usually expressed in terms of
CPU seconds, number of I/Os and memory usage.

Resource unit Resource units may be calculated on a standard cost basis to identify
costs the expected (standard) cost for using a particular resource. Because
computer resources come in many shapes and forms, units have to be
established by logical groupings. Examples are:
a) CPU time or instructions b) disk I/Os c) print lines
d) communication transactions.

Resources The IT Services section needs to provide the customers with the
required services. The resources are typically computer and related
equipment, software, facilities or organisational (people).

Return to normal The phase within a business recovery plan which re-establishes normal
phase operations.

Risk A measure of the exposure to which an organisation may be subjected.


This is a combination of the likelihood of a business disruption occurring
and the possible loss that may result from such business disruption.

Risk Analysis The identification and assessment of the level (measure) of the risks
calculated from the assessed values of assets and the assessed levels of
threats to, and vulnerabilities of, those assets.

Risk Management The identification, selection and adoption of countermeasures justified


by the identified risks to assets in terms of their potential impact upon
services if failure occurs, and the reduction of those risks to an
acceptable level.

Risk reduction Measures taken to reduce the likelihood or consequences of a business


measure disruption occurring (as opposed to planning to recover after a
disruption).

Role A set of responsibilities, activities and authorisations.

Roll in roll out Used on some systems to describe swapping.


(RIRO)

101
Glossary of Terms

Rotational A facility which is employed on most mainframes and some


Position Sensing minicomputers. When a seek has been initiated the system can free the
path from a disk drive to a controller for use by another disk drive, while
it is waiting for the required data to come under the read/write heads
(latency). This facility usually improves the overall performance of the
I/O subsystem.

Seek Time Occurs when the disk read/write heads are not positioned on the
required track. It describes the elapsed time taken to move heads to the
right track.

Self-insurance A decision to bear the losses that could result from a disruption to the
business as opposed to taking insurance cover on the risk.

Service One or more IT systems which enable a business process.

Service The actual service levels delivered by the IT organisation to a customer


achievement within a defined life-span.

Service Catalogue Written statement of IT services, default levels and options.

Service Desk The single point of contact within the IT organisation for users of IT
services.

Service A formal project undertaken within an organisation to identify and


Improvement introduce measurable improvements within a specified work area or
Programme work process.

Service Level Written agreement between a service provider and the Customer(s),
Agreement that documents agreed Service Levels for a Service.

Service Level The process of defining, agreeing, documenting and managing the levels
Management of customer IT service, that are required and cost justified.

Service Management of Services to meet the Customer's requirements.


Management

Service provider Third-party organisation supplying services or products to customers.

Service quality The written plan and specification of internal targets designed to
plan guarantee the agreed service levels.

Service Request Every Incident not being a failure in the IT Infrastructure.

Services The deliverables of the IT Services organisation as perceived by the


Customers; the services do not consist merely of making computer
resources available for customers to use.

Simulation Using a program to simulate computer processing by describing in detail


modelling the path of a job or transaction. It can give extremely accurate results.
Unfortunately, it demands a great deal of time and effort from the
modeller. It is most beneficial in extremely large or time-critical systems
h th i f i ll

102
Glossary of Terms

Soft fault The situation in a virtual memory system when the operating system
has detected that a page of code or data was due to be reused, i.e. it is
on a list of 'free' pages, but it is still actually in memory. It is now
rescued and put back into service.

Software As 'Configuration Item', excluding hardware and services.


Configuration
Item (SCI)

Software Software used to support the application such as operating system,


Environment database management system, development tools, compilers, and
application software.

Software Library A controlled collection of SCIs designated to keep those with like status
and type together and distinctly segregated, to aid in development,
operation and maintenance.

Software work Software work is a generic term devised to represent a common base on
unit which all calculations for workload usage and IT resource capacity are
then based. A unit of software work for I/O type equipment equals the
number of bytes transferred; and for central processors it is based on
the product of power and CPU-time.

Solid state devices Memory devices that are made to appear as if they are disk devices.
The advantages of such devices are that the service times are much
faster than real disks since there is no seek time or latency. The main
disadvantage is that they are much more expensive.

Specsheet Specifies in detail what the customer wants (external) and what
consequences this has for the service provider (internal) such as
required resources and skills.

Standard cost A pre-determined calculation of how much costs should be under


specified working conditions. It is built up from an assessment of the
value of cost elements and correlates technical specifications and the
quantification of materials, labour and other costs to the prices and/or
wages expected to apply during the period in which the standard cost is
intended to be used. Its main purposes are to provide bases for control
through variance accounting, for the valuation of work in progress and
for fixing selling prices.

Standard costing A technique which uses standards for costs and revenues for the
purposes of control through variance analysis.

Stand-by Arrangements to have available assets which have been identified as


arrangements replacements should primary assets be unavailable following a business
disruption. Typically, these include accommodation, IT systems and
networks, telecommunications and sometimes people.

Storage A defined measurement unit that is used for storage type equipment to
occupancy measure usage. The unit value equals the number of bytes stored.

Super User In some organisations it is common to use 'expert' Users (commonly


known as Super or Expert Users) to deal with first-line support problems
and queries. This is typically in specific application areas, or
geographical locations, where there is not the requirement for full-time
support staff. This valuable resource however needs to be carefully co-
di t d d tili d

103
Glossary of Terms

Surcharging Surcharging is charging business users a premium rate for using


resources at peak times.

Swapping The reaction of the operating system to insufficient real memory:


swapping occurs when too many tasks are perceived to be competing
for limited resources. It is the physical movement of an entire task (e.g.
all real memory pages of an address space may be moved at one time
from main storage to auxiliary storage).

System An integrated composite that consists of one or more of the processes,


hardware, software, facilities and people, that provides a capability to
satisfy a stated need or objective.

Terminal Software running on an intelligent device, typically a PC or workstation,


emulation which allows that device to function as an interactive terminal connected
to a host system. Examples of such emulation software includes IBM
3270 BSC or SNA, ICL C03, or Digital VT100.

Terminal I/O A read from, or a write to, an online device such as a VDU or remote
printer.

Third-party An enterprise or group, external to the Customer's enterprise, which


supplier provides services and/or products to that Customer's enterprise.

Thrashing A condition in a virtual storage system where an excessive proportion of


CPU time is spent moving data between main and auxiliary storage.

Total Cost Of Calculated including depreciation, maintenance, staff costs,


Ownership accommodation, and planned renewal.

Tree structures In data structures, a series of connected nodes without cycles. One
node is termed the root and is the starting point of all paths, other
nodes termed leaves terminate the paths.

Underpinning A contract with an external supplier covering delivery of services that


contract support the IT organisation in their delivery of services.

Unit costs Costs distributed over individual component usage. For example, it can
be assumed that, if a box of paper with 1000 sheets costs £10, then
each sheet costs 1p. Similarly if a CPU costs £lm a year and it is used to
process 1,000 jobs that year, each job costs on average £1,000.

Urgency Measure of the business criticality of an Incident or Problem based on


the impact and on the business needs of the Customer.

User The person who uses the service on a day-to-day basis.

Utility cost centre A cost centre for the provision of support services to other cost centres.
(UCC)

104
Glossary of Terms

Variance analysis A variance is the difference between planned, budgeted or standard cost
and actual cost (or revenues). Variance analysis is an analysis of the
factors that have caused the difference between the pre-determined
standards and the actual results. Variances can be developed specifically
related to the operations carried out in addition to those mentioned
above.

Version An identified instance of a Configuration Item within a product


breakdown structure or configuration structure for the purpose of
tracking and auditing change history. Also used for software
Configuration Items to define a specific identification released in
development for drafting, review or modification, test or production.

Version Identifier A version number; version date; or version date and time stamp.

Virtual memory A system that enhances the size of hard memory by adding an auxiliary
system storage layer residing on the hard disk.

Virtual storage An ICL VME term for a page fault.


interrupt (VSI)

Vulnerability A weakness of the system and its assets, which could be exploited by
threats.

Warm stand-by See 'Intermediate Recovery'.

Waterline The lowest level of detail relevant to the customer.

Work-around Method of avoiding an Incident or Problem, either by a temporary fix or


by a technique that means the Customer is not reliant on a particular
aspect of the service that is known to have a problem.

Workloads In the context of Capacity Management Modelling, a set of forecasts


which detail the estimated resource usage over an agreed planning
horizon. Workloads generally represent discrete business applications
and can be further sub-divided into types of work (interactive,
timesharing, batch).

WORM (Device) Optical read only disks, standing for Write Once Read Many.

105
106