IT Infrastructure Management

IT Infrastructure Library
Best Practice

IT Infrastructure Library
Table of Contents
IT Operations Management

Operations process

The Operations process comprises all

activities and measures necessary to
enable and/or maintain the intended use
of ICT services and infrastructure in
order to meet Service Level Agreements
and business targets

An Operations process is required to
ensure a stable and secure foundation
on which to provide ICT services.
The Operations process has a strong
technology focus with an emphasis on
‘monitor and control’

CI : Configuration Item Static
MO: Managed Object  Dinamics
Exp. FTP Server:
 CI : dB Configurations
 MO: Files, status (running, off-line etc.)

The goals
to operate, manage and maintain an end-to-
end ICT infrastructure that facilitates the
delivery of the ICT services to the business,
and that meets all its agreed requirements
and targets
to ensure that the ICT infrastructure is
reliable, robust, secure, consistent and
facilitates the efficient and effective business
processes of the organisation.


Inputs (Op)
Current ICT Infra.
OLA (op. Lv Agr.) SLM (Svc. Lv. Mgt)
Underpinning ContractConsistent SLA
Op. Processes & Procedures
Strategy, Plans, Policies & Standards

Event, Warning, Alert & Alarm
End-to-End Mgt.
Workload Scheduling
Housekeeping & Maintenance
Storege management
Relationship: Tech. Support, third party
Improvement/Remedial activities 
Change Mgt
A Stable & Secure ICT Infra.
A Secure Operational Doc. Lib. : Op. Proc,
Hand Over Proced.
A Set of Operational Script
A set of Op Work Schedule
Aset of Operation Tools: Inf Of Op State
Mgt Report & Inf
Exection review & Report
Audit Reports for Effectiveness, Efficiency

improved management and control of all ICT
infrastructure events, alerts and alarms
faster detection, response and resolution of incidents
and problems, with clear definition of responsibilities
more resilient infrastructure with improved service
availability with prevention of incident and problem
a framework for financial management and long-term
cost reduction
help with the provision and maintenance of quality IT
clear definition of the roles and responsibilities of
individuals within Operations
a systematic approach to the assessment of staff
performance and promotion 13
Benefit (cont.)

a platform for the development of

automation in Operations and the use
of an operational bridge, offering
productivity and service quality
improved supplier relationships
the generation of a professional
environment where the performance of
everybody can rise to the level of the
adoption, tailoring and production of the
procedures described within this section
the selection and purchase of operational
management tools in conjunction with all
other areas of ICT and Service Management
the configuration and integration of
operational management tools and processes
with all other ICT and Service Management
tools and processes

Cost (cont.)

customising of management tools and the

development of operational scripts
additional software or management tools and
packages required
any additional equipment requirements, e.g.,
network backup devices and solutions
the establishment and maintenance of the
Operational environments
education and training for all operational

a lack of awareness and knowledge of service
and business targets and requirements
the establishment of effective end-to-end
operational management tools and processes
a lack of focus on service and business
lack of awareness and adherence to the
operational aspects of security policies and

Problem (cont.)

poor recognition of the need and importance of

operational tools and processes by senior ICT
resistance to change
creating service- and business-focused operational
documentation and adherence to agreed practices
and processes
maintenance of operational systems and services
while implementing process improvements
staff motivation and focus on repetitive operational

Roles and Responsibilities
Production Services Manager/Operations
Management/Shift Leader: 24x7 management of the
operational people, process and technology, facilitating
teamwork throughout ICT Operations
Operations Analyst: the day-to-day control of the
complete ICT operational infrastructure, working directly with
MOs and all the operational ICT infrastructure components
and services
Storage and Backup Management: management and
control of all information storage devices and space, and the
scheduling, testing and storage of all backups and their
subsequent recovery
Scheduling Analyst: management and control of all
aspects of the scheduling or operational workloads, including
printers and output
Database Administration: the management and
administration of all operational databases.
Interfaces to Other Areas
Application Management is concerned with all
aspects of the design and development, justification
and procurement of applications for use within the
organisation. Application Management should provide
Operations with details of the operational
requirements of all operational applications,
information and databases
Human Resources – on skill sets, personal
development plans, training plans and approved
training service suppliers
Security Management – on all aspects of security
including security policies, procedures and plans
Supplier Management – on contract details and
supplier performance.

Mgt Processes:
Mgt Events
Event monitoring
Event detection
Event logging
Event examination and filtering
Event processing, correlation and
Event resolution
Event closure
Management of the event lifecycle
Event grouping
Event reporting 21
Op Control & Mgt Service
Operation control
Development and maintenance of an
operational management tool-set
Configuration and reconfiguration
Housekeeping and preventative
maintenance processes
Inventory and Asset Management
Storage Mgt, Backup & Recovery
Storage management and allocation: this activity manages
all aspects of the management, allocation and housekeeping of
media and information storage. It involves aspects of policy
making and implementation as well as the control and
management of media movements.
System backup and recovery: backup and recovery are
complementary in the sense that backup is almost always
scheduled in advance and recovery is usually reactive and
unscheduled. This means that there is rarely a pressing need for
a backup whereas there is nearly always a pressing reason for a
restore. The more thought that is put into backup and recovery
policies and strategies, in advance, the less disruption and
damage the procedure will cause.
Information management: would include the use of
document and hierarchical information management systems.
This process should ensure that the right information is stored
in the appropriate media, with the right level of access and
speed of retrieval.
Database management and administration: is responsible
for the regular management and administrative tasks necessary
for support and maintenance of all operational databases. 23
Security Mgt
Security monitoring: monitors, verifies and
 detection and containment of all intrusion
attempts and attempts at unauthorised access
 logging, management and reporting.
Security control:
 physical security: the element of Security
Management that prevents unauthorised,
unwanted and unnecessary physical access
 logical security: the component of Security
Management that prevents unauthorised,
unwanted and unnecessary logical access to ICT
information and systems by using measures on
classification, authentication and access controls.
Proactive Op Mgt Processes
review of the Operations processes: for
efficiency, effectiveness and compliance
internal or external audit of all operational
processes: or assessment and comparison with best
practice guidelines on a regular basis
instigation of remedial or improvement actions
to the operational infrastructure: this can be
achieved using analysis of event logs, incident data
and operational trend reports. By working with other
ICTIM and ITSM processes, weak areas within ICT
services and infrastructure can be identified and
improved under the control of Change Management

Proactive Op Mgt Processes (cont.)
instigation of remedial actions or
improvements to the Operations processes:
again, by analysing operational information, reports,
reviews and audits, remedial and improvement
changes can be made to process weak spots
operational tuning: this is the process of
determining the value of attributes and parameters of
operational ICT services and MOs in order to optimise
their performance. This can be achieved using
analysis and trending of operational information and
data. Again, these activities should only be completed
in coordination and conjunction with all other ICTIM
and ITSM processes, especially Capacity Management
and Change Management
The techniques of Op
Operations bridge: This enables the Operations
process and the Incident Management processes, people and
products (tools and technology) to be closely integrated.
Event analysis and trending
 Major operational event review:
 what went wrong?
 what aspects of the resolution were done well?
 what aspects of the resolution could be improved?
 how can this event be prevented from reoccurring?
 Event trend analysis

The techniques of Op (cont.)

Internal review and assessment

External benchmarking
Continual process of improvement

The tools
Systems Management or network management tools
 categorisation, interpretation, recording and management of
ICT infrastructure events, warnings and alarms, leading to
possible incident recording and problem determination and
 monitoring and interrogation of ICT component status and
condition and all aspects of ICT infrastructure operation and
 ICT infrastructure equipment reload, reset and recovery
 monitoring, control and management of ICT infrastructure
 traffic monitoring, collection and storage of data on service
and performance levels of all aspects of the operational ICT

The Tools
Systems Management or network
management tools (cont.)
 ICT infrastructure component configuration
maintenance, including download and upload
 activation/deactivation of ICT infrastructure
 proactive preventative maintenance
 diagnostic monitoring, tracing and analysis
 ICT infrastructure testing and monitoring
 automation of operational procedures.

The Tools (cont.)

Environmental management tools

Service, application and database management tools
 user concurrency levels
 service, application and database transaction statistics,
volumes and response times
 transaction control, management, processing, queuing
times, tracing and diagnosis
 service, application and database security, access control
and management
 logging of all events, warnings, alarms and failures
 resource utilisation levels
 audit trail facilities.

The Tools (cont.)

Diagnostic tools
 from problem diagnosis to traffic simulation
and planning
Scheduling tools

The Tools (cont.)

Storage management tools

 setting and implementing space allocation,
hierarchical management, copying and backup
 mirroring, raid sets and ‘hot swapping’ of failed
components with in-flight recovery
 monitoring, setting thresholds and reporting on
media storage usage
 security and access control facilities
 easy housekeeping and maintenance of the
storage media
 support of the efficient and effective use of
databases and file systems.
The Tools (cont.)

Performance tools
 real-time monitoring of ICT service response,
component response and client and customer
response times
 real-time monitoring of ICT service and
component usage and traffic volumes
 support of a database for recording past statistical
information and performance history
 statistical analysis facilities
 import and export facilities with other Capacity
Management tools
 trending and modelling facilities.

The technologies
to ensure that any new systems pass
their operational acceptance criteria,
meeting or exceeding their operational
to develop any new operational support
or management processes or
procedures before or during the
to develop any new skills or knowledge
required either by formal, workshop or
on-the-job training.