You are on page 1of 29

Improving the Performance of IT Operations

Creating, Managing, and Improving Your IT Metrics

Overview
Share our experiences with managing operational performance via metrics in IT Our approach
Choosing the right metrics Method of sharing Developing action plans and improvement

Examples
Mean Time to Repair (Support Groups) Average Speed of Answer (Help Desk) Enterprise Data Warehouse Data Currency

Q&A
10/23/2008 Improving the Performance of IT Operations

Historical Approach to Metrics


SPC Monthly Ops Reviews Large group formats Metrics driven by ticketing system

Less About
1. 2. Explaining month to month variances Praises when better than target and dissatisfaction when worse Reacting to bad news without analysis 1.

More About
Understanding the natural variation in a process Knowing how performing against customer desires Developing thoughtful action plans

2.

3.

3.

10/23/2008

Improving the Performance of IT Operations

New Approach
Focus on metrics that are meaningful to both the support organization and their customers Supportive and targeted Individualized Less about charting, more about analysis and improvement

10/23/2008

Improving the Performance of IT Operations

Metric Development
Bring group management, key employees and customers together Brainstorm ways that demonstrate that we:
Dont let it break

If it breaks, we fix it fast

We fix it right the 1st time

10/23/2008

Improving the Performance of IT Operations

Metric Development
Dont let it break
Customer Focused Back Office Focused

Reactive

Proactive

Reactive

Proactive

10/23/2008

Improving the Performance of IT Operations

Variants
Project Management
Dont let it be late It its late, minimize the impact fast Plan it right the first time

Architecture
Dont let the design be incorrect It its incorrect, fill the gap quickly Design it right the first time

Security
Dont let them compromise data If they do, catch them fast Plan security right the 1st time
10/23/2008 Improving the Performance of IT Operations

Sample IT Network Engineering & Support


Dont let it break If it breaks, we fix it fast
Repeat troubles for Customers or devices Traffic blocked at firewall % routers with unsaved configs

We fix it right the 1st time

Ticket routing & Assignment accuracy

Traffic Analysis / utilization

% of successful changes WAN interface utilization

Memory utilization On key network devices

Repeat TAC calls to vendor Mean Time to Repair Spanning Tree Convergence Time

Trunk utilization 3rd party response times

End of Life/Support Hardware

Interface Errors

Call quality for VOIP

Certifying device & config standards

10/23/2008

Improving the Performance of IT Operations

Network Engineering & Support Comparison

10/23/2008

Improving the Performance of IT Operations

Networking Engineering & Support Improving on Mean Time to Repair (MTTR)


Problem: It was taking too long to repair low priority problems.

10/23/2008

Improving the Performance of IT Operations

Why-Because Pursuit

Fifty reasons why not capable, but only 6 common themes


10/23/2008 Improving the Performance of IT Operations

Common Themes
Understanding of requests Ticket monitoring External noise Varying effort Priorities Timing of ticket entry

10/23/2008

Improving the Performance of IT Operations

NES MTTR Control Chart and Process Break


January - August 2008 MTTR
125 100 75 MTTR (Hours) 50 25 0 -25 -50 Jan Feb March April May Months June July August UCL=31.7 _ X=20.5 LCL=9.4

10/23/2008

Improving the Performance of IT Operations

Comparison of Capability

10/23/2008

Improving the Performance of IT Operations

IT Help Desk Improving Average Speed of Answer (ASA)


Problem: The IT Help Desk was not answering employee phone calls quickly enough Through Why-Because pursuit, identified four areas of opportunity:
Incentives HR policies Shift schedule changes Additional customer choices

10/23/2008

Improving the Performance of IT Operations

ASA Control Chart and Process Breaks

1 2 3 4

10/23/2008

Improving the Performance of IT Operations

10/23/2008

Improving the Performance of IT Operations

Results
Process now operates at 3 sigma Reduced wait time by 4 minutes

10/23/2008

Improving the Performance of IT Operations

System Administration Improving Mean Time to Repair (MTTR)


Problem: The System Administration group was not solving problems quickly enough Identified five areas of opportunity:
Staff levels Improved ticket management Weekly awareness of metrics Optimized process for elevated permissions Developed incentive plan

10/23/2008

Improving the Performance of IT Operations

System Administration Control Chart

10/23/2008

Improving the Performance of IT Operations

Capability Comparison Over Time

10/23/2008

Improving the Performance of IT Operations

Results
Problem appeared to be workload related, but proved to be both process and workload Now operates at > 4 sigma

10/23/2008

Improving the Performance of IT Operations

Enterprise Data Warehousing Improving Load Times


Problem:
The EDW Nightlies Informatica data load needs to be more consistent on a day-to-day basis resulting in less load errors, faster reaction times to errors, and overall improved load completion times.

10/23/2008

Improving the Performance of IT Operations

Enterprise Data Warehousing Improving Load Times

10/23/2008

Improving the Performance of IT Operations

Illustration of Improvement
NOTDONE DONE NOT YET!! YET!!

10/23/2008

Improving the Performance of IT Operations

Problem Statement The EDW Nightlies Informatica data load needs to be more consistent on a day-to-day basis resulting in less load errors, faster reaction times to errors, and overall improved load completion times.

Action Plan:
A. Modify OBIEE parameter settings to reduce extra database processes from application. Modify PEDW data parameter to increase shared memory allocation to database thus eliminating database lockups. Add SiteScope Alerts on critical load failures (w/ Command Center callouts to EDW Support) by end of Sept Add SiteScope Alerts when critical loads do not start on time (w/Command Center callouts to EDW Support) by middle of Oct Upgrade PEDW to 10g on RAC by end of Oct Upgrade Informatica to 8.6 by end of Nov
10/23/2008

B.

Current Status / Updates OBIEE changes done and work well. PEDW shared memory allocation parameters modified (8/20) and appear to have solved the lockup problems. We will continue monitoring. SiteScope alert on failure in progress, working out bugs involving Command Center action. SiteScope alert on load schedule miss is being designed. PEDW upgrade to 10g/RAC planned for week of Oct 6 with Oct 13 completion date. SEDW testing going well. Informatica 8.6 upgrade still being planned.

C.

Expected Results
EDW Nightly load times and load error metrics will be capable as a result of these changes. The upgrade to 10g/RAC and Infa 8.6 should improve the completion times by at least 30% over the current 9i and Infa 7.1.3 infrastructure.

D.

E. F.

Improving the Performance of IT Operations

Lessons Learned
Those that volunteer first have been most successful If youre not happy with the metrics, start over Involve the front line employees Review in small groups, monthly at a minimum Share the results with your customers Document your successes!

10/23/2008

Improving the Performance of IT Operations

Wrap Up
Matured our metrics program Collaborative approach to metric selection Focused on action planning and improvement Individual success stories

10/23/2008

Improving the Performance of IT Operations

Q&A

10/23/2008

Improving the Performance of IT Operations

You might also like