Professional Documents
Culture Documents
Abstract have come to question the widely held belief that au-
The high cost of IT operations has led to an intense fo- tomation of IT systems always reduces costs. In fact, our
cus on the automation of processes for IT service de- claim is that automation can increase cost if it is applied
livery. We take the heretical position that automation without a holistic view of the processes used to deliver
does not necessarily reduce the cost of operations since: IT services. This conclusion derives from the hidden
(1) additional effort is required to deploy and maintain costs of automation, costs that become apparent when
the automation infrastructure; (2) using the automation automation is viewed holistically. While automation
infrastructure requires the development of structured in- may reduce the cost of certain operational processes, it
puts that have up-front costs for design, implementation, increases other costs, such as those for maintaining the
and testing that are not required for a manual process; automation infrastructure, adapting inputs to structured
and (3) detecting and recovering from errors in an au- formats required by automation, and handling automa-
tomated process is considerably more complicated than tion failures. When these extra costs outweigh the bene-
for a manual process. Our studies of several data cen- fits of automation, we have a situation described by hu-
ters suggest that the up-front costs mentioned in (2) man factors experts as an irony of automation—a case
are of particular concern since many processes have a where automation intended to reduce cost has ironically
limited lifetime (e.g., 25% of the packages constructed ended up increasing it [1].
for software distribution were installed on fewer than To prevent these ironies of automation, we must take
15 servers). We describe a process-based methodology a holistic view when adding automation to an IT system.
for analyzing the benefits and costs of automation, and This requires a technique for methodically exposing the
hence for determining if automation will indeed reduce hidden costs of automation, and an analysis that weighs
the cost of IT operations. Our analysis provides a quan- these costs against the benefits of automation. The ap-
titative framework that captures several traditional rules proach proposed in this paper is based on explicit rep-
of thumb: that automating a process is beneficial if the resentations of IT operational processes and the changes
process has a sufficiently long lifetime, if it is relatively to those processes induced by automation. We illustrate
easy to automate (i.e., can readily be generalized from a our process-based approach using a running example of
manual process), and if there is a large cost reduction (or automated software distribution. We draw on data col-
leverage) provided by each automated execution of the lected from several real data centers to help illuminate
process compared to a manual invocation. the impact of automation and the corresponding costs,
and to give an example of how a cost-benefit analysis
1 Introduction can be used to determine when automation should and
The cost of information technology (IT) operations should not be applied. Finally, we broaden our analysis
dwarfs the cost of hardware and software, often account- into a general discussion of the trade-offs between man-
ing for 50% to 80% of IT budgets [8, 4, 16]. IBM, ual and automated processes and offer guidance on the
HP, and others have announced initiatives to address this best ways to apply automation.
problem. Heeding the call in the 7th HotOS for “futz-
free” systems, academics have tackled the problem as 2 Hidden Costs of Automation
well, focusing in particular on error recovery and prob- We begin our discussion of the hidden costs of automa-
lem determination. All of these initiatives have a com- tion by laying out a methodical approach to exposing
mon message: salvation through automation. This mes- them. Throughout, we use software distribution to server
sage has appeal since automation provides a way to re- machines as a running example since the proper man-
duce labor costs and error rates as well as increase the agement of server software is a critical part of operating
uniformity with which IT operations are performed. a data center. Our discussion applies to software pack-
After working with corporate customers, service de- age management on centrally-administered collections
livery personnel, and product development groups, we of desktop machines as well. Software distribution in-
(a) Manual Software Distribution
SW Request
System Administrator
Obtain Source More Y Validate Prereqs Y Configure Perform Install Y Verify Y
OK?
Distribution Targets? Prerequisites Met? Installer Installation Succeeds? Installation
N N
N N
Success Fail
Remove
Installation Fix Problem
Remnants
SW Req.
End
Select Invoke Check Y
Req. OK?
Research Y Targets Installer Results
Package
Available Exists? N
Packages
N Package Y
Problem?
Diagnose
N
Problem
Infrastructure
Automation
Maintainer
Update
Diagnose Repair
Endpoint Endpoint Upgrade
Upgrade
Distribution
Endpoints
Servers
Automation
N N
Remove Install
Remnants
Figure 1: Manual and automated processes for software distribution. Boxes with heavy lines indicate process steps that
contribute to variable (per-target) costs, as described in Section 3.
volves the selection of software components and their (1) the SA obtains the necessary software resources; (2)
installation on target machines. We use the term “pack- for each server, the SA repeatedly does the following—
age” to refer to the collection of software resources to in- (2a) checks prerequisites such as the operating system
stall and the step-by-step procedure (process) by which release level, memory requirements, and dependencies
this is done. on other packages; (2b) configures the installer, which
Our approach is based on the explicit representation of requires that the SA determine the values of various pa-
the processes followed by system administrators (SAs). rameters such as the server’s IP address and features to
These processes may be formal, e.g. derived from ITIL be installed; and (2c) performs the install, verifies the
best practices [13], or informal, representing the ad-hoc result, and handles error conditions that arise. While
methods used in practice. Regardless of their source, the Figure 1(a) abstracts heavily to illustrate similarities be-
first step is to document the processes as they exist be- tween software installs, we underscore that a particular
fore automation. Our approach accomplishes this with software install process has many steps and checks that
“swim-lane” diagrams—annotated flowcharts that allo- typically make it quite different from other seemingly
cate process activities across roles (represented as rows) similar software installs (e.g., which files are copied to
and phases (represented as columns). Roles are typi- what directories, pre-requisites, and the setting of con-
cally performed by people (and can be shared or consol- figuration parameters).
idated); we include automation as its own role to reflect Now suppose that we automate the process in Fig-
activities that have been handed over to an automated ure 1(a) so as to reduce the work done by the SA. That
system. is, in the normal case, the SA selects a software pack-
Figure 1(a) shows the “swim-lane” representation for age, and the software distribution infrastructure handles
the manual version of our example software distribu- the other parts of the process flow in Figure 1(a). Have
tion process. In the data centers we studied, the SA we simplified IT operations?
responds to a request to distribute software as follows: No. In fact, we may have made IT operations more
complicated. To understand why, we turn to our process- 1
driven analysis, and update our process diagram with the 0.9
is not the only impact of the automation. For one thing, 0.4
l
ua
1
Furthermore, at least 7% of the installs fail due to issues
an
M
related to configuration of the automation infrastructure,
a consideration that does not exist if a manual process
ed
at
m
is used. This back-of-the envelope analysis underscores
to
Au
the importance of considering the entire set of process
changes that occur when automation is deployed, partic- 1/N
ularly the extra operational processes created to handle
1
automation failures. It also suggests the need for a quan- Automation Leverage (L)