You are on page 1of 163

tm

The Reference Guide To

Data Center
Automation
sponsored by

Don Jones and Anil Desai


Introduction

Introduction to Realtimepublishers
by Don Jones, Series Editor

For several years, now, Realtime has produced dozens and dozens of high-quality books that just
happen to be delivered in electronic format—at no cost to you, the reader. We’ve made this
unique publishing model work through the generous support and cooperation of our sponsors,
who agree to bear each book’s production expenses for the benefit of our readers.
Although we’ve always offered our publications to you for free, don’t think for a moment that
quality is anything less than our top priority. My job is to make sure that our books are as good
as—and in most cases better than—any printed book that would cost you $40 or more. Our
electronic publishing model offers several advantages over printed books: You receive chapters
literally as fast as our authors produce them (hence the “realtime” aspect of our model), and we
can update chapters to reflect the latest changes in technology.
I want to point out that our books are by no means paid advertisements or white papers. We’re an
independent publishing company, and an important aspect of my job is to make sure that our
authors are free to voice their expertise and opinions without reservation or restriction. We
maintain complete editorial control of our publications, and I’m proud that we’ve produced so
many quality books over the past years.
I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if
you’ve received this publication from a friend or colleague. We have a wide variety of additional
books on a range of topics, and you’re sure to find something that’s of interest to you—and it
won’t cost you a thing. We hope you’ll continue to come to Realtime for your educational needs
far into the future.
Until then, enjoy.
Don Jones

i
Table of Contents

Introduction to Realtimepublishers.................................................................................................. i
An Introduction to Data Center Automation ...................................................................................1
Information Technology Infrastructure Library...............................................................................2
Benefits of ITIL ...................................................................................................................2
Improving Levels of Service....................................................................................3
Reducing IT Costs....................................................................................................3
Enforcing Well-Defined Processes ..........................................................................3
ITIL Framework Content Organization ...............................................................................3
ITIL Compliance..................................................................................................................6
ITIL Content and Resources ................................................................................................6
The Business Value of Data Center Automation.............................................................................7
Basic Benefits of IT .............................................................................................................7
Calculating the Value of IT..................................................................................................8
Identifying Costs......................................................................................................8
Discovering Business Benefits ................................................................................8
Communicating Strategic Business Value...............................................................9
Improving the Business Value of IT....................................................................................9
The Value of Data Center Automation ....................................................................9
Implementing Charge-Backs .................................................................................10
Enabling Better Decisions......................................................................................10
Service Provider.............................................................................................................................11
Benefits of Operating IT as a Service Provider .................................................................11
Implement the Service Provider Model .............................................................................11
Identifying Customers’ Needs ...............................................................................12
Determining “Product Pricing”..............................................................................12
Identifying Service Delivery Details .................................................................................12
Measuring Service Levels......................................................................................13
Prioritizing Projects ...............................................................................................13
Network Configuration Management ............................................................................................13
NCM Tasks ........................................................................................................................14
Configuration Management Challenges ............................................................................14
NCM Solutions ..................................................................................................................15
Benefits of Automating NCM............................................................................................15

ii
Table of Contents

Choosing an NCM Solution...............................................................................................16


Server Provisioning........................................................................................................................18
Challenges Related to Provisioning ...................................................................................18
Server-Provisioning Methods ............................................................................................19
Scripting.................................................................................................................19
Imaging ..................................................................................................................19
Evaluating Server-Provisioning Solutions.........................................................................20
Return on Investment.....................................................................................................................21
The Need for ROI Metrics .................................................................................................21
Calculating ROI .................................................................................................................21
Calculating Costs ...................................................................................................22
Calculating Benefits...............................................................................................22
Measuring Risk ......................................................................................................23
Using ROI Data..................................................................................................................23
Making Better Decisions........................................................................................24
ROI Example: Benefits of Automation..............................................................................25
ROI Analysis..........................................................................................................25
Change Advisory Board.................................................................................................................26
The Purpose of a CAB .......................................................................................................26
Benefits of a CAB..................................................................................................26
Roles on the CAB ..................................................................................................26
The Change-Management Process.....................................................................................27
Planning for Changes.............................................................................................28
Implementing Changes ..........................................................................................28
Reviewing Changes ...............................................................................................29
Planning for the Unplanned ...................................................................................29
Configuration Management Database............................................................................................29
The Need for a CMDB.......................................................................................................30
Benefits of Using a CMDB................................................................................................31
Implementing a CMDB Solution .......................................................................................32
Information to Track ..........................................................................................................32
Server Configuration..............................................................................................32
Desktop Configuration...........................................................................................32

iii
Table of Contents

Network Configuration ..........................................................................................32


Software Configuration..........................................................................................33
Evaluating CMDB Features...............................................................................................33
Auditing .........................................................................................................................................35
The Benefits of Auditing ...................................................................................................35
Developing Auditing Criteria ............................................................................................36
Preparing for Audits...........................................................................................................38
Performing Audits..............................................................................................................38
Automating Auditing .........................................................................................................39
Customers ......................................................................................................................................40
Identifying Customers........................................................................................................40
Understanding Customers’ Needs......................................................................................41
Defining Products and Service Offerings ..........................................................................41
Communicating with Customers........................................................................................42
Managing Budgets and Profitability ..................................................................................43
Total Cost of Ownership................................................................................................................44
Measuring Costs.................................................................................................................44
Identifying Initial Capital Costs.............................................................................45
Enumerating Infrastructure Costs ..........................................................................45
Capturing Labor Costs ...........................................................................................46
Measuring TCO .................................................................................................................46
Reducing TCO Through Automation ................................................................................47
Reporting Requirements ................................................................................................................47
Identifying Reporting Needs..............................................................................................47
Configuration Reports............................................................................................47
Service Level Agreement Reporting......................................................................48
Real-Time Activity Reporting ...............................................................................49
Regulatory Compliance Reporting ........................................................................49
Generating Reports ............................................................................................................49
Using a Configuration Management Database ......................................................49
Automating Report Generation..............................................................................50
Network and Server Convergence .................................................................................................51
Convergence Examples......................................................................................................51

iv
Table of Contents

Determining Application Requirements ............................................................................52


The Roles of IT Staff .........................................................................................................52
Managing Convergence with Automation .........................................................................52
Service Level Agreements .............................................................................................................53
Challenges Related to IT Services Delivery ......................................................................53
Defining Service Level Requirements ...............................................................................53
Determining Organizational Needs........................................................................54
Identify Service Level Details ...............................................................................55
Developing SLAs...............................................................................................................55
Delivering Service Levels..................................................................................................56
The Benefits of Well-Defined SLAs..................................................................................56
Enforcing SLAs .................................................................................................................56
Examples of SLAs .............................................................................................................57
Monitoring and Automating SLAs ....................................................................................58
Network Business Continuity ........................................................................................................58
The Benefits of Continuity Planning .................................................................................58
Developing a Network Business Continuity Plan..............................................................59
Defining Business Requirements...........................................................................59
Identifying Technical Requirements......................................................................59
Preparing for Network Failover .........................................................................................60
Configuration Management ...................................................................................60
Managing Network Redundancy ...........................................................................60
Simulating Disaster Recovery Operations .............................................................60
Automating Network Business Continuity ........................................................................61
Remote Administration..................................................................................................................62
The Benefits of Remote Administration ............................................................................62
Remote Administration Scenarios .....................................................................................62
Remote Management Features...........................................................................................63
Securing Remote Management ..........................................................................................64
Choosing a Remote Management Solution .......................................................................65
Server Configuration Management................................................................................................66
Server Configuration Management Challenges .................................................................66
Technical Challenges .............................................................................................66

v
Table of Contents

Process-Related Challenges ...................................................................................67


Automating Server Configuration Management................................................................67
Automated Server Discovery.................................................................................67
Applying Configuration Changes ..........................................................................67
Configuration Management and Change Tracking................................................68
Monitoring and Auditing Server Configurations...................................................68
Enforcing Policies and Processes...........................................................................68
Reporting................................................................................................................69
Evaluating Automated Solutions .......................................................................................69
IT Processes ...................................................................................................................................69
The Benefits of Processes ..................................................................................................70
Challenges Related to Process ...........................................................................................70
Characteristics of Effective Processes ...............................................................................70
Designing and Implementing Processes ............................................................................71
Managing Exceptions.........................................................................................................72
Delegation and Accountability ..........................................................................................72
Examples of IT Processes ..................................................................................................72
Automating Process Management .....................................................................................73
Application Infrastructure Management ........................................................................................74
Understanding Application Infrastructure .........................................................................74
Challenges of Application Infrastructure Management.........................................75
Inventorying Application Requirements................................................................75
Identifying Interdependencies................................................................................75
Automating Application Infrastructure Management........................................................76
Using Application Instrumentation........................................................................76
Managing Applications Instead of Devices ...........................................................77
Business Continuity for Servers.....................................................................................................77
The Value of Business Continuity .....................................................................................77
Identifying Mission-Critical Applications and Servers .........................................78
Developing a Business Continuity Plan for Servers ..........................................................78
Defining Business and Technical Requirements ...................................................79
Implementing and Maintaining a Backup Site...................................................................80
Automating Business Continuity .......................................................................................80

vi
Table of Contents

Using a Configuration Management Database ......................................................80


Change and Configuration Management ...............................................................81
Network and Server Maintenance..................................................................................................82
Network and Server Maintenance Tasks ...........................................................................82
Configuration Management ...................................................................................82
Applying System and Security Updates ................................................................82
Monitoring Performance........................................................................................83
Implementing Maintenance Processes...............................................................................83
Delegating Responsibility......................................................................................84
Developing Maintenance Schedules ......................................................................84
Verifying Maintenance Operations........................................................................84
The Benefits of Automation...................................................................................84
Asset Management.........................................................................................................................85
Benefits of Asset Management ..........................................................................................85
Developing Asset Management Requirements..................................................................86
Identifying Asset Types .........................................................................................87
Developing Asset Tracking Processes ...............................................................................89
Automating IT Asset Management....................................................................................89
Automated Discovery ............................................................................................90
Using a Configuration Management Database ......................................................90
Integration with Other Data Center Automation Tools .........................................90
Reporting................................................................................................................90
Flexible/Agile Management...........................................................................................................91
Challenges Related to IT Management..............................................................................91
The Agile Management Paradigm .....................................................................................91
Key Features of an Agile IT Department...........................................................................92
Automating IT Management..............................................................................................93
Policy Enforcement........................................................................................................................94
The Benefits of Policies .....................................................................................................94
Types of Policies....................................................................................................94
Defining Policies................................................................................................................94
Involving the Entire Organization .........................................................................95
Identifying Policy Candidates................................................................................96

vii
Table of Contents

Communicating Policies ........................................................................................96


Policy Scope...........................................................................................................96
Checking for Policy Compliance .......................................................................................96
Automating Policy Enforcement........................................................................................97
Evaluating Policy Enforcement Solutions .........................................................................97
Server Monitoring..........................................................................................................................98
Developing a Performance Optimization Approach..........................................................98
Deciding What to Monitor .................................................................................................98
Monitoring Availability .........................................................................................99
Monitoring Performance......................................................................................100
Verifying Service Level Agreements...................................................................100
Limitations of Manual Server Monitoring.......................................................................100
Automating Server Monitoring........................................................................................102
Change Tracking..........................................................................................................................103
Benefits of Tracking Changes..........................................................................................103
Defining a Change-Tracking Process ..............................................................................103
Establishing Accountability .................................................................................104
Tracking Change-Related Details ........................................................................104
Automating Change Tracking..........................................................................................105
Network Change Detection..........................................................................................................106
The Value of Change Detection.......................................................................................106
Unauthorized Changes .........................................................................................107
Manual Change Tracking.....................................................................................107
Challenges Related to Network Change Detection..........................................................108
Automating Change Detection.........................................................................................108
Committing and Tracking Changes .....................................................................108
Verifying Network Configuration........................................................................109
Notification Management ............................................................................................................109
The Value of Notifications...............................................................................................109
Managing Internal Notifications ..........................................................................109
Managing External Notifications.........................................................................110
Creating Notifications......................................................................................................110
What to Include in a Notification.........................................................................110

viii
Table of Contents

What to Avoid in a Notification...........................................................................111


Automating Notification Management ............................................................................111
Server Virtualization....................................................................................................................113
Understanding Virtualization...........................................................................................113
Current Data Center Challenges ..........................................................................113
Virtualization Architecture ..................................................................................113
Virtualization Terminology .................................................................................115
Benefits of Virtualization.................................................................................................116
Virtualization Scenarios...................................................................................................118
Limitations of Virtualization............................................................................................118
Automating Virtual Machine Management .....................................................................119
Remote/Branch Office Management ...........................................................................................119
Challenges of Remote Management ................................................................................119
Technical Issues ...................................................................................................120
Personnel Issues ...................................................................................................120
Business Issues.....................................................................................................120
Automating Remote Office Management........................................................................121
Patch Management.......................................................................................................................122
The Importance of Patch Management ............................................................................122
Challenges of Manual Patch Management ......................................................................122
Developing a Patch Management Process .......................................................................123
Obtaining Updates ...............................................................................................123
Identifying Affected Systems ..............................................................................123
Testing Updates ...................................................................................................123
Deploying Updates...............................................................................................124
Auditing Changes.................................................................................................124
Automating Patch Management.......................................................................................124
Benefits of Automated Patch Management .........................................................125
What to Look for in Patch Management Solutions..............................................125
Network Provisioning ..................................................................................................................126
Defining Provisioning Needs...........................................................................................126
Modeling and Testing Changes ...........................................................................127
Managing Device Configurations ........................................................................128

ix
Table of Contents

Auditing Device Configurations ..........................................................................128


Using a Configuration Management Database ................................................................128
Additional Benefits of Automation..................................................................................128
Network Security and Authentication..........................................................................................129
Understanding Security Layers........................................................................................129
Choosing a Network Authentication Method ..................................................................130
Security Protocols ................................................................................................130
Authentication Mechanisms.................................................................................130
Authorization .......................................................................................................131
Automating Security Management ..................................................................................131
Business Processes.......................................................................................................................132
The Benefits of Well-Defined Processes .........................................................................132
Defining Business Processes............................................................................................132
Deciding Which Processes to Create ...................................................................133
Identifying Process Goals ....................................................................................133
Developing Processes ..........................................................................................134
Documenting Business Processes ........................................................................134
Creating “Living” Processes ................................................................................135
Automating Business Process Workflow.........................................................................135
Business Process Example: Service Desk Processes ...................................................................136
Characteristic of an Effective Process .............................................................................136
Developing a Service Desk Operation Flow....................................................................136
Documenting Workflow Steps.............................................................................137
Tracking and Categorizing Issues........................................................................137
Escalation Processes and Workflow ....................................................................138
Creating a Service Desk Flowchart......................................................................138
Automating Service Desk Management ..........................................................................139
Executive Action Committee.......................................................................................................140
Goals of the Executive Action Committee ......................................................................140
Evaluating Potential Projects ...............................................................................140
Defining Committee Roles and Members........................................................................142
Implementing an Executive Action Process ....................................................................142
Centralized User Authentication..................................................................................................143

x
Table of Contents

Major Goals of Authentication ........................................................................................143


Authentication Mechanisms.............................................................................................143
Strengthening Password-Based Authentication...................................................144
Other Authentication Mechanisms ......................................................................145
Centralized Security.........................................................................................................146
Problems with Decentralized Security.................................................................146
Understanding Centralized Security ....................................................................147
Understanding Directory Services Solutions ...................................................................148
Features of Directory Services Solutions.........................................................................149
Directory Services Best Practices ....................................................................................150

xi
Copyright Statement

Copyright Statement
© 2006 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that
have been created, developed, or commissioned by, and published with the permission
of, Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are
protected by international copyright and trademark laws.
THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND,
EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE,
TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice
and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web
site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be
held liable for technical or editorial errors or omissions contained in the Materials,
including without limitation, for any direct, indirect, incidental, special, exemplary or
consequential damages whatsoever resulting from the use of any information contained
in the Materials.
The Materials (including but not limited to the text, images, audio, and/or video) may not
be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any
way, in whole or in part, except that one copy may be downloaded for your personal, non-
commercial use on a single computer. In connection with such use, you may not modify
or obscure any copyright or other proprietary notice.
The Materials may contain trademarks, services marks and logos that are the property of
third parties. You are not permitted to use these trademarks, services marks or logos
without prior written consent of such third parties.
Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent
& Trademark Office. All other product or service names are the property of their
respective owners.
If you have any questions about these terms, or if you would like information about
licensing materials from Realtimepublishers.com, please contact us via e-mail at
info@realtimepublishers.com.

xii
The Reference Guide to Data Center Automation

[Editor's Note: This eBook was downloaded from Realtime Nexus—The Digital Library. All
leading technology guides from Realtimepublishers can be found at
http://nexus.realtimepublishers.com.]

An Introduction to Data Center Automation


Over time, organizations have placed increasingly heavy demands on their IT departments.
Although budgets are limited, end users and other areas of the business rely increasingly on
computing resources and services to get their jobs done. This situation raises the important issue
of how IT staff can meet these demands in the best possible way. Despite the importance of IT in
strategic and tactical operations, many technical departments are run in an ad-hoc and reactive
way. Often, issues are only addressed after they have ballooned into major problems and
support-related costs can be tremendous. From the end-user standpoint, IT departments can never
react quickly enough to the need for new applications or changing requirements. Clearly, there is
room for improvement.
This guide explores data center automation—methods through which hardware, software, and
processes can work together to streamline IT operations. Modern data center challenges include
increasing demands from business units with only limited resources to address those demands.
This guide focuses on topics in the following major areas:
• Business processes and frameworks—The fundamental purpose of IT is to support
business operations and to enable end users and other departments to perform their
functions as efficiently as possible. IT departments face many common challenges, and
various best practices have been developed to provide real-world recommendations for
ways to manage IT infrastructures. From a business standpoint, the specifics include
establishing policies and processes and implementing the tools and technology required
to support them.
• IT as a service provider—The perceived role of IT can vary dramatically among
organizations. One approach that helps IT managers better meet the needs of users is to
view IT as a service provider. In this approach, the “customers” are end users that rely
upon IT infrastructure to accomplish their tasks. This method can help in the
development of Service Level Agreements (SLAs) and IT processes and better
communicate the business value that IT organizations provide.
• Agile management—Modern IT environments are forced to constantly change in reaction
to new business requirements. In the early days of IT, it was quite common for network
administrators, server administrators, and application administrators to work in isolated
groups that had little interaction. These boundaries have largely blurred due to increasing
interdependencies of modern applications. With this convergence of servers and networks
comes new management challenges that require all areas of a technical environment to
work in concert.
• Network and server automation—The building blocks of IT infrastructure are servers and
network devices. In an ideal world, all these complex resources would manage
themselves. In the real world, significant time and effort is spent in provisioning and
deploying resources, managing configurations, monitoring performance, and reacting to
changes. All these operations are excellent opportunities for labor-reducing automation.

1
The Reference Guide to Data Center Automation

Through each of the topics in this guide, we’ll cover important terms and concepts that will
enable IT departments to perform more tasks with fewer resources. The importance and value of
automating standard IT operations can be significant in data centers of any size. The goal is to
significantly lower IT operational expenses while at the same time improving the end-user
experience. Whether you’re a CIO or IT manager looking for ways to improve efficiency or a
member of the down-in-the-trenches IT staff, you’ll find valuables concepts, methods, and
techniques for better managing your computing infrastructure.

Information Technology Infrastructure Library


The Information Technology Infrastructure Library (ITIL—http://www.itil.co.uk/) is a collection
of IT-related best practices. It is developed and maintained by the United Kingdom Office of
Government and Commerce (OGC). ITIL was created to address the lack of standard
recommendations for managing IT resources. The goal of ITIL is to provide a framework and
guidelines that allow IT organizations to deliver high-quality services in a manageable way. The
original content was developed in the late 1980s and continues to be updated with improved
recommendations to support modern IT environments. The material is copyrighted by the UK
OGC and the information is available in a variety of different formats. ITIL has become one of
the most popular standards for IT-related best practices worldwide and is currently being used by
thousands of IT organizations.

Benefits of ITIL
Many IT organizations tend to operate in an ad-hoc and reactive fashion. They often respond to
issues after they occur, leading to problems such as downtime and lower quality of service
(QoS). In many cases, this scenario is understandable as IT organizations are often faced with
providing increased levels of service with insufficient staff and minimal budgets. Many
organizations either cannot afford to provide additional resources to IT departments or cannot
justify the reasons to increase investments.
On the surface, this problem might seem very difficult to solve. However, one approach—
increasing overall efficiency—can improve IT service delivery without requiring significant
expenditures. It is in this arena where the implementation of IT management best practices
comes in.
The recommendations included in ITIL were developed based on studies of methods used by
successful IT organizations worldwide. These approaches to solving common IT problems have
been compiled and organized into a set of recommendations. Although implementing ITIL
practices can take time and effort, most organizations will find that the potential benefits clearly
justify the cost. The following sections look at some of the potential ways in which
implementing ITIL practices can benefit IT operations.

2
The Reference Guide to Data Center Automation

Improving Levels of Service


The quality of an IT organization is often measured by its ability to respond to business-related
change requests and to provide reliability, availability, and performance. Many IT organizations
do not have an organized process for responding to new issues and requests, and often several of
the requests “fall through the cracks.” ITIL prescribes ways in which organizations can improve
the reporting and management of problems and incidents. It helps IT organizations define how
particular problems should be addressed and how to communicate with end users. By better
managing these aspects of service delivery, IT departments can often identify potential areas for
improvement.

Reducing IT Costs
Many IT departments suffer from inefficiencies that lead to increased costs. Problems caused by
lack of communication, poor issue tracking, and ad-hoc changes can add up quickly. Often, IT
managers are unaware of the true costs of purchasing capital assets, configuring and deploying
new equipment, and maintaining this equipment. ITIL best practices include methods for
calculating true costs and for translating this information into business-related terms. This
information can then be used to make a strong case for investments in automation and other
labor-saving technologies.

Enforcing Well-Defined Processes


Policies and processes are crucial to a well-managed environment. When policies are
implemented and enforced, IT management can ensure that issues are dealt with consistently and
completely. ITIL recommendations provide suggestions for designing and implementing
successful processes.
Often, it seems that no matter how quickly responses are handled, users’ expectations are higher.
Through the use of SLAs, IT departments can communicate to users the type of response they
should expect for various problems. Developing an SLA is easier when service delivery is
managed through clearly defined processes.

ITIL Framework Content Organization


The total amount of information that is conveyed by the ITIL encompasses hundreds of pages.
To present the information in a more manageable way, this guide divides it into eight sets, each
of which focuses on a specific portion of the total framework. Figure 1 provides an overview of
the different sets and how they’re related. The most important point is that the box on the left
represents business requirements, and on the right is the actual technology. The ITIL framework
focuses on the content in between—the ways in which technology can be used to meet business
goals.

3
The Reference Guide to Data Center Automation

Figure 1: An overview of the ITIL framework.

Each set covers specific topics:


• Service Support (ISBN 0113300158)—An important aspect of IT operations is
determining how services are provided and how changes are managed. The beginning of
service operations is usually a request for a change from an end user, and the process
involves communicating with a service desk. It is the service desk’s responsibility to
ensure that the issue is documented and eventually resolved. Specifically, this area
includes problem management, incident management, configuration management, and
Help desk operations.
• Service Delivery (ISBN 0113300174)—Service Delivery focuses on defining and
establishing the types of services and infrastructure an IT department must provide to its
“customers.” Topics include creating SLAs, managing overall capacity, developing
availability goals, and determining financial management methods. These topics are
particularly useful for identifying the purpose and business value of IT. Both Service
Support and Service Delivery are parts of the overall Service Management topic.
• Planning to Implement Service Management (ISBN 0113308779)—Many organizations
quickly realize the value of using the ITIL approach and want to know how best to move
to this model. As few IT organizations have the luxury of starting completely from
scratch, it’s important to understand how to migrate to ITIL recommendations. This set
provides details about how an organization can develop a plan for implementing the best
practices suggested within the ITIL framework. It includes information about justifying
the use of ITIL (potential benefits). This area is an excellent starting point for IT
managers that are prepared to “sell” their organizations on the value of the ITIL
approach.

4
The Reference Guide to Data Center Automation

• Security Management (ISBN 011330014X)—In recent years, computer security has


moved to the forefront of issues for technical staff. As businesses store and provide larger
amounts of information, protecting that data has become a critical part of operations. This
set focuses on best practices for managing security throughout an IT organization.
• ICT Infrastructure Management (ISBN 0113308655)—The term Information
Communications Technology (ICT) refers to traditional computer-based resources such
as workstations and servers as well as the applications that they run (for example, office
productivity suites, accounting packages, and so on). The acronym ICT (which is not
widely used in the United States) generally refers to the end purpose of IT infrastructure.
This volume focuses on managing computing resources, including network service
management, operations management, and the installation and management of computing
resources.
• The Business Perspective (ISBN 0113308949)—It is important for both business leaders
and technologists to understand the overall benefits that can be provided by IT. This set
focuses on ways in which IT can meet requirements through managing changes,
establishing business continuity practices, and working with outside help through
outsourcing. These topics are all critical to the business value of an IT environment.
• Application Management (ISBN 0113308663)—The primary purpose of IT infrastructure
is to support the software that is required by users to perform their job functions. This set
covers best practices related to managing the entire application life cycle, beginning with
gathering and documenting the business requirements for the software. Although this
topic is particularly helpful for organizations that develop custom applications, the
practices are also useful for evaluating and implementing third-party products.
• Software Asset Management (ISBN 0113309430)—Managing applications throughout an
entire IT environment can be a daunting and time-consuming task. Furthermore, the
process must be ongoing as new programs are frequently added or updated. This set
describes best practices for creating an inventory of software applications and managing
the installed base. The topic enables IT to accurately track licensing compliance and to
ensure that purchasing does not exceed requirements.
The content is applicable to many different levels within an IT organization, ranging from CIOs
to systems administrators; it can also be helpful for business management professionals. From an
implementation standpoint, the ITIL framework is intended to provide a set of flexible
guidelines. There is definitely room for interpretation of the specific best practices, and it’s up to
IT management to determine the best way to implement the recommendations. It is important to
note that many of these areas are interrelated and the ideal infrastructure will take advantage of
all the best practices presented in the framework.

5
The Reference Guide to Data Center Automation

ITIL Compliance
In some cases, organizations might find that they’re already following at least some of the ITIL
practices (regardless of whether they have consciously done so). Using ITIL’s methodology and
recommendations can give structure to these efforts. In other cases, IT departments may be able
to benefit greatly from implementing the features of the framework.
Unlike some other business-related standards, there is no official certification or testing process
that can “approve” an organization’s use of ITIL. It is up to organizations to determine the best
methods and approaches for implementing these practices in their environments. There are,
however, voluntary compliance certificates. These are known as the Foundation Certificate, the
Practitioner’s Certificate, and the Manager’s Certificate (see Table 1 for more information
about the certifications).

ITIL Content and Resources


The ITIL content is copyrighted, and can be obtained through books, CD-ROMs, or licensed
intranet content. Many online book resellers offer the books and related media (they’re easiest to
find using the ISBNs listed earlier in this topic).
Table 1 provides a list of good online starting points for additional information. Additionally,
numerous independent books and papers have been written. Each of these focuses on one or
more of the topics presented by the ITIL framework. A web search for “ITIL” or any of the
specific content topics will also uncover numerous vendors and publishers that offer related
content. In addition, professionals that are looking for more information can join one of many
different online forums and professional organizations dedicated to the ITIL methodology.
Web Site Notes URL
Office of Provides and overview of the http://www.ogc.gov.uk/index.asp?id=10003
Government and purpose and function of ITIL. 67
Commerce ITIL
Information site
ITIL “Open Guide” An open source-based http://itlibrary.org/
version of basic ITIL content.
The site provides resources
that help define and organize
the various terms and
concepts used by the ITIL
framework.
IT Service An independent, not-for-profit http://www.itsmf.com/index.asp
Management organization that focuses on
Forum (itSMF) IT best practices.
ITIL Community A portal for ITIL-related http://www.itilcommunity.com/
Forum information, including a
discussion forum and links to
various ITIL resources.
ITIL Certification A voluntary registration site http://www.itlibrary.org/index.php?page=ITI
Register for IT professionals who use L_Certification_Register
the ITIL methodology.

Table 1: ITIL-Related Web Sites.

6
The Reference Guide to Data Center Automation

The Business Value of Data Center Automation


Over time, modern businesses have grown increasingly reliant on their IT departments.
Networked machines, multi-tier applications, and Internet access are all absolute requirements in
order to complete mission-critical work. However, in many organizations, the clear business
value of IT is difficult to estimate. Unlike departments such as sales and marketing, there are
often few metrics available for quantifying how IT benefits the bottom line. Part of the reason for
this disparity is that IT departments have evolved based out of necessity and have a history of
filling a utilitarian role. Instead of presenting clear business value propositions, they tend to grow
as needed and react to changing business requirements as quickly as possible. In many cases, this
situation has caused IT budgets to shrink even while organizations are placing a greater burden
on IT staff. Furthermore, business units often see IT as out of touch with the rest of the business.
To ensure success for modern companies, it’s critical that all areas of the business recognize
common goals and that all contribute toward achieving them. It’s difficult to deny the basic
business value of IT departments, but the quandary that emerges revolves around how to
measure, quantify, and communicate those benefits to business decision makers. This guide
looks at the specific business benefits of IT, including details related to measuring benefits and
costs. It then explores how data center automation can help increase the overall value that IT
departments provide to their organizations.
Basic Benefits of IT
Practically everything that a business does relies upon the business’ underlying computing
infrastructure. Accordingly, IT departments’ internal “customers” expect a certain level of
service. They depend upon IT to perform various functions, including:
• Maintaining the infrastructure—If asked what their IT departments do, many end users
would point to the computing infrastructure: setting up workstations and servers, keeping
systems up-to-date, and installing and managing software. Reliable and high-performance
Internet connectivity has become almost as vital as electricity; without the Internet, many
business functions would cease. IT is responsible for implementing and maintaining an
efficient infrastructure that supports these requirements.
• Reacting to business changes—New business initiatives often place new (or at least
different) requirements on the computing infrastructure. For example, a new marketing
campaign might require new applications to be deployed and additional capacity to be
added. Alternatively, an engineering group might require a new test environment in order
to support the development of a new product. Usually, there is an organized process to be
followed whenever an employee starts or leaves the company. These changes often need
to be made as quickly as possible and in a cost-efficient manner.
• Troubleshooting—From the Help desk to critical network and server support, the service
desk is often the first point of contact with IT for users that are not able to do their jobs.
Users rely on these resources to quickly and efficiently resolve any issues that arise.
These benefits of IT generally point to tactical operations—performing maintenance-related
operations. When enumerating the benefits of IT, often the first metrics that come to mind are
those involving reliability, availability, and performance. Although these are certainly important
considerations, they do not necessarily demonstrate the strategic advantage of how IT initiatives
and projects can contribute to the bottom line. Consequently, it’s easy to simply look at IT as just
a cost center. Regardless of whether end users realize it, IT departments do much to help their
organizations.

7
The Reference Guide to Data Center Automation

Calculating the Value of IT


As with any business department, it’s important for management at all levels to see the benefits
that are provided to the business as a whole. In some cases, this can be quite simple. For
example, there are many metrics that can be used to measure sales and marketing performance.
Most organizations realize that in addition to these other business areas, IT is a vital portion of
operations.
To calculate the business value of IT, organizations should establish well-defined metrics that
reflect the overall business benefit of the computing infrastructure. The information required to
do so extends far beyond the boundaries of the IT organization. Instead, it must involve all areas
of the business, ranging from end users to executive management. The goal is to demonstrate
how IT affects the business.

Identifying Costs
An important consideration for IT management is to be able to calculate and clearly
communicate the real costs and benefits of the services that they provide. This identification
usually starts with determining the Total Cost of Ownership (TCO) of a specific portion of the
infrastructure. Often, when business leaders think of the costs related to increasing capacity, they
think only of capital expenditures (such as the purchase price of a workstation or a server). In
most environments, however, this cost represents only a very small portion of the total cost. IT
departments must add in network-related costs, labor costs (for installation, configuration, and
management), software licensing costs, and depreciation.
Often, just the act of collecting this information can provide visibility into an IT department’s
purpose and structure. It can also be very useful for identifying areas of improvement. Most
importantly, however, when true costs are communicated, other areas of the business can begin
to understand how their operations affect the overall finances of the company.

Discovering Business Benefits


Members of IT organizations have a tendency to think of the value of their services from a
technical standpoint. It’s easy to look at server racks or performance statistics as evidence of a
job well done. However, the best measurements of the value of IT involve the real impacts these
measures have had on the business. For example, suppose that a new test lab environment has
helped the Quality Assurance department reduce testing time by 25 percent; this metric is an
important one for business leaders to recognize. Similarly, if the implementation of new anti-
spam measures have increased productivity (if through nothing else than decreasing the negative
productivity impact of spam), it’s important to capture this information.
Enumerating business benefits requires strong communications with other areas of the
organization. A good first step is to identify which areas of the business are directly benefiting
from technology. IT leaders must understand how new infrastructure components such as servers
and workstations are being used. They must be sure that the implemented solutions closely fit the
problem. Based on this data, establishing metrics related to employee productivity and business
results (such as sales improvements) can be directly tied back to IT initiatives and projects.

8
The Reference Guide to Data Center Automation

Communicating Strategic Business Value


Once the costs and benefits have been identified, business and technical leaders can realize how
IT performs a strategic function—not just an operational one. In order to communicate strategic
business value, IT departments should focus on overall business goals. For example, a key goal
for a software company might be to reduce development time and shorten release cycles. The
implementation of a new server or technology (such as server virtualization) can often provide
dramatic benefits. If mobile sales personnel are having problems entering orders while on the
road, improvements to the network infrastructure and better training might help alleviate the
pain.

Improving the Business Value of IT


Once IT is identified as a critical part of business operations, the entire organization can work
together to improve overall value. This step often starts with the planning phases for new
projects and new initiatives. When business leaders can clearly see the benefits of investing in
their IT departments, overall business performance improves.
Decisions should be based on cost-benefit analysis calculations. Although this task might seem
simple on the surface, it can actually require a significant amount of information. The costs
related to processes should be as accurate as possible and should be based on capital asset costs
(including servers, workstations, network devices, and software) as well as personnel costs.
Additionally, there can be many “hidden fees,” including opportunity costs. Because IT
resources are often stretched to the limits, a new project or initiative might result in labor and
resource reductions for normal operations. All of these costs and potential tradeoffs should be
clearly communicated to business decision makers so that they can make informed decisions
about projects. When taking on new projects and initiatives, departments can work together with
IT to determine the best approach.

The Value of Data Center Automation


So far, we’ve seen how a major component of overall IT costs and overall service levels relate to
labor. It takes time and effort to maintain even small IT environments, and these factors can
clearly affect the bottom line. One initiative that can provide clear benefits and a quick return on
investment is data center automation. Data center automation solutions can dramatically reduce
charges for one of the most expensive resources—labor. Tools and features that allow for
automated deployment, provisioning, change management, and configuration tracking provide an
excellent payoff.

9
The Reference Guide to Data Center Automation

For example, a common challenge for most IT environments is that of keeping systems up to
date. Managing security patches and other software changes can easily use up large amounts of
time. Furthermore, the process tends to be error-prone: It’s easy for systems administrators to
accidentally overlook one or a few systems. Through the use of data center automation, the same
tasks can be performed in much less time with far less involvement from IT staff. This provides
numerous benefits, including freeing systems administrators to work on other tasks. Often,
automation increases the server-to-administrator ratio and reduces the amount of time required to
perform operations. Other benefits include improved consistency, the enforcement of policies
and processes, and improved security. Additionally, by implementing best practices (such as
those provided with the ITIL), efficiency and operational reliability can improve.
The bottom line is that data center automation can significantly improve the business value of IT.
By reducing costs and removing data center-related bottlenecks, data center automation enables
IT and business leaders to focus on more important tasks. The entire organization will be able to
react more quickly and surely to changes, providing both strategic and tactical advantages to the
entire enterprise.

Implementing Charge-Backs
A major problem for some IT organizations is that various departments often compete for
infrastructure resources such as new servers or workstations. IT managers are often in the
difficult position of deciding which projects are approved based on their limited resources and
budgets. This can lead to an adversarial relationship and to some less-than-ideal decisions.
One potential solution is to implement a system of charge-backs. In this system, the IT
department would pass costs for various projects back to the departments that request them. The
charges would affect these departments’ bottom lines. The idea is that business leaders will be
much more judicious in their decisions when they directly experience the costs to the business.
Although implementing and managing charge-backs can increase administration overhead, the
overall cost savings can justify it. Of course, in order for this system to be successful,
cooperation from the entire organization must be obtained.

Enabling Better Decisions


IT can leverage business value data to help the entire organizations make better decisions. For
example, when considering ways in which to improve organizational efficiency, IT initiatives
can play a pivotal role in controlling costs and adding capabilities. A well-managed IT
department will have standards and processes in place to ensure that all aspects of the
environment are properly managed. This can help answer important questions, such as “Are
resources being allocated optimally?” and “Are the right projects being worked on?” With this
new view, businesses can clearly see the IT department as a strategic partner instead of just a
cost center.

10
The Reference Guide to Data Center Automation

Service Provider
Modern organizations often rely upon many vendors and outside resources to meet business
objectives. For example, a marketing group might recruit outside talent to develop a Web site or
to work on creative aspects of a new campaign. Alternatively, engineering groups might rely on
outsourcing to contractors or consultants to build a portion of a product. IT departments,
however, are often seen as cost centers that provide only basic infrastructure services. By treating
IT departments as service providers, however, a strategic relationship can be established, and IT
can be seen as a business partner.

Benefits of Operating IT as a Service Provider


The value of a service provider is often measured by its abilities to help its customers reach their
goals. In this arena, customer service is most important. By having IT departments serve its
customers in this arrangement, both can work together to ensure that the best projects and
solutions—those that provide the most value to the individual business units—are delivered.
When IT works as a service provider, it should act like an independent business. Its “customers”
are the end users and departments that it serves, and its “products” are the services and
technology solutions that are provided for use by the customers. Although this concept might at
first seem like a strange approach for an internal department, there are many potential benefits.
First, IT services are better communicated so that end users know what to expect (and what to do
if expected service levels are not met). Second, all areas of the business can see how IT
operations are helping them achieve their objectives.

Implement the Service Provider Model


There are several aspects that must be taken into consideration before an internal IT department
can be seen as a business partner. This section will look at parts of the overall approach of
becoming a service provider to internal customers.

11
The Reference Guide to Data Center Automation

Identifying Customers’ Needs


A good salesperson will always work hard to determine a customer’s needs. If he or she truly
believes in their products, they can quickly identify which are relevant and which will provide
the best benefit. For IT as a service provider, this process can start with meetings with individual
department leaders as well as end users that might have specific requirements. The overall goal
for the service provider is to focus on the business goals of the customer, and not on technology
itself.
The first step is to identify the primary purpose of the department. This includes details related to
how success is measured and approaches to achieving the success. The details will likely differ
dramatically between, for example, sales and engineering organizations. Next, it is important to
identify current “pain points”—problems or limitations that are reducing the level of success.
Based on this input, IT service providers can develop proposed solutions that address those
issues.
As with a pre-sales effort, it’s important for IT to gather as much information as possible early in
the game—well before any implementation has been discussed. During this phase, it’s important
to identify functionality that is absolutely required as well as items that are not required but
would be nice to have. If there is any ambiguity at this point, details and risks should be
identified. Important high-level questions to ask include whether the benefits justify the costs and
whether business demands truly present a need for the solution.

Determining “Product Pricing”


IT organizations should come up with complete pricing for their products and solutions. This
pricing scheme should include details related to capital asset charges (including hardware,
software, and network costs) as well as labor-related costs. Higher costs might be incurred by
using non-standard hardware or software. Presenting such costs will help the customer determine
whether a particular solution is cost-effective for their department and whether it benefits the
organization as a whole. Additionally, other factors (such as a lack of personnel or when other
high-priority projects are underway) should also be communicated to the customer.

Identifying Service Delivery Details


Once a customer has agreed to purchase a specific product or service from the IT department, it’s
time to look into the implementation details. It’s important to identify the key stakeholders and to
establish points of contact on the IT side and on the customer side. The goal should be to identify
who is responsible for which actions.
Milestones should be designed and mutually agreed upon before moving forward. Also,
processes for managing changing requirements will help eliminate any surprises during the
implementation of the solution. For larger projects, a change management process should be
created, complete with approval authority from the customer and the service provider.

12
The Reference Guide to Data Center Automation

Measuring Service Levels


An IT service provider can create products of various types. Some might be closed-ended
initiatives, such as the installation of a Customer Relationship Management (CRM) solution, or
the expansion of a development test lab. In those cases, service levels can be measured based on
milestones and the quality of the implementation. Stakeholders can sign off on project
completion just as they would with external vendors.
Other products might involve expected levels of service. For example, when new servers and
workstations are added, customers should know what type of response to expect when problems
occur. Service Level Agreements (SLAs) can be instrumental in developing mutually agreed-
upon expectations. For less-critical systems, longer turnaround times might be acceptable. For
mission-critical components, greater uptime and quicker response might be justified. Of course,
those services will likely come at a higher cost because they will involve additional staff
allocation, the purchase of high-availability solutions, and other features.

Prioritizing Projects
All businesses are constrained with limits on their amount of production, and IT departments are
no exception. Based on labor capacity and technical constraints, only some of the proposed
projects might prove to be feasible. In the traditional IT arrangement, departments often have to
compete for infrastructure resources. Often IT departments are faced with the difficult situation
of deciding which projects should continue and which simply cannot be taken on.
However, when IT works as a service provider, the vendor and customer can work together to
determine what is best for the business overall. If a particular implementation is extremely
costly, both can decide to hold off until more resources become available. However, if multiple
projects are similar and efficiency can be gained by combining them, the business will see an
overall benefit.

Network Configuration Management


When things are working properly, most users barely realize that the network is there. But when
network problems cause downtime, the costs to business operations can be tremendous. Still, IT
organizations are faced with the difficult task of managing increasingly complex and distributed
networks with limited staff and resources.
Although configuring and managing network devices is a task of critical importance, it can be
very difficult to perform accurately and consistently. Network Configuration Management
(NCM) refers to the use of an automated method to configure and manage network devices
throughout an IT environment.

13
The Reference Guide to Data Center Automation

NCM Tasks
The act of managing the components of a network can place a significant burden on IT staff. The
process starts with the deployment of new routers, switches, firewalls, and other devices. New
hardware has to be purchased and configured before it’s brought online. The deployment must be
tested, and network administrators must verify that it is working according to the network
guidelines. And that’s just the beginning.
Maintenance operations include regularly updating to the latest available security patches. Other
routine maintenance functions involve changing passwords and updating configurations.
Business-related changes can often require significant upgrades or modifications to the network
infrastructure and adding capacity is a regular task in growing organizations. The goal of
configuration is to rapidly respond to change requests that range from opening a single firewall
port to redesigning entire subnets—without introducing new problems to the environment.

Configuration Management Challenges


There are many challenges that are related to managing network configurations. Some of these
challenges include:
• Making configuration changes—In all but the smallest of network environments, the time
and effort required to manually modify configuration settings on dozens or hundreds of
devices can be a tedious, time-consuming, and error-prone task.
• Enforcing processes—In many IT environments, it’s very easy to perform technical
operations in an ad-hoc manner. Due to the stress and pressure of reacting to business
demands, network administrators often take shortcuts and directly make modifications.
Although this can lead to what seems like a quick response, it can lead to serious
problems in network configurations later. Clearly, processes must be enforced.
• Adhering to best practices—Network security best practices include frequently changing
passwords, ensuring that patches are applied quickly, and monitoring devices for
suspicious activity. Often, due to time and resource limitations, these tasks are lowered in
priority. However, ensuring consistent configurations and adhering to change control
processes are critical for reliability of the network.
• Communication and coordination—Network administrators might understandably make a
change to resolve an urgent situation. Once the situation is resolved, however, they might
fail to communicate this to their peers. Should a problem occur in the future, this can
complicate tracking down the root cause of the issue. Distributed administration can also
cause problems. Although it’s often necessary for multiple network administrators to
have access to the same devices, when two or more administrators modify a device, they
may inadvertently overwrite the other’s changes. Such “collisions” can lead to complex
problems that are difficult to troubleshoot.
Regardless of the amount of work involved, IT departments are often limited in labor resources
to perform these tasks. That is where automation steps in.

14
The Reference Guide to Data Center Automation

NCM Solutions
Automated NCM solutions can help address many of the challenges related to maintaining a
network infrastructure. The key feature of an automated NCM solution is that all modifications
are made through the system. Ideally, network administrators do not have direct access to the
actual device configurations themselves. All modifications must occur based on a specific
workflow and changes are tracked for later review (see Figure 2).

Figure 2: Configuration management using an NCM solution.

Benefits of Automating NCM


The list of benefits related to using an automated NCM solution is a long one. Specifically, NCM
solutions provide:
• Improved efficiency—Manually configuring routers, switches, and other devices can take
a significant amount of effort. Some important changes might simply require too much
time and effort and may never be performed at all. Automated solutions can handle
changes for hundreds of devices without requiring manual intervention. Network
administrators can use the time that they save to focus on other tasks that better use their
skills. The end result is that the network infrastructure can be more reactive to business
changes, and costs can be lowered.
• Policy enforcement—In a manually managed environment, it’s up to each individual to
be responsible for adhering to processes. It’s often difficult to remember all the
processes, and in some cases, network administrators might take shortcuts. Related
problems can be difficult to isolate and resolve. Through the use of automated
configuration management, IT managers can be assured that all changes are coordinated,
tracked, and done in accordance with the defined policies.
• Automated network discovery—Modern networks tend to be very complex and have
hundreds or even thousands of devices that must be accounted for and managed. It’s
understandably easy to overlook important pieces of the infrastructure. Automated
solutions aid in the process of collecting information and can store and display data about
the environment without requiring a manual scavenger hunt. This setup helps prevent
surprises when managing the entire environment.

15
The Reference Guide to Data Center Automation

• Improved security—Neglecting to keep network infrastructure devices up to date can


lead to security violations or reliability issues. Automated NCM solutions can quickly
identify and resolve any maintenance- or configuration-related problems according to
company policies.
• Configuration consistency—When dealing with complex environments, consistency is an
important factor. Without automation, it’s very easy for human error to creep into
configuration files. Ad-hoc changes are difficult to detect, and a less-than-ideal
configuration may persist for months or years. In the worst case, the configuration
problem will be detected only after a security violation or downtime is experienced.
Making the change process easier can also avoid putting off important modifications
simply because of the amount of effort required. The improved responsiveness means
that significant changes can be performed with minimal disruption to the business.
• Backup and recovery—Network device configurations can be complex and vital to the
proper operations of a business. An automated configuration management tool can
regularly collect configuration information for an entire network environment and store it
securely. In the event of a device failure, the configuration can be quickly restored,
reducing downtime and the loss of setup details.
• Monitoring—Network performance is a critical aspect of business operations in many
environments. NCM tools can regularly measure performance statistics throughout an
environment and can report on any potential problems—often before users even notice
delays.
• Auditing and reporting—Various business processes can benefit from visibility into the
entire infrastructure that is supported by an organization. Auditing allows network
administrators to compare the intended configuration of a device with its actual
configuration. For organizations that must adhere to regulations such as the Sarbanes-
Oxley Act or the Health Insurance Portability and Accountability Act (HIPAA), auditing
can significantly help in proving compliance. Any inconsistencies can be quickly
identified and resolved. Additionally, reporting provides IT managers with the ability to
gain insight into what has been deployed along with how it’s being used.

Choosing an NCM Solution


The many benefits of using an automated NCM solution are difficult to overlook, but this leads
to the question of how to choose the best product. First and foremost, an NCM solution should
allow IT managers to define and enforce change control policies and processes. The solution
should ensure that changes can be made only by authorized individuals and only after the
appropriate steps (such as peer review) have been taken. All changes should be tracked, and the
solution should provide ways for auditing settings to ensure that everything is working as
desired. The solution should also provide for automatic backup and restore of configuration
data—automatic backups so that the latest authorized configuration is always in safe storage, and
automated restore to, for example, roll back after a failed change deployment or after an
unauthorized change deployment.
Organizations should look for solutions that use a centralized configuration management
database and that allows for tracking details of other computing resources such as workstations
and servers. With all these features, the otherwise difficult task of maintaining a network
infrastructure can become simpler and much more efficient.

16
The Reference Guide to Data Center Automation

Selecting an NCM solution can be complicated. Your business’ own requirements—both


technical and procedural—will define the exact feature set you need. However, there are some
requirements for which most businesses and organizations can find common ground:
• Security—An NCM solution should provide granular, role-based security so that each
individual using the system can have exactly the permissions they need and no more.
This includes the ability for auditors (for example) to review configuration information
but not to make changes. Authentication should be centralized and, when appropriate,
integrated with any existing directory or authentication (such as TACACS+) service you
have in place.
• Configuration repository—The solution should provide a version-controlled repository,
enabling the retrieval of past versions of a device’s configuration. Capture of
configuration data into the repository should be made as part of change deployments, on
a regular basis, and on-demand.
• Logging—All activity should be logged. This should include pass-through Telnet or SSH
activity, where such logging usually takes the form of keystroke logging so that all
administrative activity can be properly audited.
• Workflow enforcement—If your company has a process for managing change, the
solution should help enforce that process. For example, solutions should be able to
enforce a peer review or managerial approval requirement before allowing changes to be
deployed.
• Notification—Full notification—of unauthorized changes, successful deployments, and
other events—capabilities should be built-in to the solution. Such capabilities help alert
your IT staff to problems that need their attention or to recent events that might require
manual follow-up or verification.
• Configuration policies and remediation—When possible and desirable, you might want a
solution that is capable of analyzing device configurations and comparing them with a
standard configuration template or policy. By alerting you to nonstandard configurations,
the solution can help you identify devices that do not meet, for example, security or
compliance requirements. Automated remediation goes a step further by automatically
reconfiguring non-compliant devices to meet your configuration standards.
• Configuration comparison—The solution should provide the ability to compare different
versions of a device’s configuration, visually highlighting differences for quick
identification and review.
• Automation—When possible, the solution should respond automatically to configuration
events such as reconfigurations that occur outside the solution. This support might derive
from syslog or Simple Network Management Protocol (SNMP) monitoring or through
other means.
• Multiple vendor support—A solution should support, of course, every brand and model
of device you have in operation. Further, the solution should be architected in a way that
facilitates easy addition of additional device support, helping make the solution “future
proof.”
By using these broad requirements as a starting point, you can begin to identify key features and
capabilities that are important to your organization and conduct pilot programs and product
evaluations to locate products and solutions that meet your specific needs.

17
The Reference Guide to Data Center Automation

Server Provisioning
Most IT users recognize that one of the most important—and visible—functions of their IT
departments is setting up new computers. Server provisioning is the process of readying a server
for production use. It generally involves numerous tasks, beginning with the purchase of server
hardware and the physical racking of the equipment. Next is the important (and tedious) task of
installing and configuring the operating system (OS). This step is followed by applying security
patches and OS updates, installing any required applications, and performing security
configuration.
When done manually, the entire process can be time consuming and error prone. For example, if
a single update is overlooked, the server may be vulnerable to security exploits. Furthermore,
even in the smallest IT environments, the task of server provisioning is never really “done”—
changes in business and technical requirements often force administrators to repurpose servers
with new configuration settings and roles.

Challenges Related to Provisioning


Modern OSs are extremely flexible and complicated pieces of software. They have hundreds of
configurable options to meet the needs of various roles they may take on. Therefore, the process
of readying a new server for production use can involve many different challenges. Some of
these include:
• Configuring OS options—New servers should meet corporate technical and business
standards before they’re brought online. Ensuring that new machines meet security
requirements might involve manual auditing of configurations—a process that is neither
fun nor reliable. Other important settings include computer names, network addresses,
and the overall software configuration. The goal should be to ensure consistency while
minimizing the amount of effort required—two aspects that are not usually compatible.
• Labor-related costs—Manual systems administration tasks can result in large costs for
performing routine operations. For example, manually installing an OS can take hours,
and the potential for errors in the configuration is high.
• Support for new platforms—Provisioning methods must constantly evolve to support new
hardware, OS versions, and service packs. New technologies, such as ultra-dense blade
server configurations and virtual machines, often require new images to be created and
maintained. And, there is always a learning curve and some “gotchas” associated with
supporting new machines.
• Redeployment of servers—Changing business requirements often necessitate that servers
be reconfigured, reallocated, and repurposed. Although it is difficult enough to prepare a
server for use the first time, it can be even more challenging to try to adapt the
configuration to changing requirements. Neither option (reconfiguration or reinstallation)
is ideal.
• Keeping servers up to date—The installation and management of security updates and OS
fixes can require a tremendous amount of time, even in smaller environments. Often,
these processes are managed on an ad-hoc basis, leading to windows of vulnerability.

18
The Reference Guide to Data Center Automation

• Technology refreshes—Even the fastest and most modern servers will begin to show their
age in a matter of just a few years. Organizations often have standards for technology
refreshes that require them to replace a certain portion of the server pool on a scheduled
basis. Migrating the old configuration to new hardware can be difficult and time
consuming when done manually.
• Support for remote sites—It’s often necessary to support remote branch offices and other
sites that might require new servers. Sometimes, the servers can be installed and
configured by the corporate IT department and then be physically shipped. In other cases,
IT staff might have to physically travel between sites. The costs and inefficiencies of this
process can add up quickly.
• Business-related costs—As users and business units await new server deployments, there
are often hidden costs associated with decreases in productivity, lost sales opportunities,
and associated business inefficiencies. These factors underscore the importance of quick
and efficient server provisioning.
Clearly, there is room for improvement in the manual server-provisioning process.

Server-Provisioning Methods
Many OS vendors are aware of the pain associated with deploying new servers. They have
included numerous tools and technologies that can make the process easier and smoother, but
these solutions also have their limitations. To address the challenges of server provisioning, there
are two main approaches that are typically used.

Scripting
The first is scripting. This method involves creating a set of “answer files” or scripts that are
used to provide configuration details to the OS installation process. Ideally, the entire process
will be automated—that is, no manual intervention is required. However, there are some
drawbacks to this approach. First, the process of installing an OS can take many hours because
all the hardware has to be detected and configured, drivers must be loaded, hard disks must be
formatted, and so on. The second problem is that the scripts must be maintained over time, and
they tend to be “fragile.” When hardware and software vendors make even small specification
changes, new drivers or versions might be required.

Imaging
The other method of automating server provisioning is known as imaging. As its name suggests,
this approach involves performing a base installation of an OS (including all updates and
configuration), then simply making identical copies of the hard disks. The disk duplication may
be performed through dedicated hardware devices or through software. The major problems with
this approach include the creation and maintenance of images. As the hardware detection portion
of OS installation is bypassed, the images must be created for each hardware platform on which
the OS will be deployed. Hardware configuration changes often require the creation of new
images. Another problem is in managing settings that must be unique, including OS security
identifiers (SIDs), network addresses, computer names, and other details. Both approaches
involve some important tradeoffs and neither is an ideal solution for IT departments.

19
The Reference Guide to Data Center Automation

Evaluating Server-Provisioning Solutions


Automated server-provisioning tools allow IT departments to quickly and easily define server
configurations, install OSs, perform patches and updates, and get computers ready for use as
quickly as possible. When looking for an automated server-provisioning system, there are many
features that might help increase efficiency and better manage the deployment process. Features
to look for in an automated provisioning solution include:
• Broad OS compatibility—Ideally, a server-provisioning solution will support all of the
major OSs that your environment plans to deploy. Also, continuing updates for new OS
versions and features will help “future-proof” the solution.
• Integration with other data center automation tools—Server provisioning is often the first
step in many other related processes, such as configuration management and asset
tracking. A deployment solution that can automatically integrate with other IT operations
tools can help reduce chances for error and increase overall manageability.
• Hardware configuration—Modern server computer platforms often include advanced
management features for configuring the BIOS, disk arrays, and other options. Server-
provisioning tools can take advantage of these options to automate steps that might
otherwise have to be done manually.
• License tracking—Keeping track of OS and software licensing can easily be a full-time
job, even in smaller organizations. Server-provisioning tools that provide license-tracking
functionality can make the job much easier by recording which licenses are used and on
which machines.
• Support for network-based installation—A common deployment method involves using
network-based Pre-Boot eXecution Environment (PXE) booting. This method allows
computers that have no OS installed to connect to an installation server over a network
and begin the process. When all the components are in place, this method of provisioning
can be the most “hands-off” approach.
• Duplicating the configuration of a server—Upgrading servers to new hardware platforms
is a normal part of data center operations. Server-provisioning tools that allow for
backing up and restoring the configuration of an OS on new hardware can help make this
process quicker, easier, and safer.
• Ability to define configuration “templates”—Most IT departments have standards for the
configuration of their servers. These standards tend to specify network settings, security
configuration, and other details. When deploying new servers, it’s useful to have a
method for developing a template server configuration that can then be applied to other
machines.
• Support for remote sites—Deploying new servers is rarely limited to a single site or data
center, so the server provisioning tool should provide methods for performing and
managing remote deployments. Depending on the bandwidth available at the remote
sites, multiple installation sources might be required.
Overall, a well-designed automated server-provisioning tool can dramatically decrease the
amount of time it takes to get a new server ready for use and can help ensure that the
configuration meets all of an organization’s business and technical requirements.

20
The Reference Guide to Data Center Automation

Return on Investment
IT departments are often challenged to do more with less. They’re posed with the difficult
situation of having to increase service levels with limited budgets. This reality makes the task of
determining which investments to make far more important. The right decisions can dramatically
decrease costs and improve service; the worst decisions might actually increase overall costs. In
many ways, IT managers just know the benefits of particular technologies or implementations.
We can easily see how automation can reduce the time and effort required to perform certain
tasks. But the real challenge is related to how this information can be communicated to others
within the organization.
The basic idea is that one must make an investment in order to gain a favorable return. And most
investments involve at least some risk. Generally, there will be a significant time between when
you choose to make an investment, and when you see the benefits of that venture. In the best
case, you’ll realize the benefits quickly and there will be a clear advantage. In the worst case, the
investment may never pay off. The following sections explore how Return on Investment (ROI)
can be calculated and how it can be used to make better IT decisions.

The Need for ROI Metrics


The concept of ROI focuses on comparing the potential benefits of a particular IT project with
the associated costs. From the standpoint of technology, IT managers must have a way of
communicating the potential benefits of investments in process improvements and other projects.
These are the details that business leaders will need in order to determine whether to fund the
project. Additionally, once projects are completed, IT managers should have a way of
demonstrating the benefits of the investment. Finally, no one can do it all—there are often far
more potential projects than staff and money to take them on.
ROI is a commonly used business metric that is familiar to CFOs and business leaders; it
compares the cost of an investment against the potential benefits. When considering investments
in ventures such as a new marketing campaign, it’s important to know how soon the investment
will pay off, and how much the benefit will be. Often, the costs are clear—it’s just a matter of
combining that with risks and potential gain. By using ROI-based calculations, businesses can
determine which projects can offer the most “bang-for-the-buck.” A high ROI is a strong factor
in ensuring the idea is approved.

Calculating ROI
Although there are many ways in which ROI can be determined, the basic concepts remain the
same: The main idea is to compare the anticipated benefit of an investment with its expected
cost. Terms such as “benefit” and “cost” can be ambiguous, but this section will show the
various types of information you’ll need in order to calculate those numbers.

21
The Reference Guide to Data Center Automation

Calculating Costs
IT-related costs can come from many areas. The first, and perhaps easiest to calculate, is related
to capital equipment purchases. This area includes the “hard costs” spent on workstations,
servers, network devices, and infrastructure equipment. The actual amounts spent can be divided
into meaningful values through metrics such as “average IT equipment cost per user.” In addition
to hardware, software might be required. Based on the licensing terms with the vendor, costs
may be one-time, periodic, or usage-based.
For most environments, a large portion of IT spending is related to labor—the effort necessary to
keep an environment running efficiently and in accordance with business requirements. These
costs might be measured in terms of hours spent on specific tasks. For example, managing
security updates might require, on average, 10 hours per server per year. Well-managed IT
organizations can often take advantage of tracking tools and management reports to determine
these costs. In some cases, independent analysis can help.
When considering an investment in an IT project, both capital and labor costs must be taken into
account. IT managers should determine how much time and effort will be required to make the
change, and what equipment will be required to support it. In addition, costs related to down time
or any related business disruptions must be factored in. This might include, for example, a
temporary loss of productivity while a new accounting application is implemented. There will
likely be some “opportunity costs” related to the change: Time spent on this proposed project
might take attention away from other projects. All these numbers combined can help to identify
the total cost of a proposal.

Calculating Benefits
So far, we’ve looked at the downside—the fact that there are costs related to making changes.
Now, let’s look at factors to take into account when determining potential benefits. An easy
place to start is by examining cost reductions related to hardware and software. Perhaps a new
implementation can reduce the number of required servers, or it can help make more efficient use
of network bandwidth. These benefits can be easy to enumerate and total because most IT
organizations already have a good idea of what they are. It can sometimes be difficult for IT
managers to spot areas for improvement in their own organizations. A third party can often shed
some light on the real costs and identify areas in which the IT teams stand to benefit most.
Other benefits are more difficult to quantify. Time savings and increases in productivity are
important factors that can determine the value of a project. In some cases, metrics (such as sales
projections or engineering quality reports) are readily available. If it is expected that the project
will yield improvements in these areas, the financial benefits can be determined. Along with
these “soft” benefits are aspects related to reduced downtime, reduced deployment times, and
increased responsiveness from the IT department.

22
The Reference Guide to Data Center Automation

Measuring Risk
Investment-related risks are just part of the game—there is rarely a “sure thing” when it comes to
making major changes. Common risks are related to labor and equipment cost overruns. Perhaps
designers and project managers underestimated the amount of effort it would require to
implement a new system. Or capacity estimates for new hardware were too optimistic. These
factors can dramatically reduce the potential benefit of an investment.
Although it is not possible to identify everything that could possibly go wrong, it’s important to
take into account the likelihood of cost overruns and the impacts of changing business
requirements. Some of these factors might be outside the control of the project itself, but they
can have an impact on the overall decision.

Using ROI Data


Once you’ve looked at the three major factors that can contribute to an ROI calculation—costs,
benefits, and risk—you must bring it all together. ROI can be expressed in various ways. The
first is as a percentage value. For example, consider that implementing a new software package
for the sales department will cost approximately $100,000 (including labor, software, and capital
equipment purchases). Business leaders have determined that, within a period of 2 years, the end
result will be an increase in sales efficiency that equates to an additional $150,000 in revenue. It
can be said that the potential ROI for this project is equal to the benefit minus the cost.
Expressed as a percentage, this project will provide a 50 percent ROI within 2 years.
ROI can also be expressed as a measure of time. Specifically, it can indicate how long it might
take to recover the value of an investment. For example, an organization might determine that it
will take approximately 1.5 years to reach a “break-even” point on a project. This is where the
benefits from the project have paid back the costs of the investment. This method is more useful
for ongoing projects, where continual changes are expected.
As with all statistical data of this type, ROI calculations can be highly subjective. It’s important
that your company develop its own standards for calculating ROI in order to provide consistent,
reliable results. Risk should be carefully considered—for example, although a new solution
might offer a department 20 percent better efficiency, what are the odds that new employees will
be added who have inherently lower efficiency and productivity during their first days and
weeks? Also, as you implement solutions, be sure to track the actual ROI, including out-of-plan
events (such as new hires) that may impact the overall ROI and result in a different actual return.

23
The Reference Guide to Data Center Automation

Making Better Decisions


IT and business leaders can use ROI information to make better decisions about their
investments. Once details related to the expected ROI for potential projects are determined, all
areas of an organization can make educated decisions based on the anticipated risk and rewards.
Factors to look for include rapid implementation times, clearly defined tangible benefits, and
quick returns. It’s important to tailor the communications of details based on the audiences. A
CFO might not care that new servers are 30 percent more efficient than previous ones, but she’s
likely to take notice if power, space, and cooling costs can be dramatically lowered. Similarly,
when users understand that they’ll experience decreased downtime, they’ll be more likely to
support a change.
Many different projects can be compared based on the needs of the business. If management is
ready to make significant investments, the higher-benefit/higher-cost projects might be best.
Otherwise, lower-cost projects may be chosen. In either case, the goal should be to invest in the
projects with the highest ROI. Figure 3 provides an example of a chart that might be used to
compare details of various investments.

Figure 3: A chart plotting potential return vs. investment.

ROI numbers can also be very helpful for communicating IT decisions throughout an
organization. When non-technical management can see the benefits of changes such as
implementing automated processes and tools, this insight can generate buy-in and support for IT
initiatives. For example, setting up new network services might seem disruptive at first, but if
business leaders understand the cost savings, they will be much more likely to support the effort

24
The Reference Guide to Data Center Automation

Calculating ROI for some IT initiatives can be difficult. For example, security is one area in
which costs are difficult to determine. Although it would be useful if the IT industry had
actuarial statistics (similar to those used in, for example, the insurance industry), such data can
be difficult to come by. In these situations, IT managers should consider using known numbers,
such as the costs of downtime and damages caused by data loss, to help make their ROI-related
case. And it’s important to keep in mind that in most ROI calculations, subjectivity is
inevitable—you can’t always predict the future with total accuracy, and sometimes you must just
take your best guess.

ROI Example: Benefits of Automation


One area in which most IT departments can gain dramatic benefit is through data center
automation. By reducing the amount of manual time and effort required, substantial cost savings
can be realized in relatively short periods of time. This section will bring together these details to
help determine the potential ROI of an investment in automation.
In this hypothetical example, a company has decided that it is spending far too much money on
routine server maintenance operations (including deployment, configuration, maintenance, and
security). The environment supports 150 servers, and it estimates that it spends an average of
$1500 per year to maintain each server (including labor, software, and related expenses; this
figure is purely for illustrative and discussion purposes and will probably not reflect real-world
maintenance figures in your environment).
The organization has also found that, through the use of automation tools, it can reduce these
costs dramatically. By implementing automated server provisioning and patch management
solutions, it can reduce the operating cost to ~$300 per year per server. Using these numbers, the
overall cost savings would be a total of $1200 per server per year, or a grand total of $180,000
saved. The cost of purchasing and implementing the automation solution is expected to be
approximately $120,000, providing a net potential benefit of $60,000 within one year (again,
these numbers are purely for illustration and discussion and do not reflect an actual ROI analysis
of a real-world environment).

ROI Analysis
Based on the numbers predicted, the implementation of automation tools seems to be a good
investment. The return is a substantial cost savings, and the results will be realized in a brief
period of time. There is an additional benefit to making improvements in automation—time that
IT staff spends on various routine operations can be better spent on other tasks that make more
efficient use of their time and skills. For example, time that is freed by automating security patch
deployment can often increase resources for testing patches. That might result in patches being
deployed more quickly, and fewer problems with the patch deployment process. The end result is
a better experience for the entire organization. In short, data center automation provides an
excellent potential ROI, and is likely to be a good investment for the organization as a whole.

25
The Reference Guide to Data Center Automation

Change Advisory Board


Regardless of how well-aligned IT departments are with the rest of their organizations, an
important factor in their overall success is how well IT can manage and implement change.
Given that change is inevitable, the challenge becomes implementing policies and processes that
are designed to ensure that only appropriate changes are made, and that the process involves
input from the entire organization.
Best practices defined within the IT Infrastructure Library (ITIL) recommend the creation of a
Change Advisory Board. The CAB is a group of individuals whose purpose is to provide advice
related to change requests. Specifically, details related to the roles and responsibilities of the
CAB are presented in the Service Support book. The CAB itself should include members from
throughout an organization, and generally will include IT management and business leaders, as
required.

The Purpose of a CAB


A characteristic of well-managed IT organizations is having well-defined policies and processes.
It doesn’t take much imagination to see how having numerous systems and network
administrators making ad-hoc changes can lead to significant problems and inefficiencies. To
improve the implementation of change, a group of individuals from throughout the organization
is required. Members of the CAB are responsible for controlling which changes are made, how
they’re made, and when. The CAB performs tasks related to monitoring, evaluating, and
implementing all production-related IT changes. Their goal should be to minimize the risk and
maximize the benefits of suggested changes and to handle all change requests in an organized
way.

Benefits of a CAB
The main benefits of creating a CAB are related to managing a major source of potential IT
problems—changes to the existing environment. IT changes can often affect the entire
organization, so the purpose of the CAB is to determine which changes should occur and to
specify how and when they should be performed. The CAB can define a repeatable process that
ensures that requests have gone through an organized process and ad-hoc modifications are not
allowed. Through the CAB review process, some types of problems such as “collisions” caused
by multiple related changes being made by different people can be reduced.

Roles on the CAB


To be successful, the CAB must include representatives from various parts of the business. The
list of roles will generally begin with a change requester—the member of the organization that
suggests that a new implementation or modification is required. The actual people who take on
this role will vary based on the needs of the organization, but often the requesters will be
designated by the company’s management. Sometimes, when groups of users are affected, one or
a few people may be appointed in this role.

26
The Reference Guide to Data Center Automation

The CAB roles that are most important from a process standpoint are the members who perform
the review of the change request. In simple cases, there may only be a single approver. But, for
larger changes, it’s important to have input from both the technical and business sides of the
organization. The specific individuals might be business unit managers, IT managers, or people
who have specific expertise in the type of change being requested.
The next set of roles involves those who actually plan for, test, and implement the change. These
individuals may or may not be a portion of the CAB. In either case, however, it is the
responsibility of those who perform the changes to communication with CAB members to
coordinate changes with all the people that are involved.
As with many other organizational groups, it’s acceptable for one person to fill multiple roles.
However, as changes get more complex and have greater effects throughout the organization, it
is important for IT groups to work with the business units they support.

The Change-Management Process


To ensure that all change requests are handled efficiently, it’s important for the CAB to establish
a defined process. The process generally begins with the creation of a new request. Change
requests can come from any area within an organization. For example, the marketing department
might require additional capacity on public-facing servers to support a new campaign or the
engineering group might require hardware upgrades to support the development of new products.
Change requests can also come from within the IT department and might involve actions such as
performing security updates or installing a new version of important software on all servers.
Some change requests can be minor (such as increasing the amount of storage space available to
a group of users), while others might require weeks or months of planning.
Figure 4 provides an overview of the steps required in a successful change-management process.
Steps will need to be added to deal with issues such as changes that are rejected or
implementations that don’t fail.

Figure 4: A change-management process overview.

Ideally, the CAB will have established a uniform process for requesting changes. The request
should include details related to why the change is being requested, who will be affected by the
change, anticipated benefits, possible risks, and details related to what changes should occur.
Changes should be categorized based on various criteria, such as the urgency of the change
request. Organizations that must deal with large numbers of changes can also benefit from
automated systems that create and store requests in a central database.

27
The Reference Guide to Data Center Automation

When the CAB receives a new request, it can start the review process. It’s a good practice for the
CAB members to meet regularly to review new requests and discuss the status of those that are
in progress. During the review process, the CAB determines which requests should be
investigated further.

Planning for Changes


Once a request is initially approved, the CAB should solicit technical input from those that are
responsible for planning and testing the changes. This process may involve IT systems and
network administrators, software developers, and representatives from affected business units.
The goal of this team is to collect information related to the impact of the change. The questions
that should be asked include:
• Who will be affected? For most change requests, the effects will be seen outside of the IT
department. If specific individuals or business units will be affected by downtime,
changes in performance, or functional changes, the expected outcomes should be
documented.
• What are the costs? Even the simplest of change requests will require labor costs related
to implementing the changes. In many cases, IT organizations might need to purchase
more equipment to add capacity, or specific technical expertise might be required from
external vendors.
• What are the risks? Most changes have an inherent associated risk. Just the act of
changing something suggests that new or unexpected problems may arise. All portions of
the business should fully understand the risks before committing to making a change.
• What is the best way to make the change? Technical and business experts should research
the best way to meet the requirements of the change request and make recommendations.
This step usually involves several areas of the organization working together closely. The
goal is to provide maximum potential benefits while minimizing risk and effort required.
Based on all these details, the CAB can determine whether they should proceed with the change.
In some cases, reality might indicate that it’s not prudent to make the change.

Implementing Changes
If the potential benefits are difficult to overlook, and the risk is acceptable, the next step is to
implement the changes. An organization should follow a standardized change process, and the
CAB should be responsible for ensuring that the processes are followed. Often, at least the
service desk should be aware of what changes are occurring and any potential impacts. This will
allow them to respond to calls more efficiently and will help identify which issues are related to
the change.
During the implementation portion of the process, good communication can help make for a
smoother ride. For quick and easy changes, all that might be required is an email reminder of the
change and its intended affects. For larger changes, regular status updates might be better. As
with the rest of the process, it’s very important that technical staff work with the affected
business units in a coordinated way.

28
The Reference Guide to Data Center Automation

Reviewing Changes
Although it might be tempting to “close out” a request as soon as a change is made, the
responsibilities of the CAB should include reviewing changes after they’re complete. The goal is
not only to determine whether the proper process was followed but also to look for areas of
improvement within the procedures. The documentation generated by this review (even if it’s
only a brief comment) can be helpful for future reference.

Planning for the Unplanned


Although the majority of changes should be performed through the CAB, some types of
emergencies might warrant a simplified process. For example, if a Web server farm has slowed
due to a Distributed Denial of Service (DDoS) attack, changes must be made immediately. If this
happens during the night or over a weekend, authorized staff should have the authority to make
the necessary decisions. The CAB might choose to create a “change request” after the fact, and
follow the same rigorous review steps at a later time.
Overall, through the implementation of a CAB, IT organizations can help organize the change
process. The end result is reduced risk and increased coordination throughout the organization.

Configuration Management Database


To make better business and technical decisions, all members of the IT staff need to have a way
of getting a single, unified view of “everything” that is running their environments. A
Configuration Management Database (CMDB) is a central information repository that stores
details related to an IT environment. It contains data hardware and software deployments and
allows users to collect and report on the details of their environments.
The CMDB contains information related to workstations, servers, network devices, and software.
Various tools and data entry methods are available for populating the database, and most
solutions provide numerous configurable reports that can be run on-demand. The database itself
can be used to track and report on the relationships between various components of the IT
infrastructure, and it can serve as a centralized record of current configurations.
Figure 5 shows an overview of how a CMDB works with other IT automation tools. Various data
center automation tools can store information in the CMDB, and users can access the
information using an intranet server. The goal of using a CMDB is to provide IT staff with a way
to centrally collect, store, and manage network- and server-related configuration data.

29
The Reference Guide to Data Center Automation

Figure 5: Using a CMDB as part of data center automation.

The Need for a CMDB


Most IT organizations track information in a variety of different formats and locations. For
example, network administrators might use spreadsheets to store IP address allocation details.
Server administrators might store profiles in separate documents or perhaps in a simple custom-
developed database solution. Other important details might be stored on paper documents. Each
of these methods has weaknesses, including problems with collecting the information, keeping it
up-to-date, and making it accessible to others throughout the organization.
The end result is that many IT environments do not do an adequate job of tracking configuration-
related information. When asked about the network configuration of a particular device, for
example, a network administrator might prefer to connect directly to that device over the
network rather than refer to a spreadsheet that is usually out-of-date. Similarly, server
administrators might choose to undergo the tedious process of logging into various computers
over the network to determine the types and versions of applications that are installed instead of
relying on older documentation. If the same staff has to perform this task a few months later,
they will likely choose to do so manually again. It doesn’t take much imagination to recognize
that there is room for improvement in this process.

30
The Reference Guide to Data Center Automation

Benefits of Using a CMDB


A CMDB brings all the information tracked by IT organizations into a single centralized
database. The database stores details about various devices such as workstations, servers, and
network devices. It also maintains details related to how these items are configured and how they
participate in the infrastructure of the IT department. Although the specific details of what is
stored might vary by device type, all the data is stored within the centralized database solution.
The implementation of a CMDB can help make IT-related information much easier to collect,
track, and report on. Among the many benefits of using a CMDB are the following:
• Configuration auditing—IT environments tend to be complex, and there are often
hundreds of different settings that can have an impact on overall operations. Through the
use of a CMDB, IT staff can compare the expected settings of their computers with the
actual ones. Additionally, the CMDB solution can create and maintain an audit trail of
which users made which changes and when. These features can be instrumental in
demonstrating compliance with regulatory standards such as the Health Insurance
Portability and Accountability Act (HIPAA) or the Sarbanes-Oxley Act.
• Centralized reporting—As all configuration-related information is stored in a central
place, through the use of a CMDB, various reporting tools can be used to retrieve
information about the entire network environment. In addition to running pre-packaged
tools, developers can generate database queries to obtain a wide variety of custom
information. Many CMDB reporting solutions provide users with the ability to
automatically schedule and generate reports. The reports can be stored for later analysis
via a Web site or may be automatically sent to the relevant users via email.
• Change tracking—Often, seemingly complicated problems can be traced back to what
might have seemed like a harmless change. A CMDB allows for a central place in which
all change-related information is stored, and the CMDB system can track the history of
configuration details. This functionality is particularly helpful in modern network
environments where it’s not uncommon for servers to change roles, network addresses,
and names in response to changing business requirements.
• Calculating costs—Calculating the bottom line in network environments requires the
ability to access data for software licenses and hardware configurations. Without a
centralized solution, the process of collecting this information can take many hours. In
addition, it’s difficult to trust the information because it tends to become outdated very
quickly. A CMDB can help obtain details related to licenses, support contracts, asset tags,
and other details that can help quickly assess and control costs.
Overall, a CMDB solution can help address many of the inefficiencies of other methods of
configuration data collection.

31
The Reference Guide to Data Center Automation

Implementing a CMDB Solution


The goal of a CMDB is to help record and model the organization of an IT network environment
within a central data storage point. Although the details of implementation can vary greatly
between organizations, the same basic information is usually collected. Vendors that provide
data center automation solutions often rely upon a CMDB to track what is currently running in
the environment and how these devices are set up.
Implementing a new CMDB solution often begins with the selection of an acceptable platform.
Although IT organizations might choose to develop in-house custom solutions, there are many
benefits to using pre-packaged CMDB products. This section will look at the details related to
what information should be tracked and which features can help IT departments get the most
from their databases.

Information to Track
The IT industry includes dozens of standards related to hardware, software, and network
configuration. A CMDB solution may provide support for many kinds of data, with the goal of
being able to track the interaction between the devices in the environment. That raises the
question of what information should be tracked.

Server Configuration
Server configurations can be complex and can vary significantly based on the specific OS
platform and version. The CMDB should be able to track the hardware configuration of server
computers, including such details as BIOS revisions, hard disk configurations, and any health-
related monitoring features that might be available. In addition, the CMDB should contain details
about the OS and which applications are installed on the computer. Finally, important
information such as the network configuration of the server should be recorded.

Desktop Configuration
One of the most critical portions of an IT infrastructure generally exists outside the data center.
End-user workstations, notebook computers, and portable devices all must be managed.
Information about the network configuration, hardware platform, and applications can be stored
within the CMDB. These details can be very useful for performing routine tasks, such as security
updates, and for ensuring that the computers adhere to the corporate computing policies.

Network Configuration
From a network standpoint, routers, switches, firewalls, and other devices should be documented
with the CMDB. Ideally, all important details from within the router configuration files will be
included in the data. As network devices often have to interact, network topology details
(including routing methods and inter-dependencies) should also be documented. Wherever
possible, network administrators should note the purpose of various device configurations within
the CMDB.

32
The Reference Guide to Data Center Automation

Software Configuration
Managing software can be a time-consuming and error-prone process in many environments.
Fortunately, the use of a CMDB can help. By keeping track of which software is installed on
which machines, and how many copies of the software are in use concurrently, systems
administrators and support staff can easily report on important details such as OS versions,
license counts, and security configurations. Often, organizations will find that they have
purchased too many licenses or that many users are relying on outdated versions of software.

Evaluating CMDB Features


Although the basic functionality of a CMDB is easy to define, there are many features and
options that can make the task of maintaining configuration information easier and more
productive. When evaluating CMDB solutions, you should keep the following features in mind:
• Automatic discovery—One of the most painful and tedious aspects of deploying a new
CMDB solution is performing the initial population of the database. Although some of
the tasks must be performed manually, vendors offer tools that can be used to
automatically discover and document information about devices on the network. This
feature not only saves time but can greatly increase the accuracy of data collection. Plus,
automatic discovery features can be used to automatically document new components as
they’re added to the IT infrastructure.
• Integration with data center automation tools—A CMDB solution should work with other
data center automation tools, including configuration management, Help desk, patch
management, and related products. When the tools work together, this combination
provides the best value to IT—the CMDB can continue to be kept up to date from other
sources of information.
• Broad device support—Details about various hardware devices can vary significantly
between vendors and models. Ideally, the CMDB solution will provide options for
tracking products from a variety of different manufacturers, and the vendor will continue
to make updates available as new devices are released.
• Usability features—To ensure that IT staff and other users learn to rely upon a solution, it
must be easy to use. Many CMDB solutions offer a Web-based presentation of
information that can be accessed via an organization’s intranet. If they’re well-designed,
all employees in an organization will be able to quickly and easily get the data they need
(assuming, of course, that they have the appropriate permissions). For some types of
operations, “smart client” applications might provide a better experience.
• Performance and scalability—CMDB systems tend to track large quantities of
information about all the devices in the environment. The solution should be able to scale
to support an environment’s current and projected size while providing adequate
performance in the areas of data storage and reporting.
• Distributed database—Many IT organizations support networks at multiple locations. The
CMDB solution should provide a method for remote sites (such as branch offices) to
communicate with the database. Based on their network capacity, organizations might
choose to maintain a single central database. Alternatively, copies of the database might
be made available at multiple sites for performance reasons.

33
The Reference Guide to Data Center Automation

• Security features—The CMDB will contain numerous details related to the design and
implementation of the network environment. In the wrong hands, this information can be
a security liability. To help protect sensitive data, the CMDB solution should provide a
method for implementing role-based security access. This setup will allow administrators
to control who has access to which information.
• Flexibility and extensibility—In an ideal world, you would set up your entire IT
environment at once and never have to change it. In reality, IT organizations frequently
need to adapt to changing business and technical requirements. New technologies, such
as blade servers and virtual machines, can place new requirements on tracking solutions.
A CMDB solution should be flexible enough to allow for documenting many different
types of devices and should support expandability for new technologies and device types.
The solution may even allow developers to create definitions of their own devices.
• Generation of reports—The main purpose of the CMDB is to provide information to IT
staff, so the solution should have a strong and flexible reporting engine. Features to look
for include the ability to create and save custom report definitions, and the ability to
automatically publish and distribute reports via email or an intranet site.
• Customizability/Application Programming Interface (API)—Although the pre-built
reports and functionality included with a CMDB tool can meet many of users’
requirements, at some point, it might become necessary to create custom applications that
leverage the data stored in the CMDB. That is where a well-document and supported API
can be valuable. Developers should be able to use the API to programmatically return and
modify data. One potential application of this might be to integrate the CMDB with
organizations’ other IT systems.
Overall, through the use of a CMDB, IT organizations can better track, manage, and report on all
the important components of the IT infrastructure.

34
The Reference Guide to Data Center Automation

Auditing
The process of auditing involves systematic checks and examinations to ensure that a specific
aspect of a business is functioning as expected. In the financial world, auditing requires a review
of accounting records, and verification of the information that is recorded. The purpose is to
ensure that the details are consistent and that rules are being followed. From an IT standpoint,
auditing should be an important aspect of operations.

The Benefits of Auditing


Although some IT departments have established regular auditing processes, many tend to
perform these steps in a reactive way. For example, whenever a new problem arises (such as a
security violation or server downtime issue), systems administrators will manually examine the
configuration of a system to ensure that it is working as expected. A far better scenario is one in
which auditing is performed proactively and as part of a regular process. There are many benefits
of performing regular audits, including:
• Adhering to regulatory compliance requirements—Many companies are required to
adhere to government rules or industry-specific practices. These regulations might
specify how certain types of data should be handled or they might define how certain
processes should be performed. In fact, many regulatory requirements necessitate regular
auditing reviews to be carried out either by an organization or through the use of a third
party.
• Verifying security—In many IT environments, security is difficult to manage. Every time
a new server or workstation is added to the network, systems administrators must be
careful to ensure that the device meets requirements specified in the security policy.
Overlooking even one system could lead to serious problems, including loss of data.
Additionally, IT departments must keep track of hardware and software licenses and
ensure that users don’t add new devices without authorization. By performing routine
security audits, some of these potential oversights can be detected before they lead to
unauthorized access or other problems.
• Enforcing processes—Auditing can help ensure that the proper IT processes are being
followed. IT departments that have implemented best practices such as those specified
within the IT Infrastructure Library (ITIL) can perform routine reviews to look for
potential problems and identify areas for improvement.
• Change tracking and troubleshooting—Even in relatively simple IT environments,
changes can have unintended consequences. In fact, many problems occur as a result of
changes that were made intentionally. An auditing process can help identify which
changes are being made and, if necessary, can help reduce troubleshooting time and
effort.
These are just some of the important reasons for performing regular auditing of IT environments.
The important point is that rather than being just a burden on an IT group, auditing can help
ensure that the organization is working properly.

35
The Reference Guide to Data Center Automation

Developing Auditing Criteria


When working on developing auditing requirements and criteria, an organization should start by
determining goals for the auditing process. The potential benefits already discussed are a good
starting point. However, IT groups should add specifics. Examples might include Sensitive
customer data should always be stored securely and Change and configuration management
processes should always be followed.
Process-related criteria pertain to how and when changes are made and are designed to ensure
that the intended IT service levels are being properly met. Organizations might develop auditing
requirements to ensure that a process, such as manual server provisioning, is being performed
correctly. These criteria often depend upon having well-documented processes that are enforced.
For example, ITIL provides recommendations for steps that should be included in the change and
configuration management process. Processes also pertain to standard operations in such areas as
physical data center security, adherence to approvals hierarchies, and the definition of employee
termination policies.
Configuration-related criteria focus on how workstations, servers, and network devices are set
up. For example, security policies might require that all workstations and servers are only one
version behind the latest set of security updates. Application-level configuration is also very
important, as the strength of an IT organization’s security system relies upon having programs up
to date with regard to patches and user authorization settings.
Inventory-related auditing criteria generally involve verification that equipment is being tracked
properly and that hardware devices are physically located where expected. Asset tracking
methods and manual inspection of data centers and remote locations can help ensure that these
criteria are being met.
Performance-related auditing criteria are designed to ensure that an IT department is providing
adequate levels of service based on business needs. Metrics might include reliability, uptime, and
responsiveness numbers. Furthermore, if the IT department has committed to specific Service
Level Agreements (SLAs), those can serve as a basis for the auditing criteria.
Table 2 provides examples of auditing criteria and metrics that a typical IT department might
develop.

36
The Reference Guide to Data Center Automation

Auditing Category Auditing Metrics Target


Purpose/Requirement
Process Change management Percentage of
process must be changes that
enforced obtained approval
100% of changes
should be approved
prior to being performed
All data center access is Accuracy of data 100% of data center
logged center entrance and visits are logged
exit logs
Configuration Servers are up to date Percentage of 100% of servers
with the latest security servers that are up to must be within one
patches date based on level of the latest
security policy security policy, and
50% must be
running at the latest
level
CRM accounts are up to Only required All permissions and
date accounts and user accounts
permissions are in should be consistent
place, according to with IT and HR
the IT CRM records
configuration policy
Performance Ensuring adequate IT Total unscheduled No more than 100%
server and network downtime (in of the unscheduled
uptime minutes) downtime allowance,
as specified in the
SLA
Service desk resolution Percentage of issues 95% of tier-one
times are within stated that are within times issues should be
levels stated within the SLA resolved within 4
hours
Inventory Asset tracking Percentage of assets 100% of devices
that are present should be physically
according to previous verified
audit results and
change tracking
details

Table 2: Sample auditing criteria.

37
The Reference Guide to Data Center Automation

Preparing for Audits


In many IT environments, preparing for an audit is something that managers and staff dread. The
audit itself generally involves scrutinizing operational details and uncovering deficiencies. It also
requires the generation of a large amount of information. The following types of information can
be helpful in order to prepare for an audit:
• Meeting minutes and details from regular meetings, such as change and configuration
management reviews
• Asset inventory information, based on physical verification and any asset tracking
databases
• Results from previous audits, in order to ensure that previous deficiencies have been
addressed
• IT policy and process documents
• Information about processes and how they’re enforced
All this information can be difficult to obtain (especially if done manually), but is often required
in order to carry out auditing procedures.

Performing Audits
The process of performing an audit involves comparing the actual configuration of devices and
settings against their expected settings. For an IT department, a typical example might be a
security audit. The expected values will include details related to server patch levels, firewall
rules, and network configuration settings. The employees that actually perform the audit can
include members of the internal staff, including systems and network administrators and IT
management. The goal for internal staff should be to remain completely objective, wherever
possible. Alternatively, organizations can choose to employ outside professionals and consultants
to provide the audit. This method often leads to better accuracy, especially if the consultants
specialize in IT auditing.
Auditing can be performed manually by inspecting individual devices and settings, but there are
several potential problems with this method. First and foremost, the process can be tedious and
time consuming, even in small IT environments. Second, the process leaves much room for error,
as it’s easy to overlook a device or setting. Finally, performing routine audits can be difficult,
especially in large environments in which changes are frequent and thousands of devices must be
examined. Figure 6 shows an example of a manually generated auditing report. Although this
report is far from ideal, it does show the types of information that should be evaluated.

38
The Reference Guide to Data Center Automation

Figure 6: Manual spreadsheet-based auditing reports.

Automating Auditing
When performed manually, the processes related to designing, preparing for, and performing
auditing functions can add a significant burden to IT staff. IT staff must be sure to define
relevant auditing criteria, and they must work diligently to ensure that process and configuration
requirements are always being met. Additionally, the process of performing audits can be
extremely time consuming and therefore are generally performed only when absolutely required.
Fortunately, there are several ways in which data center automation tools and technologies can
help automate the auditing process. One of the most important is having a Configuration
Management Database (CMDB). A CMDB can centrally store all the details related to the
hardware, software, and network devices in an IT environment, so it serves as a ready source
against which expected settings can be compared. Asset tracking functionality provides IT
managers with the power of knowing where all of their hardware and software investments are
(or should be).
Change and configuration management tools can also help by allowing IT staff to quickly and
automatically make changes even in large environments. Whenever a change is made, it can be
recorded to the audit log. Furthermore, by restricting who can make changes and organizing the
change process, data center automation tools can greatly alleviate the burden of performing
auditing manually.
Although auditing can take time and effort to implement, the investment can quickly and easily
pay off. And, through the use of data center automation tools, the entire process can be managed
without additional burden to IT staff.

39
The Reference Guide to Data Center Automation

Customers
One of the many critical success factors for service-related organizations is customer service.
Businesses often go to great lengths to ensure that they understand their customers’ needs and
invest considerable time and effort in researching how to better serve them. The customer
experience can be a “make-or-break” factor in the overall success of the business.
Although the term “customer” is usually used to refer to individuals that work outside of an
organization, IT departments can gain insight into the users and business processes they support
by viewing them as customers. This shift in service delivery perspective can help improve
overall performance of IT departments and operations for an organization as a whole.

Identifying Customers
An important aspect of service delivery is to define who customers are. In the business world,
marketing organizations often spend considerable time, effort, and money in order to make this
determination. They understand the importance of defining their target markets. IT departments
can take a similar approach. Many IT departments tend to be reactive in that they respond to
requests as they come in. These requests may range from individual user needs (such as
password reset requests) to deployments of new enterprise applications (such as the deployment
of a new CRM application).
The first step in identifying customers is to attempt to group them together. End users might
form one group and represent typical desktop and workstation users from any department.
Another group might be mid-level management, who tend to frequently request new computer
installations or changes to existing ones. Finally, upper-level management often focuses on
strategic initiatives, many of which will require support from the IT department. Figure 7
provides an example of some of these groups.

Figure 7: Identifying IT departments’ customers.

40
The Reference Guide to Data Center Automation

Understanding Customers’ Needs


Once IT departments’ customers have been defined, it’s time to figure out what it is they really
need. In pre-sales discussions, traditional Sales staff will meet with representatives and decision
makers to identify what their customers are looking for. They’ll develop a set of requirements
and then come back with a proposed solution. They key portion of this process is to have both
business and IT representatives involved in the process.
It’s important to be able to accurately identify “pain points” for customers—the areas that are
causing them the highest costs and most frustrations. Often, IT departments tend to spend a
significant portion of their time “fighting fires” instead of addressing the root causes of reliability
problems. If IT management consistently hears that response times and service delivery delays
are primary concerns, an investment in automation tools might help address these issues.

Defining Products and Service Offerings


Once their customers’ needs have been identified, IT staff can start trying to find the best
solutions to these problems. It is at this point at which the real benefits of the “customer-
focused” model start to appear. Based on gathered requirements, IT management can start
developing a set of service offerings to meet these needs. For example, if the Engineering
department wants to be able to set up and deploy new machines as quickly as possible,
investments in server virtualization and automated deployment tools might make sense. If
reliability and uptime are primary concerns for several departments, investments in automated
monitoring tools might make sense.
The best businesses find a way to offer standardized products or services that apply to many of
their customers. Although it’s often impractical to try to meet everyone’s needs, the majority can
often benefit from these offerings. Ideally, systems and network administrators will be able to
find solutions that can benefit all areas of the organization with little or no customization. For
example, developing standard workstation upgrade and deployment processes might be of
interest to several different departments. True economies of scale can be realized when basic
services become repeatable and consistent.
IT organizations can use several different practices to ensure that customers are getting what they
“paid for.” In some cases, Service Level Agreements (SLAs) can help define and communicate
the responsibilities of the service provider. IT departments can commit to specific performance
metrics and constantly track their success against these numbers. Data center automation tools
can also be helpful in ensuring that SLAs are being met.

41
The Reference Guide to Data Center Automation

Communicating with Customers


An important aspect of overall success is related to vendors’ ability to align their products and
services with what their customers need. It’s also important to recognize that, in some cases,
what customers ask for might not be what they really want. By taking time to work in a “pre-
sales” role to better identify the source of problems faced by customers, better solutions can be
developed.
It’s important to continue communications with customers, just as successful businesses work to
earn repeated business from their existing client base. IT departments might find that their
customers’ needs have changed in reaction to new business initiatives or changing focus. This
should serve as a good indicator that it might be time to hold a review and potentially update the
products and services that are being provided. IT organizations should choose to restructure their
offerings based on customers’ changing needs. For example, if too many resources are currently
allocated toward products or services that do not address major company initiatives, it will
become obvious that these efforts could be better spent in another way.
The benefits of continued communications with customers are numerous. In the traditional
business world, it’s commonly understood that the value of keeping customers happy over time
is high. Figure 8 shows an example of a continuing customer-focused process that involves
“service after the sale.”

Figure 8: The customer-focused IT process cycle.

42
The Reference Guide to Data Center Automation

Managing Budgets and Profitability


Before a business can truly be considered successful, it must be able to show a profit—that is, its
revenue must exceed its overall costs. In the case of internal IT departments, actual money might
never change hands. And, the organization is typically structured to be a cost center.
Still, it’s important to ensure that IT is delivering its products at the best possible price to
customers while staying with budget. In some organizations, business and IT management may
decide to implement this goal through inter-department charge-backs (a system by which the IT
department will “charge” its customers and all expenditures will affect the budget of the
department requesting products or services).
The goal of the IT department should be to reduce costs while maintaining service levels and
products to its customers. This goal can often be achieved by increasing efficiency through the
development of standard practices and the use of data center automation tools. Table 3
provides sample numbers that show how various technology investments can improve
profitability.
Product or Proposed Current Cost or New Cost or Benefit
Service Investment Service Level Service Level
Workstation Investment in $350/workstation $90/workstation $270 savings
deployment automated per
deployment tools workstation
deployed
Server Investment in $450/server $125/server $325 savings
deployment automated server per
deployment and workstation
configuration tools deployed
Average time to Purchase of ~4 hours ~1.25 hours Reduction in
resolve basic Help automated Help average
desk issues desk system resolution
time by ~3.75
hours per
issue
Server patch Automated ~ 12 hours per ~ 2 hours per Reduction in
management security server per year server per year time and
management effort required
solution to maintain
servers by
approximately
10 years

Table 3: Improving IT product profitability.

Overall, there are numerous benefits that stand to be gained by having IT departments treat users
and other business units as customers. By identifying groups of users, determining their needs,
and developing products and services, IT organizations can take advantage of the many best
practices utilized by successful companies. Doing so will translate into a better alignment
between IT departments and other areas of the organization and can help to reduce costs.

43
The Reference Guide to Data Center Automation

Total Cost of Ownership


Determining total cost of ownership (TCO) involves enumerating all the time and effort-related
costs in relation to implementing and maintaining IT assets. The main concept of TCO is that the
initial purchase price of a technology is often just a very small portion of the total cost.
Organizations benefit from getting a better handle on complete expenses related to their
technology purchases and considering the many different types of charges that should be taken
into account. This information will illuminate ways in which data center automation can help
reduce costs and increase efficiencies.

Measuring Costs
Costs related to the management of IT hardware, software, and network devices can come from
many areas. Figure 9 illustrates the types of costs that might be associated with a typical IT
purchase.

 The numbers are hypothetical approximations and that they will vary significantly based on the size
and amount of automation in various data center environments.

TCO Breakdown by Cost Category

Misc. / Other
Initial Capital
Costs
Costs
6%
22%

Labor Costs
47% Infrastructure
Costs
25%

Figure 9: TCO breakdown by costs.

44
The Reference Guide to Data Center Automation

Identifying Initial Capital Costs


When asked about the cost of a particular product or service, IT staff will generally think first
about the purchase or “sticker” price of the device. Servers, network infrastructure devices, and
workstations all have very readily visible hard costs associated with them. The list of these basic
costs should include at least the following:
• Capital equipment purchases—These costs are probably the ones that first come to mind
for IT managers. For example, a new network-attached storage device might cost $6500
to purchase. The charge is usually paid one time (at the time of purchase) and is related to
the physical device or technology itself.
• Financing charges—IT organizations that choose to lease or finance the initial purchase
price of capital assets will need to factor in financing charges. Depending on the terms of
the lease, amounts may be due monthly, quarterly, or annually.
• Handling and delivery charges—Procuring new IT hardware requires basic transportation
and handling costs. These are often included in part of the purchase transaction, but in
some cases, additional delivery fees might be required.
Generally, these costs are easy to see. They will show up on invoices and purchase orders, and
can be tracked by IT managers and accounting staff without much further research. However,
they account for only one portion of the total cost.

Enumerating Infrastructure Costs


IT organizations should keep a close watch on ongoing costs that are related to supporting and
maintaining the devices. Regardless of whether an IT organization owns its data center facilities,
it will need to factor in costs for several areas of operation:
• Power—Costs related to electricity can represent a significant portion of total IT
expenditures. Although it can be challenging to isolate the exact amount of power used
by each device, average costs per amp of power can be calculated and distributed based
on devices’ requirements. Power is required for basic functioning of the device as well as
for cooling management. Furthermore, as the price of electricity can vary significantly,
many financial predictions can partially hinge upon numbers that are outside an
organization’s control.
• Support contracts and maintenance fees—Many IT hardware and software vendors offer
(or require) support and maintenance contracts for their devices. These costs may be one-
time additions to the purchase price, but generally, they’re paid through a monthly or
annual subscription.
• Infrastructure costs—Whenever new IT equipment is deployed, physical space must be
made for the new devices. Supporting infrastructure resources must also be provided.
This infrastructure might include additional network capacity (switch ports and
bandwidth), rack space, and any required changes to cooling and power capabilities.
• Network bandwidth—The implementation of new devices on the network will often
require some incremental increase in network connectivity. Although additional costs are
usually not required for every new device, total network-related costs should be
distributed over all the workstations, servers, and networks that are supported.
By factoring in the ongoing support and maintenance costs, one more piece of the TCO puzzle is
added.

45
The Reference Guide to Data Center Automation

Capturing Labor Costs


Once the initial purchase price of the physical device and related infrastructure is factored in, it’s
time to look at human resources. Labor-related costs associated with the entire life cycle of an IT
purchase often account for a large portion of the TCO for a particular device. Typical areas
include:
• Selection—The time it takes to evaluate various solutions and determine configurations
can affect the overall cost of an IT investment.
• Deployment—After equipment is delivered, labor is required in order to physically
“rack” the new device and to configure it for use. Some cost reductions can be realized
when installing numerous devices at the same time.
• Configuration and testing—New computers and network devices rarely come from the
factory completely ready for use. Initial configuration often requires significant time,
especially if it is a new device with which the IT department is unfamiliar. Testing is
critical in order to ensure a smooth deployment experience.
• Systems administration—Application and operating system (OS) updates, performance
monitoring, security management, and other routine tasks can add up to significant
ongoing computing costs.
• Replacement—It’s no secret that all IT investments have limited useful life spans and
must be replaced eventually. The cost of removing and replacing old hardware should be
factored into the total cost.
To calculate labor-related costs, IT managers should group their employees into specific skill
areas and determine average per-hour costs for those personnel. In many cases, it might make
more sense to determine an average value for the number of hours spent on certain tasks. Some
tasks, such as fixing hardware failures, may be performed infrequently, and only on a few
machines in the environment. For these cases, IT management can calculate the total number of
hours spent repairing hardware problems, then divide that by the total number of servers in the
environment. The result is a useful “cost per server” amount that can then be factored into other
technology decisions.

Measuring TCO
There are many challenges that IT organizations will face when trying to calculate TCO for the
devices they support. The main problem is in determining cost-related numbers. Some of this
information can come from reports by IT staff, but that data is often incomplete. Asset
management tools can greatly help keep track of “hard costs,” especially those related to new
purchases. These tools generally allow factoring in finance costs, operating costs, and
depreciation—all of which can be important for determining TCO.
A good source for labor-related costs can be an automated Help desk solution and change and
configuration management tools. IT staff can easily report on the amount of time they’ve spent
on specific issues by using these tools.

46
The Reference Guide to Data Center Automation

Reducing TCO Through Automation


Many of the real costs related to technology investments are related to deployment and ongoing
management. Data center automation tools can greatly help in measuring and reducing TCO. By
providing reports of the time and effort required to maintain servers, workstations, and network
devices, IT managers can get a more accurate picture of total costs.
Once an organization has a good idea of where its major operational costs are coming from, it
can use this information to start reducing those costs. Most IT organizations will find that they
spend significant amounts of money on basic labor-intensive operations that can be quickly and
easily automated through the use of the right tools. For example, if a major cost component
related to supporting servers is deployment, automated server provisioning tools can help lower
those costs. Similarly, if a large portion of the expenses come from ongoing maintenance,
automating monitoring, change, and configuration management solutions can help dramatically.
Overall, by keeping in mind the components of TCO, IT departments can make better decisions
related to managing and lowering the costs associated with service delivery.

Reporting Requirements
An old management saying states that “If you can’t measure it, you can’t manage it.” The idea is
that, without knowing what is occurring within the business, managers will be unable to make
educated decisions. This idea clearly applies to IT environments, where major changes happen
frequently and often at a pace that is much faster than that of other areas of the business. That is
where reporting comes in—the goal is for IT management to be able to gain the insight they need
to make better decisions. It is useful to know what types of reports can be useful, and how these
reports can be generated.

Identifying Reporting Needs


The first step in determining reporting requirements is to determine what types of information
will be useful. Although it’s tempting for technical staff to generate every possible report (just
because it’s possible), the real initial challenge is in identifying which information will be most
useful.

Configuration Reports
Configuration reports show IT managers the current status of the hardware, software, and
network environments that they support. Details might include the configuration of specific
network devices such as routers or firewalls, or the status of particular servers. Basic
configuration information can be obtained manually through the use of tools such as the
Windows System Information Application (as shown in Figure 10).

47
The Reference Guide to Data Center Automation

Figure 10: Viewing configuration details using the Windows System Information tool.

These reports can be very helpful by allowing IT managers to identify underutilized resources,
and for spotting any potential capacity or performance problems. They are also instrumental in
ensuring that all systems are kept up to date with security and application patches. Reporting
solutions should be able to track assets that are located in multiple sites (including those that are
hidden away in closets at small branch offices) to ensure that nothing is overlooked.

Service Level Agreement Reporting


An effective method for ensuring that IT departments are meeting their users’ needs is through
the use of well-defined Service Level Agreements (SLAs). An SLA might specify, for example,
how much downtime is acceptable for a specific application, or it might define an expected
turnaround time for the deployment of a new server. These agreements can affect operations
throughout the business, so IT managers generally want to keep a close watch on them. SLA
reporting features will allow IT staff to specify thresholds for specific metrics (such as server
deployment time), then provide for the creation of reports that show the expected values
compared against the agreed-upon values. These reports can also be helpful in improving the
perception of IT throughout an organization (assuming, of course, that service levels are met).

48
The Reference Guide to Data Center Automation

Real-Time Activity Reporting


Most IT departments are characterized by rapid changes in short amounts of time. Many of these
changes occur in reaction to changing business requirements; others are performed in order to
improve the IT infrastructure. It can be very difficult to keep track of all of the changes that are
occurring on workstations, servers, network devices, and applications. In this arena, real-time
reporting can help. The information in these reports is always kept up to date and can be
referenced many times during the day. Ideally, the reports will include details about what
changed, why the change was made, and who performed the change. This information can
greatly assist business and technical staff in coordinating their activities and troubleshooting
problems.

Regulatory Compliance Reporting


Many industries are required to comply with government and industry-specific regulations to
ensure that their operations are within guidelines. Examples include the Sarbanes-Oxley Act (for
public companies) and the Health Insurance Portability and Accountability Act (HIPAA—for the
healthcare industry). They must not only follow the rules but also be able to prove it. Regulatory
compliance reports generate information related to the metrics of the current IT environment,
then compare this data against specific regulatory requirements. With this information, IT
management can quickly identify any deficiencies that must be resolved.

Generating Reports
Once an organization has determined the requirements for its reports, it can start looking at how
the reports can be generated. There are many ways in which report creation and generation can
be simplified.

Using a Configuration Management Database


Determining the source of reporting data can be difficult in many IT organizations. Data tends to
be stored in a variety of different “systems,” including paper-based records, spreadsheets, custom
database solutions, and enterprise systems. It can be very difficult to bring all this information
together due to differences in the types of data and how information is structured. By using a
centralized Configuration Management Database (CMDB), IT departments can store the
information they need within a single solution. This data store greatly simplifies the creation and
generation of reports, and can help ensure that no information is overlooked (see Figure 11).

49
The Reference Guide to Data Center Automation

Figure 11: How a CMDB can help facilitate reporting.

Automating Report Generation


One of the challenges related to many types of reporting is that, as soon as the report is
generated, it’s out of date. Fortunately, electronic report distribution methods can help alleviate
this problem. Automated IT reporting solutions can provide numerous features:
• On-demand reporting—Whenever necessary, users should be able to generate up-to-the-
second reports on-demand. This type of reporting is particularly useful when managers
want to closely track information that might change during the day. In larger IT
environments, reports might take a significant amount of time to generate, so scheduling
options can be helpful.
• Automatic report distribution—Many business processes revolve around regular meetings
and review processes. Automated reporting solutions that have the ability to
automatically send reports based on a predefined schedule can help ensure that everyone
is kept up to date. Reports can be distributed via an intranet site or through email.
• Alerts—IT managers often expect their staff to notify them if some aspect of the
organization needs special attention. The same requirement is true for reporting.
Reporting solutions can provide the ability to set alerts and thresholds that can highlight
particularly important or interesting aspects within reports. For example, if downtime has
currently exceeded the limits specified by the service level for the Engineering
department, IT managers could have the report highlight this in red.
Overall, through the use of automated reporting, IT departments can gain the information they
need to make better decisions about their business and technical operations. The result is
reductions in cost and improvements in service levels.

50
The Reference Guide to Data Center Automation

Network and Server Convergence


Over time, IT applications have evolved to become increasingly reliant on many components of
an IT infrastructure. In the past, it was common for even enterprise-level applications to be
hosted on one or a few servers, and many organizations’ networks were centralized. The job of
the network was to ensure that clients could connect to these servers. Modern enterprise
applications are significantly more complex and often require the proper functioning of dozens
of different portions of an IT environment to be working properly. Server and network
management have converged to a point at which they’re highly inter-dependent.

Convergence Examples
In typical IT environments, there are many examples of devices that blur the line between
network and server operations. Dedicated network appliances—such as network-attached storage
(NAS) devices, firewalls, proxy servers, caching devices, and embedded Web servers—all rely
on an underlying OS. For example, although some NAS devices are based on a proprietary
network operating system (OS), many devices include optimized versions of Windows (such as
the Windows Storage Server) or Linux platforms. Figure 12 shows an example of this
configuration.

Figure 12: Components of a typical NAS device “stack.”

In many of these systems, there are clear advantages to this type of configuration. For example,
several major firewall solutions run on either Windows or Linux platforms. The benefit is that
systems administrators can gain the usability features of the underlying OS while retaining the
desired functionality. From a management standpoint, however, this configuration might require
a change to the standard paradigm—to ensure that the device is performing optimally, network
and systems administrators must share these responsibilities.

51
The Reference Guide to Data Center Automation

Determining Application Requirements


One of the most important functions of IT departments is ensuring that users can access the
applications they need to perform their jobs. The applications themselves rely on server
resources as well as the network. An important first step in managing this convergence is to
identify the requirements that applications may have. Table 4 provides an example of some
typical types of applications and their dependencies.
Application Servers Required Networks Required
CRM AppServer01, WebServer01, VPN access for all branch
DatabaseServer01 offices and traveling users
Engineering/QA Defect Engineering01, Engineering network Internet
Tracker Engineering03, access and VPN access
EngineeringDB05 (home users)
Intranet IntranetServer01 (cluster), Corporate LAN, branch office
KnowledgeMgmtDB WANs, and VPN

Table 4: Identifying application requirements and dependencies.

By highlighting these requirements, IT staff can better visualize all the network and server
infrastructure components that are required to support a specific application.

The Roles of IT Staff


In the past, IT operations tended to be specialized in numerous isolated roles. A typical staff
might include network specialists, database specialists, server administrations, and application
managers. It was often acceptable for each of these administrators to focus on his or her area of
expertise with limited knowledge of the other areas. For most modern IT organizations, this
structure has changed. Systems administrators, for example, often need strong network skills in
order to complete their job roles. And application developers must take into account the
underlying network and server infrastructure on which their programs will run.
Unfortunately, it’s impractical to expect all IT staff members to have strong skills in all of these
areas. To address this issue, IT departments must rely on strong coordination between the many
functional areas of operations to ensure that applications can remain functioning properly.

Managing Convergence with Automation


When handled manually, it can be challenging for IT staff to develop the levels of coordination
that are required to ensure that converged applications are managed properly. However, many of
the features of data center automation tools can help. First, by storing network- and server-
related configuration details in a single Configuration Management Database (CMDB), IT staff
can more easily see the inter-dependencies of the devices they support. Change and configuration
management tools can help ensure consistency in how these devices are managed. Overall, the IT
staff can better manage the complexity resulting from the convergence of network and server
management through the use of data center automation tools.

52
The Reference Guide to Data Center Automation

Service Level Agreements


The primary focus of IT departments should be meeting the requirements of other members of
their organizations. As businesses have become increasingly reliant on their technology
investments, people ranging from desktop users to executive management have specific
expectations related to the levels of service they should receive. Although these expectations
sometimes coincide with understandings within an IT organization, in many cases, there is a
large communications gap.
Service Level Agreements (SLAs) are intended to establish, communicate, and measure the
levels of service that will be provided by IT departments. They are mutually agreed-upon
definitions of scope, expected turnaround times, quality, reliability, and other metrics that are
important to the business as a whole.

Challenges Related to IT Services Delivery


In some areas of IT, the job can be rather thankless. In fact, it is sometimes said that no one even
thinks about IT until something goes wrong. Although many organizations see investments in IT
as a strategic business investment, others see it only as a cost center. The main challenge is to be
able to come to an understanding that includes the capabilities of the IT department and the
expectations of the “customers” it serves. That is where the idea of service levels comes in. In
order to focus on these benefits, IT departments can think of themselves as outside vendors that
are selling products and services to other areas of their organization. Let’s look at some details
related to defining these agreements.

Defining Service Level Requirements


SLAs can be set up in a variety of ways, and there are several approaches that can be taken
toward developing them. One common factor, however, is that all areas of the organization must
be involved. SLAs are not something that can be developed by IT departments working in
isolation. The process will require research and negotiations in order to determine an appropriate
set of requirements. Figure 13 provides an overview of the considerations that should be taken
into account. Let’s look at the process of defining SLAs.

53
The Reference Guide to Data Center Automation

Figure 13: The process of developing, implementing, and evaluating SLAs.

Determining Organizational Needs


IT departments can benefit from thinking of its services as “products” and the users and business
processes it supports as “customers.” In this model, the goal of the IT department is to first
determine which services the customer needs. This is perhaps the single most important part of
the process: IT managers must meet with users and other managers throughout the organization
to determine what exactly they need in order to best accomplish their goals. This process can be
extremely valuable and enlightening by itself. It’s very important to keep the main goal in mind:
To determine what organizations truly need, rather than what would just be nice to have.

54
The Reference Guide to Data Center Automation

Identify Service Level Details


The next step is to start trying to define specific details related to what service levels should be
accepted. This process will ideally work as a negotiation. A manager from the Engineering
department might want all new server deployments to be completed within 2 days of the request.
Based on IT staff and resources, however, this might not be possible. The IT manager might
present a “counter-offer” of a turnaround time of 4 days. If this isn’t acceptable, the two can
discuss alternatives that might allow for the goal to be more accessible. In this example, an
investment in automated server deployment tools, virtualization, or additional dedicated staff
might all be possible ways to meet the requirements.
When discussing goals, it’s important for business leaders to avoid diving too far into technical
details. For example, rather than requesting a “clustered database solution for the CRM
application,” it is better for a Marketing manager to state the high-level business requirement,
“We need to ensure that, even in a worst-case scenario, our people can access the CRM
application.” In this particular case, it might well be that the best technical solution doesn’t
involve clustering at all. The bottom line is that it’s the job of IT to figure out how to meet the
requirements.
A major benefit of this negotiation process is that it forces both sides to communicate details of
their operations, and it allows each side to compromise to find a solution that works within given
constraints. Occasionally, it might seem impossible for an IT department to meet the needs of a
particular business area. In this case, either expectations have to be adjusted or budgetary and
staffing resources might be required. In any case, communicating these issues makes the topics
open and available for discussion. Once acceptable terms have been reached, it’s time to
determine what to include in the SLA.

Developing SLAs
There are several important points to include in a complete SLA. Of course, it begins with a
description of what level of service will be provided. At this point, the more detailed the
information, the better it will be for both sides. Details should include processes that will be used
to manage and maintain SLAs. For example, if a certain level is not being met, points of contact
should be established on the IT and business sides.
In many cases, IT departments might find that many different service level requirements overlap.
For example, several departments might require high availability of Virtual Private Network
(VPN) services in order to support traveling users and remote branch offices. This can help IT
managers prioritize initiatives to best meet their overall goals. In this example, by adding better
monitoring and redundancy features into the VPN, all areas of the organization can benefit.

55
The Reference Guide to Data Center Automation

Delivering Service Levels


IT managers might have some level of fear when committing to specific service levels. Due to
the nature of technology, it’s quite possible that situations could arise in which SLAs cannot be
met (at least not for all areas of the organization). An extreme example might be the “perfect
storm” of industry-wide hardware shortages combined with a lack of staff. In such a case,
circumstances beyond the control of an organization can cause failures to meet the predefined
goals.
Overall, IT departments and business leaders should treat SLAs like they would any other target
(such as sales-related goals or Engineering milestones). Ideally, the levels will always be met.
But, when they’re not, everyone involved should look into the issues that caused the problem and
look at how it can be resolved and avoided in the future. Even in the worst case, having some
well-defined expectations can help avoid miscommunications between IT and its customers.

The Benefits of Well-Defined SLAs


When implemented properly, SLAs can help make the cost and challenges related to IT
operations a part of the entire organization. By providing some level of visibility into IT
operations and costs, other departments can get an idea of the amount of work involved. This can
help manage expectations. For example, once the Accounting department understands the true
cost of ensuring automated failover, perhaps it might decide that some unplanned downtime is
acceptable.
IT management can benefit greatly from the use of SLAs. They can use these agreements to
justify expenditures and additional staff if appropriate resources are not available to meet the
required levels. By communicating these issues up front, either their service levels must be
lowered or necessary resources must be made available. Either way, the decision is one that the
organization can make as a whole.
Another major benefit of using SLAs is that investments in technologies such as data center
automation products can become much more evident. When relatively small investments can
quickly return increases in service levels, this is a clear win for both the IT department and the
users it supports.

Enforcing SLAs
When dealing with outside parties, an agreement is often only as strong as the terms of any
guarantee or related penalties. Because most IT departments tend to be located in-house, it’s
generally not appropriate to add financial penalties. Thus, the enforceability of SLAs will be up
to the professionalism of the management team. When goals are not being met, reasons should
be sought out and the team should work together to find a solution. SLAs should be seen as
flexible definitions, and business leaders should expect to adjust them regularly. As with other
performance metrics, organizations might choose to attach salary and performance bonuses
based on SLAs.
Perhaps the biggest challenge is that of prioritization. Given a lack of labor resources, what is
more important: uptime for the CRM application or the deployment of new Engineering servers?
To help in these areas, IT managers might want to schedule regular meetings, both inside and
outside of the IT department, to be sure that everyone in the organization understands the
challenges.

56
The Reference Guide to Data Center Automation

Examples of SLAs
The actual details of SLAs for organizations will differ based on specific business needs.
However, there are some general categories that should be considered. One category is that of
application, hardware, and service uptime. Based on the importance of particular portions of the
IT infrastructure, availability and uptime goals can be developed. Other types of SLAs might
focus on deployment times or issue resolution times.
Table 5 provides some high-level examples of the types of SLAs that might be developed by
an organization. The examples focus on numerical metrics, but it’s also important to keep in
mind that “soft metrics” (such as overall satisfaction with the Service Desk) might also be
included.
SLA Area Metrics Goal Notes/Terms
CRM Application Percent availability 99.9% Excludes planned downtime for
Uptime availability maintenance operations and
downtime due to unrelated
network issues; major application
updates might require additional
planned downtime
Service Desk: Level 1 Issue Resolution Time 4 business Include definition of “Level 1
Issue Resolution hours Issues”
Service Desk: Level 2 Issue Resolution Time 8 business Time is measured from original
Issue Resolution hours submission of issue to the
Service Desk; include definition
of “Level 2 Issues”
Engineering: New Time to deployment 3 days Time is measured from when
Server Deployments formal change request has been
(Physical machine) approved; SLA applies only to
servers that will be hosted within
the data center
Engineering: New Time to deployment 2 hours Virtual machines must use one
Server Deployments of the three standard
(Virtual machine) configuration profiles; time is
measured from when formal
change request has been
approved.

Table 5: Examples of SLAs.

Now that we’ve looked at some examples, let’s see how IT organizations can keep track of
SLAs.

57
The Reference Guide to Data Center Automation

Monitoring and Automating SLAs


Once SLAs have been put into place, it’s up to the IT department to meet the goals that have
been agreed upon. Although some environments might attempt to handle issues only when they
arise, the ideal situation is one in which IT managers regularly produce reports showing SLA-
related performance. This can be done manually, but in many cases, the management and process
overhead related to tracking issue resolution times and uptime can be significant.
One important way in which SLAs can be better monitored and managed is through the use of
data center automation tools. Integrated platforms include features for monitoring uptime,
automating deployment, and tracking changes. They can also provide IT managers with the
ability to define service levels and measure their actual performance against them. Reports can
be generated comparing actual performance with expected performance. Without these reports,
people might have had to guess whether SLAs were being met. And the inevitable perception
issues can negate many of the advantages of having created the SLAs in the first place.
Overall, through the establishment of SLAs, IT departments can verify that they are meeting
their customers’ requirements and ensure that the organization is receiving the expected value
from their IT investments.

Network Business Continuity


Network business continuity focuses on ensuring that network operations will continue to
function as quickly as possible after a major outage or disaster. The goal is to limit the disruption
to service caused by the failure of a device, a network, or even an entire data center. Most
implementations will involve a backup site and a process for failing over to that site, when
needed. There are many factors that IT managers should keep in mind when developing network
business continuity plans.

The Benefits of Continuity Planning


Business continuity, in general, has become increasingly important for many types of
organizations. Customers and business partners have become increasingly reliant on applications
and services, and even minor downtime can cause significant financial losses. For example, the
loss of connectivity lasting a few minutes for a financial institution can result in lost revenues
and reduced customer confidence, both of which would be difficult to regain. The list of things
that can go wrong is a long one, ranging from issues with electricity to widespread natural
disasters. Business continuity planning attempts to mitigate these risks by planning for processes
that will resume normal operations, even in a worst-case scenario.

58
The Reference Guide to Data Center Automation

Developing a Network Business Continuity Plan


The success of any continuity process hinges on its accuracy and alignment with business needs.
This section will look at the many considerations that should be taken into account when
developing a network business continuity plan. Figure 14 shows a high-level view of the
processes that should be included.

Figure 14: Example steps of a network business continuity plan.

Defining Business Requirements


The first step in developing a network business continuity plan is to determine the organization’s
requirements. Although all systems are important, certain areas of the network might be more
important than others. The most important aspect of determining requirements is to involve an
entire organization. The IT department shouldn’t rely on its own knowledge to make important
decisions related to the most important areas of the computing infrastructure. Given infinite
resources, multiple duplicate network environments might be possible. In the real world, it’s
much more likely that budget and labor constraints will restrict the reasonable level of protection
against failures and disasters.
A realistic plan should include discussions of the costs of downtime, the effects of data loss, and
the importance of various areas of the network. Ideally, a list of critical systems will be
developed based on input from the organization’s entire management team.

Identifying Technical Requirements


Modern IT networks tend to be complicated. There are many interdependencies between devices
such as switches, routers, firewalls, and network caching devices. And this list doesn’t even
include details related to which devices are relying on that infrastructure. When planning for
business continuity, IT staff should first develop a high-level overview of the network topology
and should outline critical systems. The goal is to ensure that the base levels of the infrastructure
(which will be required by all other systems) are identified.
The next step is to enumerate which devices will be required in the event of a failover process.
Core routers, switches, and firewalls will probably be the first items on the list. Next would be
devices required to support the most important applications and services on the network.
Considerations should include how the network can run with reduced capacity (particularly if the
budget doesn’t allow for full redundancy).

59
The Reference Guide to Data Center Automation

Preparing for Network Failover


In the event of a network outage, failover processes must be performed. But before these steps
can be taken, IT departments must ensure that they have the tools and information required. This
section will take a look at some of the most important considerations.

Configuration Management
Keeping track of network configuration files is an important first step to enabling the failover
process. In the event of a failover, restoring this information will help bring a network back to a
usable state. Whenever configuration changes are made, network administrators must be sure
that the change is recorded and replicated to any backup or standby devices.

Managing Network Redundancy


The implementation of redundancy is a major component of most business continuity plans.
When planning for redundancy, it’s important to start with defining acceptable downtime limits
and appropriate failover times. Most enterprise-level solutions offer options for enabling
automatic failover of routers, switches, firewalls, content caches, and other network devices. It is
important to keep in mind that, in the case of most failovers, the process might be noticeable to
users (although the impact will hopefully be limited to a few connections that need to be
reestablished).

Simulating Disaster Recovery Operations


An important—but often overlooked—aspect of any recovery process is to rehearse the failover
and business continuity plan. There are many benefits to walking through this process. First,
through a trial run, it’s likely that business and technical staff will find areas for improvement in
the plan. Even the best planning can overlook some of the details that are revealed when
performing the “real thing.” In the worst case, perhaps a critical system was completely
overlooked. Or, there may be various time-saving changes that can be made to improve the
process.
Another major benefit of simulating disaster recovery is that practice builds expertise. IT staff
should be well-versed in what is required to perform failover processes. There is one iron-clad
rule related to testing recovery processes: Immediately after the failure of a critical system is not
the time to start learning how to recover it.

60
The Reference Guide to Data Center Automation

Automating Network Business Continuity


There are many aspects of an organizations’ network that must be considered when developing
and preparing a business continuity plan. For most organizations, the tasks involved will require
a lot of work. Fortunately, automated data center management tools can help make the process
easier. For example, through the use of automated network discovery, network administrators
can easily look at the overall network and discover interdependencies. And, through the use of
configuration management (ideally with a configuration management database—CMDB),
accurate network device configuration details can be collected. The process of keeping routers,
switches, and firewalls up to date at a backup site can also be performed automatically. Figure 15
provides an example of how this process might work.

Figure 15: Maintaining a failover configuration using data center automation tools.

Developing a network business continuity plan is no small task for most IT departments.
Through the use of data center automation solutions, however, this critical task can be made
much more manageable.

61
The Reference Guide to Data Center Automation

Remote Administration
In modern IT environments, systems and network administrators are often tasked with managing
increasing numbers of devices without additional time and resources. In addition, the systems
might be spread out over numerous sites. Centralized management can help meet these needs by
increasing overall efficiency. IT staff should be able to manage devices that are located across
the world just as easily as they can manage the computing devices on their desks. Remote
administration can be used to improve systems and network administration in an IT environment.

The Benefits of Remote Administration


When thinking of desktop administration, the term “SneakerNet” (referring to the fact that
systems administrators often spend much of their time and effort walking between systems)
might come to mind. For this reason, remote administration is a concept that is usually an easy
“sell” to IT departments. When you factor in the labor costs and time associated with physically
traveling to remote offices and departments, it’s difficult to find a method that is less efficient.
Before looking at specific requirements related to remote administration, let’s quickly cover
some of the potential benefits of remote administration. First and foremost, by centrally
managing the configuration of hardware, software, and network devices, systems administrators
can work from the comfort and convenience of their own workstations. Although having to deal
with left-handed mice and custom keyboards might be a fun challenge, it’s clearly not efficient.
Time saved by avoiding walking around is also another obvious benefit. For managing data
center operations, you can increase security by limiting physical access to servers. From an end-
user standpoint, having problems solved quickly and with minimal disruption to work are
important goals. By now, the benefits are probably pretty obvious. Let’s delve into what you
should look for in a remote management solution.

Remote Administration Scenarios


From a technical standpoint, remote administration can take many forms. Perhaps the most
familiar to systems administrators is that of managing servers located in the data center or
troubleshooting end users’ desktop machines. Network administrators can also perform remote
administration tasks to configure routers, switches, firewalls, and other devices. In distributed
environments, the remotely managed device might be located a few feet away or half-way across
the world.
Some terms to be familiar with include the remote management host (the computer or device to
which you are connecting), and the remote management client (which is usually implemented as
software that is run on users’ workstations). Additionally, an organization might have specific
tools for monitoring and managing machines remotely.

62
The Reference Guide to Data Center Automation

Remote Management Features


There are several important features to consider when evaluating and selecting a remote
management solution:
• Broad support—The ideal remote management solution will be able to support a variety
of device types, platforms, and versions. For example, in the area of desktop
administration, the remote administration client should be able to connect to all of the
operating systems (OSs) and versions that an organization regularly supports. Support for
future OSs and products should also be taken into account. All of these platforms should
be managed in a consistent manner.
• Reliability—As organizations depend on remote administration features for both routine
and emergency operations, reliability is a major concern. The client- and server-sides of
the remote management solution should be robust and dependable. Features that allow for
remotely restarting a non-responsive host device can be helpful in a pinch. In addition,
the ability to perform “out-of-band” management (that is, connections to a system by
using non-standard connection methods) can help ensure that services are available when
you need them most.
• Efficient bandwidth utilization—Remote management features should efficiently use
network bandwidth. In some cases, remote administration connections may be made over
high-bandwidth connections, so this won’t be an issue. However, when managing remote
data centers, small branch offices, and international locations, using an efficient protocol
can really help. Potential issues include low throughput rates and high latency on
networks (both of which can make a remote connection practically unusable). Specific
features to look for include the ability to provide for data compression, low average data
rates, and ways to minimize latency given a variety of different network scenarios. In the
area of desktop administration, for example, reducing the color depth, hiding desktop
backgrounds, and changing screen resolution can help decrease requirements (see Figure
16).

Figure 16: Configuring video settings in the Windows XP Remote Desktop client.

63
The Reference Guide to Data Center Automation

• File transfers—In addition to controlling remote computers, Help desk staff and systems
administrations might need a quick and easy way to transfer files. In some cases, transfers
can be handled outside of the remote administration solution by using standard network
file transfer methods. In other cases, such as when a connection is made to a remote
office or across multiple firewalls, a built-in solution that uses the same protocol and
connection as the remote connection can be helpful.
• Shadowing support—For training and troubleshooting purposes, the ability to “shadow” a
connection can be helpful. In this method, the remote user might have view-only
privileges on the remote device. Or, a trainer might be able to demonstrate an operation
on a remote computer without worrying about interruptions from a user.
In addition to these basic features, let’s look closer at details related to security.

Securing Remote Management


A critical concern related to remote management features is security. After all, if you’re adding a
new way in which users can access your users’ computers (and the data they contain), what is to
keep unauthorized users from doing the same? Fortunately, most modern remote management
tools offer many capabilities to help address these concerns.
First, authentication security—controlling who can remotely access a machine—must be
implemented. Through the simplest method, authentication security can take the form of a simple
“shared secret” username and password combination. But this approach leaves much to be
desired—by creating new login information, many potential security problems can be
introduced. In addition, it’s difficult to manually manage these settings. For example, what
happens when systems administrators enter and leave the company? For this purpose, reliance on
directory services (such as Windows Active Directory—AD) can help greatly. By centrally
managing security settings and permissions, systems administrators can keep track of which
users have access to remotely manage which resources.
The next type of security to consider is encryption. Most remote management tools will transfer
sensitive information in some form. Even keystrokes and converted video displays can be
misused if they’re intercepted. The security solution should provide for verification of the
identity of local and remote computers (through the use of certificates or machine-level
authentication) and should implement encryption of the packets that are being sent between the
client and server.
Finally, a remote management solution should provide administrators with the ability to
configure, review, and manage permissions related to remote management. In some cases, being
able to remotely manage a computer or other device will be an all-or-nothing proposition—either
the user will be able to fully control the device or they won’t. In other cases, such as in the case
of remote desktop management, you might choose to restrict the operations that some users can
perform. For example, a Level-1 Service Desk staff member might be allowed to only view a
remote desktop machine while the user is accessing it. This can help in the area of
troubleshooting, while maintaining adequate security and avoiding potential problems that might
be caused by accidental changes.

64
The Reference Guide to Data Center Automation

Choosing a Remote Management Solution


Most systems and network administrators already commonly use remote management features.
For example, on Windows desktop and server computers, the Remote Desktop feature is easily
accessible. And, for network devices, it’s a simple and straightforward process to connect over
the network rather than to a physical serial port or dedicated management port on the device
itself. Although these features might meet the basic needs of systems management, they do leave
a lot to be desired. Managing permissions, keeping track of logins, and controlling connection
details can make the process cumbersome and error-prone.
An ideal remote management solution will integrate with other IT data center operations tools,
utilities, and processes. For example, in the area of security, existing directory services will be
used for authentication and the management of permissions. Security can also be improved by
maintaining an audit log of which staff members connected to which devices (and when).
The remote management features may also integrate with change and configuration management
tools to keep track of any modifications that have been made. This functionality can greatly help
in isolating problems and ensuring compliance with IT standards. Additionally, processes should
be put in place to ensure that remote management features are used only when necessary. For
example, if automated tools can be used to change network address information on a server,
systems administrators should only connect to those machines if they need to perform a more
complicated task.
Overall, there are many potential benefits of working with remote management tools in
environments of any size. When managed and implemented correctly, remote administration can
save significant time and effort while improving IT operations and the end-user experience.

65
The Reference Guide to Data Center Automation

Server Configuration Management


The servers that an IT department manages for other members of the organization are one of the
most visible and critical portions of the infrastructure. From hosting file shares, databases, and
other critical applications services, servers must be available and properly configured at all
times. The challenge for IT staff is ensuring that these computers are properly configured and
problems don’t crop up over time. This section will talk about details related to server
configuration management, including important things to keep in mind when documenting and
configuring servers. Based on that, it will then look at details related to simplifying and
improving the process through automation.

Server Configuration Management Challenges


When working in production data center environments, there are many challenges that can make
managing server configurations more difficult. They can broadly be categorized as technical
challenges and process-related challenges.

Technical Challenges
Regardless of the operating system (OS) platform or the applications that are supported, all
servers must be kept up to date by systems administrators. Common tasks that must be
performed include installing security patches, managing changes to system and network
configurations, and taking an inventory of installed services. These operations are fairly simple
to perform on one or a few servers, but in most data center environments, IT staff members must
manage dozens or hundreds of machines.
Technical challenges include the actual deployment of updates and configuration changes.
Performing this task manually is time-consuming and tedious, even when using remote
administration features. Also, it’s far too easy for systems administrators to accidentally overlook
one or a few machines. In the case of implementing security patches, the result could be serious
security vulnerabilities.
Other challenges are related to actually performing configuration changes. IT departments should
ensure that changes are made consistently, that they adhere to best practices, and that any
modifications are tracked and documented. It’s also important to ensure that only authorized
administrators are making changes and to track who made modifications. Although most systems
administrators would agree to this process, in the real world, it can be difficult to spend the time
and attention required to follow these steps every time.

66
The Reference Guide to Data Center Automation

Process-Related Challenges
It’s important for IT departments to implement and enforce processes related to change and
configuration management. The goal is to ensure that all changes are valid and authorized and to
avoid problems that might appear due to inappropriate modifications to server configurations.
Unfortunately, ensuring communications between IT staff, management, and the users they
support can be difficult. The result is that some changes can cause unexpected problems due to a
lack of coordination.
IT management should also consider “quality assurance” processes and auditing of server
configurations. Ideally, management would be able to quickly and easily view up-to-date details
related to the configuration of all servers in the environment, regardless of location. This can
help identify machines whose configurations are outdated or not in compliance with IT policies.

Automating Server Configuration Management


Server configuration management is an excellent candidate for automation in most data center
environments. Many of the tasks that must be routinely performed can occur within minutes
rather than days, weeks, or months. Let’s take a look at the many features and benefits of
automating server configuration management.

Automated Server Discovery


An important first step in managing an entire IT environment is to discover what is out there.
Instead of manually connecting to individual servers and collecting configuration details,
automated server discovery features can scan the network and discover all the servers that are
present on the network. Often, this will include computers that systems administrators weren’t
aware of, and machines whose purpose is unknown. The computers may be located in the
organization’s data centers, or within remote branch offices.

Applying Configuration Changes


Once an IT department has decided to make a change on all of its servers, it must begin the
tedious and time-consuming process of performing the changes. By using an automated solution,
however, a single change can be propagated throughout an entire network environment in a
matter of minutes or hours. The changes can be scheduled to occur during periods of low activity
and results can be automatically collected. An automated process enforces consistency and helps
ensure that some systems are not accidentally overlooked during the update process.

67
The Reference Guide to Data Center Automation

Configuration Management and Change Tracking


A basic fact of working in an IT environment is that server configurations will change over time.
In most cases, changes are based on authorized modifications due to business and technical
initiatives. A server configuration management tool can collect network configuration
information and OS details and conduct application inventories. All these details are obtained
automatically, either through the use of agent software or standard OS methods. This reduces the
chance for human error and allows for frequent validation of changes.
Additionally, all of the configuration-related data can be stored centrally in a single configuration
management database (CMDB). The data can then be correlated with other information about the
environment to ensure that configurations are consistent.

Monitoring and Auditing Server Configurations


The process of auditing server configurations ensures that all servers are compliant with server
configuration policies. By using these solutions, IT managers can confidently state that all their
assets are being properly managed. When configuration details are properly tracked, systems
administrators can easily identify which servers might need to be updated. The process of
monitoring ensures that only authorized changes have been made and helps avoid unexpected
problems. In addition, it applies to the entire environment—not just one or a few servers that an
administrator might work with at a particular point in time.

Enforcing Policies and Processes


The importance of strong and consistent policies and processes cannot be overstated. IT
departments should develop and enforce methods for making changes to servers. Although many
IT managers might have developed approvals processes on paper, in reality, many ad-hoc
changes often occur. An automated server configuration management solution can greatly help
enforce processes by restricting changes to only authorized users and validating that the proper
approvals have been obtained.
From a technical standpoint, security permissions can be greatly restricted. For example, only the
automation solution might have permissions to perform actual changes, and systems
administrators must make all their modifications through the tool (see Figure 17). This serves the
dual purpose of increasing accountability and ensuring that only authorized users are accessing
server assets.

Figure 17: Making configuration changes using data center automation tools.

68
The Reference Guide to Data Center Automation

Reporting
One of the most visible benefits of automating the server configuration management process is
the ability to generate on-demand reports. The information provided can range from software
installation details to security configurations to server uptime and availability reports. All
configuration and change data is stored in a central CMDB, so systems administrators and IT
managers can quickly obtain the information they need to make better decisions.
Reporting might also be required in order to demonstrate compliance with various regulatory
requirements. A process that was formerly time-consuming and inaccurate can be reduced to a
few simple steps. Better yet, individuals from areas outside of the IT department can view details
that are relevant to performing their jobs.

Evaluating Automated Solutions


In addition to looking for the already mentioned features, there are several factors IT decision
makers should keep in mind when evaluating automated server configuration management
solutions. They should be sure that most of the platforms they support are manageable using the
product. Considerations include hardware platforms, OS versions, and various system updates.
Ideally, the technology will be regularly updated to keep pace with new systems. Additionally,
the tool should enforce policies and processes to ensure that all changes are authorized and
coordinated. Finally, all details should be tracked centrally, and the ability to perform audits and
regular reporting can greatly help IT better manage its server investments.
Overall, through the implementation of an automated server configuration management solution,
IT departments can perform the vital task of keeping servers updated while avoiding much of the
manual work involved. The benefits are that servers are configured consistently and accurately
and IT staff is free to perform other important tasks.

IT Processes
Processes define a consistent set of steps that should be followed in order to complete a
particular task. From an IT standpoint, processes can range from details about Service Desk
escalations to communicating with end users. The goal of IT processes is to improve overall
operations within the IT department and the organization as a whole.
It’s often a fact that the implementation of processes requires additional effort and may add steps
to some jobs. The steps can be time-consuming and may result in resistance or non-compliance.
That raises the challenge: Processes must be worth far more than the “trouble” they cause in
order to be considered worthwhile. This section will look at details related to what makes a good
process, how you can enforce processes, and the benefits of automating process management.

69
The Reference Guide to Data Center Automation

The Benefits of Processes


Let’s first talk about the upside of designing and implementing processes. The major goals and
benefits include:
• Consistency—Tasks should be performed in the same way, regardless of who is
performing them. In fact, in many cases, it can be argued that having something done
consistently in a sub-optimal way is far better than having tasks sometimes completed
well and sometimes completed poorly. Ad-hoc changes are difficult to manage and can
lead to complex problems.
• Repeatability—It’s often easy for IT staff to make the same mistakes over and over or to
“reinvent the wheel.” The goal of defining processes is to ensure that the same task can
be completed multiple times in the same way. Simply allowing everyone to complete
goals in their own way might be good for tasks that involve creativity, but they often
don’t work well for operations that require a lot of coordination and many steps.
• Effectiveness—The process should indicate the best way to do things with respect to the
entire organization and all that are involved. The steps involved in the process should
enforce best practices.
These benefits might make the decision to implement processes an easy one, but the real
challenge is not related to “why” but rather “how” to implement processes.

Challenges Related to Process


For some IT staff members, the very thought of processes might conjure up images of the
Pointy-Haired Boss from the Dilbert comic strips. And there are some very good reasons for this:
Mainly, many processes are poorly implemented and can actually make work more difficult with
little or no benefit to anyone. Some of the problems with poorly implemented processes are
based on a lack of knowledge of the details of a particular task.
When out-of-touch management tries to single-handedly implement steps in an operation that it
does not understand, the result can be disastrous. Consequently, many IT staffers tend to resist
processes. They tend to circumvent them and do the bare minimum in order to meet
managements’ requirements. Worse, they don’t see that there are benefits at all. OK, so that’s the
bad news. Let’s look into what makes a good process (and one that people will like and follow).

Characteristics of Effective Processes


There are several aspects of processes that should be taken into account when implementing new
methods of doing things. First, the purpose of the process should be clearly defined before going
into the details themselves. Usually, the purpose is to define how a particular set of actions
should be performed. Change management processes are a typical example. Organizations might
implement formal change request documents and a Change Advisory Board (CAB) to keep track
of modifications.
Effective processes should be well aligned with the business and technical goals they’re trying to
accomplish. “Process for the sake of process” is often counter-productive. Some questions to ask
might include “Is this is the best and most efficient way to accomplish a particular goal?” and “Is
the extra effort required by the process really worth it?” In some cases, if reporting and
documentation of actions aren’t useful, perhaps they can be removed to make the process
simpler.

70
The Reference Guide to Data Center Automation

The reasoning behind processes should be well-understood. IT staff will be much more likely to
adhere to processes that they understand and agree with. Managers should avoid implementing
unnecessarily rigid rules: Processes should not attempt to describe every possible action an
employee must take. Instead, implementers should be given some leeway in determining the best
method by which to complete smaller portions of the tasks. Presenting processes as flexible and
evolving guidelines can go a long way toward ensuring compliance.

Designing and Implementing Processes


When you choose to design and implement a new process, it’s important to solicit input from all
the individuals and business units that might be involved. Ideally, the process will have
collective ownership. Although you might be able to coerce employees to follow specific
sequences of steps, you might reduce overall productivity by hurting morale and overlooking
better ways to do things. The best processes will solicit and incorporate input from all of those
involved. Although it might be painful, sometimes one of the best things that IT managers can do
is get out of the way.
Another important consideration to keep in mind is that processes should never be considered
“final.” Instead, they should evolve when business and technical needs change. If you hear
systems administrators explaining that processes reflect the way things “used to be done,” it’s
probably time to update the process. In order to ensure that the proper steps are being followed,
however, IT staff should be encouraged to propose changes to the process. In fact (and at the risk
of sounding like a management fad), a process to control process changes might be in order.
Often, processes can require many steps, and it can be very difficult for all of those that are
involved to understand them. One useful method for communication processes is that of
flowcharts. Figure 18 provides an example of a server deployment process. Note that decisions
and responsibilities are clearly identified, and roles for each step have been defined.

Figure 18: An example of a server deployment workflow process.

Overall, the key goals are that those who follow processes should clearly understand the benefits.
Without buy-in, the process will be seen as a chore that is forced by management.

71
The Reference Guide to Data Center Automation

Managing Exceptions
An unfortunate fact related to working in the real world is that most rules will have at least
occasional exceptions. For example, in an emergency downtime situation, you might not have
enough time to walk through all the steps in a change and configuration management process.
Clearly, in that case, resolving the problem as quickly as possible is the most important factor.
However, the goal should be for exceptions to be relatively rare. If exceptions do occur
frequently, it’s probably worth considering adding them to the current process or developing a
new process.

Delegation and Accountability


One crucial aspect related to developing and managing processes is the people involved.
Although it might be easy to define a process and just expect everyone to follow it, there will be
many cases in which this simply will not happen. Rapidly approaching deadlines and juggling
multiple responsibilities and handling related concerns can often cause diligence related to
processes to slip.
One way to ensure that processes are consistently enforced is to ensure that specific individuals
are tasked with reviewing steps and ensuring that they’re followed. Management can add
accountability and metrics to the individuals based on how closely processes are followed and
how many exceptions are made.

Examples of IT Processes
By now, it’s likely that you’re either considering updating existing procedures or putting new
processes in place. That raises the question of which operations can benefit most from well-
defined processes. In general, it’s best to focus on tasks that require multiple steps and multiple
operations in order to be completed. The tasks should happen frequently enough so that the
process will be used regularly. Other characteristics include business goals that are often not met
due to miscommunications or inconsistent ways of handling the tasks that are involved.
Some specific examples of IT processes that organizations might either have in place or might be
considering are shown in Table 6.
Business Process Possible Steps Notes
Change and Formal documentation of change Standard forms for
Configuration requests and approval by a CAB communicating changes can be
Management helpful
IT Purchasing Requests for multiple quotes (if Different processes or approval
possible), cost justification, ROI/TCO levels might apply based on the
analysis, and approvals from senior cost and business area related to
management the purchase
Server Server configuration review, security The server should be based on
Deployments configuration checklist, and one of the predefined supported
management acceptance of new configurations
configuration
Service Desk Documentation of new requests, At any given point in time, the
prioritization based on relevant issue must be “owned” by a
Service Level Agreements (SLAs), specific individual
and escalation of process details

Table 6: Examples of IT processes.

72
The Reference Guide to Data Center Automation

Automating Process Management


One important way in which IT managers can better implement, enforce, and manage processes
is through the use of data center automation tools and utilities. Ideally, these tools will provide
the ability to quickly and easily define processes and workflow. The steps might involve
branching logic, approvals, and metrics that must be met along the way. By storing this
information consistently and in an accessible way, all people involved should be able to quickly
and easily view details about the steps required to complete a particular task.
If the solution can lead the individual or user through the steps required to complete a process
correctly, compliance will increase significantly. Additionally, automated process management
tools should provide the ability to audit and report on whether particular processes were closely
followed.
Overall, when implemented and managed properly, IT processes are a significant characteristic
of a well-managed IT environment. Processes can help ensure that tasks are performed
consistently, efficiently, and in accordance with business requirements.

73
The Reference Guide to Data Center Automation

Application Infrastructure Management


In the old days of information technology (IT), applications frequently fit on floppy disks or
resided on a single mainframe computer. As long as the hardware platform met the minimum
system requirements, data center administrators could be fairly sure that the application would
run properly. And, ensuring uptime and reliability involved ensuring that the few computers that
ran the software were running properly. Times have definitely changed. Modern applications are
significantly more complicated, and can often rely on many different components of an overall
IT architecture.

Understanding Application Infrastructure


When considering hardware, software, network, and operating system (OS) requirements, the
entire infrastructure that is required to support an application can include dozens of computers
and devices. The actual number of independent parts adds complexity, which in turn can make it
much more difficult to manage overall systems.
For example, if a user complains of slow reporting performance related to a Web application, it’s
not always easy to pinpoint the problem. Perhaps the database server is bogged down fulfilling
other requests. Or, perhaps the problem is on the WAN link that connects the user to the Web
server. Or, a combination of factors might be leading to the problem. Figure 19 shows a
simplified path of interactions for creating a report. Each component in the figure is a potential
bottleneck.

Figure 19: Potential performance bottlenecks in a modern distributed environment.

This relatively simple situation highlights the importance of understanding the entire
infrastructure that is required to manage an application. For IT departments that support multiple
sites throughout the world and dozens of different line-of-business applications, the overall
problems can be far more complex.

74
The Reference Guide to Data Center Automation

Challenges of Application Infrastructure Management


Most IT organizations attempt to manage important applications without fully understanding
them. The theory is that as long as areas such as the network infrastructure are properly
configured, all the applications on which it depends should also work optimally. Although such
certainly can be the case in some situations, some types of issues can be far more complicated to
manage.
For example, in the area of change and configuration management, making a single change
might have unexpected consequences. Seemingly unrelated modifications to a firewall
configuration, for example, might cause connectivity issues in another application. The main
challenge for IT is to be able to identify all the inter-related components of an application and to
have the ability to compare and verify suggested changes before they’re made.

Inventorying Application Requirements


To get a handle on the complete requirements for complex applications, IT departments should
start by taking an inventory of important applications. For example, a Web-based CRM tool that
is hosted by an external provider might have relatively simple requirements: As long as users’
workstations can access the Internet, they will be able to use the application (although even a
Web application might impose other requirements, such as specific browser features). The
infrastructure requirements might include network connectivity to the desktop and the firewall
and access through edge routers.
Data center applications that require multiple servers can be significantly more complex. Often,
multi-tier applications consist of components that include routers, switches, firewalls, and
multiple servers. From a logical organization standpoint, the requirements for the application
should include all these devices.

Identifying Interdependencies
Infrastructure components that are shared by multiple applications can be identified after taking
an inventory of the application requirements. Often, the results can help provide greater insight
into operations. Figure 20 provides an example of a shared component that might be used by
multiple applications.

75
The Reference Guide to Data Center Automation

Figure 20: Shared application infrastructure requirements for a modern, distributed application.

As an example, a single low-end switch might be identified as a single point-of-failure for


multiple important applications. In this case, an investment in upgrading the hardware or
implementing redundancy might help protect overall resources. Also, whenever changes are
being planned, test and verification processes should include examining all of the applications
that use the same components.

Automating Application Infrastructure Management


It’s probably evident that even in relatively simple IT environments, identifying and managing
application infrastructure components can be a complicated task. Fortunately, through the use of
data center automation solutions, much of the work can be managed automatically.
By storing application infrastructure information centrally in a Configuration Management
Database (CMDB), IT staff can quickly find details about all the devices that are required to
support an application or service. High-end solutions provide the ability to be able to visualize
the interdependencies between hardware, software, and network resources in ways that are easy
to understand. Change and configuration management features can also help keep track of the
effects of modifications and can help avoid potential problems before they occur.

Using Application Instrumentation


Many third-party applications provide built-in methods for monitoring performance and
application activity. Collectively known as “instrumentation,” these features may take the form
of a custom Application Programming Interface (API), OS performance monitor counters, or log
files. IT departments should look for data center automation solutions that can collect and report
on this data.

76
The Reference Guide to Data Center Automation

Managing Applications Instead of Devices


Although it’s easy to get bogged down in the heads-down technical details of maintaining an IT
environment, the overall success of operations is not based on routers, servers, and workstations.
Instead, the real goal is to manage the applications and services upon which users depend. Well-
designed data center automation tools can help IT staff visualize complex inter-dependencies
even in widely distributed environments. By focusing on the management of entire applications,
IT departments can significantly improve performance, reliability, and availability.

Business Continuity for Servers


It’s no secret that the success of enterprise environments is at least partially based on reliable and
available computing resources. In modern business environments, even minor disruptions in
service can result in large financial losses. Normally, outages can be caused by power failures,
hardware failures, or even the unavailability of an entire data center. As most organizations have
become increasingly reliant on IT, technical managers have been tasked with ensuring that
services can continue, even in the case of major disasters.

The Value of Business Continuity


Generally, most IT and business leaders have a good idea about the value of business continuity
planning. Simply put, the goal is to avoid downtime and to minimize potential data loss.
Although it might be tempting to imagine a large meteor heading towards your data center
(perhaps, targeting your mission-critical systems), there are many other reasons to protect against
disaster. Security breaches or malicious intruders to your system could cause a tremendous
amount of damage to systems. In addition, most organizations must rely on infrastructure that is
out of their immediate control, such as electric grids and Internet infrastructure. Finally, good
old-fashioned user or administrator error can lead to downtime. When all of this is taken into
account, the reasons for implementing business continuity are compelling.
Unfortunately, maintaining complete redundancy can be an expensive proposition. Therefore, the
organization as a whole should work together to determine the high-level reasons for undertaking
a business continuity initiative. In some cases, the main drivers will be related to contractual
obligations or complying with regulatory requirements. In other cases, the financial impact of
downtime or data loss might create the business case. The important point is for the entire
business to realize the value of disaster planning.
Inevitably, organizations will need to determine what needs to be protected and how much is
appropriate to spend. The main point is that a successful business continuity approach will
include far more than just the IT department—the organization’s entire management team must
be involved in order for it to be successful.

77
The Reference Guide to Data Center Automation

Identifying Mission-Critical Applications and Servers


Given infinite resources, implementing business continuity would be simple: multiple redundant
environments could be created, and the infrastructure to support real-time synchronization of
data would be readily available. In the real world, financial and technical constraints make the
process much more difficult. Therefore, before looking at the technical aspects of implementing
disaster recovery measures, IT management should meet with business leaders to identify the
critical portions of the infrastructure that must be protected.
Assuming that not all resources can be completely protected, it’s important to determine the
value of each important asset. The first step in prioritization is to take an inventory of the most
important high-level functions of the IT department. For example, an online financial services
firm might rely heavily upon stock trading software. Next, the technical details of supporting the
application should be identified. Modern applications will have many different requirements,
including network connections and devices, authentication and security services, and many
physical computer systems. In order to provide continuity for the entire end-user service, it’s
important that none of these components is overlooked. Ideally, IT management will be able to
provide an estimate of the cost required to protect each system. In most environments, this
process can be challenging, but it’s absolutely critical to ensuring a reliable business continuity
plan.

Developing a Business Continuity Plan for Servers


When developing a plan for managing servers during disaster situations, it’s important to keep in
mind the overall goal—to allow business to continue. Often, systems and network administrators
will focus on the lower-level technical details of high availability. For example, redundant power
supplies and RAID disk configurations can help reduce the likelihood of downtime and data loss.
However, the overall approach to high availability should include details related to all areas of
operations. For example, even if data and hardware is protected, how will an actual failover
occur? Will users be required to implement any changes? What is the process for the IT team?
Immediately after a failure occurs is probably the worst time to “rehearse” this process.
Business continuity planning generally involves several major steps (see Figure 21). The process
begins with identifying which systems must be protected. Then specific business and technical
requirements should be defined. Finally, based on this information, the organization will be
ready to look at implementing the business continuity plan.

Figure 21: Steps to include in a server continuity plan.

78
The Reference Guide to Data Center Automation

Defining Business and Technical Requirements


A general best practice related to performing backups is to base the actual processes that are
performed on recovery requirements. When developing business continuity implementations,
there are several important factors to take into account:
• Acceptable data loss—Although most business managers would rather not think about it,
the potential for data loss during a disaster is difficult to avoid. Businesses should come
up with a realistic idea of how much data loss is acceptable. An important consideration
will be approximate costs. Is it worth a $1.2M investment to ensure that no more than 2
minutes of transactions will ever be lost? Or is it acceptable to lose up to an hour’s worth
of transactions to lower the implementation cost? Other considerations include the impact
to actual production systems. For example, two-phase commit (2PC) replication for
database servers can add single points of failure and can decrease overall production
performance.
• Automated failover—A disaster or system failure can occur at any time. One requirement
to ensure the highest levels of availability is that of automatic failover. Like other factors,
however, this comes at a significant cost. For a seamless failover to occur, many aspects
of the infrastructure must be prepared. Starting from the server side, machines must be
able to coordinate the removal of one server from active service and the promotion of
another one to take its place. This process usually requires a third “witness” server.
Additionally, the network infrastructure and configuration must be adapted. Finally,
changes might be required on the client-side. Although Web applications can often
failover without users noticing, full client-side applications might require users to change
connection settings or to log out and log back into the system. Clearly, there is a lot of
work to be done to ensure automatic failover, but in some business cases, this work is
unavoidable.
• Time for failover—When primary production servers become unavailable, it will
generally take some period of time for the backup site to take its place. There are many
challenges related to minimizing this time. For example, how long should systems wait
before determining that a failover should take place? And, how is a failure defined?
Business should decide on acceptable failover times, taking into account the cost and
feasibility of supporting those levels of availability. Furthermore, the entire process
should be tested to ensure that there are no unexpected surprises. Even multi-million
dollar disaster recovery plans can fail due to seemingly minor configuration
discrepancies.
Now that we have a good idea of some of the business and technical considerations, let’s look at
how you can use this information to create a plan.

79
The Reference Guide to Data Center Automation

Implementing and Maintaining a Backup Site


The most important aspect of implementing a business continuity plan involves the creation of a
secondary site that can be used in the event of a failure. A backup site will generally contain
enough hardware and infrastructure services to support critical backup operations from a remote
location. Setting up this new site generally involves purchasing new hardware and duplicating
the configuration of current production equipment. Although systems administrators are
generally aware of the steps required to perform these processes, it can be difficult to replicate
configurations exactly.
Once a backup site has been implemented, it’s time to look at details related to maintaining it. In
some cases, business requirements might allow for periodic backups and restores to be
performed. In those cases, some data loss is acceptable. In other situations, however, the backup
site must be kept up to date in real-time and must be ready for a loss-less failover in a matter of
seconds. For servers, various solutions such as clustering, replication, backup and recovery, and
related methods can be used. Regardless of the technical approach, however, a lot of time and
effort is usually required to implement and monitor synchronization for a disaster recovery site.

Automating Business Continuity


Implementing business continuity is generally no small undertaking. IT staff must have a
complete understanding of the resources that are to be protected, and all technical information
must be kept up to date. It’s simply unacceptable for changes to be made in the production
environment without an accompanying change within the disaster recovery site. Fortunately, data
center automation tools can greatly help reduce the amount of time and effort that is required to
maintain a disaster recovery site.

Using a Configuration Management Database


The purpose of a Configuration Management Database (CMDB) is to centrally store information
related to the entire infrastructure that is supported by an IT department. Specifically, related to
servers, the CMDB can store configuration details about the operating system (OS), security
patches, installed applications, and network configuration.
Using this information, systems administrators can quickly view and compare configuration
details for the disaster recovery site. One of the potential issues with maintaining redundant sites
is ensuring that a site that is effectively “offline” is ready for a failover. Therefore, reports can be
centrally run in order ensure that there are no undetected problems with the backup site.

80
The Reference Guide to Data Center Automation

Change and Configuration Management


The operations related to keeping a backup site up to date leaves a lot of room for error. If done
manually, the process involves a doubling of effort whenever configuration changes are made.
Data center automation tools that provide for server change and configuration management can
automatically commit the same change to multiple servers (see Figure 22). This is ideal for
situations in which a backup site must remain synchronized with the production site, and it
dramatically reduces the chances of human error.

Figure 22: Automating configuration management using data center automation tools.

Overall, the process of developing and implementing a business continuity plan for servers will
be a major undertaking for IT staff and management. However, through the use of data center
automation tools, the process can be significantly simplified, and administration overhead can be
minimized. The end result is increased protection of critical data and services at a reasonable
overall cost.

81
The Reference Guide to Data Center Automation

Network and Server Maintenance


Although it might not be the most glamorous aspect of IT, maintaining network and server
devices is a critical factor in managing overall IT services. Most systems administrators are well-
versed in performing standard maintenance tasks manually, but there are many advantages to
automating routine operations.
Most IT organizations have established at least some basic procedures and processes related to
the maintenance of devices in the data center. For example, patches might be installed on servers,
as needed, and device configurations might be routinely backed up.

Network and Server Maintenance Tasks


Perhaps one quick way of building a list of maintenance tasks is to ask IT administrators what
they least enjoy doing. Although a complete list of routine IT tasks could fill many books, this
section will focus on common maintenance areas. Each section will explore how data center
automation tools can provide significant benefits over manual processes.

Configuration Management
Over time, servers and network equipment will likely need to be updated to meet changing
business needs. For example, when a router is reconfigured, network address information may
need to change on multiple servers. Alternatively, the organization might implement stricter
security policies that must then be applied to hundreds of devices. In very small and simple
network environments, it might be reasonable to perform these changes manually. In most IT
environments, the process of manually making changes is one that is tedious and leaves a lot of
room for error.
Data center automation solutions can ease the process of making changes on even hundreds of
devices. The process generally involves a member of the IT staff specifying the change that
should be made. Assuming that the staffer has the appropriate permissions, the actual
modifications can be scheduled or applied immediately. Often, the task can be completed in a
matter of minutes, and all that is left for the administrator to do is verify the change log.

Applying System and Security Updates


Computers and network devices often benefit from periodic updates. For example, operating
system (OS) vendors often release updates that can fix potential functional problems or add
functionality. And, security updates are critical to ensuring that important systems and data
remain protected. An automated patch management solution can quickly deploy a single update
to even thousands of devices with minimal effort.
Figure 23 illustrates an automated patch deployment process. In this example, a systems
administrator has tested an OS patch and has verified that it is ready to deploy to a set of
production servers. Instead of connecting to the servers individually, the change request is sent to
a data center automation solution. This server identifies which machines require the update and
then automatically manages the patch deployment process. While the updates are being
performed, the administrator can view the progress by using the central configuration console.

82
The Reference Guide to Data Center Automation

Figure 23: Applying updates using an automated system.

Monitoring Performance
All modern OSs require some standard maintenance operations in order to perform at their best.
Actions such as deleting unnecessary files and performing defragmentation can help keep
systems performing optimally. For certain types of applications, such as databases, other tasks
such as index defragmentation or consistency checks might be required. By implementing
automated monitoring solutions, administrators can often be alerted to potential problems before
users experience them. And, many types of problems can be fixed automatically, requiring no
manual intervention at all.

Implementing Maintenance Processes


In addition to the various categories of tasks we’ve covered thus far, there are several
considerations that IT departments should keep in mind when performing maintenance
operations.

83
The Reference Guide to Data Center Automation

Delegating Responsibility
An important best practice to keep in mind is that of delegating responsibility. Without
coordination between members of the IT team, statements like, “I thought you were going to take
care of that last week,” can lead to significant problems. Data center automation solutions can
allow IT managers to create and configure schedules for their staff members, and can assign
specific duties. This can make it much easier to handle vacation schedules and to ensure that no
area of the environment is left completely uncovered at any time.

Developing Maintenance Schedules


Systems and network administrators are likely all too familiar with spending cold nights and
evenings in the server room in order to work during “downtime windows.” Although the goal is
a good one—to minimize disruption to production services—it’s still one that is dreaded by IT
staff. Through the use of data center automation solutions, downtime windows can be scheduled
for any time, and changes can be applied automatically. Administrators can review and verify the
changes the next day to ensure that everything worked as planned.

Verifying Maintenance Operations


The very nature of performing maintenance often relegates important tasks to “back burner”
status. When performed manually, it’s far too easy for a network or systems administrator to
forget to update one or a few devices, or to be called off on other tasks. In addition to
automatically making changes, data center automation solutions can store expected configuration
information within a Configuration Management Database (CMDB). IT managers and staff can
then compare the actual configuration of devices to their expected configuration to find any
discrepancies. This process can quickly and easily ensure that maintenance operations are not
overlooked, and that all systems are up to specifications.

The Benefits of Automation


Overall, without data center automation solutions, the process of maintaining server and network
equipment can take a significant amount of time and effort. And, it’s rarely any IT staffer’s
favorite job. Through the use of automation, however, tasks that used to take days, weeks, or
months can be implemented in a matter of minutes. And, the process can be significantly more
reliable, leading to improved uptime, quicker changes, and a better experience for IT
departments and end users.

84
The Reference Guide to Data Center Automation

Asset Management
The goal of asset management is to track the fixed assets that an organization owns and controls.
From a general standpoint, asset management can include everything ranging from racks to
servers and from buildings to storage devices. IT departments are often tasked with managing
budgets and keeping track of inventory, even in highly distributed environments. Without proper
technology and processes in place, it can quickly become difficult to find and manage all of these
assets. The following sections focus on what to track and how to develop a process that will
allow for easily maintaining information about hardware, software, and other important aspects
of a distributed IT environment.

Benefits of Asset Management


Although many organizations perform some level of asset tracking using a variety of methods,
usually these processes leave much to be desired. For example, IT managers might be tasked
with performing a complete audit of software used by the PCs in a particular department, based
on the request of the CFO. The process of collecting, analyzing, and verifying the data can be
extremely painful. Often, systems administrators must manually log into many different devices
to get the information they need. It can take weeks or even months to complete, and still the
accuracy of the information might be difficult to verify.
By implementing best practices related to asset management, IT departments and the
organizations they support can quickly realize many important benefits:
• Lowering costs—Many IT departments are in a position to negotiate deals and discounts
with hardware and software vendors. Often, however, IT departments can leave “money
on the table” by not leveraging their bargaining power. In some cases, too much
equipment may be purchased, leading to unused systems. In other cases, the IT
departments are unaware of exactly how much they’re spending with a vendor, making it
difficult to use this information during pricing negotiations. Asset management practices
can shed light on the overall resource usage of the entire organization and can lead to
better decision making.
• Security—“Out of sight, out of mind,” can apply to many of the assets that are managed
by an IT department. During audits or when troubleshooting problems, IT staff might
find network devices that have not been patched, or servers for which there is no known
purpose. This ambiguity can lead to security problems. Through the use of asset
management tools, IT departments can be sure that the purpose and function of each
device is known, and they can help ensure that no system is overlooked when performing
critical system maintenance.
• Improved service levels—IT departments that are unaware of the location and purpose of
devices that they support are generally unable to provide high levels of service and
responsiveness whenever problems arise. When asset management can be used to provide
the entire IT staff with visibility of the entire environment, monitoring and
troubleshooting systems can become significantly easier and more efficient. The end
result is quicker and more thorough issue resolution.

85
The Reference Guide to Data Center Automation

• Regulatory compliance—The proper management of fixed computing assets is an


important part of many regulatory requirements. It is also an important financial practice.
IT managers must be able to identify and locate various capital purchases during an audit,
and must be able to provide details related to the purpose and history of the device.
• Software licensing—In most IT environments, a significant portion of overall capital
expenditures is related to software licensing. Operating systems (OSs), office
productivity applications, and specialized tools all incur costs related to purchasing,
installation, configuration, and maintenance. It’s not uncommon for an IT department to
support dozens or even hundreds of different applications. Without an asset management
solution, it can be difficult to produce an up-to-date picture of which software is installed
(and what is actually being used). However, with this information, IT departments can
quickly identify how many licenses are actually needed and whether licenses can be
reallocated. The information might indicate that reduced investments in some software
and upgrades of other products might be in order.
• Budgeting—Providing service while staying within budgetary constraints is one of the
most challenging tasks for IT departments. Often, purchasing is handled in a case-by-
case, ad-hoc manner. Whenever new assets are needed, IT managers must justify the
related expenditures to upper management. And, there are often surprises related to
unexpected expenses. By efficiently tracking current assets (and their levels of usage), IT
management can provide significantly more accurate predictions about ongoing and
future capital asset requirements to the rest of the business.
Once you’re sold on the benefits of asset management, it’s time to look at how you can
implement this technology.

Developing Asset Management Requirements


Before implementing an asset management solution, organizations should look at what
information they need to track. Although the basic needs of asset management are well-defined,
additional data can often help improve operations. At a minimum, IT departments should be able
to enumerate all the devices that they manage. They should be able to uniquely identify each
asset and find the current physical location of the device. In the case of a data center, this might
involve the row, rack, and position numbers. Figure 24 shows an example of a simple rack
diagram.

86
The Reference Guide to Data Center Automation

Figure 24: Developing a rack diagram for asset management.

In addition to basic information, IT departments should consider capturing details related to the
initial and ongoing costs for the device, details about its purpose, and any configuration
information that can help in troubleshooting and management.

Identifying Asset Types


Loosely defined, IT assets can include many different items. The granularity of what is tracked
could extend to physical buildings, office spaces, and even network cables. So that raises the
consideration of what an IT department should practically track. The main rule is usually based
on asset value. It might be determined that only devices that cost more than $250 should be
tracked. Table 7 provides a list of the types of assets that should generally be included by an
asset tracking solution.

87
The Reference Guide to Data Center Automation

Category Examples Information to Collect


Software Operating systems The purpose and cost of each
Office productivity applications supported application
Line-of-business applications Where software applications
Standard utilities (firewall software, are installed
anti-virus, anti-spyware) Actual application usage
Unauthorized software
installations
Workstations End-user desktop computers Computer name and model
Training and test lab computers Hardware and network
configuration details
Asset cost and related
information
Servers Intranet servers Purpose of the server
Application servers Computer name and model
Database servers Hardware and network
configuration details
Asset cost and related
information
Support contract details
Mobile devices Laptop computers Current location of the asset
PDAs Current “owner” of the device
Other “smart” portable devices Purpose of the device
Information about the
capabilities of the device
Security information
Networking devices Routers Device manufacturer and
Switches model
Firewalls Purpose of each device
Content caches Physical location
Intrusion detection/prevention systems

Table 7: An example list of asset types.

It’s likely that IT departments will need to take into account other types of devices, as well. For
example, if a business uses specialized hardware in a testing lab, that hardware should be
included. Additionally, IT departments should take into account assets that are committed to
remote sites.

88
The Reference Guide to Data Center Automation

Developing Asset Tracking Processes


As with many other IT initiatives, developing solid asset-tracking processes is critical to ensuring
that information remains consistent and relevant. Although software solutions can meet some of
these needs, defined and enforced processes are crucial to the overall process. To facilitate the
accurate tracking of asset data, organizations should physically place asset tags on devices.
Doing so helps uniquely identify a device and requires no technical expertise to match up an
asset with its information.
All IT staff must be responsible for keeping asset management up to date through the use of the
asset management system. For example, if a router is removed from service, this information
should be captured by the asset management tool. A best practice is to include asset-tracking
details in any change and configuration management process. Figure 25 shows some possible
steps to the process.

Figure 25: Steps in an asset management process.

For organizations that have implemented the IT Infrastructure Library (ITIL) best practices, the
Software Asset Management topic can be particularly useful. For more information, see the ITIL Web
site at http://www.itil.co.uk/.

Automating IT Asset Management


It’s likely obvious at this point that the process of asset management can be greatly simplified
through the use of automation tools. The tasks of collecting, storing, and maintaining up-to-date
data are often well-suited for computer systems. When examining asset management solutions,
IT departments should look for features that fit into their overall automation tools frameworks.
For example, a Web-based user interface (UI) can make accessing asset-related data easy for
non-IT users. In addition, support for regular updates can help maintain the accuracy of
information. Many IT industry hardware and software vendors have included asset tracking
features in their solutions. Asset management products that can utilize this type of information
should be preferred. The following sections look at other useful features that should be
considered when evaluating asset management solutions.

89
The Reference Guide to Data Center Automation

Automated Discovery
One of the largest barriers related to implementing asset management is the difficulty associated
with collecting data about all the devices that must be supported in an IT environment. In some
cases, this task might be done manually by physically or remotely connecting to each device and
recording details. Of course, apart from the tedium of the process, it’s easy for certain devices to
be overlooked altogether.
Many asset management solutions can leverage an automated discovery feature to
programmatically scan the network and find devices and nodes that are currently active. The
process can often be performed very quickly and can include details about devices located
throughout a distributed environment. Furthermore, routine audits can be performed to ensure
that devices are still available and to track any changes that might have occurred.

Using a Configuration Management Database


Asset-related data is most useful when it can be combined with other details from throughout an
IT environment. For this reason, using a Configuration Management Database (CMDB) is
beneficial. The CMDB can centrally store details related to assets and their configuration. The
CMDB should also store change-related data in order to ensure that data is always up to date.

Integration with Other Data Center Automation Tools


Ideally, an asset management solution will integrate with other automation tools used by an IT
department. For example, service desk application users should be able to quickly and easily
access details about workstation, server, and network devices. This ability can help them more
quickly isolate and troubleshoot problems. In addition, systems administrators should be able to
update configuration details about a server and have the information automatically update asset-
related details such as physical location, network details, and purpose of the computer. Many
integrated data center automation solutions provide the ability to make assets easier to track and
maintain with minimal effort from systems administrators and IT managers.

Reporting
The key goal of asset management is to facilitate reporting. IT managers should be able to
generate on-demand information about hardware, software, and network devices, as needed.
Many asset management solutions will provide the ability to create real-time reports. Products
often allow for Web-based report design and customization. By making asset-related information
available to managers throughout the organization, IT departments can better ensure that they are
meeting overall business needs. Overall, by developing an asset management approach and
selecting an appropriate data center automation tool, IT organizations can realize the benefits of
tracking the devices they support with minimal time and effort.

90
The Reference Guide to Data Center Automation

Flexible/Agile Management
In just about any IT environment, changes are frequent and inevitable. Successful businesses
must often make significant modifications to business and technical processes to keep pace with
customer demands and increasing competition. In business and IT terms, agility refers to the
ability to quickly and efficiently adapt to changes. The faster an IT organization can react to
changes, the better aligned it will be with business units—and that will translate to overall
success for the entire enterprise.

Challenges Related to IT Management


In some cases, the problems related to agile management might seem paradoxical. On one hand,
IT managers work hard to define and enforce processes to ensure that changes are performed
consistently and in a reproducible manner. This often requires additional steps to record and
track changes, and processes to support them. On the other hand, IT departments must remain
flexible enough to support changes that might come at a moment’s notice. This raises the
question: How can an IT department plan for the future when anything could change at a
moment’s notice?
It’s important not to confuse agility with a lack of processes. As is the case with all areas of the
business, chaos resulting from ad-hoc changes is rarely productive and can lead to far more
complicated problems in the future. The main point for IT managers to remember is that they
must preserve standard operating best practices, even when making large changes in a small
period of time.

The Agile Management Paradigm


The term agile management is often heard in reference to managing software development
projects. The central theme is to ensure that designers and programmers are ready to
accommodate change at a moment’s notice. For many environments, the standard year-long
cycles of designing, prototyping, implementing, and testing are no longer adequate. Business
leaders want to see changes occur with little delay and are unwilling to accept the time and cost
related to entire application rewrites. Therefore, the teams must work in much smaller cycles,
and each portion of the development process should result in usable code.
Many of the same goals and concepts also translate into the area of managing data center
environments. Rather than setting up servers and network infrastructure and considering the job
“done,” systems and network administrators must be ready to make major changes when they’re
required.

91
The Reference Guide to Data Center Automation

Key Features of an Agile IT Department


Although there are many aspects of IT management that can affect the overall quality of
operations, there are common areas that should be kept in mind. The key features of an agile IT
department include the following:
• Coordination with business objectives—Agile IT departments recognize that their main
function is to support business initiatives. IT managers and systems administrators must
have a high level of awareness of the systems they support, and the reasons that they
exist. This awareness can help IT immediately identify which areas might change due to
shifts in business strategy rather than waiting until it becomes completely obvious that
the systems no longer fit requirements. To keep on top of changes that might be coming,
IT representatives should be included in business strategy meetings.
• Consistent and repeatable processes—A well-managed IT environment will adhere to
best practices and processes such as those presented by the Information Technology
Infrastructure Library (ITIL). Although it might seem that processes could get in the way
of quick reactions, well-designed processes can usually be adapted to meet new
requirements. Specifically, change and configuration management practices can help IT
departments quickly react to new needs.
• Communications—Too often, IT departments tend to work in a way that is isolated from
other areas of the organization. In some cases, IT doesn’t find out about the needs of its
users until just before changes are required. This situation should be avoided by
proactively talking with users and business managers to help prioritize any changes that
might be required. In many cases, simple solutions can be developed that minimize
disruptive impact while meeting business goals.
• Efficient administration—Most IT departments lack the resources to spend days, weeks,
or months manually making system configuration changes. Rather, the changes must be
made as quickly as possible, but still in a reliable way. Tasks such as the deployment of
new software, upgrades to existing equipment, and the deployment of new computing
hardware can take significant amounts of time and effort when performed manually.
Through the use of dedicated tools for managing the IT infrastructure, even organization-
wide changes can be implemented quickly and reliably.
Many other features can also help make IT departments more agile. However, the general rule is
that greater agility comes from efficient and coordinated IT departments.

92
The Reference Guide to Data Center Automation

Automating IT Management
Obviously, all these requirements related to automating IT management can necessitate a
significant amount of expertise, time, and effort. As with many other areas of improving IT
efficiency, data center automation tools can significantly help IT departments increase their
flexibility. Especially when budgets and personnel resources are limited, investments in
automation can decrease the overhead related to changes.
Specific areas from which organizations can benefit include change and configuration
management, server and network provisioning deployment, automatic updates, asset
management, and reporting. For example, there are significant benefits to storing all IT-related
information in a centralized Configuration Management Database (CMDB). The combined data
can help IT and business leaders quickly identify which systems might need to be updated to
accommodate business changes.
Overall, the process of making an IT department more flexible and agile can provide tremendous
advantages throughout an entire organization. By quickly adapting to changing needs, the role of
IT can transform from a rate-of-change limitation to a strategic advantage. And, through the use
of data center automation technology and best practices, IT organizations can quickly work
towards the features that can help make them agile.

93
The Reference Guide to Data Center Automation

Policy Enforcement
Well-managed IT departments are characterized by having defined, repeatable processes that are
communicated throughout the organization. However, sometimes that alone isn’t enough—it’s
important for IT managers and systems administrators to be able to verify that their standards are
being followed throughout the organization.

The Benefits of Policies


It usually takes time and effort to implement policies, so let’s start by looking at the various
benefits of putting them in place. The major advantage to having defined ways of doing things in
an IT environment is that of ensuring that processes are carried out in a consistent way. IT
managers and staffers can develop, document, and communicate best practices related to how to
best manage the environment.

Types of Policies
Policies can take many forms. For example, one common policy is related to password strength
and complexity. These requirements usually apply to all users within the organization and are
often enforced using technical features in operating systems (OSs) and directory services
solutions. Other types of policies might define response times for certain types of issues or
specify requirements such as approvals before important changes are made. Some policies are
mandated by organizations outside of the enterprise’s direct control. The Health Insurance
Portability and Accountability Act (HIPAA), the Sarbanes-Oxley Act, and related governmental
regulations fall into this category.

Defining Policies
Simply defined, policies specify how areas within an organization are expected to perform their
responsibilities. For an IT department, there are many ways in which policies can be used. On
the technical side, IT staff might create a procedure for performing system updates. The
procedure should include details of how downtime will be scheduled and any related technical
procedures that should be followed. For example, the policy might require systems
administrators to verify system backups before performing major or risky changes.
On the business and operations side, the system update policy should include details about who
should be notified of changes, steps in the approvals process, and the roles of various members
of the team, such as the service desk and other stakeholders.

94
The Reference Guide to Data Center Automation

Figure 26: An overview of a sample system update policy.

Involving the Entire Organization


Some policies might apply only to the IT department within an organization. For example, if a
team decides that it needs a patch or update management policy, it can determine the details
without consulting other areas of the business. More often, however, input from throughout the
organization will be important to ensuring the success of the policy initiatives. A good way to go
gather information from organization members is to implement an IT Policy committee. This
group should include individuals from throughout the organization. Figure 27 shows some of the
areas of a typical organization that might be involved. In addition, representation from IT
compliance staff members, HR personnel, and the legal department might be appropriate based
on the types of policies. The group should meet regularly to review current policies and change
requests.

Figure 27: The typical areas of an organization that should be involved in creating policies.

IT departments should ensure that policies such as those that apply to passwords, email usage,
Internet usage, and other systems and services are congruent with the needs of the entire
organization. In some cases, what works best for IT just doesn’t fit with the organization’s
business model, so compromise is necessary. The greater the “buy-in” for a policy initiative, the
more likely it is to be followed.

95
The Reference Guide to Data Center Automation

Identifying Policy Candidates


For some IT staffers, the mere mention of implementing new policies will conjure up images of
the pointy-haired boss from the Dilbert comic strips. Cynics will argue that processes can slow
operations and often provide little value. That raises the question of what characterizes a well-
devised and effective policy. Sometimes, having too many policies (and steps within those
policies) can actually prevent people from doing their jobs effectively.
So, the first major question should center around whether a policy is needed and the potential
benefits of establishing one. Good candidates for policies include those areas of operations that
are either well defined or need to be. Sometimes, the needs are obvious. Examples might include
discovering several servers that haven’t been updated to the latest security patch level, or
problems related to reported issues “falling through the cracks.” Also, IT risk assessments
(which can be performed in-house or by outside consultants) can be helpful in identifying areas
in which standardized operations can streamline operations. In all of these cases, setting up
policies (and verifying that they are being followed) can be helpful.

Communicating Policies
Policies are most effective when all members of the organization understand them. In many
cases, the most effective way to communicate a policy is to post it on an intranet or other shared
information site. Doing so will allow all staff to view the same documentation, and it will help
encourage updates when changes are needed.

Policy Scope
Another consideration related to defining policies is determining how detailed and specific
policies should be. In many cases, if policies are too detailed, they may defeat their purpose—
either IT staffers will ignore them or will feel stifled by overly rigid requirements. In those cases,
productivity will suffer. Put another way, policy for the sake of policy is generally a bad idea.
When writing policies, major steps and interactions should be documented. For example, if a
policy requires a set of approvals to be obtained, details about who must approve the action
should be spelled out. Additional information such as contact details might also be provided.
Ultimately, however, it will be up to the personnel involved to ensure that everything is working
according to the process.

Checking for Policy Compliance


Manually verifying policy compliance can be a difficult and tedious task. Generally, this task
involves comparing the process that was performed to complete certain actions against the
organization’s definitions. Even in situations that require IT staffers to thoroughly document
their actions, the process can be difficult. The reason is the amount of overhead that is involved
in manually auditing the actions. Realistically, most organizations will choose to perform
auditing on a “spot-check” basis, where a small portion of the overall policies are verified.

96
The Reference Guide to Data Center Automation

Automating Policy Enforcement


For organizations that tend to perform most actions on an ad-hoc basis, defining policies and
validating their enforcement might seem like it adds a significant amount of overhead to the
normal operations. And, even for organizations that have defined policies, it’s difficult to verify
that policies and processes are being followed. Often, it’s not until a problem occurs that IT
managers look back at how changes have been made.
Fortunately, through the use of integrated data center automation tools, IT staff can have the
benefits of policy enforcement while minimizing the amount of extra work that is required. This
is possible because it’s the job of the automated system to ensure that the proper prerequisites are
met before any change is carried out. Figure 28 provides an example.

Figure 28: Making changes through a data center automation tool.

Evaluating Policy Enforcement Solutions


When evaluating automation utilities, there are numerous factors to keep in mind. First, the
better integrated the system is with other IT tools, the more useful it will be. As policies are often
involved in many types of modifications to the environment, combining policy enforcement with
change and configuration management makes a lot of sense.
Whenever changes are to be made, an automated data center suite can verify whether the proper
steps have been carried out. For example, it can ensure that approvals have been obtained, and
that the proper systems are being modified. It can record who made which changes, and when.
Best of all, through the use of a few mouse clicks, a change (such as a security patch) can be
deployed to dozens or hundreds of machines in a matter of minutes. Any time a change is made,
the modification can be compared against the defined policies. If the changes meet the
requirements, that are committed. If not, they are either prevented or a warning is sent to the
appropriate managers.
Additionally, through the use of a centralized Configuration Management Database (CMDB),
users of the system can quickly view details about devices throughout the environment. This
information can be used to determine which systems might not meet the organization’s
established standards, and which changes might be required. Overall, through the use of
automation, IT organizations can realize the benefits of enforcing policies while at the same time
streamlining policy compliance.

97
The Reference Guide to Data Center Automation

Server Monitoring
In many IT departments, the process of performing monitoring is done on an ad-hoc basis. Often,
it’s only after numerous users complain about slow response times or throughput when accessing
a system that IT staff gets involved. The troubleshooting process generally requires multiple
steps. Even in the best case, however, the situation is highly reactive—users have already run
into problems that are affecting their work. Clearly, there is room for improvement in this
process.

Developing a Performance Optimization Approach


It’s important for IT organizations to develop and adhere to an organized approach to
performance monitoring and optimization. All too often, systems and network administrators
will simply “fiddle with a few settings” and hope that it will improve performance. Figure 29
provides an example of a performance optimization process that follows a consistent set of steps.

 Note that the process can be repeated, based on the needs of the environment. The key point is that
solid performance-related information is required in order to support the process.

Figure 29: A sample performance optimization process.

Deciding What to Monitor


Over time, desktop, server, and network hardware will require certain levels of maintenance or
monitoring. These are generally complex devices that are actively used within the organization.
There are two main aspects to consider when implementing monitoring. The first is related to
uptime (which can report when servers become unavailable) and the other is performance (which
indicates the level of end-user experience and helps in troubleshooting).

98
The Reference Guide to Data Center Automation

Monitoring Availability
If asked about the purpose of their IT departments, most managers and end users would specify
that it is the task of the IT department to ensure that systems remain available for use. Ideally, IT
staff would be alerted when a server or application becomes unavailable, and would be able to
quickly take the appropriate actions to resolve the situation.
There are many levels at which availability can be monitored. Figure 30 provides an overview of
these levels. At the most basic level, simple network tests (such as a PING request) can be used
to ensure that a specific server or network device is responding to network requests. Of course,
it’s completely possible that the device is responding, but that it is not functioning as requested.
Therefore, a higher-level test can verify that specific services are running.

Figure 30: Monitoring availability at various levels.

Tests can also be used to verify that application infrastructure components are functioning
properly. On the network side, route verifications and communications tests can ensure that the
network is running properly. On the server side, isolated application components can be tested by
using procedures such as test database transactions and HTTP requests to Web applications. The
ultimate (and most relevant) test is to simulate the end-user experience. Although it can
sometimes be challenging to implement, it’s best to simulate actual use cases (such as a user
performing routine tasks in a Web application). These tests will take into account most aspects of
even complex applications and networks and will help ensure that systems remain available for
use.

99
The Reference Guide to Data Center Automation

Monitoring Performance
For most real-world applications, it’s not enough for an application or service to be available.
These components must also respond within a reasonable amount of time in order to be useful.
As with the monitoring of availability, the process of performance monitoring can be carried out
at many levels. The more closely a test mirrors end-user activity, the more relevant will be the
performance information that is returned. For complex applications that involve multiple servers
and network infrastructure components, it’s best to begin with a real-world case load that can be
simulated. For example, in a typical Customer Relationship Management (CRM) application,
developers and systems administrators can work together to identify common operations (such as
creating new accounts, running reports, or updating customers’ contact details). Each set of
actions can be accompanied by expected response times.
All this information can help IT departments proactively respond to issues, ideally before users
are even aware of them. As businesses increasingly rely on their computing resources, this data
can help tremendously.

Verifying Service Level Agreements


One non-technical issue of managing systems in an IT department is related to perception and
communication of requirements. For organizations that have defined and committed to Service
Level Agreements (SLAs), monitoring can be used to compare actual performance statistics
against the desired levels. For example, SLAs might specify how quickly specific types of
reports can be run or outline the overall availability requirements for specific servers or
applications. Reports can provide details related to how closely the goals were met, and can even
provide insight into particular problems. When this information is readily available to managers
throughout the organization, it can enable businesses to make better decisions about their IT
investments.

Limitations of Manual Server Monitoring


It’s possible to implement performance and availability monitoring in most environments using
existing tools and methods. Many IT devices offer numerous ways in which performance and
availability can be measured. For example, network devices usually support the Simple Network
Management Protocol (SNMP) standard, which can be used to collect operational data. On the
server side, operating systems (OSs) and applications include instrumentation that can be used to
monitor performance and configure alert thresholds. For example, Figure 31 shows how a
performance-based alert can be created within the built-in Windows performance tool.

100
The Reference Guide to Data Center Automation

Figure 31: Defining performance alerts using Windows System Monitor.

Although tools such as the Windows System Monitor utility can help monitor one or a few
servers, it quickly becomes difficult to manage monitoring for an entire environment. Therefore,
most systems administrators will use these tools only when they must troubleshoot a problem in
a reactive way. Also, it’s very easy to overlook critical systems when implementing monitoring
throughout a distributed environment. Overall, there are many limitations to the manual
monitoring process. In the real world, this means that most IT departments work in a reactive
way when dealing with their critical information systems.

101
The Reference Guide to Data Center Automation

Automating Server Monitoring


Although manual performance monitoring can be used in a reactive situation for one or a few
devices, most IT organizations require visibility into their entire environments in order to provide
the expected levels of service. Fortunately, data center automation tools can dramatically
simplify the entire process. There are numerous benefits related to this approach, including:
• Establishment of performance thresholds—Systems administrators can quickly define
levels of acceptable performance and have an automated solution routinely verify
whether systems are performing optimally. At its most basic level, the system might
perform PING requests to verify whether a specific server or network device is
responding to network requests. A much better test would be to execute certain
transactions and measure the total time for them to complete. For example, the host of an
electronic commerce Web site could create workflow that simulates the placing of an
order and routinely measure the amount of time it takes to complete the enter process at
various times during the day. The system can also take into account any SLAs that might
be established and can provide regular reports related to the actual levels of service.
• Notifications—When systems deviate from their expected performance, systems
administrators should be notified as quickly as possible. The notifications can be sent
using a variety of methods, but email is common in most environments. The automated
system should allow managers to develop and update schedules for their employees and
should take into account “on-call” rotation schedules, vacations, and holidays.
• Automated responses—Although it might be great to know that a problem has occurred
on a system, wouldn’t it be even better if the automated solution could start the
troubleshooting process? Data center automation tools can be configured to automatically
take corrective actions whenever a certain problem occurs. For example, if a particularly
troublesome service routinely stops responding, the system can be configured to
automatically restart the service. In some cases, this setup might resolve the situation
without human intervention. In all cases, however, automated actions can at least start the
troubleshooting process.
• Integration with other automation tools—By storing performance and availability
information in a Configuration Management Database (CMDB), data center automation
tools can help show IT administrators the “track record” for particular devices or
applications. Additionally, integrated solutions can use change tracking and configuration
management features to help isolate the potential cause of new problems with a server.
The end result is that systems and network administrators can quickly get the information
they need to resolve problems.
• Automated test creation—As mentioned earlier, the better a test can simulate what end
users are doing, the more useful it will be. Some automation tools might allow systems
administrators and developers to create actual user interface (UI) interaction tests. In the
case of Web applications, tools can automatically record the sequence of clicks and
responses that are sent to and from a server. These tests can then be repeated regularly to
monitor realistic performance. Additionally, the data can be tracked over time to isolate
any slow responses during periods of high activity.

102
The Reference Guide to Data Center Automation

Overall, through the use of data center automation tools, IT departments can dramatically
improve visibility into their environments. They can quickly and easily access information that
will help them more efficiently troubleshoot problems, and they report on the most critical aspect
of their systems: availability and performance.

Change Tracking
An ancient adage states, “The only constant is change.” This certainly applies well to most
modern IT environments and the businesses they support. Often, as soon as systems are
deployed, it’s time to update them or make modifications to address business needs. And keeping
up with security patches can take significant time and effort. Although the ability to quickly
adapt can increase the agility of organizations as a whole, with change comes the potential for
problems.

Benefits of Tracking Changes


In an ad-hoc IT environment, actions are generally performed whenever a systems or network
administrator deems them to be necessary. Often, there’s a lack of coordination and
communication. Responses such as, “I thought you did that last week,” are common and,
frequently, some systems are overlooked.
There are numerous benefits related to performing change tracking. First, this information can be
instrumental in the troubleshooting process or when identifying the root cause of a new problem.
Second, tracking change information provides a level of accountability and can be used to
proactively manage systems throughout an organization.

Defining a Change-Tracking Process


When implemented manually, the process of keeping track of changes takes a significant amount
of commitment from users, systems administrators, and management. Figure 32 provides a high-
level example of a general change-tracking process. As it relies on manual maintenance, the
change log is only as useful as the data it contains. Missing information can greatly reduce the
value of the log.

Figure 32: A sample of a manual change tracking process.

103
The Reference Guide to Data Center Automation

Establishing Accountability
It’s no secret that most IT staffers are extremely busy keeping up with their normal tasks.
Therefore, it should not be surprising that network and systems administrators will forget to
update change-tracking information. When performed manually, policy enforcement generally
becomes a task for IT managers. In some cases, frequent reminders and reviews of policies and
processes are the only way to ensure that best practices are being followed.

Tracking Change-Related Details


When implementing change tracking, it’s important to consider what information to track. The
overall goal is to collect the most relevant information that can be used to examine changes
without requiring a significant amount of overhead. The following types of information are
generally necessary:
• The date and time of the change—It probably goes without saying that the time at which
a change occurs is important. The time tracked should take into account differences in
time zones and should allow for creating a serial log of all changes to a particular set of
configuration settings.
• The change initiator—For accountability purposes, it’s important that the person who
actually made the change be included in the auditing information. This requirement helps
ensure that the change was authorized, and provides a contact person from whom more
details can be obtained.
• The initial configuration—A simple fact of making changes is that sometimes they can
result in unexpected problems. An auditing system should be able to track the state of a
configuration setting before a change was made. In some cases, this can help others
resolve the problem or undo the change, if necessary.
• The configuration after the change—This information will track the state of the audited
configuration setting after the change has been made. In some cases, this information
could be obtained by just viewing the current settings. However, it’s useful to be able to
see a serial log of changes that were made.
• Categories—Types of changes can be grouped to help the appropriate staff find what they
need to know. For example, a change in the “Backups” category might not be of much
interest to an application developer, while systems administrators might need to know
about the information contained in this category.
• Comments—This is one area in which many organizations fall short. Most IT staff (and
most people, for that matter) doesn’t like having to document changes. An auditing
system should require individuals to provide details related to why a change was made.
IT processes should require that this information is included (even if it seems obvious to
the person making the change).
In addition to these types of information, the general rule is that more detail is better. IT
departments might include details that require individuals to specify whether change
management procedures were followed and who authorized the change.

104
The Reference Guide to Data Center Automation

Table 8 shows an example of a simple, spreadsheet-based audit log. Although this system is
difficult and tedious to administer, it does show the types of information that should be collected.
Unfortunately, it does not facilitate advanced reporting, and it can be difficult to track changes
that affect complex applications that have many dependencies.

Date/Time Change System(s) Initial New Categories


Initiator Affected Configuration Configuration
7/10/2006 Jane Admin DB009 and Security patch Security patch Security
DB011 level 7.3 level 7.4 patches;
server
updates
7/12/2006 Joe Admin WebServer007 CRM CRM Vendor-
and application application based
WebServer012 version 3.1 version 3.5 application
update
07/15/2006 Dana DBA DB003 (All N/A Created
databases) archival
backups of all
databases for
off-site
storage

Table 8: A sample audit log for server management.

Automating Change Tracking


Despite the numerous benefits related to change tracking, IT staff members might be resistant to
the idea. In many environments, the processes related to change tracking can cause significant
overhead related to completing tasks. Unfortunately, this can lead to either non-compliance (for
example, when systems administrators neglect documenting their changes) or reductions in
response times (due to additional work required to keep track of changes).
Fortunately, through the use of data center automation tools, IT departments can gain the benefits
of change tracking while minimizing the amount of effort that is required to track changes. These
solutions often use a method by which changes are defined and requested using the automated
system. The system, in turn, is actually responsible for committing the changes.
There are numerous benefits to this approach. First and foremost, only personnel that are
authorized to make changes will be able to do so. In many environments, the process of directly
logging into a network device or computer can be restricted to a small portion of the staff. This
can greatly reduce the number of problems that occur due to inadvertent or unauthorized
changes. Second, because the automated system is responsible for the tedious work on dozens or
hundreds of devices, it can keep track of which changes were made and when they were
committed. Other details such as the results of the change and the reason for the change
(provided by IT staff) can also be recorded. Figure 33 shows an overview of the process.

105
The Reference Guide to Data Center Automation

Figure 33: Committing and tracking changes using an automated system.

By using a Configuration Management Database (CMDB), all change and configuration data can
be stored in a single location. When performing troubleshooting, systems and network
administrators can quickly run reports to help isolate any problems that might have occurred due
to a configuration change. IT managers can also generate enterprise-wide reports to track which
changes have occurred. Overall, automation can help IT departments implement reliable change
tracking while minimizing the amount of overhead incurred.

Network Change Detection


Network-related configuration changes can occur based on many requirements. Perhaps the most
common is the need to quickly adapt to changing business and technical requirements. The
introduction of new applications often necessitates an upgrade of the underlying infrastructure,
and growing organizations seem to constantly outgrow their capacity. Unfortunately, changes
can lead to unforeseen problems that might result in a lack of availability, downtime, or
performance issues. Therefore, IT organizations should strongly consider implementing methods
for monitoring and tracking changes.

The Value of Change Detection


We already covered some of the important causes for change, and in most organizations, these
are inevitable. Coordinating changes can become tricky in even small IT organizations. Often,
numerous systems need to be modified at the same time, and human error can lead to some
systems being completely overlooked. Additionally, when roles and responsibilities are
distributed, it’s not uncommon for IT staff to “drop the ball” by forgetting to carry out certain
operations. Figure 34 shows an example of some of the many people that might be involved in
applying changes.

106
The Reference Guide to Data Center Automation

Figure 34: Multiple “actors” making changes on the same device.

Unauthorized Changes
In stark contrast to authorized changes that have the best of intentions, network-related changes
might also be committed by unauthorized personnel. In some cases, a junior-level network
administrator might open a port on a firewall at the request of a user without thoroughly
considering the overall ramifications. In worse situations, a malicious attacker from outside the
organization might purposely modify settings to weaken overall security.

Manual Change Tracking


All these potential problems point to the value of network change detection. Comparing the
current configuration of a device against its expected configuration is a great first step. Doing so
allows network administrators to find any systems that don’t comply with current requirements.
Even better is the ability to view a serial log of changes, along with the reasons the changes were
made. Table 9 provides a simple example of tracking information in a spreadsheet or on an
intranet site.

107
The Reference Guide to Data Center Automation

Date of Devices / Change Purpose of Change Comments


Change Systems
Affected
5/5/2006 Firewall01 and Opened TCP port User request for Port is only
Firewall02 1178 (outbound) access to Web required for 3
application days.
5/7/2006 Corp-Router07 Upgraded Addresses a known Update was
firmware security vulnerability tested on spare
hardware

Table 9: An example of a network change log.

Of course, there are obvious drawbacks to this manual process. The main issue is that the
information is only useful when all members of the network administration team place useful
information in the “system.” When data is stored in spreadsheets or other files, it’s also difficult
to ensure that the information is always up to date.

Challenges Related to Network Change Detection


Network devices tend to store their configuration settings in text files (or can export to this
format). Although it’s a convenient and portable option, these types of files don’t lend
themselves to being easily compared—at least not without special tools that understand the
meanings of the various options and settings. Add to this the lack of a standard configuration file
type between vendors and models, and you have a large collection of disparate files that must be
analyzed.
In many environments, it is a common practice to create backups of configuration files before a
change is made. Ideally, multiple versions of the files would also be maintained so that network
administrators could view a history of changes. This “system,” however, generally relies on
network administrators diligently making backups. Even then, it can be difficult to determine
who made a change, and (most importantly) why the change was made. Clearly, there’s room for
improvement.

Automating Change Detection


Network change detection is an excellent candidate for automation—it involves relatively simple
tasks that must be carried out consistently, and it can be tedious to manage these settings
manually. Data center automation applications can alleviate much of this pain in several ways.

Committing and Tracking Changes


It’s a standard best practice in most IT environments to limit direct access to network devices
such as routers, switches, and firewalls. Data center automation tools help implement these
limitations while still allowing network administrators to accomplish their tasks. Instead of
making changes directly to specific network hardware, the changes are first requested within the
automation tool. The tool can perform various checks, such as ensuring that the requester is
authorized to make the change and verifying that any required approvals have been obtained.

108
The Reference Guide to Data Center Automation

Once a change is ready to be deployed, the network automation utility can take care of
committing the changes automatically. Hundreds of devices can be updated simultaneously or
based on a schedule. Best of all, network administrators need not connect to any of the devices
directly, thereby increasing security.

Verifying Network Configuration


Data center automation utilities also allow network administrators to define the expected settings
for their network devices. If, for example, certain routing features are not supported by the IT
group, the system can quickly check the configuration of all network devices to ensure that it has
not been enabled.
Overall, automated network change detection can help IT departments ensure that critical parts
of their infrastructure are configured as expected and that no unwanted or unauthorized changes
have been committed.

Notification Management
It’s basic human nature to be curious about how IT systems and applications are performing, but
it can become a mission-critical concern whenever events related to performance or availability
occurs. In those cases, it’s the responsibility of the IT department to ensure that problems are
addressed quickly and that any affected members of the business are notified of the status.

The Value of Notifications


One of the worst parts of any outage is not being informed of the current status of the situation.
Most people would feel much more at ease knowing that the electricity will come back on after a
few hours instead of (quite literally) sitting in the dark trying to guess what’s going on. There are
two broad categories related to communications within and between an IT organization: internal
and external notifications.

Managing Internal Notifications


There are many types of events that are specific to the IT staff itself. For example, creating
backups and updating server patch levels might require only a few members of the team to be
notified. These notifications can be considered “internal” to the IT department.
When sending notifications, an automated system should take into account the roles and
responsibilities of staff members. In general, the rule should be to notify only the appropriate
staff, and to provide detailed information. Sending a simple message stating “Server Alert” to the
entire IT staff is usually not very useful. In most situations, it’s appropriate to include technical
details, and the format of the message can be relatively informal. Also, escalation processes
should be defined to make sure that no issue is completely ignored.

109
The Reference Guide to Data Center Automation

Managing External Notifications


When business systems and applications are affected, it’s just as important to keep staff outside
of the IT department well informed. Users might assume that “IT is working on it,” but often
they need more information. For example, how long are the systems expected to be unavailable?
If the outage is only for a few minutes, users might choose to just wait. If it’s going to be longer,
perhaps the organization should switch to “Plan B” (which might involve using an alternative
system or resorting to pen-and-paper data collection).

Creating Notifications
In many IT environments, IT departments are notorious for delivering vague, ambiguous, and
overly technical communications. The goal for the content of notifications is to make them
concise and informative in a way that users and non-technical management can understand.

What to Include in a Notification


There are several important points that should be included in any IT communication. Although
the exact details will vary based on the type of situation and the details of the audience, the
following list highlights some aspects to keep in mind when creating notifications:
• Message details—The date and time of the notification, along with a descriptive subject
line is a good start. Some users might want to automatically filter messages based on their
content. Keep in mind that, for some users, the disruptions caused by notifications that
don’t affect them might actually reduce productivity. IT departments should develop
consistent nomenclature for the severity of problems and for identifying who might be
affected. A well-organized message can help users find the information they need quickly
and easily.
• Acknowledgement of the problem—This portion of the notification can quickly assure
users that the IT staff is aware of a particular problem such as the lack of availability of
an application. It’s often best to avoid technical details. Users will be most concerned
about the fact that they cannot complete their jobs. Although it might be interesting to
know that a database server’s disk array is unavailable or there is a problem on the
Storage Area Network (SAN), it’s best to avoid unnecessary details that might confuse
some users.
• Estimated time to resolution—This seemingly little piece of information can be quite
tricky to ascertain. When systems administrators are unaware of the cause of a problem,
how can they be expected to provide a timeframe for resolution? However, for users, not
having any type of estimate can be frustrating. If IT departments have some idea of how
long it will take to repair a problem (perhaps based on past experience), they can provide
those details. It’s often better to “under-promise and over-deliver” when it comes to time
estimates. If it’s just not possible to provide any reliable estimate, the notification should
state just that and promise to provide more information when an update becomes
available.

110
The Reference Guide to Data Center Automation

• What to expect—The notification should include details about the current and expected
effects of the problem. In some cases, systems and network administrators might need to
reboot devices or cause additional downtime in unrelated systems. If time windows are
known, it’s a good idea to include those details as well.
• Any required actions—If users are expect to carry out any particular tasks or make
changes to their normal processes, this information should be spelled out in the
notification. If emergency processes are in place, users should be pointed to the
documentation. If not, a point-person (such as a department manager) should be specified
to make the determinations related to what users should do.
• Which users and systems are affected—Some recipients of notifications might be
unaware of the problem altogether. The fact that they’re receiving a notification might
indicate that they should be worried. If it’s likely that some recipients will be able to
safely ignore the message, this should also be stated clearly. The goal is to minimize any
unnecessary disruption to work.
• Reassurance—This might border on the “public relations” side of IT management, but
it’s important for users to believe that their IT departments are doing whatever is possible
to resolve the situation quickly. The notification might include contact information for
reporting further problems, and can refer users to any posted policies or processes that
might be relevant to the downtime.
Although this might seem like a lot of information to include, in many cases, it can be summed
up in just a few sentences. The important point is for the message to be concise and informative.

What to Avoid in a Notification


Notifications should, for the most part, be brief and to the point. There are a few types of
information that generally should not be included. First, speculation should be minimized. If a
systems administrator suspects the failure of a disk controller (which has likely resulted in some
data loss), it’s better to wait until the situation is understood before causing unnecessary panic.
Additional technical details can also cause confusion to novice users. Clearly, IT staff will be in
a position of considerable stress when sending out such notifications, so it’s important to stay
focused on the primary information that is needed by IT users.

Automating Notification Management


Many of the tasks related to creating and sending notifications can be done manually, but it can
be a tedious process. Commonly, systems administrators will send ad-hoc messages from their
personal accounts. They will often neglect important information, causing recipients to respond
requesting additional details. In the worst case, messages might never be sent, or users might be
ignored altogether.
Data center automation tools can fill in some of these gaps and can help ensure that notifications
work properly within and outside of the IT group. The first important benefit is the ability to
define the roles and responsibilities of members of the IT team within the application. Contact
information can also be centrally managed, and details such as on-call schedules, vacations, and
rotating responsibilities can be defined. The automated system can then quickly respond to issues
by contacting those that are involved.

111
The Reference Guide to Data Center Automation

The messages themselves can use a uniform format based on a predefined template. Fields for
common information such as “Affected Systems,” “Summary,” and “Details” can also be
defined. This can make it much easier for Service desk staff to respond to common queries about
applications. Finally, the system can keep track of who was notified about particular issues, and
when a response was taken. Overall, automated notifications can go a long way toward keeping
IT staff and users informed of both expected and unexpected downtime and related issues. The
end result is a better “customer satisfaction” experience for the entire organization.

112
The Reference Guide to Data Center Automation

Server Virtualization
Virtualization refers to the abstraction between the underlying physical components of an IT
architecture and how it appears to users and other devices. The term virtualization can be applied
to network devices, storage environments, databases, other portions of an IT infrastructure, and
servers. Simply put, server virtualization is the ability to run multiple independent operating
systems (OSs) concurrently on the same hardware.

Understanding Virtualization
The concept of running multiple “virtual machines” on a single computer can be traced back to
the days of mainframes. In that architecture, many individual computing environments or
sessions can be created on a single large computer. Although each session runs in what seems
like an isolated space, the underlying management software and hardware translates users’
requests and commands so that users can access the same physical hardware. The benefits
include scalability (many virtual machines can run simultaneously on the same hardware) and
manageability (most administration is handled centrally and client-side hardware requirements
are minimal).

Current Data Center Challenges


Before diving into the technical details of virtual machines and how they work, let’s set the
foundation by exploring the background for why virtualization has quickly become an important
option for data center administrators. The main issue is that of server utilization—or lack thereof.
The vast majority of computers in most data centers run at a fraction of their overall potential
(often as little as 10 to 15 percent). The obvious solution is server consolidation: Placing multiple
applications on the same hardware. However, due to the complexity of many environments,
potentials for conflicts can make server consolidation difficult if not impossible. One of the
many benefits of virtualization is that it allows systems administrators to easily create multiple
virtual operating environments on a single server system, thereby simplifying server
consolidation.

Virtualization Architecture
For modern computing environments, virtualization solutions can be quickly and easily installed
on standard hardware. Figure 35 shows a generic example of one way in which virtualization can
be implemented.

113
The Reference Guide to Data Center Automation

Figure 35: A logical overview of virtualization.

At the bottom of the figure is the actual physical hardware—the CPU, memory, hard disks,
network adapters, and other components that make up the complete system. Running atop the
hardware is the OS, which includes device drivers that interact with physical system
components. Moving up the stack, within the OS is a virtualization management layer. This layer
allows for the creation of multiple independent virtual machine environments. The virtualization
layer may run as an application or as a service (depending on the product). Finally, at the top of
the “stack” are the virtual machines. It is at this level that multiple OSs can run simultaneously.
The job of the virtualization layer is to translate and coordinate calls from within each virtual
machine to and from the underlying hardware. For example, if the Linux-based OS within a
virtual machine requests access to a file, the virtualization management application translates the
request and redirects it to the actual file that represents a virtual hard drive on the host file
system. Figure 36 shows an example of how a Microsoft Virtual Server 2005-based
virtualization stack might look.

114
The Reference Guide to Data Center Automation

Figure 36: An example of a virtualization configuration using Microsoft Virtual Server 2005 R2.

Virtualization Terminology
Virtualization provides new ways in which to refer to standard computer resources, so it’s
important to keep in mind some basic terminology. The physical computer on which the
virtualization platform is running is known as the host computer and the primary OS is referred
to as the host OS. The OSs that run on top of the virtualization platform are known as guest OSs.
An additional concept to keep in mind is the virtual hard disk. From the perspective of the guest
OS, these files appear to be actual physical hard disks. However, physically, they’re stored as
files within the host OS file system.
Finally, another major advantage of virtual machines is that they can be “rolled back” to a
previous state. This is done by keeping track of all write operations and storing them in a file that
is separate from the primary virtual hard disk.

115
The Reference Guide to Data Center Automation

Other Virtualization Approaches


It’s important to note that, in addition to the OS-based virtualization layer shown in Figure 36, there are
other virtualization approaches. In one such approach, the virtualization layer can run directly on the
hardware itself. This model (also referred to as a “Hypervisor”) offers the advantage of avoiding the
overhead related to running a primary host OS. The drawbacks, however, include more specific
requirements for device drivers and the potential lack of management software.
Another virtualization approach is “application-level virtualization.” In this configuration, application
environments are virtualized—in contrast with running entire OSs. The main benefit is that scalability can
be dramatically improved—often hundreds of applications can run simultaneously on a single physical
server. There are drawbacks, however; some complex applications might not be supported or might
require modifications. In addition, OS versions, device drivers, updates, and settings will affect all virtual
environments because they’re defined at the machine level.

The following sections focus on the type of virtualization described in Figure 36.

Benefits of Virtualization
The list of benefits related to working with virtual machines is a long one. Let’s take a brief look
at some of the most relevant advantages from the standpoint of data center management:
• Increased hardware utilization—By allowing multiple virtual machines to run
concurrently on a single server, overall resource utilization can be dramatically improved.
This benefit can lead to dramatic cost reductions in data center environments, without
significant costs for upgrading current hardware.
• Hardware independence—One of the major challenges related to managing data center
environments is dealing with heterogeneous hardware configurations. Although it’s easy
to physically relocate an array of hard disks to another machine, chances are good that
OS and device driver differences will prevent it from working smoothly (if at all). On a
given virtualization platform, however, virtual machines will use a standardized virtual
environment that will stay constant regardless of the physical hardware configuration.
• Load-balancing and portability—Guest OSs are designed for compatibility with the
virtualization platform (and not the underlying hardware), so they can easily be moved
between host computers. This process can allow users and systems administrators to
easily make copies of entire virtual machines or to rebalance them based on overall server
load. Figure 37 provides an illustration. This method allows systems administrators to
optimize performance as business and performance needs change over time. In addition,
it’s far easier than manually moving applications or reallocating physical servers.

116
The Reference Guide to Data Center Automation

Figure 37: Load-balancing of virtual machines based on utilization.

• Rapid provisioning—New virtual machines can be set up in a matter of minutes, and


hardware changes (such as the addition of a virtual hard disk or network interface) can be
performed in a matter of seconds. When compared with the process of procuring new
hardware, rack-mounting the devices, and performing the entire installation process,
provisioning and deploying virtual machines usually takes just a small fraction of the
time of deploying new hardware.
• Backup and disaster recovery—The process of creating a complete backup of a virtual
machine can be quicker and easier than backing up a physical machine. This process also
lends itself well to the creation and maintenance of a disaster recovery site.

117
The Reference Guide to Data Center Automation

Virtualization Scenarios
Earlier, we mentioned how virtualization can help data center administrators in the area of server
consolidation. This, however, is only one of the many ways in which this technology can be
used. Others include:
• Agile management—As virtual machines can be created, reconfigured, copied, and
moved far more easily than can physical servers, virtualization technology can help IT
departments remain flexible enough to accommodate rapid changes.
• Support for legacy applications—IT departments are commonly stuck with supporting
older servers because applications require OSs that can’t run on newer hardware. The
result is higher support costs and decreased reliability. By placing these application
within a virtual machine, the application can be moved to newer hardware while still
running on an older OS.
• Software development and testing—Developers and testers often require the ability to
test their software in many configurations. Virtual machines can easily be created for this
purpose. It’s easy to copy virtual machines to make, for example, changes to the service
pack level. Additionally, whenever a test is complete, the virtual machine can be reverted
to its original state to start the process again.
• Training—Dozens of virtual machines can be hosted on just a few physical servers, and
trainers can easily roll back changes before or after classes. Students can access their
virtual machines using low-end client terminals or even over the Internet. Usually, it’s far
easier to maintain a few host servers than it is to maintain dozens of client workstations.

Limitations of Virtualization
Despite the many benefits and applications of virtualization technology, there are scenarios in
which this approach might not be the perfect solution. The first and foremost concern for most
systems administrators is that of performance. All virtualization solutions will include some level
of overhead due to the translation of hardware calls between each virtual machine and physical
hardware device. Furthermore, virtual machines are unaware of each other, so competition for
resources such as CPU, memory, disk, and network devices can become quite high. Overall, for
many types of applications and services, organizations will likely find that the many benefits of
virtualization will outweigh the performance hit. The key point is that IT departments should do
as much performance testing as possible before rolling out virtualized applications.
There are additional considerations to keep in mind. For example, for physical servers that are
currently running at or near capacity, it might make more sense to leave those systems as they
are. The same goes for complex multi-tier applications that may be optimized for a very specific
hardware configuration. Additionally, for applications require custom hardware that is not
supported by the virtualization platform (for example, 3-D video acceleration), running within a
virtual machine will not be an option. Over time, virtualization solutions will include increasing
levels of hardware support, but in the mean time, it’s important to test and verify your
requirements before going live with virtualization.

118
The Reference Guide to Data Center Automation

Automating Virtual Machine Management


In many ways, IT environments should treat virtual machines just like physical ones. Virtual
machines should be regularly patched, monitored, and backed up and should adhere to standard
IT best practices. This leads to the issue of automating the management of virtualization
solutions. IT departments should look for tools that are virtualization-aware. Specifically, these
solutions should be able to discern which virtual machines are running on which hosts systems.
Ideally, virtualization management tools should be integrated with other data center automation
features such as change and configuration management and performance monitoring and should
coordinate with IT policies and processes.
Developers can also automate virtual machine management. Most virtualization solutions
provide an Application Programming Interface (API) that allows for basic automation of virtual
machines. You can generally write simple scripts that enable tasks such as creating new virtual
machines, starting and stopping virtual machines, and moving virtual machines to other
computers. More complex programs can also be created.
Overall, through the use of virtualization technology, IT departments can realize numerous
benefits such as increased hardware utilization and improved management of computer
resources. And, through the use of automation, they can ensure that virtual machines are
managed as well as physical ones.

Remote/Branch Office Management


In an ideal world, all of an organization’s technical and human resources would be located within
a single building or location. Everything would be within arm’s reach, and systems
administrators would be able to easily access all their resources from a single data center. The
reality for all but the smallest of organizations, however, is that it’s vital to be able to support a
distributed environment. The specifics can range from regional offices to home offices to
traveling “road warriors.” In all cases, it’s important to ensure that users can get the information
they need and that all IT assets are properly managed.

Challenges of Remote Management


Before delving into the details of automating remote management, it will be helpful to discuss
the major challenges related to performing these tasks. The overall goal is for IT departments to
ensure consistency in managing resources that reside in the corporate data center as well as
resources that might be located in a small office on the other side of the planet. Let’s look at
some details.

119
The Reference Guide to Data Center Automation

Technical Issues
In some ways, technology has come to the rescue: network bandwidth is more readily available
(and at a lower cost) than it has been in the past, and establishing physical network connectivity
is usually fairly simple. In other ways, improvements in technology have come with a slew of
new problems. Storage requirements often grow at a pace that far exceeds the capacity of
devices. In addition, almost all employees of modern organizations have grown accustomed to
high-bandwidth, low-latency network connections regardless of their locations. IT departments
must meet these demands while working within budget and resource constraints.
Perhaps one of the most pertinent issues related to remote office management is that of network
bandwidth. Usually the total amount of bandwidth is constrained, and factors such as latency
must be taken into account. This process has often lead to remote office systems being less
frequently updated. Servers sitting in a wiring closet of a branch office are often neglected and
don’t get the attention they deserve. The result is systems that are likely out of compliance with
IT policies and standards.

Personnel Issues
Ideally, organizations would be able to place senior-level systems and network administrators at
each remote office. Unfortunately, cost considerations almost always prohibit this. Therefore,
certain tasks must be performed manually (and often by less-trained individuals). Common tasks
include the installation of security updates or the management of backup media. Dedicated
technical staff is not available, so it’s common for these important operations to be overlooked or
to be performed improperly. Even when using remote management tools, some tasks cannot
easily be accomplished from a remote location.

Business Issues
Functions served by remote offices can be mission critical for many of an organization’s
operations. From a business standpoint, new initiatives and changes in standard operating
procedures must apply through the entire organization. The concept of “out of sight, out of
mind” simply is not acceptable for remote locations. All of the hardware, software, and network
devices that are under IT’s supervision must be maintained to ensure overall reliability and
security.

120
The Reference Guide to Data Center Automation

Automating Remote Office Management


Clearly, the task of managing remote locations and resources can be a difficult one. There is
some good news, however: data center automation solutions can make the entire process
significantly easier and much more efficient. IT departments that need to support remote offices
should look for several features and capabilities in the solutions that they select:
• Change and configuration management—Keeping track of the purpose, function, and
configuration of remote resources is extremely important in distributed environments.
Often, physically walking up to a specific computer just isn’t an option, so the data must
be accurate and up to date. Whenever changes are required, an automated solution can
efficiently distribute them to all the IT department’s resources. In addition, they can keep
a record of which changes were made and who made them. Doing so helps ensure that no
devices are overlooked and can help avoid many common problems.
• Use of a configuration management database (CMDB)—Collecting and maintaining
information across WAN links in distributed environments can require a lot of
bandwidth. When IT managers need to generate reports, it’s often unacceptable to wait to
query all the devices individually. A CMDB can centrally store all the important
technical details of the entire distributed environment and can facilitate quick access to
the details.
• Notifications—In fully staffed data centers, trained support staff is usually available to
resolve issues around-the-clock. For remote offices, however, an automated solution must
be able to notify the appropriate personnel about any problems that might have occurred.
In addition to IT staff, those alerted might include the branch manager or other contacts at
the remote site.
• Monitoring—The server and network resources that reside in remote offices are often
critical to the users in those offices. If domain controllers, database servers, routers, or
firewalls become unavailable, dozens or hundreds of users might be unable to complete
their job functions. Furthermore, staff at these locations might be unqualified to
accurately diagnose a problem and determine its root cause. Therefore, it’s important for
computing devices and resources to be closely monitored at all times.
• Scheduling—When supporting remote sites that are located in distant locations, factors
such as time zones and normal work hours must be taken into account. When performing
tasks such as applying updates, it’s important to have the ability to specify when the
changes should be committed. The main benefit is the ability to minimize disruptions to
normal activity without placing an unnecessary burden on IT staff.
• Support for low-bandwidth and unreliable connections—Remote sites will have varying
levels of network capacity and reliability. The automation solution must be able to adapt
to and accommodate situations such as the failure of a connection during an important
update or the application of security changes as soon as network connections become
available again. Also, client agents should be able to automatically detect low-bandwidth
states and reduce the number and length of messages that are sent accordingly.
In addition, most of the best practices covered through this guide also apply to remote sites. By
incorporating all these features in an IT automation solution, organizations can be assured that
their remote resources will enjoy the same level of care and management as resources in
corporate data centers.

121
The Reference Guide to Data Center Automation

Patch Management
One of the least glamorous but still important tasks faced by systems and network administrators
is that of keeping their hardware and software up to date. The benefits of applying patches for all
devices within an environment can range from reducing security vulnerabilities to ensuring
reliability and uptime. More importantly, the cost of not diligently testing and applying updates
can be extremely high.

The Importance of Patch Management


Although many of the reasons to keep systems updated might be obvious, let’s take a quick look
at the importance of having a patch management process. First and foremost, security is an
important concern for all the components of an IT infrastructure. Ranging from physical
hardware to operating systems (OSs) to applications, it’s important for known vulnerabilities and
issues to be addressed as quickly as possible. Due to the nature of security-related updates, it’s
difficult to predict which systems will be affected and when updates will be made available.
Thus, organizations must be ready to deploy these as soon as possible to prevent exposure to
security attacks.
Other reasons for managing patches are just as relevant. By using the latest software, IT
departments can avoid problems that might lead to downtime or data corruption. Some patches
might increase performance or improve usability. In all cases, there are many advantages to
deploying patches using an organized process.

Challenges of Manual Patch Management


Although some environments might handle patches on an ad-hoc “as-needed” basis, this
approach clearly leaves a lot to be desired. Even in relatively small IT environments, there are
numerous problems related to performing patch management through manual processes. Due to
the many demands on IT staff’s time, it’s often easy to overlook a specific patch or a specific
target device when updates are handled manually. The time and costs related to deploying
updates can also present a barrier to reacting as quickly as possible.
In larger IT environments, coordinating downtime schedules and allocating resources for keeping
hundreds or thousands of devices up to date can be difficult (if not impossible). Often, entire
remote sites or branch offices might be out of compliance with standard IT best practices and
policies. These seemingly small challenges often result in problems that are very difficult to
troubleshoot or that can allow network-wide security breaches. With all these factors in mind,
it’s easy to see how manual patch management is not ideal.

122
The Reference Guide to Data Center Automation

Developing a Patch Management Process


An important step in improving patch management is to develop a well-defined process. Figure
38 provides an example of the high-level steps that should be included in the process.

Figure 38: Steps in a typical patch management process.

Obtaining Updates
It’s important for IT staff to be aware of new updates and patches as soon as possible after
they’re released. Although many vendors provide newsletters and bulletins related to updates,
most IT environments must continuously monitor many sources for this information. This
requirement makes it very likely that some updates will be overlooked.

Identifying Affected Systems


Once a potential patch has been made available, systems administrators must determine whether
the issue applies to their environment. In some cases, the details of the update might not
necessitate a deployment to the entire environment. In other cases, however, dozens or hundreds
of systems might be affected. If the patch is relevant, the process should continue.

Testing Updates
A sad-but-true fact about working in IT is that sometimes the “cure” can be worse than the
disease. Software and hardware vendors are usually under a tremendous amount of pressure to
react to vulnerabilities once they’re discovered, and it’s possible that these updates will introduce
new bugs or may be incompatible with certain system configurations. This reality highlights the
need for testing an update. Developers and systems administrators should establish test
environments that can be used to help ensure that a patch does not have any unintended effects.

123
The Reference Guide to Data Center Automation

Deploying Updates
Assuming that a patch has passed the testing process, it’s time to roll out the update to systems
throughout the environment. Ideally, it will be possible to deploy all the changes simultaneously.
More likely, however, the need for system reboots or downtime will force IT departments to
work within regularly scheduled downtime windows.

Auditing Changes
Once patches have been deployed, it’s important to verify that all systems have been updated.
Due to technical problems or human error, it’s possible that some systems were not correctly
patched. When done manually, this portion of the process often requires the tedious step of
logging into each server or manually running a network scanning tool.

Automating Patch Management


Clearly, the process of implementing patch management is not an easy one. After multiplying the
effort required to perform the outlined steps by the frequency of updates from various vendors,
performing the process manually might simply be impossible. Fortunately, data center
automation tools can help to dramatically reduce the amount of time and error related to
distributing updates. Figure 39 provides an example of how an automated patch management
solution might work.

Figure 39: An overview of an automated patch management process.

124
The Reference Guide to Data Center Automation

The process begins with the detection of new patches. Ideally, the system will automatically
download the appropriate files. If systems administrators determine that the update is relevant
and that it should be tested, they can instruct the solution to deploy the update to a test set of
servers. They can then perform any required testing. If the update passes the tests, they can
instruct the automated patch management system to update the relevant devices. Patches are then
applied and verified based on the organization’s rules. The entire process is often reduced to a
small fraction of the total time of performing these steps manually.

Benefits of Automated Patch Management


The main purpose of an automated patch management solution is to help carry out all the steps
mentioned earlier. This includes obtaining updates, testing them, deploying the changes, and
auditing systems. In addition to automating these tasks, other benefits include:
• Obtaining updates—The process of discovering and downloading updates can be
automated through various tools. This is often done through a database that is managed
by the solution vendor. Broad support for many different device types, OSs, and
applications is a definite plus. IT staff can quickly view a “dashboard” that highlights
which new patches need to be deployed.
• Identifying patch targets—It’s often difficult to determine exactly which systems might
need to be patched. Automated tools can determine the configuration of IT components
and allow administrators to easily determine which systems might be affected.
• Auditing—Expected system configurations can be automatically compared with current
configuration details to help prove compliance with IT standards.
• Simplified deployment—Patches can be deployed automatically to hundreds or even
thousands of devices. When necessary, the deployment can be coordinated with
downtime windows.
With all these benefits in mind, let’s look at some additional features that can help IT
departments manage updates.

What to Look for in Patch Management Solutions


IT organizations should look for patch management solutions that integrate with other data
center automation tools. Through the use of a configuration management database (CMDB), all
details related to servers, network devices, workstations, and software can be collected centrally.
The CMDB facilitates on-demand reporting, which can help organizations demonstrate
compliance with regulatory requirements as well as internal patch policies. Other features
include automated notifications, support for remote offices, easy deployment, and support for as
many systems and devices as possible.
Overall, the important task of keeping servers and network devices up to date can be greatly
simplified through the use of data center automation tools. This approach provides the best of
both worlds: ensuring that systems are running in their ideal configuration while freeing up IT
time and resources for other tasks.

125
The Reference Guide to Data Center Automation

Network Provisioning
Perhaps the most critical portion of modern IT environments is the underlying network
infrastructure. Almost all applications, workstations, and servers depend on connectivity in order
to get their jobs done. In the “old days” of computing, networks were able to remain largely
static. Although new switches may be added occasionally to support additional devices, the
scope of the changes was limited. In current environments, the need to react to rapidly changing
business and technical needs has made the process of network provisioning increasingly
important.

Defining Provisioning Needs


From a high-level view of the network, it’s important to keep in mind several main goals for
managing the configuration of the infrastructure. The main objective should be to allow systems
and network administrators to efficiently design, test, and deploy network changes. The quicker
the IT team can react to changing requirements, the better will be its coordination with the rest of
the organization. The list of types of devices that are supported by network teams is a long one,
and usually includes many of the items shown in Figure 40.

Figure 40: Examples of commonly supported network device types.

Common operations include the deployment of new devices and making network-wide changes.
Additional tasks include making sure that devices are configured as expected and that they meet
the organization’s business and technical requirements. Figure 41 provides an overview of the
types of tasks that are required to perform network provisioning. Let’s take a look at some of
these requirements in more detail, and how using an automated network provisioning solution
can help.

126
The Reference Guide to Data Center Automation

Figure 41: An overview of network provisioning goals.

Modeling and Testing Changes


Simple types of network changes might require only minor modifications to one or a few
devices. For example, if a new port or protocol should be allowed to cross a single firewall or
router, the change can safely be performed manually by a knowledgeable network administrator.
The modification is also likely to be fairly safe.
Other types of network changes can require the coordination of changes between dozens or even
hundreds of network devices. Often, a relatively simple error such as a typo in a network
configuration file or overlooking a single device can lead to downtime for entire segments of the
network. Furthermore, applying changes to numerous devices at the same time can be a tedious
and error-prone process.
This additional complexity can best be managed through the use of an automated system. By
allowing network administrators to design their expected changes in an offline simulation or test
environment, they can predict the effects of their changes. This can help catch any configuration
problems before they are actually committed in a production environment.

127
The Reference Guide to Data Center Automation

Managing Device Configurations


Once an IT organization has decided which changes need to be made, an automated solution can
apply those changes. The process generally involves defining which modifications are to be
made to which devices. Data center automation tools can verify that the proper approvals have
been obtained and that standard change and configuration management processes have been
followed. The actual modifications can be deployed simultaneously to many different devices, or
they can be scheduled to occur in sequence. From a network standpoint, the coordination of
changes is extremely important in order to avoid configuration conflicts or unnecessary
downtime.
Automated network provisioning systems also provide additional useful features. Common
operations might include copying the relevant portions of the configuration of an existing device
(for de-provisioning or re-provisioning), or defining templates for how network devices should
be configured. For environments that often need to scale quickly, the ability to define standard
configuration templates for devices such as routers, switches, firewalls, load balancers, and
content caches can dramatically reduce deployment times and configuration errors.

Auditing Device Configurations


Even in well-managed IT environments, it’s possible for the configuration of a device to deviate
from its expected settings. This might happen due to simple human error or as a result of an
intrusion or unauthorized modification. Automated network provisioning solutions should be
able to regularly scan the configuration of all the devices on the network and report on any
unexpected values that are encountered. These reports can be used to demonstrate compliance
with regulatory requirements and IT policies and standards.

Using a Configuration Management Database


An excellent method for managing the complexity of network environments is through the use of
a centralized configuration management database (CMDB). This central repository can store
details related to all the devices in the environment, including networking hardware, servers,
workstations, and applications. The information can be combined to provide reports such as
insight into overall network utilization or finding the root causes of any performance problems or
failures that might have occurred.

Additional Benefits of Automation


By automating network provisioning, IT departments can also realize numerous additional
benefits. For example, automatic notifications can be sent whenever problems occur on the
network. Also, overall security is often greatly increased because network administrators will no
longer need to share passwords, and IT managers can ensure that only authorized personnel are
able to make changes. Overall, data center automation tools can greatly simplify the process of
network provisioning and can increase the responsiveness of an IT department.

128
The Reference Guide to Data Center Automation

Network Security and Authentication


It is commonly accepted that network security is one of the most important aspects of IT
management, but the methods by which users and computers are granted access to communicate
within an organization can vary greatly between environments. The goal of most security
measures is to ensure that only authorized users can access resources while still allowing all
users to do their jobs with a minimal amount of hassle.

Understanding Security Layers


If you were to imagine a house with a concrete front door that includes numerous locks and that
has flimsy single-pane windows, it’s unlikely that you would consider the house to be secure.
The same applies to networks—security must be implemented and managed throughout the
organization and at all entry points to the network. The best-implemented security plan will
include multiple layers of security. Figure 42 provides an overview of some of these layers.

Figure 42: An overview of various IT security layers.

All these layers work together to form the links in an organization’s armor. For example, before
an employee or consultant can access a specific database application, the employee will first
have to have access to a physical network port. He or she will then be verified at the network and
server levels, and finally at the application level. The user must meet all these challenges in order
to be able to access the application.

129
The Reference Guide to Data Center Automation

Choosing a Network Authentication Method


When working in all but the smallest of IT environments, it’s important to use a centralized
authentication mechanism. One of the most commonly used systems is Microsoft’s Active
Directory. AD domains provide an organization-wide security database that can be used to
control permissions for users, groups, and computers throughout the environment. All
administration is managed centrally without requiring security to be configured on individual
computers. As long as a user has the appropriate credentials, he or she will be able to access the
appropriate devices or services.

Security Protocols
For managing authentication in a distributed network environment, one of the most common
protocols is Kerberos. This protocol allows computer systems to be able to positively identify a
user in a secure way. It can help avoid security problems such as the interception of security
credentials through the use of encryption. Generally, Kerberos is implemented at the server or
the application level. However, network devices and other components can also take advantage
of it.
There are also several other authentication methods that can be used. Older versions of the
Microsoft Windows platform use the NTLM authentication protocol and method. Although this
method is less secure than Kerberos, NTLM is a widely supported standard that might still be
required to support down-level clients and servers. Also, numerous Lightweight Directory
Access Protocol (LDAP)-compliant solutions can integrate with or replace AD. Remote
Authentication Dial-In User Service (RADIUS), which was originally developed for the purpose
of authenticating remote users, can help centralize security for mobile users and remote
locations.

Authentication Mechanisms
The goal of authentication is to ensure that a specific user is who he or she claims to be. By far,
the most common authentication mechanism is through the use of a login and password
combination. Although this method meets basic security requirements, it has numerous
drawbacks. First, users are forced to memorize these pieces of information, and handling lost
passwords is a tedious and time-consuming process. Additionally, passwords can be shared or
stolen, making it possible that a person is not actually being positively identified. So much is
dependent on having the right credentials that this method leaves much room for improvement.
Newer authentication mechanisms include biometrics and the use of specialized security devices.
Biometric devices are most commonly based on the use of fingerprints or voice identification to
identify individuals. Other methods such as retinal scans are available (though they’re most
commonly seen in spy movies). Security devices such as an encryption card or “fob” can also be
used to verify individuals’ identities, especially for remote access. All of these methods involve a
certain level of management overhead, and IT departments must be able to keep track of security
principals, regardless of the method used.

130
The Reference Guide to Data Center Automation

Authorization
Figuring out how administrators can control access to a system is only part of the security
puzzle. Just as important is defining what exactly these users can do. Restrictions can range from
determining which files and folders can be accessed to limiting the time of day during which a
user can log on. Authorization is the process of granting permission to security principals (such
as users or computers) in order to granularly manage what tasks they can perform.

Automating Security Management


With the many methods of managing and determining network permissions, IT departments are
faced with a difficult challenge. On one hand, administrators must make systems as usable and
accessible to authorized users as is practical. On the other hand, the IT team must ensure that all
the different levels and layers of security include consistent information to prevent unauthorized
access. Even a single device or database that is out of compliance with policies can create a
major security hole in the overall infrastructure.
So how can security be managed across all these disparate systems? A commonly used method is
through the use of a centralized security management solution. Figure 43 shows an example of
how this might work from a conceptual standpoint. The goal of the solution is to coordinate
details between multiple security providers. It can do so through the use of a centralized security
database that might contain either a master set of credentials or mappings between different types
of security systems. The actual implementation details will vary based on the overall needs of the
environment. From the user’s standpoint, this can help achieve the benefit of single sign on
(SSO).

Figure 43: Coordinating security between multiple systems.

Overall, by integrating the management of overall security, IT departments and organizations can
be sure that all their systems remain coordinated and that only authorized users can access the
network.

131
The Reference Guide to Data Center Automation

Business Processes
An important characteristic of successful businesses is a strong alignment of the efforts between
multiple areas of the organization. This arrangement rarely occurs by itself—instead, it requires
significant time and effort from organizational leaders. The end result is often the creation of
processes that define how all areas of the enterprise should work together to reach common
goals.

The Benefits of Well-Defined Processes


Business processes are put in place to describe best practices and methods for consistently
performing certain tasks. Often, the tasks involved will include input and interaction of
individuals from throughout the organization. Before delving into details and examples of
processes, let’s first look at the value and benefits.
There are several valuable benefits of implementing processes. The first is consistency: by
documenting the way in which certain tasks should be completed, you can be assured that all
members of the organization will know their roles and how they may need to interact with
others. This alone can lead to many benefits. First, when tasks are performed in a consistent
manner, they become predictable. For example, if the process of qualifying sales leads is done
following the same steps, managers can get a better idea of how much effort will be required to
close a sale. If the business needs to react to any changes (for example a new competitive
product), the process can be updated and all employees can be instructed of the new steps that
need to be carried out.
Another major benefit of defining business processes is related to ensuring best practices. The
goal should not be to stifle creativity. Rather, it’s often useful to have business leaders from
throughout the organization decide upon the best way to accomplish a particular task. When
considering the alternative—having every employee accomplish the task a different way—
consistency can greatly help improve efficiency. Additionally, when processes are documented,
new employees or staff members that need to take on new roles will be able to quickly learn
what is required without making a lot of mistakes that others may have had to learn “the hard
way.”

Defining Business Processes


Once you’ve decided that your organization can benefit from the implementation of business
processes, it’s time to get down to the details. You must define business processes and determine
how they can best be implemented to meet the company’s needs.

132
The Reference Guide to Data Center Automation

Deciding Which Processes to Create


An obvious first step related to designing processes is to figure out which sets of tasks to work
on. At one extreme, organizations could develop detailed plans for performing just about every
business function. However, creating and enforcing business processes requires time and effort,
and the value of the process should be considered before getting started. Some characteristics of
tasks that might be good candidates for well-defined processes include:
• Tasks that are performed frequently—The more often a process is used, the more value it
will have for the organization. For tasks that are performed rarely (for example, a few
steps that are carried out once per year), the effort related to defining the process might
not be worthwhile.
• Tasks that involve multiple people—Processes are most useful when there is a sequence
of steps that must be carried out to reach a goal. When multiple people depend upon each
other to complete the task, a process can help define each person’s responsibilities and
can help ensure that things don’t “fall through the cracks.”
• Tasks that have consistent workflows—Since the goal of a process is to define the best
way in which to accomplish a task, processes are best suited for operations that should be
done similarly every time. Although it is possible to define processes when significant
variations are common, often these processes lead to many exceptions, which can lower
the overall value of the effort.
With these aspects in mind, let’s look at additional details related to defining business processes.

Identifying Process Goals


As it’s helpful to have a project plan or mission statement, it’s important to define the goals of a
process before beginning the work of defining it. Examples of typical process goals include:
• To provide an efficient method for tracking customer issues immediately after a sale.
• To increase the quality of technical support provided by the customer service desk.
• To streamline the process of payroll processing.
Effective goals will usually be concise and will focus on the what and why, instead of how.
During the development of processes, organizations should regularly refer back to these goals to
ensure that all the steps are working towards the requirements.

133
The Reference Guide to Data Center Automation

Developing Processes
When it comes to deciding who should be involved in developing processes, the general rule of
thumb is the more, the better. Although it might be tempting for managers to take a top-down
approach to defining processes or for a single business manager to document the details, it’s
much better to solicit the input of all those that are involved. Many operations and tasks have
effects that are felt outside of the immediate realm of a single department. Therefore, it’s
important to ensure coordination with other portions of the business.
Specifically, there are several roles that should be represented during the creation of a process.
Business leaders from all areas of the organization should be welcome. Additionally,
stakeholders whose jobs will be directly affected by the process should drive the process. This
might include employees ranging from hands-on staff members to executive management
(depending on the scope of the process). An organized process for implementing ideas and
reviewing documentation drafts can go a long way toward keeping the development process
humming along. At the risk of sounding like a half-baked management fad, it’s often helpful to
have a process for creating processes.

Documenting Business Processes


Once the key components of a business process have been defined, it’s time to commit the
details to a document. A best practice is to use a consistent format that includes all the relevant
details that might be needed by individuals that are new to the job role. Figure 44 provides some
examples.

Figure 44: Components of a well-defined process.

Specific details include the owner of the document—the individual or group that is responsible
for defining and maintaining the process. Other details include who is affected by the process,
and the roles that might be required. The actual steps of the process can be defined in a variety of
ways. Although text might be useful as a basis, flowcharts and other visual aids can help
illustrate and summarize the main points very effectively.

134
The Reference Guide to Data Center Automation

Creating “Living” Processes


It’s important to keep in mind that processes are rarely, if ever, perfect. There is almost always
room for improvement, and organizations often have to react to changing business or technical
requirements. Instead of looking at processes as fixed, rigid commandments, organizations
should see them as guidelines and best practices. Ideally, the group will be able to meet
periodically to review the processes and ensure that they are still meeting their goals.
Furthermore, all employees should be encouraged to make suggestions about changes. This open
communication can help add a sense of ownership to the process and can help enforce it. It
doesn’t take much imagination to picture workers grumbling about antiquated systems and steps
that make their jobs more difficult and less efficient. Rather than encouraging people to work
around the system, they should be encouraged to improve the portions that don’t work.

Automating Business Process Workflow


As mentioned earlier, it’s common for processes to include steps that require interactions among
different individuals and business units. Therefore, it should come as no surprise that
organizations can benefit significantly through the use of automated workflow software
solutions. These solutions allow managers to define steps that are required and to ensure that
they are properly followed.
Approvals processes and workflow often require multiple people to work on the same piece of
information. Tasks include reviewing the current state of the information and making comments
or modifications. The changes should be visible to everyone involved in the process, and people
should be sure to have the latest version of each document. The challenges lie in the ability to
coordinate who has access to which pieces of a document, and when.
Many popular software packages and suites offer workflow features. For example, Microsoft’s
Office system productivity suite and its SharePoint Portal Server product can help make
documents and other information available to teams and organizations online. Many enterprises
have also invested in the implementation of enterprise resource planning (ERP), customer
relationship management (CRM), or custom-built line-of-business applications. And, from an IT
standpoint, data center automation tools can be used to ensure that processes related to change
and configuration management, security management, deployment, and many other tasks are
handled according to the organization’s best practices. Regardless of the approach taken, the
creation and enforcement of business processes can significantly improve the maturity and
efficiency of organizations of any size.

135
The Reference Guide to Data Center Automation

Business Process Example: Service Desk Processes


Having already explored the benefit of business processes and characteristics that can make them
successful, let’s look at a specific example of a business process—the implementation of a
service desk workflow. The goal is to help illustrate how organizations can create and document
a common business practice to help streamline operations.

Characteristic of an Effective Process


Before diving into specific details of a service desk process, let’s enumerate a few ideas to keep
in mind. First and foremost, the process should be defined well enough so that all reasonable
procedures are covered. Examples might include what to do in the case of an emergency, or how
after-hours support calls should be handled.
Second, it’s important for IT departments to communicate their processes to their users. If the
turnaround time to resolve low-priority issues is 2 business days, users should be made aware of
this ahead of time. Third, it is very important that at any given point in the process, at least one
individual has ownership of an issue. This individual should have the authority to make decisions
whenever decisions are required. A common cause of poor customer service is when a call or
issue should be transferred but instead ends up in a “black hole” somewhere. (It’s tempting to
think that there’s a place in the Universe where these calls go to commiserate).
Some fundamental rules related to documentation should also apply. Consistent use of particular
terminology (along with definitions, wherever appropriate) can be greatly helpful. In the area of
service desk support, clear definitions of “Level 2 Emergency” or “minor issue” can help
everyone better understand their roles. Even terms such as “regular business hours” could use at
least a reference to the company’s standard work schedule.
Finally, wherever possible, service desk staff should be empowered to act as advocates for their
callers. Although their ultimate loyalty should be to the support organization, they should also
represent the needs of those that they support to the best of their abilities. Keeping these things in
mind, let’s move on to some examples.

Developing a Service Desk Operation Flow


Let’s start by taking a look at a typical service desk process. For the sake of this example, let’s
focus on a scenario in which an IT call center is designed to support end users from within the
organization. Let’s assume that the organization supports approximately 3000 employees spread
through numerous sites, and the service desk includes 35 staff members, including management.

 Most of the information in this section is adaptable to organizations of just about any size.

136
The Reference Guide to Data Center Automation

Documenting Workflow Steps


The approach we’ll take to developing a service desk process is to start with the very basics. You
might imagine these first steps as something that might be scribbled in a notebook somewhere.
Typical steps in the service desk process can initially be defined by the following high-level
steps:
• A Service Desk Representative (SDR) receives a call and determines the nature of the
problem.
• If the problem can be resolved by the SDR, assistance should be provided and the call
should be completed.
• If the problem requires the caller to be transferred, the SDR should document details and
transfer the call to the appropriate specialist.
• If the issue is an emergency, it should be escalated to a supervisor via email (during
regular business hours) or via a phone call (outside of regular business hours).
• All other issues should be escalated to a Senior Support Representative (SSR).
Although text-based descriptions can be helpful, this example leaves much to be desired. First,
it’s difficult to read—it’s not clear whether these steps should be performed in sequence or some
decisions are exclusive of each other. Clearly, there is room for improvement. Let’s continue on
the path to an effective service desk business process by looking at more examples of what might
be included.

Tracking and Categorizing Issues


One important aspect of providing service desk support is the requirement of always tracking all
issues. Apart from ensuring that no request is ignored, this information can be vital in
identifying, comparing, and reporting on common problems. Service desk staff should be made
aware of common categories of problems. Table 10 provides basic examples.
Category Description Examples
Minor—Desktop Minor computer issue that is Intermittent application
not preventing use of the problems; non-critical or
system “annoying” issues
Minor—Change Request Change to an existing system Addition of a new computer;
that is not preventing an new hardware request;
employee from working physical relocation of a
computer
Medium—Single System A single computer is Hard disk or other hardware
unavailable for use by an failure; operating system (OS)
employee issue
High—Multiple Systems Multiple systems are Department-level server
unavailable for use failure; network failure

Table 10: Examples of service desk issue categories.

In addition, this table could include details about any service level agreements (SLAs) that the IT
department has created as well as target issue resolution times. Of course, manual judgment will
always be required on the part of service desk staff. Still, the goal should be to capture and route
important information as accurately as possible.

137
The Reference Guide to Data Center Automation

Escalation Processes and Workflow


In even small service desk environments, it’s likely that the organization has specialists to handle
certain types of issues. In some cases, there might be multiple levels of support staff; in other
cases, application experts might be located outside the IT organization. Once the nature and
severity of an issue has been determined, service desk representatives should know how they
should route and handle these issues. Perhaps the most important aspect is to ensure that the
issue always has an owner.

Creating a Service Desk Flowchart


Once you have settled on the features to include in your high-level service desk process, it’s time
to determine how best to communicate the information. A flowchart is often the best way for
people to visualize the steps that might be required to resolve an issue and how the steps are
related. Figure 45 provides an example.

Figure 45: An example of a Help desk issues resolution process flowchart.

138
The Reference Guide to Data Center Automation

Notice that in this document, there are many decision points and branching logic that will affect
the path to issue resolution. The major areas of ownership start at the left and begin with the
reporting of an issue (which can be from any area of the organization). The Level-1 staff is
responsible for categorizing the issues and determining the next steps. The issue may be resolved
at this level or it may be moved on to other members of the staff. At all points, the issue is owned
by an individual or a group. In this particular flowchart, it is ultimately the responsibility of the
Level-1 staff to ensure that an issue is closed.
Although this flowchart may not be perfect, it is easy to read and provides a simple overview of
many portions of the process. Most IT organizations will also want to accompany the flowchart
with additional details such as definitions of terms and steps involved in procedures.

Automating Service Desk Management


Service desk workflow is an excellent example of the type of business process that can be greatly
improved through the use of automation. It’s important to note that there are many approaches to
the task of defining service desk workflows. For example, the IT Infrastructure Library (ITIL)
defines a Service Desk, and provides best practices for how IT organizations can best implement
policies and processes related to issue resolution

For more information about ITIL, see the ITIL Web site at http://www.itil.co.uk.

Numerous third-party products and software solutions are also available. Some products are very
customizable, while others introduce their own suggested workflows, terminology, and best
practices.
When evaluating potential service desk solutions, IT organizations should start by looking at
their overall needs. For example, some solutions might better lend themselves to the support of
customers that are external to an organization (by allowing for fee-based support and related
features); others might be more appropriate for internal IT service desks. In some cases, an
enterprise might decide to build its own service desk solution. Although doing so can lead to a
system that is well-aligned with business goals, the time, cost, and maintenance effort required
might not lead to a strong enough business case for this approach.
Regardless of the approach and the technology selected, the implementation of an organized
service desk process is an excellent example of how IT organizations can benefit from the
implementation of business processes.

139
The Reference Guide to Data Center Automation

Executive Action Committee


A challenge that is common to most IT departments is the goal of meeting organizational
requirements while staying within established budgets. In addition to the ever-increasing reliance
most organizations put on their IT staff, new initiatives often take up important time and
resources. When reacting to demands, it can become difficult for IT management to stay on top
of the needs of the entire organization. Instead of working in isolation from the rest of the
business, a recommended best practice is to establish an Executive Action Committee.

Goals of the Executive Action Committee


An Executive Action Committee can help determine the course of the business and can help
define the role of the IT organization within it. The purpose of the committee is to evaluate
current and future IT initiatives and to make recommendations about which projects should be
undertaken. The process might start by evaluating active proposals and requests as collected by
the IT department. For example, the Sales and Marketing departments might have requested an
upgrade of their current customer relationship management (CRM) application, while the
Engineering department is looking for a managed virtualization solution to facilitate testing of a
new product.

Evaluating Potential Projects


Given time and budget constraints, it’s likely that some projects will either have to be cut from
the list or be postponed until resources are available. That raises the question of how to decide
which projects are most valuable to the organization. Standard business-related measurements
can be helpful. Quantitative estimates such as return on investment (ROI) and total cost of
ownership (TCO) are key indicators of the feasibility of a particular project. The quicker the ROI
and the lower the TCO, the better. Other factors that might be taken into account include risks
(factors that might lead to cost overruns or unsuccessful project completion) as well as available
resources (see Figure 46).

140
The Reference Guide to Data Center Automation

Figure 46: Factors related to prioritizing projects.

An adage related to technical project management specifies that organizations can choose to
define two of the following: scope, timeliness, and quality. For example, if the project deadline is
most important, followed by quality, then it’s quite possible that the scope (the list of included
features and functionality) might need to be reduced (see Figure 47).

Figure 47: Prioritizing the goals of a particular project.

141
The Reference Guide to Data Center Automation

Defining Committee Roles and Members


When defining the membership of the Executive Action Committee, it’s important to ensure that
representation from various areas of the organization is included. Ideally, this will include senior
management and executives from various business units. Because investments in IT can affect
the organization as a whole, input and comments should be solicited before undertaking major
projects. This process can go a long way towards having IT organizations seen as strategic
business partners and good team players.

Implementing an Executive Action Process


A crucial first step in implementing an Executive Action Process is to gather buy-in from
throughout the organization. Often, the potential benefit—better prioritization of IT projects—is
enough to gain support for the process. In other cases, IT managers might have to start the
process by calling meetings to evaluate specific projects.
The roles of committee members may vary based on business needs and particular projects that
are underway. For example, if an organization is planning to invest significant resources in a new
Web-based service offering, leaders from the Engineering department might be most interested
in helping to prioritize projects. Figure 48 provides some steps that might be involved in regular
Executive Action Committee meetings.

Figure 48: Parts of the of the Executive Action Committee process.

Overall, the goal of the Executive Action Committee is to better align IT with the needs of the
organization. By ensuring that input is gained from throughout the organization and by
prioritizing the projects that can provide the most “bang for the buck,” enterprises can be sure to
maximize the value of their IT investments.

142
The Reference Guide to Data Center Automation

Centralized User Authentication


Taken literally, the concept of authentication refers to establishing that something is genuine or
valid. In the “real world,” this is often easy enough—unless you have reason to believe that
you’re involved in a complex international plot. Basic physical appearance can help you identify
individuals with little room for error. Add in an individual’s voice, and it’s pretty easy to
distinguish your manager from other coworkers (perhaps by identifying the tell-tale pointy hair
from the Dilbert comic strips). The process of authentication in the technical world is
significantly more complex.

Major Goals of Authentication


From the standpoint of an IT department, the primary goal of authentication is to positively
identify users or computing devices and to ensure that they are who they claim to be. Based on
their validated identities, systems can determine which permissions to grant (a process known as
authorization). Although the primary goal is easily stated, there is a lot more to it.
Other goals of the authentication process involve minimizing the hassle and intrusiveness of
security methods. If you required your users to provide authentication information every time
they tried to open a file, for example, it’s likely that the reduction in productivity (not to mention
the negative effects on your own life expectancy) might not make it a worthwhile
implementation. With strong but user-friendly and easy-to-maintain authentication mechanisms,
organizations can gain the advantages of increased security without the potential downsides.
With this goal in mind, let’s look at ways in which IT departments can implement authentication.

Authentication Mechanisms
By far, the most commonly used method of computer-based authentication is through the use of
a login and password combination. Although this method is relatively easy to implement, it
comes with significant burdens. Users are responsible for generating and remembering their own
passwords. They should choose strong passwords, but they’re often required to enter them
multiple times per day.
From an IT standpoint, devices such as routers and security accounts for use by applications and
services also often have passwords. Creating and maintaining these passwords can be a difficult
and time-consuming process. From a security standpoint, it can also be difficult to determine
whether a password has been shared, compromised, or used in an authorized way. All too often,
“secrets” are shared. Considering that organizations often have many thousands of passwords
and accounts, this can be a major security-related liability.

143
The Reference Guide to Data Center Automation

Strengthening Password-Based Authentication


An old adage states that a chain is only as strong as its weakest link—should even one
component fail, the strength and integrity of the entire chain is compromised. From an IT
standpoint, this means that security staff must ensure that authentication credentials are properly
maintained. Some general best practices related to managing password-based environments
include the following:
• Password length—IT departments should require a minimum number of characters for
each password that is used within the environment. Although the specifics vary between
IT environments, a minimum password length of at least six characters is a standard best
practice.
• Password complexity—A common method for infiltrating computer systems is that of
dictionary-based or “brute force” attacks. This approach involves either randomly or
systematically trying to “guess” a password. If the potential attacker has additional
knowledge (such as names of the user’s children, pets, and so on), the chances of success
can be dramatically improved. To counter these methods, it’s important to ensure that
passwords are sufficiently complex. The general approach is to require at least two of the
following types of characters in every password:
• Lower-case letters
• Upper-case letters
• Numbers
• Special characters
• Password expiration—The longer a user account and password combination is active, the
more likely it is that the account is being used by an unauthorized individual. Because
there is little to prevent users from accidentally or purposely sharing passwords and it’s
difficult to detect whether a login is being used by an unauthorized individual, it’s
important to require passwords to be regularly modified. A typical practice might require
users to change their passwords every 3 months. The authentication system can also keep
a list of recently used passwords, and prevent their reuse. Finally, some systems might be
able to look for similarities in passwords and disallow the change from keys like
“P@ssw0rd01” to “P@ssw0rd02.”

144
The Reference Guide to Data Center Automation

• Account lockout policies—Unauthorized access attempts are generally characterized by


having many unsuccessful logon attempts. Password-based security solutions should
automatically lock an account so that it cannot be used if a certain number of incorrect
logon attempts are made. Additionally, the information could be logged so that IT staff
can examine the situation. To avoid administrative overhead, an automatic unlock
process is often used. For example, after five unsuccessful logon attempts, the user must
wait 10 minutes before again attempting to access the system. These methods can
dramatically decrease the viability of brute-force attacks.
• User education—A critical but often-overlooked area related to authentication is that of
end-user education. Staff members often see security as a hindrance to getting their jobs
done, and they can sometimes work to circumvent certain measures. This attitude can
lead to significant problems that can eventually increase the vulnerability of an entire
organization’s computing resources. By informing users of the value of and power of
their network accounts, IT departments can gain allies in the process of securing systems.
It’s also important to note that IT departments can easily go overboard in implementing security
measures. Such Draconian tactics as requiring extremely long passwords or forcing very frequent
password changes can often work against the goal of security. Users will often choose the path of
least resistance, and may feel the need to write down their passwords in multiple places or to use
easy-to-guess phrases. As mentioned earlier, all security implementations should also take into
account usability and productivity issues. Perhaps most importantly, all of an IT environment’s
authentication policies and procedures should be documented and should be made available to
members of the organization.

Other Authentication Mechanisms


Although password-based authentication is the most ubiquitous method, other methods are also
available. The field of biometrics focuses on the task of identifying individuals based on
biological mechanisms. Fingerprint-based identification is now available at a reasonable cost and
even consumer-focused devices are available. In order for this method to work in a corporate
environment, the fingerprint readers must be readily available wherever authentication takes
place. Often, users will have to fall-back to “old-fashioned” username and password
combinations, at least occasionally. Other biometric methods range from the use of voice-print
analysis to retinal scans. The major barriers to the adoption of these methods include cost and
compatibility with existing systems.
Still more authentication mechanisms involve the use of a small device that can generate
regularly changing cryptographic values known as secure tokens. This mechanism adds another
layer of security by ensuring that a potential user of a system is in possession of the device.
Should it be misplaced or stolen, IT departments can find out quickly and cancel old credentials.

145
The Reference Guide to Data Center Automation

Centralized Security
So far, we’ve looked at several authentication mechanisms (with a focus on password-based
authentication). Let’s explore the process of creating and managing security credentials in a
network environment. We’ll focus on the importance of implementing a centralized user
authentication system, but first let’s look at an alternative (and the many problems it can cause).

Problems with Decentralized Security


Most new computers, operating systems (OSs), applications, and network devices have
mechanisms for maintaining their own security. For example, most switches, routers, and
firewalls can be protected through the use of a password. Applications might use their own set of
logins and permissions, and even individual computers might have their own security settings.
Figure 49 provides an overview of this security approach.

Figure 49: A logical overview of decentralized security.

The most important aspect of decentralized security is that there are many security databases
within the organization. Each one is independent of the others and contains its own
authentication information. For example, every computer might have a separate account named
“SysAdmin.” Although it’s technically possible to manually synchronize the login information
(that is, to ensure that the same usernames and passwords are used on each machine), the process
is tedious and error-prone. Furthermore, maintaining even a few of these systems can quickly
become difficult and time consuming. The end result is often that security is not maintained:
Simple passwords are used, login information is changed infrequently, and passwords are often
written down or recorded in some other way.

146
The Reference Guide to Data Center Automation

Although simply setting up a decentralized security environment can be painful, the real risks are
in the areas of manageability. For example, what will happen if a password is compromised?
Even if IT staff can scramble to update the passwords on multiple devices, there is still a large
window of vulnerability. The new password also has to be communicated to the users that need
it—an inherently risky proposition. What if one or more devices are overlooked and continue to
run with the exposed authentication information? And this doesn’t even take into account the
effort that might be required to ensure that other computers and services that rely upon the login
are properly updated.
In case all of this isn’t incentive enough to see the drawbacks of decentralized security, let’s look
at one more motivator before moving on: Imagine the difficulty that end users will experience if
they must manually log on to each device or application on the network. The decrease in
productivity and frustration might be tantamount to not having a network at all. By now, it’s
probably obvious that decentralized security is not a very effective approach—even for the
smallest of IT organizations.

Understanding Centralized Security


In a centralized security model, all security principles (such as users and computers) are stored in
a single repository. All the devices in the environment rely upon this security database to provide
authentication services. All accounts are created and maintained once (although many different
devices might be able to perform the function). Figure 50 provides a visualization of this
approach.

Figure 50: A centralized security implementation.

It’s easy to see how this method can alleviate much of the pain of maintaining many separate
security databases. IT administrators that are responsible for maintaining security can create
accounts in the security database. And, if a password or other user setting must be changed, it
can be done centrally.

147
The Reference Guide to Data Center Automation

Understanding Directory Services Solutions


Although the benefits of centralized security management are compelling by themselves, so far
we’ve only scratched the surface. Several vendors offer unified directory services solutions that
provide numerous additional advantages. One of the most popular solutions is Microsoft’s
Active Directory (AD—see Figure 51).

Figure 51: A logical overview of a Microsoft AD domain.

AD is designed to be an enterprise-wide centralized security structure that is hosted by Windows


Server-based domain controllers. Although built-in authentication mechanisms can differ,
practically all enterprise-based hardware, software, and network solutions can leverage AD for
verifying user credentials and evaluating permissions. Microsoft’s directory services solution is
based on a variety of standards and technologies, including the Lightweight Directory Access
Protocol (LDAP), Kerberos (for managing authentication tokens), and Domain Name System
(DNS).
Setting up a complete directory services infrastructure involves many components and services,
so vendors have gone to great lengths to make these systems easy to configure, deploy, and
manage. In addition to AD, other vendors offer LDAP-compliant directory services solutions.
One example is the Remote Authentication Dial-In User Services (RADIUS) standard that was
originally intended for verifying credentials for remote users. Most of these directory services
solutions can work in conjunction with AD or by themselves.

148
The Reference Guide to Data Center Automation

Features of Directory Services Solutions


In addition to the important feature of providing a single central security repository, centralized
authentication solutions include many features that help simplify the management of user
authentication. Some of the features include:
• Secure authentication mechanisms—A significant challenge related to working with
password-based security is the problem of transferring password information over the
network. Even if the data is encrypted, it’s possible that replay-based attacks or man-in-
the-middle intrusions can reduce security. Modern directory services solutions use strong
authentication and key management systems such as Kerberos. Although the underlying
concepts are complex, the main benefit is that actual passwords are never sent over the
wire, thereby making it impossible for them to be intercepted or reverse-engineered. Best
of all, when it’s properly implemented, these features work behind-the-scenes without the
intervention of IT staff.
• Cross-machine authentication—Most IT environments support at least a few dozen
computers, and many support thousands. It doesn’t take much imagination to see the
problems with forcing users to authenticate at each resource. To solve this issue,
directory services solutions work in a way that allows computers that are members of a
domain to trust each other. As long as a user has authenticated with the security domain,
the user no longer must manually provide credentials for accessing other network
resources.
• Hierarchical management—Most businesses have established departments and an
organizational structure to best manage their personnel and resources. Directory services
solutions are able to mirror this hierarchy to provide for simplified management.
Administrative containers called organizational units (OUs) are created to allow for
easily managing thousands or even millions of “objects” such as users, computers,
applications, and groups.
• Management tools—Directory services solutions generally provide well-designed
graphical tools to manage security settings and accounts. Although IT staff will have no
problem using them, some operations can even be handed down to non-IT staff (such as
managers or Human Resources staff). By delegating the management of user accounts to
trusted individuals, IT departments can ensure that their security database is kept up to
date. And, through the use of scripting and programmatic automation, many of the most
common tasks can be greatly simplified.
• Application and device support—Third-party applications and hardware devices can take
advantage of directory services solutions to authenticate users. This setup alleviates
developers from the difficult task of creating secure logon mechanisms and reduces the
potential liabilities of security issues for the IT department. Furthermore, as there is
generally only a single account per user, IT departments can centrally enable, disable, or
modify permissions from within a single security database.
Though this basic list of features of directory service solutions is a long one, it only scratches the
surface of the full potential.

149
The Reference Guide to Data Center Automation

Directory Services Best Practices


Taking advantage of directory services solutions is usually a straightforward process. There are,
however, some important aspects to keep in mind. First and foremost, enterprise IT staff should
look for management solutions, software, and hardware that work with the directory solution that
they have implemented. By leveraging the advantages of the directory, IT organizations can
lower costs and improve security. The same applies for custom software development: Internal
developers should ensure that line-of-business applications adhere to corporate IT standards and
that they work with the directory services solution.
Finally, it’s important for IT departments to develop, document, and enforce policies related to
their security implementations. Processes for creating new user accounts, handling employees
that are leaving, and performing periodic security checks are vital to ensuring the overall health
and benefit of the directory service.
Overall, directory services solutions can dramatically improve security and reduce administration
related to a difficult technical and organizational challenge—managing user authentication. This
should make them a vital part of the core infrastructure of all IT departments of any size.

Download Additional eBooks from Realtime Nexus!


Realtime Nexus—The Digital Library provides world-class expert resources that IT
professionals depend on to learn about the newest technologies. If you found this eBook to be
informative, we encourage you to download more of our industry-leading technology eBooks
and video guides at Realtime Nexus. Please visit http://nexus.realtimepublishers.com.

150

You might also like