The Reference Guide To

tm

Data Center Automation
sponsored by

Don Jones and Anil Desai

Introduction

Introduction to Realtimepublishers
by Don Jones, Series Editor

For several years, now, Realtime has produced dozens and dozens of high-quality books that just happen to be delivered in electronic format—at no cost to you, the reader. We’ve made this unique publishing model work through the generous support and cooperation of our sponsors, who agree to bear each book’s production expenses for the benefit of our readers. Although we’ve always offered our publications to you for free, don’t think for a moment that quality is anything less than our top priority. My job is to make sure that our books are as good as—and in most cases better than—any printed book that would cost you $40 or more. Our electronic publishing model offers several advantages over printed books: You receive chapters literally as fast as our authors produce them (hence the “realtime” aspect of our model), and we can update chapters to reflect the latest changes in technology. I want to point out that our books are by no means paid advertisements or white papers. We’re an independent publishing company, and an important aspect of my job is to make sure that our authors are free to voice their expertise and opinions without reservation or restriction. We maintain complete editorial control of our publications, and I’m proud that we’ve produced so many quality books over the past years. I want to extend an invitation to visit us at http://nexus.realtimepublishers.com, especially if you’ve received this publication from a friend or colleague. We have a wide variety of additional books on a range of topics, and you’re sure to find something that’s of interest to you—and it won’t cost you a thing. We hope you’ll continue to come to Realtime for your educational needs far into the future. Until then, enjoy. Don Jones

i

Table of Contents Introduction to Realtimepublishers.................................................................................................. i An Introduction to Data Center Automation ...................................................................................1 Information Technology Infrastructure Library...............................................................................2 Benefits of ITIL ...................................................................................................................2 Improving Levels of Service....................................................................................3 Reducing IT Costs....................................................................................................3 Enforcing Well-Defined Processes ..........................................................................3 ITIL Framework Content Organization ...............................................................................3 ITIL Compliance..................................................................................................................6 ITIL Content and Resources ................................................................................................6 The Business Value of Data Center Automation.............................................................................7 Basic Benefits of IT .............................................................................................................7 Calculating the Value of IT..................................................................................................8 Identifying Costs......................................................................................................8 Discovering Business Benefits ................................................................................8 Communicating Strategic Business Value...............................................................9 Improving the Business Value of IT....................................................................................9 The Value of Data Center Automation ....................................................................9 Implementing Charge-Backs .................................................................................10 Enabling Better Decisions......................................................................................10 Service Provider.............................................................................................................................11 Benefits of Operating IT as a Service Provider .................................................................11 Implement the Service Provider Model .............................................................................11 Identifying Customers’ Needs ...............................................................................12 Determining “Product Pricing”..............................................................................12 Identifying Service Delivery Details .................................................................................12 Measuring Service Levels......................................................................................13 Prioritizing Projects ...............................................................................................13 Network Configuration Management ............................................................................................13 NCM Tasks ........................................................................................................................14 Configuration Management Challenges ............................................................................14 NCM Solutions ..................................................................................................................15 Benefits of Automating NCM............................................................................................15

ii

Table of Contents Choosing an NCM Solution...............................................................................................16 Server Provisioning........................................................................................................................18 Challenges Related to Provisioning ...................................................................................18 Server-Provisioning Methods ............................................................................................19 Scripting.................................................................................................................19 Imaging ..................................................................................................................19 Evaluating Server-Provisioning Solutions.........................................................................20 Return on Investment.....................................................................................................................21 The Need for ROI Metrics .................................................................................................21 Calculating ROI .................................................................................................................21 Calculating Costs ...................................................................................................22 Calculating Benefits...............................................................................................22 Measuring Risk ......................................................................................................23 Using ROI Data..................................................................................................................23 Making Better Decisions........................................................................................24 ROI Example: Benefits of Automation..............................................................................25 ROI Analysis..........................................................................................................25 Change Advisory Board.................................................................................................................26 The Purpose of a CAB .......................................................................................................26 Benefits of a CAB..................................................................................................26 Roles on the CAB ..................................................................................................26 The Change-Management Process.....................................................................................27 Planning for Changes.............................................................................................28 Implementing Changes ..........................................................................................28 Reviewing Changes ...............................................................................................29 Planning for the Unplanned ...................................................................................29 Configuration Management Database............................................................................................29 The Need for a CMDB.......................................................................................................30 Benefits of Using a CMDB................................................................................................31 Implementing a CMDB Solution .......................................................................................32 Information to Track ..........................................................................................................32 Server Configuration..............................................................................................32 Desktop Configuration...........................................................................................32

iii

Table of Contents Network Configuration ..........................................................................................32 Software Configuration..........................................................................................33 Evaluating CMDB Features...............................................................................................33 Auditing .........................................................................................................................................35 The Benefits of Auditing ...................................................................................................35 Developing Auditing Criteria ............................................................................................36 Preparing for Audits...........................................................................................................38 Performing Audits..............................................................................................................38 Automating Auditing .........................................................................................................39 Customers ......................................................................................................................................40 Identifying Customers........................................................................................................40 Understanding Customers’ Needs......................................................................................41 Defining Products and Service Offerings ..........................................................................41 Communicating with Customers........................................................................................42 Managing Budgets and Profitability ..................................................................................43 Total Cost of Ownership................................................................................................................44 Measuring Costs.................................................................................................................44 Identifying Initial Capital Costs.............................................................................45 Enumerating Infrastructure Costs ..........................................................................45 Capturing Labor Costs ...........................................................................................46 Measuring TCO .................................................................................................................46 Reducing TCO Through Automation ................................................................................47 Reporting Requirements ................................................................................................................47 Identifying Reporting Needs..............................................................................................47 Configuration Reports............................................................................................47 Service Level Agreement Reporting......................................................................48 Real-Time Activity Reporting ...............................................................................49 Regulatory Compliance Reporting ........................................................................49 Generating Reports ............................................................................................................49 Using a Configuration Management Database ......................................................49 Automating Report Generation..............................................................................50 Network and Server Convergence .................................................................................................51 Convergence Examples......................................................................................................51

iv

Table of Contents Determining Application Requirements ............................................................................52 The Roles of IT Staff .........................................................................................................52 Managing Convergence with Automation .........................................................................52 Service Level Agreements .............................................................................................................53 Challenges Related to IT Services Delivery ......................................................................53 Defining Service Level Requirements ...............................................................................53 Determining Organizational Needs........................................................................54 Identify Service Level Details ...............................................................................55 Developing SLAs...............................................................................................................55 Delivering Service Levels..................................................................................................56 The Benefits of Well-Defined SLAs..................................................................................56 Enforcing SLAs .................................................................................................................56 Examples of SLAs .............................................................................................................57 Monitoring and Automating SLAs ....................................................................................58 Network Business Continuity ........................................................................................................58 The Benefits of Continuity Planning .................................................................................58 Developing a Network Business Continuity Plan..............................................................59 Defining Business Requirements...........................................................................59 Identifying Technical Requirements......................................................................59 Preparing for Network Failover .........................................................................................60 Configuration Management ...................................................................................60 Managing Network Redundancy ...........................................................................60 Simulating Disaster Recovery Operations .............................................................60 Automating Network Business Continuity ........................................................................61 Remote Administration..................................................................................................................62 The Benefits of Remote Administration ............................................................................62 Remote Administration Scenarios .....................................................................................62 Remote Management Features...........................................................................................63 Securing Remote Management ..........................................................................................64 Choosing a Remote Management Solution .......................................................................65 Server Configuration Management................................................................................................66 Server Configuration Management Challenges .................................................................66 Technical Challenges .............................................................................................66

v

Table of Contents Process-Related Challenges ...................................................................................67 Automating Server Configuration Management................................................................67 Automated Server Discovery.................................................................................67 Applying Configuration Changes ..........................................................................67 Configuration Management and Change Tracking................................................68 Monitoring and Auditing Server Configurations...................................................68 Enforcing Policies and Processes...........................................................................68 Reporting................................................................................................................69 Evaluating Automated Solutions .......................................................................................69 IT Processes ...................................................................................................................................69 The Benefits of Processes ..................................................................................................70 Challenges Related to Process ...........................................................................................70 Characteristics of Effective Processes ...............................................................................70 Designing and Implementing Processes ............................................................................71 Managing Exceptions.........................................................................................................72 Delegation and Accountability ..........................................................................................72 Examples of IT Processes ..................................................................................................72 Automating Process Management .....................................................................................73 Application Infrastructure Management ........................................................................................74 Understanding Application Infrastructure .........................................................................74 Challenges of Application Infrastructure Management.........................................75 Inventorying Application Requirements................................................................75 Identifying Interdependencies................................................................................75 Automating Application Infrastructure Management........................................................76 Using Application Instrumentation........................................................................76 Managing Applications Instead of Devices ...........................................................77 Business Continuity for Servers.....................................................................................................77 The Value of Business Continuity .....................................................................................77 Identifying Mission-Critical Applications and Servers .........................................78 Developing a Business Continuity Plan for Servers ..........................................................78 Defining Business and Technical Requirements ...................................................79 Implementing and Maintaining a Backup Site...................................................................80 Automating Business Continuity .......................................................................................80

vi

Table of Contents Using a Configuration Management Database ......................................................80 Change and Configuration Management ...............................................................81 Network and Server Maintenance..................................................................................................82 Network and Server Maintenance Tasks ...........................................................................82 Configuration Management ...................................................................................82 Applying System and Security Updates ................................................................82 Monitoring Performance........................................................................................83 Implementing Maintenance Processes...............................................................................83 Delegating Responsibility......................................................................................84 Developing Maintenance Schedules ......................................................................84 Verifying Maintenance Operations........................................................................84 The Benefits of Automation...................................................................................84 Asset Management.........................................................................................................................85 Benefits of Asset Management ..........................................................................................85 Developing Asset Management Requirements..................................................................86 Identifying Asset Types .........................................................................................87 Developing Asset Tracking Processes ...............................................................................89 Automating IT Asset Management....................................................................................89 Automated Discovery ............................................................................................90 Using a Configuration Management Database ......................................................90 Integration with Other Data Center Automation Tools .........................................90 Reporting................................................................................................................90 Flexible/Agile Management...........................................................................................................91 Challenges Related to IT Management..............................................................................91 The Agile Management Paradigm .....................................................................................91 Key Features of an Agile IT Department...........................................................................92 Automating IT Management..............................................................................................93 Policy Enforcement........................................................................................................................94 The Benefits of Policies .....................................................................................................94 Types of Policies....................................................................................................94 Defining Policies................................................................................................................94 Involving the Entire Organization .........................................................................95 Identifying Policy Candidates................................................................................96

vii

Table of Contents Communicating Policies ........................................................................................96 Policy Scope...........................................................................................................96 Checking for Policy Compliance .......................................................................................96 Automating Policy Enforcement........................................................................................97 Evaluating Policy Enforcement Solutions .........................................................................97 Server Monitoring..........................................................................................................................98 Developing a Performance Optimization Approach..........................................................98 Deciding What to Monitor .................................................................................................98 Monitoring Availability .........................................................................................99 Monitoring Performance......................................................................................100 Verifying Service Level Agreements...................................................................100 Limitations of Manual Server Monitoring.......................................................................100 Automating Server Monitoring........................................................................................102 Change Tracking..........................................................................................................................103 Benefits of Tracking Changes..........................................................................................103 Defining a Change-Tracking Process ..............................................................................103 Establishing Accountability .................................................................................104 Tracking Change-Related Details ........................................................................104 Automating Change Tracking..........................................................................................105 Network Change Detection..........................................................................................................106 The Value of Change Detection.......................................................................................106 Unauthorized Changes .........................................................................................107 Manual Change Tracking.....................................................................................107 Challenges Related to Network Change Detection..........................................................108 Automating Change Detection.........................................................................................108 Committing and Tracking Changes .....................................................................108 Verifying Network Configuration........................................................................109 Notification Management ............................................................................................................109 The Value of Notifications...............................................................................................109 Managing Internal Notifications ..........................................................................109 Managing External Notifications.........................................................................110 Creating Notifications......................................................................................................110 What to Include in a Notification.........................................................................110

viii

Table of Contents What to Avoid in a Notification...........................................................................111 Automating Notification Management ............................................................................111 Server Virtualization....................................................................................................................113 Understanding Virtualization...........................................................................................113 Current Data Center Challenges ..........................................................................113 Virtualization Architecture ..................................................................................113 Virtualization Terminology .................................................................................115 Benefits of Virtualization.................................................................................................116 Virtualization Scenarios...................................................................................................118 Limitations of Virtualization............................................................................................118 Automating Virtual Machine Management .....................................................................119 Remote/Branch Office Management ...........................................................................................119 Challenges of Remote Management ................................................................................119 Technical Issues ...................................................................................................120 Personnel Issues ...................................................................................................120 Business Issues.....................................................................................................120 Automating Remote Office Management........................................................................121 Patch Management.......................................................................................................................122 The Importance of Patch Management ............................................................................122 Challenges of Manual Patch Management ......................................................................122 Developing a Patch Management Process .......................................................................123 Obtaining Updates ...............................................................................................123 Identifying Affected Systems ..............................................................................123 Testing Updates ...................................................................................................123 Deploying Updates...............................................................................................124 Auditing Changes.................................................................................................124 Automating Patch Management.......................................................................................124 Benefits of Automated Patch Management .........................................................125 What to Look for in Patch Management Solutions..............................................125 Network Provisioning ..................................................................................................................126 Defining Provisioning Needs...........................................................................................126 Modeling and Testing Changes ...........................................................................127 Managing Device Configurations ........................................................................128

ix

Table of Contents Auditing Device Configurations ..........................................................................128 Using a Configuration Management Database ................................................................128 Additional Benefits of Automation..................................................................................128 Network Security and Authentication..........................................................................................129 Understanding Security Layers........................................................................................129 Choosing a Network Authentication Method ..................................................................130 Security Protocols ................................................................................................130 Authentication Mechanisms.................................................................................130 Authorization .......................................................................................................131 Automating Security Management ..................................................................................131 Business Processes.......................................................................................................................132 The Benefits of Well-Defined Processes .........................................................................132 Defining Business Processes............................................................................................132 Deciding Which Processes to Create ...................................................................133 Identifying Process Goals ....................................................................................133 Developing Processes ..........................................................................................134 Documenting Business Processes ........................................................................134 Creating “Living” Processes ................................................................................135 Automating Business Process Workflow.........................................................................135 Business Process Example: Service Desk Processes ...................................................................136 Characteristic of an Effective Process .............................................................................136 Developing a Service Desk Operation Flow....................................................................136 Documenting Workflow Steps.............................................................................137 Tracking and Categorizing Issues........................................................................137 Escalation Processes and Workflow ....................................................................138 Creating a Service Desk Flowchart......................................................................138 Automating Service Desk Management ..........................................................................139 Executive Action Committee.......................................................................................................140 Goals of the Executive Action Committee ......................................................................140 Evaluating Potential Projects ...............................................................................140 Defining Committee Roles and Members........................................................................142 Implementing an Executive Action Process ....................................................................142 Centralized User Authentication..................................................................................................143

x

Table of Contents Major Goals of Authentication ........................................................................................143 Authentication Mechanisms.............................................................................................143 Strengthening Password-Based Authentication...................................................144 Other Authentication Mechanisms ......................................................................145 Centralized Security.........................................................................................................146 Problems with Decentralized Security.................................................................146 Understanding Centralized Security ....................................................................147 Understanding Directory Services Solutions ...................................................................148 Features of Directory Services Solutions.........................................................................149 Directory Services Best Practices ....................................................................................150

xi

Copyright Statement

Copyright Statement
© 2006 Realtimepublishers.com, Inc. All rights reserved. This site contains materials that have been created, developed, or commissioned by, and published with the permission of, Realtimepublishers.com, Inc. (the “Materials”) and this site and any such Materials are protected by international copyright and trademark laws. THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. The Materials are subject to change without notice and do not represent a commitment on the part of Realtimepublishers.com, Inc or its web site sponsors. In no event shall Realtimepublishers.com, Inc. or its web site sponsors be held liable for technical or editorial errors or omissions contained in the Materials, including without limitation, for any direct, indirect, incidental, special, exemplary or consequential damages whatsoever resulting from the use of any information contained in the Materials. The Materials (including but not limited to the text, images, audio, and/or video) may not be copied, reproduced, republished, uploaded, posted, transmitted, or distributed in any way, in whole or in part, except that one copy may be downloaded for your personal, noncommercial use on a single computer. In connection with such use, you may not modify or obscure any copyright or other proprietary notice. The Materials may contain trademarks, services marks and logos that are the property of third parties. You are not permitted to use these trademarks, services marks or logos without prior written consent of such third parties. Realtimepublishers.com and the Realtimepublishers logo are registered in the US Patent & Trademark Office. All other product or service names are the property of their respective owners. If you have any questions about these terms, or if you would like information about licensing materials from Realtimepublishers.com, please contact us via e-mail at info@realtimepublishers.com.

xii

The Reference Guide to Data Center Automation [Editor's Note: This eBook was downloaded from Realtime Nexus—The Digital Library. All leading technology guides from Realtimepublishers can be found at http://nexus.realtimepublishers.com.]

An Introduction to Data Center Automation
Over time, organizations have placed increasingly heavy demands on their IT departments. Although budgets are limited, end users and other areas of the business rely increasingly on computing resources and services to get their jobs done. This situation raises the important issue of how IT staff can meet these demands in the best possible way. Despite the importance of IT in strategic and tactical operations, many technical departments are run in an ad-hoc and reactive way. Often, issues are only addressed after they have ballooned into major problems and support-related costs can be tremendous. From the end-user standpoint, IT departments can never react quickly enough to the need for new applications or changing requirements. Clearly, there is room for improvement. This guide explores data center automation—methods through which hardware, software, and processes can work together to streamline IT operations. Modern data center challenges include increasing demands from business units with only limited resources to address those demands. This guide focuses on topics in the following major areas: • Business processes and frameworks—The fundamental purpose of IT is to support business operations and to enable end users and other departments to perform their functions as efficiently as possible. IT departments face many common challenges, and various best practices have been developed to provide real-world recommendations for ways to manage IT infrastructures. From a business standpoint, the specifics include establishing policies and processes and implementing the tools and technology required to support them. IT as a service provider—The perceived role of IT can vary dramatically among organizations. One approach that helps IT managers better meet the needs of users is to view IT as a service provider. In this approach, the “customers” are end users that rely upon IT infrastructure to accomplish their tasks. This method can help in the development of Service Level Agreements (SLAs) and IT processes and better communicate the business value that IT organizations provide. Agile management—Modern IT environments are forced to constantly change in reaction to new business requirements. In the early days of IT, it was quite common for network administrators, server administrators, and application administrators to work in isolated groups that had little interaction. These boundaries have largely blurred due to increasing interdependencies of modern applications. With this convergence of servers and networks comes new management challenges that require all areas of a technical environment to work in concert. Network and server automation—The building blocks of IT infrastructure are servers and network devices. In an ideal world, all these complex resources would manage themselves. In the real world, significant time and effort is spent in provisioning and deploying resources, managing configurations, monitoring performance, and reacting to changes. All these operations are excellent opportunities for labor-reducing automation.

1

The Reference Guide to Data Center Automation Through each of the topics in this guide, we’ll cover important terms and concepts that will enable IT departments to perform more tasks with fewer resources. The importance and value of automating standard IT operations can be significant in data centers of any size. The goal is to significantly lower IT operational expenses while at the same time improving the end-user experience. Whether you’re a CIO or IT manager looking for ways to improve efficiency or a member of the down-in-the-trenches IT staff, you’ll find valuables concepts, methods, and techniques for better managing your computing infrastructure.

Information Technology Infrastructure Library
The Information Technology Infrastructure Library (ITIL—http://www.itil.co.uk/) is a collection of IT-related best practices. It is developed and maintained by the United Kingdom Office of Government and Commerce (OGC). ITIL was created to address the lack of standard recommendations for managing IT resources. The goal of ITIL is to provide a framework and guidelines that allow IT organizations to deliver high-quality services in a manageable way. The original content was developed in the late 1980s and continues to be updated with improved recommendations to support modern IT environments. The material is copyrighted by the UK OGC and the information is available in a variety of different formats. ITIL has become one of the most popular standards for IT-related best practices worldwide and is currently being used by thousands of IT organizations. Benefits of ITIL Many IT organizations tend to operate in an ad-hoc and reactive fashion. They often respond to issues after they occur, leading to problems such as downtime and lower quality of service (QoS). In many cases, this scenario is understandable as IT organizations are often faced with providing increased levels of service with insufficient staff and minimal budgets. Many organizations either cannot afford to provide additional resources to IT departments or cannot justify the reasons to increase investments. On the surface, this problem might seem very difficult to solve. However, one approach— increasing overall efficiency—can improve IT service delivery without requiring significant expenditures. It is in this arena where the implementation of IT management best practices comes in. The recommendations included in ITIL were developed based on studies of methods used by successful IT organizations worldwide. These approaches to solving common IT problems have been compiled and organized into a set of recommendations. Although implementing ITIL practices can take time and effort, most organizations will find that the potential benefits clearly justify the cost. The following sections look at some of the potential ways in which implementing ITIL practices can benefit IT operations.

2

The Reference Guide to Data Center Automation

Improving Levels of Service The quality of an IT organization is often measured by its ability to respond to business-related change requests and to provide reliability, availability, and performance. Many IT organizations do not have an organized process for responding to new issues and requests, and often several of the requests “fall through the cracks.” ITIL prescribes ways in which organizations can improve the reporting and management of problems and incidents. It helps IT organizations define how particular problems should be addressed and how to communicate with end users. By better managing these aspects of service delivery, IT departments can often identify potential areas for improvement. Reducing IT Costs Many IT departments suffer from inefficiencies that lead to increased costs. Problems caused by lack of communication, poor issue tracking, and ad-hoc changes can add up quickly. Often, IT managers are unaware of the true costs of purchasing capital assets, configuring and deploying new equipment, and maintaining this equipment. ITIL best practices include methods for calculating true costs and for translating this information into business-related terms. This information can then be used to make a strong case for investments in automation and other labor-saving technologies. Enforcing Well-Defined Processes Policies and processes are crucial to a well-managed environment. When policies are implemented and enforced, IT management can ensure that issues are dealt with consistently and completely. ITIL recommendations provide suggestions for designing and implementing successful processes. Often, it seems that no matter how quickly responses are handled, users’ expectations are higher. Through the use of SLAs, IT departments can communicate to users the type of response they should expect for various problems. Developing an SLA is easier when service delivery is managed through clearly defined processes. ITIL Framework Content Organization The total amount of information that is conveyed by the ITIL encompasses hundreds of pages. To present the information in a more manageable way, this guide divides it into eight sets, each of which focuses on a specific portion of the total framework. Figure 1 provides an overview of the different sets and how they’re related. The most important point is that the box on the left represents business requirements, and on the right is the actual technology. The ITIL framework focuses on the content in between—the ways in which technology can be used to meet business goals.

3

The Reference Guide to Data Center Automation

Figure 1: An overview of the ITIL framework.

Each set covers specific topics: • Service Support (ISBN 0113300158)—An important aspect of IT operations is determining how services are provided and how changes are managed. The beginning of service operations is usually a request for a change from an end user, and the process involves communicating with a service desk. It is the service desk’s responsibility to ensure that the issue is documented and eventually resolved. Specifically, this area includes problem management, incident management, configuration management, and Help desk operations. Service Delivery (ISBN 0113300174)—Service Delivery focuses on defining and establishing the types of services and infrastructure an IT department must provide to its “customers.” Topics include creating SLAs, managing overall capacity, developing availability goals, and determining financial management methods. These topics are particularly useful for identifying the purpose and business value of IT. Both Service Support and Service Delivery are parts of the overall Service Management topic. Planning to Implement Service Management (ISBN 0113308779)—Many organizations quickly realize the value of using the ITIL approach and want to know how best to move to this model. As few IT organizations have the luxury of starting completely from scratch, it’s important to understand how to migrate to ITIL recommendations. This set provides details about how an organization can develop a plan for implementing the best practices suggested within the ITIL framework. It includes information about justifying the use of ITIL (potential benefits). This area is an excellent starting point for IT managers that are prepared to “sell” their organizations on the value of the ITIL approach.

4

The Reference Guide to Data Center Automation • Security Management (ISBN 011330014X)—In recent years, computer security has moved to the forefront of issues for technical staff. As businesses store and provide larger amounts of information, protecting that data has become a critical part of operations. This set focuses on best practices for managing security throughout an IT organization. ICT Infrastructure Management (ISBN 0113308655)—The term Information Communications Technology (ICT) refers to traditional computer-based resources such as workstations and servers as well as the applications that they run (for example, office productivity suites, accounting packages, and so on). The acronym ICT (which is not widely used in the United States) generally refers to the end purpose of IT infrastructure. This volume focuses on managing computing resources, including network service management, operations management, and the installation and management of computing resources. The Business Perspective (ISBN 0113308949)—It is important for both business leaders and technologists to understand the overall benefits that can be provided by IT. This set focuses on ways in which IT can meet requirements through managing changes, establishing business continuity practices, and working with outside help through outsourcing. These topics are all critical to the business value of an IT environment. Application Management (ISBN 0113308663)—The primary purpose of IT infrastructure is to support the software that is required by users to perform their job functions. This set covers best practices related to managing the entire application life cycle, beginning with gathering and documenting the business requirements for the software. Although this topic is particularly helpful for organizations that develop custom applications, the practices are also useful for evaluating and implementing third-party products. Software Asset Management (ISBN 0113309430)—Managing applications throughout an entire IT environment can be a daunting and time-consuming task. Furthermore, the process must be ongoing as new programs are frequently added or updated. This set describes best practices for creating an inventory of software applications and managing the installed base. The topic enables IT to accurately track licensing compliance and to ensure that purchasing does not exceed requirements.

The content is applicable to many different levels within an IT organization, ranging from CIOs to systems administrators; it can also be helpful for business management professionals. From an implementation standpoint, the ITIL framework is intended to provide a set of flexible guidelines. There is definitely room for interpretation of the specific best practices, and it’s up to IT management to determine the best way to implement the recommendations. It is important to note that many of these areas are interrelated and the ideal infrastructure will take advantage of all the best practices presented in the framework.

5

The Reference Guide to Data Center Automation

ITIL Compliance In some cases, organizations might find that they’re already following at least some of the ITIL practices (regardless of whether they have consciously done so). Using ITIL’s methodology and recommendations can give structure to these efforts. In other cases, IT departments may be able to benefit greatly from implementing the features of the framework. Unlike some other business-related standards, there is no official certification or testing process that can “approve” an organization’s use of ITIL. It is up to organizations to determine the best methods and approaches for implementing these practices in their environments. There are, however, voluntary compliance certificates. These are known as the Foundation Certificate, the Practitioner’s Certificate, and the Manager’s Certificate (see Table 1 for more information about the certifications). ITIL Content and Resources The ITIL content is copyrighted, and can be obtained through books, CD-ROMs, or licensed intranet content. Many online book resellers offer the books and related media (they’re easiest to find using the ISBNs listed earlier in this topic). Table 1 provides a list of good online starting points for additional information. Additionally, numerous independent books and papers have been written. Each of these focuses on one or more of the topics presented by the ITIL framework. A web search for “ITIL” or any of the specific content topics will also uncover numerous vendors and publishers that offer related content. In addition, professionals that are looking for more information can join one of many different online forums and professional organizations dedicated to the ITIL methodology.
Web Site Office of Government and Commerce ITIL Information site ITIL “Open Guide” Notes Provides and overview of the purpose and function of ITIL. URL http://www.ogc.gov.uk/index.asp?id=10003 67

An open source-based version of basic ITIL content. The site provides resources that help define and organize the various terms and concepts used by the ITIL framework. An independent, not-for-profit organization that focuses on IT best practices. A portal for ITIL-related information, including a discussion forum and links to various ITIL resources. A voluntary registration site for IT professionals who use the ITIL methodology.

http://itlibrary.org/

IT Service Management Forum (itSMF) ITIL Community Forum

http://www.itsmf.com/index.asp

http://www.itilcommunity.com/

ITIL Certification Register

http://www.itlibrary.org/index.php?page=ITI L_Certification_Register

Table 1: ITIL-Related Web Sites.

6

The Reference Guide to Data Center Automation

The Business Value of Data Center Automation
Over time, modern businesses have grown increasingly reliant on their IT departments. Networked machines, multi-tier applications, and Internet access are all absolute requirements in order to complete mission-critical work. However, in many organizations, the clear business value of IT is difficult to estimate. Unlike departments such as sales and marketing, there are often few metrics available for quantifying how IT benefits the bottom line. Part of the reason for this disparity is that IT departments have evolved based out of necessity and have a history of filling a utilitarian role. Instead of presenting clear business value propositions, they tend to grow as needed and react to changing business requirements as quickly as possible. In many cases, this situation has caused IT budgets to shrink even while organizations are placing a greater burden on IT staff. Furthermore, business units often see IT as out of touch with the rest of the business. To ensure success for modern companies, it’s critical that all areas of the business recognize common goals and that all contribute toward achieving them. It’s difficult to deny the basic business value of IT departments, but the quandary that emerges revolves around how to measure, quantify, and communicate those benefits to business decision makers. This guide looks at the specific business benefits of IT, including details related to measuring benefits and costs. It then explores how data center automation can help increase the overall value that IT departments provide to their organizations. Basic Benefits of IT Practically everything that a business does relies upon the business’ underlying computing infrastructure. Accordingly, IT departments’ internal “customers” expect a certain level of service. They depend upon IT to perform various functions, including: Maintaining the infrastructure—If asked what their IT departments do, many end users would point to the computing infrastructure: setting up workstations and servers, keeping systems up-to-date, and installing and managing software. Reliable and high-performance Internet connectivity has become almost as vital as electricity; without the Internet, many business functions would cease. IT is responsible for implementing and maintaining an efficient infrastructure that supports these requirements. • Reacting to business changes—New business initiatives often place new (or at least different) requirements on the computing infrastructure. For example, a new marketing campaign might require new applications to be deployed and additional capacity to be added. Alternatively, an engineering group might require a new test environment in order to support the development of a new product. Usually, there is an organized process to be followed whenever an employee starts or leaves the company. These changes often need to be made as quickly as possible and in a cost-efficient manner. • Troubleshooting—From the Help desk to critical network and server support, the service desk is often the first point of contact with IT for users that are not able to do their jobs. Users rely on these resources to quickly and efficiently resolve any issues that arise. These benefits of IT generally point to tactical operations—performing maintenance-related operations. When enumerating the benefits of IT, often the first metrics that come to mind are those involving reliability, availability, and performance. Although these are certainly important considerations, they do not necessarily demonstrate the strategic advantage of how IT initiatives and projects can contribute to the bottom line. Consequently, it’s easy to simply look at IT as just a cost center. Regardless of whether end users realize it, IT departments do much to help their organizations.
7

The Reference Guide to Data Center Automation Calculating the Value of IT As with any business department, it’s important for management at all levels to see the benefits that are provided to the business as a whole. In some cases, this can be quite simple. For example, there are many metrics that can be used to measure sales and marketing performance. Most organizations realize that in addition to these other business areas, IT is a vital portion of operations. To calculate the business value of IT, organizations should establish well-defined metrics that reflect the overall business benefit of the computing infrastructure. The information required to do so extends far beyond the boundaries of the IT organization. Instead, it must involve all areas of the business, ranging from end users to executive management. The goal is to demonstrate how IT affects the business. Identifying Costs An important consideration for IT management is to be able to calculate and clearly communicate the real costs and benefits of the services that they provide. This identification usually starts with determining the Total Cost of Ownership (TCO) of a specific portion of the infrastructure. Often, when business leaders think of the costs related to increasing capacity, they think only of capital expenditures (such as the purchase price of a workstation or a server). In most environments, however, this cost represents only a very small portion of the total cost. IT departments must add in network-related costs, labor costs (for installation, configuration, and management), software licensing costs, and depreciation. Often, just the act of collecting this information can provide visibility into an IT department’s purpose and structure. It can also be very useful for identifying areas of improvement. Most importantly, however, when true costs are communicated, other areas of the business can begin to understand how their operations affect the overall finances of the company. Discovering Business Benefits Members of IT organizations have a tendency to think of the value of their services from a technical standpoint. It’s easy to look at server racks or performance statistics as evidence of a job well done. However, the best measurements of the value of IT involve the real impacts these measures have had on the business. For example, suppose that a new test lab environment has helped the Quality Assurance department reduce testing time by 25 percent; this metric is an important one for business leaders to recognize. Similarly, if the implementation of new antispam measures have increased productivity (if through nothing else than decreasing the negative productivity impact of spam), it’s important to capture this information. Enumerating business benefits requires strong communications with other areas of the organization. A good first step is to identify which areas of the business are directly benefiting from technology. IT leaders must understand how new infrastructure components such as servers and workstations are being used. They must be sure that the implemented solutions closely fit the problem. Based on this data, establishing metrics related to employee productivity and business results (such as sales improvements) can be directly tied back to IT initiatives and projects.

8

The Reference Guide to Data Center Automation Communicating Strategic Business Value Once the costs and benefits have been identified, business and technical leaders can realize how IT performs a strategic function—not just an operational one. In order to communicate strategic business value, IT departments should focus on overall business goals. For example, a key goal for a software company might be to reduce development time and shorten release cycles. The implementation of a new server or technology (such as server virtualization) can often provide dramatic benefits. If mobile sales personnel are having problems entering orders while on the road, improvements to the network infrastructure and better training might help alleviate the pain. Improving the Business Value of IT Once IT is identified as a critical part of business operations, the entire organization can work together to improve overall value. This step often starts with the planning phases for new projects and new initiatives. When business leaders can clearly see the benefits of investing in their IT departments, overall business performance improves. Decisions should be based on cost-benefit analysis calculations. Although this task might seem simple on the surface, it can actually require a significant amount of information. The costs related to processes should be as accurate as possible and should be based on capital asset costs (including servers, workstations, network devices, and software) as well as personnel costs. Additionally, there can be many “hidden fees,” including opportunity costs. Because IT resources are often stretched to the limits, a new project or initiative might result in labor and resource reductions for normal operations. All of these costs and potential tradeoffs should be clearly communicated to business decision makers so that they can make informed decisions about projects. When taking on new projects and initiatives, departments can work together with IT to determine the best approach. The Value of Data Center Automation So far, we’ve seen how a major component of overall IT costs and overall service levels relate to labor. It takes time and effort to maintain even small IT environments, and these factors can clearly affect the bottom line. One initiative that can provide clear benefits and a quick return on investment is data center automation. Data center automation solutions can dramatically reduce charges for one of the most expensive resources—labor. Tools and features that allow for automated deployment, provisioning, change management, and configuration tracking provide an excellent payoff.

9

The Reference Guide to Data Center Automation For example, a common challenge for most IT environments is that of keeping systems up to date. Managing security patches and other software changes can easily use up large amounts of time. Furthermore, the process tends to be error-prone: It’s easy for systems administrators to accidentally overlook one or a few systems. Through the use of data center automation, the same tasks can be performed in much less time with far less involvement from IT staff. This provides numerous benefits, including freeing systems administrators to work on other tasks. Often, automation increases the server-to-administrator ratio and reduces the amount of time required to perform operations. Other benefits include improved consistency, the enforcement of policies and processes, and improved security. Additionally, by implementing best practices (such as those provided with the ITIL), efficiency and operational reliability can improve. The bottom line is that data center automation can significantly improve the business value of IT. By reducing costs and removing data center-related bottlenecks, data center automation enables IT and business leaders to focus on more important tasks. The entire organization will be able to react more quickly and surely to changes, providing both strategic and tactical advantages to the entire enterprise. Implementing Charge-Backs A major problem for some IT organizations is that various departments often compete for infrastructure resources such as new servers or workstations. IT managers are often in the difficult position of deciding which projects are approved based on their limited resources and budgets. This can lead to an adversarial relationship and to some less-than-ideal decisions. One potential solution is to implement a system of charge-backs. In this system, the IT department would pass costs for various projects back to the departments that request them. The charges would affect these departments’ bottom lines. The idea is that business leaders will be much more judicious in their decisions when they directly experience the costs to the business. Although implementing and managing charge-backs can increase administration overhead, the overall cost savings can justify it. Of course, in order for this system to be successful, cooperation from the entire organization must be obtained. Enabling Better Decisions IT can leverage business value data to help the entire organizations make better decisions. For example, when considering ways in which to improve organizational efficiency, IT initiatives can play a pivotal role in controlling costs and adding capabilities. A well-managed IT department will have standards and processes in place to ensure that all aspects of the environment are properly managed. This can help answer important questions, such as “Are resources being allocated optimally?” and “Are the right projects being worked on?” With this new view, businesses can clearly see the IT department as a strategic partner instead of just a cost center.

10

The Reference Guide to Data Center Automation

Service Provider
Modern organizations often rely upon many vendors and outside resources to meet business objectives. For example, a marketing group might recruit outside talent to develop a Web site or to work on creative aspects of a new campaign. Alternatively, engineering groups might rely on outsourcing to contractors or consultants to build a portion of a product. IT departments, however, are often seen as cost centers that provide only basic infrastructure services. By treating IT departments as service providers, however, a strategic relationship can be established, and IT can be seen as a business partner. Benefits of Operating IT as a Service Provider The value of a service provider is often measured by its abilities to help its customers reach their goals. In this arena, customer service is most important. By having IT departments serve its customers in this arrangement, both can work together to ensure that the best projects and solutions—those that provide the most value to the individual business units—are delivered. When IT works as a service provider, it should act like an independent business. Its “customers” are the end users and departments that it serves, and its “products” are the services and technology solutions that are provided for use by the customers. Although this concept might at first seem like a strange approach for an internal department, there are many potential benefits. First, IT services are better communicated so that end users know what to expect (and what to do if expected service levels are not met). Second, all areas of the business can see how IT operations are helping them achieve their objectives. Implement the Service Provider Model There are several aspects that must be taken into consideration before an internal IT department can be seen as a business partner. This section will look at parts of the overall approach of becoming a service provider to internal customers.

11

The Reference Guide to Data Center Automation

Identifying Customers’ Needs A good salesperson will always work hard to determine a customer’s needs. If he or she truly believes in their products, they can quickly identify which are relevant and which will provide the best benefit. For IT as a service provider, this process can start with meetings with individual department leaders as well as end users that might have specific requirements. The overall goal for the service provider is to focus on the business goals of the customer, and not on technology itself. The first step is to identify the primary purpose of the department. This includes details related to how success is measured and approaches to achieving the success. The details will likely differ dramatically between, for example, sales and engineering organizations. Next, it is important to identify current “pain points”—problems or limitations that are reducing the level of success. Based on this input, IT service providers can develop proposed solutions that address those issues. As with a pre-sales effort, it’s important for IT to gather as much information as possible early in the game—well before any implementation has been discussed. During this phase, it’s important to identify functionality that is absolutely required as well as items that are not required but would be nice to have. If there is any ambiguity at this point, details and risks should be identified. Important high-level questions to ask include whether the benefits justify the costs and whether business demands truly present a need for the solution. Determining “Product Pricing” IT organizations should come up with complete pricing for their products and solutions. This pricing scheme should include details related to capital asset charges (including hardware, software, and network costs) as well as labor-related costs. Higher costs might be incurred by using non-standard hardware or software. Presenting such costs will help the customer determine whether a particular solution is cost-effective for their department and whether it benefits the organization as a whole. Additionally, other factors (such as a lack of personnel or when other high-priority projects are underway) should also be communicated to the customer. Identifying Service Delivery Details Once a customer has agreed to purchase a specific product or service from the IT department, it’s time to look into the implementation details. It’s important to identify the key stakeholders and to establish points of contact on the IT side and on the customer side. The goal should be to identify who is responsible for which actions. Milestones should be designed and mutually agreed upon before moving forward. Also, processes for managing changing requirements will help eliminate any surprises during the implementation of the solution. For larger projects, a change management process should be created, complete with approval authority from the customer and the service provider.

12

The Reference Guide to Data Center Automation

Measuring Service Levels An IT service provider can create products of various types. Some might be closed-ended initiatives, such as the installation of a Customer Relationship Management (CRM) solution, or the expansion of a development test lab. In those cases, service levels can be measured based on milestones and the quality of the implementation. Stakeholders can sign off on project completion just as they would with external vendors. Other products might involve expected levels of service. For example, when new servers and workstations are added, customers should know what type of response to expect when problems occur. Service Level Agreements (SLAs) can be instrumental in developing mutually agreedupon expectations. For less-critical systems, longer turnaround times might be acceptable. For mission-critical components, greater uptime and quicker response might be justified. Of course, those services will likely come at a higher cost because they will involve additional staff allocation, the purchase of high-availability solutions, and other features. Prioritizing Projects All businesses are constrained with limits on their amount of production, and IT departments are no exception. Based on labor capacity and technical constraints, only some of the proposed projects might prove to be feasible. In the traditional IT arrangement, departments often have to compete for infrastructure resources. Often IT departments are faced with the difficult situation of deciding which projects should continue and which simply cannot be taken on. However, when IT works as a service provider, the vendor and customer can work together to determine what is best for the business overall. If a particular implementation is extremely costly, both can decide to hold off until more resources become available. However, if multiple projects are similar and efficiency can be gained by combining them, the business will see an overall benefit.

Network Configuration Management
When things are working properly, most users barely realize that the network is there. But when network problems cause downtime, the costs to business operations can be tremendous. Still, IT organizations are faced with the difficult task of managing increasingly complex and distributed networks with limited staff and resources. Although configuring and managing network devices is a task of critical importance, it can be very difficult to perform accurately and consistently. Network Configuration Management (NCM) refers to the use of an automated method to configure and manage network devices throughout an IT environment.

13

The Reference Guide to Data Center Automation

NCM Tasks The act of managing the components of a network can place a significant burden on IT staff. The process starts with the deployment of new routers, switches, firewalls, and other devices. New hardware has to be purchased and configured before it’s brought online. The deployment must be tested, and network administrators must verify that it is working according to the network guidelines. And that’s just the beginning. Maintenance operations include regularly updating to the latest available security patches. Other routine maintenance functions involve changing passwords and updating configurations. Business-related changes can often require significant upgrades or modifications to the network infrastructure and adding capacity is a regular task in growing organizations. The goal of configuration is to rapidly respond to change requests that range from opening a single firewall port to redesigning entire subnets—without introducing new problems to the environment. Configuration Management Challenges There are many challenges that are related to managing network configurations. Some of these challenges include: • Making configuration changes—In all but the smallest of network environments, the time and effort required to manually modify configuration settings on dozens or hundreds of devices can be a tedious, time-consuming, and error-prone task. Enforcing processes—In many IT environments, it’s very easy to perform technical operations in an ad-hoc manner. Due to the stress and pressure of reacting to business demands, network administrators often take shortcuts and directly make modifications. Although this can lead to what seems like a quick response, it can lead to serious problems in network configurations later. Clearly, processes must be enforced. Adhering to best practices—Network security best practices include frequently changing passwords, ensuring that patches are applied quickly, and monitoring devices for suspicious activity. Often, due to time and resource limitations, these tasks are lowered in priority. However, ensuring consistent configurations and adhering to change control processes are critical for reliability of the network. Communication and coordination—Network administrators might understandably make a change to resolve an urgent situation. Once the situation is resolved, however, they might fail to communicate this to their peers. Should a problem occur in the future, this can complicate tracking down the root cause of the issue. Distributed administration can also cause problems. Although it’s often necessary for multiple network administrators to have access to the same devices, when two or more administrators modify a device, they may inadvertently overwrite the other’s changes. Such “collisions” can lead to complex problems that are difficult to troubleshoot.

Regardless of the amount of work involved, IT departments are often limited in labor resources to perform these tasks. That is where automation steps in.

14

The Reference Guide to Data Center Automation

NCM Solutions Automated NCM solutions can help address many of the challenges related to maintaining a network infrastructure. The key feature of an automated NCM solution is that all modifications are made through the system. Ideally, network administrators do not have direct access to the actual device configurations themselves. All modifications must occur based on a specific workflow and changes are tracked for later review (see Figure 2).

Figure 2: Configuration management using an NCM solution.

Benefits of Automating NCM The list of benefits related to using an automated NCM solution is a long one. Specifically, NCM solutions provide: • Improved efficiency—Manually configuring routers, switches, and other devices can take a significant amount of effort. Some important changes might simply require too much time and effort and may never be performed at all. Automated solutions can handle changes for hundreds of devices without requiring manual intervention. Network administrators can use the time that they save to focus on other tasks that better use their skills. The end result is that the network infrastructure can be more reactive to business changes, and costs can be lowered. Policy enforcement—In a manually managed environment, it’s up to each individual to be responsible for adhering to processes. It’s often difficult to remember all the processes, and in some cases, network administrators might take shortcuts. Related problems can be difficult to isolate and resolve. Through the use of automated configuration management, IT managers can be assured that all changes are coordinated, tracked, and done in accordance with the defined policies. Automated network discovery—Modern networks tend to be very complex and have hundreds or even thousands of devices that must be accounted for and managed. It’s understandably easy to overlook important pieces of the infrastructure. Automated solutions aid in the process of collecting information and can store and display data about the environment without requiring a manual scavenger hunt. This setup helps prevent surprises when managing the entire environment.

15

The Reference Guide to Data Center Automation • Improved security—Neglecting to keep network infrastructure devices up to date can lead to security violations or reliability issues. Automated NCM solutions can quickly identify and resolve any maintenance- or configuration-related problems according to company policies. Configuration consistency—When dealing with complex environments, consistency is an important factor. Without automation, it’s very easy for human error to creep into configuration files. Ad-hoc changes are difficult to detect, and a less-than-ideal configuration may persist for months or years. In the worst case, the configuration problem will be detected only after a security violation or downtime is experienced. Making the change process easier can also avoid putting off important modifications simply because of the amount of effort required. The improved responsiveness means that significant changes can be performed with minimal disruption to the business. Backup and recovery—Network device configurations can be complex and vital to the proper operations of a business. An automated configuration management tool can regularly collect configuration information for an entire network environment and store it securely. In the event of a device failure, the configuration can be quickly restored, reducing downtime and the loss of setup details. Monitoring—Network performance is a critical aspect of business operations in many environments. NCM tools can regularly measure performance statistics throughout an environment and can report on any potential problems—often before users even notice delays. Auditing and reporting—Various business processes can benefit from visibility into the entire infrastructure that is supported by an organization. Auditing allows network administrators to compare the intended configuration of a device with its actual configuration. For organizations that must adhere to regulations such as the SarbanesOxley Act or the Health Insurance Portability and Accountability Act (HIPAA), auditing can significantly help in proving compliance. Any inconsistencies can be quickly identified and resolved. Additionally, reporting provides IT managers with the ability to gain insight into what has been deployed along with how it’s being used.

Choosing an NCM Solution The many benefits of using an automated NCM solution are difficult to overlook, but this leads to the question of how to choose the best product. First and foremost, an NCM solution should allow IT managers to define and enforce change control policies and processes. The solution should ensure that changes can be made only by authorized individuals and only after the appropriate steps (such as peer review) have been taken. All changes should be tracked, and the solution should provide ways for auditing settings to ensure that everything is working as desired. The solution should also provide for automatic backup and restore of configuration data—automatic backups so that the latest authorized configuration is always in safe storage, and automated restore to, for example, roll back after a failed change deployment or after an unauthorized change deployment. Organizations should look for solutions that use a centralized configuration management database and that allows for tracking details of other computing resources such as workstations and servers. With all these features, the otherwise difficult task of maintaining a network infrastructure can become simpler and much more efficient.
16

The Reference Guide to Data Center Automation Selecting an NCM solution can be complicated. Your business’ own requirements—both technical and procedural—will define the exact feature set you need. However, there are some requirements for which most businesses and organizations can find common ground: Security—An NCM solution should provide granular, role-based security so that each individual using the system can have exactly the permissions they need and no more. This includes the ability for auditors (for example) to review configuration information but not to make changes. Authentication should be centralized and, when appropriate, integrated with any existing directory or authentication (such as TACACS+) service you have in place. • Configuration repository—The solution should provide a version-controlled repository, enabling the retrieval of past versions of a device’s configuration. Capture of configuration data into the repository should be made as part of change deployments, on a regular basis, and on-demand. • Logging—All activity should be logged. This should include pass-through Telnet or SSH activity, where such logging usually takes the form of keystroke logging so that all administrative activity can be properly audited. • Workflow enforcement—If your company has a process for managing change, the solution should help enforce that process. For example, solutions should be able to enforce a peer review or managerial approval requirement before allowing changes to be deployed. • Notification—Full notification—of unauthorized changes, successful deployments, and other events—capabilities should be built-in to the solution. Such capabilities help alert your IT staff to problems that need their attention or to recent events that might require manual follow-up or verification. • Configuration policies and remediation—When possible and desirable, you might want a solution that is capable of analyzing device configurations and comparing them with a standard configuration template or policy. By alerting you to nonstandard configurations, the solution can help you identify devices that do not meet, for example, security or compliance requirements. Automated remediation goes a step further by automatically reconfiguring non-compliant devices to meet your configuration standards. • Configuration comparison—The solution should provide the ability to compare different versions of a device’s configuration, visually highlighting differences for quick identification and review. • Automation—When possible, the solution should respond automatically to configuration events such as reconfigurations that occur outside the solution. This support might derive from syslog or Simple Network Management Protocol (SNMP) monitoring or through other means. • Multiple vendor support—A solution should support, of course, every brand and model of device you have in operation. Further, the solution should be architected in a way that facilitates easy addition of additional device support, helping make the solution “future proof.” By using these broad requirements as a starting point, you can begin to identify key features and capabilities that are important to your organization and conduct pilot programs and product evaluations to locate products and solutions that meet your specific needs. •

17

The Reference Guide to Data Center Automation

Server Provisioning
Most IT users recognize that one of the most important—and visible—functions of their IT departments is setting up new computers. Server provisioning is the process of readying a server for production use. It generally involves numerous tasks, beginning with the purchase of server hardware and the physical racking of the equipment. Next is the important (and tedious) task of installing and configuring the operating system (OS). This step is followed by applying security patches and OS updates, installing any required applications, and performing security configuration. When done manually, the entire process can be time consuming and error prone. For example, if a single update is overlooked, the server may be vulnerable to security exploits. Furthermore, even in the smallest IT environments, the task of server provisioning is never really “done”— changes in business and technical requirements often force administrators to repurpose servers with new configuration settings and roles. Challenges Related to Provisioning Modern OSs are extremely flexible and complicated pieces of software. They have hundreds of configurable options to meet the needs of various roles they may take on. Therefore, the process of readying a new server for production use can involve many different challenges. Some of these include: • Configuring OS options—New servers should meet corporate technical and business standards before they’re brought online. Ensuring that new machines meet security requirements might involve manual auditing of configurations—a process that is neither fun nor reliable. Other important settings include computer names, network addresses, and the overall software configuration. The goal should be to ensure consistency while minimizing the amount of effort required—two aspects that are not usually compatible. Labor-related costs—Manual systems administration tasks can result in large costs for performing routine operations. For example, manually installing an OS can take hours, and the potential for errors in the configuration is high. Support for new platforms—Provisioning methods must constantly evolve to support new hardware, OS versions, and service packs. New technologies, such as ultra-dense blade server configurations and virtual machines, often require new images to be created and maintained. And, there is always a learning curve and some “gotchas” associated with supporting new machines. Redeployment of servers—Changing business requirements often necessitate that servers be reconfigured, reallocated, and repurposed. Although it is difficult enough to prepare a server for use the first time, it can be even more challenging to try to adapt the configuration to changing requirements. Neither option (reconfiguration or reinstallation) is ideal. Keeping servers up to date—The installation and management of security updates and OS fixes can require a tremendous amount of time, even in smaller environments. Often, these processes are managed on an ad-hoc basis, leading to windows of vulnerability.

18

The Reference Guide to Data Center Automation • Technology refreshes—Even the fastest and most modern servers will begin to show their age in a matter of just a few years. Organizations often have standards for technology refreshes that require them to replace a certain portion of the server pool on a scheduled basis. Migrating the old configuration to new hardware can be difficult and time consuming when done manually. Support for remote sites—It’s often necessary to support remote branch offices and other sites that might require new servers. Sometimes, the servers can be installed and configured by the corporate IT department and then be physically shipped. In other cases, IT staff might have to physically travel between sites. The costs and inefficiencies of this process can add up quickly. Business-related costs—As users and business units await new server deployments, there are often hidden costs associated with decreases in productivity, lost sales opportunities, and associated business inefficiencies. These factors underscore the importance of quick and efficient server provisioning.

Clearly, there is room for improvement in the manual server-provisioning process. Server-Provisioning Methods Many OS vendors are aware of the pain associated with deploying new servers. They have included numerous tools and technologies that can make the process easier and smoother, but these solutions also have their limitations. To address the challenges of server provisioning, there are two main approaches that are typically used. Scripting The first is scripting. This method involves creating a set of “answer files” or scripts that are used to provide configuration details to the OS installation process. Ideally, the entire process will be automated—that is, no manual intervention is required. However, there are some drawbacks to this approach. First, the process of installing an OS can take many hours because all the hardware has to be detected and configured, drivers must be loaded, hard disks must be formatted, and so on. The second problem is that the scripts must be maintained over time, and they tend to be “fragile.” When hardware and software vendors make even small specification changes, new drivers or versions might be required. Imaging The other method of automating server provisioning is known as imaging. As its name suggests, this approach involves performing a base installation of an OS (including all updates and configuration), then simply making identical copies of the hard disks. The disk duplication may be performed through dedicated hardware devices or through software. The major problems with this approach include the creation and maintenance of images. As the hardware detection portion of OS installation is bypassed, the images must be created for each hardware platform on which the OS will be deployed. Hardware configuration changes often require the creation of new images. Another problem is in managing settings that must be unique, including OS security identifiers (SIDs), network addresses, computer names, and other details. Both approaches involve some important tradeoffs and neither is an ideal solution for IT departments.

19

The Reference Guide to Data Center Automation Evaluating Server-Provisioning Solutions Automated server-provisioning tools allow IT departments to quickly and easily define server configurations, install OSs, perform patches and updates, and get computers ready for use as quickly as possible. When looking for an automated server-provisioning system, there are many features that might help increase efficiency and better manage the deployment process. Features to look for in an automated provisioning solution include: • Broad OS compatibility—Ideally, a server-provisioning solution will support all of the major OSs that your environment plans to deploy. Also, continuing updates for new OS versions and features will help “future-proof” the solution. Integration with other data center automation tools—Server provisioning is often the first step in many other related processes, such as configuration management and asset tracking. A deployment solution that can automatically integrate with other IT operations tools can help reduce chances for error and increase overall manageability. Hardware configuration—Modern server computer platforms often include advanced management features for configuring the BIOS, disk arrays, and other options. Serverprovisioning tools can take advantage of these options to automate steps that might otherwise have to be done manually. License tracking—Keeping track of OS and software licensing can easily be a full-time job, even in smaller organizations. Server-provisioning tools that provide license-tracking functionality can make the job much easier by recording which licenses are used and on which machines. Support for network-based installation—A common deployment method involves using network-based Pre-Boot eXecution Environment (PXE) booting. This method allows computers that have no OS installed to connect to an installation server over a network and begin the process. When all the components are in place, this method of provisioning can be the most “hands-off” approach. Duplicating the configuration of a server—Upgrading servers to new hardware platforms is a normal part of data center operations. Server-provisioning tools that allow for backing up and restoring the configuration of an OS on new hardware can help make this process quicker, easier, and safer. Ability to define configuration “templates”—Most IT departments have standards for the configuration of their servers. These standards tend to specify network settings, security configuration, and other details. When deploying new servers, it’s useful to have a method for developing a template server configuration that can then be applied to other machines. Support for remote sites—Deploying new servers is rarely limited to a single site or data center, so the server provisioning tool should provide methods for performing and managing remote deployments. Depending on the bandwidth available at the remote sites, multiple installation sources might be required.

Overall, a well-designed automated server-provisioning tool can dramatically decrease the amount of time it takes to get a new server ready for use and can help ensure that the configuration meets all of an organization’s business and technical requirements.

20

The Reference Guide to Data Center Automation

Return on Investment
IT departments are often challenged to do more with less. They’re posed with the difficult situation of having to increase service levels with limited budgets. This reality makes the task of determining which investments to make far more important. The right decisions can dramatically decrease costs and improve service; the worst decisions might actually increase overall costs. In many ways, IT managers just know the benefits of particular technologies or implementations. We can easily see how automation can reduce the time and effort required to perform certain tasks. But the real challenge is related to how this information can be communicated to others within the organization. The basic idea is that one must make an investment in order to gain a favorable return. And most investments involve at least some risk. Generally, there will be a significant time between when you choose to make an investment, and when you see the benefits of that venture. In the best case, you’ll realize the benefits quickly and there will be a clear advantage. In the worst case, the investment may never pay off. The following sections explore how Return on Investment (ROI) can be calculated and how it can be used to make better IT decisions. The Need for ROI Metrics The concept of ROI focuses on comparing the potential benefits of a particular IT project with the associated costs. From the standpoint of technology, IT managers must have a way of communicating the potential benefits of investments in process improvements and other projects. These are the details that business leaders will need in order to determine whether to fund the project. Additionally, once projects are completed, IT managers should have a way of demonstrating the benefits of the investment. Finally, no one can do it all—there are often far more potential projects than staff and money to take them on. ROI is a commonly used business metric that is familiar to CFOs and business leaders; it compares the cost of an investment against the potential benefits. When considering investments in ventures such as a new marketing campaign, it’s important to know how soon the investment will pay off, and how much the benefit will be. Often, the costs are clear—it’s just a matter of combining that with risks and potential gain. By using ROI-based calculations, businesses can determine which projects can offer the most “bang-for-the-buck.” A high ROI is a strong factor in ensuring the idea is approved. Calculating ROI Although there are many ways in which ROI can be determined, the basic concepts remain the same: The main idea is to compare the anticipated benefit of an investment with its expected cost. Terms such as “benefit” and “cost” can be ambiguous, but this section will show the various types of information you’ll need in order to calculate those numbers.

21

The Reference Guide to Data Center Automation

Calculating Costs IT-related costs can come from many areas. The first, and perhaps easiest to calculate, is related to capital equipment purchases. This area includes the “hard costs” spent on workstations, servers, network devices, and infrastructure equipment. The actual amounts spent can be divided into meaningful values through metrics such as “average IT equipment cost per user.” In addition to hardware, software might be required. Based on the licensing terms with the vendor, costs may be one-time, periodic, or usage-based. For most environments, a large portion of IT spending is related to labor—the effort necessary to keep an environment running efficiently and in accordance with business requirements. These costs might be measured in terms of hours spent on specific tasks. For example, managing security updates might require, on average, 10 hours per server per year. Well-managed IT organizations can often take advantage of tracking tools and management reports to determine these costs. In some cases, independent analysis can help. When considering an investment in an IT project, both capital and labor costs must be taken into account. IT managers should determine how much time and effort will be required to make the change, and what equipment will be required to support it. In addition, costs related to down time or any related business disruptions must be factored in. This might include, for example, a temporary loss of productivity while a new accounting application is implemented. There will likely be some “opportunity costs” related to the change: Time spent on this proposed project might take attention away from other projects. All these numbers combined can help to identify the total cost of a proposal. Calculating Benefits So far, we’ve looked at the downside—the fact that there are costs related to making changes. Now, let’s look at factors to take into account when determining potential benefits. An easy place to start is by examining cost reductions related to hardware and software. Perhaps a new implementation can reduce the number of required servers, or it can help make more efficient use of network bandwidth. These benefits can be easy to enumerate and total because most IT organizations already have a good idea of what they are. It can sometimes be difficult for IT managers to spot areas for improvement in their own organizations. A third party can often shed some light on the real costs and identify areas in which the IT teams stand to benefit most. Other benefits are more difficult to quantify. Time savings and increases in productivity are important factors that can determine the value of a project. In some cases, metrics (such as sales projections or engineering quality reports) are readily available. If it is expected that the project will yield improvements in these areas, the financial benefits can be determined. Along with these “soft” benefits are aspects related to reduced downtime, reduced deployment times, and increased responsiveness from the IT department.

22

The Reference Guide to Data Center Automation

Measuring Risk Investment-related risks are just part of the game—there is rarely a “sure thing” when it comes to making major changes. Common risks are related to labor and equipment cost overruns. Perhaps designers and project managers underestimated the amount of effort it would require to implement a new system. Or capacity estimates for new hardware were too optimistic. These factors can dramatically reduce the potential benefit of an investment. Although it is not possible to identify everything that could possibly go wrong, it’s important to take into account the likelihood of cost overruns and the impacts of changing business requirements. Some of these factors might be outside the control of the project itself, but they can have an impact on the overall decision. Using ROI Data Once you’ve looked at the three major factors that can contribute to an ROI calculation—costs, benefits, and risk—you must bring it all together. ROI can be expressed in various ways. The first is as a percentage value. For example, consider that implementing a new software package for the sales department will cost approximately $100,000 (including labor, software, and capital equipment purchases). Business leaders have determined that, within a period of 2 years, the end result will be an increase in sales efficiency that equates to an additional $150,000 in revenue. It can be said that the potential ROI for this project is equal to the benefit minus the cost. Expressed as a percentage, this project will provide a 50 percent ROI within 2 years. ROI can also be expressed as a measure of time. Specifically, it can indicate how long it might take to recover the value of an investment. For example, an organization might determine that it will take approximately 1.5 years to reach a “break-even” point on a project. This is where the benefits from the project have paid back the costs of the investment. This method is more useful for ongoing projects, where continual changes are expected. As with all statistical data of this type, ROI calculations can be highly subjective. It’s important that your company develop its own standards for calculating ROI in order to provide consistent, reliable results. Risk should be carefully considered—for example, although a new solution might offer a department 20 percent better efficiency, what are the odds that new employees will be added who have inherently lower efficiency and productivity during their first days and weeks? Also, as you implement solutions, be sure to track the actual ROI, including out-of-plan events (such as new hires) that may impact the overall ROI and result in a different actual return.

23

The Reference Guide to Data Center Automation

Making Better Decisions IT and business leaders can use ROI information to make better decisions about their investments. Once details related to the expected ROI for potential projects are determined, all areas of an organization can make educated decisions based on the anticipated risk and rewards. Factors to look for include rapid implementation times, clearly defined tangible benefits, and quick returns. It’s important to tailor the communications of details based on the audiences. A CFO might not care that new servers are 30 percent more efficient than previous ones, but she’s likely to take notice if power, space, and cooling costs can be dramatically lowered. Similarly, when users understand that they’ll experience decreased downtime, they’ll be more likely to support a change. Many different projects can be compared based on the needs of the business. If management is ready to make significant investments, the higher-benefit/higher-cost projects might be best. Otherwise, lower-cost projects may be chosen. In either case, the goal should be to invest in the projects with the highest ROI. Figure 3 provides an example of a chart that might be used to compare details of various investments.

Figure 3: A chart plotting potential return vs. investment.

ROI numbers can also be very helpful for communicating IT decisions throughout an organization. When non-technical management can see the benefits of changes such as implementing automated processes and tools, this insight can generate buy-in and support for IT initiatives. For example, setting up new network services might seem disruptive at first, but if business leaders understand the cost savings, they will be much more likely to support the effort

24

The Reference Guide to Data Center Automation Calculating ROI for some IT initiatives can be difficult. For example, security is one area in which costs are difficult to determine. Although it would be useful if the IT industry had actuarial statistics (similar to those used in, for example, the insurance industry), such data can be difficult to come by. In these situations, IT managers should consider using known numbers, such as the costs of downtime and damages caused by data loss, to help make their ROI-related case. And it’s important to keep in mind that in most ROI calculations, subjectivity is inevitable—you can’t always predict the future with total accuracy, and sometimes you must just take your best guess. ROI Example: Benefits of Automation One area in which most IT departments can gain dramatic benefit is through data center automation. By reducing the amount of manual time and effort required, substantial cost savings can be realized in relatively short periods of time. This section will bring together these details to help determine the potential ROI of an investment in automation. In this hypothetical example, a company has decided that it is spending far too much money on routine server maintenance operations (including deployment, configuration, maintenance, and security). The environment supports 150 servers, and it estimates that it spends an average of $1500 per year to maintain each server (including labor, software, and related expenses; this figure is purely for illustrative and discussion purposes and will probably not reflect real-world maintenance figures in your environment). The organization has also found that, through the use of automation tools, it can reduce these costs dramatically. By implementing automated server provisioning and patch management solutions, it can reduce the operating cost to ~$300 per year per server. Using these numbers, the overall cost savings would be a total of $1200 per server per year, or a grand total of $180,000 saved. The cost of purchasing and implementing the automation solution is expected to be approximately $120,000, providing a net potential benefit of $60,000 within one year (again, these numbers are purely for illustration and discussion and do not reflect an actual ROI analysis of a real-world environment). ROI Analysis Based on the numbers predicted, the implementation of automation tools seems to be a good investment. The return is a substantial cost savings, and the results will be realized in a brief period of time. There is an additional benefit to making improvements in automation—time that IT staff spends on various routine operations can be better spent on other tasks that make more efficient use of their time and skills. For example, time that is freed by automating security patch deployment can often increase resources for testing patches. That might result in patches being deployed more quickly, and fewer problems with the patch deployment process. The end result is a better experience for the entire organization. In short, data center automation provides an excellent potential ROI, and is likely to be a good investment for the organization as a whole.

25

The Reference Guide to Data Center Automation

Change Advisory Board
Regardless of how well-aligned IT departments are with the rest of their organizations, an important factor in their overall success is how well IT can manage and implement change. Given that change is inevitable, the challenge becomes implementing policies and processes that are designed to ensure that only appropriate changes are made, and that the process involves input from the entire organization. Best practices defined within the IT Infrastructure Library (ITIL) recommend the creation of a Change Advisory Board. The CAB is a group of individuals whose purpose is to provide advice related to change requests. Specifically, details related to the roles and responsibilities of the CAB are presented in the Service Support book. The CAB itself should include members from throughout an organization, and generally will include IT management and business leaders, as required. The Purpose of a CAB A characteristic of well-managed IT organizations is having well-defined policies and processes. It doesn’t take much imagination to see how having numerous systems and network administrators making ad-hoc changes can lead to significant problems and inefficiencies. To improve the implementation of change, a group of individuals from throughout the organization is required. Members of the CAB are responsible for controlling which changes are made, how they’re made, and when. The CAB performs tasks related to monitoring, evaluating, and implementing all production-related IT changes. Their goal should be to minimize the risk and maximize the benefits of suggested changes and to handle all change requests in an organized way. Benefits of a CAB The main benefits of creating a CAB are related to managing a major source of potential IT problems—changes to the existing environment. IT changes can often affect the entire organization, so the purpose of the CAB is to determine which changes should occur and to specify how and when they should be performed. The CAB can define a repeatable process that ensures that requests have gone through an organized process and ad-hoc modifications are not allowed. Through the CAB review process, some types of problems such as “collisions” caused by multiple related changes being made by different people can be reduced. Roles on the CAB To be successful, the CAB must include representatives from various parts of the business. The list of roles will generally begin with a change requester—the member of the organization that suggests that a new implementation or modification is required. The actual people who take on this role will vary based on the needs of the organization, but often the requesters will be designated by the company’s management. Sometimes, when groups of users are affected, one or a few people may be appointed in this role.

26

The Reference Guide to Data Center Automation The CAB roles that are most important from a process standpoint are the members who perform the review of the change request. In simple cases, there may only be a single approver. But, for larger changes, it’s important to have input from both the technical and business sides of the organization. The specific individuals might be business unit managers, IT managers, or people who have specific expertise in the type of change being requested. The next set of roles involves those who actually plan for, test, and implement the change. These individuals may or may not be a portion of the CAB. In either case, however, it is the responsibility of those who perform the changes to communication with CAB members to coordinate changes with all the people that are involved. As with many other organizational groups, it’s acceptable for one person to fill multiple roles. However, as changes get more complex and have greater effects throughout the organization, it is important for IT groups to work with the business units they support. The Change-Management Process To ensure that all change requests are handled efficiently, it’s important for the CAB to establish a defined process. The process generally begins with the creation of a new request. Change requests can come from any area within an organization. For example, the marketing department might require additional capacity on public-facing servers to support a new campaign or the engineering group might require hardware upgrades to support the development of new products. Change requests can also come from within the IT department and might involve actions such as performing security updates or installing a new version of important software on all servers. Some change requests can be minor (such as increasing the amount of storage space available to a group of users), while others might require weeks or months of planning. Figure 4 provides an overview of the steps required in a successful change-management process. Steps will need to be added to deal with issues such as changes that are rejected or implementations that don’t fail.

Figure 4: A change-management process overview.

Ideally, the CAB will have established a uniform process for requesting changes. The request should include details related to why the change is being requested, who will be affected by the change, anticipated benefits, possible risks, and details related to what changes should occur. Changes should be categorized based on various criteria, such as the urgency of the change request. Organizations that must deal with large numbers of changes can also benefit from automated systems that create and store requests in a central database.
27

The Reference Guide to Data Center Automation When the CAB receives a new request, it can start the review process. It’s a good practice for the CAB members to meet regularly to review new requests and discuss the status of those that are in progress. During the review process, the CAB determines which requests should be investigated further. Planning for Changes Once a request is initially approved, the CAB should solicit technical input from those that are responsible for planning and testing the changes. This process may involve IT systems and network administrators, software developers, and representatives from affected business units. The goal of this team is to collect information related to the impact of the change. The questions that should be asked include: • Who will be affected? For most change requests, the effects will be seen outside of the IT department. If specific individuals or business units will be affected by downtime, changes in performance, or functional changes, the expected outcomes should be documented. What are the costs? Even the simplest of change requests will require labor costs related to implementing the changes. In many cases, IT organizations might need to purchase more equipment to add capacity, or specific technical expertise might be required from external vendors. What are the risks? Most changes have an inherent associated risk. Just the act of changing something suggests that new or unexpected problems may arise. All portions of the business should fully understand the risks before committing to making a change. What is the best way to make the change? Technical and business experts should research the best way to meet the requirements of the change request and make recommendations. This step usually involves several areas of the organization working together closely. The goal is to provide maximum potential benefits while minimizing risk and effort required.

Based on all these details, the CAB can determine whether they should proceed with the change. In some cases, reality might indicate that it’s not prudent to make the change. Implementing Changes If the potential benefits are difficult to overlook, and the risk is acceptable, the next step is to implement the changes. An organization should follow a standardized change process, and the CAB should be responsible for ensuring that the processes are followed. Often, at least the service desk should be aware of what changes are occurring and any potential impacts. This will allow them to respond to calls more efficiently and will help identify which issues are related to the change. During the implementation portion of the process, good communication can help make for a smoother ride. For quick and easy changes, all that might be required is an email reminder of the change and its intended affects. For larger changes, regular status updates might be better. As with the rest of the process, it’s very important that technical staff work with the affected business units in a coordinated way.

28

The Reference Guide to Data Center Automation Reviewing Changes Although it might be tempting to “close out” a request as soon as a change is made, the responsibilities of the CAB should include reviewing changes after they’re complete. The goal is not only to determine whether the proper process was followed but also to look for areas of improvement within the procedures. The documentation generated by this review (even if it’s only a brief comment) can be helpful for future reference. Planning for the Unplanned Although the majority of changes should be performed through the CAB, some types of emergencies might warrant a simplified process. For example, if a Web server farm has slowed due to a Distributed Denial of Service (DDoS) attack, changes must be made immediately. If this happens during the night or over a weekend, authorized staff should have the authority to make the necessary decisions. The CAB might choose to create a “change request” after the fact, and follow the same rigorous review steps at a later time. Overall, through the implementation of a CAB, IT organizations can help organize the change process. The end result is reduced risk and increased coordination throughout the organization.

Configuration Management Database
To make better business and technical decisions, all members of the IT staff need to have a way of getting a single, unified view of “everything” that is running their environments. A Configuration Management Database (CMDB) is a central information repository that stores details related to an IT environment. It contains data hardware and software deployments and allows users to collect and report on the details of their environments. The CMDB contains information related to workstations, servers, network devices, and software. Various tools and data entry methods are available for populating the database, and most solutions provide numerous configurable reports that can be run on-demand. The database itself can be used to track and report on the relationships between various components of the IT infrastructure, and it can serve as a centralized record of current configurations. Figure 5 shows an overview of how a CMDB works with other IT automation tools. Various data center automation tools can store information in the CMDB, and users can access the information using an intranet server. The goal of using a CMDB is to provide IT staff with a way to centrally collect, store, and manage network- and server-related configuration data.

29

The Reference Guide to Data Center Automation

Figure 5: Using a CMDB as part of data center automation.

The Need for a CMDB Most IT organizations track information in a variety of different formats and locations. For example, network administrators might use spreadsheets to store IP address allocation details. Server administrators might store profiles in separate documents or perhaps in a simple customdeveloped database solution. Other important details might be stored on paper documents. Each of these methods has weaknesses, including problems with collecting the information, keeping it up-to-date, and making it accessible to others throughout the organization. The end result is that many IT environments do not do an adequate job of tracking configurationrelated information. When asked about the network configuration of a particular device, for example, a network administrator might prefer to connect directly to that device over the network rather than refer to a spreadsheet that is usually out-of-date. Similarly, server administrators might choose to undergo the tedious process of logging into various computers over the network to determine the types and versions of applications that are installed instead of relying on older documentation. If the same staff has to perform this task a few months later, they will likely choose to do so manually again. It doesn’t take much imagination to recognize that there is room for improvement in this process.

30

The Reference Guide to Data Center Automation

Benefits of Using a CMDB A CMDB brings all the information tracked by IT organizations into a single centralized database. The database stores details about various devices such as workstations, servers, and network devices. It also maintains details related to how these items are configured and how they participate in the infrastructure of the IT department. Although the specific details of what is stored might vary by device type, all the data is stored within the centralized database solution. The implementation of a CMDB can help make IT-related information much easier to collect, track, and report on. Among the many benefits of using a CMDB are the following: • Configuration auditing—IT environments tend to be complex, and there are often hundreds of different settings that can have an impact on overall operations. Through the use of a CMDB, IT staff can compare the expected settings of their computers with the actual ones. Additionally, the CMDB solution can create and maintain an audit trail of which users made which changes and when. These features can be instrumental in demonstrating compliance with regulatory standards such as the Health Insurance Portability and Accountability Act (HIPAA) or the Sarbanes-Oxley Act. Centralized reporting—As all configuration-related information is stored in a central place, through the use of a CMDB, various reporting tools can be used to retrieve information about the entire network environment. In addition to running pre-packaged tools, developers can generate database queries to obtain a wide variety of custom information. Many CMDB reporting solutions provide users with the ability to automatically schedule and generate reports. The reports can be stored for later analysis via a Web site or may be automatically sent to the relevant users via email. Change tracking—Often, seemingly complicated problems can be traced back to what might have seemed like a harmless change. A CMDB allows for a central place in which all change-related information is stored, and the CMDB system can track the history of configuration details. This functionality is particularly helpful in modern network environments where it’s not uncommon for servers to change roles, network addresses, and names in response to changing business requirements. Calculating costs—Calculating the bottom line in network environments requires the ability to access data for software licenses and hardware configurations. Without a centralized solution, the process of collecting this information can take many hours. In addition, it’s difficult to trust the information because it tends to become outdated very quickly. A CMDB can help obtain details related to licenses, support contracts, asset tags, and other details that can help quickly assess and control costs.

Overall, a CMDB solution can help address many of the inefficiencies of other methods of configuration data collection.

31

The Reference Guide to Data Center Automation

Implementing a CMDB Solution The goal of a CMDB is to help record and model the organization of an IT network environment within a central data storage point. Although the details of implementation can vary greatly between organizations, the same basic information is usually collected. Vendors that provide data center automation solutions often rely upon a CMDB to track what is currently running in the environment and how these devices are set up. Implementing a new CMDB solution often begins with the selection of an acceptable platform. Although IT organizations might choose to develop in-house custom solutions, there are many benefits to using pre-packaged CMDB products. This section will look at the details related to what information should be tracked and which features can help IT departments get the most from their databases. Information to Track The IT industry includes dozens of standards related to hardware, software, and network configuration. A CMDB solution may provide support for many kinds of data, with the goal of being able to track the interaction between the devices in the environment. That raises the question of what information should be tracked. Server Configuration Server configurations can be complex and can vary significantly based on the specific OS platform and version. The CMDB should be able to track the hardware configuration of server computers, including such details as BIOS revisions, hard disk configurations, and any healthrelated monitoring features that might be available. In addition, the CMDB should contain details about the OS and which applications are installed on the computer. Finally, important information such as the network configuration of the server should be recorded. Desktop Configuration One of the most critical portions of an IT infrastructure generally exists outside the data center. End-user workstations, notebook computers, and portable devices all must be managed. Information about the network configuration, hardware platform, and applications can be stored within the CMDB. These details can be very useful for performing routine tasks, such as security updates, and for ensuring that the computers adhere to the corporate computing policies. Network Configuration From a network standpoint, routers, switches, firewalls, and other devices should be documented with the CMDB. Ideally, all important details from within the router configuration files will be included in the data. As network devices often have to interact, network topology details (including routing methods and inter-dependencies) should also be documented. Wherever possible, network administrators should note the purpose of various device configurations within the CMDB.

32

The Reference Guide to Data Center Automation Software Configuration Managing software can be a time-consuming and error-prone process in many environments. Fortunately, the use of a CMDB can help. By keeping track of which software is installed on which machines, and how many copies of the software are in use concurrently, systems administrators and support staff can easily report on important details such as OS versions, license counts, and security configurations. Often, organizations will find that they have purchased too many licenses or that many users are relying on outdated versions of software. Evaluating CMDB Features Although the basic functionality of a CMDB is easy to define, there are many features and options that can make the task of maintaining configuration information easier and more productive. When evaluating CMDB solutions, you should keep the following features in mind: • Automatic discovery—One of the most painful and tedious aspects of deploying a new CMDB solution is performing the initial population of the database. Although some of the tasks must be performed manually, vendors offer tools that can be used to automatically discover and document information about devices on the network. This feature not only saves time but can greatly increase the accuracy of data collection. Plus, automatic discovery features can be used to automatically document new components as they’re added to the IT infrastructure. Integration with data center automation tools—A CMDB solution should work with other data center automation tools, including configuration management, Help desk, patch management, and related products. When the tools work together, this combination provides the best value to IT—the CMDB can continue to be kept up to date from other sources of information. Broad device support—Details about various hardware devices can vary significantly between vendors and models. Ideally, the CMDB solution will provide options for tracking products from a variety of different manufacturers, and the vendor will continue to make updates available as new devices are released. Usability features—To ensure that IT staff and other users learn to rely upon a solution, it must be easy to use. Many CMDB solutions offer a Web-based presentation of information that can be accessed via an organization’s intranet. If they’re well-designed, all employees in an organization will be able to quickly and easily get the data they need (assuming, of course, that they have the appropriate permissions). For some types of operations, “smart client” applications might provide a better experience. Performance and scalability—CMDB systems tend to track large quantities of information about all the devices in the environment. The solution should be able to scale to support an environment’s current and projected size while providing adequate performance in the areas of data storage and reporting. Distributed database—Many IT organizations support networks at multiple locations. The CMDB solution should provide a method for remote sites (such as branch offices) to communicate with the database. Based on their network capacity, organizations might choose to maintain a single central database. Alternatively, copies of the database might be made available at multiple sites for performance reasons.

33

The Reference Guide to Data Center Automation • Security features—The CMDB will contain numerous details related to the design and implementation of the network environment. In the wrong hands, this information can be a security liability. To help protect sensitive data, the CMDB solution should provide a method for implementing role-based security access. This setup will allow administrators to control who has access to which information. Flexibility and extensibility—In an ideal world, you would set up your entire IT environment at once and never have to change it. In reality, IT organizations frequently need to adapt to changing business and technical requirements. New technologies, such as blade servers and virtual machines, can place new requirements on tracking solutions. A CMDB solution should be flexible enough to allow for documenting many different types of devices and should support expandability for new technologies and device types. The solution may even allow developers to create definitions of their own devices. Generation of reports—The main purpose of the CMDB is to provide information to IT staff, so the solution should have a strong and flexible reporting engine. Features to look for include the ability to create and save custom report definitions, and the ability to automatically publish and distribute reports via email or an intranet site. Customizability/Application Programming Interface (API)—Although the pre-built reports and functionality included with a CMDB tool can meet many of users’ requirements, at some point, it might become necessary to create custom applications that leverage the data stored in the CMDB. That is where a well-document and supported API can be valuable. Developers should be able to use the API to programmatically return and modify data. One potential application of this might be to integrate the CMDB with organizations’ other IT systems.

Overall, through the use of a CMDB, IT organizations can better track, manage, and report on all the important components of the IT infrastructure.

34

The Reference Guide to Data Center Automation

Auditing
The process of auditing involves systematic checks and examinations to ensure that a specific aspect of a business is functioning as expected. In the financial world, auditing requires a review of accounting records, and verification of the information that is recorded. The purpose is to ensure that the details are consistent and that rules are being followed. From an IT standpoint, auditing should be an important aspect of operations. The Benefits of Auditing Although some IT departments have established regular auditing processes, many tend to perform these steps in a reactive way. For example, whenever a new problem arises (such as a security violation or server downtime issue), systems administrators will manually examine the configuration of a system to ensure that it is working as expected. A far better scenario is one in which auditing is performed proactively and as part of a regular process. There are many benefits of performing regular audits, including: • Adhering to regulatory compliance requirements—Many companies are required to adhere to government rules or industry-specific practices. These regulations might specify how certain types of data should be handled or they might define how certain processes should be performed. In fact, many regulatory requirements necessitate regular auditing reviews to be carried out either by an organization or through the use of a third party. Verifying security—In many IT environments, security is difficult to manage. Every time a new server or workstation is added to the network, systems administrators must be careful to ensure that the device meets requirements specified in the security policy. Overlooking even one system could lead to serious problems, including loss of data. Additionally, IT departments must keep track of hardware and software licenses and ensure that users don’t add new devices without authorization. By performing routine security audits, some of these potential oversights can be detected before they lead to unauthorized access or other problems. Enforcing processes—Auditing can help ensure that the proper IT processes are being followed. IT departments that have implemented best practices such as those specified within the IT Infrastructure Library (ITIL) can perform routine reviews to look for potential problems and identify areas for improvement. Change tracking and troubleshooting—Even in relatively simple IT environments, changes can have unintended consequences. In fact, many problems occur as a result of changes that were made intentionally. An auditing process can help identify which changes are being made and, if necessary, can help reduce troubleshooting time and effort.

These are just some of the important reasons for performing regular auditing of IT environments. The important point is that rather than being just a burden on an IT group, auditing can help ensure that the organization is working properly.

35

The Reference Guide to Data Center Automation Developing Auditing Criteria When working on developing auditing requirements and criteria, an organization should start by determining goals for the auditing process. The potential benefits already discussed are a good starting point. However, IT groups should add specifics. Examples might include Sensitive customer data should always be stored securely and Change and configuration management processes should always be followed. Process-related criteria pertain to how and when changes are made and are designed to ensure that the intended IT service levels are being properly met. Organizations might develop auditing requirements to ensure that a process, such as manual server provisioning, is being performed correctly. These criteria often depend upon having well-documented processes that are enforced. For example, ITIL provides recommendations for steps that should be included in the change and configuration management process. Processes also pertain to standard operations in such areas as physical data center security, adherence to approvals hierarchies, and the definition of employee termination policies. Configuration-related criteria focus on how workstations, servers, and network devices are set up. For example, security policies might require that all workstations and servers are only one version behind the latest set of security updates. Application-level configuration is also very important, as the strength of an IT organization’s security system relies upon having programs up to date with regard to patches and user authorization settings. Inventory-related auditing criteria generally involve verification that equipment is being tracked properly and that hardware devices are physically located where expected. Asset tracking methods and manual inspection of data centers and remote locations can help ensure that these criteria are being met. Performance-related auditing criteria are designed to ensure that an IT department is providing adequate levels of service based on business needs. Metrics might include reliability, uptime, and responsiveness numbers. Furthermore, if the IT department has committed to specific Service Level Agreements (SLAs), those can serve as a basis for the auditing criteria. Table 2 provides examples of auditing criteria and metrics that a typical IT department might develop.

36

The Reference Guide to Data Center Automation

Auditing Category Process

Auditing Purpose/Requirement Change management process must be enforced 100% of changes should be approved prior to being performed All data center access is logged

Metrics Percentage of changes that obtained approval

Target

Accuracy of data center entrance and exit logs Percentage of servers that are up to date based on security policy

100% of data center visits are logged 100% of servers must be within one level of the latest security policy, and 50% must be running at the latest level All permissions and user accounts should be consistent with IT and HR records No more than 100% of the unscheduled downtime allowance, as specified in the SLA 95% of tier-one issues should be resolved within 4 hours 100% of devices should be physically verified

Configuration

Servers are up to date with the latest security patches

CRM accounts are up to date

Only required accounts and permissions are in place, according to the IT CRM configuration policy Total unscheduled downtime (in minutes)

Performance

Ensuring adequate IT server and network uptime

Service desk resolution times are within stated levels Inventory Asset tracking

Percentage of issues that are within times stated within the SLA Percentage of assets that are present according to previous audit results and change tracking details

Table 2: Sample auditing criteria.

37

The Reference Guide to Data Center Automation

Preparing for Audits In many IT environments, preparing for an audit is something that managers and staff dread. The audit itself generally involves scrutinizing operational details and uncovering deficiencies. It also requires the generation of a large amount of information. The following types of information can be helpful in order to prepare for an audit: • • • • • Meeting minutes and details from regular meetings, such as change and configuration management reviews Asset inventory information, based on physical verification and any asset tracking databases Results from previous audits, in order to ensure that previous deficiencies have been addressed IT policy and process documents Information about processes and how they’re enforced

All this information can be difficult to obtain (especially if done manually), but is often required in order to carry out auditing procedures. Performing Audits The process of performing an audit involves comparing the actual configuration of devices and settings against their expected settings. For an IT department, a typical example might be a security audit. The expected values will include details related to server patch levels, firewall rules, and network configuration settings. The employees that actually perform the audit can include members of the internal staff, including systems and network administrators and IT management. The goal for internal staff should be to remain completely objective, wherever possible. Alternatively, organizations can choose to employ outside professionals and consultants to provide the audit. This method often leads to better accuracy, especially if the consultants specialize in IT auditing. Auditing can be performed manually by inspecting individual devices and settings, but there are several potential problems with this method. First and foremost, the process can be tedious and time consuming, even in small IT environments. Second, the process leaves much room for error, as it’s easy to overlook a device or setting. Finally, performing routine audits can be difficult, especially in large environments in which changes are frequent and thousands of devices must be examined. Figure 6 shows an example of a manually generated auditing report. Although this report is far from ideal, it does show the types of information that should be evaluated.

38

The Reference Guide to Data Center Automation

Figure 6: Manual spreadsheet-based auditing reports.

Automating Auditing When performed manually, the processes related to designing, preparing for, and performing auditing functions can add a significant burden to IT staff. IT staff must be sure to define relevant auditing criteria, and they must work diligently to ensure that process and configuration requirements are always being met. Additionally, the process of performing audits can be extremely time consuming and therefore are generally performed only when absolutely required. Fortunately, there are several ways in which data center automation tools and technologies can help automate the auditing process. One of the most important is having a Configuration Management Database (CMDB). A CMDB can centrally store all the details related to the hardware, software, and network devices in an IT environment, so it serves as a ready source against which expected settings can be compared. Asset tracking functionality provides IT managers with the power of knowing where all of their hardware and software investments are (or should be). Change and configuration management tools can also help by allowing IT staff to quickly and automatically make changes even in large environments. Whenever a change is made, it can be recorded to the audit log. Furthermore, by restricting who can make changes and organizing the change process, data center automation tools can greatly alleviate the burden of performing auditing manually. Although auditing can take time and effort to implement, the investment can quickly and easily pay off. And, through the use of data center automation tools, the entire process can be managed without additional burden to IT staff.

39

The Reference Guide to Data Center Automation

Customers
One of the many critical success factors for service-related organizations is customer service. Businesses often go to great lengths to ensure that they understand their customers’ needs and invest considerable time and effort in researching how to better serve them. The customer experience can be a “make-or-break” factor in the overall success of the business. Although the term “customer” is usually used to refer to individuals that work outside of an organization, IT departments can gain insight into the users and business processes they support by viewing them as customers. This shift in service delivery perspective can help improve overall performance of IT departments and operations for an organization as a whole. Identifying Customers An important aspect of service delivery is to define who customers are. In the business world, marketing organizations often spend considerable time, effort, and money in order to make this determination. They understand the importance of defining their target markets. IT departments can take a similar approach. Many IT departments tend to be reactive in that they respond to requests as they come in. These requests may range from individual user needs (such as password reset requests) to deployments of new enterprise applications (such as the deployment of a new CRM application). The first step in identifying customers is to attempt to group them together. End users might form one group and represent typical desktop and workstation users from any department. Another group might be mid-level management, who tend to frequently request new computer installations or changes to existing ones. Finally, upper-level management often focuses on strategic initiatives, many of which will require support from the IT department. Figure 7 provides an example of some of these groups.

Figure 7: Identifying IT departments’ customers.

40

The Reference Guide to Data Center Automation

Understanding Customers’ Needs Once IT departments’ customers have been defined, it’s time to figure out what it is they really need. In pre-sales discussions, traditional Sales staff will meet with representatives and decision makers to identify what their customers are looking for. They’ll develop a set of requirements and then come back with a proposed solution. They key portion of this process is to have both business and IT representatives involved in the process. It’s important to be able to accurately identify “pain points” for customers—the areas that are causing them the highest costs and most frustrations. Often, IT departments tend to spend a significant portion of their time “fighting fires” instead of addressing the root causes of reliability problems. If IT management consistently hears that response times and service delivery delays are primary concerns, an investment in automation tools might help address these issues. Defining Products and Service Offerings Once their customers’ needs have been identified, IT staff can start trying to find the best solutions to these problems. It is at this point at which the real benefits of the “customerfocused” model start to appear. Based on gathered requirements, IT management can start developing a set of service offerings to meet these needs. For example, if the Engineering department wants to be able to set up and deploy new machines as quickly as possible, investments in server virtualization and automated deployment tools might make sense. If reliability and uptime are primary concerns for several departments, investments in automated monitoring tools might make sense. The best businesses find a way to offer standardized products or services that apply to many of their customers. Although it’s often impractical to try to meet everyone’s needs, the majority can often benefit from these offerings. Ideally, systems and network administrators will be able to find solutions that can benefit all areas of the organization with little or no customization. For example, developing standard workstation upgrade and deployment processes might be of interest to several different departments. True economies of scale can be realized when basic services become repeatable and consistent. IT organizations can use several different practices to ensure that customers are getting what they “paid for.” In some cases, Service Level Agreements (SLAs) can help define and communicate the responsibilities of the service provider. IT departments can commit to specific performance metrics and constantly track their success against these numbers. Data center automation tools can also be helpful in ensuring that SLAs are being met.

41

The Reference Guide to Data Center Automation

Communicating with Customers An important aspect of overall success is related to vendors’ ability to align their products and services with what their customers need. It’s also important to recognize that, in some cases, what customers ask for might not be what they really want. By taking time to work in a “presales” role to better identify the source of problems faced by customers, better solutions can be developed. It’s important to continue communications with customers, just as successful businesses work to earn repeated business from their existing client base. IT departments might find that their customers’ needs have changed in reaction to new business initiatives or changing focus. This should serve as a good indicator that it might be time to hold a review and potentially update the products and services that are being provided. IT organizations should choose to restructure their offerings based on customers’ changing needs. For example, if too many resources are currently allocated toward products or services that do not address major company initiatives, it will become obvious that these efforts could be better spent in another way. The benefits of continued communications with customers are numerous. In the traditional business world, it’s commonly understood that the value of keeping customers happy over time is high. Figure 8 shows an example of a continuing customer-focused process that involves “service after the sale.”

Figure 8: The customer-focused IT process cycle.

42

The Reference Guide to Data Center Automation Managing Budgets and Profitability Before a business can truly be considered successful, it must be able to show a profit—that is, its revenue must exceed its overall costs. In the case of internal IT departments, actual money might never change hands. And, the organization is typically structured to be a cost center. Still, it’s important to ensure that IT is delivering its products at the best possible price to customers while staying with budget. In some organizations, business and IT management may decide to implement this goal through inter-department charge-backs (a system by which the IT department will “charge” its customers and all expenditures will affect the budget of the department requesting products or services). The goal of the IT department should be to reduce costs while maintaining service levels and products to its customers. This goal can often be achieved by increasing efficiency through the development of standard practices and the use of data center automation tools. Table 3 provides sample numbers that show how various technology investments can improve profitability.
Product or Service Workstation deployment Proposed Investment Investment in automated deployment tools Investment in automated server deployment and configuration tools Purchase of automated Help desk system Current Cost or Service Level $350/workstation New Cost or Service Level $90/workstation Benefit $270 savings per workstation deployed $325 savings per workstation deployed Reduction in average resolution time by ~3.75 hours per issue Reduction in time and effort required to maintain servers by approximately 10 years

Server deployment

$450/server

$125/server

Average time to resolve basic Help desk issues

~4 hours

~1.25 hours

Server patch management

Automated security management solution

~ 12 hours per server per year

~ 2 hours per server per year

Table 3: Improving IT product profitability.

Overall, there are numerous benefits that stand to be gained by having IT departments treat users and other business units as customers. By identifying groups of users, determining their needs, and developing products and services, IT organizations can take advantage of the many best practices utilized by successful companies. Doing so will translate into a better alignment between IT departments and other areas of the organization and can help to reduce costs.

43

The Reference Guide to Data Center Automation

Total Cost of Ownership
Determining total cost of ownership (TCO) involves enumerating all the time and effort-related costs in relation to implementing and maintaining IT assets. The main concept of TCO is that the initial purchase price of a technology is often just a very small portion of the total cost. Organizations benefit from getting a better handle on complete expenses related to their technology purchases and considering the many different types of charges that should be taken into account. This information will illuminate ways in which data center automation can help reduce costs and increase efficiencies. Measuring Costs Costs related to the management of IT hardware, software, and network devices can come from many areas. Figure 9 illustrates the types of costs that might be associated with a typical IT purchase.
The numbers are hypothetical approximations and that they will vary significantly based on the size and amount of automation in various data center environments.

TCO Breakdown by Cost Category

Misc. / Other Costs 6%

Initial Capital Costs 22%

Labor Costs 47%

Infrastructure Costs 25%

Figure 9: TCO breakdown by costs.

44

The Reference Guide to Data Center Automation

Identifying Initial Capital Costs When asked about the cost of a particular product or service, IT staff will generally think first about the purchase or “sticker” price of the device. Servers, network infrastructure devices, and workstations all have very readily visible hard costs associated with them. The list of these basic costs should include at least the following: Capital equipment purchases—These costs are probably the ones that first come to mind for IT managers. For example, a new network-attached storage device might cost $6500 to purchase. The charge is usually paid one time (at the time of purchase) and is related to the physical device or technology itself. • Financing charges—IT organizations that choose to lease or finance the initial purchase price of capital assets will need to factor in financing charges. Depending on the terms of the lease, amounts may be due monthly, quarterly, or annually. • Handling and delivery charges—Procuring new IT hardware requires basic transportation and handling costs. These are often included in part of the purchase transaction, but in some cases, additional delivery fees might be required. Generally, these costs are easy to see. They will show up on invoices and purchase orders, and can be tracked by IT managers and accounting staff without much further research. However, they account for only one portion of the total cost. Enumerating Infrastructure Costs IT organizations should keep a close watch on ongoing costs that are related to supporting and maintaining the devices. Regardless of whether an IT organization owns its data center facilities, it will need to factor in costs for several areas of operation: Power—Costs related to electricity can represent a significant portion of total IT expenditures. Although it can be challenging to isolate the exact amount of power used by each device, average costs per amp of power can be calculated and distributed based on devices’ requirements. Power is required for basic functioning of the device as well as for cooling management. Furthermore, as the price of electricity can vary significantly, many financial predictions can partially hinge upon numbers that are outside an organization’s control. • Support contracts and maintenance fees—Many IT hardware and software vendors offer (or require) support and maintenance contracts for their devices. These costs may be onetime additions to the purchase price, but generally, they’re paid through a monthly or annual subscription. • Infrastructure costs—Whenever new IT equipment is deployed, physical space must be made for the new devices. Supporting infrastructure resources must also be provided. This infrastructure might include additional network capacity (switch ports and bandwidth), rack space, and any required changes to cooling and power capabilities. • Network bandwidth—The implementation of new devices on the network will often require some incremental increase in network connectivity. Although additional costs are usually not required for every new device, total network-related costs should be distributed over all the workstations, servers, and networks that are supported. By factoring in the ongoing support and maintenance costs, one more piece of the TCO puzzle is added.
45

The Reference Guide to Data Center Automation Capturing Labor Costs Once the initial purchase price of the physical device and related infrastructure is factored in, it’s time to look at human resources. Labor-related costs associated with the entire life cycle of an IT purchase often account for a large portion of the TCO for a particular device. Typical areas include: • • Selection—The time it takes to evaluate various solutions and determine configurations can affect the overall cost of an IT investment. Deployment—After equipment is delivered, labor is required in order to physically “rack” the new device and to configure it for use. Some cost reductions can be realized when installing numerous devices at the same time. Configuration and testing—New computers and network devices rarely come from the factory completely ready for use. Initial configuration often requires significant time, especially if it is a new device with which the IT department is unfamiliar. Testing is critical in order to ensure a smooth deployment experience. Systems administration—Application and operating system (OS) updates, performance monitoring, security management, and other routine tasks can add up to significant ongoing computing costs. Replacement—It’s no secret that all IT investments have limited useful life spans and must be replaced eventually. The cost of removing and replacing old hardware should be factored into the total cost.

To calculate labor-related costs, IT managers should group their employees into specific skill areas and determine average per-hour costs for those personnel. In many cases, it might make more sense to determine an average value for the number of hours spent on certain tasks. Some tasks, such as fixing hardware failures, may be performed infrequently, and only on a few machines in the environment. For these cases, IT management can calculate the total number of hours spent repairing hardware problems, then divide that by the total number of servers in the environment. The result is a useful “cost per server” amount that can then be factored into other technology decisions. Measuring TCO There are many challenges that IT organizations will face when trying to calculate TCO for the devices they support. The main problem is in determining cost-related numbers. Some of this information can come from reports by IT staff, but that data is often incomplete. Asset management tools can greatly help keep track of “hard costs,” especially those related to new purchases. These tools generally allow factoring in finance costs, operating costs, and depreciation—all of which can be important for determining TCO. A good source for labor-related costs can be an automated Help desk solution and change and configuration management tools. IT staff can easily report on the amount of time they’ve spent on specific issues by using these tools.

46

The Reference Guide to Data Center Automation

Reducing TCO Through Automation Many of the real costs related to technology investments are related to deployment and ongoing management. Data center automation tools can greatly help in measuring and reducing TCO. By providing reports of the time and effort required to maintain servers, workstations, and network devices, IT managers can get a more accurate picture of total costs. Once an organization has a good idea of where its major operational costs are coming from, it can use this information to start reducing those costs. Most IT organizations will find that they spend significant amounts of money on basic labor-intensive operations that can be quickly and easily automated through the use of the right tools. For example, if a major cost component related to supporting servers is deployment, automated server provisioning tools can help lower those costs. Similarly, if a large portion of the expenses come from ongoing maintenance, automating monitoring, change, and configuration management solutions can help dramatically. Overall, by keeping in mind the components of TCO, IT departments can make better decisions related to managing and lowering the costs associated with service delivery.

Reporting Requirements
An old management saying states that “If you can’t measure it, you can’t manage it.” The idea is that, without knowing what is occurring within the business, managers will be unable to make educated decisions. This idea clearly applies to IT environments, where major changes happen frequently and often at a pace that is much faster than that of other areas of the business. That is where reporting comes in—the goal is for IT management to be able to gain the insight they need to make better decisions. It is useful to know what types of reports can be useful, and how these reports can be generated. Identifying Reporting Needs The first step in determining reporting requirements is to determine what types of information will be useful. Although it’s tempting for technical staff to generate every possible report (just because it’s possible), the real initial challenge is in identifying which information will be most useful. Configuration Reports Configuration reports show IT managers the current status of the hardware, software, and network environments that they support. Details might include the configuration of specific network devices such as routers or firewalls, or the status of particular servers. Basic configuration information can be obtained manually through the use of tools such as the Windows System Information Application (as shown in Figure 10).

47

The Reference Guide to Data Center Automation

Figure 10: Viewing configuration details using the Windows System Information tool.

These reports can be very helpful by allowing IT managers to identify underutilized resources, and for spotting any potential capacity or performance problems. They are also instrumental in ensuring that all systems are kept up to date with security and application patches. Reporting solutions should be able to track assets that are located in multiple sites (including those that are hidden away in closets at small branch offices) to ensure that nothing is overlooked. Service Level Agreement Reporting An effective method for ensuring that IT departments are meeting their users’ needs is through the use of well-defined Service Level Agreements (SLAs). An SLA might specify, for example, how much downtime is acceptable for a specific application, or it might define an expected turnaround time for the deployment of a new server. These agreements can affect operations throughout the business, so IT managers generally want to keep a close watch on them. SLA reporting features will allow IT staff to specify thresholds for specific metrics (such as server deployment time), then provide for the creation of reports that show the expected values compared against the agreed-upon values. These reports can also be helpful in improving the perception of IT throughout an organization (assuming, of course, that service levels are met).

48

The Reference Guide to Data Center Automation

Real-Time Activity Reporting Most IT departments are characterized by rapid changes in short amounts of time. Many of these changes occur in reaction to changing business requirements; others are performed in order to improve the IT infrastructure. It can be very difficult to keep track of all of the changes that are occurring on workstations, servers, network devices, and applications. In this arena, real-time reporting can help. The information in these reports is always kept up to date and can be referenced many times during the day. Ideally, the reports will include details about what changed, why the change was made, and who performed the change. This information can greatly assist business and technical staff in coordinating their activities and troubleshooting problems. Regulatory Compliance Reporting Many industries are required to comply with government and industry-specific regulations to ensure that their operations are within guidelines. Examples include the Sarbanes-Oxley Act (for public companies) and the Health Insurance Portability and Accountability Act (HIPAA—for the healthcare industry). They must not only follow the rules but also be able to prove it. Regulatory compliance reports generate information related to the metrics of the current IT environment, then compare this data against specific regulatory requirements. With this information, IT management can quickly identify any deficiencies that must be resolved. Generating Reports Once an organization has determined the requirements for its reports, it can start looking at how the reports can be generated. There are many ways in which report creation and generation can be simplified. Using a Configuration Management Database Determining the source of reporting data can be difficult in many IT organizations. Data tends to be stored in a variety of different “systems,” including paper-based records, spreadsheets, custom database solutions, and enterprise systems. It can be very difficult to bring all this information together due to differences in the types of data and how information is structured. By using a centralized Configuration Management Database (CMDB), IT departments can store the information they need within a single solution. This data store greatly simplifies the creation and generation of reports, and can help ensure that no information is overlooked (see Figure 11).

49

The Reference Guide to Data Center Automation

Figure 11: How a CMDB can help facilitate reporting.

Automating Report Generation One of the challenges related to many types of reporting is that, as soon as the report is generated, it’s out of date. Fortunately, electronic report distribution methods can help alleviate this problem. Automated IT reporting solutions can provide numerous features: • On-demand reporting—Whenever necessary, users should be able to generate up-to-thesecond reports on-demand. This type of reporting is particularly useful when managers want to closely track information that might change during the day. In larger IT environments, reports might take a significant amount of time to generate, so scheduling options can be helpful. Automatic report distribution—Many business processes revolve around regular meetings and review processes. Automated reporting solutions that have the ability to automatically send reports based on a predefined schedule can help ensure that everyone is kept up to date. Reports can be distributed via an intranet site or through email. Alerts—IT managers often expect their staff to notify them if some aspect of the organization needs special attention. The same requirement is true for reporting. Reporting solutions can provide the ability to set alerts and thresholds that can highlight particularly important or interesting aspects within reports. For example, if downtime has currently exceeded the limits specified by the service level for the Engineering department, IT managers could have the report highlight this in red.

Overall, through the use of automated reporting, IT departments can gain the information they need to make better decisions about their business and technical operations. The result is reductions in cost and improvements in service levels.

50

The Reference Guide to Data Center Automation

Network and Server Convergence
Over time, IT applications have evolved to become increasingly reliant on many components of an IT infrastructure. In the past, it was common for even enterprise-level applications to be hosted on one or a few servers, and many organizations’ networks were centralized. The job of the network was to ensure that clients could connect to these servers. Modern enterprise applications are significantly more complex and often require the proper functioning of dozens of different portions of an IT environment to be working properly. Server and network management have converged to a point at which they’re highly inter-dependent. Convergence Examples In typical IT environments, there are many examples of devices that blur the line between network and server operations. Dedicated network appliances—such as network-attached storage (NAS) devices, firewalls, proxy servers, caching devices, and embedded Web servers—all rely on an underlying OS. For example, although some NAS devices are based on a proprietary network operating system (OS), many devices include optimized versions of Windows (such as the Windows Storage Server) or Linux platforms. Figure 12 shows an example of this configuration.

Figure 12: Components of a typical NAS device “stack.”

In many of these systems, there are clear advantages to this type of configuration. For example, several major firewall solutions run on either Windows or Linux platforms. The benefit is that systems administrators can gain the usability features of the underlying OS while retaining the desired functionality. From a management standpoint, however, this configuration might require a change to the standard paradigm—to ensure that the device is performing optimally, network and systems administrators must share these responsibilities.
51

The Reference Guide to Data Center Automation Determining Application Requirements One of the most important functions of IT departments is ensuring that users can access the applications they need to perform their jobs. The applications themselves rely on server resources as well as the network. An important first step in managing this convergence is to identify the requirements that applications may have. Table 4 provides an example of some typical types of applications and their dependencies.
Application CRM Engineering/QA Defect Tracker Intranet Servers Required AppServer01, WebServer01, DatabaseServer01 Engineering01, Engineering03, EngineeringDB05 IntranetServer01 (cluster), KnowledgeMgmtDB Networks Required VPN access for all branch offices and traveling users Engineering network Internet access and VPN access (home users) Corporate LAN, branch office WANs, and VPN

Table 4: Identifying application requirements and dependencies.

By highlighting these requirements, IT staff can better visualize all the network and server infrastructure components that are required to support a specific application. The Roles of IT Staff In the past, IT operations tended to be specialized in numerous isolated roles. A typical staff might include network specialists, database specialists, server administrations, and application managers. It was often acceptable for each of these administrators to focus on his or her area of expertise with limited knowledge of the other areas. For most modern IT organizations, this structure has changed. Systems administrators, for example, often need strong network skills in order to complete their job roles. And application developers must take into account the underlying network and server infrastructure on which their programs will run. Unfortunately, it’s impractical to expect all IT staff members to have strong skills in all of these areas. To address this issue, IT departments must rely on strong coordination between the many functional areas of operations to ensure that applications can remain functioning properly. Managing Convergence with Automation When handled manually, it can be challenging for IT staff to develop the levels of coordination that are required to ensure that converged applications are managed properly. However, many of the features of data center automation tools can help. First, by storing network- and serverrelated configuration details in a single Configuration Management Database (CMDB), IT staff can more easily see the inter-dependencies of the devices they support. Change and configuration management tools can help ensure consistency in how these devices are managed. Overall, the IT staff can better manage the complexity resulting from the convergence of network and server management through the use of data center automation tools.

52

The Reference Guide to Data Center Automation

Service Level Agreements
The primary focus of IT departments should be meeting the requirements of other members of their organizations. As businesses have become increasingly reliant on their technology investments, people ranging from desktop users to executive management have specific expectations related to the levels of service they should receive. Although these expectations sometimes coincide with understandings within an IT organization, in many cases, there is a large communications gap. Service Level Agreements (SLAs) are intended to establish, communicate, and measure the levels of service that will be provided by IT departments. They are mutually agreed-upon definitions of scope, expected turnaround times, quality, reliability, and other metrics that are important to the business as a whole. Challenges Related to IT Services Delivery In some areas of IT, the job can be rather thankless. In fact, it is sometimes said that no one even thinks about IT until something goes wrong. Although many organizations see investments in IT as a strategic business investment, others see it only as a cost center. The main challenge is to be able to come to an understanding that includes the capabilities of the IT department and the expectations of the “customers” it serves. That is where the idea of service levels comes in. In order to focus on these benefits, IT departments can think of themselves as outside vendors that are selling products and services to other areas of their organization. Let’s look at some details related to defining these agreements. Defining Service Level Requirements SLAs can be set up in a variety of ways, and there are several approaches that can be taken toward developing them. One common factor, however, is that all areas of the organization must be involved. SLAs are not something that can be developed by IT departments working in isolation. The process will require research and negotiations in order to determine an appropriate set of requirements. Figure 13 provides an overview of the considerations that should be taken into account. Let’s look at the process of defining SLAs.

53

The Reference Guide to Data Center Automation

Figure 13: The process of developing, implementing, and evaluating SLAs.

Determining Organizational Needs IT departments can benefit from thinking of its services as “products” and the users and business processes it supports as “customers.” In this model, the goal of the IT department is to first determine which services the customer needs. This is perhaps the single most important part of the process: IT managers must meet with users and other managers throughout the organization to determine what exactly they need in order to best accomplish their goals. This process can be extremely valuable and enlightening by itself. It’s very important to keep the main goal in mind: To determine what organizations truly need, rather than what would just be nice to have.

54

The Reference Guide to Data Center Automation

Identify Service Level Details The next step is to start trying to define specific details related to what service levels should be accepted. This process will ideally work as a negotiation. A manager from the Engineering department might want all new server deployments to be completed within 2 days of the request. Based on IT staff and resources, however, this might not be possible. The IT manager might present a “counter-offer” of a turnaround time of 4 days. If this isn’t acceptable, the two can discuss alternatives that might allow for the goal to be more accessible. In this example, an investment in automated server deployment tools, virtualization, or additional dedicated staff might all be possible ways to meet the requirements. When discussing goals, it’s important for business leaders to avoid diving too far into technical details. For example, rather than requesting a “clustered database solution for the CRM application,” it is better for a Marketing manager to state the high-level business requirement, “We need to ensure that, even in a worst-case scenario, our people can access the CRM application.” In this particular case, it might well be that the best technical solution doesn’t involve clustering at all. The bottom line is that it’s the job of IT to figure out how to meet the requirements. A major benefit of this negotiation process is that it forces both sides to communicate details of their operations, and it allows each side to compromise to find a solution that works within given constraints. Occasionally, it might seem impossible for an IT department to meet the needs of a particular business area. In this case, either expectations have to be adjusted or budgetary and staffing resources might be required. In any case, communicating these issues makes the topics open and available for discussion. Once acceptable terms have been reached, it’s time to determine what to include in the SLA. Developing SLAs There are several important points to include in a complete SLA. Of course, it begins with a description of what level of service will be provided. At this point, the more detailed the information, the better it will be for both sides. Details should include processes that will be used to manage and maintain SLAs. For example, if a certain level is not being met, points of contact should be established on the IT and business sides. In many cases, IT departments might find that many different service level requirements overlap. For example, several departments might require high availability of Virtual Private Network (VPN) services in order to support traveling users and remote branch offices. This can help IT managers prioritize initiatives to best meet their overall goals. In this example, by adding better monitoring and redundancy features into the VPN, all areas of the organization can benefit.

55

The Reference Guide to Data Center Automation

Delivering Service Levels IT managers might have some level of fear when committing to specific service levels. Due to the nature of technology, it’s quite possible that situations could arise in which SLAs cannot be met (at least not for all areas of the organization). An extreme example might be the “perfect storm” of industry-wide hardware shortages combined with a lack of staff. In such a case, circumstances beyond the control of an organization can cause failures to meet the predefined goals. Overall, IT departments and business leaders should treat SLAs like they would any other target (such as sales-related goals or Engineering milestones). Ideally, the levels will always be met. But, when they’re not, everyone involved should look into the issues that caused the problem and look at how it can be resolved and avoided in the future. Even in the worst case, having some well-defined expectations can help avoid miscommunications between IT and its customers. The Benefits of Well-Defined SLAs When implemented properly, SLAs can help make the cost and challenges related to IT operations a part of the entire organization. By providing some level of visibility into IT operations and costs, other departments can get an idea of the amount of work involved. This can help manage expectations. For example, once the Accounting department understands the true cost of ensuring automated failover, perhaps it might decide that some unplanned downtime is acceptable. IT management can benefit greatly from the use of SLAs. They can use these agreements to justify expenditures and additional staff if appropriate resources are not available to meet the required levels. By communicating these issues up front, either their service levels must be lowered or necessary resources must be made available. Either way, the decision is one that the organization can make as a whole. Another major benefit of using SLAs is that investments in technologies such as data center automation products can become much more evident. When relatively small investments can quickly return increases in service levels, this is a clear win for both the IT department and the users it supports. Enforcing SLAs When dealing with outside parties, an agreement is often only as strong as the terms of any guarantee or related penalties. Because most IT departments tend to be located in-house, it’s generally not appropriate to add financial penalties. Thus, the enforceability of SLAs will be up to the professionalism of the management team. When goals are not being met, reasons should be sought out and the team should work together to find a solution. SLAs should be seen as flexible definitions, and business leaders should expect to adjust them regularly. As with other performance metrics, organizations might choose to attach salary and performance bonuses based on SLAs. Perhaps the biggest challenge is that of prioritization. Given a lack of labor resources, what is more important: uptime for the CRM application or the deployment of new Engineering servers? To help in these areas, IT managers might want to schedule regular meetings, both inside and outside of the IT department, to be sure that everyone in the organization understands the challenges.
56

The Reference Guide to Data Center Automation Examples of SLAs The actual details of SLAs for organizations will differ based on specific business needs. However, there are some general categories that should be considered. One category is that of application, hardware, and service uptime. Based on the importance of particular portions of the IT infrastructure, availability and uptime goals can be developed. Other types of SLAs might focus on deployment times or issue resolution times. Table 5 provides some high-level examples of the types of SLAs that might be developed by an organization. The examples focus on numerical metrics, but it’s also important to keep in mind that “soft metrics” (such as overall satisfaction with the Service Desk) might also be included.
SLA Area CRM Application Uptime Metrics Percent availability Goal 99.9% availability Notes/Terms Excludes planned downtime for maintenance operations and downtime due to unrelated network issues; major application updates might require additional planned downtime Include definition of “Level 1 Issues” Time is measured from original submission of issue to the Service Desk; include definition of “Level 2 Issues” Time is measured from when formal change request has been approved; SLA applies only to servers that will be hosted within the data center Virtual machines must use one of the three standard configuration profiles; time is measured from when formal change request has been approved.

Service Desk: Level 1 Issue Resolution Service Desk: Level 2 Issue Resolution

Issue Resolution Time Issue Resolution Time

4 business hours 8 business hours

Engineering: New Server Deployments (Physical machine)

Time to deployment

3 days

Engineering: New Server Deployments (Virtual machine)

Time to deployment

2 hours

Table 5: Examples of SLAs.

Now that we’ve looked at some examples, let’s see how IT organizations can keep track of SLAs.

57

The Reference Guide to Data Center Automation

Monitoring and Automating SLAs Once SLAs have been put into place, it’s up to the IT department to meet the goals that have been agreed upon. Although some environments might attempt to handle issues only when they arise, the ideal situation is one in which IT managers regularly produce reports showing SLArelated performance. This can be done manually, but in many cases, the management and process overhead related to tracking issue resolution times and uptime can be significant. One important way in which SLAs can be better monitored and managed is through the use of data center automation tools. Integrated platforms include features for monitoring uptime, automating deployment, and tracking changes. They can also provide IT managers with the ability to define service levels and measure their actual performance against them. Reports can be generated comparing actual performance with expected performance. Without these reports, people might have had to guess whether SLAs were being met. And the inevitable perception issues can negate many of the advantages of having created the SLAs in the first place. Overall, through the establishment of SLAs, IT departments can verify that they are meeting their customers’ requirements and ensure that the organization is receiving the expected value from their IT investments.

Network Business Continuity
Network business continuity focuses on ensuring that network operations will continue to function as quickly as possible after a major outage or disaster. The goal is to limit the disruption to service caused by the failure of a device, a network, or even an entire data center. Most implementations will involve a backup site and a process for failing over to that site, when needed. There are many factors that IT managers should keep in mind when developing network business continuity plans. The Benefits of Continuity Planning Business continuity, in general, has become increasingly important for many types of organizations. Customers and business partners have become increasingly reliant on applications and services, and even minor downtime can cause significant financial losses. For example, the loss of connectivity lasting a few minutes for a financial institution can result in lost revenues and reduced customer confidence, both of which would be difficult to regain. The list of things that can go wrong is a long one, ranging from issues with electricity to widespread natural disasters. Business continuity planning attempts to mitigate these risks by planning for processes that will resume normal operations, even in a worst-case scenario.

58

The Reference Guide to Data Center Automation

Developing a Network Business Continuity Plan The success of any continuity process hinges on its accuracy and alignment with business needs. This section will look at the many considerations that should be taken into account when developing a network business continuity plan. Figure 14 shows a high-level view of the processes that should be included.

Figure 14: Example steps of a network business continuity plan.

Defining Business Requirements The first step in developing a network business continuity plan is to determine the organization’s requirements. Although all systems are important, certain areas of the network might be more important than others. The most important aspect of determining requirements is to involve an entire organization. The IT department shouldn’t rely on its own knowledge to make important decisions related to the most important areas of the computing infrastructure. Given infinite resources, multiple duplicate network environments might be possible. In the real world, it’s much more likely that budget and labor constraints will restrict the reasonable level of protection against failures and disasters. A realistic plan should include discussions of the costs of downtime, the effects of data loss, and the importance of various areas of the network. Ideally, a list of critical systems will be developed based on input from the organization’s entire management team. Identifying Technical Requirements Modern IT networks tend to be complicated. There are many interdependencies between devices such as switches, routers, firewalls, and network caching devices. And this list doesn’t even include details related to which devices are relying on that infrastructure. When planning for business continuity, IT staff should first develop a high-level overview of the network topology and should outline critical systems. The goal is to ensure that the base levels of the infrastructure (which will be required by all other systems) are identified. The next step is to enumerate which devices will be required in the event of a failover process. Core routers, switches, and firewalls will probably be the first items on the list. Next would be devices required to support the most important applications and services on the network. Considerations should include how the network can run with reduced capacity (particularly if the budget doesn’t allow for full redundancy).

59

The Reference Guide to Data Center Automation

Preparing for Network Failover In the event of a network outage, failover processes must be performed. But before these steps can be taken, IT departments must ensure that they have the tools and information required. This section will take a look at some of the most important considerations. Configuration Management Keeping track of network configuration files is an important first step to enabling the failover process. In the event of a failover, restoring this information will help bring a network back to a usable state. Whenever configuration changes are made, network administrators must be sure that the change is recorded and replicated to any backup or standby devices. Managing Network Redundancy The implementation of redundancy is a major component of most business continuity plans. When planning for redundancy, it’s important to start with defining acceptable downtime limits and appropriate failover times. Most enterprise-level solutions offer options for enabling automatic failover of routers, switches, firewalls, content caches, and other network devices. It is important to keep in mind that, in the case of most failovers, the process might be noticeable to users (although the impact will hopefully be limited to a few connections that need to be reestablished). Simulating Disaster Recovery Operations An important—but often overlooked—aspect of any recovery process is to rehearse the failover and business continuity plan. There are many benefits to walking through this process. First, through a trial run, it’s likely that business and technical staff will find areas for improvement in the plan. Even the best planning can overlook some of the details that are revealed when performing the “real thing.” In the worst case, perhaps a critical system was completely overlooked. Or, there may be various time-saving changes that can be made to improve the process. Another major benefit of simulating disaster recovery is that practice builds expertise. IT staff should be well-versed in what is required to perform failover processes. There is one iron-clad rule related to testing recovery processes: Immediately after the failure of a critical system is not the time to start learning how to recover it.

60

The Reference Guide to Data Center Automation

Automating Network Business Continuity There are many aspects of an organizations’ network that must be considered when developing and preparing a business continuity plan. For most organizations, the tasks involved will require a lot of work. Fortunately, automated data center management tools can help make the process easier. For example, through the use of automated network discovery, network administrators can easily look at the overall network and discover interdependencies. And, through the use of configuration management (ideally with a configuration management database—CMDB), accurate network device configuration details can be collected. The process of keeping routers, switches, and firewalls up to date at a backup site can also be performed automatically. Figure 15 provides an example of how this process might work.

Figure 15: Maintaining a failover configuration using data center automation tools.

Developing a network business continuity plan is no small task for most IT departments. Through the use of data center automation solutions, however, this critical task can be made much more manageable.

61

The Reference Guide to Data Center Automation

Remote Administration
In modern IT environments, systems and network administrators are often tasked with managing increasing numbers of devices without additional time and resources. In addition, the systems might be spread out over numerous sites. Centralized management can help meet these needs by increasing overall efficiency. IT staff should be able to manage devices that are located across the world just as easily as they can manage the computing devices on their desks. Remote administration can be used to improve systems and network administration in an IT environment. The Benefits of Remote Administration When thinking of desktop administration, the term “SneakerNet” (referring to the fact that systems administrators often spend much of their time and effort walking between systems) might come to mind. For this reason, remote administration is a concept that is usually an easy “sell” to IT departments. When you factor in the labor costs and time associated with physically traveling to remote offices and departments, it’s difficult to find a method that is less efficient. Before looking at specific requirements related to remote administration, let’s quickly cover some of the potential benefits of remote administration. First and foremost, by centrally managing the configuration of hardware, software, and network devices, systems administrators can work from the comfort and convenience of their own workstations. Although having to deal with left-handed mice and custom keyboards might be a fun challenge, it’s clearly not efficient. Time saved by avoiding walking around is also another obvious benefit. For managing data center operations, you can increase security by limiting physical access to servers. From an enduser standpoint, having problems solved quickly and with minimal disruption to work are important goals. By now, the benefits are probably pretty obvious. Let’s delve into what you should look for in a remote management solution. Remote Administration Scenarios From a technical standpoint, remote administration can take many forms. Perhaps the most familiar to systems administrators is that of managing servers located in the data center or troubleshooting end users’ desktop machines. Network administrators can also perform remote administration tasks to configure routers, switches, firewalls, and other devices. In distributed environments, the remotely managed device might be located a few feet away or half-way across the world. Some terms to be familiar with include the remote management host (the computer or device to which you are connecting), and the remote management client (which is usually implemented as software that is run on users’ workstations). Additionally, an organization might have specific tools for monitoring and managing machines remotely.

62

The Reference Guide to Data Center Automation Remote Management Features There are several important features to consider when evaluating and selecting a remote management solution: • Broad support—The ideal remote management solution will be able to support a variety of device types, platforms, and versions. For example, in the area of desktop administration, the remote administration client should be able to connect to all of the operating systems (OSs) and versions that an organization regularly supports. Support for future OSs and products should also be taken into account. All of these platforms should be managed in a consistent manner. Reliability—As organizations depend on remote administration features for both routine and emergency operations, reliability is a major concern. The client- and server-sides of the remote management solution should be robust and dependable. Features that allow for remotely restarting a non-responsive host device can be helpful in a pinch. In addition, the ability to perform “out-of-band” management (that is, connections to a system by using non-standard connection methods) can help ensure that services are available when you need them most. Efficient bandwidth utilization—Remote management features should efficiently use network bandwidth. In some cases, remote administration connections may be made over high-bandwidth connections, so this won’t be an issue. However, when managing remote data centers, small branch offices, and international locations, using an efficient protocol can really help. Potential issues include low throughput rates and high latency on networks (both of which can make a remote connection practically unusable). Specific features to look for include the ability to provide for data compression, low average data rates, and ways to minimize latency given a variety of different network scenarios. In the area of desktop administration, for example, reducing the color depth, hiding desktop backgrounds, and changing screen resolution can help decrease requirements (see Figure 16).

Figure 16: Configuring video settings in the Windows XP Remote Desktop client.

63

The Reference Guide to Data Center Automation • File transfers—In addition to controlling remote computers, Help desk staff and systems administrations might need a quick and easy way to transfer files. In some cases, transfers can be handled outside of the remote administration solution by using standard network file transfer methods. In other cases, such as when a connection is made to a remote office or across multiple firewalls, a built-in solution that uses the same protocol and connection as the remote connection can be helpful.

Shadowing support—For training and troubleshooting purposes, the ability to “shadow” a connection can be helpful. In this method, the remote user might have view-only privileges on the remote device. Or, a trainer might be able to demonstrate an operation on a remote computer without worrying about interruptions from a user. In addition to these basic features, let’s look closer at details related to security. Securing Remote Management A critical concern related to remote management features is security. After all, if you’re adding a new way in which users can access your users’ computers (and the data they contain), what is to keep unauthorized users from doing the same? Fortunately, most modern remote management tools offer many capabilities to help address these concerns. First, authentication security—controlling who can remotely access a machine—must be implemented. Through the simplest method, authentication security can take the form of a simple “shared secret” username and password combination. But this approach leaves much to be desired—by creating new login information, many potential security problems can be introduced. In addition, it’s difficult to manually manage these settings. For example, what happens when systems administrators enter and leave the company? For this purpose, reliance on directory services (such as Windows Active Directory—AD) can help greatly. By centrally managing security settings and permissions, systems administrators can keep track of which users have access to remotely manage which resources. The next type of security to consider is encryption. Most remote management tools will transfer sensitive information in some form. Even keystrokes and converted video displays can be misused if they’re intercepted. The security solution should provide for verification of the identity of local and remote computers (through the use of certificates or machine-level authentication) and should implement encryption of the packets that are being sent between the client and server. Finally, a remote management solution should provide administrators with the ability to configure, review, and manage permissions related to remote management. In some cases, being able to remotely manage a computer or other device will be an all-or-nothing proposition—either the user will be able to fully control the device or they won’t. In other cases, such as in the case of remote desktop management, you might choose to restrict the operations that some users can perform. For example, a Level-1 Service Desk staff member might be allowed to only view a remote desktop machine while the user is accessing it. This can help in the area of troubleshooting, while maintaining adequate security and avoiding potential problems that might be caused by accidental changes.

64

The Reference Guide to Data Center Automation

Choosing a Remote Management Solution Most systems and network administrators already commonly use remote management features. For example, on Windows desktop and server computers, the Remote Desktop feature is easily accessible. And, for network devices, it’s a simple and straightforward process to connect over the network rather than to a physical serial port or dedicated management port on the device itself. Although these features might meet the basic needs of systems management, they do leave a lot to be desired. Managing permissions, keeping track of logins, and controlling connection details can make the process cumbersome and error-prone. An ideal remote management solution will integrate with other IT data center operations tools, utilities, and processes. For example, in the area of security, existing directory services will be used for authentication and the management of permissions. Security can also be improved by maintaining an audit log of which staff members connected to which devices (and when). The remote management features may also integrate with change and configuration management tools to keep track of any modifications that have been made. This functionality can greatly help in isolating problems and ensuring compliance with IT standards. Additionally, processes should be put in place to ensure that remote management features are used only when necessary. For example, if automated tools can be used to change network address information on a server, systems administrators should only connect to those machines if they need to perform a more complicated task. Overall, there are many potential benefits of working with remote management tools in environments of any size. When managed and implemented correctly, remote administration can save significant time and effort while improving IT operations and the end-user experience.

65

The Reference Guide to Data Center Automation

Server Configuration Management
The servers that an IT department manages for other members of the organization are one of the most visible and critical portions of the infrastructure. From hosting file shares, databases, and other critical applications services, servers must be available and properly configured at all times. The challenge for IT staff is ensuring that these computers are properly configured and problems don’t crop up over time. This section will talk about details related to server configuration management, including important things to keep in mind when documenting and configuring servers. Based on that, it will then look at details related to simplifying and improving the process through automation. Server Configuration Management Challenges When working in production data center environments, there are many challenges that can make managing server configurations more difficult. They can broadly be categorized as technical challenges and process-related challenges. Technical Challenges Regardless of the operating system (OS) platform or the applications that are supported, all servers must be kept up to date by systems administrators. Common tasks that must be performed include installing security patches, managing changes to system and network configurations, and taking an inventory of installed services. These operations are fairly simple to perform on one or a few servers, but in most data center environments, IT staff members must manage dozens or hundreds of machines. Technical challenges include the actual deployment of updates and configuration changes. Performing this task manually is time-consuming and tedious, even when using remote administration features. Also, it’s far too easy for systems administrators to accidentally overlook one or a few machines. In the case of implementing security patches, the result could be serious security vulnerabilities. Other challenges are related to actually performing configuration changes. IT departments should ensure that changes are made consistently, that they adhere to best practices, and that any modifications are tracked and documented. It’s also important to ensure that only authorized administrators are making changes and to track who made modifications. Although most systems administrators would agree to this process, in the real world, it can be difficult to spend the time and attention required to follow these steps every time.

66

The Reference Guide to Data Center Automation

Process-Related Challenges It’s important for IT departments to implement and enforce processes related to change and configuration management. The goal is to ensure that all changes are valid and authorized and to avoid problems that might appear due to inappropriate modifications to server configurations. Unfortunately, ensuring communications between IT staff, management, and the users they support can be difficult. The result is that some changes can cause unexpected problems due to a lack of coordination. IT management should also consider “quality assurance” processes and auditing of server configurations. Ideally, management would be able to quickly and easily view up-to-date details related to the configuration of all servers in the environment, regardless of location. This can help identify machines whose configurations are outdated or not in compliance with IT policies. Automating Server Configuration Management Server configuration management is an excellent candidate for automation in most data center environments. Many of the tasks that must be routinely performed can occur within minutes rather than days, weeks, or months. Let’s take a look at the many features and benefits of automating server configuration management. Automated Server Discovery An important first step in managing an entire IT environment is to discover what is out there. Instead of manually connecting to individual servers and collecting configuration details, automated server discovery features can scan the network and discover all the servers that are present on the network. Often, this will include computers that systems administrators weren’t aware of, and machines whose purpose is unknown. The computers may be located in the organization’s data centers, or within remote branch offices. Applying Configuration Changes Once an IT department has decided to make a change on all of its servers, it must begin the tedious and time-consuming process of performing the changes. By using an automated solution, however, a single change can be propagated throughout an entire network environment in a matter of minutes or hours. The changes can be scheduled to occur during periods of low activity and results can be automatically collected. An automated process enforces consistency and helps ensure that some systems are not accidentally overlooked during the update process.

67

The Reference Guide to Data Center Automation Configuration Management and Change Tracking A basic fact of working in an IT environment is that server configurations will change over time. In most cases, changes are based on authorized modifications due to business and technical initiatives. A server configuration management tool can collect network configuration information and OS details and conduct application inventories. All these details are obtained automatically, either through the use of agent software or standard OS methods. This reduces the chance for human error and allows for frequent validation of changes. Additionally, all of the configuration-related data can be stored centrally in a single configuration management database (CMDB). The data can then be correlated with other information about the environment to ensure that configurations are consistent. Monitoring and Auditing Server Configurations The process of auditing server configurations ensures that all servers are compliant with server configuration policies. By using these solutions, IT managers can confidently state that all their assets are being properly managed. When configuration details are properly tracked, systems administrators can easily identify which servers might need to be updated. The process of monitoring ensures that only authorized changes have been made and helps avoid unexpected problems. In addition, it applies to the entire environment—not just one or a few servers that an administrator might work with at a particular point in time. Enforcing Policies and Processes The importance of strong and consistent policies and processes cannot be overstated. IT departments should develop and enforce methods for making changes to servers. Although many IT managers might have developed approvals processes on paper, in reality, many ad-hoc changes often occur. An automated server configuration management solution can greatly help enforce processes by restricting changes to only authorized users and validating that the proper approvals have been obtained. From a technical standpoint, security permissions can be greatly restricted. For example, only the automation solution might have permissions to perform actual changes, and systems administrators must make all their modifications through the tool (see Figure 17). This serves the dual purpose of increasing accountability and ensuring that only authorized users are accessing server assets.

Figure 17: Making configuration changes using data center automation tools.

68

The Reference Guide to Data Center Automation Reporting One of the most visible benefits of automating the server configuration management process is the ability to generate on-demand reports. The information provided can range from software installation details to security configurations to server uptime and availability reports. All configuration and change data is stored in a central CMDB, so systems administrators and IT managers can quickly obtain the information they need to make better decisions. Reporting might also be required in order to demonstrate compliance with various regulatory requirements. A process that was formerly time-consuming and inaccurate can be reduced to a few simple steps. Better yet, individuals from areas outside of the IT department can view details that are relevant to performing their jobs. Evaluating Automated Solutions In addition to looking for the already mentioned features, there are several factors IT decision makers should keep in mind when evaluating automated server configuration management solutions. They should be sure that most of the platforms they support are manageable using the product. Considerations include hardware platforms, OS versions, and various system updates. Ideally, the technology will be regularly updated to keep pace with new systems. Additionally, the tool should enforce policies and processes to ensure that all changes are authorized and coordinated. Finally, all details should be tracked centrally, and the ability to perform audits and regular reporting can greatly help IT better manage its server investments. Overall, through the implementation of an automated server configuration management solution, IT departments can perform the vital task of keeping servers updated while avoiding much of the manual work involved. The benefits are that servers are configured consistently and accurately and IT staff is free to perform other important tasks.

IT Processes
Processes define a consistent set of steps that should be followed in order to complete a particular task. From an IT standpoint, processes can range from details about Service Desk escalations to communicating with end users. The goal of IT processes is to improve overall operations within the IT department and the organization as a whole. It’s often a fact that the implementation of processes requires additional effort and may add steps to some jobs. The steps can be time-consuming and may result in resistance or non-compliance. That raises the challenge: Processes must be worth far more than the “trouble” they cause in order to be considered worthwhile. This section will look at details related to what makes a good process, how you can enforce processes, and the benefits of automating process management.

69

The Reference Guide to Data Center Automation The Benefits of Processes Let’s first talk about the upside of designing and implementing processes. The major goals and benefits include: • Consistency—Tasks should be performed in the same way, regardless of who is performing them. In fact, in many cases, it can be argued that having something done consistently in a sub-optimal way is far better than having tasks sometimes completed well and sometimes completed poorly. Ad-hoc changes are difficult to manage and can lead to complex problems. Repeatability—It’s often easy for IT staff to make the same mistakes over and over or to “reinvent the wheel.” The goal of defining processes is to ensure that the same task can be completed multiple times in the same way. Simply allowing everyone to complete goals in their own way might be good for tasks that involve creativity, but they often don’t work well for operations that require a lot of coordination and many steps.

Effectiveness—The process should indicate the best way to do things with respect to the entire organization and all that are involved. The steps involved in the process should enforce best practices. These benefits might make the decision to implement processes an easy one, but the real challenge is not related to “why” but rather “how” to implement processes. Challenges Related to Process For some IT staff members, the very thought of processes might conjure up images of the Pointy-Haired Boss from the Dilbert comic strips. And there are some very good reasons for this: Mainly, many processes are poorly implemented and can actually make work more difficult with little or no benefit to anyone. Some of the problems with poorly implemented processes are based on a lack of knowledge of the details of a particular task. When out-of-touch management tries to single-handedly implement steps in an operation that it does not understand, the result can be disastrous. Consequently, many IT staffers tend to resist processes. They tend to circumvent them and do the bare minimum in order to meet managements’ requirements. Worse, they don’t see that there are benefits at all. OK, so that’s the bad news. Let’s look into what makes a good process (and one that people will like and follow). Characteristics of Effective Processes There are several aspects of processes that should be taken into account when implementing new methods of doing things. First, the purpose of the process should be clearly defined before going into the details themselves. Usually, the purpose is to define how a particular set of actions should be performed. Change management processes are a typical example. Organizations might implement formal change request documents and a Change Advisory Board (CAB) to keep track of modifications. Effective processes should be well aligned with the business and technical goals they’re trying to accomplish. “Process for the sake of process” is often counter-productive. Some questions to ask might include “Is this is the best and most efficient way to accomplish a particular goal?” and “Is the extra effort required by the process really worth it?” In some cases, if reporting and documentation of actions aren’t useful, perhaps they can be removed to make the process simpler.
70

The Reference Guide to Data Center Automation The reasoning behind processes should be well-understood. IT staff will be much more likely to adhere to processes that they understand and agree with. Managers should avoid implementing unnecessarily rigid rules: Processes should not attempt to describe every possible action an employee must take. Instead, implementers should be given some leeway in determining the best method by which to complete smaller portions of the tasks. Presenting processes as flexible and evolving guidelines can go a long way toward ensuring compliance. Designing and Implementing Processes When you choose to design and implement a new process, it’s important to solicit input from all the individuals and business units that might be involved. Ideally, the process will have collective ownership. Although you might be able to coerce employees to follow specific sequences of steps, you might reduce overall productivity by hurting morale and overlooking better ways to do things. The best processes will solicit and incorporate input from all of those involved. Although it might be painful, sometimes one of the best things that IT managers can do is get out of the way. Another important consideration to keep in mind is that processes should never be considered “final.” Instead, they should evolve when business and technical needs change. If you hear systems administrators explaining that processes reflect the way things “used to be done,” it’s probably time to update the process. In order to ensure that the proper steps are being followed, however, IT staff should be encouraged to propose changes to the process. In fact (and at the risk of sounding like a management fad), a process to control process changes might be in order. Often, processes can require many steps, and it can be very difficult for all of those that are involved to understand them. One useful method for communication processes is that of flowcharts. Figure 18 provides an example of a server deployment process. Note that decisions and responsibilities are clearly identified, and roles for each step have been defined.

Figure 18: An example of a server deployment workflow process.

Overall, the key goals are that those who follow processes should clearly understand the benefits. Without buy-in, the process will be seen as a chore that is forced by management.
71

The Reference Guide to Data Center Automation Managing Exceptions An unfortunate fact related to working in the real world is that most rules will have at least occasional exceptions. For example, in an emergency downtime situation, you might not have enough time to walk through all the steps in a change and configuration management process. Clearly, in that case, resolving the problem as quickly as possible is the most important factor. However, the goal should be for exceptions to be relatively rare. If exceptions do occur frequently, it’s probably worth considering adding them to the current process or developing a new process. Delegation and Accountability One crucial aspect related to developing and managing processes is the people involved. Although it might be easy to define a process and just expect everyone to follow it, there will be many cases in which this simply will not happen. Rapidly approaching deadlines and juggling multiple responsibilities and handling related concerns can often cause diligence related to processes to slip. One way to ensure that processes are consistently enforced is to ensure that specific individuals are tasked with reviewing steps and ensuring that they’re followed. Management can add accountability and metrics to the individuals based on how closely processes are followed and how many exceptions are made. Examples of IT Processes By now, it’s likely that you’re either considering updating existing procedures or putting new processes in place. That raises the question of which operations can benefit most from welldefined processes. In general, it’s best to focus on tasks that require multiple steps and multiple operations in order to be completed. The tasks should happen frequently enough so that the process will be used regularly. Other characteristics include business goals that are often not met due to miscommunications or inconsistent ways of handling the tasks that are involved. Some specific examples of IT processes that organizations might either have in place or might be considering are shown in Table 6.
Business Process Change and Configuration Management IT Purchasing Possible Steps Formal documentation of change requests and approval by a CAB Requests for multiple quotes (if possible), cost justification, ROI/TCO analysis, and approvals from senior management Server configuration review, security configuration checklist, and management acceptance of new configuration Documentation of new requests, prioritization based on relevant Service Level Agreements (SLAs), and escalation of process details Notes Standard forms for communicating changes can be helpful Different processes or approval levels might apply based on the cost and business area related to the purchase The server should be based on one of the predefined supported configurations At any given point in time, the issue must be “owned” by a specific individual

Server Deployments

Service Desk

Table 6: Examples of IT processes.

72

The Reference Guide to Data Center Automation Automating Process Management One important way in which IT managers can better implement, enforce, and manage processes is through the use of data center automation tools and utilities. Ideally, these tools will provide the ability to quickly and easily define processes and workflow. The steps might involve branching logic, approvals, and metrics that must be met along the way. By storing this information consistently and in an accessible way, all people involved should be able to quickly and easily view details about the steps required to complete a particular task. If the solution can lead the individual or user through the steps required to complete a process correctly, compliance will increase significantly. Additionally, automated process management tools should provide the ability to audit and report on whether particular processes were closely followed. Overall, when implemented and managed properly, IT processes are a significant characteristic of a well-managed IT environment. Processes can help ensure that tasks are performed consistently, efficiently, and in accordance with business requirements.

73

The Reference Guide to Data Center Automation

Application Infrastructure Management
In the old days of information technology (IT), applications frequently fit on floppy disks or resided on a single mainframe computer. As long as the hardware platform met the minimum system requirements, data center administrators could be fairly sure that the application would run properly. And, ensuring uptime and reliability involved ensuring that the few computers that ran the software were running properly. Times have definitely changed. Modern applications are significantly more complicated, and can often rely on many different components of an overall IT architecture. Understanding Application Infrastructure When considering hardware, software, network, and operating system (OS) requirements, the entire infrastructure that is required to support an application can include dozens of computers and devices. The actual number of independent parts adds complexity, which in turn can make it much more difficult to manage overall systems. For example, if a user complains of slow reporting performance related to a Web application, it’s not always easy to pinpoint the problem. Perhaps the database server is bogged down fulfilling other requests. Or, perhaps the problem is on the WAN link that connects the user to the Web server. Or, a combination of factors might be leading to the problem. Figure 19 shows a simplified path of interactions for creating a report. Each component in the figure is a potential bottleneck.

Figure 19: Potential performance bottlenecks in a modern distributed environment.

This relatively simple situation highlights the importance of understanding the entire infrastructure that is required to manage an application. For IT departments that support multiple sites throughout the world and dozens of different line-of-business applications, the overall problems can be far more complex.

74

The Reference Guide to Data Center Automation

Challenges of Application Infrastructure Management Most IT organizations attempt to manage important applications without fully understanding them. The theory is that as long as areas such as the network infrastructure are properly configured, all the applications on which it depends should also work optimally. Although such certainly can be the case in some situations, some types of issues can be far more complicated to manage. For example, in the area of change and configuration management, making a single change might have unexpected consequences. Seemingly unrelated modifications to a firewall configuration, for example, might cause connectivity issues in another application. The main challenge for IT is to be able to identify all the inter-related components of an application and to have the ability to compare and verify suggested changes before they’re made. Inventorying Application Requirements To get a handle on the complete requirements for complex applications, IT departments should start by taking an inventory of important applications. For example, a Web-based CRM tool that is hosted by an external provider might have relatively simple requirements: As long as users’ workstations can access the Internet, they will be able to use the application (although even a Web application might impose other requirements, such as specific browser features). The infrastructure requirements might include network connectivity to the desktop and the firewall and access through edge routers. Data center applications that require multiple servers can be significantly more complex. Often, multi-tier applications consist of components that include routers, switches, firewalls, and multiple servers. From a logical organization standpoint, the requirements for the application should include all these devices. Identifying Interdependencies Infrastructure components that are shared by multiple applications can be identified after taking an inventory of the application requirements. Often, the results can help provide greater insight into operations. Figure 20 provides an example of a shared component that might be used by multiple applications.

75

The Reference Guide to Data Center Automation

Figure 20: Shared application infrastructure requirements for a modern, distributed application.

As an example, a single low-end switch might be identified as a single point-of-failure for multiple important applications. In this case, an investment in upgrading the hardware or implementing redundancy might help protect overall resources. Also, whenever changes are being planned, test and verification processes should include examining all of the applications that use the same components. Automating Application Infrastructure Management It’s probably evident that even in relatively simple IT environments, identifying and managing application infrastructure components can be a complicated task. Fortunately, through the use of data center automation solutions, much of the work can be managed automatically. By storing application infrastructure information centrally in a Configuration Management Database (CMDB), IT staff can quickly find details about all the devices that are required to support an application or service. High-end solutions provide the ability to be able to visualize the interdependencies between hardware, software, and network resources in ways that are easy to understand. Change and configuration management features can also help keep track of the effects of modifications and can help avoid potential problems before they occur. Using Application Instrumentation Many third-party applications provide built-in methods for monitoring performance and application activity. Collectively known as “instrumentation,” these features may take the form of a custom Application Programming Interface (API), OS performance monitor counters, or log files. IT departments should look for data center automation solutions that can collect and report on this data.

76

The Reference Guide to Data Center Automation

Managing Applications Instead of Devices Although it’s easy to get bogged down in the heads-down technical details of maintaining an IT environment, the overall success of operations is not based on routers, servers, and workstations. Instead, the real goal is to manage the applications and services upon which users depend. Welldesigned data center automation tools can help IT staff visualize complex inter-dependencies even in widely distributed environments. By focusing on the management of entire applications, IT departments can significantly improve performance, reliability, and availability.

Business Continuity for Servers
It’s no secret that the success of enterprise environments is at least partially based on reliable and available computing resources. In modern business environments, even minor disruptions in service can result in large financial losses. Normally, outages can be caused by power failures, hardware failures, or even the unavailability of an entire data center. As most organizations have become increasingly reliant on IT, technical managers have been tasked with ensuring that services can continue, even in the case of major disasters. The Value of Business Continuity Generally, most IT and business leaders have a good idea about the value of business continuity planning. Simply put, the goal is to avoid downtime and to minimize potential data loss. Although it might be tempting to imagine a large meteor heading towards your data center (perhaps, targeting your mission-critical systems), there are many other reasons to protect against disaster. Security breaches or malicious intruders to your system could cause a tremendous amount of damage to systems. In addition, most organizations must rely on infrastructure that is out of their immediate control, such as electric grids and Internet infrastructure. Finally, good old-fashioned user or administrator error can lead to downtime. When all of this is taken into account, the reasons for implementing business continuity are compelling. Unfortunately, maintaining complete redundancy can be an expensive proposition. Therefore, the organization as a whole should work together to determine the high-level reasons for undertaking a business continuity initiative. In some cases, the main drivers will be related to contractual obligations or complying with regulatory requirements. In other cases, the financial impact of downtime or data loss might create the business case. The important point is for the entire business to realize the value of disaster planning. Inevitably, organizations will need to determine what needs to be protected and how much is appropriate to spend. The main point is that a successful business continuity approach will include far more than just the IT department—the organization’s entire management team must be involved in order for it to be successful.

77

The Reference Guide to Data Center Automation

Identifying Mission-Critical Applications and Servers Given infinite resources, implementing business continuity would be simple: multiple redundant environments could be created, and the infrastructure to support real-time synchronization of data would be readily available. In the real world, financial and technical constraints make the process much more difficult. Therefore, before looking at the technical aspects of implementing disaster recovery measures, IT management should meet with business leaders to identify the critical portions of the infrastructure that must be protected. Assuming that not all resources can be completely protected, it’s important to determine the value of each important asset. The first step in prioritization is to take an inventory of the most important high-level functions of the IT department. For example, an online financial services firm might rely heavily upon stock trading software. Next, the technical details of supporting the application should be identified. Modern applications will have many different requirements, including network connections and devices, authentication and security services, and many physical computer systems. In order to provide continuity for the entire end-user service, it’s important that none of these components is overlooked. Ideally, IT management will be able to provide an estimate of the cost required to protect each system. In most environments, this process can be challenging, but it’s absolutely critical to ensuring a reliable business continuity plan. Developing a Business Continuity Plan for Servers When developing a plan for managing servers during disaster situations, it’s important to keep in mind the overall goal—to allow business to continue. Often, systems and network administrators will focus on the lower-level technical details of high availability. For example, redundant power supplies and RAID disk configurations can help reduce the likelihood of downtime and data loss. However, the overall approach to high availability should include details related to all areas of operations. For example, even if data and hardware is protected, how will an actual failover occur? Will users be required to implement any changes? What is the process for the IT team? Immediately after a failure occurs is probably the worst time to “rehearse” this process. Business continuity planning generally involves several major steps (see Figure 21). The process begins with identifying which systems must be protected. Then specific business and technical requirements should be defined. Finally, based on this information, the organization will be ready to look at implementing the business continuity plan.

Figure 21: Steps to include in a server continuity plan.

78

The Reference Guide to Data Center Automation

Defining Business and Technical Requirements A general best practice related to performing backups is to base the actual processes that are performed on recovery requirements. When developing business continuity implementations, there are several important factors to take into account: • Acceptable data loss—Although most business managers would rather not think about it, the potential for data loss during a disaster is difficult to avoid. Businesses should come up with a realistic idea of how much data loss is acceptable. An important consideration will be approximate costs. Is it worth a $1.2M investment to ensure that no more than 2 minutes of transactions will ever be lost? Or is it acceptable to lose up to an hour’s worth of transactions to lower the implementation cost? Other considerations include the impact to actual production systems. For example, two-phase commit (2PC) replication for database servers can add single points of failure and can decrease overall production performance. Automated failover—A disaster or system failure can occur at any time. One requirement to ensure the highest levels of availability is that of automatic failover. Like other factors, however, this comes at a significant cost. For a seamless failover to occur, many aspects of the infrastructure must be prepared. Starting from the server side, machines must be able to coordinate the removal of one server from active service and the promotion of another one to take its place. This process usually requires a third “witness” server. Additionally, the network infrastructure and configuration must be adapted. Finally, changes might be required on the client-side. Although Web applications can often failover without users noticing, full client-side applications might require users to change connection settings or to log out and log back into the system. Clearly, there is a lot of work to be done to ensure automatic failover, but in some business cases, this work is unavoidable. Time for failover—When primary production servers become unavailable, it will generally take some period of time for the backup site to take its place. There are many challenges related to minimizing this time. For example, how long should systems wait before determining that a failover should take place? And, how is a failure defined? Business should decide on acceptable failover times, taking into account the cost and feasibility of supporting those levels of availability. Furthermore, the entire process should be tested to ensure that there are no unexpected surprises. Even multi-million dollar disaster recovery plans can fail due to seemingly minor configuration discrepancies.

Now that we have a good idea of some of the business and technical considerations, let’s look at how you can use this information to create a plan.

79

The Reference Guide to Data Center Automation

Implementing and Maintaining a Backup Site The most important aspect of implementing a business continuity plan involves the creation of a secondary site that can be used in the event of a failure. A backup site will generally contain enough hardware and infrastructure services to support critical backup operations from a remote location. Setting up this new site generally involves purchasing new hardware and duplicating the configuration of current production equipment. Although systems administrators are generally aware of the steps required to perform these processes, it can be difficult to replicate configurations exactly. Once a backup site has been implemented, it’s time to look at details related to maintaining it. In some cases, business requirements might allow for periodic backups and restores to be performed. In those cases, some data loss is acceptable. In other situations, however, the backup site must be kept up to date in real-time and must be ready for a loss-less failover in a matter of seconds. For servers, various solutions such as clustering, replication, backup and recovery, and related methods can be used. Regardless of the technical approach, however, a lot of time and effort is usually required to implement and monitor synchronization for a disaster recovery site. Automating Business Continuity Implementing business continuity is generally no small undertaking. IT staff must have a complete understanding of the resources that are to be protected, and all technical information must be kept up to date. It’s simply unacceptable for changes to be made in the production environment without an accompanying change within the disaster recovery site. Fortunately, data center automation tools can greatly help reduce the amount of time and effort that is required to maintain a disaster recovery site. Using a Configuration Management Database The purpose of a Configuration Management Database (CMDB) is to centrally store information related to the entire infrastructure that is supported by an IT department. Specifically, related to servers, the CMDB can store configuration details about the operating system (OS), security patches, installed applications, and network configuration. Using this information, systems administrators can quickly view and compare configuration details for the disaster recovery site. One of the potential issues with maintaining redundant sites is ensuring that a site that is effectively “offline” is ready for a failover. Therefore, reports can be centrally run in order ensure that there are no undetected problems with the backup site.

80

The Reference Guide to Data Center Automation

Change and Configuration Management The operations related to keeping a backup site up to date leaves a lot of room for error. If done manually, the process involves a doubling of effort whenever configuration changes are made. Data center automation tools that provide for server change and configuration management can automatically commit the same change to multiple servers (see Figure 22). This is ideal for situations in which a backup site must remain synchronized with the production site, and it dramatically reduces the chances of human error.

Figure 22: Automating configuration management using data center automation tools.

Overall, the process of developing and implementing a business continuity plan for servers will be a major undertaking for IT staff and management. However, through the use of data center automation tools, the process can be significantly simplified, and administration overhead can be minimized. The end result is increased protection of critical data and services at a reasonable overall cost.

81

The Reference Guide to Data Center Automation

Network and Server Maintenance
Although it might not be the most glamorous aspect of IT, maintaining network and server devices is a critical factor in managing overall IT services. Most systems administrators are wellversed in performing standard maintenance tasks manually, but there are many advantages to automating routine operations. Most IT organizations have established at least some basic procedures and processes related to the maintenance of devices in the data center. For example, patches might be installed on servers, as needed, and device configurations might be routinely backed up. Network and Server Maintenance Tasks Perhaps one quick way of building a list of maintenance tasks is to ask IT administrators what they least enjoy doing. Although a complete list of routine IT tasks could fill many books, this section will focus on common maintenance areas. Each section will explore how data center automation tools can provide significant benefits over manual processes. Configuration Management Over time, servers and network equipment will likely need to be updated to meet changing business needs. For example, when a router is reconfigured, network address information may need to change on multiple servers. Alternatively, the organization might implement stricter security policies that must then be applied to hundreds of devices. In very small and simple network environments, it might be reasonable to perform these changes manually. In most IT environments, the process of manually making changes is one that is tedious and leaves a lot of room for error. Data center automation solutions can ease the process of making changes on even hundreds of devices. The process generally involves a member of the IT staff specifying the change that should be made. Assuming that the staffer has the appropriate permissions, the actual modifications can be scheduled or applied immediately. Often, the task can be completed in a matter of minutes, and all that is left for the administrator to do is verify the change log. Applying System and Security Updates Computers and network devices often benefit from periodic updates. For example, operating system (OS) vendors often release updates that can fix potential functional problems or add functionality. And, security updates are critical to ensuring that important systems and data remain protected. An automated patch management solution can quickly deploy a single update to even thousands of devices with minimal effort. Figure 23 illustrates an automated patch deployment process. In this example, a systems administrator has tested an OS patch and has verified that it is ready to deploy to a set of production servers. Instead of connecting to the servers individually, the change request is sent to a data center automation solution. This server identifies which machines require the update and then automatically manages the patch deployment process. While the updates are being performed, the administrator can view the progress by using the central configuration console.

82

The Reference Guide to Data Center Automation

Figure 23: Applying updates using an automated system.

Monitoring Performance All modern OSs require some standard maintenance operations in order to perform at their best. Actions such as deleting unnecessary files and performing defragmentation can help keep systems performing optimally. For certain types of applications, such as databases, other tasks such as index defragmentation or consistency checks might be required. By implementing automated monitoring solutions, administrators can often be alerted to potential problems before users experience them. And, many types of problems can be fixed automatically, requiring no manual intervention at all. Implementing Maintenance Processes In addition to the various categories of tasks we’ve covered thus far, there are several considerations that IT departments should keep in mind when performing maintenance operations.

83

The Reference Guide to Data Center Automation

Delegating Responsibility An important best practice to keep in mind is that of delegating responsibility. Without coordination between members of the IT team, statements like, “I thought you were going to take care of that last week,” can lead to significant problems. Data center automation solutions can allow IT managers to create and configure schedules for their staff members, and can assign specific duties. This can make it much easier to handle vacation schedules and to ensure that no area of the environment is left completely uncovered at any time. Developing Maintenance Schedules Systems and network administrators are likely all too familiar with spending cold nights and evenings in the server room in order to work during “downtime windows.” Although the goal is a good one—to minimize disruption to production services—it’s still one that is dreaded by IT staff. Through the use of data center automation solutions, downtime windows can be scheduled for any time, and changes can be applied automatically. Administrators can review and verify the changes the next day to ensure that everything worked as planned. Verifying Maintenance Operations The very nature of performing maintenance often relegates important tasks to “back burner” status. When performed manually, it’s far too easy for a network or systems administrator to forget to update one or a few devices, or to be called off on other tasks. In addition to automatically making changes, data center automation solutions can store expected configuration information within a Configuration Management Database (CMDB). IT managers and staff can then compare the actual configuration of devices to their expected configuration to find any discrepancies. This process can quickly and easily ensure that maintenance operations are not overlooked, and that all systems are up to specifications. The Benefits of Automation Overall, without data center automation solutions, the process of maintaining server and network equipment can take a significant amount of time and effort. And, it’s rarely any IT staffer’s favorite job. Through the use of automation, however, tasks that used to take days, weeks, or months can be implemented in a matter of minutes. And, the process can be significantly more reliable, leading to improved uptime, quicker changes, and a better experience for IT departments and end users.

84

The Reference Guide to Data Center Automation

Asset Management
The goal of asset management is to track the fixed assets that an organization owns and controls. From a general standpoint, asset management can include everything ranging from racks to servers and from buildings to storage devices. IT departments are often tasked with managing budgets and keeping track of inventory, even in highly distributed environments. Without proper technology and processes in place, it can quickly become difficult to find and manage all of these assets. The following sections focus on what to track and how to develop a process that will allow for easily maintaining information about hardware, software, and other important aspects of a distributed IT environment. Benefits of Asset Management Although many organizations perform some level of asset tracking using a variety of methods, usually these processes leave much to be desired. For example, IT managers might be tasked with performing a complete audit of software used by the PCs in a particular department, based on the request of the CFO. The process of collecting, analyzing, and verifying the data can be extremely painful. Often, systems administrators must manually log into many different devices to get the information they need. It can take weeks or even months to complete, and still the accuracy of the information might be difficult to verify. By implementing best practices related to asset management, IT departments and the organizations they support can quickly realize many important benefits: • Lowering costs—Many IT departments are in a position to negotiate deals and discounts with hardware and software vendors. Often, however, IT departments can leave “money on the table” by not leveraging their bargaining power. In some cases, too much equipment may be purchased, leading to unused systems. In other cases, the IT departments are unaware of exactly how much they’re spending with a vendor, making it difficult to use this information during pricing negotiations. Asset management practices can shed light on the overall resource usage of the entire organization and can lead to better decision making. Security—“Out of sight, out of mind,” can apply to many of the assets that are managed by an IT department. During audits or when troubleshooting problems, IT staff might find network devices that have not been patched, or servers for which there is no known purpose. This ambiguity can lead to security problems. Through the use of asset management tools, IT departments can be sure that the purpose and function of each device is known, and they can help ensure that no system is overlooked when performing critical system maintenance. Improved service levels—IT departments that are unaware of the location and purpose of devices that they support are generally unable to provide high levels of service and responsiveness whenever problems arise. When asset management can be used to provide the entire IT staff with visibility of the entire environment, monitoring and troubleshooting systems can become significantly easier and more efficient. The end result is quicker and more thorough issue resolution.

85

The Reference Guide to Data Center Automation • Regulatory compliance—The proper management of fixed computing assets is an important part of many regulatory requirements. It is also an important financial practice. IT managers must be able to identify and locate various capital purchases during an audit, and must be able to provide details related to the purpose and history of the device. Software licensing—In most IT environments, a significant portion of overall capital expenditures is related to software licensing. Operating systems (OSs), office productivity applications, and specialized tools all incur costs related to purchasing, installation, configuration, and maintenance. It’s not uncommon for an IT department to support dozens or even hundreds of different applications. Without an asset management solution, it can be difficult to produce an up-to-date picture of which software is installed (and what is actually being used). However, with this information, IT departments can quickly identify how many licenses are actually needed and whether licenses can be reallocated. The information might indicate that reduced investments in some software and upgrades of other products might be in order. Budgeting—Providing service while staying within budgetary constraints is one of the most challenging tasks for IT departments. Often, purchasing is handled in a case-bycase, ad-hoc manner. Whenever new assets are needed, IT managers must justify the related expenditures to upper management. And, there are often surprises related to unexpected expenses. By efficiently tracking current assets (and their levels of usage), IT management can provide significantly more accurate predictions about ongoing and future capital asset requirements to the rest of the business.

Once you’re sold on the benefits of asset management, it’s time to look at how you can implement this technology. Developing Asset Management Requirements Before implementing an asset management solution, organizations should look at what information they need to track. Although the basic needs of asset management are well-defined, additional data can often help improve operations. At a minimum, IT departments should be able to enumerate all the devices that they manage. They should be able to uniquely identify each asset and find the current physical location of the device. In the case of a data center, this might involve the row, rack, and position numbers. Figure 24 shows an example of a simple rack diagram.

86

The Reference Guide to Data Center Automation

Figure 24: Developing a rack diagram for asset management.

In addition to basic information, IT departments should consider capturing details related to the initial and ongoing costs for the device, details about its purpose, and any configuration information that can help in troubleshooting and management. Identifying Asset Types Loosely defined, IT assets can include many different items. The granularity of what is tracked could extend to physical buildings, office spaces, and even network cables. So that raises the consideration of what an IT department should practically track. The main rule is usually based on asset value. It might be determined that only devices that cost more than $250 should be tracked. Table 7 provides a list of the types of assets that should generally be included by an asset tracking solution.

87

The Reference Guide to Data Center Automation

Category Software

Examples Operating systems Office productivity applications Line-of-business applications Standard utilities (firewall software, anti-virus, anti-spyware)

Information to Collect The purpose and cost of each supported application Where software applications are installed Actual application usage Unauthorized software installations Computer name and model Hardware and network configuration details Asset cost and related information Purpose of the server Computer name and model Hardware and network configuration details Asset cost and related information Support contract details Current location of the asset Current “owner” of the device Purpose of the device Information about the capabilities of the device Security information Device manufacturer and model Purpose of each device Physical location

Workstations

End-user desktop computers Training and test lab computers

Servers

Intranet servers Application servers Database servers

Mobile devices

Laptop computers PDAs Other “smart” portable devices

Networking devices

Routers Switches Firewalls Content caches Intrusion detection/prevention systems

Table 7: An example list of asset types.

It’s likely that IT departments will need to take into account other types of devices, as well. For example, if a business uses specialized hardware in a testing lab, that hardware should be included. Additionally, IT departments should take into account assets that are committed to remote sites.

88

The Reference Guide to Data Center Automation

Developing Asset Tracking Processes As with many other IT initiatives, developing solid asset-tracking processes is critical to ensuring that information remains consistent and relevant. Although software solutions can meet some of these needs, defined and enforced processes are crucial to the overall process. To facilitate the accurate tracking of asset data, organizations should physically place asset tags on devices. Doing so helps uniquely identify a device and requires no technical expertise to match up an asset with its information. All IT staff must be responsible for keeping asset management up to date through the use of the asset management system. For example, if a router is removed from service, this information should be captured by the asset management tool. A best practice is to include asset-tracking details in any change and configuration management process. Figure 25 shows some possible steps to the process.

Figure 25: Steps in an asset management process.

For organizations that have implemented the IT Infrastructure Library (ITIL) best practices, the Software Asset Management topic can be particularly useful. For more information, see the ITIL Web site at http://www.itil.co.uk/.

Automating IT Asset Management It’s likely obvious at this point that the process of asset management can be greatly simplified through the use of automation tools. The tasks of collecting, storing, and maintaining up-to-date data are often well-suited for computer systems. When examining asset management solutions, IT departments should look for features that fit into their overall automation tools frameworks. For example, a Web-based user interface (UI) can make accessing asset-related data easy for non-IT users. In addition, support for regular updates can help maintain the accuracy of information. Many IT industry hardware and software vendors have included asset tracking features in their solutions. Asset management products that can utilize this type of information should be preferred. The following sections look at other useful features that should be considered when evaluating asset management solutions.

89

The Reference Guide to Data Center Automation

Automated Discovery One of the largest barriers related to implementing asset management is the difficulty associated with collecting data about all the devices that must be supported in an IT environment. In some cases, this task might be done manually by physically or remotely connecting to each device and recording details. Of course, apart from the tedium of the process, it’s easy for certain devices to be overlooked altogether. Many asset management solutions can leverage an automated discovery feature to programmatically scan the network and find devices and nodes that are currently active. The process can often be performed very quickly and can include details about devices located throughout a distributed environment. Furthermore, routine audits can be performed to ensure that devices are still available and to track any changes that might have occurred. Using a Configuration Management Database Asset-related data is most useful when it can be combined with other details from throughout an IT environment. For this reason, using a Configuration Management Database (CMDB) is beneficial. The CMDB can centrally store details related to assets and their configuration. The CMDB should also store change-related data in order to ensure that data is always up to date. Integration with Other Data Center Automation Tools Ideally, an asset management solution will integrate with other automation tools used by an IT department. For example, service desk application users should be able to quickly and easily access details about workstation, server, and network devices. This ability can help them more quickly isolate and troubleshoot problems. In addition, systems administrators should be able to update configuration details about a server and have the information automatically update assetrelated details such as physical location, network details, and purpose of the computer. Many integrated data center automation solutions provide the ability to make assets easier to track and maintain with minimal effort from systems administrators and IT managers. Reporting The key goal of asset management is to facilitate reporting. IT managers should be able to generate on-demand information about hardware, software, and network devices, as needed. Many asset management solutions will provide the ability to create real-time reports. Products often allow for Web-based report design and customization. By making asset-related information available to managers throughout the organization, IT departments can better ensure that they are meeting overall business needs. Overall, by developing an asset management approach and selecting an appropriate data center automation tool, IT organizations can realize the benefits of tracking the devices they support with minimal time and effort.

90

The Reference Guide to Data Center Automation

Flexible/Agile Management
In just about any IT environment, changes are frequent and inevitable. Successful businesses must often make significant modifications to business and technical processes to keep pace with customer demands and increasing competition. In business and IT terms, agility refers to the ability to quickly and efficiently adapt to changes. The faster an IT organization can react to changes, the better aligned it will be with business units—and that will translate to overall success for the entire enterprise. Challenges Related to IT Management In some cases, the problems related to agile management might seem paradoxical. On one hand, IT managers work hard to define and enforce processes to ensure that changes are performed consistently and in a reproducible manner. This often requires additional steps to record and track changes, and processes to support them. On the other hand, IT departments must remain flexible enough to support changes that might come at a moment’s notice. This raises the question: How can an IT department plan for the future when anything could change at a moment’s notice? It’s important not to confuse agility with a lack of processes. As is the case with all areas of the business, chaos resulting from ad-hoc changes is rarely productive and can lead to far more complicated problems in the future. The main point for IT managers to remember is that they must preserve standard operating best practices, even when making large changes in a small period of time. The Agile Management Paradigm The term agile management is often heard in reference to managing software development projects. The central theme is to ensure that designers and programmers are ready to accommodate change at a moment’s notice. For many environments, the standard year-long cycles of designing, prototyping, implementing, and testing are no longer adequate. Business leaders want to see changes occur with little delay and are unwilling to accept the time and cost related to entire application rewrites. Therefore, the teams must work in much smaller cycles, and each portion of the development process should result in usable code. Many of the same goals and concepts also translate into the area of managing data center environments. Rather than setting up servers and network infrastructure and considering the job “done,” systems and network administrators must be ready to make major changes when they’re required.

91

The Reference Guide to Data Center Automation

Key Features of an Agile IT Department Although there are many aspects of IT management that can affect the overall quality of operations, there are common areas that should be kept in mind. The key features of an agile IT department include the following: • Coordination with business objectives—Agile IT departments recognize that their main function is to support business initiatives. IT managers and systems administrators must have a high level of awareness of the systems they support, and the reasons that they exist. This awareness can help IT immediately identify which areas might change due to shifts in business strategy rather than waiting until it becomes completely obvious that the systems no longer fit requirements. To keep on top of changes that might be coming, IT representatives should be included in business strategy meetings. Consistent and repeatable processes—A well-managed IT environment will adhere to best practices and processes such as those presented by the Information Technology Infrastructure Library (ITIL). Although it might seem that processes could get in the way of quick reactions, well-designed processes can usually be adapted to meet new requirements. Specifically, change and configuration management practices can help IT departments quickly react to new needs. Communications—Too often, IT departments tend to work in a way that is isolated from other areas of the organization. In some cases, IT doesn’t find out about the needs of its users until just before changes are required. This situation should be avoided by proactively talking with users and business managers to help prioritize any changes that might be required. In many cases, simple solutions can be developed that minimize disruptive impact while meeting business goals. Efficient administration—Most IT departments lack the resources to spend days, weeks, or months manually making system configuration changes. Rather, the changes must be made as quickly as possible, but still in a reliable way. Tasks such as the deployment of new software, upgrades to existing equipment, and the deployment of new computing hardware can take significant amounts of time and effort when performed manually. Through the use of dedicated tools for managing the IT infrastructure, even organizationwide changes can be implemented quickly and reliably.

Many other features can also help make IT departments more agile. However, the general rule is that greater agility comes from efficient and coordinated IT departments.

92

The Reference Guide to Data Center Automation

Automating IT Management Obviously, all these requirements related to automating IT management can necessitate a significant amount of expertise, time, and effort. As with many other areas of improving IT efficiency, data center automation tools can significantly help IT departments increase their flexibility. Especially when budgets and personnel resources are limited, investments in automation can decrease the overhead related to changes. Specific areas from which organizations can benefit include change and configuration management, server and network provisioning deployment, automatic updates, asset management, and reporting. For example, there are significant benefits to storing all IT-related information in a centralized Configuration Management Database (CMDB). The combined data can help IT and business leaders quickly identify which systems might need to be updated to accommodate business changes. Overall, the process of making an IT department more flexible and agile can provide tremendous advantages throughout an entire organization. By quickly adapting to changing needs, the role of IT can transform from a rate-of-change limitation to a strategic advantage. And, through the use of data center automation technology and best practices, IT organizations can quickly work towards the features that can help make them agile.

93

The Reference Guide to Data Center Automation

Policy Enforcement
Well-managed IT departments are characterized by having defined, repeatable processes that are communicated throughout the organization. However, sometimes that alone isn’t enough—it’s important for IT managers and systems administrators to be able to verify that their standards are being followed throughout the organization. The Benefits of Policies It usually takes time and effort to implement policies, so let’s start by looking at the various benefits of putting them in place. The major advantage to having defined ways of doing things in an IT environment is that of ensuring that processes are carried out in a consistent way. IT managers and staffers can develop, document, and communicate best practices related to how to best manage the environment. Types of Policies Policies can take many forms. For example, one common policy is related to password strength and complexity. These requirements usually apply to all users within the organization and are often enforced using technical features in operating systems (OSs) and directory services solutions. Other types of policies might define response times for certain types of issues or specify requirements such as approvals before important changes are made. Some policies are mandated by organizations outside of the enterprise’s direct control. The Health Insurance Portability and Accountability Act (HIPAA), the Sarbanes-Oxley Act, and related governmental regulations fall into this category. Defining Policies Simply defined, policies specify how areas within an organization are expected to perform their responsibilities. For an IT department, there are many ways in which policies can be used. On the technical side, IT staff might create a procedure for performing system updates. The procedure should include details of how downtime will be scheduled and any related technical procedures that should be followed. For example, the policy might require systems administrators to verify system backups before performing major or risky changes. On the business and operations side, the system update policy should include details about who should be notified of changes, steps in the approvals process, and the roles of various members of the team, such as the service desk and other stakeholders.

94

The Reference Guide to Data Center Automation

Figure 26: An overview of a sample system update policy.

Involving the Entire Organization Some policies might apply only to the IT department within an organization. For example, if a team decides that it needs a patch or update management policy, it can determine the details without consulting other areas of the business. More often, however, input from throughout the organization will be important to ensuring the success of the policy initiatives. A good way to go gather information from organization members is to implement an IT Policy committee. This group should include individuals from throughout the organization. Figure 27 shows some of the areas of a typical organization that might be involved. In addition, representation from IT compliance staff members, HR personnel, and the legal department might be appropriate based on the types of policies. The group should meet regularly to review current policies and change requests.

Figure 27: The typical areas of an organization that should be involved in creating policies.

IT departments should ensure that policies such as those that apply to passwords, email usage, Internet usage, and other systems and services are congruent with the needs of the entire organization. In some cases, what works best for IT just doesn’t fit with the organization’s business model, so compromise is necessary. The greater the “buy-in” for a policy initiative, the more likely it is to be followed.

95

The Reference Guide to Data Center Automation

Identifying Policy Candidates For some IT staffers, the mere mention of implementing new policies will conjure up images of the pointy-haired boss from the Dilbert comic strips. Cynics will argue that processes can slow operations and often provide little value. That raises the question of what characterizes a welldevised and effective policy. Sometimes, having too many policies (and steps within those policies) can actually prevent people from doing their jobs effectively. So, the first major question should center around whether a policy is needed and the potential benefits of establishing one. Good candidates for policies include those areas of operations that are either well defined or need to be. Sometimes, the needs are obvious. Examples might include discovering several servers that haven’t been updated to the latest security patch level, or problems related to reported issues “falling through the cracks.” Also, IT risk assessments (which can be performed in-house or by outside consultants) can be helpful in identifying areas in which standardized operations can streamline operations. In all of these cases, setting up policies (and verifying that they are being followed) can be helpful. Communicating Policies Policies are most effective when all members of the organization understand them. In many cases, the most effective way to communicate a policy is to post it on an intranet or other shared information site. Doing so will allow all staff to view the same documentation, and it will help encourage updates when changes are needed. Policy Scope Another consideration related to defining policies is determining how detailed and specific policies should be. In many cases, if policies are too detailed, they may defeat their purpose— either IT staffers will ignore them or will feel stifled by overly rigid requirements. In those cases, productivity will suffer. Put another way, policy for the sake of policy is generally a bad idea. When writing policies, major steps and interactions should be documented. For example, if a policy requires a set of approvals to be obtained, details about who must approve the action should be spelled out. Additional information such as contact details might also be provided. Ultimately, however, it will be up to the personnel involved to ensure that everything is working according to the process. Checking for Policy Compliance Manually verifying policy compliance can be a difficult and tedious task. Generally, this task involves comparing the process that was performed to complete certain actions against the organization’s definitions. Even in situations that require IT staffers to thoroughly document their actions, the process can be difficult. The reason is the amount of overhead that is involved in manually auditing the actions. Realistically, most organizations will choose to perform auditing on a “spot-check” basis, where a small portion of the overall policies are verified.

96

The Reference Guide to Data Center Automation

Automating Policy Enforcement For organizations that tend to perform most actions on an ad-hoc basis, defining policies and validating their enforcement might seem like it adds a significant amount of overhead to the normal operations. And, even for organizations that have defined policies, it’s difficult to verify that policies and processes are being followed. Often, it’s not until a problem occurs that IT managers look back at how changes have been made. Fortunately, through the use of integrated data center automation tools, IT staff can have the benefits of policy enforcement while minimizing the amount of extra work that is required. This is possible because it’s the job of the automated system to ensure that the proper prerequisites are met before any change is carried out. Figure 28 provides an example.

Figure 28: Making changes through a data center automation tool.

Evaluating Policy Enforcement Solutions When evaluating automation utilities, there are numerous factors to keep in mind. First, the better integrated the system is with other IT tools, the more useful it will be. As policies are often involved in many types of modifications to the environment, combining policy enforcement with change and configuration management makes a lot of sense. Whenever changes are to be made, an automated data center suite can verify whether the proper steps have been carried out. For example, it can ensure that approvals have been obtained, and that the proper systems are being modified. It can record who made which changes, and when. Best of all, through the use of a few mouse clicks, a change (such as a security patch) can be deployed to dozens or hundreds of machines in a matter of minutes. Any time a change is made, the modification can be compared against the defined policies. If the changes meet the requirements, that are committed. If not, they are either prevented or a warning is sent to the appropriate managers. Additionally, through the use of a centralized Configuration Management Database (CMDB), users of the system can quickly view details about devices throughout the environment. This information can be used to determine which systems might not meet the organization’s established standards, and which changes might be required. Overall, through the use of automation, IT organizations can realize the benefits of enforcing policies while at the same time streamlining policy compliance.

97

The Reference Guide to Data Center Automation

Server Monitoring
In many IT departments, the process of performing monitoring is done on an ad-hoc basis. Often, it’s only after numerous users complain about slow response times or throughput when accessing a system that IT staff gets involved. The troubleshooting process generally requires multiple steps. Even in the best case, however, the situation is highly reactive—users have already run into problems that are affecting their work. Clearly, there is room for improvement in this process. Developing a Performance Optimization Approach It’s important for IT organizations to develop and adhere to an organized approach to performance monitoring and optimization. All too often, systems and network administrators will simply “fiddle with a few settings” and hope that it will improve performance. Figure 29 provides an example of a performance optimization process that follows a consistent set of steps.
Note that the process can be repeated, based on the needs of the environment. The key point is that solid performance-related information is required in order to support the process.

Figure 29: A sample performance optimization process.

Deciding What to Monitor Over time, desktop, server, and network hardware will require certain levels of maintenance or monitoring. These are generally complex devices that are actively used within the organization. There are two main aspects to consider when implementing monitoring. The first is related to uptime (which can report when servers become unavailable) and the other is performance (which indicates the level of end-user experience and helps in troubleshooting).

98

The Reference Guide to Data Center Automation

Monitoring Availability If asked about the purpose of their IT departments, most managers and end users would specify that it is the task of the IT department to ensure that systems remain available for use. Ideally, IT staff would be alerted when a server or application becomes unavailable, and would be able to quickly take the appropriate actions to resolve the situation. There are many levels at which availability can be monitored. Figure 30 provides an overview of these levels. At the most basic level, simple network tests (such as a PING request) can be used to ensure that a specific server or network device is responding to network requests. Of course, it’s completely possible that the device is responding, but that it is not functioning as requested. Therefore, a higher-level test can verify that specific services are running.

Figure 30: Monitoring availability at various levels.

Tests can also be used to verify that application infrastructure components are functioning properly. On the network side, route verifications and communications tests can ensure that the network is running properly. On the server side, isolated application components can be tested by using procedures such as test database transactions and HTTP requests to Web applications. The ultimate (and most relevant) test is to simulate the end-user experience. Although it can sometimes be challenging to implement, it’s best to simulate actual use cases (such as a user performing routine tasks in a Web application). These tests will take into account most aspects of even complex applications and networks and will help ensure that systems remain available for use.

99

The Reference Guide to Data Center Automation

Monitoring Performance For most real-world applications, it’s not enough for an application or service to be available. These components must also respond within a reasonable amount of time in order to be useful. As with the monitoring of availability, the process of performance monitoring can be carried out at many levels. The more closely a test mirrors end-user activity, the more relevant will be the performance information that is returned. For complex applications that involve multiple servers and network infrastructure components, it’s best to begin with a real-world case load that can be simulated. For example, in a typical Customer Relationship Management (CRM) application, developers and systems administrators can work together to identify common operations (such as creating new accounts, running reports, or updating customers’ contact details). Each set of actions can be accompanied by expected response times. All this information can help IT departments proactively respond to issues, ideally before users are even aware of them. As businesses increasingly rely on their computing resources, this data can help tremendously. Verifying Service Level Agreements One non-technical issue of managing systems in an IT department is related to perception and communication of requirements. For organizations that have defined and committed to Service Level Agreements (SLAs), monitoring can be used to compare actual performance statistics against the desired levels. For example, SLAs might specify how quickly specific types of reports can be run or outline the overall availability requirements for specific servers or applications. Reports can provide details related to how closely the goals were met, and can even provide insight into particular problems. When this information is readily available to managers throughout the organization, it can enable businesses to make better decisions about their IT investments. Limitations of Manual Server Monitoring It’s possible to implement performance and availability monitoring in most environments using existing tools and methods. Many IT devices offer numerous ways in which performance and availability can be measured. For example, network devices usually support the Simple Network Management Protocol (SNMP) standard, which can be used to collect operational data. On the server side, operating systems (OSs) and applications include instrumentation that can be used to monitor performance and configure alert thresholds. For example, Figure 31 shows how a performance-based alert can be created within the built-in Windows performance tool.

100

The Reference Guide to Data Center Automation

Figure 31: Defining performance alerts using Windows System Monitor.

Although tools such as the Windows System Monitor utility can help monitor one or a few servers, it quickly becomes difficult to manage monitoring for an entire environment. Therefore, most systems administrators will use these tools only when they must troubleshoot a problem in a reactive way. Also, it’s very easy to overlook critical systems when implementing monitoring throughout a distributed environment. Overall, there are many limitations to the manual monitoring process. In the real world, this means that most IT departments work in a reactive way when dealing with their critical information systems.

101

The Reference Guide to Data Center Automation

Automating Server Monitoring Although manual performance monitoring can be used in a reactive situation for one or a few devices, most IT organizations require visibility into their entire environments in order to provide the expected levels of service. Fortunately, data center automation tools can dramatically simplify the entire process. There are numerous benefits related to this approach, including: • Establishment of performance thresholds—Systems administrators can quickly define levels of acceptable performance and have an automated solution routinely verify whether systems are performing optimally. At its most basic level, the system might perform PING requests to verify whether a specific server or network device is responding to network requests. A much better test would be to execute certain transactions and measure the total time for them to complete. For example, the host of an electronic commerce Web site could create workflow that simulates the placing of an order and routinely measure the amount of time it takes to complete the enter process at various times during the day. The system can also take into account any SLAs that might be established and can provide regular reports related to the actual levels of service. Notifications—When systems deviate from their expected performance, systems administrators should be notified as quickly as possible. The notifications can be sent using a variety of methods, but email is common in most environments. The automated system should allow managers to develop and update schedules for their employees and should take into account “on-call” rotation schedules, vacations, and holidays. Automated responses—Although it might be great to know that a problem has occurred on a system, wouldn’t it be even better if the automated solution could start the troubleshooting process? Data center automation tools can be configured to automatically take corrective actions whenever a certain problem occurs. For example, if a particularly troublesome service routinely stops responding, the system can be configured to automatically restart the service. In some cases, this setup might resolve the situation without human intervention. In all cases, however, automated actions can at least start the troubleshooting process. Integration with other automation tools—By storing performance and availability information in a Configuration Management Database (CMDB), data center automation tools can help show IT administrators the “track record” for particular devices or applications. Additionally, integrated solutions can use change tracking and configuration management features to help isolate the potential cause of new problems with a server. The end result is that systems and network administrators can quickly get the information they need to resolve problems. Automated test creation—As mentioned earlier, the better a test can simulate what end users are doing, the more useful it will be. Some automation tools might allow systems administrators and developers to create actual user interface (UI) interaction tests. In the case of Web applications, tools can automatically record the sequence of clicks and responses that are sent to and from a server. These tests can then be repeated regularly to monitor realistic performance. Additionally, the data can be tracked over time to isolate any slow responses during periods of high activity.

102

The Reference Guide to Data Center Automation Overall, through the use of data center automation tools, IT departments can dramatically improve visibility into their environments. They can quickly and easily access information that will help them more efficiently troubleshoot problems, and they report on the most critical aspect of their systems: availability and performance.

Change Tracking
An ancient adage states, “The only constant is change.” This certainly applies well to most modern IT environments and the businesses they support. Often, as soon as systems are deployed, it’s time to update them or make modifications to address business needs. And keeping up with security patches can take significant time and effort. Although the ability to quickly adapt can increase the agility of organizations as a whole, with change comes the potential for problems. Benefits of Tracking Changes In an ad-hoc IT environment, actions are generally performed whenever a systems or network administrator deems them to be necessary. Often, there’s a lack of coordination and communication. Responses such as, “I thought you did that last week,” are common and, frequently, some systems are overlooked. There are numerous benefits related to performing change tracking. First, this information can be instrumental in the troubleshooting process or when identifying the root cause of a new problem. Second, tracking change information provides a level of accountability and can be used to proactively manage systems throughout an organization. Defining a Change-Tracking Process When implemented manually, the process of keeping track of changes takes a significant amount of commitment from users, systems administrators, and management. Figure 32 provides a highlevel example of a general change-tracking process. As it relies on manual maintenance, the change log is only as useful as the data it contains. Missing information can greatly reduce the value of the log.

Figure 32: A sample of a manual change tracking process.

103

The Reference Guide to Data Center Automation

Establishing Accountability It’s no secret that most IT staffers are extremely busy keeping up with their normal tasks. Therefore, it should not be surprising that network and systems administrators will forget to update change-tracking information. When performed manually, policy enforcement generally becomes a task for IT managers. In some cases, frequent reminders and reviews of policies and processes are the only way to ensure that best practices are being followed. Tracking Change-Related Details When implementing change tracking, it’s important to consider what information to track. The overall goal is to collect the most relevant information that can be used to examine changes without requiring a significant amount of overhead. The following types of information are generally necessary: • The date and time of the change—It probably goes without saying that the time at which a change occurs is important. The time tracked should take into account differences in time zones and should allow for creating a serial log of all changes to a particular set of configuration settings. The change initiator—For accountability purposes, it’s important that the person who actually made the change be included in the auditing information. This requirement helps ensure that the change was authorized, and provides a contact person from whom more details can be obtained. The initial configuration—A simple fact of making changes is that sometimes they can result in unexpected problems. An auditing system should be able to track the state of a configuration setting before a change was made. In some cases, this can help others resolve the problem or undo the change, if necessary. The configuration after the change—This information will track the state of the audited configuration setting after the change has been made. In some cases, this information could be obtained by just viewing the current settings. However, it’s useful to be able to see a serial log of changes that were made. Categories—Types of changes can be grouped to help the appropriate staff find what they need to know. For example, a change in the “Backups” category might not be of much interest to an application developer, while systems administrators might need to know about the information contained in this category. Comments—This is one area in which many organizations fall short. Most IT staff (and most people, for that matter) doesn’t like having to document changes. An auditing system should require individuals to provide details related to why a change was made. IT processes should require that this information is included (even if it seems obvious to the person making the change).

In addition to these types of information, the general rule is that more detail is better. IT departments might include details that require individuals to specify whether change management procedures were followed and who authorized the change.

104

The Reference Guide to Data Center Automation Table 8 shows an example of a simple, spreadsheet-based audit log. Although this system is difficult and tedious to administer, it does show the types of information that should be collected. Unfortunately, it does not facilitate advanced reporting, and it can be difficult to track changes that affect complex applications that have many dependencies.

Date/Time 7/10/2006

Change Initiator Jane Admin

System(s) Affected DB009 and DB011

Initial Configuration Security patch level 7.3

New Configuration Security patch level 7.4

Categories Security patches; server updates Vendorbased application update

7/12/2006

Joe Admin

WebServer007 and WebServer012 DB003 (All databases)

CRM application version 3.1 N/A

CRM application version 3.5 Created archival backups of all databases for off-site storage

07/15/2006

Dana DBA

Table 8: A sample audit log for server management.

Automating Change Tracking Despite the numerous benefits related to change tracking, IT staff members might be resistant to the idea. In many environments, the processes related to change tracking can cause significant overhead related to completing tasks. Unfortunately, this can lead to either non-compliance (for example, when systems administrators neglect documenting their changes) or reductions in response times (due to additional work required to keep track of changes). Fortunately, through the use of data center automation tools, IT departments can gain the benefits of change tracking while minimizing the amount of effort that is required to track changes. These solutions often use a method by which changes are defined and requested using the automated system. The system, in turn, is actually responsible for committing the changes. There are numerous benefits to this approach. First and foremost, only personnel that are authorized to make changes will be able to do so. In many environments, the process of directly logging into a network device or computer can be restricted to a small portion of the staff. This can greatly reduce the number of problems that occur due to inadvertent or unauthorized changes. Second, because the automated system is responsible for the tedious work on dozens or hundreds of devices, it can keep track of which changes were made and when they were committed. Other details such as the results of the change and the reason for the change (provided by IT staff) can also be recorded. Figure 33 shows an overview of the process.

105

The Reference Guide to Data Center Automation

Figure 33: Committing and tracking changes using an automated system.

By using a Configuration Management Database (CMDB), all change and configuration data can be stored in a single location. When performing troubleshooting, systems and network administrators can quickly run reports to help isolate any problems that might have occurred due to a configuration change. IT managers can also generate enterprise-wide reports to track which changes have occurred. Overall, automation can help IT departments implement reliable change tracking while minimizing the amount of overhead incurred.

Network Change Detection
Network-related configuration changes can occur based on many requirements. Perhaps the most common is the need to quickly adapt to changing business and technical requirements. The introduction of new applications often necessitates an upgrade of the underlying infrastructure, and growing organizations seem to constantly outgrow their capacity. Unfortunately, changes can lead to unforeseen problems that might result in a lack of availability, downtime, or performance issues. Therefore, IT organizations should strongly consider implementing methods for monitoring and tracking changes. The Value of Change Detection We already covered some of the important causes for change, and in most organizations, these are inevitable. Coordinating changes can become tricky in even small IT organizations. Often, numerous systems need to be modified at the same time, and human error can lead to some systems being completely overlooked. Additionally, when roles and responsibilities are distributed, it’s not uncommon for IT staff to “drop the ball” by forgetting to carry out certain operations. Figure 34 shows an example of some of the many people that might be involved in applying changes.
106

The Reference Guide to Data Center Automation

Figure 34: Multiple “actors” making changes on the same device.

Unauthorized Changes In stark contrast to authorized changes that have the best of intentions, network-related changes might also be committed by unauthorized personnel. In some cases, a junior-level network administrator might open a port on a firewall at the request of a user without thoroughly considering the overall ramifications. In worse situations, a malicious attacker from outside the organization might purposely modify settings to weaken overall security. Manual Change Tracking All these potential problems point to the value of network change detection. Comparing the current configuration of a device against its expected configuration is a great first step. Doing so allows network administrators to find any systems that don’t comply with current requirements. Even better is the ability to view a serial log of changes, along with the reasons the changes were made. Table 9 provides a simple example of tracking information in a spreadsheet or on an intranet site.

107

The Reference Guide to Data Center Automation

Date of Change 5/5/2006

Devices / Systems Affected Firewall01 and Firewall02 Corp-Router07

Change

Purpose of Change

Comments

Opened TCP port 1178 (outbound) Upgraded firmware

User request for access to Web application Addresses a known security vulnerability

Port is only required for 3 days. Update was tested on spare hardware

5/7/2006

Table 9: An example of a network change log.

Of course, there are obvious drawbacks to this manual process. The main issue is that the information is only useful when all members of the network administration team place useful information in the “system.” When data is stored in spreadsheets or other files, it’s also difficult to ensure that the information is always up to date. Challenges Related to Network Change Detection Network devices tend to store their configuration settings in text files (or can export to this format). Although it’s a convenient and portable option, these types of files don’t lend themselves to being easily compared—at least not without special tools that understand the meanings of the various options and settings. Add to this the lack of a standard configuration file type between vendors and models, and you have a large collection of disparate files that must be analyzed. In many environments, it is a common practice to create backups of configuration files before a change is made. Ideally, multiple versions of the files would also be maintained so that network administrators could view a history of changes. This “system,” however, generally relies on network administrators diligently making backups. Even then, it can be difficult to determine who made a change, and (most importantly) why the change was made. Clearly, there’s room for improvement. Automating Change Detection Network change detection is an excellent candidate for automation—it involves relatively simple tasks that must be carried out consistently, and it can be tedious to manage these settings manually. Data center automation applications can alleviate much of this pain in several ways. Committing and Tracking Changes It’s a standard best practice in most IT environments to limit direct access to network devices such as routers, switches, and firewalls. Data center automation tools help implement these limitations while still allowing network administrators to accomplish their tasks. Instead of making changes directly to specific network hardware, the changes are first requested within the automation tool. The tool can perform various checks, such as ensuring that the requester is authorized to make the change and verifying that any required approvals have been obtained.

108

The Reference Guide to Data Center Automation Once a change is ready to be deployed, the network automation utility can take care of committing the changes automatically. Hundreds of devices can be updated simultaneously or based on a schedule. Best of all, network administrators need not connect to any of the devices directly, thereby increasing security. Verifying Network Configuration Data center automation utilities also allow network administrators to define the expected settings for their network devices. If, for example, certain routing features are not supported by the IT group, the system can quickly check the configuration of all network devices to ensure that it has not been enabled. Overall, automated network change detection can help IT departments ensure that critical parts of their infrastructure are configured as expected and that no unwanted or unauthorized changes have been committed.

Notification Management
It’s basic human nature to be curious about how IT systems and applications are performing, but it can become a mission-critical concern whenever events related to performance or availability occurs. In those cases, it’s the responsibility of the IT department to ensure that problems are addressed quickly and that any affected members of the business are notified of the status. The Value of Notifications One of the worst parts of any outage is not being informed of the current status of the situation. Most people would feel much more at ease knowing that the electricity will come back on after a few hours instead of (quite literally) sitting in the dark trying to guess what’s going on. There are two broad categories related to communications within and between an IT organization: internal and external notifications. Managing Internal Notifications There are many types of events that are specific to the IT staff itself. For example, creating backups and updating server patch levels might require only a few members of the team to be notified. These notifications can be considered “internal” to the IT department. When sending notifications, an automated system should take into account the roles and responsibilities of staff members. In general, the rule should be to notify only the appropriate staff, and to provide detailed information. Sending a simple message stating “Server Alert” to the entire IT staff is usually not very useful. In most situations, it’s appropriate to include technical details, and the format of the message can be relatively informal. Also, escalation processes should be defined to make sure that no issue is completely ignored.

109

The Reference Guide to Data Center Automation

Managing External Notifications When business systems and applications are affected, it’s just as important to keep staff outside of the IT department well informed. Users might assume that “IT is working on it,” but often they need more information. For example, how long are the systems expected to be unavailable? If the outage is only for a few minutes, users might choose to just wait. If it’s going to be longer, perhaps the organization should switch to “Plan B” (which might involve using an alternative system or resorting to pen-and-paper data collection). Creating Notifications In many IT environments, IT departments are notorious for delivering vague, ambiguous, and overly technical communications. The goal for the content of notifications is to make them concise and informative in a way that users and non-technical management can understand. What to Include in a Notification There are several important points that should be included in any IT communication. Although the exact details will vary based on the type of situation and the details of the audience, the following list highlights some aspects to keep in mind when creating notifications: • Message details—The date and time of the notification, along with a descriptive subject line is a good start. Some users might want to automatically filter messages based on their content. Keep in mind that, for some users, the disruptions caused by notifications that don’t affect them might actually reduce productivity. IT departments should develop consistent nomenclature for the severity of problems and for identifying who might be affected. A well-organized message can help users find the information they need quickly and easily. Acknowledgement of the problem—This portion of the notification can quickly assure users that the IT staff is aware of a particular problem such as the lack of availability of an application. It’s often best to avoid technical details. Users will be most concerned about the fact that they cannot complete their jobs. Although it might be interesting to know that a database server’s disk array is unavailable or there is a problem on the Storage Area Network (SAN), it’s best to avoid unnecessary details that might confuse some users. Estimated time to resolution—This seemingly little piece of information can be quite tricky to ascertain. When systems administrators are unaware of the cause of a problem, how can they be expected to provide a timeframe for resolution? However, for users, not having any type of estimate can be frustrating. If IT departments have some idea of how long it will take to repair a problem (perhaps based on past experience), they can provide those details. It’s often better to “under-promise and over-deliver” when it comes to time estimates. If it’s just not possible to provide any reliable estimate, the notification should state just that and promise to provide more information when an update becomes available.

110

The Reference Guide to Data Center Automation • What to expect—The notification should include details about the current and expected effects of the problem. In some cases, systems and network administrators might need to reboot devices or cause additional downtime in unrelated systems. If time windows are known, it’s a good idea to include those details as well. Any required actions—If users are expect to carry out any particular tasks or make changes to their normal processes, this information should be spelled out in the notification. If emergency processes are in place, users should be pointed to the documentation. If not, a point-person (such as a department manager) should be specified to make the determinations related to what users should do. Which users and systems are affected—Some recipients of notifications might be unaware of the problem altogether. The fact that they’re receiving a notification might indicate that they should be worried. If it’s likely that some recipients will be able to safely ignore the message, this should also be stated clearly. The goal is to minimize any unnecessary disruption to work. Reassurance—This might border on the “public relations” side of IT management, but it’s important for users to believe that their IT departments are doing whatever is possible to resolve the situation quickly. The notification might include contact information for reporting further problems, and can refer users to any posted policies or processes that might be relevant to the downtime.

Although this might seem like a lot of information to include, in many cases, it can be summed up in just a few sentences. The important point is for the message to be concise and informative. What to Avoid in a Notification Notifications should, for the most part, be brief and to the point. There are a few types of information that generally should not be included. First, speculation should be minimized. If a systems administrator suspects the failure of a disk controller (which has likely resulted in some data loss), it’s better to wait until the situation is understood before causing unnecessary panic. Additional technical details can also cause confusion to novice users. Clearly, IT staff will be in a position of considerable stress when sending out such notifications, so it’s important to stay focused on the primary information that is needed by IT users. Automating Notification Management Many of the tasks related to creating and sending notifications can be done manually, but it can be a tedious process. Commonly, systems administrators will send ad-hoc messages from their personal accounts. They will often neglect important information, causing recipients to respond requesting additional details. In the worst case, messages might never be sent, or users might be ignored altogether. Data center automation tools can fill in some of these gaps and can help ensure that notifications work properly within and outside of the IT group. The first important benefit is the ability to define the roles and responsibilities of members of the IT team within the application. Contact information can also be centrally managed, and details such as on-call schedules, vacations, and rotating responsibilities can be defined. The automated system can then quickly respond to issues by contacting those that are involved.

111

The Reference Guide to Data Center Automation The messages themselves can use a uniform format based on a predefined template. Fields for common information such as “Affected Systems,” “Summary,” and “Details” can also be defined. This can make it much easier for Service desk staff to respond to common queries about applications. Finally, the system can keep track of who was notified about particular issues, and when a response was taken. Overall, automated notifications can go a long way toward keeping IT staff and users informed of both expected and unexpected downtime and related issues. The end result is a better “customer satisfaction” experience for the entire organization.

112

The Reference Guide to Data Center Automation

Server Virtualization
Virtualization refers to the abstraction between the underlying physical components of an IT architecture and how it appears to users and other devices. The term virtualization can be applied to network devices, storage environments, databases, other portions of an IT infrastructure, and servers. Simply put, server virtualization is the ability to run multiple independent operating systems (OSs) concurrently on the same hardware. Understanding Virtualization The concept of running multiple “virtual machines” on a single computer can be traced back to the days of mainframes. In that architecture, many individual computing environments or sessions can be created on a single large computer. Although each session runs in what seems like an isolated space, the underlying management software and hardware translates users’ requests and commands so that users can access the same physical hardware. The benefits include scalability (many virtual machines can run simultaneously on the same hardware) and manageability (most administration is handled centrally and client-side hardware requirements are minimal). Current Data Center Challenges Before diving into the technical details of virtual machines and how they work, let’s set the foundation by exploring the background for why virtualization has quickly become an important option for data center administrators. The main issue is that of server utilization—or lack thereof. The vast majority of computers in most data centers run at a fraction of their overall potential (often as little as 10 to 15 percent). The obvious solution is server consolidation: Placing multiple applications on the same hardware. However, due to the complexity of many environments, potentials for conflicts can make server consolidation difficult if not impossible. One of the many benefits of virtualization is that it allows systems administrators to easily create multiple virtual operating environments on a single server system, thereby simplifying server consolidation. Virtualization Architecture For modern computing environments, virtualization solutions can be quickly and easily installed on standard hardware. Figure 35 shows a generic example of one way in which virtualization can be implemented.

113

The Reference Guide to Data Center Automation

Figure 35: A logical overview of virtualization.

At the bottom of the figure is the actual physical hardware—the CPU, memory, hard disks, network adapters, and other components that make up the complete system. Running atop the hardware is the OS, which includes device drivers that interact with physical system components. Moving up the stack, within the OS is a virtualization management layer. This layer allows for the creation of multiple independent virtual machine environments. The virtualization layer may run as an application or as a service (depending on the product). Finally, at the top of the “stack” are the virtual machines. It is at this level that multiple OSs can run simultaneously. The job of the virtualization layer is to translate and coordinate calls from within each virtual machine to and from the underlying hardware. For example, if the Linux-based OS within a virtual machine requests access to a file, the virtualization management application translates the request and redirects it to the actual file that represents a virtual hard drive on the host file system. Figure 36 shows an example of how a Microsoft Virtual Server 2005-based virtualization stack might look.

114

The Reference Guide to Data Center Automation

Figure 36: An example of a virtualization configuration using Microsoft Virtual Server 2005 R2.

Virtualization Terminology Virtualization provides new ways in which to refer to standard computer resources, so it’s important to keep in mind some basic terminology. The physical computer on which the virtualization platform is running is known as the host computer and the primary OS is referred to as the host OS. The OSs that run on top of the virtualization platform are known as guest OSs. An additional concept to keep in mind is the virtual hard disk. From the perspective of the guest OS, these files appear to be actual physical hard disks. However, physically, they’re stored as files within the host OS file system. Finally, another major advantage of virtual machines is that they can be “rolled back” to a previous state. This is done by keeping track of all write operations and storing them in a file that is separate from the primary virtual hard disk.

115

The Reference Guide to Data Center Automation

Other Virtualization Approaches It’s important to note that, in addition to the OS-based virtualization layer shown in Figure 36, there are other virtualization approaches. In one such approach, the virtualization layer can run directly on the hardware itself. This model (also referred to as a “Hypervisor”) offers the advantage of avoiding the overhead related to running a primary host OS. The drawbacks, however, include more specific requirements for device drivers and the potential lack of management software. Another virtualization approach is “application-level virtualization.” In this configuration, application environments are virtualized—in contrast with running entire OSs. The main benefit is that scalability can be dramatically improved—often hundreds of applications can run simultaneously on a single physical server. There are drawbacks, however; some complex applications might not be supported or might require modifications. In addition, OS versions, device drivers, updates, and settings will affect all virtual environments because they’re defined at the machine level.

The following sections focus on the type of virtualization described in Figure 36. Benefits of Virtualization The list of benefits related to working with virtual machines is a long one. Let’s take a brief look at some of the most relevant advantages from the standpoint of data center management: • Increased hardware utilization—By allowing multiple virtual machines to run concurrently on a single server, overall resource utilization can be dramatically improved. This benefit can lead to dramatic cost reductions in data center environments, without significant costs for upgrading current hardware. Hardware independence—One of the major challenges related to managing data center environments is dealing with heterogeneous hardware configurations. Although it’s easy to physically relocate an array of hard disks to another machine, chances are good that OS and device driver differences will prevent it from working smoothly (if at all). On a given virtualization platform, however, virtual machines will use a standardized virtual environment that will stay constant regardless of the physical hardware configuration. Load-balancing and portability—Guest OSs are designed for compatibility with the virtualization platform (and not the underlying hardware), so they can easily be moved between host computers. This process can allow users and systems administrators to easily make copies of entire virtual machines or to rebalance them based on overall server load. Figure 37 provides an illustration. This method allows systems administrators to optimize performance as business and performance needs change over time. In addition, it’s far easier than manually moving applications or reallocating physical servers.

116

The Reference Guide to Data Center Automation

Figure 37: Load-balancing of virtual machines based on utilization.

Rapid provisioning—New virtual machines can be set up in a matter of minutes, and hardware changes (such as the addition of a virtual hard disk or network interface) can be performed in a matter of seconds. When compared with the process of procuring new hardware, rack-mounting the devices, and performing the entire installation process, provisioning and deploying virtual machines usually takes just a small fraction of the time of deploying new hardware. Backup and disaster recovery—The process of creating a complete backup of a virtual machine can be quicker and easier than backing up a physical machine. This process also lends itself well to the creation and maintenance of a disaster recovery site.

117

The Reference Guide to Data Center Automation

Virtualization Scenarios Earlier, we mentioned how virtualization can help data center administrators in the area of server consolidation. This, however, is only one of the many ways in which this technology can be used. Others include: • Agile management—As virtual machines can be created, reconfigured, copied, and moved far more easily than can physical servers, virtualization technology can help IT departments remain flexible enough to accommodate rapid changes. Support for legacy applications—IT departments are commonly stuck with supporting older servers because applications require OSs that can’t run on newer hardware. The result is higher support costs and decreased reliability. By placing these application within a virtual machine, the application can be moved to newer hardware while still running on an older OS. Software development and testing—Developers and testers often require the ability to test their software in many configurations. Virtual machines can easily be created for this purpose. It’s easy to copy virtual machines to make, for example, changes to the service pack level. Additionally, whenever a test is complete, the virtual machine can be reverted to its original state to start the process again. Training—Dozens of virtual machines can be hosted on just a few physical servers, and trainers can easily roll back changes before or after classes. Students can access their virtual machines using low-end client terminals or even over the Internet. Usually, it’s far easier to maintain a few host servers than it is to maintain dozens of client workstations.

Limitations of Virtualization Despite the many benefits and applications of virtualization technology, there are scenarios in which this approach might not be the perfect solution. The first and foremost concern for most systems administrators is that of performance. All virtualization solutions will include some level of overhead due to the translation of hardware calls between each virtual machine and physical hardware device. Furthermore, virtual machines are unaware of each other, so competition for resources such as CPU, memory, disk, and network devices can become quite high. Overall, for many types of applications and services, organizations will likely find that the many benefits of virtualization will outweigh the performance hit. The key point is that IT departments should do as much performance testing as possible before rolling out virtualized applications. There are additional considerations to keep in mind. For example, for physical servers that are currently running at or near capacity, it might make more sense to leave those systems as they are. The same goes for complex multi-tier applications that may be optimized for a very specific hardware configuration. Additionally, for applications require custom hardware that is not supported by the virtualization platform (for example, 3-D video acceleration), running within a virtual machine will not be an option. Over time, virtualization solutions will include increasing levels of hardware support, but in the mean time, it’s important to test and verify your requirements before going live with virtualization.

118

The Reference Guide to Data Center Automation Automating Virtual Machine Management In many ways, IT environments should treat virtual machines just like physical ones. Virtual machines should be regularly patched, monitored, and backed up and should adhere to standard IT best practices. This leads to the issue of automating the management of virtualization solutions. IT departments should look for tools that are virtualization-aware. Specifically, these solutions should be able to discern which virtual machines are running on which hosts systems. Ideally, virtualization management tools should be integrated with other data center automation features such as change and configuration management and performance monitoring and should coordinate with IT policies and processes. Developers can also automate virtual machine management. Most virtualization solutions provide an Application Programming Interface (API) that allows for basic automation of virtual machines. You can generally write simple scripts that enable tasks such as creating new virtual machines, starting and stopping virtual machines, and moving virtual machines to other computers. More complex programs can also be created. Overall, through the use of virtualization technology, IT departments can realize numerous benefits such as increased hardware utilization and improved management of computer resources. And, through the use of automation, they can ensure that virtual machines are managed as well as physical ones.

Remote/Branch Office Management
In an ideal world, all of an organization’s technical and human resources would be located within a single building or location. Everything would be within arm’s reach, and systems administrators would be able to easily access all their resources from a single data center. The reality for all but the smallest of organizations, however, is that it’s vital to be able to support a distributed environment. The specifics can range from regional offices to home offices to traveling “road warriors.” In all cases, it’s important to ensure that users can get the information they need and that all IT assets are properly managed. Challenges of Remote Management Before delving into the details of automating remote management, it will be helpful to discuss the major challenges related to performing these tasks. The overall goal is for IT departments to ensure consistency in managing resources that reside in the corporate data center as well as resources that might be located in a small office on the other side of the planet. Let’s look at some details.

119

The Reference Guide to Data Center Automation

Technical Issues In some ways, technology has come to the rescue: network bandwidth is more readily available (and at a lower cost) than it has been in the past, and establishing physical network connectivity is usually fairly simple. In other ways, improvements in technology have come with a slew of new problems. Storage requirements often grow at a pace that far exceeds the capacity of devices. In addition, almost all employees of modern organizations have grown accustomed to high-bandwidth, low-latency network connections regardless of their locations. IT departments must meet these demands while working within budget and resource constraints. Perhaps one of the most pertinent issues related to remote office management is that of network bandwidth. Usually the total amount of bandwidth is constrained, and factors such as latency must be taken into account. This process has often lead to remote office systems being less frequently updated. Servers sitting in a wiring closet of a branch office are often neglected and don’t get the attention they deserve. The result is systems that are likely out of compliance with IT policies and standards. Personnel Issues Ideally, organizations would be able to place senior-level systems and network administrators at each remote office. Unfortunately, cost considerations almost always prohibit this. Therefore, certain tasks must be performed manually (and often by less-trained individuals). Common tasks include the installation of security updates or the management of backup media. Dedicated technical staff is not available, so it’s common for these important operations to be overlooked or to be performed improperly. Even when using remote management tools, some tasks cannot easily be accomplished from a remote location. Business Issues Functions served by remote offices can be mission critical for many of an organization’s operations. From a business standpoint, new initiatives and changes in standard operating procedures must apply through the entire organization. The concept of “out of sight, out of mind” simply is not acceptable for remote locations. All of the hardware, software, and network devices that are under IT’s supervision must be maintained to ensure overall reliability and security.

120

The Reference Guide to Data Center Automation Automating Remote Office Management Clearly, the task of managing remote locations and resources can be a difficult one. There is some good news, however: data center automation solutions can make the entire process significantly easier and much more efficient. IT departments that need to support remote offices should look for several features and capabilities in the solutions that they select: • Change and configuration management—Keeping track of the purpose, function, and configuration of remote resources is extremely important in distributed environments. Often, physically walking up to a specific computer just isn’t an option, so the data must be accurate and up to date. Whenever changes are required, an automated solution can efficiently distribute them to all the IT department’s resources. In addition, they can keep a record of which changes were made and who made them. Doing so helps ensure that no devices are overlooked and can help avoid many common problems. Use of a configuration management database (CMDB)—Collecting and maintaining information across WAN links in distributed environments can require a lot of bandwidth. When IT managers need to generate reports, it’s often unacceptable to wait to query all the devices individually. A CMDB can centrally store all the important technical details of the entire distributed environment and can facilitate quick access to the details. Notifications—In fully staffed data centers, trained support staff is usually available to resolve issues around-the-clock. For remote offices, however, an automated solution must be able to notify the appropriate personnel about any problems that might have occurred. In addition to IT staff, those alerted might include the branch manager or other contacts at the remote site. Monitoring—The server and network resources that reside in remote offices are often critical to the users in those offices. If domain controllers, database servers, routers, or firewalls become unavailable, dozens or hundreds of users might be unable to complete their job functions. Furthermore, staff at these locations might be unqualified to accurately diagnose a problem and determine its root cause. Therefore, it’s important for computing devices and resources to be closely monitored at all times. Scheduling—When supporting remote sites that are located in distant locations, factors such as time zones and normal work hours must be taken into account. When performing tasks such as applying updates, it’s important to have the ability to specify when the changes should be committed. The main benefit is the ability to minimize disruptions to normal activity without placing an unnecessary burden on IT staff. Support for low-bandwidth and unreliable connections—Remote sites will have varying levels of network capacity and reliability. The automation solution must be able to adapt to and accommodate situations such as the failure of a connection during an important update or the application of security changes as soon as network connections become available again. Also, client agents should be able to automatically detect low-bandwidth states and reduce the number and length of messages that are sent accordingly.

In addition, most of the best practices covered through this guide also apply to remote sites. By incorporating all these features in an IT automation solution, organizations can be assured that their remote resources will enjoy the same level of care and management as resources in corporate data centers.
121

The Reference Guide to Data Center Automation

Patch Management
One of the least glamorous but still important tasks faced by systems and network administrators is that of keeping their hardware and software up to date. The benefits of applying patches for all devices within an environment can range from reducing security vulnerabilities to ensuring reliability and uptime. More importantly, the cost of not diligently testing and applying updates can be extremely high. The Importance of Patch Management Although many of the reasons to keep systems updated might be obvious, let’s take a quick look at the importance of having a patch management process. First and foremost, security is an important concern for all the components of an IT infrastructure. Ranging from physical hardware to operating systems (OSs) to applications, it’s important for known vulnerabilities and issues to be addressed as quickly as possible. Due to the nature of security-related updates, it’s difficult to predict which systems will be affected and when updates will be made available. Thus, organizations must be ready to deploy these as soon as possible to prevent exposure to security attacks. Other reasons for managing patches are just as relevant. By using the latest software, IT departments can avoid problems that might lead to downtime or data corruption. Some patches might increase performance or improve usability. In all cases, there are many advantages to deploying patches using an organized process. Challenges of Manual Patch Management Although some environments might handle patches on an ad-hoc “as-needed” basis, this approach clearly leaves a lot to be desired. Even in relatively small IT environments, there are numerous problems related to performing patch management through manual processes. Due to the many demands on IT staff’s time, it’s often easy to overlook a specific patch or a specific target device when updates are handled manually. The time and costs related to deploying updates can also present a barrier to reacting as quickly as possible. In larger IT environments, coordinating downtime schedules and allocating resources for keeping hundreds or thousands of devices up to date can be difficult (if not impossible). Often, entire remote sites or branch offices might be out of compliance with standard IT best practices and policies. These seemingly small challenges often result in problems that are very difficult to troubleshoot or that can allow network-wide security breaches. With all these factors in mind, it’s easy to see how manual patch management is not ideal.

122

The Reference Guide to Data Center Automation

Developing a Patch Management Process An important step in improving patch management is to develop a well-defined process. Figure 38 provides an example of the high-level steps that should be included in the process.

Figure 38: Steps in a typical patch management process.

Obtaining Updates It’s important for IT staff to be aware of new updates and patches as soon as possible after they’re released. Although many vendors provide newsletters and bulletins related to updates, most IT environments must continuously monitor many sources for this information. This requirement makes it very likely that some updates will be overlooked. Identifying Affected Systems Once a potential patch has been made available, systems administrators must determine whether the issue applies to their environment. In some cases, the details of the update might not necessitate a deployment to the entire environment. In other cases, however, dozens or hundreds of systems might be affected. If the patch is relevant, the process should continue. Testing Updates A sad-but-true fact about working in IT is that sometimes the “cure” can be worse than the disease. Software and hardware vendors are usually under a tremendous amount of pressure to react to vulnerabilities once they’re discovered, and it’s possible that these updates will introduce new bugs or may be incompatible with certain system configurations. This reality highlights the need for testing an update. Developers and systems administrators should establish test environments that can be used to help ensure that a patch does not have any unintended effects.

123

The Reference Guide to Data Center Automation Deploying Updates Assuming that a patch has passed the testing process, it’s time to roll out the update to systems throughout the environment. Ideally, it will be possible to deploy all the changes simultaneously. More likely, however, the need for system reboots or downtime will force IT departments to work within regularly scheduled downtime windows. Auditing Changes Once patches have been deployed, it’s important to verify that all systems have been updated. Due to technical problems or human error, it’s possible that some systems were not correctly patched. When done manually, this portion of the process often requires the tedious step of logging into each server or manually running a network scanning tool. Automating Patch Management Clearly, the process of implementing patch management is not an easy one. After multiplying the effort required to perform the outlined steps by the frequency of updates from various vendors, performing the process manually might simply be impossible. Fortunately, data center automation tools can help to dramatically reduce the amount of time and error related to distributing updates. Figure 39 provides an example of how an automated patch management solution might work.

Figure 39: An overview of an automated patch management process.

124

The Reference Guide to Data Center Automation The process begins with the detection of new patches. Ideally, the system will automatically download the appropriate files. If systems administrators determine that the update is relevant and that it should be tested, they can instruct the solution to deploy the update to a test set of servers. They can then perform any required testing. If the update passes the tests, they can instruct the automated patch management system to update the relevant devices. Patches are then applied and verified based on the organization’s rules. The entire process is often reduced to a small fraction of the total time of performing these steps manually. Benefits of Automated Patch Management The main purpose of an automated patch management solution is to help carry out all the steps mentioned earlier. This includes obtaining updates, testing them, deploying the changes, and auditing systems. In addition to automating these tasks, other benefits include: • Obtaining updates—The process of discovering and downloading updates can be automated through various tools. This is often done through a database that is managed by the solution vendor. Broad support for many different device types, OSs, and applications is a definite plus. IT staff can quickly view a “dashboard” that highlights which new patches need to be deployed. Identifying patch targets—It’s often difficult to determine exactly which systems might need to be patched. Automated tools can determine the configuration of IT components and allow administrators to easily determine which systems might be affected. Auditing—Expected system configurations can be automatically compared with current configuration details to help prove compliance with IT standards. Simplified deployment—Patches can be deployed automatically to hundreds or even thousands of devices. When necessary, the deployment can be coordinated with downtime windows.

• •

With all these benefits in mind, let’s look at some additional features that can help IT departments manage updates. What to Look for in Patch Management Solutions IT organizations should look for patch management solutions that integrate with other data center automation tools. Through the use of a configuration management database (CMDB), all details related to servers, network devices, workstations, and software can be collected centrally. The CMDB facilitates on-demand reporting, which can help organizations demonstrate compliance with regulatory requirements as well as internal patch policies. Other features include automated notifications, support for remote offices, easy deployment, and support for as many systems and devices as possible. Overall, the important task of keeping servers and network devices up to date can be greatly simplified through the use of data center automation tools. This approach provides the best of both worlds: ensuring that systems are running in their ideal configuration while freeing up IT time and resources for other tasks.

125

The Reference Guide to Data Center Automation

Network Provisioning
Perhaps the most critical portion of modern IT environments is the underlying network infrastructure. Almost all applications, workstations, and servers depend on connectivity in order to get their jobs done. In the “old days” of computing, networks were able to remain largely static. Although new switches may be added occasionally to support additional devices, the scope of the changes was limited. In current environments, the need to react to rapidly changing business and technical needs has made the process of network provisioning increasingly important. Defining Provisioning Needs From a high-level view of the network, it’s important to keep in mind several main goals for managing the configuration of the infrastructure. The main objective should be to allow systems and network administrators to efficiently design, test, and deploy network changes. The quicker the IT team can react to changing requirements, the better will be its coordination with the rest of the organization. The list of types of devices that are supported by network teams is a long one, and usually includes many of the items shown in Figure 40.

Figure 40: Examples of commonly supported network device types.

Common operations include the deployment of new devices and making network-wide changes. Additional tasks include making sure that devices are configured as expected and that they meet the organization’s business and technical requirements. Figure 41 provides an overview of the types of tasks that are required to perform network provisioning. Let’s take a look at some of these requirements in more detail, and how using an automated network provisioning solution can help.

126

The Reference Guide to Data Center Automation

Figure 41: An overview of network provisioning goals.

Modeling and Testing Changes Simple types of network changes might require only minor modifications to one or a few devices. For example, if a new port or protocol should be allowed to cross a single firewall or router, the change can safely be performed manually by a knowledgeable network administrator. The modification is also likely to be fairly safe. Other types of network changes can require the coordination of changes between dozens or even hundreds of network devices. Often, a relatively simple error such as a typo in a network configuration file or overlooking a single device can lead to downtime for entire segments of the network. Furthermore, applying changes to numerous devices at the same time can be a tedious and error-prone process. This additional complexity can best be managed through the use of an automated system. By allowing network administrators to design their expected changes in an offline simulation or test environment, they can predict the effects of their changes. This can help catch any configuration problems before they are actually committed in a production environment.

127

The Reference Guide to Data Center Automation

Managing Device Configurations Once an IT organization has decided which changes need to be made, an automated solution can apply those changes. The process generally involves defining which modifications are to be made to which devices. Data center automation tools can verify that the proper approvals have been obtained and that standard change and configuration management processes have been followed. The actual modifications can be deployed simultaneously to many different devices, or they can be scheduled to occur in sequence. From a network standpoint, the coordination of changes is extremely important in order to avoid configuration conflicts or unnecessary downtime. Automated network provisioning systems also provide additional useful features. Common operations might include copying the relevant portions of the configuration of an existing device (for de-provisioning or re-provisioning), or defining templates for how network devices should be configured. For environments that often need to scale quickly, the ability to define standard configuration templates for devices such as routers, switches, firewalls, load balancers, and content caches can dramatically reduce deployment times and configuration errors. Auditing Device Configurations Even in well-managed IT environments, it’s possible for the configuration of a device to deviate from its expected settings. This might happen due to simple human error or as a result of an intrusion or unauthorized modification. Automated network provisioning solutions should be able to regularly scan the configuration of all the devices on the network and report on any unexpected values that are encountered. These reports can be used to demonstrate compliance with regulatory requirements and IT policies and standards. Using a Configuration Management Database An excellent method for managing the complexity of network environments is through the use of a centralized configuration management database (CMDB). This central repository can store details related to all the devices in the environment, including networking hardware, servers, workstations, and applications. The information can be combined to provide reports such as insight into overall network utilization or finding the root causes of any performance problems or failures that might have occurred. Additional Benefits of Automation By automating network provisioning, IT departments can also realize numerous additional benefits. For example, automatic notifications can be sent whenever problems occur on the network. Also, overall security is often greatly increased because network administrators will no longer need to share passwords, and IT managers can ensure that only authorized personnel are able to make changes. Overall, data center automation tools can greatly simplify the process of network provisioning and can increase the responsiveness of an IT department.

128

The Reference Guide to Data Center Automation

Network Security and Authentication
It is commonly accepted that network security is one of the most important aspects of IT management, but the methods by which users and computers are granted access to communicate within an organization can vary greatly between environments. The goal of most security measures is to ensure that only authorized users can access resources while still allowing all users to do their jobs with a minimal amount of hassle. Understanding Security Layers If you were to imagine a house with a concrete front door that includes numerous locks and that has flimsy single-pane windows, it’s unlikely that you would consider the house to be secure. The same applies to networks—security must be implemented and managed throughout the organization and at all entry points to the network. The best-implemented security plan will include multiple layers of security. Figure 42 provides an overview of some of these layers.

Figure 42: An overview of various IT security layers.

All these layers work together to form the links in an organization’s armor. For example, before an employee or consultant can access a specific database application, the employee will first have to have access to a physical network port. He or she will then be verified at the network and server levels, and finally at the application level. The user must meet all these challenges in order to be able to access the application.

129

The Reference Guide to Data Center Automation

Choosing a Network Authentication Method When working in all but the smallest of IT environments, it’s important to use a centralized authentication mechanism. One of the most commonly used systems is Microsoft’s Active Directory. AD domains provide an organization-wide security database that can be used to control permissions for users, groups, and computers throughout the environment. All administration is managed centrally without requiring security to be configured on individual computers. As long as a user has the appropriate credentials, he or she will be able to access the appropriate devices or services. Security Protocols For managing authentication in a distributed network environment, one of the most common protocols is Kerberos. This protocol allows computer systems to be able to positively identify a user in a secure way. It can help avoid security problems such as the interception of security credentials through the use of encryption. Generally, Kerberos is implemented at the server or the application level. However, network devices and other components can also take advantage of it. There are also several other authentication methods that can be used. Older versions of the Microsoft Windows platform use the NTLM authentication protocol and method. Although this method is less secure than Kerberos, NTLM is a widely supported standard that might still be required to support down-level clients and servers. Also, numerous Lightweight Directory Access Protocol (LDAP)-compliant solutions can integrate with or replace AD. Remote Authentication Dial-In User Service (RADIUS), which was originally developed for the purpose of authenticating remote users, can help centralize security for mobile users and remote locations. Authentication Mechanisms The goal of authentication is to ensure that a specific user is who he or she claims to be. By far, the most common authentication mechanism is through the use of a login and password combination. Although this method meets basic security requirements, it has numerous drawbacks. First, users are forced to memorize these pieces of information, and handling lost passwords is a tedious and time-consuming process. Additionally, passwords can be shared or stolen, making it possible that a person is not actually being positively identified. So much is dependent on having the right credentials that this method leaves much room for improvement. Newer authentication mechanisms include biometrics and the use of specialized security devices. Biometric devices are most commonly based on the use of fingerprints or voice identification to identify individuals. Other methods such as retinal scans are available (though they’re most commonly seen in spy movies). Security devices such as an encryption card or “fob” can also be used to verify individuals’ identities, especially for remote access. All of these methods involve a certain level of management overhead, and IT departments must be able to keep track of security principals, regardless of the method used.

130

The Reference Guide to Data Center Automation

Authorization Figuring out how administrators can control access to a system is only part of the security puzzle. Just as important is defining what exactly these users can do. Restrictions can range from determining which files and folders can be accessed to limiting the time of day during which a user can log on. Authorization is the process of granting permission to security principals (such as users or computers) in order to granularly manage what tasks they can perform. Automating Security Management With the many methods of managing and determining network permissions, IT departments are faced with a difficult challenge. On one hand, administrators must make systems as usable and accessible to authorized users as is practical. On the other hand, the IT team must ensure that all the different levels and layers of security include consistent information to prevent unauthorized access. Even a single device or database that is out of compliance with policies can create a major security hole in the overall infrastructure. So how can security be managed across all these disparate systems? A commonly used method is through the use of a centralized security management solution. Figure 43 shows an example of how this might work from a conceptual standpoint. The goal of the solution is to coordinate details between multiple security providers. It can do so through the use of a centralized security database that might contain either a master set of credentials or mappings between different types of security systems. The actual implementation details will vary based on the overall needs of the environment. From the user’s standpoint, this can help achieve the benefit of single sign on (SSO).

Figure 43: Coordinating security between multiple systems.

Overall, by integrating the management of overall security, IT departments and organizations can be sure that all their systems remain coordinated and that only authorized users can access the network.
131

The Reference Guide to Data Center Automation

Business Processes
An important characteristic of successful businesses is a strong alignment of the efforts between multiple areas of the organization. This arrangement rarely occurs by itself—instead, it requires significant time and effort from organizational leaders. The end result is often the creation of processes that define how all areas of the enterprise should work together to reach common goals. The Benefits of Well-Defined Processes Business processes are put in place to describe best practices and methods for consistently performing certain tasks. Often, the tasks involved will include input and interaction of individuals from throughout the organization. Before delving into details and examples of processes, let’s first look at the value and benefits. There are several valuable benefits of implementing processes. The first is consistency: by documenting the way in which certain tasks should be completed, you can be assured that all members of the organization will know their roles and how they may need to interact with others. This alone can lead to many benefits. First, when tasks are performed in a consistent manner, they become predictable. For example, if the process of qualifying sales leads is done following the same steps, managers can get a better idea of how much effort will be required to close a sale. If the business needs to react to any changes (for example a new competitive product), the process can be updated and all employees can be instructed of the new steps that need to be carried out. Another major benefit of defining business processes is related to ensuring best practices. The goal should not be to stifle creativity. Rather, it’s often useful to have business leaders from throughout the organization decide upon the best way to accomplish a particular task. When considering the alternative—having every employee accomplish the task a different way— consistency can greatly help improve efficiency. Additionally, when processes are documented, new employees or staff members that need to take on new roles will be able to quickly learn what is required without making a lot of mistakes that others may have had to learn “the hard way.” Defining Business Processes Once you’ve decided that your organization can benefit from the implementation of business processes, it’s time to get down to the details. You must define business processes and determine how they can best be implemented to meet the company’s needs.

132

The Reference Guide to Data Center Automation

Deciding Which Processes to Create An obvious first step related to designing processes is to figure out which sets of tasks to work on. At one extreme, organizations could develop detailed plans for performing just about every business function. However, creating and enforcing business processes requires time and effort, and the value of the process should be considered before getting started. Some characteristics of tasks that might be good candidates for well-defined processes include: • Tasks that are performed frequently—The more often a process is used, the more value it will have for the organization. For tasks that are performed rarely (for example, a few steps that are carried out once per year), the effort related to defining the process might not be worthwhile. Tasks that involve multiple people—Processes are most useful when there is a sequence of steps that must be carried out to reach a goal. When multiple people depend upon each other to complete the task, a process can help define each person’s responsibilities and can help ensure that things don’t “fall through the cracks.” Tasks that have consistent workflows—Since the goal of a process is to define the best way in which to accomplish a task, processes are best suited for operations that should be done similarly every time. Although it is possible to define processes when significant variations are common, often these processes lead to many exceptions, which can lower the overall value of the effort.

With these aspects in mind, let’s look at additional details related to defining business processes. Identifying Process Goals As it’s helpful to have a project plan or mission statement, it’s important to define the goals of a process before beginning the work of defining it. Examples of typical process goals include: • • • To provide an efficient method for tracking customer issues immediately after a sale. To increase the quality of technical support provided by the customer service desk. To streamline the process of payroll processing.

Effective goals will usually be concise and will focus on the what and why, instead of how. During the development of processes, organizations should regularly refer back to these goals to ensure that all the steps are working towards the requirements.

133

The Reference Guide to Data Center Automation

Developing Processes When it comes to deciding who should be involved in developing processes, the general rule of thumb is the more, the better. Although it might be tempting for managers to take a top-down approach to defining processes or for a single business manager to document the details, it’s much better to solicit the input of all those that are involved. Many operations and tasks have effects that are felt outside of the immediate realm of a single department. Therefore, it’s important to ensure coordination with other portions of the business. Specifically, there are several roles that should be represented during the creation of a process. Business leaders from all areas of the organization should be welcome. Additionally, stakeholders whose jobs will be directly affected by the process should drive the process. This might include employees ranging from hands-on staff members to executive management (depending on the scope of the process). An organized process for implementing ideas and reviewing documentation drafts can go a long way toward keeping the development process humming along. At the risk of sounding like a half-baked management fad, it’s often helpful to have a process for creating processes. Documenting Business Processes Once the key components of a business process have been defined, it’s time to commit the details to a document. A best practice is to use a consistent format that includes all the relevant details that might be needed by individuals that are new to the job role. Figure 44 provides some examples.

Figure 44: Components of a well-defined process.

Specific details include the owner of the document—the individual or group that is responsible for defining and maintaining the process. Other details include who is affected by the process, and the roles that might be required. The actual steps of the process can be defined in a variety of ways. Although text might be useful as a basis, flowcharts and other visual aids can help illustrate and summarize the main points very effectively.
134

The Reference Guide to Data Center Automation Creating “Living” Processes It’s important to keep in mind that processes are rarely, if ever, perfect. There is almost always room for improvement, and organizations often have to react to changing business or technical requirements. Instead of looking at processes as fixed, rigid commandments, organizations should see them as guidelines and best practices. Ideally, the group will be able to meet periodically to review the processes and ensure that they are still meeting their goals. Furthermore, all employees should be encouraged to make suggestions about changes. This open communication can help add a sense of ownership to the process and can help enforce it. It doesn’t take much imagination to picture workers grumbling about antiquated systems and steps that make their jobs more difficult and less efficient. Rather than encouraging people to work around the system, they should be encouraged to improve the portions that don’t work. Automating Business Process Workflow As mentioned earlier, it’s common for processes to include steps that require interactions among different individuals and business units. Therefore, it should come as no surprise that organizations can benefit significantly through the use of automated workflow software solutions. These solutions allow managers to define steps that are required and to ensure that they are properly followed. Approvals processes and workflow often require multiple people to work on the same piece of information. Tasks include reviewing the current state of the information and making comments or modifications. The changes should be visible to everyone involved in the process, and people should be sure to have the latest version of each document. The challenges lie in the ability to coordinate who has access to which pieces of a document, and when. Many popular software packages and suites offer workflow features. For example, Microsoft’s Office system productivity suite and its SharePoint Portal Server product can help make documents and other information available to teams and organizations online. Many enterprises have also invested in the implementation of enterprise resource planning (ERP), customer relationship management (CRM), or custom-built line-of-business applications. And, from an IT standpoint, data center automation tools can be used to ensure that processes related to change and configuration management, security management, deployment, and many other tasks are handled according to the organization’s best practices. Regardless of the approach taken, the creation and enforcement of business processes can significantly improve the maturity and efficiency of organizations of any size.

135

The Reference Guide to Data Center Automation

Business Process Example: Service Desk Processes
Having already explored the benefit of business processes and characteristics that can make them successful, let’s look at a specific example of a business process—the implementation of a service desk workflow. The goal is to help illustrate how organizations can create and document a common business practice to help streamline operations. Characteristic of an Effective Process Before diving into specific details of a service desk process, let’s enumerate a few ideas to keep in mind. First and foremost, the process should be defined well enough so that all reasonable procedures are covered. Examples might include what to do in the case of an emergency, or how after-hours support calls should be handled. Second, it’s important for IT departments to communicate their processes to their users. If the turnaround time to resolve low-priority issues is 2 business days, users should be made aware of this ahead of time. Third, it is very important that at any given point in the process, at least one individual has ownership of an issue. This individual should have the authority to make decisions whenever decisions are required. A common cause of poor customer service is when a call or issue should be transferred but instead ends up in a “black hole” somewhere. (It’s tempting to think that there’s a place in the Universe where these calls go to commiserate). Some fundamental rules related to documentation should also apply. Consistent use of particular terminology (along with definitions, wherever appropriate) can be greatly helpful. In the area of service desk support, clear definitions of “Level 2 Emergency” or “minor issue” can help everyone better understand their roles. Even terms such as “regular business hours” could use at least a reference to the company’s standard work schedule. Finally, wherever possible, service desk staff should be empowered to act as advocates for their callers. Although their ultimate loyalty should be to the support organization, they should also represent the needs of those that they support to the best of their abilities. Keeping these things in mind, let’s move on to some examples. Developing a Service Desk Operation Flow Let’s start by taking a look at a typical service desk process. For the sake of this example, let’s focus on a scenario in which an IT call center is designed to support end users from within the organization. Let’s assume that the organization supports approximately 3000 employees spread through numerous sites, and the service desk includes 35 staff members, including management.
Most of the information in this section is adaptable to organizations of just about any size.

136

The Reference Guide to Data Center Automation Documenting Workflow Steps The approach we’ll take to developing a service desk process is to start with the very basics. You might imagine these first steps as something that might be scribbled in a notebook somewhere. Typical steps in the service desk process can initially be defined by the following high-level steps: • • • • A Service Desk Representative (SDR) receives a call and determines the nature of the problem. If the problem can be resolved by the SDR, assistance should be provided and the call should be completed. If the problem requires the caller to be transferred, the SDR should document details and transfer the call to the appropriate specialist. If the issue is an emergency, it should be escalated to a supervisor via email (during regular business hours) or via a phone call (outside of regular business hours).

• All other issues should be escalated to a Senior Support Representative (SSR). Although text-based descriptions can be helpful, this example leaves much to be desired. First, it’s difficult to read—it’s not clear whether these steps should be performed in sequence or some decisions are exclusive of each other. Clearly, there is room for improvement. Let’s continue on the path to an effective service desk business process by looking at more examples of what might be included. Tracking and Categorizing Issues One important aspect of providing service desk support is the requirement of always tracking all issues. Apart from ensuring that no request is ignored, this information can be vital in identifying, comparing, and reporting on common problems. Service desk staff should be made aware of common categories of problems. Table 10 provides basic examples.
Category Minor—Desktop Description Minor computer issue that is not preventing use of the system Change to an existing system that is not preventing an employee from working A single computer is unavailable for use by an employee Multiple systems are unavailable for use Examples Intermittent application problems; non-critical or “annoying” issues Addition of a new computer; new hardware request; physical relocation of a computer Hard disk or other hardware failure; operating system (OS) issue Department-level server failure; network failure

Minor—Change Request

Medium—Single System

High—Multiple Systems

Table 10: Examples of service desk issue categories.

In addition, this table could include details about any service level agreements (SLAs) that the IT department has created as well as target issue resolution times. Of course, manual judgment will always be required on the part of service desk staff. Still, the goal should be to capture and route important information as accurately as possible.
137

The Reference Guide to Data Center Automation Escalation Processes and Workflow In even small service desk environments, it’s likely that the organization has specialists to handle certain types of issues. In some cases, there might be multiple levels of support staff; in other cases, application experts might be located outside the IT organization. Once the nature and severity of an issue has been determined, service desk representatives should know how they should route and handle these issues. Perhaps the most important aspect is to ensure that the issue always has an owner. Creating a Service Desk Flowchart Once you have settled on the features to include in your high-level service desk process, it’s time to determine how best to communicate the information. A flowchart is often the best way for people to visualize the steps that might be required to resolve an issue and how the steps are related. Figure 45 provides an example.

Figure 45: An example of a Help desk issues resolution process flowchart.

138

The Reference Guide to Data Center Automation Notice that in this document, there are many decision points and branching logic that will affect the path to issue resolution. The major areas of ownership start at the left and begin with the reporting of an issue (which can be from any area of the organization). The Level-1 staff is responsible for categorizing the issues and determining the next steps. The issue may be resolved at this level or it may be moved on to other members of the staff. At all points, the issue is owned by an individual or a group. In this particular flowchart, it is ultimately the responsibility of the Level-1 staff to ensure that an issue is closed. Although this flowchart may not be perfect, it is easy to read and provides a simple overview of many portions of the process. Most IT organizations will also want to accompany the flowchart with additional details such as definitions of terms and steps involved in procedures. Automating Service Desk Management Service desk workflow is an excellent example of the type of business process that can be greatly improved through the use of automation. It’s important to note that there are many approaches to the task of defining service desk workflows. For example, the IT Infrastructure Library (ITIL) defines a Service Desk, and provides best practices for how IT organizations can best implement policies and processes related to issue resolution
For more information about ITIL, see the ITIL Web site at http://www.itil.co.uk.

Numerous third-party products and software solutions are also available. Some products are very customizable, while others introduce their own suggested workflows, terminology, and best practices. When evaluating potential service desk solutions, IT organizations should start by looking at their overall needs. For example, some solutions might better lend themselves to the support of customers that are external to an organization (by allowing for fee-based support and related features); others might be more appropriate for internal IT service desks. In some cases, an enterprise might decide to build its own service desk solution. Although doing so can lead to a system that is well-aligned with business goals, the time, cost, and maintenance effort required might not lead to a strong enough business case for this approach. Regardless of the approach and the technology selected, the implementation of an organized service desk process is an excellent example of how IT organizations can benefit from the implementation of business processes.

139

The Reference Guide to Data Center Automation

Executive Action Committee
A challenge that is common to most IT departments is the goal of meeting organizational requirements while staying within established budgets. In addition to the ever-increasing reliance most organizations put on their IT staff, new initiatives often take up important time and resources. When reacting to demands, it can become difficult for IT management to stay on top of the needs of the entire organization. Instead of working in isolation from the rest of the business, a recommended best practice is to establish an Executive Action Committee. Goals of the Executive Action Committee An Executive Action Committee can help determine the course of the business and can help define the role of the IT organization within it. The purpose of the committee is to evaluate current and future IT initiatives and to make recommendations about which projects should be undertaken. The process might start by evaluating active proposals and requests as collected by the IT department. For example, the Sales and Marketing departments might have requested an upgrade of their current customer relationship management (CRM) application, while the Engineering department is looking for a managed virtualization solution to facilitate testing of a new product. Evaluating Potential Projects Given time and budget constraints, it’s likely that some projects will either have to be cut from the list or be postponed until resources are available. That raises the question of how to decide which projects are most valuable to the organization. Standard business-related measurements can be helpful. Quantitative estimates such as return on investment (ROI) and total cost of ownership (TCO) are key indicators of the feasibility of a particular project. The quicker the ROI and the lower the TCO, the better. Other factors that might be taken into account include risks (factors that might lead to cost overruns or unsuccessful project completion) as well as available resources (see Figure 46).

140

The Reference Guide to Data Center Automation

Figure 46: Factors related to prioritizing projects.

An adage related to technical project management specifies that organizations can choose to define two of the following: scope, timeliness, and quality. For example, if the project deadline is most important, followed by quality, then it’s quite possible that the scope (the list of included features and functionality) might need to be reduced (see Figure 47).

Figure 47: Prioritizing the goals of a particular project.

141

The Reference Guide to Data Center Automation Defining Committee Roles and Members When defining the membership of the Executive Action Committee, it’s important to ensure that representation from various areas of the organization is included. Ideally, this will include senior management and executives from various business units. Because investments in IT can affect the organization as a whole, input and comments should be solicited before undertaking major projects. This process can go a long way towards having IT organizations seen as strategic business partners and good team players. Implementing an Executive Action Process A crucial first step in implementing an Executive Action Process is to gather buy-in from throughout the organization. Often, the potential benefit—better prioritization of IT projects—is enough to gain support for the process. In other cases, IT managers might have to start the process by calling meetings to evaluate specific projects. The roles of committee members may vary based on business needs and particular projects that are underway. For example, if an organization is planning to invest significant resources in a new Web-based service offering, leaders from the Engineering department might be most interested in helping to prioritize projects. Figure 48 provides some steps that might be involved in regular Executive Action Committee meetings.

Figure 48: Parts of the of the Executive Action Committee process.

Overall, the goal of the Executive Action Committee is to better align IT with the needs of the organization. By ensuring that input is gained from throughout the organization and by prioritizing the projects that can provide the most “bang for the buck,” enterprises can be sure to maximize the value of their IT investments.

142

The Reference Guide to Data Center Automation

Centralized User Authentication
Taken literally, the concept of authentication refers to establishing that something is genuine or valid. In the “real world,” this is often easy enough—unless you have reason to believe that you’re involved in a complex international plot. Basic physical appearance can help you identify individuals with little room for error. Add in an individual’s voice, and it’s pretty easy to distinguish your manager from other coworkers (perhaps by identifying the tell-tale pointy hair from the Dilbert comic strips). The process of authentication in the technical world is significantly more complex. Major Goals of Authentication From the standpoint of an IT department, the primary goal of authentication is to positively identify users or computing devices and to ensure that they are who they claim to be. Based on their validated identities, systems can determine which permissions to grant (a process known as authorization). Although the primary goal is easily stated, there is a lot more to it. Other goals of the authentication process involve minimizing the hassle and intrusiveness of security methods. If you required your users to provide authentication information every time they tried to open a file, for example, it’s likely that the reduction in productivity (not to mention the negative effects on your own life expectancy) might not make it a worthwhile implementation. With strong but user-friendly and easy-to-maintain authentication mechanisms, organizations can gain the advantages of increased security without the potential downsides. With this goal in mind, let’s look at ways in which IT departments can implement authentication. Authentication Mechanisms By far, the most commonly used method of computer-based authentication is through the use of a login and password combination. Although this method is relatively easy to implement, it comes with significant burdens. Users are responsible for generating and remembering their own passwords. They should choose strong passwords, but they’re often required to enter them multiple times per day. From an IT standpoint, devices such as routers and security accounts for use by applications and services also often have passwords. Creating and maintaining these passwords can be a difficult and time-consuming process. From a security standpoint, it can also be difficult to determine whether a password has been shared, compromised, or used in an authorized way. All too often, “secrets” are shared. Considering that organizations often have many thousands of passwords and accounts, this can be a major security-related liability.

143

The Reference Guide to Data Center Automation

Strengthening Password-Based Authentication An old adage states that a chain is only as strong as its weakest link—should even one component fail, the strength and integrity of the entire chain is compromised. From an IT standpoint, this means that security staff must ensure that authentication credentials are properly maintained. Some general best practices related to managing password-based environments include the following: • Password length—IT departments should require a minimum number of characters for each password that is used within the environment. Although the specifics vary between IT environments, a minimum password length of at least six characters is a standard best practice. Password complexity—A common method for infiltrating computer systems is that of dictionary-based or “brute force” attacks. This approach involves either randomly or systematically trying to “guess” a password. If the potential attacker has additional knowledge (such as names of the user’s children, pets, and so on), the chances of success can be dramatically improved. To counter these methods, it’s important to ensure that passwords are sufficiently complex. The general approach is to require at least two of the following types of characters in every password: • • • • • Lower-case letters Upper-case letters Numbers Special characters

Password expiration—The longer a user account and password combination is active, the more likely it is that the account is being used by an unauthorized individual. Because there is little to prevent users from accidentally or purposely sharing passwords and it’s difficult to detect whether a login is being used by an unauthorized individual, it’s important to require passwords to be regularly modified. A typical practice might require users to change their passwords every 3 months. The authentication system can also keep a list of recently used passwords, and prevent their reuse. Finally, some systems might be able to look for similarities in passwords and disallow the change from keys like “P@ssw0rd01” to “P@ssw0rd02.”

144

The Reference Guide to Data Center Automation

Account lockout policies—Unauthorized access attempts are generally characterized by having many unsuccessful logon attempts. Password-based security solutions should automatically lock an account so that it cannot be used if a certain number of incorrect logon attempts are made. Additionally, the information could be logged so that IT staff can examine the situation. To avoid administrative overhead, an automatic unlock process is often used. For example, after five unsuccessful logon attempts, the user must wait 10 minutes before again attempting to access the system. These methods can dramatically decrease the viability of brute-force attacks. User education—A critical but often-overlooked area related to authentication is that of end-user education. Staff members often see security as a hindrance to getting their jobs done, and they can sometimes work to circumvent certain measures. This attitude can lead to significant problems that can eventually increase the vulnerability of an entire organization’s computing resources. By informing users of the value of and power of their network accounts, IT departments can gain allies in the process of securing systems.

It’s also important to note that IT departments can easily go overboard in implementing security measures. Such Draconian tactics as requiring extremely long passwords or forcing very frequent password changes can often work against the goal of security. Users will often choose the path of least resistance, and may feel the need to write down their passwords in multiple places or to use easy-to-guess phrases. As mentioned earlier, all security implementations should also take into account usability and productivity issues. Perhaps most importantly, all of an IT environment’s authentication policies and procedures should be documented and should be made available to members of the organization. Other Authentication Mechanisms Although password-based authentication is the most ubiquitous method, other methods are also available. The field of biometrics focuses on the task of identifying individuals based on biological mechanisms. Fingerprint-based identification is now available at a reasonable cost and even consumer-focused devices are available. In order for this method to work in a corporate environment, the fingerprint readers must be readily available wherever authentication takes place. Often, users will have to fall-back to “old-fashioned” username and password combinations, at least occasionally. Other biometric methods range from the use of voice-print analysis to retinal scans. The major barriers to the adoption of these methods include cost and compatibility with existing systems. Still more authentication mechanisms involve the use of a small device that can generate regularly changing cryptographic values known as secure tokens. This mechanism adds another layer of security by ensuring that a potential user of a system is in possession of the device. Should it be misplaced or stolen, IT departments can find out quickly and cancel old credentials.

145

The Reference Guide to Data Center Automation

Centralized Security So far, we’ve looked at several authentication mechanisms (with a focus on password-based authentication). Let’s explore the process of creating and managing security credentials in a network environment. We’ll focus on the importance of implementing a centralized user authentication system, but first let’s look at an alternative (and the many problems it can cause). Problems with Decentralized Security Most new computers, operating systems (OSs), applications, and network devices have mechanisms for maintaining their own security. For example, most switches, routers, and firewalls can be protected through the use of a password. Applications might use their own set of logins and permissions, and even individual computers might have their own security settings. Figure 49 provides an overview of this security approach.

Figure 49: A logical overview of decentralized security.

The most important aspect of decentralized security is that there are many security databases within the organization. Each one is independent of the others and contains its own authentication information. For example, every computer might have a separate account named “SysAdmin.” Although it’s technically possible to manually synchronize the login information (that is, to ensure that the same usernames and passwords are used on each machine), the process is tedious and error-prone. Furthermore, maintaining even a few of these systems can quickly become difficult and time consuming. The end result is often that security is not maintained: Simple passwords are used, login information is changed infrequently, and passwords are often written down or recorded in some other way.

146

The Reference Guide to Data Center Automation Although simply setting up a decentralized security environment can be painful, the real risks are in the areas of manageability. For example, what will happen if a password is compromised? Even if IT staff can scramble to update the passwords on multiple devices, there is still a large window of vulnerability. The new password also has to be communicated to the users that need it—an inherently risky proposition. What if one or more devices are overlooked and continue to run with the exposed authentication information? And this doesn’t even take into account the effort that might be required to ensure that other computers and services that rely upon the login are properly updated. In case all of this isn’t incentive enough to see the drawbacks of decentralized security, let’s look at one more motivator before moving on: Imagine the difficulty that end users will experience if they must manually log on to each device or application on the network. The decrease in productivity and frustration might be tantamount to not having a network at all. By now, it’s probably obvious that decentralized security is not a very effective approach—even for the smallest of IT organizations. Understanding Centralized Security In a centralized security model, all security principles (such as users and computers) are stored in a single repository. All the devices in the environment rely upon this security database to provide authentication services. All accounts are created and maintained once (although many different devices might be able to perform the function). Figure 50 provides a visualization of this approach.

Figure 50: A centralized security implementation.

It’s easy to see how this method can alleviate much of the pain of maintaining many separate security databases. IT administrators that are responsible for maintaining security can create accounts in the security database. And, if a password or other user setting must be changed, it can be done centrally.

147

The Reference Guide to Data Center Automation Understanding Directory Services Solutions Although the benefits of centralized security management are compelling by themselves, so far we’ve only scratched the surface. Several vendors offer unified directory services solutions that provide numerous additional advantages. One of the most popular solutions is Microsoft’s Active Directory (AD—see Figure 51).

Figure 51: A logical overview of a Microsoft AD domain.

AD is designed to be an enterprise-wide centralized security structure that is hosted by Windows Server-based domain controllers. Although built-in authentication mechanisms can differ, practically all enterprise-based hardware, software, and network solutions can leverage AD for verifying user credentials and evaluating permissions. Microsoft’s directory services solution is based on a variety of standards and technologies, including the Lightweight Directory Access Protocol (LDAP), Kerberos (for managing authentication tokens), and Domain Name System (DNS). Setting up a complete directory services infrastructure involves many components and services, so vendors have gone to great lengths to make these systems easy to configure, deploy, and manage. In addition to AD, other vendors offer LDAP-compliant directory services solutions. One example is the Remote Authentication Dial-In User Services (RADIUS) standard that was originally intended for verifying credentials for remote users. Most of these directory services solutions can work in conjunction with AD or by themselves.

148

The Reference Guide to Data Center Automation

Features of Directory Services Solutions In addition to the important feature of providing a single central security repository, centralized authentication solutions include many features that help simplify the management of user authentication. Some of the features include: • Secure authentication mechanisms—A significant challenge related to working with password-based security is the problem of transferring password information over the network. Even if the data is encrypted, it’s possible that replay-based attacks or man-inthe-middle intrusions can reduce security. Modern directory services solutions use strong authentication and key management systems such as Kerberos. Although the underlying concepts are complex, the main benefit is that actual passwords are never sent over the wire, thereby making it impossible for them to be intercepted or reverse-engineered. Best of all, when it’s properly implemented, these features work behind-the-scenes without the intervention of IT staff. Cross-machine authentication—Most IT environments support at least a few dozen computers, and many support thousands. It doesn’t take much imagination to see the problems with forcing users to authenticate at each resource. To solve this issue, directory services solutions work in a way that allows computers that are members of a domain to trust each other. As long as a user has authenticated with the security domain, the user no longer must manually provide credentials for accessing other network resources. Hierarchical management—Most businesses have established departments and an organizational structure to best manage their personnel and resources. Directory services solutions are able to mirror this hierarchy to provide for simplified management. Administrative containers called organizational units (OUs) are created to allow for easily managing thousands or even millions of “objects” such as users, computers, applications, and groups. Management tools—Directory services solutions generally provide well-designed graphical tools to manage security settings and accounts. Although IT staff will have no problem using them, some operations can even be handed down to non-IT staff (such as managers or Human Resources staff). By delegating the management of user accounts to trusted individuals, IT departments can ensure that their security database is kept up to date. And, through the use of scripting and programmatic automation, many of the most common tasks can be greatly simplified. Application and device support—Third-party applications and hardware devices can take advantage of directory services solutions to authenticate users. This setup alleviates developers from the difficult task of creating secure logon mechanisms and reduces the potential liabilities of security issues for the IT department. Furthermore, as there is generally only a single account per user, IT departments can centrally enable, disable, or modify permissions from within a single security database.

Though this basic list of features of directory service solutions is a long one, it only scratches the surface of the full potential.

149

The Reference Guide to Data Center Automation Directory Services Best Practices Taking advantage of directory services solutions is usually a straightforward process. There are, however, some important aspects to keep in mind. First and foremost, enterprise IT staff should look for management solutions, software, and hardware that work with the directory solution that they have implemented. By leveraging the advantages of the directory, IT organizations can lower costs and improve security. The same applies for custom software development: Internal developers should ensure that line-of-business applications adhere to corporate IT standards and that they work with the directory services solution. Finally, it’s important for IT departments to develop, document, and enforce policies related to their security implementations. Processes for creating new user accounts, handling employees that are leaving, and performing periodic security checks are vital to ensuring the overall health and benefit of the directory service. Overall, directory services solutions can dramatically improve security and reduce administration related to a difficult technical and organizational challenge—managing user authentication. This should make them a vital part of the core infrastructure of all IT departments of any size.

Download Additional eBooks from Realtime Nexus!
Realtime Nexus—The Digital Library provides world-class expert resources that IT professionals depend on to learn about the newest technologies. If you found this eBook to be informative, we encourage you to download more of our industry-leading technology eBooks and video guides at Realtime Nexus. Please visit http://nexus.realtimepublishers.com.

150

Sign up to vote on this title
UsefulNot useful