You are on page 1of 63

Table of Contents

Introduction ................................................................................................................................. 2 Purpose........................................................................................................................... 2 Context ........................................................................................................................... 2 Audience ......................................................................................................................... 3 Development Process ..................................................................................................... 3 Principles and Properties............................................................................................... 3 2. Framework/Reference Model ...................................................................................................... 5 3. Best Practice Components ......................................................................................................... 41 3.1. Standards ..................................................................................................................... 41 3.2. Hardware Platforms .................................................................................................... 43 3.3. Software ....................................................................................................................... 45 3.4. Delivery Systems .......................................................................................................... 45 3.5. Disaster Recovery ........................................................................................................ 51 3.6. Total Enterprise Virtualization ................................................................................... 57 3.7. Management Disciplines .............................................................................................. 60 1. 1.1. 1.2. 1.3. 1.4. 1.5.

1. Introduction
1.1. Purpose As society and institutions of higher education increasingly benefit from technology and collaboration, the importance of identifying mutually best practices and architecture makes this document vital to the behind-the-scenes infrastructure of the university. Key drivers behind the gathering and assimilation of this collection are: y Many campuses want to know what the others are doing so they can draw from a knowledge base of successful initiatives and lessons learned. Having a head start in thinking through operational practices and effective architectures--as well as narrowing vendor selection for hardware, software and services--creates efficiencies in time and cost. Campuses are impacted financially and data center capital and operating expenses need to be curbed. For many, current growth trends are unsustainable with limited square footage to address the demand for more servers and storage without implementing new technologies to virtualize and consolidate. Efficiencies in power and cooling need to be achieved in order to address green initiatives and reduction in carbon footprint. They are also expected to translate into real cost savings in an energy-conscious economy. Environmentally sound practices are increasingly the mandate and could result in measurable controls on higher energy consumers. Creating uniformity across the federation of campuses allows for consolidation of certain systems, reciprocal agreements between campuses to serve as tertiary backup locations, and opt-in subscription to services hosted at campuses with capacity to support other campuses, such as the C-cubed initiative.

y

y

y

Comment [mmb1]: Reference other ITAC initiatives, as appropriate

1.2. Context This document is a collection of Best Practices and Architecture for California State University Data Centers. It identifies practices and architecture associated with the provision and operation of missioncritical production-quality servers in a multi-campus university environment. The scope focuses on the physical hardware of servers, their operating systems, essential related applications (such as virtualization, backup systems and log monitoring tools), the physical environment required to maintain these systems, and the operational practices required to meet the needs of the faculty, students, and staff. Data centers that adopt these practices and architecture should be able to house any end-user service from Learning Management Systems, to calendaring tools, to file-sharing. This work represents the collective experience and knowledge of data center experts from the 23 campuses and the chancellor s office of the California State University system. It is coordinated by the Systems Technology Alliance, whose charge is to advise the Information Technology Advisory Committee

(made up of campus Chief Information Officers and key Chancellor s Office personnel) on matters relating to servers (i.e., computers which provide a service for other computers connected via a network) and server applications. This is a dynamic, living document that can be used to guide planning to enable collaborative systems, funding, procurement, and interoperability among the campuses and with vendors. This document does not prescribe services used by end-users, such as Learning Management Systems nor Document Management Systems. As those services and applications are identified by end-users such as faculty and administrators, this document will describe the data center best practices and architecture needed to support such applications. Campuses are not required to adopt the practices and architecture elucidated in this document. There may be extenuating circumstances that require alternative architectures and practices. However, it is hoped that these alternatives are documented in this process. It is not the goal to describe a single solution, but rather the range of best solutions that meet the diverse needs of diverse campuses.

1.3. Audience This information is intended to be reviewed by key stakeholders who have material knowledge of data center facilities and service offerings from business, technical, operational, and financial perspectives.

1.4. Development Process The process for creating and updating these best Practices and Architecture (P&A) is to identify the most relevant P&A, inventory existing CSU P&A for key aspects of data center operations, identify current industry trends, and document those P&A which best meet the needs of the CSU. This will include information about related training and costs, so that campuses can adopt these P&A with a full understanding of the costs and required expertise. The work of creating this document will be conducted by members of the Systems Technology Alliance appointed by the campus Chief Information Officers, by members of the Chancellors Office Technology Infrastructure Services group, and by contracted vendors.

1.5. Principles and Properties In deciding which Practices and Architecture should be adopted, it is important to have a set of criteria that reflect the unique needs, values, and goals of the organization. These Principles and Properties include: y Cost-effectiveness

Systems and solutions described herein should relate to corresponding ITIL and service management principles. The CSU seeks to adhere to standard ITIL practices and workflows where practical.y y y y y y y y y y Long-term viability Flexibility to support a range of services Security of the systems and data Reliable and dependable uptime Environmental compatibility Redundancy High availability Performance Training Communication Additionally. . the architecture should emphasize criteria that are standards-based. The CSU will implement standards-based solutions in preference to proprietary solutions where this does not compromise the functional implementation.

2. security.1. Service Strategy Service Design Service Transition Service Operation Continual Service Improvement ASHRAE Comment [mmb2]: This section (Reference Model) is meant to define terms. show the high-level relationships. Framework/Reference Model The framework is used to describe the components and management processes that lead to a holistic data center design. and reference external documents from other initiatives and reference architectures.1.1 for specific information on tiering standards as applied to data centers. Refrigerating and Air-Conditioning Engineers (ASHRAE) releases updated standards and guidelines for industry consideration in building design.1.4. and telecommunications design considerations. Uptime Institute The Uptime Institute addresses architectural. 2. these elements should constitute a reference model for a specific CSU campus implementation. reflect a baseline architecture.7). Data centers are as much about the services offered as they are the equipment and space contained in them. ISO 20000-1 promotes the adoption of . ISO/IEC 20000 An effective resource to draw upon as part of one of the ISO IT management standards are the ISO 20000-1 and ISO 20000-2 processes. such as temperature. They include recommended and allowable environment envelopes. Taken together. relative humidity.1. ITIL The Information Technology Infrastructure Library is a set of concepts around managing services and operations. The purpose of the recommended envelope is to give guidance to data center operators on maintaining high reliability and also operating their data centers in the most energy efficient manner. See Section 2.1. and altitude for spaces housing datacomm equipment. The American Society of Heating. including best practices. ITIL version 3 has reworked the framework into a collection of five volumes that describe y y y y y 2. mechanical. Standards 2.2. 2. electrical.4. This same outline is developed in Section 3 (Components) where each topic is given a more thorough ³deep dive´ treatment.3. The model was developed by the UK Office of Government Commerce and has been refined and adopted internationally.1. The ITIL version 2 framework for Service Support breaks out several management disciplines that are incorporated in this CSU reference architecture (see Section 2.2.1.

Resolution Processes. Most IHVs do not guarantee blade/chassis beyond two generations or five years. Service Delivery Processes. It comprises nine sections: Scope. Planning & Implementing Service Management. geared towards high-performance computing (HPC) or applications that need more input/output (I/O) and /or storage is composed of 4U to 6U rack-mounted servers. Blade Servers are defined by the removal of many components PSUs. The Management System. memory and a hard drive or two. Planning and Implementing Service Management. Servers Types y Rack-mounted Servers provide the foundation for any data center s compute infrastructure. Service Delivery Process. Hardware Platforms 2. ISO 20000-2 is a 'code of practice'. Terms & Definitions. Planning & Implementing New or Changed Services. network interface cards (NICS) and storage adapters from the server itself.an integrated process approach to effectively deliver managed services to meet the business and customer requirements. The primary distinction between volume market and high-end servers is the I/O and storage capabilities. and is fully compatible and supportive of the ITIL framework. Terms & Definitions. and describes the best practices for service management within the scope of ISO20000-1. The blade servers themselves contain processors. Control Processes. Requirements for a Management System. Relationship Processes. Resolution Processes. Towers offer the least expensive entrance into the server platform market. The most common are 1U and 2U: these form factors compose what is known as the volume market. It comprises ten sections: Scope. Another potential caveat is the high initial investment in blade technology because of additional costs associated with the chassis. One of the primary caveats to selecting the blade server option is the potential for future blade/chassis compatibility. Relationship Processes. Release Management Processes. Towers There are two primary reasons for using tower servers price and remote locations. and Release Process. The chassis is the piece of equipment that all of the blade servers plug into. Together. 2. this set of ISO standards is the first global standard for IT service management.2. Towers have the ability to be placed outside the confines of a data center. The high-end market.2. This feature can be useful for locating an additional Domain Name Server (DSN) or backup server in a remote office for redundancy purposes. These components are grouped together as part of the blade chassis and shared by all the blades.1. y y . Control Processes.

5. Improved management Many data centers contain best of breed technology. Fiber offers greater reliability and performance but a higher skill lever from SAN Admins. The benefits of consolidation include reduced power and space requirements and fewer servers to manage. Direct Attached Storage (DAS) is still prevalent because it is less costly and easier to manage than SAN storage. backup servers and other high I/O requirements are better suited HPC rack-mounted servers. especially if they are low demand applications. Support for faster speeds in iSCSI is and improved reliability is making it more attractive. Replacing old servers with newer energy efficient ones reduces energy use and cooling requirements and may be eligible for rebates which allow them to pay for themselves. 3. Applications with high I/O requirements perform better with 1U or 2U rack-mounted servers rather than blade servers because stand alone servers have a dedicated I/O interface rather than a common one found on the chassis of a blade server. Application requirements Applications such as databases. 4. They contain server platforms and other devices from many .Principles 1. 2. server utilization management and power management. Storage requirements can vary from a few gigabytes to accommodate the operating system. Some vendors refuse to support virtual servers making VMs unsuitable if support is a key requirement. Applications requiring large amounts of storage should be SAN attached using fiber channel or iSCSI. Energy efficiency starts with proper cooling design. The need to have servers that are physically located at different sites for redundancy or ease of administration can be met by tower servers. Rack-mounted 4U to 6U servers have the space to house a large number of disk drives and make suitable DAS servers. Multiple instances of an application is not supported by some software. Applications such as web servers and MTAs work well in a volume-market rackmounted environment or even in a virtual server environment. These applications allow servers to be easily added and removed to meet spikes in capacity demand. Virtualization accomplishes consolidation by allowing each application think it s running on its own server. application and state data for application servers to terabytes to support large database servers. Consolidation projects can result in several applications being combined onto a single server or virtualization. Care must be taken when combining applications to ensure they are compatible with each other and vendor support can be maintained. Software support can determine the platform an application lives on. requiring the application to run on a large single server rather than multiple smaller servers. 6.

Administration Server virtualization will improve administration by having a single. Servers may be from vendor A.2. 2. storage from vendor B and network from vendor C.2. Consumability Server virtualization should allow us to provide quickly available server instances. secure. Fiber Channel 2.2. etc.1. 4. the data center s capacity to run its applications and store its data increases. Business growth/New services As student enrollment grows and the number of services to support them increases. 2. 3. easy-to-access interface to all virtual servers. 2. 5.) as a storage fabric.1. Reducing the number of vendors produces standardization and is more likely to allow a single management interface for all platforms. This is the most common reason for buying new server platforms.2. iSCSI increases the return on investment (ROI) made for data center network communications and potentially saves . switches.1.2.2. using technologies such as cloning and templating when appropriate.2. Benefits 1. Care must be taken to ensure that hardware is operating within limits of its capacity.1. SAN Storage Area Network 2.1.1. 7. Agility Server virtualization should allow us to improve organizational efficiency by provisioning servers and services faster by allowing for rapid deployment of instances using cloning and templates.1. Reliability and availability An implementation of server virtualization should provide increased reliability of servers and services by providing for server failover in the event of a hardware loss of service as well as high-availability by ensuring that access to shared services like network and disk are fault-tolerant and balanced by load.2.2. Storage 2. Server Virtualization Principles 1. IT administrators must use a variety of gauges to anticipate this need and respond in time. Reuse Server virtualization should allow better utilization of hardware and resources by provisioning multiple services and operating environments on the same hardware. Effective capacity planning becomes especially important. This complicates troubleshooting and leads to finger pointing.2.different vendors. Reduced costs: By leveraging existing network components (network interface cards [NICs]. iSCSI 1.

1. Organizations employ qualified network administrator(s) or trained personnel to manage network operations. and search data at rapid speeds. iSCSI can replicate data from a local target array to a remote iSCSI target array. in some cases. boot from SAN (BfS) becomes a reality. restore. Also. Being a network protocol.3. Initiators . Offsite data replication plays a key part in disaster recovery plans by preserving company data at a co-location that is protected by distance from a disaster affecting the original data center. B2D applications inexpensively back up. eliminating the need for costly Fibre Channel SAN infrastructure at the remote site. iSCSI host bus adapters (HBAs) are 3040% less expensive than Fibre Channel HBAs. For example. while removing ties to Fibre Channel HBAs previously required for SAN connectivity (would still require hardware initiator). Components 2. Using iSCSI in conjunction with Serial Advanced Technology Attachment (SATA) disk farms. Improved options for DR: One of iSCSI's greatest strengths is its ability to travel long distances using IP wide area networks (WANs). obviating the need for additional staff and educational training to manage a different storage network. Using a SAN router (iSCSI to Fibre Channel gateway device) and a target array that supports standard storage protocols (like Fibre Channel). allowing chameleon-like servers to change application personalities based on business needs.capital investments required to create a separate storage network.2. iSCSI-based tiered storage solutions such as backup-to-disk (B2D) and near-line storage have become popular disaster recovery options. 2. 1 Gigabit Ethernet (GbE) switches are 50% less than comparable Fibre Channel switches.1. Boot from SAN: As operating system (OS) images migrate to network storage. iSCSI leverages existing network administration knowledge bases. 1.

many iSCSI storage arrays are similar. Targets 2. By leveraging the host.2. but resource consumption could be problematic in certain scenarios. . leaving little room for additional applications. 2.1. an iSCSI HBA can enable an OS to boot an iSCSI target like any DAS or Fibre Channel SAN-connected system. utilizing the host memory space and CPU for iSCSI protocol processing. an iSCSI initiator can outperform almost any hardware-based initiator. An iSCSI initiator runs within the input/output (I/O) stack of the operating system.Hardware Initiators: iSCSI HBAs simplify boot-from-SAN (BfS). In certain scenarios. Obviously. and management. an iSCSI HBA offloads both TCP and iSCSI protocol processing.Hardware Targets: Many of the iSCSI disk array platforms are built using the same storage platform as their Fibre Channel cousin. By discovering a bootable target LUN during system power-on self test (POST). to Fibre Channel arrays in terms of reliability. However. NIC. Thus. leaving less for applications. In terms of resource utilization. Because an iSCSI HBA is a combination NIC and initiator.1. there are some issues to consider. as more iSCSI packets are sent or received by the initiator. scalability. an iSCSI HBA may be the only choice where CPU processing power is consequential. 2.2. saving host CPU cycles and memory. more memory and CPU bandwidth is consumed. 2. it does not require assistance to boot from the SAN.2. Other than the controller interface. if not identical.1. performance. like server virtualization. A software target can capitalize platform resources. and initiator implementation. Software iSCSI initiators can consume additional resource bandwidth that could be partitioned for supplemental virtual machines.2. unlike software initiator counterparts.Software Targets: Any standard server can be used as a software target storage array but should be deployed as a stand-alone application.1.2. the amount of resource consumption is highly dependent on the host CPU. the remaining product features are almost identical.Software Initiators: While software initiators offer costeffective SAN connectivity. The first is host resource consumption versus performance.2.

iSCSI exacerbates this problem by proliferating iSCSI initiators . iSCSI gateways and routers afford IT administrators the luxury of time and money. these devices increase return on invested capital made in Fibre Channel SANs by extending connectivity to Ethernet islands where devices that were previously unable to reach the Fibre Channel SAN can tunnel through using a router or gateway.5.3. Once again. any x86 server can present LUNs from a Fibre Channel SAN as an iSCSI target. However. 2.1. 2. Any x86 server can act as an iSCSI to Fibre Channel gateway.1.Voracious storage consumption.4. careful attention must be made to ensure traffic is evenly distributed across ports. Using a Fibre Channel HBA and iSCSI target software. SAN migration is a gradual process.4. this is not a turnkey solution especially for large SANs and caution should be exercised to prevent performance bottlenecks. One note of caution: It's important to know the port speeds and amount of traffic passing through a gateway or router.5.2. For example. Tape Libraries 2. Gateways and Routers 2. some router products offer eight 1 GbE ports and only two 4 Gb Fibre Channel ports. Replacing a large investment in Fibre Channel SANs at one time is not a cost reality. Secondly.3. combined with lowercost SAN devices. however broad adoption and support in this category hasn t been seen and remains a territory served by native Fiber Channel connectivity. iSCSI routers and gateways enable Fibre Channel to iSCSI migration. As IT administrators carefully migrate from one interconnect to another. First. While total throughput is the same.iSCSI to Fibre Channel gateways and routers play a vital role in two ways.1.Tape libraries should be capable of being iSCSI target devices. These devices can become potential bottlenecks if too much traffic from one network is aggregated into another. Internet Storage Name Service (iSNS) 2. has stimulated SAN growth beyond what administrators can manage without help. this configuration can be cost-effective for small environments and connectivity to a single Fibre Channel target or small SAN.

and service-oriented architecture. numeric.3.1. It includes web servers.and low-cost target devices throughout a boundless IP network. one of the purposes of an operating system is to handle the details of the operation of the hardware.2.2. iSNS is emerging as the most widely accepted solution. 2. and updated. Thus. like iSNS is a must for large SAN configurations. Although other discovery services exist for iSCSI SANs.2. full-text.1. Databases A database is an integrated collection of logically related records or files which consolidates records previously stored in separate files into a common pool of data records that provides data for many applications.3.2.3. 2. a discovery and configuration service. distributed applications. NAS Network Attached Storage 2. Middleware is especially integral to modern information technology based on XML. which are used most often to support and simplify complex. Storage Virtualization 2. databases can be classified according to types of content: bibliographic. Middleware Middleware is computer software that connects software components or applications. Identity Management Identity management or ID management is a broad administrative area that deals with identifying individuals in a system (such as a country. and similar tools that support application development and delivery. Security 4. This technology evolved to provide for interoperability in support of the move to coherent distributed architectures. In one view. As a host. Operating Systems An Operating System (commonly abbreviated to either OS or O/S) is an interface between hardware and user. Software 2. an OS is responsible for the management and coordination of activities and the sharing of the resources of the computer. The operating system acts as a host for computing applications that are run on the machine. a network or an organization) and controlling the access to the resources in that system by placing restrictions on the established identities.2. 2. Multi-path support 2.2. managed.2. SOAP.3.3. 3. . such as Service Location Protocol (SLP).3.3. A database is a collection of information that is organized so that it can easily be accessed. The model that is most commonly used today is the relational model. Web services. DAS Direct Attached Storage 2.4. The software consists of a set of services that allows multiple processes running on one or more machines to interact across a network.2. The structure is achieved by organizing the data according to a database model.2. and images. This relieves application programs from having to manage these details and makes it easier to write applications. application servers. Other models such as the hierarchical model and the network model use a more explicit representation of relationships.

2. DNS blocklists. Spam filtering comes with a large set of rules which are applied to determine whether an email is spam or not. Email Electronic mail. as well as the application or library sending syslog messages.7. Core/Enabling Applications 2. also known as junk e-mail. Calendaring iCalendar is a computer file format which allows internet users to send meeting requests and tasks to other internet users. device errors. (rather then IP addresses) in addressing network resources.1.3. designed primarily for human use. DHCP servers should be placed in an Enabling Services Network Infrastructure Model (see section 12. DNS Domain Name Services enable the use of canonical names. Devices provide a wide range of messages.4.5. DHCP services must also be highly available.4. DNS services must also be highly available.4.1.3. The term "syslog" is often used for both the actual syslog protocol. Comment [mmb4]: Remove NTA reference 2.2. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the Web service in a manner prescribed by its description using SOAPmessages. 2. Bayesian filtering. often abbreviated as email or e-mail. including changes to device configurations.4. is a method of exchanging digital messages.g. To provide a highly available network.5).3. such as an email client or calendar application) can respond to the sender easily or counter propose another meeting date/time. iCalendar data is usually sent with traditional email. Syslog syslog is a standard for forwarding log messages in an IP network. Recipients of the iCalendar data file (with supporting software. Comment [mmb3]: Remove NTA reference 2. Web Services A Web Service (also Webservice) is defined by the W3C as "a software system designed to support interoperable machine-to-machine interaction over a network. iCalendar is used and supported by a large number of products.. and collaborative filtering databases. Most rules are based on regular expressions that are matched against the body or header fields of the message.5 ). Spam Filtering E-mail spam.7.2.6.3.4.3. forward. 2.4. and hardware component failures.4. 2.4.1. DNS servers should be placed in an Enabling Services Network Infrastructure Model (see section 12. who only need to connect to the e-mail infrastructure. To provide a highly available network.3.7." 2.3. typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards. typically an e-mail server.3. but Spam vendors also employ a number of other spam-fighting techniques including header and text analysis. deliver and store messages on behalf of users. Desktop Virtualization Desktop virtualization (or Virtual Desktop Infrastructure) is a server-centric computing model that borrows from the traditional thin-client model but is designed to give system administrators and end-users the best of both worlds: the ability to host and centrally manage desktop virtual machines in the data center while giving end users a full PC .4. E-mail systems are based on a store-andforward model in which e-mail computer server systems accept. DHCP Dynamic Host Configuration Protocol is used to manage the allocation of IP addresses.ics extension. a personal computer) for the duration of message submission or retrieval. via email. is a subset of spam that involves nearly identical messages sent to numerous recipients by e-mail.3. Syslog is essential to capturing system messages generated from network devices. with a network-enabled device (e.4. or sharing files with an .3.

y Traditional . for example. the higher the availability.3. financials and student services administration functions with a common suite of Oracle Enterprise applications in a shared data center. LMSs range from systems for managing training and educational records to software for distributing courses over the Internet and offering features for online collaboration.1.3. CMS supports human resources. including data warehousing. parking systems. mechanical. Application Virtualization Application virtualization is an umbrella term that describes software technologies that improve portability.5. Help desks can provide users the ability to ask questions and receive effective answers. Application virtualization differs from operating system virtualization in that in the latter case.The advent of the Internet has provided the opportunity for potential and existing customers to communicate with suppliers directly and to review and buy their services online. tracking and managing training/education.5. nightly feeds from CMS to LMS for student rosters. managed and then appropriately resolved in a timely manner.4. Help Desk/Ticketing Help desks are now fundamental and key aspects of good business service and operation. or N. Through the help desk.com.1. security. 2. Facilities 2. Also used for authenticating users. but from a thin client device or similar.4. TurnItIn. Comment [mmb6]: Sonoma uses CMS as toplevel system for other systems that feed info to and from. CMS The mission of the Common Management Systems (CMS) is to provide efficient. although it is still executed as if it is.3. when in reality it is not.3. faculty and staff of the 23campus California State University System (CSU) and the Office of the Chancellor. y Internet . Utilizing a best practices approach. One of the largest advantages Internet help desks have over call centers are that it is available 24/7.3. effective and high quality service to the students. problems are reported. A fully virtualized application is not installed in the traditional sense. help desks can help the organization run smoothly and improve the quality of the support it offers to the users. Third Party Applications 2. with a supported data warehouse infrastructure. 2. LMS A learning management system (LMS) is software for delivering.3.4.1.4. from the same office or remotely. Moreover.Help desks have been traditionally used as call centers.5. 2. electrical. 2.5. The higher the tier. Tiering Standards The industry standard for measuring data center availability is the tiering metric developed by The Uptime Institute and addresses architectural. watts per square foot. Need. Tier descriptions include information like raised floor heights. The application is fooled at runtime into believing that it is directly interfacing with the original operating system and all the resources managed by it.1. Delivery Systems 2. and points of failure. Comment [mmb5]: Architecturally. indicates the level of . The user experience is intended to be identical to that of a standard PC. 2. identity management systems.8. APIs to cashiering. there are some common interactions that need to be described. manageability and compatibility of applications by encapsulating them from the underlying operating system on which they are executed. Customers can email their problems without being put on hold over the phone. the whole operating system is virtualized rather than only specific applications. and telecommunications design considerations. Telephone support was the main medium used until the advent of Internet.2.desktop experience.

Construction cost per square foot is also provided and varies greatly from tier to tier with Tier 3 costs double that of Tier 1.995% Availability y y y y Planned activity does not disrupt critical load and data center can sustain at least one worst-case unplanned event with no critical load impact Multiple active power and cooling distribution paths. no redundant components (N) May or may not have a raised floor. includes redundant components (N+1) Includes raised floor and sufficient capacity and distribution to carry load on one path while performing maintenance on the other Typically takes 15 to 20 months to implement Annual downtime of 1.671% Availability y y y y y y Susceptible to disruptions from both planned and unplanned activity Single path for power and cooling distribution.982% Availability y y y y y Enables planned activity without disrupting computer hardware operation.6 hours Tier 4 Fault Tolerant: 99. 2 UPS each with N+1 redundancy) Typically takes 15 to 20 months to implement Annual downtime of 0.redundant components for each tier with N representing only the necessary system need. or generator Typically takes 3 months to implement Annual downtime of 28.8 hours Must be shut down completely to perform preventative maintenance Tier 2 Redundant Components: 99. Tier 1 Basic: 99. but unplanned events will still cause disruption Multiple power and cooling distribution paths but with only one path active.741% Availability y y y y y y Less susceptible to disruption from both planned and unplanned activity Single path for power and cooling distribution.0 hours Maintenance of power path and other parts of the infrastructure require a processing shutdown Tier 3 Concurrently Maintainable: 99. UPS. includes redundant components (N+1) Includes raised floor.e. includes redundant components (2 (N+1). and generator Typically takes 3 to 6 months to implement Annual downtime of 22.4 hours . UPS. i.

Zoned space: Data centers should be block designed with specific tiering levels in mind so that sections of the space can be operated at high density with supporting infrastructure while other sections can be supported with minimal infrastructure. UPS: Can be rack-based or large room-based systems.4. which manages HP s data center design practice. ease of distribution for power and network. 2. 4. Power is generally the largest cost factor over time. EYP. For higher tier designs. consideration for cost factors such as utilities. Asset management systems should track the lifecycle of batteries for proactive service and replacement. 2. Electrical Systems Generators: Can be natural gas or petroleum/diesel fuel type. Locale: A primary consideration in data center design is understanding the importance of location.4.4 hours due to the human element that gets introduced in managing the complexities of the many redundant systems. Some PDUs are able to be remotely managed to allow for power cycling of equipment at the . Intelligent PDUs are able to provide management systems information about power consumption at the rack or even device level. Exposure to natural disaster is also a key component. Raised floor: A typical design approach for data centers is to use raised floor for air flow management and cable conveyance. 3.1. says their empirical data shows no additional uptime from the considerable cost of trying to further reduce downtime from 0. Raised floor structures must also be grounded. Must be configured for load and runtime considerations. 2. Rack rows and density: Equipment racks and cabinets should be arranged in rows that provide for logical grouping of equipment types.3. usually from the UPS.Trying to achieve availability above Tier 4 presents a level of complexity that some believe presents diminishing returns. either through perforated floor tiles or direct ducting. PDUs: Power distribution units provide receptacles from circuits on the data center power system. In addition to the obvious criteria of adjacency to business operations and technical support resources. as well as weight loading. which dictates the height of the floor. networking and real estate are prime. and provide for air flow management. Consideration must be given for air flow volume. Spatial Guidelines and Capacities 1.2. which has prompted organizations to increasingly consider remote data centers in low utility cost areas.1. are deployed in an N+1 configuration to account for load. Each zone should have capacity for future growth within that tier. Addressing remote control operations and network latency become essential considerations.

Dual A-B cording: In-rack PDUs should make multiple circuits available so that redundant power supplies (designated A and B) for devices can be corded to separate circuits. end of row wall or panel structures. 2.1. Fire Protection & Life Safety Fire suppression systems are essential for providing life safety protection for occupants of a data center and to protecting the equipment.4. Adjacent rows would have opposite airflow to provide only one set of supply or exhaust ducts. Each is a function of resilience and availability. 2. Economizers: Directs ambient outside air in cooler climates to supplement cooling to the data center. VESDA: Very Early Smoke Detection Apparatus allows for pre-action or gas suppression systems to have a human interrupt and intervention at initial y . They are typically tied to power systems that can maintain cooling independent of the power distribution to the rest of the building.4. which factors in the decision of certain gas suppression systems.5. Containment is achieved through enclosed cabinet panels. which aids in remote operation of servers where a power cycle is required to reboot a hung system.4. The key design component is to not allow hot air exhaust to mix with cold air supply and diminish its overall effectiveness. Hot/Cold Aisle Containment: Arranging equipment racks in rows that allow for the supply of cold air to the front of racks and exhaust of hot air at the rear. y Pre-action: Describes a water sprinkler design that allows for the water pipes serving sprinkler heads within a data center to be free from water until such point that a triggering mechanism allows water to enter the pipes. HVAC Systems CRAC units: Computer Room Air Conditioners are specifically designed to provide cooling with humidification for data centers.receptacle level.1. This is meant to mitigate damage from incidental leakage or spraying water from ruptured water lines normally under pressure. Some very dense rack configurations may require the use of chimney exhaust above the racks to channel hot air away from the cold air supply. or plastic sheet curtains. Some A-B cording strategies call for both circuits to be on UPS while others call for one power supply to be on house power while the other is on UPS. Design of systems should give priority to human life over equipment.

to highly complex systems that include access portals (aka. Updates to these sorts of locks are usually done through some sort of hand-held device that is plugged into the lock. depending on the product. Intrusion systems (aka.6. Online systems (sometimes refered to as hardwired systems) consist of an access control panel that connects to a set of doors and readers of various types using wiring run through the building. but they do not provide an audit trail. man traps) and anti-tailgating systems. The system operates by using lasers to evaluate continuous air samples for very low levels of smoke. Electronic lock systems allow the flexability to issue and revoke access instantaneously. FM-200: Gas suppression system that quickly rushes the FM-200 gas to the confined data center space that must be kept air tight for effectiveness.thresholds before ultimately triggering on higher thresholds. The data on the cards can be easily duplicated with equipment easily purchased on the Internet.1. It is a popular replacement for halon gas since it can be implemented without having to replace deployment infrastructure. Inergen: A gas suppression system that does not require a purge system or air tight facility since it is non-toxic and can enter the atmosphere without environmental concerns. a battery and all of the electronics to make access determinations. While the technology is mature and stable. or nearly so. it has a few weaknesses. Offline systems consist of locks that have a reader integrated into the lock. Novec1230: Gas suppression system that is stored as a liquid at room temperature and allows for more efficient use of space over inert gas systems. One is magnetic stripe based. y Halon: Oxygen displacing gas suppression system that is generally no longer used in current data center design due to risk to personnel in the occupied space. A purge system is usually required to exhaust and contain the gas after deployment so it does not enter the atmosphere. alarm systems) can sometimes allow for this kind of control in a facility where it is not possible to migrate to an electronic lock system. Metal keys can provide a high level of security.4. Most new Data Centers constructed today include some sort electronic locking system. The magnetic . Requires larger footprint for more tanks and is a more expensive gas to use and replace. These systems usually read data encoded on tracks two or three. y y y 2. There are two fairly common reader technologies in use today. Also a popular halon gas alternative. offline keypad locks. and don't allow you to limit access based on times and/or days. These can take the form of simple. Access Control Part of a good physical security plan includes access controls which allow you to determine who has access to your Data Center and when.

the IC transmits it's information to the reader and the reader or control panel that it communicates with. A commissioning agent will inspect for such things as proper wiring. 2. and testing of failover mechanisms. some sort of alarm is sounded on egress if proper credentials were not presented. A commissioning agent can identify design flaws. Anti-passback systems require a user to present credentials to both enter or exit a given space.4. electrical distribution panels and switch gear. verification of load capacities. also called a RFID card. Network Network components in the data center such as Layer 3 backbone switches. One option to improving the security of magnetic swipe installations is the use of a dual-validation reader. and inconsistencies in the build-out from the original design. pipe sizes. weight loads.4. UPS and generator step loads.7. chiller and pump capacities. Normally a commissioning agent would be independent from the design or build team. perimeter firewalls. capacitor and wire coil inside of them. the user must enter a PIN code before the lock will open.2. The other common access token in use today is the proximity card.1. because the system knows that they presented credentials to exit a space. Commissioning Commissioning is essential to have validation of the design. single-points of failure. and air conditioning. Load Balancing/High Availability 2. Anti-passback also allows you to track where individuals are at any given time.4. The system will keep track of all credentials presented to the reader. These cards have an integrated circuit (IC). They will test battery run times. Connectivity 2. or anti-passback. and the resulting outcome of that presentation . the other big advantage to electronic locking systems is their ability to provide an audit trail. developed by the .1. locking someone into a room would be a life safety issue. where after swiping your card.4. the energy field emitted by the reader produces a charge in the capacitor. and wireless access points are described in the ITRP2 Network Baseline Standard Architecture and Design document.3. which powers the IC. When the coil is placed near a reader.access was either granted or denied.stripe can wear out or become erased if it gets close to a magnetic field. Complex access control systems will even allow you to do things such as implement a two-man rule. Obviously. Once powered.3. Beyond access control. They will simulate load with resistive coils to generate heat and UPS draw and go through a playbook of what-if scenarios to test all aspects of redundant systems. where two people must present authorized credentials before a lock will open. determines if you should gain access. so usually. WAN edge routers. 2.

1. but reduces the security exposure of internal UA e-mail servers. . 3... Other e-mail servers containing user agent (UA) mailboxes for enterprise users may be provided as infrastructure servers located behind additional firewalls in trusted zones in the data center. no mailboxes are contained within it) can be located where it is readily accessible to other enterprise e-mail servers on the Internet. Other DNS servers containing records for internally accessible enterprise resources may be provided as infrastructure servers hidden behind additional firewalls in trusted zones in the data center. Voice Media Gateway Comment [mmb7]: Standards for these services should be developed collaboratively with the NTA. Virtual switches Considerations beyond common services The following components have elements of network enabling services but are also systems-oriented and may be managed by the systems or applications groups. boundaries are blurring between systems and networks. This can be accomplished by creating a separate Domain Name System (DNS) server with entries for these systems. A message transfer agent (MTA) server that only forwards Simple Mail Transfer Protocol (SMTP) traffic (i. This division of responsibility permits the DNS server with records for externally visible enterprise systems to be exposed to the public Internet.shtml. sister committee to the Systems Technology Alliance. DNS For privacy and security reasons.g. Latest versions of the standard can be located at http://nta. and locating it where it can be readily accessible by any external user on the Internet (e. many large enterprises choose to make only a limited subset of their systems visible to external parties on the public Internet. Virtualization is causing an abstraction of traditional networking components and moving them into software and the hypervisor layer. Increasingly.Network Technology Alliance. large enterprises may choose to distribute e-mail functionality across different types of e-mail servers. 2.edu/ITRP2. while reducing the security exposure of DNS servers containing the records of internal enterprise systems. For example. This division of responsibility permits the external MTA server to communicate with any other e-mail server on the public Internet. it can be located in a DMZ LAN behind external firewalls to the public Internet). locating it in a DMZ LAN behind external firewalls to the public Internet).calstate.e. E-Mail (MTA only) For security reasons.

network operators/carriers increasingly are providing a SIP trunking interface between their IP networks and the PSTN. can manage all downstream devices as one switching fabric. the hypervisor manages L2 connections from virtual hosts to the NIC(s) of the physical server.1.4.3. With this configuration. this will permit enterprises to send VoIP calls across IP WANs to communicate with PSTN devices without the need for a voice media gateway or direct PSTN interface. With Ethernet IP phones.4.1. Ethernet L2 Virtual Switch In a virtual server environment. optimally. top-of-rack switching has emerged as a way to provide both Ethernet and Fiber Channel connectivity in one platform. reduced cross connects and better cable management. possibly including integrated services digital network (ISDN) ports. The benefits are a modularized approach to server and storage networks. the VoIP gateway is used for data center site phone users to gain local dial access to the PSTN. 4. Structured Cabling .2.The data center site media gateway will include analog or digital voice ports for access to the local PSTN. Instead. The VoIP media gateway converts voice calls between packetized IP voice traffic on a data center site network and local circuit-switched telephone service. data center site voice calls can be routed through the site s WAN edge IP routers and data network access links. these devices connect to end-of-row switches that. or out in the ISP public network as part of an IP Centrex or virtual PBX service.1. Top-of-Rack Fabric Switches As a method of consolidating and aggregating connections from dense rack configurations in the data center. Generally. However. Network Virtualization Comment [mmb8]: Added as a result of the announcement of the virtual switch at our Cisco briefing. 2. 2.3. the VoIP media gateway operates under the control of a call control server located at the data center site. A hypervisor plug-in module may be available to allow the switching characteristics to emulate a specific type of L2 switch so that it can be managed apart from the hypervisor and incorporated into the enterprise NMS. 5.

adds and changes Heat control. Cable types: Cabling may be copper (shielded or unshielded) or fiber optic (single mode or multi mode). such as power and fire suppression.calstate. Cables under raised floor should be in channels that protect them from adjacent systems. known as the Infrastructure Physical Plant Working Group (IPPWG). network monitoring. 3. Cabling pathways: usually a combination of raised floor access and overhead cable tray. 5. Cable management: Comment [mmb9]: Finish this section 2. The use of modular fiber cassettes and trunk cables allows for higher densities and the benefit of factory terminations rather than terminations in the field. Operational support includes systems monitoring. which can be time-consuming and subject to higher dB loss. change control.edu/NTA_working_groups/IPP/ The approach to structured cabling in a data center differs from other aspects of building wiring due to the following issues: y y y Managing higher densities. 4. Information about the working group can be found at the following link: http://nta. problem escalation. An IT operation incorporates all the work required to keep a system running smoothly. pigtails and patchcords among the distribution frames and splice cabinets. version . operating system upgrades.The CSU has developed a set of standards for infrastructure planning that should serve as a starting place for designing cabling systems and other utilities serving the data center.calstate. 2. problem determination. Fiber ducts: fiber optic cabling has specific stress and bend radius requirements to protect the transmission of light and duct systems designed for fiber takes into account the proper routing and storage of strands. particularly fiber optics Cable management. for which cable management plays a role The following are components of structured cabling design in the data center: 1.4. especially with regard to moves.4 Operations Information Technology (IT) operations refers to the day-to-day management of an IT infrastructure. such as mailbox moves and hardware upgrades.edu/cpdc/ae/gsf/TIP_Guidelines/ There is also a NTA working group specific to networking that regards cabling infrastructure. These Telecommunications Infrastructure Planning (TIP) standards can be referenced at the following link: http://www. LC. problem reporting. but it does not affect the overall system design. Fiber connector types: usually MT-RJ. SC or ST. This process typically includes the introduction and control of small changes to the system.

capacity planning.management. performance tuning and system programming. Sample Practices Enlist executive-level champions. . deploying. backup and recovery. Comment [mmb10]: Include references to benchmarks and classifications IT Training Management Processes and Sample Practices Management Processes Align IT training with business goals.4. but a strategic element in achieving an organization s objectives. Involve critical stakeholders.2 Training Training is not simply a support function.4.4.1 Staffing Staffing is the process of acquiring. Data center operations services include: y y y y y y y y y y y Help Desk Support Network Management Data Center Management Server Management Application Management Database Administration Web Infrastructure Management Systems Integration Business Continuity Planning Disaster Recovery Planning Email Administration 2. and retaining a workforce of sufficient quantity and quality maximize the organizational effectiveness of the data center. 2. The mission of data center operations is to provide the highest possible quality of central computing support for the campus community and to maximize availability central computing systems.4.

Build courses using reusable components. Provide resources for management training. drop in sample graphs of info from systems such as Nagios or HP OpenView. Allocate IT training resources. Document competencies/skills required for each job description. 2. system performance levels. Big Brother In a discussion of monitoring.4. Collect information on how job performance is affected by training. Give trainees choice among different training delivery methods. Perform a gap analysis to determine needed training. also make mention of the actions that follow from monitoring. whether prescriptive escalations or automation or distributed notifications.4. etc.4.5 Console Management Comment [mmb11]: Provide some examples of systems that can be monitored. component serviceability and timely detection of system operational or security problems such as disk capacity exceeding defined thresholds or system binary files being modified. Comment [mmb12]: Other examples of automation can involve provisioning of VMs and storage .4. Use an investment process to select and manage training projects. In best practice section . leadership and project management. Assess evaluation results in terms of business impact 2. Automation increases reliability and frees staff from routine tasks so that continuous improvement of operations can occur.4 Automation Automation of routine data center tasks reduces staffing headcount by using tools such as automated tape backup systems that auto load magnetic media from tape libraries sending backup status and exception reports to data center staff. etc. The potential for automating routine tasks is limitless.Identify and assess IT training needs.g.4. Evaluate/demonstrate the value of IT training. 2. Design and deliver IT training.3 Monitoring Monitoring is a critical element of data center asset management and covers a wide spectrum of issues such as system availability.4. e..

4. which has some implications for system management.4. This leverages the economy of scales enjoyed by managing multiple remote production data centers from a single location that may be dynamically assigned in manner such as follow the sun.edu/audit/audit_reports/information_security/index.net/itacdrp/default.3. 2.3. as well as establish recovery time and point objectives. medium-.shtml Comment [mmb13]: Should this section also mention anything about contention with remote access and IP-based KVMs? Comment [mmb14]: Architecturally.5. and utilities necessary to recover the level of service required by the critical business functions. etc. Consideration should be given to location.4.6 Remote Operations Lights out operations are facilitated by effective remote operations tools. this usually involves dedicated NICs and IP address assignments. Relationship to overall campus strategy for Business Continuity Campuses should already have a business continuity plan. and long-term disaster and disruption scenarios. Deducing a maximum allowable downtime through this exercise will inform service and operational level agreements. . telephones. discussed in section 2.2.To the extent possible console management should integrate the management of heterogeneous systems using orchestration or a common management console. Infrastructure considerations 2.1 Backup and Recovery. which typically includes a business impact analysis (BIA) to monetize the effects of interrupted processes and system outages. 2.5. memorandums of understanding.1.5. Reports can be found at the following site: http://www.1.2 2. Examples of operational considerations.4. Accounting 2.5. Relationship to CSU Remote Backup DR initiative ITAC has sponsored an initiative to explore business continuity and disaster recovery partnerships between CSU campuses. electrical. [Charter document?] Several campuses have teamed to develop documents and procedures and their workproduct is posted at http://drp. 2. Site availability Disaster recovery planning should account for short-.4. size. capacity. network connectivity. mechanical.calstate.5. and network diagrams are given in Section 3.sharepointsite. Attention should be given to structural. plumbing and control systems and should also include planning for workspace.5.1.7.5. workstations. 2.aspx.5. including impact and accessibility to the data center.4. Auditing The CSU publishes findings and campus responses to information security audits. Disaster Recovery 2.

Business process does not contain applications and services that were developed and are maintained in-house 3. and cooling with the added service of peer-to-peer network cross-connection. When determining an alternate site.2. Business process requires geographically distant locations for disaster recovery or business continuity 6. Business process does not predominantly include internal infrastructure or support services that are not web-based 4. The plan should include logistical procedures for accessing backup data as well as moving personnel to the recovery location.3. in the event a long-term disaster becomes a reality. y Unmanaged hosted services: Hosting centers may offer a form of semi-colocation wherein the hosting provider owns and maintains the server hardware for the customer. but doesn't manage the operating system or applications/services that run on that hardware. Access to particular IT staff skills and bandwidth of the current IT staffers 8. either through a reciprocal agreement with another campus or a commercial provider.5. Co-location provider can accommodate regulatory auditing and reporting for the business process . The following are typical types of collocation arrangements: y Real estate investment trusts (REITs): REITs offer leased shared data center facilities in a business model that leverages tax laws to offer savings to customers. Business process includes or provides an e-commerce solution 2. or commercially available co-location facilities described in Section 2. or IV) at less cost than retrofitting or building new campus data centers 7. Co-location One method of accomplishing business continuity objectives through redundancy with geographic diversity is to use a co-location scenario. III. management should consider scalability. Principles for co-location selection criteria 1. Business process contain predominantly commodity and horizontal applications and services (such as email and database systems) 5.Alternate sites could be geographically diverse locations on the same campus. Level of SLA matches the campus requirements. Co-location facility meets level of reliability objective (Tier I. power. 2.5. II. locations on other campuses (perhaps as part of a reciprocal agreement between campuses to recover each other s basic operations). y Network-neutral co-location: Network-neutral co-location providers offer leased rack space. y Co-location within hosting center: Hosting centers may offer co-location as a basic service with the ability to upgrade to various levels of managed hosting.2. including those for disaster recovery 9.3.

To be successful.1.1.1.and provides different content based on function.4. organization. compliance and other issues might define the scope of appropriate use for external cloud computing. power. and orchestrate the delivery of these services across the functionally distributed (and. ISO/IEC 20000. the IT Service Catalog must be focused on addressing the unique requirements for each of these business segments. plan demand for these services.6. 2. and entitlements. yet it's important to recognize that these are two very distinct and different audiences. An effective Service Catalog also segments the customers who may access the catalog . Current data center facilities have run out of space.5. oftentimes. information and technology. Host. Operational considerations 2. .1 (Backup and Recovery 2. Recovery Time Objectives and Recovery Point Objectives discussed in 2.1. they will require a very different view into the Service Catalog.5. Depending on the audience.7. the Service Catalog can provide a vehicle for communicating and marketing IT services to both business decision-makers and end users. including: y Aligning IT with business objectives y Managing IT services and solutions throughout their lifecycles y Service management processes like those described in ITIL. The satisfaction of both customers and users is equally important.10. Co-Lo. communicate IT services to the business community. with services articulated in business terms. The ITIL framework distinguishes between these groups as "customers" (the business executives who fund the IT budget) and "users" (the consumers of day-today IT service deliverables). multi-sourced) IT organization.7. Total Enterprise Virtualization 2. or cooling [concepts from Burton Group article. The most important requirement for any Service Catalog is that it should be business-oriented. It is the management of customer-valued IT capabilities through effective processes. Management Disciplines 2. IT organizations should consider a two-pronged approach to creating an actionable Service Catalog: Comment [mmb15]: From Jim Michael: Internal cloud computing would probably span aspects of almost all of this framework and external cloud computing could have a similar placement.7.7. or IBM s Process Reference Model for IT.whether end users or business unit executives . but I expect that security. Service Catalog An IT Service Catalog defines the services that an IT organization is delivering to the business users and serves to align the business requirements with IT capabilities. roles. locations. needs.4.4. In following this principle. or Do-It-Yourself? ] 2. Service Management IT service management is the integrated set of activities required to ensure the cost and quality of IT services valued by the customer.

This can lead to serious problems with the relationship.1. Service Level Agreements The existence of a quality service level agreement is of fundamental importance for any service or product delivery of any importance. and typically will define and/or cover: y The services to be delivered y Performance. service portfolio view of the Service Catalog used by business unit executives to understand how IT's portfolio of service offerings map to business unit needs. configuration. This is an area which too often is not given sufficient attention." As described above.y y The executive-level. service catalogs should extend beyond a mere list of services offered and can be used to facilitate: y IT best practices. with easy-to-understand descriptions and an intuitive store-front interface for browsing available service offerings.7. and is NOT an area for shortcutting. The Service Portfolio provides the basis for a balanced. Tracking and Reporting Mechanisms y Problem Management Procedures y Dispute Resolution Procedures y The Recipient's Duties and Responsibilities y Security y Legislative Compliance y Intellectual Property and Confidential Information Issues Comment [mmb16]: Should this be part of the Best Practice section that describes different and effective ways of presenting the information? . This is referred to in this article as the "service portfolio. this view is referred to as a "service request catalog.2. Service Level Agreements (aligning internal & external customer expectations) y Hierarchical and modular service models y Catalogs of supporting and underlying infrastructures and dependencies (including direct links into the CMDB) y Demand management and capacity planning y Service request. This customer-focused approach helps ensure that the Service Request Catalog is adopted by end users. and approval processes y Workflow-driven provisioning of services y Key performance indicator (KPI)-based reporting and compliance auditing 2. serious issues with respect to the service itself and potentially the business itself. and indeed." The employee-centric. business-level discussion on service quality and cost trade-offs with business decision-makers. For the purposes of this article. captured as Service Catalog templates y Operational Level Agreements. a Service Request Catalog should look like consumer catalogs. validation. It will embrace all key issues. To that end. It essentially defines the formal relationship between the supplier and the recipient. request-oriented view of the Service Catalog that is used by end users (and even other IT staff members) to browse for the services required and submit requests for IT services.

7. Stakeholders and IT staff should collaborate on defining project requirements. usually not of the magnitude of major modifications and can be performed in the normal course of business. Functions associated with change management are: 1. Major modifications: significant functional changes to an existing system. Change procedures should be similar to routine modifications but include abbreviated change request. and training.3. or converting to or implementing a new system. 3. A project management system should employ well-defined and proven techniques for managing projects at all stages.y Agreement Termination 2. It is a core component of a functional ITIL process as well. and risk assessment. 2. Project status updates measured against original targets to assess time and cost overruns. Change Management Change Management addresses routine maintenance and periodic modification of hardware. including: y y y y y Initiation Planning Execution Control Close-out Project monitoring will include: y y Target completion dates realistically set for each task or phase to improve project control. 2. budget. rigorous testing. infrastructure upgrades and system maintenance. evaluation and approval procedures to allow for expedited action. software and related documentation. Routine modifications: changes to applications or operating systems to improve performance. correct problems or enhance security. . Emergency modifications: periodically needed to correct software problems or restore operations quickly.7. Project Management An organization s ability to effectively manage projects allows it to adapt to changes and succeed in activities such as system conversions. as well as a transition plan from the implementation team to the operational team. critical success factors. resources. Controls should be designed so that management completes detailed evaluation and documentation as soon as possible after implementation. usually involves detailed file mapping.2.

2. 1. The information held may be in a variety of formats.A database that contains details about the attributes and history of each Configuration Item and details of the important relationships between CI s. Documentation maintenance: identifies document authoring.7. etc. [concepts from FFIEC Development and Acquisition handbook] 2. such as for development. 8. debugging.. Library controls: provide ways to manage the movement of programs and files between collections of information. documentation.7. and also provides end users access to operations manuals. photographic.4. 5. size and type. and management of storage and operating systems.4. effectively a data map of the physical reality of IT Infrastructure. Effective documentation allows administrators to maintain and update systems efficiently and to identify and correct programming defects.) to a single program module or a minor hardware component. 4. test and production. etc. The lowest level CI is normally the smallest unit that will be changed independently of other components. from an entire service (including all its hardware. CI s may vary widely in complexity. Communication plan: change standards should include communication procedures that ensure management notifies affected parties of changes. 3. software. Functions associated with Configuration Management are: y y y y y Planning Identification Control Status Accounting Verification and Audit 2.Any component of an IT Infrastructure which is (or is to be) under the control of Configuration Management. approving and formatting requirements and establishes primary document custodians. but relating to externally developed software.5. Utility controls: restricts the use of programs used for file maintenance. textual. An oversight or change control committee can help clarify requirements and make departments or divisions aware of pending changes. Configuration Management Database (CMDB) . 6. typically segregated by the type of stored information. Configuration Management Comment [mmb17]: Consider using FFIEC document on Sharepoint to draw best practice concepts Configuration Management is the process of creating and maintaining an up to date record of all components of the infrastructure. 7. diagrammatic. Patch management: similar to routine modifications. Data Management . Configuration Item .

2. Backup types a. . i. measured in time. In fact.Source deduplication means that the deduplication work is done up-front by the client being backed up. Target deduplication .1. this is the point in time before a data loss event occurred.Incremental backups backup the changed data set since the last full backup of the system was performed. is the acceptable amount of data loss a business can tolerate. it may be acceptable to recover to the most recent backup taken at the end of the business day. some vendors include multiple styles of incrementals that a backup administrator may choose from. Source deduplication . There does not seem to be any industry standards when you compare one vendor's style of incremental to another. RPO and RTO go hand-in-hand in developing your data protection plan. a highly visible server such as a campus' main web server may need to be up and running again in a matter of seconds. as the business impact if that service is down is high. Deduplication: a.5.Full backups are a backup of a device that includes all data required to restore that device to the point in time at which the backup was performed. Recovery Time Objective. Full backups . analyze that data to find duplicate blocks. Incremental backups .7. whereas highly critical systems may have a RPO of an hour or only a few minutes. a server. Post-process deduplication devices write all of the data to disk. In other words. a server with low visibility. at which data may be successfully recovered. or RPO. must be restored by. may have a RTO of a few hours. b. and then at some later point. 3.Target deduplication is where the deduplication processing is done by the backup appliance and/or server. Backup and Recovery Concepts 1. is the duration of time in which a set of data. 4. ii. b. 2. business process etc. In-line deduplication devices decide whether or not they have seen the data before writing it out to disk. Conversely. such as a server used in software QA. For example. or RTO. Recovery Point Objective. There tend to be two forms of target deduplication: in-line and post-process. For less critical systems.

Disk-to-tape is what most system administrators think of when they think of backups. STK/IBM .Digital Linear Tape. as it has been the most common backup method in the past. or LTO. Methods 1. b. is a tape technology developed by Sony in the late 1990's. c. d.With the dramatic drop in hard drive prices over the recent years. DAT/DDS . or VTLs.StorageTek and IBM have created several proprietary tape formats that are usually found in large. Standard disk array . 2.i. The big advantage they have over the traditional tape method. Disk-to-Disk (D2D) . Quantum licenses the technology to other manufacturers. whether it be a full or another differential incremental. or DLT. A cumulative incremental backup is a style of incremental backup where the data set contains all data changed since the last full backup. DLT . Writing data to tape is typically faster than reading the data from the tape. is a tape technology developed by a consortium of companies in order to compete with proprietary tape formats in use at the time. The technology was later purchased by Quantum in 1994. a. disk-to-disk methods and technologies have become more popular. or DAT technology. e. is speed in both the writing and reading of data. mainframe environments.Virtual Tape Libraries.Linear Tape Open. The data typically moves from the client machine through some backup server to an attached tape drive.Many enterprise backup software packages available today support writing data to attached disk devices instead of Comment [mmb18]: For best practices: Be sure to stick with the same vendor for all drive types Be sure to understand backward compatibility issues with drive types Not adviseable to mix drive types in library chassis Should select vendor solution that is open with respect to tape manufacturers. 5. A differential incremental backup is a style of incremental backup where the data set contains all data changed since the previous backup. or DDS. Some options available in the disk-to-disk technology space: a. are a class of disk-to-disk backup devices where a disk array and software appear as a tape library to your backup software.Advanced Intelligent Tape. Disk-to-Tape (D2T). AIT . VTL .There are many tape formats to choose from when looking at tape backup purchases. no warranty issues if going off platform or mixing manufacturers . They range from open-standards (many vendors sell compatible drives) to single-vendor or legacy technologies. was originally developed by Digital Equipment Corporation in 1984. is a tape technology that evolved from Digital Audio Tape. b. as well as manufacturing their own drives. ii. LTO . Tape Media .Digital Data Store. or AIT.

Asynchronous . Off-site . 6. the data is written to the arrays that are part of the replication configuration. synchronous replication can cause performance impacts. Backing up each virtual instance as a file at the hypervisor level is another consideration.On-site replication is useful if you are trying to protect against device failure. One advantage to this method is that you don't have to purchase a special device in order to gain the speed benefits of disk-todisk technology. Backup applications that support disk targets. Disk-to-Disk-to-Tape (D2D2T) . A write request is not considered complete until acknowledged by all storage arrays.Disk-to-disk-to-tape is a combination of the previous two methods. . it may also be possible to achieve file-level restores within the VM while backing up the entire VM as a file. protect against some sort of disaster that takes out your entire data center. however. This does not. A prime consideration in architecting backup strategies in a virtual environment is the use of a proxy server or intermediate staging server to handle snapshots of active systems. This practice combines the best of both worlds . c. or as licenseable add-ons.A snapshot is a copy of a set of files and directories as they were at a particular moment in time.a tape drive. or none of them. Technically. 4. Depending on your application and the distance between your local and remote arrays. b. backup agents may be installed on the virtual host and file level backups invoked in a conventional method. but generally this term implies some geo-diversity to the configuration.Synchronous replication guarantees zero data-loss by performing atomic writes. the snapshot is usually taken by either the logical volume manager (LVM) or the file system driver. On-site . In other words.speed benefits from using disk as your backup target. 5. Depending on the platform and the OS. Replication a. You would typically purchase identical storage arrays and then configure them to mirror the data between them. File system snapshots tend to be more space-efficient than their LVM counterpart. Such proxies allow for the virtual host instance to be staged for backup without having to quiesce or reboot the VM. On a server operating system. since the application may wait until it has been informed by the OS that the write is complete.Off-site implies that you are replicating your data to a similar device located away from your campus. and tape's value in long-term and off-site storage practices. Snapshots . Many specialized D2D appliances have some support for pushing their images off to tape. Synchronous vs. also tend to support migrating their images to tape at a later date. Most storage arrays come with some sort of snapshot capabilities either as base features. 3. off-site could mean something as simple as a different building on your campus. VM images In a virtualized environment.

Tape library: A tape library is a device which usually holds multiple tapes. The weekly or father backups are rotated on a weekly basis with one graduating to grandfather status each month. son . The basic method is to define three sets of backups. father. 7. Out-of-band . d. weekly and monthly. mixing DLT and LTO tapes and drives).calstate.From Wikipedia: "Grandfather-father-son backup refers to the most common rotation scheme for rotating backup media. A large tape library can also allow you to consolidate various media formats in use in an environment into a single device (ie.In-band replication refers to replication capabilities built into the storage device. The site is located at http://www. Offsite vaults . the CSU Records/Information Retention and Disposition Schedules.Vaulting. multiple tape drives and has a robot to move tapes between the various slots and drives." b. The objective of the executive order is to ensure compliance with legal and regulatory requirements while implementing appropriate operational best practices. but it is common to have at least one copy of the media being sent rather than sending your only copy.Asynchronous replication gets around this by acknowledging the write as soon as the local storage array has written the data. is usually done with some sort of full backup.edu/recordsretention. c. or son. such as daily. 9. A library can help automate the process of switching tapes so that an administrator doesn't have to spend several hours every week changing out tapes in the backup system. Retention policies: The CSU maintains a website with links and resources to help campuses comply with requirements contained in Executive Order 1031. usually in the form of a module or licensed feature installed into a storage router or switch. The amount of time it takes to retrieve a given piece of media should be taken into consideration when calculating and planning for your RTO. The media sent off-site can either be the original copy or a duplicate. Grandfather. Disk Backup appliances/arrays: some vendor backup solutions may implement the use of a dedicated storage appliance or array that is optimized for their particular . Originally designed for tape backup. 8. Tape Rotation and Aging Strategies a. or moving media from on-site to an off-site storage facility. The daily. In-band vs. but it can contribute to data loss if the local array fails before the remote array has received all data updates. Asynchronous replication may increase performance. software installed on a server or "in the network". Out-of-band can be accomplished with an appliance. backups are rotated on a daily basis with one graduating to father status each week. it works well for any hierarchical backup strategy.

asset management for smaller environments may be able to be managed by spreadsheets or simple database.7. manage all aspects of the assets. and other network related appliances.7.backup scheme. Storage Assets to include Storage Area Networks (SAN).shtml The following are asset categories to be considered in a management system: y Physical Assets to include the grid. a system that could be shared among campuses while maintaining restricted permission levels. Network Assets to include routers. firewalls. floor space. Media Lifecycle Destruction of expired data 2. Recognizing that sophisticated systems may be prohibitively expensive.4. tile space. potentially resulting in significant cost savings. would allow for more comprehensive and uniform participation. life cycle management with Web interfaces for real time access to the data. financial and contractual. Archiving 2. Hierarchical Storage Management 2. The best systems/tools should be capable of asset discovery. An effective management process requires combining current Information Technology Infrastructure Library (ITIL) and Information Technology Asset Management (ITAM) best practices with accurate asset information.5. y y .5.5. Optimally.6.edu/tis/cass/niams.3.2. switches. such as the Network Infrastrucure Asset Management System (NIAMS).calstate. http://www.7. In the case of incorporating deduplication into the backup platform. ongoing governance and asset management tools. The layout of space and the utilization of the attributes above are literally an asset that needs to be tracked both logically and physically. racks and cables.7. It can improve life cycle management. 2. load balancers. Document Management 2. a dedicated appliance may be involved for handling the indexing of the bit-level data. Asset Management Effective data center asset management is necessary for both regulatory and contractual compliance. Network Attached Storage (NAS). including physical. and facilitate inventory reductions by identifying under-utilized hardware and software.5.5. tape libraries and virtual tape libraries.7.

video and other media. The definition is: An information asset is a definable piece of information. Following is a list of logical assets or associated attributes that would need to be tracked: o o o A list of Virtual Machines Software licenses in use in data center Virtual access to assets  VPN access accounts to data center  Server/asset accounts local to the asset y y y y Information Assets to include text. Data Center Security and Safety Assets Media access controllers. Computational fluid dynamics (CFD) modeling can serve as a tool for maximizing airflow within the data center. associated costs. circuit number and grid location of same. timely. Information is probably the most important asset a data center manager is responsible for. stored in any manner. Airflow in this instance may be considered a logical asset as well but the usage plays an important role in a data center environment. outlets (NEMA noted). Electrical Assets to include Universal Power Supplies (UPS). such as fire suppression systems. PRI s and other communication lines. In order to achieve access users must have accurate. cameras. secure and personalized access to this information. fire and life safety components. images. recognized as valuable to the organization. electrical power usage.y y Server Assets to include individual servers. Most importantly in this logical realm is the management of the virtual environment. Air Conditioning Assets to include air conditioning units. Logical Assets T1 s. fire alarms. if not reduce. Power consumption is another example of logical asset that needs to be monitored by the data center manager in order to maximize server utilization and understand. access control systems and access cards/devices. chiller plants and other airflow related equipment. blade servers and enclosures. breakers. Rising energy costs and concerns about global warming require data center managers to track usage carefully. air conditioning. audio. environmental surveillance. The following are asset groupings to be considered in a management system: y By Security Level o Confidentiality o FERPA o HIPPA o PCI . Power Distribution Units (PDU). air handlers.

Correction .7.7. open problems.An iterative process to diagnose known errors until they are eliminated by the successful implementation of a change under the control of the Change Management process.7. ISO 17799 Information Security Standard. for which the cause is unknown. the . Problem Management Problem Management investigates the underlying cause of incidents. or disclosure.Summarizes Problem Management activities.2. Problems can also be identified from a single significant incident.3. 2. destruction. 8AM .7. it should be an overriding concern in the design and operation of a data center. California SB 1386. Fault Detection .8. but for which the impact is significant. California AB 211. Includes number of repeat incidents. 24x7 availability) o Business Hours only (ex.2.7.7.7.7. Problem Management seeks to remove the causes of incidents permanently from the IT infrastructure whereas Incident Management deals with fighting symptoms to incidents. Data Security Data security is the protection of data from accidental or malicious modification.7. Problem Management should not be confused with Incident Management.8.3. regulations and standards that are likely to be applicable such as the Payment Card Industry Data Security Standard. By removing errors. repeat problems. indicative of a single error.7.1. and aims to prevent incidents of a similar nature from recurring. Reporting .y y y By Support Organization o Departmental o Computer Center Supported o Project Team Criticality o Critical (ex.7 PM) o Noncritical By Funding Source (useful for recurring costs) o Departmental funded o Project funded o Division funded 2. Licensing 2. the number of incidents can be reduced over time. Tagging/Tracking 2. Security 2.6.7.A condition often identified as a result of multiple incidents that exhibit common symptoms.1. 2.7.1.7. Although the subject of data security is broad and multi-faceted. There are multiple laws. 2. Problem Management is proactive while Incident Management is reactive. which often requires a structural change to the IT infrastructure in an organization. 2.6. etc. Software Distribution 2. problems.6.

3. Anti-malware Protection Malware (malicious code. Encrypted Data Storage programs include PGP's encryption products (other security vendors such as McAfee have products in this space as well). Relying on only anti virus solutions will not fully protect a computer from malware. and spyware.California State University Information Security Policy and Standards to name a few. host/network intrusion protection systems.7. Strong passwords should be used and a password should never be transmitted or stored without being encrypted. protection of keys. two-factor authentication uses two of them. and TrueCrypt's free encryption software. upper case alpha. Vulnerability Management 2. Anti malware solutions must be deployed on all operating system platforms to detect and reduce the risk to an acceptable level. password or passphrase) y What a person has (e. Common transmission encryption protocols and utilities include SSL/TLS. SecureShell. smart card or token) y What a person is or does (e.. Solutions for malware infection attacks include firewalls (host and network). passwords used for single factor authentication may soon outlive their usefulness.7. Single-factor password authentication remains the most common means of authentication ("What a person knows").4. The use of encryption is especially important for Protected Data (data classified as Level 1 or 2). Determining the correct mix and configuration of the anti-malware solutions depends on the value and type of services provided by a server. Anti virus.7.g. Key management (exchange of keys.g. encrypted USB keys. firewalls. such as viruses.8. Encryption Encryption is the use of an algorithm to encode data in order to render a message or other file readable only for the intended recipient.8. integrity.. biometrics or keystroke dynamics) Single-factor authentication is the use one of the above authentication types.8.g. There are three types of authentication available: y What a person knows (e. . Its primary functions are to ensure non-repudiation. and special character. Also the process of issuing identifiers must be secure and documented.2. and three-factor authentication uses all of them. From a security perspective it is important that user identification be unique so that each person can be positively identified. OS/Application hardening and patching. number. Authentication Authentication is the verification of the identity of a user. written to circumvent the security policy of a computer) represents a threat to data center operations. 2. worms.1. and key recovery) should be carefully considered. A reasonably strong password would be a minimum of eight characters and should contain three of the following four character types: lower case alpha.7. and confidentiality in both data transmission and data storage. However due to the computing power of modern computers in the hands of attackers and technologies such as "rainbow tables". It is required to periodically prove compliance to these standards and laws. 2. antivirus/anti-spyware. and IPSec. 2.8..4.

See section 2. 2.8. The ability to undo patches is highly desirable in case unexpected consequences are encountered.8. test. In order to detect and address emerging vulnerabilities in a timely manner. PCI DSS. and Microsoft and then take appropriate action.and intrusion protection systems need to be regularly updated in order to respond to current threats. how quickly a system comes into compliance.once an attacker has gained physical access to your systems.4. etc.).8. Compliance reporting is also valuable in proving compliance to applicable laws and contracts (HIPAA. US-Cert. Compliance reporting should include measures on: y How many systems are out of compliance. y Compliance trends over time. 2. Compliance Reporting Compliance Reporting informs all parties with responsibility for the data and applications how well risks are reduced to an acceptable level as defined by policy.8. or what method of encryption you are using on your network . y Once detected out of compliance.5. and procedures.7. password crack tools.4.4.4. Data Center Operations groups should implement a patching program designed to monitor available patches. and monitor the deployment of OS and application patches. implement. Patches should be applied via a change control process. REN-ISAC.7.7. such as port scanning. This would be part of a "Defense-in-Depth" security model. campus staff members should frequently monitor announcements from sources such as BugTraq. Vulnerability Scanning The datacenter should implement a vulnerability scanning program such as regular use of McAfee s Foundstone. Physical Security When planning for security around your Data Center and the equipment contained therein.6 for description of access control . Also the capability to verify that patches were successfully applied is important. Patching includes file updates and configuration alterations.7. y Percentage of compliant/non-compliant systems. etc.1. 2. it doesn't matter how long your passwords are. Comment [mmb19]: Remove product reference and describe specific toolsets that are typically used. physical security must be part of the equation. standards. not much else matters. Both timely patch deployment and patch testing are important and should be thoughtfully balanced. 2. categorize.2. Patching The ongoing patching of operating systems and applications are important activities in vulnerability management.4. If physical security of critical IT equipment isn't addressed.3.

<insert diagram of reference model with key components as building blocks> .

1.1. and Service Continuous Improvement (SCI). Best Practice Components 3. as illustrated below: Service Strategy Main Activities Define the Market Develop Offerings Develop Strategic Assets Prepare Execution Key Concepts Utility & Warranty Value Creation Service Provider Service Model Service Portfolio Processes Service Portfolio Management Demand Management Financial Management Service Design Five Aspects of SD Service Solution Service Management Systems and Solutions Technology and Management Architectures & Tools Processes Measurement Systems. Service Design (SD). Products. The five components are Service Strategy (SS). & Partners Service Design Package Delivery Model Options Service Level Agreement Operational Level Agreement Processes Service Catalog Management Service Level Management Availability Management Capacity Management IT Service Continuity Management . ITIL defines the five components in terms of functions/activities.3. SCI wraps the first four components and depicts the necessary concern of each of the core components to continuously look for ways to improve the respective ITIL process. Service Operations (SO). ITIL The Information Technology Infrastructure Library (ITIL) Version 3 is a collection of good practices for the management of Information Technology organizations. Methods & Metrics Key Concepts Four P s : People. It consists of five components whose central theme is the management of IT services. ST and SO) at the core with SCI overarching the first four components. Processes. SD.1. and processes. Together these five components define the ITIL life cycle with the first four components (SS. Service Transition (ST). concepts. Standards 3.

accuracy. 7. according to plan. trends. format. Gather the data. Exceeding the recommended limits for short periods of time should not be a problem. the IT manufacturers recommend that data center operators maintain their environment within the recommended envelope. ASHRAE ASHRAE modified their operational envelope for data centers with the goal of reducing energy consumption.1. but running near the allowable limits for months could result in increased reliability issues. tactical and operational goals 1. targets met.2. Define what you can measure 3. etc. and operation within this envelope will not compromise overall reliability of the IT equipment. . For extended periods of time. Relationships. Who? How? When? Integrity of the data? 4. corrective actions? 6.Underpinning Contract Information Security Management Supplier Management Service Transition Processes Change Management Service Asset & Configuration Management Release & Deployment Management Knowledge Management Transition Planning & Support Service Validation & Testing Evaluation Key Concepts Service Changes Request for Change Seven R s of Change Management Change Types Release Unit Configuration Management Database (CMDB) Configuration Management System Definitive Media Library (DML) Service Operation Achieving the Right Balance Internal IT View versus External Business View Stability versus Responsiveness Reactive versus Proactive Quality versus Cost Processes Event Management Incident Management Problem Management Access Management Request Fulfillment Function Service Desk Technical Management IT Operations Management Application Management Service Continuous Improvement The 7 Step Improvement Process to identify vision and strategy. Frequency. In reviewing the available data from a number of IT manufacturers the 2008 expanded recommended operating envelope is the agreed-upon envelope that is acceptable to all the IT manufacturers. Present and use the information assessment summary action plans. Define what you should measure 2. Process the data. Implement corrective action. system. 3. 5. Analyze the data.

blade) dual-attached to the data network and storage environment to provide for load balancing and fault tolerance. Configurations should be modified. Uptime Institute 3. These templates should be used to support cloning of new instances as required and systematic maintenance of production instances as needed. memory. with patching and upgrade paths defined and pursued on a scheduled basis with each hardware element (e. to ensure an optimum balance between required and committed capacity. Production hardware should run the latest stable release of the selected hypervisor. Practices a.2. tested and maintained to allow for consistent OS.3. j Post-provisioning capacity analysis should be performed via a formal. Virtual machines should be monitored for CPU. b. d. j This process should allow for interaction between requestor and provider to ensure appropriate configuration and acceptance of any fee-for-service arrangements.Following are the previous and 2008 recommended envelope data: 2004 Version 20°C (68 °F) 25°C (77 °F) 40% RH 55% RH 2008 Version 18°C (64. network and disk usage. Server Virtualization 1. For example. with service owning unit participation.> 3.5°C DP (41.g. Servers 3.6 °F) 5.1.2. Hardware Platforms 3. Virtual machine templates should be developed. a 4 VCPU virtual .1.9 °F) 60% RH & 15°C DP (59 °F DP) Low End Temperature High End Temperature Low End Moisture High End Moisture <Additional comments on the relationship of electro static discharge (ESD) and relative humidity and the impact to printed circuit board (PCB) electronics and component lubricants in drive motors for disk and tape.1. maintenance and middleware levels across production instances. documented process.4 °F) 27°C (80.2. Virtual machines should be provisioned using a defined work order process that allows for an effective understanding of server requirements and billing/accounting expectations.1. c.

However. e. e. Utilize NDMP when able for NFS. g. Backups are faster and network traffic is reduced. make sure the target for the backup is not the same storage device as the SAN if you are doing disk to disk backups. f. b. c. quickly presents information about them and reduces likelihood of misunderstandings. High speed fiber channel drives are not necessary for all applications. d.2. Whenever possible use NDMP to backup NFS files on the SAN. h. changes to production instances/templates. 3. SAN iSCSI traffic should be on its own network using its own switches for performance and security reasons. However. Misalignment of partitions will cause one or two additional I/O s for . Virtual machines should be administered using a central console/resource such as VMWare VirtualCenter. A virtual development environment should be implemented. Ensure partition alignment between servers and SAN disks. Isolate iSCSI from regular network traffic. allowing for development and testing of new server instances/templates. hypervisor upgrades and testing of advanced high-availability features. A mix of fiber with various SATA capacity and speed drives produces a SAN that balances performance with cost. Practices a. remote KVM functionality should also be implemented to support remote hypervisor installation and patching and remote hardware maintenance. To reduce I/O contention. This readily identifies components.machine with 8 gigabytes of RAM that is using less than 10% of 1 VCPU and 500 megabytes of RAM should be adjusted to ensure that resources are not wasted. Develop meaningful naming convention when defining components of the SAN. This process should be formal. documented and performed on a frequent basis. Use different types of storage to produce tiers.2. Storage 1. virtual machines with high performance or high capacity requirements should have their non-boot/system disks provisioned using dedicated LUNs mapped to logical disks in the storage environment. Virtual machine boot/system disks should be provisioned into a LUN maintained in the storage environment to ensure portability of server instances across hardware elements.

MSSQL. sourcing) 3. Licensing considerations 3.7. Use thin provisioning where possible.5.3.1.1. To abstract storage hardware for device/array independence 2. DHCP 3. MySQL) 3.3. Identity Management 3.3.1.4. Decision matrix for OS platforms (Windows.1. Email 3. outsourcing) 3.4.1.2.4. Application Virtualization 3.3. Some SAN vendor tools allow you to dynamically grow and shrink storage allocations. AIX) 3.3. Licensing considerations 3. Help Desk/Ticketing 3.3. Calendaring 3.3.3.3.2.1.1. Use deduplication where possible.3.3. g.2. Support for DB platforms (training.3. Preferred internal development platforms 3. Spam Filtering 3. Middleware 3. Keep an eye on the technology and begin to apply conservatively.3.3.4. Tiering Standards . Operating Systems 3. HP-UX.3. Software 3. DNS 3.6.4.3.4.3.2.3. This can have a huge performance impact on large files or databases.3. Third Party Applications 3.1. Core/Enabling Applications 3. Informix.3. Web Services 3.1. LMS 3. Storage Virtualization 1. Decision matrix for DB platforms (Oracle.3.1.8. f. Linux.3.1.5.4.3.3.4. Databases 3.3. To provide replication/mirroring for higher availability.3.1.4.4.4. 3. CMS 3. DR/BC.3.5.2.1.3.4.3.4.4.3.5. A large amount of storage can be assigned to a server but doesn t get allocated until it is needed. Facilities 3.every read or write.1.2. etc. Delivery Systems 3.4.3. Desktop Virtualization 3. Syslog 3.3. Deduplication on VTL s have been in use for quite a while but deduplication of primary storage is fairly new.1.3.5. Support for OS platforms (sys admin skill set maintenance.3.3.2.2. 3.

The redundant UPS systems and related critical electrical equipment should be located in separate rooms external to the data center. Provide at least one set of double doors or one 3 -6 wide door into the Data Center to facilitate the movement of large equipment. and ceilings are to be sealed with an approved sealant with fire ratings equal to the penetrated construction. if any. 5. 13. 7. All penetrations through walls. floors. 10. on fire-rated walls between the Data Center and adjacent office spaces shall be one-hour rated.1. 6. The Data Center should be located in an area with no flooding sources directly above or adjacent to the room. Spatial Guidelines and Capacities 1. Column spacing shall be 30 x 30 or greater. The Data Center is to be constructed of slab-to-slab walls and a minimum one-hour fire rating. storage and test/development. The concrete floor beneath the raised access floor in the Data Center is to be sealed with appropriate water based sealer prior to the installation of . A portion of future data center expansion space may be used for staging. increase the R factor to achieve the same effect based on the exterior wall construction. 4. All walls of the Data Center shall be full height from slab to under floor above. and toilets and sinks away from areas alongside or above the Data Center location. If any walls are common to the outside. Redundant systems such as UPS systems (not parallel) not connected together shall be located in separate rooms where not cost prohibitive. Interior windows. Provide a 20 millimeter vapor barrier for the Data Center envelope to allow maintaining proper humidity.2. Locate building water and sprinkler main pipes. All paint within the Data Center shall be moisture control/anti-flake type. No windows are permitted in Data Center exterior walls. 8.3. There will be a Network Operation Center (NOC) outside the data center. 12.4. Provide R-21 minimum insulation. 9. KVMs will be located inside the Data Center as well as in the NOC. 11. 3. possibly supplemented with VESDA smoke detection with 24/7 monitoring. 2. FM-200 gaseous fire suppression will be used.

electrical. VCT floor. windows. standby generators and switchgear will be installed on grade. Data cabling and fiber shall be routed overhead in cable tray or on ladder rack. 16. All doors should have the appropriate fire rating per the NFPA and local codes. 15. A steel roof shall be provided.000 lbs each. backboards. 18. All construction in the Data Center is to be complete a minimum three weeks prior to occupancy. Consider installing a redundant roof or fireproof gypsum board ceiling (especially if the original roof is wooden). 20. The doors shall have a full 180-degree swing. Motion sensors shall be used for data center lighting control for energy efficiency. IT equipment cabinets can range to 2.5' clear from the floor slab to the bottom of beams in the Data Center. fire suppression systems. 21. piping or other equipment that can cause obstruction of airflow. Ceiling load capacity for suspension of data and power cable tray. ladder rack and HVAC ductwork shall not be less than 50 lbs/SF. otherwise on the roof. 24. mounted on equipment racks or suspended overhead from the ceiling. Subfloor load capacity shall not be less than 150 lbs/SF for the Data Center and 250 lbs/SF for electrical and mechanical support areas.the raised access floor. 17. Provide two drains with anti-backflow in the floor unless cost prohibitive. ceiling grid and tile (if specified). 22. HVAC and UPS. sprinklers. painting. 14. Provide a minimum ceiling height of 12. This includes walls. 23. lights. A 24 raised access floor would be required (please refer to details in this document). The access floor will be used as the primary cool air supply plenum and should be kept free of cables. Power cabling shall be routed in raceway overhead. doors. hardware. All doors into the Data Center and Lab areas are to be secured with locks. HVAC heat rejection equipment may be installed on grade if space allows. . 25. Utility power transformers. frames. Roof load capacity shall not be less than 40 lbs/SF with capability to add structural steel for support of HVAC heat rejection equipment. Orient light fixtures to run between and parallel with equipment rows in the Data Center. Temporary power and HVAC are not acceptable. Rooms must be wet-mop cleaned to remove all dust prior to installing equipment. 19. All doors into the Data Center are to have self-closing devices.

3.4.1.3. Electrical Systems <This section from Lawrence Berkeley National Laboratories. The referenced Design Guidelines Sourcebook is by PG&E. http://hightech.lbl.gov/DCTraining/best-practicestechnical.html >
1. Electrical Infrastructure

y

Maximize UPS Unit Loading o When using battery based UPSs, design the system to maximize the load factor on operating UPSs. Use of multiple smaller units can provide the same level of redundancy while still maintaining higher load factors, where UPS systems operate most efficiently. For more information, see Chapter 10 of the Design Guidelines Sourcebook. Specify Minimum UPS Unit Efficiency at Expected Load Points o There are a wide variety of UPSs offered by a number of manufacturers at a wide range of efficiencies. Include minimum efficiencies at a number of typical load points when specifying UPSs. Compare offerings from a number of vendors to determine the best efficiency option for a given UPS topography and feature set. For more information, see Chapter 10 of the Design Guidelines Sourcebook. Evaluate UPS Technologies for the Most Efficient o New UPS technologies that offer the potential for higher efficiencies and lower maintenance costs are in the process of being commercialized. Consider the use of systems such as flywheel or fuel cell UPSs when searching for efficient UPS options. For more information, see Chapter 10 of the Design Guidelines Sourcebook.

y

y

2.

Lighting

y

Use Occupancy Sensors o Occupancy sensors can be a good option for datacenters that are infrequently occupied. Thorough area coverage with occupancy sensors or an override should be used to insure the lights stay on during installation procedures when a worker may be 'hidden' behind a rack for an extended period. Provide Bi-Level Lighting o Provide two levels of clearly marked, easily actuated switching so the lighting level can be easily changed between normal, circulation space lighting and a higher power detail work lighting level. The higher power lighting can be normally left off but still be available for installation and other detail tasks. Provide Task Lighting o Provide dedicated task lighting specifically for installation detail work to allow for the use of lower, circulation space and halls level lighting through the datacenter area.

y

y

3.4.1.4. HVAC Systems <This section from Lawrence Berkeley National Laboratories. The referenced Design Guidelines Sourcebook is by PG&E. http://hightech.lbl.gov/DCTraining/best-practicestechnical.html >
1. Mechanical Air Flow Management

y y

Hot Aisle/Cold Aisle Layout Blank Unused Rack Positions o Standard IT equipment racks exhaust hot air out the back and draw cooling air in the front. Openings that form holes through the rack should be blocked in some manner to prevent hot air from being pulled forward and recirculated back into the IT equipment. For more information, see Chapter 1 of the Design Guidelines Sourcebook.

y y

Use Appropriate Air Diffusers Position supply and returns to minimize mixing o Diffusers should be located to deliver air directly to the IT equipment. At a minimum, diffusers should not be placed such that they direct air at rack or equipment heat exhausts, but rather direct air only towards where IT equipment draws in cooling air. Supplies and floor tiles should be located only where there is load to prevent short circuiting of cooling air directly to the returns; in particular, do not place perforated floor supply tiles near computer room air conditioning units using the as a return air path. For more information, see Chapters 1 and 2 of the Design Guidelines Sourcebook. Minimize Air Leaks in Raised Floor

y
2.

Mechanical Air Handler Systems

y

Use Redundant Air Handler Capacity in Normal Operations o With the use of Variable Speed Drives and chilled water based air handlers, it is most efficient to maximize the number air handlers operating in parallel at any given time. Power usage drops approximately with the square of the velocity, so operating two units at 50% capacity uses a sum total less energy than a single unit at full capacity. For more information, see Chapter 3 of the Design Guidelines Sourcebook. Configure Redundancy to Reduce Fan Power Use in Normal Operation o When multiple small distributed units are used, redundancy must be equally distributed. Achieving N+1 redundancy can require the addition of a large number of extra units, or the oversizing of all units. A central air handler system can achieve N+1 redundancy with the addition of a single unit. The redundant capacity can be operated at all times to provide a lower air handler velocity and an overall fan power reduction, since fan power drops with the square of the velocity. Light loading. For more information, see Chapter 3 of the Design Guidelines Sourcebook. Control Volume by Variable Speed Drive on Fans Based on Space Temperature  The central air handlers should use variable fan speed control to minimize the volume of air supplied to the space. The fan speed should be varied in series with the supply air temperature in a manner that reduces fan speed to the minimum speed possible before increase supply air temperature above a reasonable set point. Typically, supply air of 60F is appropriate to provide the sensible cooling required by datacenters. For more information, see Chapters 1 and 3 of the Design Guidelines Sourcebook.

y

y

3.

Mechanical Humidification

y y y

Use Widest Suitable Humidity Control Band Centralize Humidity Control Use Lower Power Humidification Technology o There are several options for lower power, non-isothermal humidification, including air or water pressure based 'fog' systems, air washers, and ultrasonic systems. For more information, see Chapter 7 of the Design Guidelines Sourcebook.

4.

Mechanical Plant Operation

y

Use Free Cooling / Waterside Economization o Free cooling provides cooling using only the cooling tower and a heat exchanger. It is very attractive in dry climates and for facilities that have local concerns about outside air quality that may cause concern about the use of standard airside economizers. For more information, see Chapters 4 and 6 of the Design Guidelines Sourcebook. Monitor System Efficiency o Install reliable, accurate monitoring of key plant metrics such as such kW/ton. The first cost of monitoring can be quickly recovered by identifying common efficiency problems, such as: low refrigerant charge, non-optimal compressor mapping, incorrect sensors, incorrect pumping control, etc. Efficiency monitoring provides the information needed for facilities personnel to optimize the system's energy performance during buildout and avoid efficiency decay and troubleshoot developing equipment problems over the life of the system. For more information, see Chapter 4 of the Design Guidelines Sourcebook. Rightsize the Cooling Plant

y

y

o

Due to the critical nature of the load and unpredictability of future IT equipment loads, datacenter cooling plants are oversized. The design should recognize that the standard operating condition will be at partload and optimize for efficiency accordingly. Consistent part-load operation dictates using well know design approaches to part load efficiency such as utilizing redundant towers to improve approach, using multiple chillers with variable speed drive, variable speed pumping throughout, chiller staging optimized for partload operation, etc. For more information, see Chapter 4 of the Design Guidelines Sourcebook.

3.4.1.5. Fire Protection & Life Safety 3.4.1.6. Access Control 3.4.1.7. Commissioning <This section from Lawrence Berkeley National Laboratories. The referenced Design Guidelines Sourcebook is by PG&E. http://hightech.lbl.gov/DCTraining/best-practicestechnical.html >
1. Commissioning and Retrocommissioning

y

Perform a Peer Review o A peer review offers the benefit of having the design evaluated by a professional without the preconceived assumptions that the main designer will inevitably develop over the course of the project. Often, efficiency, reliability and cost benefits can be achieved through the simple process of having a fresh set of eyes, unencumbered by the myriad small details of the project, review the design and offer suggestions for improvement. Engage a Commissioning Agent o Commissioning is a major task that requires considerable management and coordination throughout the design and construction process. A dedicated commissioning agent can ensure that commissioning is done in a thorough manner, with a minimum of disruption and cost. Document Testing of All Equipment and Control Sequences o Develop a detailed testing plant for all components. The plan should encompass all expected sequence of operation conditions and states. Perform testing at with the support of all relevant trades ² it is most efficient if small errors in the sequence or programming can be corrected onthe-spot rather than relegated to the back and forth of a traditional punchlist. Functional testing performed for commissioning does not take the place of equipment startup testing, control pointto-point testing or other standard installation tests. Measure Equipment Energy Onsite o Measure and verify that major pieces of equipment meet the specified efficiency requirements. Chillers in particular can have seriously degraded cooling efficiency due to minor installation damage or errors with no outward symptoms, such as loss of capacity or unusual noise. Provide Appropriate Budget and Scheduling for Commissioning o Commissioning is a separate, non-standard, procedure that is necessary to ensure the facility is constructed to and operating at peak efficiency. Additional time commitment beyond a standard construction project will be required from the contractors. Coordination meetings dedicated to commissioning are often required at several points during construction to ensure a smooth and effective commissioning. Perform Full Operational Testing of All Equipment o Commissioning testing of all equipment should be performed after the full installation of the systems are complete, immediately prior to occupancy. Normal operation and all failure modes should be tested. In many critical facility cases, the use of load banks to produce a realistic load on the system is justified to ensure system reliability under design conditions. Perform a Full Retrocommissioning o Many older datacenters may have never been commissioned, and even if they had performance degrades over time. Perform a full commissioning and correct any problems found. Where control loops have been overridden due to immediate operational concerns, such as locking out condenser water reset due to chiller instability, diagnose and correct the underlying problem to maximize system efficiency, effectiveness, and reliability. Recalibrate All Control Sensors Where Appropriate, Install Efficiency Monitoring Equipment

y

y

y

y

y

y

y y

5. Relationship to CSU Remote Backup DR initiative 3.4.3.2.3.aspx > . Remote Operations 3. Relationship to overall campus strategy for Business Continuity 3. humidification/dehumidification operation. Network Virtualization 1.4. Console Management 3. Operational considerations 3.4. However. Resource-sharing between campuses: Cal State Fullerton/San Francisco State example <sourced from ITAC Disaster Recovery Plan Project.4.4.4. Staffing 3. Load Balancing/High Availability 3.4.1. Multiprotocol Label Switching (MPLS) used to create Virtual Private Networks (VPNs) to provide traffic isolation and differentiation.4.sharepointsite.3.2.4.4.1.5.4.5.3.1.5. Training 3.1. A number of simple metrics (cooling plant kW/ton.5. without a simple means of continuous monitoring.4. Use of MPLS for WAN security and network management a. Using VLANs to present multiple virtual subnets within a given physical subnet c.1.5.5.4. Structured Cabling 3.3. Recovery Time Objectives and Recovery Point Objectives discussed in 2.1 (Backup and Recovery 3.4.2. Infrastructure considerations 3. Use of VLANs for LAN security and network management a. Accounting 3.1.4.5. http://drp. 3.1.o As a rule. a thorough retrocommissioning will locate a number of low-cost or no-cost areas where efficiency can be improved.7. economizer hours of operation. 3.) should be identified and continuously monitored and displayed to allow facilities personnel to recognize when system efficiency has been compromised.3.4.3. Disaster Recovery 3. Operations 3.net/itacdrp/default.4.5. Connectivity 3.4.4.2.4.4. etc. Monitoring 3. the persistence of the savings is likely to be low.2. Controlling network access through VLAN assignment b. Network 3. Using VLANs to present one virtual subnet across portions of many physical subnets 2.4.

educational and business activities can continue while the infrastructure interruption is resolved. the remote site can substitute for specific resources. CSU. such as complete failure of both CENIC links to the Internet. (The same is true for SFSU. The site allows a number of critical computing functions to remain in service in the event of severe infrastructure disruption. and San Francisco State s Data Center could not provide the space for that much equipment. or electrical to duplicate all the equipment at SFSU.fullerton. This granular approach provides significant flexibility for responding to specific computing issues in the Data Center. Fullerton is the only CSU campus to establish such extensive off-site capabilities.edu The faculty/student Portal: my. including: y y y y CMS Data warehouse Filenet document repository Voice mail IBM mainframe (which is to be decommissioned in Dec. Fullerton does not have the space. but accessed via the Fullerton Portal) Brass Ring H. The goal of the site is to provide continuity for the most critical computing services that can be supported in a cost-effective manner.R. and Student applications (hosted by Unisys in Salt Lake City.edu Faculty/staff email. cooling. Finance. Student email (provided by Google. Complete duplication of every central computing resource on the Fullerton campus is prohibitively expensive. recruitment system (hosted by Brass Ring) OfficeMax ordering system GE Capital state procurement card system With these capabilities. such as the campus website. failure of the network core on the Fullerton campus. or complete shutdown of the Fullerton Data Center. 2008) . A number of significant resources were found to be too costly to duplicate off site. but accessed via the Fullerton Portal) Blackboard Learning Solutions (hosted by Blackboard ASP. but accessed via the Fullerton Portal) CMS HR. Fullerton has established a Business Continuity computing site at San Francisco State University.fullerton.CSU. In addition to complete operation.) Major capabilities include continuity of access to: y y y y y y y y y The main campus website: www.

To avoid unexpected disruption. servers. An arrangement was worked out with CENIC. network switch. We use no IP addresses at SF. Fullerton purchased an entire cabinet of equipment. Staff at San Francisco provide no routine assistance. to use features of the Border Gateway Protocol to allow a campus to switch a portion of the campus Internet Protocol (IP) space from their main campus to the remote campus in a matter of seconds. not through SFSU s primary link. The project began with initial discussions between the CIO s of Fullerton and San Francisco in 2006. and would allow Active Directory to be rebuilt if the entire Fullerton Data Center were destroyed. 2008. including firewall. the statewide data network provider. They make no changes to their firewall or routers. all Internet traffic to our site goes through the backup CENIC link. This completely avoids Fullerton having any impact on the SFSU network. 2008. where they agreed in concept to provide limited hosting for equipment from the other campus. Experience with the unexpected consequences of setting up automated systems prompted this design. the Continuity site is activated manually by accessing the Firewall through the Internet and changing a few rules. The equipment was tested locally and transported to San Francisco in January. the same Op Manager software that monitors equipment on the Fullerton campus also monitors the servers at SFSU. And (3) it allows the entire remote site to be relocated to a different remote host with almost no change. and remote management of keyboards and power plugs. This capability was perfected and tested in the summer of 2007. All maintenance and monitoring of the remote equipment is done through the Internet from the Fullerton campus. The SFSU site contains a substantial capability. This has several important benefits: (1) it places little burden on the remote host . We need no network ports. Any unexpected server conditions at SFSU automatically trigger alerts to Fullerton staff. And. including: y Domain controllers for Fullerton s two Microsoft Active Directory domains (AD and ACAD) This provides secure authentication to the portal and email servers.The project does not provide continuity for resources provided outside of Fullerton IT. . (2) it avoids the need to train staff at the remote campus and re-train when personnel turnover. A major innovation was the use of the backup CENIC link at SFSU to provide Internet access for Fullerton s remote site. The site became operational in February. Because the remote site is connected to Fullerton through a VPN tunnel.

y y y y y y y y Web Server Portal Server Application Servers for running batch jobs and synchronizing the Portal databases Microsoft Exchange email servers Blackberry Enterprise Server (BES) to provide continuity for Blackberry users SQL database servers CMS Portal servers (this capability not totally implemented yet because CMS Student is just coming on line during 2008) Email gateway server Because the SFSU site constantly replicates domain updates and refreshes the Portal database daily. the SFSU site is a better source of much information than the tape backups kept at Iron Mountain. .

representing elements such as physical space.3. Page 1 of 1 MEMORANDUM OF UNDERSTANDING This MEMORANDUM OF UNDERSTANDING is entered into this 1st day of September. Following is a sample template (sourced from ITAC DR site): MOU SSU-SJSU Sonoma State University Rider A.4. 2008. University Computing and Telecomm. . Memorandum of Understanding Campuses striking partnerships in order to share resources and create geographic diversity for data stores will want to document the terms of their agreement. access rights and effective dates. services. Information Technology Department and San Jose State University. by and between Sonoma State University.Network Diagrams Figure 1 Intercampus network diagram Figure 2 Remote LAN at DR site 3.5. Computer Operations.

Power provided by Sonoma State University will include uninterrupted power supply (UPS) and diesel generator backup power. Physical access is available during normal business hours (8am 5pm) by appointment or emergency access on a best effort basis by contacting the SSU emergency contact listed in this document. Collocated network devices will reside in a physically segregated LAN. Security policy change requests for the firewall by SJSU will be accomplished within 7 days of receipt by SSU. PHYSICAL ACCESS BY SISTER CAMPUS Physical access to the two racks in the SSU data center by SJSU support staff will be granted according to the escorted visitor procedures in place at SSU. mounting equipment specification (EIA-310-C) and requisite power and cooling for same. PHYSICAL LOCATION OF HOSTED EQUIPMENT Sonoma State University will station the two 19 racks for San Jose State University in the Sonoma State University Information Technology data center. Physical access to the two racks in the SJSU data center by SSU support staff will be granted according to SERVICES PROVIDED BY HOST CAMPUS SSU Computer Operations staff will provide limited services to SJSU as required during normal business hours (8am 5pm) or after hours on a best effort basis by contacting the SSU emergency contact listed in this document. or other simple task that cannot be performed remotely by SJSU. San Jose State University will station the two racks for Sonoma State University LOGICAL LOCATION OF HOSTED EQUIPMENT Equipment stationed by SJSU in the two racks in the SSU data center will be provisioned in a segregated security zone behind a firewall interface whose effective security policy is specified by the hosted campus. Limited services are defined to be such things as tasks not to exceed 1 hour of labor such as visiting a system console for diagnostic purposes.19 racks. power cycling a system.Each campus will provide. for use by the other campus: 2 each . .

The point of contact person for Sonoma State University will be Mr. Samuel Scalise (707 664-3065.edu).edu).edu).lopez@sonoma. The following are characteristics of a dynamic data center: y y y y y Enables workload mobility Automatically managed through orchestration Seamlessly leverages external services Service-oriented Highly available . The point of contact person for San Jose State University will be Mr. Don Lopez (707 291-4970. Limited services are defined to be such things as tasks not to exceed 1 hour of labor such as visiting a system console for diagnostic purposes.6. SECURITY REQUIREMENTS Level-one data in transit or at rest must be encrypted. Each campus will conform to CSU information security standards as they may apply to equipment stationed at the sister campus and cooperate with the sister campus Information Security Officer pertaining to audit findings on their collocated servers. power cycling a system. 3. extending to desktop virtualization and application virtualization. virtualization is key.baker@sjsu. such as virtualizing connectivity to storage systems and the network. services and support are mutual The term of this MOU shall be September 1. The emergency point of contact person for Sonoma State University will be Mr. don.SJSU Computer Operations staff will provide limited services to SSU as required during normal business hours (8am 5pm) or after hours on a best effort basis by contacting the SJSU emergency contact listed in this document. scalise@sonoma. Data centers that are able to deliver services dynamically to meet demand will have other virtualization layers as well. and to maximize the unused capacity of compute and storage resources. Don Baker (408 924-7820 don. 2009. 2008 through June 30. Total Enterprise Virtualization In order to allow IT organizations to remain nimble to the increased complexity of application provisioning and delivery. And while virtualizing servers and storage are obvious targets for optimization. total enterprise virtualization would encompass additional layers. or other simple task that cannot be performed remotely by SSU. The emergency point of contact for San Jose State University will be No charges to either party since equipment.

e. application support. decreases space and power consumption.y y y Energy and space efficient Utilizes a unified fabric Secure and regulatory compliant Achieving these characteristics requires investments in some of the following key enabling technologies: 1. including better resource utilization. greater performance. but it also improves storage capacity utilization. guest OS +application) enables the workload to move from one system to another without hardware compatibility worries. 3. Storage virtualization is an abstraction layer that decouples the storage interface from the physical storage. In fact. and administrator trust with critical applications to virtual platforms. This virtualization layer not only creates workload agility (not tied to a single storage infrastructure). Issues to be aware of in server virtualization: licensing. storage and server . In fact. obfuscating where and how data is stored. Automation and orchestration Storage virtualization is an increasingly important technology for the dynamic data center because it brings many of the same benefits to the IT table as server virtualization. Storage virtualization is an abstraction layer that decouples the storage interface from the physical storage. This virtualization layer not only creates workload agility (not tied to a single storage infrastructure). and even energy efficiency. high availability. disaster recovery.. decreases space and power consumption. facilitating a more dynamic data center together by enabling workloads to migrate to any physical machine connected to the storage virtualization layer that houses the workload's data. Storage Virtualization Storage virtualization is an increasingly important technology for the dynamic data center because it brings many of the same benefits to the IT table as server virtualization. storage and server virtualization fit hand and glove. In turn. hardware capabilities. and increases data availability. this opens up a whole new world of IT agility possibilities that enable the administrator to dynamically shift workloads to different IT resources for any number of reasons. but it also improves storage capacity utilization. and increases data availability. Imagine a data center that can automatically optimize workloads based on spare CPU cycles from highly energy-efficient servers. server maintenance. obfuscating where and how data is stored. Server Virtualization Server virtualization's ability to abstract the system hardware away from the workload (i. 2.

administrators can move workloads between physical servers without worrying about network or SAN connectivity.e. increasing host connectivity to shared resources and increasing workload mobility. iSCSI is already a suitable alternative for many FC applications. These tools put the IT administrator in the role of the conductor. The best automation and orchestration software can reduce the management complexity created by workload mobility. 5. Using 10GbE as a universal medium. SANs today operate on FC-based fabrics running at 2. 4. has the performance potential (i.virtualization fit hand and glove. Workflow automation and orchestration tools use models and policies stored in configuration management databases (CMDBs) that describe the desired data center state and the actions that automation must take to keep the data center operating within administrator-defined parameters. Just like 1GbE and Fast Ethernet before it. or Cisco s version called Data Center Ethernet. Unified Fabric with 10GbE 10 Gigabit Ethernet (10GbE) is an important dynamic data center-enabling technology because it raises Ethernet performance to a level that can compete with specialized I/O fabrics. bandwidth and latency) to carry both SAN and network communication I/O on the same medium. 4. Desktop and Application Virtualization 6. facilitating a more dynamic data center together by enabling workloads to migrate to any physical machine connected to the storage virtualization layer that houses the workload's data. thereby potentially unifying multiple I/O buses into a single fabric. including the storage networks. automating systems and workload management using policy-based administration. and now 8 Gb speeds. Ethernet. allows for lossless networks that can do without the overhead of TCP/IP and therefore rival Fibre Channel (FC) for transactional throughput. but Fibre Channel over Ethernet (FCoE) should close the gap for those systems that still cannot tolerate the relative inefficiencies of iSCSI. which are often managed by the storage administrators whose time could be better spent on managing the data and archival and recovery processes rather than the connectivity. One of the key benefits of a unified fabric within the data center is putting the network team in the role of managing all connectivity. at 10 Gb. Cloud computing .. The development of Converged Enhanced Ethernet. For example. 10GbE will become the standard network interface shipped with every x86/64 server.

etc. Sarbanes-Oxley (SOX) statutes present a moving target with severe penalties for non-compliance. The requirements defined by the governance process will be a critical factor in establishing recovery point objectives (RPO) and recovery time objectives (RTO) and must be congruent with budget. In doing this. SOX. .4.7.Not for everything. Project Management 3.4. the Chancellor s Office must make decisions about the classification and retention of business information. development of backup and disaster recovery standards and policies are responsibility of enterprise governance. Assumptions: a. Service Management 3. This is not a trivial task as evidenced by the complex legal compliance issues posed by Family Educational Rights and Privacy Act (FERPA). 1. In this regard. Data Management 3.1.2. The Health Insurance Portability and Accountability Act (HIPAA). well written standards and policies are the lynchpin of a successful backup and recovery program deployment. Management Disciplines 3.7.3. c.1. Configuration Management 3.1.7. RPO/RTO requirements are driven by business and legal constraints (FERPA. HIPAA. APIs Abstraction layer Internal and external [Key concepts extracted from The Dynamic Data Center by The Burton Group] 3.1.7. Backup and Recovery One of the essential elements of an effective business continuity plan resides within backup and disaster recovery infrastructure component. b.7. While the technical issues pertaining to hardware and software are critical to the implementation of an effective backup and disaster recovery plan.7.) and should be defined by enterprise governance.7. dependent on service offerings. Budget constraints drive the technological solution for backup/recovery. Service Level Agreements 3.

Sizing of the tiers is driven largely by the retention and recovery time objectives and corresponding budget constraints. d. c. e.d.e. The backup window. Lower tiers generally set expectations for faster recovery time and lower cost of implementation. i. i. Review and insure that established retention requirements comply with state and federal legal requirements. Work with CSU and SSU governance bodies to establish retention requirements for electronic media. Work with CSU and SSU governance bodies to establish business continuity and disaster requirements. Best Practices a. h. as much storage as possible at each tier means lower RPO/RTO. y Tier 2 . RPO and RTO objectives will have a significant impact on the design of the backup and disaster recovery plan: higher expectations for RPO and RTO will drive the costs of the requisite technological design. Subject to budget constraints. b. Sizing of tiers of backup/recovery storage should be as large as is practicable. g. User control at tier one facilitates restoration of backup files without support from operations staff. RPO and RTO have a major influence on the backup/recovery technological solution. The backup/recovery architecture should be a tiered design consisting of: y Tier 1 The first tier of storage is volume snapshots for immediate recovery under user control.. A tiered approach to the backup/recovery architecture mitigates budget and backup window constraints. 2. the time slot within which backups must take place (so as to not interfere with production) is a significant constraint on the backup/recovery design. This typically is referred to as snap reserve and is set aside when the volume is configured. f.

disaster recovery would be much faster. Magnetic tape solutions are usually more economical and the chosen media for archival systems. the longer these backup datasets can be retained. While the cost of a WAN-based solution is significantly higher than a magnetic media solution. the encrypted backup tapes are useless. Tier 3 strategies can be disk-based but budget constraints often rule out disk based solution.4.7. A copy of encrypted full backup datasets should be stored at another CSU campus serviced by a different power grid and if possible in a different seismic zone at a minimum in a different earthquake fault zone as established by the California State Geologist. Deduplication should be employed to reduce the footprint of the backup dataset. For more aggressive recovery time objective requirements. This would be the target of system backups which more effectively utilize the available backup window. e. Hierarchical Storage Management .7. a WANbased remote backup system should be employed. The higher the capacity of the disk target.3.2. In addition. The movement of and tracking of the encrypted backup datasets will be defined in a memorandum of understanding between the sister campuses who engage in the reciprocal agreement. Recovery time from disk is much faster that other archive media such as magnetic tape and hence. f. The schedule of remote backup datasets will be determined by the recovery point objective (RPO) established by the CSU governance body. 3. y Tier 3 The third tier is deployed for longer term archival of backup datasets.4.The second tier of online storage is a disk target for backups such as a virtual tape library (VTL). Archiving 3. Key management is vital to ensure timely decryption. can support more aggressive recovery time objectives. Without effective and timely key management. This is particularly true during a disaster recovery process when magnetic media must be utilized to recover on a remote site. g. a WANbased system could also dove tail into a high availability architecture where the backup datasets could feed redundant systems on the sister campus and provide for failover in the event of an outage in the primary datacenter. Magnetic media stored off-site should be encrypted using LTO-4 tape drives and a secure key management system. The sizing of tier 2 will drive the availability of the backup datasets on the VTL.

3.7.4.7.3.7.5. Document Management 3.6.1.6.7.5.7.5.7.7.6. Software Distribution 3.6. Encryption 3.7.1.7.7. Tagging/Tracking 3. Antivirus protection 3.7.6.7.7.2. Asset Management 3. Reporting 3. Problem Management 3.4. Data Security 3.1.7. Authentication 3. Fault Detection 3. OS update/patching 3.7. Security 3.2.7.7.5. Correction 3.3.7.7.4. Physical Security .5.2.3. Licensing 3.7.7.7.7.