You are on page 1of 15
E-Guide
E-Guide

Power and Cooling best practices for large data centers

Data center infrastructure is constantly evolving. Increases in computer room and data center density and diversity are driving change in the power and cooling systems that business critical servers and communications devices depend on for their performance and reliability. In this expert e-guide from SearchDataCenter.com, discover power and cooling tips in order to optimize energy efficiency in your data center. Also, learn best practices for backup power maintenance to ensure uninterrupted power.

Sponsored By:

Sponsored By:
E-Guide Power and Cooling best practices for large data centers Data center infrastructure is constantly evolving.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

E-Guide
E-Guide

Power and Cooling best practices for large data centers

Table of Contents

Data center infrastructure management: Power and cooling tips Best practices for backup power maintenance Resources from Schneider Electric

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Data center infrastructure management: Power and cooling tips

By Omar McKee, Contributor

Data center infrastructure is constantly evolving. Increases in computer room and data center density and diversity are driving change in the power and cooling systems that business critical servers and communications devices depend on for their performance and reliability.

As equipment density rises, hardware becomes mission critical because the each application deployed increases business dependence on data center IT systems. At the same time, entire facilities, as well as individual racks, are supporting an escalating number of devices as server form factors continue to shrink.

Density is an issue felt across all business, according to a 2006 Data Center Users Group study released in October. Heat density and power density represented two of the top three issues driving change in the data center as more than 40% of the respondents noted these as top trends related to infrastructure.

For many organizations, the IT infrastructure has evolved into an interdependent business critical network with the data center as the hub. A power failure at any point along the network can impact the entire operation -- and have serious consequences for the business.

As a result, there exists a valuable opportunity for resellers to work closely with customers to proactively identify problems within their power systems that could adversely affect availability of their critical systems and operational performance of their facility.

Preventive maintenance usually requires a shut-down to ensure electrical connection integrity. Most preventive maintenance measures should only be attempted by qualified personnel.

The following are preventive maintenance tips for resellers to use when reviewing a customer's power systems:

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Small UPS devices should be inspected annually.

Medium and large UPS systems should be inspected twice a year to ensure proper operation and confirm that the unit is operating within the manufacturer's specifications.

Data center semi-annual service

Perform temperature checks on all breakers, connections and associated controls.

Repair and/or report all high temperature areas. Perform complete visual inspection of equipment including subassemblies, wiring

harnesses, contacts, cables and major components. Check air filters for cleanliness. Check modules for the following:

o

Rectifier and inverter snubber boards for discoloration.

o

Power capacitors for swelling or leaking oil.

o

DC capacitor vent caps that have extruded more than 1/8".

Record all voltage and current meter readings on module control cabinet or system

control cabinet. Measure and record harmonic trap filter currents.

Data center annual services

Check inverter and rectifier snubbers for burned or broken wires.

Ensure all nuts, bolts, screws, and connectors for tightness and heat discoloration.

Verify fuses on the DC capacitor deck for continuity (if applicable).

With customer approval, perform operational test of the system, including unit

transfer and battery discharge. Calibrate and record all electronics to system specifications.

Install or perform Engineering Field Change Notices (FCN), as necessary.

Measure and record all low-voltage power supply levels.

Measure and record phase to phase input voltage and currents.

Review system performance with customer to address any questions and to schedule any repairs.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Data center battery inspection service

This visual inspection should be performed during the UPS semi-annual and annual preventive maintenance services.

Check integrity of battery cabinet (if applicable).

Visual inspection of the battery cabinet or room to include:

 

o

Check for NO-OX grease or oil on all connections.

o

Check battery jars for proper liquid level (if flooded cells).

o

Check for corrosion on all terminals and cables.

o

Examine physical cleanliness of battery room and jars.

Measure and record DC bus ripple voltage.

Measure and record total battery float voltage.

Data center preventive maintenance service on power management systems

Perform complete visual inspection of internal sub-assemblies, wiring harnesses,

contactors, cables, major components, and check for proper clearance around the unit. Examine all transformer, terminal block and ground/neutral bus bar connections, as

well as input and output breakers for tightness. Inspect high and low voltage junction box terminals for tightness.

Inspect all option wiring for tightness. (Spike suppressor, ground fault, phase

rotation/loss). Inspect all capacitor bank connections for a solid fit.

Verify that all cooling fans are functional and air ducts are open.

Confirm continuity of all fuses and that they are correctly rated.

Measure input and output phase to phase voltage.

Determine the output, neutral and ground current.

Verify kVA load and capacity per phase.

Validate grounding electrode conductor and any isolated grounds.

Measure all filter capacitor currents at no load for all three phases.

Measure primary, secondary, second harmonic and third harmonic (if applicable). All should be balanced within 2.5% deviation.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Verify EPO lamps are illuminated.

Check that local and remote EPOs are functioning properly if permitted.

Confirm that monitor is recording within +/- 2% of those values measured.

Activate transformer over-temp alarm and shutdown circuits to confirm proper

operation if permitted. Verify operation of any option for alarm or shutdown sequence if permitted and of

any customer alarm circuits and specified messages. Make sure of specified restart capabilities either manual or auto-restart.

Verify operation of the bypass switch and the bypass transformer over temp alarm.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Best practices for backup power maintenance

By Julius Neudorfer, Contributor

Having uninterrupted clean power for the critical load is the objective of every data center. To achieve this goal, the systems in the power path must be properly maintained and tested. Ideally, this would be done without interrupting or potentially exposing the critical load to power loss.

However, maintenance is sometimes seen as a disruptive, (un)necessary evil and expense

by some senior managers. This is especially true in today’s economic climate, where every

expense is examined to see if can be reduced or eliminated. Nonetheless, periodic maintenance is required to achieve the projected level of equipment reliability and critical load uptime. Of course, this requires that some level of redundancy be built into the power chain to allow for concurrent operation during maintenance (i.e. tier 2-4).

The higher the level of power system redundancy (N+1, 2N or S+S, corresponding to tier levels 2-4) the lower the probability that power to the critical load won’t need to be interrupted during scheduled maintenance procedures. However, redundant equipment is meaningless unless it is properly maintained and tested. Improper procedures and human error have caused outages, even in tier 3- and 4-level systems.

Assuming there is redundancy available to allow for maintenance, let’s examine these key components and best practices for backup power maintenance.

Main utility power panel

The main utility power panel is the first panel in the data center power path. At the utility service entrance, the utility hands off the power to the entire facility. Although this panel is

normally untouched during normal operation, it’s recommended that it is visually and

thermally inspected on a quarterly or semiannually basis, and no less than annually.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Generators

The need to constantly test and maintain backup generators is well recognized by data center facilities managers. In many cases, there is an automated weekly generator exercise routine initiated by the ATS. It is also imperative that all staff be apprised and immediately available during any scheduled maintenance or test. Virtually any type of testing requires constant supervision. For example, starting a generator and ATS load transfer test to the generator, and then moving on to other tasks (or going out to lunch) once the initial load transfer is successful, is poor practice and invites exposure to failure.

While it may be boring to stand around for 30 to 60 minutes just looking at a running generator, it is a good time to listen for unusual sounds and inspect the generator for fluid leaks. It is also good practice to take some voltage and current measurements, as well as rpm and frequency readings. Observe and record oil pressure and temperature gauges and also scan specific areas of the motor-generator with a hand-held IR thermometer or thermal scanner. By recording these readings, you will have a baseline and running record for reference that can be analyzed. You can also use the readings to help monitor for any problems and facilitate preventive service on the suspect areas. Maintenance schedules, such as oil and filter changes, are based on run-time hours, as well as periodic intervals, and are usually prescribed by the engine manufacturer. In addition, diesel fuel should be checked for quality semi-annually, or even more frequently, when warranted.

Generator paralleling switchgear

In larger sites with multiple generators, paralleling switchgear is required. This extra equipment increases the complexity of the data center’s backup power system, as the generator synchronization controls and paralleling switchgear require special attention. Ensuring that the sync controls are working correctly is critical, and regular testing and inspections should coincide with the generator’s physical maintenance. If all the generators aren’t synchronized -- rotating at the exact same rpm and in-phase with each other -- the

load won’t be able to be transferred to the generator array. The data center may go down,

even if some (or even all) generators are running, but are not in-sync.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Of course, some components of the sync controls are also part of the systems mounted on the generators, and as such must be coordinated with the generator maintenance program. It is very common for the generator, ATS and paralleling gear to be maintained by the same vendor. Focusing on the specialized requirements of the sync controls, such as other switchgear and regular visual and thermal inspections, is recommended.

Automatic transfer switch (ATS)

Note that unlike most types of switchgear that typically remain in static positions and untouched during their service life, ATS equipment is far more frequently used to make, break and transfer power under load. Therefore, it must be closely watched so see if the contacts need be serviced or replaced. Every time an ATS affects a power transfer, it essentially “uses up” the contacts by the arcing, caused by the making and breaking of high-energy circuits. In most cases, the ATS gear must be disassembled to examine or replace the contacts. The electromechanical transfer mechanism must also be serviced to make sure it can move freely and is free from contaminants.

For complete maintenance, the ATS needs to be de-energized. The ATS also needs to have a functional isolation (bypass) path, either internally or externally, to allow for uninterrupted power to the load during maintenance. Not all ATS installations have this feature; those that

don’t have it require that power be interrupted for ATS servicing. The ATS bypass must be

part of the original design requirements to ensure that it can be serviced without interrupting power to the load. The ATS should be inspected quarterly or semiannually, and maintained annually.

Note that some data centers will operate on generator during UPS or battery bypass operations to avoid possible exposure to a utility outage during maintenance, as there would not be UPS power available to provide ride-through while the generator starts and is ready to accept the load.

In addition to the major equipment categories listed above, larger sites with A-B power

systems (2N or S+S) may also have one or more “tie” circuit breakers. The circuit breakers

allow power sources to transfer to the alternate A-B side and permit concurrent operation during maintenance. This is normally done “hot-to-hot” (both sides are energized and must

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

be in phase) to keep the critical load energized during the power source transfer. There can be multiple ties located at different points in the electrical system, such as both before and after the ATS and even downstream of the UPS, depending on the level and type of design redundancy. This allows for different sections of the power path to be separately bypassed or shutdown, while still permitting delivery of both sides of the A-B power to the racks. However, to prevent a system outage, it is extremely important to ensure that these tie circuit breakers are only operated in the proper sequence by authorized and fully-trained personnel. Normally, tie breaker handles are kept locked to prevent this problem from occurring.

Main power distribution panel

After power has passed through the ATS, it goes into the main power distribution panel. Typically, this panel feeds the UPS and cooling equipment, as well as lighting and other data center systems. Like the main utility panel, it is normally untouched during typical operation, and it should be visually and thermally inspected annually (at a minimum).

Maintenance bypass panel (MBP) for the UPS

Power into and out of the UPS passes though the MBP and out to the critical load, so it is extremely important that it is visually and thermally inspected. Sometimes, in smaller data center sites, external MBPs are not installed to lower initial UPS purchase and installations costs, or because someone assumed that since the UPS already had an internal bypass, they would not need to also purchase an external bypass panel.

Unfortunately, this assumption is a fairly common occurrence for smaller sites, and it has major consequences if the UPS needs to be de-energized or replaced. These same smaller sites also usually only have a single UPS, so they are forced to cut power to the critical load in the event the UPS needs to be de-energized.

In many cases, the MBP is matched to the UPS and manufactured and installed by the UPS vendor. These matching MBPs can be also equipped with Kirk Key Interlocks and can interactively communicate with the UPS controls to prevent mis-operation. They are usually also covered and maintained under the same UPS service contract. A written procedure and

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

clear understanding of how to operate the MBP should be imparted to key site personnel to help avoid a problem should the need arise to safely bypass the UPS.

Uninterruptable Power Supply (UPS)

Internal systems are electrically checked and visually and thermally inspected. Factory trained service technicians may also run diagnostics. In some cases, the UPS can be placed in internal bypass, and other tests or maintenance procedures require that the UPS be de- energized and externally bypassed via the MBP. In either case, the critical load is then exposed to utility failure unless there a redundant UPS. As noted above, some data centers will operate on a generator during UPS or battery bypass operations to avoid the possibility of a utility outage. Physical maintenance, such as cleaning the UPS fans and changing or cleaning the air filters, is also performed. This is typically done semi-annually, but should be done annually at a minimum.

Battery plant or other energy storage for the UPS

For the UPS to support the critical load from when the utility failure occurs until backup power returns from the generator, stored energy must be always instantly available. Energy is most commonly provided from one or more strings of batteries.

Battery banks require regular maintenance and inspection for signs of corrosion, leakage and temperature variations from cell to cell. Each battery is connected to the other in a

series via a jumper cable, and each cable must be checked to ensure it’s tightly connected

and free of corrosion. In a typically 480V battery cabinet, there are forty 12-volt batteries and therefore 80 terminals than need to be inspected. This is in addition to the electrical voltage and internal impedance testing, as well as periodic load testing.

Note that some data centers will operate on a generator during UPS, battery bypass or load testing operations. Using a generator is necessary to avoid a utility outage while there is no UPS power available.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Many larger sites have dedicated battery-monitoring systems that can monitor each battery

individually, not just the entire string. This is useful for detecting early signs that an individual battery is deteriorating and endangering the integrity of the entire string. While other forms of short-term energy storage are also used in the data center, such as the flywheel or the so-called “rotary UPS,” their maintenance is primarily mechanical in nature and varies by different manufacturer’s recommendations.

Batteries need maintenance, testing and replacement more than any other power-related component. Depending on the type of battery -- VRLA, wet cell or NiCad -- testing should be done quarterly or semi-annually, but annually at a very minimum. Unless there is an allotted budget for the procedure, it is often deferred or ignored. It is worthy to note that statically speaking, battery failure is the most common cause of downtime, other than human error.

Load Testing

Load testing is usually performed at the initial commissioning of the data center. Typically, it covers all the critical areas in the power path described above. However, once a site is operational, it is difficult to perform load testing without interrupting power, unless it is a tier 3- or 4-level facility. Opinions on the necessity for continued load testing are mixed. Purists will insist that it should be performed regularly. Some larger sites even have load banks onsite and they may be pre-wired into key points in the electrical system.

Other data center operators will see load testing as unnecessary, and under normal condition, an additional exposure to failure that is done only if a piece of equipment is suspect or has been replaced. This is especially true for smaller tier 1- and 2-type sites, where the load banks need to be rented and temporarily wired into panels. Of course, in those cases, the critical load must have another source of power, and the switchgear must be already in place to bridge the power without dropping the load, or it must be shut down during the load test.

One of the more debated issues is runtime testing the battery banks, either directly or while powering the load bank from the UPS, because each full runtime discharge diminishes the working life and capacity of the cells. Even after a successful load test, a single cell can fail

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

the next day, and if utility power is lost, the critical load will be dropped. The only way to mitigate this potential exposure is by having multiple battery strings.

Planning, documentation, training and supervision

Needless to say, this article provides only a top-level view of data center backup power

maintenance issues. Actual maintenance procedures vary according to each manufacturer’s

service recommendations and requirements and should only be performed by properly trained service personnel. Moreover, key data center staff, such as shift supervisors, should also observe normal maintenance that is performed by outside vendors and in-house technical resources to ensure procedures are followed. Staff should be familiar with, and even able to perform, some basic and emergency procedures, such as manual operation of equipment, starting the generator, ATS power transfers and operation of the UPS bypass gear.

These procedures should be well documented, reviewed and updated as needed. Equipment vendors or service personnel should conduct training, as well as semi-annual or annual refresher courses. In fact, the ability of in-house staff to properly manually operate critical bypass gear may help avert a downtime incident. Properly documented procedures and supervision by on-site personnel may also avert a total data center shutdown.

Moreover, proper documented detailed procedures and supervision by on-site personnel

may avert a total data center shutdown. This circumstance can arise if it becomes necessary to stop improper maintenance from occurring by new service personnel who are not fully

familiar with the site’s equipment and systems. Emergency procedure documents should be

readily available and accessible to key personnel. Documents should contain clearly labeled

photos of the equipments’ controls and there should be instructions on the exact sequence

of operation and emergency use. Also consider having one to two page emergency procedure cards that can be posted at or near the UPS-MBP and that also include information for manually operating ATS.

The quality and frequency of maintenance is sometimes based on the size of the data center and facilities department. Facilities staff are often far more sophisticated if the organization is running a dedicated data center. Alternately, a facilities department supporting a 2,000

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

square-foot data center in a large mixed-use building may not be as sensitive to some of these specialized data center requirements, because the emphasis and expectations are more often based on the building's systems. The overall culture and training level of the

facilities’ staff makes a huge difference. Also, because many maintenance procedures are

contracted out to either the equipment manufacturers or one or more service or sub- contractors, it is imperative that someone from the organization’s own management team be aware of the scheduling, what work is being performed by who, as well as who is supervising it.

Each data center site may differ in the types of equipment and maintenance requirements,

yet all sites need to have preventive services that don’t affect the operation of the IT

equipment. Some managers try to avoid full failover testing and major maintenance of

critical systems, as it could potentially go wrong

..

This

simply moves the limited (and

presumably avoidable) known risk on the planned maintenance day, to the unknown exposure, the other 364 days of the year.

By avoiding or deferring maintenance, IT personnel could be exposing the data center to downtime from a variety of undetected malfunctions that went unnoticed while on normal power, but failed during a utility outage. Proper training, planning, supervision and documentation of maintenance procedures, as well as upper management support, is crucial to ensuring that a normal scheduled event doesn’t turn into a downtime debacle.

SearchDataCenter.com E-Guide

Power and Cooling best practices for large data centers

Resources from Schneider Electric

SearchDataCenter.com E-Guide Power and Cooling best practices for large data centers Resources from Schneider Electric <aPower Monitoring for Modern Data Centers Switchgear Design Impacts the Reliability of Backup Power Systems Low Voltage Circuit Breaker Guidelines for Data Centers About Schneider Electric Schneider Electric delivers engineered solutions designed to increase safety, lower life cycle cost and maximize power system reliability. Whether you require a new data center installation, refurbishment, replacement, or recommendations for optimizing existing equipment, our nationwide network of qualified experts provide the expertise and accessibility necessary to deliver a complete solution specific to your needs. Sponsored By: Page 15 of 15 " id="pdf-obj-14-10" src="pdf-obj-14-10.jpg">

About Schneider Electric

Schneider Electric delivers engineered solutions designed to increase safety, lower life cycle cost and maximize power system reliability. Whether you require a new data center installation, refurbishment, replacement, or recommendations for optimizing existing equipment, our nationwide network of qualified experts provide the expertise and accessibility necessary to deliver a complete solution specific to your needs.