You are on page 1of 20

C

White paper

command

& control for data centers

Your business technologists. Powering progress

Command & Control for Data Centers


This whitepaper presents possible future developments of Data Centers and improvements made possible by bringing techniques from industrial processes. It elaborates on the Green for IT concept highlighted by Atos Scientific Community in Journey 2014. After a short description of the equipment involved in the operation of a Data Center, the focus is put on growing electricity consumption, its rising cost, and the limited metering and monitoring capabilities that have been deployed to date. Global solutions based on sensors and actuators are discussed, along with their integration into state-of-the-art Data Centers, and their benefits. The paper then presents current solutions and future trends aimed at reducing power intake, or using the heat produced, supported by the basic physics required to achieve that.

An industrial approach to Data Center management


Contents

Data Centers A Thriving Species A Voracious, Growing Beast

The history of Data Centers and how they have developed up to the modern day. The importance of keeping a Data Center safe and operational is also considered..

Reducing Energy Consumption: Recent and Upcoming Trends

Proven ways to minimize power consumption in Data Center infrastructures, in addition to a consideration of other areas, such as servers, IT services management, cooling and containers.

A look at energy efficiency of IT infrastructures, including details of recent studies.

Energy Monitoring

Reusing Heat to Reduce Waste

About the Authors


This whitepaper was authored for the Scientific Community by Ana Maria Juan Ferrer, Head of the Service Engineering and IT Platforms (SEITP) Lab at Atos Research and Innovation (ana.juanf@atos.net); Jrme Brun (jerome.brun@atos.net), VP for Atos Cloud Services offerings; and Mathieu Peyral, former Head of the Control and Command track in the Scientific Community; with valuable contributions from Mick Symonds, Principal Solutions Architect responsible for the creation and documentation of Atos vision and strategy on Data Centers and Cloud (Mick.symonds@atos.net) and Chee Tan, Solution Manager in Managed Services China (chee.tan@atos.net).

The need for finer grain metering of power consumption in Data Centers, as well as monitoring of other physical values, such as humidity, smoke, and water presence.

Putting produced heat to use in other ways, which requires the full spectrum of Command & Control capabilities.

Atos Position Command & Control: on Data Center Management An Industrial Atos activities in the field of Data Center Approach to management, including supporting a cloud strategy, creating greener Data Thermal Regulation services Centers, and developing Command & Control
How Data Centers can be made more efficient via global monitoring and potential solutions to achieve this. further.

Steps Towards Implementation and New Horizons

Conclusion and References Appendixes

A summary of issues and ideas covered in the paper and a list of sources for further reading.

An explanation of the benefits of Command & Control for Data Centers.

Appendix 1: Physic Background on Energy Use and Transfer. Appendix 2: Future Trends in Energy Saving.

Command & Control for Data Centers

Data Centers A Thriving Species


When the computer industry first appeared, the Data Center was the room which housed the mainframe, along with its supporting systems, and was populated by the specialists who operated the system. Consider for instance that ENIAC, the first general-purpose electronic computer, occupied more than 60m of floor space. Electronics miniaturization made the following generations of computers smaller and smaller, until both the processing power and storage fitted into a small case, leading to the personal computer boom of the 1980s. At that point, Data Centers shrunk to occupy a room or two in an office building, accommodating the different computers needed to run the business, usually of different brands and configurations. The complexity of managing those systems, worsened by their heterogeneity, was a driver for the consolidation seen in the 1990s. Fuelled by growing networking capabilities, servers were gathered in corporate Data Centers where they could be professionally managed. Standardization became the norm and economies of scale were encouraged. Look inside one of the thousands of modern Data Centers found across the world and you will see row after row of identical-looking racks, connected together by kilometers of structured wiring. That is assuming you were allowed inside; physical security has become an essential component of Data Center management. The description above is actually of a data room that hosts IT equipment. A Data Center is usually a room on a raised floor that lets air flow underneath and through grated tiles. There is also computer room air conditioning (CRAC), as well as an external chiller, cooling towers, access control systems, and uninterruptible power supplies (UPS). Plugging such computers into a standard wall-mounted electrical socket, directly tied to the grid, would not do the job, as a power cut would turn business-critical servers off immediately, along with all the other equipment in the building. This sort of downtime is simply not compatible with the Service Level Agreements (SLAs) reached with Data Center customers. Multiple steps ensure that the power supply will continue even in cases of external failure. An uninterruptible power supply can provide power for a limited period; generators, usually diesel-fuelled, will handle longer outages. Power distribution units (PDUs) dispatch the electricity to the individual racks, via fixed power rails. In higher SLAs, more redundant units are needed to guarantee uptime, up to the Tier-IV requirement of not having any single point of failure. Confusingly, this is known in electrical engineering as the N-1 constraint: the system should continue to operate under normal conditions even with the loss of any single element; but in Data Centers it is known as N+1: having one more element than is normally required.

A Data Center is usually a room on a raised floor that lets air flow underneath and through grated tiles.

Command & Control for Data Centers

Bulding Supply and Return Air

Bulding Supplyand Return Air

Rack

Rack

Rack

Rack

Floor Slab

Floor Slab

Computers not only consume power, they also turn it into heat, which has to be dissipated in the atmosphere in order to avoid turning the Data Center into a large furnace. The most common setup is to perform cooling in two stages: cold air is blown by CRACs into the computer room; it flows through the racks and exits the room at around 45C. It is then driven through a heat exchanger outside the room. The secondary circuit, usually using water as the exchange fluid, captures the heat, thus cooling the air from the primary circuit and making it ready for a new cycle. The water then releases the heat to the atmosphere through the cooling towers. The system is, in effect, like an enormous glorified domestic refrigerator. There are variations in the process, from the control of the airflow to how close to the servers the water is allowed to come. Inside the computer room, the most common arrangement is one of hot and cold aisles, with an air exhaust every even row and an intake every odd row of racks.

The more power is drawn by the servers, the more heat is produced, and the more cooling is required to move it outside and maintain an adequate temperature in the room. Cooling systems also use electricity and need a more complex and intricate design as power density increases, making total power requirement a nonlinear function of server consumption. A key measure is the amount of electrical power, expressed in Watts, needed per square meter of floor space1. A free-flowing air setup could meet the requirements of a 100 W/m room, whereas the current standard specification by The Uptime Institute of 1,000 to 1,500 W/m will likely require careful ducting and removal of bypasses, depending on how systems are deployed. Hardware suppliers have recently developed systems up to and above 5 kW/m (Symonds, 2009).

The demand for remote computing power, and thus for floor space, has been growing steadily since 2003. The advent of cloud services, and of Software as a Service (SaaS) in particular, are likely to increase that need: software that once ran on the end-users computer will now be using central processing unit (CPU) cycles somewhere on a server farm. Another trend is for companies to become averse to risks and therefore take the necessary steps to mitigate them. In the IT domain, risks related to data loss or unavailability are therefore transferred to the Data Center operator and insured through the SLA. Big players, such as Facebook and Google, are already building huge Data Centers to meet this demand.

This encompasses multiple definitions depending on the devices in the scope (racks only or cooling equipment as well) and the floor space being considered (rack only, clearances, total Data Center surface), as defined by Rasmussen (2005). Figures given in this document are for IT equipment power, divided by rack and clearance ground surface space.

Command & Control for Data Centers

A Voracious, Growing Beast


Information and communication technology (ICT) is omnipresent in a diverse array of economic, industrial and social activities. It has been reported on many occasions that servers alone account for two percent of CO2 emissions in Europe, and the whole IT environment accounts for about eight percent of all electricity currently being used in the region. According to Forrest, Kaplan and Kindler (2008), the worlds 44 million servers account for 0.5 percent of all power consumed and Data Centers CO2 emissions are approaching the levels of complete countries emissions, such as the Netherlands, with a population of around 16 million2 inhabitants in 2010. Figures are expected to continue rising and to quadruple in less than 10 years (by 2020).
1

Figure 1: Comparison of Data Centers carbon dioxide emissions, from Forrest, Kaplan and Kindler (2008).

Data centerslarge carbon footprint


Data centers emissions are now approaching those of Argentina or the Netherlands.
Carbon dioxide (Co2) emissions as % of world total, by industry
0,6 0,2 Data centers1 Airlines Shipyards Steel plants 0,8 1,0

Emissions from data centers worldwide, metric megatons CO2 CAGR >11%
340
2

CO2 emissions by country, megatons Co2 a year


142 80 146 178 80

Data centers

Argentina Netherlands Malaysia

2007

2020

Enhancing the energy efficiency of Data Centers is therefore important, not only to reduce power consumption, but also to stimulate the development of a large leading-edge market for ICT-enabled energy-efficiency technologies that will foster the competitiveness of the industry and result in new business opportunities. Greenpeace (2010) has published a study analyzing and comparing the effectiveness of the power usage of large cloud Data Centers which offer access to popular Internet services, such a Google or Facebook (Figure 2). Google,

Including custom-designed servers (eg, Google, Yahoo), consumed and embedded carbon. Compound annual growth rate. Source: Advanced Micro Devices; Financial Times; Gartner; Stanford University; Uptime Institute; McKinsey analysis

the most-used Internet service, provides a very good example for understanding energy usage and the carbon footprint of Internet services. It is estimated that Google manages over one million servers and processes one billion search requests daily. The operation, production, and distribution of these servers produce huge amounts of carbon dioxide, assessed by Gombiner (2011) as one gram of CO2 per search,

amounting to a daily total of one thousand tons of CO2. This is just one example, but it offers a magnitude order of peoples day-by-day Internet activities and the use of ICT, as well as the services companies such as Atos provide to customers. Another example is the avatar of someone playing an online game that produces as much CO2 as the average real-life Brazilian.

Figure 2: Comparison of significant cloud Data Centers, from Greenpeace (2010).

Comparison of significant cloud data centres

Sq Footage

Estimated number of servers

Estimated power usage effectiveness

% of Dirty Energy Generation of local grid

% oof Renewable Electricity of local grid

Lenoir, NC

476,000

1.21

50.5% Coal 38.7% Nuclear 34.4% Coal 3.3% Nuclear 50.5% Coal 38.7% Nuclear

3.8%

Dalles, OR

206,000

1.2 -

50.9%

500,000 Apple, NC

3.8%

Chicago IL

700,000

473,000

1.22

72.8% Coal 22.3% Nuclear 37.1% Coal 21.0% Coal 27.0% Nuclear

1.1%

San Antonio, TX Lockport, NY

470,000 190,000

1.2 1.16

11% 27.7%

La Vista, NE

350,000

100,000

73.5% Coal 14.6% Nuclear

7%

16,590,965 according to http://www.indexmundi.com/netherlands/population.html 5

Command & Control for Data Centers

The Smart 2020 report shows that only about half of the energy consumed by Data Centers powers the servers and storage; the rest is needed to run backup, uninterruptible power supplies (5%) and cooling systems (45%). Another example, from data provided by APC, Intel and Forrester, assumes only 30 percent for IT equipment, and nine percent for CPUs.
Figure 3: Composition of the Data Center footprint. Source: Smart 2020 report.

2002
100%=76 MtCO 2e
Volume servers (7 MtCO 2e)

2020
100%=259 MtCO 2e

Cooling systems (24 MtCO2e)

Volume servers (136 MtCO 2e)

Cooling systems (70 MtCO2e)

17%

18%

Power systems (13 MtCO2e)

36%

Power systems (62 MtCO2e)

52% 21% 32% 6% 5% 3%


Storage systems (4 MtCO2e) High-end servers (2 MtCO2e) Storage systems (18 MtCO2e)

7%

Mid-range servers (5 MtCO2e)

1% High-end servers (5 MtCO2e) 1% Mid-range servers (2 MtCO2e)

In many current Data Centers the actual IT equipment uses only half of the total energy consumed with most of the remaining energy required for cooling and air movement. This often results in poor power usage effectiveness (PUE) values and significant CO2 emissions. PUE is the measure of how much overhead is required to power the ancillary equipment, such as UPSs and coolers, above the load of the IT equipment itself. For this reason, issues related to cooling, heat transfer, and IT infrastructure location are more and more carefully studied during the planning and operation of Data Centers. The cooling and heat transfer processes are not the only important aspects influencing the energy efficiency of Data Centers. Actual power usage and the effectiveness of energy-saving methods depend heavily on the types of IT applications and workload properties. However, to take full advantage of these methods, (i) application power usage and performance must be monitored in a fine-grained manner, and (ii)

parameters and metrics that characterize both applications and resources must be precisely defined. Consequently, there are a large number of parameters that may impact the energy efficiency of IT infrastructures. All these parameters should be taken into account during the design and configuration of Data Centers. Issues such as types and parameters of applications, workload and resource management policies, scheduling, hardware configuration, metrics defining efficiency of building blocks, hot/cold aisle design, and energy re-used by facilities connected to IT infrastructures are all critical to understanding and improving the energy efficiency of Data Centers. To carefully study these issues, simulation, visualization, and decision-support tools are needed that will help in the optimization, design, and operation of new energy-efficient modular IT infrastructures and facilities.

Only about half the energy consumed by Data Centers powers the servers and storage.

Command & Control for Data Centers

Energy Monitoring
Power provisioning, in particular, requires careful consideration during the strategic planning phase. Rating it too high will cost extra, as all support equipment (UPSs, CRACs, etc.) and external chillers must be sized accordingly. Rating it too low means that the power supply will be exhausted with room space still remaining, resulting in wasted floor space. Factoring in the initial error margin, the latter choice is usually a more financially sound one, since the infrastructure to deliver power costs around three times as much as the space, as described by Symonds (2009). While the total amount of power consumed by the Data Center is well known, the information at a finer granular level is rarely observed, although this is increasing. Invoicing by power used, instead of by space, is beginning to be seen above a certain power density, and is a driver for installing sub-metering capabilities, either at rack or cluster level. Meters are currently read manually; the upcoming generation of smart meters will report directly to the enterprise resource planning (ERP) system for billing purposes. Finer-grain metering, at the computer or virtual machine level, has not been achieved on a commercial scale, yet could become the next paradigm for diagnosing a Data Center and improving operational efficiency. With everincreasing power density, the electricity bill has
Figure 4: Breakdown of Data Center operating expenses.

become an important operational expense for a Data Center (see Figure 4), triggering initiatives to identify and sort out power losses. The Green Grid has published recommendations in Green Grid #7, advocating instrumentation through smart power strips or new consumptionaware servers. The most effective managers in reducing energy consumption are those with a clear view of where the energy is going, and with some clear objectives of what can be done. Moreover, they have the information to fine tune energy consumption when problems arise or workloads change. Power is not the only physical value in need of supervision. All-in-one solutions have been monitoring computer rooms for more than twenty years, helping avert the most frequent hazards. Overheating can initiate a safety shutdown in servers, or permanently damage them. High humidity can cause water to condense on circuits, while low humidity can result in electric arcing. Sensors continuously measure and report temperature, humidity, and smoke and water presence to the monitoring solution, which can trigger an alert if an emergency condition is met. Advanced products can also be accessed through the Internet, and communicate via SMS and email. However, only a handful of monitoring products feature power usage in their capabilities. And none so far goes beyond triggering an alert: commanding inroom devices to maintain optimal conditions.

With ever-increasing power density, the electricity bill has become an important operational expense for a Data Center.

% cost compared to standard DC total cost

Current standard

30

High density

44

20

40
Building

60
Instalations

80
Power

100

120
Ruming costs

140

160

Command & Control for Data Centers

Command & Control: an Industrial Approach to Thermal Regulation


Command & Control (C&C), as presented by Atos Scientific Community (2011), was introduced to address a need to keep a physical process, like a chemical reaction or power generation, under control. It encompasses a set of devices manipulated automatically or by an operator to steer a system in order to produce a desired outcome. Systems range from the very simple (thermostat for oven or room heating) to the very complex (nuclear power plant with sensors by the hundreds of thousands). With this approach, Data Centers, from their physical standpoint, and computer boards and circuitry are seen as very expensive, electricallypowered heaters. The expected outcome is to maintain the physical parameters of the room, and psychrometrics in particular, within an allowable range, as recommended by the American Society of Heating, Refrigeration and Air-conditioning Engineers (ASHRAE). Most Data Centers feature minimal C&C operations, with actuators coupled to the associated sensor. Thermostats can regulate CRACs depending on measured and desired temperature, especially the newer types with variable-speed fans. Hygrometers drive condensers or humidifiers, depending on the measured humidity. Such solutions perform local optimization (achieving an optimum environment locally around the unit) and can suit small rooms. However for larger Data Centers, it becomes inappropriate. Devices that do not know of each others existence could end up pushing in opposite directions, especially as far as humidity is concerned. Such a condition can be avoided through global optimization, which can be achieved by coordinating the actuators with C&C. This is guaranteed to achieve a solution that is at least as good as that of local optimization, and usually better.

Devices that do not know of each others existence could end up pushing in opposite directions.

Figure 5: Psychrometric chart, with ASHRAE recommendations highlighted. Original chart from Wikipedia.
120

10

30

80

cific Spe

25

m Volu

10 0% 90 %

it y

e 0,9

80 %

Hu

m id

60

0 (m

70 %

60 %

at

20

iv e

3/kg

50
40

% 40 %

Re l

Dry Air)

15

10

30
0,85

20
5

20%

0,80

10%

-5
-5 0 5 8

0% 10 15 20 25 30 35 40

Dry Buib Temperature (C)

Command & Control for Data Centers

Atos Scientific Community (2011) describes how modern C&C architectures feature the notion of model, which is an abstract and often simplified view of the status of the controlled system. Proper use and interpretation of that model will help perform the appropriate actions required to maintain the psychometric balance of the room. Large Data Centers are proportionally fitted with more sensors rather than small ones: three temperature values instead of one are not of much use to a CRAC with a single setting, whereas they could make a more detailed and more accurate model, leading to finer control. The system under supervision can also be extended to cover the whole Data Center, not just the computer room, so it would also regulate the heat exchangers. Global monitoring of the center makes its operation more efficient and safer by detecting anomalous conditions in real time and by reacting appropriately to emergency situations, such as a fire breaking out or a ruptured coolant hose. Computers kept in the room are not the dumb heaters that the physics rules would describe. They have their own operational rules and

constraints, acting at a logical level rather than a physical one, and providing software solutions to manage that logical level. A communication channel between the computers or blades and the C&C could be mutually beneficial. A first level of interaction is to tap into the embedded sensors: CPU, board temperature, and fan speed can be used as inputs for the model. An overheating CPU could be detected, and countermeasures initiated (adjusting the airconditioning or modifying air flow) before built-in security is invoked, thus averting an emergency shutdown. Such monitoring need not be limited to the present status: a forecast of the server load could help the C&C system to anticipate a processor-induced heat wave and get through it at minimal operational cost. Linking computer management and room control more closely, by giving the latter the capability to command the former (using the computers as actuators too) should be undertaken with great care as it means the C&C has to be able to take the different SLAs into account and enforce them. It is exactly the

opposite of the arrangement expected within IT service circles, yet it would give greater flexibility in dealing with uncommon situations, such as relocating virtual machines to a CPU away from a hot spot, or initiating selective shutdowns in case a fire breaks out, and then to be able to slow down air circulation to avoid fanning the flames. Beyond real-time day-to-day operations, model and sensor data history can be exploited for analytical purposes, such as reviewing the Data Center, deciding on where to add racks, preparing a retrofit or maintenance operation, or making sure that cooling devices are correctly sized. Such plans can even be devised before the Data Center is actually built: Computational fluid dynamics (CFD) software simulations are already used to assess airflows, optimize layouts, and determine cooling needs. Coupling these with the C&C model would allow the stability of the whole system to be tested, its response to events fine-tuned, and its resilience to extreme situations measured. C&C therefore has its uses as early as the strategic planning phase.

Operator

Sensor Temperature Humidity Power Wind / airflow Smoke Motion / cameras oxygen levels Board sensors

Controller

Actuator CRACs Humidifiers Chiller Air mixture control Fire control Alarm

Data Center

Command & Control for Data Centers

Steps Towards Implementation and New Horizons


In the industrial world, the provision of C&C capabilities to plants is usually planned from the beginning, or added during major maintenance operations. Deployment of C&C deeply impacts the system in operation, whether it is a factory or Data Center, and requires asset integration at multiple levels: `` Sensors for each physical value that needs to be measured: temperature, humidity, airflow, etc. `` Actuators (CRACs, humidifier, chiller) that can be remotely controlled. `` Links (physical layer from the open systems interconnection (OSI) model), to convey information from the sensors to the controller, and from the controller to the actuators. Wireless (ZigBee, Bluetooth, Wi-Fi, etc.) is a possibility, but requires careful setup to avoid conflicts. Using existing network cables may be incompatible with the SLAs, and can also become a security issue. A specific network is often a more appropriate solution. `` Communication protocols: the controller has to understand all the protocols (SSI, SensorML, AS-i, etc.) used by the different devices. This can influence the choice of these devices. `` Dedicated hardware for the controller. `` Tailored software that can account for CFD, as mentioned above. `` Integration with asset management. Because of the large impact on hardware, it is usually impractical and not financially sound to retrofit existing Data Centers. Considering the novelty of the approach, and the delay between the initial planning and the moment a Data Center becomes operational, only very recent Data Centers are likely to have embedded C&C. The status of IT equipment is monitored at all times using asset management tools, which are deployed in existing Data Centers. They inform the service desk of incidents, such as an out-of-order machine that requires immediate replacement. Integrating this service with C&C can help anticipate such incidents minutes before they actually happen: the detection of anomalous patterns, such as levels of heat in a server not being correlated with its CPU load, or a malfunctioning fan, would trigger automatic actions: transfer the load to another machine, shut the faulty one down, and issue an alert to have it sent for repair. Atos has implemented predictive maintenance for power plants, using C&C detections as input, in order to reduce downtime from both expected and unexpected maintenance. The same techniques could be put into use for Data Centers. Energy management is one domain where C&C can make a real difference. Load balancing matching demand and generation is one essential feature of the electricity grid. Consumption peaks were usually addressed by increasing production, however over the past years a new tool called Demand Response was made available to grid operators. Demand Response consists of asking selected customers to lower their demand for a short period of time, from a few minutes to one hour. For industries, this is an opt-in scheme with financial incentives, such as lower tariffs. C&C could help Data Centers enroll in such programs, using temperature as a lever to reduce consumption: During peak hours, it would be raised to the upper part of the ASHRAE acceptable range, thus requiring less cooling and therefore consuming less electricity. Once the peak is over, cooling would be increased to return to standard conditions. This technique can also offset fluctuation of electricity prices during the day, which many retailers envision for the near future.

Energy management is one domain where C&C can make a real difference.

10

Command & Control for Data Centers

Reducing Energy Consumption: Recent and Upcoming Trends


Saving energy is not only a matter of being greener, be it out of conviction or social pressure, it also means saving money. The four most effective technologies and strategies currently being implemented in order to minimize power consumption of Data Center infrastructures with positive results are listed below: `` Energy-efficient (blade) servers: New multicore, energy-efficient servers can bring energy use down for three reasons: They replace and consolidate many older servers; they are far more energy efficient (transactions or MIPs million instructions per second - per watt) than older models; and savings in IT equipment result in upstream or downstream savings in UPS or cooling power (assuming variable control equipment). `` Overhauled Data Center cooling: The organizations that reported significant improvements in Data Center energy efficiency usually undertook a major re-evaluation of Data Center cooling at some point in the process. This involves multiple strategies, almost all of which separate hot and cool air. Some companies adopted advanced technologies such as in-row cooling, which focuses cooling on where it is most needed. Many are raising the temperature of chillers and increasing cold-aisle temperatures, sometimes into the mid-20Cs (above 78F) that was traditionally viewed as the upper limit (this practice has been supported by the European Code of Conduct on Data Centers, and by successful trials at Google, Intel, and elsewhere3). Another successful tactic is to upgrade to energyefficient motors and pumps. `` Free air cooling: Many energy-efficient Data Centers make use of free air cooling if cool outside temperatures can be utilized, then less energy will be used in cooling. Data Centers with the best energy-efficiency statistics are likely to be sited in temperate Northern zones, where the temperature is below 20C for much of the time, although very cold air can create static/humidity balance issues. `` Monitoring and analysis of power usage was described in chapter 3. One achievement of this technique was to identify important losses caused by the power supply units (PSUs) in each server, which fostered the adoption of higher standards, such as the 80-plus initiative4. Solutions are considered at every scale, from the integrated circuit to the building, and to increasing available CPU cycles at a given power rating. The following sections show solutions that are currently being studied or deployed. More futuristic options are listed in Appendix 2.

IT Services Management
Power Management takes advantage of the hardwares ability to adapt input power depending on computational needs, which can be performed at multiple levels: `` Individual processor clocks can be slowed down so that less power is used when it is being utilized at a low level. Green Grid #33 details the topic and exposes why these measures are not more widely implemented. `` Servers can be set to idle when not in use, as long as this does not conflict with the SLA. `` Workloads can be transferred from one server to another with the sole purpose of saving energy (setting unused servers to idle). This is possible in shared cloud environments and this policy is in place in Atos cloud platforms. Another important contribution made by service management is Charge Back Energy Transparency. Charging IT users for kilowatts spent requires adequate equipment, is quite complex to put in place, and needs a number of hypotheses to evaluate actual user consumption. Studies have shown that consumption-based invoicing is a major factor in decreasing consumption in various domains, such as energy, water, public services, telephone, etc. The level of the decrease depends on the levers that the users may have on their consumption: for instance, smart grids offer multiple tariffs depending on the time period. All of this is a rather unexplored field for IT: There are few rewards for writing energy-economical applications and few penalties for unnecessary usage of IT resources. A full Google search expends about the same energy as driving a car for 1 km, an inadequately-written request to a big database can utilize several servers for one night, a PC may be desperately running high to recover a connection lost somewhere else, but who is really accountable for this spending?

Servers
After transistors (see Appendix 2), servers are the next field of battle for energy optimization. Norms and standards such as Climate Savers and Energy Star are helping to accelerate and control progress. Servers have to take full advantage of energy improvements at processor level and also improve their power internal supply chain and their own cooling system (fans). Individual server fans, acting in an uncoordinated manner, could be replaced by C&C-driven global airflow control. Rack technologies have evolved from being purely a piece of furniture to a piece of IT equipment that takes on many of the functions normally executed by servers, like load balancing, storage, networking, cooling, and monitoring of power supply and energy by moving workloads between different servers in the rack.

3 4

The industry-standard limits for Data Center operation are determined by ASHRAE, which has recently revised the expected limits of both temperature and humidity. See http://www.80plus.org 11

Command & Control for Data Centers

Cooling
Water or non-conductive liquids are far more effective than air at removing heat. Some Data Centers are using water, either at component level or in rack equipment. Although effective, take-up has been low because of the fear of water being in contact with electrical equipment, technical issues involved in implementation, and capital cost.

Location Chasing Cheaper Electrons


Finding a suitable location for a Data Center means taking multiple factors into account, such as political stability, environmental hazards, availability of power supply, network connection, reasonable access, local regulations, and more. Making this decision is part of the strategic planning phase, which can be helped by simulations through C&C, as explained above. Cold climates have recently drawn attention as possible homelands: Iceland, Norway, Scotland, and Canada have all tried to lure Data Center builders or operators to their cooler climates, where there may also be renewable power. Data Centers in these locations will inevitably operate very cleanly, but other issues such as proximity to customers, finding suitable staff, and connectivity have prevented take-up. There are some famous examples though, such as Microsofts first European Data Center (50.000 m), located in Dublin and opened in 2010, Google and Microsoft have also built Data Centers along the Columbia River to use hydroelectric power and free water cooling. There are some limitations to the use of free cooling: `` Due to variations in outside temperature throughout the year, it is generally not possible to use free cooling all year round, so conventional systems must also be in place. Of course, it is possible to limit activity in the Data Center during the hot season, as Google claims to do. `` External air may not have the appropriate hygrometry or may contain dust, so additional systems are needed. `` Free cooling by water is more efficient, but it is only interesting when the water temperature is below 12C, which again may not be the case all year round.

Containers
Containers are small Data Centers embedded in modular physical environments which are self-contained and portable (by truck). They are offered by large hardware vendors, like IBM, HP, Sun, Dell, or Rackspace. Companies, like Google or Microsoft, building huge Data Centers have also designed their own containers. Containers have been used for a long time for particular purposes, such as disaster recovery or for deploying a small Data Center in a remote place. But they have recently become more widely used as a building block for large Data Centers which are made up only of containers. The deployment of equipment inside each container is optimized for power and cooling, thus making these Data Centers much more efficient than classical Data Centers, with PUE in the range of 1.2-1.3. While the strategy is still subject to debate, the next generation of Data Center is also using containers for infrastructural components, such as power and cooling, thereby eliminating the need for a building at all and rendering the whole Data Center a secure compound of concrete slabs.

Direct Current
From the electricity grid to the motherboard, multiple conversions are performed between alternating currents (ACs) and direct currents (DCs): UPSs store direct current; PSUs require alternating current to rectify that and provide direct current to the motherboards. The transformation is performed through thousands of small and inefficient items of equipment in servers and routers, and at each step some energy is lost at no benefit to IT. Feeding DCs from the UPS to the servers, as explained in Green Grid #31, eliminates a conversion step back and forth, with an expected yield of between 10 and 30 percent. The technique is widely used in the telecommunications sector, but so far take-up for general-purpose computing is light, with concerns about the cost of conversion, nonstandard equipment, manufacturer support for the proved-model of AC, thick cables, and high voltages all preventing adoption. A possible hybrid solution is to perform such AC/DC conversion at the rack, or even row-of-racks, level rather than in each item of equipment.

12

Command & Control for Data Centers

Reusing Heat to Reduce Waste


So far, strategies described have focused on reducing power intake. Another option concerns considering output heat to be something other than a waste product, and attempting to put it to use: heat is simply another form of energy, which could be exploited directly, or transformed into something else. Managing such operations requires the full spectrum of C&C capabilities.

Heat Reuse for External Buildings


Minimal effort would involve a Data Center providing office buildings nearby with heat, thus avoiding the consumption of other energies for the same purpose. However, Data Centers tend to be located outside of city zones, with few offices around. Other anecdotal applications are heating swimming pools5, arboretums, and horticultural greenhouses. Data Center heat is not constant and not at a temperature high enough to make it suitable to more critical applications, such as houses.

A possible, but currently remote, solution could be to twin the Data Center with an Ocean Thermal Energy Conversion (OTEC) plant. The latter is a heat engine designed to extract energy from the temperature difference between the warm water from the ocean surface and the cold water from the ocean bottom. Recent designs can run on a difference in temperature of as little as 20C. The Data Center could have an open secondary circuit, sharing the cold water intake and using the hot water exhaust as an addition to the surface water for the power plant. In return, the plant would provide the Data Center with electricity. The experience gained from a failed experiment (described by Symonds, 2009) with a biomass plant advises loose coupling, allowing one of the components to run without the other, through valves for the water pipes and a connection to the grid for electricity. However, OTEC power plants are still experimental today, so while this solution seems promising on paper, the technology is in need of further development and it would require a location close to the seashore, with potential environmental hazards to be factored in.

Digital Cities
In the future, one can imagine the emergence of a digital city, a kind of ecosystem with renewable power plants, supplying power to Data Centers which in turn would supply labs with computing power and heat. More advanced cities, such as Amsterdam in The Netherlands, are already looking at how they can build a sophisticated, energy-efficient digital infrastructure into their cityscape6.

Conversion to Electricity
The heat could be converted back to electricity, which in turn could be reused directly by the Data Center. Following the second principle (as explained in Appendix 2), converting heat to other forms of energy will have a yield of around 10 percent, which means that the return on investment (ROI) is hard to meet. Waste heat reuse has nevertheless been demonstrated in lab conditions by Kongtragool and Wongwises (2008) using a Stirling cycle, by Declaye (2009) with a Rankine cycle, and by Harman (2008) with an ejector heat pump, paving the way for industrial applications.

5 6

See http://www.geek.com/articles/news/ibm-makes-a-splash-with-data-center-heat-2008047/ See http://www.datacenterknowledge.com/archives/2010/06/04/ozzo-energy-focused-data-center-design/ 13

Command & Control for Data Centers

Atos Position on Data Center Management


Atos currently operates over 40 fullfunction Data Centers for itself worldwide, and over 100 customer-dedicated environments.
The advent of the Global Factory in 2008 facilitated a shift from a country-based approach to Data Centers to a global strategy with a view across all Atos Data Centers, global capacity planning and a policy for the consolidation of the use of Data Centers, keeping them, closing them, or expanding their capacity. As a result, more than 10 Data Centers were decommissioned between 2008 and 2010. If Data Center capacity is developed as a company-wide resource and properly configured, it can be used for workload from any country, subject to any legal constraints regarding data location and compliance. The basic level of such services is that of Infrastructure as a Service (IaaS), consisting of computing and storage on demand, linked by a structured Data Center network. This basic level can be enhanced by facilities, like Platform as a Service (PaaS), to deliver capabilities, such as those for test and development environments, or back-end facilities for distributed desktop services. These in turn can be complemented by Software as a Service (SaaS) and other added-value services, such as federated identity management and cloud integration (Symonds, 2010). These services need to be standardized and consolidated, to maintain a critical mass, but may also need to be made available in specific locations, for reasons of compliance to legal constraints, latency of connections to other systems and/or cultural or language issues. For that reason, they are best deployed using a pattern of hubs and satellites: Regional hubs delivering a full set of services in EMEA, the Americas and Asia Pacific; satellites delivering a sub-set of those services in other countries where they are needed, for a general market or specific customers.

Data Centers supporting a Cloud Services Strategy


Atos is increasingly undertaking a gradual migration of most of its installed base of systems from the traditional built-to-order model towards common utility and cloud environments. This needs to be carried out on a consolidated basis to ensure maximum flexibility and economies of scale.

Peripheral

Local country

Atos Sphere cloud

Centres

Centres

14

Command & Control for Data Centers

Possible Satellites Regional Hubs

EMEA

Americas APAC

Greener Data Centers


Atos has a number of key performance indicators regarding its Data Center efficiency which are helping create a greener environment by using less energy and resources. Sub-metering has been deployed in some centers for analytical purposes, as described in 3. PUE is the main indicator in this area and there are many things which can be done to improve its value (see Symonds, 2009b). Many older Data Centers still exceed a figure of 2.0, representing an overhead of 100 percent, yet the most recent Atos Data Centers can achieve as low as 1.4. However, one should not compare these values to the figures advertised by Google or Facebook: the diversity of Atos customers environments and their permanent change requests make matching the power and cooling savings resulting from the massive infrastructure homogenization of such companies - which are even designing and building their own hardware - impossible. As part of a company-wide Green initiative, Atos has measured the total carbon footprint of its Data Centers, which amounted to a total of 117,000 tons of CO2 in 2009, representing 53 percent of the whole companys CO2 emissions. This was followed by an engagement with The Carbon Neutral Company that helped to compensate for these carbon emissions via investment in a windmill turbine project in the Thar Desert, India. This makes Atos the first global IT provider to offer fully integrated carbon-neutral hosting services. The company is also involved in many environmental initiatives, including a contributing membership to the Green Grid initiative.

Command & Control for Data Centers

15

Future Growth Expectations


Whilst being consolidated and virtualized, Atos installed Data Center base is also growing. Current forecast shows a pattern where the current relatively flat volume prognosis will be increased again in a few years, as shown below.

Outsourcing 11% growth

. . . . . . . . . Virtualization . . . . . . . . . . at

Cloud services growth


Virt. rate

KW

65%

30%

2008

2009

2010

2011

2012

2013

2014

2015

0%

Command & Control: Still a Long Way to Go


Basically, two management systems co-exist in Atos Data Centers: `` Energy management systems, with sensors for temperature and humidity which react in real time for appropriate actions through CRACs, power distribution systems, security systems for fire prevention, and all related physical-world devices. `` IT management systems, where the activity of every single IT element is monitored and incidents are followed through a service desk, with alert prevention or reaction, automation of most IT processes, and control over the logical world. However, these two systems are loosely coupled, with only few synchronous communications between them, and lack of optimization. The IT industry, with Data Centers at its heart, is still quite young and has not yet the maturity required to deploy fully integrated Command & Control systems, such as those described in 4. Cloud computing, with its scalable approach, will hopefully accelerate the use of optimization techniques derived from the factory world.

16

Command & Control for Data Centers

Conclusion and References


Conclusion
This whitepaper describes the current Data Center norm, presents existing and upcoming monitoring tools, and exposes how Command & Control solutions, from the industrial world, could be implemented in Data Centers to influence operations, reduce and optimize electricity consumption, and improve maintenance. While Data Centers could greatly profit from the C&C approach, deployment of the latter is a complex task that has to be planned from the beginning, meaning that existing Data Centers cannot benefit from it. It also involves a cultural change in Data Center setup and operation, from the current if it aint broke, dont fix it motto that prevails today in the computer world. The adoption of C&C will be progressive and come from newly-built Data Centers. Atos, with expertise in both domains, is an ideal partner to accelerate the convergence of the two worlds.

References
Atos Scientific Community (2011). Control-Command for Complex Systems, part of Journey 2014. Atos Scientific Community (2011). Green IT, part of Journey 2014. Carnot, S. (1824) Rflexions sur la puissance motrice du feu et sur les machines propres dvelopper cette puissance (Bachelier, Paris). Clausius, R. (1850) ber die bewegende Kraft der Wrme und die Gesetze, welche sich daraus fr die Wrmelehre selbst ableiten lassen. Annalen der Physik 155, 368-397 Declaye S. (2009) Design, Optimization and Modeling of an Organic Rankine Cycle for Waste Heat Recovery. Forrest W., Kaplan J.M. and Kindler N. (2008) Data Centers: How to Cut Carbon Emissions and Costs, McKinsey & Company. Gombiner J. (2011) Carbon Footprinting the Internet. Consilience - The Journal of Sustainable Development [Online], Volume 5 Number 1 (6 February 2011) Greenpeace International (2010), Make IT Green, Cloud Computing and its Contribution to Climate Change. Harman T.D. (2008) Waste Heat Recovery in Data Centers: Ejector Heat Pump Analysis. Kongtragool B, Wongwises S. (2008) A four Power-piston Low-temperature Differential Stirling Engine using Simulated Solar Energy as a Heat Source, Elsevier Ltd. Maxwell, J.C. (1891, republished 1971) Theory of Heat (Longmans, Green and Co, London). Landauer, R. (1961) Irreversibility and Heat Generation in the Computing Process. IBM Journal of research and development 1961, 3, 184-191. Rasmussen N. (2005) Guidelines for Specification of Data Center Power Density, American Power Conversion, White Paper #120 Shannon C.E. (1948) A Mathematical Theory of Communication. Bell System Technical Journal, vol. 27, pp. 379-423 and 623-656, July and October, 1948 Symonds M. (2009) Data Centres in the early 21st Century, Atos, 2nd edition. Symonds M. (2009b) Greener Data Centres Cookbook, Atos, 2nd edition. Symonds M. (2010) Cloud: Its Potential for Business, Atos. The 451 Group, Eco-efficient IT: The Eco-imperative and its Impact on Suppliers and Users (2007-2011), October 2007 The 451 Group, Eco-efficient IT: Policy, Legislation and Compliance, November 2008 The 451 Group, MIS 2009 preview. Eco-efficient IT, Part 2, 17 December 2008 The 451 Group, MIS Spotlight, Buying Behavior, Part 1: What is really Driving Eco-efficient IT?, 08 February 2008 The 451 Group, MIS Spotlight, Buying Behavior, Part 2: The Future of Eco-efficient IT Procurement, 2007-2011, 14 February 2008 The Climate Group (2008) Smart 2020 : Enabling the Low-carbon Economy in the Information Age. The Green Grid WP#7, Five Ways to Reduce Data Center Server Power Consumption. The Green Grid WP#31, Issues relating to the Adoption of Higher Voltage Direct Current Power in the Data Center. The Green Grid WP#33, A Roadmap for the Adoption of Power-related Features in Servers.

Command & Control for Data Centers

17

Appendixes
Second Principle of Thermodynamics
Sadi Carnot (1796-1832) established what is known today as the second principle of thermodynamics, which implies that the maximum efficiency of a heat engine is limited by the difference of temperatures between a hot source and a cold source divided by the temperature of the hot source. For instance, if a Data Center is considered to be a heat engine, the fluids usable for a thermodynamic cycle would have to be at about 45C and 8C respectively (318K and 281K), meaning a thermal efficiency of at best (318 281) / 318 = 11.6 percent. This limits ambitions for reusing heat from Data Centers.

Appendix 1: Physics Background on Energy Use and Transfer


This paradox has been subject to discussion for almost 150 years and it is interesting to note that research labs are currently trying to create a Maxwells demon out of nano-devices. An update on progress was recently published by Scientific American7. Among the arguments from the defenders of the second principle, the most interesting in the context of this paper is that Maxwells demon would need information to recognize the speed of particles, and information requires energy, as detailed below. Irreversible is an important word here, since some reversible operations, like the Boolean AND and OR, could theoretically be performed without spending energy. However, basic operations such as DELETE are irreversible and so the formula applies to all common software operations. It also shows the impossibility of producing information (even the smallest amount) at no energy cost. The physical limit stated in the formula is at eight orders of magnitude below the actual energy spent in computing today. Assessing which limits we can reach with current technologies remains an open question.

Information Theory
Claude Shannon (1916-2001) founded Information Theory in the 1950s and it has since greatly influenced many aspects of IT. Shannon established a link between the exchange of information and the changes in entropy of a closed system. Using this, Rolf Landauer (1927-1999) offered the first evaluation of the lower physical limit of energy spent by IT: The energy efficiency of computers is limited by the fundamental von Neumann-Landauer energy formula (1961):

Entropy and Maxwells Demon


Several years after the Carnot result, Rudolf Clausius (1822-1888) introduced the notion of entropy which can be defined as the quantity of energy which is lost during heat exchange. Entropy is more or less a measure of the disorder of a closed system, and as such is related to information. This strange property gives rise to a number of paradoxes, the most famous being Maxwells demon, invented by James Maxwell (1831-1879). Maxwells demon is an imaginary creature that controls a door between two closed compartments A and B that both contain gas. The gas in both A and B starts out at the same temperature and pressure. The demon could (how, is the question!) open and close the door very rapidly to let the quickest, hence hottest, molecules from A flow to B, and also open and close the door to let the slowest, hence coldest, molecules from B flow to A. After a while, compartment B will be hotter than compartment A, heat transformation would have been produced in a closed system without external power input, hence contradicting the second principle.

E = KT In 2
E is the energy dissipated per irreversible bit operation. k is Boltzmanns constant of 1.38 1023 J/K. T is the temperature of the environment into which unwanted entropy will be expelled.

Raizen, M. (2011) Demons, Entropy, and the Quest for Absolute Zero, Scientific American, March 2011 issue.

18

Command & Control for Data Centers

Appendix 2: Future Trends in Energy Saving

Transistors
Processors, which have been based on transistors for more than 60 years, are the start point for all energy spent along the IT chain. Transistors have proven their efficiency in managing information, through billions of bits with two values, and Moores law remains valid, although it will soon hit the atomic border. However, from an energy standpoint, transistors are far from optimal: They need electrical inputs and they produce heat due their extremely quick on/off switches. The undesired field effect as a result of miniaturization also generates large amounts of heat that is wasted. In addition, energy losses occur when information is sent from one transistor to another, when it is sent between parts of a chip with different functions, such as from computing to storage, and when the data is addressed and read out to other devices, such as displays. In total, there is twice as much heat dissipated per square unit on a chip (100W/cm2) than on a hob (50W/cm2). Projects have been launched for cooling (by fluids) at chip level, but it is quite difficult to achieve due to miniaturization. Transistor consumption is steadily decreasing, as measured per elementary operation. This is illustrated in the chart on the right, from Intel.
1 10-1 10-2 10-3 10-4 10-5 10-6 10-7 1970 1980 1990 2000 2005 2010

Energy / Transistor (relative)

Chart 1 - dramatic improvements in energy consumption per transistor. 45nm technology is a million times more e icient than 30 years ago, and at only 16W quad-core idle power.

Nano-technologies: the future of transistors may still be found in carbon nanotubes, which limit the undesired tunnel effect. However, there are still many questions to be answered, such as the behavior of thermodynamics at this scale, noise level outside of the binary nature of transistors, and techniques needed for mass manufacturing. Quantum computing is based on the moves of photons. Through the superposition of states, elementary operations can transmit much more information than the stupid transistor. As photons have no mass, no energy is required. This does actually not contradict Landauers formula because, according to quantum theory, one has to spend energy to observe information, meaning to retrieve its value. We can nevertheless expect that the energy footprint will be closer to Landauers value. Quantum computing is still in its infancy and has so far shown very limited concrete application.

`` Take advantage of parallel architecture, already in place in processors and servers, by writing parallel software to be executed within those architectures. This field was very active 20-30 years ago, but has been almost abandoned because of Moores law for processors which makes software optimization less critical. `` A good example is the brain, which is a realm of parallel computing. Some experimental studies have tried to measure the brains energy consumption when performing elementary operations. Although the results are difficult to validate, they show tremendous efficiency. A promising trend for lower consumption is the development of appliances: a combination of hardware together with a software optimized to run on this hardware. Appliances were mainly used for embedded systems, but now they are proposed for traditional operations. Examples are Acadia (Cisco/VMware) and Exadata (Sun/ Oracle). Microsoft is also proposing an appliance to run its Azure framework independently. These appliances provide better performance for processing and power consumption than traditional solutions.

Future of processing: Some Candidates to Replace Transistors


Memristors are components with a resistance which has a value between 0 and 1 depending on the amount of charge going through, and able to keep this value when not in use. Their existence was proved theoretically by Professor Leon Chua in 1971, yet only as recently as 2008 have HP labs announced that they are able to deliver nano-elements with this property. If proven industrially, this invention will completely change the current model of memory, since, through the combination of multiple numbers between 0 and 1, complex sets of data could be stored in a single element. The association of transistors and memristors could produce the same computing power with a reduced number of components. Since memristors need less electrical power and do not produce as much heat as transistors, this new computing paradigm will dramatically change the IT/energy ratio.

Software
Writing low consumption software is a rather new field, which should bring much progress: `` Manage and optimize the IT resources needed for execution from the code itself; ideally an energy tag should be attached to each subprogram. `` Prevent misuse of the application in situations with workloads with heavy consumption.

Command & Control for Data Centers

19

About Atos
Atos is an international information technology services company with annual 2010 pro forma revenues of EUR 8.6 billion and 74,000 employees in 42 countries at the end of September 2011. Serving a global client base, it delivers hi-tech transactional services, consulting and technology services, systems integration and managed services. With its deep technology expertise and industry knowledge, it works with clients across the following market sectors: Manufacturing, Retail, Services; Public, Health & Transport; Financial Services; Telecoms, Media & Technology; Energy & Utilities. Atos is focused on business technology that powers progress and helps organizations to create their firm of the future. It is the Worldwide Information Technology Partner for the Olympic Games and is quoted on the Paris Eurolist Market. Atos operates under the brands Atos, Atos Consulting and Technology Services, Atos Worldline and Atos Worldgrid.

atos.net

Atos, the Atos logo, Atos Consulting, Atos Worldline, Atos Sphere, Atos Cloud and Atos Worldgrid are registered trademarks of Atos SA. November 2011 2011 Atos.