You are on page 1of 55

Taming the Energy Hog

in Cloud Infrastructure
Jie Liu
Microsoft Research
liuj@microsoft.com

UCSB, Nov. 12 2014

Sensing and Energy Efficient Computing


Research at MSR
Systems research at the physical layers of computing with applications to sensing
devices, mobile and wearable computing, and efficient cloud infrastructures.

Physical Awareness
Physical Data
Physical Interface

Energy-efficient computing
Context-aware services
Mining from physical and mobile data
Stream data management and processing
Embedded devices and networking
Novel sensors, circuits, and RF comm.

A Few Research Artifacts

RF Proximity sensor

LittleRock

Cloud-offloaded
GPS

NFC Ring
FM-based indoor location

Mobile app power profiling


Windows (phone) geo-fencing
Vanar Sena: automated app
execution and testing
Store app ranking and fraudulent
detection
Bing local search relavancy

Genomotes

Data center
asset tracking

Data Centers: Home of the Cloud

Energy Expenditure

The IT industry is on fire!


Constitutes about 2% of total US energy consumption
Consumed 61 Billion kWh in 2006, enough to power 5.8 Million average US
households
Is the fastest growing energy consuming industrial sector
Doubled every 5 years since 2001.

But wait, you cant really save the earth with


2%...

Why Cloud Providers Care So Much about Energy:

Cloud infrastructure is about providing agreed


availability (typically 99.9%) at the lowest cost.

Total Cost = Capital Investment + Operation Cost

Microsofts Chicago Data Center


(~200,000 servers)

700,000+ square feet


26,000 cubic yards of concrete
3400 tons of steel

2400 tons of copper


190 miles of conduit
7.5 miles of chilled water piping

100+ MW Power Capacity


60 MW Total Critical Power

Cost Analysis
Data centers can cost between
$10M and $20M per Megawatt

>80% of costs scale with power

Total Cost of Ownership of a 1U Server

<20% of costs scale with space


Server costs are trending town

Total Equipment
cost per server

12%

Power costs are trending up

46%

23%

Energy Usage cost


per server
Datacenter Capital
cost per server

19%

Datacenter
operating cost per
server

Server Lifetime 5 years


Infrastructure Lifetime 15 years

What Is in a (Mega) Data Center?


Internet

gasoline

power

Generators

Power grid

PUE=



PDU
Racks

UPS
Transformer

air
CRAC

water
Water chillers

Server Power Consumption


180

Power Consumption (Watts)

160

Intel(R) 2CPU 2.4GHz


Intel(R) 2CPU 3GHz

140
120
100
80
60
40
20
0

Sleep

Idle

20%

40$

60%

80%

100%

CPU Utilization

Idle power takes a large percentage of total power consumption.


Almost linear to its utilization once used.

Login Rate
5
Connections

1200

1000
3
800
2

600

400
200

20

40

60

80
100
Time in hours

120

computing

140

Number of connections (millions)

1400
Login rate (per second)

temporal

spatial

Data Center Dynamics Over Space and Time

160 0

physical

Christos Kozyrakis, Aman Kansal, Sriram Sankar, and Kushagra Vaid, Server Engineering Insights for Large-Scale Online Services, in IEEE Micro, IEEE, July 2010
Chieh-Jan Mike Liang, Jie Liu, Liqian Luo, Andreas Terzis, and Feng Zhao, RACNet: A High-Fidelity Data Center Sensing Network, in Proceedings of The 7th ACM
Conference on Embedded Networked Sensor Systems (SenSys 2009),, November 2009

Traditional Operations
Over cooled
Under utilized
-33% cooling
-4% lighting
-15% UPS loss
-10% air handling (fans)
-35% power supply
-85% underutilization
-40% inefficient applications
3% energy for
useful work
30kW
to the data
center

Source: Reinventing Fire (Lovins), 2007

16kW
to the server

9.5kW
to the
application

0.9kW
to the
customer

Strategies for Taming The Energy Hog

REDUCE

REUSE

RENEW

Energy Reduction Tactics

More energy efficient design


Energy proportionality

Relaxed operation conditions


Consolidation
Infrastructure over-subscription

Understanding the Genome of Data


Centers
power
performance
networking
environments

Measure

Control
Adaptive cooling
economizers
load distribution
VM migrations

Improve
Data Center
Efficiency

trend
correlation
dependencies
abnormality
bottlenecks

Model

Plan
capacity
provisioning
allocation
consolidation

A Data Driven Approach for Data Center Resource Optimization

Design
facilities
server hardware
networking
applications

Sense Everything
Asset location
Heat distribution
Electrical wiring
Network wiring
Air flow
Power consumption
Service types
Cooling Systems

Collect, archive,
and understand
operations data

Server utilization
Processor
Network

Power
consumption

Storage

Weather
Electricity availability & price
Networking

Server performance

From Data to Decisions

The easy
The hard

The ugly

Visualization
Static provisioning

The Easy
Eye-Balling

Capacity planning

Statistics
Monte Carlo Simulation

Change management

A Vinyl Curtains Story


top

before

middle
bottom

after

9/28/06 0:0:0 4:0:0

12/31/06 0:0:0 4:0:0

Relaxing Air Conditioning

(Static) Power Provisioning

Data Center Power

Rated peak (never reached)


Possible peak (sum of server peaks)
Allocated Capacity
Actual power consumption
(peak of the sum usually lower
than allocated, but can exceed)
Time

Microsofts Data Center Evolution


1989-2005

Generation 1
~2 PUE

Colocation
Server
Capacity
20 year Technology

2007

2008

2011+

Generation 2

Generation 3

Generation 4

1.2 1.5 PUE

1.05 1.20 PUE

Containment

Modular

1.4 1.6 PUE

Density
Rack
Density and Deployment
Minimized Resource Impact

Containers, PODs
Scalability & Sustainability
Air & Water Economization
Differentiated SLAs

ITPACs & Colos


Reduced Carbon, Rightsized
Faster Time to Market
Outside Air Cooled

Power Capping & Tracking


Dynamic Provisioning

The Hard
Single Domain Modeling
Clear Hierarchy
Dynamic Control

Load Placement

Power Capping
Data centers oversubscribe their
power capacity.
Statistically speaking, the
aggregated power will not exceed
circuit capacities.
At rear events when the power
exceeds capacity, server activities
must be capped.

Data Center Power

Power Capping & Tracking

Time
Lead Acid Battery Charging Curve

Utility power price changes

Battery charging after outage

Actuation:
DVFS
Shut down unimportant
servers/tasks

Power (Watt)

Power Tracking

3
2.5
2
1.5
1
0.5
0
0

5
10
Time (hours)

15

Dynamic Provisioning
The number of active servers follow the workload
Login Requests
Clients

Dispatch Server

Pick a CS
Connection

Load reporting

Connection
Server

Backend Servers:
Authentication, address book, etc.

Load Forecasting
Seasonal Data Regression
X

Mon. Two
weeks ago

Last Mon.

Number of Connections

X X

x 10

Observed value
Forecasted value

4.5
4
3.5
3
2.5
2

20

40

60

80
100
Time (Hours)

120

140

160

X
Today

X
X
11:30 12:00
Long term dependency +
Local adjustments

Login Rates (per second)

1200
Observed value
Forecasted value

1000
800
600
400
200

20

40

60

80
100
Time (Hours)

120

140

5 weeks of data for training,


1 weeks of data for validation
Forecast every 30 minutes

160

Load Dispatching Strategies

Load Balancing

Load Skewing

controls convergence rate

round robin over


busiest server as long as N i N tgt
e.g. N tgt 0.9 N max

Starve a server before shutting down

Declare N i N tail as shut down candidates

pi

N
1
1
( i )
K
K N tot

Ltot (t )

Ltot (t )

User requests

Load
Dispatcher

Load
Dispatcher
Li (t )

Li (t )

N i (t )

N i (t )

Di (t )

User requests

Di (t )

N tgt

Algorithm Performances
Algorithm

Energy (KWH)

Savings

Denials

No dynamic provisioning + Balancing (NB)

478

---

Forecasting + Balancing (FB)

331

30.8%

3,711,680

Forecasting + Balancing + Starving (FBS 2)

343

28.2%

799,120

Forecasting + Skewing (FS)

367

23.3%

597,520

Reactive + Skewing (HS 5/10)

375

21.5%

48,160

60

60
FB
FS
FSS

FB
FBS
RLS

55
Number of Active Servers

Number of Active Servers

55
50
45
40
35

50
45
40
35
30

30
25

25
0

16

24
Time (Hours)

32

40

48

16

24
Time (Hours)

3032

40

Gong Chen, Wenbo He, Jie Liu, Suman Nath, Leonidas Rigas, Lin Xiao, and Feng Zhao, Energy-aware server provisioning and load dispatching for connectionintensive internet services, in NSDI'08, Berkeley, CA, USA, 2008

48

But, Why Keep 30% Machines Off?


Virtualization
Improve server utilization
Amortize idle power consumption

However, hardware-based resource control methods fall short


Servers shared by VMs from different applications
Not all apps/tiers are created equal
Throttling a physical server affects performance of all apps on it

VM VM
Server-12

VM VM
Server-1j

Rack

VM VM
Server-11

Service Virtualization
Soft Actuations

The Ugly
Cross-Domain Modeling
Interference

Power Capping VMs

VM Interference
Co-located
applications

App 1

App 2

Core-private
cache

Processor
Shared Cache

Memory
Bandwidth
Memory

DRAM

Shared
resource
contention
Static partitioning

Memory subsystem Interference


degradation in Core 2
Duo, Intel Nehalem and
AMD Opteron Quad
Core processors

Up to 40% degradation
was observed among
Google applications*

120

Normalized Performance
Degradation (%)

Up to 125%

lbm

100

mcf

80

bzip2

60

povray

40
20
0
Vs lbm Vs mcf Vs bzip2

Vs
povray

Co-located Application
on Intel Core 2 Duo
*The impact of memory subsystem resource sharing on datacenter applications, Tang et al., ISCA 2011

Interference Quantification and Prediction


4

scl Factor Perf. Degr.


(Normalized to default)

Cache Pressure modeling

lbm

8192 sets, 16 ways

Tunable Cache
Intensity

Cache Sets

Cache Ways

User equivalent cache load for prediction


LBM
Appln X
(8192, 16)
VM

VM

Core 1

Core 2

Shared Cache

Application Perf. Degr. (%)

50
40

scl (sets, ways)

Measured (Vs lbm)


Predicted (Vs scl(8192, 16))

30

20
10
0

lbm
gcc
mcf
soplex
omnetpp
bzip2
gobmk
povray
perlbench
libquant
hmmer
sjeng

Performance: Bytes
accessed per second

Sriram Govindan, Jie Liu, Aman Kansal, and Anand Sivasubramaniam, Cuanta: Quantifying Effects of Shared On-chip Resource
Interference for Consolidated Virtual Machines, in ACM Symposium on Cloud Computing (SOCC), October 2011

PACMan: Performance-Aware Consolidation

Profile, Consolidate, Migrate

Alan Roytman, Aman Kansal, Sriram Govindan, Jie Liu, and Suman Nath, PACMan: Performance Aware Virtual Machine Consolidation, in
10th International Conference on Autonomic Computing (ICAC), June 2013

Interference-Aware VM Consolidation
Given n jobs and m machines each with k cores
Job degradation is specified over all job sets
The max degradation should be less than D
Every set of jobs has (energy) cost w(S)

Find a partition of jobs into b < m machines and minimize

Heuristic:

()
=1

Polynomial when k = 2
NP-hard when k > 2
Polynomial time approximation

List all the feasible consolidations

sort them from small interference to


large interference
One pass on placement from small to large.


ln()

Interference-Aware VM Migration
Given an existing assignment and G allowable
migrations, minimize the total cost of the new assignment
after migration.
Polynomial when k = 2
NP-hard when k > 2

NP-hard to approximate

Greedy Heuristics:
Select the worst degraded
VM on a server.
Migrate to the server that
causes least interference
Repeat until G is exhausted.

1000 VMs, 4 cores, 22% TCO Reduction

VM Power Capping
Software Energy Estimation
Estimate VM power consumption from performance counters
Linear regression with whole machine (HW) power meters
Power consumption
Energy Model Error

Component Dynamic
Energy
Performance
counters

20

60

CPU
Memory
Disk

18
16

20
Measured
Estimate
Error

50

14

18
16
14

40

12

10

Watts

Watts

Watts

12
30

8
20

6
4

10

2
0

10

50

100

150
Time(s)

200

250

300

50

100

150
Time(s)

200

250

300

Application Dynamic Energy

Component Dynamic Energy (Cumulative)

25

2500

Aman Kansal, Feng Zhao, Jie Liu, Nupur Kothari, and Arka
2000
Computing (SOCC), June 2010

CPU
Memory
Bhattacharya,
Disk

2500
Total
App 1
App 2

Virtual Machine Power


Metering and Provisioning , in ACM Symposium on Cloud
20

2000

VM Performance Accounting

CPU

DISK

Average Errors: SPEC CPU 2006


Platform: HP DLG380
8-core Xeon, 16GB RAM

19

Benchmark Number

17
Benchmark numbers 2 to 19
are SPEC CPU 2006 INT and FP
benchmarks (ones that compile
without Fortran).

15
13
11

8 copies of each benchmark ran to


keep every core used.

9
7

Benchmark #1 and 20 are a synthetic


loads.

5
3

Error is in line with errors reported in


hardware meters

1
0

2
3
Error (%)

Worst-Case Power Rise in Servers

Power rise in an Intel Xeon L5640 Server


Fastest observed power rise: 200ms

servers

Power rise in an Intel Xeon L5520 Server


Fastest observed power rise : 100ms

Server spikes are correlated.

time
Time

Time Line of Events


Central controller
gives actuation command

Settings reflected by OS

Command received
by agent
~20ms

< 1ms

< 1ms

Command reaches
destination server

200-350ms

<40-60ms in
current
implementation (using
user-level code)

OS changes
setting in hardware

time
(not to scale)

Power decreases

E2E response latency: ~400ms


Typical adjustable power/core: 10~20 W
Arka Bhattacharya, Aman Kansal, David Culler, Sriram Sankar, and Sriram Govindan, The Need for Speed and Stability in Data Center Power Capping, in Third
International Green Computing Conference (IGCC'12), 5 June 2012

Circuit Breaker Architecture


X-PDU

REMOTE
POWER
PANEL

RACK
PDU
SERVER

UPS

Magntic Breaker
Thermal Breaker

Hierarchical Control Framework


PT(t)
PI + Weighted Fair Sharing

Data Center

Controller

Papp-1(t)

Papp-n(t)

PID to Tier 1 only

Application Level Controller 1


Ptier-1(t)
Tier Level
Controller 1
MPC

VM

VM

Application Level Controller n

Ptier-n(t)
Tier Level
Controller n
VM

VM

Ptier-1(t)
Tier Level
Controller 1
VM

VM

Ptier-n(t)

Tier Level
Controller n
VM

VM

Harold Lim, Aman Kansal, and Jie Liu, Power Budgeting for Virtualized Data Centers, in 2011 USENIX Annual Technical Conference (USENIX ATC '11), June 2011

Experimental Results
40 VMs on 10 servers
3 Priorities (stock trader (high), web service (mid), SPEC CPU (low))
Take battery charging as complimentary power consumer
Workload
(%)

100

MSN Messenger
demand traces

50
0

Time (s)
Uncapped

Total Power

Power (Watt)

1150
MPC
Controller

1100
1050
1000
0

1000

2000

3000

Time (s)

4000

5000

6000

Physical
Hierarchy
Controller
Total Power
Budget

The Complete
What
We HavePicture
Discussed So Far
-67% power plant
-10% transmission and distribution
-33% cooling
Fossil fuel

-4% lighting
-15% UPS loss
-10% air handling (fans)
-35% power supply
-85% underutilization
-40% inefficient applications
>99% of initial energy
lost in conversion
100kW
to the power
plant

30kW
to the data
center

16kW
to the server

9.5kW
to the
application

0.9kW
to the
customer

Fuel Cells Are Getting Ready


Fuel cell installation unit count
40000
30000
20000
10000

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

Fuel Cell Electric Vehicles (Toyota)

Fuel cell has surpassed conventional Micro-CHP in 2013

eBays Utah data center (Bloom Energy)

Fuel Cells
Comparing prices - Electricity vs. Natural Gas
(Energy Equivalent)
0.14
0.12

USD

0.10
0.08

Electricity

0.06

Natural gas
0.04
0.02
0.00

Natural gas grid is 100 times more reliable than electrical grid.
Gas grid failure is more graceful.
Natural gas is easier to store than electrical or kinetic energy.

It is Green!

It Can Be Cost Effective


Rack-Level Fuel Cells

150 kV 15kV

Medium
voltage switch
board

Up to 40% cheaper to build

15kV 480 V

Up to 40% lower energy cost


Batt ery

Low voltage
switch board
Genera tor

TVSS

Connection
to street
header

ATS

Manifold
distribution to
each row

480 V 400 /230 V

No electrical distribution

No backup generators
No central UPS

Branch
circuit
distribution

SE RVE R

SE RVE R

SE RVE R

SE RVE R

SE RVE R

SE RVE R

SE RVE R

Fue l Ce ll

SE RVE R

SE RVE R

SE RVE R

SE RVE R

SE RVE R

SE RVE R

Pressure
regulation

Measured 53% efficiency


Ana Carolina Riekstin, Sean James, Aman Kansal, Jie Liu, and Eric Peterson, No More Electrical Infrastructure: Towards Fuel Cell Powered
Data Centers, in 2013 Workshop on Power-Aware Computing and Systems , ACM, November 2013

Technical Challenge: Load Following


Hard power cycle
Power (Watt)

600

500

Power cord off

Fuel Cell Response

Power on

400
300
200
100

Power
cord on
Time (seconds)

0.0
0.9
1.9
2.8
3.8
4.7
5.6
6.6
7.5
8.5
9.4
10.4
11.3
12.2
13.2
14.1
15.1
16.0
16.9
17.9
18.8

Server Crash and Reboot

400
300

200
100
0

Server crash;
Blue screen;
Core dump;
and Restart

Time (seconds)

0
5
10
15
20
25
30
35
40
45
50
55
60
66
71
76
81
86
91
96

Power (Watt)

500

On Going Work
Workload Power
Consumption

Server Power
Consumption

I(k)

PS(k)

Fuel Cell

V(k)
-

+
U(k)
-

Server
Internal
Parasitic

PW(k)

Server

Provisioning
Energy storage selection
and sizing

Power capping/tracing

Dynamic coordination

Conclusions
Data centers are large, complex cyber-physical systems
and a very rich research space.

Energy efficiency should not be considered in isolation.


Power management matters for both capital and operation
reasons.

Virtualization significantly improves utilization but


complicates modeling and control.

The industry has picked the low hanging fruits and is


craving for more.

Acknowledgements

Sriram Govindan
Sean James
Mike Liang
Aman Kansal
Suman Nath
Eric Peterson
Bodhi Priyantha
Sriram Shankar
Lin Xiao
Feng Zhao

Arka Bhattacharya
Gong Chen
Christos Faloutsos
Wenbo He
Oliver Kennedy
Lei Li
Harold Lim
Xue Liu
Chenyang Lu
Ana Carolina Reikstin
Alan Roytman
Abu Sayeed Saifullah
Andreas Terzis
Qiang Wang

Thank you!