You are on page 1of 70

Data Center Construction

and Management

Johan Tordsson and Luis Tomás


Department of Computing Science
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
What is a Data Center?

•  Wikipedia:
“A data center is a facility used to house
computer systems and associated components,
such as telecommunications and storage
systems. It generally includes redundant or
backup power supplies, redundant data
communications connections, environmental
controls (e.g., air conditioning, fire suppression)
and various security devices. Large data centers
are industrial scale operations using as much
electricity as a small town and sometimes are
a significant source of air pollution in the form of
diesel exhaust.”
What is a Data Center?
What is a Data Center?

–  Servers
–  Network
–  Storage
–  Power
–  Cooling
–  Energy-efficiency
–  … and a building to keep everything in…

•  Data centers - How to build and operate


•  Conceptual overview only
–  Details about these only relevant for those
who actually build/operate data centers…
Data Center as a Computer
•  Majority of cloud computing infrastructure consists
of reliable services delivered through data centers

•  Traditional co-location data centers


–  Multiple servers and communications gear collocated
due to common environmental & security needs
–  Hosts a large number of relatively small or medium-
sized applications, each running on a dedicated
hardware infrastructure

•  Data centers for cloud computing platforms


–  Belongs to a single organization
–  Uses a relatively homogeneous hardware and system
software platform
–  Common system management layer
–  Runs a smaller number of very large applications
–  Cloud computing workloads must be designed to
gracefully tolerate large numbers of component
faults with little or no impact on service level
performance and availability
Warehouse Scale Computers
(WSC)
•  Not just a collection of servers
–  100s to 1000s coordinated servers
–  Typically runs on a virtualized platform
–  Fault behavior & energy considerations have
significant impact
–  Needs to be considered as a single unit
•  Must be highly manageable
–  Deployment of software updates
–  Monitoring & system management
•  Affordability
–  Currently power public clouds such as Google,
Amazon, Yahoo, Microsoft, etc…
–  Soon to be affordable by Enterprises
•  A rack of servers can easily have > 600 cores
What’s different about
WSC’s?
“As computation continues to move into the cloud,
the computing platform of interest no longer
resembles a pizza box or a refrigerator, but a
warehouse full of computers. These new large
datacenters are quite different from traditional
hosting facilities of earlier times and cannot be
viewed simply as a collection of co-located
servers. Large portions of the hardware and
software resources in these facilities must work in
concert to efficiently deliver good levels of
Internet service performance, something that can
only be achieved by a holistic approach to their
design and deployment. In other words, we must
treat the datacenter itself as one massive
warehouse-scale computer (WSC).”

Google “Warehouse Style
Computer” Data Center
The New Data Center Industry
•  Data centers replaces servers
•  Container Computer for high efficiency and
environmental conservation (Packaging, PUE, …)
•  Bundled software for integrated service, high
scalability, and availability
•  Large Enterprise will bypass traditional server
channels (IBM, HP, Dell, …)
–  Purchase of entire data center directly from
manufacturers
•  Significant cost reductions
•  Horizontal scalability
•  High Availability
•  Google already buys directly purchase from Taiwan
–  Google 4th largest server manufacturer, does not sell…
–  Facebook opencompute.org project
•  Open specifications for data center design
Container Computers

12
Data Center Architecture
•  Treat the entire data center as a computer
- Air flow analysis
- Cooling architecture (thermal management)
- Power/energy management
- Focus on ease of system and network management
- What cannot be managed/monitored does not get
deployed
•  Modular and Scalable
-  Card to Rack
-  Rack to Container
-  Container to Warehouse
•  Explore low power, commodity CPU as a
building block
Data Center - Tiering
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Data Center Components
Power
•  To run servers
•  To run data center
–  Cooling, power distribution, etc.
Power
•  Uninterruptable Power Supply (UPS)
–  Detects power failure
–  Batteries (for short-term outage + switch)
–  (Diesel) Generator (long-term outage)
•  Power Distribution Unit (PDU)
–  Fancy socket w. power distribution and/or
control
•  Power usage breakdown
Cooling
•  Keep heat-generating servers cool
•  Computer Room Air Conditioners (CRAC)
- Like room air conditioner, for server rooms

•  Very complex to model and design


- Airflow 3D and non-linear
Data center server hardware

•  Standard servers
•  Standard networks
•  Standard storage
•  But at a very large scale

•  Comparison: Parallel computer


–  Custom high-performance hardware (?)
–  Fast interconnection networks
•  Myrinet, Infiniband, …
Design Motivation

•  Multicore CPUs in mid-range servers typically


carry a price/performance benefit
–  2-5 times cheaper than top-of-the-line systems

•  Many services are memory-bound


–  Faster CPUs do not scale well for large services
–  Applications are larger-than-server anyway

•  Slower CPUs are more power efficient;


–  CPU power decreases by O(k^2) when as CPU
frequency decreases by k
Cost comparison example
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Data Center preparation
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Server and network overview
•  High-latency, low-price network
–  Gigabit ethernet
•  Hierarchy of commodity switches
Storage
•  Increased storage + latency with distance
Data center management
tools
Cloud Application Management Tool

Virtual Cluster Provisioning


Network/System
Physical
Management
Cluster
Deployment
Tool
Physical Compute Servers
Security
Distributed Main/Secondary Storage
Network

Intra-Virtual-Cluster
Load Balancing
Power
Management
Virtual Machine Management
Management
•  Virtualization Platform (virtualize everything)
–  CPUs
–  Storage (Filesystems)
–  Network
•  Resource Management
–  Provisioning of virtual clusters
–  Physical machine load balancing
–  Network traffic load balancing
•  Power Management
•  Security
–  Hypervisor protection
–  Isolation between clusters
•  System Management
•  High Availability
–  Physical component failure should not
interrupt availability of virtual resources
•  Cloud Applications management

•  Unless a resource can be remotely managed,


it should not be part of the data center…
Virtualization Platform
•  Leverage existing Mail Bkup HC
AppXYZ
hypervisors Virtual Virtual Virtual
Virtual
Cluster Cluster Cluster
Cluster
–  Allocation of virtual System

machine instances Service


daemons

–  Monitor VM Compute
Nodes
Cloud
OS

Performance agents

–  Virtual storage
provisioning
–  Intra-Virtual Cluster
load balancing Service Data
Nodes Nodes
–  Scalable data center
network
–  Isolation between Physical Physical
Physica
Physical Storage
Stora Stora
Storage
Node Node
l Node Server
ge ge
Server
virtual clusters Node
Serve Serve
r r
–  Virtual machine
migration
Virtual Machine Management
•  Objective
–  Power Management
–  Physical Machine Load Balancing
•  Monitor runtime VM statistics
–  Heuristic calculation to predict workloads
•  Determine power down/up of machines
–  Multi-dimensional bin packing (knapsack)
•  CPU, network, disk
–  VM migration algorithm
•  Physical machine load balancing
–  Migration of VM’s to other physical machine
Distributed System Group
research overview
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Power
•  Data centers major power users
–  Common claim: ~4% of world electricity use
•  Example: Facebook in Luleå (120MW)
–  ~1BSEK (1’000’000’000 SEK) / year (list prices)
•  Exponential growth of data center capacity &
cheaper server hardware
–  Power costs (will) dominate
•  Cost breakdown (examples)
Power Consumption
Power Consumption
Energy-efficiency
Energy-efficiency
•  Not all power is used by servers…
•  Power Usage Efficiency (PUE)
–  Power used / power used computing:

•  Typical: 2.0
•  State-of-the-art: ~1.2

•  Quite a few variants of the definition


–  Many to make data centers look good
•  Others look at power source
–  Carbon vs. solar vs. ….
Energy-efficiency
Energy-efficiency
GOOGLE DATA CENTERS
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Cooling and Air Flow
management
Cooling and Air Flow
management
Cooling and Air Flow
management
•  Cooling by water
–  Water cooling very close to servers
•  Cooling by sea water
–  Inf. availability of cool water (Google Finland)
•  Cooling by location
–  Cold climate reduces
need for cooling
Cooling and Air Flow
management
Cooling and Air Flow
management

Google has build a data center entirely


cooled with sea water in Hamina,
Gulf of Finland

Facebook has build a data center in


Luleå, 60 miles from the Arctic Circle
Power Consumption

- “Free” electricity
- Less carbon emissions
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
–  Workload consolidation
–  Resource overbooking
•  Data center costs
•  Conclusions
IT Energy-efficiency
•  Non-linear server power usage
–  Performance/power ration changes with load
•  High server utilization beneficial
–  But not common by default
IT Energy-efficiency

•  5k Google
servers
(6 months)
IT Energy-efficiency
•  Overbooking – Admission Control

•  Consolidate workloads
–  Power servers off
–  Or slow servers down
•  Dynamic Voltage Frequency Scaling(DVFS)
–  Very hard to assess impact for
bursty (rapidly changing) workloads
•  Oscillations and un-wanted correlations
•  More next time…
•  Consolidation requires software support
–  Must be able to start/stop instances and
autoscale
–  Stateless services preferable
IT Energy-efficiency

Dynamic Voltage Fequency Scaling (DVFS)


Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Costs for a Data Center
•  How much performance is required?
–  How many/fast servers, disks, networks etc.?
–  Size of data center: Watt

•  How much power is needed?


–  PUE?
–  How much cooling?
–  Price of electricity?

•  What additional physical equipment is needed?


–  Redundancy of power and cooling

•  Where to place it, given the above?


–  Costs vs. where are users located?
–  Very attractive to host data centers…
Cloud computing = cost cuts?
•  Amazon EC2 examples
•  Small VM, 3 years full use (est. server lifetime)
–  Per h: $0.08*(24*365*3) -> ~$2100 (!)
–  Reserved: $300 + $0.013*(24*365*3) -> $640
•  Rough estimate of costs for Amazon
(according to ”data center as a computer”)
–  Assume server cost 25% of total cost (TCO)
–  Standard $2k (list price) 1U server today:
•  32 cores + memory, disk etc. Total cost $8k
•  Estimate: Can deliver 64 Small VMs
–  Revenue: $2100*64 … -> ~17 times server cost
–  Amazon does not pay list prices
•  90% discount rumoured
•  With 24/7 use, hourly prices are very high…
Cloud cost life cycle
1.  Develop service
–  Run in-house for testing and very early use

2. Move to cloud-hosting
–  To handle large scale-up of user base

3.  Build own data center to cut hosting costs


–  Once size of service is roughly known
–  Unless major price cuts by IaaS providers,
this will happen for more and more SaaS
providers as server and data center costs
drop…

•  Example: Zynga
Agenda

•  What is a Data Center?


•  Data Center Components
•  How are built
•  How to manage/operate DCs
–  Power and energy efficient
•  Reduce Total Facility Power Needs
•  Reduce IT equipment Power needs
•  Data center costs
•  Conclusions
Conclusions

•  Data centers at warehouse scale


–  More than just a group of servers
–  Holistic management perspective needed

•  Standard solutions superior


–  Off-the-shelf servers, networks, disks, etc.
–  Redundancy, scalability, etc. in software layer

•  Balanced design cust costs and increases


efficiency
–  Efficient cooling and energy usage are
extremely important
Suggested reading

•  ”Data center as a computer”


–  Barroso & Hölzle (Google)

•  Read (somewhat) carefully:


–  Chapter 1, Chapter 3-5
•  Focus on principles, ignore numbers
(examples are a few years old...)
•  Skim:
–  Chapter 2
•  Overlaps texts from last lecture + Data
management lecture

You might also like