You are on page 1of 23

Power Reduction Techniques for

an
8-core Xeon® Processor
Tanmay Rao M
MS VLSI CAD
101002014

Under the Guidance of


1 Mr Shridhar Nayak

12/8/21
2 12/8/21

Content
 What is Nehalem Architecture?
 What is a Xeon Processor?
 Power Reduction Techniques
 Core and Cache Recovery
 Power Management Techniques
 Idle Power Reduction
 Conclusion
3 12/8/21

Nehalem Architecture
 Nehalem is a family of next-generation Itanium
processors
 New system architecture, significantly enhanced
Core architecture innovations in power
management and modular design.
 Intel QuickPath Interconnect, in high-end models,
replacing the legacy front side bus.
4 12/8/21

 Intel® Hyper-Threading Technology.


 Intel Turbo Boost Technology.
 The following caches:
 32 KB L1 instruction and 32 KB L1 data cache per
core
 256 KB L2 cache per core
 4–12 MB L3 cache shared by all cores
 30% lower power usage for the same performance
5 12/8/21

Xeon Processor
 Xeon is a brand of multiprocessing- or multi-
socket-capable x86 microprocessors targeted at the
non-consumer server, workstation .
 The Xeon brand has been maintained over several
generations of x86 and x86-64 processors.
 The Xeon CPUs generally have more cache than
their desktop counterparts in addition to
multiprocessing capabilities.
6 12/8/21

Power Dissipation Trends


7 12/8/21

Overview of Xeon Nehalem-Ex


 The shared L3 cache is split in eight slices. Even
though each cache slice is aligned with a processor
core, the entire L3 cache is seen as one large,
shared cache by all cores.
 The top side of the floor-plan includes four point-
to-point Quick Path Interconnect links (QPI)
running at 6.4GT/s
 The bottom side houses the memory interface.
8 12/8/21

Processor Die Photo


9 12/8/21

Power Reduction Techniques


 The processor has four voltage domains.
 To reduce both switching and leakage power, we strive to
operate each domain at the lowest possible voltage level.
 To minimize leakage power, long channel devices are
used in non-critical circuit paths. These devices trade off
speed for lower leakage: about 10% lower speed provides
a 3x leakage reduction.
 The processor cores include 58% long channel devices,
while the un-core logic uses 85% low-leakage transistors.
 All circuits use static CMOS logic to minimize the active
switching power.
10 12/8/21

Processor Voltage Domains


11 12/8/21

L3 Cache
 The L3 cache implements both sleep and shut-off
modes, that reduce leakage by 35% and 83%
respectively.
 In active state, the large pull-up is turned on and
connects the virtual array supply to the real Vce.
 In sleep mode, the large pull-up is turned off and
the small multi-leg programmable pull-up acts as a
resistor that drops the supply voltage into the sleep
state.
12 12/8/21

 In shut-off mode all pull-up devices are turned off and the
virtual VCC drops to a low voltage determined by the residual
leakage of the array.
 In shut-off state the array is functionally disabled and the logic
state of the array bits is lost.
13 12/8/21

Cache and Core Recovery


 Ifone of the cores has a manufacturing defect, we
disable the core and recover that part.
 The same applies for a massive defect in the cache
which cannot be repaired using the built-in cache
redundancy.
 On-die electrically programmable one-time fuses
are used for the core and cache recovery.
 The disabled cores are clock and power gated to
reduce power consumption.
14 12/8/21

Cache And Core Recovery


15 12/8/21

Infrared Image of the Die


16 12/8/21

 Brighter shades of gray indicate higher switching


and leakage power .
 The two cores that are disabled show dark in the
picture, since there is no infrared emission coming
from them.
 The one bright spot in each core is due to the
thermal sensor, which is powered by the clean PLL
voltage domain and is not affected by the core-
level power gates.
17 12/8/21

Power Management
 The PCU contains a micro-controller.
 The PCU receives the output of core-level voltage
and temperature sensors, as well as the desired
power state for each core.
 The microcontroller computes the voltage ID bits
that program the external voltage regulator and the
multiplier ratio for the core PLLs.
 PCU controls the core voltage regulator output and
the core power gates, manage the transitions
between the different power states.
18 12/8/21

Block Diagram of PCU


19 12/8/21

Idle Power Reduction


 To minimize the power consumption in unused blocks.
 Cutting the power dissipation of the unused I/O links
 Multiple platform configurations leave I/O links un-
terminated at the other end.
 The link detect circuit is implemented.
 The Link Detect signal shuts off the link PLL and the
analog bias circuit, such that there is no active power
dissipated in the un-terminated link.
 This saves about 2W per disabled link.
20 12/8/21
21 12/8/21

SUMMARY
 Multiple voltage and clock domains are used to
minimize the power consumption for each domain.
 Power gating is implemented at both the core and
cache level to control leakage.
 An on-die microcontroller manages voltage and
frequency operating points, as well as power and
thermal events.
 Idle power is reduced by shutting off the un-
terminated I/O links.
22 12/8/21

REFERENCES
 “Power Reduction Techniques for an 8-core Xeon®
Processor” Stefan Rusu et al ,Intel Corporation ,Santa
Clara, CA, USA.
 K. Mistry, et aI., "A 45nm Logic Technology with High-
k+ Metal Gate Transistors, Strained Silicon, 9 Cu
Interconnect Layers, 193nm Dry Patterning, and 100%
Pb-free Packaging", IEDM Tech. Digest, December
2007
 S. Rusu, et aI., "A 45nm 8-Core Enterprise Xeon"
Processor", ISSCC Tech. Digest, February 2009.
23 12/8/21

THANK YOU

You might also like