Professional Documents
Culture Documents
November, 2007
Acknowledgements
Columbia Prof. Luca Carloni Dr. Assaf Shacham, Michele Petracca, Ben Lee, Caroline Lai, Howard Wang, Sasha Biberman IBM
Jeff Kash
Yurii Vlasov Cornell Michal Lipson
November, 2007
November, 2007
IBM Cell:
Value 90nm SOI with low- dielectrics and 8 metal layers of copper interconnect Chip area 235mm^2 Number of transistors ~234M Operating clock frequency 4Ghz Power dissipation ~100W Percentage of power dissipation due to 30-50% global interconnect Intra-chip, inter-core communication 1.024 Tbps, 2Gb/sec/lane (four shared bandwidth buses, 128 bits data + 64 bits address each) I/O communication bandwidth 0.819 Tbps (includes external memory)
5
Frontiers in Nanophotonics and Plasmonics, Guaruja, SP Brazil
November, 2007
Some examples
Columbia University
ELECTRONICS:
Buffer, receive and re-transmit at every switch Off chip is pin-limited and really power hungry
RX
TX
RX TX
RX TX
RX TX
RX TX
RX
November, 2007
Luxtera, 2005
November, 2007
For 22nm scaling will enable 36 multithreaded cores similar to todays Cell
Estimated on-chip local memory per complex core ~0.5GB
November, 2007
P
P
P
P
P
P
P
G
P
G
P
G
Cell Core
(on processor plane)
P
G
P
G
P
G
Photonic NoC
P
G
P
G
P
G
Deflection Switch
12
November, 2007
HIGH-SPEED RECEIVER
Ti/Al
IBM
Wm
SiO2
n+ p+ n+ p+
tGe Si Wi Ge SiO2 Si S
IBM/Columbia
13
November, 2007
North
PSE
ER
PSE PSE
East
PSE
South
Electronic router
High speed simple logic Links optimized for high speed Nearly no power consumption in OFF state
CMOS Driver
CMOS Driver
CMOS Driver
CMOS Driver
14
November, 2007
N N
W S
S
November, 2007
15
1
2 3 4 5 6
Unidirectional links
1 x 2 Switches Simple routing algorithm
7
8
16
Frontiers in Nanophotonics and Plasmonics, Guaruja, SP Brazil
November, 2007
1 2 3 4 1 2 3 4 5 6 7 8 8 7 6 5
Bidirectionality provides for independent reception by two nodes from output (Y) dimension
17
November, 2007
18
November, 2007
2 6
7 3
19
November, 2007
2 6
7 3
20
November, 2007
2 5 4
GW
21
November, 2007
2 5 4
GW
22
November, 2007
1 2 6 7
5
23
Frontiers in Nanophotonics and Plasmonics, Guaruja, SP Brazil
4
November, 2007
21
36
36 250 1 Tb/s 28.8 W 360 mW 2.5 W 31.7 W
10 mW
10 mW 250
0.4 pJ/bit
Off-chip lasers
? demux
Broadband router . . .
modulator
...
Broadband router
Rx
. . .
Rx
Supercore
amp
1 Tb/s
24
Performance Analysis
Goal to evaluate performance-per-Watt advantage of CMP system with photonic NoC Developed network simulator using OMNeT++: modular, opensource, event-driven simulation environment Modules for photonic building blocks, assembled in network Multithreaded model for complex cores Evaluate NoC performance under uniform random distribution Performance-per-Watt gains of photonic NoC on FFT application
Optic al I/O
25
November, 2007
27
November, 2007
Multithreading enables better exploitation of photonic NoC high BW Gain of 26% over single-thread Non-blocking mesh, shorter average path, improved by 13% over crossbar
28
Frontiers in Nanophotonics and Plasmonics, Guaruja, SP Brazil
November, 2007
Hop between two switches is 2.78mm, with average path of 11 hops and 4 switch element turns
32 blocks of 256MB and line rate of 960Gbps, each connection is 105.6mW at interfaces and 2mW in switch turns
30
November, 2007
FFT computation: time ratio and power ratio as function of line rate
31
Frontiers in Nanophotonics and Plasmonics, Guaruja, SP Brazil
November, 2007
Performance-per-Watt
To achieve same execution time (time ratio = 1), electronic NoC must operate at the same line rate of 960Gbps, dissipating 7.6W/connection or ~70X over photonic Total dissipated power is ~244W To achieve same power (power ratio = 1), electronic NoC must operate at line rate of 13.5Gbps, a reduction of 98.6%. Execution time will take ~1sec or 15X longer than photonic
32
November, 2007
Summary
CMPs are clearly emerging for power efficient high performance computing capability Future on-chip interconnects must provide large bandwidth to many cores Electronic NoCs dissipate prohibitively high power