You are on page 1of 41

VLSI Scaling for Architects

Mark Horowitz Computer Systems Laboratory Stanford University horowitz@stanford.edu


MAH VLSI Scaling for Architects 1

The Buzz is VLSI Wires are Bad


A lot of talk about VLSI wires being a problem: Delay Noise coupling What are the characteristics of chip wires? How do they compare to scaled gates? And what does it really mean? To CAD developers To architects A very popular figure

MAH

VLSI Scaling for Architects

How Will Scaling Change Computer Design?


To answer this question: First look at what changes when technology scales Surprisingly less changes than you might think Components get faster (both wires and gates) Mostly it allows one to build more complex devices Then look at how computing devices use silicon technology How architects and circuit designers use the transistors What are the looming problems with scaling What can be done to help

Lets start by looking at scaling CMOS technology


MAH VLSI Scaling for Architects 3

Predicting the Future (without making a fool of yourself)


Is very difficult The only guarantee is:
The future will happen, and you will be wrong

Two approaches Think about limitations


SIA 1994 Roadmap
Limited oxide thickness, small clock frequency growth, etc. Industry hit points above the curve

Project from current trends


SIA 1997 Roadmap
Allow miracles to occur, continue trends Project clock rates higher than physically possible

So use a range of technology scalings Better chance of covering the correct answer
MAH VLSI Scaling for Architects 4

Technology Scaling
In scaling there are really two issues Devices Can we build smaller devices What will their performance be Wires Try to avoid the wet noodle effect There is concern about our ability to scale both of these components

MAH

VLSI Scaling for Architects

Device Scaling Limits


Limitations to device scaling has been around since I started I started working in 3 nMOS, 22 years ago (actually bipolar) Worries were Short channel effect Punchthrough
drain control of current rather than gate

Hot electrons Parasitic resistances Now worries are a little different Oxide tunnel currents Punchthrough Parameter control Parasitic resistances
MAH VLSI Scaling for Architects 6

Transistor Scaling
People are building very short channel devices Shown are I-V curves for 15nm L pMOS And a short channel nMOS The structure is strange FinFET But you can make them work

MAH

VLSI Scaling for Architects

Logic Gate Speed


How does the speed of a gate depend on technology? Use a Fanout of 4 inverter metric

16

FO4
Measure the delay of an inverter with Cout/Cin = 4 Divide speed of a circuit by speed of FO4 inverter Get delay of circuit in measured in FO4 inverters Metric pretty stable, over process, temp, and voltage
MAH VLSI Scaling for Architects 8

Delay Tracking

Inverter Delays (log)

12

12

8.4 2

8.4 2

1.4 TT FF SS FS SF

1.4 2 2.5 3 3.5

Process Corner

Power Supply (V)

MAH

VLSI Scaling for Architects

Using FO4 Metric


Is very useful way to estimate/compare circuit/logic Can compare two designs and normalize out technology Can set lower bound on speed of a circuit block
If you know fanout of gate, you can estimate delay

Complete theory is called logical effort


Total Fanout = Electrical fanout * Logical Effort of gate

Use an adder as an example Fastest 64 bit adders run in around 7 FO4 delays
Dynamic adders Hasnt changed very much recently (HP at ISSCC in 96)

Can find bound on the delay


Need to have fanout of 64 for Cin L.E. has a 2 input XOR and some carry logic
MAH VLSI Scaling for Architects 10

Memory Performance
Can use FO4 delays to look at memory access time too. Delay is log(Size) for an optimized design. Wire delay is important at larger memories, but not a dominant factor. Partition array, use fewer thicker wires. Notice access times 10-20 FO4
MAH

40.0 Total Delay (no wire res.) Decoder Delay (no wire res.) Output Path Delay (no wire res.) Total Delay (with wire res.)

30.0

Delay ( fo4)

20.0

10.0

0.0 64Kb

256Kb

1Mb Size

4Mb

16Mb

VLSI Scaling for Architects

11

FO4 Inverter Delay Under Scaling


Device performance will scale FO4 delay has been linear with tech
Approximately 0.36 nS/m*Ldrawn at TT (0.5nS/m under worst-case conditions)
7 FO4 delay (nS) 5 3 1 2 1.5 1 0.5 0 0.36 * Ldrawn

Easy to predict gate performance We can measure them


Labs have built 0.04m devices

Key issue is voltage scaling


Need to scale Vdd for power Hard since Vth does not scale

Feature size (m)

Gate speed improvement might slow down


Vth issues and gate oxide limitations
MAH VLSI Scaling for Architects 12

Circuit Power
Is very much tied to voltage scaling If the power supply scales with technology For a fixed complexity circuit Power scales down as ^3 if you run as same frequency Power scales down as ^2 if you run it 1/ times faster Power scaling is a problem because Freq has been scaling at faster than 1/ Complexity of machine has been growing This will continue to be an issue in future chips Remember scaling the technology makes a chip lower power!
MAH VLSI Scaling for Architects 13

Wire Scaling
More uncertainty than transistor scaling Many options with complex trade-offs For each metal layer Need to set H, TT, TB, 1, 2, conductivity of the metal
TT SL SR ILDTop

2
W TB

Metal, ILDmiddle

1
MAH

ILDBot

VLSI Scaling for Architects

14

Wire Capacitance
Capacitance per micron is roughly constant Simple approx. = fringe (0.07fF/m today) + 4 parallel plates Depends only on the ratio of the parameters Coupling becomes a large issue with increased H/S ratio

1W/TT 2H/SL 2H/SR 1W/TB


MAH VLSI Scaling for Architects

ILDTop

Metal, ILDmiddle

ILDBot

15

Wire Resistance
Resistance is simpler R/m = /wh Scales up as the technology shrinks Main reason that wire height has not scaled much

H1

H1

2R W1

H1

1.4R

W1

W1

Tradeoffs between height, width and wire pitch

MAH

VLSI Scaling for Architects

16

Wire Layers
Not all wiring layers have the same characteristics Today have three types of levels M1
Finest pitch, highest resistance, local interconnection in a cell

M2-4?
Normal routing level, most of the wires

M5+
Thick coarse metal, used for global wires When scaling forces thinner metal, create new top layer

MAH

VLSI Scaling for Architects

17

Noise Issues
Two main noise sources for wires Capacitance coupling Inductive coupling Capacitance coupling is mostly a nearest neighbor issue High aspect ratio wires make this worse Real push for low- dielectric between wires Inductive coupling Is much more complex to analyze
Depends on where the return currents flow

Reduce these problem by design constraints


Gnd returns in buses, power and gnd planes (e.g. 21264)

MAH

VLSI Scaling for Architects

18

Scaling Global Wires

R gets quite a bit worse with scaling; C basically constant


Semi-global wire resistance, 1mm long 0.4 Kohms 0.3 0.2 0.1 0 0.25 0.18 0.13 0.1 0.07 0.05 0.035 Technology Ldrawn (um)
MAH VLSI Scaling for Architects

Semi-global wire capacitance, 1mm long 0.6 0.4 0.2 0 0.25 0.18 0.13 0.1 0.07 0.05 0.035 Technology Ldrawn (um)
19

Aggressive scaling Conservative scaling pF

Aggressive scaling Conservative scaling

Scaling Module Wires

R is basically constant, and C falls linearly with scaling


Semi-global wire resistance, scaled length 0.4 Kohms 0.3 0.2 0.1 0 0.25 0.18 0.13 0.1 0.07 0.05 0.035 Technology Ldrawn (um)
MAH VLSI Scaling for Architects

Semi-global wire capacitance, scaled length 0.6 0.4 0.2 0 0.25 0.18 0.13 0.1 0.07 0.05 0.035 Technology Ldrawn (um)
20

Aggressive scaling Conservative scaling pF

Aggressive scaling Conservative scaling

Module Wires
These wires scale fairly well:
Semi-global wire, scaled length 0.5 Wire delay/Gate delay 0.4 0.3 0.2 0.1 0 0.25 Aggressive scaling Conservative scaling

0.18

0.13

0.1

0.07

0.05

0.035

Technology Ldrawn (um)

Scaled wire delays stay pretty constant relative to gates Not a very big change
MAH VLSI Scaling for Architects 21

Is There a Module-Level Wire Problem?


This first cut seems to imply that scaled wires arent a problem Delay of these wires are scaling (mostly) with gate speed Long wires get worse, but pretty slowly So the job a designer (or CAD tool) see stays the same, right?

So what are we missing? Why are people working so hard on developing new tools?

Die complexity: what if the number of modules doubles?

MAH

VLSI Scaling for Architects

22

Implications of Complexity Growth

9 modules, 22 exceptions

19 modules, 49 exceptions 19 modules, 24 exceptions

What can we do? Larger design teams? Longer design times? Better tools that have fewer exceptions per module
MAH VLSI Scaling for Architects 23

Keeping Design Time Reasonable


Suppose we want the total CAD tool exceptions to stay constant At a module level, we need 1/2 as many exceptions The required threshold for exceptions will grow The delay of those lines increases dramatically
Length in gate pitches 300 200 100 100K gate module 75K gate module 50K gate module Current trend

THIS is important!

Is this important?
0 0.25 0.18 0.13 0.10 0.07 0.05 0.035

Technology Ldrawn (um)

So CAD tools need to handle increasingly longer wires


MAH VLSI Scaling for Architects 24

Global Wire Scaling


Now we examine global wire delay relative to gate delay
Semi-global wire, 1mm long 7 Wire delay/Gate delay 6 5 4 3 2 1 0 0.25 0.18 0.13 0.1 0.07 0.05 0.035 Aggressive scaling Conservative scaling

Technology Ldrawn (um)

Fixed-length wires, relative to gates, worsen by 2x per generation This is a big problem
MAH VLSI Scaling for Architects 25

Designer Responses
Wire engineering -- use wider wires or thicker wires Making wires wider will improve performance
Resistance = k/W; Capacitance = C0 + b*W RC = kC0/W + kb; b << C0 so increasing W helps quite a bit
Delay (nS) for a 1cm wire 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 4
MAH

M3 M5

10 12 Wire width in lambdas


VLSI Scaling for Architects

14

16
26

Designer Responses
Circuit solution -- use repeaters Break the wire into segments Delay becomes linear with length Signal velocity = k ( FO4 Rw Cw )1/2
Rw/2 Cw 4 Cw 4 Cw 4 Rw/2 Rw/2 Cw Cw 4 4 Cwload Cload C 4

MAH

VLSI Scaling for Architects

27

When to Add Repeaters


Repeaters have overhead and can introduce logic complexity Add when they reduce the delay; wire RC > 8 FO4 Add sooner to reduce noise coupling issues Plot for 0.25m minimum width wires:
14 12 Delay (FO4) 10 8 6 4 2 0 0
MAH

M3

Repeaters Repeaters

M5

10 Distance (mm)
VLSI Scaling for Architects

15

20
28

Signal Velocity for Repeated wires


Under SIA scaling, pretty constant over many generations Under conservative scaling, slow change at sub-0.1m techs Makes wire delay increase slowly
18 0 16 0 Delay (ps/mm) 14 0 12 0 10 0 80 60 40 20 0.25
MAH

M1 C on serva tive S IA M3

M5

0.2

0.15 Fe ature s ize (m )


VLSI Scaling for Architects

0.1

0.05
29

A Different View
Wires are not as bad as they are painted in the media If you take a chip and scale it Keeping the same exact design The ratio of the wire delay to gate delay will change slowly
Currently they both scale by roughly the scaling factor Wire get a little slower each generation

Not a big issue Key is the logical span of the wire remained constant The problem is if you make the chip more complex Wires in general need to connect up more gates Logical span increases Get slower Circuit Tricks (repeaters) help some, but not enough
MAH VLSI Scaling for Architects 30

The World is Growing


The problem associated with wires is really due to complexity Diagram shows the logical span you reach in a cycle It also show the logical span of a chip

Old view: a chip looks small to a wire

Logical chip size Distance I can go in 1 cycle New view: a chip looks really big to a wire How is this going to affect chip design?

MAH

VLSI Scaling for Architects

31

Computer Performance
100

ISPEC Performance vs Year


100

10

SpecInt95

1 80386 80486 Pentium Pentium II

ISPEC

ISPEC 10

0.1

0.01 Jan-85 Jan-88 Jan-91 Jan-94 Jan-97

1 1993

1994

1995

1996

1997

1998

1999

Year

How is scaling going to effect computer design? Use Hennessy Patterson Formula Delay = Number of Instructions * Cycles/Inst * Clock Period
Performance = Clock Frequency/CPI

Look at how these factors have been improving, and why


MAH VLSI Scaling for Architects 32

Computer Architects Job


Convert transistors to performance Use transistors to Exploit parallelism Or create it (speculate) Processor generations Simple machine
Reuse hardware

Pipelined
Separate hardware for each stage

Super-scalar
Multiple port mems, function units

Out-of-order
Mega-ports, complex scheduling

Speculation Each design has more logic to accomplish same task (but faster)
MAH VLSI Scaling for Architects 33

Architecture Scaling
Plot of IPC Compiler + IPC 1.5x / generation Used all known tricks OOO is old idea Uses lots of wires What next? Wider machines Threads Speculation
Guess answers to create parallelism

0.05 80386 80486 Pentium Pentium II

0.04

SpecInt95 / MHz

0.03

0.02

0.01

0.00 Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

Have high wire costs


MAH VLSI Scaling for Architects 34

Architecture Scaling Issues


Wire scaling is an issue for architecture Need to support a old program model Modest number of global resources
Registers, memory port

To execute in parallel Need to find the parallelism in instruction stream


Many instruction decoders, needing to communicate

Need to predict instruction stream well


Large memory for prediction tables

Need multiple functional units Need large ported central registerfiles This will not scale: Machines are already starting to use internal clustering
MAH VLSI Scaling for Architects 35

Clock Frequency
1000

MHz

100

80386 80486 Pentium Pentium II 10 Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

Most of performance comes from clock scaling Clock frequency double each generation Two factors contribute: technology (1.4x/gen), circuit design
MAH VLSI Scaling for Architects 36

Gates Per Clock


Clock speed has been scaling faster than base technology Number of FO4 delays in a cycle has been falling
100.00

80386 80486 Pentium Pentium II

10.00 Dec-83 Dec-86 Dec-89 Dec-92 Dec-95 Dec-98

Number of gates decrease 1.4x each generation Caused by: Faster circuit families (dynamic logic) Better optimization Better micro-architecture Better adder/mem arch All this generally requires more transistors
37

FO4 inverter delays / cycle


MAH

VLSI Scaling for Architects

Gates Per Clock Limits


Current SOA machines are at 16 FO4 gates per cycle Historical low values (Cray) were at this level Overhead for short tick machines grows rapidly Power
Increases clock power per logic function

Latency
Flops are already 10-20% of cycle today

Logic reach grows smaller


What fits in a cycle (how many bits/gates) decreases

Difficult to generate a clock at less than 8 FO4 gates Continued scaling of gates/clock will be hard Guessing slope will change soon
MAH VLSI Scaling for Architects 38

Performance Scaling
Remember processor performance plots used to have two lines
Mainframe

uP

Microprocessors and mainframes Mainframes had maxed out and improved at technology rate What will happen with microprocessors?

MAH

VLSI Scaling for Architects

39

Will Processor Performance Slow Down?


Yes and No Uniprocessor performance growth will slow down Lastest jump is getting to the 16ish FO4 cycles People will change the benchmarks to fix this problem More data parallel application
Multi-media / streaming applications

More threaded applications

Explicitly parallel machine scale quite nicely


MAH VLSI Scaling for Architects 40

VLSI Scaling Summary


Scaling allows people to build more complex machines That run faster too It does not to first order change the difficulty of module design Module wires will get worse, but only slowly You dont think to rethink your wires in your adder, memory
Or even your super-scalar processor core

It does let you design more modules Continued scaling of uniprocessor performance is getting hard Machines using global resources run into wire limitations Faster clock ticks (in FO4) is getting very hard
Power and logical reach issues in going much under 16 FO4s

Machines will have to become more explicitly parallel


MAH VLSI Scaling for Architects 41

You might also like