You are on page 1of 279

Inclusive Language Commitment

• Arm is committed to making the language we use inclusive, meaningful, and respectful. Our goal is to
remove and replace non-inclusive language from our vocabulary to reflect our values and represent our
global ecosystem.
• Arm is working actively with our partners, standards bodies, and the wider ecosystem to adopt a
consistent approach to the use of inclusive language and to eradicate and replace offensive terms. We
recognise that this will take time. This course contains references to non-inclusive language; it will be
updated with newer terms as those terms are agreed and ratified with the wider community. We
recognise that some of you will be accustomed to using the previous terms and may not immediately
recognise their replacements. Please refer to the following examples:
• When introducing edge-triggered Flip-Flops, we will use the term ‘JK Flip-flop’ instead of ‘master-slave flip-flop’.

• Contact us at education@arm.com with questions or comments about this course. You can also report
non-inclusive and offensive terminology usage in Arm content at terms@arm.com.

1 © 2020 Arm Limited


CMOS VLSI Design

Lecture 1:
Introduction
Course Learning Objectives

At the end of this course, you should be able to:


• Design a custom digital integrated circuit using commercial CAD tools
• Interpret, analyze, and design cells and components at the schematic and layout levels
• Analyze delay and power consumption analytically and through simulation
• Select appropriate structures for combinational and sequential circuits, and datapath
and memory components
• Describe considerations for reliability and testability

3 © 2020 Arm Limited


Sample Course Schedule

4 © 2020 Arm Limited


This Lecture’s Learning Objectives
At the end of this lecture, you should be able to:
• Describe the properties of nMOS and pMOS transistors which enables them to be
used as digital switches.
• Draw schematic of simple logic gates using a combination
of nMOS and pMOS transistors.
• Sketch the cross-sections of a CMOS inverter.
• List the fabrication steps of a CMOS inverter.

5 © 2020 Arm Limited


Introduction
• Integrated circuits: many transistors on one chip.
• Very Large-Scale Integration (VLSI): bucketloads!
• Complementary Metal Oxide Semiconductor
• Fast, cheap, low power transistors
• Today: How to build your own simple CMOS chip
• CMOS transistors
• Building logic gates from transistors
• Transistor layout and fabrication
• Rest of the course: How to build a good CMOS chip

6 © 2020 Arm Limited


Silicon Lattice
• Transistors are built on a silicon substrate
• Silicon is a Group IV material
• Forms crystal lattice with bonds to four neighbors

Si Si Si

Si Si Si

Si Si Si

7 © 2020 Arm Limited


Dopants
• Silicon is a semiconductor
• Pure silicon has no free carriers and conducts poorly
• Adding dopants increases the conductivity
• Group V: extra electron (n-type)
• Group III: missing electron, called hole (p-type)

Si Si Si Si Si Si
- +

+ -
Si As Si Si B Si

Si Si Si Si Si Si

8 © 2020 Arm Limited


p-n Junctions
• A junction between p-type and n-type semiconductor forms a diode.
• Current flows only in one direction

p-type n-type

anode cathode

9 © 2020 Arm Limited


nMOS Transistor
• Four terminals: gate, source, drain, body
• Symbol shows gate, source, drain
• Usually omits body
• Source and drain are logically indistinguishable

Source Gate Drain


Polysilicon
SiO2

n+ n+
Body
p bulk Si

10 © 2020 Arm Limited


nMOS Transistor
• Gate – oxide – body stack looks like a capacitor
• Gate and body are conductors
• Body is lightly doped p-type
• Gate is a metal or polycrystalline silicon (polysilicon)
• SiO2 (oxide) is an excellent insulator
• Called metal – oxide – semiconductor (MOS) capacitor
• Source and drain are heavily doped (n+) silicon

Source Gate Drain


Polysilicon
SiO2

n+ n+
Body
p bulk Si

11 © 2020 Arm Limited


nMOS Operation
• Body is usually tied to ground (0 V)
• When the gate is at a low voltage:
• P-type body is at low voltage
• Source-body and drain-body diodes are OFF
• No current flows, transistor is OFF

Source Gate Drain


Polysilicon
SiO2

0
n+ n+
S D
p bulk Si

12 © 2020 Arm Limited


nMOS Operation Cont.
• When the gate is at a high voltage:
• Positive charge on gate of MOS capacitor
• Negative charge attracted to body
• Inverts a channel under gate to n-type
• Now current can flow through n-type silicon from source through channel to drain, transistor is ON

Source Gate Drain


Polysilicon
SiO2

1
n+ n+
S D
p bulk Si

13 © 2020 Arm Limited


pMOS Transistor
• Similar, but doping and voltages reversed
• Body tied to high voltage (VDD)
• Gate low: transistor ON
• Gate high: transistor OFF
• Bubble indicates inverted behavior

Source Gate Drain


Polysilicon
SiO2

p+ p+

n bulk Si

14 © 2020 Arm Limited


Power Supply Voltage
• GND = 0 V
• In 1980’s, VDD = 5 V
• VDD has decreased in modern processes
• High VDD would damage modern tiny transistors
• Lower VDD saves power
• VDD = 3.3, 2.5, 1.8, 1.5, 1.2, 1.0, 0.8, 0.7, …
• Gradually scaling down as transistors shrink

15 © 2020 Arm Limited


Transistors as Switches
• We can view MOS transistors as electrically controlled switches
• Voltage at gate controls path from source to drain

g=0 g=1

d d d
nMOS g OFF
ON
s s s

d d d

pMOS g OFF
ON
s s s

16 © 2020 Arm Limited


CMOS Inverter

A Y VDD
0 1
1 0 OFF
ON
0
1
A Y
ON
OFF

A Y
GND
17 © 2020 Arm Limited
CMOS NAND Gate

A B Y
0 0 1 ON
OFF
OFF
ON OFF
ON
0 1 1
1
Y
1 0 1 0 ON
A OFF
1 1 0 0
1
1
0 OFF
ON
B ON
OFF

18 © 2020 Arm Limited


CMOS NOR Gate

A B Y
0 0 1 A
0 1 0
1 0 0 B
1 1 0 Y

19 © 2020 Arm Limited


3-input NAND Gate
• Y pulls low if ALL inputs are 1
• Y pulls high if ANY input is 0

Y
A
B
C

20 © 2020 Arm Limited


CMOS Fabrication
• CMOS transistors are fabricated on silicon wafer
• Lithography process similar to printing press
• On each step, different materials are deposited or etched
• Easiest to understand by viewing both top and cross-section of wafer in a simplified
manufacturing process

21 © 2020 Arm Limited


Inverter Cross-section
• Typically use p-type substrate for nMOS transistors
• Requires n-well for body of pMOS transistors
• So pMOS p-type source/drain doesn’t short to p-type substrate

A
GND VDD
Y SiO2

n+ diffusion

p+ diffusion
n+ n+ p+ p+
polysilicon
n well
p substrate
metal1

nMOS transistor pMOS transistor

22 © 2020 Arm Limited


Well and Substrate Taps
• Substrate must be tied to GND and n-well to VDD
• Metal to lightly doped semiconductor forms poor connection called Schottky Diode
• Use heavily doped well and substrate contacts/taps

A
GND VDD
Y

p+ n+ n+ p+ p+ n+

n well
p substrate

well
substrate tap
tap

23 © 2020 Arm Limited


Inverter Mask Set
• Transistors and wires are defined by masks
• Cross-section taken along dashed line

GND VDD

nMOS transistor pMOS transistor


substrate tap well tap

24 © 2020 Arm Limited


Detailed Mask Views
• Six masks
• n-well
• Polysilicon n well

• n+ diffusion
• p+ diffusion
• Contact Polysilicon

• Metal
n+ Diffusion

p+ Diffusion

Contact

Metal

25 © 2020 Arm Limited


Fabrication
• Chips are built in huge factories called fabs
• Contain clean rooms as large as football fields, costing billions of dollars

Courtesy of International
Business Machines Corporation.
Unauthorized use not permitted.

26 © 2020 Arm Limited


Fabrication Steps
• Start with blank wafer
• Build inverter from the bottom up
• First step will be to form the n-well
• Cover wafer with protective layer of SiO2 (oxide)
• Remove layer where n-well should be built
• Implant or diffuse n dopants into exposed wafer
• Strip off SiO2

p substrate

27 © 2020 Arm Limited


Oxidation
• Grow SiO2 on top of Si wafer
• 900 – 1200 ℃ with H2O or O2 in oxidation furnace

SiO2

p substrate

28 © 2020 Arm Limited


Photoresist
• Spin on photoresist
• Photoresist is a light-sensitive organic polymer
• Softens where exposed to light

Photoresist
SiO2

p substrate

29 © 2020 Arm Limited


Lithography
• Expose photoresist through n-well mask
• Strip off exposed photoresist

Photoresist
SiO2

p substrate

30 © 2020 Arm Limited


Etch
• Etch oxide with hydrofluoric acid (HF)
• Seeps through skin and eats bone, nasty stuff!!!
• Only attacks oxide where resist has been exposed

Photoresist
SiO2

p substrate

31 © 2020 Arm Limited


Strip Photoresist
• Strip off remaining photoresist
• Use a mixture of acids called piranha etch
• Resist doesn't melt in the next step

SiO2

p substrate

32 © 2020 Arm Limited


n-well
• n-well is formed with diffusion or ion implantation
• Diffusion
• Place wafer in furnace with arsenic gas
• Heat until As atoms diffuse into exposed Si
• Ion Implantation
• Blast wafer with a beam of As ions
• Ions blocked by SiO2, only enter exposed Si

SiO2

n well

33 © 2020 Arm Limited


Strip Oxide
• Strip off the remaining oxide using HF
• Back to bare wafer with n-well
• Subsequent steps involve similar series of steps

n well
p substrate

34 © 2020 Arm Limited


Polysilicon
• Deposit very thin layer of gate oxide
• < 20 Å (6-7 atomic layers)
• Chemical Vapor Deposition (CVD) of silicon layer
• Place wafer in furnace with Silane gas (SiH4)
• Forms many small crystals called polysilicon
• Heavily doped to be good conductor

Polysilicon
Thin gate oxide

n well
p substrate

35 © 2020 Arm Limited


Polysilicon Patterning
• Use the same lithography process to pattern polysilicon

Polysilicon

Polysilicon
Thin gate oxide

n well
p substrate

36 © 2020 Arm Limited


Self-Aligned Process
• Use oxide and masking to expose where n+ dopants should be diffused or implanted
• N-diffusion forms nMOS source, drain, and n-well contact

n well
p substrate

37 © 2020 Arm Limited


N-diffusion
• Pattern oxide and form n+ regions
• Self-aligned process where gate blocks diffusion
• Polysilicon is better than metal for self-aligned gates because it doesn’t melt during later
processing

n+ Diffusion

n well
p substrate

38 © 2020 Arm Limited


N-diffusion cont.
• Historically dopants were diffused
• Usually ion implantation today
• But regions are still called diffusion

n+ n+ n+

n well
p substrate

39 © 2020 Arm Limited


N-diffusion cont.
• Strip off oxide to complete patterning step

n+ n+ n+

n well
p substrate

40 © 2020 Arm Limited


P-Diffusion
• Similar set of steps form p+ diffusion regions for pMOS source and drain and substrate
contact

p+ Diffusion

p+ n+ n+ p+ p+ n+

n well
p substrate

41 © 2020 Arm Limited


Contacts
• Now we need to wire together the devices
• Cover chip with thick field oxide
• Etch oxide where contact cuts are needed

Contact

Thick field oxide


p+ n+ n+ p+ p+ n+

n well
p substrate

42 © 2020 Arm Limited


Metallization
• Sputter on aluminum over whole wafer
• Pattern to remove excess metal, leaving wires

Metal

Metal
Thick field oxide
p+ n+ n+ p+ p+ n+

n well
p substrate

43 © 2020 Arm Limited


Layout
• Chips are specified with a set of masks
• Minimum dimensions of masks determine transistor size (and hence speed, cost, and
power)
• Feature size f = distance between source and drain
• Set by minimum width of polysilicon
• Feature size improves 30% every 3 years or so
• Normalize for feature size when describing design rules
• Express rules in terms of λ = f/2
• E.g., λ = 0.3 𝜇m in 0.6 𝜇m process

44 © 2020 Arm Limited


Simplified Design Rules
• Conservative rules to get you started

45 © 2020 Arm Limited


Inverter Layout
• Transistor dimensions specified as Width/Length
• Minimum size is 4 λ/2 λ, sometimes called 1 unit
• In f = 0.6 𝜇m process, this is 1.2 𝜇m wide, 0.6 𝜇m long

46 © 2020 Arm Limited


Summary
• MOS transistors are stacks of gate, oxide, silicon
• Act as electrically controlled switches
• Build logic gates out of switches
• Draw masks to specify layout of transistors

• Now you know everything necessary to start designing schematics and layout for a
simple chip!

47 © 2020 Arm Limited


CMOS VLSI Design

Lecture 2:
Circuits and
Layout
Learning Objectives
At the end of this lecture, you should be able to:
• Draw transistor-level schematics and layouts for complementary CMOS standard cells.
• Use time diagrams to describe the operation of D latch and D Flip-flop.
• Use stick diagrams to sketch and plan cell layouts.

2 © 2020 Arm Limited


A Brief History
• 1958: First integrated circuit
• Flip-flop using two transistors
• Built by Jack Kilby at Texas Instruments
• 2019
• Apple A13 microprocessor
– 8.5 billion transistors
• Samsung 8 TB Flash memory Courtesy Texas Instruments

– 2 trillion stacked transistors

[Trinh09]
© 2009 IEEE.

3 © 2020 Arm Limited


Growth Rate
• 53% compound annual growth rate over 50 years
• No other technology has grown so fast so long
• Driven by miniaturization of transistors
• Smaller is cheaper, faster, lower in power!
• Revolutionary effects on society

4 © 2020 Arm Limited


Annual Sales
• >1020 transistors manufactured in 2017
• 56 billion for every human on the planet
500

Billions
Global Semiconductor Billings (US$)
450
400
350
300
250
200
150
100
50
0
1985 1990 1995 2000 2005 2010 2015 2020

Year

5 © 2020 Arm Limited


Invention of the Transistor
• Vacuum tubes ruled in first half of 20th century Large, expensive, power-hungry,
unreliable
• 1947: first point contact transistor
• John Bardeen and Walter Brattain at Bell Labs
• See Crystal Fire
by Riordan, Hoddeson

AT&T Archives.
Reprinted with
permission.

6 © 2020 Arm Limited


Transistor Types
• Bipolar transistors
• npn or pnp silicon structure
• Small current into very thin base layer controls large currents between emitter and collector
• Base currents limit integration density
• Metal Oxide Semiconductor Field Effect Transistors
• nMOS and pMOS MOSFETS
• Voltage applied to insulated gate controls current between source and drain
• Low power allows very high integration

7 © 2020 Arm Limited


MOS Integrated Circuits
• 1970’s processes usually had only nMOS transistors
• Inexpensive, but consumed power while idle

Intel
Museum.
[Vadasz69]
Reprinted
© 1969 IEEE. with
permission.

Intel 1101 256-bit SRAM Intel 4004 4-bit mProc

• 1980s-present: CMOS processes for low idle power


8 © 2020 Arm Limited
Moore’s Law: Then
• 1965: Gordon Moore plotted transistor on each chip
• Fit a straight line on semilog scale
• Transistor counts have doubled every 26 months

Integration Levels
SSI: 10 gates
MSI: 1000 gates
LSI: 10,000 gates
VLSI: >10k gates

9 © 2020 Arm Limited


And Now…

10 © 2020 Arm Limited


Feature Size
• Minimum feature size shrinking 30% every 2-3 years

11 © 2020 Arm Limited


Corollaries
• Many other factors grow exponentially
• E.g., clock frequency, processor performance

Year Clock Speed (Hz) Processor name


1971 740K 4004
1972 500K 8008
1982 6M 80186
1982 8M 80188
1993 66M Pentium
1999 600M Pentium III
2000 2G Pentium 4
2006 2.33K Core 2 Duo
2008 3.2G Core i7
2011 2.67G Xeon E7
2013 4.4G Intel "Haswell"
12 © 2020 Arm Limited
Intel Processors, year and clock speed
Complementary CMOS
• Complementary CMOS logic gates
• nMOS pull-down network
• pMOS pull-up network
• a.k.a. static CMOS

13 © 2020 Arm Limited


Series and Parallel
• nMOS: 1 = ON
• pMOS: 0 = ON
• Series: both must be ON
• Parallel: either can be ON

14 © 2020 Arm Limited


Conduction Complement
• Complementary CMOS gates always produce 0 or 1
• E.g., NAND gate
• Series nMOS: Y=0 when both inputs are 1
• Thus, Y=1 when either input is 0
• Requires parallel pMOS

• Rule of Conduction Complements


• Pull-up network is a complement of pull-down
• Parallel -> series, series -> parallel

15 © 2020 Arm Limited


Compound Gates
• Compound gates can do any inverting function
• E.g.,
Y = AiB + C iD (AND-AND-OR-INVERT, AOI22)

16 © 2020 Arm Limited


Example: O3AI

( )
Y = A + B + C iD

17 © 2020 Arm Limited


Signal Strength
• Strength of signal
• How close it approximates ideal voltage source
• VDD and GND rails are strongest 1 and 0
• nMOS pass strong 0
• But degraded or weak 1
• pMOS pass strong 1
• But degraded or weak 0
• Thus, nMOS are best for pull-down network

18 © 2020 Arm Limited


Pass Transistors
• Transistors can be used as switches

19 © 2020 Arm Limited


Transmission Gates
• Pass transistors produce degraded outputs
• Transmission gates pass both 0 and 1 well

20 © 2020 Arm Limited


Tristates
• Tristate buffer produces Z when not enabled

21 © 2020 Arm Limited


Nonrestoring Tristate
• Transmission gate acts as a tristate buffer
• Only two transistors
• But nonrestoring
– Noise on A is passed on to Y

22 © 2020 Arm Limited


Tristate Inverter
• Tristate inverter produces restored output
• Violates conduction complement rule
• Because we want a Z output

23 © 2020 Arm Limited


Multiplexers
• 2:1 multiplexer chooses between two inputs

24 © 2020 Arm Limited


Gate-Level Mux Design
• Y  SD1  SD0 (too many transistors)
• How many transistors are needed?

D1
S Y
D0

D1 4 2
S 4 2 Y
D0 4 2
2

25 © 2020 Arm Limited


Transmission Gate Mux
• Nonrestoring mux uses two transmission gates
• Only 4 transistors

26 © 2020 Arm Limited


Inverting Mux
• Inverting multiplexer
• Use compound AOI22
• Or a pair of tristate inverters
• Essentially the same thing
• Noninverting multiplexer adds an inverter

27 © 2020 Arm Limited


4:1 Multiplexer
• 4:1 mux chooses one of 4 inputs using two selects
• Two levels of 2:1 muxes
• Or four tristates

28 © 2020 Arm Limited


D Latch
• When CLK = 1, latch is transparent
• D flows through to Q like a buffer
• When CLK = 0, the latch is opaque
• Q holds its old value independent of D
• Aka transparent latch or level-sensitive latch

29 © 2020 Arm Limited


D Latch Design
• Multiplexer chooses D or old Q

30 © 2020 Arm Limited


D Latch Operation

31 © 2020 Arm Limited


D Flip-flop
• When CLK rises, D is copied to Q
• At all other times, Q holds its value
• Also known as positive edge-triggered flip-flop

32 © 2020 Arm Limited


D Flip-flop Design
• Built from two D latches, a primary and secondary D latches

33 © 2020 Arm Limited


D Flip-flop Operation

34 © 2020 Arm Limited


Race Condition
• Back-to-back flops can malfunction from clock skew
• Second flip-flop fires late
• Sees first flip-flop change and captures its result
• Called hold-time failure or race condition

CLK1
CLK1 CLK2 CLK2

Q1
Flop

Flop
Q1 Q2
D
Q2

35 © 2020 Arm Limited


Nonoverlapping Clocks
• Nonoverlapping clocks can prevent races
• As long as nonoverlap exceeds clock skew
• We will use them in this class for safe design
• Industry manages skew more carefully instead

36 © 2020 Arm Limited


Gate Layout
• Layout can be very time consuming
• Design gates to fit together nicely
• Build a library of standard cells
• Standard cell design methodology
• VDD and GND should abut (standard height)
• Adjacent gates should satisfy design rules
• nMOS at bottom and pMOS at top
• All gates include well and substrate contacts

37 © 2020 Arm Limited


Example: Inverter

38 © 2020 Arm Limited


Example: NAND3
• Horizontal n-diffusion and p-diffusion strips
• Vertical polysilicon gates
• Metal 1 VDD rail at top
• Metal 1 GND rail at bottom
• 32 l by 40 l

39 © 2020 Arm Limited


Stick Diagrams
• Stick diagrams help plan layout quickly
• Need not be scaled
• Draw with color pencils or dry-erase markers

VDD VDD
A A B C
metal1
c poly
ndiff
pdiff
Y
Y contact

GND GND
INV NAND3

40 © 2020 Arm Limited


Wiring Tracks
• A wiring track is the space required for a wire
• 4 l width, 4 l spacing from neighbor = 8 l pitch
• Transistors also consume one wiring track

41 © 2020 Arm Limited


Well spacing
• Wells must surround transistors by 6 l
• Implies 12 l between opposite transistor flavors
• Leaves room for one wire track

42 © 2020 Arm Limited


Area Estimation
• Estimate area by counting wiring tracks
• Multiply by 8 to express in l

40 l

32 l

43 © 2020 Arm Limited


Example: O3AI
• Sketch a stick diagram for O3AI and estimate area

( )

Y = A + B + C iD
VDD
A B C D

6 tracks =
48 l
Y

GND
5 tracks =
40 l

44 © 2020 Arm Limited


Modern Design Rules
• Rules are expressed in nanometers, not l
• Lithography becomes difficult when feature size is less than the
wavelength of light
• At 16 nm and below, design rules are highly restrictive
• Layers have preferred directions, and no bends are allowed
• Only certain widths are allowed
• Minimum area of each rectangle
• Complex rules for power busses
• Thousands of design rules
• Layout becomes the domain of full-time experts
• But the principles of layout remain valid

45 © 2020 Arm Limited


CMOS VLSI Design

Lecture 3:
Simple Processor Example
Learning Objectives
At the end of this lecture, you should be able to
• Describe and apply techniques for managing the design of complex systems.
• Describe the implementation of a simple processor at abstraction levels including:
Architecture, Microarchitecture, Logic Design, Circuit Design, Physical Design,
Verification & Test.

2 © 2020 Arm Limited


Outline
• Design Partitioning
• Simplified Processor Example
• Architecture
• Microarchitecture
• Logic Design
• Circuit Design
• Physical Design
• Fabrication, Packaging, Testing

3 © 2020 Arm Limited


Coping with Complexity
• How to design System-on-Chip?
• Many millions (even billions!) of transistors
• Tens to hundreds of engineers
• Structured Design
• Design Partitioning

4 © 2020 Arm Limited


Structured Design
• Hierarchy: Divide and Conquer
• Recursively system into modules
• Regularity
• Reuse modules wherever possible
• E.g., standard cell library
• Modularity: well-formed interfaces
• Allows modules to be treated as black boxes
• Locality
• Physical and temporal

5 © 2020 Arm Limited


Design Partitioning
• Architecture: User’s perspective, what does it do?
• Instruction set, registers
• ARM, MIPS, x86, Alpha, PIC, RISC-V, …
• Microarchitecture
• Single cycle, multicycle, pipelined, superscalar?
• Logic: how are functional blocks constructed
• Ripple carry, carry lookahead, carry select adders
• Circuit: how are transistors used
• Complementary CMOS, pass transistors, domino
• Physical: chip layout
• Datapaths, memories, random logic

6 © 2020 Arm Limited


Simplified Architecture
• Example: simplified processor architecture
• Drawn from DDCA ARM Ed. (Harris & Harris)
• ARM is a 32-bit architecture with 16 registers
• Consider 8-bit subset using 8-bit datapath for simplified processor
• Only implement 8 registers (R0 - R7)
• 8-bit program counter in R7
• You’ll build this simplified processor in the labs
• Illustrate the key concepts in VLSI design

7 © 2020 Arm Limited


Instruction Set

• Data Processing
– ADD, SUB, AND, OR
– Each accepts register/immediate 2nd sources
• Memory
– LDR, STR
– Positive immediate offset
• Branch
– B

8 © 2020 Arm Limited


Instruction Encoding
• 32-bit instruction encoding
• Requires four cycles to fetch on 8-bit datapath
• Three instruction formats shown below
• Distinguished by their op fields
• Rd: destination register
• Rn: first source register
• Rm: second source register
• Cond: conditional execution
• Imm: immediate
cmd instr
0100 ADD
0010 SUB
0000 AND
1100 OR

9 © 2020 Arm Limited


Fibonacci (C)
f0 = 1; f-1 = -1
fn = fn-1 + fn-2
f = 1, 1, 2, 3, 5, 8, 13, …

10 © 2020 Arm Limited


Fibonacci (Assembly)
# fib.asm
# Register usage: R0 = 0, R3 = n, R4 = f1, R5 = f2
# Put result in address 255
fib add R3, R0, #8 # initialize n = 8
add R4, R0, #1 # initialize f1 = 1
add R5, R0, #-1 # initialize f2 = -1
loop subs R3, R0, end # n = 0?
beq done # then done
add R4, R4, R5 # f1 = f1 + f2
sub R5, R4, R5 # f2 = f1 – f2
add R3, R3, #-1 # n = n-1
b loop # repeat until done
done str R4, [R0, #255] # store result in adr 255

11 © 2020 Arm Limited


ARM Microarchitecture
• Multicycle microarchitecture ( Harris DDCA ARM Ed.)

12 © 2020 Arm Limited


Multicycle Controller

13 © 2020 Arm Limited


Multicycle FSM

14 © 2020 Arm Limited


Logic Design
• Start at top level
• Hierarchically decompose simplified processor into units

15 © 2020 Arm Limited


Hierarchical Design

mips
Simplified Processor

controller alucontrol datapath

standard bitslice zipper


cell library

alu inv4x flop ramslice

fulladder or2 and2 mux4

nor2 inv nand2 mux2

tri

16 © 2020 Arm Limited


HDLs
• Hardware Description Languages
• Widely used in logic design
• Verilog and VHDL
• Describe hardware using code
• Document logic functions
• Simulate logic before building
• Synthesize code into gates and layout
– Requires a library of standard cells

17 © 2020 Arm Limited


Verilog Example
module fulladder(input a, b, c,
output s, cout);

sum s1(a, b, c, s);


carry c1(a, b, c, cout);
endmodule

module carry(input a, b, c,
output cout)

assign cout = (a&b) | (a&c) | (b&c);


endmodule

18 © 2020 Arm Limited


Circuit Design
• How should logic be implemented?
• NANDs and NORs vs. ANDs and ORs?
• Fan-in and fan-out?
• How wide should transistors be?
• These choices affect speed, area, power
• Logic synthesis makes these choices for you
• Good enough for many applications
• Hand-crafted circuits are still better

19 © 2020 Arm Limited


Example: Carry Logic
• assign cout = (a&b) | (a&c) | (b&c);

Transistors? Gate Delays?

20 © 2020 Arm Limited


Gate-level Netlist

module carry(input a, b, c,
output cout)

wire x, y, z;

and g1(x, a, b);


and g2(y, a, c);
and g3(z, b, c);
or g4(cout, x, y, z);
endmodule

21 © 2020 Arm Limited


Transistor-Level Netlist

module carry(input a, b, c,
output cout)

wire i1, i2, i3, i4, cn;

tranif1 n1(i1, 0, a);


tranif1 n2(i1, 0, b);
tranif1 n3(cn, i1, c);
tranif1 n4(i2, 0, b);
tranif1 n5(cn, i2, a);
tranif0 p1(i3, 1, a);
tranif0 p2(i3, 1, b);
tranif0 p3(cn, i3, c);
tranif0 p4(i4, 1, b);
tranif0 p5(cn, i4, a);
tranif1 n6(cout, 0, cn);
tranif0 p6(cout, 1, cn);
endmodule

22 © 2020 Arm Limited


SPICE Netlist
.SUBCKT CARRY A B C COUT VDD GND
MN1 I1 A GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P
MN2 I1 B GND GND NMOS W=1U L=0.18U AD=0.3P AS=0.5P
MN3 CN C I1 GND NMOS W=1U L=0.18U AD=0.5P AS=0.5P
MN4 I2 B GND GND NMOS W=1U L=0.18U AD=0.15P AS=0.5P
MN5 CN A I2 GND NMOS W=1U L=0.18U AD=0.5P AS=0.15P
MP1 I3 A VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1 P
MP2 I3 B VDD VDD PMOS W=2U L=0.18U AD=0.6P AS=1P
MP3 CN C I3 VDD PMOS W=2U L=0.18U AD=1P AS=1P
MP4 I4 B VDD VDD PMOS W=2U L=0.18U AD=0.3P AS=1P
MP5 CN A I4 VDD PMOS W=2U L=0.18U AD=1P AS=0.3P
MN6 COUT CN GND GND NMOS W=2U L=0.18U AD=1P AS=1P
MP6 COUT CN VDD VDD PMOS W=4U L=0.18U AD=2P AS=2P
CI1 I1 GND 2FF
CI3 I3 GND 3FF
CA A GND 4FF
CB B GND 4FF
CC C GND 2FF
CCN CN GND 4FF
CCOUT COUT GND 2FF
.ENDS

23 © 2020 Arm Limited


Physical Design
• Floorplan
• Standard cells
• Place & route
• Datapaths
• Slice planning
• Area estimation

24 © 2020 Arm Limited


Simplified Processor Floorplan

25 © 2020 Arm Limited


8-bit Processor Layout

26 © 2020 Arm Limited


Standard Cells
• Uniform cell height
• Uniform well height
• M1 VDD and GND rails
• M2 Access to I/Os
• Well/substrate taps
• Exploits regularity

27 © 2020 Arm Limited


Synthesized Controller
• Synthesize HDL into gate-level netlist
• Place & Route using standard cell library

28 © 2020 Arm Limited


Pitch Matching
• Synthesized controller area is mostly wires
• Design is smaller if wires run through/over cells
• Smaller = faster, lower power as well!
• Design snap-together cells for datapaths and arrays
• Plan wires into cells
• Connect by abutment
– Exploits locality
– Takes lots of effort A A A A B

A A A A B

A A A A B

A A A A B

C C D

29 © 2020 Arm Limited


Simple 8-bit Processor Datapath

• 8-bit datapath built from wordslices


• Zipper at top drives control signals to datapath
Register
File
Decoder

Zipper
Bitslice 7

Bitslice 1

Flop Adder
Wordslice Wordslice

30 © 2020 Arm Limited


Slice Plans
• Slice plan for bitslice
• Cell ordering, dimensions, wiring tracks
• Arrange cells for wiring locality

31 © 2020 Arm Limited


Area Estimation
• Need area estimates to make floorplan
• Compare to another block you already designed
• Or estimate from transistor counts
• Budget room for large wiring tracks
• Your mileage may vary; derate by 2x for class.

32 © 2020 Arm Limited


Design Verification
• Fabrication is slow & expensive
• MOSIS 0.6 µm: $1000, 3 months
• 65 nm: $3M, 1 month
• Debugging chips is very hard
• Limited visibility into operation
• Prove design is right before building!
• Logic simulation
• Circuit simulation/formal verification
• Layout vs. schematic comparison
• Design & electrical rule checks
• Verification is >50% of effort on most chips!

33 © 2020 Arm Limited


Fabrication & Packaging
• Tapeout final layout
• Fabrication
• 6, 8, 12” wafers
• Optimized for throughput,
not latency (10 weeks!)
• Cut into individual dice
• Packaging
• Bond gold wires from die I/O pads to package

34 © 2020 Arm Limited


Testing
• Test that chip operates
• Design errors
• Manufacturing errors
• A single dust particle or wafer defect kills a die
• Yields from 90% to <10%
• Depends on die size, maturity of process
• Test each part before shipping to customer

35 © 2020 Arm Limited


ARM1 Processor
• 1st generation ARM chip (1985)
• 24,800 transistors
• 3-stage pipeline
• 8 MHz
• 3 mm process
• 49 mm2
• 82-pin PLCC package
• Average power = 0.1 W
• 2-phase nonoverlapping
clocks
• Built for Acorn Computer
© 1985 ARM Ltd.

36 © 2020 Arm Limited


CMOS VLSI Design

Lecture 4:
CMOS Transistor Theory
Learning Objectives
At the end of this lecture, you should be able to:
• Use cross section diagrams to describe the characteristics of MOS transistors when
operating in cut off, linear and saturation regions.
• Derive the relationship between current and voltage (I-V) of the MOS device at cut off,
linear and saturation modes.
• Mathematically estimate the MOS gate capacitance.
• Describe the effect of diffusion capacitance on the terminals

2 © 2020 Arm Limited


Introduction
• So far, we have treated transistors as ideal switches
• An ON transistor passes a finite amount of current
• Depends on terminal voltages
• Derive current-voltage (I-V) relationships
• Transistor gate, source, drain all have capacitance
• I = C (DV/Dt) -> Dt = (C/I) DV
• Capacitance and current determine speed

3 © 2020 Arm Limited


MOS Capacitor
• Gate and body form MOS capacitor
• Operating modes
• Accumulation
• Depletion
• Inversion
• Threshold voltage
is when inversion
begins

4 © 2020 Arm Limited


Terminal Voltages
• Mode of operation depends on Vg, Vd, Vs
• Vgs = Vg – Vs Vg
• Vgd = Vg – Vd
+ +
• Vds = Vd – Vs = Vgs - Vgd Vgs Vgd
- -
• Source and drain are symmetric diffusion terminals
Vs Vd
• By convention, source is terminal at lower voltage -
Vds +
• Hence, Vds ≥ 0
• nMOS body is grounded. First, assume source is 0 too.
• Three regions of operation
• Cutoff
• Linear
• Saturation

5 © 2020 Arm Limited


nMOS Cutoff
• Vgs < Vt
• No inversion, no channel
• Ids ≈ 0

6 © 2020 Arm Limited


nMOS Linear
• Vgs > Vt
• But Vds small
• Channel forms
• Current flows from d to s
• e- from s to d
• Ids increases with Vds
• Similar to linear resistor

7 © 2020 Arm Limited


nMOS Saturation
• Vgs > Vt
• Vds > Vgs - Vt
• Channel pinches off
• Ids independent of Vds
• We say current saturates
• Similar to current source

8 © 2020 Arm Limited


I-V Characteristics
• In Linear region, Ids depends on
• How much charge is in the channel?
• How fast is the charge moving?

9 © 2020 Arm Limited


Channel Charge
• MOS structure looks like a parallel plate capacitor while operating in inversions
• Gate – oxide – channel
• Qchannel = CV
• C = Cg = eoxWL/tox = CoxWL Cox = eox / tox
• V = Vgc – Vt = (Vgs – Vds/2) – Vt
Cox = ox / tox

10 © 2020 Arm Limited


Carrier velocity
• Charge is carried by e-
• Electrons are propelled by the lateral electric field between source and drain
• E = Vds/L
• Carrier velocity v proportional to lateral E-field
• v=µE µ called mobility
• Time for carrier to cross channel:
• t=L/v

11 © 2020 Arm Limited


nMOS Linear I-V
• Now we know
• How much charge Qchannel is in the channel
• How much time (t) each carrier takes to cross

Qchannel
I ds 
t
 Cox
W V  V  Vds V
 gs  ds
 2 
t
L
  Vgs  Vt  ds Vds
V W
 = Cox
 2 L

12 © 2020 Arm Limited


nMOS Saturation I-V
• If Vgd < Vt, channel pinches off near drain
• When Vds > Vdsat = Vgs – Vt
• Now drain voltage no longer increases current

I ds   Vgs  Vt  dsat V
V
 dsat
 2 

  Vt 
2
 Vgs
2

13 © 2020 Arm Limited


nMOS I-V Summary
• Shockley 1st order transistor models


 0 Vgs  Vt cutoff

  Vds V V  V
I ds    Vgs  Vt   ds linear
 2 
ds dsat

 
Vgs  Vt 
2
 Vds  Vdsat saturation
2

14 © 2020 Arm Limited


Example
• We will be using a 0.6 m process for your project
• From AMI Semiconductor
• tox = 100 Å 2.5
Vgs = 5
• µ = 350 cm2/V*s
• Vt = 0.7 V 2

• Plot Ids vs. Vds 1.5 Vgs = 4

Ids (mA)
• Vgs = 0, 1, 2, 3, 4, 5
• Use W/L = 4/2 l 1
Vgs = 3
0.5
Vgs = 2
Vgs = 1
0
0 1 2 3 4 5
W  3.9  8.85 1014   W  W Vds
   Cox   350   8    120 μA/V 2
L  100  10  L  L

15 © 2020 Arm Limited


pMOS I-V
• All dopings and voltages are inverted for pMOS
• Source is the more positive terminal
• Mobility µp is determined by holes 0
Vgs = -1
• Typically 2-3x lower than that of electrons µn Vgs = -2
• 120 cm2/V•s in AMI 0.6 µm process
-0.2
• In advanced nodes, µp ≈ µn Vgs = -3

• Thus, pMOS must be wider to

Ids (mA)
-0.4
provide same current
Vgs = -4
•In this class, assume
µn / µp = 2 -0.6

Vgs = -5
-0.8
-5 -4 -3 -2 -1 0
Vds

16 © 2020 Arm Limited


Capacitance
• Any two conductors separated by an insulator have capacitance
• Gate to channel capacitor is very important
• Creates channel charge necessary for operation
• Source and drain have capacitance to body
• Across reverse-biased diodes
• Called diffusion capacitance because it is associated with source/drain diffusion

17 © 2020 Arm Limited


Gate Capacitance
• Approximate channel as connected to source
• Cgs = eoxWL/tox = CoxWL = CpermicronW
• Cpermicron is typically about 2 fF/m

polysilicon
gate
W
tox
L SiO2 gate oxide
n+ n+ (good insulator, eox = 3.9e0)
p-type body

18 © 2020 Arm Limited


Diffusion Capacitance
• Source/drain diffusion to body Csb, Cdb
• Undesirable, called parasitic capacitance
• Capacitance depends on area and perimeter
• Use small diffusion nodes
• Comparable to Cg
for contacted diff
• ½ Cg for uncontacted
• Varies with process

19 © 2020 Arm Limited


CMOS VLSI Design

Lecture 5:
Nonideal Transistor Theory
Learning Objectives
At the end of this lecture, you should be able to:
• Use suitable equations to describe the nonideal transistor characteristics due to High
field effects, Channel length modulation, Threshold voltage effects and Leakage.
• Explain sources and impacts of process and environmental variations on the transistor
operation.

2 © 2020 Arm Limited


Outline
• Nonideal Transistor Behavior
• High Field Effects
– Mobility Degradation
– Velocity Saturation
• Channel Length Modulation
• Threshold Voltage Effects
– Body Effect
– Drain-Induced Barrier Lowering
– Short Channel Effect
• Leakage
– Subthreshold Leakage
– Gate Leakage
– Junction Leakage
• Process and Environmental Variations

3 © 2020 Arm Limited


Ideal Transistor I-V
• Shockley long-channel transistor models


 0 Vgs  Vt cutoff

  Vds V V  V
I ds    Vgs  Vt   ds linear
 2 
ds dsat

 
Vgs  Vt 
2
 Vds  Vdsat saturation
2

4 © 2020 Arm Limited


Ideal vs. Simulated nMOS I-V Plot

• 65 nm IBM process, VDD = 1.0 V


Ids (A)

Simulated
Vgs = 1.0
Ideal
1200
Velocity saturation & Mobility degradation:
Ion lower than ideal model predicts

1000
Ion = 747 mA @
Channel length modulation: V = V = V
gs ds DD
Saturation current increases
800 with Vds Vgs = 1.0

Vgs = 0.8
600
Velocity saturation & Mobility degradation:
Vgs = 0.8
Saturation current increases less than
400 quadratically with Vgs

Vgs = 0.6
200 Vgs = 0.6
Vgs = 0.4
0 Vds
0 0.2 0.4 0.6 0.8 1

5 © 2020 Arm Limited


ON and OFF Current

Ids (A)

Ion = Ids @ Vgs = Vds = VDD 1000


Ion = 747 mA @
Vgs = Vds = VDD

• Saturation 800 Vgs = 1.0

600

Vgs = 0.8

400

Vgs = 0.6
200

Vgs = 0.4

0 Vds
0 0.2 0.4 0.6 0.8 1

• Ioff = Ids @ Vgs = 0, Vds = VDD


• Cutoff

6 © 2020 Arm Limited


Electric Fields Effects
• Vertical electric field: Evert = Vgs / tox
• Attracts carriers into channel
• Long channel: Qchannel proportional to Evert
• Lateral electric field: Elat = Vds / L
• Accelerates carriers from drain to source
• Long channel: v = µElat

7 © 2020 Arm Limited


Coffee Cart Analogy
• Tired student runs from VLSI lab to coffee cart
• Freshmen are pouring out of the physics lecture hall
• Vds is how long you have been up
• Your velocity = fatigue × mobility
• Vgs is a wind blowing you against the glass (SiO2) wall
• At high Vgs, you are buffeted against the wall
• Mobility degradation
• At high Vds, you scatter off freshmen, fall down, get up
• Velocity saturation
– Don’t confuse this with the saturation region

8 © 2020 Arm Limited


Mobility Degradation
• High Evert effectively reduces mobility
• Collisions with oxide interface

9 © 2020 Arm Limited


Velocity Saturation
• At high Elat, carrier velocity rolls off
• Carriers scatter off atoms in silicon lattice
• Velocity reaches vsat
– Electrons: 107 cm/s
– Holes: 8 x 106 cm/s
• Better model

10 © 2020 Arm Limited


Vel Sat I-V Effects
• Ideal transistor ON current increases with VDD2
W Vgs  Vt 
2

 Vgs  Vt 
2
I ds  Cox
L 2 2

• Velocity-saturated ON current increases with VDD

• Real transistors are partially velocity saturated


• Approximate with α-power law model
• Ids scales with VDDα
• 1 < α < 2 determined empirically (≈ 1.3 for 65 nm)

11 © 2020 Arm Limited


a-Power Model
 0 Vgs  Vt cutoff
 
Vgs  Vt 
a
 V I dsat  Pc
I ds   I dsat ds Vds  Vdsat linear 2
 Vdsat
Vdsat  Pv Vgs  Vt 
a /2
 I dsat Vds  Vdsat saturation

12 © 2020 Arm Limited


Channel Length Modulation
• Reverse-biased p-n junctions form a depletion region
• Region between n and p with no carriers
• Width of depletion, Ld, region grows with reverse bias
VDD VDD
• Leff = L – Ld GND
Source Gate Drain

• Shorter Leff gives more current Depletion Region


Width: Ld
• Ids increases with Vds
• Even in saturation n n
L
+ Leff +
p GND bulk Si

13 © 2020 Arm Limited


Chan Length Mod I-V


  Vt  1  Vds 
2
I ds  Vgs
2

• λ = channel length modulation coefficient


• not feature size
• Empirically fit to I-V characteristics

14 © 2020 Arm Limited


Threshold Voltage Effects
• Vt is Vgs for which the channel starts to invert
• Ideal models assumed Vt is constant
• Really depends (weakly) on almost everything else:
• Body voltage: Body Effect
• Drain voltage: Drain-Induced Barrier Lowering
• Channel length: Short Channel Effect

15 © 2020 Arm Limited


Body Effect

• Body is a fourth transistor terminal


• Vsb affects the charge required to invert the channel
• Increasing Vs or decreasing Vb increases Vt
Vt  Vt 0  g  fs  Vsb  fs 
• Vt0 = nominal threshold voltage
• fs = surface potential at threshold
NA
fs  2vT ln
ni
• Depends on doping level NA
• And intrinsic carrier concentration ni
• g = body effect coefficient

tox 2q si N A
g  2q si N A 
 ox Cox

16 © 2020 Arm Limited


Body Effect Cont.
• For small source-to-body voltage, treat as linear

17 © 2020 Arm Limited


DIBL
• Electric field from drain affects channel
• More pronounced in small transistors where the drain is closer to the channel
• Drain-Induced Barrier Lowering
• Drain voltage also affect Vt

VVVh¢=-
ttds

• High drain voltage causes current to increase.

Vt   Vt  Vds

18 © 2020 Arm Limited


Short Channel Effect
• In small transistors, source/drain depletion regions extend into the channel
• Impacts the amount of charge required to invert the channel
• And thus makes Vt a function of channel length
• Short channel effect: Vt increases with L
• Some processes exhibit a reverse short channel effect in which Vt decreases with L

19 © 2020 Arm Limited


Leakage
• What about current in cutoff?
• Simulated results
• What differs?
• Current doesn’t
go to 0 in cutoff

20 © 2020 Arm Limited


Leakage Sources
• Subthreshold conduction
• Transistors can’t abruptly turn ON or OFF
• Dominant source in contemporary transistors
• Gate leakage
• Tunneling through ultrathin gate dielectric
• Junction leakage
• Reverse-biased PN junction diode current

21 © 2020 Arm Limited


Subthreshold Leakage
• Subthreshold leakage exponential with Vgs
Vgs Vt 0 Vds  kg Vsb
 Vds

I ds  I ds 0 e nvT
1  e vT 
 
 
• n is process dependent
• typically 1.3-1.7
• Rewrite relative to Ioff on log scale

• S ≈ 100 mV/decade @ room temperature

22 © 2020 Arm Limited


Gate Leakage
• Carriers tunnel through very thin gate oxides
• Exponentially sensitive to tox and VDD

• A and B are tech constants


• Greater for electrons
– So nMOS gates leak more
• Negligible for older processes (tox > 20 Å)
• Critically important at 65 nm and below (tox ≈ 10.5 Å)
From [Song01]

23 © 2020 Arm Limited


Junction Leakage
• Reverse-biased p-n junctions have some leakage
• Ordinary diode leakage
• Band-to-band tunneling (BTBT)
• Gate-induced drain leakage (GIDL)

p+ n+ n+ p+ p+ n+

n well
p substrate

24 © 2020 Arm Limited


Diode Leakage
• Reverse-biased p-n junctions have some leakage
 VvD 
I D  I S  e  1
T

 
 
• At any significant negative diode voltage, ID = -Is
• Is depends on doping levels
• And area and perimeter of diffusion regions
• Typically <1 fA/µm2 (negligible)

25 © 2020 Arm Limited


Band-to-Band Tunneling
• Tunneling across heavily doped p-n junctions
• Especially sidewall between drain & channel
when halo doping is used to increase Vt
• Increases junction leakage to significant levels

• Xj: sidewall junction depth


• Eg: bandgap voltage
• A, B: tech constants

26 © 2020 Arm Limited


Gate-Induced Drain Leakage
• Occurs at overlap between gate and drain
• Most pronounced when drain is at VDD, gate is at a negative voltage
• Thwarts efforts to reduce subthreshold leakage using a negative gate voltage

27 © 2020 Arm Limited


Temperature Sensitivity
• Increasing temperature
• Reduces mobility
• Reduces Vt
• ION decreases with temperature
• IOFF increases with temperature

I ds

increasing
temperature

Vgs

28 © 2020 Arm Limited


So What?
• So what if transistors are not ideal?
• They still behave like switches.
• But these effects matter for…
• Supply voltage choice
• Logical effort
• Quiescent power consumption
• Pass transistors
• Temperature of operation

29 © 2020 Arm Limited


Parameter Variation
• Transistors have uncertainty in parameters
• Process: Leff, Vt, tox of nMOS and pMOS
• Vary around typical (T) values
• Fast (F)
• Leff: short
• Vt: low

fast
FF
• tox: thin SF

pMOS
• Slow (S): opposite TT

• Not all parameters are independent FS


SS

slow
for nMOS and pMOS
slow fast
nMOS

30 © 2020 Arm Limited


Environmental Variation
• VDD and T also vary in time and space
• Fast:
• VDD: high
• T: low

Corner Voltage Temperature


F 1.98 0C
T 1.8 70 C
S 1.62 125 C

31 © 2020 Arm Limited


Process Corners
• Process corners describe worst case variations
• If a design works in all corners, it will probably work for any variation.
• Describe corner with four letters (T, F, S)
• nMOS speed
• pMOS speed
• Voltage

fast
FF
• Temperature SF

pMOS
TT

FS
SS

slow
slow fast
nMOS

32 © 2020 Arm Limited


Important Corners
• Some critical simulation corners include

Purpose nMOS pMOS VDD Temp

Cycle time S S S S

Power F F F F

Subthreshold F F F S
leakage

33 © 2020 Arm Limited


CMOS VLSI Design

Lecture 6:
DC & Transient Response
Learning Objectives
At the end of this lecture, you should be able to:
• Explain threshold drop in pass transistor circuits.
• Graphically derive the DC response of a CMOS logic gate.
• Estimate the delay of logic gates using RC delay models.

2 © 2020 Arm Limited


Outline
• Pass Transistors
• DC Response
• Logic Levels and Noise Margins
• Transient Response
• RC Delay Models
• Delay Estimation

3 © 2020 Arm Limited


Pass Transistors
• We have assumed source is grounded
• What if source is >0?
• e.g., pass transistor passing VDD
• Vg = VDD VDD
• If Vs > VDD-Vt, then Vgs < Vt VDD
• Hence, transistor would turn itself off
• nMOS pass transistors pull no higher than VDD-Vtn
• Called a degraded “1”
• Approach degraded value slowly (low Ids)
• pMOS pass transistors pull no lower than Vtp
• Transmission gates are needed to pass both 0 and 1

4 © 2020 Arm Limited


Pass Transistor Ckts

VDD VDD VDD


VDD VDD
VDD
Vs = VDD-Vtn VDD-Vtn
VDD-Vtn VDD-Vtn

VDD
VDD-Vtn
Vs = |Vtp|
VDD VDD-2Vtn
VSS

5 © 2020 Arm Limited


DC Response
• DC Response: Vout vs. Vin for a gate
• E.g., Inverter
• When Vin = 0 -> Vout = VDD
• When Vin = VDD -> Vout = 0
• In between, Vout depends on
transistor size and current
• By KCL, must settle such that VDD
Idsn = |Idsp|
• We could solve equations Idsp
• But graphical solution gives more insight Vin Vout
Idsn

6 © 2020 Arm Limited


Transistor Operation
• Current depends on region of transistor behavior
• For what Vin and Vout are nMOS and pMOS in
• Cutoff?
• Linear?
• Saturation?

7 © 2020 Arm Limited


nMOS Operation

Cutoff Linear Saturated


Vgsn < Vtn Vgsn > Vtn Vgsn > Vtn
Vin < Vtn Vin > Vtn Vin > Vtn
Vdsn < Vgsn – Vtn Vdsn > Vgsn – Vtn
Vout < Vin - Vtn Vout > Vin - Vtn

VDD
Vgsn = Vin Idsp
Vin Vout
Vdsn = Vout Idsn

8 © 2020 Arm Limited


pMOS Operation

Cutoff Linear Saturated


Vgsp > Vtp Vgsp < Vtp Vgsp < Vtp
Vin > VDD + Vtp Vin < VDD + Vtp Vin < VDD + Vtp
Vdsp > Vgsp – Vtp Vdsp < Vgsp – Vtp
Vout > Vin - Vtp Vout < Vin - Vtp

VDD
Vgsp = Vin - VDD Vtp < 0 Idsp
Vin Vout
Vdsp = Vout - VDD Idsn

9 © 2020 Arm Limited


I-V Characteristics
• Make pMOS wider than nMOS such that bn = bp

Vgsn5

Idsn Vgsn4

-Vdsp Vgsn3
-VDD Vgsn2
Vgsp1 Vgsn1
Vgsp2 0 VDD
Vgsp3 Vdsn

Vgsp4 -Idsp

Vgsp5

10 © 2020 Arm Limited


Current vs. Vout, Vin

Vin0 Vin5

Vin1 Vin4
Idsn, |Idsp|

Vin2 Vin3
Vin3 Vin2
Vin4 Vin1
VDD
Vout

11 © 2020 Arm Limited


Load Line Analysis
• For a given Vin:
• Plot Idsn, Idsp vs. Vout
• Vout must be where |currents| are equal in

Vin0 Vin5

Vin1 Vin4
Idsn, |Idsp|
VDD
Vin2 Vin3
Idsp
Vin3 Vin2 Vin Vout
Vin4 Vin1 Idsn

VDD
Vout

12 © 2020 Arm Limited


Load Line Analysis
• Vin = 0  Vin = 0V
0.4V
0.6V
0.8V
.2V
DD DD
DD

Vin0 Vin5
in5

Vin1 Vin4
dsn, |Idsp
Idsn dsp
|

Vin2 Vin3
Vin3 Vin2
Vin4 Vin1
in0
VDD
Vout
out
DD

13 © 2020 Arm Limited


DC Transfer Curve
• Transcribe points onto Vin vs. Vout plot

Vin0 Vin1
VDD Vin2
Vin0 Vin5
A B

Vout
Vin1 Vin4
C

Vin2 Vin3
Vin3
Vin3 Vin2 D Vin4 Vin5
Vin4 Vin1 E
0 Vtn VDD/2 VDD+Vtp
VDD VDD
Vout Vin

14 © 2020 Arm Limited


Operating Regions
• Revisit transistor operating regions
VDD

Vin Vout

Region nMOS pMOS


A Cutoff Linear
VDD
B Saturation Linear A B

C Saturation Saturation Vout


C
D Linear Saturation
E Linear Cutoff
D
E
0 Vtn VDD/2 VDD+Vtp
VDD
Vin

15 © 2020 Arm Limited


Beta Ratio
• If βp / βn ≈ 1, switching point will move from VDD/2
• Called skewed gate
• Other gates: collapse into equivalent inverter

VDD
bp
 10
bn
Vout 2
1
0.5
bp
 0.1
bn

0
VDD
Vin

16 © 2020 Arm Limited


Noise Margins
• How much noise can a gate input see before it does not recognize the input?

Output Characteristics Input Characteristics


VDD
Logical High
Output Range VOH Logical High
Input Range
NMH
VIH
Indeterminate
VIL Region
NML
Logical Low
Logical Low VOL Input Range
Output Range
GND

17 © 2020 Arm Limited


Logic Levels
• To maximize noise margins, select logic levels at
• unity gain point of DC transfer characteristic

Vout

Unity Gain Points


VDD
Slope = -1
VOH

b p/b n > 1

Vin Vout

VOL
Vin
0
Vtn VIL VIH VDD- VDD
|Vtp|

18 © 2020 Arm Limited


Transient Response
• DC analysis tells us Vout if Vin is constant
• Transient analysis tells us Vout(t) if Vin(t) changes
• Requires solving differential equations
• Input is usually considered to be a step or ramp
• From 0 to VDD or vice versa

19 © 2020 Arm Limited


Inverter Step Response
• E.g., find step response of inverter driving load cap

Vin (t )  u(t  t0 )VDD


Vin(t)
Vout (t  t0 )  VDD Vout(t)
Cload
dVout (t ) I dsn (t )
 Idsn(t)
dt Cload
Vin(t)

 0 t  t0

I dsn (t )   b
   Vout  VDD  Vt
2
2 V DD V Vout(t)

 b VDD  Vt  out 2  V (t ) V  V  V
V (t ) t
 out t0
  
out DD t

20 © 2020 Arm Limited


Delay Definitions
• tpdr: rising propagation delay
• From input to rising output
crossing VDD/2
• tpdf: falling propagation delay
• From input to falling output
crossing VDD/2
• tpd: average propagation delay
• tpd = (tpdr + tpdf)/2
• tr: rise time
• From output crossing 0.2 VDD to
0.8 VDD
• tf: fall time
• From output crossing 0.8 VDD to
0.2 VDD

21 © 2020 Arm Limited


Delay Definitions
• tcdr: rising contamination delay
• From input to rising output crossing VDD/2
• tcdf: falling contamination delay
• From input to falling output crossing VDD/2
• tcd: average contamination delay
• tpd = (tcdr + tcdf)/2

22 © 2020 Arm Limited


Simulated Inverter Delay
• Solving differential equations by hand is too hard
• SPICE simulator solves the equations numerically
• Uses more accurate I-V models too!
• But simulations take time to write, may hide insight

2.0

1.5

1.0
(V)
tpdf = 66ps tpdr = 83ps
Vin
Vout
0.5

0.0

0.0 200p 400p 600p 800p 1n


t(s)

23 © 2020 Arm Limited


Delay Estimation
• We would like to be able to easily estimate delay
• Not as accurate as simulation
• But easier to ask “What if?”
• The step response usually looks like a 1st order RC response with a decaying
exponential.
• Use RC delay models to estimate delay
• C = total capacitance on output node
• Use effective resistance R
• So that tpd = RC
• Characterize transistors by finding their effective R
• Depends on average current as gate switches

24 © 2020 Arm Limited


Effective Resistance
• Shockley models have limited value
• Not accurate enough for modern transistors
• Too complicated for much hand analysis
• Simplification: treat transistor as resistor
• Replace Ids(Vds, Vgs) with effective resistance R
– Ids = Vds/R
• R averaged across switching of digital gate
• Too inaccurate to predict current at any given time
• But good enough to predict RC delay

25 © 2020 Arm Limited


RC Delay Model
• Use equivalent circuits for MOS transistors
• Ideal switch + capacitance and ON resistance
• Unit nMOS has resistance R, capacitance C
• Unit pMOS has resistance 2R, capacitance C
• Capacitance proportional to width
• Resistance inversely proportional to width

26 © 2020 Arm Limited


RC Values
• Capacitance
• C = Cg = Cs = Cd = 2 fF/mm of gate width in 0.6 mm
• Gradually decline to 1 fF/mm in nanometer techs.
• Resistance
• R ≈ 6 kΩ*mm in 0.6 mm process
• Reduces with shorter channel lengths
• Unit transistors
• May refer to minimum contacted device (4/2 l)
• Or maybe 1 mm wide device
• Doesn’t matter as long as you are consistent

27 © 2020 Arm Limited


Inverter Delay Estimate
• Estimate the delay of a fanout-of-1 inverter

28 © 2020 Arm Limited


Delay Model Comparison

29 © 2020 Arm Limited


Example: 3-input NAND
• Sketch a 3-input NAND with transistor widths chosen to achieve
effective rise and fall resistances equal to a unit inverter (R).

30 © 2020 Arm Limited


3-input NAND Caps
• Annotate the 3-input NAND gate with gate and diffusion capacitance.

31 © 2020 Arm Limited


Elmore Delay
• ON transistors look like resistors
• Pullup or pulldown network modeled as RC ladder
• Elmore delay of RC ladder

t pd  
nodes i
Ri to  sourceCi

 R1C1   R1  R2  C2  ...   R1  R2  ...  RN  C N


R1 R2 R3 RN

C1 C2 C3 CN

32 © 2020 Arm Limited


Example: 3-input NAND
• Estimate worst-case rising and falling delay of 3-input NAND driving h
identical gates.

t pdf   3C   R3    3C   R3  R3    9  5h  C   R3  R3  R3 
t pdr   9  5h  RC
 11  5h  RC

33 © 2020 Arm Limited


Delay Components
• Delay has two parts
• Parasitic delay
– 9 or 11 RC
– Independent of load
• Effort delay
– 5h RC
– Proportional to load capacitance

34 © 2020 Arm Limited


Contamination Delay
• Best-case (contamination) delay can be substantially less
than propagation delay.
• E.g., if all three inputs fall simultaneously

35 © 2020 Arm Limited


Diffusion Capacitance
• We assumed contacted diffusion on every s/d.
• Good layout minimizes diffusion area
• E.g., NAND3 layout shares one diffusion contact
• Reduces output capacitance by 2C
• Merged uncontacted diffusion might help too

36 © 2020 Arm Limited


Layout Comparison
• Which layout is better?
• Left has less diffusion capacitance on Y

VDD VDD
A B A B

Y Y

GND GND

37 © 2020 Arm Limited


CMOS VLSI Design

Lecture 7:
Logical Effort
Learning Objectives
At the end of this lecture, you should be able to:
• Use Logical Effort to estimate the delay of a logic gate and a combinational circuit path.
• Apply the method of Logical Effort to determine the best number of stages and best
topology to minimize delay of a path.

2 © 2020 Arm Limited


Introduction
• Chip designers face a bewildering array of choices
• What is the best circuit topology for a function?
• How many stages of logic give least delay?
• How wide should the transistors be? ???
• Logical effort is a method to make these decisions
• Uses a simple model of delay
• Allows back-of-the-envelope calculations
• Helps make rapid comparisons between alternatives
• Emphasizes remarkable symmetries

3 © 2020 Arm Limited


Example
• Help Ben Bitdiddle design the decoder for a register file.

A[3:0] A[3:0]
32 bits

• Decoder specifications:

4:16 Decoder
• 16-word register file

16 words
16
• Each word is 32 bits wide Register File

• Each bit presents load of 3 unit-sized transistors


• True and complementary address inputs A[3:0]
• Each input may drive 10 unit-sized transistors
• Ben needs to decide:
• How many stages to use?
• How large should each gate be?
• How fast can decoder operate?

4 © 2020 Arm Limited


Delay in a Logic Gate
• Express delays in process-independent unit
• Delay has two components: d = f + p d abs
d
• f: effort delay = gh (aka stage effort) 
• Again has two components  3RC

• g: logical effort ≈ 3 ps in 65 nm process


• Measures relative ability of gate to deliver current 60 ps in 0.6  m process
• g =1 for inverter
• h: electrical effort = Cout / Cin
• Ratio of output to input capacitance
• Sometimes called fanout
• p: parasitic delay
• Represents delay of gate driving no load
• Set by internal parasitic capacitance

5 © 2020 Arm Limited


Delay Plots
d =f+p 2-input
= gh + p 6
NAND Inverter
g = 4/3

Normalized Delay: d
• What about 5 p=2
d = (4/3)h + 2
NOR2? 4 g=1
p=1
3 d=h+1

2 Effort Delay: f

1
Parasitic Delay: p
0
0 1 2 3 4 5

Electrical Effort:
h = Cout / Cin

6 © 2020 Arm Limited


Computing Logical Effort
• DEF: Logical effort is the ratio of the input capacitance of a gate to the input
capacitance of an inverter delivering the same output current.
• Measure from delay vs. fanout plots
• Or estimate by counting transistor widths

7 © 2020 Arm Limited


Catalog of Gates
• Logical effort of common gates

Gate type Number of inputs


1 2 3 4 n
Inverter 1
NAND 4/3 5/3 6/3 (n+2)/3
NOR 5/3 7/3 9/3 (2n+1)/3
Tristate/mux 2 2 2 2 2
XOR, XNOR 4, 4 6, 12, 6 8, 16, 16, 8

8 © 2020 Arm Limited


Catalog of Gates
• Parasitic delay of common gates
• In multiples of pinv (≈1)

Gate type Number of inputs


1 2 3 4 n
Inverter 1
NAND 2 3 4 n
NOR 2 3 4 n
Tristate/mux 2 4 6 8 2n
XOR, XNOR 4 6 8

9 © 2020 Arm Limited


Example: Ring Oscillator
• Estimate the frequency of an N-stage ring oscillator

Logical Effort: g=1


Electrical Effort: h = 1 31 stage ring oscillator in
0.6  m process has
Parasitic Delay: p=1 frequency of ~ 200 MHz
Stage Delay: d=2
Frequency: fosc = 1/(2*N*d) = 1/4N

10 © 2020 Arm Limited


Example: FO4 Inverter
• Estimate the delay of a fanout-of-4 (FO4) inverter

Logical Effort: g=1


Electrical Effort: h=4
Parasitic Delay: p=1 The FO4 delay is about

Stage Delay: d=5 300 ps in 0.6  m process


15 ps in a 65 nm process

11 © 2020 Arm Limited


Multistage Logic Networks
• Logical effort generalizes to multistage networks
• Path Logical Effort
G   gi
• Path Electrical Effort
Cout-path
H
• Path Effort Cin-path
F   f i   gi hi

12 © 2020 Arm Limited


Multistage Logic Networks
• Logical effort generalizes to multistage networks
• Path Logical Effort
G   gi
• Path Electrical Effort
Cout  path
H
• Path Effort Cin  path
F   f i   gi hi
• Can we write F = GH?

13 © 2020 Arm Limited


Paths that Branch
• No! Consider paths that branch:

G =1
H = 90 / 5 = 18
GH = 18
h1 = (15 +15) / 5 = 6
h2 = 90 / 15 = 6
F = g1g2h1h2 = 36 = 2GH

14 © 2020 Arm Limited


Branching Effort
• Introduce branching effort
• Accounts for branching between stages in path

Con path  Coff path


b
Con path
• Now we compute the path effort
B   bi
Note:
• F = GBH
h i  BH

15 © 2020 Arm Limited


Multistage Delays
• Path Effort Delay
DF   f i
• Path Parasitic Delay
P   pi
• Path Delay
D   d i  DF  P

16 © 2020 Arm Limited


Designing Fast Circuits
D   d i  DF  P
• Delay is smallest when each stage bears the same effort

ˆf  g h  F N1
i i
• Thus, minimum delay of N stage path is
1
D  NF  P N

• This is a key result of logical effort


• Find fastest possible delay
• Doesn’t require calculating gate sizes

17 © 2020 Arm Limited


Gate Sizes
• How wide should the gates be for the least delay?

fˆ  gh  g CCoutin
gi Couti
 Cini 

• Working backward, apply capacitance transformation to find input capacitance of each
gate given load it drives.
• Check work by verifying input cap spec is met.

18 © 2020 Arm Limited


Example: 3-stage path
• Select gate sizes x and y for least delay from A to B

19 © 2020 Arm Limited


Example: 3-stage path
x

y
x
45
A 8
x
y B
45

Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27


Electrical Effort H = 45/8
Branching Effort B=3*2=6
Path Effort F = GBH = 125 fˆ  3 F  5
Best Stage Effort
Parasitic Delay P = 2 + 3 + 2 = 7
Delay D = 3*5 + 7 = 22 = 4.4 FO4
20 © 2020 Arm Limited
Example: 3-stage path
• Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10

y
x
45
45
A P:
84 P:
x 4
N: 4 P:
y 12 B
B
N: 6 45
N: 3 45

21 © 2020 Arm Limited


Best Number of Stages
• How many stages should a path use?
• Minimizing number of stages is not always fastest
• Example: drive 64-bit datapath with unit inverter

D = NF1/N + P
= N(64)1/N + N

22 © 2020 Arm Limited


Derivation
• Consider adding inverters to end of path
• How many give least delay?
n1
D  NF   pi   N  n1  pinv
1
N

i 1
D 1 1 1
  F N ln F N  F N  pinv  0
N
Define best stage effort   F N
1

pinv   1  ln    0

23 © 2020 Arm Limited


Best Stage Effort
• pinv   1  ln    0 has no closed-form solution

• Neglecting parasitics (pinv = 0), we find  = 2.718 (e)


• For pinv = 1, solve numerically for  = 3.59

24 © 2020 Arm Limited


Sensitivity Analysis
• How sensitive is the delay to using exactly the best number of stages?
1.6
1.51

D(N) /D(N)
1.4
1.26
1.2 1.15
1.0

(=6) ( =2.4)

0.0
0.5 0.7 1.0 1.4 2.0

N/ N

• 2.4 < < 6 gives delay within 15% of optimal


• We can be sloppy!
• I like  = 4

25 © 2020 Arm Limited


Example, Revisited
• Help Ben Bitdiddle design the decoder for a register file.
• Decoder specifications: A[3:0] A[3:0]
32 bits
• 16-word register file
• Each word is 32 bits wide

4:16 Decoder
• Each bit presents a load of 3 unit-sized transistors

16 words
16
• True and complementary address inputs A[3:0] Register File

• Each input may drive 10 unit-sized transistors


• Ben needs to decide:
• How many stages to use?
• How large should each gate be?
• How fast can the decoder operate?

26 © 2020 Arm Limited


Number of Stages
• Decoder effort is mainly electrical and branching
Electrical Effort: H = (32*3) / 10 = 9.6
Branching Effort: B=8
• If we neglect logical effort (assume G = 1)
Path Effort: F = GBH = 76.8
Number of Stages: N = log4F = 3.1
• Try a 3-stage design

27 © 2020 Arm Limited


Gate Sizes & Delay
Logical Effort G = 1 * 6/3 * 1 = 2
Path Effort: F = GBH = 154
Stage Effort: fˆ  F 1/ 3  5.36
Path Delay: D  3 fˆ  1  4  1  22.1
Gate sizes: z = 96*1/5.36 = 18; y = 18*2/5.36 = 6.7

28 © 2020 Arm Limited


Comparison
• Compare many alternatives with a spreadsheet
• D = N(76.8 G)1/N + P

Design N G P D
NOR4 1 3 4 234
NAND4-INV 2 2 5 29.8
NAND2-NOR2 2 20/9 4 30.1
INV-NAND4-INV 3 2 6 22.1
NAND4-INV-INV-INV 4 2 7 21.1
NAND2-NOR2-INV-INV 4 20/9 6 20.5
NAND2-INV-NAND2-INV 4 16/9 6 19.7
INV-NAND2-INV-NAND2-INV 5 16/9 7 20.4
NAND2-INV-NAND2-INV-INV-INV 6 16/9 8 21.6

29 © 2020 Arm Limited


Review of Definitions

Term Stage Path


number of stages 1 N
logical effort g G   gi
H
Cout-path
electrical effort h Cout
Cin Cin-path
Con-path Coff-path
branching effort b Con-path B   bi
effort f  gh F  GBH

effort delay f DF   f i

parasitic delay p P   pi
delay d f p D   d i  DF  P

30 © 2020 Arm Limited


Method of Logical Effort
1) Compute path effort F  GBH
2) Estimate best number of stages N  log 4 F
3) Sketch path with N stages
1

4) Estimate least delay D  NF  P


N

5) Determine best stage effort ˆf  F N1


6) Find gate sizes gi Couti
Cini 

31 © 2020 Arm Limited


Limits of Logical Effort
• Chicken and egg problem
• Need path to compute G
• But don’t know the number of stages without G
• Simplistic delay model
• Neglects input rise time effects
• Interconnect
• Iteration required in designs with wire
• Maximum speed only
• Not minimum area/power for constrained delay

32 © 2020 Arm Limited


Summary
• Logical effort is useful for thinking of delay in circuits
• Numeric logical effort characterizes gates
• NANDs are faster than NORs in CMOS
• Paths are fastest when effort delays are ~4
• Path delay is weakly sensitive to stages, sizes
• But using fewer stages doesn’t mean faster paths
• Delay of path is about log4F FO4 inverter delays
• Inverters and NAND2 best for driving large caps
• Provides language for discussing fast circuits
• But requires practice to master

33 © 2020 Arm Limited


CMOS VLSI Design

Lecture 8:
Power
Learning Objectives
At the end of this lecture, you should be able to:
• Use equations to explain the sources of power dissipation in a chip.
• Estimate dynamic and static power consumption.
• Describe methods to reduce dynamic and static power losses.

2 © 2020 Arm Limited


Power and Energy
• Power is drawn from a voltage source attached to the VDD pin(s) of a chip.

• Instantaneous Power: P(t )  I (t )V (t )


T
• Energy: E   P (t )dt
0
T
• Average Power: E 1
Pavg    P(t )dt
T T 0

3 © 2020 Arm Limited


Power in Circuit Elements
PVDD  t   I DD  t VDD

VR2  t 
PR  t    I R2  t  R
R

 
dV
EC   I  t V  t  dt   C V  t  dt
0 0
dt
VC

 C  V  t dV  12 CVC2
0

4 © 2020 Arm Limited


Charging a Capacitor
• When the gate output rises
• Energy stored in the capacitor is
EC  12 CLVDD
2

• But energy drawn from the supply is


 
dV
EVDD   I  t VDD dt   CL VDD dt
0 0
dt
VDD

 CLVDD  dV  C V
2
L DD
0

• Half the energy from VDD is dissipated in the pMOS transistor as heat
and other half stored in capacitor
• When the gate output falls
• Energy in the capacitor is dumped to GND
• Dissipated as heat in the nMOS transistor

5 © 2020 Arm Limited


Switching Waveforms
• Example: VDD = 1.0 V, CL = 150 fF, f = 1 GHz

6 © 2020 Arm Limited


Switching Power
T
1
Pswitching   iDD (t )VDD dt VDD
T 0 iDD(t)
fsw

T
VDD

T 0 iDD (t )dt C

VDD
 Tfsw CVDD 
T
 CVDD 2 f sw

7 © 2020 Arm Limited


Activity Factor
• Suppose the system clock frequency = f
• Let fsw = a f, where a = activity factor
• If the signal is a clock, a = 1
• If the signal switches once per cycle, a = ½

• Dynamic power:

Pswitching  a CVDD 2 f

8 © 2020 Arm Limited


Short Circuit Current
• When transistors switch, both nMOS and pMOS networks may be momentarily ON at
once
• Leads to a blip of “short circuit” current.
• <10% of dynamic power if rise/fall times are comparable for input and output
• We will generally ignore this component

9 © 2020 Arm Limited


Power Dissipation Sources
• Ptotal = Pdynamic + Pstatic
• Dynamic power: Pdynamic = Pswitching + Pshort circuit
• Switching load capacitances
• Short-circuit current
• Static power: Pstatic = (Isub + Igate + Ijunct + Icontention)VDD
• Subthreshold leakage
• Gate leakage
• Junction leakage
• Contention current

10 © 2020 Arm Limited


Dynamic Power Example
• 1 billion transistor chip
• 50 M logic transistors
– Average width: 12 λ
– Activity factor = 0.1
• 950 M memory transistors
– Average width: 4 λ
– Activity factor = 0.02
• 1.0 V 65 nm process
• C = 1 fF/μm (gate) + 0.8 fF/μm (diffusion)
• Estimate dynamic power consumption @ 1 GHz. Neglect wire capacitance and short-
circuit current.

11 © 2020 Arm Limited


Solution

Clogic   50 106  12  0.025 m /  1.8 fF /  m   27 nF


Cmem   950 106   4  0.025 m /  1.8 fF /  m   171 nF

Pdynamic  0.1Clogic  0.02Cmem  1.0  1.0 GHz   6.1 W


2

12 © 2020 Arm Limited


Dynamic Power Reduction

• Pswitching  a CVDD 2 f
• Try to minimize:
• Activity factor
• Capacitance
• Supply voltage
• Frequency

13 © 2020 Arm Limited


Activity Factor Estimation
• Let Pi = Prob(node i = 1)
• Pi = 1-Pi
• ai = Pi * Pi
• Completely random data have P = 0.5 and a = 0.25
• Data are often not completely random
• e.g., upper bits of 64-bit words representing bank account balances are usually 0
• Data propagating through ANDs and ORs have lower activity factor
• Depends on design, but typically a ≈ 0.1

14 © 2020 Arm Limited


Switching Probability

15 © 2020 Arm Limited


Example
• A 4-input AND is built out of two levels of gates
• Estimate the activity factor at each node if the inputs have P = 0.5

16 © 2020 Arm Limited


Clock Gating
• The best way to reduce the activity is to turn off the clock to registers in unused blocks
• Saves clock activity (a = 1)
• Eliminates all switching activity in the block
• Requires determining if block will be used

17 © 2020 Arm Limited


Capacitance
• Gate capacitance
• Fewer stages of logic
• Small gate sizes
• Wire capacitance
• Good floor planning to keep communicating blocks close to each other
• Drive long wires with inverters or buffers rather than complex gates

18 © 2020 Arm Limited


Voltage/Frequency
• Run each block at the lowest possible voltage and frequency that meets performance
requirements
• Voltage Domains
• Provide separate supplies to different blocks
• Level converters required when crossing
from low to high VDD domains

• Dynamic Voltage Scaling


• Adjust VDD and f according to
workload

19 © 2020 Arm Limited


Static Power
• Static power is consumed even when chip is quiescent.
• Leakage draws power from nominally OFF devices
• Ratioed circuits burn power in fight between ON transistors

20 © 2020 Arm Limited


Static Power Example
• Revisit power estimation for 1 billion transistor chip
• Estimate static power consumption
• Subthreshold leakage
– Normal Vt: 100 nA/m
– High Vt: 10 nA/m
– High Vt used in all memories and in 95% of logic gates
• Gate leakage 5 nA/m
• Junction leakage negligible

21 © 2020 Arm Limited


Solution

Wnormal-Vt   50 106  12  0.025 m /   0.05   0.75 106  m

Whigh-Vt   50 106  12  0.95    950 106   4    0.025 m /    109.25 106  m

I sub  Wnormal-Vt 100 nA/ m+Whigh-Vt 10 nA/ m  / 2  584 mA

 
I gate   Wnormal-Vt  Whigh-Vt  5 nA/ m  / 2  275 mA
 
Pstatic   584 mA  275 mA 1.0 V   859 mW

22 © 2020 Arm Limited


Subthreshold Leakage
• For Vds > 50 mV
Typical values in 65 nm
Vgs  Vds VDD   k Vsb Ioff = 100 nA/m @ Vt = 0.3 V
I sub  I off 10 S Ioff = 10 nA/m @ Vt = 0.4 V
Ioff = 1 nA/m @ Vt = 0.5 V
 = 0.1
• Ioff = leakage at Vgs = 0, Vds = VDD
k = 0.1
S = 100 mV/decade

23 © 2020 Arm Limited


Stack Effect
• Series OFF transistors have less leakage
• Vx > 0, so N2 has negative Vgs

VDD
Vx 
1  2  k
• Leakage through 2-stack reduces ~10x
• Leakage through 3-stack reduces further
 1  k 
VDD  
 1 2  k  VDD
 
I sub  I off 10 S
 I off 10 S

24 © 2020 Arm Limited


Leakage Control
• Leakage and delay trade off
• Aim for low leakage in sleep and low delay in active mode
• To reduce leakage:
• Increase Vt: multiple Vt
– Use low Vt only in critical circuits
• Increase Vs: stack effect
– Input vector control in sleep
• Decrease Vb
– Reverse body bias in sleep
– Or forward body bias in active mode

25 © 2020 Arm Limited


Gate Leakage
• Extremely strong function of tox and Vgs
• Negligible for older processes
• Approaches subthreshold leakage at 65 nm and below in some processes
• An order of magnitude less for pMOS than nMOS
• Control leakage in the process using tox > 10.5 Å
• High-k gate dielectrics help
• Some processes provide multiple tox
– e.g., thicker oxide for 3.3 V I/O transistors
• Control leakage in circuits by limiting VDD

26 © 2020 Arm Limited


NAND3 Leakage Example
• 100 nm process
Ign = 6.3 nA Igp = 0
Ioffn = 5.63 nA Ioffp = 9.3 nA

Data from [Lee03]

27 © 2020 Arm Limited


Junction Leakage
• From reverse-biased p-n junctions
• Between diffusion and substrate or well
• Ordinary diode leakage is negligible
• Band-to-band tunneling (BTBT) can be significant
• Especially in high-Vt transistors where other leakage is small
• Worst at Vdb = VDD
• Gate-induced drain leakage (GIDL) exacerbates
• Worst for Vgd = -VDD (or more negative)

28 © 2020 Arm Limited


Power Gating
• Turn OFF power to blocks when they are idle to save
leakage
• Use virtual VDD (VDDV)
• Gate outputs to prevent
invalid logic levels to next block

• Voltage drop across sleep transistor degrades performance


during normal operation
• Size the transistor wide enough to minimize impact
• Switching wide sleep transistor costs dynamic power
• Only justified when circuit sleeps long enough

29 © 2020 Arm Limited

You might also like