You are on page 1of 767

VLSI System Design

Overview of VLSI Design Issues

Professor: Dr. Marcel Jacomet (based on transparencies designed by


Chris Terman at MIT, completely updated and adapted at MicroLab-
MicroLab-I3S)

Overview
‹Microelectronic history
‹the complexity of microelectronics
‹design steps
Goal: You are familiar with the microelectronics history,
have an idea about the microelectronics complexity and
you have an overview of the VLSI design steps.
MicroLab, VLSI-1 (1/28)

JMM v1.4
What’s expected of you
Class/Homework Readings from a Starter Guide to
VHDL and some articles. Some
50% in class
50% homework problems to be worked at home. Self-
Self-
study of the VHDL language with help
of the CBT CD from Doulouse.
Doulouse.

Project
Some design exercises to be done in
40% of final grade the lab. Specify, design and simulate
a small VHDL design project using a
data-
data-path / finit state machine.
Place & route it on a FPGA target
technology (due date: July 19th at
13h00, 2002)
Test
One 70 minute in-
in-class test. Meant
60% of final grade
to be duck soup if you’ve been
coming to lectures and doing the lab
and homework (date: Friday July 12th,
2002).

MicroLab, VLSI-1 (2/28)

JMM v1.4
Timetable 4th Semester:
Introduction to VLSI System Design
Date Topic Self-
Self-Study
11-
11-15.3. vlsi1: history & complexity A VLSI tutorial
18-
18-22.3. vlsi8: micro technologies How a silicon int.
25-
25-29.3. --
11-
11-19.4. vlsi8: micro technologies article Hoff
22-
22-26.4. vlsi21: top-
top-down design, VHDL VHDL/CBT
29.4-
29.4-3.5. Ex400, 401 VHDL/CBT
6-10.5. -- VHDL/CBT
13-
13-17.5. vlsi21 & Ex402 VHDL
20-
20-24.5. vlsi21 & Ex404,405 VHDL
27-
27-31.5. vlsi21 & Ex406-
Ex406-408 VHDL
3-7.6. vlsi21 & Ex409 chapter 5
10-
10-14.6. vlsi21: & Ex410 VHDL finish
17-
17-21.6. Ex450 project
24-
24-28.6 Ex451 project
1-5.7. Ex452 project
8-12.6. Test project
15-
15-19. 6 test discussion and outlook project
19.6. at 13h00 project due
MicroLab, VLSI-1 (3/28)

JMM v1.4
So, what’s VLSI Systems Design
all about?
You’ll get a bottom-
bottom-up tour of how integrated
circuits are engineered. We’ll talk about
Š field-
field-effect transistors: how they work, how they’re
built, effects of new technologies
Š various design and layout techniques, from the
ordinary to the bizarre, for creating combinational
and sequential circuits, datapaths,
datapaths, memories,
buffers, regular logic structures, …
Š how you tackle the problem of designing circuits
with 1,000,000 gates -- you’re not in Digital
Technique anymore!

MicroLab, VLSI-1 (4/28)

JMM v1.4
Key Technology Microelectronics

Š microelectronics is a key technology of the world


economy
Š technology development is extremely aggressive
Š post-
post-grade engineering education is important
Š influence of other technologies like software
engineering
Š key technologies may be used as weapons. 1991
Japan hold 80% share of the world production of
4MB DRAMs.
DRAMs. Artificial raw material shortage are
disastrous.
Š very few Swiss chip fabs.
fabs. Our raw material is the
high education standard, that means YOU

MicroLab, VLSI-1 (5/28)

JMM v1.4
What is a VLSI Circuit?

VERY LARGE SCALE INTEGRATED CIRCUIT

Technique where many circuit components and


the wiring that connects them are manufactured
simultaneously into a compact, reliable and
inexpensive chip.

Early (circa 1977) characterization of circuit


“size” before people realized that the number of
components per chip was quadrupling every 24
months ((Moore’s
Moore’s Law)! This growth rate has
slowed in recent years… can you guess why?

MicroLab, VLSI-1 (6/28)

JMM v1.4
Course Outline/Brief history

Bell Labs lays the groundwork:


1940: Ohl develops PN junction
1945: Shockley’s lab established
1947: Bardeen and Brattain create
point-
point-contact transistor with
two PN junctions. Gain = 18.

1951: Shockley develops junction


transistor which can be
manufactured in quantity.
1952: Dummer forecasts “solid
block [with] layers of
insulating, conducting and
amplifying materials”
1954: The first transistor radio!
Also, TI makes first silicon
transistor (price $2.50)

MicroLab, VLSI-1 (7/28)

JMM v1.4
Early integration

Jack Kilby,
Kilby, working at Texas Instruments, first dreamed up the idea
of a monolithic “integrated circuit” in July 1959. By the end ooff the
year, he had constructed several examples, including the flip-
flip-flop
shown in the patent drawing above. Components are connected by
hand-
hand-soldered wires and isolated by “shaping” and pn diodes used as
resistors.

Robert Noyce experimented in the late 40’s with


transistors while a physics major at college. He went to MIT where
where
“much to his surprise, few people had even heard about the
transistor.” After getting his PhD in 1953, he worked in industry,
industry,
finally arriving at Mountain View, CA and Shockley Semiconductor
Labs in 1955.

MicroLab, VLSI-1 (8/28)

JMM v1.4
“ “

In 1957, Noyce left Shockley’s


lab to form Fairchild Semi-
Semi-
conductor with Jean Hoerni.
Hoerni.
Gordon Moore is another
founder.

In early 1958, Hoerni invents


technique for diffusing impurities into
the silicon to build planar transistors
and then using a SiO2 insulator.

In mid 1959, Noyce develops


first true IC using planar transistors,
back-
back-to-
to-back pn junctions for
isolation, diode-
diode-isolated silicon
resistors and SiO2 insulation with
evaporated metal wiring on top.

MicroLab, VLSI-1 (9/28)

JMM v1.4
Practice makes perfect...
1.5 mm
1961: TI and Fairchild introduced
the first logic IC’s (cost ~$50 in
quantity!). This is a dual flip-
flip-flop with 4
transistors.

1963: Densities and yields are improving.


This circuit has four flip flops.

0.97 mm

1967: Fairchild markets the semi-


semi-custom
chip shown below. Transistors (organized in
columns) could be easily rewired using a
two-
two-layer interconnect to create different
circuits. This circuit contains ~150 logic
gates.

3.81 mm

1968: Noyce and Moore leave


Fairchild and found Intel. No
business plan, just a promise
to specialize in memory chips.
They raise $3M in two days
and move to Santa Clara. By
1971 Intel had 500 employees;
by 1983 it had 21,500
employees and $1100M in sales.

MicroLab, VLSI-1 (10/28)

JMM v1.4
The Big Bang
2.87 mm

In 1970, making good on


its promise to its investors Intel
starts selling a 1K bit RAM, the
1103. It was a bear to interface to,
but its density and cost make it the
only game it town.

In 1971 Intel introduces the first


microprocessor, designed by Ted
Hoff. The 4004 had 4- 4-bit buses and
a clock rate of 108KHz. It had 2300
transistors and was built in a 10um
process. It never captured much
interest in the market and was soon
eclipsed by its more capable brothers.

MicroLab, VLSI-1 (11/28)

JMM v1.4
Exponential Growth

Introduced in 1972, the 8008 had 3,500


transistors supporting a byte-
byte-wide data path.
Despite its limitations, the 8008 was the first
microprocessor capable of playing the role of
computer CPU as demonstrated on the cover of
the July ‘74 issue of Radio-
Radio-Electronics.

Last, but not least, on our tour is the


8080. Introduced in 1974, the 8080
had 6,000 transistors fab’ed in a 6um
process. The clock rate was 2Mhz, more
than enough to ignite the personal
computer industry. At least Paul Allen
and his partner thought so when they
wrote a BASIC interpreter for the 8080
in 1975. They would later collaborate in
another, more profitable, venture...

MicroLab, VLSI-1 (12/28)

JMM v1.4
Today

AVP-
AVP-III Video Codec from Lucent Technologies

Many disciplines have contributed to the current state of the art


art
in VLSI design:
Š solid-
solid-state physics Š circuit design & layout
Š materials science Š architecture
Š lithography and fab Š algorithms
Š device modeling Š CAD tools
We’ll be concentrating on the right-
right-hand column
MicroLab, VLSI-1 (13/28)

JMM v1.4
“Computer-
“Computer-
Aided CAD Tools #1
Design”

Š organize Š generate Š verify


Symbolic layout tools to
Standard-
Standard-cell place ease the task of physical
and route for “random” design; mask verification
logic. to ensure manufacturability.

Circuit analysis programs predict circuit behavior at


all the process corners. Gate-
Gate-level and behavioral
simulators help you get it right the first time!
Tools to do the tedious, repetitive work such as
routing,“tiling” a mosaic of building-
building-block cells, or
verifying that the layout and schematic match.
MicroLab, VLSI-1 (14/28)

JMM v1.4
CAD Tools #2

Problem:
Š designing highly complex VLSI circuits
(100K to xM fets)
fets)
Š classical, iterative procedures are unsuitable
Š precise transistor models are necessary for
reliable predictions Æ data inflation

Solution:
Š new design methodologies
Š powerful design tools
Š high level design languages
Š silicon compiler would be useful

MicroLab, VLSI-1 (15/28)

JMM v1.4
VLSI Design Challenge

Goal:
designing circuits with increasing complexity in
always shorter times

Š computer has to take over routine work


Š deliberate the designer from unnecessary low
qualification work
Š shift of design activities to higher level abstract
work
Š computer has to support new design methods

MicroLab, VLSI-1 (16/28)

JMM v1.4
Chip Complexity #1

Chip classification according to number of active


elements and minimal feature size:

classification #transistors example


SSI 1 - 100 gates
MSI 100 - 1k registers
LSI 1k - 100k uP
VLSI 100K - RAM, sig. proc.
ULSI ?

year minimal channel length


1970 10µm
10µ
1980 5µm
1985 2µm
1992 0.5µm
0.5µ
2002
2002 0.13µm
0.13
2010 ?

MicroLab, VLSI-1 (17/28)

JMM v1.4
Chip Complexity #2

can you really imagine the chip complexity of


today's VLSI chips and not just express it as a mere
number

street map image


year feature block chip town
10x10µm 200m
1970 10x10µ 2mm Biel
10x5µm 200m
1980 10x5µ 5mm Paris
10x0.7µ 200m
1992 10x0.7µ 10mm Switzerland

MicroLab, VLSI-1 (18/28)

JMM v1.4
Architecture

(Multiple choice)
This is a picture of

(A) a programmable general purpose ASIC with 1/4 million


0.7µm CMOS
transistors on a 40mm2 designed in a 0.7µ
full custom technology.

(B) a processor able to execute 64 knowledge based rules


in parallel due to a 3 stage pipelined architecture with
hard-
hard-coded adder, multiplier, divider architecture.

(C) the fastest fuzzy processor in the world, designed


by MicroLab-
MicroLab-I3S and presented at the international
FUZZ‘98 conference in New Orleans

ANSWER: _________
MicroLab, VLSI-1 (19/28)

JMM v1.4
Circuit Design & Layout
Standard cell Full custom

RAM Generator

Q: Which engineer drew the most fets?


fets? ______
MicroLab, VLSI-1 (20/28)

JMM v1.4
VLSI: The Ideal Implementation
Medium?
VLSI
Š gives the designer control over almost everything:
architecture, logic design, speed, area, power, …
Š densities are increasing, costs decreasing with each
passing year
Š is used by almost everyone: “No one gets fired for
building an ASIC”
Š was the enabling technology for much of the
economic growth of the 80’s and 90’s. It will no
doubt continue in its starring role for some time
come.
Is life really a bowl of cherries?

MicroLab, VLSI-1 (21/28)

JMM v1.4
VLSI Fact
Fact--of-
of-Life #1:
“So much to do, so little time”
You need a design methodology :

Š budget ($, speed, area, power, schedule, risk)

Š low-
low-level building blocks,
high-
high-level architecture

Š behavioural design, verification

Š logic design, verification

Š layout, verification

MicroLab, VLSI-1 (22/28)

JMM v1.4
VLSI Fact-
Fact-of-
of-Life #2:
“You can’t reach in and fix it”
verification”” kept appearing in
Notice that the word “verification
the previous slide.
Mistakes can be costly:
find bug(s) ? ?
reverify 1 week Ecu 10k
new masks 3 days Ecu 25k
fab run 12 weeks Ecu 1k/wafer
slip ship date Ecu Ecu Ecu

There’s a lot that needs checking:


Š circuit must operate at all “corners”
verified at building block level
Š logic must be correct, operate reliably
verified at RTL/gate level
Š chip has to interoperate with system
verified at behavioral level
Š chip has to be manufacturable
manufacturable
verified at mask level, at tester

MicroLab, VLSI-1 (23/28)

JMM v1.4
VLSI Fact-
Fact-of-
of-Life #3:
“Verification is a tedious task”

MicroLab, VLSI-1 (24/28)

JMM v1.4
VLSI Fact
Fact--of-
of-Life #4:
“You can’t find all the bugs”
The key word here is “find”:
Š one can’t explore the behaviour of the circuit under all
possible conditions
Š some of the bugs arise from unanticipated interactions
which, by definition, one never thinks of testing
Š it’s not clear when one is “done” looking for bugs!
Time pressures mean that most searches stop too soon.

The trick is to choose some implementation rules that


result in a circuit that is correct by construction*. For
example:
Š choose a simple clocking scheme
Š module inputs must go only to fet gates
Š disallow unclocked feedback
Š make register t(clk
t(clk-
clk-to-
to-Q) > t(hold)+skew
Š use poly only for local interconnect
Š no diffusion wires
Š etc., etc., etc.
* or at least avoid as many problems as possible!
MicroLab, VLSI-1 (25/28)

JMM v1.4
VLSI Fact-
Fact-of-
of-Life #5:
“Nobody’s perfect”

Plan for what happens after you turn it on and


nothing happens.
Š provide lot’s of observability and controlability.
You’ll need to localize and then find the bug.
Š have a way to run the chip slowly and/or stop it
without it burning up or loosing bits.
Š figure out how to track down performance
problems without relying on fast I/O (tester pins
are slow!)
Š leave room in the budget
(time, Ecu)
Ecu) for debugging.
Š write and run your
manufacturing tests
before tape out.

MicroLab, VLSI-1 (26/28)

JMM v1.4
Microelectronics in 4th Semester

history & microelectronic


complexity technologies

EXPERIENCE VHDL
exercises with data path / fsm
CAD tools project

synthesis
design flow

Course material
‹ Textbook from Weste & Eshraghian for
4th and 5th semester (voluntary)
‹ Copy of transparencies (placeholder for private notes)

‹ VHDL Starter (recommended)

‹ CAD Exercises on the MicroLab web pages

‹ CBT CD on VHDL for your PC (lending


from MicroLab in 4th semester)
MicroLab, VLSI-1 (27/28)

JMM v1.4 ‹ different small articles


Coming Up...

We’ll be traveling top-


top-down in 4th semester and
bottom-
bottom-up in 5 & 6 semester:

Next topic…
Microelectronic technologies like standard cell,
gate array, sea-
sea-of-
of-gates, macro cell, FPGA, tiny
micro-
micro-controllers.

Readings for next time…


web CBT tutorials see on
http://www.microlab
http://www.microlab.
microlab.ch/academics/courses
ch/academics/courses
‹ How a silicon integrated circuit is made (web CBT)
‹ A VLSI Tutorial up to chapter with NAND/NOR
(web CBT from Uni Manchester)
‹ T. Hoff: Article about the µP History (G(German
erman)
erman)
‹ To learn more about Intel’s early days and to ogle
some die photos of oldie-
oldie-but-
but-goodie chips browse
at the Intel link of the MicroLab VLSI course web
page.

MicroLab, VLSI-1 (28/28)

JMM v1.4
VLSI Design I
The MOSFET model Wow !
Are device models as
nice as Cindy ?

Overview
The large signal MOSFET model and second order
effects. MOSFET capacitances.
Introduction in fet process technology

Goal: You can use the large signal equivalent MOS


device equation. You are familiar with second order
effects like body effect, channel length modulation.
You know the MOS capacitances. You know the
basic steps in MOS fabrication.
MicroLab, VLSI-2 (1/24)

JMM v1.4
Let’s build a MOSFET
There are lots of different recipes to choose from.
Like most things in life, you get what you pay for:
the ability to have good bipolar devices, radiation
hardness, reduced latch-
latch-up and substrate noise, …
are all extra cost options. We’ll consider a general
process: bulk CMOS with a p- p-type substrate:

500um slice of a silicon ingot that has been


doped with an acceptor (typically boron) to
Use <100> surface increase the concentration of holes to 1014/cm3
to minimize surface - 1018/cm3.
charge

p-type

Back is metal
metalliz
lized to provide
a good ground connection.
Good for n-
n-channel fets,
fets, but p-
p-channel
fets will need a n-
n-type “well” (or tub) to
live in!
MicroLab, VLSI-2 (2/24)

JMM v1.4
Next, a “thick” (0.4um) layer of silicon dioxide, called
field oxide, is formed on the surface by oxidation in wet
oxygen. This is then etched to expose surface where we
want to make a mosfet:
mosfet:

Now grow a “thin” (0.01um = 100 Å) layer of silicon


dioxide, called gate oxide, on the surface by exposing the
wafer to dry oxygen.

The gate oxide needs to be of high quality: uniform


thickness, no defects! The thinner the gate oxide, the
more oomph the fet will have (we’ll see why soon) but the
harder it is to make it defect free.

MicroLab, VLSI-2 (3/24)

JMM v1.4
On top of the thin oxide a 0.7um thick layer of
polycrystalline silicon, called polysilicon or poly for
short, is deposited by CVD. The poly layer is patterned
and plasma etched (thin ox not covered by poly is etched
away too!) exposing the surface where the source and
drain junctions will be formed:

gate oxide poly wires field oxide


(only under poly)

exposed surface for source


and drain junctions p

Poly has a high sheet resistance (25 ohms/square) which


can be reduced by adding a layer of a silicided refractory
metal such titanium (TiSi2), tantalum (TaSi2) or
molybdenum (MoSi2). These have sheet resistances of 1,
3 or 5 ohms per square, respectively. This is great for
memory structures that have lots of poly wiring.

MicroLab, VLSI-2 (4/24)

JMM v1.4
The entire surface is doped, either by diffusion or ion
implantation, with phosphorus (an electron donor) which
creates two n-
n-type regions in the substrate. The
phosphorus also penetrates the poly reducing its resistance
and affecting the nfet’s threshold.

diffusions are “self-


“self-aligned”
with poly

n+ n+

n+ wires: 20-
20-30 ohms/sq. p

Finally an intermediate oxide layer is grown and then


reflowed to flatten its surface. Holes are etched in the
oxide (where contacts to poly/diff are wanted) and
alumin
aluminum deposited, patterned and etched.

metal wires (0.08 ohms/square)

???

diff contact (0.25 - 10 ohms) n- channel MOS


field effect transistor!
MicroLab, VLSI-2 (5/24)

JMM v1.4
NFET Operation
Picture shows configuration when Vgs < Vto
S G D

Ids = 0

n+ n+

mobile holes, mobile electrons,


fixed negative ions B fixed positive ions
(n+ means heavily
depletion layer doped with donors,
no mobile carriers, doesn’t imply
but fixed negative ions positive charge!)
(slight intrusion into n+,
but mostly in p area) Terminal with higher
G voltage is labelled D,
Other symbols: the other is labelled S
so Ids >= 0.

S D

B almost always ground


MicroLab, VLSI-2 (6/24)

JMM v1.4
FET = field effect transistor
The four terminals of a fet (gate, source, drain and bulk)
connect to conducting surfaces that generate a complicated
set of electric fields in the channel region which depend on
the relative voltages of each terminal.
Picture shows configuration
when Vgb > Vto gate

inversion
happens here

Eh Ev
source drain

bulk

INVERSION: CONDUCTION:
A sufficiently str
strong
ong vertical If a channel exists, a
field will attract enough horizontal field will cause
electrons to the surface to a drift current from the
create a conducting n-
n-type channel drain to the source.
between the source and drain. Expect Ids proportional
to Vds*(W/L)?
Vds*(W/L)?

MicroLab, VLSI-2 (7/24)

JMM v1.4
Threshold voltage
The gate voltage required to form the channel is called the threshold
voltage. Many factors affect the gate-
gate-source voltage at which the
channel becomes conductive. Threshold voltage for source-
source-bulk voltage
zero:

VTO = Vt − ms + Vfb

   
Q Q ε ox
VTO = 2φ F + b + φ ms − fc
, C ox C ox t ox

kT  N DN A 
n-channel 2 kT ln N A 
0.61V for n- ln 2 
p-channel q  n i 
-0.61V for p- q  ni 
2 ε si q N A 2φ F

MicroLab, VLSI-2 (8/24)

JMM v1.4
Body effect (second order)
As Vsb increases, the depth of the depletion region
increases, exposing more of the fixed acceptor (i.e.
negative) ions in the substrate.
Thus the second term in the threshold voltage equation on
the previous slide increases from

2ε si qN A 2 ΦF
to
2ε si qN A (Vsb + 2 ΦF )

the threshold voltage of the n-


n-channel transistor is now:
2ε si qN A
Vtn = Vtn0 + γ ( Vsb + 2 ΦF − 2 ΦF ) γ=
C ox

As we’ll see, this effect


T2
comes into play in
series-
series-connected fets Vsb>0
where only one of the T1
fets will have Vsb = 0
and the other fets will Vt2> Vt1 Vsb=0
have Vsb > 0 and a
higher threshold voltage.
MicroLab, VLSI-2 (9/24)

JMM v1.4
Basic DC equations

MOS transistors have 3 regions of operation:


Š cutoff region (subthreshold
(subthreshold)
subthreshold)
Š linear region (triode region)
Š saturated region (active region)
polysilicon gate
SiO2
source diffusion
drain diffusion

Cutoff or subthreshold region:


Vgs <=V
<=Vt
Ids = 0
There is still a small current described in the
second order effects (weak inversion). Important to
model for analog circuits: I ds ∝ Vds
MicroLab, VLSI-2 (10/24)

JMM v1.4
“Linear” operating region
Vs Vgs > Vt 0 < Vds < Vdsat

Ids

Larger Vgs creates Larger Vds increases drift current but


deeper channel which also reduces vertical field component
increases Ids which in turn makes channel less deep.
Channel will pinch-
pinch-off, when
channel length is mobility Vds = Vgs - Vt = Vdsat
almost always min (un > up)
allowable fet gain factor k=µC
k= Cox

W µ ε ox  Vds2 
I ds =
L t ox 
(
 Vgs − Vt Vds −
2 
)
max value at Vds = Vdsat,
but then channel is only linear when Vds is small,
pinched off (see next slide) otherwise parabolic
MicroLab, VLSI-2 (11/24)

JMM v1.4
Saturated operating region
Vs Vgs > Vt Vdsat < Vds

Ids

Voltage at channel end Electrons arriving from source are


remains essentially injected into drain depletion region
constant at Vdsat so and accelerated towards drain by field
drift current also remains proportional to Vds - Vdsat usually
constant: device is in reaching the drift velocity limit.
saturation.

W µ ε ox
( )
2
I ds (sat ) = Vgs − Vt
2 L t ox

this is just Ids from previous slide


evaluated at Vds = Vdsat

MicroLab, VLSI-2 (12/24)

JMM v1.4
Channel--length modulation
Channel
(second order)
Vs Vgs > Vt Vdsat < Vds

Ids

L’ = L - dL
dL

This looks just like a As Vds increases,


fet with a channel length dL get larger
of L’ < L. Shorter L’ implies
greater Ids...

As Vds increases the effective channel length gets


shorter so Ids(sat) increases. dL is proportional to
Vds − Vdsat but most people approximate channel
length modulation as a linear effect:
W µ ε ox
( ) (1 + λ V
2
I ds (sat ) = Vgs − Vt ds )
2 L t ox
MicroLab, VLSI-2 (13/24)

JMM v1.4
NFET Ids curves
“Put it together and what have you got?”

plot of Ids vs. Vds for Vgs = 0 ,1, 2, 3, 4 and 5V

Can you find the following in the plot?


Ids vs. Vds when Vgs = 0V
Ids vs. Vds when Vgs = 5V
value of Vt
value of Vdsat
evidence of body effect
evidence of channel length modulation

MicroLab, VLSI-2 (14/24)

JMM v1.4
SPICE Models
There are different models used in circuit simulators:
Š level 1 is our simple model including the most
important second order effects described
Š level 2 model is based on device physics
Š level 3 is a semi-
semi-empirical model allowing to match
equations to the real circuit
circuit:: example BSIM model
from Berkeley models subthreshold characteristics

Š summary of the main SPICE DC parameters used in


all three models at the end of this chapter

.
M1 4 3 5 0 nfet W=1u L=0.5u AS=1p AD=1p PS=3u PD=3u
.
.
.MODEL nfet NMOS
+TOX=1E-
+TOX=1E-8
+CGB0=345p CGS0=138p CGD0=138p
+CJ=775u CJSW=344p MJ=0.35 MJSW=0.26 PB=0.75
+. . . .
.
.
MicroLab, VLSI-2 (15/24)

JMM v1.4
MOSFET Capacitance Estimation
the dynamic response of MOS systems strongly
depends on the parasitic capacitances associated with
the MOS device. The total load capacitance on the
output of a CMOS gate is the sum of:
Š gate capacitance (of other inputs connected to out)
Š diffusion capacitance (of drain/source regions)
Š routing capacitances (output to other inputs)

Cgd drain
Cdb
gate substrate

Cgs Csb
source
gate
Cgb

Cgs Cgb Cgd tox


channel
source drain
depletion
layer
Csb Cdb
substrate
MicroLab, VLSI-2 (16/24)

JMM v1.4
MOSFET gate capacitances
Cg = Cgd + Cgs + Cgb
Oxide-
Oxide-related capacitances come in two forms:

Š overlap capacitance (extrinsic) since gate slightly


overhangs diffusions and bulk:
for both Cgs and Cgd amount of overlap
C(overlap) = W LD Cox for SPICE
for Cgb Cgs = W CGS0
C(overlap) = 2L CGB0 Cgd = W CGD0
Cgb = 2L CGB0

Š channel-
channel-charge related capacitances (intrinsic):
cut-
cut-off: Cgb = Cox W L
Cgs = Cgd = 0
shielded by channel
linear: Cgb = 0
Cgs = Cgd = 0.5 Cox W L
equally shared between S and D
note capacitive coupling of gate and drain/source
saturation: Cgb = 0 channel pinched off
Cgd = 0 channel shortened
Cgs = 0.67 Cox W L

MicroLab, VLSI-2 (17/24)

JMM v1.4
MOSFET diffusion capacitances
Junction capacitances Cdb and Csb are a function of the
applied terminal voltages and diffusion dimensions:

source/drain diffusion
xj

channel

sidewall faces bottom junction faces sidewalls face p+


channel p-type substrate channel stop
zero-
zero-bias C/unit area of bottom junction zero-
zero-bias C/unit length of
area of diffusion sidewall junction
perimeter of diffusion
C jA C jsw P
C diff = Mj
+ Mjsw
negative for  Vj   Vj 
reverse biased  1 −   1 −  grading coeff.
coeff.
 Vb   Vb 
built-
built-in junction
potential junction voltage
grading coeff.
coeff.

MicroLab, VLSI-2 (18/24)

JMM v1.4
P-channel MOSFETs
S G D

p+ p+

n
p
threshold voltage is PFET is built inside its
negative since we need B own “substrate”: a n-
n-type
attract holes to form well or tub diffused into
inversion layer p-type bulk substrate.
Don’t forget well contacts!

Other symbols:
G Terminal with lower
voltage is labelled D,
the other is labelled S

S D

off: Vgs > Vt B n-well always connected


lin:
lin: Vgs>Vt, Vds>Vgs-Vt to Vdd to keep pn
sat: Vgs>Vt, Vds<Vgs-Vt junction back-
back-biased
MicroLab, VLSI-2 (19/24)

JMM v1.4
Depletion--mode MOSFETs
Depletion
S G D

n+ n+

channel doped with donors


B to give negative threshold
voltage, i.e., depletion fets
are always on.

This mosfet is always conducting but, like ordinary


enhancement fets,
fets, it will conduct more current as Vgs
increases. One can build logic circuits with only n-
n-
channel devices (NMOS): enhancement fets for pulldowns
and depletion fets as static pullups.
pullups. Since NMOS logic
dissipates DC power it’s been largely replaced by CMOS.

MicroLab, VLSI-2 (20/24)

JMM v1.4
Coming Up...
Next topic…
Static characteristics of MOS inverters: input
and output voltages, noise margins, power
dissipation.

Readings for next time…


Weste:
‹ sections 2 thru 2.23 except 2.2.2.4 - 2.2.2.7 (fet
(fet
models),
‹ 3 thru 3.2.2 (process technology) and

‹ 4.3 through 4.3.4 (capacitances)

CBT:
Study the chip fabrication text of the university of
Manchester at the MicroLab VLSI course web link.

MicroLab, VLSI-2 (21/24)

JMM v1.4
Useful Constants

sym value units description


ε0 8.8542E-
8.8542E-12 F/m permittivity
εox 3.9 ε0 F/m permittivity of SiO2
εSi 11.7 ε0 F/m permittivity of silicon
VT 25.8 mV kT/q
kT/q (@300°K)
q 1.6022E-
1.6022E-19 C charge of electron
k 1.381E-
1.381E-23 J/°K Boltzmann‘s
Boltzmann‘s constant
ni 1.45E10 cm-3 intrinsic carrier concentration

MicroLab, VLSI-2 (22/24)

JMM v1.4
Alcatel 0,5um Process Parameters
sym param nmos pmos units description
Vt0 VTO 0.61 -0.61 V threshold voltage
tox TOX 1E-1E-8 1E-
1E-8 m thin oxide thickness
NA NSUB 4E16 4E16 cm-3 substrate doping density
µ U0 290 72 cm2/Vs charge mobility
k KP A/V2 fet gain factor
γ GAMMA V0.5 bulk threshold param.
param.
Cox COX F/m2 oxide capacitance
capacitance
λ α/L V- 1 channel length
α modulat.
modulat.1e-
1e-8 2e-
2e-8 V-1m-1 channel length mod fact.
φ0 PB 0.7556 0.78469 V built in junction potent.
2φF PHI 0.77 0.77 V surface inversion pot.
Cgb0 CGB0 3.45E-
3.45E-10 dito F/m overlapping cap per 2L
Cgs0 CGS0 1.38E-
1.38E-10 dito F/m overlapping cap per W
Cgd0 CGD0 1.38E-
1.38E-10 dito F/m overlapping cap per W
Cj CJ 7.75E-
7.75E-4 8.15E-
8.15E-4 F/m2 zero-
zero-bias cap / unit A
Cjsw CJSW 3.44E-
3.44E-10 3.54E
3.54E--10 F/m zero-
zero-bias cap per unit P
Mj MJ 0.35 0.36 grading coeff for bottom
Mjsw MJSW 0.26 0.27 grading coeff sidewall
MicroLab, VLSI-2 (23/24)

JMM v1.4
VLSI--2
Exercises: VLSI
Ex vlsi2.1 (difficulty: easy): Calculate the missing
parameters on the previous transparency like intrinsic
transconductance k, bulk threshold parameter γ and
0.5µm process)
oxide capacitance Cox of an nfet (Alatel 0.5µ process)
=100µA/V2, kp=24.9µ
Result: kn=100µ =24.9µA/V2, γ=0.334V0.5,
Cox=3.45E-
=3.45E-7 F/cm2 (see Weste pp48ff)
Ex vlsi2.2 (difficulty: easy): Calculate the threshold
voltage shift due to the body effect of an nfet at Vsb =
2.2V ((Alcatel 0.5µm process)
Alcatel 0.5µ
Result: dVtn = 0.282V (see Weste pp55)
Ex vlsi2.3 (difficulty: easy): Calculate the
transconductance βn of an nfet (Alatel 0.5µ
0.5µm process),
W=1 µm, L= 0.5 µm
Result: βn=200 µΑ/ µΑ/V2 (see Weste pp53)
Ex vlsi2.4 (difficulty: easy): Calculate the capacitances of
an nfet with Vsb=
Vsb=Vdb=3V, W=1µm, L=0.5µ
Vdb=3V, W=1µ L=0.5µm,
A=1µm2, P=3µ
A=1µ P=3µm (Alatel 0.5µm process)
(Alatel 0.5µ
Result: Cgate=2.35fF, Cdrain=Csource=1.2fF (see Weste
pp183-
pp183-191)

Weste pp99: 2.10: Have a look at ex 8, 9


MicroLab, VLSI-2 (24/24)

JMM v1.4
VLSI Design I
Static characteristics of MOS inverter

Static characteristics?
Does that mean it’s not
going to move?

Overview
Static transfer characteristic of CMOS gates

Goal: You know the transfer characteristic of CMOS


gates and know how to calculate noise margins

MicroLab, VLSI-3 (1/14)

JMM v1.4
NFET Review
D D +

G G Vds >= 0
+

S - S -
Vgs

Operating regions: 0.7V

cut-
cut-off: Vgs < Vt S D

linear:V
linear: Vgs >= Vt
Vds < Vdsat S D

Vgs - Vt
saturation: Vgs >= Vt
Vds >= Vdsat S D
Ids

Vgs

Vds
MicroLab, VLSI-3 (2/14)

JMM v1.4
PFET Review
D D -

G G Vds <= 0
+

S - S +
Vgs

Operating regions: -0.7V

cut-
cut-off: Vgs > Vt S D

linear:V
linear: Vgs <= Vt
Vds > Vdsat S D

Vgs - Vt
saturation: Vgs <= Vt
Vds <= Vdsat S D

-Vds

-Vgs

-Ids
MicroLab, VLSI-3 (3/14)

JMM v1.4
“Bipolar” Logic
Isn’t this a
CMOS course?

Bipolar = two signal levels


‘0’ when V near 0
‘1’ when V near Vdd
Vdd
Inverter recipe:
pullup: make this connection
when Vin near 0 so that Vout = Vdd

Vin Vout

pulldown: make this connection


when Vin near Vdd so that Vout = 0

one power supply => low impedance source for 2 levels


receivers have a simple job => only make one decision
no DC power if connections not “made” at same time
Boolean logic has been around a long time

MicroLab, VLSI-3 (4/14)

JMM v1.4
Characterizing Inverters
What goals do we want to achieve with our inverter
implementation (and, more generally, other functions)?
Š fast propagation delay (next lecture!)
Š low power dissipation
Š compact layout
Š noise immunity

Vout
Š Draw voltage-
voltage-transfer
VOH curve (VTC) for inverter.
Š Shade-
Shade-in areas that
VTC can’t enter.
Š What can we say about
gain?
VOL Š What is “ideal” inv. VTC?
Vin
VIL VIH

MicroLab, VLSI-3 (5/14)

JMM v1.4
Noise Margin Are there other ways
of signalling?

Š noise immunity. Since we’re signalling values using


voltages, we want good noise margins. This means
that we need to make an allowance for noise when
assigning voltage levels for valid inputs and outputs
Š definition: NM L = VIL max − VOL max
NM H = VOH min − VIH min

output input
characteristics characteristics
Vdd
Logical High
Output Range VOHmin Logical High
Input Range
VIHmin
VILmax
Logical Low
Logical Low VOLmax Input Range
Output Range
Vss
MicroLab, VLSI-3 (6/14)

JMM v1.4
Choosing signal voltages
This is a subject on which reasonable people
can disagree! One possible line of attack:
merged VTC for all
process corners &
Vout devices
Step 1: pick VIL and VIH
don’t want to amplify noise
so find values of Vin where
VTC gain = 1 or -1. Choose
smallest VIL and largest VIH
VIL VIH

Vout
Step 2: pick VOL and VOH
choose values so that VOH
(1) VTC is in legal territory
(2) leave desired noise
margins VOL

VIL VIH
NML NMH

MicroLab, VLSI-3 (7/14)

JMM v1.4
Inverter pulldown devices
The NFET makes an ideal pulldown device:

Ipd

Š if pullup is off, VOL = ______


Š no DC connection when Vin < ______
Š increase width to increase Ipd
Š compact layout
saturated pulldown region
Vout Vin = Vout
Vin = Vout + Vt0
cut-
cut-off
pulldown
region linear pulldown region

Vin
VIL
Vt0
always > Vt0
MicroLab, VLSI-3 (8/14)

JMM v1.4
Inverter pullup devices
Resistor. No load on input, VOH=Vdd
Will dissipate static power; increasing R will reduce
power and increase noise margin, but low-
low-to-
to-high
transition gets slower. Only practical if process
supports undo
undop
doped poly which has sheet resistance of 10M
Ohm/square.

Depletion-
Depletion-mode NFET. No load on input, VOH=Vdd.
Connecting gate to source sets Vgs = 0 so Ipu is
determined only by Vout. Layout can be compact since
pullup is in same well as pulldown;
pulldown; buried contact can be
used to connect gate to source. Only found in NMOS
processes.

Enhancement-
Enhancement-mode NFET. VOH= Vdd- Vt unless gate of
pullup is driven above Vdd. If gate is not switched off,
pullup needs to be weak to avoid excessive power
dissipation, but this may entail larger layouts. Useful
where PFETs not wanted (e.g., some I/O structures).

Pseudo-
Pseudo-NMOS using saturated PFET as load
device. VOH= Vdd. Useful for building large fan-
fan-in NOR
gates found in static ROMs and PLAs where static power
dissipation is okay.

MicroLab, VLSI-3 (9/14)

JMM v1.4
Inverter with PFET pullup
Vgs,
gs,pu = Vin-Vdd Vds,
ds,pu = Vout-Vdd

S Š negligible steady-
steady-state
G
power dissipation
Vin D Vout Š VOL = 0V, VOH = Vdd
D Š VTC transition very sharp
Š switching point can be
G S adjusted by fet sizing
Vgs,
gs,pd = Vin Vds,
ds,pd = Vout

non-
non-vertical only because
of channel-
channel-length modulation
Vout Vin = Vout
Vdd

n=off lin
p=
sat

sat
p=

n= p=off
lin
n=

Wn/
Wn/Wp>1
Wp>1 Wn/
Wn/Wp<1
Wp<1

Vin

Vt,p Vt,n Vdd+Vt,p Vdd

MicroLab, VLSI-3 (10/14)

JMM v1.4
Build your own VTC
In the steady state:
Ids,pd(Vin,Vout) = -Ids,pu
ds,pu(Vin-Vdd,Vout-Vdd)

Ids,
ds,pd Ids,
ds,pd
-Ids,
ds,pu
Vin = 0.5V -Ids,
ds,pu
Vin = 1.5V

Vout Vout

Vout

Ids,
ds,pd
-Ids,
ds,pu
Vin = 2.5V

Vout
When both fets are
saturated, small changes
in Vin produce large
changes in Vout
Vin

Ids,
ds,pd Ids,
ds,pd
-Ids,
ds,pu
Vin = 3.5V -Ids,
ds,pu
Vin = 4.5V

Vout Vout

MicroLab, VLSI-3 (11/14)

JMM v1.4
Ben Bitdiddle’s Buffer!

Vin Vout

How many would you buy?

MicroLab, VLSI-3 (12/14)

JMM v1.4
Coming Up...
Next topic…
Dynamic characteristics of MOS inverters:
propagation delay, effects of rise and fall times.
Transistor sizing, interconnect issues, estimating
performance.
Readings for next time…
Weste:
‹ Sections 2.3 thrugh 2.3.2

MicroLab, VLSI-3 (13/14)

JMM v1.4
VLSI--3
Exercises: VLSI
Ex vlsi3.1 (difficulty: easy): Calculate the CMOS
inverter threshold values for the following confi- confi-
0,5µm process,VDD=3,3V)
gurations (Alcatel 0,5µ
a) Wn = Ln, Wp = Lp
b) Wn = 10 Ln, Wp = Lp
c) Wn = Ln, Wp = 10 Lp
Result: a) Vinv = 1.30V, b) Vinv = 0.893, c) Vinv =
1.88V (see Weste pp66)
Ex vlsi3.2 (difficulty: medium, time consuming):
Calculate the noise margin and VIL, VIH, VOL, VOH,
for a CMOS inverter operating at 3.3V with βn=
βp, Utn= -Utp=0.61V.
Result: VIL = 1.39V, VIH = 1.91V, VOL = 0.26V,
VOH = 3.04, NML= NML=1.13V
Weste pp99: 2.10 ex 5 (difficulty: medium, time
consuming): Design an input buffer that may be
used to interface with a TTL driver (V (Vdd=3.3V,
VOL=0.8V, VOH=2.0V). Show full derivations of
DC conditions. Assume Wn =1µ =1µm and Ln = Lp =
0.5µm
0.5µ
1.51µm
Result: Wp = 1.51µ MicroLab, VLSI-3 (14/14)

JMM v1.4
VLSI Design I
Dynamic characteristics of MOS inverters

Wow! 0 to 3.3 volts in 300ps!

Overview
gate delay modeling
power dissipation

Goal: You are familiar with CMOS gate delay models


like Penfield-
Penfield-Rubenstein and wire models. You
know the influence of body effect and large loads to
gate delay. You know why ground bounce occurres.
occurres.
You know the different factors in power dissipation.
MicroLab, VLSI-4 (1/29)

JMM v1.3
Static properties reviewed
sharp transition:
inverter good
receiver for voltage-
voltage-
based signalling
Vout Ids,n
ds,n

increasing Wn increasing Wp
decreasing Wp decreasing Wn

Define theshold voltage


Vinv as voltage where
Vin = Vout on VTC.

Vin
VOH=Vdd, VOL=0, sharp transition => good noise margins
VOH=Vdd => pfet off when Vin=VOH => no static power
VOL=0 => nfet off when Vin=VOL => no static power

VTC describes static behaviour. When Vin changes, Vout


“lags behind” because it takes time for capacitors to
charge/discharge. So, in real, life Vin reaches Vth before
Vout does.

MicroLab, VLSI-4 (2/29)

JMM v1.3
Choosing what to measure
V tf
Vin
90%
Vin Vout ???

10% Vout
t
td
tr
Rise time, tr = time for a waveform to rise from 10% to
90% of its steady-
steady-state value
Fall time, tf = time for a waveform to fall from 90% to
10% of its steady-
steady-state value
Delay time, td = time between input transition (when Vin
= ???) and output transition (when Vout = ???).
Š If ??? = Vinv, can delay be negative?
Š does Vinv differ for each gate?
so does td(seq. of gates) = sum(td)?
Š should we choose 50% instead of Vinv?

MicroLab, VLSI-4 (3/29)

JMM v1.3
Signal delay time
Signal delay time is composed as follows
‹ gate delay time

‹ interconnection delay time

‹ due to minimization the delay times decreases


‹ the output impedance of buffers increases, thus the
importance of interconnection delays increases

due to continuing miniaturization, signal delay time


Ödue
becomes less dependent on gate delay but more
dependent on interconnection delay time

UCC
switch level mode of fet switch level mode
of inverter Rp

Ugs Uds Uin Uout

C Cin
R Rn

MicroLab, VLSI-4 (4/29)

JMM v1.3
Fall time analysis #1
dynamic transition
Vout
static transition Vin = Vout
Vdd

n=off lin
speed
p=
sat

sat
p=

n= lin p=off
n=

Vin

Vt,p Vt,n Vdd+Vt,p Vdd

‹ the switching speed is limited by the time taken to


discharge the capacitance CL
‹ the static transition curve moves to the right if the
input transition is fast
‹ p-fet gets cut-
cut-off during the whole falling output time
‹ n-fet immediately gets saturated, later on linear

MicroLab, VLSI-4 (5/29)

JMM v1.3
Fall time analysis #2
Saturated: Vout >= Vdd - Vt,n
dVout βn
= − (Vdd − Vt,n )
2
CL
Vout dt 2
So, time to fall from 0.9Vdd to
Idsat,n
dsat,n CL Vdd - Vt,n is given by
2C L 0.9V dd

β n (Vdd − Vt,n )
2 ∫Vdd − Vt, n
dVout

Linear: Vout < Vdd - Vt,n


dVout Vout
CL =− = −Idn function
Vout
dt Rn of Vout
Rn CL So, time to fall from Vdd - Vt,n to
0.1Vdd is given by
Vdd −Vt ,n dVout
CL ∫
0.1Vdd I dn
Adding to get total fall time (Weste, Eq 4.37):
Vt,n/Vdd

CL 2  (n − 0.1 ) 
tf =  + 0.5 ln (19 − 20n )
β n Vdd (1 − n )  (1 - n )


tr is
similar equals 3 to 4 for Vdd=3V-
=3V-5V and Vt,n=.5V-
=.5V-1V
equals 3.6 for C05M
MicroLab, VLSI-4 (6/29)

JMM v1.3
Estimating delays
In most CMOS circuits, the delay of a single gate is
dominated by the output raise and fall time. Thus:
tr tf
t dr = t df =
2 2

Having found a general form for approximate rise and fall


times, one might estimate all delays using the same general
form:
L width expressed
t delay = A delay CL
W as multiple of
minimum width
looks like a resistor!

Where Adelay is a constant that depends on the power supply


and transition voltages, the process and the minimum
mosfet dimensions. This last dependency might strike one
as odd, but usually all mosfets are built using the minimum
allowable mosfet length for the process.
Rather than solve the equations analytically, one can use
Spice to determine the value of various useful constants:
Ar, Af, Adr, Adf. These can be used in quick&dirty
calculations for sizing transistors during the design
process. MicroLab, VLSI-4 (7/29)

JMM v1.3
Input rise/fall & delay
How do input rise and fall times affect delay?
Š fast inputs will quickly turn off one mosfet and provide
maximum Vgs to the driving mosfet for most of the output
transition
Š slow inputs will leave both mosfets on longer, reducing
effective current to/from load capacitance and Vgs will be
lower than above.
So we might expect slower input transitions to lead to
longer output delay times.
One rule of thumb (Weste
(Weste,
Weste, p. 216ff)
~0.2 for Vtn = 0.61V, Vdd = 3.3V

1 + 2n
t dr = t dr −step + t f,in
6
1 − 2p
t df = t df −step + t r,in
6

valid for input transitions that aren’t “too” long

MicroLab, VLSI-4 (8/29)

JMM v1.3
Bootstrapping & delay

CGD

When the input starts to rise, the output, which was


high, starts to fall. Thus the voltage across CGD
changes requiring the input to supply more current to
charge CGD, slowing the input transition.
Since CGD is small, this is usually a small effect.
When inverter is biased into its linear region, CGD may
appear multiplied by the gain of the inverter (Miller
effect). This doesn’t usually matter in digital circuits
since the input passes rapidly through linear region.
Useful in analog circuits...

MicroLab, VLSI-4 (9/29)

JMM v1.3
Multiple inputs & delay

Cout
A
Cab
B Intermediate
Cbc node
C capacitances
Ccd
D

How should we model delays when we have multiple


inputs? When A, B, C and D are logic 1:
Š treat series mosfets as resistances in series. Lump intermediate
node capacitance with load capacitance.
t d = ∑iR i ∑ iC i
Š use Penfield-
Penfield-Rubenstein model which predicts
t d = ∑iR i C i
where Ri is the summed resistance from point i to ground and Ci
is the capacitance at point i.
Š Penfield-
Penfield-Rubenstein Slope Model uses effective
resistance simulated by Spice: t df
Rn =
MicroLab, VLSI-4 (10/29)
C
JMM v1.3
Body effect & delay

If A goes from 0 to 1 while B, C and D are 1,


then all the intermediate nodes in the pulldown chain have
already been discharged and the top mosfet sees only a
small body effect.
If D goes from 0 to 1 while A, B and C are 1,
then the intermediate nodes are all one Vt below Vdd and
the upper mosfets see a larger body effect.
Thus A is the “faster” input!

MicroLab, VLSI-4 (11/29)

JMM v1.3
Driving large loads #1
If large loads have to be driven, the delay may increase
drastically. Large loads are output capacitances, clock trees,
etc.
C
t d = t inv L = 1000 ⋅ t inv
1 CG
CG CL=1000 CG

A possibility to reduce the delay, but probably not the


optimum:
40 ⋅ t inv 5 ⋅ t inv 5 ⋅ t inv

1 40 200

CG CL=1000 CG

40 200 1000
td = ⋅ t inv + ⋅ t inv + ⋅ t inv = 50 ⋅ t inv
1 40 200

MicroLab, VLSI-4 (12/29)

JMM v1.3
Driving large loads #2
To drive a large load capacitance one might
employ a sequence of n inverters, each a factor “a” larger
than the previous one:
1 a a2 a3

CG CL
n=4 inverters

The delay through each stage is atd where td is the average


delay of a minimum-
minimum-sized inverter driving another minimum-
minimum-
sized inverter. We want an = (CL/CG), so
 CL  a t d
Total delay = n (a t d ) = ln  
 C G  ln (a )
Thus, total delay is minimized when a = e = 2.7
7

4
in practice
3
a=3...5
2

0
0 1 2 3 4 5 6 7 8

MicroLab, VLSI-4 (13/29)

JMM v1.3
Power dissipation #1

‹ the power consumption is low compared to other


technologies
‹ scaling down increases the power dissipation
density with respect to chip area
‹ power dissipation produces heat on the chip, which
has to be carried off through the chip socket
‹ power dissipation is one of the limiting factors in
todays CMOS VLSI chips
‹ low power applications is a speciality of EM
(Neuenburg,
Neuenburg, watches, battery applications, etc)

MicroLab, VLSI-4 (14/29)

JMM v1.3
Power dissipation #2

sources of power dissipation:


‹ static power dissipation (quiescent current)

‹ dynamic power dissipation

‹ dc power dissipat
dissipation: short circuit current (power to
ground) due to switching
‹ ac power dissipation: capacitor current (charging, re-
re-
charging) due to switching

static power dissipation


‹ there is always one fet off, so only leakage current is
present

I0 = I S (e qV / kT − 1 )

PS = ∑ I0 ⋅ VDD

MicroLab, VLSI-4 (15/29)

JMM v1.3
Dynamic power dissipation #1
Comparison of dynamic short circuit current vs.
capacitive current.
‹ As expected, the short circuit current have a less
important contribution when the load gets large.
‹ Slower input transition would increase short circuit
current.
Uin Uout

W/L=4 Idsn
Uin Uout-
out-A
W/L=2
Idsp

W/L=4 Idsn
Uin Uout-
out-B
W/L=2 Idsp
50fF

W/L=4 Idsn
Uin Uout-
out-C
Idsp
W/L=2 200fF

MicroLab, VLSI-4 (16/29)

JMM v1.3
Dynamic power dissipation #2
Average dynamic power for switching a square-
square-wave input
with a repetition frequency of fp = 1/t 1/tp is (capacitor
current)
t p /2 tp
1 1
Pd = ∫ in (t )Vout dt + ∫ i p (t )(VDD − Vout )dt
tp 0 t p t p /2

Assuming a step input and taking in(t) = CLdVout/dt


dt,,
i.e., the capacitive current, we get:
Vdd 0
CL CL
Pd =
tp ∫0 Vout dVout + t p ∫ (V
Vdd
DD − Vout )d (VDD − Vout )

Aha! Now one can see why everybody


changes from 5V to 3.3V and to 2.5V!

C L VDD2
Pd = = C L VDD2 fp
tp
proportional to switching
frequency but independent
of device parameters

MicroLab, VLSI-4 (17/29)

JMM v1.3
Dynamic power dissipation #3

Short circuit power dissipation is given by

Psc = Imean ⋅ VDD


tr tf

VDD+Vtp

Vtn

tp
Imax

Imean
t1 t2 t3

The above waveform shows the short circuit current

β t rf
⋅ (VDD − 2 Vt )
3
Psc =
12 t p

MicroLab, VLSI-4 (18/29)

JMM v1.3
Total power dissipation

‹ Total power dissipation is:

Ptotal = Ps + Pd + Psc

‹ dynamic power dissipation is dominant


‹ use switching activity to estimate power
dissipation:
Pd = n switching ⋅ C total ⋅ VDD2 ⋅ f

‹ switching activity:
 nswitching = percentage of switching gates
‹ there exist simulators estimating power dissipation
using the switching activity

MicroLab, VLSI-4 (19/29)

JMM v1.3
Build your own power meter
linear current-
current-controlled
current source
+
Vs = 0 Is g*I
g*Is RY CY Vy
-
Vy(0) = 0V

Device
or
Periodic input Circuit CL
Vin(t) = Vin(t+T)

If one chooses Vdd C y


g=
T
and
RyCy >> T
Then Vy(T) in volts will equal the average dynamic
power in watts drawn from the power supply over
one period.

MicroLab, VLSI-4 (20/29)

JMM v1.3
Power and ground bounce

‹ Metal power-
power-carrying conductors have to be sized
for three reasons:
‹ metal migration
‹ power supply noise
‹ RC delay
‹ general rule:
‹ limit current density J AL ≈ 0.4... 1mA / µm
‹ contact replication

I I I

MicroLab, VLSI-4 (21/29)

JMM v1.3
“It’s the wires, stupid”
As process dimensions shrink, wiring capacitances
start to dominate the mosfet capacitances.
To estimate wiring capacitances, consider the
following figure:

h
Cpp

fringing-
fringing-field
parallel-
parallel-plate
capacitance
capacitance

Parallel-
Parallel-plate capacitance given in process
files. Fringing capacitance is significant
when t is comparable to h.

MicroLab, VLSI-4 (22/29)

JMM v1.3
Fringing Capacitance
Figure 6.11 from CMOS Digital Integrated Circuits:
Analysis and Design, by Kang and Leblebici:
Leblebici:

For a long conductor where (t/h)=0.4,


(w/h)=0.25, (w/l)=0, the total capacitance
may be 10x the parallel plate capacitance.
MicroLab, VLSI-4 (23/29)

JMM v1.3
Wire model?
Today, the longest wire on a VLSI chip might be 2cm
which has “time of flight” of ~130ps assuming εSiO2
= 3.9 ε0
If the signal rise/fall time is longer than the time of
flight we can model wires as a distributed RC network.
Longer wires or shorter rise/fall times require the wire
to be modelled as a transmission line.
For short wires, a lumped RC model is sufficient. For
longer wires, we use the distributed RC model where
signal propagation can be shown to obey the diffusion
equation:

R/unit length dV d 2 V
rc = 2
dt dx
C/unit length distance from driver

Which means the prop time tx = kx2 with the


signal “edge” becoming dispersed with
increasing x.

MicroLab, VLSI-4 (24/29)

JMM v1.3
Eq.. in “real life”
Diffusion Eq
rcl2 Weste,
Weste, Eq.
Eq. 4.28,
t = 2 .2 but 10% to 90% rise/fall time
2
Ex vlsi4.3: clock with 50pf load distributed
1µ-wide metal wire running from clock
by 1µ
buffer in corner of 10mm x 10mm chip.
buffer
r = 0.05 ohm/square
c = 50pf/20mm
l = 20mm

a) t = ? b) t = ?

Fix: drive clock from central location to


20µ:
decrease l and widen clock wire to 20µ

r = 0.0025 ohm/square
c = 50pf/10mm
l = 10mm

c) t = ?

whew!
MicroLab, VLSI-4 (25/29)

JMM v1.3
Inductance
‹ Bond-
Bond-wire inductance can cause deleterious effects
in large, high speed I/O buffers
 package inductance: 3 .. 15 nH
‹ with process shrinking on-
on-chip inductance has to be
taken into account
Vdd
 on-
on-chip inductance: 10 .. 50pH/mm
L
dI
dV = L i(t)
dt

 design techniques:
9 separate power pins for I/O pads and chip core
9 multiple power and ground pins
9 careful selection of the position of the power and
ground pins on the package
9 adding decoupling capacitances on the board
9 increase the rise and fall times
9 use advanced package technologies (SMD, etc)
MicroLab, VLSI-4 (26/29)

JMM v1.3
Coming Up...
Next topic…
Combinational logic: series/parallel switch
networks, transmission gates. Performance
optimis
optimisation.
ation.

Readings for next time…


Weste:
‹ 4.4 (inductance)
‹ 4.3.6, and 4.5 thru 4.5.1, and 4.5.4 thru 4.5.5 except
4.5.4.4, and 4.6.3 (delay modelling)
‹ 4.7 (power consumption)

‹ 4.8 (sizing routing conductors)

You should read the rest of chapter 4 when you get


the chance ...

MicroLab, VLSI-4 (27/29)

JMM v1.3
VLSI--4
Exercises: VLSI
Ex vlsi4.1 (difficulty: easy): Calculate the inductive spike
at the power supply provoked by 8 output buffers, each
driving 50pF in 4ns, Vdd=3.3V, total bonding
inductance 15nH
Result: dVtot = 1.24V (see Weste pp 205)

Ex vlsi4.2 (difficulty: easy): a) Calculate the power


supply width Wpower necessary for feeding a clock buffer
running at 50MHz driving 100pF. b) What is the ground
=0.5mA/µm,
bounce with the chosen conductor? (JAL=0.5mA/µ
power supply distance l = 1mm, Vdd=3.3V, Rmetal1 =
72mΩ/sq, tr= tf=1ns)
72mΩ
Result: a) Wpower=33 µm,
b) dV = 0.72V (see Weste pp 239)

Ex vlsi4.3 (difficulty: easy): Calculate the clock


distribution delay for the example on transparency 25
Result: a) td=55 ns, b) td=27.5 ns,
b) td=1.38 ns (see Weste pp 200)
MicroLab, VLSI-4 (28/29)

JMM v1.3
VLSI--4
Exercises: VLSI

Ex vlsi4.4 (difficulty: easy): Calculate Ar and Af for


a CMOS inverter ((Vdd
Vdd=3.3V, 0.5µm
Vdd=3.3V, Alcatel 0.5µ
process)
Result: Ar =43.9 kΩ kΩ, Af =10.9 kΩ kΩ (see Weste
pp208ff and transparency 7)

Weste pp370: 5.9 ex 14 (difficulty: easy): A low


power 3.3V chip has a clock of 12MHz. In the
power down-
down-mode, the clock driver drives 5mm of a
2µm wide metal1 wire. If the area capacitance of
Ca=2.37pF/µm2 and the sidewall
metal is Ca=2.37pF/µ
2.37pF/µm what is the
capacitance is Cf0= 2.37pF/µ
power-
power-down dissipation, assuming this is the
dominant term? What is the dissipation if the wire
50µm length?
is reduced to 50µ
85µW, 0.85µ
Result: Pd = 85µ 0.85µW (see Weste pp 235)

MicroLab, VLSI-4 (29/29)

JMM v1.3
VLSI Design I
CMOS Combinational Logic

Overview
Euler rules for complex CMOS gates
Layout and stick diagram

Goal: You know how to design compact layout of


complex CMOS logic gates with the Euler rules.
You are familiar with transmission gates and its
limitations.
MicroLab, VLSI-5 (1/34)

JMM v1.4
How ‘bout more than 1 input?
Vdd
Logic recipe:
pullup: make this connection
when we want F(A1,…,An) = 1
...

A1
F(A1,…,An)
...

An
pulldown: make this connection
when we want F(A1,…,An) = 0
...

Finally! I was
getting tired
of inverters...

we want VOH = Vdd, better use only


pfets in the pullup path
Š similarly, since we want VOL = 0, better
use only nfets in the pulldown path
Š looking at pulldown path: since nfets are
on when VGS > VTH, output will be pulled
low when right combination of inputs are
high…
CMOS gates are naturally inverting

MicroLab, VLSI-5 (2/34)

JMM v1.4
Complementary logic
Now you know what the “C”
in CMOS stands for!
We want complementary pullup and pulldown
logic, i.e., the pulldown should be “on” when
the pullup is “off” and vice versa.
pullup pulldown F(A1,…,An)
on off driven “1”
off on driven “0”
on on driven “X”
off off no connection

Since there’s plenty of capacitance on the output


node, when the output becomes disconnected it
“remembers” its previous voltage -- at least for a
while. The “memory” is the load capacitor’s charge.
Leakage currents will cause eventual decay of the
charge (that’s why DRAMs need to be refreshed!).

“No connection” is also useful for constructing


tristate drivers! In this case, we call this state
“Z” which is short for “high-
“high-Z” which is short for
“high impedance” which is how engineers say
“no connection”. Isn’t jargon wonderful?

MicroLab, VLSI-5 (3/34)

JMM v1.4
CMOS complements
What a nice Thanks. It runs
VOH you have... in the family...

pulldown pullup
nfet block pfet block

conducts when VGS is high conducts when VGS is low

A
A B
B

conducts when A is high conducts when A is low


and B is high: A.B or B is low: A+B = A.B

A
A B
B

conducts when A is high conducts when A is low


or B is high: A+B and B is low: A.B = A+B

MicroLab, VLSI-5 (4/34)

JMM v1.4
Development of CMOS gates /1

Example: CMOS NAND gate F = A*B

A
B 0 1
Step 1: development of nfet 0 1 1
block. Logic mini-
mini-
mization of “0” in 1 1 0
Karnaugh diagram

F=A*B

MicroLab, VLSI-5 (5/34)

JMM v1.4
Development of CMOS gates /2

A
B 0 1
Step 2: development of pfet 0 1 1
block. Logic mini-
mini-
mization of “1” in 1 1 0
Karnaugh diagram

F=A+B

A B

MicroLab, VLSI-5 (6/34)

JMM v1.4
Development of CMOS gates /2

A
B 0 1
Step 3: put nfet and pfet 0 1 1
block together
1 1 0

F=A*B

MicroLab, VLSI-5 (7/34)

JMM v1.4
NAND & NOR
2-input NAND. When output is low,
two nfets are in series. So to keep
output fall time equivalent to that
of an inverter, the nfets have to be
twice as wide. Pfet widths can be
A same as those in the inverter (but
remember there were already 2x nfet
widths!). Can be extended to large
B fan-
fan-in but practical limit is 4 inputs.
Why?

2-input NOR. When output is high,


B two pfets are in series. So to keep
output rise time equivalent to that
of an inverter, the pfets have to be
twice as wide. Nfet widths can be
same as those in the inverter. Can
be extended to large fan-
fan-in but
A practical limit is 4 inputs. NOR gates
are considered less good than NAND
gates. Why?

Pseudo-
Pseudo-NMOS NOR gates are
used to build high fan-
fan-in NOR
gates for PLA’s to save area
A1 … An
(at some cost in static power).

MicroLab, VLSI-5 (8/34)

JMM v1.4
Layout of simple gates
VDD p-type substrate

n-type well

metal/pdiff
metal/pdiff
contact
with detail
removed Wp

Lp

IN OUT

Wn

Ln contact
from metal
to ndiff
GND

metal2 metal poly n+ diff p+ diff

MicroLab, VLSI-5 (9/34)

JMM v1.4
Layout Rules #1

Š layout rules are the common language between


design and process engineers
Š conservative rules absorb process disturbances and
variations
Š layout rules must be respected by the designer
Š layout rules reflect the limits of a process, they
describe:
Š minimal distance, overlap
Š minimal width (e.x. channel length, λ)
Š layout readability is improved using colours:
metal blue
polysilicium red
n-diffusion green
p-diffusion yellow
n-well brown
contact, via black

MicroLab, VLSI-5 (10/34)

JMM v1.4
Layout Rules #2

symbol and mask layout of a CMOS inverter


n-well contact (n-
(n-diff)

bulk contact (p-


(p-diff)

MicroLab, VLSI-5 (11/34)

JMM v1.4
Stick Diagram

Š stick diagrams are technology independent


Š no layout rules need to be known
Š mask layout may be generated automatically

MicroLab, VLSI-5 (12/34)

JMM v1.4
NAND & NOR ((again
again))
again

MicroLab, VLSI-5 (13/34)

JMM v1.4
Fan--In CMOS Gates
Large Fan

CMOS gates with large fan-


fan-in suffer from:
Š body effect
Š unsymmetrical delay
Š large delay
⇒ never use more than 4 or 5 fets in series
⇒ increment logic depth

&
&

& ≥1

&

MicroLab, VLSI-5 (14/34)

JMM v1.4
CMOS Gate Recipe

A
Step 1. Figure out pulldown
network that does what you
want, e.g., F = A*(B+C) B C

Step 2. Walk the hierarchy


replacing nfets with pfets,
pfets,
B
series subnets with parallel A
subnets, and parallel C
subnets with series subnets

Step 3. Combine pfet


B
pullup network from Step A
2 with nfet pulldown C
network from Step 1 to
form fully-
fully-complementary A
CMOS gate.
But isn’t it B C
hard to wire
it all up?

MicroLab, VLSI-5 (15/34)

JMM v1.4
Complex CMOS Gates /1

Š classical CMOS logic gates are always inverting


logic gates
Š complex CMOS logic gates are a mixture of AND
and OR structures with a final inversion

Example: F = A * B + C * D

Step 1: generation of nfet


A C
block (logic “0”)
B D
F=A*B+C*D

C D
Step 2: generation of pfet
block (logic “1”)
A B
F = (A + B) * (C + D)

MicroLab, VLSI-5 (16/34)

JMM v1.4
Complex CMOS Gates /2

Step 3: put everything C D


together. What
about the layout ?
A B

A C
where is this signal
B D
in the transistor schema ?

A & ≥1
B

C &

MicroLab, VLSI-5 (17/34)

JMM v1.4
Complex CMOS Gates Layout /1

Goal: compact layout. All complex gates may be


designed using a single row of nfets and a single
line of pfets,
pfets, thus adjacent drain/source diffusions
of fets are very close.
Euler rule:
Š generate an n-n-graph by replacing the nfet block with
vertices for nodes and edges for fets
Š generate a dual p-p-graph
Š find a sequence containing all edges in the n-
n-graph.
This sequence is called Euler n-path.
Š generate an Euler p-path with the same labelling as
the Euler n-path. If not possible start again.
Š the labelling sequence of the 2 Euler paths are the
gate sequence of the single row nfet/
nfet/pfet CMOS
gate.

MicroLab, VLSI-5 (18/34)

JMM v1.4
Complex CMOS Gates Layout /2
VDD

C D
N1

A B
F

A C
N2 N3
B D

VSS start
F
C
A

start

VDD N3 N1 N2 F

D
B
VSS
A -> B -> D -> C
MicroLab, VLSI-5 (19/34)

JMM v1.4
Complex CMOS Gates /3
C D

A B
F

A C A -> B -> D -> C


B D

A B D C
MicroLab, VLSI-5 (20/34)

JMM v1.4
Complex CMOS Gates /4

C
A
B

B C

MicroLab, VLSI-5 (21/34)

JMM v1.4
A Quiz! /1

MicroLab, VLSI-5 (22/34)

JMM v1.4
A Quiz! /2

Find the minimal transistor circuit (2 * 4 fets)


fets) and
the most compact layout using Eulers rule.

CD
00 01 11 10
AB
00 1 1 1 1
01 0 0 0 0
11 0 0 0 0
10 1 0 0 0

MicroLab, VLSI-5 (23/34)

JMM v1.4
Quiz : Solution

F=A*B+B*C*D

F = B * ( A + C * D) equation ready for p-


p-block

C VDD

start
VSS P1 N1 F

A
D
P2
start

D -> C -> A -> B


MicroLab, VLSI-5 (24/34)

JMM v1.4
Transmission Gates
S
CMOS nMOS

A B A B

S
S

If VA = VDD then current will flow from


A to B until VB = _____

If VA = 0 then current will flow from


B to A until VB = _____

Assuming S and -S are complementary signals, the CMOS


transmission gate (TG) acts as a switch, controlled by S,
that has no inherent voltage drop (unlike a switch
constructed from a single nfet or pfet which exhibits at
VT drop at one rail or the other).

MicroLab, VLSI-5 (25/34)

JMM v1.4
CMOS TG Electrical Model
S=VDD S=0

A B A B

S=0 S= VDD
switch is off switch is “on”

How on is “on”? Assume VA = VDD then


nfet = sat nfet = sat nfet = off
pfet = sat pfet = lin pfet = lin

VB
0V |VT,p| VDD-VT,n VDD

R
Req,p
eq,p Req,n
eq,n

Req,TG
eq,TG
Req,n
eq,n || Req,p
eq,p

VB
0V VDD-VT,n VDD
MicroLab, VLSI-5 (26/34)

JMM v1.4
TG Circuits: MUX

A
Y=A*S+B*S

B
Is this node
always the “output”
S of this gate?

inverter
not drawn

MicroLab, VLSI-5 (27/34)

JMM v1.4
TG Circuits: 4 to 1 MUX

Š multiplexers can easily be done with TG


Š never forget that TG are bi-
bi-directional
Š compact layout by combining identical gates

B
F
C

S1
S2
MicroLab, VLSI-5 (28/34)

JMM v1.4
Best XOR in Town
A ≥1&
A =1 F B F
B ≥1

12 transistors
A

A*B+A*B
B
Is this node
always the “output”
8 transistors of this gate?

A A*B+A*B

B Is this node
always the “output”
of this gate?

6 transistors
MicroLab, VLSI-5 (29/34)

JMM v1.4
TG Quiz

Find the function of the following 4 transistor circuit:

MicroLab, VLSI-5 (30/34)

JMM v1.4
TG Circuits: Problems

Š difficult to get compact layout


Š outputs behave like bi-
bi-directional signals
Š many TG in series provoke large delays

Uin Uout

R R R R R
Uin Uout
C C C C C

τ = 2.2 ⋅ (RC )2

MicroLab, VLSI-5 (31/34)

JMM v1.4
Coming Up...
Next topic…
Dynamic ((precharge
precharge/evaluate)
precharge/evaluate) logic circuits:
CMOS domino logic, NP domino logic, CVSL logic.
Charge sharing.

Readings for next time…


Weste:
‹ Sections 5.3 thru 5.3.4 and 5.4.6
‹ 5.3.9 thru 5.4.1

MicroLab, VLSI-5 (32/34)

JMM v1.4
VLSI--5
Exercises: VLSI #1

Ex vlsi5.1 (difficulty: easy): Design a CMOS gate that


implements the function
Out = (( A + B) ⋅C + D ⋅ E ) ⋅ F

Ex vlsi5.2 (difficulty: easy): What is the Boolean


equation of the following CMOS gate.
VDD

A
B
Z

GND

MicroLab, VLSI-5 (33/34)

JMM v1.4
VLSI--5
Exercises: VLSI #2

Weste pp371: 5.9ex7 (difficulty: easy): Design a pass


transistor network that implements the sum
function for an adder
S = A ⋅B ⋅ C + A ⋅B ⋅C + A ⋅B ⋅ C + A ⋅B ⋅C

MicroLab, VLSI-5 (34/34)

JMM v1.4
VLSI Design I
Dynamic Logic Gates

Overview
Dynamic logic gates, Domino, NORA, CVSL structure,

Goal: You are familiar with dynamic logic gates and its
different families. You can handle the dynamic
logic problems like charge sharing and timing.
MicroLab, VLSI-6 (1/28)

JMM v1.3
Tinkering with Logic Gates
Things to like about CMOS gates:
Š easy to translate logic to fets
Š rail-
rail-to-
to-rail switching
Š good noise margins
Š no static power since fets are in cutoff
Š sizing not critical to correct operation

Things not to like about CMOS gates:


Š N inputs Ö 2N fets (i.e., one nfet and one pfet)
pfet)
Š large circuit area, especially for pfets
Š “heavy” loading of inputs
Š pfets are either large or slow relative to nfets
Š series connections can get very slow

We can replace pfet pullup network with pseudo-


pseudo-NMOS
load (pfet
(pfet with grounded gate) but
Š dissipate static power when output is low
Š have to make load fet small to ensure that
VOL is low enough to cut off nfets in next stage
Š reduces static power consumption (good!)
Š increases output rise time (bad!)

One alternative: dynamic CMOS gates

MicroLab, VLSI-6 (2/28)

JMM v1.3
Dynamic CMOS Gates
“pre
“precharge”
B switch

A B

A “evaluate”
CLK
switch

Š inputs must be stable before CLK goes high because once output has
been discharged it won’t go high again until next cycle
Š for same reason, noise/glitches on inputs cannot exceed nfet
threshold, a much more stringent requirement than for static CMOS
CMOS
gates.

Prec
Precharge
echarge phase Evaluate phase

clock

output

MicroLab, VLSI-6 (3/28)

JMM v1.3
There’s good news & bad news
The good news:
Dynamic gates are faster than static gates despite the extra
“evaluate” fet in the pulldown path because of the reduction in self-
self-
loading and the elimination of the pullup short-
short-circuit current during
the first part of the output transition.
The bad news:
Dynamic gates cannot be cascaded.

Because of finite pulldown


time for node , node
starts to discharge!
evaluate
precharge
nfets nfets CLK

CLK

Solution: develop techniques that avoid races


CMOS Domino logic
CMOS NORA (no race) logic
MicroLab, VLSI-6 (4/28)

JMM v1.3
CMOS Domino Logic
pree
preecharge: high
evaluate: falls (maybe)

nfets nfets
buffer might
be needed
in any case
CLK for high fan-
fan-out
circuits.
pree
preecharge:low
evaluate: rises (maybe)

When CLK is low, dynamic node is pree


preecharged high and buffer inverter
output is low. Nfets in the next logic block will be off.

When CLK goes high, dynamic node is conditionally discharged andand the
buffer output will conditionally go high. Since discharge can only
only
happen once, buffer output can only make one low-
low-to-
to-high transition.

When domino gates are cascaded, as each gate “evaluates”, if its


output rises, it will trigger the evaluation of the next stage, and so
on… like a line of dominos falling. Like dominos, once the internal
internal
node in a gate “falls”, it stays “fallen” until it is “picked up”
up” by the
pree
preecharge phase of the next cycle.
Thus many gates may evaluate in one eval cycle.

MicroLab, VLSI-6 (5/28)

JMM v1.3
Domino--style Circuits
More Domino
weak pfet “keeper” keeps dynamic node pulled high during
evaluate phase if it’s not being pulled down through nfets Ö
gate is static in both clock phases.

CLK

nfets
“latching” pfet acts like keeper above unless dynamic node
gets pulled down during evaluate phase. When buffer output
goes high it switches keeper off saving static power. Good
for leakage current problems...

CLK Note that you can put an even


number of static gates after
the inverter and before the
nfets
next domino gate.

Be careful of cap.
! coupling to dynamic
node (see later slide).

Use NOR gate instead of inverter as the


buffer to make a faster high fan-
fan-in AND
gate. Same trick works for high fan-
fan-in OR
CLK or MUX functions.

MicroLab, VLSI-6 (6/28)

JMM v1.3
Optimising Domino Logic (I)

nfets nfets

CLK evaluate nfet


not needed?
precharge: low

Since domino gate outputs are low during the pre


precharge
phase, gates which have only domino output nodes as
inputs don’t need the “evaluate” nfet since all the nfets
in the pulldown will be off anyway.

But remember: if evaluate nfet is removed, precharge


will “ripple” through cascaded gates just like evaluates
do. Maybe only remove for gates where nfet stack is
tall (i.e. resistive) enough that pullup will start to
“win” anyway before ripple reaches gates and turns off
pulldowns.
pulldowns.

MicroLab, VLSI-6 (7/28)

JMM v1.3
Optimising Domino Logic (II)
In domino logic circuits we want evaluate
to happen as quickly as possible. We can
size fets to optimise evaluate speed.

small large

large small
nfets

CLK Some designers also “grade” the sizes of the nfets,


nfets,
smallest at the top (increase in R offset by decrease
in C)

If we make the nfet in the output inverter


much smaller than the pfet then
Š the load on the internal node decreases, and
Š the switching threshold of the inverter increases

Both effects make the gate evaluate sooner. If large >>


half! However,
small, the gate delay can be cut almost in half!
the other edge is very slow, so ripple pree
preecharge is a
problem.

MicroLab, VLSI-6 (8/28)

JMM v1.3
“it is not everything gold which is
glittering“

There are a few “little” difficulties:


Š “charge sharing”
sharing” between nodes in the pulldown
network and the dynamic node can unintentionally
reduce the voltage of the dynamic node enough to
switch output buffer
Š the addition of the output inverter makes domino
gates non-
non-inverting. One can often design around
this limitation, but some circuits cannot be
implemented solely using domino logic unless both
polarities (true and complement) of the inputs are
available. If both polarities of inputs are available
then we can generate both polarities of internal
signals with two domino gates so subsequent
stages will have both polarities of their inputs
available too.

MicroLab, VLSI-6 (9/28)

JMM v1.3
Charge Sharing (I)

F=0-
F=0->1 C 3C
Suppose the dynamic node has been
E=1 1.5C
discharged during the previous
evaluate cycle. Then during
precharge, all the intermediate nodes
D=1 1.5C in the pulldown chain will remain
discharged while the dynamic node is
C=1 C precharged.
precharged. Calculate the voltage on
the dynamic node when CLK goes
B=1 high. When CLK goes high, the
C
voltage on the dynamic node goes to
A=1 ->0 C
3C V = 1.1V for VDD=3.3V
3C + 6C DD
CLK
which is low enough to switch the output
inverter.

Fortunately this situation is easily detected by CAD tools and ccan


an be
resolved by (1) adding additional pree
preecharge devices to intermediate
nodes or (2) increasing size of output buffer which will increase
increase
capacitance of dynamic node (faster output buffer may compensate
for larger internal capacitance).

MicroLab, VLSI-6 (10/28)

JMM v1.3
Charge Sharing (II)

n-logic

n-logic n-logic

n-logic

CLK

additional precharge devices to


eliminate charge sharing problems

MicroLab, VLSI-6 (11/28)

JMM v1.3
Capacitive Coupling

OUT

CLK

OUT
t

Coupling can also occur between other signal wires and long dynamic
dynamic
nodes (e.g., ones that span multiple bits in a datapath).
datapath). Solutions:
on long routes add “twists” to avoid continuous routes or route
dynamic signals between mutually exclusive or complementary
signals.

MicroLab, VLSI-6 (12/28)

JMM v1.3
Domino Logic Design
To convert to Domino-
Domino-style design we need to
create schematic that uses non-
non-inverting gates:
(1) look for CMOS gates followed by inverter
(2) use Demorgan’s Law to create non-
non-inv gates

use Demorgan’s law


A
B X

C
D
E Y
F
G
H
Convert to Domino OR gate

Domino AND
A
B X

D
E Y
F
G
H Domino AND-
AND-OR
Domino OR

MicroLab, VLSI-6 (13/28)

JMM v1.3
Domino Logic Design (II)
X Y

8/2 8/2 8/2

A E G H

B D F

C nfet W/L = 4
pfet W/L = 8

CLK

s = static
d = domino (W/L = 4)
dd = domino (W/L = 8)

MicroLab, VLSI-6 (14/28)

JMM v1.3
Dual--rail Domino Logic
Dual
Domino circuits that generate both polarities of output

CLK CLK

A A
A B A B
B B

CLK CLK

CLK

A A
A

B B

CLK

MicroLab, VLSI-6 (15/28)

JMM v1.3
Multiple--output Domino
Multiple
Why stop at complementary outputs? There are interesting
multiple-
multiple-output functions where there is a lot of sharing of nfets in
the evalua
evaluate logic.
logic. For example, in a carry-
carry-lookahead adder

C1 = G1 + P1C0 Gi = A i Bi
C2 = G2 + P2G1 + P2P1C0 Pi = Ai+Bi
C3 = G3 + P3G2 + P3P2G1 + P3P2P1C0
C4 = G4 + P4G3 + P4P3G2 + P4P3P2G1 + P4P3P2P1C0

CLK
C4
P4 G4
C3
P3 G3
C2
P2 G2
C1
P1 G1

C0

Domino version of the


Manchester carry chain

MicroLab, VLSI-6 (16/28)

JMM v1.3
Dual--rail “Keeper” Circuit
Dual
CLK

A
A B
B

CLK

The cross-
cross-coupled pfets serve as “keepers”
for the output which is high making the gate
static rather than dynamic! During precharge
both keepers are off; during the evaluate
phase, the output that goes low switches
on the keeper for the output that is staying
high. Really solves capacitive coupling
problems with dynamic logic in datapaths.
datapaths.

MicroLab, VLSI-6 (17/28)

JMM v1.3
Cascade voltage switch logic (CVSL)

Q Q clock Q
Q

nmos nmos
combinatorial combinatorial
network network

clock
The static version might be dynamic CVSL
quite slow due to the nfet
pfet “fight” during switching
Q
Q
d e
d a
b a
e b c
c

MicroLab, VLSI-6 (18/28)

JMM v1.3
CMOS NORA Logic (NP Domino)
p blocks n blocks n blocks p blocks

pre eval pre

nfets pfets nfets

CLK eval CLK pre CLK eval

If we turn a dynamic gate “upside down” and use pfets to build the
logic block, we get a logic gate that “pree
“preecharges” low and
“discharges” high. By using these gates in an alternating seque
sequence
nce
with regular nfet dynamic gates we can eliminate the race problem
we had with nfet-
nfet-only dynamic gate sequences and hence we don’t
need the buffer inverter present in domino gates.

Removing the buffer is a mixed blessing since we may need it for


drive reasons and to keep compatibility with other domino gates. It
also makes NORA logic very susceptible to noise since during the
evaluate phase all
information is stored dynamically.

MicroLab, VLSI-6 (19/28)

JMM v1.3
Domino Life Cycle
Actively pr
precharging

Waiting for precharge Waiting for data


(holding output value) (holding precharge)

Actively evaluating

The “9 O’clock” state is very interesting: once a Domino gate has


has
finished evaluating, the gate’s immediate predecessors can start to
pre
preearge (forcing the gate’s inputs low) without affecting the value of
the gate’s output. The gate is acting as latch so long as its
predecessors don’t start another evaluate cycle.
might be several gates
Perhaps we can build a pipeline of domino stages where each stage
stage
serves as both logic and latch depending on where it is in its cycle.
cycle.
Need to have each stage supply its own pre
precharge/evaluate timing
dependent on what its neighbours are doing...

MicroLab, VLSI-6 (20/28)

JMM v1.3
Self--timed Pipelines
Self
0 = precharged
1 = evaluation done

P/E done?
done? P/E done?
done? P/E done?
done?

F1 F2 F3

Simplest correctness rules:


Š a stage only prec
precharges when both Sdone = 1
(a) its successor has finished evaluating
(it’s done with our values) Pdone = 0
(b) its predecessor has finished precharging
(old values are gone so we can’t use ‘em
‘em twice!)
Š a stage only evaluates when both Sdone = 0
(a) its successor has finished precharging
(our new output won’t affect its stored value)
(b) its predecessor has finished evalu
evaluating
(there are new inputs for us to consider) Pdone = 1

So, what logic goes in the clouds?


And how do we build the “done?” boxes?
MicroLab, VLSI-6 (21/28)

JMM v1.3
Muller CC--Element
Add weak feedback
inverter if we’re worried
about dynamic storage
for precharge/eval
precharge/eval signal

P/E

Pdone

Sdone

The Muller C-
C-Element is the “AND” gate for self-
self-timed
logic because it changes its output only after both inputs
have changed. As shown above, it’s an elegant
implementation for both sets of rules on the previous
slide.

MicroLab, VLSI-6 (22/28)

JMM v1.3
Completion Detectors
Self-
Self-timed logic
dual-rail signalling (i.e., two wires) to encode
use dual-
reset (not yet evaluated) 00
ready with value 0 01
ready with value 1 10
and then build handshake logic that starts
next stage when current stage is done and next
stage has completed its previous computation
and delivered its values...

MicroLab, VLSI-6 (23/28)

JMM v1.3
Self--timed Pipeline Latency
Self
1 = precharged
0 = evaluation done

C C C

P/E done?
done? P/E done?
done? P/E done?
done?

F1 F2 F3

Propagation through self-


self-timed pipelines
is constrained in both directions:
In the forward direction by how long it takes for the evaluate
edge in one stage to trigger the evaluate edge in the next stage:
stage:
LF = tFÇ + tDÈ + tCÇ
In the reverse direction by how long it takes
for the precharge in one stage to trigger a new
evaluate in the stage after first evaluating the previous stage
(remember not double count!):
LR = 0.5*(t
0.5*(tCÈ + tFÈ + tDÇ + tCÇ + tFÇ + tDÈ)

MicroLab, VLSI-6 (24/28)

JMM v1.3
Further Improvements
We don’t have to delay evaluation until successor has finished
its precharge (signalling that it’s finished with our values). We
can just check that successor has started precharging…
precharging… Even
with this improvement, the correct sequencing will still happen
for any combination of precharge and evaluate times for all the
gates.
We can modify the control element like so:

S P/E

Eliminate the “extra”


inverter for good measure
P/E
and use dynamic storage
Pdone as control element memory

Sdone

We’re going to stop here, but there are other


improvements that can be made. Hint: do we have
to wait until the predecessor is done computing
new values before starting our eval?
eval? etc., etc., etc.
MicroLab, VLSI-6 (25/28)

JMM v1.3
Dynamic Logic Summary
Advantages of dynamic logic:
Š smaller area than fully static gates
Š smaller parasitic capacitances hence higher
speed
Š reliable operation if correctly designed. Concerns:
capacitive coupling to dynamic nodes
charge sharing with dynamic nodes
subthreshold leakage currents in eval logic
minority carrier injection and latchup
alpha particle immunity
vdd/
vdd/gnd noise and resistance

This makes dynamic logic a good choice for those parts of


a circuit where the extra engineering investment is
justified, e.g., along the critical timing paths.

Engineers who like this sort


of design will find this the sort
of design they like!

MicroLab, VLSI-6 (26/28)

JMM v1.3
Coming Up...
Next topic…
CMOS sequential logic.
logic.

Readings for next time ...


Weste:
‹ 5.4.4 (dynamic CMOS logic)
‹ 5.4.7 - 5.4.11 (CMOS domino logic, CVSL), except
5.4.10

MicroLab, VLSI-6 (27/28)

JMM v1.3
VLSI--6
Exercises: VLSI

Weste pp371: 5.9ex8 (difficulty: easy): Design a


CVSL gate for the following fun
function:
ction:
S = A ⋅B ⋅ C + A ⋅B ⋅C + A ⋅B ⋅ C + A ⋅B ⋅C

MicroLab, VLSI-6 (28/28)

JMM v1.3
VLSI Design I
Clocking Strategies

“I take care of it” ?

Generator
Clock

Today’s handouts:
(1) Lecture Slides

MicroLab, VLSI-7 (1/8)

JMM/ESA v1.0
VLSI Systems Design
Microelectronic Technologies

Overview
‹microelectronic technologies, ASIC, FPGA, µC

Goal: You are familiar with the microelectronic


technologies, and know their advantages and features.

MicroLab, VLSI-8 (1/20)

JMM v1.4
Microelectronic Technologies

Š What is microelectronic ?
Š Has a microelectronic design engineer only to have
good knowledge about silicon, layout, etc. ?

application specific
integrated circuit
macro cell full custom
standard cell
gate array microprocessors
PIC, COP
FPGA RISC
uController
PAL CPLD signal processor

field programmable logic

MicroLab, VLSI-8 (2/20)

JMM v1.4
Gate Array Technology #1
Š prefabricated wafers
Š I/O stages predefined
Š regular array of fets and interconnection channels
Š interconnection defines functionality
Š features
Š size: 100 - 1M gates
Š short turn around time
Š cheap at medium quantities
Š unsuitable for regular structures like RAM, PLA, ALU

MicroLab, VLSI-8 (3/20)

JMM v1.4
Gate Array Technology #2

Š 3 cells of a gate array are illustrated


Š 1 cell corresponds to a 2 input nand gate

MicroLab, VLSI-8 (4/20)

JMM v1.4
Sea--of
Sea of--Gate Technology
Š prefabricated wafers
Š I/O stages predefined
Š regular array of fets,
fets, no reserved interconnection
channels
Š interconnection defines functionality
Š features
Š size: 100 - 1M gates
Š short turn around time
Š cheap at medium quantities
Š regular structures like RAM, PLA, ALU can be used

MicroLab, VLSI-8 (5/20)

JMM v1.4
SOG Example
nwell contacts
INV NOR2

GND

3 nfets
2 small, 1 large horizontal
mosfets with wiring tracks
common gate in metal-
metal-1
3 pfets

gate isolation VDD


mosfets
unused horizontal
and vertical tracks
used for wiring
gates together.
Better granularity
if main routing
channels run
vertically.

GND
substrate
contacts
vertical wiring tracks
in metal-
metal-1 or metal-
metal-2
MicroLab, VLSI-8 (6/20)

JMM v1.4
Standard Cell Technology
Š complete fabrication process
Š predefined library of base functions
Š modular similar to TTL families
Š features
Š chip size limits complexity
Š long turn around time
Š cheap at high quantities
Š standardized cell height
Š unsuitable for regular structures
Š more flexible and compact (1:4) than gate array

MicroLab, VLSI-8 (7/20)

JMM v1.4
Standard Cell Example
Create a library of pre-
pre-layed-
layed-out cells, e.g,, boolean gates,
registers, muxes,
muxes, adders, I/O pads, … A data sheet for
each cell describes the cell’s function, area, power,
propagation delay, output rise/fall time as function of
load, etc.

Quiz: what‘s the


cells function

It’s just like designing with board-


board-level components.
CAD tools help with placing the cells to minimize area
and to meet timing constraints (perhaps directed by a
floorplan created by the user); routers make the
appropriate connections between the cells.
MicroLab, VLSI-8 (8/20)

JMM v1.4
Full Custom Technology

Š complete fabrication process


Š total flexibility, only limited by layout rules
Š manual design
Š features
Š chip size limits complexity
Š long design and fabrication time
Š efficient use of silicon area
Š cheap only at highest quantities (ex. uP,
uP, memories, ...)

MicroLab, VLSI-8 (9/20)

JMM v1.4
Macrocell Technology #1

Š complete fabrication process


semi-- and full custom technologies
Š combines semi
Š predefined library of base functions
Š generators for regular structures
Š features
Š chip size limits complexity
Š short design, long fabrication time
Š cheap at high quantities
Š high flexibility,
compact layouts
PLA
macro cell
RAM

MicroLab, VLSI-8 (10/20)

JMM v1.4
Macrocell Technology #2
2-dim array of standard cell block
full custom block

full custom block

MicroLab, VLSI-8 (11/20)

JMM v1.4
FPGA Technology #1
Š field programmable device
Š no fabrication needed for customizing
Š predefined logic blocks
Š unsuitable for regular structures
Š features
Š size: up to 2‘000’000 logic gates (see Virtex from Xilinx)
Xilinx)
Š large silicon area necessary (72 million fets,
fets, 10x Pentium2)
Š short design and customize time
Š cheap for small quantities
Š compared to ASICs,
ASICs, FPGAs have a reduced clock speed
Š circuit configuration downloadable (RAM or PROM)

MicroLab, VLSI-8 (12/20)

JMM v1.4
FPGA Technology #2
configurable
logic block (CLB)
I/O buffers
switching
matrix
I/O buffers

I/O buffers

routing
channels
I/O buffers

Š configuration
- mask programmable
- one time programmable
- downloading of configuration from host into internal RAM
- downloading of configuration from on board serial ROM

MicroLab, VLSI-8 (13/20)

JMM v1.4
JMM v1.4
CLB from Xilinx serie XC5200

C1...C4

H1 Din/H2 SR/H0 EC

G4
Din Bypass
G3 Logic F’ S/R
Function Control SD YQ
G’
of
G2 H’ D Q
G1...G4

G1

Logic
Function
G’ EC
of
F’,G’ H’ H’ RD
1
and H1 Y
F4
Din
Bypass
F3 Logic F’ S/R
Function G’ Control SD XQ
F2 of H’ D Q
G1...G4

F1

K (Clock)
EC
FPGA Technology #3

RD
1
H’ X

MicroLab, VLSI-8 (14/20)


F’
FPGA Technology #4

Switching matrix with CLBs

CLB CLB CLB

PSM PSM

CLB CLB CLB

PSM PSM

CLB CLB CLB

MicroLab, VLSI-8 (15/20)

JMM v1.4
uC Technology

Š field programmable device


Š no fabrication needed for customizing
Š simple C software compilers
Š software vs. hardware solutions
Š features
Š 4 or 8 bit CPU, size: 512 bytes or more
Š down to 8 pins
Š AD, usart,
usart, timer, etc. included
Š very slow compared to hardware solutions
Š cheap (<$2)

PIC

36 mm
MicroLab, VLSI-8 (16/20)

JMM v1.4
How to select a technology

Selection arguments
- cost
- speed
- size
- time to market

cost

units ASIC
FPGA
NRE
units
design
design

break even quantity

MicroLab, VLSI-8 (17/20)

JMM v1.4
Coming Up...
Next topic…
Hardware description language VHDL, top-
top-down
design.

Readings for next time…


Xilinx article: The total cost of ownership

MicroLab, VLSI-8 (18/20)

JMM v1.4
VLSI--8
Exercises: VLSI #1

Ex vlsi08.1 (difficulty: easy): Calculate the breakeven


point between an FPGA and ASIC design. Assume a
design time of 6 months and an additional back-
back-end
design time of 1 month for the ASIC. The NRE
costs of the ASIC are 75kEuro, the cost per unit
are 150Euro for the FPGA and 3 Euro for the ASIC.
The cost of 1 engineer per month are 10kEuro.
Result: breakeven at 578

MicroLab, VLSI-8 (19/20)

JMM v1.4
VLSI--8
Exercises: VLSI #2

Ex vlsi08.2 (difficulty: medium): Calculate the


breakeven point between an FPGA and ASIC design.
Assume the design costs from exercise vlsi08.2
and a fabrication time of 3 months for the ASIC.
The revenue per sold system at a product lifetime
of 4 years is 600Euro without taking into account
the FPGA/ASIC chip costs. Use the triangular
time-
time-to-
to-market model from Synopsys (see Xilinx
article “The total cost of ownership).
Result: breakeven at 14068 FPGA solutions
maximum available
units/time revenue

time
delayed market d
introduction L
product life

MicroLab, VLSI-8 (20/20)

JMM v1.4
VLSI Design I
Regular Logic Structures

Today’s handouts:
(1) Lecture Slides

MicroLab, VLSI-9 (1/12)

JMM v1.2
Goals for Regular Logic Structures

Look for a systematic physical structure:


w get handle on layout for “random” logic
w automate layout task once schematic is done
w may have several structures to choose from, each optimized
along a different design dimension
standard cells, gate arrays

But we still have to draw the schematic! So look for


systematic logical structures:
w may lead to additional systematic physical layouts
w find canonical logic representations that can be automatically
turned into compact physical structures (automate, automate,
automate…)
w would like to be able to make changes in the logic without
having to redo entire layout -- look for “ECO-tolerant”
structures (engineering change orders)
muxes, ROMs, PLAs

MicroLab, VLSI-9 (2/12)

JMM v1.2
Useful Logic Forms
Truth tables
w direct implementation as muxes, ROMs
w good when you have many outputs and few inputs since cost of
“decoding” inputs is fixed
w ECO-tolerant but often not efficient use of logic
Minimum Sum-of-Products (SOP, AND-OR)
w minimize no. of literals (small fan-in ANDs) or no. of products
(small fan-in ORs)
w maximum sharing of product terms for multiple-output functions
w if fan-ins are small: direct implementation as complex gates or
as 2-levels of ANDs then ORs
w if fan-ins aren’t small: multiple levels of gates (e.g., parity,
“Achilles heel” = 2n-1 minterms)
w efficient use of logic, but not very ECO-tolerant
But how do we minimize the number of literals or
minterms? Yeah, we know about Karnaugh maps, but
they aren’t so good for more than 4 inputs or for
maximizing minterm sharing.

MicroLab, VLSI-9 (3/12)

JMM v1.2
Logic Manipulation
Start with two-level minimization
w by inspection searching for terms that are logically
adjacent:
p⋅ x + p⋅ x = p ⋅( x + x ) = p ⋅1= p
w Karnaugh maps for simple situations
w Quine-McCluskey otherwise
Then try to generate multiple levels:
w factoring. Choose literal that appears in most product
terms (>1) and factor it out.
F = a ⋅c + a ⋅d +b⋅c +b⋅d +a ⋅e
= a ⋅(c +d ) +b⋅ (c + d ) + ae
w factor again with or-terms that appear in multiple places
F = (a + b) ⋅(c + d ) + ae
w find common subexpressions (multiple output
decomposition)

MicroLab, VLSI-9 (4/12)

JMM v1.2
Muxes as “lookup tables”
A B C F
0 0 0 0
0 0 1 0
0 1 0 0 0
0 1 1 1 C
1 0 0 0 C
1 0 1 1 1
1 1 0 1
1 1 1 1
A,B

A,B,C

Easy to implement but not necessarily


compact even when implemented with TGs.
But you can make a nice Boolean Unit:

OP0
OP1 F Vcc

OP2
OP3
B

A,B out

OP<3:0> F
0 0 0 0 ZERO A
1 0 0 0 AND gnd
1 1 1 0 OR
0 1 1 0 XOR
MicroLab, VLSI-9 (5/12)

JMM v1.2
Read-only Memories
if connection or if connection or
mosfet is present, mosfet is present,
blank otherwise blank otherwise
7

6
5
4
Address decoder 3 For each Fi,
implemented as OR together
AND (= NOR). 2 all rows for
Note: all but one which output
row pulled down 1 is 1 (actually
for given input. use NOR then
0 invert).

A B C F1 F0

Like muxes, but share decoding logic among


all outputs. Potential optimizations:
w delete rows with no output pulldowns
w look for “adjacent” rows with identical
output pulldown configurations and
merge into single row.
Are these worth doing?
MicroLab, VLSI-9 (6/12)

JMM v1.2
PLAs
In fact, the optimizations from the previous
slide are so worthwhile that we have a
name for the resulting “optimized” ROM:
Programmed Logic Array, or PLA for short.
“AND” plane “OR” plane
Hint: for greater
ECO-tolerance, add
4,5,6,7 a few extra empty
2,3 rows!

1
What are the logic
equations for F1
and F0?
A B C F1 F0

PLAs are usually constructed directly from minimized


SOP logic equations: the rows represent the minterms of
the equations, the “input” columns form the minterms
and the “output” columns form the sums.
Note that with multiple output columns,
minterm sharing between the outputs happens naturally...

MicroLab, VLSI-9 (7/12)

JMM v1.2
PLA Folding
PLAs can be sparse, i.e., only a few of the
possible connections in either plane may
be made. (AND plane can only have 50%!)
A A B B C C D D F1 F2
1
2
3
4
5
6

If we allow input and outputs to come from


both above and below then we may be able
to fold two columns into one if the rows
they use don’t overlap. This may require
rearranging the rows to minimize overlap
and hence maximize folding possibilities.
A A B B F1
Row folding 6
is another 1
possible 2
optimization 3
(but not in this 4
example). 5
D D C C F2

MicroLab, VLSI-9 (8/12)

JMM v1.2
Multiple-input encoding
On the previous slide, it was noted that
the AND plane can have at most 50% of
its connections programmed. Why?

To improve the utilization of the input


columns, consider encoding the 4 columns
used to transmit the two input literals and
their complements with some more useful
functions of the two literals. For example:

AB AB AB AB
A A B B

AIN BIN
AIN BIN
You get extra computing oomph: for example, it’s now
possible to compute (A xor B) using
a single row rather than the two rows it took
with the old encoding.
MicroLab, VLSI-9 (9/12)

JMM v1.2
Datapath Operators
Most digital functions can be divided into the
following categories:
u datapath operators
u memory elements
u control structures
u I/O cells
Datapath operators form an important subclass of
VLSI design that benefit from the structured design
principles of hierarchy, regularity, modularity and
locality.
u N-bit Data is generally processed by the use of n
identical subcircuits.
u Data operations may be sequenced in time or space.

MicroLab, VLSI-9 (10/12)

JMM v1.2
Datapath Operator Example
Magnitude operator example:
u data may be arranged to flow in one direction
u control signals are introduced in an orthogonal direction
to the dataflow
less than or equal
B m

m Z
- =0
m
A

If (A<=B) then Z=A else Z=B

ctrl
Am
Bm - =0 if Zm
Am-1
B m-1 - =0 if Zm-1
m bits

A1
B1 - =0 if Z1
A0
B0 - =0 if Z0
subtractor equal-zero mux
metal1 control flow

metal2 data flow

MicroLab, VLSI-9 (11/12)

JMM v1.2
Coming Up...
Next topic…
Sequential logic: state elements, latches and
registers. Static vs. dynamic storage. Single and
multiphase clocking strategies. Setup and hold
times; propagation delays.
Readings for next time…
Weste:
u Sections 8.1 thru 8.2 (data operators)
u 8.3.2, 8.4.2 (just read, don‘t study)

MicroLab, VLSI-9 (12/12)

JMM v1.2
VLSI Design I
CMOS Sequential Logic
Clocking Strategies

Overview
single and double phase clock systems
Latch and FF timing

Goal: You are familiar with static and dynamic


latches/FFs
latches/FFs as well as with single, double phase
clock, clock redistribution, clock skew and PLL
clocking techniques.
MicroLab, VLSI-10 (1/23)

JMM v1.4
Sequential Logic
Use #1: Get better utilization from
idle combinational logic blocks.
Pipeline the system so that new
computations start before the old ones
complete. Add registers to keep
computations separate.

8
A
8 Use #2: Convert parallel operations
x C
B to a sequence of (faster, smaller)
8 serial operations.
operations.
1
A
1
+ C
B
8 8

Use #3: Need to process a


sequence of inputs and want to
reuse the same hardware (finite
state machine).

MicroLab, VLSI-10 (2/23)

JMM v1.4
Flip--Flops
Latches and Flip
Q follows D

D Q D

G G
Q
level sensitive latch
Q stable

Q takes value from D

D Q D

clk clk
Q
edge sensitive flip-
flip-flop

Q stable

A static latch will hold data while G is inactive, however long


that may be. A dynamic latch will hold data while G is
inactive, but only “for a while”, after which the saved value
may decay.
Do static latches dissipate static power?
How long is “for a while”?
Which one should I use?
MicroLab, VLSI-10 (3/23)

JMM v1.4
Latch Timing Constraints #1
latch a latch b

D Q CLa D Q CLb D Q

G G G

CLK

t1a
t2b
H S
CLK H S

Do I have to
check ALL these t1a = tnqa+ tnla > thb
constraints?
t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
th = hold time
ts = setup time
tn = min delay from invalid input to invalid output
tx = max delay from valid input to valid output
tl = delay for combinatorial logic from input to output
tq = delay for memory element from G to Q

tc0 = low period of clock cycle tc


MicroLab, VLSI-10 (4/23)

JMM v1.4
Latch Timing Constraints #2
t1a
t2b
H S
CLK H S

t1a = tnqa+ tnla > thb


t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
Questions for latch-
latch-based designs:
Š how much time for useful work (i.e. for combinational logic
delay)?
txla + txlb < tc - 2(t
2(ts + txq)
Š what is the maximal clock frequency
1/f = tc > 2(t
2(txq + txl + ts )
Š does it help to guarantee a minimum tn, for example, by requiring
a minimum number of gates in each cloud?
Š Suppose the maximum clock skew is tSKEW. How does that affect
the equations above? Clock skew measures the difference in
arrival of CLK at two cascaded latches (not necessarily any two
latches!).
MicroLab, VLSI-10 (5/23)

JMM v1.4
Static Latches
Basic idea: Want storage node to
be isolated from whatever
Need gain around user does to Q.
this loop to make 0
latch static.
Q
D 1
Would like fast CLK-
CLK-to-
to-Q,
small setup and zero hold
times.
CLK
Oops… feedback not
Obvious implementation: isolated from Q. Could
add additional
output inverters...

Good! Input goes


only to fet gates
Q

D D

CLKN

CLK CLK
Should we buffer CLK
0, 1 or 2 times?

MicroLab, VLSI-10 (6/23)

JMM v1.4
Latch Timing
1 2

CLK

setup time = how long D input has to be stable


before CLK transition.
hold time = how long D input has to be stable
after CLK transition.
ts
th
CLK

So, what node should we use to measure


setup and hold times? And what should we measure?

Other time of interest: CLK-


CLK-to-
to-Q MicroLab, VLSI-10 (7/23)

JMM v1.4
Dynamic Latches
Suppose in the interest of speed we were
willing to give up the “static guarantee”
and take our chances with dynamic latches,
i.e., remove feedback path...
Eliminate when
Q fanout is small (1)

D Q
Can combine
other logic
with inverter
CLK local or global
clock inverter?

Can we do without the CLK inverter too?


DEC did without on 21064 but put in back in for 21164

CLKN D Q
D Q
CLK
CLK

Delete the PFET driven by CLKN and then add


NFET driven by CLK in Q’s pulldown path to
handle what happens when D goes from 1 to 0.

MicroLab, VLSI-10 (8/23)

JMM v1.4
Flip--flops (registers)
Flip
Using alternating positive and negative dynamic latches with
a single clock gives great speed and small area, but…
Š lots of worries about clock skew
Š must balance logic delays to minimize wastage
Š need latch size checks (check optimisations!)

What about those of us who don’t have buildings full of


engineers to sweat the details? Use D-flip-
flip-flops and
address all the problems once!

D D Q D Q Q D D Q Q
master slave
G G CLK
CLK

D
CLK

Q
!
MicroLab, VLSI-10 (9/23)

JMM v1.4
Flip--flop Implementations
Flip
Obvious implementation:

Q
D

CLK

Use “jamb” latches to lighten CLK load:


“Weak” feedback inverters
(long n and p) get overridden

D Q

CLK

MicroLab, VLSI-10 (10/23)

JMM v1.4
Flip--Flop Timing
Flip
D Q CL D Q

clk clk

CLK

t1
t2
CLK

t1 = tnq + tnl > th


t2 = txq + txl < tc - ts

Questions for register-


register-based designs:
Š how much time for useful work (i.e. for combinational logic
delay)?
Š does it help to guarantee a minimum tn? How about designing
registers so that
txq > th?
Š Supp
Suppose the maximum clock skew is tSKEW. How does that affect
the equations above?

MicroLab, VLSI-10 (11/23)

JMM v1.4
Flip--Flops
Dynamic Flip
I’ll have the Christer Svensson
special please!
2

CLK QN

CLK is low:
Š node 1 follows not(D)
Š node 2 pulled up
Š QN is “floating” with it’s old value

CLK is high:
Š node 2 = “0” if node 1 = “1”,
otherwise it stays “1”
Ö node 2 = not(node 1) shortly after CLKÏ
Š QN = not(node 2) Ö stable soon after CLKÏ
Š node 1 can be pulled down if D goes to “0” (capacitive
coupling), but node 2 won’t change!
MicroLab, VLSI-10 (12/23)

JMM v1.4
Single--Phase Clocked Systems
Single
RTL #1:
D Q D Q D Q

clk clk clk

CLK

latch #2:
D Q D Q D Q

G G G

CLK

Simplest clocking methodology is to use a single clock in conjunction


conjunction
with a register. Clocks are generated with global clock buffers.
CLK and CLK are generated locally.
buffers necessary
for large loads
clk-
clk-in
clk

clk
MicroLab, VLSI-10 (13/23)

JMM v1.4
Clock Skew
D Q D Q D Q

clk clk clk

CLK delay delay

Š if a clock net is heavily loaded, there might be a race


between clock and data -> clock skew
Š special attention has be made by designing the clock
tree. CAD tools are able to design balanced clock trees.
Š two methods to avoid clock skew:
latch
D Q D Q D Q

clk clk clk

CLK delay

D Q D Q

clk clk

delay CLK
MicroLab, VLSI-10 (14/23)

JMM v1.4
Two--Phase Clocked Systems (latch)
Two

D Q D Q D Q

G G G
PHI1
PHI2
phi1
“non-
“non-overlapping
two phase clocks” phi2

Š a problem in single phase clocked systems is the


generation an
and distribution of nearly perfect overlapping
clocks.
Š in two-
two-phase clocked systems this is solved by non-
non-
overlapping clocks
Š non-
non-overlapping clocks can be generated with latch
structures
clk
≥1 phi1

≥1 phi2

MicroLab, VLSI-10 (15/23)

JMM v1.4
Two--Phase Clocked Systems (FF)
Two
D Q D Q D Q

clk clk clk

CLK

CLK
“non-
“non-overlapping
two edge clocks”

‹ in properly designed two-


two-edge clocked systems clock
skew problems are drastically reduced
‹ Disadvantage: 50% speed reduction
‹ typical application: FSM on rising edge, data-
data-path on
falling edge
‹ designs with several FSMs and data-
data-paths need thorough
design

MicroLab, VLSI-10 (16/23)

JMM v1.4
Clock Distribution
Two main techniques for clock distribution exist:
‹ a single large buffer (see Alpha processor)

‹ a distributed clock tree approach

n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
delays have
n-bit datapath to match
clk between
n-bit datapath
n-bit datapath stages
n-bit datapath
n-bit datapath
n-bit datapath

‹ there is no such thing as design-


design-free clocking
strategy in today’s high-
high-performance processes
‹ clock buffers should be surrounded by power pads
due to its large power consumption
vdd clk gnd clk

clk clk clk clk driver

clk

MicroLab, VLSI-10 (17/23)

JMM v1.4
Phase Locked Loop Clock Technique
Phase locked loops (PLL) are used to generate
internal clocks on chips for two main reasons:
‹ to synchronize the internal clock of a chip with an
external clock
‹ to operate the internal clock at a higher rate than
the external clock input
clock clock

PLL

clock clock
route route

dclk dclk

dclk+dpad dpad

clock clock

dclk dclk

data out data out


MicroLab, VLSI-10 (18/23)

JMM v1.4
PLL Divider #2
by n

up VCO
Phase Charge voltage
Filter
fosc Detector down Pump controlled n x fosc
oscillator
PLL
fosc

ffeed

up

down

Ufilter
‹ The phase detector produces a sequence of up/down
pulses, which are used to switch a charge pump.
‹ The charge pump charges/discharges a capacitor
with voltage or current pulses
‹ A filter is used to limit the rate of change of the
capacitor voltage. The result is a slowly changing
voltage that depends on the frequency difference
between the PLL and VCO.
‹ The VCO increases/decreases its frequency of
operation depending on its input voltgae
MicroLab, VLSI-10 (19/23)

JMM v1.4
Static Timing Analysis
Do I have to Yup, for every pair of connected
check ALL the register/latches AND for all
constraints? possible data values!

We need a CAD tool: static timing analyser. Here’s how


it works:
Step 1: “Level-
“Level-ize”
ize” all signal nodes.
Start by assigning all register outputs and top-
top-level inputs a
level of 0. For all other gates: levelOUTPUT =
max(level
max(levelINPUT)+1.

Step 2: Compute min/max signal delays.


For each successive node level, compute min and max time for
all nodes on that level (see next slide for details). This is a
“data independent”
independent” computation. Might need case analysis to
avoid false paths.
paths.

Step 3: Check setup and hold constraints


Use min times of register inputs to check hold time. Use max
times and tCLK to check setup time or use max time + tSETUP
to determine min tCLK.

MicroLab, VLSI-10 (20/23)

JMM v1.4
Stage Delay Computation
Look at each gate and use knowledge of input timing and rise/fall
rise/fall
timing to compute earliest and latest time output could change ffor
or
both rising and falling output transitions.

IN VDD

INÏ Ö OUTÐ
C1 COUT
2
CLKN min Ö 1=OV, fast
IN OUT max Ö 1=VDD, slow
CLK
1 IN GND

INÐ Ö OUTÏ
C2 COUT
Other transitions:
CLKÏ, CLKÐ, CLKNÏ, CLKNÐ min Ö 2= VDD , fast
max Ö 2=0V, slow

Use Penfield-
Penfield-Rubenstein model to compute
td,in-
d,in-out = sum(R
sum(Ri,Ci) over all nodes “i” in the stage, where Ri is
total “effective resistance” to power rail and Ci is non-
non-zero if node
capacitor needs to be charged/discharged. Multiply by degrading
factor to account for rise/fall time of input.

MicroLab, VLSI-10 (21/23)

JMM v1.4
Coming Up...
Next topic…
Data operators

Readings for next time…


Weste:
‹ Sections 5.5 thru 5.5.6 (latch, FF)
‹ 5.5.8 thru 5.5.11 (clock strategy)

‹ 5.5.15 and 5.5.16 (clock strategy)

Selfstudy…
Selfstudy…
Weste:
‹ PLL section 9.3.5.3

MicroLab, VLSI-10 (22/23)

JMM v1.4
VLSI--10
Exercises: VLSI

Ex vlsi10.1 (difficulty: easy): calculate peak current


and power consumption of a 100MHz clock driver
with rise and fall times of 1ns driving 30k registers
bits at 100fF each with Vdd=3.3V
Vdd=3.3V
Result: Ipeak=9.9A, Pd=2.18 Watt

MicroLab, VLSI-10 (23/23)

JMM v1.4
Intro to VLSI Systems
Finite State Machines

Today’s handouts:
(1) Lecture Slides

MicroLab, VLSI-11 (1/9)

JMM/ESA v1.0
Excuse me… Is there such a thing
as unclocked
sequential logic?

Wave pipelining
just assert new inputs to logic after waiting “long enough” to
ensure that previous values won’t be corrupted. Requires very
careful design of each level of logic to ensure consistent
propagation delay along all paths with all possible data values.
Hard to do in the face of manufacturing variataions (“fast N, slow
P” and vice versa)

Self-timed logic
use dual-rail signaling (i.e., two wires) to encode
reset (not yet evaluated) 00
ready with value 0 01
ready with value 1 10
and then build handshake logic that starts
next stage when current stage is done and next
stage has completed its previous computation
and delivered its values. Dual-rail logic works well
with precharge-evaluate gates… more on this
in a later lecture.

MicroLab, VLSI-11 (2/9)

JMM/ESA v1.0
Finite State Machines

Draw and check state transition diagram

merge equivalent states

perform state encoding

design logic implementation

MicroLab, VLSI-11 (3/9)

JMM/ESA v1.0
Correct State Diagrams
in/out 1/0
S1 S2
1/0 1/0 1/0
0/0 0/0
S3 S8 S9 S4
1/1
-/0 1/0 -/0
0/0 0/0

S5 S6
1/1 -/1 1/0 0/0
S7

Is this a Mealy or Moore machine?

Arcs leaving a state must be:


(1) mutually exclusive
can’t have two choices for a given input value
(2) collectively exhaustive
every state must specify what happens for each possible
input combination. “Nothing happens” means arc back to
itself.

MicroLab, VLSI-11 (4/9)

JMM/ESA v1.0
Merge Equivalent States
Two states are equivalent if for each
possible combination of inputs
(1) they have identical outputs
(2) they transition to equivalent states
0/0 0/1 1/0 1/1 0/0

S1 S2 S3 S4 S5
0/1 1/1 1/1 0/1

1/1

Compatibility table:
S2 start by putting “X”
in square (Si,Sj) if Si
produces different output
S3 from Sj for some input
all but
first
state
S4
X
S5 X
S1 S2 S3 S4

all but last state


MicroLab, VLSI-11 (5/9)

JMM/ESA v1.0
0/0 0/1 1/0 1/1 0/0

S1 S2 S3 S4 S5
0/1 1/1 1/1 0/1

1/1

S2

S3

S4
X
S5 X S1,S5

S1 S2 S3 S4

Next: for non-X square (Si,Sj) write in pairs of states that have to be
equivalent in order for Si and Sj to be equivalent.

Finally: Look at an entry in (Si,Sj). If entry is “Sm,Sn”, and if


(Sm,Sn) has an X, put an X in square (Si,Sj). Repeat until no more
squares can be X’ed out.

Remaining squares indicate equivalent states.

MicroLab, VLSI-11 (6/9)

JMM/ESA v1.0
Perform State Encoding
Given a minimized symbolic state diagram,
assign binary codes to the states. We need to predict the
effects of logic minimization and find state encoding the
produces smallest logic implementation.

This is hard when number of states is large!


current new
input state state output
0 S1 S1 1
1 S1 S2 0
0 S2 S1 1
1 S2 S3 1
- S3 S4 0
- S4 S1 1

0 01 01 1 1 00 10 1
1 01 00 0 “Q-M” 0 0- 01 1
S1=01 S3=10
0 00 01 1 - 10 11 0
S2=00 S4=11
1 00 10 1 - 11 01 1
- 10 11 0
- 11 01 1

0 00 00 1 0 0- 00 1
1 00 01 0 “Q-M” 1 -0 01 0
S1=00 S3=10
0 01 00 1 1 01 10 1
S2=01 S4=11
1 01 10 1 - 10 11 0
- 10 11 0 - -1 00 1
- 11 00 1

MicroLab, VLSI-11 (7/9)

JMM/ESA v1.0
FSM Logic Implementation
Multi-level
Logic
ROM
“One hot”
Registers
PLA

“One hot” encoding uses a separate register


for each possible state: register output is “1”
if FSM is in that state. Hence only one state
register is “hot” at a time. Makes for trivial
decoding of state, simple next state logic.
Good for simple FSMs and when no multi-level
synthesis is available. Often a good choice
for FPGA’s.

MicroLab, VLSI-11 (8/9)

JMM/ESA v1.0
Coming Up...
Next topic…
Arithmetic circuits: adders and multipliers.

Readings for next time…


Weste: 8.4

MicroLab, VLSI-11 (9/9)

JMM/ESA v1.0
VLSI Design I
Datapath Operators: Addition and Multiplication

Didn’t I learn how


to do addition in
the first year? 01011
First year courses
arent’ what they +00101
used to be... 10000

Overview
Carry propagate, carry lookahead,
lookahead, carry save, carry skip
and carry select adder

Goal: You know serial and parallel addition and


multiplication architectures
MicroLab, VLSI-12 (1/29)

JMM v1.4
Addition/Subtraction

Most digital functions can be divided into the


following categories:
‹ datapath operators
‹ memory elements
‹ control structures
‹ I/O cells

Adder architectures:
‹ carry-
carry-propagate adder (CPA)
‹ ripple carry adder
‹ carry-
carry-lookahead adder (CLA)
‹ manchester carry adder
Why can‘t we just add
‹ hierarchical carry-
carry-lookahead adder
‹ carry-
carry-save adder (CSA)
‹ carry-
carry-skip adder
‹ carry-
carry-select adder
‹ parallel adder
‹ serial adder ...

MicroLab, VLSI-12 (2/29)

JMM v1.4
Binary Addition
Here’s an example of binary addition as
one might do it by “hand”:
Carries from previous
1 1 0 1 column
01101
+00101
10010
If we use a two’s-
two’s-complement representation
for signed integers, the same procedure will
work for adding both signed and unsigned
numbers.
Besides the sum, one often wants two other
bits of information from an adder:
carry-
carry-out: indicates that add in the most significant position
produced a carry; used when implementing multi-
multi-word arithmetic,
e.g, “1 + (-
(-1)”
C =a ⋅b +s ⋅(a +b )
n−1 n−1 n−1 n−1 n−1
overflow: indicates that the answer has too many bits to be
represented correctly by the result width (2‘s complement),
e.g., “(2N-1 - 1)+ (2N-1- 1)”
V =a ⋅b ⋅ s +a ⋅b ⋅ s
n−1 n−1 n−1 n−1 n−1 n−1
MicroLab, VLSI-12 (3/29)

JMM v1.4
Adder with “ripple” carry chain
To convert the simple addition procedure to hardware, we’ll
need “full adder” module:
A B CIN COUT S
A COUT
0 0 0 0 0
B 0 0 1 0 1
CIN S 0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
One-
One-bit adders are sometimes 1 0 1 1 0
called “counters” since they 1 1 0 1 0
count the number of 1’s on their 1 1 1 1 1
inputs and encode the answer
on their outputs. Thus a full
adder is a 3:2 counter. S = A⋅ B⋅Cin+ A⋅ B⋅Cin+ A⋅ B⋅Cin+ A⋅ B⋅Cin
Cout= A⋅ B+ A⋅Cin+B⋅Cin
COUT
AN-1
Carry “ripples” from BN-1
one stage to the next SN-1
...
A2
B2
S2
A1
B1
A0
C0 S1 propagation delay
B0
S0
_______________
CIN
MicroLab, VLSI-12 (4/29)

JMM v1.4
Faster carry logic (CLA)
Let’s see if we can improve the speed by
rewriting the equations for COUT:
COUT = AB + ACIN + BCIN
= AB + (A + B)CIN
= G + P CIN where G = AB and P = A + B

generate propagate

For adding two N-


N-bit numbers:
CN = GN + PNCN-1
= GN + PN GN-1 + PN PN-1CN-2
= GN + PN GN-1 + PN PN-1GN-2 + … + PN ...P0CIN

So if we had (N+1)-
(N+1)-input gates and didn’t mind a lot of
loading on the P signals,
signals, the propagation delay of adder
built using this equation for the carries would be (count
per fan-
fan-in 1 delay unit: ripple carry: 5N delays):

____________________________________
Of course, this is impractical but it does lead to some
interesting ideas:
Š faster ripple-
ripple-carry implementations
Š hierarchical carry-
carry-lookahead adders

MicroLab, VLSI-12 (5/29)

JMM v1.4
Manchester carry chain (CLA)
The plan: first generate carry-
carry-in for each adder bit as fast
as we can then compute the sum. Delay still proportional
to size of adder, but “constant” is pretty small.
static Manchester stages
P=A+B PN PN

GN CN-1
PN
PN CN
CN-1 CN
GN
GN
PN
PN

When CLK is low, all


dynamic Manchester stage C nodes precharge.
precharge.
CLK

CN-1 CN

PN GN
When CLK is high, if GN
is high, CN is asserted,
CLK i.e., driven low.

To prevent GN from affecting CN-1, PN must be


computed as AN xor BN. But we needed the xor
anyway… now SN = PN xnor CN

MicroLab, VLSI-12 (6/29)

JMM v1.4
Manchester Adder Block (CLA)
PNPN+1PN+2PN+3 link in Manchester
carry chain

SN SN+1 SN+2 SN+3

xnor xnor xnor xnor


CN+3
Cin
Cin
P G P G P G P G
A B A B A B A B

AN BN AN+1 BN+1 AN+2 BN+2 AN+3 BN+3

The propagate logic in the Manchester carry chain


puts a lot of NFETs in series, so when CIN is high
the pulldown path can get long if a lot of the P
signals are true. For most technologies, the
performance of this long pulldown path limits
the maximum length of the carry chain to around
four stages before it needs to split into subchains.
subchains.

Adding a bypass path that skips over the block


when all P signals are true can improve maximum propagation delay
delay
when multiple Manchester carry chains are used in series.

MicroLab, VLSI-12 (7/29)

JMM v1.4
carry--lookahead adders
Hierarchical carry
The linear growth of adder carry-
carry-delay with size of the input word may
be improved by calculating the carries to each stage in parallel:
parallel:

“generate a carry from bits I thru


K if it is generated in the high-
high-order
(J+1,K) part of the block or if it is
CJ = GIJ + PIJCI-1 generated in the low-
low-order (I,J) part
of the block and then propagated
GIK = GJ+1,K + PJ+1,K GIJ thru the high part”
PIK = PIJ PJ+1,K where I <= J and J+1 <=K

7 6 5 4 3 2 1 0

6,7 4,5 2,3 0,1

4,7 0,3 log2(n)

0,7

AK SK BK GJ+1,K CJ PJ+1,K

GIJ
K I,K C I- 1
PIJ

PK CK-1 GK GIK CI-1 PIK

MicroLab, VLSI-12 (8/29)

JMM v1.4
Carry--skip adders
Carry
Since computing PIK is simpler than computing GIK, let’s try just
computing PIK and apply the “skip” optimisation from Manchester
adders.

C12 C0

P8,11 C8 P4,7 C4

Supp
Suppose
ppose it takes 1 time unit for a signal to pass thru
two logic levels, then
time to ripple thru block of k bits = k time units
time to skip a block = 1 time unit

Consider a 24-
24-bit carry-
carry-skip adder organized as
6 blocks of four bits each. So the worst case propagation time is

4 + 1 + 1 + 1 + 1 + 4 = 12 time units
ripple skip ripple
But now reorganize the adder with the least significant 3 bits inin the
first block, the next 4 bits in the second block, followed by bl
blocks
ocks of
5, 5, 4, and 3. Now the worst case propagation time is

3 + 1 + 1 + 1 + 1 + 3 = 10 time units

MicroLab, VLSI-12 (9/29)

JMM v1.4
Late--arriving inputs
Late
Is there a general way to reorganize a
logic equation to accommodate a late-
late-
arriving input?

Consider the following where X1 arrives late:


f = X ⋅X +X ⋅X +X ⋅X
1 2 1 3 4 5

If we want only one gate delay from X1 to


the output f, how do we do it?

MicroLab, VLSI-12 (10/29)

JMM v1.4
Carry--select adders
Carry
Building on the idea from the previous slide: perform two
additions in parallel, one assuming the carry-
carry-in is zero and
the other assuming the carry-
carry-in is one. When the carry-
carry-in
is finally known, the correct result is selected from the two
precomputed results.

0 0

... ...
1 1
CIN

>=1 1 0 ... 1 0
>=1 1 0 ... 1 0 ...
& &

Is this a “mux
“mux”?
mux”?

If it takes k time units for a block to add k-


k-bit numbers
and if it takes one time unit to compute mux select from
the two carry-
carry-out signals, then for optimal operation each
block should be one bit wider than the next block, just as
in the carry-
carry-skip adder.

MicroLab, VLSI-12 (11/29)

JMM v1.4
JMM v1.4
32-
32-bit carry-
carry-select adder
Adder layouts

32-
32-bit carry-
carry-lookahead adder

MicroLab, VLSI-12 (12/29)


N--bit numbers
Adding M N
“carry-
“carry- M-1
propagate”

... ... ... ... ... ...


N ...

0 0 0 0 0 0

prop delay _____________ area _____


“carry-
“carry-save” M-2

... ... ...


N

0
0 0 0

prop delay _____________ area _____


MicroLab, VLSI-12 (13/29)

JMM v1.4
Even--Odd Arrays
Even
Abstract carry-
carry-save picture from previous page:
M-2

...
CSA

CSA

CSA

CSA

CSA

CPA
Rewire so that first two adders work in parallel.
Feed results into third and fourth adders which
also work in parallel, etc.
M-4 2

...
CSA

CSA

CSA

CSA

CSA

CSA

CPA

prop delay _____________ area _____


Even and odd streams pass through half the
adders so even/odd design runs at almost twice
the speed of simple CSA implementation.

MicroLab, VLSI-12 (14/29)

JMM v1.4
Wallace Trees
O(log1.5M)
CSA

CSA

CSA

CSA

CPA
CSA

CSA

... We have been using full-


full-adders
or 3:2 counters in our array
adders. Higher fanin-
fanin-counters
can be used to further reduce
delays for large M, e.g., Weste
CSA

shows a 5:3 counter in Fig. 8.41.

Wallace trees give asymptotically better behaviour than the earlier


earlier
O(M) schemes, but they do not have a regular layout. Other
O(log(M)) schemes, e.g., binary-
binary-tree multipliers using signed
digit representations, have better layout properties but at a cost
cost of
more complicated adder cells.

MicroLab, VLSI-12 (15/29)

JMM v1.4
Bit--Serial Adder
Bit

• bit-
bit-serial adders are very slow, have a high data
latency, but are extremely compact
• applications are signal processing

cout
FF

clk clr
A
result
n-bit register

B n-bit register

n-bit register cin


clk

MicroLab, VLSI-12 (16/29)

JMM v1.4
CSA Adder (pipelining)
• pipelining adders are extremely fast, but lack of
high data latency (CSA structure of slide #13)
nc
FF
S=A+B+C+D Carry

FF
FF
D(3)
FF
FF
A(3) S(3)
B(3)
C(3) FF
FF
D(2)
FF
FF
A(2) S(2)
B(2)
C(2) FF
FF
D(1)
FF
FF
A(1) S(1)
B(1)
C(1) FF
FF
0
D(0)
FF S(0)
FF
A(0)
clk B(0) CPA adder
C(1) 0 clk
MicroLab, VLSI-12 (17/29)
CSA adders
JMM v1.4
CPA Adder (pipelining)
• the CPA structure on slide #13 can also be used in
a pipeline structure. Useful in signal processing
applications.

FF Carry
B(3) FF FF FF FF FF S(3)
A(3) FF FF FF FF

FF

B(2) FF FF FF FF FF S(2)
A(2) FF FF FF

FF

B(1) FF FF FF FF FF S(1)
A(1) FF FF

FF

B(0) FF FF FF FF FF S(0)
A(0) FF

Cin FF

MicroLab, VLSI-12 (18/29)


CSA adders
JMM v1.4
Binary Multiplication
Suppose we want to multiply two numbers:
multiplicand
A = {AN-1, AN-2, …, A1, A0}
B = {BM-1, BM-2, …, B1, B0}
multiplier
to produce a (N*M)-
(N*M)-bit result. We can write the product
as
A*B = B0*A*20 + B1*A*21 + … + BM-1*A*2M-1
Note that BK*A can be accomplished with N AND gates
since BK = 0 or 1. The scaling by powers of two is a
simple shift.

Thus multiplication of an N-
N-bit number by an M-
M-bit
number boils down to the addition of M N-
N-bit partial
products each of which is formed by a simple Boolean
operation. Any of the techniques from the previous slides
can be used to accomplish the required additions.

MicroLab, VLSI-12 (19/29)

JMM v1.4
Array multipliers
Example 3x3 array multiplier nc P5
using CSAs to sum partial A2B2
products: P4
0

A2B1 A1B2 0
P3
0

0 A1B1 A0B2
P2
A2B0
0
0 A0B1
P1
A1B0
0
0
P0
A0B0

0 Actual layout is usually squished flat:

MicroLab, VLSI-12 (20/29)

JMM v1.4
Higher Radix Multiplication
Array multipliers are nice, but we get one column of adders (which
(which are
big/slow) for each partial product, i.e., one column for each bit
bit of the
multiplier. If we could use, say, 2 bits of the multiplier in generating
generating
each partial product we would halve the number of columns and double
the speed of the multiplier!
multiplier!

Let’s rewrite our equation for A*B:

A*B = B1,0*A*20 + B3,2*A*22 + … + BM-1,M-


1,M-2*A*2
M-2

This looks the same as before except we have half as many partial
partial
products to sum. Generating each partial product is now more
complicated since BK+1,K can now be 0, 1, 2 or 3. The only
troublesome value here is 3 since that would seem to require more
more
adder inputs than we have (3*A = A + 2*A).

But…
we can also write 3*A = 4*A - A. We’ll do the -A in this partial
product stage and signal the next stage that it needs to add 4*A.
4*A. To
keep the signalling simple we’ll also rewrite 2*A = 4*A - 2*A
Profs go crazy nowadays, why
can‘t he just multiply as
everybody does it

MicroLab, VLSI-12 (21/29)

JMM v1.4
(Radix--4)
Booth Recoding (Radix
A*B = B1,0*A*20 + B3,2*A*22 + … + BM-1,M-
1,M-2*A*2
M-2

AN-1 AN-2 … A4 A3 A2 A1 A0
x BM-1 BM-2 … B3 B2 B1 B0

M/2 2

...

BK+1,K*A = 0*A Ö 0
= 1*A Ö A
= 2*A Ö 4*A - 2*A
BK+1 BK BK-1 action N x1 x2 = 3*A Ö 4*A - A
0 0 0 add 0 -- 0 0
0 0 1 add A 0 1 0
Ai
0 1 0 add A 0 1 0 x1
&
>=1
0 1 1 add 2*A 0 0 1 Ai<<1 &
x2 PPi
1 0 0 sub 2*A 1 0 1 =1
1 0 1 sub A 1 1 0 N
carry-
carry-in
1 1 0 sub A 1 1 0
1 1 1 add 0 -- 0 0
Not cheaper than an ADD but all recodes
can be done in parallel so we only pay
time penalty once (for first column)!
MicroLab, VLSI-12 (22/29)

JMM v1.4
16x32 Booth Multiplier

This multiplier only produces a 32-


32-bit result so
top 16-
16-bits of “rhombus” have been omitted:
top 16 bits omitted
32

MicroLab, VLSI-12 (23/29)

JMM v1.4
Serial Multiplication

• bit-
bit-serial multipliers are very compact, but lack of
high data latency and are very slow
• simplest form of serial multiplier: successive addition

cout
& FF
reset
clk result
clr

A
&
B N-1 bit register

clk

M+N bit product -> td=MN time intervals

MicroLab, VLSI-12 (24/29)

JMM v1.4
Serial/parallel and Pipelined
Multiplication
• serial/parallel multiplier: very modular structure
Y0 Y1 Y2 Y3
X1 X0 0 0
&

&

&

&
P

M+N bit product -> td=M+N time intervals, but time intervals are larger

• pipelined multiplication: 2 delay elements per cell

Yn
Xj Xj+1
&

PPin PPout

MicroLab, VLSI-12 (25/29)

JMM v1.4
Shifters
• Shifters are very important for microprocessor
architectures:
– arithmetic shifting
– logical shifting
– rotation functions
• barrel shifter constructed by transmission gates
shift3 shift2 shift1 shift0

result3

result2

input6 result1

input5
result0
input4
input3
input2
input1 Operation: input
input0 logical right shift 0,0,0,A(3:0)
logical left shift A(3:0),0,0,0
right rotate A(2:0),A(3:0)
left rotate A(3:0),A(2:0)
arithmetic right shift A3,A3,A3,A(3:0)
arithmetic left shift A(3:0),A0,A0,A0
MicroLab, VLSI-12 (26/29)

JMM v1.4
Coming Up...
Next topic…
VLSI fabrication: processing steps, basic
structures, self-
self-aligned processes, P and N devices.

Readings for next time…


Weste:
‹ Sections 8 thru 8.2.1.6 and 8.2.7.3
‹ 8.2.7 thru 8.2.8

Self study Weste:


‹ parity generators 8.2.2
‹ comparators 8.2.3

‹ zero/one detectors 8.2.4

‹ binary counters 8.2.5

‹ Boolean operations - ALUs 8.2.6

MicroLab, VLSI-12 (27/29)

JMM v1.4
VLSI--12
Exercises: VLSI

Ex vlsi12.1 (difficulty: medium): Develop a 1 bit full


adder with not more than 3 fets in series for the
not(sum) and not more than 2 fets in series for the
not(carry) circuit. The not(carry) signal can be
used for the sum circuit.
Result: Notice that the n-
n- and pfet blocks are
identical and not complementary.
A

A B A A B C B

B C
C Carry Sum
B C

A B A A B C B

MicroLab, VLSI-12 (28/29)

JMM v1.4
VLSI--12 con‘t
Exercises: VLSI
Ex vlsi12.2 (difficulty: easy): A 32-
32-bit adder is built as a
carry-
carry-select adder. Each adder as well as the muxes have
one delay unit. Find the optimal structure in respect to
speed.
Result: The maximum speed is 9 time units for a structure
with stages 4-4-4-5-6-7-6 (see Weste pp532)
Ex vlsi12.3 (difficulty: easy): A hierarchical carry-
carry-
lookahead adder (see slide 8) is given. Show
algebraically that C3=G03+ P03 Cin corresponds to the
equation C3=G3+P3 G2 +P3 P2 G1 +P3 P2 P1 G0 +P3
P2 P1 P0 Cin (note that Gii= Gi and Pii= Pi)
Ex vlsi12.4 (difficulty: easy, time consuming): Design a
VHDL code for a 32-
32-bit hierarchical carry-
carry-lookahead
adder (see slide 8). If one block has a delay of 1 time
unit, what is the overall delay.
Result: The total delay is 9 time units
Ex vlsi12.5 (difficulty: medium): Consider X1 as a late
arriving input which needs to be speed up. Develop the
circuit for the function: f = X1⋅ X2 + X1⋅ X3+ X4⋅ X5

MicroLab, VLSI-12 (29/29)

JMM v1.4
VLSI Systems Design
Design Project: Practical Aspects

I am a VHDL expert.
But how applying
in real live – for my MP3 player!

Overview
applying the “description-
“description-synthesis” design
method in practice

Goal: You are able to master your own VHDL project.


project. You
have basic notions about HW/SW co-co-design.

MicroLab, VLSI-13 (1/24)

JMM v1.4
Project Goal
Š Goal:
design of an
an electronic system from specification
down to ASIC/FPGA
Š Problem:
one of the most difficult tasks in a VLSI project
design is to find the starting design point
Š Basic Steps:
in order to proceed in a structured manner, you
should perform the following steps
Š block diagram
Š HW/SW co-co-design (hardware/software co-
co-design)
Š IP cores (intellectual property cores)
hardware software co--design
co

Š FSMD architecture model Š structured software design


Š VHDL coding & simulation Š C coding, compiling
hardware software co--design
co

Š hardware/software system simulation


Š synthesis, place & route
Š back-
back-annotation & simulation (formal design verification)
MicroLab, VLSI-13 (2/24)

JMM v1.4
Š chip test
Initial System Design Steps
Š System design steps
1. identify your chip in the overall system
block diagram

2. define the chip IO and group them to blocks


3. identify functional units of your chip
4. identify the interconnection between your units
co-design

5. identify speed sensitive (HW) and control sensitive (SW)


tasks
HW/SW co-

6. define the “intelligence” of each functional unit

7. identify IP cores
8. organize as much as possible IP cores (tools, core
generators, old designs, internet)
IP cores

9. update design if necessary according to available IP cores


10. define inter-
inter-process communication
11. define the interconnections between your units

Š In the classical HW/SW co- co-design approach, the


design process is continued as long as possible
independent of its implementation. HW/SW design
units are identified at the very end of the design
steps. In smaller designs, as it is in our case, the
HW/SW co- co-design step is done in an early phase.
MicroLab, VLSI-13 (3/24)

JMM v1.4
Project MP3 Player: step 1
(block diagram)
Š Step 1: identify your chip in the overall system

USB
USB LCD
LCD
MP3
MP3Player
Player
ASIC/FPGA
ASIC/FPGA
Keyboard
Keyboard MP3
MP3Decoder
Decoder

Power
Power Flash
FlashMemory
Memory DAC
DAC

MicroLab, VLSI-13 (4/24)

JMM v1.4
Project MP3 Player: step 22--4
(block diagram)
Š Step 2: define the chip IO and group them to
blocks
Š Step 3: identify functional units of your chip
Š Step 4: find the interconnections between your
units
MP3 Player ASIC/FPGA

power main LCD


management control interface

I2C interface
keyboard Decoder
interface interface
I2S interface

USB Flash DAC


interface interface interface

MicroLab, VLSI-13 (5/24)

JMM v1.4
Project MP3 Player: step 5
(HW/SW Co Co--Design)
Š Step 5: identify speed and control sensitive tasks
Š Step 6: define the “intelligence” of each
functional unit
add “intelligence” ?
control sensitive
MP3 Player ASIC/FPGA

power main LCD


management control interface

I2C interface
add “intelligence”

keyboard Decoder
interface interface
I2S interface

USB Flash DAC


interface interface interface

speed sensitive add “intelligence”


MicroLab, VLSI-13 (6/24)

JMM v1.4
Project MP3 Player: step 77--8
(Hardware Design)
Š Step 7: identify IP cores
Š Step 8: organize as much as possible IP cores
(tools, core generator, old designs, internet)

MP3 Player ASIC/FPGA


PIC core LCD
power main interface
management control

I2C interface
Decoder
interface
keyboard
interface

USB core
I2S interface

USB Flash DAC


interface interface interface

MicroLab, VLSI-13 (7/24)

JMM v1.4
Project MP3 Player: step 99--11
(Hardware Design)
Š Step 9: update design if necessary according to
available IP cores
Š Step 10: define inter-
inter-process communication
Š Step 11: define the interconnection between units

MP3 Player ASIC/FPGA


PIC core LCD
power main interface
management control

Decoder
interface
“intelligent”
keyboard Port A
DAC
interface interface
Port B
USB core Port C
Port D

“intelligent” “intelligent” “intelligent”


USB
flash I2S I2C
interface
interface interface interface

MicroLab, VLSI-13 (8/24)

JMM v1.4
Hardware/Software Design Steps
Š Hardware design project steps:
I. imagine your chip working in the target system, identify
and describe its basic functional units in a data-
data-flow view
FSMD architecture model

II. find the RTL structure of each of the above data-


data-flow
functions and update your block diagram by allocating your
RTL structure to one or more functional units
III. fix in detail the operation of your functional units (local
intelligence or data-
data-path only) and add FSMs if required,
fix the detailed interconnections between your units
IV. design all FSMs,
FSMs, define clock strategy, use colored data-
data-
flow, be careful with the inter-
inter-process communications

V. VHDL coding of your RTL design


VHDL coding

VI. test bench design


VII. simulate your VHDL design with test bench

Š Software design project steps:


I. design the software structure as learned in SW
software design

engineering courses
structured

II. define the data structure


III. define the HW/SW communication

IV. develop the C code


C coding

V. compile & verify your C code


MicroLab, VLSI-13 (9/24)

JMM v1.4
Project MP3 Player: step I
(Hardware design project steps)
Š Step I: imagine your chip working in the target
system, identify and describe its basic functional
units in a data-
data-flow view
Š download MP3 song from host to flash
memory (flow 1):
9 generate flash command, generate flash address
9 load byte from USB into register
9 use byte to execute ECC (Hamming code)
9 update flash address
9 store byte into flash
9 write ECC code after 512 bytes
9 generate write-
write-to-
to-flash after 512 bytes
9 use pipeline structure to speed up data transfer
MP3 Player ASIC/FPGA
mainPIC core
LCD
power interface
management control
Decoder
interface
“intelligent”
keyboard Port A
DAC
interface interface
Port B
USB core Port C Port D
“intelligent”
USB
lash “intel.”
intel.” “intel.”
intel.”
interface
interface I2S inter. I2C inter.

MicroLab, VLSI-13 (10/24)

JMM v1.4
Project MP3 Player: step II
(hardware design project steps)
Š Step II: find the RTL structure of each of the
previous data-
data-flow functions and update your
block diagram by allocating your RTL
structure to one or more functional units
Š download MP3 song from host to flash
memory (flow 1):
count
enable
in out

clk

enable enable command


ECC register
in out generator in out

clk clk
sel mux

USB Flash
interface interface
pads to
flash mem
MicroLab, VLSI-13 (11/24)

JMM v1.4
Project MP3 Player: step III
(hardware design project steps)
Š Step III: fix in detail the function of your
functional units (local intelligence or data-
data-path
only) and add FSMs if required, fix the detailed
interconnections between your units

MP3 Player ASIC/FPGA


PIC core
power
management
Software
“intelligent”
C Code
“intelligent”
keyboard
keyboard Port A
(FSMDinterface
architecture)
Port B
USB core Port C
Port D

“intelligent”
“intelligent” “intelligent”
“intelligent”
Hardware
(IP core) Flash &lash
I2S interface I2C
LCD interface
interface
(FSMD architecture) interface
(FSMD architecture)

MicroLab, VLSI-13 (12/24)

JMM v1.4
Project MP3 Player: step IVa
Š Step IVa:
IVa: design all FSMs,
FSMs, define clock strategy, use
colored data-
data-flow, be careful with the inter-
inter-process
communications
Š Clock strategy: Rising edge for data-
data-paths, falling edge for IP
cores and FSMs.
FSMs. All handshake signals between FSMDs and IP
cores on falling edge.
Š Colors:
Colors: make a lot of copies of your RTL data path
Š Colors:
Colors: for each data-
data-flow step, color the old active data paths
leaving a register blue, the new active data-
data-paths leaving a
register green, and data-
data-paths treated with a combinatorial
function in the corresponding dark color. Active control signals
and its blocks are orange. All other data-
data-signals are red. Red
signals are dominant. Be sure that no red signals enter a FSM,
and no darkend or red signals attack asynchronous set/reset of
FFs.
FFs.
count
enable
in out
clk

enable enable command


ECC
in out in out register
generator
clk clk
sel mux

pads to MicroLab, VLSI-13 (13/24)


JMM v1.4 flash mem
Project MP3 Player: step IVb
Š Step IVb:
IVb: design all FSMs,
FSMs, define clock strategy, use
colored data-
data-flow, be careful with the inter-
inter-process
communications
Š we decide to use 3 different FSMs in addition to the ones
present in IP cores
Š the PIC processor core is the main unit, which
communicates with all other FSMD or core units, thus use
inter-
inter-process communication. There is no communication
in-
in-between the other units.
Software
“intelligent” C Code
keyboard
(FSMD)

Hardware “intelligent” “intelligent”


(IP core) Flash & I2S interface LCD interface
(FSMD) (FSMD)

request
process 1

aknowledge
process 2
data data valid

MicroLab, VLSI-13 (14/24)

JMM v1.4
Project MP3 Player: step V
Š Step V: VHDL coding of your RTL design
Š use a processes for data-
data-path manipulation and its
succeeding register
Š use 2 processes for a FSM:
Š one process for transition table (VHDL case)
Š one process for next state (state register)
Š continuous assignment for output function

count
enable
in out

clk

enable enable command


ECC register
in out generator in out

clk Process 1 clk


sel mux

Process 2
pads to
flash mem
MicroLab, VLSI-13 (15/24)

JMM v1.4
Project MP3 Player: step VI
Š Step VI: test bench design
the design of a test bench is one of the most time
consuming and important tasks. A test bench will be
re-
re-used several times during the different design
steps as well as for chip test (have a look at vlsi21)

Test Bench

control response
and generation
stimulus and
generation verification

device under test (DUT)

MicroLab, VLSI-13 (16/24)

JMM v1.4
Final System Design Steps
Š Hardware design project steps:
12. system test bench design
simulation
system

13. hardware/software system simulation with test bench


place and route

14. synthesis of logic level design


15. simulation of logic level with test bench
synthesis

16. place & route your design for target technology

17. back annotation and simulation with test bench


verify

18. (formal design verification)

19. chip fabrication

20. chip test with test bench


test

21. in system test

MicroLab, VLSI-13 (17/24)

JMM v1.4
diagam
Block diagamm of a general System

Š A general system is composed of three elements:


Š user
Š algorithm
Š plant
Š all three items interact with each other resulting in
2 closed loops
Š The closed loops may have real-
real-time constraints

MicroLab, VLSI-13 (18/24)

JMM v1.4
GECKO Design Environment
Š Design entry:
Š C-code software
Š manual RTL hardware
Š algorithms
Š All three design entry elements will be converted
to VHDL and thus can be implemented into a SoC

MicroLab, VLSI-13 (19/24)

JMM v1.4
SoC Design Methodology

Š The specify-
specify-explore-
explore-refine design flow is extended
to a specify-
specify-explore-
explore-refine-
refine-prototype-
prototype-analyse
design flow for SoC designs with real-
real-time
constraints

MicroLab, VLSI-13 (20/24)

JMM v1.4
SoC with GECKO Environment

Š An SoC design using the GECKO system supports


the two chip approach
Š GECKO main board for digital part
Š application specific GECKO expansion board for analog,
power, HF part

Gecko main board

Real Time
Software
Signal Processing
Hardware
Microprocessor
IP Core Hardware
IP blocks SoC

Power Analog
Sensor
blocks blocks

MicroLab, VLSI-13 (21/24)

JMM v1.4
The GECKO system

GECKO Interface Driver

GECKO main board

GECKO main board n top if an


application specific
GECKO expansion board
(RFID reader application
application,, 2 W
13.56MHz RF power)
MicroLab, VLSI-13 (22/24)

JMM v1.4
Hardware--in
Hardware in--the
the--Loop

Š to iteratively improve a design fast prototyping and


data analysis steps are necessary
Š difficult to model plants are preferably not be
modeled and directly included in the simulation
loop
Š variable cut between simulation and hardware
Š respect real-
real-time constraints

hardware-
hardware-in-
in-the-
the-loop

hardware-
hardware-in-
in-the-
the-
software-
software-loop

MicroLab, VLSI-13 (23/24)

JMM v1.4
Homework: MyProject

Š define your own project


Š plan the development and use the presented design
methodology
Š prepare the presentation of your project, be sure
you do have all the necessary documentation for the
discussed design steps

Š MyProject 2002:
2002: speed controlled dc motor
Matlab//Simulink with speed controller
Š Matlab
Š GECKO main board with dc-
dc-motor electronics
Š use hardware-
hardware-in
in--the-
the-simulation-
simulation-loop
Š Implementation constraints:
Š microprocessor with C code for „administrative“ tasks
Š pulse wide modulation for driving dc motor (hardware)
Š A/B signal encoder for speed sensing (hardware)
Š driving circuitry (expansion board) as simple as possible
Š Technical data:
Š dc motor has 6000 turns/minute at 5V
Š speed sensor has 12 pulses per turn

MicroLab, VLSI-13 (24/24)

JMM v1.4
VLSI Design II
CMOS Processing

Overview
‹ Processing steps

‹ processing step sequence

Goal: You know the basics of integrated circuit


processing steps and you are familiar with the
processing sequence of a sample CMOS technology.

MicroLab, VLSI-14 (1/32)

JMM v1.4
Introduction
Š Complementary MOS (CMOS) technology is
becoming the dominant candidate for VLSI
applications
Š CMOS provides both n- n-channel and p-
p-channel MOS
transistors on one chip
Š on extremely expensive fabs cheap chips are
produced
Š each chip passes hundreds of different processing
steps
Š random process disturbances cause electrical
parameter variations of the chips
Š elements are never identical

Process technology pictures and text are copied from:


Atlas of IC Technologies, W. Maly,
Maly, The Benjamin Cummings
Publishing Company, ISBN 0-0-8053-
8053-6850-
6850-7
MicroLab, VLSI-14 (2/32)

JMM v1.4
VLSI Circuit Fabrication
oxidize silicon to form deposit thin layers of material
thin and thick layers of and etch into desired pattern
SiO2 to serve as
insulators.

n+ n+

diffuse dopants into implant ions to set


substrate to create thresholds and achieve
P/N junctions precise dopant profiles

Most fabrication steps require first creating a mask that determines


where the operation will occur. Masks can either be existing layers
layers on
the IC (these masks are “self-
“self-aligned”) or created using a lithographic
process and photoresist.
photoresist.

Design rules ensure that design is still functional in the face of


misalignments and various side-
side-effects of the fabrication process.

MicroLab, VLSI-14 (3/32)

JMM v1.4
Overview
Overview of Processing Step Sequence
n-well

active

poly
Overview of Processing Steps n-diffusion
‹ making the wafers

‹ photolithography p-diffusion
‹ oxidation
contacts
‹ layer deposition

‹ etching metal1
‹ diffusion
via1
‹ implantation

metal2

passivation

MicroLab, VLSI-14 (4/32)

JMM v1.4
Processing Steps:
Making the wafers

Š the basic raw material used is a wafer or disk of


silicon which varies from 3” to 12” in diameter
Š wafers are cut in thin slices (less than 1mm) of
semiconductor cylindrical ingots
Š first step in IC processing is the production of a
single-
single-crystal ingot starting from a silicon melt
with a controlled amount of impurities

MicroLab, VLSI-14 (5/32)

JMM v1.4
Processing Steps:
Photolithography #1

Š Complementary Photolithography is a technique


used in IC fabrication to transfer a desired pattern
onto the surface of a silicon wafer. As such the
photolithography is a key step in the entire circuit
integration process.

Š alternative method for lower quantities: direct write


procedure (E-
(E-beam)

MicroLab, VLSI-14 (6/32)

JMM v1.4
Processing Step:
Photolithography #2

MicroLab, VLSI-14 (7/32)

JMM v1.4
Processing Steps:
Oxidation #1
Thermal oxidation is a process in which silicon (Si
(Si)
Si)
reacts with oxygen to form a continuous layer of
high-
high-quality silicon dioxide (SiO2)
Š oxidation of the silicon surface
Š oxidation through a window in the oxide
Š selective oxide growth
oxidation of the silicon surface

MicroLab, VLSI-14 (8/32)

JMM v1.4
Processing Steps:
Oxidation #2

oxidation through
a window

selective
oxide growth

birds bike
MicroLab, VLSI-14 (9/32)

JMM v1.4
Processing Steps:
Layer Deposition - General
Thin layers of both conduction substances and
insulation materials constitute an important part of
any semiconductor device.
Š epitaxy (single crystal deposition)
Š PVD and CVD process (polycrystalline deposition)

MicroLab, VLSI-14 (10/32)

JMM v1.4
Processing Steps:
Vapour Deposition

PVD

CVD

MicroLab, VLSI-14 (11/32)

JMM v1.4
Processing Steps: Etching

The process that immediately follows the


photolithography step is the removal of material
from areas of the wafer unprotected by photoresist.
photoresist.
Characterization by selectivity and anisotropy.

Š wet etching

Š dry etching

MicroLab, VLSI-14 (12/32)

JMM v1.4
Processing Steps:
Diffusion
Solid state diffusion is a process which allows
atoms to move within a solid at elevated
temperatures.

MicroLab, VLSI-14 (13/32)

JMM v1.4
Processing Steps:
Implantation
The alternative to the diffusion technique of dopant
introduction used in IC manufacturing is ion
implantation.

MicroLab, VLSI-14 (14/32)

JMM v1.4
Drive--in
N-Well Implant & Drive

In p substrate only n-
n-channel fets can be processed.
Therefore an n-
n-well has to be implanted in order to hold
the p-
p-channel fets.
fets.

Window in the mask and cross section illustrated.

MicroLab, VLSI-14 (15/32)

JMM v1.4
Channel--stop Implant
Channel

A “thick” (0.4um) layer of silicon dioxide, called field


oxide, is formed on the surface by oxidation in wet
oxygen. This is then etched to expose surface where we
want to make fets.
fets.

MicroLab, VLSI-14 (16/32)

JMM v1.4
Grow Field Oxide

Formation of active regions for n-


n-channel and p-
p-channel
fets of the CMOS process. The obtained bird’s beak
causes the active area of the device to be significantly
smaller.

MicroLab, VLSI-14 (17/32)

JMM v1.4
Grow Thin Oxide
Now grow a “thin” (0.01um = 100 Angstroms) layer of
silicon dioxide, called gate oxide, on the surface by
exposing the wafer to dry oxygen.

The gate oxide needs to be of high quality: uniform


thickness, no defects! The thinner the gate oxide, the
more oomph the fet will have (we’ll see why soon) but the
harder it is to make it defect free.

MicroLab, VLSI-14 (18/32)

JMM v1.4
Deposit & Etch Polysilicon

On top of the thin oxide a 0.7um thick layer of


polycrystalline silicon, called polysilicon or poly for
short, is deposited by CVD. The poly layer is patterned
and plasma etched (thin ox not covered by poly is etched
away too!) exposing the surface where the source and
drain junctions will be formed:

MicroLab, VLSI-14 (19/32)

JMM v1.4
Implant Nfet Drain & Source

The entire surface is doped, either by diffusion or ion


implantation, with phosphorus (an electron donor) which
creates two n-
n-type regions in the substrate and an ohmic
contact in the n-
n-well. The phosphorus also penetrates the
poly reducing its resistance and affecting the nfet’s
threshold.

MicroLab, VLSI-14 (20/32)

JMM v1.4
Effective Nfet Dimensions

MicroLab, VLSI-14 (21/32)

JMM v1.4
Parasitic Fets

MicroLab, VLSI-14 (22/32)

JMM v1.4
Implant Pfet Drain & Source

Once again the entire surface is doped, either by diffusion


or ion implantation, with boron (an electron acceptor)
which creates two p-
p-type regions in the n-
n-well and an
ohmic contact in the substrate.

MicroLab, VLSI-14 (23/32)

JMM v1.4
Deposit SiO2 insulator

Finally an intermediate oxide layer is grown for isolation and


then reflowed to flatten its surface.

MicroLab, VLSI-14 (24/32)

JMM v1.4
Etch contact cuts

Holes are etched in the oxide where contacts to poly/diff


are wanted.

MicroLab, VLSI-14 (25/32)

JMM v1.4
Deposit & Etch Metal1

For interconnections aluminium is deposited, patterned and


etched.

MicroLab, VLSI-14 (26/32)

JMM v1.4
Voila: a CMOS Inverter!

Finally a passivation layer protects the wafer surface from


contamination and scratches. Pads are opened for bonding.

MicroLab, VLSI-14 (27/32)

JMM v1.4
Planarize

MicroLab, VLSI-14 (28/32)

JMM v1.4
Deposit & Etch Metal2

MicroLab, VLSI-14 (29/32)

JMM v1.4
Double--level Metal CMOS
N-well, Double
Process Steps
1. Grow barrier oxide 23. Deposit SiO2 using CVD
2. Mask/Etch
Mask/Etch n-n-well window 24. Mask/Etch
Mask/Etch contacts
3. P n-well implant through SiO2
4. Thermal drive-
drive-in to deepen n-
n-well 25. Deposit first Al using PVD
5. Remove barrier oxide 26. Mask/Etch
Mask/Etch leaving metal1
6. Grow “pad” oxide wires
7. Deposit Si3N4 27. Grow thick layer of SiO2
8. Mask/Etch
Mask/Etch leaving active region 28. Spin on thick, flat layer of
9. B channel-
channel-stop implant photoresist
10. Grow field oxide (more drive-
drive-in!) 29. Etch SiO2 and photoresist
11. Remove Si3N4 at same rate until only flat
12. Remove pad oxide SiO2 remains
13. B or P implant to adjust VTH Mask/Etch vias through SiO2
30. Mask/Etch
14. Grow thin (gate) oxide 31. Deposit second using PVD
15. Deposit P-doped polysilicon 32. Mask/Etch
Mask/Etch leaving metal2
16. Mask/Etch
Mask/Etch leaving poly wires wires
17. Etch exposed thin oxide 33. Deposit overglass to
18. Mask off p-p-diffusion regions passivate circuit
19. Sb or As nfet source/drain 34. Mask/Etch
Mask/Etch pad windows
implant, n-
n-well contact too
20. Mask all but p-
p-diffusion regions
21. B pfet source/drain implant
22. Thermal source/drain annealing

MicroLab, VLSI-14 (30/32)

JMM v1.4
Coming Up...
Next time:
Mask layout: design rules, layout examples,
structured and symbolic layout techniques,
retargetable layouts. CAD tools for layout: design
capture, design rule checking, extraction, network
comparison.

Readings for next time…


Weste:
‹ Chapter 3 thru 3.2.3
Johns&Martin:
‹ 2 through 2.1 (CMOS processing)
Transparencies:
‹ transparency notes (process technology)

Study CBT course on the web or on I3S-


I3S-CD:
How a silicon integrated circuit is made ((Uni
Uni
Manchester)

MicroLab, VLSI-14 (31/32)

JMM v1.4
VLSI--14
Exercises: VLSI

‹ Weste pp168: 3.8 ex 5 (difficulty: easy): Explain


why substrate and well contacts are important in
CMOS.

MicroLab, VLSI-14 (32/32)

JMM v1.4
VLSI Design II
CMOS Layout

Measure twice, fab once

Overview
‹ CMOS Layout and Design Rules

‹ Analog Layout Design Considerations

Goal: You are familiar with the basic layout design


0.5µm CMOS process. You
rules of the Alcatel 0.5µ
know how to layout integrated transistors,
capacitors and resistors, and what has to be
considered in order to realize quality analog
circuits, like matching and shielding.
MicroLab, VLSI-15 (1/36)

JMM v1.4
Sources of Error
‹ Line registration errors
resist exposure and development
over/under etching, lateral diffusion
uneven topography
Ö systematic errors corrected by bloating/
shrinking mask
Ö random errors increase minimum widths
and spacing
‹ Mask misalignment
Ö random errors increase extensions and
surrounds
‹ Other fab difficulties
Ö contacts and vias only on “flat” surfaces
Ö no devices near boundaries of well
Ö no poly contacts over diffusion
Ö “gate” metal must connect to diffusion
Ö minimum metal coverage requirements
‹ Electrical properties
Ö current density limitations
Ö latch-
latch-up prevention
‹ Process instabilities
mobility variations (why?)
thin-
thin-oxide thickness variations
sheet resistances
Ö use of “process corners” in analysis

MicroLab, VLSI-15 (2/36)

JMM v1.4
Design vs. Actual IC

MicroLab, VLSI-15 (3/36)

JMM v1.4
Line Registration Errors

MicroLab, VLSI-15 (4/36)

JMM v1.4
Mask Alignment Errors (I)

MicroLab, VLSI-15 (5/36)

JMM v1.4
Mask Alignment Errors (II)
Maly,
Maly, Figure 2-9

MicroLab, VLSI-15 (6/36)

JMM v1.4
Design Rules
Exclusion rule extension rules
enclosure rules (overlapping)

width rules

spacing rules
‹ We can specify the design rules using some convenient
units, e.g., microns but what happens if we want to
manufacture the chip using different manufacturers?
‹ One suggestion: use an abstract unit, the lambda, and scale
the design to the appropriate actual dimensions when the
chip is to be manufactured.
‹ Usually all edges must be “on grid”, e.g., in the MOSIS
scalable rules, all edges must be on a half lambda grid, on
0.5µm Alcatel all edges must be on 0.05µ
the 0.5µ 0.05µm grid.
MicroLab, VLSI-15 (7/36)

JMM v1.4
Lambda--based Rules
Lambda
One lambda (λ(λ)= one half of the “minimum” mask
dimension, typically the length of a transistor channel.
Under the assumption that the worst case alignment is
0.75λ, the maximum relative misalignment
better than 0.75λ
1.5λ. This can be
between any two masks is better than 1.5λ
used to derive design rules and to estimate minimum
dimensions of a junction area and perimeter before a
transistor has to be laid out.

3λx3λ
x3λ



4λ 2λ

3λ 2λ
diffusion (active)

poly
1λ 2λ
metal1
contact 6λ

0.5µ
For 0.5µm Alcatel process:
0.25µ
λ= 0.25 µm 5λ
MicroLab, VLSI-15 (8/36)

JMM v1.4
Lambda vs. Micron Rules
Lambda-
Lambda-based design rules are based on the assumption
that one can scale a design to the appropriate size before
manufacture. The assumption is that all manufacturing
dimensions scale equally,
equally, an assumption that “works” only
over some modest span of time. For example: if a design
2λ and a metal width of
is completed with a poly width of 2λ
3λ then minimum width metal wires will always be 50%
wider than minimum width of poly wires.

0.5µm process
Consider the following data from Alcatel 0.5µ
(compare with Weste, Table 3.2 pp145):
lambda lambda micron
contacted metal pitch rule = 0.25u rule
1/2 * contact size 1.5λ 0.375µ 0.3µ
contact surround 1λ 0.25µ 0.25µ
metal-
metal-to-
to-metal spacing 4λ 1.0µ 0.8µ
contact surround 1λ 0.25µ 0.25µ
1/2 * contact size 1.5λ 0.375µ 0.3µ
9λ 2.25µ 1.9µ

+40% in area
Scaled design is legal
but much larger than
it needs to be!

MicroLab, VLSI-15 (9/36)

JMM v1.4
Retargetable Layouts?

So, should one use lambda rules, or not?


‹ probably okay for retargeting between “similar”
processes, e.g., when later process is a simple
“shrink” of the earlier process. This often happens
between generations as a mid-
mid-life kicker for a
0.35µm processes are shrinks of
process. Some 0.35µ
0.5µm process. Can be useful for
an earlier 0.5µ
“fabless”
fabless” semiconductor companies.
‹ most industrial designs use micron rules to get the
extra space efficiency. Cost of retargeting by hand
is acceptable for a successful product, but usually
it’s time for a redesign anyway.
‹ invent some way of entering a design symbolically
but use a more sophisticated technique for
producing the masks for a particular process.
Insight: relative sizes may change but topological
relationship between components does not.
not. So,
instead of shrinking a design, compact it!

MicroLab, VLSI-15 (10/36)

JMM v1.4
0.5µ
0.5µm CMOS Alcatel Mietec Process
C05M--D
Layers and mask definition: C05M
layer name drawn mask name
active yes active
nwell yes n-well
pwell no (p-
(p-well)
poly yes poly
nplus no (n+ implant)
pplus yes p+ implant
contact yes contact
metal_1 yes metal 1
via_1 yes via 1
metal_2 yes metal 2
via_2 yes via 2
metal_3 yes metal 3
nitride yes nitride
dractext yes -
nldd no (no low doped drain, Zener) Zener)
nlddprot yes -
JMM v1.4
nplusprot yes - MicroLab, VLSI-15 (11/36)
C05M--D: some logical descriptions
C05M
logical name used masks
nwell = nwell
pwell = nwell
n+diffusion = active and pplus and poly
p+diffusion = active and pplus and poly
n+source/drain = active and pplus and poly and nwell
p+source/drain = active and pplus and poly and nwell
gate = active and poly

locical masks

pfet

nwell nwell

n+diffusion active

p+diffusion pplus nfet

poly

MicroLab, VLSI-15 (12/36)

JMM v1.4
(C05M--D)
Layout Rules (C05M #1

‹ n-well, active

1.7µm
1.7µ

n strap
0.8µm
0.8µ

0.8µm
0.8µ
n-well
0.5µm
0.5µ 0.5µm
0.5µ on same
0.6µm 2µm (3µ
0.6µ (3µm) (different)
1µm potential
n strap
1.1µm
1.1µ

0.7µm
0.7µ 1µm 2.4µm
2.4µ
1.1µm
1.1µ
p strap
0.6µm
0.6µ

1µm

MicroLab, VLSI-15 (13/36)

JMM v1.4
(C05M--D)
Layout Rules (C05M #2

‹ poly, fets

0.5µm
0.5µ
0.6µm
0.6µ 0.6µm
0.6µ

0.6µm
0.6µ

1.1µm
1.1µ
0.7µm
0.7µ
1.1µm
1.1µ
0.35µm
0.35µ

0.6µm
0.6µ

MicroLab, VLSI-15 (14/36)

JMM v1.4
(C05M--D)
Layout Rules (C05M #3

‹ abutting straps

abutting
strap
1.6µm
1.6µ
abutting
strap
0.8µm
0.8µ

0.8µm
0.8µ 1.15µm
1.15µ 0.6µm
0.6µ

1.1µm
1.1µ

1.1µm
1.1µ

0.6µm
0.6µ
1µm
0.8µm
0.8µ 1.15µm
1.15µ
abutting 1.15µm
1.15µ
0.8µm
0.8µ
strap

MicroLab, VLSI-15 (15/36)

JMM v1.4
(C05M--D)
Layout Rules (C05M #4
contact

via1
‹ metal, contacts, via1, via2 via2

0.7µm
0.7µ 0.9µm
0.9µ 1.1µm
1.1µ
0.8µm
0.8µ 0.9µm
0.9µ 1.1µm
1.1µ

0.25µm
0.25µ
0.2µm
0.2µ 0.7µm
0.7µ
0.25µm
0.25µ 0.2µm
0.2µ

0.8µm
0.8µ 0.9µm
0.9µ 1µm

0.6µm
0.6µ via2
0.8µm
0.8µ
via1
via1 need to be 0.5µm 0.25µ
0.5µ 0.25µm
covered by metal2
contact
0.35µm
0.35µ
contacts need to be
0.6µm
0.6µ covered by metal1
0.25µm
0.25µ
0.8µm
0.8µ
MicroLab, VLSI-15 (16/36)

JMM v1.4
Sticks and Compaction

Stick diagram Horizontal constraints


for compaction in X

Compact X then Y Compact Y then X

Compact X with jog


insertion, then Y

MicroLab, VLSI-15 (17/36)

JMM v1.4
Digital Layout: Choosing a “style”

Vertical Gates Horizontal Gates


Good for circuits where fets sizes are Good for circuits where long and
similar and each gate has limited short fets are needed or where nodes
fanout.
fanout. Best choice for multiple must control many fets.
fets. Often used
input static gates and for datapaths.
datapaths. in multiple-
multiple-output complex gates
(e.g, sum/carry circuits).

What about routing signals between gates? Note that both layouts
layouts block
metal/poly routing inside the cell. Choices: metal2 routing over
over the cell or
routing above/below the cell.
Š avoid long (> 50 squares) poly runs
Š don’t “capture” white space in a cell
Š don’t obsess over the layout, instead make a
second pass, optimizing where it counts
MicroLab, VLSI-15 (18/36)

JMM v1.4
Digital Layout:
Optimising Connections

Which is the better gate layout?

Š considering node capacitances?

Š considering “composibility
“composibility”
composibility” with
neighbouring gates?

MicroLab, VLSI-15 (19/36)

JMM v1.4
Digital Layout: Big vs. Parallel
can’t make gates too
long because of poly
resistance! Eventually
really large transistors
have to broken into
smaller transistors in
wired in parallel.

94µm2
area = 94µ 73µm2
area = 73µ

Which is the better gate layout?

Š considering node capacitances?


133µm2
area = 133µ
Š considering “composibility
“composibility”
composibility” with
neighbouring gates?

MicroLab, VLSI-15 (20/36)

JMM v1.4
Digital Layout: Eliminating Gaps
A B C D E A

B C

D E

B D
A
C E

C B A E D
B C

D E

B D
A
C E

MicroLab, VLSI-15 (21/36)

JMM v1.4
Analog Layout: Large Transistors

‹ W/L can be very large in analog circuits


‹ due to asymmetric layout, node1 has a smaller
capacitor which should be used for the most critical
node (high impedance)

node 1

J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5

node 2 gates

node 1

Q1 Q2 Q3 Q4

node 2
MicroLab, VLSI-15 (22/36)

JMM v1.4
Analog Layout: Matching
‹ Using lithography techniques a variety of two-
two-
dimensional effects can cause effective sizes of
components to differ from the sizes of the glass
layout masks.
‹ lateral diffusion
‹ overetching

‹ mask misalignment ...

second--order size error effects is done


Goal: Matching second
unit--
mainly by making larger objects out of several unit
sized components connected together. For best
accuracy, the bounding conditions around all objects
should be matched, even when this means adding
extra unused components.

SiO2 protection
SiO2 protection

well poly gate

lateral diffusion overetching


under SiO2 mask

MicroLab, VLSI-15 (23/36)

JMM v1.4
Matching Transistor Layouts:
Common--Centroid Layout
Common
‹ use interdigitated G
M2
finger structures M2
for keeping the
effect of temp
SM1,M2 M1
and oxide
thickness M1
gradients low
‹ use one outside M2
finger for M1, one
for M2 M2

‹ symmetry in x & y
M1
‹ fets in analog
circuitry are M1
typically much
wider than in M2
digital circuits
SM1,M2 M2

GM1 GM1
M1

DM1 DM2
DM1 GM1 DM2
MicroLab, VLSI-15 (24/36)

JMM v1.4
Capacitor Matching #1

‹ material
‹ preferable poly1 - poly2 structures (only C05M-
C05M-A)
‹ if not available: poly1 - diffusion (C05M-
(C05M-D), but
nonlinear due to voltage dependency
‹ sandwich structures with poly - metal1

‹ in analog design very often precise ratios of


capacitors are used
‹ major sources of errors in realized capacitors are
due to overetching and something less relevant is
an oxide thickness gradient across the surface.
Goal: Larger capacitors are realized by a parallel
unit--sized capacitors
combination of smaller unit
overetching).
(overetching unit--size capacitors are not
). If unit
realizable, overetching can still be minimized by
realizing a nonunit
nonunit--sized capacitor with a specific
perimeter--to area ratio. For very accurate ratios
perimeter
common--centroid layout is used (oxide
additionally common
thickness gradient).

MicroLab, VLSI-15 (25/36)

JMM v1.4
Capacitor Matching #2
x
xa = x − 2∆e
y x − 2∆e ya = y − 2∆e
y − 2 ∆e
∆e
ε ox
C= A = Cox xy
∆e Ca tox

Ca = Cox xa ya = Cox (x − 2∆e )( y − 2∆e ) poly top plate


∆Ct = Cox xa ya − Cox xy poly bottom plate

∆Ct ≅ −2∆e(x + y )Cox

∆C t − 2∆e ( x + y )
ε = = C1 C2
C xy
C 2 a C 2 (1 + ε 2 ) C2 C1
=
C1a C1 (1 + ε 1 )

ideally ε1 = ε 2

C 2 a nC1a nC1 (1 + ε ) poly etch matching


= = =n well region
C 1a C1a C1 (1 + ε ) well contacts
MicroLab, VLSI-15 (26/36)

JMM v1.4
Capacitor Matching #3

‹ unit sized capacitors C1 are squared


‹ nonunit-
nonunit-sized capacitors C2 are rectangular and
usually between 1 and 2 times unit-
unit-sized capacitors
(K>1)
C2 A2 x2 y2
K= = = 2
C1 A1 x1
‹ perimeter-
perimeter-to-
to-area ratio should be kept identical

P2 P1
=
A2 A1
P2 A2
= =K
P1 A1
4 units

x2 + y 2
K=
2x1

(
y2 = x1 K ± K 2 − K ) K=1 ... 2

MicroLab, VLSI-15 (27/36)

JMM v1.4
Analog Layout: Resistor #1

L ρ
‹ resistor value: R = Rsq Rsq =
W t

‹ material: many different materials can be used.


They have different non-
non-ideal effects. Absolute
accuracy is low (+-
(+-20% or less), matching can be
made to be in the order of 1% at most.
‹ polysilicon ((salicided
salicided and non salicided in C05M-
C05M-A and
C05M-
C05M-D process)
‹ diffusions or ion-
ion-implanted regions (n/p-
(n/p-diff, n-
n-well)

material typ Rsq temp coeff nonideality


metal1 72mΩ
72mΩ 0 not used
metal2 55mΩ
55mΩ 0 not used
metal3 34mΩ
34mΩ 0 not used
salicid poly 2.3Ω
2.3Ω 4300ppm/C parasitic cap
most common used

n+ diff sal 2.3Ω


2.3Ω 4300ppm/C v dep,
dep, non lin
p+ diff sal 2.1Ω
2.1Ω 4300ppm/C v dep,
dep, nonlin
unsal n+poly 325Ω
325Ω −2000ppm/C
2000ppm/C parasitic cap
n+ diff unsal 50Ω
50Ω 1600ppm/C v dep,
dep, non lin
p+ diff unsal 70Ω
70Ω 1600ppm/C v dep,
dep, nonlin
n-well 1.3kΩ
1.3kΩ 4300ppm/C v dependent
MicroLab, VLSI-15 (28/36)

JMM v1.4
Analog Layout: Resistor #2
Examples of possible resistor layout

0.14 Rsq

2.11 Rsq
matched resistors

MicroLab, VLSI-15 (29/36)

JMM v1.4
Analog Layout:
Noise Considerations #1
Where does noise coupling occur
‹ every time a digital gate changes its state a glitch
is injected on the digital power supply and in the
surrounding substrate
‹ direct ohmic connections (power supply line)

‹ via electromagnetic fields (e.g. capacitive coupling


in and from substrate)

How can noise be reduced


‹ use of different power supply lines

‹ layout analog and digital circuitry in different


sections of the chip
‹ protect analog layout by guard rings

‹ use shields connected to power and ground

analog part digital part analog part digital part analog part digital part

pad pad
pad pad pad

pin pin pin pin

power supply power supply power supply


MicroLab, VLSI-15 (30/36)

JMM v1.4
Analog Layout:
Noise Considerations #2
‹ Use of shields
analog interconnect digital interconnect

ground shield n+ n+ n+
n-well
p- substrate

‹ Separate analog and digital parts with guard rings

VSS VDD VSS

p+ n+ p+
n-well
analog region digital region
depletion region
p- substrate as bypass capacitor

MicroLab, VLSI-15 (31/36)

JMM v1.4
Summary of Analog Layout Rules

When drawing layout for analog circuits, one has to


consider many details
‹ layout design rules, in order to get correct circuits
without shortcuts between layers, or open circuits
due to misaligned layers
‹ avoid parasitic components
9 resistors: take care of length of interconnect wires and
material used for interconnects Add enough contacts.
9 Capacitors: There is a parasitic capacitor between any
two isolation layers. Minimize size of all areas that do
not need to have a specific size for their functionality.
‹ Increase matching accuracy by
9 using common centroid layout
9 using non minimum sized components
9 using capacitors with constant area to perimeter ratio
‹ reduce noise coupling by
9 separating analog and digital parts
9 using separate power supplies
9 using shielding techniques

MicroLab, VLSI-15 (32/36)

JMM v1.4
Checking Layouts
Design Rule Checker (DRC). This is a program that checks each
piece of the layout against the process design rules. This is a
slow process:
‹ canonicalize layout into a set of leading and

‹ trailing non-
non-overlapping mask edges. Some Boolean mask
operations may be needed. determine electrical connectivity
and label each edge with the node it belongs to.
‹ test each edge end point against neighboring
edges to check for spacing (leading edges) and width
(trailing edges) violations.
Layout vs. Schematic (LVS). First a netlist is extracted from the
layout. Use the electrical info generated by the DRC and then
recognize transistors are juxtapositions of channel with
diffusion. Then see if extracted netlist is isomorphic to the
schematic netlist. This is done by a coloring algorithm:
‹ initialize all nodes to the same color

‹ compute a new color for each node as some hashing


function involving the colors of connected (ie (ie,
ie, thru a fet)
fet)
nodes.
‹ nodes that have a unique color are isomorphic to similarly
colored node in other network
‹ worry about parallel fets,
fets, ambiguous nodes
MicroLab, VLSI-15 (33/36)

JMM v1.4
Coming Up...
Next topic:
Small signal fet model

Readings for next time…


Weste:
‹ 3.4 through 3.4.7
Johns&Martin:
‹ 2.3 (CMOS layout design rules)
‹ 2.4 (analog layout design considerations)

Optional
‹ have a look at Alcatel CMOS C05M-
C05M-D design rules
manual

MicroLab, VLSI-15 (34/36)

JMM v1.4
VLSI--15
Exercises: VLSI #1

0.5µm
Ex vlsi15.1 (difficulty: easy): Assume the 0.5µ
Alcatel Mietec process. Use the λ rules to
calculate the minimal area and perimeter of the
following layout structure.
=4.5µm2, AJ2=3.188µ
Result: a) AJ1=4.5µ =3.188µm2,
=2.25µm2, PJ1=6µ
AJ3=2.25µ =6µm, PJ2=6µ=6µm,
=1.5µm (see Johns&Martin pp99)
PJ3=1.5µ

J2
J1 J3

Q1 Q2

MicroLab, VLSI-15 (35/36)

JMM v1.4
VLSI--15
Exercises: VLSI #2

John&Martin pp110: 2.3 (difficulty: easy): Show a


layout that might be used to match two capacitors
of size 4 and 2.314 units, where a unit-
unit-sized
10µm x 10µ
capacitor is 10µ 10µm.
=19.56µm, x2=6.717µ
Result: y2=19.56µ =6.717µm

2.314 units

4 units

John&Martin pp123ff: 2.14, 2.15, 2.16, 2.17

MicroLab, VLSI-15 (36/36)

JMM v1.4
Intro to VLSI Systems
CMOS Layout (replicating)

Measure twice, fab once

Today’s handouts:
(1) Lecture Slides
(2) Problem Set #5
(3) Inverter Layout Tutorial

MicroLab, VLSI-16 (1/16)

JMM/ESA v1.0
Design for Re-use

w what’s the schematic for this cell?

w what are the “fat” fets?

w Cell was designed for placement “under” a


metal2/metal3 routing grid. How was the
layout affected by this design requirement?

MicroLab, VLSI-16 (2/16)

JMM/ESA v1.0
Replicating Cells

What does this cell do?

What if we want to replicate this cell vertically, i.e., make a stack of


the cells, to process many bits in parallel?
w what nodes are shared among the cells?
w what nodes aren’t shared?
w how should we arrange the cells vertically?

MicroLab, VLSI-16 (3/16)

JMM/ESA v1.0
Vertical Replication

Place shared geometry


symmetrically about
shared boundary.

Place items that aren’t


to be shared 1/2 min
spacing rule from shared
boundary.

Reflect cell about X axis


so that Pfets are next
to each other: this avoids
large ndiff/pdiff spacing.

Run shared control


signals vertically -- they’ll
wire themselves up
automatically?

MicroLab, VLSI-16 (4/16)

JMM/ESA v1.0
Vertical Intercell Routing
carry-out to
cell above
S’pose we have a signal
that will run vertically from
one cell to the next, e.g., the
carry-out from one cell becomes
the carry-in for the cell above.

Looks okay until we reflect the


cell when we do the vertical
replication!
carry-in from
cell below

Solution: we have to do the


routing for vertical intercell
signals for a pair of cells,
then replicate the pair
(complete with routing)
vertically.

MicroLab, VLSI-16 (5/16)

JMM/ESA v1.0
Building a Datapath
It’s often the case that we want to operate on many bits in parallel. A
sensible way to arrange the layout of this sort of logic is as a datapath
where data signals run horizontally between functional units and
control signals run vertically to all the bits of a particular functional
unit:

control

bit #3
bit #2
bit #1
bit #0 data

Logic that generates the control signals can be placed at the bottom of
the datapath. If control logic is complicated or irregular, it might be
placed in a separate standard cell block and only the control signal
buffers placed placed just below the datapath. Although it’s tempting
to run control signals in poly (so they can control fets) this is unwise
for tall datapaths because of poly resistance (e.g., 32 bits x 20u/bit
= 640u = ~1000 squares = ~20k ohms!)

MicroLab, VLSI-16 (6/16)

JMM/ESA v1.0
Datapath Bit Pitch
How tall should we make each bit of the datapath?
That depends on
w the width of the nfets and pfets
w how much in-cell routing there is
w how much over-the-cell global routing there is
Global routes can be determined from datapath schematic:

Three global routing Internal routing may


tracks required take additional tracks

RESULT
OP1
OP2
SHIFTER

ADDER
BOOLE

MULT

OP EN OP EN EN CIN EN

Cell routing plan: vdd (m2)

global route (m2)

in-cell route (m2) control (m1)

gnd (m2)

MicroLab, VLSI-16 (7/16)

JMM/ESA v1.0
Adder Datapath

power strapping (M1=GND, M3-VDD)

32-bit carry-lookahead adder


tristate output enable control logic
32-bit register w/ tristate driver
MicroLab, VLSI-16 (8/16)

JMM/ESA v1.0
Shifter Datapath

>>4 >>2 >>8

<<16 <<1 <<8 <<2 <<4


shift right MicroLab, VLSI-16 (9/16)

JMM/ESA v1.0
Design for Re-use

w what’s this cell do?

w what are the “fat” fets?

w Cell was designed for placement “under” a


metal2/metal3 routing grid. How was the
layout affected by this design requirement?

MicroLab, VLSI-16 (10/16)

JMM/ESA v1.0
Breaking the Rules

BIT BIT

word line

w How are neighboring cells placed?


w Isn’t the word line a long poly wire?
w Where’s the p-substrate contact?

MicroLab, VLSI-16 (11/16)

JMM/ESA v1.0
Coming Up...
Next time:
Scaling effects, fundamental limits. Submicron
design issues. Power dissipation and packaging.
Readings for next time…
Weste: 6.3.7 through 6.3.9

MicroLab, VLSI-16 (12/16)

JMM/ESA v1.0
Intro to VLSI Systems
Predicting the Future

…I see… I see… a supercomputer


the size of a sugar cube...!

Neat. Where do I invest?

Today’s handouts:
(1) Lecture Slides
(2) Mead and Conway, Chapter 9
(1981)

MicroLab, VLSI-17 (1/20)

JMM/ESA v1.0
Scaling

Over time, process improvements will allow MOSFETs


to scale down by some factor α

w/α tox/α

l/α

t/α
xj/α
α NA

What happens?

“Scaling Theory” is a model which provides first order


predictions.

MicroLab, VLSI-17 (2/20)

JMM/ESA v1.0
Often, different dimensions will scale at different
rates. But for an overall picture of what the future
portends, there are two major scaling models:

1. Constant Voltage Scaling


All spatial dimensions scale equally:
W W/α
L L/α
tox tox/α
and some other dimensions do as well:
d d/α depletion thickness
NA α NA doping

2. Constant Field - scale VDD too:


V V/α
so that electric fields remain the same

MicroLab, VLSI-17 (3/20)

JMM/ESA v1.0
First, let’s consider constant field scaling, and use
basic MOSFET models to predict the effect of
scaling by α

Parameters Effect
W/L
Cg = Cox W L
Id Cox (W/L) (Vgs-Vt) 2
device power = V I
Area = W L
device power / Area
Rdiff
Rmetal
Rpoly

MicroLab, VLSI-17 (4/20)

JMM/ESA v1.0
Speedup!
L

e-
τ = L/(µE)
Transit time τ scales as ___________

Can also compute as time to discharge gate


capacitance:

delay=Cg V/I

Gate discharge time scales as _________

MicroLab, VLSI-17 (5/20)

JMM/ESA v1.0
Interconnect

Local (metal) Interconnect Delay = RC


L
W
t
I

R = ___________
Scaled R = R ________
C = ____________
Scaled C = C ________

Scaled Delay = delay ___________

This turns out to be an overoptimistic prediction -


more later...

MicroLab, VLSI-17 (6/20)

JMM/ESA v1.0
Scaling Table

First Order Scaling (Weste Table 4.12)


In lateral scaling, we only
change the channel length L

Parameter Scaling Model


Constant Field Consant Voltage Lateral
Length (L) 1/a 1/a 1/a
Width (W) 1/a 1/a 1
Voltage (V) 1/a 1 1
Gate oxide thickness (tox) 1/a 1/a 1
Current 1/a a a
Transconductance 1 a a
Junction Depth 1/a 1/a 1
Substrate Doping (Na) a a 1
Gate Field (E) 1 a 1
Depletion layer thickness 1/a 1/a 1
Load Capacitance (WL/tox) 1/a 1/a 1/a
Gate Delay (VC/I) 1/a 1/a^2 1/a^2

Resulting Influence
DC Power dissipation 1/a^2 a a
Dynamic Power Dissipation 1/a^2 a a
Power-delay product 1/a^3 1/a 1/a
Gate Area 1/a^2 1/a^2 1/a
Power-density (VI/A) 1 a^3 a^2
Current Density a a^3 a^2

Devices get faster, lower power,


though current density goes up.
Devices get even faster,
though overall power and
power density rise

MicroLab, VLSI-17 (7/20)

JMM/ESA v1.0
Die Size
With basic scaling of the same system, we’d just end
up with smaller and smaller chips.

However, from year to year, the overall die size stays


about the same or grows as we add features to the
chip.

Fab improvements (mostly, bigger wafers) are what


allow for bigger die.

Because the die doesn’t shrink, global interconnect,


particularly clocks and on-chip buses, don’t shrink
either.

MicroLab, VLSI-17 (8/20)

JMM/ESA v1.0
Global Interconnect Scaling

Interconnect scaling for global signals:


L

W/α t/α

d
scaled R = R * α 2

scaled C = C
scaled delay = delay * α 2

Even worse: wire starts looking like lossy distributed


rc wire - O(L2) delay!

In the submicron domain, this increased significance


of wire has led to major CAD industry turmoil.

MicroLab, VLSI-17 (9/20)

JMM/ESA v1.0
Power Scaling

Power per chip increases with constant voltage scaling


and when die size grows. How does this affect us?
Junction temperature is a function of power and
thermal resistance θja to environment.
Example: a 30W chip at 27 C ambient.
Junction temp. = 27C + 30W*θja

θja=2 C/W
Junction temp = ___________

Junction temp = _______


θja=0.1 C/W

heat sink

Heat through pins chip


to PC board
MicroLab, VLSI-17 (10/20)

JMM/ESA v1.0
In the submicron domain, it’s difficult to scale VDD,
so power faces the “constant voltage” scaling of α2

This adds impetus to the already-important goal of


reducing power of VLSI systems. Some of the main
ways of doing this:

1. Reduce unnecessary on-chip transitions by careful


logic design, or by disabling the clock to idle
systems.

2. Reduce voltage, use more parallelism.

3. Adiabatic logic.

MicroLab, VLSI-17 (11/20)

JMM/ESA v1.0
Problems with scaling theory

Can one scale indefnitely? No.

What are the limits?

Are they fundamental limits?


Is there any difference?
Are they technical limitations?

Is the current technology close to those limits?

MicroLab, VLSI-17 (12/20)

JMM/ESA v1.0
Some limits

Current Density J increases with α


L
J=I/(Wt)
scaled I = I /α W
scaled J = J α
t
I

Metal migration imposes a limit on current density.


==> Thicker wires and more metal
layers needed.
==> Increased fringing capacitance with
thicker wires.

Punchthrough: source/drain depletion regions touch


VPT=(L 2 q NA)/(2ε)
L

Xd

MicroLab, VLSI-17 (13/20)

JMM/ESA v1.0
Subthreshold leakage

Subthreshold conductance is proportional to


Vgs-Vt
exp (- kT/q
)

We can scale V t by α via ion implantation.

kT/q = 0.025V does not scale.

Vt falls =====> Subthreshold current


______________ exponentially.

Example: Vt = 0.5V means that leakage current time constant is


10 7 τ
Vt = 0.1V means that leakage current time constant
is 10 1 τ

MicroLab, VLSI-17 (14/20)

JMM/ESA v1.0
Threshold Variations

Threshold varies from transistor to transistor.

VDD

Vout

If pullup has “big” threshold,


pulldown has “small” threshold,
and sum of variances > VDD
then inverter will not invert
(Vout = 0V always.)

How likely is this?

MicroLab, VLSI-17 (15/20)

JMM/ESA v1.0
Threshold Variations, cont.

Analyze using Gaussian distribution.


P(given inverter fails) = exp(-4 VDD / ∆Vth)

For given inverter...


∆Vth = 0.08V, (Mead & Conway, p. 343)
VDD = 5V ===> P = 10-110
VDD = 0.5V ===> P = 10-11

But with 10,000,000 transistors on the chip, a


broken chip is very likely.

Question: Should threshold variance


increase or decrease with scaling?

MicroLab, VLSI-17 (16/20)

JMM/ESA v1.0
Lithographic Scaling Limits

Insert p. 1110 of Halliday and Resnick Here

Ultraviolet = λ = 0.3 µ
X-Ray Lithography, λ = __________
Synchrotron lithography?
Wavelength of an electron?
Cost of FABs.
Optical tricks.

MicroLab, VLSI-17 (17/20)

JMM/ESA v1.0
Fundamental Physical Limits

Thermodynamic
How much entropy change to set a bit?
Reversability

Quantum Limits
Tunnelling For Eb of 1eV, the gate oxides and
depletion layers must be thicker than 1 nm. In the
IBM 0.4um process, the gate thickness is 7
nm.

Thermal Limits

MicroLab, VLSI-17 (18/20)

JMM/ESA v1.0
Is this the beginning of the end?

Not really.

VLSI is not yet really up against any fundamental


physical constraint.

The constraints that we’re facing are technological


hurdles.

With sufficient economic incentive, technological


hurdles are cleared.

Wires are a lot more important than in the past.

MicroLab, VLSI-17 (19/20)

JMM/ESA v1.0
Coming Up...
Next topic…
MOS memories. Static and dynamic RAM cells.
Single and double-ended bit line sensing.
Multiport register files.
Readings for next time…
Weste: 4.13

MicroLab, VLSI-17 (20/20)

JMM/ESA v1.0
Intro to VLSI Systems
CMOS Memories

I wonder which part


does the remembering?

Today’s handouts:
(1) Lecture Slides

MicroLab, VLSI-18 (1/21)

JMM/ESA v1.0
Semiconductor Memories
Usually the majority of transistors found in a modern system are devoted
to data storage in the form of random-access memories. The need for
increased densities and lower prices has driven the development of
improved VLSI technology.

Uses:
“main” memory ⇒ high capacity, low cost
cache memories, TLB’s ⇒ fast access
programming info (eg, FPGA) ⇒ non-volatile

Read-only memories: ROM (non-volatile!)


Mask programmed
Programmable ROM (PROM)
Erasable PROM (EPROM)
Electrically Erasable PROM (EEPROM)
Read/Write or Random Access memories: RAM
Static RAM (SRAM)
Multiport SRAM (Register Files)
Content-Addressable Memories (CAM)
Non-volatile SRAM (NVRAM)
Dynamic RAM (DRAM)
Serial-access video memories (VRAM)
Synchronous DRAM (SDRAM)
RAMBUS
...

MicroLab, VLSI-18 (2/21)

JMM/ESA v1.0
Design Tradeoffs
density: bits/unit area. Usually higher density
also means lower cost per bit. Improvements due
to finer lithography, better capacitor structures,
new materials with higher dielectric constants.

Speed: access time (latency) and


bandwidth. Improvements due to Power consumption: want power to
better sensing (smaller voltage depend on access pattern not
swing), increased parallelism quantity of bits stored.
(overlapped accesses), faster I/O. Improvements due to lower supply
voltage.

Improvements in one dimension come


at an increased cost in the other
dimensions.
MicroLab, VLSI-18 (3/21)

JMM/ESA v1.0
Memory Architecture
bit lines word lines
Col. Col. Col. Col.
1 2 3 2M

Row 1

N Row 2
Row Address Decoder

Row 2N

memory
cell
M (one bit)
N+M Column Decoder

DATA

w Most memory layouts are “folded”, i.e., D < M. Why?


w What are there practical upper bounds on M and N?
w What if you want even more memory?
w Why only one bit per cell? (Not a silly question!)
w Why are “page-mode” accesses a good idea?

MicroLab, VLSI-18 (4/21)

JMM/ESA v1.0
ROM Circuits
NOR-based
ROM array

shared
ground

R1

R2

R3

R4

shared
bit line C1 C2 C3 C4
contact

R1 R2 R3 R4 C1 C2 C3 C4
1 0 0 0 0 1 0 1
0 1 0 0 0 0 1 1
0 0 1 0 1 0 0 1
0 0 0 1 0 1 1 0

MicroLab, VLSI-18 (5/21)

JMM/ESA v1.0
ROM Layout

VDD

GND

shared shared
contact ground

no
pulldown

pulldown

ground and
word line
refresh

w Which are the word lines? the bit lines?


w Why are the word lines “strapped” with M2?
w What layers change when programming changes?
w How often should signals be refreshed?
MicroLab, VLSI-18 (6/21)

JMM/ESA v1.0
ROM Performance
tACCESS = tROW DECODE + tCOLUMN + tCOL DECODE
tROW DECODE :
If ROM is large, row decode logic is just a small percentage of
total area. So we can make the driver for the word line large and
thus fast. Note that we need to strap the poly word line to
eliminate slow down due to poly resistance.

tCOL DECODE:
As with the row decode logic, we can increase speed by
increasing size of transistors in this section.

t COLUMN:
We want small program transistors to keep the total area of
ROM as small as possible. Also increasing size of pulldowns
increases load on both word and bit lines. This means we’re
limited in the speed we can achieve in pulling down the column.
If CPD,DRAIN = 10fF and we have 128 rows:
tCOLUMN = C ∆V / I AV
= (10fF)(128)(2.5V)/(30uA)
= 110ns

Too slow! which of these can we fix?


MicroLab, VLSI-18 (7/21)

JMM/ESA v1.0
Sense Amplifiers
Let’s speed things up by sensing small changes in the bit line voltage
using a sense amplifier:

R1

R2

C1
column C1
(tree)
decoder
C0
C0

tenths of a volt
amplified to full SENSE AMP
rail-to-rail swing

MicroLab, VLSI-18 (8/21)

JMM/ESA v1.0
Single-ended Sense Amp
Choose fet sizes so that
M2, MD >> MC >> M1
M3 >> M4
voltage M1
reference 1
(fets sized M3
to produce
VREF = 3V) M2

M4
2

series fets in
column decoder MD bit line
(pullup built into
sense amp)

MC

word line -- enables pulldown memory cell pulldowns


when row is selected (connected to bit line)

When bit line is not pulled down, V1 = VDD and V2 = VREF - Vth = 2V, so M3
is off and M4 is on and the output is pulled low.

When a bit line pulldown is turned on, V2 starts to drop and


M2 conducts well enough so that V1 drops to V2 since MC >> M1. When V1
and V2 drop 0.5V to 1.5V, M3 is strongly conducting and M4 is weakly
conducting, so output goes high. So small ∆V on bit line produces large output
swing.

MicroLab, VLSI-18 (9/21)

JMM/ESA v1.0
SRAM Circuits
precharge or VDD

static
bistable 6-T SRAM Cell
storage access fet
element

word line
Differential Sense Amp
rdata

bit bit

tie bulk to
source if
possible long-channel
precharge fet used as
or VDD
current source

clocked
cross-coupled Use CLK if
sense amp possible to
clk reduce power
and improve
write speed
wdata

MicroLab, VLSI-18 (10/21)

JMM/ESA v1.0
6-T SRAM Cell Layout
VDD

inverter
pullup

inverter
pulldown
GND

strapped access fet


word line

bit line bit line

Pulldowns do the work when access fet is turned


on, pullups can be small to save space and make
the cell easy to write.

MicroLab, VLSI-18 (11/21)

JMM/ESA v1.0
SRAM Read Cycle
VDD

VDD
6-T SRAM Cell bit
word
1
word data

bit Cell pullup has bit


volts no real effect
word
bit

make this big bit

keep away from inverter threshold


1
time

Choose WPU, WACCESS, W INV so that:


fast bit line recovery when WORD goes low
don’t want to “flip” selected cell on read (V1 < VTH,INV)
large ∆V on BIT lines to speed up sensing
minimize cell size

MicroLab, VLSI-18 (12/21)

JMM/ESA v1.0
Differential Sense Amp
rdata

4.8/0.6 4.8/0.6

bit 2 V2 1 V1 bit

4.8/0.6 4.8/0.6

3 long-channel
VDD 0.9/7.2
fet used as
current “source”
VCS

MicroLab, VLSI-18 (13/21)

JMM/ESA v1.0
Fast Address Decoding

Logically, row/column decoders can be


built from wide fan-in AND gates. But
these are slow, place heavy loading on
address wires and may be hard to fit into
the pitch of the memory cell.

A2 A1 A0

One can use predecode


logic to decode blocks of
addresses which are then
further decoded using
smaller AND gates. The
address lines going to the
predecode gates are less
loaded and all gates have
smaller fanin ⇒ decode
happens faster. Layout
works better too!
A2 A1 A0

MicroLab, VLSI-18 (14/21)

JMM/ESA v1.0
Multiport SRAM (Reg File)
One can increase the number of SRAM ports by
adding access transistors. Writes are usually
double-ended; single-ended reads can be used
to save space.

write
read0
read1
rd0 wd wd rd1
An alternative design that can be easily expanded
without worrying about unintentionally flipping the
cell on reads is shown below.
wd rd0 rd1

PU = 2/1 2/1
PD = 4/1

4/1 2/1

5/1
PU = 2/2
PD = 2/3
write
read0
read1
MicroLab, VLSI-18 (15/21)

JMM/ESA v1.0
Content-addressable RAM
By adding two transistors to the 6-T SRAM cell one can form an
XOR gate to compare the cell contents to data on the bit lines.
The output of this logic can drive a pulldown in a distributed NOR
gate to form a word “match” signal for a content-addressable
memory (CAM).

word

xor gate

match

This node goes high This node will be


if data on bit lines pulled down if any bit
doesn’t match data of the word doesn’t
in the cell. match

Read and Write cycles: like before…


Match cycle: place data on bit lines but don’t
assert word line.

MicroLab, VLSI-18 (16/21)

JMM/ESA v1.0
CAM Architecture

weste, figure 8.76(b)

The word match lines from the CAM array can be


used as WORD lines in a companion RAM to read
out other data associated with the tag stored in
the CAM. Uses: fully-associative caches,
translation lookaside buffers (TLBs), ...

MicroLab, VLSI-18 (17/21)

JMM/ESA v1.0
3-T Dynamic RAM
precharge
Precharge happens
before each r/w cycle.
3-T DRAM Cell READ/WRITE and
PRECHARGE dont’
read overlap.

CW CR
CC
Data is stored on
write CC. It’s not destroyed
on read, but will leak
away through write
wdata transistor. CW >> CC

rdata

WRITE: READ:
After precharge, CW is charged high. After precharge, CR is charged high.
When WRITE is asserted CW shares When READ is asserted CR is pulled
charge low if there’s a stored “1” or remains
with CC and dominates since unchanged if there’s a stored “0”. A
CW >> CC. If WDATA is sense amp
asserted, both CW and CR is usually used to speed up
will be discharged, writing a the availability of read data.
“0” into the cell; otherwise
a “1” will be written.

Pros: little or no static power, smaller than SRAM


Cons: needs refresh, need time to precharge
MicroLab, VLSI-18 (18/21)

JMM/ESA v1.0
1-T Dynamic Ram
1-T DRAM Cell
Explicit storage
capacitor (fet gate, word
trench, stack) = 30fF
to 100fF. If we
want higher C:
access fet
better dielectric VREF
more area
εA
C= d bit
thinner film

TiN top electrode (VREF)


Ta2O5 dielectric

poly W bottom
word electrode
line
access fet “Stack” DRAM Cell

MicroLab, VLSI-18 (19/21)

JMM/ESA v1.0
1-T DRAM Read Cycle
DSL PC DSR

lbit rbit
R2 R1 R 129 R 130

C C C/2 C/2 C C
VDD CS VDD

PC PC read out of dummy


cell half way between
“0” and “1” value

lbit, rbit

precharge (PC)

row sel (RN)

dummy sel (DSL,R)

column sel (CS)

precharge bit lines,


discharge dummy cells
read out bit, opposite dummy
amplify difference, restore bit cell
MicroLab, VLSI-18 (20/21)

JMM/ESA v1.0
Coming Up...
Next time:
Driving large loads:
I/O circuits (edge rates, ESD protection, latch up)
Clock generation and distribution (skew)
Readings for next time…
Weste: 5.4.2, 5.5, 5.6

MicroLab, VLSI-18 (21/21)

JMM/ESA v1.0
VLSI Design I
Defect Mechanisms and Fault Models

He’s dead Jim...

Overview
Defects
Fault models
Goal: You know the difference between design and
fabrication defects. You know sources of defects
and you can estimate yield. You can handle fault
models at different abstraction levels.
MicroLab, VLSI-19 (1/32)

JMM v1.4
Design Defects

?
Design Specification

Š it helps to have a specification to compare against!

Š if specification is written in a hardware description


language from which the design is synthesized then
the design should be defect-
defect-free (modulo bugs in
the synthesis software!) Of course the specification
may be buggy...
Š everyone feels better if the design/specification are
“run” in the environment in which they will be used.
For example, in testing a processor chip, one might
boot the operating system and run some key
programs, all under simulation. This leads to the
need for lots of simulation cycles, e.g., as provided
by a hardware emulation system.
system. Now-
Now-a-days these
are built using a small army of FPGA’s.
FPGA’s. Other
choices: in-
in-circuit emulation, cycle-
cycle-based simulators.

MicroLab, VLSI-19 (2/32)

JMM v1.4
Manufacturing Defects
Goal: verify every gate is operating as expected
Defects from misalignment, dust and other particles, “stacking”
faults, pinholes in dielectrics, mask scratches & dirt, thickness
thickness
variations ⇒ layer-
layer-to-
to-layer shorts, discontinuous wires (“opens”),
circuit sensitivities (VTH, LCHANNEL).
Find during wafer probe.

Defects from scratching in handling, damage


during bonding to lead frame, manufacturing defects
undetected during wafer probe (particularly
speed-
speed-related problems).
Find during testing of packaged parts.

Defects from damage during board insertion (thermal, ESD),


infant mortality (manufacturing defects that show up after a few
hours of use). Also noise problems, susceptibility to latch-
latch-up...
Find during testing/burn-
testing/burn-in of boards.

Defects that only appear after months or years of use (metal


migration, oxide damage during manufacture, impurities).
Found by customer (oops!).

Cost of replacing defective component increases


by an order of magnitude with each stage of
manufacture.
MicroLab, VLSI-19 (3/32)

JMM v1.4
Production defects in CMOS circuits

Š a lot of complex processing steps are used to


manufacture a chip -> defects
Š defects and their effect depend on circuit topology
and process
Š knowledge of chemical and physical mechanisms
who lead to defects are essential
Š circuit complexity and surface determine testability
and yield
Š testability and yield are key factors for future VLSI
technologies

MicroLab, VLSI-19 (4/32)

JMM v1.4
VLSI fabrication process

Š fabrication process consists of a sequence of well


defined process steps
Š 50 wafers form a batch
Š each wafer contains 100's or 1000's of chips
Š specific test chips are distributed on the wafers
Š test chips allow to monitor process parameters
Š between a set of process steps the test structures
are measured
process geometrical measure
control chip's conditions
parameters structurs tolerances

layout wafer
controlling tolerances
for futher
processing

process monitor
steps steps

wafer
not futher
disturbances processed

environment
changing

MicroLab, VLSI-19 (5/32)

JMM v1.4
VLSI fabrication process (con‘t)

chip fabrication tests:


Š process parameters
Š oxide thickness, distances of structures, etc
Š electrical parameters
Š currents, resistances, threshold voltages, ...
Š chip test on wafer
Š packaged chip test
disturbances
controlling

parameter
measuring
layout

of test-chips

parameter and
wafer bonding function test
fabrication packaging of packaged
chips

measuring parameter
of process and function
parameters test of chips
on wafer

MicroLab, VLSI-19 (6/32)

JMM v1.4
VLSI fabrication process (con‘t)

Š parameter test
Š test of electrical parameters: current consumption,
quiescent currents, voltage levels, delay times, etc.
Š function test
Š test for logical faults: binary test sequences are applied
to the device under test (DUT)

MicroLab, VLSI-19 (7/32)

JMM v1.4
Defect classification

defects occur at different fabrication steps:


Š defects at wafer fabrication
Š defects at chip packaging
Š defects during chip lifetime

MicroLab, VLSI-19 (8/32)

JMM v1.4
Defects at wafer fabrication
50% of all defects
Š reason:
Š changes in fabrication environment
Š substrate inhomogenities,
inhomogenities, mask misalignment
Š dust particles, photolithography defects
Š local or global effects
Š electrical effects depend on layout topology
Š changes in delay, current consumption
Š shorts, opens

MicroLab, VLSI-19 (9/32)

JMM v1.4
Defect at chip packaging

reasons:
Š bonding problems
Š mechanical stress
Š effect:
Š normally occur at primary inputs or outputs
Š easy to detect

MicroLab, VLSI-19 (10/32)

JMM v1.4
Defects during lifetime

time dependant mechanisms lead to defects


Š early defects: high defect rate (burn-
(burn-in)
Š middle life phase: low defect rate
Š wear defects: defect rate climbs with time

defect rate

early defects
wear defects
middle life phase

time

MicroLab, VLSI-19 (11/32)

JMM v1.4
Yield modeling

Š defects can produce faults


Š yield is percentage of fault free chips
Š yield influences chip cost
yield models are necessary to predict chip cost
Öyield
Š local defects produce most faults

Š assumption: local defects are statistically


independent and occur with probability p
binominal distribution
Öbinominal
Pr{K=k} = Pr{k from n areas are faulty}
Pr{
due to Bernoulli
Ödue
n
Pr{K = k} =  (1 − p ) p k
n− k

k 
with n to infinity and p to zero (np = λ ) we find
with
λk −λ
Pr{K = k} = e
k!
MicroLab, VLSI-19 (12/32)

JMM v1.4
Yield modeling (con‘t)

Š expectation value E {K } = ∑ ke −λ = λ
k =0

Š probability that a chip is fault free


Pr{K = 0} = e − DA
Š Murphy normalized density function f(D)

Y = ∫ e − AD f (D )dD
0

Š calculation of yield with Murphy's density function f(D)


Y1, Y2, Y3 ?
f(D)
(for high yield) 1/D
(for 0

− AD0 2
1 − e 
Y2 =   f2

 AD0  f3
1/(2 D0)

Š Seed's yield model f1

Y = e − AD0
0 2D0
(for low yield)
D0
(for
MicroLab, VLSI-19 (13/32)

JMM v1.4
Yield modeling (con‘t)

Š the bigger the circuit the higher the probability for


a faulty chip
Š example: 2 wafers with the same 17 defects
Š wafer with total 44 chips
yield 61%
Öyield
Š wafer with total 316 chips
yield 95%
Öyield

MicroLab, VLSI-19 (14/32)

JMM v1.4
VLSI fabrication process: conclusion

Š defects occur during wafer fabrication, chip


packaging and during chip lifetime
Š local and global defects
Š local defects dominate at mature process
Š local defects are hard to find and costly

MicroLab, VLSI-19 (15/32)

JMM v1.4
Fault models for integrated circuits

Š complex circuits need more test time


Š test time with expensive equipment leads to high
test cost per chip
Š to reduce test time fault models for structured test
approaches are required

Š if a system behaves not as expected, faults are


present
Š faults can be modeled at different electrical levels
Š faults can be caused by defects
Š they occur during fabrication or life time
Š design errors produce design-
design-faults
Š for example faulty logic implementation of functions
design validation is necessary
Ödesign

MicroLab, VLSI-19 (16/32)

JMM v1.4
Fault models: Testing approaches
Plan: supply a set of test vectors that specify an input or output
value for every pin on every cycle. Tester will load the program
program
into the pin cards, run it and report any discrepancies between an
observed output value and the expected value.

0000 1 10 0000 XXXX input to chip = {0, 1}


0001 1 10 0000 LLLL
0002 1 01 1111 LLLL
output from chip = {L, H}
0003 1 00 1011 HLHL tri-
tri-state/no compare = { X }

cycle # program for 11 pins

How many vectors do we need?


n
n
combinational m combinational m
logic logic

2n inputs required to
exhaustively test circuit

If n=50, m=25, 1ns/test 2n+m inputs required to


then test time > 106 years exhaustively test circuit

Exhaustive testing is not only impractical, it’s


not necessary! Instead we only need to verify that
no faults are present which may take many fewer
vectors.
MicroLab, VLSI-19 (17/32)

JMM v1.4
Fault models: abstraction level

Š circuits are treated at different abstraction levels


Š analog or memory circuits are treated at transistor level
Š medium size digital circuits are treated at logic level
Š complex digital circuits or microprocessors are normally
treated at functional level

Š example of fault manifestation: missing polysilicon


material
Š layout level: ex. missing polysilicon
Š electrical level: ex. open interconnection
Š transistor level: ex. permanently short-
short-circuited
transistor (if missing polysilicon gate)
Š logic level: ex. permanent logic level "1"
Š functional level: ex. register not resetable ...

MicroLab, VLSI-19 (18/32)

JMM v1.4
Fault models (con‘t)

fault dependencies
Š faults are layout dependent
Š fault are technology dependent

goals of fault models


Š fault models should be realistic and thus depend on
physical defect mechanisms
Š fault models should be simple and treatable

MicroLab, VLSI-19 (19/32)

JMM v1.4
Hard to detect faults

transient (intermittent) faults


Š occur only from time to time
Š due to environment changing
Š no satisfactory strategy to search them
Š repeating search
Š built-
built-in test: self-
self-checking circuits, error-
error-correcting-
correcting-
circuits
Š redundant use of several identical circuit-
circuit-blocks

benefits of redundant circuits


Š redundancy for higher functionality security
Š redundancy to eliminate hazards

disadvantages of redundant circuits


Š faults not detectable (masking effect)

MicroLab, VLSI-19 (20/32)

JMM v1.4
Logic level fault models

historical perspective
Š Eldred proposed 1959 methods how to test
computers with relays, diodes, tubes, which
behaved like switches
stimulation of development of fault models on logic
Östimulation
level

stuck-
stuck-at fault model
Š signal can be stuck at "0" or "1"
Š independent of process technology
Š does not model technology dependant
characteristics
Š mathematical calculus exists
Š very useful for TTL technology (or other old
"current" technologies, but not for "charge"
technologies like CMOS)

MicroLab, VLSI-19 (21/32)

JMM v1.4
Logic level fault models (con‘t)

Š Traditional model, first developed for board-


board-level
tests, assumes that a node gets “stuck” at a “0” or
“1”, presumably by shorting to GND or VDD.

stuck at “0” = S-
S-A-0 = node@0
stuck at “1” = S-
S-A-1 = node@1

A Z = ABCD
B X ZB@1 = ACD
C
D ZB@0 = 0

Š example of TTL NAND gate with many defects


describable with stuck-
stuck-at fault model

R1 R2 R4

T4
I1
T1 T2
O
I2
R3 T3

MicroLab, VLSI-19 (22/32)

JMM v1.4
Fault reduction

fault collapsing
Š fault equivalence
Š fault dominance
Š single faults, multiple faults
Š fault detection
Š fault free function: f(x))
Š with fault α: fα(x))
Š test vectors x detect fault, if condition is fulfilled:
f ( x ) ⊕ fα ( x ) = 1
Š fault equivalence
A
f β ( x ) = fα ( x ) C
Š fault dominance B
Tβ ⊂ Tγ
fault β dominates γ A
0
B
0
fault classes
α/1 <=> β/1 <=> γ/1
0 1 β/0 => γ/0
1 0 α/0 => γ/0 α/1 A stuck-at-1
1 1 γ/0 <=> equivalence
=> dominance

MicroLab, VLSI-19 (23/32)

JMM v1.4
Logic level fault models

Š fault dominance
Š Tα represents test vector set to detect fault α
Š fault α dominates fault γ under condition
Tα ⊂ Tγ
Š for test generation only tests for fault α are
necessary
multiple faults: fault masking problems
multiple

MicroLab, VLSI-19 (24/32)

JMM v1.4
Transistor level fault models

Š introduced due to imperfection of logic level fault


models, especially for CMOS
Š technology dependant and thus more realistic
Š more complex to handle and thus not useful for
large circuits
Š transistor level fault models:
Š Wadsack's model
Š Hayes' switch level model
Š Reddy's restrictions due to static discharge
Ö robust test sets

MicroLab, VLSI-19 (25/32)

JMM v1.4
Transistor level fault models (con‘t)

Wadsack's fault models for CMOS:


Š defects can lead to memory effects
Š faulty combinational logic may behave like
sequential logic
Š this effect was modeled by introducing flip-
flip-flop's
in order to use stuck-
stuck-at models
stuck-
stuck-at syndrome !
Östuck

A
fault free stuck-
stuck-at stuck-
stuck-open

B
vddsop A B Y α/0 β/0 γ/0 a b vdd
bsop
asop Y 0 0 1
0 1 0
1 0 0
1 1 0

MicroLab, VLSI-19 (26/32)

JMM v1.4
Functional level fault models

Š VLSI circuits need simple fault models


Š goal of test: it is sometimes sufficient to know if a
sub-
sub-function works correctly
model of functional faults of sub-
Ömodel sub-circuit
Š each sub-
sub-function has its own process dependent
faults
Š advantage:
Š fast simulation
Š short test time
Š process dependent
Š good knowledge on important sub-
sub-functions (ex. RAM's)
Š disadvantage
Š less accurate
Š not useful for all sub-
sub-functions

MicroLab, VLSI-19 (27/32)

JMM v1.4
Functional level fault models:
example

Š example of CMOS multiplexer with n inputs:


behavior under faults:
Š an other input is selected
Š one of the n inputs has a stuck-
stuck-at fault
Š two inputs are selected (AND or OR result at output)
Š if the complementary value arrives at a selected input,
an error occurs at the output
Š if the complementary value of the selected input arrives
at a neighbor of the selected input, an error occurs at
the output

S0 S1 S2

A0
A1
A2 Y
A3 88toto1 1MUX
MUX
A4
A5
A6
A7

MicroLab, VLSI-19 (28/32)

JMM v1.4
Fault models summary

Š fault models are used to model the effects of


fabrication defects on abstract levels
Š fault models allow to search directly for circuit
defects
Š fault models need to be simple and precise
Š CMOS defects are bad modeled with stuck-
stuck-at fault
model

MicroLab, VLSI-19 (29/32)

JMM v1.4
Coming Up...
Next topic…
Test pattern generation and fault simulation
Test

Readings for next time…


Weste::
Weste
‹ reading 7 through 7.2.1

MicroLab, VLSI-19 (30/32)

JMM v1.4
VLSI--19 #1
Exercises: VLSI

Š Ex vlsi19.1 (difficulty: easy): Calculate the yield of a


circuit of area 5 mm2 and 1 cm2 if the defect rate D
is 2 defects per cm2.
Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low
Result:
yield equation), see vlsi-
vlsi-19/13

Š Ex vlsi19.2 (difficulty: easy): Discuss the circuits


function with the introduction of the stuck-
stuck-open
fault Fx=open. Can this fault be modeled by a stuck-
stuck-at
fault?
C D

A B
F = (A+C)(B+D)
A C FX=OPEN = __________
X
B D

MicroLab, VLSI-19 (31/32)

JMM v1.4
VLSI--19
Exercises: VLSI #2

Š Ex vlsi19.3 (difficulty: easy):


Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low
Result:
yield equation), see vlsi-
vlsi-19/13

Š Ex vlsi19.4 (difficulty: easy): Discuss faults due to


defects at the TTL nand gate on transparency 22.
stuck-at fault do you have if a) R1 is an
What kind of stuck-
open circuit, b)open at I1, c) open in R2
Result: s-a-1, b) I1 s-a-1, c) O s-
Result: a) O s- s-a-1

MicroLab, VLSI-19 (32/32)

JMM v1.4
VLSI Design I
Test Pattern Generation and Fault Simulation

Let‘s test a chip?

Overview
Test pattern generation
Fault simulation
Goal: Design for testability terms like
controllability and observability are known. You are
familiar with test pattern algorithms as well as
with testability measure metrics.
MicroLab, VLSI-20 (1/26)

JMM v1.4
Testers
The device under test
(DUT) can be a site on
a wafer or a packaged
part.

100’s

pin
circuitry

Each pin on the chip is


driven/observed by a
separate set of circuitry which typically can drive the pin to oone
ne data value
per cycle or observe (“strobe”) the value of the pin at a particular
particular point
in a clock cycle. Timing of input transitions and sampling of outputs
outputs is
controlled by a small (<< # of pins) number of high- high-
resolution timing generators. To increase the number
of possible input patterns, different data “formats” are provided:
provided:

tCYCLE
non-
non-return-
return-to-
to-zero (NRZ) data

return-
return-to-
to-zero (RTZ) data

return-
return-to-
to-one (RTO) data

surround-
surround-by-
by-complement (SBC) ~data data ~data

MicroLab, VLSI-20 (2/26)

JMM v1.4
Test pattern generation

Š test generation is a time consuming task


Š computer-
computer-aided test programs (CAT) help designer
but do not solve test problems
Š approaches to manage test problem with increasing
circuit complexity (research fields)
Š design for testability
Š algorithms to generate good test vectors

design for testability: controllability, observability


design
Š system designer needs DFT knowledge
ad--hoc approaches to augment controllability:
Š ad
partitioning, more test-
test-pads
Š structured methods, multiplexer approach, scan-
scan-path,
built-
built-in logic block observation (BILBO), boundary-
boundary-scan,
signature analysis, etc...

MicroLab, VLSI-20 (3/26)

JMM v1.4
Algorithms for test pattern
generation

basic concepts for test generation for stuck-


basic stuck-at fault
models in combinational circuits
Š algebraic test generation: boolean difference
Š D-algorithm
Š Podem and FAN algorithms
Š controllability and observability measuring

MicroLab, VLSI-20 (4/26)

JMM v1.4
Boolean difference
Š algebraic method: boolean difference
Š circuits function with input vector x
f ( x ) = f ( x1 ... x n )
Š for ith component of vector x with fix value we define
f i (1) = f ( x1 ,..., x i −1 ,1, x i +1 ..., xn )
f i (0 ) = f ( x1 ,..., xi −1 ,0, x i +1 ..., x n )
Š definition of boolean difference
∂f ( x )
= f ( x1 ,..., x i ,..., x n ) ⊕ f ( x1 ,..., xi ,..., xn )
∂x i
∂f ( x )
= f i (0 ) ⊕ f i (1)
∂xi
Š circuit with fault α: stuck-
stuck-at-
at-1 at input xi
fα ( x ) = f ( x1 ,..., x i −1 ,1, x i +1 ..., xn ) = fα (1)
s-a-1 faults the two functions f(x)) and fα(1)
Š to detect s-
must produce different results, so the test vector set is
defined by T=1 ∂f
T = f ( x ) ⊕ fα ( x ) = x i
∂x i
∂f
s-a-0 faults: T = f ( x ) ⊕ fα ( x ) = xi
Š for s-
∂xi
MicroLab, VLSI-20 (5/26)

JMM v1.4
Boolean difference: Rules
∂f (x ) ∂f (x )
∂ f (x ) ∂f (x ) =
= ∂ xi ∂xi
∂xi ∂xi
∂ ∂f (x ) ∂ ∂f (x )
⋅ = ⋅
xi ∂x j x j ∂xi
∂[ f (x )g (x )] ∂g (x ) ∂f (x ) ∂f (x ) ∂g (x )
= f (x ) ⊕ g (x ) ⊕
∂xi ∂xi ∂xi ∂xi ∂xi
∂[ f (x ) + g (x )] ∂g (x ) ∂f (x ) ∂f (x ) ∂g (x )
= f (x ) ⊕ g (x ) ⊕
∂xi ∂xi ∂xi ∂xi ∂xi
∂[ f ( x )g ( x )] f (x )
= g(x )
∂x i ∂x i g ( x ) independent of xi
∂[ f ( x ) + g ( x )] f (x )
= g(x )
∂x i ∂x i
∂[ f (x ) + g (x )] ∂[ f (x )⋅ g (x )]
=
∂xi ∂xi
∂[ f (x ) ⊕ g (x )] ∂f (x ) g (x )
= ⊕
∂xi ∂xi ∂xi
MicroLab, VLSI-20 (6/26)

JMM v1.4
Boolean difference: example

Example Ex 20.1 (medium): circuit with stuck-


stuck-at-
at-1
fault at x3. Find all test patterns which detect the
the fault with means of the boolean difference.

G1
x1
s1 G3
x2 ≥1
s2
& G5
x3

y
G4
≥1
G2

x4 &
s3

MicroLab, VLSI-20 (7/26)

JMM v1.4
Test generation: DD--algorithm

Basics:
Basics:
Š fault sensitisation (provoke error)
Š fault propagation
Š line justification

Š D-notation
Š a signal with value D is fault free if D = 1
Š a signal with value D is faulty if D=0
Š a signal with value D is fault free if D = 0
Š a signal with value D is faulty if D=1

Š very formal table manipulation procedure


Š advantage:
Š if test vector exists it will be found
Š programmable for computers
Š disadvantage:
Š conflicts lead to time consuming dummy calculations
Š not usable for large circuits
MicroLab, VLSI-20 (8/26)

JMM v1.4
Test generation: Path sensitisation
Example Ex20.2 (easy): circuit with stuck-
stuck-at-
at-1 fault
at x3. Find test vectors with means of D-
D-algorithm
Š sensitization:
Š fault propagation:
Š line justification
Step 1: Sensitize circuit.
Step circuit. Find input values that
produce a value on the faulty node that’s different
from the value forced by the fault. For our S-
S-A-1
fault above, want output of AND gate to be 0.
Š Is this always possible? What would it mean if no such
input values exist?
Š Is the set of sensitizing input values unique? If not,
which should one choose?
Š What’s left to do?
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3

y
G4
G2 ≥1

x4
&
Xs3
S-A-1

MicroLab, VLSI-20 (9/26)

JMM v1.4
Test generation: Fault propagation
Š sensitization:
Š fault propagation:
Š line justification
Step 2: Fault propagation.
Step propagation. Select a path that
propagates the faulty value to an observed output (y
in our example).

G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1

x4 & Xs3

S-A-1

MicroLab, VLSI-20 (10/26)

JMM v1.4
Test generation: Line justification
Š sensitization:
Š fault propagation:
Š line justification
Step 3: Line justification.
Step justification. Find a set of input
values that enables the selected path
(backtracking).
Š Is this always possible? What would it mean if no such
input values exist?
Š Is the set of enabling input values unique?
If not, which should one choose?

G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1

x4 & Xs3

S-A-1

MicroLab, VLSI-20 (11/26)

JMM v1.4
Test generation: PODEM, FAN

Š more recent algorithms like Podem,


Podem, FAN or others
basically intend to prevent conflict situations or to
detect them as early as possible
Š concept:
Š take decisions as late as possible (prevent wrong
decisions, perhaps there is later nothing to decide)
Š heuristics help to take decisions which succeed with
higher probability
controllability and observability measuring
Öcontrollability
necessary

MicroLab, VLSI-20 (12/26)

JMM v1.4
PODEM

Š Podem algorithm is simpler to understand than D-


D-
algorithm
Š backtrack (branch-
(branch-and-
and-bound) algorithm is used
Š small steps to reach objective
Š if objective leads to dead-
dead-end, go back
Š backtrack (branch-
(branch-and-
and-bound) in Podem:
Podem:
Š all signals are initialised to "X"
Š fault sensitisation
Š during fault propagation D symbols are propagated only
one step to primary outputs (branch)
Š immediate line justification of selected signal to primary
inputs (new input with value corresponds branch of
decision tree)
Š succeeding fault simulation immediately detects conflict
situations (bound)
Š new branch

MicroLab, VLSI-20 (13/26)

JMM v1.4
PODEM: Example

Š branch-
branch-and-
and-bound tree
Š nodes represent decisions
Š branches represent PI's
9 represent 1st decision faulty
start
Š example x1 stuck-
stuck-at-
at-1

x1=0

x2=1

G1
x1
s1 G3
x2 ≥1
s2
& G5
x3

y
G4
≥1
G2

&
x4
s3

MicroLab, VLSI-20 (14/26)

JMM v1.4
Test pattern generation: Heuristics

Heuristics in FAN algorithm


Heuristics
Š fault propagation
Š propagate to PO on path which is best observable
Š line justification
Š start with the most difficult path to control
Š heuristics help to find test vectors faster

MicroLab, VLSI-20 (15/26)

JMM v1.4
Controllability and observability
measure
Š often used to solve np-
np-complete problems
Š heuristics do not guarantee to find a solution in a
given time
Š testability measure methods:
Š temas,
temas, testscreen,
testscreen, victor, camelot,
camelot, scoap

Š sandia controllability/observability
controllability/observability analysis
program ((scoap
scoap)
scoap)
Š each node in a circuit gets values for its
controllability, observability and testability
Š high values indicate nodes which are hard to control
or to observe
Š distinguish between "1" and "0" controllability
Š distinguish between combinational and sequential
values

MicroLab, VLSI-20 (16/26)

JMM v1.4
Observability & Controllability
When propagating faulty values to observed outputs we are often faced
with several choices for which should be the next gate in our path.
path.

?
X
?

We’d like to have a way to measure the observability of a node, i.e.,


some indication of how hard it is to observe the node at the outputs
outputs of
the chip. During fault propagation we could choose the gate whose
whose
output was easiest to observe.

Similarly, during backtracking we need a way to choose between


alternative ways of forcing a particular value:

which input should


we try to set to 0? want 0 here

In this case, we’d like to have a way to measure the


controllability of a node, i.e., some indication of how
easy it is to force the node to 0 or 1. During
backtracking we could choose the input that was
easiest to control.
MicroLab, VLSI-20 (17/26)

JMM v1.4
Testability measurement:
Scoap algorithm
Š combinational "1" and "0" controllability of a logic gate
output y dependent on inputs x1..x3
OR gate:
OR
CC 0 ( y ) = CC 0 ( x1 ) + CC 0 ( x2 ) + CC 0 ( x3 ) + 1
CC 1 ( y ) = min{CC 1 ( x1 ), CC 1 ( x 2 ), CC 1 ( x3 )}+ 1
AND gate:
AND
CC 0 ( y ) = min{CC 0 ( x1 ), CC 0 ( x2 ), CC 0 ( x3 )}+ 1
CC 1 ( y ) = CC 1 ( x1 ) + CC 1 ( x 2 ) + CC 1 ( x3 ) + 1
Š combinational "1" and "0" observability of a logic gate
dependent on output y and inputs x2,x3
OR gate:
OR
CO ( x1 ) = CO ( y ) + CC 0 ( x 2 ) + CC 0 ( x3 ) + 1
AND gate:
AND
CO ( x1 ) = CO ( y ) + CC 1 ( x2 ) + CC 1 ( x3 ) + 1
initialization (N are internal nodes, X,Y are PI, PO's)
initialization
CC 0 ( X ) = 1 CC 0 (N ) = ∞ CO (Y ) = 0
CC 1 ( X ) = 1 CC 1 (N ) = ∞ CO (N ) = ∞
MicroLab, VLSI-20 (18/26)

JMM v1.4
Testability measurement:
Scoap algorithm (con‘t)
hmmm. I guess
smaller numbers
are better...

“testability” measure assumes that the further a node


is from an input/output the harder it is to set/observe
CC0(Z) = min[CC0(A), CC0(B)] + 1
A CC1(Z) = CC1(A) + CC1(B) + 1
B Z
CO(A) = CO(Z) + CC1(B) + 1
CO(B) = CO(Z) + CC1(A) + 1
if more than one, choose min

CC0(Z) = CC0(A) + CC0(B) + 1


A CC1(Z) = min[CC1(A), CC1(B)] + 1
B Z
CO(A) = CO(Z) + CC0(B) + 1
CO(B) = CO(Z) + CC0(A) + 1

1,1,-
1,1,-
1,1,-
1,1,- -.-.0
1,1,-
1,1,-
1,1,-
1,1,- -.-.0
1,1,-
1,1,-
1,1,-
1,1,-
1,1,-
1,1,- CC0,CC1,CO

MicroLab, VLSI-20 (19/26)

JMM v1.4
Fault simulation

goals of fault simulation:


goals
Š analyze circuit under faults condition
Š qualify test sequence, fault coverage
Š reduce fault set during test generation
good quality fault models necessary
Ögood

fault simulation methods


fault
Š parallel fault simulation
Š concurrent fault simulation
Š deductive fault simulation

Š alternative to fault simulation in test generation


procedures:
Š tracing fault sensitive paths

MicroLab, VLSI-20 (20/26)

JMM v1.4
Fault simulation (con‘t)

Š parallel fault simulation


Š principle: computing with 1-1-bit or n-bit wide
variables need similar computing time
Š test of n-
n-1 faults at the same time

bitposition fault mask fault values


1 fault free MA=[01000] [01000]
2 A s-a-1 MB=[00100] [00100]
3 B s-a-1 MC=[00011] [00010]
4 C s-a-1
5 C s-a-0

A=[00000] A'=[01000]
MA
C=[01100] C'=[01110]
MC
B=[00000] B'=[00100] ≥1
MB

MicroLab, VLSI-20 (21/26)

JMM v1.4
Fault Grading
So, you’ve constructed a set of test vectors
using the techniques described here. Will
they detect all the faulty parts?

Š You could see how many different faults


your vectors detect by inserting each
possible fault one at a time, running the
vectors, then check to see if some output
was different from the “good” machine on
some cycle. Need *lots* of simulation…
probably impractical for large circuits even
with hardware-
hardware-accelerated simulator.

Š You can use the same sorts of statistical


sampling techniques that other QA
programs employ: randomly select a set
of faults, fault grade your vectors on
those faults and use standard statistical
techniques to see if fault coverage exceeds
a desired level. The level of confidence may
be increased by increasing the number of
samples.
MicroLab, VLSI-20 (22/26)

JMM v1.4
Conclusion

Š defects during chip fabrication are inevitable


Š faults model defects on higher abstraction levels
Š higher chip complexity, more gates and less pads
reduce controllability, observability and thus
testability
Š test pattern generation is going to be time
consuming and thus costly
Š structured design for test during chip development
is required

MicroLab, VLSI-20 (23/26)

JMM v1.4
Coming Up...
Š Next topic…
Design for Testability
Design

Readings for next time…


Weste:
‹ Sections 7.2.2 thru 7.2.5

MicroLab, VLSI-20 (24/26)

JMM v1.4
VLSI--20
Exercises: VLSI #1

Š Ex vlsi20.3 (difficulty: medium): The digital circuit


suffers from error α s-a-0. Try to find test patterns
by means of D-
D-algorithm. If you don‘t succeed use
the boolean difference to calculate the test patterns.
Result: T=x1 x2 x3 x4 found by boolean difference
Result:

G2
&

x3
G3
& G6
x1
G1
& y
x2 &
G4
α &
x4

G5
&

MicroLab, VLSI-20 (25/26)

JMM v1.4
VLSI--20 #2
Exercises: VLSI

Š Ex vlsi20.4 (difficulty: easy): Calculate the Scoap


combinational controllability and observability values
for the circuit below (CC0,CC1,CO)
Result: x1 (1,1,7), x2 (1,1,7), x3 (1,1,5), x4 (1,1,5),
Result:
s1 (3,2,5), s2 (2,4,3), s3 (2,3,3), y (4,5,0)
G1
x1
s1 G3
≥1
x2
s2
& G5
x3

y
G4
≥1

x4 &
s3

Š Ex vlsi20.5 (difficulty: easy): a) circuit with stuck-


stuck-
at-0 fault at s1. b) circuit with stuck-
at- stuck-at-
at-0 fault at
x1. Find all test patterns which detect the the fault
with means of the boolean difference.
Result equations a) x=(x1+x2)x3, b) x=x1x2x3
Result

MicroLab, VLSI-20 (26/26)

JMM v1.4
VLSI Systems Design
Top-down Design and HDLs

It seems I
have to
hurry up!

Overview
? Top down design-flow, VHDL hardware description
language, test-bench methodology

Goal: You are able to design circuits with the VHDL


language with behavioral, dataflow and structural
modeling. You are familiar with the top down design flow
and the test-bench methodology.
MicroLab, VLSI-21 (1/94)

JMM v1.4
chapter 1

The Need for HDLs


A specification is an engineering contract that lists all the
goals for a project:

?goals include area, power, throughput, latency,


functionality, test coverage, costs (NREs and piece costs).
Helps you figure out when you’re done and how to make
engineering tradeoffs. Later on goals help remind everyone
(especially management) what was agreed to!

?partition the project into modules with well-defined


interfaces so that each module can be worked on by a
separate team. Gives the SW types a head start too!
(Hardware/software codesign)

?A behavioral model serves as an executable specification


that documents the exact behavior of all the individual
modules and their interfaces. Since one can run tests, this
model can be refined and finally verified through
simulation.

We need a way to talk about what hardware should do


without actually designing the hardware itself, i.e., need to
separate functionality from implementation. We need a
Hardware Description Language
MicroLab, VLSI-21 (2/94)

JMM v1.4
The Need for HDLs cont.

?easier to explore ideas in HDLs than in logic gates


?stepwise refinement: HDLs allow to describe
designs at various levels of abstraction
?HDLs sustain description-synthesis method
?pitfalls: abstract models are not precise

?first HDLs were introduced in late 70s


?difficulties to develop general purpose HDL for
signal-processing and real-time applications and ...
?portability needs lead to standardizations (Institute
of electrical and electronics engineering, IEEE)

MicroLab, VLSI-21 (3/94)

JMM v1.4
Hardware Description Languages

?textual HDLs
VHDL, Verilog-HDL, HardwareC, etc.
?graphic HDLs
SpecdChart, etc. (control & dataflow graphs)
?tabular HDLs
BIF, etc. (FSMD models in tabular forms)
?time-diagram HDLs
Waves, etc.

?Standardization
? VHDL: IEEE Std 1067-1987 & 1993
? std_logic package IEEE Std 1164-1993
? Verilog-HDL: IEEE Std 1997

MicroLab, VLSI-21 (4/94)

JMM v1.4
A Tale of Two HDLs
VHDL Verilog-HDL
VHSIC HDL, Very High Speed C-like concise syntax
Integrated Circuits. ADA-like
verbose syntax, lots of redundancy
Extensible types and Built-in types and logic
simulation engine. Logic representations. Oddly,
representations are not this has led to slightly
built in and have evolved incompatible simulators
with time (IEEE-1164). from different vendors.

Design is composed of Design is composed of


entities each of which can have modules.
multiple architectures. A
configuration chooses what
architecture is used for a given
instance of an entity.

Behavioral, structural, Behavioral, structural,


logic-level modeling logic-level modeling

Synthesizable subset... Synthesizable subset...

Harder to learn and use, Easy to learn and use,


not technology-specific, fast simulation, good for
DoD mandate. logic. Gateway Design
Automation
MicroLab, VLSI-21 (5/94)

JMM v1.4
Introduction to VHDL & Verilog

VHDL Verilog-HDL
?rich & powerful ?simple & efficient
language language
?data type driven ?hardware driven
language language
?goal: documentation of ?goal: automatic
large complex systems synthesis

language structures language structures


?entity (hierarchy ?module (blocks or sub-
interface) blocks)
?architecture (behavior ?#include (file
of system) structuring)
?configuration (binding
of entity and
architecture)
?package (library of
global types or
blocks)
MicroLab, VLSI-21 (6/94)

JMM v1.4
Introduction to VHDL & Verilog cont.

language features
?signal data types (in, out, bidir, signal-strength ...)
?hardware structures (memory, register-files, ...)
?logic operators (shift, rotation, masking, ...)
?asynchronous structures (set, reset of memories)
?parallel or synchronous structures
?constraints (pin, technology, area, delays, ...)
?inter-process communications (shared medium,
message passing, ...)

VHDL

MicroLab, VLSI-21 (7/94)

JMM v1.4
chapter 2

Signals, Delays, Events, Concurrency

?digital systems in contrast to software systems are


fundamentally about signals
?signals in contrast to variables do have delays which
leads to signal waveforms
?digital systems are comprised of components
?digital systems do have concurrency of operation
?events on signals lead to computations that may
generate events on other signals
event

sum

carry
time (ns)

5 10 15 20 25 30 35 40

MicroLab, VLSI-21 (8/94)

JMM v1.4
Signal Values

?signal values are physically associated to wires


?VHDL language supports signal type:
?type: bit, values: ‘0’, ‘1’
?type: bit_vector, values: “0001”, etc
?VHDL package IEEE 1164 supports signal type:
?type: std_ulogic and vector std_ulogic_vector
?std_ulogic is a 9 value logic

value interpretation

U un-initialized
X forcing unknown
0 forcing 0
1 forcing 1
Z high impedance
W weak unknown
L weak 0
H weak 1
- don’t care

MicroLab, VLSI-21 (9/94)

JMM v1.4
Resolved Signals

?it is common for components in a digital system to


have multiple sources for the value of an input
signal
?many designs use buses: a group of signals that can
be shared among multiple sources
?the values on shared signals will be determined
upon the type of interconnection, like wired logic
?the signal values depend on its implementation
? the VHDL simulator has to resolve the signals value
? The IEEE 1164 package offers std_logic and
std_logic_vector signal types for resolved version
of the signal std_ulogic and std_ulogic_vector

resolved signal necessary


wired-or logic

unresolved signal

MicroLab, VLSI-21 (10/94)

JMM v1.4
chapter 3

Entity

?the design entity is a primary programming


abstraction in VHDL
?entity defines the interface of a component, without
giving any information about the component
behavior

a sum
+
b carry

entity HalfAdder is
port (a,b : in bit;
sum,carry : out bit);
end HalfAdder;

library IEEE;
use IEEE.std_logic_1164.all
entity HalfAdder is
port (a,b : in std_ulogic;
sum,carry : out std_ulogic);
end HalfAdder;

MicroLab, VLSI-21 (11/94)

JMM v1.4
Exercises Ex401: Entity

?Ex401 (difficulty: easy): Define the entities of the


following digital components. Use the unresolved 9
value logic of the IEEE 1164 package. Each
component has to be edited in a separate file with
the components name plus extension “.vhd” . The
files have to be analyzed by the Synopsys command:
gvan
sNot
i0
Mux4to1 d q
i1 D_ff
z
i2 clk qNot
i3

sel rNot
8 bit data

n z
a
32 bit data Alu32
6 bit op-code c
use first letter of
b component name in capital,
op and first letter of signal
name in small cap
MicroLab, VLSI-21 (12/94)

JMM v1.4
Architecture

?the design architecture is a primary programming


abstraction in VHDL
?architecture describes the internal behavior of a
component, without giving any information about
the component IO’s
?The behavioral description can take many forms.
These forms differ in the levels of abstraction and
detail.

architecture behavior of HalfAdder is


-- comment: declaration of variables
begin
...
functional description
of the system

end behavior;

MicroLab, VLSI-21 (13/94)

JMM v1.4
Entity-Architecture: Hierarchy
(VHDL vs. Verilog)

library IEEE;
use IEEE.std_logic_1164.all;
entity FullAdder is
port (a,b,ci: in std_logic; co,s:out std_logic);
end FullAdder; VHDL

architecture behavior of FullAdder is


-- comment: declaration of variables
...
functional description
of the system

end behavior; module FullAdder (a,b,ci,co,s);


input a,b,ci;
output co,s;
/* comment: declarations of variables */
...
Verilog-HDL functional description
of the system
endmodule
MicroLab, VLSI-21 (14/94)

JMM v1.4
Concurrency
?The operation of digital systems is inherently
concurrent
?Within VHDL signals are assigned values using
signal assignment statements <=
?Multiple signal assignment statements are executed
concurrently
concurrent architecture concurrent_behavior of HalfAdder is
signal assignment begin
sum <= (a xor b) after 5 ns;
carry <= (a and b) after 5 ns;
end concurrent_behavior;

sum

carry
time (ns)

5 10 15 20 25 30 35 40
MicroLab, VLSI-21 (15/94)

JMM v1.4
Dataflow Model #1

library IEEE;
use IEEE.std_logic_1164.all;

entity HalfAdder is
port (a,b: in std_logic;
carry,sum:out std_logic);
end HalfAdder;

architecture dataflow of HalfAdder is


begin
sum <= (a xor b) after 5 ns;
carry <= (a and b) after 5 ns;
end dataflow;

MicroLab, VLSI-21 (16/94)

JMM v1.4
Dataflow Model #2
s1
L1
L4
+
L2 s2
L5
L3 s3

library IEEE;
use IEEE.std_logic_1164.all;

entity FullAdder is
port (a,b,cIn: in std_logic;
cOut,sum: out std_logic);
end FullAdder;

architecture architecture dataflow of FullAdder is


declarative signal s1,s2,s3 : std_logic;
constant gate_delay: Time:=5 ns;
segment begin
L1: s1 <= (a xor b) after gate_delay;
L2: s2 <= (cIn and s1) after gate_delay;
architecture L3: s3 <= (a and b) after gate_delay;
body L4: sum <= (s1 xor cIn) after gate_delay;
L5: cOut <= (s2 or s3) after gate_delay;
end dataflow;

MicroLab, VLSI-21 (17/94)

JMM v1.4
Signal Assignments #1

?simple signal assignments

sum<=(a xor b) after 5 ns, (a or b) after 10 ns, (not a) after 15 ns;

sig <= ‘0’, ‘1’ after 10 ns, ‘0’ after 20 ns, ‘1’ after 40 ns;

time (ns)

5 10 15 20 25 30 35 40

clock <= ‘0’, not(clock) after 5 ns;

time (ns)

5 10 15 20 25 30 35 40

a <= “00000000_00000000”, to_stdlogicvector(x”abcd”) after 5 ns;

Type conversion from hexadecimal


to std_logic_vector is defined in
package std_logic_1164
MicroLab, VLSI-21 (18/94)

JMM v1.4
Conditional Signal Assignment #2

?The right hand value is computed immediately and


assigned at some point in the future using the after
clause

library IEEE;
use IEEE.std_logic_1164.all;

entity Mux4to1 is
port (i0,i1,i2,i3: in std_logic_vector(7 downto 0);
sel : in std_logic_vector(1 downto 0);
z : out std_logic_vector(7 downto 0));
end Mux4to1;

architecture dataflow of Mux4to1 is


begin
z <= i0 after 5 ns when sel=“00” else
one single i1 after 5 ns when sel=“01” else
signal i2 after 5 ns when sel=“10” else
assignment i3 after 5 ns when sel=“11” else
“00000000” after 5 ns;
end dataflow;

MicroLab, VLSI-21 (19/94)

JMM v1.4
Exercises Ex402: Conditional Signal
Assignment
?Ex402 (difficulty: easy): Define the VHDL code of
a 1bit ALU with the operations: AND, OR,
FullAdder. Use the resolved 9 value logic of the
IEEE 1164 package. The Simple1bitALU.vhd file has
to be analyzed and simulated by the Synopsys
commands: gvan and vhdldbx

carry
a
Alu32
carryIn result
b
opcode

MicroLab, VLSI-21 (20/94)

JMM v1.4
Delays: Delta Delay Model
?The VHDL language distinguished between tree
delay models:
?Delta delay model
?Inertial delay model (default)
?Transport delay model
? Delta delay model
?If no delay is specified, a delta delay is assumed. A delta
delay is as small as zero delay. It is used by the
simulator which sums delta delays to zero.
in1 architecture delta_delay of Comb is
signal s1,s2,s3,s4: std_logic:=0;
in2 begin
z s1 <=not(in1);
s1 s2 <=not(in2);
s3 <=not(s1 and in2);
s2 s4 <=not(s2 and in1);
s3 z <=not(s3 and s4);
s4 end delta_delay;

0 10 20 30 40 50 60 70
in2
s2
s3
z

10 ? 2? 3?
MicroLab, VLSI-21 (21/94)

JMM v1.4
Delays: Inertial Delay Model
?Digital circuits have a certain amount of inertia.
For example it takes a finite amount of time and a
certain amount of energy for the output of a gate to
respond to a change on the input
? Inertial delay model (default)
?a pulse shorter than the propagation delay will not
propagate to the output

out1 <= (a xor b) after 8 ns;


out2 <= (a xor b) after 2 ns;
input
out
input
8 ns
out1 output for delay: 8 ns
2 ns
out2 output for delay: 2 ns
time (ns)

5 10 15 20 25 30 35 40

VHDL’93!
sum <=reject 2 ns inertial (a xor b) after 5 ns;
MicroLab, VLSI-21 (22/94)

JMM v1.4
Delays: Transport Delay Model

?Unlike switching devices, wires have a


comparatively less inertia, As a result, wires will
propagate signals with very small pulse width.
?In modern technologies with increasingly small
feature sizes the wire delays dominate.
? Transport delay model (default)
?any pulse will propagate to the output, independent of
the delay

out1 <= transport (a xor b) after 8 ns;

input
out

input
8 ns
out1 output for delay: 8 ns
time (ns)

5 10 15 20 25 30 35 40

MicroLab, VLSI-21 (23/94)

JMM v1.4
Delay Model in Practice
?Accurate delay library IEEE;
modeling of wire use IEEE.std_logic_1164.all;
delays is possible,
although in practice it entity HalfAdder is
is difficult to obtain port (a,b: in std_logic;
accurate estimates of carry,sum:out std_logic);
the wire delay without end HalfAdder;
proceeding through
physical design and architecture transport_delay of HalfAdder is
layout of the circuit. signal s1,s2: std_logic:=‘0’;
begin
s1 <= (a xor b) after 2 ns;
a s1 s2 <= (a and b) after 2 ns;
sum
b sum <= transport s1 after 4 ns;
carry <= transport s2 after 4 ns;
s2 end transport_delay;
carry

a
b inertial
sum
carry transport
s1
s2
time (ns)
0 2 4 6 8 10 12 14
MicroLab, VLSI-21 (24/94)

JMM v1.4
Exercises vlsi21: Conditional
Assignments
?Ex403 (difficulty: easy): Write and simulate a
VHDL model of a 2-bit comparator (compare on
equality, filename: Comp2.vhd).

a
Comp2 c
b
?Ex405 (difficulty: easy): Construct and test a
VHDL module for generating the following
waveforms.
a
b
c time (ns)
0 10 20 30 40 50 60

?Ex vlsi21 (difficulty: easy): Have a look at the


exercises at the end of chapter 3 of “VHDL:
Starter’s Guide”

MicroLab, VLSI-21 (25/94)

JMM v1.4
chapter 4

The Process Construct #1

?The continuous assignment model is used when


components correspond to gates.
?The process construct enables the use of
conventional programming language constructs.
?In contrast to concurrent signal assignment
statements a process is a sequentially executed
block of code.
?Control flow within a process is strictly sequential.
?With respect to simulation time a process executes
in zero time.

architecture behavior of MyProcess is


begin
process
process declarative part
begin
process body
end process;
end behavior;

MicroLab, VLSI-21 (26/94)

JMM v1.4
Example: Process Statement
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;

entity Memory is
port (addr,wrData: in std_logic_vector(31 downto 0);
wr,rd: in std_logic;
rdData :out std_logic_vector(31 downto 0));
end Memory;

architecture behavioral of Memory is


type memArray is array(0 to 1024) of std_logic_vector(31 downto 0);
begin sensitivity list
MemProcess: process(addr,wr)
variable mem: memArray :=(
(x“00000A06“), -- initializing memory data
others => (x“00000000“));
variable addrIndex: integer;
begin
addrIndex:=conv_integer(addr); immediate
if (wr = ‘1‘) then variable
mem(addrIndex):=wrData; assignment
elsif (rd = ‘1’) then
rdData <=mem(addrIndex) after 10 ns; concurrent
end if; signal
end process; assignment
end behavioral;
MicroLab, VLSI-21 (27/94)

JMM v1.4
The Process Construct #2

?The execution of a process is initiated whenever an


event occurs on any signal in the sensitivity list
?Once started the process executes to completion in
zero (simulation) time.
?Processes execute concurrently with other
processes and concurrent signal assignments.
?Concurrent signal assignments are in fact only
special cases of processes.

identical behavior

architecture behavior of MyBlock1 is architecture behavior of MyBlock2 is


begin begin
c <= a and b after 5 ns; process(a,b)
end behavior; begin
c <= a and b after 5 ns;
end process;
concurrent signal assignment end behavior;

process
MicroLab, VLSI-21 (28/94)

JMM v1.4
VHDL vs. Verilog: Events

Events are variable or signal changes.


Real circuits are event driven.

VHDL Verilog-HDL

?process sensitivity list ?always @(sensitivity


begin statements; list) statement
end process;
?initial (sensitivity
?wait on/until/for list) statement
event;

whow!
everything is
event driven like
in real life

MicroLab, VLSI-21 (29/94)

JMM v1.4
Conditional Programming Constructs

?If-then-else statement
if condition then sequential statement
[ elsif condition then sequential statement ]
[ else sequential statement ] end;
?case statement
case expression is
{when choices => sequential statements }
[ when others => sequential statements ]
end case;

MicroLab, VLSI-21 (30/94)

JMM v1.4
Example: Condition Statements
library IEEE;
use IEEE.std_logic_1164.all;

entity HalfAdder is
port (a,b: in std_logic;
sum,carry: out std_logic);
end HalfAdder;

architecture behavioral of HalfAdder is


begin
If_Process: process(a,b)
begin
if (a = b) then
sum<= ‘ 0‘ after 5 ns;
else
sum<= (a or b) after 5 ns;
end if;
end process;

Case_Process: process(a,b)
begin
case a is
when ‘0‘ => carry <= a after 5 ns;
when ‘1‘ => carry <= b after 5 ns;
when others => carry <= ‘x‘ after 5 ns;
end case;
end process;

end behavioral; MicroLab, VLSI-21 (31/94)

JMM v1.4
VHDL vs. Verilog:
Combinational Logic Example
entity Multiplexer4to1 is
port (sel: in std_logic_vector (1 downto 0);
a,b,c,d: in std_logic_vector (15 downto 0);
z:out std_logic_vector (15 downto0));
end Multiplexer4to1;
VHDL
architecture DemoExample of Multiplexer4to1 is
begin
process (a,b,c,d,sel)
4 to 1 multiplexer
begin (no interfered memory)
case sel is
when (“00“) => z <= a;
when (“01“) => z <= b;
when (“10“) => z <= c;
when (“11“) => z <= d; module Multiplexer4to1(sel,a,b,c,d,z);
when others => z<=“-------“; input [1:0] sel;
end case; input [15:0] a,b,c,d;
end process; output [15:0] z;
end DemoExample;
assign z =(sel == 2’d0) ? a:
(sel == 2’d1) ? b:
(sel == 2’d2) ? c:
(sel == 2’d3) ? d:
Verilog-HDL 16’bx;
endmodule
MicroLab, VLSI-21 (32/94)

JMM v1.4
Loop Programming Constructs

loop index has not to be


?for loop statement declared but can only be
for index in range loop used locally
sequential statements
end loop;

?while loop statement


while condition loop
sequential statements
end loop;

MicroLab, VLSI-21 (33/94)

JMM v1.4
Example: Loop Statements
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all; a
Multiplier
(32 bit) m
entity Multiplier is b
port (a,b: in std_logic_vector(31 downto 0);
m: out std_logic_vector(63 downto 0));
end Multiplier;

architecture behavioral of Multiplier is


constant modulDelay: Time:=10 ns;
begin
process(a,b)
variable bReg: std_logic_vector(63 downto 0);
variable aReg: std_logic_vector(31 downto 0);
begin
aReg:=a;
bReg:=(x“00000000“) & b;
for index in 1 to 32 loop
if bReg(0)= ‘ 1‘ then
bReg(63 downto 32):=bReg(63 downto 32)+aReg(31 downto 0);
end if;
bReg(63 downto 0):= ‘ 0‘ & bReg(63 downto 1);
end loop;
m<=bReg after modulDelay;
end process;
end behavioral;
MicroLab, VLSI-21 (34/94)

JMM v1.4
Exercises vlsi21: Loops

?Ex405 (difficulty: easy): Write a VHDL code for a


combinational shift logic block with 8 bit data
buses with zero fill. Use the 2 bit signal shiftNum
to indicate the number of bits to be shifted. If a
std_logic_vector has to be converted to an integer
type, the conv_integer() function from the
std_logic_unsigned package can be used.

shiftLeft shiftRight

dataIn Shift dataOut

shiftNum

MicroLab, VLSI-21 (35/94)

JMM v1.4
More on Processes

?Never assign a value to a signal in different


processes (multiple drives).

process A conflict
y<=‘0‘; - two drivers!
- not synthesisable!

process B
y<=‘1‘;

?Upon initialization all processes are executed at


once.
?Thereafter processes are executed in a data-driven
manner:
?activated by events on signal list of the process or
?by waiting on occurrences of specific events using wait
statements

MicroLab, VLSI-21 (36/94)

JMM v1.4
The Wait Statement

?A more general way to specify when a process


executes is the wait statement.
?Wait statements explicitly specify the conditions
under which a process may resume execution after
being suspended.
?With wait statements a process can be suspended at
multiple points.

wait for time expression;


example: wait for 20 ns;

wait on signal;
example: wait on clk,reset,status;

wait until condition;


example: wait until (a = ‘1‘);

wait;

MicroLab, VLSI-21 (37/94)

JMM v1.4
Example: Wait Statements
library IEEE; library IEEE;
use IEEE.std_logic_1164.all; use IEEE.std_logic_1164.all;

entity Dff1 is entity Dff2 is


port (d,clk: in std_logic; port (d,clk,rst: in std_logic;
q,qBar: out std_logic); q,qBar: out std_logic);
end Dff1; end Dff2;

architecture behavioral of Dff1 is architecture behavioral of Dff2 is


begin begin
process process(clk,rst);
begin begin
wait until (clk‘event and clk=‘1‘); if (rst=‘0‘) then
q <=d after 1 ns; q <= ‘0‘ after 1 ns;
qBar<=not d after 1 ns; qBar<= ‘1‘ after 1 ns;
end process; elsif (clk‘event and clk=‘1‘) then
end behavioral; q <=d after 1 ns;
qBar<=not d after 1 ns;
end if;
end process;
end behavioral;

if a process has no sensitivity list you MUST


use wait statements, otherwise your process
never suspends and blocks your simulation
MicroLab, VLSI-21 (38/94)

JMM v1.4
Latch vs. Flip-Flop
process(clk,reset)
begin
if (reset = ‘0’) then d D Q q
q <= ‘0’; Latch
elsif (clk=‘1’) then clk
q <= d;
end if;
end process;
reset

process(clk,reset)
begin
if (reset = ‘0’) then d D Q q
q <= ‘0’; Flip-Flop
elsif (clk’event and clk=‘1’) then clk
q <= d;
end if;
end process; reset

process(clk,reset)
begin
if (reset = ‘0‘) then
q <= “00000000“; Mux D Q q
elsif rising_edge(clk) then d
register
if (enable = ‘1’) then enable
q <= d; clk
end if;
end if;
end process; reset
MicroLab, VLSI-21 (39/94)

JMM v1.4
Exercises vlsi21: Synchronous

?Ex406 (difficulty: easy): Write a VHDL code for a


16 bit register with an enable and a asynchronous
reset input. Register16

d q
enable

clk

reset

?Ex407 (difficulty: easy): Write a VHDL code for a


16 bit counter with an enable a load and a
asynchronous reset input.
Counter16
data count
enable
load
clk

reset
MicroLab, VLSI-21 (40/94)

JMM v1.4
More on Wait: Inter-Process Comm.
transmitData
request
acknowledge
receiveData time

entity Handshake is
port(inputData: in std_logic_vector(31 downto 0));
end Handshake;

architecture behavioral of Handshake is


signal transmitData: std_logic_vector(31 downto 0);
signal request, acknowledge: std_logic;
begin

producer: process consumer: process


begin variable receiveData:
wait until inputData‘event; std_logic_vector(31 downto 0);
transmitData<=inputData; begin
request<=‘1‘; wait until request=‘1‘;
wait until acknowledge=‘1‘; receiveData:=transmitData;
request<=‘0‘; acknowledge<=‘1‘;
wait until acknowledge=‘0‘; wait until request=‘0‘;
end process; acknowledge<=‘0‘;
end process;

end behavioral;

MicroLab, VLSI-21 (41/94)

JMM v1.4
Exercises vlsi21: Handshake

?Ex vlsi21.8a (difficulty: easy): Write a VHDL


model for communication between an input process
and an output process using handshaking protocol.
The input process can only read a single word (32
bit) at a time. The output device requires a
reversing byte order, which is performed by the
input process. Assign a delay of 1 ns to each
handshake signal.
AsyncComm

inputData outputData
input process output process

?Ex vlsi21.8b (difficulty: medium, optional):


Rewrite the above handshake model by using a clk1,
clk2 signal for the two synchronous processes as
well as a rst for initialization, and a start signal to
initiate one data transfer. Do not use any wait
constructions within the processes.
MicroLab, VLSI-21 (42/94)

JMM v1.4
Attributes
attribute function
signal’event function returning a Boolean value
signifying a change in value on this signal
signal’active function returning a Boolean value
signifying an assignment made to this
signal (may not be a new value)
signal’last_event function returning the time since the
last event
signal’last_active function returning the time since the
signal was last active
signal’last_value function returning the previous value
of this signal
signal’left returns the leftmost value of signal in
its defined range
signal’right returns the rightmost value of signal
in its defined range
signal’hight returns the highest value of signal
in its defined range
signal’low returns the lowest value of signal
in its defined range
signal’ascending returns true if signal has an ascending
range of values
signal’length returns the number of elements in the
array signal
MicroLab, VLSI-21 (43/94)

JMM v1.4
Generating Periodic Waveforms
library IEEE;
use IEEE.std_logic_1164.all; Z
time (ns)
entity Periodic is
port(Z: out std_logic); 0 10 20 30 40 50
end Periodic;

architecture behavioral of Periodic is


begin
process
begin
Z<=‘0’, ‘1’ after 10 ns, ‘0’ after 20 ns, ‘1’ after 40 ns;
wait for 50 ns;
end process;
end behavioral;

library IEEE; reset


use IEEE.std_logic_1164.all; phi1
phi2
entity TwoPhase is
port(phi1,phi2,reset: out std_logic);
end twoPhase; 0 10 20 30 40 50 60
time (ns)
architecture behavioral of TwoPhase is
begin
reset_process: reset<=‘1’, ‘0’ after 10 ns;
clock_process: process
begin
phi1<=‘1’, ‘0’ after 10 ns;
phi2<=‘0’, ‘1’ after 12 ns, ‘0’ after 18 ns;
wait for 20 ns;
end process;
MicroLab, VLSI-21 (44/94)
end behavioral;
JMM v1.4
Modeling Finite State Machines
reset
outputData
inputData state state
transition register output
process enable process outSig0
outSig1
outSig2
clk

architecture behavioral of MooreFSM is


type StateType is (MyState,YourState,InitState);
signal state : StateType;
signal outputData: std_logic_vector(5 downto 0);
begin output_process: process(state)
begin
transition_process: process(reset,clk) case state is
begin when MyState =>
if (reset = ‘0’) then outputData<=“01—00”;
state <= InitState; when YourState =>
elsif rising_edge(clk) then outputData<=“00100-”;
case state is when InitState =>
when MyState => outputData<= “100100”;
state<=YourState; when others =>
when YourState => outputData<=“000000”;
if (inputDataSignal = ‘1’) then end case;
state<=MyState; end process;
end if;
when others => null; outSig0<=outputData(0);
end case; outSig1 <=outputData(1);
end if; outSig2<=outputData(2);
end process;
end behavioral;
MicroLab, VLSI-21 (45/94)

JMM v1.4
Exercises vlsi21: FSM

?Ex409 (difficulty: easy): Write a VHDL model for


a traffic light controller. Use a Moore type FSM.
The signal carPresent indicates cars running on the
main street which always have priority. If no cars
are present on the main street, the secondary street
gets green lights.

OrangeState
orange

carPresent
green
red

main 0 1 0
second 1 0 0
carPresent
GreenState RedState1
orange

orange
green

green
red

red

main 0 0 1 main 1 0 0
second 1 0 0 second 0 0 1
carPresent

RedState2
orange
green

carPresent
red

reset main 1 0 0
second 0 1 0

MicroLab, VLSI-21 (46/94)

JMM v1.4
chapter 5

Modeling Structure

?a structural model of a system is described in terms


of interconnection of its components
? a structural model consists of 3 features:
?component declaration
?signal declaration
?component interconnection component
OR2 declaration
HalfAdder3 a
a sum z
b carry b
ports

component label
component
H1 H2 interconnection
HalfAdder3 HalfAdder3
s1 sum
in1 a sum a sum OR2
in2 b carry b carry s2 a
z cOut
s3
b
cIn
O3

MicroLab, VLSI-21 (47/94)

JMM v1.4
Example: Structural Model
library IEEE;
use IEEE.std_logic_1164.all;

entity FullAdder3 is
port (in1,in2,cIn: in std_logic;
sum,cOut: out std_logic);
end FullAdder3; component behavior
architecture structural of FullAdder3 is described elsewhere
component HalfAdder3
port(a,b: in std_logic;
sum,carry: out std_logic);
end component; component
declaration
component OR2
port(a,b: in std_logic;
z: out std_logic);
end component;

signal s1,s2,s3: std_logic;


signal
declaration
begin
H1: HalfAdder3 port map(a=>in1,b=>in2,
sum=>s1,carry=>s3);
H2: HalfAdder3 port map(a=>s1,b=>cIn, component
sum=>sum,carry=>s2); interconnection
O3: OR2 port map(a=>s2,b=>s3,
z=>cOut); (netlist)
end structural;

MicroLab, VLSI-21 (48/94)

JMM v1.4
Exercises vlsi21: Structural Model

? Ex410 (difficulty: medium): Write a VHDL code


for the structural model of the FullAdder3
described in the previous transparency. Assume a
delay of 1 ns for all logic gates
a) Write the structural VHDL code for a HalfAdder.
b) Write the VHDL codes for the necessary logic
gates like OR2 and others in one file
(logicgates.vhd)
b) Write the VHDL code for FullAdder3
c) Analyze and simulate the whole circuit. Be aware
of the correct sequence of analyzing.

MicroLab, VLSI-21 (49/94)

JMM v1.4
VHDL vs. Verilog:
library IEEE; Structural
use IEEE.std_logic_1164.all;
entity FullAdder4 is Description
port (a,b,cIn:in std_logic;
cOut,sum:out std_logic);
end FullAdder4;

architecture flatStructure of FullAdder4 is


component XOR
port(a,b: in std_logic; z:out std_logic); VHDL
end component;
component AND2
port(a,b: in std_logic; z:out std_logic);
end component;
component OR3
port(a,b,c: in std_logic; z:out std_logic);
module FullAdder4
end component; (a,b,cIn,cOut,sum);
signal net1,net2,net3,net4:std_logic; input a,b,cIn;
begin output cOut,sum;
u1: XOR port map (a,b,net1); wire net1,net2,net3,net4;
u2: XOR port map (cIn,net1,sum);
u3: AND2 port map (cIn,a,net2); XOR u1(net1,a,b);
u4: AND2 port map (cIn,b,net3); XOR u2(sum,cIn,net1);
u5: AND2 port map (a,b,net4); AND2 u3(net2,cIn,a);
u6: OR3 port map (net2,net3,net4,cOut); AND2 u4(net3,cIn,a);
end flatStructure; AND2 u5(net4,a,b);
OR3u6(cOut,net2,net3,net4);
endmodule
Verilog-HDL
MicroLab, VLSI-21 (50/94)

JMM v1.4
VHDL vs. Verilog:
Data Flow Description

library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_unsigned.all;

entity FullAdder5 is
port (a,b,cIn:in std_logi;
VHDL
sum,cOut:out std_logic);
end FullAdder5;

architecture dataFlow of FullAdder5 is


signal tmp: std_logic_vector(1 downto 0);
begin
tmp <= ‘0‘ & a + b + cIn;
cOut <= tmp(1);
sum <= tmp(0);
end behavior; module FullAdder5 (a,b,cIn,sum,cOut);
input a,b,cIn;
output cOut,sum;
Verilog-HDL
assign {cOut,sum} = a + b + cIn;
endmodule

MicroLab, VLSI-21 (51/94)

JMM v1.4
Hierarchy, Abstraction, and Accuracy

?Structural models simply describe interconnections


?Structural models do not describe any form of
behavior
?Hierarchy expresses different levels of detail
?Structural models are a way to manage large,
complex designs
?Modern designs have several 10 millions of gates
?Simulation time: the more detailed a design is
described, the more events are generated and thus
the larger the simulation time will be needed.

FullAdder3 top level

OR2 HalfAdder3

AND2 XOR2 bottom level


MicroLab, VLSI-21 (52/94)

JMM v1.4
Generics
?The VHDL language provides the ability to construct
parameterized models using the concept of generics
entity AND2 is
generic(andDelay: Time);
library IEEE; port(a,b : in std_logic; z: out std_logic;
use IEEE.std_logic_1164.all; end AND2;

entity HalfAdder4 is architecture genericDelay of AND2 is


generic(adderDelay: Time:=3 ns); begin
port(a,b : in std_logic; z<=a and b after andDelay;
sum,carry: out std_logic; end genericDelay;
end HalfAdder4;

architecture genericDelay of HalfAdder4 is


component AND2 is
generic(andDelay: Time);
port(a,b : in std_logic; z: out std_logic;
end component; values to generics
can be assigned at
component XOR2 is different locations
generic(xorDelay: Time);
port(a,b : in std_logic; z: out std_logic;
end component;

begin no semi column


C1: XOR2 generic map(12 ns) port map(a,b,sum); needed
C2: AND2 generic map(adderDelay) port map(a,b,carry);
end genericDelay;
MicroLab, VLSI-21 (53/94)

JMM v1.4
More on Generics
?Within a structural model there are two ways in
which the values of generic constants of lower level
components can be specified:
?in the component declaration
?in the component instantiation
?If both are specified, then the value provided by the
generic map() takes precedence.
?If neither is specified, then the default value
defined in the model is used.
library IEEE;
use IEEE.std_logic_1164.all;
entity GenericOR is
generic(n: positive:=2);
port(in1: in std_logic_vector((n-1) downto 0); z: out std_logic);
end GenericOR;

architecture behavioral of GenericOR is


begin
process(in1)
variable sum: std_logic:=‘0‘;
begin
sum:=‘0‘;
for i in 0 to (n-1) loop
sum:=sum or in1(i);
end loop;
z<=sum;
end process;
end behavioral;
MicroLab, VLSI-21 (54/94)

JMM v1.4
Exercises vlsi21: Hierarchy, Generic

? Ex411 (difficulty: medium): Write a VHDL code


of an 8 bit ALU based on the definitions made in
Ex402 with the Simple1BitALU.
a) Write a behavioral VHDL code for ALU8b.vhd
b) Write the structural VHDL code for ALU8 in one
file ALU8s.vhd. Assume a delay of 1 ns for all
logic gates. What is the worst case delay of the
ALU8.

? Ex412 (difficulty: easy): Write a VHDL code of


an n bit register with reset and enable inputs
(NbitRegister.vhd).

MicroLab, VLSI-21 (55/94)

JMM v1.4
Configuration

?Structural models may employ different levels of


abstraction
?Each component in a structural model may be
described as a behavioral or a structural model
?Configuration allows stepwise refinement in a
design cycle
?Configuration represents resource binding
?Description-synthesis design method

Configuration associates
an architecture
description to each
FullAdder3 component:

- behavioral or
- structural for
OR2 HalfAdder3
FullAdder3

AND2 XOR2

MicroLab, VLSI-21 (56/94)

JMM v1.4
Configuration: Component Binding

?Example of binding architectures: A bit-serial adder


?one of the different architectures must be
bound to the component C1 for simulation
?entity is not bound as interfaces do not change

architecture gataLevel of Comb is


- - -

C1
architecture lowPower of Comb is
a Comb sum - - -
b (combinational
logic) carry

C2
Dff
q d architecture highSpeed of Comb is
carryIn - - -
clk
rst clock

reset
architecture behavioral of Comb is
- - -

MicroLab, VLSI-21 (57/94)

JMM v1.4
Configuration: Default Binding Rules

?To analyze different implementations, we simply


change the configuration, compile and simulate.
?When newer component models become available we
bind the new architecture to the component

Default binding rules:


?if the entity name is the same as the component
name, then this entity is bound to the component
?if there are different architectures in the working
directory, the last compiled architecture is bound to
the entity

MicroLab, VLSI-21 (58/94)

JMM v1.4
Example: Configuration
C1 highSpeed
in1 Comb sum
in2 (combinational
logic)
carryIn carry
C2 MyDff
q d
clk
rst clock
configuration name behavioral
reset entity name
(used for simulation)

library name
configuration CFG_HighSpeed of SerialAdder is entity name
architecture name
for structural
for C1: Comb use entity WORK.Comb(highSpeed);
end for;

for C2: Dff use entity MyLibrary.MyDff(behavioral)


generic map(gateDelay=>5 ns)
port map(my_clk=>clk, my_d=>d,
my_q=>q, my_rst=>rst); if different component
end for; than described in
entity is used, then
end for; I/O mapping must
be declared.
end CFG_HighSpeed;

MicroLab, VLSI-21 (59/94)

JMM v1.4
Exercises vlsi21: Configuration
? Ex 413 (difficulty: easy): Write a VHDL code of
the bit-serial adder shown in the previous
transparency SerialAdder.vhd
a) Construct a model for the two components Comb
and MyDff and place them both in your WORK
library (don‘t use the library MyLibrary yet).
b) Adapt the configuration, compile and simulate it.

? Ex414 (difficulty: medium): Consider the circuit


shown below (ConfigExample). Construct a
structural model comprised of three components.
However in the configuration use only two
components by using a n-input AND gate.

i1
i2 &
i3
?1 o1

&

MicroLab, VLSI-21 (60/94)

JMM v1.4
chapter 6

Subprograms, Packages and Libraries

?VHDL provides mechanisms for structuring


programs, reusing software modules, and otherwise
managing design complexity.

?Packages contain definitions of procedures and


functions that can be shared across different VHDL
models.
?Packages may contain user defined data types and
constants and can be placed in libraries.

?Summary: procedures, functions, packages and


libraries provide facilities for creating and
maintaining modular and reusable VHDL programs.

MicroLab, VLSI-21 (61/94)

JMM v1.4
Functions
?Functions are used to compute a value based on the
values of the input parameters. Functions are
placed in declarative parts. Example of function
definition:

function rising_edge (signal clock: in std_logic) return boolean;

?Functions cannot modify parameter values


(procedures can). Example of function call:

rising_edge(clk)
?Functions execute in zero simulation time, thus
wait statements cannot exist in functions.
Parameters are restricted to be of mode in.
mode not necessary
function rising_edge (signal clock: std_logic) return boolean is
variable edge: boolean:=false;
begin
edge:=(clock= ‘ 1‘ and clock‘event);
return(edge);
end rising_edge;
MicroLab, VLSI-21 (62/94)

JMM v1.4
Example: Type Conversion Function
with Functions
?As VHDL is a type sensitive language, type
conversions are quite often necessary.

note: size is not declared

function to_bitvector(svalue: std_logic_vector) return bit_vector is


variable outvalue: bit_vector(svalue‘length-1 downto 0);
begin
for i in svalue‘range loop
case svalue i is
when ‘0‘ => outvalue i:=‘0‘;
when ‘1‘ => outvalue i:=‘1‘;
when others => outvalue i:=‘0‘;
end case;
end loop;
end to_bitvector;

?Many conversion procedures as well as resolution


functions can be found in std_logic_1164 or
std_logic_arith libraries and others. Have a look at
$SYNOPSYS/packages/IEEE/src/
MicroLab, VLSI-21 (63/94)

JMM v1.4
Procedures
?Procedures are subprograms that can modify one or
more of the input parameters. Example of procedure
declaration reading from a file f:

procedure read_v1d (variable f: in text; v: out std_logic_vector);

?if the class of the procedure parameters is not


explicitly declared, then the following rules apply:
?parameters of mode in are assumed to be of class constant
?parameters of mode out or inout are assumed to be of class
variable
?Variables declared within a procedure are initialized
on each call to the procedure and their values do not
persists across invocations of the procedure.
?Signals cannot be declared within procedures
?Poor programming: Procedures declared within
process can make assignments to signals
corresponding to the ports of the encompassing
entity.
?Procedure call:
Dff(clk=>clk,reset=>reset,d=>s2,q=>s1,qbar=>open);
MicroLab, VLSI-21 (64/94)

JMM v1.4
Example: Procedure
library IEEE;
use IEEE.std_logic_1164.all;
entity CPU is
port(di: out std_logic_vector(31 downto );
addr: out std_logic_vector(2 downto 0);
r,w: out std_logic;
do: in std_logic_vector(31 downto 0);
s: in std_logic);
end CPU;

architecture behavioral of CPU is


procedure Mread(address: in std_logic_vector(2 downto 0);
signal r: out std_logic;
signal s: in std_logic;
signal addr: out std_logic_vector(2 downto 0);
signal data: out std_logic_vector(31 downto 0)) is
begin
addr<=address; procedure Mwrite(address: in std_logic_vector(2 downto 0);
r<=‘1‘; signal data: in std_logic_vector(31 downto 0);
wait until s=‘1‘; signal addr: out std_logic_vector(2 downto 0);
data <= do; signal w: out std_logic;
r<=‘0‘; signal di: out std_logic_vector(31 downto 0)) is
end Mread; begin
addr<=address;
w<=‘1‘;
wait until s=‘1‘;
di <= data;
begin w<=‘0‘;
-- CPU behavioral end Mwrite;
-- description
end behavioral;
MicroLab, VLSI-21 (65/94)

JMM v1.4
Overloading

?A very useful feature of the VHDL language is the


ability to overload a subprogram or an operator.
?Imagine writing different Flip-Flop models with no
and with asynchronous inputs and with different
argument types. With the overloading feature only
one single Flip-Flop name can be used.
?Example for Dff calls:

Dff(clk,d,q,qbar);
Dff(clk,d,q,qbar,reset,clear);
?From the type and number of arguments we can tell
which procedure we meant to use.
?Note that in std_logic_1164.vhd the boolean
functions and, or, etc have been defined for
std_logic types, the functions +,*, etc have been
defined for certain predefined types of the language
such as integer. See also std_logic_arith package.
function “*“(arg1,ar2: std_logic_vector) return std_logic_vector;
function “+“(arg1,ar2: singed) return signed;
MicroLab, VLSI-21 (66/94)

JMM v1.4
Packages
?Locally related functions and procedures can be
grouped into packages, and thus easily be shared
among designs and people.
package MyLibraryPackage is
--
-- type declarations
package declaration
-- function declarations
-- procedure declarations similar to VHDL entity
-- defines interfaces
end MyLibraryPackage;

package body MyLibraryPackage is


-- package body
-- functions
-- procedures similar to VHDL architecture
-- defines behavior
end MyLibraryPackage;

?package declaration needs to be analyzed first, and


then package body can be analyzed.
?Packages are used as libraries and referenced within
VHDL design units via the use clause.

MicroLab, VLSI-21 (67/94)

JMM v1.4
Example: Package Declaration

package std_logic_1164 is

type std_ulogic is (‘U‘, -- uninitialized


‘X‘, -- forcing unknown
‘0‘, -- forcing 0
‘1‘, -- forcing 1
‘Z‘, -- high impedance
‘W‘, -- weak unknown
‘L‘, -- weak 0
‘H‘, -- weak 1
‘-‘ -- don‘t care);
type std_ulogic_vector is array (natural range <>) of std_ulogic;
subtype std_logic is resolved std_ulogic;

type std_logic_vector is array (natural range <>) of std_logic;

function “and“ (l,r: std_logic_vector) return std_logic_vector;


function “and“ (l,r: std_ulogic_vector) return std_ulogic_vector;

-- rest of package declaration

end std_logic_1164;

MicroLab, VLSI-21 (68/94)

JMM v1.4
Libraries

?Each design unit - entity, architecture, package - is


analyzed (compiled) and placed in a design library.
?Libraries are generally implemented as directories
and are referenced by a logical name.
?In VHDL the libraries STD and WORK are
implicitly declared.
?WORK is the working design library normally
placed in a local directory.
?Once a library has been declared, all of the
functions, procedures and type declarations of a
package can be accessed.
all functions, procedures, typed
are visible
library IEEE;
use IEEE.std_logic_1164.all;

only the „xnor“ function is visible


library IEEE;
use IEEE.std_logic_1164.xnor; visibility must be established
for each design unit – entity-
separately
MicroLab, VLSI-21 (69/94)

JMM v1.4
Synopsys tools on unix workstations

Example: Libraries and Packages


/home/MyHome/ design environment .synopsys_vss.setup
- VHDLdesign/
- WORK/ all source VHDL design files
library: - lib/ all compiled designs
MyLibrary - src/
compiled package: MyPackage
.synopsys_vss.setup source file: MyPackage.vhd

DEFAULT: ./WORK

MyLibrary : ./lib
use = . ./src
timebase = ns

MyPackage.vhd in a unix shell:


package MyPackage is analyze the package MyPackage
-- analyze the design MyVHDLdesign
end MyPackage;
cd /home/MyHome/VHDLdesign
gvan –w MyLibrary src/MyPackage.vhd
package body MyPackage is
--
end MyPackage;
gvan MyVHDLdesign.vhd

MyVHDLdesign.vhd components
library MyLibrary; can also be
use MyLibrary.MyPackage.all; placed into
library
-- use MyLibrary.all; libraries
library MyLibrary
WORK
entity MyVHDLdesign is /home/MyHome/VHDLdesign/lib
... /home/MyHome/VHDLdesign/WORK
MicroLab, VLSI-21 (70/94)

JMM v1.4
Exercises vlsi21: Libraries & Packages
? Ex415 (difficulty: medium): The small circuit
ConfigExample from exercise Ex414 shall be
rewritten by using the components OR2 and ANDn
from the library MyLibrary.
a) Write the VHDL file MyComponents.vhd holding
the two components OR2 and ANDn and compile it
into the library MyLibrary.
b) Rewrite the ConfigExample circuit using only
library elements and call it LibraryExample.vhd,
compile and simulate it.
i1 ANDn
i2 & OR2
i3
ANDn ?1 o1

&

? Ex416 (difficulty: medium): Write the VHDL


package MyPackage with the functions OneCounter
(counting ‚1‘) and ParityGenerator should accept
std_logic_vectors or bit_vectors of any size), and
analyze it into the library MyLibrary. Use the
defined functions in your design PackageExample to
show its functionality.

MicroLab, VLSI-21 (71/94)

JMM v1.4
Exercises vlsi21: Packages
? Ex417 (difficulty: medium): The bit-serial adder of
exercise Ex413 shall we rewritten using a
procedure call for the Dff instead of a component
(SerialAdder2.vhd). Place the procedure into a
package MyPackage and analyze it into the library
MyLibrary. Verify the functionality.

C1 highSpeed
in1 Comb sum
in2 (combinational
logic)
carryIn carry
Dff
q d
clk library
rst clock MyLibrary
behavioral
reset

MicroLab, VLSI-21 (72/94)

JMM v1.4
VHDL vs. Verilog: Data Types
VHDL Verilog-HDL

?type driven language ?arrays


?predefined data types ?run-time constants:
in packages: parameter
character, integer, real, ?continuous driven
bit, std_logic, textio, ...
nets: wire, tri, ...
?enumerate types
?triggered assignments:
?arrays reg, integer, real, ...
?records
?pointers

MicroLab, VLSI-21 (73/94)

JMM v1.4
VHDL vs. Verilog: Operators
Operator type function VHDL Verilog
arithmetic a+b + +
a-b - -
a*b * *
a/b / /
a-b*n a div b mod %
a-(a/b)*b rem
logical a and b and &
a or b or ¦
not(a and b) nand ~&
a exor b xor ^
shift logic srl,sll >>
shift arith. sra,sla
rotate ror, rol
reduction, & {a,b}
concatenation
replication {4{a}}
relational > > >
>= >= >=
/=

MicroLab, VLSI-21 (74/94)

JMM v1.4
VHDL vs. Verilog:
Sequential Structures
-- inside an architecture
...
variable inp: std_logic_vector (7 downto 0);
variable outp,cout:std_logic_vector (7 downto 0);
VHDL
process (clk)
begin
if (clk’event and clk = ‘1’) then
outp := outp + inp;
cout := outp + 1;
end if;
end process;
...
/* inside a module */
...
wire [7:0] inp;
sequentially executed statements reg [7:0] outp, cou;
...
always @(posedge clk)
begin
outp = oupt + inp;
cout = outp + 1;
end
Verilog-HDL ...

MicroLab, VLSI-21 (75/94)

JMM v1.4
VHDL vs. Verilog:
Parallel Structures
-- in an architecture
...
variable inp: std_logic_vector (7 downto 0);
signal outp,cout:std_logic_vector (7 downto 0);
VHDL
p1: process (clk)
begin parallel executed
if (clk’event and clk = ‘1’) then statements
outp <= outp + inp;
cout <= outp + 1;
end if;
end process; /* in a module */
...
p2: process (reset) wire [7:0] inp;
begin reg [7:0] outp, cou;
if (reset = ‘0’) then ...
outp <= “00000000“; always @(posedge clk)
end if; fork
end process; outp = outp + inp;
... cout = outp + 1;
join
parallel
executed blocks always @(reset)
two drivers if (!reset)
Verilog-HDL outp = 8’b0;
...

MicroLab, VLSI-21 (76/94)

JMM v1.4
VHDL vs. Verilog: Assignments
architecture ex1 of AssignExample is
signal x1, y1, y2, z1, z2:
std_logic_vector (7 downto 0); VHDL
variable x2: std_logic_vector (7 downto 0);
...
begin
p1: process (clk)
begin signal assignment
if (clk’event and clk = ‘0’) then variable assignment
x1 <= y1;
y1 <= x1;
z1 <= y1 after 12ns; Verilog-HDL
end if;
end process; module AssignExample
wire [7:0] v,y2,z2;
p2: process (y2) reg [7:0] x1,y1,z1,x2;
begin ...
x2 := y2; always @(posedge clk)
y2 <= x2; fork
z2 <= y2 after 12ns; x1 = y1;
end process; y1 = x1;
end ex1; z1 #(12) = y1;
join
before the falling edge of clk:
assign x2 = y2;
x=1, y=2, z=3 assign y2 = x2;
12ns after falling edge of clk: assign #(12) z2 = y2;
x= y= z= ? endmodule
MicroLab, VLSI-21 (77/94)

JMM v1.4
VHDL vs. Verilog:
Sequential Logic
library IEEE; register with
use IEEE.std_logic_1164.all;
package MyDefinition is asynchronous reset
type vector16 is array (15 downto 0) of
std_logic;
end MyDefinition; VHDL

library IEEE;
use IEEE.std_logic_1164.all;
use work.MyDefinition.all;

entity AsynRegister is
port (clk,rst: in std_logic;
a: in vector16; z: out vector16); Verilog-HDL
end AsynRegister;
module AsynRegister(clk,rst,a,z);
architecture DemoExample of AsynRegister is input clk,rst;
begin input [15:0] a;
process (clk, rst); output [15:0] z;
begin
if (rst = ‘0’) then always @(posedge clk)
z <= vector16’(others => ‘0‘); if (rst == 1’b0)
elsif (clk’event and clk = ‘1’) then z = 16’b0;
z <= a; else
end if; z = a;
end process; endmodule
end DemoExample; MicroLab, VLSI-21 (78/94)

JMM v1.4
“Dataflow” Modeling
library IEEE;
use IEEE.std_logic_1164.all;

entity Demux2x4 is port(


a,b,enable: in std_logic;
z: out std_logic_vector(0 to 3););
end Demux2x4;

architecture dataflowof Demux2x4 is


signal abar,bbar: std_logic;
begin
z(3) <= not(a and b and enable);
z(0) <= not(abar and bbar and enable);
abar <= not a;
z(2) <= not(a and bbar and enable);
abar <= not a;
z(1) <= not(abar and b and enable);
end dataflow;

All the signal assignment statements (“<=“) happen


concurrently after some specified delay which defaults
to 1 “delta”, an infinitesimally small delay. Note that
concurrent statements are always “running” so whenever
A, B or ENABLE change then ABAR, BBAR, and Z(0 to 3)
will also change after some delay.

The delay in assigning a signal its new value means that


the following statement is meaningful (it generates a
periodic waveform):
CLK <= not CLK after 10 ns;

MicroLab, VLSI-21 (79/94)

JMM v1.4
VHDL Example: Behavioral Modeling
library IEEE;
use IEEE.std_logic_1164.all;
use IEEE.std_logic_arith.all;

entity Demux2x4 is port(


a,b,enable: in std_logic;
z: out std_logic_vector(0 to 3););
end Demux2x4;

architecture behavioral of Demux2x4 is


begin
process(a,b,enable)
variable abar,bbar: std_logic;
begin
abar := not a; local variables (separate
bbar := not b; copies for each instance
if (enable = ‘1’) then of Demux2x4)
z(3) <= not(a and b);
Process statements z(2) <= not(a and bbar);
can be compiled so z(1) <= not(abar and b);
behavioral simulations z(0) <= not(abar and bbar);
can be quite fast. else
z <= “1111”;
end if;
end process;
end behavioral; vector constant

Statements within a process are executed sequentially,


like a program. The process is scheduled for execution
after any events are processed for variables on its
sensitivity list. Values of local variables are maintained
between executions.
MicroLab, VLSI-21 (80/94)

JMM v1.4
Synthesis

Idea: once an behavioral model has been finished why not


use it to automatically synthesize a logic implementation in
much the same was as a compiler generates executable code
from a source program?
a.k.a. “silicon compilers”
Synthesis programs process the HDL then

?infer logic and state elements


?perform technology-independent optimizations
(e.g., logic simplification, state assignment)
?map elements to the target technology
?perform technology-dependent optimizations
(e.g., multi-level logic optimization, choose
gate strengths to achieve speed goals)

Synopsys, Inc. is the current leader in


providing synthesis tools and synthesizable
HDL modules.

MicroLab, VLSI-21 (81/94)

JMM v1.4
Logic Synthesis
Z <= (A and B) or C; if (SEL = ‘1’) then Z <= B;
else Z <= A;
A end if;
B
C Z B 1
0 Z
A
A
B SEL
C Z
B
SEL Z
A

signal x,y,sum: std_logic_vector(3 downto 0);


sum <= unsigned(x) + unsigned(y);
SUM(0) SUM(1) SUM(2) SUM(3)

full full full full


0 adder adder adder adder NC

X(0) Y(0) X(1) Y(1) X(2) Y(2) X(3) Y(3)

process(word)
variable result: std_logic;
begin WORD(3)
result := ‘0’; PARITY
for j in 0 to 3 loop
result := result xor word(j); WORD(2)
end loop;
parity <= result;
WORD(1)
end process; WORD(0)

MicroLab, VLSI-21 (82/94)

JMM v1.4
FSM Example
architecture behavioral of Moore is
type StateType is (S0,S1,S2,S3);
signal current,next: StateType;
begin

process(current) -- state transition


begin
case current is
when S0 =>
if (a=‘1’) then next<=S2; end if;
when S1 =>
if (a=‘1’) then next<=S0; else next<=S2; end if;
when S2 =>
if (x=‘1’) then next<=S3; end if;
when S3 =>
if (x=‘1’) then next<=S1; end if;
end case;
end process;

process(current) -- output logic


begin
case current is
a
when S0 => z <= ‘0’;
when S1 => z <= ‘0’;
when S2 => z <= ‘1’;
when S3 => z <= ‘1’; S0
z=0
end case; a a
end process;
S1 a S2
z=0 z=1 a
process(clk,reset) -- state register
begin a
a
if (reset=‘0’) then current<=S0; S3
elsif (clk’event and clk=‘1’) then z=1
current <= next; a
end if;
end process;
end behavioral;
MicroLab, VLSI-21 (83/94)

JMM v1.4
Further Reading

ISBN 0-13-181447-8 ISBN 0-7923-9472-0

Also:

? D. Perry, VHDL, Second Edition, McGraw Hill, 1993

? see VHDL tutorials at I3S-CD or on the web


http://www.microlab.ch/academics/courses/vlsi/g.html

? don‘t forget to study the CBT tutorial on VHDL

MicroLab, VLSI-21 (84/94)

JMM v1.4
Test Bench /1

?avoid interactive simulation, because it can never


be used again
?test benches reduce total simulation development
time
?test benches are used to verify designs during
stepwise refinement
?test bench methodology bridges simulation with
automatic test equipment (ATE)

I can relax
my test bench does
everything automatically

MicroLab, VLSI-21 (85/94)

JMM v1.4
Test Bench /2
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested
?there is a nice measurement equipment
?there are skilled and hard working people
?there are no signals coming or going to the outside of the
lab

Test Bench

control response
and generation
stimulus and
generation verification

device under test (DUT)

why do we need MicroLab


if my test bench does the
job as well

MicroLab, VLSI-21 (86/94)

JMM v1.4
Test Bench in Design Flow
design of
VHDL model

simulation of test bench


VHDL model inp VHDL out
FPGA synthesis model
place & route

FPGA test
(debugger)
test bench synthesis of
test machine logic model
inp FPGA out test bench
chip simulation of
logic model inp out

place & route


physical design

simulation of test bench


extracted model inp out

ASIC fabrication
test bench
test machine
prototype test
(ASIC) inp ASIC out
chip
MicroLab, VLSI-21 (87/94)

JMM v1.4
VHDL Test Bench
use IEEE.std_logic_1164.all
entity TestBench is test bench has
end TestBench; no inputs, no outputs
architecture sample of TestBench is
signal clk, a: bit;
signal b: bit;
component MyCircuit
port(clk,a:in bit; b: out bit);
end component;
begin
call of device under test
DUT: MyCircuit port map (clk,a,b); (DUT)
process
begin clk generation
clk <= ‘0’, ‘1‘ after 20 ns, ‘0‘ after 70 ns;
wait for 100 ns;
end process;
TestPatternGenerator: block
begin test pattern generation
process on a cycle by cycle basis
begin
a <= ‘0’; -- test cycle 1
wait for 100 ns;
a <= ‘1’; -- test cycle 2
wait for 100 ns; response pattern
... verification not
end process; yet implemented!
end block;
end sample;
MicroLab, VLSI-21 (88/94)

JMM v1.4
Test Bench - Test Cycle

?design strictly synchronous circuits


?cycle based test benches

test cycle

apply input patterns capture output response

clock

input valid

output stable stable stable

MicroLab, VLSI-21 (89/94)

JMM v1.4
ProTest Test Machine

?test bench controls CAD simulator and test machine


?low cost rapid prototyping and test system

MicroLab, VLSI-21 (90/94)

JMM v1.4
Conclusions

?HDLs are very useful for behavioral hardware


system descriptions
?abstract models do not precisely reflect the reality
?restriction to synthesizable coding is necessary
?technology independency opens the possibility to
fast FPGA prototyping
?test benches increase chip quality and highly
decrease simulation time

MicroLab, VLSI-21 (91/94)

JMM v1.4
Exercises: VLSI-21: Test-Bench

?CAD Ex418: Test-Bench (difficulty: easy) Instead


of interactive simulation or writing macros for
interactive simulation, it is state-of-the-art to use
test-benches for simulation and chip test. Write a
test bench file tb_SerialAdder2.vhd for the
previous exercise Ex417. Generate the clock signal
with a process and write sequential test cycles for
the input signals. Be aware that the test-bench has
no input and output signals, but calls the unit-
under-test (UUT) and generates all stimuli.

test bench
inp VHDL out
model

MicroLab, VLSI-21 (92/94)

JMM v1.4
Coming Up...

?Next topic…
CAD exercises and mini FPGA projects PWM, blackjack
dealer, simple microprocessor, etc

?Readings for next time...


VHDL tutorials
A Prototype Test System for ASICs and FPGAs with a Tight
Link to VHDL and Verilog-HDL Based CAD Simulators,
DATE’99. Design Automation and Test Engineering in
Europe Conference, Jacomet et. al. (see on the MicroLab
web)
On a Development Environment for Real-Time information
Processing in System-on-Chip Solutions, SBCCI’01,
IEEE 14th Symposium on Integrated Circuit and System
Design, Brazil Sept. 2001, Jacomet et. al. (see on the
MicroLab web)

MicroLab, VLSI-21 (93/94)

JMM v1.4
Exercises: VLSI-21 #1

?CAD Ex45x: PWM Project (difficulty: easy; time:


medium): Design of a pulse width modulator
(PWM) controlling a DC-motor. The PWM shall
have an microprocessor interface. The VHDL design
is simulated, compiled and implemented into an
FPGA and is supposed to drive small dc motor.

?CAD Ex450: (difficulty: easy): Design the VHDL


code of the PWM element. The btrdy and ack
signals are handshake signals for communication
with the microprocessor data bus. A value 0 on the
8-bit data bus will switch off the dc motor
(pOut=‘1‘), a non-zero value will generate a PWM
signal with an on-time of (data/256)*100% of a
period. Analyze the VHDL syntax with gvan.
(data/266) * 100%
btrdy pOut
ack PWM
data 8
PWM period
clk
rst
MicroLab, VLSI-21 (94/94)

JMM v1.4
Exercises: VLSI-21 #2

?CAD Ex451: (difficulty: easy): Design a test-bench


for the PWM. Simulate your VHDL code with the
Synopsis VSS simulator and use your test-bench
to verify its correct behavior.
?Result: see exercise Ex451 on the MicroLab web

?CAD Ex452 (difficulty: easy): Synthesize the


PWM VHDL code into a gate level schematic for a
Xilinx FPGA target technology. Connect your VHDL
signals to the correct FPGA pins. Perform the
place&route of the logic elements.
?Result: see exercise Ex452 on the MicroLab web

?CAD Ex453 (difficulty: easy): Download your PWM


circuit into an FPGA and and applying different
PWM values to your circuit by the GECKO User
Interface tool. Use an oscilloscope to verify its
correct output behavior. This exercise has to be
done in MicroLab, using the GECKO system.
?Result: see exercise Ex453 on the MicroLab web

MicroLab, VLSI-21 (95/94)

JMM v1.4
Exercises: VLSI-21 #3

?CAD Ex400 (difficulty: easy): Design VHDL a 2:1


multiplexer. Use a dataflow model. Simulate your
VHDL code with the Synopsys VSS simulator. Get
familiar with interactive simulation.
?Result: see exercise Ex400 on the microlab web

?CAD Ex401 (difficulty: medium): Design in VHDL


a the SN74160 synchronous decimal counter. Use a
behavioral model. Simulate your VHDL code with
the Synopsis VSS simulator and use macros for
interactive simulation.
?Result: see exercise Ex401 on the microlab web

?CAD Ex402 (difficulty: easy): Schematic entry of


the blackjack-dealer on block-level using Synopsys
SGE tool.
?Result: see exercise Ex402 on the microlab web

MicroLab, VLSI-21 (96/94)

JMM v1.4
Exercises: VLSI-21 #4

?CAD Ex403 (difficulty: easy): Skeleton VHDL code


generation for blackjack-dealer from block-level
schematics using Synopsys SGE tool.
?Result: see exercise Ex403 on the microlab web

?CAD Ex404 (difficulty: medium): Design the


VHDL code of your blackjack-dealer. Use the
prepared templates as a guide.
?Result: see exercise Ex404 on the microlab web

?CAD Ex405 (difficulty: medium): Write a VHDL


test bench for your blackjack dealer. Generate the
following sequence of card values: 3, 11, 11, 7, 2,
11, 6.
?Result: see exercise Ex404 on the microlab web
(final result is: 21)

MicroLab, VLSI-21 (97/94)

JMM v1.4
Exercises: VLSI-21 #5

?CAD Ex406 (difficulty: easy): Synthesize the


blackjack dealer VHDL code into a gate level
schematic for a Xilinx FPGA target technology.
Perform the place&route of the logic element.
?Result: see exercise Ex406 on the microlab web

?CAD Ex407 (difficulty: medium): Perform a back-


annotation of your FPGA chip and simulate it again
with the real timing information. Does your chip
still work? Look for the errors.
?Result: see exercise Ex407 on the microlab web

?CAD Ex408 (difficulty: medium): Download your


blackjack dealer circuit into an FPGA and run your
test bench again on the ProTest test system. This
exercise has to be done in MicroLab.
?Result: see exercise Ex408 on the microlab web

MicroLab, VLSI-21 (98/94)

JMM v1.4
VLSI Design I
Automatic Synthesis of Digital Circuits

Why should I not enjoy


life instead of drawing
schematics if CAD tools
can do the job for me?

Š Overview
design abstraction domains
architectural models
Š Goal: You are familiar with the design abstraction
domains, the description-
description-synthesis design method,
the design strategies as well as the three synthesis
steps. You know the FSMD architectural model as
well as the interprocess communication models.
MicroLab, VLSI-22 (1/40)

JMM v1.4
Introduction

‹ system complexity is increasing


‹ product lifetime is decreasing

Ö design efficiency is essential


Ö new design methods are necessary
Ö higher abstraction levels are introduced
Ö CAD tools able to handle large amounts of data are
needed

MicroLab, VLSI-22 (2/40)

JMM v1.4
Design Methodology
Š budget ($, speed, area,
power, schedule, risk)

Š low-
low-level building blocks, spice
high-
high-level architecture paper & pencil

Gee, I skipped these steps


Š specification when doing the project!

Š behavioural design, verification

Š logic design, verification


schematics
Š layout, verification simulation
timing analysis
layout, drc
extraction
net compare
LVS (layout vs schematic)
MicroLab, VLSI-22 (3/40)

JMM v1.4
Capture--Simulation Method
Capture

bottom--up approach
Š bottom
Š structure of a system is described
Š knowledge of an experienced designer is difficult to
automate

& CLK
D Q

data 3A
clk
ena

MicroLab, VLSI-22 (4/40)

JMM v1.4
Description--Synthesis Method
Description

Š top-
top-down approach
Š behaviour of a system is described
Š technology independent
Š CAD algorithms can search the solution space very
quickly

&
D Q
if data-ready then
bus := data;
else
bus := high-Z; clk
end if;

MicroLab, VLSI-22 (5/40)

JMM v1.4
Design methods for VLSI circuits

Š use advantages of top-


top-down and bottom-
bottom-up design
methods
Š automatic optimisations are not always ideal,
but
Š an optimisation of a 70’000 gate design on a
100’000 gate gate-
gate-array makes no sense
need of abstract design languages
Öneed
need to keep the design cycle short
Öneed

what it is now
top-
top-down or bottom-
bottom-up ?

MicroLab, VLSI-22 (6/40)

JMM v1.4
Abstraction Domains

Š VLSI designs can be performed in 3 abstraction


domains:
Š behavioural domain
Š structural domain
Š physical domain
Š each domain gives different freedoms to the
designer
Š parallel or serial algorithms
Š logic technology and bit-
bit-slice
Š full-
full-custom and macro-
macro-cells ...

MicroLab, VLSI-22 (7/40)

JMM v1.4
Abstraction Domains: YY--Chart

synthesis

Behavioural Domain Structural Domain


applications, algorithm s
processors
progra ms
system AL Us, registers
subro utines, b. equ ations
abstraction level instructionslogic gates
tra nsistors
micro architecture
abstraction level layout, transistors

logic cells
abstraction level
chips, mo d ules
circuit chips, MC M s, boar ds
abstraction level
Physical Domain

MicroLab, VLSI-22 (8/40)

JMM v1.4
Behavioural Domain

Š description and verification of first ideas


Š function and not implementation is asked
Š modelling with general purpose languages
Š modula-
modula-2, pascal,
pascal, c, c++, lisp, ...
Š matlab,
matlab, mathematica,
mathematica, ...
Š vhdl,
vhdl, verilog-
verilog-hdl,
hdl, cathedral, ...
Š graphic languages as vee,
vee, ...
Š transformation to structural domain: synthesis

Behavioural Domain

progr a ms
subro utines, b. equ ations
instructions

MicroLab, VLSI-22 (9/40)

JMM v1.4
Structural Domain

Š description and verification of a solution


Š implementation decisions taken
Š restrictions like delay, signal strength, etc.
Š modelling styles
Š vhdl,
vhdl, verilog-
verilog-hdl,
hdl,
Š schematic
Š transformation to physical domain:
logic minimization, place and route tools
logic

Structural Domain
processors
AL Us, registers
logic gates
tra nsistors

MicroLab, VLSI-22 (10/40)

JMM v1.4
Physical Domain

Š description and verification of physical


implementation
Š process technology specific implementation
Š floorplan,
floorplan, mask-
mask-layout, packaging
Š description formats
cif,, gds2
Š cif
Š stick diagrams, symbolic layout

layout, transistors

cells

chips, mo d ules

chips, MC M s, boar ds

Physical Domain

MicroLab, VLSI-22 (11/40)

JMM v1.4
Abstraction Levels

design domains are divided in several abstraction


design
levels:
Š system level
Š micro architecture level
Š logic level
Š circuit level

MicroLab, VLSI-22 (12/40)

JMM v1.4
Abstraction: System Level

Š highest abstraction level


Š description with HDLs or graphical block diagrams

24 bit graphic video


64 bit RISC
accelerator interface

64 MByte 8 GByte ISDN


memory hard disk interface

MicroLab, VLSI-22 (13/40)

JMM v1.4
Abstraction: Microarchitecture Level

Š register transfer system is a pure sequential


machine
Š use of memory elements and combinational logic
Š register transfer is a complete specification on
what a chip will do on every cycle

output

input reg reg


combinational
logic
combinational
logic
reg
combinational
logic

MicroLab, VLSI-22 (14/40)

JMM v1.4
Abstraction: Logic Level

Š circuit description on a quite low abstraction level


Š today only used to design optimised functional
blocks

cin sel
a
b
mux s

ALU
cout

MicroLab, VLSI-22 (15/40)

JMM v1.4
Abstraction: Circuit Level

Š lowest abstraction level


Š transistor schematic or mask-
mask-layout
Š comparable to machine code in computer science

c y
b

MicroLab, VLSI-22 (16/40)

JMM v1.4
Design Strategies

Š the goal is a fast as possible transfer of an idea to


a chip
Š descriptions in the 3 abstraction domains
Š strategies used:
Š hierarchy
Š regularity
Š modularity
Š locality

a strategy?
why not ad-
ad-hoc

MicroLab, VLSI-22 (17/40)

JMM v1.4
Design Strategies: Hierarchy

basic idea: divide and conquer


Š dividing in modules, sub-
sub-modules until complexity
of sub-
sub-modules is comprehensible
Š comparison to software engineering: split programs
in modules, procedures, subroutines.

cin
a sum
b
cin sum
a
b adder cout
cout

MicroLab, VLSI-22 (18/40)

JMM v1.4
Design Strategy: Regularity

Ö goal is reduction of complexity


Ö idea: divide in similar building blocks
Š identical blocks, sub-
sub-blocks, cells, transistor sizes
Š 1-dim. arrays: bit-
bit-slice technique
Š 2-dim. arrays: systolic arrays

si+3 si+2 si+1 si

ci+3 full ci+2 full ci+1 full ci full ci-1

adder adder adder adder


ai+3 bi+3 ai+2 bi+2 ai+1 bi+1 ai bi

MicroLab, VLSI-22 (19/40)

JMM v1.4
Design Strategies: Modularity

different modules should not influence each other


Ödifferent
sub-
sub-modules with well formed interfaces:
Ösub
Š do not use transmission gates
Š well defined signal types and strengths
Š well defined interconnection widths, etc.

MicroLab, VLSI-22 (20/40)

JMM v1.4
Design Strategies: Locality

idea: reduction of complexity due to information


Öidea:
hiding
Š few global variables
Š reduction of inter-
inter-module influences
Š reduction of global wiring
Š time locality leads to synchron designs (compare
local variables in software engineering)

I can’t see anything

MicroLab, VLSI-22 (21/40)

JMM v1.4
Automatic Synthesis /1

Š automatic synthesis: transformation of a design


from behavioural to structural domain
Š silicon compilation: transformation from
behavioural to physical domain

synthesis
Behavioural Domain Structural Domain

silicon compilation

Physical Domain

MicroLab, VLSI-22 (22/40)

JMM v1.4
Automatic Synthesis /2

Š automatic synthesis tools on high abstraction levels


do not exist yet
Š not every description is synthesizable
Š synthesis is a design process and not a only a
coding as in software engineering
Š synthesis steps:
Š allocation
Š scheduling
Š binding

MicroLab, VLSI-22 (23/40)

JMM v1.4
Automatic Synthesis: Allocation

Š allocation defines the necessary resources


Š clocking strategy, pipelining, memory structure etc.
have to be defined
Š manual allocation reduces the search space of
design solutions
Š trade-
trade-off between chip-
chip-area and performance
Š parallel implementations of designs have high
throughput, but consume large areas

delay s1

s4

s6

s8
s10
s14 s18
s22

area

MicroLab, VLSI-22 (24/40)

JMM v1.4
Allocation: Example
Š RTL example
xx = a + b;
yy = a * c;
zz = x + d;
xx = y - d;
xx = x + c;
Š allocation: 1 adder, 1 multiplier, 1 substractor

a b c d

+ * +
y z

-
x2

+
x3
MicroLab, VLSI-22 (25/40)

JMM v1.4
Automatic Synthesis: Scheduling

Š scheduling defines the operation sequencing


Š operations are bound to clock cycles
Š scheduling principles:
Š resource limited: given a set of resources, solutions for a
minimal execution time has to be found
Š time-
time-limited: given a total execution time, a minimal set
of resources has to be found

MicroLab, VLSI-22 (26/40)

JMM v1.4
Scheduling: Example

Š resource limited scheduling


Š each operation is bound to a clock cycle
Š solutions for minimal execution time
Š directed acyclic graphs can be used

a b c

+ *
cycle 1 y
d

- +
cycle 2
x2 z

+
cycle 3
x3

MicroLab, VLSI-22 (27/40)

JMM v1.4
Automatic Synthesis: Binding

Š binding phase: operations and memory accesses


within the clock cycles are bound to the hardware
resources
Š resources can be reused in different clock cycles
Š binding steps:
Š variables are bound to memory elements
Š operations are bound to functional blocks
Š interconnection elements are bound for data transfers
(busses, multiplexers)
multiplexers)

MicroLab, VLSI-22 (28/40)

JMM v1.4
Binding: Example

Š variables are bound to memories


Š temporary variables x1 and x2 are not used
simultaneously

b a c

d
cycle 1 + *

x1 y

cycle 2 + -

z x2

cycle 3 +

x3

MicroLab, VLSI-22 (29/40)

JMM v1.4
cont..
Binding: Example cont

reg
x1 x2 a

mux mult
b d
y
reg reg reg reg

mux mux

add sub

z, x1, x3 x2

MicroLab, VLSI-22 (30/40)

JMM v1.4
Architecture Models

Š synthesis is based on the knowledge of a set of


architecture models and design styles
Š design styles:
Š parallel or serial datapath
Š interrupt or polling control
Š memory access types (cache ...)

MicroLab, VLSI-22 (31/40)

JMM v1.4
Architecture Models:
Microarchitecture

microarchitecture components
microarchitecture
Š functional units
Š adder, multiplier, comparator, ALU, etc.
Š memory elements
Š latch, flip-
flip-flop, register, register-
register-file, RAM, ROM ...
Š interconnection units
Š bus, multiplexer

MicroLab, VLSI-22 (32/40)

JMM v1.4
Architectural Models:
Combinational Logic

combinational logic:
Š non subdividable units
Š encoder, decoder, carry-
carry-lookahead adder ...
Š subdividable units
Š ripple-
ripple-carry adder, selector, ALUs,
ALUs, ...

implementation forms
Š ROM (table lookup)
Š PLA structures (2 stage logic)
Š multistage logic
Š bit-
bit-slice, systolic array, etc

MicroLab, VLSI-22 (33/40)

JMM v1.4
Architectural Models:
Finit State Machines

finit state machines (FSM) are classical control


structures
Š autonomous FSM
Š no inputs (image processing, ...)
Š non-
non-autonomous FSM with inputs (general purpose)
Š Mealy machine (general)
Š Moore machine (restricted)
Š Medwedjew machine (hazard free)

MicroLab, VLSI-22 (34/40)

JMM v1.4
Architectural Models:
Control Unit / Data Path

Š FSMs are used for control unit tasks


Š datapaths are used as functional units
control unit - datapath model (FSMD model)
Öcontrol

control inputs datapath inputs

FSM datapath

transfer transfer
logic logic status
functional
unit
state datapath
control
register

control outputs datapath outputs

MicroLab, VLSI-22 (35/40)

JMM v1.4
Architectural Models:
System Architecture

Š FSMD is used as process on system level


Š system consists of a set of processes
Š hierarchical FSMD model
Š process synchronization is needed

D
Q process 1
control inputs databus
FSM datapath

transfer transfer
logic logic status
functional
unit
state datapath
control
clock1 register

control outputs
D
Q

control inputs
FSM datapath

transfer
transfer transfer
logic
logic logic status
functional
unit
state datapath
control
clock2 register

control outputs

process 2
MicroLab, VLSI-22 (36/40)

JMM v1.4
Architectural Models:
Interprocess Communication

Š synchronous or asynchronous communications


Š no protocol, delay known
Š handshake protocol

process 1
request

aknowledge

data data valid

process 2

MicroLab, VLSI-22 (37/40)

JMM v1.4
Architectural Models:
Implementation Constraints

Š behavioural modelling uses abstract models, which do not


model the reality precisely
Ö implementation constraints / pitfalls
Š deactivation of set and reset of latches simultaneously
Š clock skew in shift registers lead to races of clock and data
(two phase clocking strategy)
Š Moore and Mealy FSMs have hazards
Š asynchronous inputs lead to undefined FSM states
Ö never use:
Š gated clocks
Š combinatorial outputs for asynchronous inputs
Š asynchronous inputs as FSM inputs

MicroLab, VLSI-22 (38/40)

JMM v1.4
Conclusions

description--synthesis method
Š description
Š system design with HDLs (parallel constructions,
RTL level)
Š top-
top-down and bottom-
bottom-up design
Š abstract models are not precise
Š races, hazards, delays, signal strength, ...
Š silicon compiler does not exist

MicroLab, VLSI-22 (39/40)

JMM v1.4
Coming Up...

Next time...
Hardware description languages
Reading
Weste:
‹ Sections 6 thru 6.2.7 (design strategy)
‹ 6.4 thru 6.4.5 (design methods)

‹ 6.5 thru 6.5.4 ((design capture tools)

Self study Weste:


Weste:
‹ 6.6 thru 6.6.8 (design verification)
‹ 6.8 thru 6.9 (data sheets)

MicroLab, VLSI-22 (40/40)

JMM v1.4
VLSI Design I
Design for Test

He’s dead Jim...

‹ Overview
design for test architectures
ad-
ad-hoc, scan based, built-
built-in
‹ Goal: You are familiar with testability metrics and
you know ad-
ad-hoc test structures as well as scan-
scan-
based test structures. Built in test structures as
BILBO and boundary scan can be applied.
MicroLab, VLSI-23 (1/24)

JMM v1.3
Design For Test
What can we do to increase testability?

Š increase observability
Ö add more pins (?!)
Ö add small “probe” bus, selectively
enable different values onto bus
Ö use a hash function to “compress” a
sequence of values (e.g., the values of a
bus over many clock cycles) into a
small number of bits for later read-
read-out
Ö cheap read
read--out of all state information

Š increase controllability
Ö use muxes to isolate sub-
sub-modules and
select sources of test data as inputs
Ö provide easy setup of internal state

Design strategies for test (design for testability):


‹ad-
ad-hoc testing
‹scan-
scan-based approaches
‹self-
self-test and built-
built-in testing

MicroLab, VLSI-23 (2/24)

JMM v1.3
Ad--hoc testing #1
Ad
‹ Ad-
Ad-hoc test techniques are a collection of ideas
aimed at reducing the test time. Common
techniques are:
‹ partitioning large sequential circuits
‹ adding test points

‹ adding multiplexers

‹ providing for easy state access .


.
.
& co3 load & co3 load & co3
test 1 test 1
=1 0 =1 0 =1
Q3 Q3 &
Q3
test

&
co2 &
co2 &
co2
load load
test 1 test 1
=1 0 =1 0 =1
Q2 Q2 Q2

&
co1 &
co1 &
co1
load load
test 1 test 1
=1 0 =1 0 =1
Q1 Q1 Q1

vdd &
co0 vdd load &
co0 vdd load &
co0
test 1 test 1
=1 0 =1 0 =1
Q0 Q0 Q0
half-
half-adder
MicroLab, VLSI-23 (3/24)

JMM v1.3
Ad--hoc testing
Ad #2

bus oriented test technique


bus

unit unit unit unit


1 2 3 4

multiplexer based testing


A inp B inp

Module Module
1 A B
A control 0
Module A Module B
1
0 B control

Module Module
0 1 1 0 A B

A out test1
test1 test2
test2 B out

Module A test: {test1,test2}={0,1}


MicroLab, VLSI-23 (4/24)

JMM v1.3
Scan--based test techniques #1
Scan
Idea:
Idea: have a mode in which all registers are chained
into one giant shift register which can be loaded/
read-
read-out bit serially. Test remaining (combinational)
logic by
(1) in “test” mode, shift in new values for all
register bits thus setting up the inputs to the
combinational logic
(2) clock the circuit once in “normal” mode, latching
the outputs of the combinational logic back into
the registers
(3) in “test” mode, shift out the values of all
register bits and compare against expected
results. One can shift in new test values at the
same time (i.e., combine steps 1 and 3).
.
.
.
scan-
scan-out
CL 1
D Q
0
shift out
clk
QQ DD
QQ DD normal/test
clk
clk
clk
clk 1
D Q
0
shift in
normal/test clk
scan-
scan-in
normal/test

MicroLab, VLSI-23 (5/24)

JMM v1.3
Scan--based test techniques #2
Scan
‹serial scan

scan-
scan-out

DD QQ CL1 DD QQ CL2 DD QQ
DD QQ DD QQ DD QQ
clk
clk clk
clk clk
clk
clk
clk clk
clk clk
clk

scan-
scan-in serial scan chain
Scan registers

‹ partial serial scan: sometimes it is not area and


speed efficient to implement scan in every location
where a register is used (signal processing)

R1 CL R4

CL CL
R2 CL R5
R6

R3

MicroLab, VLSI-23 (6/24)

JMM v1.3
Level sensitive scan design
‹ A popular approach is the level sensitive scan
design technique from T.W. Williams (LSSD)
‹ the circuit is level sensitive (steady state response is
independent of circuit and wire delays within a circuit):
hazard free
‹ each register may be converted to a serial shift register

D D
T T2
C 1 B
I
A reg A reg B
L1 L2 D D
C B C B
I I
A A
Comb
D D
C B logic C B
I I
A A

D D
C B C B
I I
A A
serial data out
serial data in
c1
shift-clk

c2
shift-

normal operation
shift data into reg A shift reg B out
c1

shift-
shift-clk
c2
MicroLab, VLSI-23 (7/24)

JMM v1.3
Scan Elements
D D
T T2
C 1 B
‹ LSSD I
A
L1 L2
D
&
& T1 D
& & T2
&
C
I
& & &
B
&
&
A L2
L1

D
‹ scan FF 1
D Q Q
0
TI
clk
TE

TE clkb clka

D
clka clkb Q
TE clkb
clka
TI
TE clkb clka
MicroLab, VLSI-23 (8/24)

JMM v1.3
Self--Test Techniques: BILBO
Self
Problem: Scan-
Scan-based approach is great for testing combinational logic
but can be impractical when trying to test memory blocks, etc. because
because
of the number of separate test values required to get adequate fault
fault
coverage.

Solution: use on-


on-chip circuitry to generate test data
and check the results. Can be used at every power-
power-on
to verify correct operation!

1 circuit
0
under
test
normal/test

FSM FSM okay


A B

Generate pseudo-
pseudo-random For pseudo-
pseudo-random input
data for most circuits by data simply compute some
using, e.g., a linear feedback hash of output values and
shift register (LFSR). compare against expected
Memory tests use more value (“signature”) at end of
systematic FSMs to create test. Memory data can be
ADDR and DATA patterns. checked cycle-
cycle-by-
by-cycle.

MicroLab, VLSI-23 (9/24)

JMM v1.3
Linear Feedback Shift Register (LFSR)
If Ci’s are not programmable, can eliminate
AND gates and some XOR gates...

=1 =1 =1 =1
&

&

&

&

&
....
c1 c2 c3 cn-1 cn
D Q D Q D Q D Q D Q

clk clk clk clk clk

1 + c1 x + c 2 x 2 + c3 x 3  cn−1 x n−1 + cn x n

‹ with a small number of XOR gates the cycle


time is very fast. Cycle through fixed sequence
of states (can be as long as 2n-1 for some n’s).
n’s).
Handy for large modulo-
modulo-n counters.
‹ different responses for different initial states

‹ different responses for different ci

Î pseudo-
pseudo-random sequence generator (PRSG)

MicroLab, VLSI-23 (10/24)

JMM v1.3
Signature Analysis
‹ signature analysis is used to compact a data stream
into a so called signature
‹ different responses for different ci, many well-
well-
known CRC (cyclic redundancy check) polynomials
correspond to a specific choice of ci’s.
’s.
serial in
=1 =1 =1 =1
&

&

&

&

&
....
c1 c2 c3 cn-1 cn
=1 D Q D Q D Q D Q D Q

clk clk clk clk clk

parallel in
=1 =1 =1
&

&

&

c1 c2 Cn-1

=1 D Q =1 D Q . . . . =1 D Q =1 D Q

clk clk clk clk

z1 q1 z2 q2 zn-1 qn-1 zn qn

MicroLab, VLSI-23 (11/24)

JMM v1.3
LFSR Polynomials
‹ polynomials for maximal long sequences for n equal
1 up to 32
n f(x)
1,2,3,4,6,7,15,22 1+x+x
1+x+xn
5,11,21,29 1+x2+xn
10,17,20,25,28,31 1+x3+xn
9 1+x4+xn
23 1+x5+xn
18 1+x7+xn
8 1+x2+x3+x4+xn
12 1+x+x4+x6+xn
13 1+x+x3+x4+xn
14,16 1+x3+x4+x5+xn
19,27 1+x+x2+x5+xn
24 1+x+x2+x7+xn
26 1+x+x2+x6+xn
30 1+x+x2+x23+xn
32 1+x+x2+x22+xn
‹ examples of CRC’s
n CRC
8 1+x+x4+x5+x7+x8
16 1+xMicroLab,
2+x15+x16
VLSI-23 (12/24)

JMM v1.3
BILBO #1
‹ Very popular built-
built-in test structure is the built-
built-in
logic block observation (BILBO) from Koenemann
‹ BILBO operate in 4 different modes

parallel register BILBO BILBO


mode normal
operation
register of circuit register
mode mode

PRSG or
signature analysis BILBO BILBO
normal
mode operation signature
PRSG of circuit analysis
mode mode

scan mode BILBO BILBO


mode normal
operation
scan of circuit scan
mode mode

reset BILBO BILBO


mode normal
operation
reset of circuit reset
mode mode

MicroLab, VLSI-23 (13/24)

JMM v1.3
BILBO #2

‹ example of a BILBO element with polynomials 1+x+x4

D0 D1 D2 D3
c1
c0 scan
out
&

&

&

&
scan =1 D =1 D =1 D =1 D
in 0 & Q & & &
Q Q Q
1 clk clk clk clk

=1
Q1 Q2 Q3 Q4

mode c1 c0 function

A 0 0 scan mode
B 1 0 reset
C 0 1 PRSG or signature analyzer
D 1 1 parallel registers

MicroLab, VLSI-23 (14/24)

JMM v1.3
IDDQ Testing

A-met
meter (measures IDD)

VDD

GND

Idea: CMOS logic should draw no current


when it’s not switching. So after initializing
circuit to eliminate tri-
tri-state fights, disable
pseudo-
pseudo-NMOS gates, etc., the power-
power-supply
current should be zero after all signals have
settled.

Good for detecting bridging faults (shorts).


May want to try several different circuit
states to ensure all parts of the chip
have been observed.

MicroLab, VLSI-23 (15/24)

JMM v1.3
System--Level Test: Boundary Scan
System
‹ The IEEE 1149.1 boundary scan architecture
provides a standardized serial scan path through the
I/O pins of a chip (also called JTAG)
‹ at the board level, chips obeying the standard may
be connected in a variety of series and parallel
combinations for board testing (replacing bead of
nails)
‹ standardized tests:
‹ connectivity tests between components
‹ sampling and setting chip I/Os

‹ distribution an collection of self-


self-test or built-
built-in-
in-test
results

serial test interconnect


PCB interconnect

IO pad and
boundary cell

serial data out


serial data in MicroLab, VLSI-23 (16/24)

JMM v1.3
Boundary Scan: Test Access Port

‹ The test access port (TAP) is a definition of the


interface that needs to be included in an IC
‹ TCK: test clock input
‹ TMS: test mode select

‹ TDI: test date input

‹ TDO: test data output

‹ TRST: optional signal for asynchronous reset the TAP

‹ the test architecture


test data registers
0
TDI instruction decode TDO
1
instruction registers

clocks/control

TCK TAP
TMS controller
(TRST)
MicroLab, VLSI-23 (17/24)

JMM v1.3
Boundary Scan: TAP controller

‹ State machine for the TAP controller. TMS is the


control signal.

1 test-
test-logic reset
0 1
1 1
0 run-
run-test/idle select-
select-DR-
DR-scan select-
select-IR-
IR-scan
0 0
1 capture-
capture-DR 1 capture-
capture-IR
0 0
shift-
shift-DR 0 shift-
shift-IR 0
1 1
exit1-
exit1-DR 1 exit1-
exit1-IR 1
0 0
pause-
pause-DR 0 pause-
pause-IR 0
1 1
0 exit2-
exit2-DR 0 exit2-
exit2-IR
1 1
update-
update-DR update-
update-IR
1 0 1 0

MicroLab, VLSI-23 (18/24)

JMM v1.3
Boundary--scan: IR
Boundary

‹ Instruction register (IR): minimum 2 bits

to next IR bit
data 0
D Q D Q
from last cell 1
IR bit
clk clk
shiftIR
clockIR updateIR
TRST &
reset

FSM state capture-


capture-IR shift-
shift-IR exit1-
exit1-IR pause-
pause-IR exit2-
exit2-IR update-
update-IR
shiftIR
clockIR

updateIR

MicroLab, VLSI-23 (19/24)

JMM v1.3
Boundary--scan: DR
Boundary
‹ TAP data register (DR)

boundary scan registers


TDI internal data register TDO

bypass register (1 bit)

‹ boundary scan register is a special case of a data


register. It allows circuit board interconnections to be
tested, external components tested, and the state of the
chip digital I/Os to be sampled. The boundary scan
register is mandatory.
‹ internal data registers are optional and add additional
access to the circuit.
‹ the bypass register is a 1 bit register used to bypass a
whole chip.

MicroLab, VLSI-23 (20/24)

JMM v1.3
Boundary--scan: DR
Boundary
‹ boundary scan input and output cells
mode
next cell
out 0
PAD 0 1 to chip
1 D Q D Q
shiftDR clk clk
last cell clockDR updateDR
mode
next cell
0 out
from chip 0 1 PAD
1 D Q D Q
shiftDR clk clk
last cell clockDR updateDR

‹ boundary scan bi-


bi-directional cell
next cell
0
enable 0 1
1 D Q D Q
shiftDR clk clk
clockDR updateDR
0 bidir
from chip 0 1 PAD
1 D Q D Q
shiftDR clk clk
clockDR updateDR
0
0 1
1 D Q D Q
last cell shiftDR clk clk
clockDR updateDR
to chip
MicroLab, VLSI-23 (21/24)

JMM v1.3
Boundary scan: instructions

‹ Minimum 3 instructions
‹ Bypass (all 0): it is used to bypass any serial data
registers in a chip with a 1 bit register. This allows
specific chips to be tested in a serial-
serial-scan chain without
having to shift through the accumulated SR stages in all
the chips
‹ Extest (all 1): testing of off chip circuitry

‹ sample/preload: places the boundary scan registers (at


the chips I/O pins) in the DR chain, and samples or
preloads the chips I/Os
‹ optional recommended instructions:
‹ Intest:
Intest: single-
single-step testing of internal circuitry via the
boundary scan registers
‹ Runbist:
Runbist: run internal self-
self-testing procedures within a chip

MicroLab, VLSI-23 (22/24)

JMM v1.3
Coming Up...
Next time:
Top down design. Hardware description languages,
logic synthesis.

Readings …
Weste:
Weste:
‹ 7.3 through 7.3.3.3 (ad-
(ad-hoc & scan-
scan-based testing)
‹ 7.3.4 through 7.3.4.1 (BILBO)

‹ 7.3.5 (Iddq
(Iddq testing)
‹ 7.5 (boundary scan)

MicroLab, VLSI-23 (23/24)

JMM v1.3
VLSI--22
Exercises: VLSI

‹ Ex vlsi22.1 (difficulty: easy): calculate the pseudo-


pseudo-
random sequence of an LFSR with the implemented
polynomial 1+x+x3 use the start value x=1
‹ Result: 1,3,7,6,5,2,4,1,...

MicroLab, VLSI-23 (24/24)

JMM v1.3
VLSI Design II
Small Signal FET Model
and Diode Models

Overview
‹ small signal equivalent circuit for fet and diodes

‹ advanced large fet modeling and second-


second-order
effects

Goal: You can use the small signal equivalent circuit


of a diode and a MOS transistor. You are able to
determine the parameters of a fet and have the
feeling for a MOS fet.
fet. You are familiar with
advanced modeling like weak inversion, short-
short-
channel effects and leakage.
MicroLab, VLSI-24(1/22)

JMM v1.3
Summary: Large Signal Model
MOS fets have 3 regions of operation
‹ cutoff region (
(subthreshold
subthreshold):
subthreshold): VGS <= Vth
‹ linear region (triode region): VGS> Vth ; 0< VDS< VDSsat

‹ active region (saturated region): VGS> Vth ; VDSsat< VDS

cutoff
(subthreshold)
subthreshold I DS = 0

W  2
VDS 
linear region I DS = µCox
L (VGS − Vth )VDS − 2 
 
active region channel length modulation
µCox W
I DS (sat ) = (VGS − Vth )2 [1 + λ (VDS − Veff )]
2 L
k rds 2ε Si
λ= k rds =
2 L VDS − Veff + Φ 0 qN A

Body effect Vth = Vth 0 + γ ( VSB + 2φ F − 2φ F )


Veff = VGS − Vth 2ε Si qN A φ = −V ln N A 
γ = F T  n 
Cox MicroLab, VLSI-24(2/22)  i 
JMM v1.3
Advanced Large Signal Modeling:
Cutoff or subthreshold region
Condition: VGS<=V
<=Vth
Channel is not inverted and therefore
IDS=0
A more precise definition, which is better suited for
analog design takes into account that teh channel
becomes not suddenly inverted when the gate-
gate-source
voltage is increased. Depending on the gate-
gate-source
voltage, we define three regions of inversion:
‹weak inversion: Veff < -100mV

‹moderate inversion: -100mV < Veff < 100mV

‹strong inversion: Veff > 100mV


IDS
(some designers use 200mV instead)
quadratic
strong inversion
UGS
weak inversion: Ut
log IDS
W
I DS ≅ I D 0  e (qVGS / nkT )
L
exponential
n ≅ 1.5 weak inversion
UGS
MicroLab, VLSI-24(3/22)

JMM v1.3
Advanced Large Signal Modeling:
Short Channel Effects
As device dimensions are scaled down, short-
short-channel
effects degrade the operation of mos fets
‹ mobility degradation: short channels and large
electric fields provoke more electron collisions.
Carrier velocity saturates as it is not anymore
proportional to the electric filed: µn E
νd ≅
1 + E
µ nCox W 2 where is the Ec
ID = Veff
2(1 + θVeff ) L square law
Id
θ = 1 LE
c

1 UGS U’GS
Rsx ≅ Rsx
µ nCoxWEc

‹ hot carrier effects specially in nfets due higher


mobility: high velocity electrons can generate
electron hole pairs in drain to substrate: V >>V
>>V G th VD>>0

Î reduced output impedance


n+ n+
punch
through current drain to source
current
MicroLab, VLSI-24(4/22)

JMM v1.3
Advanced Large Signal Modeling:
Leakage Currents
An important second-
second-order device limitation is the
leakage current of the junctions (ex sample-
sample-and
hold time)
‹ the intrinsic concentration is a strong function of
temperature, the leakage current is also strongly
dependent of temperature (approx. doubles for 11C)
‹ leakage current of a reverse-
reverse-biased junction:

electron and hole lifetime


junction area

1
qA j ni τ 0 ≅ (τ n + τ p )
I IK ≅ xd 2
2τ 0

2ε si
xd = (Φ 0 + VR )
qN A

MicroLab, VLSI-24(5/22)

JMM v1.3
Small Signal Equivalent Circuits
Why do we love them?
‹ Find Id of a transistor in active region when the gate
sin(ωt)
is driven with a voltage source Vgs=V0sin(ω
It is handy to use simple linear equations !
ÖIt

What are small signal parameters?


‹ Instead of using nonlinear transistor curves, we
determine the operating point and use the derivative
in this point
f(x)
f(x0)
x0 x

f (n ) (x0 )

f (x ) = ∑ (x − x0 )n
‹ Taylor: n =0 n!
‹ approximation: f (x ) ≈ f (x ) +
df (x0 )
0 (x − x0 )
dx
operating point small signal
– small signal parameters are denoted with small letters
– small signal parameters are very handy for building
simple equivalent circuits MicroLab, VLSI-24(6/22)

JMM v1.3
Transconductance #1

 The most important small signal component is the


transconductance.
transconductance. The behavior of a
transconductance is the one of a voltage controlled
current source. It describes the change of output
current when the input voltage is varied.
‹ gm main transconductance,
transconductance, describes the
amplification of the drain current when a voltage is
applied between gate and source.
‹ gds transconductance,
transconductance, accounting for finite output
impedance of transistor. Models channel length
modulation effect, when drain to source voltage
varies.
‹ gs transconductance,
transconductance, describing how the output
current depends on the source to substrate voltage
(body effect).

MicroLab, VLSI-24(7/22)

JMM v1.3
Transconductance #2
µCox W
I DS (sat ) = (VGS − Vth )2 [1 + λ (VDS − Veff )]
2 L

∂I D
gm =
∂VGS
W 2I D W
g m = µ n Cox Veff = = 2 µ n Cox I D
L Veff L

∂I D ∂I D ∂Vtn
gs = = ⋅
∂VSB ∂Vtn ∂VSB

γ ⋅ gm the negative sign is eliminated


gs = by changing the current direction
2 VSB + 2φ F in the equivalent circuit

∂I D
g ds =
∂VDS

1
g ds = = λI Dsat ≈ λI D
rds
MicroLab, VLSI-24(8/22)

JMM v1.3
Small--Signal Modeling in the Active
Small
Region (Low Frequency)
the low-
low-frequency model
id
vg vd
+ gmvgs gsvs rds
vgs
-
is
vs
Depending on the terminal voltages, and the relative size of
the parameters, some of the components may be ignored.
This helps to reduce the complexity of hand calculations.

the alternate low-


low- vd
frequency T model
is

vg rds

is rs=1/gm

vs

MicroLab, VLSI-24(9/22)

JMM v1.3
MOSFET Capacitance Estimation
in Active Region
 The dynamic response of MOS systems strongly depends on
the parasitic capacitance associated with the MOS transistor.
2 2
C gs = WCox  L + Lov  = WLCox + WCGS 0
3  3 C j0
C gd = WLov Cox = WCGD 0 C jx = Mj
V
1 + XB 
 Φ0 
Csb = ( As + Ach )C js
'
C j −sw0
'
Cdb = Ad C jd C j −sw, x = M jsw
V
1 + XB Φ 
 0
Csb = Csb' + Cs−sw Cs −sw = Ps C j −sw,s
Cdb = Cdb '
+ Cd −sw Cd −sw = Pd C j −sw,d
VGS>Vth
VDG>-Vth
VSB=0

poly Cgd
Al Cgs SiO2
n+ n+
p+ field Cs-sw C’sb Lov
impland Cd-sw
p- substrate C’db

VB=0MicroLab, VLSI-24(10/22)
JMM v1.3
Small--Signal Modeling
Small
in the Active Region
the small signal model

Cgd id
vg vd
+ gmvgs gsvs rds
Cgs vgs
- Cdb

is
Csb vs

‹ Gate capacitance Cgs is normally the largest parasitic


cap of fet.
fet.
‹ The gate-
gate-drain overlap capacitance Cgd is normally
small, can however play a role when the voltage gain is
large (Miller effect).
‹ Source capacitance Csb is normally second largest
capacitance, since it includes channel bulk capacitance.
‹ Drain capacitance Cdb normally smallest capacitance.

MicroLab, VLSI-24(11/22)

JMM v1.3
Small--Signal Modeling
Small
in the Triode region
a simplified triode-
triode-region model for small VDS

In the triode region a resistor modeling the conductance


of the channel is normally sufficient.
W 1 2 
I DS = µCox (VGS − Vth )VDS − VDS 
L 2
1 W
= g ds ≅ µCox Veff
rds L vg

The accurate modeling of


Cgs rds Cgd
high frequency operation vs vd
of a fet in triode region
is nontrivial. We use a Csb Cdb
simplified model.

1 1
C gs = C gd = AchCox + WLov Cox = WLCox + WCGX 0
2 2
1
C xb =  Ax + Ach C jx + Px C j −sw, x
 2 
MicroLab, VLSI-24(12/22)

JMM v1.3
Small--Signal Modeling
Small
cut--off region
in cut
a simplified cut-
cut-off region model
vg

Cgs Cgb Cgd


vs vd

Csb Cdb

As the channel has disappeared we have:


C gs = C gd = WLov Cox = WCGX 0
but we also have a new capacitor:
C gb = AchCox

The capacitors Csb and Cdb are smaller as the channel is


not present :
C xb = AX C jx + Px C j −sw, x

MicroLab, VLSI-24(13/22)

JMM v1.3
Diodes
anode cathode
‹ p+/nwell diode anode
Note that the metal Al SiO2
contacts to the p+ n+
diode are connected n well cathode
to heavily doped p- substrate
region pn junction

cathode anode
anode
‹ n+/pwell diode
Al SiO2
n+ p+
p well cathode
n- substrate
pn junction

‹ Schottky diode anode cathode


anode
metal contacts to Al SiO2
lightly doped
n+
semiconductor n well cathode
forms a Schottky
p- substrate
diode
Schottky diode depletion region

MicroLab, VLSI-24(14/22)

JMM v1.3
Diode Modeling
‹ If a diode is reverse-
reverse-biased, current flow is
extremely small and primarily due to thermal or
optically generated carriers. electric field
C j0
Cj = Mj p+ n
 VR 
1 +   N AND 
 Φ0  Φ 0 = VT ln 2

 ni  depletion region
qε si N D N A depletion
Cj0 = capacitance Cj
2 Φ 0 (N A + N D )
‹ Large-
Large-signal model for forward biased junction
VD  1 1 
ID = ISe VT I S ∝ AD  + 
 N A ND 
CT = Cd + AC j diffusion capacitance Cd
ID
(Cd=0 for forward biased Schottky diodes) Cd = τ T
VT
‹ Small-
Small-signal model for a forward-
forward-biased diode
dominant for
1 dI D I D large currents
= =
rd dVD VT rd
Cj Cd

MicroLab, VLSI-24(15/22)

JMM v1.3
Coming Up...

Next time:
Basic current mirrors and single stage amplifiers.

Readings for next time…


Johns&Martin:
‹ 1 through 1.1 ((pn
pn junctions)
‹ 1.2 (mos
(mos transistor)
‹ 1.2 (advanced mos modeling)

CAD Exercises for next time…


‹ Ex600: simulation of static behavior of nfet
‹ Ex600a: output resistance and channel length
modulation
‹ Ex600b: weak vs strong inversion

MicroLab, VLSI-24(16/22)

JMM v1.3
VLSI--24
Exercises: VLSI #1
Johns&Martin 1.1 pp7: Ex1.4 (difficulty: easy):
Assuming process C05M-
C05M-D. a) Calculate the total
zero-
zero-bias depletion capacitance CT-j0 of a p+nwell
5µm times 5µ
diode with an area of 5µ 5µm. Do not use
the Spice parameter CJ. b) At 3V reverse-
reverse-bias the
capacitance Cj has to be calculated again.
Result:
Result: a) CT-j0=16.3fF, b) CT-j=8.98fF
Johns&Martin 1.1 pp10: Ex1.6 (difficulty: medium):
C05M-D and Mj=0.5 (use Spice
Assuming process C05M-
parameter CJ). A reversed biased p+nwell diode is
charged from 0V to 3.3V through a 10kΩ 10kΩ resistor.
Calculate the time to charge the diode to 2/3 2/3 of its
end value.
value.
Result: t66%=130ps (Johns: see eq.
Result: eq. 1.36 pp10)
Johns&Martin 1.2 pp31: 1.9 (difficulty: easy):
Assuming process C05M-
C05M-D. a) Derive the low- low-
frequency parameters for an nfet with W=10µ W=10µm and
L=0.5µm at Vgs=1.1V, Vds= Veff , Vsb= 0.55
L=0.5µ 0.55V.
b) What is the new value of rds if the drain-drain-source
voltage is increased by 0.55
0.55V.
Result:
Result: a) gm=0.98mA/V, gs=0.143mA/V,
=208kΩ, b) rds=12.8kΩ
rds=208kΩ =12.8kΩ ???? MicroLab, VLSI-24(17/22)

JMM v1.3
VLSI--24
Exercises: VLSI #2

Johns&Martin 1.2 pp33: 1.10 (difficulty: easy):


Assuming process C05M-
C05M-D. Find the T- T-model
parameter rs for the nfet for example 1.9a.
=502Ω
Result: rs=502Ω
Result:
Johns&Martin 1.2 pp36: 1.12 (difficulty: easy):
Assuming process C05M-
C05M-D. Find the gds for the
nfet for example 1.9 working in triode region with
Vds near zero.
Result: =502Ω
Result: gm=1.99mA/V, rds=502Ω
Johns&Martin 1.9 pp79: 1.7 (difficulty: easy):
C05M-D. a) Find ID for an nfet
Assuming process C05M-
W=10µm, L=0.5µ
with W=10µ L=0.5µm and VGS=1.1V, VDS=
Veff . b) Assuming λ remains constant, estimate
the new value of ID if VDS is increased by 0.3V.
=487µA, b) ID= 513µ
Result: a) ID=487µ
Result: 513µA

MicroLab, VLSI-24(18/22)

JMM v1.3
VLSI--24
Exercises: VLSI #3

Ex600a: Johns&Martin 1.9 pp79: 1.8 (difficulty:


easy): Assuming process C05M-
C05M-D. Simulate a fet
W=10µm, L=2µ
W=10µ L=2µm in its active region
(VGS=2V) and measure the drain current at
VDS1=2V and at VDS2=3V. Estimate the output
impedance rds and the channel length modulation
factor λ.
=402kΩ, λ=0.006
Result: rds=402kΩ
Result:

MicroLab, VLSI-24(19/22)

JMM v1.3
VLSI--24
Exercises: VLSI #4

Ex vlsi24.1 (difficulty: easy): Assuming process


C05M-
C05M-D. Find the capacitances of an nfet as shown
below in its active region for Vsb=1V, Vdb=2V.
Result:
Result: Cgs=3.86fF, Csb=3.09fF, Cdb=1.94fF,
Cgd=0.41fF ((see
see Johns&Martin pp35)

0.5µm
0.5µ
0.6µm
0.6µ

3µm

0.6µm
0.6µ

MicroLab, VLSI-24(20/22)

JMM v1.3
VLSI--24
Exercises: VLSI #5

Ex vlsi24.2 (difficulty: easy): Assume the transistors


are designed with minimal dimensions using the
0.5µm Alcatel Mietec process. Use the λ rules to
0.5µ
calculate the Cgs, Csb and Cdb capacitances for its
active region. Compare the values with a single
device fet.
fet.
Result:
Result: a) Cdb=26.6fF, Csb=49.1fF, Cgs=34.8fF,
(see Johns&Martin pp103ff)
node 1

J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
27λ
27λ

node 2 gates

MicroLab, VLSI-24(21/22)

JMM v1.3
VLSI--24
Exercises: VLSI #6

Ex600b: (difficulty: easy, medium time): Assume the


W=10µm and
transistors are designed with W=10µ
L=2µm using the 0.5µ
L=2µ 0.5µm Alcatel Mietec process.
Simulate the fet with Spice in strong and weak
inversion. Visualize VGS vs IDS, sqrt IDS and log IDS
and identify the different regions and find ID0.
Result: compare with transparency #3

MicroLab, VLSI-24(22/22)

JMM v1.3
VLSI Design II
Basic Current Mirrors and
Single Stage Amplifiers He !
That‘s me !

Goal: You know the properties of the different


amplifier stages and are able to choose the one
which is best suited for your application. You can
determine the fet dimensions from a given circuit
specification. You are familiar with current mirrors.
You can apply two possible techniques for
improving the output impedance. You know the
resulting limitations on the output voltage swing.
MicroLab, vlsi-25 (1/26)

JMM v1.2
Outline

 Current mirrors
 Single stage amplifiers with active loads

 Johns&Martin
 nodal analysis method
 simple CMOS current mirror (chap 3.1)
 common-
common-source amplifier (chap 3.2)
 source-
source-follower or common drain amplifier (chap 3.3)
 common gate amplifier (chap 3.4)
 source degenerated current mirror (chap 3.5)
 high-
high-output-
output-impedance current mirrors (chap 3.6)
 cascode gain stage (chap 3.7)

 Exercises
 hand calculations
 spice simulations

MicroLab, vlsi-25 (2/26)

JMM v1.2
Simple CMOS Current Mirror
 Used as bias current source
 Used to multiply currents
 Used as high output impedance

 Q1 and Q2 have the same size


 both transistors are in active region
W
I ds (sat ) = µ n C ox (Vgs − Vt )
2

2L
Iin Iout
Id active
V1 rout

Q1 Q2
linear

Vds

Vgs 1 = Vgs 2 → Iin = I out

 consider minimal output voltage


 consider finite output impedance

MicroLab, vlsi-25 (3/26)

JMM v1.2
Simple CMOS Current Mirror
(Q1 model)
 small signal model (low frequency)
id
vg vd
+ gmvgs gsvs rds
vgs
-
is
vs Iin Iout
 small signal model for Q1 V1 rout

Q1 Q2

V1
iy
vg1
+ gm1vgs1 gs1vs1 rds1 +
vgs1 ~ vy
- -

v1
Q1 1/gm1=rs1
small signal model of
diode connected transistor
MicroLab, vlsi-25 (4/26)

JMM v1.2
Simple CMOS Current Mirror
(small signal analysis)
 Small signal model of overall CMOS current mirror

Q1 Q2
ix

+ gm2vgs2 rds2 +
1/gm1 vgs2 ~ vx
- -

as there is no current through gm1 -> vgs2=0

ix

rds2 +
~ vx
-

 rout of CMOS current mirror is:

rout = rds 2

MicroLab, vlsi-25 (5/26)

JMM v1.2
Common Source Amplifier
 the common source topology is the most popular
gain stage, especially when high-
high-input impedance is
required
 a common use of simple current mirrors in a single-
single-
stage amplifier with an active load
 active loads represent high-
high-impedance output loads
without using high impedance resistors or large
power supply voltages.
 for a given supply voltage a larger gain can be
achieved using active loads.
1MΩ load were required with a
 for example, if a 1MΩ
100µA bias current, a 100µ
100µ 100µA x 1MΩ
1MΩ=100V
power supply would be necessary

active load
Q3 Q2
Vout
rout
Ibias Q1
Vin
common source
amplifier stage

MicroLab, vlsi-25 (6/26)

JMM v1.2
Common Source Amplifier
(small signal analysis)
 it is assumed, that the bias current is such that
both transistors Q2 and Q3 are in active region.

Q3 Q2
Vout
rout
Ibias Q1
Vin

Q1 R2
Rin vout active load
vin ~
+ + gm1vgs1
vgs1
- - rds1 rds2

v gs1 = v in
v out
Av = = − gm1 R2 = − gm1 (rds1 rds 2 )
v in
MicroLab, vlsi-25 (7/26)

JMM v1.2
Source--Follower or
Source
Common-
Common-Drain Amplifier
 common-
common-drain amplifier is commonly used as
voltage buffers and thus is called source-
source-follower
 ideally the small signal voltage gain is close to
unity
 as the circuit has no voltage gain it does have a
current gain
 dc level of the output voltage is not the same as
the dc level of the input voltage
 note that the body effect is the major limitation on
the small-
small-signal gain

common-
common-drain
amplifier stage
Vin
Ibias Q1
Vout

Q3 Q2
active load

MicroLab, vlsi-25 (8/26)

JMM v1.2
Source--Follower
Source
(small signal analysis)
 Note that the voltage controlled current source that
models the body effect of the nfet has been
included

Vin
Ibias Q1
Vout

Q3 Q2

Q1
vd1
vin =vg1
+ gm1vgs1 gs1vs1 rds1
vgs1
-
vs1 vout=vs1

rds2
active load

MicroLab, vlsi-25 (9/26)

JMM v1.2
Nodal Equation Methodology
 In order to minimize circuit equation errors, a
consistent methodology should be maintained when
writing nodal equations:
 the first term is always the node at which the currents
are being summed v out

 this node voltage is multiplied by the sum of all


admittances connected to the node v (g out ds1 + gds 2 )

 the next negative terms are the adjacent node voltages,


and each is mutiplied by all connecting admittances
− v d gds1
 the last terms are any current sources with a multiplying
negative sign used if the current is shown to flow into
the node +g v −g v s1 s1 m1 gs1
Q1
vd1
vin =vg1
+ gm1vgs1 gs1vs1 rds1
vgs1
-
vs1 vout=vs1

rds2

MicroLab, vlsi-25 (10/26)

JMM v1.2
Source--Follower
Source
(small signal analysis, con‘t)
Q1
vd1
vin =vg1
+ gm1vgs1 gs1vs1 rds1
vgs1
-
vs1 vout=vs1

rds2
v s1

v out ( gds1 + gds 2 ) + g s1v out − g m1 (v in − v out ) = 0

vout g m1
Av = =
vin g m1 + g s1 + g ds1 + g ds 2

 gs1 is 5 to 10% of the value of gm1, gds1 and gds2 are


in the order of 1/10 of gs1
the body effect parameter gs1 is the major source of
the error causing the gain less than unity

MicroLab, vlsi-25 (11/26)

JMM v1.2
Common--Gate Amplifier
Common

 Common-
Common-gate stage with active load is used when
relatively small input impedance is desired
 Application examples: input impedance of 50Ω50Ω to
terminate a transmission line, or first stage of
amplifier to amplify current instead of voltage
active load
Q3 Q2
Vout

Ibias Q1 common-
common-gate
Vbias
rin amplifier stage
Vin

Q1
vd1 vout

+ gm1vgs1 gs1vs1 rds1 RL


vgs1
-
vs1
rin active load
RS

vin
MicroLab, vlsi-25 (12/26)

JMM v1.2
Common--Gate Amplifier
Common
(small signal analysis)
Q1
vd1 vout

+ gm1vgs1 gs1vs1 rds1 RL


vgs1
-
vs1
rin active load
RS

vin
v s1 = −v gs1 thus

Q1
vd1 vout

+ (gm1+gs1)vs1 rds1
vgs1 RL=rds2
-
vs1
rin only active
RS charge present
 nodal analysis for is
nodes vout and vs1: vin

é ù
v out ê Gs g m 1 + g s 1 + g ds 1
Av = =ê
v in ê G + g m 1 + g s 1 + g ds 1 G L + g ds 1
s
êë 1 + g ds 1 / G L
MicroLab, vlsi-25 (13/26)

JMM v1.2
Summary: Gain Stages
 common source amplifier: gain stage with high
input impedance.
vout
Av = = − g m1 (rds1 rds 2 )
vin
 common drain amplifier (source follower): used as
voltage buffers with small signal voltage gain close
to 1, but can produce current gain.
vout g m1
Av = =
vin g m1 + g s1 + g ds1 + g ds 2

 common gate amplifier: used as gain stage when a


small input impedance is desired and can be used as
first stage of an amplifier designed to amplify
current rather than voltage.
é ù
vout ê Gs g m1 + g s1 + g ds1
Av = =ê
vin ê G + g m1 + g s1 + g ds1 GL + g ds1
s
êë 1 + g ds1 / GL
MicroLab, vlsi-25 (14/26)

JMM v1.2
Source--Degenerated Current Mirror
Source
 General consequence of finit output resistance:
 deviation in large signal behavior
 difficulties as active load

 the output impedance of the basic 2 transistor current


mirror can be increased by
degeneration resistors Rs I I in out

V1 rout

Q1 Q2

Rs Rs

Q1 Q2
0V ix
gm2vgs
+ gsvs rds2 +
1/gm1 vgs ~ vx
- -
vs

Rs ix Rs
impedance increase
vx
rout = = rds 2 [1 + Rs ( gm 2 + g s 2 + gds 2 )]
ix
MicroLab, vlsi-25 (15/26)

JMM v1.2
High--Output Impedance Current Mirrors
High
Cascode Current Mirror
 the output impedance of a cascode current mirror is
increased by a factor 10 to 100 compared to a basic
current mirror
 a disadvantage is the reduced output voltage swing
because transistors may enter triode region

Iin Iout Vout


rout

Q3 Q4

Q1 Q2

Vout > 2 Veff + Vtn


rout ≅ rds 4 rds 2 g m 4

MicroLab, vlsi-25 (16/26)

JMM v1.2
Cascode Current Mirror

 reduced output voltage swing


transistor in active region
Vds > Veff = Vgs - Vtn

all transistor have the same size and current Id:


Vgs = Veff + Vtn

Iin Iout Vout 2Id


Veff =
rout µ n C ox (W / L )
Q3 Q4

Q1 Q2

Vg 3 = Vgs 1 + Vgs 3 = 2 Veff + 2 Vtn

Vds 2 = Vg 3 − Vgs 4 = Vg 3 − (Veff + Vtn ) = Veff + Vtn

Vout > Vds 2 + Veff = 2 Veff + Vtn


MicroLab, vlsi-25 (17/26)

JMM v1.2
Cascode Current Mirror (con‘t)

 very high output impedance Iin Iout Vout


rout

Q3 Q4

Q1 Q2

impedance
impedance vgs4=-vs4
vg4 iout

gs3vs3 g v gs4vs4 vout


+ gm3vgs3 rds3 + m4 gs4 rds4
vgs3 vgs4
- -
vs3 no current vs4
vg2
rds1 vs3=0V g v gs2vs2
+ m2 gs2 rds2
vgs2
-

rout = rds 4 [1 + rds 2 (g m 4 + g s 4 + g ds 4 )] ≅ rds 2 rds 4 g m 4


MicroLab, vlsi-25 (18/26)

JMM v1.2
High--Output
High Output--Impedance Current Mirrors
Wilson Current Mirror
 very similar performance than cascode current
mirror but 1/2 of its output impedance
 shunt-
shunt-series feedback to increase output impedance

Iin Iout
rin rout

Q3 Q4

Q1 Q2

Q2 senses output current and mirrors it to Id1 to.


Iin and Id1 must precisely match otherwise Vg3 increases/decreases.

MicroLab, vlsi-25 (19/26)

JMM v1.2
Cascode Gain Stage
 cascode configuration for single stage amplifiers is
commonly used in modern IC design
 quite large gain for single stage due to large impedance
at the output
to enable the large gain, high quality cascode current
mirrors at the output are necessary
 large gain normally without any speed degradation
 voltage across input drive fet is limited
minimizing short channel effects in modern technologies
 configuration: common-
common-source-
source-connected transistor
feeding into a common-
common-gate-
gate-connected transistor

telescopic cascode amplifier folded-


folded-cascode amplifier
p-channel
n-channel I Ibias common-
common-gate
common-gate bias
common-
Vout Q2
Vin
Vbias Q1 Vbias
Q2 CL Vout
Vin
Q1 Ibias2 CL
identical in/out
dc level possible
MicroLab, vlsi-25 (20/26)

JMM v1.2
Cascode Gain Stage
telescopic cascode amplifier
Ibias

Vout
Vbias
Q2 CL
Vin
Q1
output impedance of cascode stage:
rx ≅ g m 2 rds 1rds 2
ix vx
vx
gm2vgs2 gs2vs2 ix
+ rds2
vgs2 vs2(gs2+gm2)
- rds2
vs2
vs2
gm1vgs1 gs1vs1
+ rds1 rds1
vgs1
-

2 rd 2 ≅ g m 2 rds1rds 2
1 æ gm ö
A v ≅ − çç
2 è g ds for high impedance Ibias with
RL ≅ g 2
r g mrds2
m− p ds − p rout ≅
for gdsn=gdsp and gmn=gmp 2
MicroLab, vlsi-25 (21/26)

JMM v1.2
Summary: Cascode and Source Deg.

 source degenerated current mirror: by addding a


resistor RS at the source node of a current mirror
fet,
fet, the output impedance can be increased:
vout
rout = = rds 2 [1 + Rs ( g m 2 + g s 2 )]
iout
 cascode current mirror: the output impedance of a
current mirror can further be increased by using
cascode fets:
fets:
v
rout = out ≅ rds 2 rds 4 g m 4
iout
Vout ≥ 2Veff + Vtn
 cascode gain stage: due to the large impedance at
the output, high gain can be realized with cascode
gain stages:

2
vout 1 æ gm ö
Av = ≅ − çç
vin 2 è g ds

MicroLab, vlsi-25 (22/26)

JMM v1.2
Coming Up...
Next topic…
Frequency response of single stage amplifiers

Readings for next time…


Johns&Martin:
 nodal analysis method
 simple CMOS current mirror (chap 3.1)
 common-
common-source amplifier (chap 3.2)
 source-
source-follower or common drain amplifier (chap 3.3)
 common gate amplifier (chap 3.4)
 source degenerated current mirror (chap 3.5)
 high-
high-output-
output-impedance current mirrors (chap 3.6)
 cascode gain stage (chap 3.7)

Exercises:
 Havea look at the exercises in Johns&Martin.
 CAD exercise Ex601

MicroLab, vlsi-25 (23/26)

JMM v1.2
VLSI--25
Exercises VLSI #1
Johns&Martin chap 3.1 pp127: 3.1 (difficulty: easy):
Consider the current mirror shown on transparency
=100µA and each transistor has
vlsi25/3 where Iin=100µ
W=10µm and L=2µ
W=10µ L=2µm. Given rds=88000 [L
(µm)]/[ID (mA
(mA)],
mA)], find rout for the current mirror
and the value of gm1. Also estimate the change in Iout
for a 0.5V change in the output voltage.
=1.76MΩ, gm1=0.45mA/V,
Result: rout =1.76MΩ
=0.28µΑ
dIout=0.28µΑ

Johns&Martin chap 3.2 pp129: 3.2 (difficulty: easy):


Consider the common source stage shown on
=100µA and all
transparency vlsi25/7 where Iin=100µ
W=10µm and L=2µ
transistor have W=10µ L=2µm. Given
rdsn=88000 [L (µ (µm)]/[ID (mA
(mA)],
mA)], rdsp=50000
(µm)]/[ID ((mA
[L (µ mA)].
mA)]. What is the gain of the
stage.
Result: Av =-287

MicroLab, vlsi-25 (24/26)

JMM v1.2
VLSI--25
Exercises VLSI #2
Johns&Martin chap 3.3 pp131: 3.3 (difficulty: easy):
Consider the source follower shown on transparency
=100µA and all transistors
vlsi25/8 where Ibias=100µ
0.5µm process have
designed with Alcatel 0.5µ
W=10µm and L=2µ
W=10µ L=2µm. Given γn=0.45V1/2,
Vsb=2V, and rds-
ds-n=88000 [L (µ(µm)]/[ID (mA
(mA)].
mA)].
What is the gain of the stage.
Result: Av =0.88

Johns&Martin chap 3.5 pp136: 3.4 (difficulty: easy):


Consider the source degenerated current mirror
shown on transparency vlsi25/15 where
=100µA and all transistors designed with
Ibias=100µ
0.5µm process have W=100µ
Alcatel 0.5µ W=100µm and
L=2µm. Given γn=0.45V1/2, Vsb=2V, Rs=5kΩ
L=2µ =5kΩ,
and rds-
ds-n=88000 [L (µ(µm)]/[ID (mA(mA)].
mA)]. What is
the increase in output resistance compared to simple
current mirror.
=16MΩ
Result: increase=9.1, rout =16MΩ

MicroLab, vlsi-25 (25/26)

JMM v1.2
VLSI--25
Exercises VLSI #3
Johns&Martin chap 3.6 pp138: 3.5 (difficulty: easy):
Consider the cascode current mirror shown on
=100µA and all
transparency vlsi25/15 where Iin=100µ
W=10µm and L=2µ
transistors have W=10µ L=2µm. Given
VSB4=1V and rds-ds-n=50000 [L (µ(µm)]/[ID (mA
(mA)].
mA)].
What is the output impedance and the minimal
output voltage.
=527kΩ, Vout(min)=1.5V
Result: rout =527kΩ

Johns&Martin chap 3.7 pp142: 3.6 (difficulty: easy):


Consider the telescopic cascode gain stage shown on
transparency vlsi25/20 assuming gm=0.5mA/V
=100kΩ. What is the output impedance and
and rds=100kΩ
gain.
=2.5MΩ, Av=-1250
Result: rout =2.5MΩ

MicroLab, vlsi-25 (26/26)

JMM v1.2
VLSI Design II
Frequency Response of
Single Stage Amplifiers

[dB]

40

20

0
103 104 105 106 107 108 109 [Hz]

Circuit Analysis
 the precise way: solving complex equations
 the approximate way: find the dominant pole
 the handy way: let Spice do it precisely

Goal: You are able to identify the dominant pole in a


transistor circuit. You can approximately determine
the contribution of each node in a circuit to the
total frequency response.
MicroLab, vlsi26 (1/29)

JMM v1.2
Outline

 Frequency response
 common-
common-source amplifier
 source-
source-follower amplifier
 source-
source-follower amplifier with compensation technique
 cascode gain stage

 Johns&Martin
 frequency response (chap 3.11)
 Gray&Meyer
 estimation of dominant poles
 zero-
zero-Value Time Constant Analysis (pp500 ff)
ff)
(Analysis and Design of Analog Integrated Circuits, 3rd
edition, Wiley and Sons, ISBN-
ISBN-0471-
0471-59984-
59984-0)

 Exercises
 hand calculations
 spice simulations

MicroLab, vlsi26 (2/29)

JMM v1.2
Frequency Response
Dominant Pole Approximation
 precise calculation of frequency response is a
complex task and thus different approximation
methods exist
 one method is the zero-
zero-value time constant analysis
 first some ideas about dominant-
dominant-pole approximation
are developed

transfer function by small-


small-signal analysis
N( s ) a 0 + a 1 s + a 2 s 2 +  + a m sm
A(s ) =
D(s ) 1 + b 1 s + b 2 s2 +  + b n sn
very often the zeros are unimportant, thus
K
A(s ) =
 s  s  s
 1 −  1 −   1 − 
 p 1  p 2   p n 
Where K is a constant and p1,p2 ... are poles of the transfer function,
n
 1
thus b 1 = ∑  − 
i= 1  pi 
MicroLab, vlsi26 (3/29)

JMM v1.2
Dominant Pole Approximation
(con’t 2)
n
 1
b 1 = ∑  − 
i= 1  pi 
an important practical case occurs when one pole is dominant

1 n
 1
p 1 << p 2 , p 3 ,
p1
>> ∑  − 
i= 2  pi 
1
thus b 1 ≅
p1

the gain magnitute in the frequency domain is

K
A ( jω) =
  ω  2   ω  2    ω  2 
 1 +    1 +      1 +   
  p 1    p 2     p n  
    
with a dominant pole we simply get
K
A ( jω) ≅
  ω 2 
1 +   
  p 1  
 
MicroLab, vlsi26 (4/29)

JMM v1.2
Dominant Pole Approximation
(con’t 3)
this approximation will be quite accurate as long as ω ≅ p 1

thus for a dominant pole situation the -3dB frequency is


1
ω−3 dB ≅ p 1 ω−3 dB ≅
b1

pole plot for a circuit with a dominant pole


s plane
σ
p3 p2 p1

MicroLab, vlsi26 (5/29)

JMM v1.2
Zero--Value Time Constant
Zero

Method for finding the time constant associated


with a capacitor in the small signal equivalent
circuit
 replace the capacitor Cx by a voltage source Vx
 set all independent sources to ground
 set all other network capacitors to zero
 find admittance Yx (=1/Rx) which is driven by a
voltage source Vx
 the time constant τx is given by:

τ x = Rx C x

MicroLab, vlsi26 (6/29)

JMM v1.2
Frequency Response
Zero-
Zero-Value Time Constant
RL

Rin

+ vout
vin ~
-

i3
Cx
+
v3
- i2
Rin rb Cµ
+ -
+ + v2
vin ~ i1 C v rπ vout
- 1
- π gmv1 RL

We can show that with this choice od variables the circuit equations are of the form:

i 1 = (g 11 + sC π )v 1 + g 12 v 2 + g 13 v 3
i 2 = g 21 v 1 + (g 22 + sC µ )v 2 + g 23 v 3
i 3 = g 31 v 1 + g 32 v 2 + (g 33 + sC x )v 3

MicroLab, vlsi26 (7/29)

JMM v1.2
Zero--Value Time Constant
Zero
(con’t 1)
determinant ∆ of the
The poles of the transfer function are the zeros of the determinant
circuit equations, which can be written in the form:
∆(s ) = K 0 + K 1 s + K 2 s2 + K 3 s3
∆ ( s ) = K 0 (1 + b 1 s + b 2 s 2 + b 3 s 3 )
If all capacitors are zero:

K 0 = ∆ C π =C µ = C x = 0 ≡ ∆ 0
Consider now the term K1s, this is the sum of the terms involving s that are
obtained when the system determinant is evaluated. However it is apparent,
that s only occurs when associated with a capacitance:
K 1 s = h 1 sC π + h 2 sC µ + h 3 sC x
The terms are constants. h1 can be evaluated by expanding the determinant
about the first row:

∆ ( s ) = (g 11 + sC π )∆ 11 + g 12 ∆ 12 + g 13 ∆ 13
With cofactors ∆xx of the determinant. The term sCπ is found by evaluating
∆11 with Cµ and Cx equal zero

h 1 = ∆ 11 Cµ =C x =0

MicroLab, vlsi26 (8/29)

JMM v1.2
Zero--Value Time Constant
Zero
(con’t 2)

Now consider expansion of the determinant about the second row.

∆ ( s ) = g 21 ∆ 21 + (g 22 + sC µ )∆ 22 + g 23 ∆ 23
With cofactors ∆xx of the determinant. The term sCµ is found by evaluating
∆22 with Cπ and Cx equal zero

h 2 = ∆ 22 C π =C x =0

similarly
h 3 = ∆ 33 C µ = Cπ = 0

Combining these equations gives:

K 1 = ∆ 11 Cµ =C x =0 C π + ∆ 22 C π =C x =0 C µ + ∆ 33 C µ =C π = 0 Cx

and:

K 1 ∆ 11 Cµ =C x =0 ∆ 22 C π =C x =0 ∆ 33 Cµ =C π =0
b1 = = Cπ + Cµ + Cx
K0 ∆0 ∆0 ∆0

MicroLab, vlsi26 (9/29)

JMM v1.2
Zero--Value Time Constant
Zero
(con’t 3)
Now consider putting i2=i3=0 and solving for v1
v 1 ∆ 11
=
i1 ∆(s )
The driving-
driving-point resistance at the Cπ node pair with all capacitors
equal to zero:
∆ 11 Cµ = C x =0 ∆ 11
= Cµ = C π = C x =0
∆0 ∆
We now define
∆ 11
R π0 = Cµ =C x =0
∆0

We can write now:

b 1 = R π0C π + R µ0Cµ + R x 0C x
Thus:
1 1
ω−3 dB ≅ ω−3 dB ≅
b1 ∑T 0
Thus the sum of the zero-
zero-value time constants leads to the -3dB frequency

MicroLab, vlsi26 (10/29)

JMM v1.2
Summary: Frequency Analysis Methods
The precise way:
 Add the parasitic capacitors to the equivalent circuit. Use
nodal analysis for evaluating the transfer function.
The approximate way:
 if there exists a pole p1 <<p2, p3 ,..., and the transfer
function is already given be the transfer function
A(s)=N(s)/D(s)
with D ( s ) = 1 + b1 s + b2 s 2
+ l + bn s n

the pole p1 is given by: p1 = 1 / b1


 the dominant pole may be found directly in the circuit
diagram by looking for the node with the largest
impedance. Take care of the Miller Effect.
 The time constant (and its influence on the frequency
response) associated with a single parasitic capacitor can
be estimated with the zero value time constant method:
 set all independent sources to zero
 replace the interesting capacitor Cx by a voltage source Vx
 set all other capacitors to zero
 evaluate the impedance Rx seen by the voltage source Vx
 the time constant is equal to CxRx

The handy way:


 AC analysis with Spice
MicroLab, vlsi26 (11/29)

JMM v1.2
Frequency Response
Common-
Common-Source Amplifier
 precise calculation of frequency response is most
often left to computer simulations
 much insight can be obtained by finding the
dominant frequency effects (dominant poles, zeros)

Rin v1 Cgd1 vout

+ gm1vgs1
vin ~ vgs1
- Cgs1 C2
R2

Cdb of Q1 and Q2
and load CL
rds of Q1 and Q2

nodel analysis ...

MicroLab, vlsi26 (12/29)

JMM v1.2
Frequency Analysis ((con’t
con’t))
con’t
 C gd 1 
− g m 1R 2  1 − s  at frequencies
v out  gm1  where gain has just
= started to decrease
v in 1 + sa + s 2 b
a = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )

b = R inR 2 (C gd 1 C gs 1 + C gs 1 C 2 + C gd 1 C 2 )
1
ω−3 db =
a
1
ω−3 db =
R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
Miller capacitance for Rin >> R2

analysis for high frequencies for widely separated poles


 s  s  s s 2
D( s ) =  1 +  1 +
 
 ≅ 1+
 +
 ω p 1  ω p 2  ωp 1 ωp 1 ωp 2

g m 1 C gd 1
ωp 2 ≅
C gs 1 C gd 1 + C gs 1 C 2 + C gd 1 C 2

MicroLab, vlsi26 (13/29)

JMM v1.2
Frequency Response
Source-
Source-Follower Amplifier
 source followers can have complex poles and thus
exhibit overshoot
 a compensation technique resulting in only real axis
poles is shown, resulting in no overshooting

Q1

vout
Iin Rin Cin
Ibias CL

Cgd1 vd1

+ gm1vgs1 gs1vs1 rds1


iin Rin Cin Cgs1 -vgs1
vs1 vout
rds2 Cs Cs=CL+Csb1

MicroLab, vlsi26 (14/29)

JMM v1.2
Source--Follower Amplifier
Source
(con’t 1)
vg1 Yg

+ gm1vgs1
iin Rin C’in Cgs1 -vgs1
vs1 vout

C’in=Cin+Cgd1 Rs1 Cs

R s 1 = rds 1 rds 2 (1 / g s 1 )

1. gain from vg1 to vout is found


2. admittance Yg looking into gate of Q1 without considering Cgd1 is found
3. Gain from iin to vg1 is found
4. overall gain from vin to vout is found and results interpreted

1. gain from vg1 to vout is found

v out (sC s + sC gs 1 + G s 1 ) − v g 1 sC gs 1 − g m 1 (v g 1 − v out ) = 0


v out sC gs 1 + g m 1
=
v g 1 s(C gs 1 + C s ) + g m 1 + G s 1

MicroLab, vlsi26 (15/29)

JMM v1.2
Source--Follower Amplifier
Source
(con’t 2)
1. gain from vg1 to vout is found
2. admittance Yg looking into gate of Q1 without considering Cgd1 is found
3. Gain from iin to vg1 is found
4. overall gain from vin to vout is found and results interpreted

2. admittance Yg looking into gate of Q1 without considering Cgd1 is found

ig1t sC gs 1 (sC s + G sq )
Yg = =
v g1 s(C gs 1 + C s ) + g m 1 + G s 1
3. Gain from iin to vg1 is found

v g1 s(C gs 1 + C s ) + g m 1 + G s 1
=
iin a + sb + s 2 c
4. overall gain from vin to vout is found and results interpreted
v out sC gs 1 + g m 1
A(s ) = =
iin a + sb + s 2 c

MicroLab, vlsi26 (16/29)

JMM v1.2
Source--Follower Amplifier
Source
(con’t 3)
N( s )
ω0 is the pole frequency
Q is the Q factor
A ( s ) = A (0 )
s s2
1+ + 2
ω 0 Q ω0
There is no peaking and the transfer functions maximum is at dc if:
Q < 1 / 2 ≅ 0.707
ω0 is the -3dB frequency if: Q = 1/ 2
Step input function:
no peaking for Q ≤ 0 .5
peaking for Q > 0 .5
(complex conjugate poles) 4 Q 2 −1
% overshoot = 100 e −π /
For the source follower:
− gm1 G in (g m 1 + G s 1 )
ω0 =
ωZ =
C gs 1 C gs 1 C s + C 'in (C gs 1 + C s )

G in (g m 1 + G s 1 )[C gs 1 C s + C 'in (C gs 1 + C s )]
Q=
G in C s + C 'in (g m 1 + G s 1 ) + C gs 1 G s 1
Source follower circuits can exhibit large amounts of overshoot under certain
conditions. In practical uE circuits the parasitic capacitances and the output
capacitance results in only moderate overshoot for worst-
worst-case conditions.
MicroLab, vlsi26 (17/29)

JMM v1.2
Source--Follower Amplifier
Source
Compensation Technique
 source followers can have complex poles and thus
exhibit overshoot
 overshooting may be reduced by:
 increasing Cin
or Cs or both
 adding a compensation network

Q1

C1 vout
Iin Rin Cin
Ibias CL
R1

C gs 1 (C s g m 1 − C gs 1 G s 1 ) g m 1 C gs 1 C s
C1 = ≅
(g m 1 + G s1 )(C gs 1 + C s ) (g m1 + G s1 )(C gs 1 + C s )
(C + G )
gs 1 s (C 2
gs 1 + G s )2

R1 = ≅
C gs 1 (C g − C G ) C
s m1 gs 1 s1 Cg
gs 1 s m 1

C gs 1 C s
C2 =
C gs 1 + C s (see Johns/Martin pp160-
pp160-162)
MicroLab, vlsi26 (18/29)

JMM v1.2
Frequency Response
Common-
Common-Gate Amplifier
 The frequency response of the common-
common-gate stage
is usually superior to that of the common-
common-source
stage due to the low impedance, rin, at the source
node, assuming GL=(sC
=(sCL+gds2)is not considerably
smaller than gds1.

Ibias

Q1
vout
Vbias CL
=
vout

(see Johns/Martin pp160-


pp160-162)
MicroLab, vlsi26 (19/29)

JMM v1.2
Frequency Response
High-
High-Ouput Impedance Mirrors
 Both the Wilson and the cascode current mirrors
introduce high-
high-frequency poles into the signal
transfer function.
 The approximate time constant of these poles is
Cgs/gm, the roof of this statement can be found by
doing high-
high-frequency, small-
small-signal analysis.

Iin Iout Vout Iin Iout


rout rin rout

Q3 Q4 Q3 Q4

Q1 Q2 Q1 Q2

(see Johns/Martin pp163)


MicroLab, vlsi26 (20/29)

JMM v1.2
Frequency Response
Cascode Gain Stage
 The exact high-
high-frequency analysis of a cascode gain
stage is usually left to simulation on a computer.
 at high-
high-frequencies, the time constant due to the
output node almost always dominates since the
impedance is so large at that node:
 Cout=(Cgd2+Cdb2)+CL+Cbias
 CL is normally the major contributor

Ibias

Vout
Vbias
Q2 CL
Vin
Q1

1 2 g 2ds
ω−3 dB ≅ ≅
R out C L g mCL

MicroLab, vlsi26 (21/29)

JMM v1.2
Cascode Gain Stage
(con’t 1)
 Zero-
Zero-value time constant analysis method used

Ibias C d 2 = C gd 2 + C db 2 + C L + C bias
Vout C s 2 = C db 1 + C sb 2 + C gs 2
Vbias
Q2 CL
Vin
Q1 gm2vs2 vout

vg1 Cgd1 rds2 Cd2 GL


vs2
vin
Cgs1 gm1vg1 rds1 Cs2

 All independent sources have to be set to zero


(vin=0)
node vg1 τ Cgs 1 = C gs 1R in

MicroLab, vlsi26 (22/29)

JMM v1.2
Cascode Gain Stage
(con’t 2)
gm2vs2 vout

vg1 Cgd1 rds2 Cd2 GL


vs2
vin
Cgs1 gm1vg1 rds1 Cs2

nodes vg1,vs2 the capacitor Cgd1 is replaced by a voltage source vx in order


to calculate the input resistance seen from that node.

vx ix
vg1
-~ + G d 1 = g ds 1 + Ys 2
admittance looking into the source
Rin gm1vg1 Rd1
of a cascode transistor is Ys2

τ Cgd 1 = C gd 1R d 1 (1 + R in [G d 1 + g m 1 ])

MicroLab, vlsi26 (23/29)

JMM v1.2
Cascode Gain Stage
(con’t 3)
gm2vs2 vout

vg1 Cgd1 rds2 Cd2 GL


vs2
vin
Cgs1 gm1vg1 rds1 Cs2

G d 1 = g ds 1 + Ys 2
admittance looking into the source
of a cascode transistor is Ys2 gm2vs2 vout

Ys2=is/vs2 vs2 rds2 Cd2 GL


for
g ds << g m R L ≅ g mrds2 is
(see cascode current mirror
impedance, pp137, vlsi-
vlsi-25/17)
Ys 2 ≅ g ds
rds
τ Cgd 1 ≅ C gd 1 (1 + g mR in )
2
g mrds2
τ Cgd 1 ≅ C gd 1 for Rin is large and equal rds
2
MicroLab, vlsi26 (24/29)

JMM v1.2
Cascode Gain Stage
(con’t 4)
gm2vs2 vout

vg1 Cgd1 rds2 Cd2 GL


vs2
vin
Cgs1 gm1vg1 rds1 Cs2

node vs2 the resistance seen by the capacitor Cs2 is rds1 in paralell
with the impedance seen looking in the source of Q2 which
is approximately rds, thus:
rds
τ Cs 2 ≅ C s2
2
The resistance seen by C is the output impedance of the
node vout cascode amplifier, thus: d2
g mrds2
τ Cd 2 ≅ C d 2
2

τ total ≅ τ Cgs 1 + τ Cgd 1 + τ Cs 1 + τ Cd 1


g mrds2 r g m r 2

τ total ≅ C gs 1R in + C gd 1 + C s 2 ds + C d 2 ds
2 2 2
MicroLab, vlsi26 (25/29)

JMM v1.2
Cascode Gain Stage
Comments
 High frequencies considerations

Ibias one pole dominates, thus the gain is:

Av
Vbias
Vout
A(s ) =
Q2 1 + s / ω−3 dB
CL
Vin at frequencies substantial larger than ω-3dB:
Q1 Av gm1
A(s ) ≅ ≅−
s / ω−3 dB sC L

 upper limit of the unity-


unity-gain frequency of an
amplifier that uses a cascode gain stage is limited
by source node of Q2:
1 3µ p Veff 2
ωp 2 = >
τ s2 2 L22

MicroLab, vlsi26 (26/29)

JMM v1.2
Coming Up...
 Next topic…
Basic OpAmp design and compensation

 Readings
for next time…
Johns&Martin: Sections 3.11

 Exercises:
Have a look at the exercises in Johns&Martin.

MicroLab, vlsi26 (27/29)

JMM v1.2
VLSI--26
Exercises VLSI #1
Johns&Martin chap 3.11 pp156: 3.8 (difficulty: easy):
Consider the common-
common-source amplifier shown on
=100µA and all
vlsi-26/6 where Iin=100µ
transparency vlsi-
W=100µm and L=1.6µ
transistors have W=100µ L=1.6µm. Given
=180kΩ, CL=0.3pF,
Rin=180kΩ, =0.3pF, Cgs1=0.2pF,
=0.2pF, Cgd1=15fF,
=15fF,
=20fF, Cdb2=36fF,
Cdb1=20fF, =36fF, µnCox=90µ
=90µA/V2,
=30µA/V2, and rds-
µpCox=30µ ds-n=8000 [L (µ(µm)]/[ID
mA)], rdsp=12000 [L (µ
(mA)], (µm)]/[ID (mA
(mA)].
mA)].
Estimate the 3db frequency response.
Result: f-3db =554kHz

Johns&Martin chap 3.11 pp160: 3.9 (difficulty: easy):


Analyse the source follower and assume that
=100µA and all transistors have W=100µ
Ibias=100µ W=100µm
and L=1.6µ =180kΩ, CL=10pF,
L=1.6µm. Given Rin=180kΩ, =10pF,
=0.2pF, Cgd1=15fF,
Cgs1=0.2pF, =15fF, Csb1=40fF,
=40fF, Cin=30fF,
=30fF,
=90µA/V2, µpCox=30µ
µnCox=90µ =30µA/V2, and rds- ds-
n=8000 [L (µ (µm)]/[ID ((mAmA)]. Find ω0, Q, and
mA)].
ωz of the source follower.
Result: ω0 =52MHz, Q=0.8, % overshoot = 8.1%,
ωz=5.3GHz
MicroLab, vlsi26 (28/29)

JMM v1.2
VLSI--26
Exercises VLSI #2
Johns&Martin chap 3.11 pp166:3.11 (difficulty: easy):
Assume that for the input transistors and the
=100kΩ,
cascode transistors, gm=1mA/V, rds=100kΩ
=180kΩ, CL=5pF,
Rin=180kΩ, =5pF, Cgs=0.2pF,
=0.2pF, Cgd=15fF,
=15fF,
=40fF, Cdb=20fF,
Csb=40fF, =20fF, Cbias=20fF,
=20fF, Estimate the -
dB frequency of the cascode amplifier (transparency
19).
=2π 6.3MHz
Result: ω-3dB =2π

Johns&Martin chap 3.11 pp168: 3.12 (difficulty: easy):


Estimate the lower bound on the frequency of the
second pole of a folded-
folded-cascode amplifier for a
0.8µm technology, where a typical value of 0.25V
0.8µ
is chosen for Veff2. L2=1.5Lmin, µp=0.02m2/Vs.
=2π 414MHz
Result: ωp2 =2π

MicroLab, vlsi26 (29/29)

JMM v1.2
Analog Microelectronics
Basic OpAmp Design
and Compensation

Today’s handouts:
(1) Lecture Slides
MicroLab, vlsi27 (1/34)

JMM v1.0
Outline

u Johns&Martin
u MOS differential pair and gain stage (chap 3.8)
u two-stage CMOS OpAmp (chap 5.1)
u gain
u frequency response
u systematic offset voltage
u n- or p-channel input stage

u feedback and OpAmp compensation (chap 5.2)


u first-order model of closed loop-amplifier
u linear settling time
u OpAmp compensation
u compensation of two-stage OpAmp
u lead compensation
u making compensation independent of process and temp
u biasing OpAmp to have stable transconductance

u Exercises (5.3-5.5)
u hand calculations
u spice simulations

MicroLab, vlsi27 (2/34)

JMM v1.0
MOS Differential Pair
and Gain Stage
u most integrated amplifiers have differential input,
realized with a differential transistor pair

ID2 ID2

V+ V-
Q1 Q2

Ibias

ua low-frequency small-signal equivalent circuit is


based on the T model for the MOS transistor

id1=is1 id2=is2

v+ v-
is1 is2
rs1 rs2
gate current is
zero in T model

MicroLab, vlsi27 (3/34)

JMM v1.0
MOS Differential Pair
(con’t 1)
to simplify analysis the output impedance of the transistor is ignored

id1=is1 id2=is2

Definition: v+ v-
is1 is2
v in ≡ v + − v − rs1 rs2

i
v in v in s1
id 1 = i s 1 = =
rs1 + rs2 1 / g m1 + 1 / g m2

since both Q1 and Q2 have the same bias currents, gm1=gm2

g m1 gm1
id 1 = v in id 2 = − v in
2 2
Definition: thus:
iout ≡ i d 1 − id 2 iout = g m 1 v in

MicroLab, vlsi27 (4/34)

JMM v1.0
MOS Differential Pair
(con’t 2)
If a differential pair has a current mirror as an active load, a complete
differential-input, single-ended-output gain stage can be realized.

to simplify analysis the output Q3 Q4


impedance of the transistor is ignored rout
is1 vout
id4
i d 4 = i d 3 = −i s 1 +
is1
and vin Q1 Q2
id 2 = −i s1 -
Ibias

v out = (− i d 2 − id 4 )rout = 2 i s1rout = g m1routv in


this result assumes that the output impedance is purely resistive, if there
is also a capacitive load CL we get:

A v = g m1 z out where z out = rout 1 / (sCL )


vout
Thus, for this differential stage, a very +
simple model is used. This model implicitly
assumes that the time constant at the output vin gm1vin rout CL
node is much larger than the time constant -
due to the parasitic capacitances at Q1 and Q2
zout

MicroLab, vlsi27 (5/34)

JMM v1.0
MOS Differential Pair
(con’t 3)
The evaluation of the output resistance rout is determined by using the
small-signal equivalent circuit and applying a voltage at the output node.
Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid-π
model is used for Q4.

vx Q3 Q4
rout ≡ rout
ix is1
id4 vout

+
is1
vin Q1 Q2
+ gm4va -
rds3 //rs3 va rds4 Ibias
-
ix1 ix4 ix
is5

ix2 ix3 vx +
-
is1 is2
rout = rds2 rds4
is1 rs1 rds1 rds2 rs2 is2

A v = g m1 (rds 2 rds4 )

MicroLab, vlsi27 (6/34)

JMM v1.0
MOS Differential Pair
(con’t 4)
The evaluation of the large signal amplification is determined by using the
large-signal transistor model in the active region of the fets.
Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid-π
model is used for Q4.

µC W Q3 Q4
ID = 0 ox (VGS − Vtn )2 IS1
Iout
Vout
2 L ID4
β +
ID = (VGS − Vtn )2 VIN Q1
ID1 ID2
Q2
2
-
Ibias
β VIN2 β 2 VIN4
I OUT = ID 1 − I D 2 = I bias − 2
I bias 4Ibias
1.5
IOUT
1
Ibias
0.5

0
2
-3 -2 -1 0 1 2 3
-0.5
VI D β / Ib i a s
-1

-1.5

typical value for Ibias=0.1mA: β / I b i a s = 5. 4 VIN = 187mV


MicroLab, vlsi27 (7/34)

JMM v1.0
Two-Stage CMOS OpAmp
u Basic OpAmp design are discussed
u OpAmp gain
u frequency response
u slew rate
u systematic offset voltage
u n-channel or p-channel input stage

capacitor ensures
stability when OpAmp CC is often called
is used in feedback Miller capacitance
to illustrate its
CC effect on input

+
Vin A1 -A2 1 Vout
-

differential second output


input stage gain stage buffer output gain stage
only present when
resistive loads
single need to be driven
ended output e.x. common-source
gain stage with
active load

MicroLab, vlsi27 (8/34)

JMM v1.0
CMOS realization of a
two-stage OpAmp
p-well process
necessary
VDD
25 25 Q 11 Q5300 Q6 300
Q 10 500

Q8
300 300
25 25
Q1 Q2
Vin- Vin+ Vout
Q 14 Q 12
Q 16 CC
100 25
Q 15 Q 13 150 150 300 500
Q4
Q3
Q7 Q9
Rb VSS

bias circuit differential input common source output


first stage second stage buffer

u p-channel input stage


u all transistor lengths are 1.6µm (1µm process)
u reasonable sizes for lengths of the transistors might be somewhere
between 1.5 and 2 times the minimum transistor length
MicroLab, vlsi27 (9/34)

JMM v1.0
Two-Stage OpAmp
Gain
u overallgain for low frequency application is the
most critical parameter of an OpAmp
gain of the first stage
(differential stage)
Av1 = gm1 (rds1 rds 4 )

 W  W  I bias
g m1 = 2 µ p C ox   I D1 = 2 µ pC ox  
 L 1  L 1 2
Li approximation to the finite output
rdsi ≅ α VDGi + Vti resistance, where a is technology
I Di dependent parameter: 5e-6 V1/2/m
ignoring short channel effects

gain of the second stage Av 2 = − gm 7 (rds 6 rds 7 )


(common-source stage)

gm 9
gain of the third stage Av 3 =
(common-drain stage) G L + g m 9 + g ds8 + g ds9
gain of the third stage gm 9
with body effect (bulk Av 3 =
not connected to source) G L + g m 9 + g s8 + gds8 + g ds9
body effect constant γ=0.5V1/2 g mγ
2φF=0.7V gs =
2 VSB + 2φ F
MicroLab, vlsi27 (10/34)

JMM v1.0
Two-Stage OpAmp
Frequency Response
u frequency response where capacitor Cc causes the
magnitude of the gain to decrease, but still well
below unity gain frequency (open-loop gain = 1)
ð midband frequency
u only compensation capacitor CC repsected
u assume Q16 is not present (resistor for lead
compensation, effect only at unity gain frequency)
u discuss simplified circuit:
g m1
midband gain Av (s ) ≅
Q 300
5 sC C
Vbias g m1
untity gain frequency ω ta ≅
CC
300 300
vin- Q1 Q2 CC
v1 v2

vin+

-A2 A3
150 150 i=gm1 vin vout
Q3 Q4

MicroLab, vlsi27 (11/34)

JMM v1.0
Two-Stage OpAmp
Slew Rate
u slew rate SR is the maximum rate the output
changes when input signals are large
u at slew rate limitation all current of Q5 goes either
in Q1 or Q2
ð this current has to go through CC

dv out
SR ≡ max
dt
Q5300
Vbias 2 I D1
SR = = Veff 1ω ta
CC
300 300
vin- Q1 Q2 CC
v1 v2

vin+

-A2 A3
150 150 I vout
Q3 Q4

increasing V eff1 and ω ta increases SR


p-channel fet inputs increases SR
increasing V eff1 reduces transconductance gm1
MicroLab, vlsi27 (12/34)

JMM v1.0
Two-Stage OpAmp
Systematic Offset Voltage Cancelation
u two-stage OpAmps may have a systematic input
offset voltage if not properly designed
u the differential input is zero: v in+= vin-
u ID6 = ID7 , which requires a well defined V GS7 value

VDD
Q5300 Q6 300
Vbias

300 300
Q1 Q2 Vout
Vin- Vin+

150 150 300


Q3 Q4
Q7

(W / L)7 = 2 (W / L )6
(W / L)4 (W / L)5
MicroLab, vlsi27 (13/34)

JMM v1.0
Two-Stage OpAmp
n- or p- channel input stage
u comparison between n- and p-channel input stage
OpAmps
u overal dc gain is largely unaffected since both designs
have one stage with n-channel and one stage with one or
more p-channel driving fets.
u for a given power dissipation, and therefore bias current,
having a p-channel input-pair stage maximizes the slew
rate.
u having a p-channel input first stage implies that the
second stage has an n-channel input drive fet. This
arrangement maximizes the transconductance of the drive
fet of teh 2nd stage, which is critical when high
frequency operation is important.
u output stage: n-channel source follower is preferable
because this will have less of a voltage drop (if separate
p-well is used). Its higher transconductance reduces the
effect of the load cap on the second pole. There is also
less degradation on the gain when small load resistances
are being driven.
ð p-channel input fets for the first stage is almost
always the best choice

MicroLab, vlsi27 (14/34)

JMM v1.0
Feedback and OpAmp Compensation

u OpAmps in closed-loop configurations are discussed


and how to compensate an OpAmp to ensure that
the closed-loop configuration is not only stable but
has a good settling characteristic.
u Optimum compensation of OpAmps is typically
considered to be one of the most difficult parts in
the OpAmp design procedure.
u first-order model of closed-loop amplifier
u linear settling time
u OpAmp compensation
u compensating the two-stage OpAmp
u lead compensation
u making compensation independent of process and
temperature
u biasing an OpAmp to have stable transconductances

MicroLab, vlsi27 (15/34)

JMM v1.0
First Order Model of
Closed-Loop Amplifier
u First order model of transfer function of a
dominant-pole compensated OpAmp:
A0
A(s ) = real axis
(1 + s / ω p1 ) dominant pole

A0
unity gain frequency definition A( jω ta ) ≡ 1 ≅
ω ta / ω p1
unity gain frequency of first ω ta ≅ A0ω p1
order OpAmp model

for midband frequencies ω p1 << ω << ω ta


ω ta
A(s ) ≅
s A( s )
closed-loop gain ACL ( s ) =
1 + βA(s )
1 1
ACL ( s ) =
β (1 + s / βω ta )
gain
β
- ω −3dB ≅ βω ta
Ain(s) Aout(s)
+ A(s)

MicroLab, vlsi27 (16/34)

JMM v1.0
Linear Settling Time
u the settling time performance is an important
design parameter of OpAmps
u the charge transfer in SC circuits is closely related to
OpAmps step response
u settling time is defined as the time it takes for an
OpAmp to reach a specified percentage of its final value
when a step input is applied
u linear settling time portion is due to the finite unity gain
frequency (independent on output step size)
u nonlinear settling time portion is due to the slew rate
limit (dependent on output step size)
ð unity gain frequency estimation for linear settling time
portion

-3dB frequency determines 1 1


the settling-time response τ= =
for s step input ω −3dB βω ta

vout (t ) = Vstep (1 − e −t / τ )
step response for a
closed-loop OpAmp

if slew rate is larger, Vstep


vout (t ) t =0 =
d
no SR limit will occur
dt τ
MicroLab, vlsi27 (17/34)

JMM v1.0
OpAmp Compensation
(second order model)
u for compensating OpAmps the first order model is
insufficient, because it ignores poles and zeros at
high frequencies which may cause instabilities.
u a more accurate open-loop transfer model adds one
additional pole (real axis poles and zeros):
A0
A(s ) =
(1 + s / ω p1 )(1 + s / ω eq )
first dominant pole higher frequency poles
u ωeq may be approximated with a set of real-axis
poles and zeros:
m n
1 1 1
≅∑ −∑
ω eq i =2 ω pi i=1 ω zi
u phase margin PM is an often used measure how far
an OpAmp with feedback is from becoming unstable
∠LG ( jω ) = −90 − tan (ω / ω eq )
o −1 unity gain
of LG
PM ≡ ∠LG( jω t ) − ( −180o ) = 90o − tan −1 (ω t / ω eq )
independent of β ω t = tan(90 o − PM )ω eq
MicroLab, vlsi27 (18/34)

JMM v1.0
OpAmp Compensation
(second order model con’t)
u Closed-loop gain if β is frequency independent (if
ωt is far away from high frequency poles and zeros)
ACL0
ACL ( s ) =
s (1 / ω p1 + 1 / ω eq ) s2
1+ +
1 + βA0 1 + βA0
A0 1
ACL0 = ≅
1 + βA0 β
u General equation for a second order transfer function:
K
H 2 (s) =
s s2
1+ + 2
ω oQ ω 0
u comparing:
ω 0 = (1 + βA0 )(ω p1ω eq ) ≅ βω taω eq

(1 + βA0 )(ω p1ω eq ) βω ta


Q= ≅
1 / ω p1 + 1 / ω eq ω eq
−π
4 Q 2 −1
% overshoot = 100
MicroLab, vlsi27 (19/34)

JMM v1.0
OpAmp Compensation
(2nd order transfer function)
u Relationship between Q factor and phase margin
u transfer function: Q=sqrt(1/2):
u no peaking
u widest passband
u ω0 = ω -3dB

u step response: Q<=0.5 (real poles and zeros)


u no peaking
u step response: Q > 0.5
u percentage of overshoot to be calculated

PM ω t/ω eq Q factor % overshoot

55 0.700 0.925 13.3%


60 0.580 0.817 8.7%
65 0.470 0.717 4.7%
70 0.360 0.622 1.4%
75 0.270 0.527 0.008%

u Phase margin is much larger than supposed to be


necessary (80 to 85)

MicroLab, vlsi27 (20/34)

JMM v1.0
Compensating the Two-Stage OpAmp

u Capacitor CC realizes dominant-pole compensation and


thereby control ωp1 and ωta :
ω ta = A0ω p1
u fet Q16 is included to realize a left-half-plane zero at
frequencies around or slightly above ωt (lead-
compensation). Q16 has Vds=0V and thus is in triode
region: 1
RC = rds16 =
 W
µ nC ox   Veff 16
 L 16
VDD
Q5300 Q6 300
Vbias

Vin- 300 300 Vin+


Q1 Q2 Vout2

Vbias
Q 16 CC

150 150 300


Q3 Q4
Q7

MicroLab, vlsi27 (21/34)

JMM v1.0
Compensating the Two-Stage OpAmp
small-signal model
u simplified small-signal model of two-stage OpAmp
for compensation analysis
v1 CC RC vout2
gm1vin1 gm7v1

R1 C1 R2 C2

R1 = rds 4 rds2 C1 = C db2 + C db4 + C gs7


R2 = rds 6 rds 7 C 2 = C db7 + C db6 + C L 2
analysis shown in Johns&Martin
dominant pole: nondominant pole:
1 gm 7
ω p1 ≅ ω p2 ≅
g m 7 R1 R2C C C1 + C 2
gm 7
for RC=0: ωz = −
CC
−1
lead compemsation ωz =
(RC not zero) CC (1 / g m 7 − RC )
MicroLab, vlsi27 (22/34)

JMM v1.0
Compensating the Two-Stage OpAmp
(discussion)
 s  s 
D( s ) = 1 + 1 +
 ω 

 ω p1  p2 

I
CC CC
gm7

R
ω p2 ω p1 ωz

1 gm 7
ω p1 ≅ ωz = −
g m 7 R1 R2C C CC
gm 7
ω p2 ≅
C1 + C 2

u increasing gm7 separates poles (pole-splitting)


u however, right-hand plane zero introduces negative
phase shift into transfer function
u increasing CC moves ωp1 and ωz1 to low frequency
and thus does not help
MicroLab, vlsi27 (23/34)

JMM v1.0
Compensating the Two-Stage OpAmp
(lead compensation)
u with a non-zero RC, a third pole is introduced, but
is at high frequency and has almost no effect
u However the zero opens a number of possibilities:
−1
ωz =
CC (1 / g m 7 − RC )
u one could eliminate the right-half plane zero:

RC = 1 / g m 7
u one could choose RC to be even larger and thus move the
right-half-plane zero into the left half plane to cancel
the nondominant pole ωp2:
1  C1 + C 2 
RC = 1 + 
gm 7  CC 
u one could choose RC even larger to move the now left-
half-plane zero to a frequency slightly greater than the
unity-gain frequency that would result without the
resistor - say 20% larger (recommended): ω = 1.2ω
z t
1
RC ≅
1.2 g m1
MicroLab, vlsi27 (24/34)

JMM v1.0
Lead Compensation
Design Procedure
ŒStart by choosing, somewhat arbitrarily, C C' ≅ 5pF
•Using Spice, find the frequency at which a -125°
phase shift exists. Let the gain at this frequency be
denoted A’ and ωt.
ŽChoose a new CC so that ωt becomes the unity-gain
frequency of the loop gain, thus resulting in a 55°
phase margin. This can be achieved by taking CC
according to the equation (iterations possible):
C C = C C' A'
•Choose RC according: 1
RC =
1.2ω t C C
•The resulting phase margin is approximately 85°
(leaving 5° for process variations). It may be neces-
sary to iterate on RC to optimize the phase margin
•If after step 4 the phase margin is not adequate, then
increase CC while leaving RC constant
‘Replace RC by a fet with the following size:
1
RC = rds16 =
 W
µ nC ox   VeffMicroLab,
16
vlsi27 (25/34)

JMM v1.0
 L 16
Compensation Independent of
Process and Temperature
u Making lead compensation process and temperature
insensitive
u the ratios of all transconductances remain relatively
constant over process and temperature variations as
all fets depend on the same biasing network:
gm 7 g m1
ω p2 ≅ ω ta ≅
C1 + C 2 CC
u when a resistor is used to realize lead
compensation, RC can also be made to track the
inverse of transconductance (1/gm7), and thus the
lead compensation will be mostly independent of
process and temperature variantions:
−1
ωz =
CC (1 / g m 7 − RC )

MicroLab, vlsi27 (26/34)

JMM v1.0
Compensation Independent of
Process and Temperature (con’t 2)
Making RC proportional to 1/gm7
1
RC = rds16 =
 W
µ nC ox   Veff 16
 L 16
g m 7 = µ nC ox (W / L )7 Veff 7

The product RC 1/gm7 needs to be constant

(W / L )7 Veff 7
RC g m 7 =
(W / L)16 Veff 16
Therefor, all that remains is to ensure that Veff16 /Veff7 is independent of
process and temperature variations. The ratio can be made constant by
deriving Vgs16 from the same biasing circuit used to derive Vgs7

u The following approach results in the possibility of


on-chip “resistors”, realized by using triode-region
fets that are accurately ratioed with respect to a
single off-chip resistor -> modern µcircuit design
MicroLab, vlsi27 (27/34)

JMM v1.0
Compensation Independent of
Process and Temperature (con’t 3)
if Veff 13 = Veff 7 25 Q 11 Q6
Vbias
then Va = Vb
then (gates connected)
25
Veff 16 = Veff 12 Q 12
Q 16 CC
Veff 7 Veff 13 25
Va
thus = Q 13
Veff 16 Veff 12 300

to make Veff 13 = Veff 7 we need Vb Q7

2ID7 2 I D13
=
µ nC ox (W / L )7 µ nC ox (W / L)13
I D 7 (W / L)7 I D 7 (W / L)6
= =
however the current
I D13 (W / L )13 I D13 (W / L)11
is set by Q6, Q11

(W / L)6 = (W / L )11
condition to be satisfied
(W / L)7 (W / L)13

RC g m 7 =
(W / L )7 (W / L )12
(W / L )16 (W / L )13 as ID12=ID13 are equal

MicroLab, vlsi27 (28/34)

JMM v1.0
Biasing an OpAmp
to Have Stable Transconductances
u Fet transconductances are the probably the most
important parameters in OpAmps to be stabilized
u the following approach matches transconductances
to conductance of a resistor
u as a result, the fet transconductances are
independent of power-supply voltage as well as
process and temperature variations
assuming (W / L )10 = (W / L )11
25 25

2 1 −
(W / L )13  Q 10
Q 11

( ) 
g m13 =  W / L 15 

Rb
25 25
for (W / L )15 = 4(W / L)13 Q 14 Q 12

1 100 25
g m13 = Q 15 Q 13
Rb
Rb
µ i (W / L)i I Di
g mi = × g m13
µ n (W / L )13 I D13
MicroLab, vlsi27 (29/34)

JMM v1.0
Exercises VLSI-27
Ex ana3.9 (difficulty: easy): Consider a differential
pair amplifier shown on transparency vlsi-27/3
where Ibias=200µA and all transistors have
W=100µm and L=1.6µm. Given
µnCox=92µA/V2 and rds-n=8000 [L (µm)]/[ID
(mA)]. Find the output impedance and the gain.
Result: Av =68.6V/V, rout=64kΩ (see
Johns/Martin pp146)

Ex ana5.1 (difficulty: easy): Find the gain of the


OpAmp shown on transparency vlsi-27/9. Assume
ID5=100µA, first stage VDG=0.5V, 2nd and 3rd
stage VDG=1V and bulk of Q8 connected to VSS.
Given µnCox =3µpCox=96µA/V2, VDD=-
VSS=2.5V, RL=10kΩ, γ=0.5V1/2, φF=0.35V,
α=5e6V1/2/m, Vtn=- Vtn=0.8V.
Result: Av =-6092V/V (see Johns/Martin pp224)

MicroLab, vlsi27 (30/34)

JMM v1.0
Exercises VLSI-27 (con’t 2)
Ex ana5.2 (difficulty: easy): Find the unity gain
frequency of the OpAmp shown on transparency vlsi-
27/9, with CC=5pF . Assume ID5=100µA, first
stage VDG=0.5V, 2nd and 3rd stage VDG=1V and
bulk of Q8 connected to VSS. Given µnCox
=3µ pCox=96µA/V2, VDD=-VSS=2.5V,
RL=10kΩ, γ=0.5V1/2, φF=0.35V,
α=5e6V1/2/m, Vtn=- Vtn=0.8V.
Result: fta = 24.7MHz (see Johns/Martin pp227)

Ex ana5.3 (difficulty: easy): Find the slew rate of


OpAmp on transparency vlsi-27/9, with CC=5pF .
Assume ID5=100µA. What circuit chane could be
done to double the slew rate but to keep ωta and
bias currents unchanged?
Result: SR = 20V/µs, to double SR: CC=2.5pF and
W1= W2= 75µm (see Johns/Martin pp229)

MicroLab, vlsi27 (31/34)

JMM v1.0
Exercises VLSI-27 (con’t 3)
Ex ana5.4 (difficulty: easy): Consider the OpAmp
shown on transparency vlsi-27/9, where Q3 qnd Q4
are each changed to widths of 120µm and we want
the output stage have a bias current of 150µA. Find
the new sizes of Q6 qnd Q7 such that there is no
systematic offset voltage.
Result: W6 = 450µm, W7 = 360µm(see
Johns/Martin pp231)

Ex ana5.5 (difficulty: easy): One phase of an SC


circuit is shown, where the input can be modelled as
a voltage step. If 0.1% accuracy is needed in the
linear settling-time portion corresponding to 100ns,
find the required unity-gain frequency in terms of
the capacitance values, C1 and C2 and in absolute
values. For C2=10C1 and for C2=0.2C1.
Result: fta = 12.1MHz, fta = 66.0MHz,
(see Johns/Martin pp235) C1 C2
-
+
+ vout
A(s)

MicroLab, vlsi27 (32/34)

JMM v1.0
Exercises VLSI-27 (con’t 4)
Ex ana5.7 (difficulty: medium): OpAmp has an open-
loop transfer function given by:
A0 (1 + s / ω z )
A(s ) =
(1 + s / ω p1 )(1 + s / ω 2 )
Assume that ω2=2π 50MHz and A0=104
a) Assuming ωz=inf, find ωp1 and the unity-gain
frequency ωt‘ so that the OpAmp has a unity-gain
phase margin of 55°
b) Assuming ωz=1.2 ωt‘ (use ωt‘ from a), what is
the unity-gain frequency ωt. Also find the new
phase margin.
Result: a) ωt‘=2π 35MHz, ωp1=2π 4.27kHz, b)
ωt=2π 46.6MHz, PM= -85° (see Johns/Martin
pp245)

MicroLab, vlsi27 (33/34)

JMM v1.0
Coming Up...
u Next topic…
Advanced Current Mirrors and OpAmps

u Readings
for next time…
Johns&Martin: Sections 3.8 and 5

u Exercises:
Have a look at the exercises in Johns&Martin.

MicroLab, vlsi27 (34/34)

JMM v1.0
Analog Microelectronics
Advanced Current Mirrors and OpAmp Design

Today’s handouts:
(1) Lecture Slides
MicroLab, vlsi28 (1/12)

JMM v1.0
Outline

u Johns&Martin
u advanced current mirrors (chap 6.1)
u wide-swing current mirrors
u wide-swing constant-transconductance bias circuit
u enhanced output-impedance current mirrors (not yet)
u wide-swing current mirror with enhanced output
impedance (not yet)
u folded-cascode OpAmp (chap 6.2)
u small signal analysis
u slew rate

u Exercises (6.8 & 6.10)


u spice simulations
u problems

MicroLab, vlsi28 (2/12)

JMM v1.0
Advanced current mirrors
wide-swing current mirrors
u The classical two-stage OpAmp was dicussed in
vlsi27.
u Recently a number of alternate OpAmps designs
have been gaining in popularity. They make use of
more advanced current mirrors.

u Wide-swing current mirror:


u as shorter channel lengths are used, it becomes more
difficult to achieve reasonable OpAmp gains due to
transistor output-impedance degradation caused to short-
channel effects.
u Conventional cascode current mirrors limit the signal
swings available.
è wide-swing current mirror

MicroLab, vlsi28 (3/12)

JMM v1.0
Wide-swing current mirrors

Iin Vout Iout=Iin


Ibias
Vbias W /L
2 W /L
W /L n 2
n
(n + 1)2
Q5
Q4 Q1 Vout > (n + 1)Veff

Q3 Q2 for Q4: Vtn > nVeff


W /L W /L

u The basic idea is to bias the drain-source voltages of


transistors Q2 and Q3 to be close to the mini-mum
possible without them going to triode region.
u Choice of Ibias:
u Ibias equal to maximum of Iin (all fets in saturation)
u Ibias equal to nominal of Iin (for larger Iin , fets in triode, but
probably only during slew-rate)
u Design hints:
u a common choice for n is unity
u Q5 larger (0,1V to 0.15V) in order to offset the increased
threshold voltages for Q 1 and Q 4 due to their body effects
u L of Q 1 , Q4 and Q 5 are twice minimal channel length, L of Q 2
and Q 3 are just slightly larger than minimal channel length
(high frequency poles)
MicroLab, vlsi28 (4/12)

JMM v1.0
Wide-swing constant-
transconductance bias circuit

20/1 20/1 Q 11 Vbias-p


Q7 20/1
Q8
5/1.6 Vcasc-p
small W/L
20/1.6 20/1.6 20/1.6 Q 14
Q6 Q 10
Q9 2/20

Q 18
10/1
10/1
10/1.6 10/1.6
Q 15
10/1.6
Q4 Q 13
Q1 Q 16

40/1 10/1 Q5 10/1


2.5/1.6 Q 17
Q2 Q3 Q 12 10/1
Vcasc-n
RB Vbias-n

bias loop cascode bias start-up circuitry


injects current as long
see vlsi-27 as ID’s are zero
slide 29

MicroLab, vlsi28 (5/12)

JMM v1.0
Enhanced output-impedance
current mirror
u Another variation of the cascode current mirror is
the enhanced output-impedance current mirror
shown as simplified version
u basic idea: use of feedback amplifier to keep the
drain-source voltage across Q2 stable, irrespetive
of the output voltage
ð the additional amplifier increases the output
impedance (see classical cascode current mirror,
vlsi-25 slides 16, 17)
Rout ≅ g m1rds1rds2 (1 + A)

Iout Rout

Iin Vbias -
A Q1
+

Q3 Q2

MicroLab, vlsi28 (6/12)

JMM v1.0
Folded-cascode OpAmp

u many modern integrated CMOS OpAmps are


designed to drive only capacitive loads
u capacitive-only loads do not need voltage buffers to
obtain low output impedance of the OpAmp
u thus it is possible to realize OpAmps having higher
speed and larger signal swings than those who must
drive resistive loads
u these improvements are obtained by having only one
single high-impedance node at the OpAmp output
that drives only capacitive loads
u all internal nodes have relatively low impedance
(around gm) thus the speed is optimized
u the compensation is usually achieved by the load
capacitance
u the most important parameter is their
transconductance:
operational transconductance amplifier OTA

MicroLab, vlsi28 (7/12)

JMM v1.0
Folded-cascode OpAmp con’t
may be replaced by a wide-swing
constant-transconductance bias network
and thus VB1, VB2 would be Vcasc-n, Vcasc-p

current mirror

Q3 Q4
Q 11 folded cascode fets
(see vlsi-25 slide 19)
VB1
Q 13
Q 12
Q5 Q6
Ibias1

Q1 Q2 Vout
Vin -
+ CL

Ibias2
differential-input Q8
VB2
single-ended output Q7
compensation
Q9 Q 10
Purpose of Q12, Q13
- increase slew-rate performance
- recovering improvement from slew-rate wide-swing cascode current mirror

Design hints:
- Ibias1 and Ibias2 should be derived from a single bias network
- any current mirrors should be designed by parallel combination of unit size fets

MicroLab, vlsi28 (8/12)

JMM v1.0
Folded-cascode OpAmp
small-signal analysis
Assumption: gm5 and gm6 are much larger than gds3 and gds4
- differential output current from drains of differential pair Q1 and Q2 is
applied to the load capacitance
- the small-signal current from Q1 passes directly from source
to drain of Q6 and thus to CL (indirect for Q2 to Q5 and CL)

Vout (s )
Av = = g m1 Z L ( s ) (for gm1 = gm2)
Vin (s )

g m1rout g m rds2
Av (s ) = rout ≅
1 + srout C L 2
(see vlsi-25 slide 20)

for mid-band and g m1 thus the unity-gain g m1


high frequencies Av ≅ frequency is ωt ≅
sC L CL
Design hint:
- for large load capacitances a maximal transconductance of input fets
maximizes band width, use n-channel fets
- input bias current 4 times larger than cascode current (maximizing dc gain)

Lead compensation (series resistance R C to CL)


g m1 g m1 (1 + sRC C L )
Av (s ) = ≅
1 1 sC L
+
rout RC + 1 / sC L
RC can be choosen to place a zero at 1.2 times unity-gain frequency
MicroLab, vlsi28 (9/12)

JMM v1.0
Folded-cascode OpAmp
slew-rate
u The diode connected fets Q12 and Q13 are turned off
during normal operation and have almost no effect
u slew-rate limiting behavior:
u assume there is a large differential input voltage that
causes Q 1 to be turned on hard and Q 2 to be turned off
u since Q 2 is off, all of the bias current of Q 4 will be
directed to through cascode fet Q5 through n-channel
current mirror and out of the load capacitance
u the output voltage will decrease linearly with a slew-
rate given by:
Id4
SR ≅
CL
u Q1 and current source Ibias will go into triode region,
moving the drain voltage of Q 1 to the negative power
supply
u Q12 and Q 13 clamp the drain voltages so they don’t change
as much during slew-rate limitation
u in addition Q 12 and Q 13 increase the bias currents for Q 3
and Q 4 and thus for C L

MicroLab, vlsi28 (10/12)

JMM v1.0
Exercises VLSI-28
Ex ana6.2 (difficulty: medium): find reasonable fet sizes
for the folded-cascode OpAmp: Assume pos/neg 2.5V
power supply, power dissipation maximal 2mW,
current ratio 4:1 between input and cascode fets, bias
current or Q11 is 1/30 of Q3 (thus ignoring it for
power dissipation), maximum fet width is 300um,
L=1.6um and Veff=0.25V for all except input fets,
W1=W2=300um, rounding widths to 10um,
CL=10pF, unCox= 3u pCox= 96uA/V2
a) find all fet sizes, unitiy gain frequency,
b) slew-rate with and without clamp fets
c) reasonable lead compensation RC
Result: a) Q1 to Q4=300um, Q5, Q6=60um, Q7 to
Q10=20um, Q11 to Q12=10um, ωt=2π 38MHz
b) SR= 32V/us,
c) RC=347Ω (see Johns/Martin pp271-273)

MicroLab, vlsi28 (11/12)

JMM v1.0
Coming Up...
u Nexttopic…
Comparators

u Readings
for next time…
Johns&Martin: Sections 6.1 and 6.2

u Exercises:
Have a look at the exercises in Johns&Martin.

MicroLab, vlsi28 (12/12)

JMM v1.0
VLSI Systems Design
FSM-D Architecture Model

FSM-D
data data

data-path
(RTL logic)
inputs outputs
control
(sensors) (actuators)

cotrol path
(finite state machine)
control control

Goal: You are able to use logic gates and flip-flops wisely
and not only in an ad-hoc manner. You master the finite
state machine data path model.

MicroLab, VLSI-30 (1/27)

JMM v1.4
Architecture Philosophy

?FSM-D architecture model is composed of 2 blocks:


?finite state machine (FSM)
?data-path (D)

?Goal of FSM-D architecture model


?structured design approach
?ressource optimization
?readability, documentation

?FSM Chatacteristics
?manager
?controlling, taking decision, initiating sub-tasks

?Data-Path Characteristics
?worker, specialist
?executing, calculating, storing & moving data

MicroLab, VLSI-30 (2/27)

JMM v1.4
FSM-D Architecture Model

?The FSM-D architecture model


?based on FSM model and data-path model
?interface: inputs, outputs

FSM-D
data data

data-path
(RTL logic)
inputs outputs
control
(sensors) (actuators)

cotrol path
(finite state machine)
control control

MicroLab, VLSI-30 (3/27)

JMM v1.4
FSM Structures

? Mealy machine
s[k+1]
i[k]
o[k]
transition state output
?outputs are dependent of logic register logic

inputs and state


s[k]

s[k ? 1] ? f (i[k ], s[k ])


o[k ] ? g (i[k ], s[k ])

? Moore machine
s[k+1]
i[k]
o[k]
transition state output

?outputs are dependent logic register logic

on states only
(functional restricted) s[k]

s?k ? 1?? f ?i?k ?, s?k ??


o?k ?? g ?s?k ??

? Medwedjew machine s[k+1]


i[k]
?outputs are dependent transition
logic
state
register o[k]

on states only
?outputs are hazard-free s[k]

s[k ? 1] ? f (i[k ], s[k ])


o[k ] ? s[k ]

MicroLab, VLSI-30 (4/27)

JMM v1.4
Data-Path Elements

?A typical data-path consists of 3 types of basic


elements
?buses, multiplexors, de-multiplexors
?functinal units, comparator, like adder, barrel shifter,
ALU, etc
?memory elements, like flip-flop, register, register file,
etc

bus[31:0] bus[31:0]

32 32 32 mux

bus[31:16] 32 32
16 32
32
2

a 1
32 cout enable
ADD
result register
32 32
b cin 32
32 1

MicroLab, VLSI-30 (5/27)

JMM v1.4
Data-Path Memory Element

?Memory elements store new values at every clock


cycle
?To give the FSM full control to the data-path, the
data-path memory elements need to be upgraded
with an enable control input

mux register register


enable
di 32 d di do
do 32 32
enable 32

clock

enable

di data

do data

MicroLab, VLSI-30 (6/27)

JMM v1.4
Design Steps

?A tutorial design shall serve as vehicle for a


practical approach: Black Jack player
?A key element in the FSM-D design procedure are
the interface definitions

?design steps:
?step 1: definitions of the algorithm
?step 2: FSM-D interface definition
?step 3: data-path design
?step 4: data-path interface definition
?step 5: FSM interface definition
?step 6: FSM state definition
?step 7: FSM design
?step 8: VHDL coding
?step 9: test-bench design and simulation

MicroLab, VLSI-30 (7/27)

JMM v1.4
Design Step 1:
Algorithm Definition
?goal of the Black Jack game:
?get as close as possible to 21 points
?lost if overpassed 21 points
?game restrictions:
?the cards have the following values:
2, 3, 4, 5, 6, 7, 8, 9, 10 and 11
as well as boy, lady and queen all
three representing 10 points
?game rules:
?ask for as many cards as needed
?the Ass can be treated as 11 points or as 1 point
?our players behavior:
?ask for cards as long as the summed-up points are below
16
?treat Ace alyways as 11 points
?when overpassed 21 points treat possible Ace as 1 point
to get a second chance

MicroLab, VLSI-30 (8/27)

JMM v1.4
Design Step 2
FSM-D Interface Definition
?defining the interface of the overal FSM-D
architecture model
?defining edge sensitivity of clock and active level of
control signals

FSM-D
cardReady newCard

score(4:0)
BlackJack Player
cardValue(3:0)

lost
clk
finished

start

MicroLab, VLSI-30 (9/27)

JMM v1.4
Design Step 3:
Data-Path Definition
?data-path has to be able to execute all functional
operations of the algorithm
?clearly separate control-path and data-path tasks as
in the manager/worker analogical model
?use memory elements, buses and multiplexers for
storing and moving data
?use combinational logic for functional operations
like adding, comparing, etc

MicroLab, VLSI-30 (10/27)

JMM v1.4
Design Step 3:
Data-Path Definition:
loading&comparing

?loading card value into register


?comparing to Ass

cmp11

A=B?
A B

enaLoad register 11
enable
cardValue(3:0) di do
regLoad

clk rst

MicroLab, VLSI-30 (11/27)

JMM v1.4
Design Step 3:
Data-Path Definition:
accumulating

?accumulating the card values

cmp11

A=B?
A B

enaLoad register 11
enable
cardValue(3:0) register
di do a enaAdd enable
regLoad regAdd
ADD
result di do
clk rst b

clk rst

MicroLab, VLSI-30 (12/27)

JMM v1.4
Design Step 3:
Data-Path Definition:
comparing sum

?comparing the accumulated values


?visualizing score

cmp11

A=B? cmp16 cmp21


enaLoad A B
A>B? A>B?
register 11
enable enaAdd A B A B
enaScore
register register
di do a enable 16 21 enable score
regLoad
ADD
result di do di do
clk rst regAdd
b

clk rst clk rst

MicroLab, VLSI-30 (13/27)

JMM v1.4
Design Step 3:
Data-Path Definition:
subtracting 10

?insert a second path to the load register and adder


to subtract 10

cmp11

A=B? cmp16 cmp21


enaLoad A B
A>B? A>B?
mux register 11
-10 enable A B A B
in0 regLoad enaAdd enaScore
do register register
di do a enable 16 21 enable score
in1
ADD di
cardValue result di do do
clk rst regAdd
sel b

clk rst clk rst

MicroLab, VLSI-30 (14/27)

JMM v1.4
Design Step 4
Data-Path Interface Definition
?defining the interface of the data-path block
?defining edge sensitivity of clock and active level of
control signals

DataPath score(4:0)
cardValue(3:0)

clk

rst
sel
enaLoad
cmp11
cmp16
cmp21

enaAdd
enaScore

MicroLab, VLSI-30 (15/27)

JMM v1.4
Design Step 5
FSM Interface Definition
?defining the inputs and outputs of the FSM block

FSM input signals

cmp11 cmp16 cmp21 cardReady

FSM output signals


finished lost newCard sel enaLoad enaAdd enaScore

MicroLab, VLSI-30 (16/27)

JMM v1.4
Design Step 5:
Interface Definition
Completed FSM-D Hierarchy
BlackJack Player
FSM-D
DataPath
cardValue(3:0) score(4:0)

rst
enaScore
enaLoad
enaAdd
cmp11
cmp16
cmp21
sel

clk

cardReady ControlPath newCard

lost

finished
rst

start

MicroLab, VLSI-30 (17/27)

JMM v1.4
Design Step 6
FSM State Definition
?draw a skeleton state with placeholders for the
state name and the output signals.

state
name
enaScore
newCard

enaLoad
finished

enaAdd
lost

sel

output
signals

MicroLab, VLSI-30 (18/27)

JMM v1.4
Design Step 7
FSM Design – FSMD Timing
?single clock cycle schema
?Moore type FSM
?FSM-D timing diagram
?registered values are available in next state or
when leaving next state
?combinational values are available in current
state or when leaving current state

state LoadReg CheckVal Idle1 OpenData Idle2

clock

enable (FSM)

registers (D) new value

inform (D)

select (FSM)

data bus (D) data

MicroLab, VLSI-30 (19/27)

JMM v1.4
Design Step 7
FSM Design
?design the Moore type state diagram
?conditions on arrows are FSM inputs
?output values are defined in states
?use bilzard arrow for asynchronous reset
reset
cardReady

state CallCard state LoadCard


name cardReady name
enaScore

enaScore
newCard

newCard
enaLoad

enaLoad
enaAdd

enaAdd
broke

broke
hold

hold
sel

sel

output output
signals 0 0 1 - - 0 0 signals 0 0 1 1 1 0 0

cmp11cmp16
cardReady

cmp11cmp16cmp21
state AddCard state Handshake
name cardReady name
enaScore

enaScore
newCard

newCard
enaLoad

enaLoad
enaAdd

enaAdd
broke

broke
hold

hold
sel

sel

output output
signals 0 0 0 - 0 1 0 signals 0 0 0 - 0 1 0

cmp11cmp21 cmp16
cmp16cmp21

MicroLab, VLSI-30 (20/27)

JMM v1.4
Design Step 8:
Coding – Data-Path
?all registers with associated logic are placed in one
process (same clock and asynchronous reset)
?loosely coupled combinatorial logic can be coded
with conditional signal assignments
cmp11

A=B? cmp16 cmp21


enaLoad A B
A>B? A>B?
mux register 11
-10 enable A B A B
in0 regLoad enaAdd enaScore
do register register
di do a enable 16 21 enable score
in1
ADD di
result di do do
clk rst regAdd
sel b

clk rst clk rst

process(clk,rst)
begin
if (rst = ‘0‘) then process
regLoad <=“00000“;
regAdd <=“00000“;
regScore <=“00000“;
elsif (clk‘event and clk=‘0‘) then continuous conditional
if (enaAdd=‘1‘) then
regAdd <= regAdd +regLoad;
assignment
end if;
... cmp11 <= ‘1‘ when (regLoad =“01011“), else ‘0‘;
end if; cmp16 <= ‘1‘ when (regAdd > “10000“) else ‘0‘;
end process; cmp21 <= ‘1‘ when (regAdd > “10101“) else ‘0‘;
MicroLab, VLSI-30 (21/27)

JMM v1.4
Design Step 8:
Coding – FSM
?one clocked process is used for the state transition
?one combinatorial process is used for the state
dependent output assignment
state
s[k+1]
i[k]
o[k]
transition state output
logic register logic

s[k]

process(clk,rst)
begin process(state)
if (rst = ‘0‘) then begin
state<=StartState; case state is
elsif (clk‘event and clk=‘0‘) then when StartState =>
case state is outvec <= “000--00“;
when StartState => when CallCard =>
state <= CallCard; outvec <= “001--00“;
when CallCard => when others =>
if (cardReady = ‘1‘) then outvec <= “UUUUUUU“;
state <= LoadCard; -- used for VHDL analysis
end if; -- „null“ for synthesis
when others => end case;
state <= IllegalState; end process;
-- used for VHDL analysis
-- „null“for synthsis
finished <= outvec(6);
end case;
lost <= outvec(5);
end if;
newCard <= outvec(4);
end process;
...

MicroLab, VLSI-30 (22/27)

JMM v1.4
Design Step 9:
Test-Bench Design
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested
?there is a nice measurement equipment
?there are skilled and hard working people
?there are no signals coming or going to the outside of
the lab

Test Bench

control response
and generation
stimulus and
generation verification

device under test (DUT)

MicroLab, VLSI-30 (23/27)

JMM v1.4
Design Step 9:
Test-Bench Design – Test Cycle
?cycle based test
?apply input patterns at begining of test cycle
?capture response after rising or falling clock edge

apply capture
stimuli response

test cycle

clock

inputs

outputs stable stable

(sync)

MicroLab, VLSI-30 (24/27)

JMM v1.4
Design Step 9:
Test-Bench Design – Simulation
?cycle based test
?apply input patterns at begining of test cycle
?observe response after rising or falling clock edge
?visualize data-path registers and FSM state

MicroLab, VLSI-30 (25/27)

JMM v1.4
Errors and Pitfalls

?asynchronous external inputs to FSM provoke state


hazards
?imagine a 0.1 ns hazard can be captured in state register
?imagine 100 states in FSM
?imagine 100 MHz clock frequency
?100 errors per second
? input synchronization for all „external“ (non-
synchronous) FSM inputs

new state always


input with hazards
synchronization
register
s[k+1]
i[k]
non-synchronous o[k]
inputs transition state output
logic register logic

s[k] FSM

MicroLab, VLSI-30 (26/27)

JMM v1.4
Summary and Conclusion

?FSM-D architectural model supports structured


design approach
?9 design step approach for FSM-D design
presented
?task re-distribution between FSM and data-oath is
crucial:
?Ass counting (0, 1 or 2) n Black Jack dealer. Who
should do it? FSM or data-path?
?workers/manager analogy is used to assign sub-
tasks to control-path (manager) and data-path
(specialized workers)

MicroLab, VLSI-30 (27/27)

JMM v1.4
MicroLab, VLSI-30 (28/27)

JMM v1.4

You might also like