Vlsi

VLSI System Design
Overview of VLSI Design Issues
Professor: Dr. Marcel Jacomet (based on transparencies designed by

Chris Terman at MIT, completely updated and adapted at MicroLab-
MicroLab-I3S)
Overview
Microelectronic history
the complexity of microelectronics
design steps
Goal: You are familiar with the microelectronics history,
have an idea about the microelectronics complexity and
you have an overview of the VLSI design steps.
MicroLab, VLSI-1 (1/28)
JMM v1.4
What’s expected of you
Class/Homework Readings from a Starter Guide to
VHDL and some articles. Some
50% in class
50% homework problems to be worked at home. Self-
Self-
study of the VHDL language with help
of the CBT CD from Doulouse.
Doulouse.
Project
Some design exercises to be done in
40% of final grade the lab. Specify, design and simulate
a small VHDL design project using a
data-
data-path / finit state machine.
Place & route it on a FPGA target
technology (due date: July 19th at
13h00, 2002)
Test
One 70 minute in-
in-class test. Meant
60% of final grade
to be duck soup if you’ve been
coming to lectures and doing the lab
and homework (date: Friday July 12th,
2002).
JMM v1.4
Timetable 4th Semester:
Introduction to VLSI System Design
Date Topic Self-
Self-Study
11-
11-15.3. vlsi1: history & complexity A VLSI tutorial
18-
18-22.3. vlsi8: micro technologies How a silicon int.
25-
25-29.3. --
11-
11-19.4. vlsi8: micro technologies article Hoff
22-
22-26.4. vlsi21: top-
top-down design, VHDL VHDL/CBT
29.4-
29.4-3.5. Ex400, 401 VHDL/CBT
6-10.5. -- VHDL/CBT
13-
13-17.5. vlsi21 & Ex402 VHDL
20-
20-24.5. vlsi21 & Ex404,405 VHDL
27-
27-31.5. vlsi21 & Ex406-
Ex406-408 VHDL
3-7.6. vlsi21 & Ex409 chapter 5
10-
10-14.6. vlsi21: & Ex410 VHDL finish
17-
17-21.6. Ex450 project
24-
24-28.6 Ex451 project
1-5.7. Ex452 project
8-12.6. Test project
15-
15-19. 6 test discussion and outlook project
19.6. at 13h00 project due
JMM v1.4
So, what’s VLSI Systems Design
all about?
You’ll get a bottom-
bottom-up tour of how integrated
circuits are engineered. We’ll talk about
field-
field-effect transistors: how they work, how they’re
built, effects of new technologies
various design and layout techniques, from the
ordinary to the bizarre, for creating combinational
and sequential circuits, datapaths,
datapaths, memories,
buffers, regular logic structures, …
how you tackle the problem of designing circuits
with 1,000,000 gates -- you’re not in Digital
Technique anymore!
JMM v1.4
Key Technology Microelectronics
microelectronics is a key technology of the world

economy
technology development is extremely aggressive
post-
post-grade engineering education is important
influence of other technologies like software
engineering
key technologies may be used as weapons. 1991
Japan hold 80% share of the world production of
4MB DRAMs.
DRAMs. Artificial raw material shortage are
disastrous.
very few Swiss chip fabs.
fabs. Our raw material is the
high education standard, that means YOU
JMM v1.4
What is a VLSI Circuit?
VERY LARGE SCALE INTEGRATED CIRCUIT
Technique where many circuit components and

the wiring that connects them are manufactured
simultaneously into a compact, reliable and
inexpensive chip.
Early (circa 1977) characterization of circuit

“size” before people realized that the number of
components per chip was quadrupling every 24
months ((Moore’s
Moore’s Law)! This growth rate has
slowed in recent years… can you guess why?
JMM v1.4
Course Outline/Brief history
Bell Labs lays the groundwork:

1940: Ohl develops PN junction
1945: Shockley’s lab established
1947: Bardeen and Brattain create
point-
point-contact transistor with
two PN junctions. Gain = 18.
1951: Shockley develops junction

transistor which can be
manufactured in quantity.
1952: Dummer forecasts “solid
block [with] layers of
insulating, conducting and
amplifying materials”
1954: The first transistor radio!
Also, TI makes first silicon
transistor (price $2.50)
JMM v1.4
Early integration
Jack Kilby,
Kilby, working at Texas Instruments, first dreamed up the idea
of a monolithic “integrated circuit” in July 1959. By the end ooff the
year, he had constructed several examples, including the flip-
flip-flop
shown in the patent drawing above. Components are connected by
hand-
hand-soldered wires and isolated by “shaping” and pn diodes used as
resistors.
Robert Noyce experimented in the late 40’s with

transistors while a physics major at college. He went to MIT where
where
“much to his surprise, few people had even heard about the
transistor.” After getting his PhD in 1953, he worked in industry,
industry,
finally arriving at Mountain View, CA and Shockley Semiconductor
Labs in 1955.
JMM v1.4
“ “
In 1957, Noyce left Shockley’s

lab to form Fairchild Semi-
Semi-
conductor with Jean Hoerni.
Hoerni.
Gordon Moore is another
founder.
In early 1958, Hoerni invents

technique for diffusing impurities into
the silicon to build planar transistors
and then using a SiO2 insulator.
In mid 1959, Noyce develops

first true IC using planar transistors,
back-
back-to-
to-back pn junctions for
isolation, diode-
diode-isolated silicon
resistors and SiO2 insulation with
evaporated metal wiring on top.
JMM v1.4
Practice makes perfect...
1.5 mm
1961: TI and Fairchild introduced
the first logic IC’s (cost ~$50 in
quantity!). This is a dual flip-
flip-flop with 4
transistors.
1963: Densities and yields are improving.

This circuit has four flip flops.
0.97 mm
1967: Fairchild markets the semi-

semi-custom
chip shown below. Transistors (organized in
columns) could be easily rewired using a
two-
two-layer interconnect to create different
circuits. This circuit contains ~150 logic
gates.
3.81 mm
1968: Noyce and Moore leave

Fairchild and found Intel. No
business plan, just a promise
to specialize in memory chips.
They raise $3M in two days
and move to Santa Clara. By
1971 Intel had 500 employees;
by 1983 it had 21,500
employees and $1100M in sales.
JMM v1.4
The Big Bang
2.87 mm
In 1970, making good on

its promise to its investors Intel
starts selling a 1K bit RAM, the
1103. It was a bear to interface to,
but its density and cost make it the
only game it town.
In 1971 Intel introduces the first

microprocessor, designed by Ted
Hoff. The 4004 had 4- 4-bit buses and
a clock rate of 108KHz. It had 2300
transistors and was built in a 10um
process. It never captured much
interest in the market and was soon
eclipsed by its more capable brothers.
JMM v1.4
Exponential Growth
Introduced in 1972, the 8008 had 3,500

transistors supporting a byte-
byte-wide data path.
Despite its limitations, the 8008 was the first
microprocessor capable of playing the role of
computer CPU as demonstrated on the cover of
the July ‘74 issue of Radio-
Radio-Electronics.
Last, but not least, on our tour is the

8080. Introduced in 1974, the 8080
had 6,000 transistors fab’ed in a 6um
process. The clock rate was 2Mhz, more
than enough to ignite the personal
computer industry. At least Paul Allen
and his partner thought so when they
wrote a BASIC interpreter for the 8080
in 1975. They would later collaborate in
another, more profitable, venture...
JMM v1.4
Today
AVP-
AVP-III Video Codec from Lucent Technologies
Many disciplines have contributed to the current state of the art

art
in VLSI design:
solid-
solid-state physics circuit design & layout
materials science architecture
lithography and fab algorithms
device modeling CAD tools
We’ll be concentrating on the right-
right-hand column
JMM v1.4
“Computer-
“Computer-
Aided CAD Tools #1
Design”
organize generate verify

Symbolic layout tools to
Standard-
Standard-cell place ease the task of physical
and route for “random” design; mask verification
logic. to ensure manufacturability.
Circuit analysis programs predict circuit behavior at

all the process corners. Gate-
Gate-level and behavioral
simulators help you get it right the first time!
Tools to do the tedious, repetitive work such as
routing,“tiling” a mosaic of building-
building-block cells, or
verifying that the layout and schematic match.
JMM v1.4
CAD Tools #2
Problem:
designing highly complex VLSI circuits
(100K to xM fets)
fets)
classical, iterative procedures are unsuitable
precise transistor models are necessary for
reliable predictions Æ data inflation
Solution:
new design methodologies
powerful design tools
high level design languages
silicon compiler would be useful
JMM v1.4
VLSI Design Challenge
Goal:
designing circuits with increasing complexity in
always shorter times
computer has to take over routine work

deliberate the designer from unnecessary low
qualification work
shift of design activities to higher level abstract
work
computer has to support new design methods
JMM v1.4
Chip Complexity #1
Chip classification according to number of active

elements and minimal feature size:
classification #transistors example

SSI 1 - 100 gates
MSI 100 - 1k registers
LSI 1k - 100k uP
VLSI 100K - RAM, sig. proc.
ULSI ?
year minimal channel length

1970 10µm
10µ
1980 5µm
1985 2µm
1992 0.5µm
0.5µ
2002
2002 0.13µm
0.13
2010 ?
JMM v1.4
Chip Complexity #2
can you really imagine the chip complexity of

today's VLSI chips and not just express it as a mere
number
street map image

year feature block chip town
10x10µm 200m
1970 10x10µ 2mm Biel
10x5µm 200m
1980 10x5µ 5mm Paris
10x0.7µ 200m
1992 10x0.7µ 10mm Switzerland
JMM v1.4
Architecture
(Multiple choice)
This is a picture of
(A) a programmable general purpose ASIC with 1/4 million

0.7µm CMOS
transistors on a 40mm2 designed in a 0.7µ
full custom technology.
(B) a processor able to execute 64 knowledge based rules

in parallel due to a 3 stage pipelined architecture with
hard-
hard-coded adder, multiplier, divider architecture.
(C) the fastest fuzzy processor in the world, designed

by MicroLab-
MicroLab-I3S and presented at the international
FUZZ‘98 conference in New Orleans
ANSWER: _________
JMM v1.4
Circuit Design & Layout
Standard cell Full custom
RAM Generator
Q: Which engineer drew the most fets?

fets? ______
JMM v1.4
VLSI: The Ideal Implementation
Medium?
VLSI
gives the designer control over almost everything:
architecture, logic design, speed, area, power, …
densities are increasing, costs decreasing with each
passing year
is used by almost everyone: “No one gets fired for
building an ASIC”
was the enabling technology for much of the
economic growth of the 80’s and 90’s. It will no
doubt continue in its starring role for some time
come.
Is life really a bowl of cherries?
JMM v1.4
VLSI Fact
Fact--of-
of-Life #1:
“So much to do, so little time”
You need a design methodology :
budget ($, speed, area, power, schedule, risk)
low-
low-level building blocks,
high-
high-level architecture
behavioural design, verification
logic design, verification
layout, verification
JMM v1.4
VLSI Fact-
Fact-of-
of-Life #2:
“You can’t reach in and fix it”
verification”” kept appearing in
Notice that the word “verification
the previous slide.
Mistakes can be costly:
find bug(s) ? ?
reverify 1 week Ecu 10k
new masks 3 days Ecu 25k
fab run 12 weeks Ecu 1k/wafer
slip ship date Ecu Ecu Ecu
There’s a lot that needs checking:

circuit must operate at all “corners”
verified at building block level
logic must be correct, operate reliably
verified at RTL/gate level
chip has to interoperate with system
verified at behavioral level
chip has to be manufacturable
manufacturable
verified at mask level, at tester
JMM v1.4
VLSI Fact-
Fact-of-
of-Life #3:
“Verification is a tedious task”
JMM v1.4
VLSI Fact
Fact--of-
of-Life #4:
“You can’t find all the bugs”
The key word here is “find”:
one can’t explore the behaviour of the circuit under all
possible conditions
some of the bugs arise from unanticipated interactions
which, by definition, one never thinks of testing
it’s not clear when one is “done” looking for bugs!
Time pressures mean that most searches stop too soon.
The trick is to choose some implementation rules that

result in a circuit that is correct by construction*. For
example:
choose a simple clocking scheme
module inputs must go only to fet gates
disallow unclocked feedback
make register t(clk
t(clk-
clk-to-
to-Q) > t(hold)+skew
use poly only for local interconnect
no diffusion wires
etc., etc., etc.
* or at least avoid as many problems as possible!
JMM v1.4
VLSI Fact-
Fact-of-
of-Life #5:
“Nobody’s perfect”
Plan for what happens after you turn it on and

nothing happens.
provide lot’s of observability and controlability.
You’ll need to localize and then find the bug.
have a way to run the chip slowly and/or stop it
without it burning up or loosing bits.
figure out how to track down performance
problems without relying on fast I/O (tester pins
are slow!)
leave room in the budget
(time, Ecu)
Ecu) for debugging.
write and run your
manufacturing tests
before tape out.
JMM v1.4
Microelectronics in 4th Semester
history & microelectronic

complexity technologies
EXPERIENCE VHDL
exercises with data path / fsm
CAD tools project
synthesis
design flow
Course material
Textbook from Weste & Eshraghian for
4th and 5th semester (voluntary)
Copy of transparencies (placeholder for private notes)
VHDL Starter (recommended)
CAD Exercises on the MicroLab web pages
CBT CD on VHDL for your PC (lending

from MicroLab in 4th semester)
JMM v1.4 different small articles

Coming Up...
We’ll be traveling top-

top-down in 4th semester and
bottom-
bottom-up in 5 & 6 semester:
Next topic…
Microelectronic technologies like standard cell,
gate array, sea-
sea-of-
of-gates, macro cell, FPGA, tiny
micro-
micro-controllers.
Readings for next time…

web CBT tutorials see on
http://www.microlab
http://www.microlab.
microlab.ch/academics/courses
ch/academics/courses
How a silicon integrated circuit is made (web CBT)
A VLSI Tutorial up to chapter with NAND/NOR
(web CBT from Uni Manchester)
T. Hoff: Article about the µP History (G(German
erman)
erman)
To learn more about Intel’s early days and to ogle
some die photos of oldie-
oldie-but-
but-goodie chips browse
at the Intel link of the MicroLab VLSI course web
page.
JMM v1.4
VLSI Design I
The MOSFET model Wow !
Are device models as
nice as Cindy ?
Overview
The large signal MOSFET model and second order
effects. MOSFET capacitances.
Introduction in fet process technology
Goal: You can use the large signal equivalent MOS

device equation. You are familiar with second order
effects like body effect, channel length modulation.
You know the MOS capacitances. You know the
basic steps in MOS fabrication.
JMM v1.4
Let’s build a MOSFET
There are lots of different recipes to choose from.
Like most things in life, you get what you pay for:
the ability to have good bipolar devices, radiation
hardness, reduced latch-
latch-up and substrate noise, …
are all extra cost options. We’ll consider a general
process: bulk CMOS with a p- p-type substrate:
500um slice of a silicon ingot that has been

doped with an acceptor (typically boron) to
Use <100> surface increase the concentration of holes to 1014/cm3
to minimize surface - 1018/cm3.
charge
p-type
Back is metal
metalliz
lized to provide
a good ground connection.
Good for n-
n-channel fets,
fets, but p-
p-channel
fets will need a n-
n-type “well” (or tub) to
live in!
JMM v1.4
Next, a “thick” (0.4um) layer of silicon dioxide, called
field oxide, is formed on the surface by oxidation in wet
oxygen. This is then etched to expose surface where we
want to make a mosfet:
mosfet:
Now grow a “thin” (0.01um = 100 Å) layer of silicon

dioxide, called gate oxide, on the surface by exposing the
wafer to dry oxygen.
The gate oxide needs to be of high quality: uniform

thickness, no defects! The thinner the gate oxide, the
more oomph the fet will have (we’ll see why soon) but the
harder it is to make it defect free.
JMM v1.4
On top of the thin oxide a 0.7um thick layer of
polycrystalline silicon, called polysilicon or poly for
short, is deposited by CVD. The poly layer is patterned
and plasma etched (thin ox not covered by poly is etched
away too!) exposing the surface where the source and
drain junctions will be formed:
gate oxide poly wires field oxide

(only under poly)
exposed surface for source

and drain junctions p
Poly has a high sheet resistance (25 ohms/square) which

can be reduced by adding a layer of a silicided refractory
metal such titanium (TiSi2), tantalum (TaSi2) or
molybdenum (MoSi2). These have sheet resistances of 1,
3 or 5 ohms per square, respectively. This is great for
memory structures that have lots of poly wiring.
JMM v1.4
The entire surface is doped, either by diffusion or ion
implantation, with phosphorus (an electron donor) which
creates two n-
n-type regions in the substrate. The
phosphorus also penetrates the poly reducing its resistance
and affecting the nfet’s threshold.
diffusions are “self-

“self-aligned”
with poly
n+ n+
n+ wires: 20-
20-30 ohms/sq. p
Finally an intermediate oxide layer is grown and then

reflowed to flatten its surface. Holes are etched in the
oxide (where contacts to poly/diff are wanted) and
alumin
aluminum deposited, patterned and etched.
metal wires (0.08 ohms/square)
???
diff contact (0.25 - 10 ohms) n- channel MOS

field effect transistor!
JMM v1.4
NFET Operation
Picture shows configuration when Vgs < Vto
S G D
Ids = 0
n+ n+
mobile holes, mobile electrons,

fixed negative ions B fixed positive ions
(n+ means heavily
depletion layer doped with donors,
no mobile carriers, doesn’t imply
but fixed negative ions positive charge!)
(slight intrusion into n+,
but mostly in p area) Terminal with higher
G voltage is labelled D,
Other symbols: the other is labelled S
so Ids >= 0.
S D
B almost always ground

JMM v1.4
FET = field effect transistor
The four terminals of a fet (gate, source, drain and bulk)
connect to conducting surfaces that generate a complicated
set of electric fields in the channel region which depend on
the relative voltages of each terminal.
Picture shows configuration
when Vgb > Vto gate
inversion
happens here
Eh Ev
source drain
bulk
INVERSION: CONDUCTION:
A sufficiently str
strong
ong vertical If a channel exists, a
field will attract enough horizontal field will cause
electrons to the surface to a drift current from the
create a conducting n-
n-type channel drain to the source.
between the source and drain. Expect Ids proportional
to Vds*(W/L)?
Vds*(W/L)?
JMM v1.4
Threshold voltage
The gate voltage required to form the channel is called the threshold
voltage. Many factors affect the gate-
gate-source voltage at which the
channel becomes conductive. Threshold voltage for source-
source-bulk voltage
zero:
VTO = Vt − ms + Vfb

Q Q ε ox
VTO = 2φ F + b + φ ms − fc
, C ox C ox t ox
kT  N DN A 
n-channel 2 kT ln N A 
0.61V for n- ln 2 
p-channel q  n i 
-0.61V for p- q  ni 
2 ε si q N A 2φ F
JMM v1.4
Body effect (second order)
As Vsb increases, the depth of the depletion region
increases, exposing more of the fixed acceptor (i.e.
negative) ions in the substrate.
Thus the second term in the threshold voltage equation on
the previous slide increases from
2ε si qN A 2 ΦF
to
2ε si qN A (Vsb + 2 ΦF )
the threshold voltage of the n-

n-channel transistor is now:
2ε si qN A
Vtn = Vtn0 + γ ( Vsb + 2 ΦF − 2 ΦF ) γ=
C ox
As we’ll see, this effect

T2
comes into play in
series-
series-connected fets Vsb>0
where only one of the T1
fets will have Vsb = 0
and the other fets will Vt2> Vt1 Vsb=0
have Vsb > 0 and a
higher threshold voltage.
JMM v1.4
Basic DC equations
MOS transistors have 3 regions of operation:

cutoff region (subthreshold
(subthreshold)
subthreshold)
linear region (triode region)
saturated region (active region)
polysilicon gate
SiO2
source diffusion
drain diffusion
Cutoff or subthreshold region:

Vgs <=V
<=Vt
Ids = 0
There is still a small current described in the
second order effects (weak inversion). Important to
model for analog circuits: I ds ∝ Vds
JMM v1.4
“Linear” operating region
Vs Vgs > Vt 0 < Vds < Vdsat
Ids
Larger Vgs creates Larger Vds increases drift current but

deeper channel which also reduces vertical field component
increases Ids which in turn makes channel less deep.
Channel will pinch-
pinch-off, when
channel length is mobility Vds = Vgs - Vt = Vdsat
almost always min (un > up)
allowable fet gain factor k=µC
k= Cox
W µ ε ox  Vds2 
I ds =
L t ox 
(
 Vgs − Vt Vds −
2 
)
max value at Vds = Vdsat,
but then channel is only linear when Vds is small,
pinched off (see next slide) otherwise parabolic
JMM v1.4
Saturated operating region
Vs Vgs > Vt Vdsat < Vds
Ids
Voltage at channel end Electrons arriving from source are

remains essentially injected into drain depletion region
constant at Vdsat so and accelerated towards drain by field
drift current also remains proportional to Vds - Vdsat usually
constant: device is in reaching the drift velocity limit.
saturation.
W µ ε ox
( )
2
I ds (sat ) = Vgs − Vt
2 L t ox
this is just Ids from previous slide

evaluated at Vds = Vdsat
JMM v1.4
Channel--length modulation
Channel
(second order)
Vs Vgs > Vt Vdsat < Vds
Ids
L’ = L - dL
dL
This looks just like a As Vds increases,

fet with a channel length dL get larger
of L’ < L. Shorter L’ implies
greater Ids...
As Vds increases the effective channel length gets

shorter so Ids(sat) increases. dL is proportional to
Vds − Vdsat but most people approximate channel
length modulation as a linear effect:
W µ ε ox
( ) (1 + λ V
2
I ds (sat ) = Vgs − Vt ds )
2 L t ox
JMM v1.4
NFET Ids curves
“Put it together and what have you got?”
plot of Ids vs. Vds for Vgs = 0 ,1, 2, 3, 4 and 5V
Can you find the following in the plot?

Ids vs. Vds when Vgs = 0V
Ids vs. Vds when Vgs = 5V
value of Vt
value of Vdsat
evidence of body effect
evidence of channel length modulation
JMM v1.4
SPICE Models
There are different models used in circuit simulators:
level 1 is our simple model including the most
important second order effects described
level 2 model is based on device physics
level 3 is a semi-
semi-empirical model allowing to match
equations to the real circuit
circuit:: example BSIM model
from Berkeley models subthreshold characteristics
summary of the main SPICE DC parameters used in

all three models at the end of this chapter
.
M1 4 3 5 0 nfet W=1u L=0.5u AS=1p AD=1p PS=3u PD=3u
.
.
.MODEL nfet NMOS
+TOX=1E-
+TOX=1E-8
+CGB0=345p CGS0=138p CGD0=138p
+CJ=775u CJSW=344p MJ=0.35 MJSW=0.26 PB=0.75
+. . . .
.
.
JMM v1.4
MOSFET Capacitance Estimation
the dynamic response of MOS systems strongly
depends on the parasitic capacitances associated with
the MOS device. The total load capacitance on the
output of a CMOS gate is the sum of:
gate capacitance (of other inputs connected to out)
diffusion capacitance (of drain/source regions)
routing capacitances (output to other inputs)
Cgd drain
Cdb
gate substrate
Cgs Csb
source
gate
Cgb
Cgs Cgb Cgd tox

channel
source drain
depletion
layer
Csb Cdb
substrate
JMM v1.4
MOSFET gate capacitances
Cg = Cgd + Cgs + Cgb
Oxide-
Oxide-related capacitances come in two forms:
overlap capacitance (extrinsic) since gate slightly

overhangs diffusions and bulk:
for both Cgs and Cgd amount of overlap
C(overlap) = W LD Cox for SPICE
for Cgb Cgs = W CGS0
C(overlap) = 2L CGB0 Cgd = W CGD0
Cgb = 2L CGB0
channel-
channel-charge related capacitances (intrinsic):
cut-
cut-off: Cgb = Cox W L
Cgs = Cgd = 0
shielded by channel
linear: Cgb = 0
Cgs = Cgd = 0.5 Cox W L
equally shared between S and D
note capacitive coupling of gate and drain/source
saturation: Cgb = 0 channel pinched off
Cgd = 0 channel shortened
Cgs = 0.67 Cox W L
JMM v1.4
MOSFET diffusion capacitances
Junction capacitances Cdb and Csb are a function of the
applied terminal voltages and diffusion dimensions:
source/drain diffusion
xj
channel
sidewall faces bottom junction faces sidewalls face p+

channel p-type substrate channel stop
zero-
zero-bias C/unit area of bottom junction zero-
zero-bias C/unit length of
area of diffusion sidewall junction
perimeter of diffusion
C jA C jsw P
C diff = Mj
+ Mjsw
negative for  Vj   Vj 
reverse biased  1 −   1 −  grading coeff.
coeff.
 Vb   Vb 
built-
built-in junction
potential junction voltage
grading coeff.
coeff.
JMM v1.4
P-channel MOSFETs
S G D
p+ p+
n
p
threshold voltage is PFET is built inside its
negative since we need B own “substrate”: a n-
n-type
attract holes to form well or tub diffused into
inversion layer p-type bulk substrate.
Don’t forget well contacts!
Other symbols:
G Terminal with lower
voltage is labelled D,
the other is labelled S
S D
off: Vgs > Vt B n-well always connected

lin:
lin: Vgs>Vt, Vds>Vgs-Vt to Vdd to keep pn
sat: Vgs>Vt, Vds<Vgs-Vt junction back-
back-biased
JMM v1.4
Depletion--mode MOSFETs
Depletion
S G D
n+ n+
channel doped with donors

B to give negative threshold
voltage, i.e., depletion fets
are always on.
This mosfet is always conducting but, like ordinary

enhancement fets,
fets, it will conduct more current as Vgs
increases. One can build logic circuits with only n-
n-
channel devices (NMOS): enhancement fets for pulldowns
and depletion fets as static pullups.
pullups. Since NMOS logic
dissipates DC power it’s been largely replaced by CMOS.
JMM v1.4
Coming Up...
Next topic…
Static characteristics of MOS inverters: input
and output voltages, noise margins, power
dissipation.

Weste:
sections 2 thru 2.23 except 2.2.2.4 - 2.2.2.7 (fet
(fet
models),
3 thru 3.2.2 (process technology) and
4.3 through 4.3.4 (capacitances)
CBT:
Study the chip fabrication text of the university of
Manchester at the MicroLab VLSI course web link.
JMM v1.4
Useful Constants
sym value units description

ε0 8.8542E-
8.8542E-12 F/m permittivity
εox 3.9 ε0 F/m permittivity of SiO2
εSi 11.7 ε0 F/m permittivity of silicon
VT 25.8 mV kT/q
kT/q (@300°K)
q 1.6022E-
1.6022E-19 C charge of electron
k 1.381E-
1.381E-23 J/°K Boltzmann‘s
Boltzmann‘s constant
ni 1.45E10 cm-3 intrinsic carrier concentration
JMM v1.4
Alcatel 0,5um Process Parameters
sym param nmos pmos units description
Vt0 VTO 0.61 -0.61 V threshold voltage
tox TOX 1E-1E-8 1E-
1E-8 m thin oxide thickness
NA NSUB 4E16 4E16 cm-3 substrate doping density
µ U0 290 72 cm2/Vs charge mobility
k KP A/V2 fet gain factor
γ GAMMA V0.5 bulk threshold param.
param.
Cox COX F/m2 oxide capacitance
capacitance
λ α/L V- 1 channel length
α modulat.
modulat.1e-
1e-8 2e-
2e-8 V-1m-1 channel length mod fact.
φ0 PB 0.7556 0.78469 V built in junction potent.
2φF PHI 0.77 0.77 V surface inversion pot.
Cgb0 CGB0 3.45E-
3.45E-10 dito F/m overlapping cap per 2L
Cgs0 CGS0 1.38E-
1.38E-10 dito F/m overlapping cap per W
Cgd0 CGD0 1.38E-
1.38E-10 dito F/m overlapping cap per W
Cj CJ 7.75E-
7.75E-4 8.15E-
8.15E-4 F/m2 zero-
zero-bias cap / unit A
Cjsw CJSW 3.44E-
3.44E-10 3.54E
3.54E--10 F/m zero-
zero-bias cap per unit P
Mj MJ 0.35 0.36 grading coeff for bottom
Mjsw MJSW 0.26 0.27 grading coeff sidewall
JMM v1.4
VLSI--2
Exercises: VLSI
Ex vlsi2.1 (difficulty: easy): Calculate the missing
parameters on the previous transparency like intrinsic
transconductance k, bulk threshold parameter γ and
0.5µm process)
oxide capacitance Cox of an nfet (Alatel 0.5µ process)
=100µA/V2, kp=24.9µ
Result: kn=100µ =24.9µA/V2, γ=0.334V0.5,
Cox=3.45E-
=3.45E-7 F/cm2 (see Weste pp48ff)
Ex vlsi2.2 (difficulty: easy): Calculate the threshold
voltage shift due to the body effect of an nfet at Vsb =
2.2V ((Alcatel 0.5µm process)
Alcatel 0.5µ
Result: dVtn = 0.282V (see Weste pp55)
Ex vlsi2.3 (difficulty: easy): Calculate the
transconductance βn of an nfet (Alatel 0.5µ
0.5µm process),
W=1 µm, L= 0.5 µm
Result: βn=200 µΑ/ µΑ/V2 (see Weste pp53)
Ex vlsi2.4 (difficulty: easy): Calculate the capacitances of
an nfet with Vsb=
Vsb=Vdb=3V, W=1µm, L=0.5µ
Vdb=3V, W=1µ L=0.5µm,
A=1µm2, P=3µ
A=1µ P=3µm (Alatel 0.5µm process)
(Alatel 0.5µ
Result: Cgate=2.35fF, Cdrain=Csource=1.2fF (see Weste
pp183-
pp183-191)
Weste pp99: 2.10: Have a look at ex 8, 9

JMM v1.4
VLSI Design I
Static characteristics of MOS inverter
Static characteristics?
Does that mean it’s not
going to move?
Overview
Static transfer characteristic of CMOS gates
Goal: You know the transfer characteristic of CMOS

gates and know how to calculate noise margins
JMM v1.4
NFET Review
D D +
G G Vds >= 0
+
S - S -
Vgs
Operating regions: 0.7V
cut-
cut-off: Vgs < Vt S D
linear:V
linear: Vgs >= Vt
Vds < Vdsat S D
Vgs - Vt
saturation: Vgs >= Vt
Vds >= Vdsat S D
Ids
Vgs
Vds
JMM v1.4
PFET Review
D D -
G G Vds <= 0
+
S - S +
Vgs
Operating regions: -0.7V
cut-
cut-off: Vgs > Vt S D
linear:V
linear: Vgs <= Vt
Vds > Vdsat S D
Vgs - Vt
saturation: Vgs <= Vt
Vds <= Vdsat S D
-Vds
-Vgs
-Ids
JMM v1.4
“Bipolar” Logic
Isn’t this a
CMOS course?
Bipolar = two signal levels

‘0’ when V near 0
‘1’ when V near Vdd
Vdd
Inverter recipe:
pullup: make this connection
when Vin near 0 so that Vout = Vdd
Vin Vout
pulldown: make this connection

when Vin near Vdd so that Vout = 0
one power supply => low impedance source for 2 levels

receivers have a simple job => only make one decision
no DC power if connections not “made” at same time
Boolean logic has been around a long time
JMM v1.4
Characterizing Inverters
What goals do we want to achieve with our inverter
implementation (and, more generally, other functions)?
fast propagation delay (next lecture!)
low power dissipation
compact layout
noise immunity
Vout
Draw voltage-
voltage-transfer
VOH curve (VTC) for inverter.
Shade-
Shade-in areas that
VTC can’t enter.
What can we say about
gain?
VOL What is “ideal” inv. VTC?
Vin
VIL VIH
JMM v1.4
Noise Margin Are there other ways
of signalling?
noise immunity. Since we’re signalling values using

voltages, we want good noise margins. This means
that we need to make an allowance for noise when
assigning voltage levels for valid inputs and outputs
definition: NM L = VIL max − VOL max
NM H = VOH min − VIH min
output input
characteristics characteristics
Vdd
Logical High
Output Range VOHmin Logical High
Input Range
VIHmin
VILmax
Logical Low
Logical Low VOLmax Input Range
Output Range
Vss
JMM v1.4
Choosing signal voltages
This is a subject on which reasonable people
can disagree! One possible line of attack:
merged VTC for all
process corners &
Vout devices
Step 1: pick VIL and VIH
don’t want to amplify noise
so find values of Vin where
VTC gain = 1 or -1. Choose
smallest VIL and largest VIH
VIL VIH
Vout
Step 2: pick VOL and VOH
choose values so that VOH
(1) VTC is in legal territory
(2) leave desired noise
margins VOL
VIL VIH
NML NMH
JMM v1.4
Inverter pulldown devices
The NFET makes an ideal pulldown device:
Ipd
if pullup is off, VOL = ______

no DC connection when Vin < ______
increase width to increase Ipd
compact layout
saturated pulldown region
Vout Vin = Vout
Vin = Vout + Vt0
cut-
cut-off
pulldown
region linear pulldown region
Vin
VIL
Vt0
always > Vt0
JMM v1.4
Inverter pullup devices
Resistor. No load on input, VOH=Vdd
Will dissipate static power; increasing R will reduce
power and increase noise margin, but low-
low-to-
to-high
transition gets slower. Only practical if process
supports undo
undop
doped poly which has sheet resistance of 10M
Ohm/square.
Depletion-
Depletion-mode NFET. No load on input, VOH=Vdd.
Connecting gate to source sets Vgs = 0 so Ipu is
determined only by Vout. Layout can be compact since
pullup is in same well as pulldown;
pulldown; buried contact can be
used to connect gate to source. Only found in NMOS
processes.
Enhancement-
Enhancement-mode NFET. VOH= Vdd- Vt unless gate of
pullup is driven above Vdd. If gate is not switched off,
pullup needs to be weak to avoid excessive power
dissipation, but this may entail larger layouts. Useful
where PFETs not wanted (e.g., some I/O structures).
Pseudo-
Pseudo-NMOS using saturated PFET as load
device. VOH= Vdd. Useful for building large fan-
fan-in NOR
gates found in static ROMs and PLAs where static power
dissipation is okay.
JMM v1.4
Inverter with PFET pullup
Vgs,
gs,pu = Vin-Vdd Vds,
ds,pu = Vout-Vdd
S negligible steady-
steady-state
G
power dissipation
Vin D Vout VOL = 0V, VOH = Vdd
D VTC transition very sharp
switching point can be
G S adjusted by fet sizing
Vgs,
gs,pd = Vin Vds,
ds,pd = Vout
non-
non-vertical only because
of channel-
channel-length modulation
Vout Vin = Vout
Vdd
n=off lin
p=
sat
sat
p=
n= p=off
lin
n=
Wn/
Wn/Wp>1
Wp>1 Wn/
Wn/Wp<1
Wp<1
Vin
Vt,p Vt,n Vdd+Vt,p Vdd
JMM v1.4
Build your own VTC
In the steady state:
Ids,pd(Vin,Vout) = -Ids,pu
ds,pu(Vin-Vdd,Vout-Vdd)
Ids,
ds,pd Ids,
ds,pd
-Ids,
ds,pu
Vin = 0.5V -Ids,
ds,pu
Vin = 1.5V
Vout Vout
Vout
Ids,
ds,pd
-Ids,
ds,pu
Vin = 2.5V
Vout
When both fets are
saturated, small changes
in Vin produce large
changes in Vout
Vin
Ids,
ds,pd Ids,
ds,pd
-Ids,
ds,pu
Vin = 3.5V -Ids,
ds,pu
Vin = 4.5V
Vout Vout
JMM v1.4
Ben Bitdiddle’s Buffer!
Vin Vout
How many would you buy?
JMM v1.4
Coming Up...
Next topic…
Dynamic characteristics of MOS inverters:
propagation delay, effects of rise and fall times.
Transistor sizing, interconnect issues, estimating
performance.
Weste:
Sections 2.3 thrugh 2.3.2
JMM v1.4
VLSI--3
Exercises: VLSI
Ex vlsi3.1 (difficulty: easy): Calculate the CMOS
inverter threshold values for the following confi- confi-
0,5µm process,VDD=3,3V)
gurations (Alcatel 0,5µ
a) Wn = Ln, Wp = Lp
b) Wn = 10 Ln, Wp = Lp
c) Wn = Ln, Wp = 10 Lp
Result: a) Vinv = 1.30V, b) Vinv = 0.893, c) Vinv =
1.88V (see Weste pp66)
Ex vlsi3.2 (difficulty: medium, time consuming):
Calculate the noise margin and VIL, VIH, VOL, VOH,
for a CMOS inverter operating at 3.3V with βn=
βp, Utn= -Utp=0.61V.
Result: VIL = 1.39V, VIH = 1.91V, VOL = 0.26V,
VOH = 3.04, NML= NML=1.13V
Weste pp99: 2.10 ex 5 (difficulty: medium, time
consuming): Design an input buffer that may be
used to interface with a TTL driver (V (Vdd=3.3V,
VOL=0.8V, VOH=2.0V). Show full derivations of
DC conditions. Assume Wn =1µ =1µm and Ln = Lp =
0.5µm
0.5µ
1.51µm
Result: Wp = 1.51µ MicroLab, VLSI-3 (14/14)
JMM v1.4
VLSI Design I
Dynamic characteristics of MOS inverters
Wow! 0 to 3.3 volts in 300ps!
Overview
gate delay modeling
power dissipation
Goal: You are familiar with CMOS gate delay models

like Penfield-
Penfield-Rubenstein and wire models. You
know the influence of body effect and large loads to
gate delay. You know why ground bounce occurres.
occurres.
You know the different factors in power dissipation.
JMM v1.3
Static properties reviewed
sharp transition:
inverter good
receiver for voltage-
voltage-
based signalling
Vout Ids,n
ds,n
increasing Wn increasing Wp
decreasing Wp decreasing Wn
Define theshold voltage

Vinv as voltage where
Vin = Vout on VTC.
Vin
VOH=Vdd, VOL=0, sharp transition => good noise margins
VOH=Vdd => pfet off when Vin=VOH => no static power
VOL=0 => nfet off when Vin=VOL => no static power
VTC describes static behaviour. When Vin changes, Vout

“lags behind” because it takes time for capacitors to
charge/discharge. So, in real, life Vin reaches Vth before
Vout does.
JMM v1.3
Choosing what to measure
V tf
Vin
90%
Vin Vout ???
10% Vout
t
td
tr
Rise time, tr = time for a waveform to rise from 10% to
90% of its steady-
steady-state value
Fall time, tf = time for a waveform to fall from 90% to
10% of its steady-
steady-state value
Delay time, td = time between input transition (when Vin
= ???) and output transition (when Vout = ???).
If ??? = Vinv, can delay be negative?
does Vinv differ for each gate?
so does td(seq. of gates) = sum(td)?
should we choose 50% instead of Vinv?
JMM v1.3
Signal delay time
Signal delay time is composed as follows
gate delay time
interconnection delay time
due to minimization the delay times decreases

the output impedance of buffers increases, thus the
importance of interconnection delays increases
due to continuing miniaturization, signal delay time

Ödue
becomes less dependent on gate delay but more
dependent on interconnection delay time
UCC
switch level mode of fet switch level mode
of inverter Rp
Ugs Uds Uin Uout
C Cin
R Rn
JMM v1.3
Fall time analysis #1
dynamic transition
Vout
static transition Vin = Vout
Vdd
n=off lin
speed
p=
sat
sat
p=
n= lin p=off
n=
Vin
Vt,p Vt,n Vdd+Vt,p Vdd
the switching speed is limited by the time taken to

discharge the capacitance CL
the static transition curve moves to the right if the
input transition is fast
p-fet gets cut-
cut-off during the whole falling output time
n-fet immediately gets saturated, later on linear
JMM v1.3
Fall time analysis #2
Saturated: Vout >= Vdd - Vt,n
dVout βn
= − (Vdd − Vt,n )
2
CL
Vout dt 2
So, time to fall from 0.9Vdd to
Idsat,n
dsat,n CL Vdd - Vt,n is given by
2C L 0.9V dd
β n (Vdd − Vt,n )
2 ∫Vdd − Vt, n
dVout
Linear: Vout < Vdd - Vt,n

dVout Vout
CL =− = −Idn function
Vout
dt Rn of Vout
Rn CL So, time to fall from Vdd - Vt,n to
0.1Vdd is given by
Vdd −Vt ,n dVout
CL ∫
0.1Vdd I dn
Adding to get total fall time (Weste, Eq 4.37):
Vt,n/Vdd
CL 2  (n − 0.1 ) 
tf =  + 0.5 ln (19 − 20n )
β n Vdd (1 − n )  (1 - n )


tr is
similar equals 3 to 4 for Vdd=3V-
=3V-5V and Vt,n=.5V-
=.5V-1V
equals 3.6 for C05M
JMM v1.3
Estimating delays
In most CMOS circuits, the delay of a single gate is
dominated by the output raise and fall time. Thus:
tr tf
t dr = t df =
2 2
Having found a general form for approximate rise and fall

times, one might estimate all delays using the same general
form:
L width expressed
t delay = A delay CL
W as multiple of
minimum width
looks like a resistor!
Where Adelay is a constant that depends on the power supply

and transition voltages, the process and the minimum
mosfet dimensions. This last dependency might strike one
as odd, but usually all mosfets are built using the minimum
allowable mosfet length for the process.
Rather than solve the equations analytically, one can use
Spice to determine the value of various useful constants:
Ar, Af, Adr, Adf. These can be used in quick&dirty
calculations for sizing transistors during the design
process. MicroLab, VLSI-4 (7/29)
JMM v1.3
Input rise/fall & delay
How do input rise and fall times affect delay?
fast inputs will quickly turn off one mosfet and provide
maximum Vgs to the driving mosfet for most of the output
transition
slow inputs will leave both mosfets on longer, reducing
effective current to/from load capacitance and Vgs will be
lower than above.
So we might expect slower input transitions to lead to
longer output delay times.
One rule of thumb (Weste
(Weste,
Weste, p. 216ff)
~0.2 for Vtn = 0.61V, Vdd = 3.3V
1 + 2n
t dr = t dr −step + t f,in
6
1 − 2p
t df = t df −step + t r,in
6
valid for input transitions that aren’t “too” long
JMM v1.3
Bootstrapping & delay
CGD
When the input starts to rise, the output, which was

high, starts to fall. Thus the voltage across CGD
changes requiring the input to supply more current to
charge CGD, slowing the input transition.
Since CGD is small, this is usually a small effect.
When inverter is biased into its linear region, CGD may
appear multiplied by the gain of the inverter (Miller
effect). This doesn’t usually matter in digital circuits
since the input passes rapidly through linear region.
Useful in analog circuits...
JMM v1.3
Multiple inputs & delay
Cout
A
Cab
B Intermediate
Cbc node
C capacitances
Ccd
D
How should we model delays when we have multiple

inputs? When A, B, C and D are logic 1:
treat series mosfets as resistances in series. Lump intermediate
node capacitance with load capacitance.
t d = ∑iR i ∑ iC i
use Penfield-
Penfield-Rubenstein model which predicts
t d = ∑iR i C i
where Ri is the summed resistance from point i to ground and Ci
is the capacitance at point i.
Penfield-
Penfield-Rubenstein Slope Model uses effective
resistance simulated by Spice: t df
Rn =
C
JMM v1.3
Body effect & delay
If A goes from 0 to 1 while B, C and D are 1,

then all the intermediate nodes in the pulldown chain have
already been discharged and the top mosfet sees only a
small body effect.
If D goes from 0 to 1 while A, B and C are 1,
then the intermediate nodes are all one Vt below Vdd and
the upper mosfets see a larger body effect.
Thus A is the “faster” input!
JMM v1.3
Driving large loads #1
If large loads have to be driven, the delay may increase
drastically. Large loads are output capacitances, clock trees,
etc.
C
t d = t inv L = 1000 ⋅ t inv
1 CG
CG CL=1000 CG
A possibility to reduce the delay, but probably not the

optimum:
40 ⋅ t inv 5 ⋅ t inv 5 ⋅ t inv
1 40 200
CG CL=1000 CG
40 200 1000
td = ⋅ t inv + ⋅ t inv + ⋅ t inv = 50 ⋅ t inv
1 40 200
JMM v1.3
Driving large loads #2
To drive a large load capacitance one might
employ a sequence of n inverters, each a factor “a” larger
than the previous one:
1 a a2 a3
CG CL
n=4 inverters
The delay through each stage is atd where td is the average

delay of a minimum-
minimum-sized inverter driving another minimum-
minimum-
sized inverter. We want an = (CL/CG), so
 CL  a t d
Total delay = n (a t d ) = ln  
 C G  ln (a )
Thus, total delay is minimized when a = e = 2.7
7
4
in practice
3
a=3...5
2
0
0 1 2 3 4 5 6 7 8
JMM v1.3
Power dissipation #1
the power consumption is low compared to other

technologies
scaling down increases the power dissipation
density with respect to chip area
power dissipation produces heat on the chip, which
has to be carried off through the chip socket
power dissipation is one of the limiting factors in
todays CMOS VLSI chips
low power applications is a speciality of EM
(Neuenburg,
Neuenburg, watches, battery applications, etc)
JMM v1.3
Power dissipation #2
sources of power dissipation:

static power dissipation (quiescent current)
dynamic power dissipation
dc power dissipat
dissipation: short circuit current (power to
ground) due to switching
ac power dissipation: capacitor current (charging, re-
re-
charging) due to switching
static power dissipation

there is always one fet off, so only leakage current is
present
I0 = I S (e qV / kT − 1 )
PS = ∑ I0 ⋅ VDD
JMM v1.3
Dynamic power dissipation #1
Comparison of dynamic short circuit current vs.
capacitive current.
As expected, the short circuit current have a less
important contribution when the load gets large.
Slower input transition would increase short circuit
current.
Uin Uout
W/L=4 Idsn
Uin Uout-
out-A
W/L=2
Idsp
W/L=4 Idsn
Uin Uout-
out-B
W/L=2 Idsp
50fF
W/L=4 Idsn
Uin Uout-
out-C
Idsp
W/L=2 200fF
JMM v1.3
Average dynamic power for switching a square-
square-wave input
with a repetition frequency of fp = 1/t 1/tp is (capacitor
current)
t p /2 tp
1 1
Pd = ∫ in (t )Vout dt + ∫ i p (t )(VDD − Vout )dt
tp 0 t p t p /2
Assuming a step input and taking in(t) = CLdVout/dt

dt,,
i.e., the capacitive current, we get:
Vdd 0
CL CL
Pd =
tp ∫0 Vout dVout + t p ∫ (V
Vdd
DD − Vout )d (VDD − Vout )
Aha! Now one can see why everybody

changes from 5V to 3.3V and to 2.5V!
C L VDD2
Pd = = C L VDD2 fp
tp
proportional to switching
frequency but independent
of device parameters
JMM v1.3
Short circuit power dissipation is given by
Psc = Imean ⋅ VDD

tr tf
VDD+Vtp
Vtn
tp
Imax
Imean
t1 t2 t3
The above waveform shows the short circuit current
β t rf
⋅ (VDD − 2 Vt )
3
Psc =
12 t p
JMM v1.3
Total power dissipation
Total power dissipation is:
Ptotal = Ps + Pd + Psc
dynamic power dissipation is dominant

use switching activity to estimate power
dissipation:
Pd = n switching ⋅ C total ⋅ VDD2 ⋅ f
switching activity:
nswitching = percentage of switching gates
there exist simulators estimating power dissipation
using the switching activity
JMM v1.3
Build your own power meter
linear current-
current-controlled
current source
+
Vs = 0 Is g*I
g*Is RY CY Vy
-
Vy(0) = 0V
Device
or
Periodic input Circuit CL
Vin(t) = Vin(t+T)
If one chooses Vdd C y

g=
T
and
RyCy >> T
Then Vy(T) in volts will equal the average dynamic
power in watts drawn from the power supply over
one period.
JMM v1.3
Power and ground bounce
Metal power-
power-carrying conductors have to be sized
for three reasons:
metal migration
power supply noise
RC delay
general rule:
limit current density J AL ≈ 0.4... 1mA / µm
contact replication
I I I
JMM v1.3
“It’s the wires, stupid”
As process dimensions shrink, wiring capacitances
start to dominate the mosfet capacitances.
To estimate wiring capacitances, consider the
following figure:
h
Cpp
fringing-
fringing-field
parallel-
parallel-plate
capacitance
capacitance
Parallel-
Parallel-plate capacitance given in process
files. Fringing capacitance is significant
when t is comparable to h.
JMM v1.3
Fringing Capacitance
Figure 6.11 from CMOS Digital Integrated Circuits:
Analysis and Design, by Kang and Leblebici:
Leblebici:
For a long conductor where (t/h)=0.4,

(w/h)=0.25, (w/l)=0, the total capacitance
may be 10x the parallel plate capacitance.
JMM v1.3
Wire model?
Today, the longest wire on a VLSI chip might be 2cm
which has “time of flight” of ~130ps assuming εSiO2
= 3.9 ε0
If the signal rise/fall time is longer than the time of
flight we can model wires as a distributed RC network.
Longer wires or shorter rise/fall times require the wire
to be modelled as a transmission line.
For short wires, a lumped RC model is sufficient. For
longer wires, we use the distributed RC model where
signal propagation can be shown to obey the diffusion
equation:
R/unit length dV d 2 V
rc = 2
dt dx
C/unit length distance from driver
Which means the prop time tx = kx2 with the

signal “edge” becoming dispersed with
increasing x.
JMM v1.3
Eq.. in “real life”
Diffusion Eq
rcl2 Weste,
Weste, Eq.
Eq. 4.28,
t = 2 .2 but 10% to 90% rise/fall time
2
Ex vlsi4.3: clock with 50pf load distributed
1µ-wide metal wire running from clock
by 1µ
buffer in corner of 10mm x 10mm chip.
buffer
r = 0.05 ohm/square
c = 50pf/20mm
l = 20mm
a) t = ? b) t = ?
Fix: drive clock from central location to

20µ:
decrease l and widen clock wire to 20µ
r = 0.0025 ohm/square
c = 50pf/10mm
l = 10mm
c) t = ?
whew!
JMM v1.3
Inductance
Bond-
Bond-wire inductance can cause deleterious effects
in large, high speed I/O buffers
package inductance: 3 .. 15 nH
with process shrinking on-
on-chip inductance has to be
taken into account
Vdd
on-
on-chip inductance: 10 .. 50pH/mm
L
dI
dV = L i(t)
dt
design techniques:
9 separate power pins for I/O pads and chip core
9 multiple power and ground pins
9 careful selection of the position of the power and
ground pins on the package
9 adding decoupling capacitances on the board
9 increase the rise and fall times
9 use advanced package technologies (SMD, etc)
JMM v1.3
Coming Up...
Next topic…
Combinational logic: series/parallel switch
networks, transmission gates. Performance
optimis
optimisation.
ation.

Weste:
4.4 (inductance)
4.3.6, and 4.5 thru 4.5.1, and 4.5.4 thru 4.5.5 except
4.5.4.4, and 4.6.3 (delay modelling)
4.7 (power consumption)
4.8 (sizing routing conductors)
You should read the rest of chapter 4 when you get

the chance ...
JMM v1.3
VLSI--4
Exercises: VLSI
Ex vlsi4.1 (difficulty: easy): Calculate the inductive spike
at the power supply provoked by 8 output buffers, each
driving 50pF in 4ns, Vdd=3.3V, total bonding
inductance 15nH
Result: dVtot = 1.24V (see Weste pp 205)
Ex vlsi4.2 (difficulty: easy): a) Calculate the power

supply width Wpower necessary for feeding a clock buffer
running at 50MHz driving 100pF. b) What is the ground
=0.5mA/µm,
bounce with the chosen conductor? (JAL=0.5mA/µ
power supply distance l = 1mm, Vdd=3.3V, Rmetal1 =
72mΩ/sq, tr= tf=1ns)
72mΩ
Result: a) Wpower=33 µm,
b) dV = 0.72V (see Weste pp 239)
Ex vlsi4.3 (difficulty: easy): Calculate the clock

distribution delay for the example on transparency 25
Result: a) td=55 ns, b) td=27.5 ns,
b) td=1.38 ns (see Weste pp 200)
JMM v1.3
VLSI--4
Exercises: VLSI
Ex vlsi4.4 (difficulty: easy): Calculate Ar and Af for

a CMOS inverter ((Vdd
Vdd=3.3V, 0.5µm
Vdd=3.3V, Alcatel 0.5µ
process)
Result: Ar =43.9 kΩ kΩ, Af =10.9 kΩ kΩ (see Weste
pp208ff and transparency 7)
Weste pp370: 5.9 ex 14 (difficulty: easy): A low

power 3.3V chip has a clock of 12MHz. In the
power down-
down-mode, the clock driver drives 5mm of a
2µm wide metal1 wire. If the area capacitance of
Ca=2.37pF/µm2 and the sidewall
metal is Ca=2.37pF/µ
2.37pF/µm what is the
capacitance is Cf0= 2.37pF/µ
power-
power-down dissipation, assuming this is the
dominant term? What is the dissipation if the wire
50µm length?
is reduced to 50µ
85µW, 0.85µ
Result: Pd = 85µ 0.85µW (see Weste pp 235)
JMM v1.3
VLSI Design I
CMOS Combinational Logic
Overview
Euler rules for complex CMOS gates
Layout and stick diagram
Goal: You know how to design compact layout of

complex CMOS logic gates with the Euler rules.
You are familiar with transmission gates and its
limitations.
JMM v1.4
How ‘bout more than 1 input?
Vdd
Logic recipe:
pullup: make this connection
when we want F(A1,…,An) = 1
...
A1
F(A1,…,An)
...
An
pulldown: make this connection
when we want F(A1,…,An) = 0
...
Finally! I was
getting tired
of inverters...
we want VOH = Vdd, better use only

pfets in the pullup path
similarly, since we want VOL = 0, better
use only nfets in the pulldown path
looking at pulldown path: since nfets are
on when VGS > VTH, output will be pulled
low when right combination of inputs are
high…
CMOS gates are naturally inverting
JMM v1.4
Complementary logic
Now you know what the “C”
in CMOS stands for!
We want complementary pullup and pulldown
logic, i.e., the pulldown should be “on” when
the pullup is “off” and vice versa.
pullup pulldown F(A1,…,An)
on off driven “1”
off on driven “0”
on on driven “X”
off off no connection
Since there’s plenty of capacitance on the output

node, when the output becomes disconnected it
“remembers” its previous voltage -- at least for a
while. The “memory” is the load capacitor’s charge.
Leakage currents will cause eventual decay of the
charge (that’s why DRAMs need to be refreshed!).
“No connection” is also useful for constructing

tristate drivers! In this case, we call this state
“Z” which is short for “high-
“high-Z” which is short for
“high impedance” which is how engineers say
“no connection”. Isn’t jargon wonderful?
JMM v1.4
CMOS complements
What a nice Thanks. It runs
VOH you have... in the family...
pulldown pullup
nfet block pfet block
conducts when VGS is high conducts when VGS is low
A
A B
B
conducts when A is high conducts when A is low

and B is high: A.B or B is low: A+B = A.B
A
A B
B
conducts when A is high conducts when A is low

or B is high: A+B and B is low: A.B = A+B
JMM v1.4
Development of CMOS gates /1
Example: CMOS NAND gate F = A*B
A
B 0 1
Step 1: development of nfet 0 1 1
block. Logic mini-
mini-
mization of “0” in 1 1 0
Karnaugh diagram
F=A*B
JMM v1.4
A
B 0 1
Step 2: development of pfet 0 1 1
block. Logic mini-
mini-
mization of “1” in 1 1 0
Karnaugh diagram
F=A+B
A B
JMM v1.4
A
B 0 1
Step 3: put nfet and pfet 0 1 1
block together
1 1 0
F=A*B
JMM v1.4
NAND & NOR
2-input NAND. When output is low,
two nfets are in series. So to keep
output fall time equivalent to that
of an inverter, the nfets have to be
twice as wide. Pfet widths can be
A same as those in the inverter (but
remember there were already 2x nfet
widths!). Can be extended to large
B fan-
fan-in but practical limit is 4 inputs.
Why?
2-input NOR. When output is high,

B two pfets are in series. So to keep
output rise time equivalent to that
of an inverter, the pfets have to be
twice as wide. Nfet widths can be
same as those in the inverter. Can
be extended to large fan-
fan-in but
A practical limit is 4 inputs. NOR gates
are considered less good than NAND
gates. Why?
Pseudo-
Pseudo-NMOS NOR gates are
used to build high fan-
fan-in NOR
gates for PLA’s to save area
A1 … An
(at some cost in static power).
JMM v1.4
Layout of simple gates
VDD p-type substrate
n-type well
metal/pdiff
metal/pdiff
contact
with detail
removed Wp
Lp
IN OUT
Wn
Ln contact
from metal
to ndiff
GND
metal2 metal poly n+ diff p+ diff
JMM v1.4
Layout Rules #1
layout rules are the common language between

design and process engineers
conservative rules absorb process disturbances and
variations
layout rules must be respected by the designer
layout rules reflect the limits of a process, they
describe:
minimal distance, overlap
minimal width (e.x. channel length, λ)
layout readability is improved using colours:
metal blue
polysilicium red
n-diffusion green
p-diffusion yellow
n-well brown
contact, via black
JMM v1.4
Layout Rules #2
symbol and mask layout of a CMOS inverter

n-well contact (n-
(n-diff)
bulk contact (p-

(p-diff)
JMM v1.4
Stick Diagram
stick diagrams are technology independent

no layout rules need to be known
mask layout may be generated automatically
JMM v1.4
NAND & NOR ((again
again))
again
JMM v1.4
Fan--In CMOS Gates
Large Fan
CMOS gates with large fan-

fan-in suffer from:
body effect
unsymmetrical delay
large delay
⇒ never use more than 4 or 5 fets in series
⇒ increment logic depth
&
&
& ≥1
&
JMM v1.4
CMOS Gate Recipe
A
Step 1. Figure out pulldown
network that does what you
want, e.g., F = A*(B+C) B C
Step 2. Walk the hierarchy

replacing nfets with pfets,
pfets,
B
series subnets with parallel A
subnets, and parallel C
subnets with series subnets
Step 3. Combine pfet

B
pullup network from Step A
2 with nfet pulldown C
network from Step 1 to
form fully-
fully-complementary A
CMOS gate.
But isn’t it B C
hard to wire
it all up?
JMM v1.4
Complex CMOS Gates /1
classical CMOS logic gates are always inverting

logic gates
complex CMOS logic gates are a mixture of AND
and OR structures with a final inversion
Example: F = A * B + C * D
Step 1: generation of nfet

A C
block (logic “0”)
B D
F=A*B+C*D
C D
Step 2: generation of pfet
block (logic “1”)
A B
F = (A + B) * (C + D)
JMM v1.4
Step 3: put everything C D

together. What
about the layout ?
A B
A C
where is this signal
B D
in the transistor schema ?
A & ≥1
B
C &
JMM v1.4
Complex CMOS Gates Layout /1
Goal: compact layout. All complex gates may be

designed using a single row of nfets and a single
line of pfets,
pfets, thus adjacent drain/source diffusions
of fets are very close.
Euler rule:
generate an n-n-graph by replacing the nfet block with
vertices for nodes and edges for fets
generate a dual p-p-graph
find a sequence containing all edges in the n-
n-graph.
This sequence is called Euler n-path.
generate an Euler p-path with the same labelling as
the Euler n-path. If not possible start again.
the labelling sequence of the 2 Euler paths are the
gate sequence of the single row nfet/
nfet/pfet CMOS
gate.
JMM v1.4
Complex CMOS Gates Layout /2
VDD
C D
N1
A B
F
A C
N2 N3
B D
VSS start
F
C
A
start
VDD N3 N1 N2 F
D
B
VSS
A -> B -> D -> C
JMM v1.4
C D
A B
F
A C A -> B -> D -> C

B D
A B D C
JMM v1.4
C
A
B
B C
JMM v1.4
A Quiz! /1
JMM v1.4
A Quiz! /2
Find the minimal transistor circuit (2 * 4 fets)

fets) and
the most compact layout using Eulers rule.
CD
00 01 11 10
AB
00 1 1 1 1
01 0 0 0 0
11 0 0 0 0
10 1 0 0 0
JMM v1.4
Quiz : Solution
F=A*B+B*C*D
F = B * ( A + C * D) equation ready for p-

p-block
C VDD
start
VSS P1 N1 F
A
D
P2
start
D -> C -> A -> B

JMM v1.4
Transmission Gates
S
CMOS nMOS
A B A B
S
S
If VA = VDD then current will flow from

A to B until VB = _____
If VA = 0 then current will flow from

B to A until VB = _____
Assuming S and -S are complementary signals, the CMOS

transmission gate (TG) acts as a switch, controlled by S,
that has no inherent voltage drop (unlike a switch
constructed from a single nfet or pfet which exhibits at
VT drop at one rail or the other).
JMM v1.4
CMOS TG Electrical Model
S=VDD S=0
A B A B
S=0 S= VDD
switch is off switch is “on”
How on is “on”? Assume VA = VDD then

nfet = sat nfet = sat nfet = off
pfet = sat pfet = lin pfet = lin
VB
0V |VT,p| VDD-VT,n VDD
R
Req,p
eq,p Req,n
eq,n
Req,TG
eq,TG
Req,n
eq,n || Req,p
eq,p
VB
0V VDD-VT,n VDD
JMM v1.4
TG Circuits: MUX
A
Y=A*S+B*S
B
Is this node
always the “output”
S of this gate?
inverter
not drawn
JMM v1.4
TG Circuits: 4 to 1 MUX
multiplexers can easily be done with TG

never forget that TG are bi-
bi-directional
compact layout by combining identical gates
B
F
C
S1
S2
JMM v1.4
Best XOR in Town
A ≥1&
A =1 F B F
B ≥1
12 transistors
A
A*B+A*B
B
Is this node
8 transistors of this gate?
A A*B+A*B
B Is this node
of this gate?
6 transistors
JMM v1.4
TG Quiz
Find the function of the following 4 transistor circuit:
JMM v1.4
TG Circuits: Problems
difficult to get compact layout

outputs behave like bi-
bi-directional signals
many TG in series provoke large delays
Uin Uout
R R R R R
Uin Uout
C C C C C
τ = 2.2 ⋅ (RC )2
JMM v1.4
Coming Up...
Next topic…
Dynamic ((precharge
precharge/evaluate)
precharge/evaluate) logic circuits:
CMOS domino logic, NP domino logic, CVSL logic.
Charge sharing.

Weste:
Sections 5.3 thru 5.3.4 and 5.4.6
5.3.9 thru 5.4.1
JMM v1.4
VLSI--5
Exercises: VLSI #1
Ex vlsi5.1 (difficulty: easy): Design a CMOS gate that

implements the function
Out = (( A + B) ⋅C + D ⋅ E ) ⋅ F
Ex vlsi5.2 (difficulty: easy): What is the Boolean

equation of the following CMOS gate.
VDD
A
B
Z
GND
JMM v1.4
VLSI--5
Exercises: VLSI #2
Weste pp371: 5.9ex7 (difficulty: easy): Design a pass

transistor network that implements the sum
function for an adder
S = A ⋅B ⋅ C + A ⋅B ⋅C + A ⋅B ⋅ C + A ⋅B ⋅C
JMM v1.4
VLSI Design I
Dynamic Logic Gates
Overview
Dynamic logic gates, Domino, NORA, CVSL structure,
Goal: You are familiar with dynamic logic gates and its
different families. You can handle the dynamic
logic problems like charge sharing and timing.
JMM v1.3
Tinkering with Logic Gates
Things to like about CMOS gates:
easy to translate logic to fets
rail-
rail-to-
to-rail switching
good noise margins
no static power since fets are in cutoff
sizing not critical to correct operation
Things not to like about CMOS gates:

N inputs Ö 2N fets (i.e., one nfet and one pfet)
pfet)
large circuit area, especially for pfets
“heavy” loading of inputs
pfets are either large or slow relative to nfets
series connections can get very slow
We can replace pfet pullup network with pseudo-

pseudo-NMOS
load (pfet
(pfet with grounded gate) but
dissipate static power when output is low
have to make load fet small to ensure that
VOL is low enough to cut off nfets in next stage
reduces static power consumption (good!)
increases output rise time (bad!)
One alternative: dynamic CMOS gates
JMM v1.3
Dynamic CMOS Gates
“pre
“precharge”
B switch
A B
A “evaluate”
CLK
switch
inputs must be stable before CLK goes high because once output has
been discharged it won’t go high again until next cycle
for same reason, noise/glitches on inputs cannot exceed nfet
threshold, a much more stringent requirement than for static CMOS
CMOS
gates.
Prec
Precharge
echarge phase Evaluate phase
clock
output
JMM v1.3
There’s good news & bad news
The good news:
Dynamic gates are faster than static gates despite the extra
“evaluate” fet in the pulldown path because of the reduction in self-
self-
loading and the elimination of the pullup short-
short-circuit current during
the first part of the output transition.
The bad news:
Dynamic gates cannot be cascaded.
Because of finite pulldown

time for node , node
starts to discharge!
evaluate
precharge
nfets nfets CLK
CLK
Solution: develop techniques that avoid races

CMOS Domino logic
CMOS NORA (no race) logic
JMM v1.3
CMOS Domino Logic
pree
preecharge: high
evaluate: falls (maybe)
nfets nfets
buffer might
be needed
in any case
CLK for high fan-
fan-out
circuits.
pree
preecharge:low
evaluate: rises (maybe)
When CLK is low, dynamic node is pree

preecharged high and buffer inverter
output is low. Nfets in the next logic block will be off.
When CLK goes high, dynamic node is conditionally discharged andand the
buffer output will conditionally go high. Since discharge can only
only
happen once, buffer output can only make one low-
low-to-
to-high transition.
When domino gates are cascaded, as each gate “evaluates”, if its

output rises, it will trigger the evaluation of the next stage, and so
on… like a line of dominos falling. Like dominos, once the internal
internal
node in a gate “falls”, it stays “fallen” until it is “picked up”
up” by the
pree
preecharge phase of the next cycle.
Thus many gates may evaluate in one eval cycle.
JMM v1.3
Domino--style Circuits
More Domino
weak pfet “keeper” keeps dynamic node pulled high during
evaluate phase if it’s not being pulled down through nfets Ö
gate is static in both clock phases.
CLK
nfets
“latching” pfet acts like keeper above unless dynamic node
gets pulled down during evaluate phase. When buffer output
goes high it switches keeper off saving static power. Good
for leakage current problems...
CLK Note that you can put an even

number of static gates after
the inverter and before the
nfets
next domino gate.
Be careful of cap.
! coupling to dynamic
node (see later slide).
Use NOR gate instead of inverter as the

buffer to make a faster high fan-
fan-in AND
gate. Same trick works for high fan-
fan-in OR
CLK or MUX functions.
JMM v1.3
Optimising Domino Logic (I)
nfets nfets
CLK evaluate nfet

not needed?
precharge: low
Since domino gate outputs are low during the pre

precharge
phase, gates which have only domino output nodes as
inputs don’t need the “evaluate” nfet since all the nfets
in the pulldown will be off anyway.
But remember: if evaluate nfet is removed, precharge

will “ripple” through cascaded gates just like evaluates
do. Maybe only remove for gates where nfet stack is
tall (i.e. resistive) enough that pullup will start to
“win” anyway before ripple reaches gates and turns off
pulldowns.
pulldowns.
JMM v1.3
Optimising Domino Logic (II)
In domino logic circuits we want evaluate
to happen as quickly as possible. We can
size fets to optimise evaluate speed.
small large
large small
nfets
CLK Some designers also “grade” the sizes of the nfets,

nfets,
smallest at the top (increase in R offset by decrease
in C)
If we make the nfet in the output inverter

much smaller than the pfet then
the load on the internal node decreases, and
the switching threshold of the inverter increases
Both effects make the gate evaluate sooner. If large >>

half! However,
small, the gate delay can be cut almost in half!
the other edge is very slow, so ripple pree
preecharge is a
problem.
JMM v1.3
“it is not everything gold which is
glittering“
There are a few “little” difficulties:

“charge sharing”
sharing” between nodes in the pulldown
network and the dynamic node can unintentionally
reduce the voltage of the dynamic node enough to
switch output buffer
the addition of the output inverter makes domino
gates non-
non-inverting. One can often design around
this limitation, but some circuits cannot be
implemented solely using domino logic unless both
polarities (true and complement) of the inputs are
available. If both polarities of inputs are available
then we can generate both polarities of internal
signals with two domino gates so subsequent
stages will have both polarities of their inputs
available too.
JMM v1.3
Charge Sharing (I)
F=0-
F=0->1 C 3C
Suppose the dynamic node has been
E=1 1.5C
discharged during the previous
evaluate cycle. Then during
precharge, all the intermediate nodes
D=1 1.5C in the pulldown chain will remain
discharged while the dynamic node is
C=1 C precharged.
precharged. Calculate the voltage on
the dynamic node when CLK goes
B=1 high. When CLK goes high, the
C
voltage on the dynamic node goes to
A=1 ->0 C
3C V = 1.1V for VDD=3.3V
3C + 6C DD
CLK
which is low enough to switch the output
inverter.
Fortunately this situation is easily detected by CAD tools and ccan

an be
resolved by (1) adding additional pree
preecharge devices to intermediate
nodes or (2) increasing size of output buffer which will increase
increase
capacitance of dynamic node (faster output buffer may compensate
for larger internal capacitance).
JMM v1.3
Charge Sharing (II)
n-logic
n-logic n-logic
n-logic
CLK
additional precharge devices to

eliminate charge sharing problems
JMM v1.3
Capacitive Coupling
OUT
CLK
OUT
t
Coupling can also occur between other signal wires and long dynamic
dynamic
nodes (e.g., ones that span multiple bits in a datapath).
datapath). Solutions:
on long routes add “twists” to avoid continuous routes or route
dynamic signals between mutually exclusive or complementary
signals.
JMM v1.3
Domino Logic Design
To convert to Domino-
Domino-style design we need to
create schematic that uses non-
non-inverting gates:
(1) look for CMOS gates followed by inverter
(2) use Demorgan’s Law to create non-
non-inv gates
use Demorgan’s law

A
B X
C
D
E Y
F
G
H
Convert to Domino OR gate
Domino AND
A
B X
D
E Y
F
G
H Domino AND-
AND-OR
Domino OR
JMM v1.3
Domino Logic Design (II)
X Y
8/2 8/2 8/2
A E G H
B D F
C nfet W/L = 4
pfet W/L = 8
CLK
s = static
d = domino (W/L = 4)
dd = domino (W/L = 8)
JMM v1.3
Dual--rail Domino Logic
Dual
Domino circuits that generate both polarities of output
CLK CLK
A A
A B A B
B B
CLK CLK
CLK
A A
A
B B
CLK
JMM v1.3
Multiple--output Domino
Multiple
Why stop at complementary outputs? There are interesting
multiple-
multiple-output functions where there is a lot of sharing of nfets in
the evalua
evaluate logic.
logic. For example, in a carry-
carry-lookahead adder
C1 = G1 + P1C0 Gi = A i Bi
C2 = G2 + P2G1 + P2P1C0 Pi = Ai+Bi
C3 = G3 + P3G2 + P3P2G1 + P3P2P1C0
C4 = G4 + P4G3 + P4P3G2 + P4P3P2G1 + P4P3P2P1C0
CLK
C4
P4 G4
C3
P3 G3
C2
P2 G2
C1
P1 G1
C0
Domino version of the

Manchester carry chain
JMM v1.3
Dual--rail “Keeper” Circuit
Dual
CLK
A
A B
B
CLK
The cross-
cross-coupled pfets serve as “keepers”
for the output which is high making the gate
static rather than dynamic! During precharge
both keepers are off; during the evaluate
phase, the output that goes low switches
on the keeper for the output that is staying
high. Really solves capacitive coupling
problems with dynamic logic in datapaths.
datapaths.
JMM v1.3
Cascade voltage switch logic (CVSL)
Q Q clock Q
Q
nmos nmos
combinatorial combinatorial
network network
clock
The static version might be dynamic CVSL
quite slow due to the nfet
pfet “fight” during switching
Q
Q
d e
d a
b a
e b c
c
JMM v1.3
CMOS NORA Logic (NP Domino)
p blocks n blocks n blocks p blocks
pre eval pre
nfets pfets nfets
CLK eval CLK pre CLK eval
If we turn a dynamic gate “upside down” and use pfets to build the
logic block, we get a logic gate that “pree
“preecharges” low and
“discharges” high. By using these gates in an alternating seque
sequence
nce
with regular nfet dynamic gates we can eliminate the race problem
we had with nfet-
nfet-only dynamic gate sequences and hence we don’t
need the buffer inverter present in domino gates.
Removing the buffer is a mixed blessing since we may need it for

drive reasons and to keep compatibility with other domino gates. It
also makes NORA logic very susceptible to noise since during the
evaluate phase all
information is stored dynamically.
JMM v1.3
Domino Life Cycle
Actively pr
precharging
Waiting for precharge Waiting for data

(holding output value) (holding precharge)
Actively evaluating
The “9 O’clock” state is very interesting: once a Domino gate has

has
finished evaluating, the gate’s immediate predecessors can start to
pre
preearge (forcing the gate’s inputs low) without affecting the value of
the gate’s output. The gate is acting as latch so long as its
predecessors don’t start another evaluate cycle.
might be several gates
Perhaps we can build a pipeline of domino stages where each stage
stage
serves as both logic and latch depending on where it is in its cycle.
cycle.
Need to have each stage supply its own pre
precharge/evaluate timing
dependent on what its neighbours are doing...
JMM v1.3
Self--timed Pipelines
Self
0 = precharged
1 = evaluation done
P/E done?
done? P/E done?
done? P/E done?
done?
F1 F2 F3
Simplest correctness rules:

a stage only prec
precharges when both Sdone = 1
(a) its successor has finished evaluating
(it’s done with our values) Pdone = 0
(b) its predecessor has finished precharging
(old values are gone so we can’t use ‘em
‘em twice!)
a stage only evaluates when both Sdone = 0
(a) its successor has finished precharging
(our new output won’t affect its stored value)
(b) its predecessor has finished evalu
evaluating
(there are new inputs for us to consider) Pdone = 1
So, what logic goes in the clouds?

And how do we build the “done?” boxes?
JMM v1.3
Muller CC--Element
Add weak feedback
inverter if we’re worried
about dynamic storage
for precharge/eval
precharge/eval signal
P/E
Pdone
Sdone
The Muller C-
C-Element is the “AND” gate for self-
self-timed
logic because it changes its output only after both inputs
have changed. As shown above, it’s an elegant
implementation for both sets of rules on the previous
slide.
JMM v1.3
Completion Detectors
Self-
Self-timed logic
dual-rail signalling (i.e., two wires) to encode
use dual-
reset (not yet evaluated) 00
ready with value 0 01
and then build handshake logic that starts
next stage when current stage is done and next
stage has completed its previous computation
and delivered its values...
JMM v1.3
Self--timed Pipeline Latency
Self
1 = precharged
0 = evaluation done
C C C
P/E done?
done? P/E done?
done? P/E done?
done?
F1 F2 F3
Propagation through self-

self-timed pipelines
is constrained in both directions:
In the forward direction by how long it takes for the evaluate
edge in one stage to trigger the evaluate edge in the next stage:
stage:
LF = tFÇ + tDÈ + tCÇ
In the reverse direction by how long it takes
for the precharge in one stage to trigger a new
evaluate in the stage after first evaluating the previous stage
(remember not double count!):
LR = 0.5*(t
0.5*(tCÈ + tFÈ + tDÇ + tCÇ + tFÇ + tDÈ)
JMM v1.3
Further Improvements
We don’t have to delay evaluation until successor has finished
its precharge (signalling that it’s finished with our values). We
can just check that successor has started precharging…
precharging… Even
with this improvement, the correct sequencing will still happen
for any combination of precharge and evaluate times for all the
gates.
We can modify the control element like so:
S P/E
Eliminate the “extra”

inverter for good measure
P/E
and use dynamic storage
Pdone as control element memory
Sdone
We’re going to stop here, but there are other

improvements that can be made. Hint: do we have
to wait until the predecessor is done computing
new values before starting our eval?
eval? etc., etc., etc.
JMM v1.3
Dynamic Logic Summary
Advantages of dynamic logic:
smaller area than fully static gates
smaller parasitic capacitances hence higher
speed
reliable operation if correctly designed. Concerns:
capacitive coupling to dynamic nodes
charge sharing with dynamic nodes
subthreshold leakage currents in eval logic
minority carrier injection and latchup
alpha particle immunity
vdd/
vdd/gnd noise and resistance
This makes dynamic logic a good choice for those parts of

a circuit where the extra engineering investment is
justified, e.g., along the critical timing paths.
Engineers who like this sort

of design will find this the sort
of design they like!
JMM v1.3
Coming Up...
Next topic…
CMOS sequential logic.
logic.
Readings for next time ...

Weste:
5.4.4 (dynamic CMOS logic)
5.4.7 - 5.4.11 (CMOS domino logic, CVSL), except
5.4.10
JMM v1.3
VLSI--6
Exercises: VLSI
Weste pp371: 5.9ex8 (difficulty: easy): Design a

CVSL gate for the following fun
function:
ction:
S = A ⋅B ⋅ C + A ⋅B ⋅C + A ⋅B ⋅ C + A ⋅B ⋅C
JMM v1.3
VLSI Design I
Clocking Strategies
“I take care of it” ?
Generator
Clock
Today’s handouts:
(1) Lecture Slides
JMM/ESA v1.0
VLSI Systems Design
Microelectronic Technologies
Overview
microelectronic technologies, ASIC, FPGA, µC
Goal: You are familiar with the microelectronic

technologies, and know their advantages and features.
JMM v1.4
Microelectronic Technologies
What is microelectronic ?
Has a microelectronic design engineer only to have
good knowledge about silicon, layout, etc. ?
application specific
integrated circuit
macro cell full custom
standard cell
gate array microprocessors
PIC, COP
FPGA RISC
uController
PAL CPLD signal processor
field programmable logic
JMM v1.4
Gate Array Technology #1
prefabricated wafers
I/O stages predefined
regular array of fets and interconnection channels
interconnection defines functionality
features
size: 100 - 1M gates
short turn around time
cheap at medium quantities
unsuitable for regular structures like RAM, PLA, ALU
JMM v1.4
Gate Array Technology #2
3 cells of a gate array are illustrated

1 cell corresponds to a 2 input nand gate
JMM v1.4
Sea--of
Sea of--Gate Technology
prefabricated wafers
I/O stages predefined
regular array of fets,
fets, no reserved interconnection
channels
interconnection defines functionality
features
size: 100 - 1M gates
short turn around time
cheap at medium quantities
regular structures like RAM, PLA, ALU can be used
JMM v1.4
SOG Example
nwell contacts
INV NOR2
GND
3 nfets
2 small, 1 large horizontal
mosfets with wiring tracks
common gate in metal-
metal-1
3 pfets
gate isolation VDD

mosfets
unused horizontal
and vertical tracks
used for wiring
gates together.
Better granularity
if main routing
channels run
vertically.
GND
substrate
contacts
vertical wiring tracks
in metal-
metal-1 or metal-
metal-2
JMM v1.4
Standard Cell Technology
complete fabrication process
predefined library of base functions
modular similar to TTL families
features
chip size limits complexity
long turn around time
cheap at high quantities
standardized cell height
unsuitable for regular structures
more flexible and compact (1:4) than gate array
JMM v1.4
Standard Cell Example
Create a library of pre-
pre-layed-
layed-out cells, e.g,, boolean gates,
registers, muxes,
muxes, adders, I/O pads, … A data sheet for
each cell describes the cell’s function, area, power,
propagation delay, output rise/fall time as function of
load, etc.
Quiz: what‘s the

cells function
It’s just like designing with board-

board-level components.
CAD tools help with placing the cells to minimize area
and to meet timing constraints (perhaps directed by a
floorplan created by the user); routers make the
appropriate connections between the cells.
JMM v1.4
Full Custom Technology

total flexibility, only limited by layout rules
manual design
features
long design and fabrication time
efficient use of silicon area
cheap only at highest quantities (ex. uP,
uP, memories, ...)
JMM v1.4
Macrocell Technology #1

semi-- and full custom technologies
combines semi
predefined library of base functions
generators for regular structures
features
short design, long fabrication time
cheap at high quantities
high flexibility,
compact layouts
PLA
macro cell
RAM
JMM v1.4
Macrocell Technology #2
2-dim array of standard cell block
full custom block
full custom block
JMM v1.4
FPGA Technology #1
field programmable device
no fabrication needed for customizing
predefined logic blocks
unsuitable for regular structures
features
size: up to 2‘000’000 logic gates (see Virtex from Xilinx)
Xilinx)
large silicon area necessary (72 million fets,
fets, 10x Pentium2)
short design and customize time
cheap for small quantities
compared to ASICs,
ASICs, FPGAs have a reduced clock speed
circuit configuration downloadable (RAM or PROM)
JMM v1.4
FPGA Technology #2
configurable
logic block (CLB)
I/O buffers
switching
matrix
I/O buffers
I/O buffers
routing
channels
I/O buffers
configuration
- mask programmable
- one time programmable
- downloading of configuration from host into internal RAM
- downloading of configuration from on board serial ROM
JMM v1.4
JMM v1.4
CLB from Xilinx serie XC5200
C1...C4
H1 Din/H2 SR/H0 EC
G4
Din Bypass
G3 Logic F’ S/R
Function Control SD YQ
G’
of
G2 H’ D Q
G1...G4
G1
Logic
Function
G’ EC
of
F’,G’ H’ H’ RD
1
and H1 Y
F4
Din
Bypass
F3 Logic F’ S/R
Function G’ Control SD XQ
F2 of H’ D Q
G1...G4
F1
K (Clock)
EC
FPGA Technology #3
RD
1
H’ X

F’
FPGA Technology #4
Switching matrix with CLBs
CLB CLB CLB
PSM PSM
CLB CLB CLB
PSM PSM
CLB CLB CLB
JMM v1.4
uC Technology
field programmable device

no fabrication needed for customizing
simple C software compilers
software vs. hardware solutions
features
4 or 8 bit CPU, size: 512 bytes or more
down to 8 pins
AD, usart,
usart, timer, etc. included
very slow compared to hardware solutions
cheap (<$2)
PIC
36 mm
JMM v1.4
How to select a technology
Selection arguments
- cost
- speed
- size
- time to market
cost
units ASIC
FPGA
NRE
units
design
design
break even quantity
JMM v1.4
Coming Up...
Next topic…
Hardware description language VHDL, top-
top-down
design.

Xilinx article: The total cost of ownership
JMM v1.4
VLSI--8
Exercises: VLSI #1
Ex vlsi08.1 (difficulty: easy): Calculate the breakeven

point between an FPGA and ASIC design. Assume a
design time of 6 months and an additional back-
back-end
design time of 1 month for the ASIC. The NRE
costs of the ASIC are 75kEuro, the cost per unit
are 150Euro for the FPGA and 3 Euro for the ASIC.
The cost of 1 engineer per month are 10kEuro.
Result: breakeven at 578
JMM v1.4
VLSI--8
Exercises: VLSI #2
Ex vlsi08.2 (difficulty: medium): Calculate the

breakeven point between an FPGA and ASIC design.
Assume the design costs from exercise vlsi08.2
and a fabrication time of 3 months for the ASIC.
The revenue per sold system at a product lifetime
of 4 years is 600Euro without taking into account
the FPGA/ASIC chip costs. Use the triangular
time-
time-to-
to-market model from Synopsys (see Xilinx
article “The total cost of ownership).
Result: breakeven at 14068 FPGA solutions
maximum available
units/time revenue
time
delayed market d
introduction L
product life
JMM v1.4
VLSI Design I
Regular Logic Structures
Today’s handouts:
(1) Lecture Slides
JMM v1.2
Goals for Regular Logic Structures
Look for a systematic physical structure:

w get handle on layout for “random” logic
w automate layout task once schematic is done
w may have several structures to choose from, each optimized
along a different design dimension
standard cells, gate arrays
But we still have to draw the schematic! So look for

systematic logical structures:
w may lead to additional systematic physical layouts
w find canonical logic representations that can be automatically
turned into compact physical structures (automate, automate,
automate…)
w would like to be able to make changes in the logic without
having to redo entire layout -- look for “ECO-tolerant”
structures (engineering change orders)
muxes, ROMs, PLAs
JMM v1.2
Useful Logic Forms
Truth tables
w direct implementation as muxes, ROMs
w good when you have many outputs and few inputs since cost of
“decoding” inputs is fixed
w ECO-tolerant but often not efficient use of logic
Minimum Sum-of-Products (SOP, AND-OR)
w minimize no. of literals (small fan-in ANDs) or no. of products
(small fan-in ORs)
w maximum sharing of product terms for multiple-output functions
w if fan-ins are small: direct implementation as complex gates or
as 2-levels of ANDs then ORs
w if fan-ins aren’t small: multiple levels of gates (e.g., parity,
“Achilles heel” = 2n-1 minterms)
w efficient use of logic, but not very ECO-tolerant
But how do we minimize the number of literals or
minterms? Yeah, we know about Karnaugh maps, but
they aren’t so good for more than 4 inputs or for
maximizing minterm sharing.
JMM v1.2
Logic Manipulation
Start with two-level minimization
w by inspection searching for terms that are logically
adjacent:
p⋅ x + p⋅ x = p ⋅( x + x ) = p ⋅1= p
w Karnaugh maps for simple situations
w Quine-McCluskey otherwise
Then try to generate multiple levels:
w factoring. Choose literal that appears in most product
terms (>1) and factor it out.
F = a ⋅c + a ⋅d +b⋅c +b⋅d +a ⋅e
= a ⋅(c +d ) +b⋅ (c + d ) + ae
w factor again with or-terms that appear in multiple places
F = (a + b) ⋅(c + d ) + ae
w find common subexpressions (multiple output
decomposition)
JMM v1.2
Muxes as “lookup tables”
A B C F
0 0 0 0
0 0 1 0
0 1 0 0 0
0 1 1 1 C
1 0 0 0 C
1 0 1 1 1
1 1 0 1
1 1 1 1
A,B
A,B,C
Easy to implement but not necessarily

compact even when implemented with TGs.
But you can make a nice Boolean Unit:
OP0
OP1 F Vcc
OP2
OP3
B
A,B out
OP<3:0> F
0 0 0 0 ZERO A
1 0 0 0 AND gnd
1 1 1 0 OR
0 1 1 0 XOR
JMM v1.2
Read-only Memories
if connection or if connection or
mosfet is present, mosfet is present,
blank otherwise blank otherwise
7
6
5
4
Address decoder 3 For each Fi,
implemented as OR together
AND (= NOR). 2 all rows for
Note: all but one which output
row pulled down 1 is 1 (actually
for given input. use NOR then
0 invert).
A B C F1 F0
Like muxes, but share decoding logic among

all outputs. Potential optimizations:
w delete rows with no output pulldowns
w look for “adjacent” rows with identical
output pulldown configurations and
merge into single row.
Are these worth doing?
JMM v1.2
PLAs
In fact, the optimizations from the previous
slide are so worthwhile that we have a
name for the resulting “optimized” ROM:
Programmed Logic Array, or PLA for short.
“AND” plane “OR” plane
Hint: for greater
ECO-tolerance, add
4,5,6,7 a few extra empty
2,3 rows!
1
What are the logic
equations for F1
and F0?
A B C F1 F0
PLAs are usually constructed directly from minimized

SOP logic equations: the rows represent the minterms of
the equations, the “input” columns form the minterms
and the “output” columns form the sums.
Note that with multiple output columns,
minterm sharing between the outputs happens naturally...
JMM v1.2
PLA Folding
PLAs can be sparse, i.e., only a few of the
possible connections in either plane may
be made. (AND plane can only have 50%!)
A A B B C C D D F1 F2
1
2
3
4
5
6
If we allow input and outputs to come from

both above and below then we may be able
to fold two columns into one if the rows
they use don’t overlap. This may require
rearranging the rows to minimize overlap
and hence maximize folding possibilities.
A A B B F1
Row folding 6
is another 1
possible 2
optimization 3
(but not in this 4
example). 5
D D C C F2
JMM v1.2
Multiple-input encoding
On the previous slide, it was noted that
the AND plane can have at most 50% of
its connections programmed. Why?
To improve the utilization of the input

columns, consider encoding the 4 columns
used to transmit the two input literals and
their complements with some more useful
functions of the two literals. For example:
AB AB AB AB
A A B B
AIN BIN
AIN BIN
You get extra computing oomph: for example, it’s now
possible to compute (A xor B) using
a single row rather than the two rows it took
with the old encoding.
JMM v1.2
Datapath Operators
Most digital functions can be divided into the
following categories:
u datapath operators
u memory elements
u control structures
u I/O cells
Datapath operators form an important subclass of
VLSI design that benefit from the structured design
principles of hierarchy, regularity, modularity and
locality.
u N-bit Data is generally processed by the use of n
identical subcircuits.
u Data operations may be sequenced in time or space.
JMM v1.2
Datapath Operator Example
Magnitude operator example:
u data may be arranged to flow in one direction
u control signals are introduced in an orthogonal direction
to the dataflow
less than or equal
B m
m Z
- =0
m
A
If (A<=B) then Z=A else Z=B
ctrl
Am
Bm - =0 if Zm
Am-1
B m-1 - =0 if Zm-1
m bits
A1
B1 - =0 if Z1
A0
B0 - =0 if Z0
subtractor equal-zero mux
metal1 control flow
metal2 data flow
JMM v1.2
Coming Up...
Next topic…
Sequential logic: state elements, latches and
registers. Static vs. dynamic storage. Single and
multiphase clocking strategies. Setup and hold
times; propagation delays.
Weste:
u Sections 8.1 thru 8.2 (data operators)
u 8.3.2, 8.4.2 (just read, don‘t study)
JMM v1.2
VLSI Design I
CMOS Sequential Logic
Clocking Strategies
Overview
single and double phase clock systems
Latch and FF timing
Goal: You are familiar with static and dynamic

latches/FFs
latches/FFs as well as with single, double phase
clock, clock redistribution, clock skew and PLL
clocking techniques.
JMM v1.4
Sequential Logic
Use #1: Get better utilization from
idle combinational logic blocks.
Pipeline the system so that new
computations start before the old ones
complete. Add registers to keep
computations separate.
8
A
8 Use #2: Convert parallel operations
x C
B to a sequence of (faster, smaller)
8 serial operations.
operations.
1
A
1
+ C
B
8 8
Use #3: Need to process a

sequence of inputs and want to
reuse the same hardware (finite
state machine).
JMM v1.4
Flip--Flops
Latches and Flip
Q follows D
D Q D
G G
Q
level sensitive latch
Q stable
Q takes value from D
D Q D
clk clk
Q
edge sensitive flip-
flip-flop
Q stable
A static latch will hold data while G is inactive, however long

that may be. A dynamic latch will hold data while G is
inactive, but only “for a while”, after which the saved value
may decay.
Do static latches dissipate static power?
How long is “for a while”?
Which one should I use?
JMM v1.4
Latch Timing Constraints #1
latch a latch b
D Q CLa D Q CLb D Q
G G G
CLK
t1a
t2b
H S
CLK H S
Do I have to
check ALL these t1a = tnqa+ tnla > thb
constraints?
t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
th = hold time
ts = setup time
tn = min delay from invalid input to invalid output
tx = max delay from valid input to valid output
tl = delay for combinatorial logic from input to output
tq = delay for memory element from G to Q
tc0 = low period of clock cycle tc

JMM v1.4
Latch Timing Constraints #2
t1a
t2b
H S
CLK H S
t1a = tnqa+ tnla > thb

t1b = tnqb + tndb > tha
t2a = txqa + txla < tc0 - tsb
t2b = txqb + txlb < tc1 - tsa
Questions for latch-
latch-based designs:
how much time for useful work (i.e. for combinational logic
delay)?
txla + txlb < tc - 2(t
2(ts + txq)
what is the maximal clock frequency
1/f = tc > 2(t
2(txq + txl + ts )
does it help to guarantee a minimum tn, for example, by requiring
a minimum number of gates in each cloud?
Suppose the maximum clock skew is tSKEW. How does that affect
the equations above? Clock skew measures the difference in
arrival of CLK at two cascaded latches (not necessarily any two
latches!).
JMM v1.4
Static Latches
Basic idea: Want storage node to
be isolated from whatever
Need gain around user does to Q.
this loop to make 0
latch static.
Q
D 1
Would like fast CLK-
CLK-to-
to-Q,
small setup and zero hold
times.
CLK
Oops… feedback not
Obvious implementation: isolated from Q. Could
add additional
output inverters...
Good! Input goes

only to fet gates
Q
D D
CLKN
CLK CLK
Should we buffer CLK
0, 1 or 2 times?
JMM v1.4
Latch Timing
1 2
CLK
setup time = how long D input has to be stable

before CLK transition.
hold time = how long D input has to be stable
after CLK transition.
ts
th
CLK
So, what node should we use to measure

setup and hold times? And what should we measure?
Other time of interest: CLK-

CLK-to-
to-Q MicroLab, VLSI-10 (7/23)
JMM v1.4
Dynamic Latches
Suppose in the interest of speed we were
willing to give up the “static guarantee”
and take our chances with dynamic latches,
i.e., remove feedback path...
Eliminate when
Q fanout is small (1)
D Q
Can combine
other logic
with inverter
CLK local or global
clock inverter?
Can we do without the CLK inverter too?

DEC did without on 21064 but put in back in for 21164
CLKN D Q
D Q
CLK
CLK
Delete the PFET driven by CLKN and then add

NFET driven by CLK in Q’s pulldown path to
handle what happens when D goes from 1 to 0.
JMM v1.4
Flip--flops (registers)
Flip
Using alternating positive and negative dynamic latches with
a single clock gives great speed and small area, but…
lots of worries about clock skew
must balance logic delays to minimize wastage
need latch size checks (check optimisations!)
What about those of us who don’t have buildings full of

engineers to sweat the details? Use D-flip-
flip-flops and
address all the problems once!
D D Q D Q Q D D Q Q
master slave
G G CLK
CLK
D
CLK
Q
!
JMM v1.4
Flip--flop Implementations
Flip
Obvious implementation:
Q
D
CLK
Use “jamb” latches to lighten CLK load:

“Weak” feedback inverters
(long n and p) get overridden
D Q
CLK
JMM v1.4
Flip--Flop Timing
Flip
D Q CL D Q
clk clk
CLK
t1
t2
CLK
t1 = tnq + tnl > th

t2 = txq + txl < tc - ts
Questions for register-

register-based designs:
how much time for useful work (i.e. for combinational logic
delay)?
does it help to guarantee a minimum tn? How about designing
registers so that
txq > th?
Supp
Suppose the maximum clock skew is tSKEW. How does that affect
the equations above?
JMM v1.4
Flip--Flops
Dynamic Flip
I’ll have the Christer Svensson
special please!
2
CLK QN
CLK is low:
node 1 follows not(D)
node 2 pulled up
QN is “floating” with it’s old value
CLK is high:
node 2 = “0” if node 1 = “1”,
otherwise it stays “1”
Ö node 2 = not(node 1) shortly after CLKÏ
QN = not(node 2) Ö stable soon after CLKÏ
node 1 can be pulled down if D goes to “0” (capacitive
coupling), but node 2 won’t change!
JMM v1.4
Single--Phase Clocked Systems
Single
RTL #1:
D Q D Q D Q
clk clk clk
CLK
latch #2:
D Q D Q D Q
G G G
CLK
Simplest clocking methodology is to use a single clock in conjunction

conjunction
with a register. Clocks are generated with global clock buffers.
CLK and CLK are generated locally.
buffers necessary
for large loads
clk-
clk-in
clk
clk
JMM v1.4
Clock Skew
D Q D Q D Q
clk clk clk
CLK delay delay
if a clock net is heavily loaded, there might be a race

between clock and data -> clock skew
special attention has be made by designing the clock
tree. CAD tools are able to design balanced clock trees.
two methods to avoid clock skew:
latch
D Q D Q D Q
clk clk clk
CLK delay
D Q D Q
clk clk
delay CLK
JMM v1.4
Two--Phase Clocked Systems (latch)
Two
D Q D Q D Q
G G G
PHI1
PHI2
phi1
“non-
“non-overlapping
two phase clocks” phi2
a problem in single phase clocked systems is the

generation an
and distribution of nearly perfect overlapping
clocks.
in two-
two-phase clocked systems this is solved by non-
non-
overlapping clocks
non-
non-overlapping clocks can be generated with latch
structures
clk
≥1 phi1
≥1 phi2
JMM v1.4
Two--Phase Clocked Systems (FF)
Two
D Q D Q D Q
clk clk clk
CLK
CLK
“non-
“non-overlapping
two edge clocks”
in properly designed two-

two-edge clocked systems clock
skew problems are drastically reduced
Disadvantage: 50% speed reduction
typical application: FSM on rising edge, data-
data-path on
falling edge
designs with several FSMs and data-
data-paths need thorough
design
JMM v1.4
Clock Distribution
Two main techniques for clock distribution exist:
a single large buffer (see Alpha processor)
a distributed clock tree approach
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
n-bit datapath
delays have
n-bit datapath to match
clk between
n-bit datapath
n-bit datapath stages
n-bit datapath
n-bit datapath
n-bit datapath
there is no such thing as design-

design-free clocking
strategy in today’s high-
high-performance processes
clock buffers should be surrounded by power pads
due to its large power consumption
vdd clk gnd clk
clk clk clk clk driver
clk
JMM v1.4
Phase Locked Loop Clock Technique
Phase locked loops (PLL) are used to generate
internal clocks on chips for two main reasons:
to synchronize the internal clock of a chip with an
external clock
to operate the internal clock at a higher rate than
the external clock input
clock clock
PLL
clock clock
route route
dclk dclk
dclk+dpad dpad
clock clock
dclk dclk
data out data out

JMM v1.4
PLL Divider #2
by n
up VCO
Phase Charge voltage
Filter
fosc Detector down Pump controlled n x fosc
oscillator
PLL
fosc
ffeed
up
down
Ufilter
The phase detector produces a sequence of up/down
pulses, which are used to switch a charge pump.
The charge pump charges/discharges a capacitor
with voltage or current pulses
A filter is used to limit the rate of change of the
capacitor voltage. The result is a slowly changing
voltage that depends on the frequency difference
between the PLL and VCO.
The VCO increases/decreases its frequency of
operation depending on its input voltgae
JMM v1.4
Static Timing Analysis
Do I have to Yup, for every pair of connected
check ALL the register/latches AND for all
constraints? possible data values!
We need a CAD tool: static timing analyser. Here’s how

it works:
Step 1: “Level-
“Level-ize”
ize” all signal nodes.
Start by assigning all register outputs and top-
top-level inputs a
level of 0. For all other gates: levelOUTPUT =
max(level
max(levelINPUT)+1.
Step 2: Compute min/max signal delays.

For each successive node level, compute min and max time for
all nodes on that level (see next slide for details). This is a
“data independent”
independent” computation. Might need case analysis to
avoid false paths.
paths.
Step 3: Check setup and hold constraints

Use min times of register inputs to check hold time. Use max
times and tCLK to check setup time or use max time + tSETUP
to determine min tCLK.
JMM v1.4
Stage Delay Computation
Look at each gate and use knowledge of input timing and rise/fall
rise/fall
timing to compute earliest and latest time output could change ffor
or
both rising and falling output transitions.
IN VDD
INÏ Ö OUTÐ
C1 COUT
2
CLKN min Ö 1=OV, fast
IN OUT max Ö 1=VDD, slow
CLK
1 IN GND
INÐ Ö OUTÏ
C2 COUT
Other transitions:
CLKÏ, CLKÐ, CLKNÏ, CLKNÐ min Ö 2= VDD , fast
max Ö 2=0V, slow
Use Penfield-
Penfield-Rubenstein model to compute
td,in-
d,in-out = sum(R
sum(Ri,Ci) over all nodes “i” in the stage, where Ri is
total “effective resistance” to power rail and Ci is non-
non-zero if node
capacitor needs to be charged/discharged. Multiply by degrading
factor to account for rise/fall time of input.
JMM v1.4
Coming Up...
Next topic…
Data operators

Weste:
Sections 5.5 thru 5.5.6 (latch, FF)
5.5.8 thru 5.5.11 (clock strategy)
5.5.15 and 5.5.16 (clock strategy)
Selfstudy…
Selfstudy…
Weste:
PLL section 9.3.5.3
JMM v1.4
VLSI--10
Exercises: VLSI
Ex vlsi10.1 (difficulty: easy): calculate peak current

and power consumption of a 100MHz clock driver
with rise and fall times of 1ns driving 30k registers
bits at 100fF each with Vdd=3.3V
Vdd=3.3V
Result: Ipeak=9.9A, Pd=2.18 Watt
JMM v1.4
Intro to VLSI Systems
Finite State Machines
Today’s handouts:
(1) Lecture Slides
JMM/ESA v1.0
Excuse me… Is there such a thing
as unclocked
sequential logic?
Wave pipelining
just assert new inputs to logic after waiting “long enough” to
ensure that previous values won’t be corrupted. Requires very
careful design of each level of logic to ensure consistent
propagation delay along all paths with all possible data values.
Hard to do in the face of manufacturing variataions (“fast N, slow
P” and vice versa)
Self-timed logic
use dual-rail signaling (i.e., two wires) to encode
reset (not yet evaluated) 00
and then build handshake logic that starts
next stage when current stage is done and next
stage has completed its previous computation
and delivered its values. Dual-rail logic works well
with precharge-evaluate gates… more on this
in a later lecture.
JMM/ESA v1.0
Finite State Machines
Draw and check state transition diagram
merge equivalent states
perform state encoding
design logic implementation
JMM/ESA v1.0
Correct State Diagrams
in/out 1/0
S1 S2
1/0 1/0 1/0
0/0 0/0
S3 S8 S9 S4
1/1
-/0 1/0 -/0
0/0 0/0
S5 S6
1/1 -/1 1/0 0/0
S7
Is this a Mealy or Moore machine?
Arcs leaving a state must be:

(1) mutually exclusive
can’t have two choices for a given input value
(2) collectively exhaustive
every state must specify what happens for each possible
input combination. “Nothing happens” means arc back to
itself.
JMM/ESA v1.0
Merge Equivalent States
Two states are equivalent if for each
possible combination of inputs
(1) they have identical outputs
(2) they transition to equivalent states
0/0 0/1 1/0 1/1 0/0
S1 S2 S3 S4 S5
0/1 1/1 1/1 0/1
1/1
Compatibility table:
S2 start by putting “X”
in square (Si,Sj) if Si
produces different output
S3 from Sj for some input
all but
first
state
S4
X
S5 X
S1 S2 S3 S4
all but last state

JMM/ESA v1.0
0/0 0/1 1/0 1/1 0/0
S1 S2 S3 S4 S5
0/1 1/1 1/1 0/1
1/1
S2
S3
S4
X
S5 X S1,S5
S1 S2 S3 S4
Next: for non-X square (Si,Sj) write in pairs of states that have to be
equivalent in order for Si and Sj to be equivalent.
Finally: Look at an entry in (Si,Sj). If entry is “Sm,Sn”, and if

(Sm,Sn) has an X, put an X in square (Si,Sj). Repeat until no more
squares can be X’ed out.
Remaining squares indicate equivalent states.
JMM/ESA v1.0
Perform State Encoding
Given a minimized symbolic state diagram,
assign binary codes to the states. We need to predict the
effects of logic minimization and find state encoding the
produces smallest logic implementation.
This is hard when number of states is large!

current new
input state state output
0 S1 S1 1
1 S1 S2 0
0 S2 S1 1
1 S2 S3 1
- S3 S4 0
- S4 S1 1
0 01 01 1 1 00 10 1
1 01 00 0 “Q-M” 0 0- 01 1
S1=01 S3=10
0 00 01 1 - 10 11 0
S2=00 S4=11
1 00 10 1 - 11 01 1
- 10 11 0
- 11 01 1
0 00 00 1 0 0- 00 1
1 00 01 0 “Q-M” 1 -0 01 0
S1=00 S3=10
0 01 00 1 1 01 10 1
S2=01 S4=11
1 01 10 1 - 10 11 0
- 10 11 0 - -1 00 1
- 11 00 1
JMM/ESA v1.0
FSM Logic Implementation
Multi-level
Logic
ROM
“One hot”
Registers
PLA
“One hot” encoding uses a separate register

for each possible state: register output is “1”
if FSM is in that state. Hence only one state
register is “hot” at a time. Makes for trivial
decoding of state, simple next state logic.
Good for simple FSMs and when no multi-level
synthesis is available. Often a good choice
for FPGA’s.
JMM/ESA v1.0
Coming Up...
Next topic…
Arithmetic circuits: adders and multipliers.

Weste: 8.4
JMM/ESA v1.0
VLSI Design I
Datapath Operators: Addition and Multiplication
Didn’t I learn how

to do addition in
the first year? 01011
First year courses
arent’ what they +00101
used to be... 10000
Overview
Carry propagate, carry lookahead,
lookahead, carry save, carry skip
and carry select adder
Goal: You know serial and parallel addition and

multiplication architectures
JMM v1.4
Addition/Subtraction
Most digital functions can be divided into the

following categories:
datapath operators
memory elements
control structures
I/O cells
Adder architectures:
carry-
carry-propagate adder (CPA)
ripple carry adder
carry-
carry-lookahead adder (CLA)
manchester carry adder
Why can‘t we just add
hierarchical carry-
carry-
carry-save adder (CSA)
carry-
carry-skip adder
carry-
carry-select adder
parallel adder
serial adder ...
JMM v1.4
Binary Addition
Here’s an example of binary addition as
one might do it by “hand”:
Carries from previous
1 1 0 1 column
01101
+00101
10010
If we use a two’s-
two’s-complement representation
for signed integers, the same procedure will
work for adding both signed and unsigned
numbers.
Besides the sum, one often wants two other
bits of information from an adder:
carry-
carry-out: indicates that add in the most significant position
produced a carry; used when implementing multi-
multi-word arithmetic,
e.g, “1 + (-
(-1)”
C =a ⋅b +s ⋅(a +b )
n−1 n−1 n−1 n−1 n−1
overflow: indicates that the answer has too many bits to be
represented correctly by the result width (2‘s complement),
e.g., “(2N-1 - 1)+ (2N-1- 1)”
V =a ⋅b ⋅ s +a ⋅b ⋅ s
n−1 n−1 n−1 n−1 n−1 n−1
JMM v1.4
Adder with “ripple” carry chain
To convert the simple addition procedure to hardware, we’ll
need “full adder” module:
A B CIN COUT S
A COUT
0 0 0 0 0
B 0 0 1 0 1
CIN S 0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
One-
One-bit adders are sometimes 1 0 1 1 0
called “counters” since they 1 1 0 1 0
count the number of 1’s on their 1 1 1 1 1
inputs and encode the answer
on their outputs. Thus a full
adder is a 3:2 counter. S = A⋅ B⋅Cin+ A⋅ B⋅Cin+ A⋅ B⋅Cin+ A⋅ B⋅Cin
Cout= A⋅ B+ A⋅Cin+B⋅Cin
COUT
AN-1
Carry “ripples” from BN-1
one stage to the next SN-1
...
A2
B2
S2
A1
B1
A0
C0 S1 propagation delay
B0
S0
_______________
CIN
JMM v1.4
Faster carry logic (CLA)
Let’s see if we can improve the speed by
rewriting the equations for COUT:
COUT = AB + ACIN + BCIN
= AB + (A + B)CIN
= G + P CIN where G = AB and P = A + B
generate propagate
For adding two N-

N-bit numbers:
CN = GN + PNCN-1
= GN + PN GN-1 + PN PN-1CN-2
= GN + PN GN-1 + PN PN-1GN-2 + … + PN ...P0CIN
So if we had (N+1)-
(N+1)-input gates and didn’t mind a lot of
loading on the P signals,
signals, the propagation delay of adder
built using this equation for the carries would be (count
per fan-
fan-in 1 delay unit: ripple carry: 5N delays):
____________________________________
Of course, this is impractical but it does lead to some
interesting ideas:
faster ripple-
ripple-carry implementations
hierarchical carry-
carry-lookahead adders
JMM v1.4
Manchester carry chain (CLA)
The plan: first generate carry-
carry-in for each adder bit as fast
as we can then compute the sum. Delay still proportional
to size of adder, but “constant” is pretty small.
static Manchester stages
P=A+B PN PN
GN CN-1
PN
PN CN
CN-1 CN
GN
GN
PN
PN
When CLK is low, all

dynamic Manchester stage C nodes precharge.
precharge.
CLK
CN-1 CN
PN GN
When CLK is high, if GN
is high, CN is asserted,
CLK i.e., driven low.
To prevent GN from affecting CN-1, PN must be

computed as AN xor BN. But we needed the xor
anyway… now SN = PN xnor CN
JMM v1.4
Manchester Adder Block (CLA)
PNPN+1PN+2PN+3 link in Manchester
carry chain
SN SN+1 SN+2 SN+3
xnor xnor xnor xnor

CN+3
Cin
Cin
P G P G P G P G
A B A B A B A B
AN BN AN+1 BN+1 AN+2 BN+2 AN+3 BN+3
The propagate logic in the Manchester carry chain

puts a lot of NFETs in series, so when CIN is high
the pulldown path can get long if a lot of the P
signals are true. For most technologies, the
performance of this long pulldown path limits
the maximum length of the carry chain to around
four stages before it needs to split into subchains.
subchains.
Adding a bypass path that skips over the block

when all P signals are true can improve maximum propagation delay
delay
when multiple Manchester carry chains are used in series.
JMM v1.4
carry--lookahead adders
Hierarchical carry
The linear growth of adder carry-
carry-delay with size of the input word may
be improved by calculating the carries to each stage in parallel:
parallel:
“generate a carry from bits I thru

K if it is generated in the high-
high-order
(J+1,K) part of the block or if it is
CJ = GIJ + PIJCI-1 generated in the low-
low-order (I,J) part
of the block and then propagated
GIK = GJ+1,K + PJ+1,K GIJ thru the high part”
PIK = PIJ PJ+1,K where I <= J and J+1 <=K
7 6 5 4 3 2 1 0
6,7 4,5 2,3 0,1
4,7 0,3 log2(n)
0,7
AK SK BK GJ+1,K CJ PJ+1,K
GIJ
K I,K C I- 1
PIJ
PK CK-1 GK GIK CI-1 PIK
JMM v1.4
Carry--skip adders
Carry
Since computing PIK is simpler than computing GIK, let’s try just
computing PIK and apply the “skip” optimisation from Manchester
adders.
C12 C0
P8,11 C8 P4,7 C4
Supp
Suppose
ppose it takes 1 time unit for a signal to pass thru
two logic levels, then
time to ripple thru block of k bits = k time units
time to skip a block = 1 time unit
Consider a 24-
24-bit carry-
carry-skip adder organized as
6 blocks of four bits each. So the worst case propagation time is
4 + 1 + 1 + 1 + 1 + 4 = 12 time units
ripple skip ripple
But now reorganize the adder with the least significant 3 bits inin the
first block, the next 4 bits in the second block, followed by bl
blocks
ocks of
5, 5, 4, and 3. Now the worst case propagation time is
3 + 1 + 1 + 1 + 1 + 3 = 10 time units
JMM v1.4
Late--arriving inputs
Late
Is there a general way to reorganize a
logic equation to accommodate a late-
late-
arriving input?
Consider the following where X1 arrives late:

f = X ⋅X +X ⋅X +X ⋅X
1 2 1 3 4 5
If we want only one gate delay from X1 to

the output f, how do we do it?
JMM v1.4
Carry--select adders
Carry
Building on the idea from the previous slide: perform two
additions in parallel, one assuming the carry-
carry-in is zero and
the other assuming the carry-
carry-in is one. When the carry-
carry-in
is finally known, the correct result is selected from the two
precomputed results.
0 0
... ...
1 1
CIN
>=1 1 0 ... 1 0
>=1 1 0 ... 1 0 ...
& &
Is this a “mux
“mux”?
mux”?
If it takes k time units for a block to add k-

k-bit numbers
and if it takes one time unit to compute mux select from
the two carry-
carry-out signals, then for optimal operation each
block should be one bit wider than the next block, just as
in the carry-
carry-skip adder.
JMM v1.4
JMM v1.4
32-
32-bit carry-
carry-select adder
Adder layouts
32-
32-bit carry-

N--bit numbers
Adding M N
“carry-
“carry- M-1
propagate”
... ... ... ... ... ...

N ...
0 0 0 0 0 0
prop delay _____________ area _____

“carry-
“carry-save” M-2
... ... ...

N
0
0 0 0
prop delay _____________ area _____

JMM v1.4
Even--Odd Arrays
Even
Abstract carry-
carry-save picture from previous page:
M-2
...
CSA
CSA
CSA
CSA
CSA
CPA
Rewire so that first two adders work in parallel.
Feed results into third and fourth adders which
also work in parallel, etc.
M-4 2
...
CSA
CSA
CSA
CSA
CSA
CSA
CPA
prop delay _____________ area _____

Even and odd streams pass through half the
adders so even/odd design runs at almost twice
the speed of simple CSA implementation.
JMM v1.4
Wallace Trees
O(log1.5M)
CSA
CSA
CSA
CSA
CPA
CSA
CSA
... We have been using full-

full-adders
or 3:2 counters in our array
adders. Higher fanin-
fanin-counters
can be used to further reduce
delays for large M, e.g., Weste
CSA
shows a 5:3 counter in Fig. 8.41.
Wallace trees give asymptotically better behaviour than the earlier

earlier
O(M) schemes, but they do not have a regular layout. Other
O(log(M)) schemes, e.g., binary-
binary-tree multipliers using signed
digit representations, have better layout properties but at a cost
cost of
more complicated adder cells.
JMM v1.4
Bit--Serial Adder
Bit
• bit-
bit-serial adders are very slow, have a high data
latency, but are extremely compact
• applications are signal processing
cout
FF
clk clr
A
result
n-bit register
B n-bit register
n-bit register cin

clk
JMM v1.4
CSA Adder (pipelining)
• pipelining adders are extremely fast, but lack of
high data latency (CSA structure of slide #13)
nc
FF
S=A+B+C+D Carry
FF
FF
D(3)
FF
FF
A(3) S(3)
B(3)
C(3) FF
FF
D(2)
FF
FF
A(2) S(2)
B(2)
C(2) FF
FF
D(1)
FF
FF
A(1) S(1)
B(1)
C(1) FF
FF
0
D(0)
FF S(0)
FF
A(0)
clk B(0) CPA adder
C(1) 0 clk
CSA adders
JMM v1.4
CPA Adder (pipelining)
• the CPA structure on slide #13 can also be used in
a pipeline structure. Useful in signal processing
applications.
FF Carry
B(3) FF FF FF FF FF S(3)
A(3) FF FF FF FF
FF
A(2) FF FF FF
FF
A(1) FF FF
FF
A(0) FF
Cin FF

CSA adders
JMM v1.4
Binary Multiplication
Suppose we want to multiply two numbers:
multiplicand
A = {AN-1, AN-2, …, A1, A0}
B = {BM-1, BM-2, …, B1, B0}
multiplier
to produce a (N*M)-
(N*M)-bit result. We can write the product
as
A*B = B0*A*20 + B1*A*21 + … + BM-1*A*2M-1
Note that BK*A can be accomplished with N AND gates
since BK = 0 or 1. The scaling by powers of two is a
simple shift.
Thus multiplication of an N-
N-bit number by an M-
M-bit
number boils down to the addition of M N-
N-bit partial
products each of which is formed by a simple Boolean
operation. Any of the techniques from the previous slides
can be used to accomplish the required additions.
JMM v1.4
Array multipliers
Example 3x3 array multiplier nc P5
using CSAs to sum partial A2B2
products: P4
0
A2B1 A1B2 0
P3
0
0 A1B1 A0B2
P2
A2B0
0
0 A0B1
P1
A1B0
0
0
P0
A0B0
0 Actual layout is usually squished flat:
JMM v1.4
Higher Radix Multiplication
Array multipliers are nice, but we get one column of adders (which
(which are
big/slow) for each partial product, i.e., one column for each bit
bit of the
multiplier. If we could use, say, 2 bits of the multiplier in generating
generating
each partial product we would halve the number of columns and double
the speed of the multiplier!
multiplier!
Let’s rewrite our equation for A*B:
A*B = B1,0*A*20 + B3,2*A*22 + … + BM-1,M-

1,M-2*A*2
M-2
This looks the same as before except we have half as many partial
partial
products to sum. Generating each partial product is now more
complicated since BK+1,K can now be 0, 1, 2 or 3. The only
troublesome value here is 3 since that would seem to require more
more
adder inputs than we have (3*A = A + 2*A).
But…
we can also write 3*A = 4*A - A. We’ll do the -A in this partial
product stage and signal the next stage that it needs to add 4*A.
4*A. To
keep the signalling simple we’ll also rewrite 2*A = 4*A - 2*A
Profs go crazy nowadays, why
can‘t he just multiply as
everybody does it
JMM v1.4
(Radix--4)
Booth Recoding (Radix
A*B = B1,0*A*20 + B3,2*A*22 + … + BM-1,M-
1,M-2*A*2
M-2
AN-1 AN-2 … A4 A3 A2 A1 A0
x BM-1 BM-2 … B3 B2 B1 B0
M/2 2
...
BK+1,K*A = 0*A Ö 0
= 1*A Ö A
= 2*A Ö 4*A - 2*A
BK+1 BK BK-1 action N x1 x2 = 3*A Ö 4*A - A
0 0 0 add 0 -- 0 0
0 0 1 add A 0 1 0
Ai
0 1 0 add A 0 1 0 x1
&
>=1
0 1 1 add 2*A 0 0 1 Ai<<1 &
x2 PPi
1 0 0 sub 2*A 1 0 1 =1
1 0 1 sub A 1 1 0 N
carry-
carry-in
1 1 0 sub A 1 1 0
1 1 1 add 0 -- 0 0
Not cheaper than an ADD but all recodes
can be done in parallel so we only pay
time penalty once (for first column)!
JMM v1.4
16x32 Booth Multiplier
This multiplier only produces a 32-

32-bit result so
top 16-
16-bits of “rhombus” have been omitted:
top 16 bits omitted
32
JMM v1.4
Serial Multiplication
• bit-
bit-serial multipliers are very compact, but lack of
high data latency and are very slow
• simplest form of serial multiplier: successive addition
cout
& FF
reset
clk result
clr
A
&
B N-1 bit register
clk
M+N bit product -> td=MN time intervals
JMM v1.4
Serial/parallel and Pipelined
Multiplication
• serial/parallel multiplier: very modular structure
Y0 Y1 Y2 Y3
X1 X0 0 0
&
&
&
&
P
M+N bit product -> td=M+N time intervals, but time intervals are larger
• pipelined multiplication: 2 delay elements per cell
Yn
Xj Xj+1
&
PPin PPout
JMM v1.4
Shifters
• Shifters are very important for microprocessor
architectures:
– arithmetic shifting
– logical shifting
– rotation functions
• barrel shifter constructed by transmission gates
shift3 shift2 shift1 shift0
result3
result2
input6 result1
input5
result0
input4
input3
input2
input1 Operation: input
input0 logical right shift 0,0,0,A(3:0)
logical left shift A(3:0),0,0,0
right rotate A(2:0),A(3:0)
left rotate A(3:0),A(2:0)
arithmetic right shift A3,A3,A3,A(3:0)
arithmetic left shift A(3:0),A0,A0,A0
JMM v1.4
Coming Up...
Next topic…
VLSI fabrication: processing steps, basic
structures, self-
self-aligned processes, P and N devices.

Weste:
Sections 8 thru 8.2.1.6 and 8.2.7.3
8.2.7 thru 8.2.8
Self study Weste:

parity generators 8.2.2
comparators 8.2.3
zero/one detectors 8.2.4
binary counters 8.2.5
Boolean operations - ALUs 8.2.6
JMM v1.4
VLSI--12
Exercises: VLSI
Ex vlsi12.1 (difficulty: medium): Develop a 1 bit full

adder with not more than 3 fets in series for the
not(sum) and not more than 2 fets in series for the
not(carry) circuit. The not(carry) signal can be
used for the sum circuit.
Result: Notice that the n-
n- and pfet blocks are
identical and not complementary.
A
A B A A B C B
B C
C Carry Sum
B C
A B A A B C B
JMM v1.4
VLSI--12 con‘t
Exercises: VLSI
Ex vlsi12.2 (difficulty: easy): A 32-
32-bit adder is built as a
carry-
carry-select adder. Each adder as well as the muxes have
one delay unit. Find the optimal structure in respect to
speed.
Result: The maximum speed is 9 time units for a structure
with stages 4-4-4-5-6-7-6 (see Weste pp532)
Ex vlsi12.3 (difficulty: easy): A hierarchical carry-
carry-
lookahead adder (see slide 8) is given. Show
algebraically that C3=G03+ P03 Cin corresponds to the
equation C3=G3+P3 G2 +P3 P2 G1 +P3 P2 P1 G0 +P3
P2 P1 P0 Cin (note that Gii= Gi and Pii= Pi)
Ex vlsi12.4 (difficulty: easy, time consuming): Design a
VHDL code for a 32-
32-bit hierarchical carry-
carry-lookahead
adder (see slide 8). If one block has a delay of 1 time
unit, what is the overall delay.
Result: The total delay is 9 time units
Ex vlsi12.5 (difficulty: medium): Consider X1 as a late
arriving input which needs to be speed up. Develop the
circuit for the function: f = X1⋅ X2 + X1⋅ X3+ X4⋅ X5
JMM v1.4
VLSI Systems Design
Design Project: Practical Aspects
I am a VHDL expert.
But how applying
in real live – for my MP3 player!
Overview
applying the “description-
“description-synthesis” design
method in practice
Goal: You are able to master your own VHDL project.

project. You
have basic notions about HW/SW co-co-design.
JMM v1.4
Project Goal
Goal:
design of an
an electronic system from specification
down to ASIC/FPGA
Problem:
one of the most difficult tasks in a VLSI project
design is to find the starting design point
Basic Steps:
in order to proceed in a structured manner, you
should perform the following steps
block diagram
HW/SW co-co-design (hardware/software co-
co-design)
IP cores (intellectual property cores)
hardware software co--design
co
FSMD architecture model structured software design

VHDL coding & simulation C coding, compiling
hardware software co--design
co
hardware/software system simulation

synthesis, place & route
back-
back-annotation & simulation (formal design verification)
JMM v1.4
chip test
Initial System Design Steps
System design steps
1. identify your chip in the overall system
block diagram
2. define the chip IO and group them to blocks

3. identify functional units of your chip
4. identify the interconnection between your units
co-design
5. identify speed sensitive (HW) and control sensitive (SW)

tasks
HW/SW co-
6. define the “intelligence” of each functional unit
7. identify IP cores
8. organize as much as possible IP cores (tools, core
generators, old designs, internet)
IP cores
9. update design if necessary according to available IP cores

10. define inter-
inter-process communication
11. define the interconnections between your units
In the classical HW/SW co- co-design approach, the

design process is continued as long as possible
independent of its implementation. HW/SW design
units are identified at the very end of the design
steps. In smaller designs, as it is in our case, the
HW/SW co- co-design step is done in an early phase.
JMM v1.4
Project MP3 Player: step 1
(block diagram)
Step 1: identify your chip in the overall system
USB
USB LCD
LCD
MP3
MP3Player
Player
ASIC/FPGA
ASIC/FPGA
Keyboard
Keyboard MP3
MP3Decoder
Decoder
Power
Power Flash
FlashMemory
Memory DAC
DAC
JMM v1.4
Project MP3 Player: step 22--4
(block diagram)
Step 2: define the chip IO and group them to
blocks
Step 3: identify functional units of your chip
Step 4: find the interconnections between your
units
MP3 Player ASIC/FPGA
power main LCD

management control interface
I2C interface
keyboard Decoder
interface interface
I2S interface
USB Flash DAC

interface interface interface
JMM v1.4
Project MP3 Player: step 5
(HW/SW Co Co--Design)
Step 5: identify speed and control sensitive tasks
Step 6: define the “intelligence” of each
functional unit
add “intelligence” ?
control sensitive
power main LCD

management control interface
I2C interface
add “intelligence”
keyboard Decoder
interface interface
I2S interface
USB Flash DAC

speed sensitive add “intelligence”

JMM v1.4
(Hardware Design)
Step 7: identify IP cores
Step 8: organize as much as possible IP cores
(tools, core generator, old designs, internet)

PIC core LCD
power main interface
management control
I2C interface
Decoder
interface
keyboard
interface
USB core
I2S interface
USB Flash DAC

JMM v1.4
(Hardware Design)
Step 9: update design if necessary according to
available IP cores
Step 10: define inter-
inter-process communication
Step 11: define the interconnection between units

PIC core LCD
power main interface
management control
Decoder
interface
“intelligent”
keyboard Port A
DAC
interface interface
Port B
USB core Port C
Port D
“intelligent” “intelligent” “intelligent”

USB
flash I2S I2C
interface
JMM v1.4
Hardware/Software Design Steps
Hardware design project steps:
I. imagine your chip working in the target system, identify
and describe its basic functional units in a data-
data-flow view
FSMD architecture model
II. find the RTL structure of each of the above data-

data-flow
functions and update your block diagram by allocating your
RTL structure to one or more functional units
III. fix in detail the operation of your functional units (local
intelligence or data-
data-path only) and add FSMs if required,
fix the detailed interconnections between your units
IV. design all FSMs,
FSMs, define clock strategy, use colored data-
data-
flow, be careful with the inter-
inter-process communications
V. VHDL coding of your RTL design

VHDL coding
VI. test bench design

VII. simulate your VHDL design with test bench
Software design project steps:

I. design the software structure as learned in SW
software design
engineering courses
structured
II. define the data structure

III. define the HW/SW communication
IV. develop the C code

C coding
V. compile & verify your C code

JMM v1.4
Project MP3 Player: step I
(Hardware design project steps)
Step I: imagine your chip working in the target
system, identify and describe its basic functional
units in a data-
data-flow view
download MP3 song from host to flash
memory (flow 1):
9 generate flash command, generate flash address
9 load byte from USB into register
9 use byte to execute ECC (Hamming code)
9 update flash address
9 store byte into flash
9 write ECC code after 512 bytes
9 generate write-
write-to-
to-flash after 512 bytes
9 use pipeline structure to speed up data transfer
mainPIC core
LCD
power interface
management control
Decoder
interface
“intelligent”
keyboard Port A
DAC
interface interface
Port B
USB core Port C Port D
“intelligent”
USB
lash “intel.”
intel.” “intel.”
intel.”
interface
interface I2S inter. I2C inter.
JMM v1.4
Project MP3 Player: step II
(hardware design project steps)
Step II: find the RTL structure of each of the
previous data-
data-flow functions and update your
block diagram by allocating your RTL
structure to one or more functional units
download MP3 song from host to flash
memory (flow 1):
count
enable
in out
clk
enable enable command

ECC register
in out generator in out
clk clk
sel mux
USB Flash
interface interface
pads to
flash mem
JMM v1.4
Project MP3 Player: step III
(hardware design project steps)
Step III: fix in detail the function of your
functional units (local intelligence or data-
data-path
only) and add FSMs if required, fix the detailed
interconnections between your units

PIC core
power
management
Software
“intelligent”
C Code
“intelligent”
keyboard
keyboard Port A
(FSMDinterface
architecture)
Port B
USB core Port C
Port D
“intelligent”
“intelligent” “intelligent”
“intelligent”
Hardware
(IP core) Flash &lash
I2S interface I2C
LCD interface
interface
(FSMD architecture) interface
(FSMD architecture)
JMM v1.4
Project MP3 Player: step IVa
Step IVa:
IVa: design all FSMs,
FSMs, define clock strategy, use
colored data-
data-flow, be careful with the inter-
inter-process
communications
Clock strategy: Rising edge for data-
data-paths, falling edge for IP
cores and FSMs.
FSMs. All handshake signals between FSMDs and IP
cores on falling edge.
Colors:
Colors: make a lot of copies of your RTL data path
Colors:
Colors: for each data-
data-flow step, color the old active data paths
leaving a register blue, the new active data-
data-paths leaving a
register green, and data-
data-paths treated with a combinatorial
function in the corresponding dark color. Active control signals
and its blocks are orange. All other data-
data-signals are red. Red
signals are dominant. Be sure that no red signals enter a FSM,
and no darkend or red signals attack asynchronous set/reset of
FFs.
FFs.
count
enable
in out
clk

ECC
in out in out register
generator
clk clk
sel mux
pads to MicroLab, VLSI-13 (13/24)

JMM v1.4 flash mem
Project MP3 Player: step IVb
Step IVb:
IVb: design all FSMs,
FSMs, define clock strategy, use
colored data-
data-flow, be careful with the inter-
inter-process
communications
we decide to use 3 different FSMs in addition to the ones
present in IP cores
the PIC processor core is the main unit, which
communicates with all other FSMD or core units, thus use
inter-
inter-process communication. There is no communication
in-
in-between the other units.
Software
“intelligent” C Code
keyboard
(FSMD)
Hardware “intelligent” “intelligent”

(IP core) Flash & I2S interface LCD interface
(FSMD) (FSMD)
request
process 1
aknowledge
process 2
data data valid
JMM v1.4
Project MP3 Player: step V
Step V: VHDL coding of your RTL design
use a processes for data-
data-path manipulation and its
succeeding register
use 2 processes for a FSM:
one process for transition table (VHDL case)
one process for next state (state register)
continuous assignment for output function
count
enable
in out
clk

ECC register
in out generator in out
clk Process 1 clk

sel mux
Process 2
pads to
flash mem
JMM v1.4
Project MP3 Player: step VI
Step VI: test bench design
the design of a test bench is one of the most time
consuming and important tasks. A test bench will be
re-
re-used several times during the different design
steps as well as for chip test (have a look at vlsi21)
Test Bench
control response
and generation
stimulus and
generation verification
device under test (DUT)
JMM v1.4
Final System Design Steps
Hardware design project steps:
12. system test bench design
simulation
system
13. hardware/software system simulation with test bench

place and route
14. synthesis of logic level design

15. simulation of logic level with test bench
synthesis
16. place & route your design for target technology
17. back annotation and simulation with test bench

verify
18. (formal design verification)
19. chip fabrication
20. chip test with test bench

test
21. in system test
JMM v1.4
diagam
Block diagamm of a general System
A general system is composed of three elements:

user
algorithm
plant
all three items interact with each other resulting in
2 closed loops
The closed loops may have real-
real-time constraints
JMM v1.4
GECKO Design Environment
Design entry:
C-code software
manual RTL hardware
algorithms
All three design entry elements will be converted
to VHDL and thus can be implemented into a SoC
JMM v1.4
SoC Design Methodology
The specify-
specify-explore-
explore-refine design flow is extended
to a specify-
specify-explore-
explore-refine-
refine-prototype-
prototype-analyse
design flow for SoC designs with real-
real-time
constraints
JMM v1.4
SoC with GECKO Environment
An SoC design using the GECKO system supports

the two chip approach
GECKO main board for digital part
application specific GECKO expansion board for analog,
power, HF part
Gecko main board
Real Time
Software
Signal Processing
Hardware
Microprocessor
IP Core Hardware
IP blocks SoC
Power Analog
Sensor
blocks blocks
JMM v1.4
The GECKO system
GECKO Interface Driver
GECKO main board
GECKO main board n top if an

application specific
GECKO expansion board
(RFID reader application
application,, 2 W
13.56MHz RF power)
JMM v1.4
Hardware--in
Hardware in--the
the--Loop
to iteratively improve a design fast prototyping and

data analysis steps are necessary
difficult to model plants are preferably not be
modeled and directly included in the simulation
loop
variable cut between simulation and hardware
respect real-
real-time constraints
hardware-
hardware-in-
in-the-
the-loop
hardware-
hardware-in-
in-the-
the-
software-
software-loop
JMM v1.4
Homework: MyProject
define your own project

plan the development and use the presented design
methodology
prepare the presentation of your project, be sure
you do have all the necessary documentation for the
discussed design steps
MyProject 2002:
2002: speed controlled dc motor
Matlab//Simulink with speed controller
Matlab
GECKO main board with dc-
dc-motor electronics
use hardware-
hardware-in
in--the-
the-simulation-
simulation-loop
Implementation constraints:
microprocessor with C code for „administrative“ tasks
pulse wide modulation for driving dc motor (hardware)
A/B signal encoder for speed sensing (hardware)
driving circuitry (expansion board) as simple as possible
Technical data:
dc motor has 6000 turns/minute at 5V
speed sensor has 12 pulses per turn
JMM v1.4
VLSI Design II
CMOS Processing
Overview
Processing steps
processing step sequence
Goal: You know the basics of integrated circuit

processing steps and you are familiar with the
processing sequence of a sample CMOS technology.
JMM v1.4
Introduction
Complementary MOS (CMOS) technology is
becoming the dominant candidate for VLSI
applications
CMOS provides both n- n-channel and p-
p-channel MOS
transistors on one chip
on extremely expensive fabs cheap chips are
produced
each chip passes hundreds of different processing
steps
random process disturbances cause electrical
parameter variations of the chips
elements are never identical
Process technology pictures and text are copied from:

Atlas of IC Technologies, W. Maly,
Maly, The Benjamin Cummings
Publishing Company, ISBN 0-0-8053-
8053-6850-
6850-7
JMM v1.4
VLSI Circuit Fabrication
oxidize silicon to form deposit thin layers of material
thin and thick layers of and etch into desired pattern
SiO2 to serve as
insulators.
n+ n+
diffuse dopants into implant ions to set

substrate to create thresholds and achieve
P/N junctions precise dopant profiles
Most fabrication steps require first creating a mask that determines

where the operation will occur. Masks can either be existing layers
layers on
the IC (these masks are “self-
“self-aligned”) or created using a lithographic
process and photoresist.
photoresist.
Design rules ensure that design is still functional in the face of

misalignments and various side-
side-effects of the fabrication process.
JMM v1.4
Overview
Overview of Processing Step Sequence
n-well
active
poly
Overview of Processing Steps n-diffusion
making the wafers
photolithography p-diffusion
oxidation
contacts
layer deposition
etching metal1
diffusion
via1
implantation
metal2
passivation
JMM v1.4
Processing Steps:
Making the wafers
the basic raw material used is a wafer or disk of

silicon which varies from 3” to 12” in diameter
wafers are cut in thin slices (less than 1mm) of
semiconductor cylindrical ingots
first step in IC processing is the production of a
single-
single-crystal ingot starting from a silicon melt
with a controlled amount of impurities
JMM v1.4
Processing Steps:
Photolithography #1
Complementary Photolithography is a technique

used in IC fabrication to transfer a desired pattern
onto the surface of a silicon wafer. As such the
photolithography is a key step in the entire circuit
integration process.
alternative method for lower quantities: direct write

procedure (E-
(E-beam)
JMM v1.4
Processing Step:
Photolithography #2
JMM v1.4
Processing Steps:
Oxidation #1
Thermal oxidation is a process in which silicon (Si
(Si)
Si)
reacts with oxygen to form a continuous layer of
high-
high-quality silicon dioxide (SiO2)
oxidation of the silicon surface
oxidation through a window in the oxide
selective oxide growth
oxidation of the silicon surface
JMM v1.4
Processing Steps:
Oxidation #2
oxidation through
a window
selective
oxide growth
birds bike
JMM v1.4
Processing Steps:
Layer Deposition - General
Thin layers of both conduction substances and
insulation materials constitute an important part of
any semiconductor device.
epitaxy (single crystal deposition)
PVD and CVD process (polycrystalline deposition)
JMM v1.4
Processing Steps:
Vapour Deposition
PVD
CVD
JMM v1.4
Processing Steps: Etching
The process that immediately follows the

photolithography step is the removal of material
from areas of the wafer unprotected by photoresist.
photoresist.
Characterization by selectivity and anisotropy.
wet etching
dry etching
JMM v1.4
Processing Steps:
Diffusion
Solid state diffusion is a process which allows
atoms to move within a solid at elevated
temperatures.
JMM v1.4
Processing Steps:
Implantation
The alternative to the diffusion technique of dopant
introduction used in IC manufacturing is ion
implantation.
JMM v1.4
Drive--in
N-Well Implant & Drive
In p substrate only n-
n-channel fets can be processed.
Therefore an n-
n-well has to be implanted in order to hold
the p-
p-channel fets.
fets.
Window in the mask and cross section illustrated.
JMM v1.4
Channel--stop Implant
Channel
A “thick” (0.4um) layer of silicon dioxide, called field

oxide, is formed on the surface by oxidation in wet
oxygen. This is then etched to expose surface where we
want to make fets.
fets.
JMM v1.4
Grow Field Oxide
Formation of active regions for n-

n-channel and p-
p-channel
fets of the CMOS process. The obtained bird’s beak
causes the active area of the device to be significantly
smaller.
JMM v1.4
Grow Thin Oxide
Now grow a “thin” (0.01um = 100 Angstroms) layer of
silicon dioxide, called gate oxide, on the surface by
exposing the wafer to dry oxygen.
The gate oxide needs to be of high quality: uniform

thickness, no defects! The thinner the gate oxide, the
more oomph the fet will have (we’ll see why soon) but the
harder it is to make it defect free.
JMM v1.4
Deposit & Etch Polysilicon
On top of the thin oxide a 0.7um thick layer of

polycrystalline silicon, called polysilicon or poly for
short, is deposited by CVD. The poly layer is patterned
and plasma etched (thin ox not covered by poly is etched
away too!) exposing the surface where the source and
drain junctions will be formed:
JMM v1.4
Implant Nfet Drain & Source
The entire surface is doped, either by diffusion or ion

implantation, with phosphorus (an electron donor) which
creates two n-
n-type regions in the substrate and an ohmic
contact in the n-
n-well. The phosphorus also penetrates the
poly reducing its resistance and affecting the nfet’s
threshold.
JMM v1.4
Effective Nfet Dimensions
JMM v1.4
Parasitic Fets
JMM v1.4
Implant Pfet Drain & Source
Once again the entire surface is doped, either by diffusion

or ion implantation, with boron (an electron acceptor)
which creates two p-
p-type regions in the n-
n-well and an
ohmic contact in the substrate.
JMM v1.4
Deposit SiO2 insulator
Finally an intermediate oxide layer is grown for isolation and

then reflowed to flatten its surface.
JMM v1.4
Etch contact cuts
Holes are etched in the oxide where contacts to poly/diff

are wanted.
JMM v1.4
Deposit & Etch Metal1
For interconnections aluminium is deposited, patterned and

etched.
JMM v1.4
Voila: a CMOS Inverter!
Finally a passivation layer protects the wafer surface from

contamination and scratches. Pads are opened for bonding.
JMM v1.4
Planarize
JMM v1.4
Deposit & Etch Metal2
JMM v1.4
Double--level Metal CMOS
N-well, Double
Process Steps
1. Grow barrier oxide 23. Deposit SiO2 using CVD
2. Mask/Etch
Mask/Etch n-n-well window 24. Mask/Etch
Mask/Etch contacts
3. P n-well implant through SiO2
4. Thermal drive-
drive-in to deepen n-
n-well 25. Deposit first Al using PVD
5. Remove barrier oxide 26. Mask/Etch
Mask/Etch leaving metal1
6. Grow “pad” oxide wires
7. Deposit Si3N4 27. Grow thick layer of SiO2
8. Mask/Etch
Mask/Etch leaving active region 28. Spin on thick, flat layer of
9. B channel-
channel-stop implant photoresist
10. Grow field oxide (more drive-
drive-in!) 29. Etch SiO2 and photoresist
11. Remove Si3N4 at same rate until only flat
12. Remove pad oxide SiO2 remains
13. B or P implant to adjust VTH Mask/Etch vias through SiO2
30. Mask/Etch
14. Grow thin (gate) oxide 31. Deposit second using PVD
15. Deposit P-doped polysilicon 32. Mask/Etch
Mask/Etch leaving metal2
16. Mask/Etch
Mask/Etch leaving poly wires wires
17. Etch exposed thin oxide 33. Deposit overglass to
18. Mask off p-p-diffusion regions passivate circuit
19. Sb or As nfet source/drain 34. Mask/Etch
Mask/Etch pad windows
implant, n-
n-well contact too
20. Mask all but p-
p-diffusion regions
21. B pfet source/drain implant
22. Thermal source/drain annealing
JMM v1.4
Coming Up...
Next time:
Mask layout: design rules, layout examples,
structured and symbolic layout techniques,
retargetable layouts. CAD tools for layout: design
capture, design rule checking, extraction, network
comparison.

Weste:
Chapter 3 thru 3.2.3
Johns&Martin:
2 through 2.1 (CMOS processing)
Transparencies:
transparency notes (process technology)
Study CBT course on the web or on I3S-

I3S-CD:
How a silicon integrated circuit is made ((Uni
Uni
Manchester)
JMM v1.4
VLSI--14
Exercises: VLSI
Weste pp168: 3.8 ex 5 (difficulty: easy): Explain

why substrate and well contacts are important in
CMOS.
JMM v1.4
VLSI Design II
CMOS Layout
Measure twice, fab once
Overview
CMOS Layout and Design Rules
Analog Layout Design Considerations
Goal: You are familiar with the basic layout design

0.5µm CMOS process. You
rules of the Alcatel 0.5µ
know how to layout integrated transistors,
capacitors and resistors, and what has to be
considered in order to realize quality analog
circuits, like matching and shielding.
JMM v1.4
Sources of Error
Line registration errors
resist exposure and development
over/under etching, lateral diffusion
uneven topography
Ö systematic errors corrected by bloating/
shrinking mask
Ö random errors increase minimum widths
and spacing
Mask misalignment
Ö random errors increase extensions and
surrounds
Other fab difficulties
Ö contacts and vias only on “flat” surfaces
Ö no devices near boundaries of well
Ö no poly contacts over diffusion
Ö “gate” metal must connect to diffusion
Ö minimum metal coverage requirements
Electrical properties
Ö current density limitations
Ö latch-
latch-up prevention
Process instabilities
mobility variations (why?)
thin-
thin-oxide thickness variations
sheet resistances
Ö use of “process corners” in analysis
JMM v1.4
Design vs. Actual IC
JMM v1.4
Line Registration Errors
JMM v1.4
Mask Alignment Errors (I)
JMM v1.4
Mask Alignment Errors (II)
Maly,
Maly, Figure 2-9
JMM v1.4
Design Rules
Exclusion rule extension rules
enclosure rules (overlapping)
width rules
spacing rules
We can specify the design rules using some convenient
units, e.g., microns but what happens if we want to
manufacture the chip using different manufacturers?
One suggestion: use an abstract unit, the lambda, and scale
the design to the appropriate actual dimensions when the
chip is to be manufactured.
Usually all edges must be “on grid”, e.g., in the MOSIS
scalable rules, all edges must be on a half lambda grid, on
0.5µm Alcatel all edges must be on 0.05µ
the 0.5µ 0.05µm grid.
JMM v1.4
Lambda--based Rules
Lambda
One lambda (λ(λ)= one half of the “minimum” mask
dimension, typically the length of a transistor channel.
Under the assumption that the worst case alignment is
0.75λ, the maximum relative misalignment
better than 0.75λ
1.5λ. This can be
between any two masks is better than 1.5λ
used to derive design rules and to estimate minimum
dimensions of a junction area and perimeter before a
transistor has to be laid out.
3λx3λ
x3λ
4λ
3λ
2λ
4λ 2λ
1λ
3λ 2λ
diffusion (active)
3λ
poly
1λ 2λ
metal1
contact 6λ
1λ
0.5µ
For 0.5µm Alcatel process:
0.25µ
λ= 0.25 µm 5λ
JMM v1.4
Lambda vs. Micron Rules
Lambda-
Lambda-based design rules are based on the assumption
that one can scale a design to the appropriate size before
manufacture. The assumption is that all manufacturing
dimensions scale equally,
equally, an assumption that “works” only
over some modest span of time. For example: if a design
2λ and a metal width of
is completed with a poly width of 2λ
3λ then minimum width metal wires will always be 50%
wider than minimum width of poly wires.
0.5µm process
Consider the following data from Alcatel 0.5µ
(compare with Weste, Table 3.2 pp145):
lambda lambda micron
contacted metal pitch rule = 0.25u rule
1/2 * contact size 1.5λ 0.375µ 0.3µ
contact surround 1λ 0.25µ 0.25µ
metal-
metal-to-
to-metal spacing 4λ 1.0µ 0.8µ
contact surround 1λ 0.25µ 0.25µ
1/2 * contact size 1.5λ 0.375µ 0.3µ
9λ 2.25µ 1.9µ
+40% in area
Scaled design is legal
but much larger than
it needs to be!
JMM v1.4
Retargetable Layouts?
So, should one use lambda rules, or not?

probably okay for retargeting between “similar”
processes, e.g., when later process is a simple
“shrink” of the earlier process. This often happens
between generations as a mid-
mid-life kicker for a
0.35µm processes are shrinks of
process. Some 0.35µ
0.5µm process. Can be useful for
an earlier 0.5µ
“fabless”
fabless” semiconductor companies.
most industrial designs use micron rules to get the
extra space efficiency. Cost of retargeting by hand
is acceptable for a successful product, but usually
it’s time for a redesign anyway.
invent some way of entering a design symbolically
but use a more sophisticated technique for
producing the masks for a particular process.
Insight: relative sizes may change but topological
relationship between components does not.
not. So,
instead of shrinking a design, compact it!
JMM v1.4
0.5µ
0.5µm CMOS Alcatel Mietec Process
C05M--D
Layers and mask definition: C05M
layer name drawn mask name
active yes active
nwell yes n-well
pwell no (p-
(p-well)
poly yes poly
nplus no (n+ implant)
pplus yes p+ implant
contact yes contact
metal_1 yes metal 1
via_1 yes via 1
metal_2 yes metal 2
via_2 yes via 2
metal_3 yes metal 3
nitride yes nitride
dractext yes -
nldd no (no low doped drain, Zener) Zener)
nlddprot yes -
JMM v1.4
nplusprot yes - MicroLab, VLSI-15 (11/36)
C05M--D: some logical descriptions
C05M
logical name used masks
nwell = nwell
pwell = nwell
n+diffusion = active and pplus and poly
p+diffusion = active and pplus and poly
n+source/drain = active and pplus and poly and nwell
p+source/drain = active and pplus and poly and nwell
gate = active and poly
locical masks
pfet
nwell nwell
n+diffusion active
p+diffusion pplus nfet
poly
JMM v1.4
(C05M--D)
Layout Rules (C05M #1
n-well, active
1.7µm
1.7µ
n strap
0.8µm
0.8µ
0.8µm
0.8µ
n-well
0.5µm
0.5µ 0.5µm
0.5µ on same
0.6µm 2µm (3µ
0.6µ (3µm) (different)
1µm potential
n strap
1.1µm
1.1µ
0.7µm
0.7µ 1µm 2.4µm
2.4µ
1.1µm
1.1µ
p strap
0.6µm
0.6µ
1µm
JMM v1.4
(C05M--D)
poly, fets
0.5µm
0.5µ
0.6µm
0.6µ 0.6µm
0.6µ
0.6µm
0.6µ
1.1µm
1.1µ
0.7µm
0.7µ
1.1µm
1.1µ
0.35µm
0.35µ
0.6µm
0.6µ
JMM v1.4
(C05M--D)
abutting straps
abutting
strap
1.6µm
1.6µ
abutting
strap
0.8µm
0.8µ
0.8µm
0.8µ 1.15µm
1.15µ 0.6µm
0.6µ
1.1µm
1.1µ
1.1µm
1.1µ
0.6µm
0.6µ
1µm
0.8µm
0.8µ 1.15µm
1.15µ
abutting 1.15µm
1.15µ
0.8µm
0.8µ
strap
JMM v1.4
(C05M--D)
contact
via1
metal, contacts, via1, via2 via2
0.7µm
0.7µ 0.9µm
0.9µ 1.1µm
1.1µ
0.8µm
0.8µ 0.9µm
0.9µ 1.1µm
1.1µ
0.25µm
0.25µ
0.2µm
0.2µ 0.7µm
0.7µ
0.25µm
0.25µ 0.2µm
0.2µ
0.8µm
0.8µ 0.9µm
0.9µ 1µm
0.6µm
0.6µ via2
0.8µm
0.8µ
via1
via1 need to be 0.5µm 0.25µ
0.5µ 0.25µm
covered by metal2
contact
0.35µm
0.35µ
contacts need to be
0.6µm
0.6µ covered by metal1
0.25µm
0.25µ
0.8µm
0.8µ
JMM v1.4
Sticks and Compaction
Stick diagram Horizontal constraints

for compaction in X
Compact X then Y Compact Y then X
Compact X with jog

insertion, then Y
JMM v1.4
Digital Layout: Choosing a “style”
Vertical Gates Horizontal Gates

Good for circuits where fets sizes are Good for circuits where long and
similar and each gate has limited short fets are needed or where nodes
fanout.
fanout. Best choice for multiple must control many fets.
fets. Often used
input static gates and for datapaths.
datapaths. in multiple-
multiple-output complex gates
(e.g, sum/carry circuits).
What about routing signals between gates? Note that both layouts
layouts block
metal/poly routing inside the cell. Choices: metal2 routing over
over the cell or
routing above/below the cell.
avoid long (> 50 squares) poly runs
don’t “capture” white space in a cell
don’t obsess over the layout, instead make a
second pass, optimizing where it counts
JMM v1.4
Digital Layout:
Optimising Connections
Which is the better gate layout?
considering node capacitances?
considering “composibility
“composibility”
composibility” with
neighbouring gates?
JMM v1.4
Digital Layout: Big vs. Parallel
can’t make gates too
long because of poly
resistance! Eventually
really large transistors
have to broken into
smaller transistors in
wired in parallel.
94µm2
area = 94µ 73µm2
area = 73µ
Which is the better gate layout?
considering node capacitances?

133µm2
area = 133µ
considering “composibility
“composibility”
composibility” with
neighbouring gates?
JMM v1.4
Digital Layout: Eliminating Gaps
A B C D E A
B C
D E
B D
A
C E
C B A E D
B C
D E
B D
A
C E
JMM v1.4
Analog Layout: Large Transistors
W/L can be very large in analog circuits

due to asymmetric layout, node1 has a smaller
capacitor which should be used for the most critical
node (high impedance)
node 1
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
node 2 gates
node 1
Q1 Q2 Q3 Q4
node 2
JMM v1.4
Analog Layout: Matching
Using lithography techniques a variety of two-
two-
dimensional effects can cause effective sizes of
components to differ from the sizes of the glass
layout masks.
lateral diffusion
overetching
mask misalignment ...
second--order size error effects is done

Goal: Matching second
unit--
mainly by making larger objects out of several unit
sized components connected together. For best
accuracy, the bounding conditions around all objects
should be matched, even when this means adding
extra unused components.
SiO2 protection
SiO2 protection
well poly gate
lateral diffusion overetching

under SiO2 mask
JMM v1.4
Matching Transistor Layouts:
Common--Centroid Layout
Common
use interdigitated G
M2
finger structures M2
for keeping the
effect of temp
SM1,M2 M1
and oxide
thickness M1
gradients low
use one outside M2
finger for M1, one
for M2 M2
symmetry in x & y
M1
fets in analog
circuitry are M1
typically much
wider than in M2
digital circuits
SM1,M2 M2
GM1 GM1
M1
DM1 DM2
DM1 GM1 DM2
JMM v1.4
Capacitor Matching #1
material
preferable poly1 - poly2 structures (only C05M-
C05M-A)
if not available: poly1 - diffusion (C05M-
(C05M-D), but
nonlinear due to voltage dependency
sandwich structures with poly - metal1
in analog design very often precise ratios of

capacitors are used
major sources of errors in realized capacitors are
due to overetching and something less relevant is
an oxide thickness gradient across the surface.
Goal: Larger capacitors are realized by a parallel
unit--sized capacitors
combination of smaller unit
overetching).
(overetching unit--size capacitors are not
). If unit
realizable, overetching can still be minimized by
realizing a nonunit
nonunit--sized capacitor with a specific
perimeter--to area ratio. For very accurate ratios
perimeter
common--centroid layout is used (oxide
additionally common
thickness gradient).
JMM v1.4
x
xa = x − 2∆e
y x − 2∆e ya = y − 2∆e
y − 2 ∆e
∆e
ε ox
C= A = Cox xy
∆e Ca tox
Ca = Cox xa ya = Cox (x − 2∆e )( y − 2∆e ) poly top plate

∆Ct = Cox xa ya − Cox xy poly bottom plate
∆Ct ≅ −2∆e(x + y )Cox
∆C t − 2∆e ( x + y )
ε = = C1 C2
C xy
C 2 a C 2 (1 + ε 2 ) C2 C1
=
C1a C1 (1 + ε 1 )
ideally ε1 = ε 2
C 2 a nC1a nC1 (1 + ε ) poly etch matching

= = =n well region
C 1a C1a C1 (1 + ε ) well contacts
JMM v1.4
unit sized capacitors C1 are squared

nonunit-
nonunit-sized capacitors C2 are rectangular and
usually between 1 and 2 times unit-
unit-sized capacitors
(K>1)
C2 A2 x2 y2
K= = = 2
C1 A1 x1
perimeter-
perimeter-to-
to-area ratio should be kept identical
P2 P1
=
A2 A1
P2 A2
= =K
P1 A1
4 units
x2 + y 2
K=
2x1
(
y2 = x1 K ± K 2 − K ) K=1 ... 2
JMM v1.4
Analog Layout: Resistor #1
L ρ
resistor value: R = Rsq Rsq =
W t
material: many different materials can be used.

They have different non-
non-ideal effects. Absolute
accuracy is low (+-
(+-20% or less), matching can be
made to be in the order of 1% at most.
polysilicon ((salicided
salicided and non salicided in C05M-
C05M-A and
C05M-
C05M-D process)
diffusions or ion-
ion-implanted regions (n/p-
(n/p-diff, n-
n-well)
material typ Rsq temp coeff nonideality

metal1 72mΩ
72mΩ 0 not used
metal2 55mΩ
55mΩ 0 not used
metal3 34mΩ
34mΩ 0 not used
salicid poly 2.3Ω
2.3Ω 4300ppm/C parasitic cap
most common used
n+ diff sal 2.3Ω

2.3Ω 4300ppm/C v dep,
dep, non lin
p+ diff sal 2.1Ω
2.1Ω 4300ppm/C v dep,
dep, nonlin
unsal n+poly 325Ω
325Ω −2000ppm/C
2000ppm/C parasitic cap
n+ diff unsal 50Ω
50Ω 1600ppm/C v dep,
dep, non lin
p+ diff unsal 70Ω
70Ω 1600ppm/C v dep,
dep, nonlin
n-well 1.3kΩ
1.3kΩ 4300ppm/C v dependent
JMM v1.4
Analog Layout: Resistor #2
Examples of possible resistor layout
0.14 Rsq
2.11 Rsq
matched resistors
JMM v1.4
Analog Layout:
Noise Considerations #1
Where does noise coupling occur
every time a digital gate changes its state a glitch
is injected on the digital power supply and in the
surrounding substrate
direct ohmic connections (power supply line)
via electromagnetic fields (e.g. capacitive coupling

in and from substrate)
How can noise be reduced

use of different power supply lines
layout analog and digital circuitry in different

sections of the chip
protect analog layout by guard rings
use shields connected to power and ground
analog part digital part analog part digital part analog part digital part
pad pad
pad pad pad
pin pin pin pin
power supply power supply power supply

JMM v1.4
Analog Layout:
Noise Considerations #2
Use of shields
analog interconnect digital interconnect
ground shield n+ n+ n+
n-well
p- substrate
Separate analog and digital parts with guard rings
VSS VDD VSS
p+ n+ p+
n-well
analog region digital region
depletion region
p- substrate as bypass capacitor
JMM v1.4
Summary of Analog Layout Rules
When drawing layout for analog circuits, one has to

consider many details
layout design rules, in order to get correct circuits
without shortcuts between layers, or open circuits
due to misaligned layers
avoid parasitic components
9 resistors: take care of length of interconnect wires and
material used for interconnects Add enough contacts.
9 Capacitors: There is a parasitic capacitor between any
two isolation layers. Minimize size of all areas that do
not need to have a specific size for their functionality.
Increase matching accuracy by
9 using common centroid layout
9 using non minimum sized components
9 using capacitors with constant area to perimeter ratio
reduce noise coupling by
9 separating analog and digital parts
9 using separate power supplies
9 using shielding techniques
JMM v1.4
Checking Layouts
Design Rule Checker (DRC). This is a program that checks each
piece of the layout against the process design rules. This is a
slow process:
canonicalize layout into a set of leading and
trailing non-
non-overlapping mask edges. Some Boolean mask
operations may be needed. determine electrical connectivity
and label each edge with the node it belongs to.
test each edge end point against neighboring
edges to check for spacing (leading edges) and width
(trailing edges) violations.
Layout vs. Schematic (LVS). First a netlist is extracted from the
layout. Use the electrical info generated by the DRC and then
recognize transistors are juxtapositions of channel with
diffusion. Then see if extracted netlist is isomorphic to the
schematic netlist. This is done by a coloring algorithm:
initialize all nodes to the same color
compute a new color for each node as some hashing

function involving the colors of connected (ie (ie,
ie, thru a fet)
fet)
nodes.
nodes that have a unique color are isomorphic to similarly
colored node in other network
worry about parallel fets,
fets, ambiguous nodes
JMM v1.4
Coming Up...
Next topic:
Small signal fet model

Weste:
3.4 through 3.4.7
Johns&Martin:
2.3 (CMOS layout design rules)
2.4 (analog layout design considerations)
Optional
have a look at Alcatel CMOS C05M-
C05M-D design rules
manual
JMM v1.4
VLSI--15
Exercises: VLSI #1
0.5µm
Ex vlsi15.1 (difficulty: easy): Assume the 0.5µ
Alcatel Mietec process. Use the λ rules to
calculate the minimal area and perimeter of the
following layout structure.
=4.5µm2, AJ2=3.188µ
Result: a) AJ1=4.5µ =3.188µm2,
=2.25µm2, PJ1=6µ
AJ3=2.25µ =6µm, PJ2=6µ=6µm,
=1.5µm (see Johns&Martin pp99)
PJ3=1.5µ
J2
J1 J3
Q1 Q2
JMM v1.4
VLSI--15
Exercises: VLSI #2
John&Martin pp110: 2.3 (difficulty: easy): Show a

layout that might be used to match two capacitors
of size 4 and 2.314 units, where a unit-
unit-sized
10µm x 10µ
capacitor is 10µ 10µm.
=19.56µm, x2=6.717µ
Result: y2=19.56µ =6.717µm
2.314 units
4 units
John&Martin pp123ff: 2.14, 2.15, 2.16, 2.17
JMM v1.4
CMOS Layout (replicating)
Measure twice, fab once
Today’s handouts:
(1) Lecture Slides
(2) Problem Set #5
(3) Inverter Layout Tutorial
JMM/ESA v1.0
Design for Re-use
w what’s the schematic for this cell?
w what are the “fat” fets?
w Cell was designed for placement “under” a

metal2/metal3 routing grid. How was the
layout affected by this design requirement?
JMM/ESA v1.0
Replicating Cells
What does this cell do?
What if we want to replicate this cell vertically, i.e., make a stack of

the cells, to process many bits in parallel?
w what nodes are shared among the cells?
w what nodes aren’t shared?
w how should we arrange the cells vertically?
JMM/ESA v1.0
Vertical Replication
Place shared geometry

symmetrically about
shared boundary.
Place items that aren’t

to be shared 1/2 min
spacing rule from shared
boundary.
Reflect cell about X axis

so that Pfets are next
to each other: this avoids
large ndiff/pdiff spacing.
Run shared control

signals vertically -- they’ll
wire themselves up
automatically?
JMM/ESA v1.0
Vertical Intercell Routing
carry-out to
cell above
S’pose we have a signal
that will run vertically from
one cell to the next, e.g., the
carry-out from one cell becomes
the carry-in for the cell above.
Looks okay until we reflect the

cell when we do the vertical
replication!
carry-in from
cell below
Solution: we have to do the

routing for vertical intercell
signals for a pair of cells,
then replicate the pair
(complete with routing)
vertically.
JMM/ESA v1.0
Building a Datapath
It’s often the case that we want to operate on many bits in parallel. A
sensible way to arrange the layout of this sort of logic is as a datapath
where data signals run horizontally between functional units and
control signals run vertically to all the bits of a particular functional
unit:
control
bit #3
bit #2
bit #1
bit #0 data
Logic that generates the control signals can be placed at the bottom of
the datapath. If control logic is complicated or irregular, it might be
placed in a separate standard cell block and only the control signal
buffers placed placed just below the datapath. Although it’s tempting
to run control signals in poly (so they can control fets) this is unwise
for tall datapaths because of poly resistance (e.g., 32 bits x 20u/bit
= 640u = ~1000 squares = ~20k ohms!)
JMM/ESA v1.0
Datapath Bit Pitch
How tall should we make each bit of the datapath?
That depends on
w the width of the nfets and pfets
w how much in-cell routing there is
w how much over-the-cell global routing there is
Global routes can be determined from datapath schematic:
Three global routing Internal routing may

tracks required take additional tracks
RESULT
OP1
OP2
SHIFTER
ADDER
BOOLE
MULT
OP EN OP EN EN CIN EN
Cell routing plan: vdd (m2)
global route (m2)
in-cell route (m2) control (m1)
gnd (m2)
JMM/ESA v1.0
Adder Datapath
power strapping (M1=GND, M3-VDD)
32-bit carry-lookahead adder

tristate output enable control logic
32-bit register w/ tristate driver
JMM/ESA v1.0
Shifter Datapath
>>4 >>2 >>8
<<16 <<1 <<8 <<2 <<4

shift right MicroLab, VLSI-16 (9/16)
JMM/ESA v1.0
Design for Re-use
w what’s this cell do?
w what are the “fat” fets?
w Cell was designed for placement “under” a

metal2/metal3 routing grid. How was the
layout affected by this design requirement?
JMM/ESA v1.0
Breaking the Rules
BIT BIT
word line
w How are neighboring cells placed?

w Isn’t the word line a long poly wire?
w Where’s the p-substrate contact?
JMM/ESA v1.0
Coming Up...
Next time:
Scaling effects, fundamental limits. Submicron
design issues. Power dissipation and packaging.
Weste: 6.3.7 through 6.3.9
JMM/ESA v1.0
Predicting the Future
…I see… I see… a supercomputer

the size of a sugar cube...!
Neat. Where do I invest?
Today’s handouts:
(1) Lecture Slides
(2) Mead and Conway, Chapter 9
(1981)
JMM/ESA v1.0
Scaling
Over time, process improvements will allow MOSFETs

to scale down by some factor α
w/α tox/α
l/α
t/α
xj/α
α NA
What happens?
“Scaling Theory” is a model which provides first order

predictions.
JMM/ESA v1.0
Often, different dimensions will scale at different
rates. But for an overall picture of what the future
portends, there are two major scaling models:
1. Constant Voltage Scaling

All spatial dimensions scale equally:
W W/α
L L/α
tox tox/α
and some other dimensions do as well:
d d/α depletion thickness
NA α NA doping
2. Constant Field - scale VDD too:

V V/α
so that electric fields remain the same
JMM/ESA v1.0
First, let’s consider constant field scaling, and use
basic MOSFET models to predict the effect of
scaling by α
Parameters Effect
W/L
Cg = Cox W L
Id Cox (W/L) (Vgs-Vt) 2
device power = V I
Area = W L
device power / Area
Rdiff
Rmetal
Rpoly
JMM/ESA v1.0
Speedup!
L
e-
τ = L/(µE)
Transit time τ scales as ___________
Can also compute as time to discharge gate

capacitance:
delay=Cg V/I
Gate discharge time scales as _________
JMM/ESA v1.0
Interconnect
Local (metal) Interconnect Delay = RC

L
W
t
I
R = ___________
Scaled R = R ________
C = ____________
Scaled C = C ________
Scaled Delay = delay ___________
This turns out to be an overoptimistic prediction -

more later...
JMM/ESA v1.0
Scaling Table
First Order Scaling (Weste Table 4.12)

In lateral scaling, we only
change the channel length L
Parameter Scaling Model

Constant Field Consant Voltage Lateral
Length (L) 1/a 1/a 1/a
Width (W) 1/a 1/a 1
Voltage (V) 1/a 1 1
Gate oxide thickness (tox) 1/a 1/a 1
Current 1/a a a
Transconductance 1 a a
Junction Depth 1/a 1/a 1
Substrate Doping (Na) a a 1
Gate Field (E) 1 a 1
Depletion layer thickness 1/a 1/a 1
Load Capacitance (WL/tox) 1/a 1/a 1/a
Gate Delay (VC/I) 1/a 1/a^2 1/a^2
Resulting Influence
DC Power dissipation 1/a^2 a a
Dynamic Power Dissipation 1/a^2 a a
Power-delay product 1/a^3 1/a 1/a
Gate Area 1/a^2 1/a^2 1/a
Power-density (VI/A) 1 a^3 a^2
Current Density a a^3 a^2
Devices get faster, lower power,

though current density goes up.
Devices get even faster,
though overall power and
power density rise
JMM/ESA v1.0
Die Size
With basic scaling of the same system, we’d just end
up with smaller and smaller chips.
However, from year to year, the overall die size stays

about the same or grows as we add features to the
chip.
Fab improvements (mostly, bigger wafers) are what

allow for bigger die.
Because the die doesn’t shrink, global interconnect,

particularly clocks and on-chip buses, don’t shrink
either.
JMM/ESA v1.0
Global Interconnect Scaling
Interconnect scaling for global signals:

L
W/α t/α
d
scaled R = R * α 2
scaled C = C
scaled delay = delay * α 2
Even worse: wire starts looking like lossy distributed

rc wire - O(L2) delay!
In the submicron domain, this increased significance

of wire has led to major CAD industry turmoil.
JMM/ESA v1.0
Power Scaling
Power per chip increases with constant voltage scaling

and when die size grows. How does this affect us?
Junction temperature is a function of power and
thermal resistance θja to environment.
Example: a 30W chip at 27 C ambient.
Junction temp. = 27C + 30W*θja
θja=2 C/W
Junction temp = ___________
Junction temp = _______

θja=0.1 C/W
heat sink
Heat through pins chip

to PC board
JMM/ESA v1.0
In the submicron domain, it’s difficult to scale VDD,
so power faces the “constant voltage” scaling of α2
This adds impetus to the already-important goal of

reducing power of VLSI systems. Some of the main
ways of doing this:
1. Reduce unnecessary on-chip transitions by careful

logic design, or by disabling the clock to idle
systems.
2. Reduce voltage, use more parallelism.
3. Adiabatic logic.
JMM/ESA v1.0
Problems with scaling theory
Can one scale indefnitely? No.
What are the limits?
Are they fundamental limits?

Is there any difference?
Are they technical limitations?
Is the current technology close to those limits?
JMM/ESA v1.0
Some limits
Current Density J increases with α

L
J=I/(Wt)
scaled I = I /α W
scaled J = J α
t
I
Metal migration imposes a limit on current density.

==> Thicker wires and more metal
layers needed.
==> Increased fringing capacitance with
thicker wires.
Punchthrough: source/drain depletion regions touch

VPT=(L 2 q NA)/(2ε)
L
Xd
JMM/ESA v1.0
Subthreshold leakage
Subthreshold conductance is proportional to

Vgs-Vt
exp (- kT/q
)
We can scale V t by α via ion implantation.
kT/q = 0.025V does not scale.
Vt falls =====> Subthreshold current

______________ exponentially.
Example: Vt = 0.5V means that leakage current time constant is

10 7 τ
Vt = 0.1V means that leakage current time constant
is 10 1 τ
JMM/ESA v1.0
Threshold Variations
Threshold varies from transistor to transistor.
VDD
Vout
If pullup has “big” threshold,

pulldown has “small” threshold,
and sum of variances > VDD
then inverter will not invert
(Vout = 0V always.)
How likely is this?
JMM/ESA v1.0
Threshold Variations, cont.
Analyze using Gaussian distribution.

P(given inverter fails) = exp(-4 VDD / ∆Vth)
For given inverter...

∆Vth = 0.08V, (Mead & Conway, p. 343)
VDD = 5V ===> P = 10-110
VDD = 0.5V ===> P = 10-11
But with 10,000,000 transistors on the chip, a

broken chip is very likely.
Question: Should threshold variance

increase or decrease with scaling?
JMM/ESA v1.0
Lithographic Scaling Limits
Insert p. 1110 of Halliday and Resnick Here
Ultraviolet = λ = 0.3 µ
X-Ray Lithography, λ = __________
Synchrotron lithography?
Wavelength of an electron?
Cost of FABs.
Optical tricks.
JMM/ESA v1.0
Fundamental Physical Limits
Thermodynamic
How much entropy change to set a bit?
Reversability
Quantum Limits
Tunnelling For Eb of 1eV, the gate oxides and
depletion layers must be thicker than 1 nm. In the
IBM 0.4um process, the gate thickness is 7
nm.
Thermal Limits
JMM/ESA v1.0
Is this the beginning of the end?
Not really.
VLSI is not yet really up against any fundamental

physical constraint.
The constraints that we’re facing are technological

hurdles.
With sufficient economic incentive, technological

hurdles are cleared.
Wires are a lot more important than in the past.
JMM/ESA v1.0
Coming Up...
Next topic…
MOS memories. Static and dynamic RAM cells.
Single and double-ended bit line sensing.
Multiport register files.
Weste: 4.13
JMM/ESA v1.0
CMOS Memories
I wonder which part

does the remembering?
Today’s handouts:
(1) Lecture Slides
JMM/ESA v1.0
Semiconductor Memories
Usually the majority of transistors found in a modern system are devoted
to data storage in the form of random-access memories. The need for
increased densities and lower prices has driven the development of
improved VLSI technology.
Uses:
“main” memory ⇒ high capacity, low cost
cache memories, TLB’s ⇒ fast access
programming info (eg, FPGA) ⇒ non-volatile
Read-only memories: ROM (non-volatile!)

Mask programmed
Programmable ROM (PROM)
Erasable PROM (EPROM)
Electrically Erasable PROM (EEPROM)
Read/Write or Random Access memories: RAM
Static RAM (SRAM)
Multiport SRAM (Register Files)
Content-Addressable Memories (CAM)
Non-volatile SRAM (NVRAM)
Dynamic RAM (DRAM)
Serial-access video memories (VRAM)
Synchronous DRAM (SDRAM)
RAMBUS
...
JMM/ESA v1.0
Design Tradeoffs
density: bits/unit area. Usually higher density
also means lower cost per bit. Improvements due
to finer lithography, better capacitor structures,
new materials with higher dielectric constants.
Speed: access time (latency) and

bandwidth. Improvements due to Power consumption: want power to
better sensing (smaller voltage depend on access pattern not
swing), increased parallelism quantity of bits stored.
(overlapped accesses), faster I/O. Improvements due to lower supply
voltage.
Improvements in one dimension come

at an increased cost in the other
dimensions.
JMM/ESA v1.0
Memory Architecture
bit lines word lines
Col. Col. Col. Col.
1 2 3 2M
Row 1
N Row 2
Row Address Decoder
Row 2N
memory
cell
M (one bit)
N+M Column Decoder
DATA
w Most memory layouts are “folded”, i.e., D < M. Why?

w What are there practical upper bounds on M and N?
w What if you want even more memory?
w Why only one bit per cell? (Not a silly question!)
w Why are “page-mode” accesses a good idea?
JMM/ESA v1.0
ROM Circuits
NOR-based
ROM array
shared
ground
R1
R2
R3
R4
shared
bit line C1 C2 C3 C4
contact
R1 R2 R3 R4 C1 C2 C3 C4
1 0 0 0 0 1 0 1
0 1 0 0 0 0 1 1
0 0 1 0 1 0 0 1
0 0 0 1 0 1 1 0
JMM/ESA v1.0
ROM Layout
VDD
GND
shared shared
contact ground
no
pulldown
pulldown
ground and
word line
refresh
w Which are the word lines? the bit lines?

w Why are the word lines “strapped” with M2?
w What layers change when programming changes?
w How often should signals be refreshed?
JMM/ESA v1.0
ROM Performance
tACCESS = tROW DECODE + tCOLUMN + tCOL DECODE
tROW DECODE :
If ROM is large, row decode logic is just a small percentage of
total area. So we can make the driver for the word line large and
thus fast. Note that we need to strap the poly word line to
eliminate slow down due to poly resistance.
tCOL DECODE:
As with the row decode logic, we can increase speed by
increasing size of transistors in this section.
t COLUMN:
We want small program transistors to keep the total area of
ROM as small as possible. Also increasing size of pulldowns
increases load on both word and bit lines. This means we’re
limited in the speed we can achieve in pulling down the column.
If CPD,DRAIN = 10fF and we have 128 rows:
tCOLUMN = C ∆V / I AV
= (10fF)(128)(2.5V)/(30uA)
= 110ns
Too slow! which of these can we fix?

JMM/ESA v1.0
Sense Amplifiers
Let’s speed things up by sensing small changes in the bit line voltage
using a sense amplifier:
R1
R2
C1
column C1
(tree)
decoder
C0
C0
tenths of a volt
amplified to full SENSE AMP
rail-to-rail swing
JMM/ESA v1.0
Single-ended Sense Amp
Choose fet sizes so that
M2, MD >> MC >> M1
M3 >> M4
voltage M1
reference 1
(fets sized M3
to produce
VREF = 3V) M2
M4
2
series fets in
column decoder MD bit line
(pullup built into
sense amp)
MC
word line -- enables pulldown memory cell pulldowns

when row is selected (connected to bit line)
When bit line is not pulled down, V1 = VDD and V2 = VREF - Vth = 2V, so M3
is off and M4 is on and the output is pulled low.
When a bit line pulldown is turned on, V2 starts to drop and

M2 conducts well enough so that V1 drops to V2 since MC >> M1. When V1
and V2 drop 0.5V to 1.5V, M3 is strongly conducting and M4 is weakly
conducting, so output goes high. So small ∆V on bit line produces large output
swing.
JMM/ESA v1.0
SRAM Circuits
precharge or VDD
static
bistable 6-T SRAM Cell
storage access fet
element
word line
Differential Sense Amp
rdata
bit bit
tie bulk to
source if
possible long-channel
precharge fet used as
or VDD
current source
clocked
cross-coupled Use CLK if
sense amp possible to
clk reduce power
and improve
write speed
wdata
JMM/ESA v1.0
6-T SRAM Cell Layout
VDD
inverter
pullup
inverter
pulldown
GND
strapped access fet

word line
bit line bit line
Pulldowns do the work when access fet is turned

on, pullups can be small to save space and make
the cell easy to write.
JMM/ESA v1.0
SRAM Read Cycle
VDD
VDD
6-T SRAM Cell bit
word
1
word data
bit Cell pullup has bit

volts no real effect
word
bit
make this big bit
keep away from inverter threshold

1
time
Choose WPU, WACCESS, W INV so that:

fast bit line recovery when WORD goes low
don’t want to “flip” selected cell on read (V1 < VTH,INV)
large ∆V on BIT lines to speed up sensing
minimize cell size
JMM/ESA v1.0
Differential Sense Amp
rdata
4.8/0.6 4.8/0.6
bit 2 V2 1 V1 bit
4.8/0.6 4.8/0.6
3 long-channel
VDD 0.9/7.2
fet used as
current “source”
VCS
JMM/ESA v1.0
Fast Address Decoding
Logically, row/column decoders can be

built from wide fan-in AND gates. But
these are slow, place heavy loading on
address wires and may be hard to fit into
the pitch of the memory cell.
A2 A1 A0
One can use predecode

logic to decode blocks of
addresses which are then
further decoded using
smaller AND gates. The
address lines going to the
predecode gates are less
loaded and all gates have
smaller fanin ⇒ decode
happens faster. Layout
works better too!
A2 A1 A0
JMM/ESA v1.0
Multiport SRAM (Reg File)
One can increase the number of SRAM ports by
adding access transistors. Writes are usually
double-ended; single-ended reads can be used
to save space.
write
read0
read1
rd0 wd wd rd1
An alternative design that can be easily expanded
without worrying about unintentionally flipping the
cell on reads is shown below.
wd rd0 rd1
PU = 2/1 2/1
PD = 4/1
4/1 2/1
5/1
PU = 2/2
PD = 2/3
write
read0
read1
JMM/ESA v1.0
Content-addressable RAM
By adding two transistors to the 6-T SRAM cell one can form an
XOR gate to compare the cell contents to data on the bit lines.
The output of this logic can drive a pulldown in a distributed NOR
gate to form a word “match” signal for a content-addressable
memory (CAM).
word
xor gate
match
This node goes high This node will be

if data on bit lines pulled down if any bit
doesn’t match data of the word doesn’t
in the cell. match
Read and Write cycles: like before…

Match cycle: place data on bit lines but don’t
assert word line.
JMM/ESA v1.0
CAM Architecture
weste, figure 8.76(b)
The word match lines from the CAM array can be

used as WORD lines in a companion RAM to read
out other data associated with the tag stored in
the CAM. Uses: fully-associative caches,
translation lookaside buffers (TLBs), ...
JMM/ESA v1.0
3-T Dynamic RAM
precharge
Precharge happens
before each r/w cycle.
3-T DRAM Cell READ/WRITE and
PRECHARGE dont’
read overlap.
CW CR
CC
Data is stored on
write CC. It’s not destroyed
on read, but will leak
away through write
wdata transistor. CW >> CC
rdata
WRITE: READ:
After precharge, CW is charged high. After precharge, CR is charged high.
When WRITE is asserted CW shares When READ is asserted CR is pulled
charge low if there’s a stored “1” or remains
with CC and dominates since unchanged if there’s a stored “0”. A
CW >> CC. If WDATA is sense amp
asserted, both CW and CR is usually used to speed up
will be discharged, writing a the availability of read data.
“0” into the cell; otherwise
a “1” will be written.
Pros: little or no static power, smaller than SRAM

Cons: needs refresh, need time to precharge
JMM/ESA v1.0
1-T Dynamic Ram
1-T DRAM Cell
Explicit storage
capacitor (fet gate, word
trench, stack) = 30fF
to 100fF. If we
want higher C:
access fet
better dielectric VREF
more area
εA
C= d bit
thinner film
TiN top electrode (VREF)

Ta2O5 dielectric
poly W bottom
word electrode
line
access fet “Stack” DRAM Cell
JMM/ESA v1.0
1-T DRAM Read Cycle
DSL PC DSR
lbit rbit
R2 R1 R 129 R 130
C C C/2 C/2 C C
VDD CS VDD
PC PC read out of dummy

cell half way between
“0” and “1” value
lbit, rbit
precharge (PC)
row sel (RN)
dummy sel (DSL,R)
column sel (CS)
precharge bit lines,

discharge dummy cells
read out bit, opposite dummy
amplify difference, restore bit cell
JMM/ESA v1.0
Coming Up...
Next time:
Driving large loads:
I/O circuits (edge rates, ESD protection, latch up)
Clock generation and distribution (skew)
Weste: 5.4.2, 5.5, 5.6
JMM/ESA v1.0
VLSI Design I
Defect Mechanisms and Fault Models
He’s dead Jim...
Overview
Defects
Fault models
Goal: You know the difference between design and
fabrication defects. You know sources of defects
and you can estimate yield. You can handle fault
models at different abstraction levels.
JMM v1.4
Design Defects
?
Design Specification
it helps to have a specification to compare against!
if specification is written in a hardware description

language from which the design is synthesized then
the design should be defect-
defect-free (modulo bugs in
the synthesis software!) Of course the specification
may be buggy...
everyone feels better if the design/specification are
“run” in the environment in which they will be used.
For example, in testing a processor chip, one might
boot the operating system and run some key
programs, all under simulation. This leads to the
need for lots of simulation cycles, e.g., as provided
by a hardware emulation system.
system. Now-
Now-a-days these
are built using a small army of FPGA’s.
FPGA’s. Other
choices: in-
in-circuit emulation, cycle-
cycle-based simulators.
JMM v1.4
Manufacturing Defects
Goal: verify every gate is operating as expected
Defects from misalignment, dust and other particles, “stacking”
faults, pinholes in dielectrics, mask scratches & dirt, thickness
thickness
variations ⇒ layer-
layer-to-
to-layer shorts, discontinuous wires (“opens”),
circuit sensitivities (VTH, LCHANNEL).
Find during wafer probe.
Defects from scratching in handling, damage

during bonding to lead frame, manufacturing defects
undetected during wafer probe (particularly
speed-
speed-related problems).
Find during testing of packaged parts.
Defects from damage during board insertion (thermal, ESD),

infant mortality (manufacturing defects that show up after a few
hours of use). Also noise problems, susceptibility to latch-
latch-up...
Find during testing/burn-
testing/burn-in of boards.
Defects that only appear after months or years of use (metal

migration, oxide damage during manufacture, impurities).
Found by customer (oops!).
Cost of replacing defective component increases

by an order of magnitude with each stage of
manufacture.
JMM v1.4
Production defects in CMOS circuits
a lot of complex processing steps are used to

manufacture a chip -> defects
defects and their effect depend on circuit topology
and process
knowledge of chemical and physical mechanisms
who lead to defects are essential
circuit complexity and surface determine testability
and yield
testability and yield are key factors for future VLSI
technologies
JMM v1.4
VLSI fabrication process
fabrication process consists of a sequence of well

defined process steps
50 wafers form a batch
each wafer contains 100's or 1000's of chips
specific test chips are distributed on the wafers
test chips allow to monitor process parameters
between a set of process steps the test structures
are measured
process geometrical measure
control chip's conditions
parameters structurs tolerances
layout wafer
controlling tolerances
for futher
processing
process monitor
steps steps
wafer
not futher
disturbances processed
environment
changing
JMM v1.4
VLSI fabrication process (con‘t)
chip fabrication tests:

process parameters
oxide thickness, distances of structures, etc
electrical parameters
currents, resistances, threshold voltages, ...
chip test on wafer
packaged chip test
disturbances
controlling
parameter
measuring
layout
of test-chips
parameter and
wafer bonding function test
fabrication packaging of packaged
chips
measuring parameter
of process and function
parameters test of chips
on wafer
JMM v1.4
VLSI fabrication process (con‘t)
parameter test
test of electrical parameters: current consumption,
quiescent currents, voltage levels, delay times, etc.
function test
test for logical faults: binary test sequences are applied
to the device under test (DUT)
JMM v1.4
Defect classification
defects occur at different fabrication steps:

defects at wafer fabrication
defects at chip packaging
defects during chip lifetime
JMM v1.4
Defects at wafer fabrication
50% of all defects
reason:
changes in fabrication environment
substrate inhomogenities,
inhomogenities, mask misalignment
dust particles, photolithography defects
local or global effects
electrical effects depend on layout topology
changes in delay, current consumption
shorts, opens
JMM v1.4
Defect at chip packaging
reasons:
bonding problems
mechanical stress
effect:
normally occur at primary inputs or outputs
easy to detect
JMM v1.4
Defects during lifetime
time dependant mechanisms lead to defects

early defects: high defect rate (burn-
(burn-in)
middle life phase: low defect rate
wear defects: defect rate climbs with time
defect rate
early defects
wear defects
middle life phase
time
JMM v1.4
Yield modeling
defects can produce faults

yield is percentage of fault free chips
yield influences chip cost
yield models are necessary to predict chip cost
Öyield
local defects produce most faults
assumption: local defects are statistically

independent and occur with probability p
binominal distribution
Öbinominal
Pr{K=k} = Pr{k from n areas are faulty}
Pr{
due to Bernoulli
Ödue
n
Pr{K = k} =  (1 − p ) p k
n− k
k 
with n to infinity and p to zero (np = λ ) we find
with
λk −λ
Pr{K = k} = e
k!
JMM v1.4
Yield modeling (con‘t)
∞
expectation value E {K } = ∑ ke −λ = λ
k =0
probability that a chip is fault free

Pr{K = 0} = e − DA
Murphy normalized density function f(D)
∞
Y = ∫ e − AD f (D )dD
0
calculation of yield with Murphy's density function f(D)

Y1, Y2, Y3 ?
f(D)
(for high yield) 1/D
(for 0
− AD0 2
1 − e 
Y2 =   f2
 AD0  f3
1/(2 D0)
Seed's yield model f1
Y = e − AD0
0 2D0
(for low yield)
D0
(for
JMM v1.4
Yield modeling (con‘t)
the bigger the circuit the higher the probability for

a faulty chip
example: 2 wafers with the same 17 defects
wafer with total 44 chips
yield 61%
Öyield
wafer with total 316 chips
yield 95%
Öyield
JMM v1.4
VLSI fabrication process: conclusion
defects occur during wafer fabrication, chip

packaging and during chip lifetime
local and global defects
local defects dominate at mature process
local defects are hard to find and costly
JMM v1.4
Fault models for integrated circuits
complex circuits need more test time

test time with expensive equipment leads to high
test cost per chip
to reduce test time fault models for structured test
approaches are required
if a system behaves not as expected, faults are

present
faults can be modeled at different electrical levels
faults can be caused by defects
they occur during fabrication or life time
design errors produce design-
design-faults
for example faulty logic implementation of functions
design validation is necessary
Ödesign
JMM v1.4
Fault models: Testing approaches
Plan: supply a set of test vectors that specify an input or output
value for every pin on every cycle. Tester will load the program
program
into the pin cards, run it and report any discrepancies between an
observed output value and the expected value.
0000 1 10 0000 XXXX input to chip = {0, 1}

0001 1 10 0000 LLLL
0002 1 01 1111 LLLL
output from chip = {L, H}
0003 1 00 1011 HLHL tri-
tri-state/no compare = { X }
cycle # program for 11 pins
How many vectors do we need?

n
n
combinational m combinational m
logic logic
2n inputs required to
exhaustively test circuit
If n=50, m=25, 1ns/test 2n+m inputs required to

then test time > 106 years exhaustively test circuit
Exhaustive testing is not only impractical, it’s

not necessary! Instead we only need to verify that
no faults are present which may take many fewer
vectors.
JMM v1.4
Fault models: abstraction level
circuits are treated at different abstraction levels

analog or memory circuits are treated at transistor level
medium size digital circuits are treated at logic level
complex digital circuits or microprocessors are normally
treated at functional level
example of fault manifestation: missing polysilicon

material
layout level: ex. missing polysilicon
electrical level: ex. open interconnection
transistor level: ex. permanently short-
short-circuited
transistor (if missing polysilicon gate)
logic level: ex. permanent logic level "1"
functional level: ex. register not resetable ...
JMM v1.4
Fault models (con‘t)
fault dependencies
faults are layout dependent
fault are technology dependent
goals of fault models

fault models should be realistic and thus depend on
physical defect mechanisms
fault models should be simple and treatable
JMM v1.4
Hard to detect faults
transient (intermittent) faults

occur only from time to time
due to environment changing
no satisfactory strategy to search them
repeating search
built-
built-in test: self-
self-checking circuits, error-
error-correcting-
correcting-
circuits
redundant use of several identical circuit-
circuit-blocks
benefits of redundant circuits

redundancy for higher functionality security
redundancy to eliminate hazards
disadvantages of redundant circuits

faults not detectable (masking effect)
JMM v1.4
Logic level fault models
historical perspective
Eldred proposed 1959 methods how to test
computers with relays, diodes, tubes, which
behaved like switches
stimulation of development of fault models on logic
Östimulation
level
stuck-
stuck-at fault model
signal can be stuck at "0" or "1"
independent of process technology
does not model technology dependant
characteristics
mathematical calculus exists
very useful for TTL technology (or other old
"current" technologies, but not for "charge"
technologies like CMOS)
JMM v1.4
Logic level fault models (con‘t)
Traditional model, first developed for board-

board-level
tests, assumes that a node gets “stuck” at a “0” or
“1”, presumably by shorting to GND or VDD.
stuck at “0” = S-
S-A-0 = node@0
stuck at “1” = S-
S-A-1 = node@1
A Z = ABCD
B X ZB@1 = ACD
C
D ZB@0 = 0
example of TTL NAND gate with many defects

describable with stuck-
stuck-at fault model
R1 R2 R4
T4
I1
T1 T2
O
I2
R3 T3
JMM v1.4
Fault reduction
fault collapsing
fault equivalence
fault dominance
single faults, multiple faults
fault detection
fault free function: f(x))
with fault α: fα(x))
test vectors x detect fault, if condition is fulfilled:
f ( x ) ⊕ fα ( x ) = 1
fault equivalence
A
f β ( x ) = fα ( x ) C
fault dominance B
Tβ ⊂ Tγ
fault β dominates γ A
0
B
0
fault classes
α/1 <=> β/1 <=> γ/1
0 1 β/0 => γ/0
1 0 α/0 => γ/0 α/1 A stuck-at-1
1 1 γ/0 <=> equivalence
=> dominance
JMM v1.4
Logic level fault models
fault dominance
Tα represents test vector set to detect fault α
fault α dominates fault γ under condition
Tα ⊂ Tγ
for test generation only tests for fault α are
necessary
multiple faults: fault masking problems
multiple
JMM v1.4
Transistor level fault models
introduced due to imperfection of logic level fault

models, especially for CMOS
technology dependant and thus more realistic
more complex to handle and thus not useful for
large circuits
transistor level fault models:
Wadsack's model
Hayes' switch level model
Reddy's restrictions due to static discharge
Ö robust test sets
JMM v1.4
Transistor level fault models (con‘t)
Wadsack's fault models for CMOS:

defects can lead to memory effects
faulty combinational logic may behave like
sequential logic
this effect was modeled by introducing flip-
flip-flop's
in order to use stuck-
stuck-at models
stuck-
stuck-at syndrome !
Östuck
A
fault free stuck-
stuck-at stuck-
stuck-open
B
vddsop A B Y α/0 β/0 γ/0 a b vdd
bsop
asop Y 0 0 1
0 1 0
1 0 0
1 1 0
JMM v1.4
Functional level fault models
VLSI circuits need simple fault models

goal of test: it is sometimes sufficient to know if a
sub-
sub-function works correctly
model of functional faults of sub-
Ömodel sub-circuit
each sub-
sub-function has its own process dependent
faults
advantage:
fast simulation
short test time
process dependent
good knowledge on important sub-
sub-functions (ex. RAM's)
disadvantage
less accurate
not useful for all sub-
sub-functions
JMM v1.4
Functional level fault models:
example
example of CMOS multiplexer with n inputs:

behavior under faults:
an other input is selected
one of the n inputs has a stuck-
stuck-at fault
two inputs are selected (AND or OR result at output)
if the complementary value arrives at a selected input,
an error occurs at the output
if the complementary value of the selected input arrives
at a neighbor of the selected input, an error occurs at
the output
S0 S1 S2
A0
A1
A2 Y
A3 88toto1 1MUX
MUX
A4
A5
A6
A7
JMM v1.4
Fault models summary
fault models are used to model the effects of

fabrication defects on abstract levels
fault models allow to search directly for circuit
defects
fault models need to be simple and precise
CMOS defects are bad modeled with stuck-
stuck-at fault
model
JMM v1.4
Coming Up...
Next topic…
Test pattern generation and fault simulation
Test

Weste::
Weste
reading 7 through 7.2.1
JMM v1.4
VLSI--19 #1
Exercises: VLSI
Ex vlsi19.1 (difficulty: easy): Calculate the yield of a

circuit of area 5 mm2 and 1 cm2 if the defect rate D
is 2 defects per cm2.
Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low
Result:
yield equation), see vlsi-
vlsi-19/13
Ex vlsi19.2 (difficulty: easy): Discuss the circuits

function with the introduction of the stuck-
stuck-open
fault Fx=open. Can this fault be modeled by a stuck-
stuck-at
fault?
C D
A B
F = (A+C)(B+D)
A C FX=OPEN = __________
X
B D
JMM v1.4
VLSI--19
Exercises: VLSI #2
Ex vlsi19.3 (difficulty: easy):

Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low
Result:
yield equation), see vlsi-
vlsi-19/13
Ex vlsi19.4 (difficulty: easy): Discuss faults due to

defects at the TTL nand gate on transparency 22.
stuck-at fault do you have if a) R1 is an
What kind of stuck-
open circuit, b)open at I1, c) open in R2
Result: s-a-1, b) I1 s-a-1, c) O s-
Result: a) O s- s-a-1
JMM v1.4
VLSI Design I
Test Pattern Generation and Fault Simulation
Let‘s test a chip?
Overview
Test pattern generation
Fault simulation
Goal: Design for testability terms like
controllability and observability are known. You are
familiar with test pattern algorithms as well as
with testability measure metrics.
JMM v1.4
Testers
The device under test
(DUT) can be a site on
a wafer or a packaged
part.
100’s
pin
circuitry
Each pin on the chip is

driven/observed by a
separate set of circuitry which typically can drive the pin to oone
ne data value
per cycle or observe (“strobe”) the value of the pin at a particular
particular point
in a clock cycle. Timing of input transitions and sampling of outputs
outputs is
controlled by a small (<< # of pins) number of high- high-
resolution timing generators. To increase the number
of possible input patterns, different data “formats” are provided:
provided:
tCYCLE
non-
non-return-
return-to-
to-zero (NRZ) data
return-
return-to-
to-zero (RTZ) data
return-
return-to-
to-one (RTO) data
surround-
surround-by-
by-complement (SBC) ~data data ~data
JMM v1.4
Test pattern generation
test generation is a time consuming task

computer-
computer-aided test programs (CAT) help designer
but do not solve test problems
approaches to manage test problem with increasing
circuit complexity (research fields)
design for testability
algorithms to generate good test vectors
design for testability: controllability, observability

design
system designer needs DFT knowledge
ad--hoc approaches to augment controllability:
ad
partitioning, more test-
test-pads
structured methods, multiplexer approach, scan-
scan-path,
built-
built-in logic block observation (BILBO), boundary-
boundary-scan,
signature analysis, etc...
JMM v1.4
Algorithms for test pattern
generation
basic concepts for test generation for stuck-

basic stuck-at fault
models in combinational circuits
algebraic test generation: boolean difference
D-algorithm
Podem and FAN algorithms
controllability and observability measuring
JMM v1.4
Boolean difference
algebraic method: boolean difference
circuits function with input vector x
f ( x ) = f ( x1 ... x n )
for ith component of vector x with fix value we define
f i (1) = f ( x1 ,..., x i −1 ,1, x i +1 ..., xn )
f i (0 ) = f ( x1 ,..., xi −1 ,0, x i +1 ..., x n )
definition of boolean difference
∂f ( x )
= f ( x1 ,..., x i ,..., x n ) ⊕ f ( x1 ,..., xi ,..., xn )
∂x i
∂f ( x )
= f i (0 ) ⊕ f i (1)
∂xi
circuit with fault α: stuck-
stuck-at-
at-1 at input xi
fα ( x ) = f ( x1 ,..., x i −1 ,1, x i +1 ..., xn ) = fα (1)
s-a-1 faults the two functions f(x)) and fα(1)
to detect s-
must produce different results, so the test vector set is
defined by T=1 ∂f
T = f ( x ) ⊕ fα ( x ) = x i
∂x i
∂f
s-a-0 faults: T = f ( x ) ⊕ fα ( x ) = xi
for s-
∂xi
JMM v1.4
Boolean difference: Rules
∂f (x ) ∂f (x )
∂ f (x ) ∂f (x ) =
= ∂ xi ∂xi
∂xi ∂xi
∂ ∂f (x ) ∂ ∂f (x )
⋅ = ⋅
xi ∂x j x j ∂xi
∂[ f (x )g (x )] ∂g (x ) ∂f (x ) ∂f (x ) ∂g (x )
= f (x ) ⊕ g (x ) ⊕
∂xi ∂xi ∂xi ∂xi ∂xi
∂[ f (x ) + g (x )] ∂g (x ) ∂f (x ) ∂f (x ) ∂g (x )
= f (x ) ⊕ g (x ) ⊕
∂xi ∂xi ∂xi ∂xi ∂xi
∂[ f ( x )g ( x )] f (x )
= g(x )
∂x i ∂x i g ( x ) independent of xi
∂[ f ( x ) + g ( x )] f (x )
= g(x )
∂x i ∂x i
∂[ f (x ) + g (x )] ∂[ f (x )⋅ g (x )]
=
∂xi ∂xi
∂[ f (x ) ⊕ g (x )] ∂f (x ) g (x )
= ⊕
∂xi ∂xi ∂xi
JMM v1.4
Boolean difference: example
Example Ex 20.1 (medium): circuit with stuck-

stuck-at-
at-1
fault at x3. Find all test patterns which detect the
the fault with means of the boolean difference.
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
≥1
G2
x4 &
s3
JMM v1.4
Test generation: DD--algorithm
Basics:
Basics:
fault sensitisation (provoke error)
fault propagation
line justification
D-notation
a signal with value D is fault free if D = 1
a signal with value D is faulty if D=0
a signal with value D is fault free if D = 0
a signal with value D is faulty if D=1
very formal table manipulation procedure

advantage:
if test vector exists it will be found
programmable for computers
disadvantage:
conflicts lead to time consuming dummy calculations
not usable for large circuits
JMM v1.4
Test generation: Path sensitisation
Example Ex20.2 (easy): circuit with stuck-
stuck-at-
at-1 fault
at x3. Find test vectors with means of D-
D-algorithm
sensitization:
fault propagation:
line justification
Step 1: Sensitize circuit.
Step circuit. Find input values that
produce a value on the faulty node that’s different
from the value forced by the fault. For our S-
S-A-1
fault above, want output of AND gate to be 0.
Is this always possible? What would it mean if no such
input values exist?
Is the set of sensitizing input values unique? If not,
which should one choose?
What’s left to do?
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1
x4
&
Xs3
S-A-1
JMM v1.4
Test generation: Fault propagation
sensitization:
fault propagation:
line justification
Step 2: Fault propagation.
Step propagation. Select a path that
propagates the faulty value to an observed output (y
in our example).
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1
x4 & Xs3
S-A-1
JMM v1.4
Test generation: Line justification
sensitization:
fault propagation:
line justification
Step 3: Line justification.
Step justification. Find a set of input
values that enables the selected path
(backtracking).
Is this always possible? What would it mean if no such
input values exist?
Is the set of enabling input values unique?
If not, which should one choose?
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
G2 ≥1
x4 & Xs3
S-A-1
JMM v1.4
Test generation: PODEM, FAN
more recent algorithms like Podem,

Podem, FAN or others
basically intend to prevent conflict situations or to
detect them as early as possible
concept:
take decisions as late as possible (prevent wrong
decisions, perhaps there is later nothing to decide)
heuristics help to take decisions which succeed with
higher probability
controllability and observability measuring
Öcontrollability
necessary
JMM v1.4
PODEM
Podem algorithm is simpler to understand than D-

D-
algorithm
backtrack (branch-
(branch-and-
and-bound) algorithm is used
small steps to reach objective
if objective leads to dead-
dead-end, go back
backtrack (branch-
(branch-and-
and-bound) in Podem:
Podem:
all signals are initialised to "X"
fault sensitisation
during fault propagation D symbols are propagated only
one step to primary outputs (branch)
immediate line justification of selected signal to primary
inputs (new input with value corresponds branch of
decision tree)
succeeding fault simulation immediately detects conflict
situations (bound)
new branch
JMM v1.4
PODEM: Example
branch-
branch-and-
and-bound tree
nodes represent decisions
branches represent PI's
9 represent 1st decision faulty
start
example x1 stuck-
stuck-at-
at-1
x1=0
x2=1
G1
x1
s1 G3
x2 ≥1
s2
& G5
x3
y
G4
≥1
G2
&
x4
s3
JMM v1.4
Test pattern generation: Heuristics
Heuristics in FAN algorithm

Heuristics
fault propagation
propagate to PO on path which is best observable
line justification
start with the most difficult path to control
heuristics help to find test vectors faster
JMM v1.4
Controllability and observability
measure
often used to solve np-
np-complete problems
heuristics do not guarantee to find a solution in a
given time
testability measure methods:
temas,
temas, testscreen,
testscreen, victor, camelot,
camelot, scoap
sandia controllability/observability
controllability/observability analysis
program ((scoap
scoap)
scoap)
each node in a circuit gets values for its
controllability, observability and testability
high values indicate nodes which are hard to control
or to observe
distinguish between "1" and "0" controllability
distinguish between combinational and sequential
values
JMM v1.4
Observability & Controllability
When propagating faulty values to observed outputs we are often faced
with several choices for which should be the next gate in our path.
path.
?
X
?
We’d like to have a way to measure the observability of a node, i.e.,

some indication of how hard it is to observe the node at the outputs
outputs of
the chip. During fault propagation we could choose the gate whose
whose
output was easiest to observe.
Similarly, during backtracking we need a way to choose between

alternative ways of forcing a particular value:
which input should

we try to set to 0? want 0 here
In this case, we’d like to have a way to measure the

controllability of a node, i.e., some indication of how
easy it is to force the node to 0 or 1. During
backtracking we could choose the input that was
easiest to control.
JMM v1.4
Testability measurement:
Scoap algorithm
combinational "1" and "0" controllability of a logic gate
output y dependent on inputs x1..x3
OR gate:
OR
CC 0 ( y ) = CC 0 ( x1 ) + CC 0 ( x2 ) + CC 0 ( x3 ) + 1
CC 1 ( y ) = min{CC 1 ( x1 ), CC 1 ( x 2 ), CC 1 ( x3 )}+ 1
AND gate:
AND
CC 0 ( y ) = min{CC 0 ( x1 ), CC 0 ( x2 ), CC 0 ( x3 )}+ 1
CC 1 ( y ) = CC 1 ( x1 ) + CC 1 ( x 2 ) + CC 1 ( x3 ) + 1
combinational "1" and "0" observability of a logic gate
dependent on output y and inputs x2,x3
OR gate:
OR
CO ( x1 ) = CO ( y ) + CC 0 ( x 2 ) + CC 0 ( x3 ) + 1
AND gate:
AND
CO ( x1 ) = CO ( y ) + CC 1 ( x2 ) + CC 1 ( x3 ) + 1
initialization (N are internal nodes, X,Y are PI, PO's)
initialization
CC 0 ( X ) = 1 CC 0 (N ) = ∞ CO (Y ) = 0
CC 1 ( X ) = 1 CC 1 (N ) = ∞ CO (N ) = ∞
JMM v1.4
Testability measurement:
Scoap algorithm (con‘t)
hmmm. I guess
smaller numbers
are better...
“testability” measure assumes that the further a node

is from an input/output the harder it is to set/observe
CC0(Z) = min[CC0(A), CC0(B)] + 1
A CC1(Z) = CC1(A) + CC1(B) + 1
B Z
CO(A) = CO(Z) + CC1(B) + 1
CO(B) = CO(Z) + CC1(A) + 1
if more than one, choose min
CC0(Z) = CC0(A) + CC0(B) + 1

A CC1(Z) = min[CC1(A), CC1(B)] + 1
B Z
CO(A) = CO(Z) + CC0(B) + 1
CO(B) = CO(Z) + CC0(A) + 1
1,1,-
1,1,-
1,1,-
1,1,- -.-.0
1,1,-
1,1,-
1,1,-
1,1,- -.-.0
1,1,-
1,1,-
1,1,-
1,1,-
1,1,-
1,1,- CC0,CC1,CO
JMM v1.4
Fault simulation
goals of fault simulation:

goals
analyze circuit under faults condition
qualify test sequence, fault coverage
reduce fault set during test generation
good quality fault models necessary
Ögood
fault simulation methods

fault
parallel fault simulation
concurrent fault simulation
deductive fault simulation
alternative to fault simulation in test generation

procedures:
tracing fault sensitive paths
JMM v1.4
Fault simulation (con‘t)
parallel fault simulation

principle: computing with 1-1-bit or n-bit wide
variables need similar computing time
test of n-
n-1 faults at the same time
bitposition fault mask fault values

1 fault free MA=[01000] [01000]
2 A s-a-1 MB=[00100] [00100]
3 B s-a-1 MC=[00011] [00010]
4 C s-a-1
5 C s-a-0
A=[00000] A'=[01000]
MA
C=[01100] C'=[01110]
MC
B=[00000] B'=[00100] ≥1
MB
JMM v1.4
Fault Grading
So, you’ve constructed a set of test vectors
using the techniques described here. Will
they detect all the faulty parts?
You could see how many different faults

your vectors detect by inserting each
possible fault one at a time, running the
vectors, then check to see if some output
was different from the “good” machine on
some cycle. Need *lots* of simulation…
probably impractical for large circuits even
with hardware-
hardware-accelerated simulator.
You can use the same sorts of statistical

sampling techniques that other QA
programs employ: randomly select a set
of faults, fault grade your vectors on
those faults and use standard statistical
techniques to see if fault coverage exceeds
a desired level. The level of confidence may
be increased by increasing the number of
samples.
JMM v1.4
Conclusion
defects during chip fabrication are inevitable

faults model defects on higher abstraction levels
higher chip complexity, more gates and less pads
reduce controllability, observability and thus
testability
test pattern generation is going to be time
consuming and thus costly
structured design for test during chip development
is required
JMM v1.4
Coming Up...
Next topic…
Design for Testability
Design

Weste:
Sections 7.2.2 thru 7.2.5
JMM v1.4
VLSI--20
Exercises: VLSI #1
Ex vlsi20.3 (difficulty: medium): The digital circuit

suffers from error α s-a-0. Try to find test patterns
by means of D-
D-algorithm. If you don‘t succeed use
the boolean difference to calculate the test patterns.
Result: T=x1 x2 x3 x4 found by boolean difference
Result:
G2
&
x3
G3
& G6
x1
G1
& y
x2 &
G4
α &
x4
G5
&
JMM v1.4
VLSI--20 #2
Exercises: VLSI
Ex vlsi20.4 (difficulty: easy): Calculate the Scoap

combinational controllability and observability values
for the circuit below (CC0,CC1,CO)
Result: x1 (1,1,7), x2 (1,1,7), x3 (1,1,5), x4 (1,1,5),
Result:
s1 (3,2,5), s2 (2,4,3), s3 (2,3,3), y (4,5,0)
G1
x1
s1 G3
≥1
x2
s2
& G5
x3
y
G4
≥1
x4 &
s3
Ex vlsi20.5 (difficulty: easy): a) circuit with stuck-

stuck-
at-0 fault at s1. b) circuit with stuck-
at- stuck-at-
at-0 fault at
x1. Find all test patterns which detect the the fault
with means of the boolean difference.
Result equations a) x=(x1+x2)x3, b) x=x1x2x3
Result
JMM v1.4
VLSI Systems Design
Top-down Design and HDLs
It seems I
have to
hurry up!
Overview
? Top down design-flow, VHDL hardware description
language, test-bench methodology
Goal: You are able to design circuits with the VHDL

language with behavioral, dataflow and structural
modeling. You are familiar with the top down design flow
and the test-bench methodology.
JMM v1.4
chapter 1
The Need for HDLs

A specification is an engineering contract that lists all the
goals for a project:
?goals include area, power, throughput, latency,

functionality, test coverage, costs (NREs and piece costs).
Helps you figure out when you’re done and how to make
engineering tradeoffs. Later on goals help remind everyone
(especially management) what was agreed to!
?partition the project into modules with well-defined

interfaces so that each module can be worked on by a
separate team. Gives the SW types a head start too!
(Hardware/software codesign)
?A behavioral model serves as an executable specification

that documents the exact behavior of all the individual
modules and their interfaces. Since one can run tests, this
model can be refined and finally verified through
simulation.
We need a way to talk about what hardware should do

without actually designing the hardware itself, i.e., need to
separate functionality from implementation. We need a
Hardware Description Language
JMM v1.4
The Need for HDLs cont.
?easier to explore ideas in HDLs than in logic gates

?stepwise refinement: HDLs allow to describe
designs at various levels of abstraction
?HDLs sustain description-synthesis method
?pitfalls: abstract models are not precise
?first HDLs were introduced in late 70s

?difficulties to develop general purpose HDL for
signal-processing and real-time applications and ...
?portability needs lead to standardizations (Institute
of electrical and electronics engineering, IEEE)
JMM v1.4
Hardware Description Languages
?textual HDLs
VHDL, Verilog-HDL, HardwareC, etc.
?graphic HDLs
SpecdChart, etc. (control & dataflow graphs)
?tabular HDLs
BIF, etc. (FSMD models in tabular forms)
?time-diagram HDLs
Waves, etc.
?Standardization
? VHDL: IEEE Std 1067-1987 & 1993
? std_logic package IEEE Std 1164-1993
? Verilog-HDL: IEEE Std 1997
JMM v1.4
A Tale of Two HDLs
VHDL Verilog-HDL
VHSIC HDL, Very High Speed C-like concise syntax
Integrated Circuits. ADA-like
verbose syntax, lots of redundancy
Extensible types and Built-in types and logic
simulation engine. Logic representations. Oddly,
representations are not this has led to slightly
built in and have evolved incompatible simulators
with time (IEEE-1164). from different vendors.
Design is composed of Design is composed of

entities each of which can have modules.
multiple architectures. A
configuration chooses what
architecture is used for a given
instance of an entity.
Behavioral, structural, Behavioral, structural,

logic-level modeling logic-level modeling
Synthesizable subset... Synthesizable subset...
Harder to learn and use, Easy to learn and use,

not technology-specific, fast simulation, good for
DoD mandate. logic. Gateway Design
Automation
JMM v1.4
Introduction to VHDL & Verilog
VHDL Verilog-HDL
?rich & powerful ?simple & efficient
language language
?data type driven ?hardware driven
language language
?goal: documentation of ?goal: automatic
large complex systems synthesis
language structures language structures

?entity (hierarchy ?module (blocks or sub-
interface) blocks)
?architecture (behavior ?#include (file
of system) structuring)
?configuration (binding
of entity and
architecture)
?package (library of
global types or
blocks)
JMM v1.4
Introduction to VHDL & Verilog cont.
language features
?signal data types (in, out, bidir, signal-strength ...)
?hardware structures (memory, register-files, ...)
?logic operators (shift, rotation, masking, ...)
?asynchronous structures (set, reset of memories)
?parallel or synchronous structures
?constraints (pin, technology, area, delays, ...)
?inter-process communications (shared medium,
message passing, ...)
VHDL
JMM v1.4
chapter 2
Signals, Delays, Events, Concurrency
?digital systems in contrast to software systems are

fundamentally about signals
?signals in contrast to variables do have delays which
leads to signal waveforms
?digital systems are comprised of components
?digital systems do have concurrency of operation
?events on signals lead to computations that may
generate events on other signals
event
sum
carry
time (ns)
5 10 15 20 25 30 35 40
JMM v1.4
Signal Values
?signal values are physically associated to wires

?VHDL language supports signal type:
?type: bit, values: ‘0’, ‘1’
?type: bit_vector, values: “0001”, etc
?VHDL package IEEE 1164 supports signal type:
?type: std_ulogic and vector std_ulogic_vector
?std_ulogic is a 9 value logic
value interpretation
U un-initialized
X forcing unknown
0 forcing 0
1 forcing 1
Z high impedance
W weak unknown
L weak 0
H weak 1
- don’t care
JMM v1.4
Resolved Signals
?it is common for components in a digital system to

have multiple sources for the value of an input
signal
?many designs use buses: a group of signals that can
be shared among multiple sources
?the values on shared signals will be determined
upon the type of interconnection, like wired logic
?the signal values depend on its implementation
? the VHDL simulator has to resolve the signals value
? The IEEE 1164 package offers std_logic and
std_logic_vector signal types for resolved version
of the signal std_ulogic and std_ulogic_vector
resolved signal necessary

wired-or logic
unresolved signal
JMM v1.4
chapter 3
Entity
?the design entity is a primary programming

abstraction in VHDL
?entity defines the interface of a component, without
giving any information about the component
behavior
a sum
+
b carry
entity HalfAdder is
port (a,b : in bit;
sum,carry : out bit);
end HalfAdder;
library IEEE;
use IEEE.std_logic_1164.all
entity HalfAdder is
port (a,b : in std_ulogic;
sum,carry : out std_ulogic);
end HalfAdder;
JMM v1.4
Exercises Ex401: Entity
?Ex401 (difficulty: easy): Define the entities of the

following digital components. Use the unresolved 9
value logic of the IEEE 1164 package. Each
component has to be edited in a separate file with
the components name plus extension “.vhd” . The
files have to be analyzed by the Synopsys command:
gvan
sNot
i0
Mux4to1 d q
i1 D_ff
z
i2 clk qNot
i3
sel rNot
8 bit data
n z
a
32 bit data Alu32
6 bit op-code c
use first letter of
b component name in capital,
op and first letter of signal
name in small cap
JMM v1.4
Architecture
?the design architecture is a primary programming

abstraction in VHDL
?architecture describes the internal behavior of a
component, without giving any information about
the component IO’s
?The behavioral description can take many forms.
These forms differ in the levels of abstraction and
detail.
architecture behavior of HalfAdder is

-- comment: declaration of variables
begin
...
functional description
of the system
end behavior;
JMM v1.4
Entity-Architecture: Hierarchy
(VHDL vs. Verilog)
library IEEE;
use IEEE.std_logic_1164.all;
entity FullAdder is
port (a,b,ci: in std_logic; co,s:out std_logic);
end FullAdder; VHDL
architecture behavior of FullAdder is

-- comment: declaration of variables
...
functional description
of the system
end behavior; module FullAdder (a,b,ci,co,s);

input a,b,ci;
output co,s;
/* comment: declarations of variables */
...
Verilog-HDL functional description
of the system
endmodule
JMM v1.4
Concurrency
?The operation of digital systems is inherently
concurrent
?Within VHDL signals are assigned values using
signal assignment statements <=
?Multiple signal assignment statements are executed
concurrently
concurrent architecture concurrent_behavior of HalfAdder is
signal assignment begin
sum <= (a xor b) after 5 ns;
carry <= (a and b) after 5 ns;
end concurrent_behavior;
sum
carry
time (ns)
5 10 15 20 25 30 35 40
JMM v1.4
Dataflow Model #1
library IEEE;
entity HalfAdder is
port (a,b: in std_logic;
carry,sum:out std_logic);
end HalfAdder;
architecture dataflow of HalfAdder is

begin
sum <= (a xor b) after 5 ns;
carry <= (a and b) after 5 ns;
end dataflow;
JMM v1.4
Dataflow Model #2
s1
L1
L4
+
L2 s2
L5
L3 s3
library IEEE;
entity FullAdder is
port (a,b,cIn: in std_logic;
cOut,sum: out std_logic);
end FullAdder;
architecture architecture dataflow of FullAdder is

declarative signal s1,s2,s3 : std_logic;
constant gate_delay: Time:=5 ns;
segment begin
L1: s1 <= (a xor b) after gate_delay;
L2: s2 <= (cIn and s1) after gate_delay;
architecture L3: s3 <= (a and b) after gate_delay;
body L4: sum <= (s1 xor cIn) after gate_delay;
L5: cOut <= (s2 or s3) after gate_delay;
end dataflow;
JMM v1.4
Signal Assignments #1
?simple signal assignments
sum<=(a xor b) after 5 ns, (a or b) after 10 ns, (not a) after 15 ns;
sig <= ‘0’, ‘1’ after 10 ns, ‘0’ after 20 ns, ‘1’ after 40 ns;
time (ns)
5 10 15 20 25 30 35 40
clock <= ‘0’, not(clock) after 5 ns;
time (ns)
5 10 15 20 25 30 35 40
a <= “00000000_00000000”, to_stdlogicvector(x”abcd”) after 5 ns;
Type conversion from hexadecimal

to std_logic_vector is defined in
package std_logic_1164
JMM v1.4
Conditional Signal Assignment #2
?The right hand value is computed immediately and

assigned at some point in the future using the after
clause
library IEEE;
entity Mux4to1 is
port (i0,i1,i2,i3: in std_logic_vector(7 downto 0);
sel : in std_logic_vector(1 downto 0);
z : out std_logic_vector(7 downto 0));
end Mux4to1;
architecture dataflow of Mux4to1 is

begin
z <= i0 after 5 ns when sel=“00” else
one single i1 after 5 ns when sel=“01” else
signal i2 after 5 ns when sel=“10” else
assignment i3 after 5 ns when sel=“11” else
“00000000” after 5 ns;
end dataflow;
JMM v1.4
Exercises Ex402: Conditional Signal
Assignment
?Ex402 (difficulty: easy): Define the VHDL code of
a 1bit ALU with the operations: AND, OR,
FullAdder. Use the resolved 9 value logic of the
IEEE 1164 package. The Simple1bitALU.vhd file has
to be analyzed and simulated by the Synopsys
commands: gvan and vhdldbx
carry
a
Alu32
carryIn result
b
opcode
JMM v1.4
Delays: Delta Delay Model
?The VHDL language distinguished between tree
delay models:
?Delta delay model
?Inertial delay model (default)
?Transport delay model
? Delta delay model
?If no delay is specified, a delta delay is assumed. A delta
delay is as small as zero delay. It is used by the
simulator which sums delta delays to zero.
in1 architecture delta_delay of Comb is
signal s1,s2,s3,s4: std_logic:=0;
in2 begin
z s1 <=not(in1);
s1 s2 <=not(in2);
s3 <=not(s1 and in2);
s2 s4 <=not(s2 and in1);
s3 z <=not(s3 and s4);
s4 end delta_delay;
0 10 20 30 40 50 60 70
in2
s2
s3
z
10 ? 2? 3?
JMM v1.4
Delays: Inertial Delay Model
?Digital circuits have a certain amount of inertia.
For example it takes a finite amount of time and a
certain amount of energy for the output of a gate to
respond to a change on the input
? Inertial delay model (default)
?a pulse shorter than the propagation delay will not
propagate to the output
out1 <= (a xor b) after 8 ns;

out2 <= (a xor b) after 2 ns;
input
out
input
8 ns
out1 output for delay: 8 ns
2 ns
time (ns)
5 10 15 20 25 30 35 40
VHDL’93!
sum <=reject 2 ns inertial (a xor b) after 5 ns;
JMM v1.4
Delays: Transport Delay Model
?Unlike switching devices, wires have a

comparatively less inertia, As a result, wires will
propagate signals with very small pulse width.
?In modern technologies with increasingly small
feature sizes the wire delays dominate.
? Transport delay model (default)
?any pulse will propagate to the output, independent of
the delay
out1 <= transport (a xor b) after 8 ns;
input
out
input
8 ns
time (ns)
5 10 15 20 25 30 35 40
JMM v1.4
Delay Model in Practice
?Accurate delay library IEEE;
modeling of wire use IEEE.std_logic_1164.all;
delays is possible,
although in practice it entity HalfAdder is
is difficult to obtain port (a,b: in std_logic;
accurate estimates of carry,sum:out std_logic);
the wire delay without end HalfAdder;
proceeding through
physical design and architecture transport_delay of HalfAdder is
layout of the circuit. signal s1,s2: std_logic:=‘0’;
begin
s1 <= (a xor b) after 2 ns;
a s1 s2 <= (a and b) after 2 ns;
sum
b sum <= transport s1 after 4 ns;
carry <= transport s2 after 4 ns;
s2 end transport_delay;
carry
a
b inertial
sum
carry transport
s1
s2
time (ns)
0 2 4 6 8 10 12 14
JMM v1.4
Exercises vlsi21: Conditional
Assignments
?Ex403 (difficulty: easy): Write and simulate a
VHDL model of a 2-bit comparator (compare on
equality, filename: Comp2.vhd).
a
Comp2 c
b
?Ex405 (difficulty: easy): Construct and test a
VHDL module for generating the following
waveforms.
a
b
c time (ns)
0 10 20 30 40 50 60
?Ex vlsi21 (difficulty: easy): Have a look at the

exercises at the end of chapter 3 of “VHDL:
Starter’s Guide”
JMM v1.4
chapter 4
The Process Construct #1
?The continuous assignment model is used when

components correspond to gates.
?The process construct enables the use of
conventional programming language constructs.
?In contrast to concurrent signal assignment
statements a process is a sequentially executed
block of code.
?Control flow within a process is strictly sequential.
?With respect to simulation time a process executes
in zero time.
architecture behavior of MyProcess is

begin
process
process declarative part
begin
process body
end process;
end behavior;
JMM v1.4
Example: Process Statement
library IEEE;
use IEEE.std_logic_unsigned.all;
entity Memory is
port (addr,wrData: in std_logic_vector(31 downto 0);
wr,rd: in std_logic;
rdData :out std_logic_vector(31 downto 0));
end Memory;
architecture behavioral of Memory is

type memArray is array(0 to 1024) of std_logic_vector(31 downto 0);
begin sensitivity list
MemProcess: process(addr,wr)
variable mem: memArray :=(
(x“00000A06“), -- initializing memory data
others => (x“00000000“));
variable addrIndex: integer;
begin
addrIndex:=conv_integer(addr); immediate
if (wr = ‘1‘) then variable
mem(addrIndex):=wrData; assignment
elsif (rd = ‘1’) then
rdData <=mem(addrIndex) after 10 ns; concurrent
end if; signal
end process; assignment
end behavioral;
JMM v1.4
The Process Construct #2
?The execution of a process is initiated whenever an

event occurs on any signal in the sensitivity list
?Once started the process executes to completion in
zero (simulation) time.
?Processes execute concurrently with other
processes and concurrent signal assignments.
?Concurrent signal assignments are in fact only
special cases of processes.
identical behavior
architecture behavior of MyBlock1 is architecture behavior of MyBlock2 is

begin begin
c <= a and b after 5 ns; process(a,b)
end behavior; begin
c <= a and b after 5 ns;
end process;
concurrent signal assignment end behavior;
process
JMM v1.4
VHDL vs. Verilog: Events
Events are variable or signal changes.

Real circuits are event driven.
VHDL Verilog-HDL
?process sensitivity list ?always @(sensitivity

begin statements; list) statement
end process;
?initial (sensitivity
?wait on/until/for list) statement
event;
whow!
everything is
event driven like
in real life
JMM v1.4
Conditional Programming Constructs
?If-then-else statement
if condition then sequential statement
[ elsif condition then sequential statement ]
[ else sequential statement ] end;
?case statement
case expression is
{when choices => sequential statements }
[ when others => sequential statements ]
end case;
JMM v1.4
Example: Condition Statements
library IEEE;
entity HalfAdder is
port (a,b: in std_logic;
sum,carry: out std_logic);
end HalfAdder;
architecture behavioral of HalfAdder is

begin
If_Process: process(a,b)
begin
if (a = b) then
sum<= ‘ 0‘ after 5 ns;
else
sum<= (a or b) after 5 ns;
end if;
end process;
Case_Process: process(a,b)
begin
case a is
when ‘0‘ => carry <= a after 5 ns;
when ‘1‘ => carry <= b after 5 ns;
when others => carry <= ‘x‘ after 5 ns;
end case;
end process;
end behavioral; MicroLab, VLSI-21 (31/94)
JMM v1.4
VHDL vs. Verilog:
Combinational Logic Example
entity Multiplexer4to1 is
port (sel: in std_logic_vector (1 downto 0);
a,b,c,d: in std_logic_vector (15 downto 0);
z:out std_logic_vector (15 downto0));
end Multiplexer4to1;
VHDL
architecture DemoExample of Multiplexer4to1 is
begin
process (a,b,c,d,sel)
4 to 1 multiplexer
begin (no interfered memory)
case sel is
when (“00“) => z <= a;
when (“01“) => z <= b;
when (“10“) => z <= c;
when (“11“) => z <= d; module Multiplexer4to1(sel,a,b,c,d,z);
when others => z<=“-------“; input [1:0] sel;
end case; input [15:0] a,b,c,d;
end process; output [15:0] z;
end DemoExample;
assign z =(sel == 2’d0) ? a:
(sel == 2’d1) ? b:
(sel == 2’d2) ? c:
(sel == 2’d3) ? d:
Verilog-HDL 16’bx;
endmodule
JMM v1.4
Loop Programming Constructs
loop index has not to be

?for loop statement declared but can only be
for index in range loop used locally
sequential statements
end loop;
?while loop statement

while condition loop
sequential statements
end loop;
JMM v1.4
Example: Loop Statements
library IEEE;
use IEEE.std_logic_unsigned.all; a
Multiplier
(32 bit) m
entity Multiplier is b
port (a,b: in std_logic_vector(31 downto 0);
m: out std_logic_vector(63 downto 0));
end Multiplier;
architecture behavioral of Multiplier is

constant modulDelay: Time:=10 ns;
begin
process(a,b)
variable bReg: std_logic_vector(63 downto 0);
variable aReg: std_logic_vector(31 downto 0);
begin
aReg:=a;
bReg:=(x“00000000“) & b;
for index in 1 to 32 loop
if bReg(0)= ‘ 1‘ then
bReg(63 downto 32):=bReg(63 downto 32)+aReg(31 downto 0);
end if;
bReg(63 downto 0):= ‘ 0‘ & bReg(63 downto 1);
end loop;
m<=bReg after modulDelay;
end process;
end behavioral;
JMM v1.4
Exercises vlsi21: Loops
?Ex405 (difficulty: easy): Write a VHDL code for a

combinational shift logic block with 8 bit data
buses with zero fill. Use the 2 bit signal shiftNum
to indicate the number of bits to be shifted. If a
std_logic_vector has to be converted to an integer
type, the conv_integer() function from the
std_logic_unsigned package can be used.
shiftLeft shiftRight
dataIn Shift dataOut
shiftNum
JMM v1.4
More on Processes
?Never assign a value to a signal in different

processes (multiple drives).
process A conflict
y<=‘0‘; - two drivers!
- not synthesisable!
process B
y<=‘1‘;
?Upon initialization all processes are executed at

once.
?Thereafter processes are executed in a data-driven
manner:
?activated by events on signal list of the process or
?by waiting on occurrences of specific events using wait
statements
JMM v1.4
The Wait Statement
?A more general way to specify when a process

executes is the wait statement.
?Wait statements explicitly specify the conditions
under which a process may resume execution after
being suspended.
?With wait statements a process can be suspended at
multiple points.
wait for time expression;

example: wait for 20 ns;
wait on signal;
example: wait on clk,reset,status;
wait until condition;

example: wait until (a = ‘1‘);
wait;
JMM v1.4
Example: Wait Statements
library IEEE; library IEEE;
use IEEE.std_logic_1164.all; use IEEE.std_logic_1164.all;
entity Dff1 is entity Dff2 is

port (d,clk: in std_logic; port (d,clk,rst: in std_logic;
q,qBar: out std_logic); q,qBar: out std_logic);
end Dff1; end Dff2;
architecture behavioral of Dff1 is architecture behavioral of Dff2 is

begin begin
process process(clk,rst);
begin begin
wait until (clk‘event and clk=‘1‘); if (rst=‘0‘) then
q <=d after 1 ns; q <= ‘0‘ after 1 ns;
qBar<=not d after 1 ns; qBar<= ‘1‘ after 1 ns;
end process; elsif (clk‘event and clk=‘1‘) then
end behavioral; q <=d after 1 ns;
qBar<=not d after 1 ns;
end if;
end process;
end behavioral;
if a process has no sensitivity list you MUST

use wait statements, otherwise your process
never suspends and blocks your simulation
JMM v1.4
Latch vs. Flip-Flop
process(clk,reset)
begin
if (reset = ‘0’) then d D Q q
q <= ‘0’; Latch
elsif (clk=‘1’) then clk
q <= d;
end if;
end process;
reset
process(clk,reset)
begin
if (reset = ‘0’) then d D Q q
q <= ‘0’; Flip-Flop
elsif (clk’event and clk=‘1’) then clk
q <= d;
end if;
end process; reset
process(clk,reset)
begin
if (reset = ‘0‘) then
q <= “00000000“; Mux D Q q
elsif rising_edge(clk) then d
register
if (enable = ‘1’) then enable
q <= d; clk
end if;
end if;
end process; reset
JMM v1.4
Exercises vlsi21: Synchronous

16 bit register with an enable and a asynchronous
reset input. Register16
d q
enable
clk
reset

16 bit counter with an enable a load and a
asynchronous reset input.
Counter16
data count
enable
load
clk
reset
JMM v1.4
More on Wait: Inter-Process Comm.
transmitData
request
acknowledge
receiveData time
entity Handshake is
port(inputData: in std_logic_vector(31 downto 0));
end Handshake;
architecture behavioral of Handshake is

signal transmitData: std_logic_vector(31 downto 0);
signal request, acknowledge: std_logic;
begin
producer: process consumer: process

begin variable receiveData:
wait until inputData‘event; std_logic_vector(31 downto 0);
transmitData<=inputData; begin
request<=‘1‘; wait until request=‘1‘;
wait until acknowledge=‘1‘; receiveData:=transmitData;
request<=‘0‘; acknowledge<=‘1‘;
wait until acknowledge=‘0‘; wait until request=‘0‘;
end process; acknowledge<=‘0‘;
end process;
end behavioral;
JMM v1.4
Exercises vlsi21: Handshake
?Ex vlsi21.8a (difficulty: easy): Write a VHDL

model for communication between an input process
and an output process using handshaking protocol.
The input process can only read a single word (32
bit) at a time. The output device requires a
reversing byte order, which is performed by the
input process. Assign a delay of 1 ns to each
handshake signal.
AsyncComm
inputData outputData
input process output process
?Ex vlsi21.8b (difficulty: medium, optional):

Rewrite the above handshake model by using a clk1,
clk2 signal for the two synchronous processes as
well as a rst for initialization, and a start signal to
initiate one data transfer. Do not use any wait
constructions within the processes.
JMM v1.4
Attributes
attribute function
signal’event function returning a Boolean value
signifying a change in value on this signal
signal’active function returning a Boolean value
signifying an assignment made to this
signal (may not be a new value)
signal’last_event function returning the time since the
last event
signal’last_active function returning the time since the
signal was last active
signal’last_value function returning the previous value
of this signal
signal’left returns the leftmost value of signal in
its defined range
signal’right returns the rightmost value of signal
in its defined range
signal’hight returns the highest value of signal
signal’low returns the lowest value of signal
signal’ascending returns true if signal has an ascending
range of values
signal’length returns the number of elements in the
array signal
JMM v1.4
Generating Periodic Waveforms
library IEEE;
use IEEE.std_logic_1164.all; Z
time (ns)
entity Periodic is
port(Z: out std_logic); 0 10 20 30 40 50
end Periodic;
architecture behavioral of Periodic is

begin
process
begin
Z<=‘0’, ‘1’ after 10 ns, ‘0’ after 20 ns, ‘1’ after 40 ns;
wait for 50 ns;
end process;
end behavioral;
library IEEE; reset

use IEEE.std_logic_1164.all; phi1
phi2
entity TwoPhase is
port(phi1,phi2,reset: out std_logic);
end twoPhase; 0 10 20 30 40 50 60
time (ns)
architecture behavioral of TwoPhase is
begin
reset_process: reset<=‘1’, ‘0’ after 10 ns;
clock_process: process
begin
phi1<=‘1’, ‘0’ after 10 ns;
phi2<=‘0’, ‘1’ after 12 ns, ‘0’ after 18 ns;
wait for 20 ns;
end process;
end behavioral;
JMM v1.4
Modeling Finite State Machines
reset
outputData
inputData state state
transition register output
process enable process outSig0
outSig1
outSig2
clk
architecture behavioral of MooreFSM is

type StateType is (MyState,YourState,InitState);
signal state : StateType;
signal outputData: std_logic_vector(5 downto 0);
begin output_process: process(state)
begin
transition_process: process(reset,clk) case state is
begin when MyState =>
if (reset = ‘0’) then outputData<=“01—00”;
state <= InitState; when YourState =>
elsif rising_edge(clk) then outputData<=“00100-”;
case state is when InitState =>
when MyState => outputData<= “100100”;
state<=YourState; when others =>
when YourState => outputData<=“000000”;
if (inputDataSignal = ‘1’) then end case;
state<=MyState; end process;
end if;
when others => null; outSig0<=outputData(0);
end case; outSig1 <=outputData(1);
end if; outSig2<=outputData(2);
end process;
end behavioral;
JMM v1.4
Exercises vlsi21: FSM
?Ex409 (difficulty: easy): Write a VHDL model for

a traffic light controller. Use a Moore type FSM.
The signal carPresent indicates cars running on the
main street which always have priority. If no cars
are present on the main street, the secondary street
gets green lights.
OrangeState
orange
carPresent
green
red
main 0 1 0
second 1 0 0
carPresent
GreenState RedState1
orange
orange
green
green
red
red
main 0 0 1 main 1 0 0
second 1 0 0 second 0 0 1
carPresent
RedState2
orange
green
carPresent
red
reset main 1 0 0
second 0 1 0
JMM v1.4
chapter 5
Modeling Structure
?a structural model of a system is described in terms

of interconnection of its components
? a structural model consists of 3 features:
?component declaration
?signal declaration
?component interconnection component
OR2 declaration
HalfAdder3 a
a sum z
b carry b
ports
component label
component
H1 H2 interconnection
HalfAdder3 HalfAdder3
s1 sum
in1 a sum a sum OR2
in2 b carry b carry s2 a
z cOut
s3
b
cIn
O3
JMM v1.4
Example: Structural Model
library IEEE;
entity FullAdder3 is
port (in1,in2,cIn: in std_logic;
sum,cOut: out std_logic);
end FullAdder3; component behavior
architecture structural of FullAdder3 is described elsewhere
component HalfAdder3
port(a,b: in std_logic;
sum,carry: out std_logic);
end component; component
declaration
component OR2
port(a,b: in std_logic;
z: out std_logic);
end component;
signal s1,s2,s3: std_logic;

signal
declaration
begin
H1: HalfAdder3 port map(a=>in1,b=>in2,
sum=>s1,carry=>s3);
H2: HalfAdder3 port map(a=>s1,b=>cIn, component
sum=>sum,carry=>s2); interconnection
O3: OR2 port map(a=>s2,b=>s3,
z=>cOut); (netlist)
end structural;
JMM v1.4
Exercises vlsi21: Structural Model
? Ex410 (difficulty: medium): Write a VHDL code

for the structural model of the FullAdder3
described in the previous transparency. Assume a
delay of 1 ns for all logic gates
a) Write the structural VHDL code for a HalfAdder.
b) Write the VHDL codes for the necessary logic
gates like OR2 and others in one file
(logicgates.vhd)
b) Write the VHDL code for FullAdder3
c) Analyze and simulate the whole circuit. Be aware
of the correct sequence of analyzing.
JMM v1.4
VHDL vs. Verilog:
library IEEE; Structural
entity FullAdder4 is Description
port (a,b,cIn:in std_logic;
cOut,sum:out std_logic);
end FullAdder4;
architecture flatStructure of FullAdder4 is

component XOR
port(a,b: in std_logic; z:out std_logic); VHDL
end component;
component AND2
port(a,b: in std_logic; z:out std_logic);
end component;
component OR3
port(a,b,c: in std_logic; z:out std_logic);
module FullAdder4
end component; (a,b,cIn,cOut,sum);
signal net1,net2,net3,net4:std_logic; input a,b,cIn;
begin output cOut,sum;
u1: XOR port map (a,b,net1); wire net1,net2,net3,net4;
u2: XOR port map (cIn,net1,sum);
u3: AND2 port map (cIn,a,net2); XOR u1(net1,a,b);
u4: AND2 port map (cIn,b,net3); XOR u2(sum,cIn,net1);
u5: AND2 port map (a,b,net4); AND2 u3(net2,cIn,a);
u6: OR3 port map (net2,net3,net4,cOut); AND2 u4(net3,cIn,a);
end flatStructure; AND2 u5(net4,a,b);
OR3u6(cOut,net2,net3,net4);
endmodule
Verilog-HDL
JMM v1.4
VHDL vs. Verilog:
Data Flow Description
library IEEE;
use IEEE.std_logic_unsigned.all;
entity FullAdder5 is
port (a,b,cIn:in std_logi;
VHDL
sum,cOut:out std_logic);
end FullAdder5;
architecture dataFlow of FullAdder5 is

signal tmp: std_logic_vector(1 downto 0);
begin
tmp <= ‘0‘ & a + b + cIn;
cOut <= tmp(1);
sum <= tmp(0);
end behavior; module FullAdder5 (a,b,cIn,sum,cOut);
input a,b,cIn;
output cOut,sum;
Verilog-HDL
assign {cOut,sum} = a + b + cIn;
endmodule
JMM v1.4
Hierarchy, Abstraction, and Accuracy
?Structural models simply describe interconnections

?Structural models do not describe any form of
behavior
?Hierarchy expresses different levels of detail
?Structural models are a way to manage large,
complex designs
?Modern designs have several 10 millions of gates
?Simulation time: the more detailed a design is
described, the more events are generated and thus
the larger the simulation time will be needed.
FullAdder3 top level
OR2 HalfAdder3
AND2 XOR2 bottom level

JMM v1.4
Generics
?The VHDL language provides the ability to construct
parameterized models using the concept of generics
entity AND2 is
generic(andDelay: Time);
library IEEE; port(a,b : in std_logic; z: out std_logic;
use IEEE.std_logic_1164.all; end AND2;
entity HalfAdder4 is architecture genericDelay of AND2 is

generic(adderDelay: Time:=3 ns); begin
port(a,b : in std_logic; z<=a and b after andDelay;
sum,carry: out std_logic; end genericDelay;
end HalfAdder4;
architecture genericDelay of HalfAdder4 is

component AND2 is
generic(andDelay: Time);
port(a,b : in std_logic; z: out std_logic;
end component; values to generics
can be assigned at
component XOR2 is different locations
generic(xorDelay: Time);
port(a,b : in std_logic; z: out std_logic;
end component;
begin no semi column

C1: XOR2 generic map(12 ns) port map(a,b,sum); needed
C2: AND2 generic map(adderDelay) port map(a,b,carry);
end genericDelay;
JMM v1.4
More on Generics
?Within a structural model there are two ways in
which the values of generic constants of lower level
components can be specified:
?in the component declaration
?in the component instantiation
?If both are specified, then the value provided by the
generic map() takes precedence.
?If neither is specified, then the default value
defined in the model is used.
library IEEE;
entity GenericOR is
generic(n: positive:=2);
port(in1: in std_logic_vector((n-1) downto 0); z: out std_logic);
end GenericOR;
architecture behavioral of GenericOR is

begin
process(in1)
variable sum: std_logic:=‘0‘;
begin
sum:=‘0‘;
for i in 0 to (n-1) loop
sum:=sum or in1(i);
end loop;
z<=sum;
end process;
end behavioral;
JMM v1.4
Exercises vlsi21: Hierarchy, Generic
? Ex411 (difficulty: medium): Write a VHDL code

of an 8 bit ALU based on the definitions made in
Ex402 with the Simple1BitALU.
a) Write a behavioral VHDL code for ALU8b.vhd
b) Write the structural VHDL code for ALU8 in one
file ALU8s.vhd. Assume a delay of 1 ns for all
logic gates. What is the worst case delay of the
ALU8.
? Ex412 (difficulty: easy): Write a VHDL code of

an n bit register with reset and enable inputs
(NbitRegister.vhd).
JMM v1.4
Configuration
?Structural models may employ different levels of

abstraction
?Each component in a structural model may be
described as a behavioral or a structural model
?Configuration allows stepwise refinement in a
design cycle
?Configuration represents resource binding
?Description-synthesis design method
Configuration associates
an architecture
description to each
FullAdder3 component:
- behavioral or
- structural for
OR2 HalfAdder3
FullAdder3
AND2 XOR2
JMM v1.4
Configuration: Component Binding
?Example of binding architectures: A bit-serial adder

?one of the different architectures must be
bound to the component C1 for simulation
?entity is not bound as interfaces do not change
architecture gataLevel of Comb is

- - -
C1
architecture lowPower of Comb is
a Comb sum - - -
b (combinational
logic) carry
C2
Dff
q d architecture highSpeed of Comb is
carryIn - - -
clk
rst clock
reset
architecture behavioral of Comb is
- - -
JMM v1.4
Configuration: Default Binding Rules
?To analyze different implementations, we simply

change the configuration, compile and simulate.
?When newer component models become available we
bind the new architecture to the component
Default binding rules:

?if the entity name is the same as the component
name, then this entity is bound to the component
?if there are different architectures in the working
directory, the last compiled architecture is bound to
the entity
JMM v1.4
Example: Configuration
C1 highSpeed
in1 Comb sum
in2 (combinational
logic)
carryIn carry
C2 MyDff
q d
clk
rst clock
configuration name behavioral
reset entity name
(used for simulation)
library name
configuration CFG_HighSpeed of SerialAdder is entity name
architecture name
for structural
for C1: Comb use entity WORK.Comb(highSpeed);
end for;
for C2: Dff use entity MyLibrary.MyDff(behavioral)

generic map(gateDelay=>5 ns)
port map(my_clk=>clk, my_d=>d,
my_q=>q, my_rst=>rst); if different component
end for; than described in
entity is used, then
end for; I/O mapping must
be declared.
end CFG_HighSpeed;
JMM v1.4
Exercises vlsi21: Configuration
? Ex 413 (difficulty: easy): Write a VHDL code of
the bit-serial adder shown in the previous
transparency SerialAdder.vhd
a) Construct a model for the two components Comb
and MyDff and place them both in your WORK
library (don‘t use the library MyLibrary yet).
b) Adapt the configuration, compile and simulate it.
? Ex414 (difficulty: medium): Consider the circuit

shown below (ConfigExample). Construct a
structural model comprised of three components.
However in the configuration use only two
components by using a n-input AND gate.
i1
i2 &
i3
?1 o1
&
JMM v1.4
chapter 6
Subprograms, Packages and Libraries
?VHDL provides mechanisms for structuring

programs, reusing software modules, and otherwise
managing design complexity.
?Packages contain definitions of procedures and

functions that can be shared across different VHDL
models.
?Packages may contain user defined data types and
constants and can be placed in libraries.
?Summary: procedures, functions, packages and

libraries provide facilities for creating and
maintaining modular and reusable VHDL programs.
JMM v1.4
Functions
?Functions are used to compute a value based on the
values of the input parameters. Functions are
placed in declarative parts. Example of function
definition:
function rising_edge (signal clock: in std_logic) return boolean;
?Functions cannot modify parameter values

(procedures can). Example of function call:
rising_edge(clk)
?Functions execute in zero simulation time, thus
wait statements cannot exist in functions.
Parameters are restricted to be of mode in.
mode not necessary
function rising_edge (signal clock: std_logic) return boolean is
variable edge: boolean:=false;
begin
edge:=(clock= ‘ 1‘ and clock‘event);
return(edge);
end rising_edge;
JMM v1.4
Example: Type Conversion Function
with Functions
?As VHDL is a type sensitive language, type
conversions are quite often necessary.
note: size is not declared
function to_bitvector(svalue: std_logic_vector) return bit_vector is

variable outvalue: bit_vector(svalue‘length-1 downto 0);
begin
for i in svalue‘range loop
case svalue i is
when ‘0‘ => outvalue i:=‘0‘;
when ‘1‘ => outvalue i:=‘1‘;
when others => outvalue i:=‘0‘;
end case;
end loop;
end to_bitvector;
?Many conversion procedures as well as resolution

functions can be found in std_logic_1164 or
std_logic_arith libraries and others. Have a look at
$SYNOPSYS/packages/IEEE/src/
JMM v1.4
Procedures
?Procedures are subprograms that can modify one or
more of the input parameters. Example of procedure
declaration reading from a file f:
procedure read_v1d (variable f: in text; v: out std_logic_vector);
?if the class of the procedure parameters is not

explicitly declared, then the following rules apply:
?parameters of mode in are assumed to be of class constant
?parameters of mode out or inout are assumed to be of class
variable
?Variables declared within a procedure are initialized
on each call to the procedure and their values do not
persists across invocations of the procedure.
?Signals cannot be declared within procedures
?Poor programming: Procedures declared within
process can make assignments to signals
corresponding to the ports of the encompassing
entity.
?Procedure call:
Dff(clk=>clk,reset=>reset,d=>s2,q=>s1,qbar=>open);
JMM v1.4
Example: Procedure
library IEEE;
entity CPU is
port(di: out std_logic_vector(31 downto );
addr: out std_logic_vector(2 downto 0);
r,w: out std_logic;
do: in std_logic_vector(31 downto 0);
s: in std_logic);
end CPU;
architecture behavioral of CPU is

procedure Mread(address: in std_logic_vector(2 downto 0);
signal r: out std_logic;
signal s: in std_logic;
signal addr: out std_logic_vector(2 downto 0);
signal data: out std_logic_vector(31 downto 0)) is
begin
addr<=address; procedure Mwrite(address: in std_logic_vector(2 downto 0);
r<=‘1‘; signal data: in std_logic_vector(31 downto 0);
wait until s=‘1‘; signal addr: out std_logic_vector(2 downto 0);
data <= do; signal w: out std_logic;
r<=‘0‘; signal di: out std_logic_vector(31 downto 0)) is
end Mread; begin
addr<=address;
w<=‘1‘;
wait until s=‘1‘;
di <= data;
begin w<=‘0‘;
-- CPU behavioral end Mwrite;
-- description
end behavioral;
JMM v1.4
Overloading
?A very useful feature of the VHDL language is the

ability to overload a subprogram or an operator.
?Imagine writing different Flip-Flop models with no
and with asynchronous inputs and with different
argument types. With the overloading feature only
one single Flip-Flop name can be used.
?Example for Dff calls:
Dff(clk,d,q,qbar);
Dff(clk,d,q,qbar,reset,clear);
?From the type and number of arguments we can tell
which procedure we meant to use.
?Note that in std_logic_1164.vhd the boolean
functions and, or, etc have been defined for
std_logic types, the functions +,*, etc have been
defined for certain predefined types of the language
such as integer. See also std_logic_arith package.
function “*“(arg1,ar2: std_logic_vector) return std_logic_vector;
function “+“(arg1,ar2: singed) return signed;
JMM v1.4
Packages
?Locally related functions and procedures can be
grouped into packages, and thus easily be shared
among designs and people.
package MyLibraryPackage is
--
-- type declarations
package declaration
-- function declarations
-- procedure declarations similar to VHDL entity
-- defines interfaces
end MyLibraryPackage;
package body MyLibraryPackage is

-- package body
-- functions
-- procedures similar to VHDL architecture
-- defines behavior
end MyLibraryPackage;
?package declaration needs to be analyzed first, and

then package body can be analyzed.
?Packages are used as libraries and referenced within
VHDL design units via the use clause.
JMM v1.4
Example: Package Declaration
package std_logic_1164 is
type std_ulogic is (‘U‘, -- uninitialized

‘X‘, -- forcing unknown
‘0‘, -- forcing 0
‘1‘, -- forcing 1
‘Z‘, -- high impedance
‘W‘, -- weak unknown
‘L‘, -- weak 0
‘H‘, -- weak 1
‘-‘ -- don‘t care);
type std_ulogic_vector is array (natural range <>) of std_ulogic;
subtype std_logic is resolved std_ulogic;
type std_logic_vector is array (natural range <>) of std_logic;
function “and“ (l,r: std_logic_vector) return std_logic_vector;

function “and“ (l,r: std_ulogic_vector) return std_ulogic_vector;
-- rest of package declaration
end std_logic_1164;
JMM v1.4
Libraries
?Each design unit - entity, architecture, package - is

analyzed (compiled) and placed in a design library.
?Libraries are generally implemented as directories
and are referenced by a logical name.
?In VHDL the libraries STD and WORK are
implicitly declared.
?WORK is the working design library normally
placed in a local directory.
?Once a library has been declared, all of the
functions, procedures and type declarations of a
package can be accessed.
all functions, procedures, typed
are visible
library IEEE;
only the „xnor“ function is visible

library IEEE;
use IEEE.std_logic_1164.xnor; visibility must be established
for each design unit – entity-
separately
JMM v1.4
Synopsys tools on unix workstations
Example: Libraries and Packages

/home/MyHome/ design environment .synopsys_vss.setup
- VHDLdesign/
- WORK/ all source VHDL design files
library: - lib/ all compiled designs
MyLibrary - src/
compiled package: MyPackage
.synopsys_vss.setup source file: MyPackage.vhd
DEFAULT: ./WORK
MyLibrary : ./lib
use = . ./src
timebase = ns
MyPackage.vhd in a unix shell:

package MyPackage is analyze the package MyPackage
-- analyze the design MyVHDLdesign
end MyPackage;
cd /home/MyHome/VHDLdesign
gvan –w MyLibrary src/MyPackage.vhd
package body MyPackage is
--
end MyPackage;
gvan MyVHDLdesign.vhd
MyVHDLdesign.vhd components
library MyLibrary; can also be
use MyLibrary.MyPackage.all; placed into
library
-- use MyLibrary.all; libraries
library MyLibrary
WORK
entity MyVHDLdesign is /home/MyHome/VHDLdesign/lib
... /home/MyHome/VHDLdesign/WORK
JMM v1.4
Exercises vlsi21: Libraries & Packages
? Ex415 (difficulty: medium): The small circuit
ConfigExample from exercise Ex414 shall be
rewritten by using the components OR2 and ANDn
from the library MyLibrary.
a) Write the VHDL file MyComponents.vhd holding
the two components OR2 and ANDn and compile it
into the library MyLibrary.
b) Rewrite the ConfigExample circuit using only
library elements and call it LibraryExample.vhd,
compile and simulate it.
i1 ANDn
i2 & OR2
i3
ANDn ?1 o1
&
? Ex416 (difficulty: medium): Write the VHDL

package MyPackage with the functions OneCounter
(counting ‚1‘) and ParityGenerator should accept
std_logic_vectors or bit_vectors of any size), and
analyze it into the library MyLibrary. Use the
defined functions in your design PackageExample to
show its functionality.
JMM v1.4
Exercises vlsi21: Packages
? Ex417 (difficulty: medium): The bit-serial adder of
exercise Ex413 shall we rewritten using a
procedure call for the Dff instead of a component
(SerialAdder2.vhd). Place the procedure into a
package MyPackage and analyze it into the library
MyLibrary. Verify the functionality.
C1 highSpeed
in1 Comb sum
in2 (combinational
logic)
carryIn carry
Dff
q d
clk library
rst clock MyLibrary
behavioral
reset
JMM v1.4
VHDL vs. Verilog: Data Types
VHDL Verilog-HDL
?type driven language ?arrays

?predefined data types ?run-time constants:
in packages: parameter
character, integer, real, ?continuous driven
bit, std_logic, textio, ...
nets: wire, tri, ...
?enumerate types
?triggered assignments:
?arrays reg, integer, real, ...
?records
?pointers
JMM v1.4
VHDL vs. Verilog: Operators
Operator type function VHDL Verilog
arithmetic a+b + +
a-b - -
a*b * *
a/b / /
a-b*n a div b mod %
a-(a/b)*b rem
logical a and b and &
a or b or ¦
not(a and b) nand ~&
a exor b xor ^
shift logic srl,sll >>
shift arith. sra,sla
rotate ror, rol
reduction, & {a,b}
concatenation
replication {4{a}}
relational > > >
>= >= >=
/=
JMM v1.4
VHDL vs. Verilog:
Sequential Structures
-- inside an architecture
...
variable inp: std_logic_vector (7 downto 0);
variable outp,cout:std_logic_vector (7 downto 0);
VHDL
process (clk)
begin
if (clk’event and clk = ‘1’) then
outp := outp + inp;
cout := outp + 1;
end if;
end process;
...
/* inside a module */
...
wire [7:0] inp;
sequentially executed statements reg [7:0] outp, cou;
...
always @(posedge clk)
begin
outp = oupt + inp;
cout = outp + 1;
end
Verilog-HDL ...
JMM v1.4
VHDL vs. Verilog:
Parallel Structures
-- in an architecture
...
variable inp: std_logic_vector (7 downto 0);
signal outp,cout:std_logic_vector (7 downto 0);
VHDL
p1: process (clk)
begin parallel executed
if (clk’event and clk = ‘1’) then statements
outp <= outp + inp;
cout <= outp + 1;
end if;
end process; /* in a module */
...
p2: process (reset) wire [7:0] inp;
begin reg [7:0] outp, cou;
if (reset = ‘0’) then ...
outp <= “00000000“; always @(posedge clk)
end if; fork
end process; outp = outp + inp;
... cout = outp + 1;
join
parallel
executed blocks always @(reset)
two drivers if (!reset)
Verilog-HDL outp = 8’b0;
...
JMM v1.4
VHDL vs. Verilog: Assignments
architecture ex1 of AssignExample is
signal x1, y1, y2, z1, z2:
std_logic_vector (7 downto 0); VHDL
variable x2: std_logic_vector (7 downto 0);
...
begin
p1: process (clk)
begin signal assignment
if (clk’event and clk = ‘0’) then variable assignment
x1 <= y1;
y1 <= x1;
z1 <= y1 after 12ns; Verilog-HDL
end if;
end process; module AssignExample
wire [7:0] v,y2,z2;
p2: process (y2) reg [7:0] x1,y1,z1,x2;
begin ...
x2 := y2; always @(posedge clk)
y2 <= x2; fork
z2 <= y2 after 12ns; x1 = y1;
end process; y1 = x1;
end ex1; z1 #(12) = y1;
join
before the falling edge of clk:
assign x2 = y2;
x=1, y=2, z=3 assign y2 = x2;
12ns after falling edge of clk: assign #(12) z2 = y2;
x= y= z= ? endmodule
JMM v1.4
VHDL vs. Verilog:
Sequential Logic
library IEEE; register with
package MyDefinition is asynchronous reset
type vector16 is array (15 downto 0) of
std_logic;
end MyDefinition; VHDL
library IEEE;
use work.MyDefinition.all;
entity AsynRegister is
port (clk,rst: in std_logic;
a: in vector16; z: out vector16); Verilog-HDL
end AsynRegister;
module AsynRegister(clk,rst,a,z);
architecture DemoExample of AsynRegister is input clk,rst;
begin input [15:0] a;
process (clk, rst); output [15:0] z;
begin
if (rst = ‘0’) then always @(posedge clk)
z <= vector16’(others => ‘0‘); if (rst == 1’b0)
elsif (clk’event and clk = ‘1’) then z = 16’b0;
z <= a; else
end if; z = a;
end process; endmodule
end DemoExample; MicroLab, VLSI-21 (78/94)
JMM v1.4
“Dataflow” Modeling
library IEEE;
entity Demux2x4 is port(

a,b,enable: in std_logic;
z: out std_logic_vector(0 to 3););
end Demux2x4;
architecture dataflowof Demux2x4 is

signal abar,bbar: std_logic;
begin
z(3) <= not(a and b and enable);
z(0) <= not(abar and bbar and enable);
abar <= not a;
z(2) <= not(a and bbar and enable);
abar <= not a;
z(1) <= not(abar and b and enable);
end dataflow;
All the signal assignment statements (“<=“) happen

concurrently after some specified delay which defaults
to 1 “delta”, an infinitesimally small delay. Note that
concurrent statements are always “running” so whenever
A, B or ENABLE change then ABAR, BBAR, and Z(0 to 3)
will also change after some delay.
The delay in assigning a signal its new value means that

the following statement is meaningful (it generates a
periodic waveform):
CLK <= not CLK after 10 ns;
JMM v1.4
VHDL Example: Behavioral Modeling
library IEEE;
use IEEE.std_logic_arith.all;
entity Demux2x4 is port(

a,b,enable: in std_logic;
z: out std_logic_vector(0 to 3););
end Demux2x4;
architecture behavioral of Demux2x4 is

begin
process(a,b,enable)
variable abar,bbar: std_logic;
begin
abar := not a; local variables (separate
bbar := not b; copies for each instance
if (enable = ‘1’) then of Demux2x4)
z(3) <= not(a and b);
Process statements z(2) <= not(a and bbar);
can be compiled so z(1) <= not(abar and b);
behavioral simulations z(0) <= not(abar and bbar);
can be quite fast. else
z <= “1111”;
end if;
end process;
end behavioral; vector constant
Statements within a process are executed sequentially,

like a program. The process is scheduled for execution
after any events are processed for variables on its
sensitivity list. Values of local variables are maintained
between executions.
JMM v1.4
Synthesis
Idea: once an behavioral model has been finished why not

use it to automatically synthesize a logic implementation in
much the same was as a compiler generates executable code
from a source program?
a.k.a. “silicon compilers”
Synthesis programs process the HDL then
?infer logic and state elements

?perform technology-independent optimizations
(e.g., logic simplification, state assignment)
?map elements to the target technology
?perform technology-dependent optimizations
(e.g., multi-level logic optimization, choose
gate strengths to achieve speed goals)
Synopsys, Inc. is the current leader in

providing synthesis tools and synthesizable
HDL modules.
JMM v1.4
Logic Synthesis
Z <= (A and B) or C; if (SEL = ‘1’) then Z <= B;
else Z <= A;
A end if;
B
C Z B 1
0 Z
A
A
B SEL
C Z
B
SEL Z
A
signal x,y,sum: std_logic_vector(3 downto 0);

sum <= unsigned(x) + unsigned(y);
SUM(0) SUM(1) SUM(2) SUM(3)
full full full full

0 adder adder adder adder NC
X(0) Y(0) X(1) Y(1) X(2) Y(2) X(3) Y(3)
process(word)
variable result: std_logic;
begin WORD(3)
result := ‘0’; PARITY
for j in 0 to 3 loop
result := result xor word(j); WORD(2)
end loop;
parity <= result;
WORD(1)
end process; WORD(0)
JMM v1.4
FSM Example
architecture behavioral of Moore is
type StateType is (S0,S1,S2,S3);
signal current,next: StateType;
begin
process(current) -- state transition

begin
case current is
when S0 =>
if (a=‘1’) then next<=S2; end if;
when S1 =>
if (a=‘1’) then next<=S0; else next<=S2; end if;
when S2 =>
if (x=‘1’) then next<=S3; end if;
when S3 =>
if (x=‘1’) then next<=S1; end if;
end case;
end process;
process(current) -- output logic

begin
case current is
a
when S0 => z <= ‘0’;
when S1 => z <= ‘0’;
when S2 => z <= ‘1’;
when S3 => z <= ‘1’; S0
z=0
end case; a a
end process;
S1 a S2
z=0 z=1 a
process(clk,reset) -- state register
begin a
a
if (reset=‘0’) then current<=S0; S3
elsif (clk’event and clk=‘1’) then z=1
current <= next; a
end if;
end process;
end behavioral;
JMM v1.4
Further Reading
ISBN 0-13-181447-8 ISBN 0-7923-9472-0
Also:
? D. Perry, VHDL, Second Edition, McGraw Hill, 1993
? see VHDL tutorials at I3S-CD or on the web

http://www.microlab.ch/academics/courses/vlsi/g.html
? don‘t forget to study the CBT tutorial on VHDL
JMM v1.4
Test Bench /1
?avoid interactive simulation, because it can never

be used again
?test benches reduce total simulation development
time
?test benches are used to verify designs during
stepwise refinement
?test bench methodology bridges simulation with
automatic test equipment (ATE)
I can relax
my test bench does
everything automatically
JMM v1.4
Test Bench /2
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested
?there is a nice measurement equipment
?there are skilled and hard working people
?there are no signals coming or going to the outside of the
lab
Test Bench
control response
and generation
stimulus and
why do we need MicroLab

if my test bench does the
job as well
JMM v1.4
Test Bench in Design Flow
design of
VHDL model
simulation of test bench

VHDL model inp VHDL out
FPGA synthesis model
place & route
FPGA test
(debugger)
test bench synthesis of
test machine logic model
inp FPGA out test bench
chip simulation of
logic model inp out
place & route

physical design
simulation of test bench

extracted model inp out
ASIC fabrication
test bench
test machine
prototype test
(ASIC) inp ASIC out
chip
JMM v1.4
VHDL Test Bench
use IEEE.std_logic_1164.all
entity TestBench is test bench has
end TestBench; no inputs, no outputs
architecture sample of TestBench is
signal clk, a: bit;
signal b: bit;
component MyCircuit
port(clk,a:in bit; b: out bit);
end component;
begin
call of device under test
DUT: MyCircuit port map (clk,a,b); (DUT)
process
begin clk generation
clk <= ‘0’, ‘1‘ after 20 ns, ‘0‘ after 70 ns;
wait for 100 ns;
end process;
TestPatternGenerator: block
begin test pattern generation
process on a cycle by cycle basis
begin
a <= ‘0’; -- test cycle 1
wait for 100 ns;
a <= ‘1’; -- test cycle 2
wait for 100 ns; response pattern
... verification not
end process; yet implemented!
end block;
end sample;
JMM v1.4
Test Bench - Test Cycle
?design strictly synchronous circuits

?cycle based test benches
test cycle
apply input patterns capture output response
clock
input valid
output stable stable stable
JMM v1.4
ProTest Test Machine
?test bench controls CAD simulator and test machine

?low cost rapid prototyping and test system
JMM v1.4
Conclusions
?HDLs are very useful for behavioral hardware

system descriptions
?abstract models do not precisely reflect the reality
?restriction to synthesizable coding is necessary
?technology independency opens the possibility to
fast FPGA prototyping
?test benches increase chip quality and highly
decrease simulation time
JMM v1.4
Exercises: VLSI-21: Test-Bench
?CAD Ex418: Test-Bench (difficulty: easy) Instead

of interactive simulation or writing macros for
interactive simulation, it is state-of-the-art to use
test-benches for simulation and chip test. Write a
test bench file tb_SerialAdder2.vhd for the
previous exercise Ex417. Generate the clock signal
with a process and write sequential test cycles for
the input signals. Be aware that the test-bench has
no input and output signals, but calls the unit-
under-test (UUT) and generates all stimuli.
test bench
inp VHDL out
model
JMM v1.4
Coming Up...
?Next topic…
CAD exercises and mini FPGA projects PWM, blackjack
dealer, simple microprocessor, etc
?Readings for next time...

VHDL tutorials
A Prototype Test System for ASICs and FPGAs with a Tight
Link to VHDL and Verilog-HDL Based CAD Simulators,
DATE’99. Design Automation and Test Engineering in
Europe Conference, Jacomet et. al. (see on the MicroLab
web)
On a Development Environment for Real-Time information
Processing in System-on-Chip Solutions, SBCCI’01,
IEEE 14th Symposium on Integrated Circuit and System
Design, Brazil Sept. 2001, Jacomet et. al. (see on the
MicroLab web)
JMM v1.4
Exercises: VLSI-21 #1
?CAD Ex45x: PWM Project (difficulty: easy; time:

medium): Design of a pulse width modulator
(PWM) controlling a DC-motor. The PWM shall
have an microprocessor interface. The VHDL design
is simulated, compiled and implemented into an
FPGA and is supposed to drive small dc motor.
?CAD Ex450: (difficulty: easy): Design the VHDL

code of the PWM element. The btrdy and ack
signals are handshake signals for communication
with the microprocessor data bus. A value 0 on the
8-bit data bus will switch off the dc motor
(pOut=‘1‘), a non-zero value will generate a PWM
signal with an on-time of (data/256)*100% of a
period. Analyze the VHDL syntax with gvan.
(data/266) * 100%
btrdy pOut
ack PWM
data 8
PWM period
clk
rst
JMM v1.4
?CAD Ex451: (difficulty: easy): Design a test-bench

for the PWM. Simulate your VHDL code with the
Synopsis VSS simulator and use your test-bench
to verify its correct behavior.
?Result: see exercise Ex451 on the MicroLab web
?CAD Ex452 (difficulty: easy): Synthesize the

PWM VHDL code into a gate level schematic for a
Xilinx FPGA target technology. Connect your VHDL
signals to the correct FPGA pins. Perform the
place&route of the logic elements.
?CAD Ex453 (difficulty: easy): Download your PWM

circuit into an FPGA and and applying different
PWM values to your circuit by the GECKO User
Interface tool. Use an oscilloscope to verify its
correct output behavior. This exercise has to be
done in MicroLab, using the GECKO system.
JMM v1.4
?CAD Ex400 (difficulty: easy): Design VHDL a 2:1

multiplexer. Use a dataflow model. Simulate your
VHDL code with the Synopsys VSS simulator. Get
familiar with interactive simulation.
?Result: see exercise Ex400 on the microlab web
?CAD Ex401 (difficulty: medium): Design in VHDL

a the SN74160 synchronous decimal counter. Use a
behavioral model. Simulate your VHDL code with
the Synopsis VSS simulator and use macros for
interactive simulation.
?CAD Ex402 (difficulty: easy): Schematic entry of

the blackjack-dealer on block-level using Synopsys
SGE tool.
JMM v1.4
?CAD Ex403 (difficulty: easy): Skeleton VHDL code

generation for blackjack-dealer from block-level
schematics using Synopsys SGE tool.
?CAD Ex404 (difficulty: medium): Design the

VHDL code of your blackjack-dealer. Use the
prepared templates as a guide.
?CAD Ex405 (difficulty: medium): Write a VHDL

test bench for your blackjack dealer. Generate the
following sequence of card values: 3, 11, 11, 7, 2,
11, 6.
(final result is: 21)
JMM v1.4
?CAD Ex406 (difficulty: easy): Synthesize the

blackjack dealer VHDL code into a gate level
schematic for a Xilinx FPGA target technology.
Perform the place&route of the logic element.
?CAD Ex407 (difficulty: medium): Perform a back-

annotation of your FPGA chip and simulate it again
with the real timing information. Does your chip
still work? Look for the errors.
?CAD Ex408 (difficulty: medium): Download your

blackjack dealer circuit into an FPGA and run your
test bench again on the ProTest test system. This
exercise has to be done in MicroLab.
JMM v1.4
VLSI Design I
Automatic Synthesis of Digital Circuits
Why should I not enjoy

life instead of drawing
schematics if CAD tools
can do the job for me?
Overview
design abstraction domains
architectural models
Goal: You are familiar with the design abstraction
domains, the description-
description-synthesis design method,
the design strategies as well as the three synthesis
steps. You know the FSMD architectural model as
well as the interprocess communication models.
JMM v1.4
Introduction
system complexity is increasing

product lifetime is decreasing
Ö design efficiency is essential

Ö new design methods are necessary
Ö higher abstraction levels are introduced
Ö CAD tools able to handle large amounts of data are
needed
JMM v1.4
Design Methodology
budget ($, speed, area,
power, schedule, risk)
low-
low-level building blocks, spice
high-
high-level architecture paper & pencil
Gee, I skipped these steps

specification when doing the project!
behavioural design, verification
logic design, verification

schematics
layout, verification simulation
timing analysis
layout, drc
extraction
net compare
LVS (layout vs schematic)
JMM v1.4
Capture--Simulation Method
Capture
bottom--up approach
bottom
structure of a system is described
knowledge of an experienced designer is difficult to
automate
& CLK
D Q
data 3A
clk
ena
JMM v1.4
Description--Synthesis Method
Description
top-
top-down approach
behaviour of a system is described
technology independent
CAD algorithms can search the solution space very
quickly
&
D Q
if data-ready then
bus := data;
else
bus := high-Z; clk
end if;
JMM v1.4
Design methods for VLSI circuits
use advantages of top-

top-down and bottom-
bottom-up design
methods
automatic optimisations are not always ideal,
but
an optimisation of a 70’000 gate design on a
100’000 gate gate-
gate-array makes no sense
need of abstract design languages
Öneed
need to keep the design cycle short
Öneed
what it is now
top-
top-down or bottom-
bottom-up ?
JMM v1.4
Abstraction Domains
VLSI designs can be performed in 3 abstraction

domains:
behavioural domain
structural domain
physical domain
each domain gives different freedoms to the
designer
parallel or serial algorithms
logic technology and bit-
bit-slice
full-
full-custom and macro-
macro-cells ...
JMM v1.4
Abstraction Domains: YY--Chart
synthesis
Behavioural Domain Structural Domain

applications, algorithm s
processors
progra ms
system AL Us, registers
subro utines, b. equ ations
abstraction level instructionslogic gates
tra nsistors
micro architecture
abstraction level layout, transistors
logic cells
abstraction level
chips, mo d ules
circuit chips, MC M s, boar ds
abstraction level
Physical Domain
JMM v1.4
Behavioural Domain
description and verification of first ideas

function and not implementation is asked
modelling with general purpose languages
modula-
modula-2, pascal,
pascal, c, c++, lisp, ...
matlab,
matlab, mathematica,
mathematica, ...
vhdl,
vhdl, verilog-
verilog-hdl,
hdl, cathedral, ...
graphic languages as vee,
vee, ...
transformation to structural domain: synthesis
Behavioural Domain
progr a ms
subro utines, b. equ ations
instructions
JMM v1.4
Structural Domain
description and verification of a solution

implementation decisions taken
restrictions like delay, signal strength, etc.
modelling styles
vhdl,
vhdl, verilog-
verilog-hdl,
hdl,
schematic
transformation to physical domain:
logic minimization, place and route tools
logic
Structural Domain
processors
AL Us, registers
logic gates
tra nsistors
JMM v1.4
Physical Domain
description and verification of physical

implementation
process technology specific implementation
floorplan,
floorplan, mask-
mask-layout, packaging
description formats
cif,, gds2
cif
stick diagrams, symbolic layout
layout, transistors
cells
chips, mo d ules
chips, MC M s, boar ds
Physical Domain
JMM v1.4
Abstraction Levels
design domains are divided in several abstraction

design
levels:
system level
micro architecture level
logic level
circuit level
JMM v1.4
Abstraction: System Level
highest abstraction level

description with HDLs or graphical block diagrams
24 bit graphic video

64 bit RISC
accelerator interface
64 MByte 8 GByte ISDN

memory hard disk interface
JMM v1.4
Abstraction: Microarchitecture Level
register transfer system is a pure sequential

machine
use of memory elements and combinational logic
register transfer is a complete specification on
what a chip will do on every cycle
output
input reg reg

combinational
logic
combinational
logic
reg
combinational
logic
JMM v1.4
Abstraction: Logic Level
circuit description on a quite low abstraction level

today only used to design optimised functional
blocks
cin sel
a
b
mux s
ALU
cout
JMM v1.4
Abstraction: Circuit Level
lowest abstraction level

transistor schematic or mask-
mask-layout
comparable to machine code in computer science
c y
b
JMM v1.4
Design Strategies
the goal is a fast as possible transfer of an idea to

a chip
descriptions in the 3 abstraction domains
strategies used:
hierarchy
regularity
modularity
locality
a strategy?
why not ad-
ad-hoc
JMM v1.4
Design Strategies: Hierarchy
basic idea: divide and conquer

dividing in modules, sub-
sub-modules until complexity
of sub-
sub-modules is comprehensible
comparison to software engineering: split programs
in modules, procedures, subroutines.
cin
a sum
b
cin sum
a
b adder cout
cout
JMM v1.4
Design Strategy: Regularity
Ö goal is reduction of complexity

Ö idea: divide in similar building blocks
identical blocks, sub-
sub-blocks, cells, transistor sizes
1-dim. arrays: bit-
bit-slice technique
2-dim. arrays: systolic arrays
si+3 si+2 si+1 si
ci+3 full ci+2 full ci+1 full ci full ci-1
adder adder adder adder

ai+3 bi+3 ai+2 bi+2 ai+1 bi+1 ai bi
JMM v1.4
Design Strategies: Modularity
different modules should not influence each other

Ödifferent
sub-
sub-modules with well formed interfaces:
Ösub
do not use transmission gates
well defined signal types and strengths
well defined interconnection widths, etc.
JMM v1.4
Design Strategies: Locality
idea: reduction of complexity due to information

Öidea:
hiding
few global variables
reduction of inter-
inter-module influences
reduction of global wiring
time locality leads to synchron designs (compare
local variables in software engineering)
I can’t see anything
JMM v1.4
Automatic Synthesis /1
automatic synthesis: transformation of a design

from behavioural to structural domain
silicon compilation: transformation from
behavioural to physical domain
synthesis
Behavioural Domain Structural Domain
silicon compilation
Physical Domain
JMM v1.4
Automatic Synthesis /2
automatic synthesis tools on high abstraction levels

do not exist yet
not every description is synthesizable
synthesis is a design process and not a only a
coding as in software engineering
synthesis steps:
allocation
scheduling
binding
JMM v1.4
Automatic Synthesis: Allocation
allocation defines the necessary resources

clocking strategy, pipelining, memory structure etc.
have to be defined
manual allocation reduces the search space of
design solutions
trade-
trade-off between chip-
chip-area and performance
parallel implementations of designs have high
throughput, but consume large areas
delay s1
s4
s6
s8
s10
s14 s18
s22
area
JMM v1.4
Allocation: Example
RTL example
xx = a + b;
yy = a * c;
zz = x + d;
xx = y - d;
xx = x + c;
allocation: 1 adder, 1 multiplier, 1 substractor
a b c d
+ * +
y z
-
x2
+
x3
JMM v1.4
Automatic Synthesis: Scheduling
scheduling defines the operation sequencing

operations are bound to clock cycles
scheduling principles:
resource limited: given a set of resources, solutions for a
minimal execution time has to be found
time-
time-limited: given a total execution time, a minimal set
of resources has to be found
JMM v1.4
Scheduling: Example
resource limited scheduling

each operation is bound to a clock cycle
solutions for minimal execution time
directed acyclic graphs can be used
a b c
+ *
cycle 1 y
d
- +
cycle 2
x2 z
+
cycle 3
x3
JMM v1.4
Automatic Synthesis: Binding
binding phase: operations and memory accesses

within the clock cycles are bound to the hardware
resources
resources can be reused in different clock cycles
binding steps:
variables are bound to memory elements
operations are bound to functional blocks
interconnection elements are bound for data transfers
(busses, multiplexers)
multiplexers)
JMM v1.4
Binding: Example
variables are bound to memories

temporary variables x1 and x2 are not used
simultaneously
b a c
d
cycle 1 + *
x1 y
cycle 2 + -
z x2
cycle 3 +
x3
JMM v1.4
cont..
Binding: Example cont
reg
x1 x2 a
mux mult
b d
y
reg reg reg reg
mux mux
add sub
z, x1, x3 x2
JMM v1.4
Architecture Models
synthesis is based on the knowledge of a set of

architecture models and design styles
design styles:
parallel or serial datapath
interrupt or polling control
memory access types (cache ...)
JMM v1.4
Architecture Models:
Microarchitecture
microarchitecture components
microarchitecture
functional units
adder, multiplier, comparator, ALU, etc.
memory elements
latch, flip-
flip-flop, register, register-
register-file, RAM, ROM ...
interconnection units
bus, multiplexer
JMM v1.4
Architectural Models:
Combinational Logic
combinational logic:
non subdividable units
encoder, decoder, carry-
carry-lookahead adder ...
subdividable units
ripple-
ripple-carry adder, selector, ALUs,
ALUs, ...
implementation forms
ROM (table lookup)
PLA structures (2 stage logic)
multistage logic
bit-
bit-slice, systolic array, etc
JMM v1.4
Finit State Machines
finit state machines (FSM) are classical control

structures
autonomous FSM
no inputs (image processing, ...)
non-
non-autonomous FSM with inputs (general purpose)
Mealy machine (general)
Moore machine (restricted)
Medwedjew machine (hazard free)
JMM v1.4
Control Unit / Data Path
FSMs are used for control unit tasks

datapaths are used as functional units
control unit - datapath model (FSMD model)
Öcontrol
control inputs datapath inputs
FSM datapath
transfer transfer
logic logic status
functional
unit
state datapath
control
register
control outputs datapath outputs
JMM v1.4
System Architecture
FSMD is used as process on system level

system consists of a set of processes
hierarchical FSMD model
process synchronization is needed
D
Q process 1
control inputs databus
FSM datapath
transfer transfer
logic logic status
functional
unit
state datapath
control
clock1 register
control outputs
D
Q
control inputs
FSM datapath
transfer
transfer transfer
logic
logic logic status
functional
unit
state datapath
control
clock2 register
control outputs
process 2
JMM v1.4
Interprocess Communication
synchronous or asynchronous communications

no protocol, delay known
handshake protocol
process 1
request
aknowledge
data data valid
process 2
JMM v1.4
Implementation Constraints
behavioural modelling uses abstract models, which do not

model the reality precisely
Ö implementation constraints / pitfalls
deactivation of set and reset of latches simultaneously
clock skew in shift registers lead to races of clock and data
(two phase clocking strategy)
Moore and Mealy FSMs have hazards
asynchronous inputs lead to undefined FSM states
Ö never use:
gated clocks
combinatorial outputs for asynchronous inputs
asynchronous inputs as FSM inputs
JMM v1.4
Conclusions
description--synthesis method
description
system design with HDLs (parallel constructions,
RTL level)
top-
top-down and bottom-
bottom-up design
abstract models are not precise
races, hazards, delays, signal strength, ...
silicon compiler does not exist
JMM v1.4
Coming Up...
Next time...
Hardware description languages
Reading
Weste:
Sections 6 thru 6.2.7 (design strategy)
6.4 thru 6.4.5 (design methods)
6.5 thru 6.5.4 ((design capture tools)
Self study Weste:

Weste:
6.6 thru 6.6.8 (design verification)
6.8 thru 6.9 (data sheets)
JMM v1.4
VLSI Design I
Design for Test
He’s dead Jim...
Overview
design for test architectures
ad-
ad-hoc, scan based, built-
built-in
Goal: You are familiar with testability metrics and
you know ad-
ad-hoc test structures as well as scan-
scan-
based test structures. Built in test structures as
BILBO and boundary scan can be applied.
JMM v1.3
Design For Test
What can we do to increase testability?
increase observability
Ö add more pins (?!)
Ö add small “probe” bus, selectively
enable different values onto bus
Ö use a hash function to “compress” a
sequence of values (e.g., the values of a
bus over many clock cycles) into a
small number of bits for later read-
read-out
Ö cheap read
read--out of all state information
increase controllability
Ö use muxes to isolate sub-
sub-modules and
select sources of test data as inputs
Ö provide easy setup of internal state
Design strategies for test (design for testability):

ad-
ad-hoc testing
scan-
scan-based approaches
self-
self-test and built-
built-in testing
JMM v1.3
Ad--hoc testing #1
Ad
Ad-
Ad-hoc test techniques are a collection of ideas
aimed at reducing the test time. Common
techniques are:
partitioning large sequential circuits
adding test points
adding multiplexers
providing for easy state access .

.
.
& co3 load & co3 load & co3
test 1 test 1
=1 0 =1 0 =1
Q3 Q3 &
Q3
test
&
co2 &
co2 &
co2
load load
test 1 test 1
=1 0 =1 0 =1
Q2 Q2 Q2
&
co1 &
co1 &
co1
load load
test 1 test 1
=1 0 =1 0 =1
Q1 Q1 Q1
vdd &
co0 vdd load &
co0 vdd load &
co0
test 1 test 1
=1 0 =1 0 =1
Q0 Q0 Q0
half-
half-adder
JMM v1.3
Ad--hoc testing
Ad #2
bus oriented test technique

bus
unit unit unit unit

1 2 3 4
multiplexer based testing

A inp B inp
Module Module
1 A B
A control 0
Module A Module B
1
0 B control
Module Module
0 1 1 0 A B
A out test1
test1 test2
test2 B out
Module A test: {test1,test2}={0,1}

JMM v1.3
Scan--based test techniques #1
Scan
Idea:
Idea: have a mode in which all registers are chained
into one giant shift register which can be loaded/
read-
read-out bit serially. Test remaining (combinational)
logic by
(1) in “test” mode, shift in new values for all
register bits thus setting up the inputs to the
combinational logic
(2) clock the circuit once in “normal” mode, latching
the outputs of the combinational logic back into
the registers
(3) in “test” mode, shift out the values of all
register bits and compare against expected
results. One can shift in new test values at the
same time (i.e., combine steps 1 and 3).
.
.
.
scan-
scan-out
CL 1
D Q
0
shift out
clk
QQ DD
QQ DD normal/test
clk
clk
clk
clk 1
D Q
0
shift in
normal/test clk
scan-
scan-in
normal/test
JMM v1.3
Scan--based test techniques #2
Scan
serial scan
scan-
scan-out
DD QQ CL1 DD QQ CL2 DD QQ
DD QQ DD QQ DD QQ
clk
clk clk
clk clk
clk
clk
clk clk
clk clk
clk
scan-
scan-in serial scan chain
Scan registers
partial serial scan: sometimes it is not area and

speed efficient to implement scan in every location
where a register is used (signal processing)
R1 CL R4
CL CL
R2 CL R5
R6
R3
JMM v1.3
Level sensitive scan design
A popular approach is the level sensitive scan
design technique from T.W. Williams (LSSD)
the circuit is level sensitive (steady state response is
independent of circuit and wire delays within a circuit):
hazard free
each register may be converted to a serial shift register
D D
T T2
C 1 B
I
A reg A reg B
L1 L2 D D
C B C B
I I
A A
Comb
D D
C B logic C B
I I
A A
D D
C B C B
I I
A A
serial data out
serial data in
c1
shift-clk
c2
shift-
normal operation
shift data into reg A shift reg B out
c1
shift-
shift-clk
c2
JMM v1.3
Scan Elements
D D
T T2
C 1 B
LSSD I
A
L1 L2
D
&
& T1 D
& & T2
&
C
I
& & &
B
&
&
A L2
L1
D
scan FF 1
D Q Q
0
TI
clk
TE
TE clkb clka
D
clka clkb Q
TE clkb
clka
TI
TE clkb clka
JMM v1.3
Self--Test Techniques: BILBO
Self
Problem: Scan-
Scan-based approach is great for testing combinational logic
but can be impractical when trying to test memory blocks, etc. because
because
of the number of separate test values required to get adequate fault
fault
coverage.
Solution: use on-

on-chip circuitry to generate test data
and check the results. Can be used at every power-
power-on
to verify correct operation!
1 circuit
0
under
test
normal/test
FSM FSM okay

A B
Generate pseudo-
pseudo-random For pseudo-
pseudo-random input
data for most circuits by data simply compute some
using, e.g., a linear feedback hash of output values and
shift register (LFSR). compare against expected
Memory tests use more value (“signature”) at end of
systematic FSMs to create test. Memory data can be
ADDR and DATA patterns. checked cycle-
cycle-by-
by-cycle.
JMM v1.3
Linear Feedback Shift Register (LFSR)
If Ci’s are not programmable, can eliminate
AND gates and some XOR gates...
=1 =1 =1 =1
&
&
&
&
&
....
c1 c2 c3 cn-1 cn
D Q D Q D Q D Q D Q
clk clk clk clk clk
1 + c1 x + c 2 x 2 + c3 x 3 cn−1 x n−1 + cn x n
with a small number of XOR gates the cycle

time is very fast. Cycle through fixed sequence
of states (can be as long as 2n-1 for some n’s).
n’s).
Handy for large modulo-
modulo-n counters.
different responses for different initial states
different responses for different ci
Î pseudo-
pseudo-random sequence generator (PRSG)
JMM v1.3
Signature Analysis
signature analysis is used to compact a data stream
into a so called signature
different responses for different ci, many well-
well-
known CRC (cyclic redundancy check) polynomials
correspond to a specific choice of ci’s.
’s.
serial in
=1 =1 =1 =1
&
&
&
&
&
....
c1 c2 c3 cn-1 cn
=1 D Q D Q D Q D Q D Q
clk clk clk clk clk
parallel in
=1 =1 =1
&
&
&
c1 c2 Cn-1
=1 D Q =1 D Q . . . . =1 D Q =1 D Q
clk clk clk clk
z1 q1 z2 q2 zn-1 qn-1 zn qn
JMM v1.3
LFSR Polynomials
polynomials for maximal long sequences for n equal
1 up to 32
n f(x)
1,2,3,4,6,7,15,22 1+x+x
1+x+xn
5,11,21,29 1+x2+xn
10,17,20,25,28,31 1+x3+xn
9 1+x4+xn
23 1+x5+xn
18 1+x7+xn
8 1+x2+x3+x4+xn
12 1+x+x4+x6+xn
13 1+x+x3+x4+xn
14,16 1+x3+x4+x5+xn
19,27 1+x+x2+x5+xn
24 1+x+x2+x7+xn
26 1+x+x2+x6+xn
30 1+x+x2+x23+xn
32 1+x+x2+x22+xn
examples of CRC’s
n CRC
8 1+x+x4+x5+x7+x8
16 1+xMicroLab,
2+x15+x16
VLSI-23 (12/24)
JMM v1.3
BILBO #1
Very popular built-
built-in test structure is the built-
built-in
logic block observation (BILBO) from Koenemann
BILBO operate in 4 different modes
parallel register BILBO BILBO

mode normal
operation
register of circuit register
mode mode
PRSG or
signature analysis BILBO BILBO
normal
mode operation signature
PRSG of circuit analysis
mode mode
scan mode BILBO BILBO

mode normal
operation
scan of circuit scan
mode mode
reset BILBO BILBO

mode normal
operation
reset of circuit reset
mode mode
JMM v1.3
BILBO #2
example of a BILBO element with polynomials 1+x+x4
D0 D1 D2 D3
c1
c0 scan
out
&
&
&
&
scan =1 D =1 D =1 D =1 D
in 0 & Q & & &
Q Q Q
1 clk clk clk clk
=1
Q1 Q2 Q3 Q4
mode c1 c0 function
A 0 0 scan mode
B 1 0 reset
C 0 1 PRSG or signature analyzer
D 1 1 parallel registers
JMM v1.3
IDDQ Testing
A-met
meter (measures IDD)
VDD
GND
Idea: CMOS logic should draw no current

when it’s not switching. So after initializing
circuit to eliminate tri-
tri-state fights, disable
pseudo-
pseudo-NMOS gates, etc., the power-
power-supply
current should be zero after all signals have
settled.
Good for detecting bridging faults (shorts).

May want to try several different circuit
states to ensure all parts of the chip
have been observed.
JMM v1.3
System--Level Test: Boundary Scan
System
The IEEE 1149.1 boundary scan architecture
provides a standardized serial scan path through the
I/O pins of a chip (also called JTAG)
at the board level, chips obeying the standard may
be connected in a variety of series and parallel
combinations for board testing (replacing bead of
nails)
standardized tests:
connectivity tests between components
sampling and setting chip I/Os
distribution an collection of self-

self-test or built-
built-in-
in-test
results
serial test interconnect

PCB interconnect
IO pad and
boundary cell
serial data out

serial data in MicroLab, VLSI-23 (16/24)
JMM v1.3
Boundary Scan: Test Access Port
The test access port (TAP) is a definition of the

interface that needs to be included in an IC
TCK: test clock input
TMS: test mode select
TDI: test date input
TDO: test data output
TRST: optional signal for asynchronous reset the TAP
the test architecture

test data registers
0
TDI instruction decode TDO
1
instruction registers
clocks/control
TCK TAP
TMS controller
(TRST)
JMM v1.3
Boundary Scan: TAP controller
State machine for the TAP controller. TMS is the

control signal.
1 test-
test-logic reset
0 1
1 1
0 run-
run-test/idle select-
select-DR-
DR-scan select-
select-IR-
IR-scan
0 0
1 capture-
capture-DR 1 capture-
capture-IR
0 0
shift-
shift-DR 0 shift-
shift-IR 0
1 1
exit1-
exit1-DR 1 exit1-
exit1-IR 1
0 0
pause-
pause-DR 0 pause-
pause-IR 0
1 1
0 exit2-
exit2-DR 0 exit2-
exit2-IR
1 1
update-
update-DR update-
update-IR
1 0 1 0
JMM v1.3
Boundary--scan: IR
Boundary
Instruction register (IR): minimum 2 bits
to next IR bit
data 0
D Q D Q
from last cell 1
IR bit
clk clk
shiftIR
clockIR updateIR
TRST &
reset
FSM state capture-

capture-IR shift-
shift-IR exit1-
exit1-IR pause-
pause-IR exit2-
exit2-IR update-
update-IR
shiftIR
clockIR
updateIR
JMM v1.3
Boundary--scan: DR
Boundary
TAP data register (DR)
boundary scan registers

TDI internal data register TDO
bypass register (1 bit)
boundary scan register is a special case of a data

register. It allows circuit board interconnections to be
tested, external components tested, and the state of the
chip digital I/Os to be sampled. The boundary scan
register is mandatory.
internal data registers are optional and add additional
access to the circuit.
the bypass register is a 1 bit register used to bypass a
whole chip.
JMM v1.3
Boundary--scan: DR
Boundary
boundary scan input and output cells
mode
next cell
out 0
PAD 0 1 to chip
1 D Q D Q
shiftDR clk clk
last cell clockDR updateDR
mode
next cell
0 out
from chip 0 1 PAD
1 D Q D Q
shiftDR clk clk
last cell clockDR updateDR
boundary scan bi-

bi-directional cell
next cell
0
enable 0 1
1 D Q D Q
shiftDR clk clk
clockDR updateDR
0 bidir
from chip 0 1 PAD
1 D Q D Q
shiftDR clk clk
clockDR updateDR
0
0 1
1 D Q D Q
last cell shiftDR clk clk
clockDR updateDR
to chip
JMM v1.3
Boundary scan: instructions
Minimum 3 instructions
Bypass (all 0): it is used to bypass any serial data
registers in a chip with a 1 bit register. This allows
specific chips to be tested in a serial-
serial-scan chain without
having to shift through the accumulated SR stages in all
the chips
Extest (all 1): testing of off chip circuitry
sample/preload: places the boundary scan registers (at

the chips I/O pins) in the DR chain, and samples or
preloads the chips I/Os
optional recommended instructions:
Intest:
Intest: single-
single-step testing of internal circuitry via the
boundary scan registers
Runbist:
Runbist: run internal self-
self-testing procedures within a chip
JMM v1.3
Coming Up...
Next time:
Top down design. Hardware description languages,
logic synthesis.
Readings …
Weste:
Weste:
7.3 through 7.3.3.3 (ad-
(ad-hoc & scan-
scan-based testing)
7.3.4 through 7.3.4.1 (BILBO)
7.3.5 (Iddq
(Iddq testing)
7.5 (boundary scan)
JMM v1.3
VLSI--22
Exercises: VLSI
Ex vlsi22.1 (difficulty: easy): calculate the pseudo-

pseudo-
random sequence of an LFSR with the implemented
polynomial 1+x+x3 use the start value x=1
Result: 1,3,7,6,5,2,4,1,...
JMM v1.3
VLSI Design II
Small Signal FET Model
and Diode Models
Overview
small signal equivalent circuit for fet and diodes
advanced large fet modeling and second-

second-order
effects
Goal: You can use the small signal equivalent circuit

of a diode and a MOS transistor. You are able to
determine the parameters of a fet and have the
feeling for a MOS fet.
fet. You are familiar with
advanced modeling like weak inversion, short-
short-
channel effects and leakage.
MicroLab, VLSI-24(1/22)
JMM v1.3
Summary: Large Signal Model
MOS fets have 3 regions of operation
cutoff region (
(subthreshold
subthreshold):
subthreshold): VGS <= Vth
linear region (triode region): VGS> Vth ; 0< VDS< VDSsat
active region (saturated region): VGS> Vth ; VDSsat< VDS
cutoff
(subthreshold)
subthreshold I DS = 0
W  2
VDS 
linear region I DS = µCox
L (VGS − Vth )VDS − 2 
 
active region channel length modulation
µCox W
I DS (sat ) = (VGS − Vth )2 [1 + λ (VDS − Veff )]
2 L
k rds 2ε Si
λ= k rds =
2 L VDS − Veff + Φ 0 qN A
Body effect Vth = Vth 0 + γ ( VSB + 2φ F − 2φ F )

Veff = VGS − Vth 2ε Si qN A φ = −V ln N A 
γ = F T  n 
Cox MicroLab, VLSI-24(2/22)  i 
JMM v1.3
Advanced Large Signal Modeling:
Cutoff or subthreshold region
Condition: VGS<=V
<=Vth
Channel is not inverted and therefore
IDS=0
A more precise definition, which is better suited for
analog design takes into account that teh channel
becomes not suddenly inverted when the gate-
gate-source
voltage is increased. Depending on the gate-
gate-source
voltage, we define three regions of inversion:
weak inversion: Veff < -100mV
moderate inversion: -100mV < Veff < 100mV
strong inversion: Veff > 100mV

IDS
(some designers use 200mV instead)
quadratic
strong inversion
UGS
weak inversion: Ut
log IDS
W
I DS ≅ I D 0  e (qVGS / nkT )
L
exponential
n ≅ 1.5 weak inversion
UGS
JMM v1.3
Short Channel Effects
As device dimensions are scaled down, short-
short-channel
effects degrade the operation of mos fets
mobility degradation: short channels and large
electric fields provoke more electron collisions.
Carrier velocity saturates as it is not anymore
proportional to the electric filed: µn E
νd ≅
1 + E
µ nCox W 2 where is the Ec
ID = Veff
2(1 + θVeff ) L square law
Id
θ = 1 LE
c
1 UGS U’GS
Rsx ≅ Rsx
µ nCoxWEc
hot carrier effects specially in nfets due higher

mobility: high velocity electrons can generate
electron hole pairs in drain to substrate: V >>V
>>V G th VD>>0
Î reduced output impedance

n+ n+
punch
through current drain to source
current
JMM v1.3
Leakage Currents
An important second-
second-order device limitation is the
leakage current of the junctions (ex sample-
sample-and
hold time)
the intrinsic concentration is a strong function of
temperature, the leakage current is also strongly
dependent of temperature (approx. doubles for 11C)
leakage current of a reverse-
reverse-biased junction:
electron and hole lifetime

junction area
1
qA j ni τ 0 ≅ (τ n + τ p )
I IK ≅ xd 2
2τ 0
2ε si
xd = (Φ 0 + VR )
qN A
JMM v1.3
Small Signal Equivalent Circuits
Why do we love them?
Find Id of a transistor in active region when the gate
sin(ωt)
is driven with a voltage source Vgs=V0sin(ω
It is handy to use simple linear equations !
ÖIt
What are small signal parameters?

Instead of using nonlinear transistor curves, we
determine the operating point and use the derivative
in this point
f(x)
f(x0)
x0 x
f (n ) (x0 )
∞
f (x ) = ∑ (x − x0 )n
Taylor: n =0 n!
approximation: f (x ) ≈ f (x ) +
df (x0 )
0 (x − x0 )
dx
operating point small signal
– small signal parameters are denoted with small letters
– small signal parameters are very handy for building
simple equivalent circuits MicroLab, VLSI-24(6/22)
JMM v1.3
Transconductance #1
The most important small signal component is the

transconductance.
transconductance. The behavior of a
transconductance is the one of a voltage controlled
current source. It describes the change of output
current when the input voltage is varied.
gm main transconductance,
transconductance, describes the
amplification of the drain current when a voltage is
applied between gate and source.
gds transconductance,
transconductance, accounting for finite output
impedance of transistor. Models channel length
modulation effect, when drain to source voltage
varies.
gs transconductance,
transconductance, describing how the output
current depends on the source to substrate voltage
(body effect).
JMM v1.3
Transconductance #2
µCox W
I DS (sat ) = (VGS − Vth )2 [1 + λ (VDS − Veff )]
2 L
∂I D
gm =
∂VGS
W 2I D W
g m = µ n Cox Veff = = 2 µ n Cox I D
L Veff L
∂I D ∂I D ∂Vtn
gs = = ⋅
∂VSB ∂Vtn ∂VSB
γ ⋅ gm the negative sign is eliminated

gs = by changing the current direction
2 VSB + 2φ F in the equivalent circuit
∂I D
g ds =
∂VDS
1
g ds = = λI Dsat ≈ λI D
rds
JMM v1.3
Small--Signal Modeling in the Active
Small
Region (Low Frequency)
the low-
low-frequency model
id
vg vd
+ gmvgs gsvs rds
vgs
-
is
vs
Depending on the terminal voltages, and the relative size of
the parameters, some of the components may be ignored.
This helps to reduce the complexity of hand calculations.
the alternate low-

low- vd
frequency T model
is
vg rds
is rs=1/gm
vs
JMM v1.3
MOSFET Capacitance Estimation
in Active Region
The dynamic response of MOS systems strongly depends on
the parasitic capacitance associated with the MOS transistor.
2 2
C gs = WCox  L + Lov  = WLCox + WCGS 0
3  3 C j0
C gd = WLov Cox = WCGD 0 C jx = Mj
V
1 + XB 
 Φ0 
Csb = ( As + Ach )C js
'
C j −sw0
'
Cdb = Ad C jd C j −sw, x = M jsw
V
1 + XB Φ 
 0
Csb = Csb' + Cs−sw Cs −sw = Ps C j −sw,s
Cdb = Cdb '
+ Cd −sw Cd −sw = Pd C j −sw,d
VGS>Vth
VDG>-Vth
VSB=0
poly Cgd
Al Cgs SiO2
n+ n+
p+ field Cs-sw C’sb Lov
impland Cd-sw
p- substrate C’db
VB=0MicroLab, VLSI-24(10/22)
JMM v1.3
Small--Signal Modeling
Small
in the Active Region
the small signal model
Cgd id
vg vd
+ gmvgs gsvs rds
Cgs vgs
- Cdb
is
Csb vs
Gate capacitance Cgs is normally the largest parasitic

cap of fet.
fet.
The gate-
gate-drain overlap capacitance Cgd is normally
small, can however play a role when the voltage gain is
large (Miller effect).
Source capacitance Csb is normally second largest
capacitance, since it includes channel bulk capacitance.
Drain capacitance Cdb normally smallest capacitance.
JMM v1.3
Small
in the Triode region
a simplified triode-
triode-region model for small VDS
In the triode region a resistor modeling the conductance

of the channel is normally sufficient.
W 1 2 
I DS = µCox (VGS − Vth )VDS − VDS 
L 2
1 W
= g ds ≅ µCox Veff
rds L vg
The accurate modeling of

Cgs rds Cgd
high frequency operation vs vd
of a fet in triode region
is nontrivial. We use a Csb Cdb
simplified model.
1 1
C gs = C gd = AchCox + WLov Cox = WLCox + WCGX 0
2 2
1
C xb =  Ax + Ach C jx + Px C j −sw, x
 2 
JMM v1.3
Small
cut--off region
in cut
a simplified cut-
cut-off region model
vg
Cgs Cgb Cgd

vs vd
Csb Cdb
As the channel has disappeared we have:

C gs = C gd = WLov Cox = WCGX 0
but we also have a new capacitor:
C gb = AchCox
The capacitors Csb and Cdb are smaller as the channel is

not present :
C xb = AX C jx + Px C j −sw, x
JMM v1.3
Diodes
anode cathode
p+/nwell diode anode
Note that the metal Al SiO2
contacts to the p+ n+
diode are connected n well cathode
to heavily doped p- substrate
region pn junction
cathode anode
anode
n+/pwell diode
Al SiO2
n+ p+
p well cathode
n- substrate
pn junction
Schottky diode anode cathode

anode
metal contacts to Al SiO2
lightly doped
n+
semiconductor n well cathode
forms a Schottky
p- substrate
diode
Schottky diode depletion region
JMM v1.3
Diode Modeling
If a diode is reverse-
reverse-biased, current flow is
extremely small and primarily due to thermal or
optically generated carriers. electric field
C j0
Cj = Mj p+ n
 VR 
1 +   N AND 
 Φ0  Φ 0 = VT ln 2

 ni  depletion region
qε si N D N A depletion
Cj0 = capacitance Cj
2 Φ 0 (N A + N D )
Large-
Large-signal model for forward biased junction
VD  1 1 
ID = ISe VT I S ∝ AD  + 
 N A ND 
CT = Cd + AC j diffusion capacitance Cd
ID
(Cd=0 for forward biased Schottky diodes) Cd = τ T
VT
Small-
Small-signal model for a forward-
forward-biased diode
dominant for
1 dI D I D large currents
= =
rd dVD VT rd
Cj Cd
JMM v1.3
Coming Up...
Next time:
Basic current mirrors and single stage amplifiers.

Johns&Martin:
1 through 1.1 ((pn
pn junctions)
1.2 (mos
(mos transistor)
1.2 (advanced mos modeling)
CAD Exercises for next time…

Ex600: simulation of static behavior of nfet
Ex600a: output resistance and channel length
modulation
Ex600b: weak vs strong inversion
JMM v1.3
VLSI--24
Exercises: VLSI #1
Johns&Martin 1.1 pp7: Ex1.4 (difficulty: easy):
Assuming process C05M-
C05M-D. a) Calculate the total
zero-
zero-bias depletion capacitance CT-j0 of a p+nwell
5µm times 5µ
diode with an area of 5µ 5µm. Do not use
the Spice parameter CJ. b) At 3V reverse-
reverse-bias the
capacitance Cj has to be calculated again.
Result:
Result: a) CT-j0=16.3fF, b) CT-j=8.98fF
Johns&Martin 1.1 pp10: Ex1.6 (difficulty: medium):
C05M-D and Mj=0.5 (use Spice
parameter CJ). A reversed biased p+nwell diode is
charged from 0V to 3.3V through a 10kΩ 10kΩ resistor.
Calculate the time to charge the diode to 2/3 2/3 of its
end value.
value.
Result: t66%=130ps (Johns: see eq.
Result: eq. 1.36 pp10)
Johns&Martin 1.2 pp31: 1.9 (difficulty: easy):
C05M-D. a) Derive the low- low-
frequency parameters for an nfet with W=10µ W=10µm and
L=0.5µm at Vgs=1.1V, Vds= Veff , Vsb= 0.55
L=0.5µ 0.55V.
b) What is the new value of rds if the drain-drain-source
voltage is increased by 0.55
0.55V.
Result:
Result: a) gm=0.98mA/V, gs=0.143mA/V,
=208kΩ, b) rds=12.8kΩ
rds=208kΩ =12.8kΩ ???? MicroLab, VLSI-24(17/22)
JMM v1.3
VLSI--24
Exercises: VLSI #2

C05M-D. Find the T- T-model
parameter rs for the nfet for example 1.9a.
=502Ω
Result: rs=502Ω
Result:
C05M-D. Find the gds for the
nfet for example 1.9 working in triode region with
Vds near zero.
Result: =502Ω
Result: gm=1.99mA/V, rds=502Ω
C05M-D. a) Find ID for an nfet
W=10µm, L=0.5µ
with W=10µ L=0.5µm and VGS=1.1V, VDS=
Veff . b) Assuming λ remains constant, estimate
the new value of ID if VDS is increased by 0.3V.
=487µA, b) ID= 513µ
Result: a) ID=487µ
Result: 513µA
JMM v1.3
VLSI--24
Exercises: VLSI #3
Ex600a: Johns&Martin 1.9 pp79: 1.8 (difficulty:

easy): Assuming process C05M-
C05M-D. Simulate a fet
W=10µm, L=2µ
W=10µ L=2µm in its active region
(VGS=2V) and measure the drain current at
VDS1=2V and at VDS2=3V. Estimate the output
impedance rds and the channel length modulation
factor λ.
=402kΩ, λ=0.006
Result: rds=402kΩ
Result:
JMM v1.3
VLSI--24
Exercises: VLSI #4
Ex vlsi24.1 (difficulty: easy): Assuming process

C05M-
C05M-D. Find the capacitances of an nfet as shown
below in its active region for Vsb=1V, Vdb=2V.
Result:
Result: Cgs=3.86fF, Csb=3.09fF, Cdb=1.94fF,
Cgd=0.41fF ((see
see Johns&Martin pp35)
0.5µm
0.5µ
0.6µm
0.6µ
3µm
0.6µm
0.6µ
JMM v1.3
VLSI--24
Exercises: VLSI #5
Ex vlsi24.2 (difficulty: easy): Assume the transistors

are designed with minimal dimensions using the
0.5µm Alcatel Mietec process. Use the λ rules to
0.5µ
calculate the Cgs, Csb and Cdb capacitances for its
active region. Compare the values with a single
device fet.
fet.
Result:
Result: a) Cdb=26.6fF, Csb=49.1fF, Cgs=34.8fF,
(see Johns&Martin pp103ff)
node 1
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
27λ
27λ
node 2 gates
JMM v1.3
VLSI--24
Exercises: VLSI #6
Ex600b: (difficulty: easy, medium time): Assume the

W=10µm and
transistors are designed with W=10µ
L=2µm using the 0.5µ
L=2µ 0.5µm Alcatel Mietec process.
Simulate the fet with Spice in strong and weak
inversion. Visualize VGS vs IDS, sqrt IDS and log IDS
and identify the different regions and find ID0.
Result: compare with transparency #3
JMM v1.3
VLSI Design II
Basic Current Mirrors and
Single Stage Amplifiers He !
That‘s me !
Goal: You know the properties of the different

amplifier stages and are able to choose the one
which is best suited for your application. You can
determine the fet dimensions from a given circuit
specification. You are familiar with current mirrors.
You can apply two possible techniques for
improving the output impedance. You know the
resulting limitations on the output voltage swing.
MicroLab, vlsi-25 (1/26)
JMM v1.2
Outline
Current mirrors
Single stage amplifiers with active loads
Johns&Martin
nodal analysis method
simple CMOS current mirror (chap 3.1)
common-
common-source amplifier (chap 3.2)
source-
source-follower or common drain amplifier (chap 3.3)
common gate amplifier (chap 3.4)
source degenerated current mirror (chap 3.5)
high-
high-output-
output-impedance current mirrors (chap 3.6)
cascode gain stage (chap 3.7)
Exercises
hand calculations
spice simulations
JMM v1.2
Simple CMOS Current Mirror
Used as bias current source
Used to multiply currents
Used as high output impedance
Q1 and Q2 have the same size

both transistors are in active region
W
I ds (sat ) = µ n C ox (Vgs − Vt )
2
2L
Iin Iout
Id active
V1 rout
Q1 Q2
linear
Vds
Vgs 1 = Vgs 2 → Iin = I out
consider minimal output voltage

consider finite output impedance
JMM v1.2
(Q1 model)
small signal model (low frequency)
id
vg vd
+ gmvgs gsvs rds
vgs
-
is
vs Iin Iout
small signal model for Q1 V1 rout
Q1 Q2
V1
iy
vg1
+ gm1vgs1 gs1vs1 rds1 +
vgs1 ~ vy
- -
v1
Q1 1/gm1=rs1
small signal model of
diode connected transistor
JMM v1.2
(small signal analysis)
Small signal model of overall CMOS current mirror
Q1 Q2
ix
+ gm2vgs2 rds2 +
1/gm1 vgs2 ~ vx
- -
as there is no current through gm1 -> vgs2=0
ix
rds2 +
~ vx
-
rout of CMOS current mirror is:
rout = rds 2
JMM v1.2
Common Source Amplifier
the common source topology is the most popular
gain stage, especially when high-
high-input impedance is
required
a common use of simple current mirrors in a single-
single-
stage amplifier with an active load
active loads represent high-
high-impedance output loads
without using high impedance resistors or large
power supply voltages.
for a given supply voltage a larger gain can be
achieved using active loads.
1MΩ load were required with a
for example, if a 1MΩ
100µA bias current, a 100µ
100µ 100µA x 1MΩ
1MΩ=100V
power supply would be necessary
active load
Q3 Q2
Vout
rout
Ibias Q1
Vin
common source
amplifier stage
JMM v1.2
Common Source Amplifier
it is assumed, that the bias current is such that
both transistors Q2 and Q3 are in active region.
Q3 Q2
Vout
rout
Ibias Q1
Vin
Q1 R2
Rin vout active load
vin ~
+ + gm1vgs1
vgs1
- - rds1 rds2
v gs1 = v in
v out
Av = = − gm1 R2 = − gm1 (rds1 rds 2 )
v in
JMM v1.2
Source--Follower or
Source
Common-
Common-Drain Amplifier
common-
common-drain amplifier is commonly used as
voltage buffers and thus is called source-
source-follower
ideally the small signal voltage gain is close to
unity
as the circuit has no voltage gain it does have a
current gain
dc level of the output voltage is not the same as
the dc level of the input voltage
note that the body effect is the major limitation on
the small-
small-signal gain
common-
common-drain
amplifier stage
Vin
Ibias Q1
Vout
Q3 Q2
active load
JMM v1.2
Source--Follower
Source
Note that the voltage controlled current source that
models the body effect of the nfet has been
included
Vin
Ibias Q1
Vout
Q3 Q2
Q1
vd1
vin =vg1
+ gm1vgs1 gs1vs1 rds1
vgs1
-
vs1 vout=vs1
rds2
active load
JMM v1.2
Nodal Equation Methodology
In order to minimize circuit equation errors, a
consistent methodology should be maintained when
writing nodal equations:
the first term is always the node at which the currents
are being summed v out
this node voltage is multiplied by the sum of all

admittances connected to the node v (g out ds1 + gds 2 )
the next negative terms are the adjacent node voltages,

and each is mutiplied by all connecting admittances
− v d gds1
the last terms are any current sources with a multiplying
negative sign used if the current is shown to flow into
the node +g v −g v s1 s1 m1 gs1
Q1
vd1
vin =vg1
vgs1
-
vs1 vout=vs1
rds2
JMM v1.2
Source--Follower
Source
(small signal analysis, con‘t)
Q1
vd1
vin =vg1
vgs1
-
vs1 vout=vs1
rds2
v s1
v out ( gds1 + gds 2 ) + g s1v out − g m1 (v in − v out ) = 0
vout g m1
Av = =
vin g m1 + g s1 + g ds1 + g ds 2
gs1 is 5 to 10% of the value of gm1, gds1 and gds2 are

in the order of 1/10 of gs1
the body effect parameter gs1 is the major source of
the error causing the gain less than unity
JMM v1.2
Common--Gate Amplifier
Common
Common-
Common-gate stage with active load is used when
relatively small input impedance is desired
Application examples: input impedance of 50Ω50Ω to
terminate a transmission line, or first stage of
amplifier to amplify current instead of voltage
active load
Q3 Q2
Vout
Ibias Q1 common-
common-gate
Vbias
rin amplifier stage
Vin
Q1
vd1 vout
+ gm1vgs1 gs1vs1 rds1 RL

vgs1
-
vs1
rin active load
RS
vin
JMM v1.2
Common--Gate Amplifier
Common
Q1
vd1 vout
+ gm1vgs1 gs1vs1 rds1 RL

vgs1
-
vs1
rin active load
RS
vin
v s1 = −v gs1 thus
Q1
vd1 vout
+ (gm1+gs1)vs1 rds1
vgs1 RL=rds2
-
vs1
rin only active
RS charge present
nodal analysis for is
nodes vout and vs1: vin
é ù
v out ê Gs g m 1 + g s 1 + g ds 1
Av = =ê
v in ê G + g m 1 + g s 1 + g ds 1 G L + g ds 1
s
êë 1 + g ds 1 / G L
JMM v1.2
Summary: Gain Stages
common source amplifier: gain stage with high
input impedance.
vout
Av = = − g m1 (rds1 rds 2 )
vin
common drain amplifier (source follower): used as
voltage buffers with small signal voltage gain close
to 1, but can produce current gain.
vout g m1
Av = =
vin g m1 + g s1 + g ds1 + g ds 2
common gate amplifier: used as gain stage when a

small input impedance is desired and can be used as
first stage of an amplifier designed to amplify
current rather than voltage.
é ù
vout ê Gs g m1 + g s1 + g ds1
Av = =ê
vin ê G + g m1 + g s1 + g ds1 GL + g ds1
s
êë 1 + g ds1 / GL
JMM v1.2
Source--Degenerated Current Mirror
Source
General consequence of finit output resistance:
deviation in large signal behavior
difficulties as active load
the output impedance of the basic 2 transistor current

mirror can be increased by
degeneration resistors Rs I I in out
V1 rout
Q1 Q2
Rs Rs
Q1 Q2
0V ix
gm2vgs
+ gsvs rds2 +
1/gm1 vgs ~ vx
- -
vs
Rs ix Rs
impedance increase
vx
rout = = rds 2 [1 + Rs ( gm 2 + g s 2 + gds 2 )]
ix
JMM v1.2
High--Output Impedance Current Mirrors
High
Cascode Current Mirror
the output impedance of a cascode current mirror is
increased by a factor 10 to 100 compared to a basic
current mirror
a disadvantage is the reduced output voltage swing
because transistors may enter triode region
Iin Iout Vout

rout
Q3 Q4
Q1 Q2
Vout > 2 Veff + Vtn

rout ≅ rds 4 rds 2 g m 4
JMM v1.2
Cascode Current Mirror
reduced output voltage swing

transistor in active region
Vds > Veff = Vgs - Vtn
all transistor have the same size and current Id:

Vgs = Veff + Vtn
Iin Iout Vout 2Id

Veff =
rout µ n C ox (W / L )
Q3 Q4
Q1 Q2
Vg 3 = Vgs 1 + Vgs 3 = 2 Veff + 2 Vtn
Vds 2 = Vg 3 − Vgs 4 = Vg 3 − (Veff + Vtn ) = Veff + Vtn
Vout > Vds 2 + Veff = 2 Veff + Vtn

JMM v1.2
Cascode Current Mirror (con‘t)
very high output impedance Iin Iout Vout

rout
Q3 Q4
Q1 Q2
impedance
impedance vgs4=-vs4
vg4 iout
gs3vs3 g v gs4vs4 vout

+ gm3vgs3 rds3 + m4 gs4 rds4
vgs3 vgs4
- -
vs3 no current vs4
vg2
rds1 vs3=0V g v gs2vs2
+ m2 gs2 rds2
vgs2
-
rout = rds 4 [1 + rds 2 (g m 4 + g s 4 + g ds 4 )] ≅ rds 2 rds 4 g m 4

JMM v1.2
High--Output
High Output--Impedance Current Mirrors
Wilson Current Mirror
very similar performance than cascode current
mirror but 1/2 of its output impedance
shunt-
shunt-series feedback to increase output impedance
Iin Iout
rin rout
Q3 Q4
Q1 Q2
Q2 senses output current and mirrors it to Id1 to.

Iin and Id1 must precisely match otherwise Vg3 increases/decreases.
JMM v1.2
Cascode Gain Stage
cascode configuration for single stage amplifiers is
commonly used in modern IC design
quite large gain for single stage due to large impedance
at the output
to enable the large gain, high quality cascode current
mirrors at the output are necessary
large gain normally without any speed degradation
voltage across input drive fet is limited
minimizing short channel effects in modern technologies
configuration: common-
common-source-
source-connected transistor
feeding into a common-
common-gate-
gate-connected transistor
telescopic cascode amplifier folded-

folded-cascode amplifier
p-channel
n-channel I Ibias common-
common-gate
common-gate bias
common-
Vout Q2
Vin
Vbias Q1 Vbias
Q2 CL Vout
Vin
Q1 Ibias2 CL
identical in/out
dc level possible
JMM v1.2
Cascode Gain Stage
telescopic cascode amplifier
Ibias
Vout
Vbias
Q2 CL
Vin
Q1
output impedance of cascode stage:
rx ≅ g m 2 rds 1rds 2
ix vx
vx
gm2vgs2 gs2vs2 ix
+ rds2
vgs2 vs2(gs2+gm2)
- rds2
vs2
vs2
gm1vgs1 gs1vs1
+ rds1 rds1
vgs1
-
2 rd 2 ≅ g m 2 rds1rds 2
1 æ gm ö
A v ≅ − çç
2 è g ds for high impedance Ibias with
RL ≅ g 2
r g mrds2
m− p ds − p rout ≅
for gdsn=gdsp and gmn=gmp 2
JMM v1.2
Summary: Cascode and Source Deg.
source degenerated current mirror: by addding a

resistor RS at the source node of a current mirror
fet,
fet, the output impedance can be increased:
vout
rout = = rds 2 [1 + Rs ( g m 2 + g s 2 )]
iout
cascode current mirror: the output impedance of a
current mirror can further be increased by using
cascode fets:
fets:
v
rout = out ≅ rds 2 rds 4 g m 4
iout
Vout ≥ 2Veff + Vtn
cascode gain stage: due to the large impedance at
the output, high gain can be realized with cascode
gain stages:
2
vout 1 æ gm ö
Av = ≅ − çç
vin 2 è g ds
JMM v1.2
Coming Up...
Next topic…
Frequency response of single stage amplifiers

Johns&Martin:
nodal analysis method
simple CMOS current mirror (chap 3.1)
common-
common-source amplifier (chap 3.2)
source-
source-follower or common drain amplifier (chap 3.3)
common gate amplifier (chap 3.4)
source degenerated current mirror (chap 3.5)
high-
high-output-
output-impedance current mirrors (chap 3.6)
cascode gain stage (chap 3.7)
Exercises:
Havea look at the exercises in Johns&Martin.
CAD exercise Ex601
JMM v1.2
VLSI--25
Exercises VLSI #1
Johns&Martin chap 3.1 pp127: 3.1 (difficulty: easy):
Consider the current mirror shown on transparency
=100µA and each transistor has
vlsi25/3 where Iin=100µ
W=10µm and L=2µ
W=10µ L=2µm. Given rds=88000 [L
(µm)]/[ID (mA
(mA)],
mA)], find rout for the current mirror
and the value of gm1. Also estimate the change in Iout
for a 0.5V change in the output voltage.
=1.76MΩ, gm1=0.45mA/V,
Result: rout =1.76MΩ
=0.28µΑ
dIout=0.28µΑ

Consider the common source stage shown on
=100µA and all
transparency vlsi25/7 where Iin=100µ
W=10µm and L=2µ
transistor have W=10µ L=2µm. Given
rdsn=88000 [L (µ (µm)]/[ID (mA
(mA)],
mA)], rdsp=50000
(µm)]/[ID ((mA
[L (µ mA)].
mA)]. What is the gain of the
stage.
Result: Av =-287
JMM v1.2
VLSI--25
Exercises VLSI #2
Consider the source follower shown on transparency
=100µA and all transistors
vlsi25/8 where Ibias=100µ
0.5µm process have
designed with Alcatel 0.5µ
W=10µm and L=2µ
W=10µ L=2µm. Given γn=0.45V1/2,
Vsb=2V, and rds-
ds-n=88000 [L (µ(µm)]/[ID (mA
(mA)].
mA)].
What is the gain of the stage.
Result: Av =0.88

Consider the source degenerated current mirror
shown on transparency vlsi25/15 where
=100µA and all transistors designed with
Ibias=100µ
0.5µm process have W=100µ
Alcatel 0.5µ W=100µm and
L=2µm. Given γn=0.45V1/2, Vsb=2V, Rs=5kΩ
L=2µ =5kΩ,
and rds-
ds-n=88000 [L (µ(µm)]/[ID (mA(mA)].
mA)]. What is
the increase in output resistance compared to simple
current mirror.
=16MΩ
Result: increase=9.1, rout =16MΩ
JMM v1.2
VLSI--25
Exercises VLSI #3
Consider the cascode current mirror shown on
=100µA and all
transparency vlsi25/15 where Iin=100µ
W=10µm and L=2µ
transistors have W=10µ L=2µm. Given
VSB4=1V and rds-ds-n=50000 [L (µ(µm)]/[ID (mA
(mA)].
mA)].
What is the output impedance and the minimal
output voltage.
=527kΩ, Vout(min)=1.5V
Result: rout =527kΩ

Consider the telescopic cascode gain stage shown on
transparency vlsi25/20 assuming gm=0.5mA/V
=100kΩ. What is the output impedance and
and rds=100kΩ
gain.
=2.5MΩ, Av=-1250
Result: rout =2.5MΩ
JMM v1.2
VLSI Design II
Frequency Response of
Single Stage Amplifiers
[dB]
40
20
0
103 104 105 106 107 108 109 [Hz]
Circuit Analysis
the precise way: solving complex equations
the approximate way: find the dominant pole
the handy way: let Spice do it precisely
Goal: You are able to identify the dominant pole in a

transistor circuit. You can approximately determine
the contribution of each node in a circuit to the
total frequency response.
MicroLab, vlsi26 (1/29)
JMM v1.2
Outline
Frequency response
common-
common-source amplifier
source-
source-follower amplifier
source-
source-follower amplifier with compensation technique
cascode gain stage
Johns&Martin
frequency response (chap 3.11)
Gray&Meyer
estimation of dominant poles
zero-
zero-Value Time Constant Analysis (pp500 ff)
ff)
(Analysis and Design of Analog Integrated Circuits, 3rd
edition, Wiley and Sons, ISBN-
ISBN-0471-
0471-59984-
59984-0)
Exercises
hand calculations
spice simulations
JMM v1.2
Frequency Response
Dominant Pole Approximation
precise calculation of frequency response is a
complex task and thus different approximation
methods exist
one method is the zero-
zero-value time constant analysis
first some ideas about dominant-
dominant-pole approximation
are developed
transfer function by small-

small-signal analysis
N( s ) a 0 + a 1 s + a 2 s 2 + + a m sm
A(s ) =
D(s ) 1 + b 1 s + b 2 s2 + + b n sn
very often the zeros are unimportant, thus
K
A(s ) =
 s  s  s
 1 −  1 −   1 − 
 p 1  p 2   p n 
Where K is a constant and p1,p2 ... are poles of the transfer function,
n
 1
thus b 1 = ∑  − 
i= 1  pi 
JMM v1.2
(con’t 2)
n
 1
b 1 = ∑  − 
i= 1  pi 
an important practical case occurs when one pole is dominant
1 n
 1
p 1 << p 2 , p 3 ,
p1
>> ∑  − 
i= 2  pi 
1
thus b 1 ≅
p1
the gain magnitute in the frequency domain is
K
A ( jω) =
  ω  2   ω  2    ω  2 
 1 +    1 +     1 +   
  p 1    p 2     p n  
    
with a dominant pole we simply get
K
A ( jω) ≅
  ω 2 
1 +   
  p 1  
 
JMM v1.2
(con’t 3)
this approximation will be quite accurate as long as ω ≅ p 1
thus for a dominant pole situation the -3dB frequency is

1
ω−3 dB ≅ p 1 ω−3 dB ≅
b1
pole plot for a circuit with a dominant pole
jω
s plane
σ
p3 p2 p1
JMM v1.2
Zero--Value Time Constant
Zero
Method for finding the time constant associated

with a capacitor in the small signal equivalent
circuit
replace the capacitor Cx by a voltage source Vx
set all independent sources to ground
set all other network capacitors to zero
find admittance Yx (=1/Rx) which is driven by a
voltage source Vx
the time constant τx is given by:
τ x = Rx C x
JMM v1.2
Frequency Response
Zero-
Zero-Value Time Constant
RL
Rin
+ vout
vin ~
-
i3
Cx
+
v3
- i2
Rin rb Cµ
+ -
+ + v2
vin ~ i1 C v rπ vout
- 1
- π gmv1 RL
We can show that with this choice od variables the circuit equations are of the form:
i 1 = (g 11 + sC π )v 1 + g 12 v 2 + g 13 v 3
i 2 = g 21 v 1 + (g 22 + sC µ )v 2 + g 23 v 3
i 3 = g 31 v 1 + g 32 v 2 + (g 33 + sC x )v 3
JMM v1.2
Zero
(con’t 1)
determinant ∆ of the
The poles of the transfer function are the zeros of the determinant
circuit equations, which can be written in the form:
∆(s ) = K 0 + K 1 s + K 2 s2 + K 3 s3
∆ ( s ) = K 0 (1 + b 1 s + b 2 s 2 + b 3 s 3 )
If all capacitors are zero:
K 0 = ∆ C π =C µ = C x = 0 ≡ ∆ 0
Consider now the term K1s, this is the sum of the terms involving s that are
obtained when the system determinant is evaluated. However it is apparent,
that s only occurs when associated with a capacitance:
K 1 s = h 1 sC π + h 2 sC µ + h 3 sC x
The terms are constants. h1 can be evaluated by expanding the determinant
about the first row:
∆ ( s ) = (g 11 + sC π )∆ 11 + g 12 ∆ 12 + g 13 ∆ 13
With cofactors ∆xx of the determinant. The term sCπ is found by evaluating
∆11 with Cµ and Cx equal zero
h 1 = ∆ 11 Cµ =C x =0
JMM v1.2
Zero
(con’t 2)
Now consider expansion of the determinant about the second row.
∆ ( s ) = g 21 ∆ 21 + (g 22 + sC µ )∆ 22 + g 23 ∆ 23
With cofactors ∆xx of the determinant. The term sCµ is found by evaluating
∆22 with Cπ and Cx equal zero
h 2 = ∆ 22 C π =C x =0
similarly
h 3 = ∆ 33 C µ = Cπ = 0
Combining these equations gives:
K 1 = ∆ 11 Cµ =C x =0 C π + ∆ 22 C π =C x =0 C µ + ∆ 33 C µ =C π = 0 Cx
and:
K 1 ∆ 11 Cµ =C x =0 ∆ 22 C π =C x =0 ∆ 33 Cµ =C π =0
b1 = = Cπ + Cµ + Cx
K0 ∆0 ∆0 ∆0
JMM v1.2
Zero
(con’t 3)
Now consider putting i2=i3=0 and solving for v1
v 1 ∆ 11
=
i1 ∆(s )
The driving-
driving-point resistance at the Cπ node pair with all capacitors
equal to zero:
∆ 11 Cµ = C x =0 ∆ 11
= Cµ = C π = C x =0
∆0 ∆
We now define
∆ 11
R π0 = Cµ =C x =0
∆0
We can write now:
b 1 = R π0C π + R µ0Cµ + R x 0C x
Thus:
1 1
ω−3 dB ≅ ω−3 dB ≅
b1 ∑T 0
Thus the sum of the zero-
zero-value time constants leads to the -3dB frequency
JMM v1.2
Summary: Frequency Analysis Methods
The precise way:
Add the parasitic capacitors to the equivalent circuit. Use
nodal analysis for evaluating the transfer function.
The approximate way:
if there exists a pole p1 <<p2, p3 ,..., and the transfer
function is already given be the transfer function
A(s)=N(s)/D(s)
with D ( s ) = 1 + b1 s + b2 s 2
+ l + bn s n
the pole p1 is given by: p1 = 1 / b1

the dominant pole may be found directly in the circuit
diagram by looking for the node with the largest
impedance. Take care of the Miller Effect.
The time constant (and its influence on the frequency
response) associated with a single parasitic capacitor can
be estimated with the zero value time constant method:
set all independent sources to zero
replace the interesting capacitor Cx by a voltage source Vx
set all other capacitors to zero
evaluate the impedance Rx seen by the voltage source Vx
the time constant is equal to CxRx
The handy way:

AC analysis with Spice
JMM v1.2
Frequency Response
Common-
Common-Source Amplifier
precise calculation of frequency response is most
often left to computer simulations
much insight can be obtained by finding the
dominant frequency effects (dominant poles, zeros)
Rin v1 Cgd1 vout
+ gm1vgs1
vin ~ vgs1
- Cgs1 C2
R2
Cdb of Q1 and Q2
and load CL
rds of Q1 and Q2
nodel analysis ...
JMM v1.2
Frequency Analysis ((con’t
con’t))
con’t
 C gd 1 
− g m 1R 2  1 − s  at frequencies
v out  gm1  where gain has just
= started to decrease
v in 1 + sa + s 2 b
a = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
b = R inR 2 (C gd 1 C gs 1 + C gs 1 C 2 + C gd 1 C 2 )
1
ω−3 db =
a
1
ω−3 db =
R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
Miller capacitance for Rin >> R2
analysis for high frequencies for widely separated poles

 s  s  s s 2
D( s ) =  1 +  1 +
 
 ≅ 1+
 +
 ω p 1  ω p 2  ωp 1 ωp 1 ωp 2
g m 1 C gd 1
ωp 2 ≅
C gs 1 C gd 1 + C gs 1 C 2 + C gd 1 C 2
JMM v1.2
Frequency Response
Source-
Source-Follower Amplifier
source followers can have complex poles and thus
exhibit overshoot
a compensation technique resulting in only real axis
poles is shown, resulting in no overshooting
Q1
vout
Iin Rin Cin
Ibias CL
Cgd1 vd1

iin Rin Cin Cgs1 -vgs1
vs1 vout
rds2 Cs Cs=CL+Csb1
JMM v1.2
Source--Follower Amplifier
Source
(con’t 1)
vg1 Yg
+ gm1vgs1
iin Rin C’in Cgs1 -vgs1
vs1 vout
C’in=Cin+Cgd1 Rs1 Cs
R s 1 = rds 1 rds 2 (1 / g s 1 )
1. gain from vg1 to vout is found

2. admittance Yg looking into gate of Q1 without considering Cgd1 is found
3. Gain from iin to vg1 is found
4. overall gain from vin to vout is found and results interpreted
v out (sC s + sC gs 1 + G s 1 ) − v g 1 sC gs 1 − g m 1 (v g 1 − v out ) = 0

v out sC gs 1 + g m 1
=
v g 1 s(C gs 1 + C s ) + g m 1 + G s 1
JMM v1.2
Source
(con’t 2)
ig1t sC gs 1 (sC s + G sq )
Yg = =
v g1 s(C gs 1 + C s ) + g m 1 + G s 1
v g1 s(C gs 1 + C s ) + g m 1 + G s 1
=
iin a + sb + s 2 c
v out sC gs 1 + g m 1
A(s ) = =
iin a + sb + s 2 c
JMM v1.2
Source
(con’t 3)
N( s )
ω0 is the pole frequency
Q is the Q factor
A ( s ) = A (0 )
s s2
1+ + 2
ω 0 Q ω0
There is no peaking and the transfer functions maximum is at dc if:
Q < 1 / 2 ≅ 0.707
ω0 is the -3dB frequency if: Q = 1/ 2
Step input function:
no peaking for Q ≤ 0 .5
peaking for Q > 0 .5
(complex conjugate poles) 4 Q 2 −1
% overshoot = 100 e −π /
For the source follower:
− gm1 G in (g m 1 + G s 1 )
ω0 =
ωZ =
C gs 1 C gs 1 C s + C 'in (C gs 1 + C s )
G in (g m 1 + G s 1 )[C gs 1 C s + C 'in (C gs 1 + C s )]
Q=
G in C s + C 'in (g m 1 + G s 1 ) + C gs 1 G s 1
Source follower circuits can exhibit large amounts of overshoot under certain
conditions. In practical uE circuits the parasitic capacitances and the output
capacitance results in only moderate overshoot for worst-
worst-case conditions.
JMM v1.2
Source
Compensation Technique
source followers can have complex poles and thus
exhibit overshoot
overshooting may be reduced by:
increasing Cin
or Cs or both
adding a compensation network
Q1
C1 vout
Iin Rin Cin
Ibias CL
R1
C gs 1 (C s g m 1 − C gs 1 G s 1 ) g m 1 C gs 1 C s
C1 = ≅
(g m 1 + G s1 )(C gs 1 + C s ) (g m1 + G s1 )(C gs 1 + C s )
(C + G )
gs 1 s (C 2
gs 1 + G s )2
R1 = ≅
C gs 1 (C g − C G ) C
s m1 gs 1 s1 Cg
gs 1 s m 1
C gs 1 C s
C2 =
C gs 1 + C s (see Johns/Martin pp160-
pp160-162)
JMM v1.2
Frequency Response
Common-
Common-Gate Amplifier
The frequency response of the common-
common-gate stage
is usually superior to that of the common-
common-source
stage due to the low impedance, rin, at the source
node, assuming GL=(sC
=(sCL+gds2)is not considerably
smaller than gds1.
Ibias
Q1
vout
Vbias CL
=
vout
(see Johns/Martin pp160-

pp160-162)
JMM v1.2
Frequency Response
High-
High-Ouput Impedance Mirrors
Both the Wilson and the cascode current mirrors
introduce high-
high-frequency poles into the signal
transfer function.
The approximate time constant of these poles is
Cgs/gm, the roof of this statement can be found by
doing high-
high-frequency, small-
small-signal analysis.
Iin Iout Vout Iin Iout

rout rin rout
Q3 Q4 Q3 Q4
Q1 Q2 Q1 Q2
(see Johns/Martin pp163)

JMM v1.2
Frequency Response
Cascode Gain Stage
The exact high-
high-frequency analysis of a cascode gain
stage is usually left to simulation on a computer.
at high-
high-frequencies, the time constant due to the
output node almost always dominates since the
impedance is so large at that node:
Cout=(Cgd2+Cdb2)+CL+Cbias
CL is normally the major contributor
Ibias
Vout
Vbias
Q2 CL
Vin
Q1
1 2 g 2ds
ω−3 dB ≅ ≅
R out C L g mCL
JMM v1.2
Cascode Gain Stage
(con’t 1)
Zero-
Zero-value time constant analysis method used
Ibias C d 2 = C gd 2 + C db 2 + C L + C bias
Vout C s 2 = C db 1 + C sb 2 + C gs 2
Vbias
Q2 CL
Vin
Q1 gm2vs2 vout
vg1 Cgd1 rds2 Cd2 GL

vs2
vin
Cgs1 gm1vg1 rds1 Cs2
All independent sources have to be set to zero

(vin=0)
node vg1 τ Cgs 1 = C gs 1R in
JMM v1.2
Cascode Gain Stage
(con’t 2)
gm2vs2 vout

vs2
vin
nodes vg1,vs2 the capacitor Cgd1 is replaced by a voltage source vx in order

to calculate the input resistance seen from that node.
vx ix
vg1
-~ + G d 1 = g ds 1 + Ys 2
admittance looking into the source
Rin gm1vg1 Rd1
of a cascode transistor is Ys2
τ Cgd 1 = C gd 1R d 1 (1 + R in [G d 1 + g m 1 ])
JMM v1.2
Cascode Gain Stage
(con’t 3)
gm2vs2 vout

vs2
vin
G d 1 = g ds 1 + Ys 2
admittance looking into the source
of a cascode transistor is Ys2 gm2vs2 vout
Ys2=is/vs2 vs2 rds2 Cd2 GL

for
g ds << g m R L ≅ g mrds2 is
(see cascode current mirror
impedance, pp137, vlsi-
vlsi-25/17)
Ys 2 ≅ g ds
rds
τ Cgd 1 ≅ C gd 1 (1 + g mR in )
2
g mrds2
τ Cgd 1 ≅ C gd 1 for Rin is large and equal rds
2
JMM v1.2
Cascode Gain Stage
(con’t 4)
gm2vs2 vout

vs2
vin
node vs2 the resistance seen by the capacitor Cs2 is rds1 in paralell
with the impedance seen looking in the source of Q2 which
is approximately rds, thus:
rds
τ Cs 2 ≅ C s2
2
The resistance seen by C is the output impedance of the
node vout cascode amplifier, thus: d2
g mrds2
τ Cd 2 ≅ C d 2
2
τ total ≅ τ Cgs 1 + τ Cgd 1 + τ Cs 1 + τ Cd 1

g mrds2 r g m r 2
τ total ≅ C gs 1R in + C gd 1 + C s 2 ds + C d 2 ds
2 2 2
JMM v1.2
Cascode Gain Stage
Comments
High frequencies considerations
Ibias one pole dominates, thus the gain is:
Av
Vbias
Vout
A(s ) =
Q2 1 + s / ω−3 dB
CL
Vin at frequencies substantial larger than ω-3dB:
Q1 Av gm1
A(s ) ≅ ≅−
s / ω−3 dB sC L
upper limit of the unity-

unity-gain frequency of an
amplifier that uses a cascode gain stage is limited
by source node of Q2:
1 3µ p Veff 2
ωp 2 = >
τ s2 2 L22
JMM v1.2
Coming Up...
Next topic…
Basic OpAmp design and compensation
Readings
for next time…
Johns&Martin: Sections 3.11
Exercises:
Have a look at the exercises in Johns&Martin.
JMM v1.2
VLSI--26
Exercises VLSI #1
Consider the common-
common-source amplifier shown on
=100µA and all
vlsi-26/6 where Iin=100µ
transparency vlsi-
W=100µm and L=1.6µ
transistors have W=100µ L=1.6µm. Given
=180kΩ, CL=0.3pF,
Rin=180kΩ, =0.3pF, Cgs1=0.2pF,
=0.2pF, Cgd1=15fF,
=15fF,
=20fF, Cdb2=36fF,
Cdb1=20fF, =36fF, µnCox=90µ
=90µA/V2,
=30µA/V2, and rds-
µpCox=30µ ds-n=8000 [L (µ(µm)]/[ID
mA)], rdsp=12000 [L (µ
(mA)], (µm)]/[ID (mA
(mA)].
mA)].
Estimate the 3db frequency response.
Result: f-3db =554kHz

Analyse the source follower and assume that
=100µA and all transistors have W=100µ
Ibias=100µ W=100µm
and L=1.6µ =180kΩ, CL=10pF,
L=1.6µm. Given Rin=180kΩ, =10pF,
=0.2pF, Cgd1=15fF,
Cgs1=0.2pF, =15fF, Csb1=40fF,
=40fF, Cin=30fF,
=30fF,
=90µA/V2, µpCox=30µ
µnCox=90µ =30µA/V2, and rds- ds-
n=8000 [L (µ (µm)]/[ID ((mAmA)]. Find ω0, Q, and
mA)].
ωz of the source follower.
Result: ω0 =52MHz, Q=0.8, % overshoot = 8.1%,
ωz=5.3GHz
JMM v1.2
VLSI--26
Exercises VLSI #2
Johns&Martin chap 3.11 pp166:3.11 (difficulty: easy):
Assume that for the input transistors and the
=100kΩ,
cascode transistors, gm=1mA/V, rds=100kΩ
=180kΩ, CL=5pF,
Rin=180kΩ, =5pF, Cgs=0.2pF,
=0.2pF, Cgd=15fF,
=15fF,
=40fF, Cdb=20fF,
Csb=40fF, =20fF, Cbias=20fF,
=20fF, Estimate the -
dB frequency of the cascode amplifier (transparency
19).
=2π 6.3MHz
Result: ω-3dB =2π

Estimate the lower bound on the frequency of the
second pole of a folded-
folded-cascode amplifier for a
0.8µm technology, where a typical value of 0.25V
0.8µ
is chosen for Veff2. L2=1.5Lmin, µp=0.02m2/Vs.
=2π 414MHz
Result: ωp2 =2π
JMM v1.2
Analog Microelectronics
Basic OpAmp Design
and Compensation
Today’s handouts:
(1) Lecture Slides
JMM v1.0
Outline
u Johns&Martin
u MOS differential pair and gain stage (chap 3.8)
u two-stage CMOS OpAmp (chap 5.1)
u gain
u frequency response
u systematic offset voltage
u n- or p-channel input stage
u feedback and OpAmp compensation (chap 5.2)

u first-order model of closed loop-amplifier
u linear settling time
u OpAmp compensation
u compensation of two-stage OpAmp
u lead compensation
u making compensation independent of process and temp
u biasing OpAmp to have stable transconductance
u Exercises (5.3-5.5)
u hand calculations
u spice simulations
JMM v1.0
MOS Differential Pair
and Gain Stage
u most integrated amplifiers have differential input,
realized with a differential transistor pair
ID2 ID2
V+ V-
Q1 Q2
Ibias
ua low-frequency small-signal equivalent circuit is

based on the T model for the MOS transistor
id1=is1 id2=is2
v+ v-
is1 is2
rs1 rs2
gate current is
zero in T model
JMM v1.0
(con’t 1)
to simplify analysis the output impedance of the transistor is ignored
id1=is1 id2=is2
Definition: v+ v-
is1 is2
v in ≡ v + − v − rs1 rs2
i
v in v in s1
id 1 = i s 1 = =
rs1 + rs2 1 / g m1 + 1 / g m2
since both Q1 and Q2 have the same bias currents, gm1=gm2
g m1 gm1
id 1 = v in id 2 = − v in
2 2
Definition: thus:
iout ≡ i d 1 − id 2 iout = g m 1 v in
JMM v1.0
(con’t 2)
If a differential pair has a current mirror as an active load, a complete
differential-input, single-ended-output gain stage can be realized.
to simplify analysis the output Q3 Q4

impedance of the transistor is ignored rout
is1 vout
id4
i d 4 = i d 3 = −i s 1 +
is1
and vin Q1 Q2
id 2 = −i s1 -
Ibias
v out = (− i d 2 − id 4 )rout = 2 i s1rout = g m1routv in

this result assumes that the output impedance is purely resistive, if there
is also a capacitive load CL we get:
A v = g m1 z out where z out = rout 1 / (sCL )

vout
Thus, for this differential stage, a very +
simple model is used. This model implicitly
assumes that the time constant at the output vin gm1vin rout CL
node is much larger than the time constant -
due to the parasitic capacitances at Q1 and Q2
zout
JMM v1.0
(con’t 3)
The evaluation of the output resistance rout is determined by using the
small-signal equivalent circuit and applying a voltage at the output node.
Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid-π
model is used for Q4.
vx Q3 Q4
rout ≡ rout
ix is1
id4 vout
+
is1
vin Q1 Q2
+ gm4va -
rds3 //rs3 va rds4 Ibias
-
ix1 ix4 ix
is5
ix2 ix3 vx +
-
is1 is2
rout = rds2 rds4
is1 rs1 rds1 rds2 rs2 is2
A v = g m1 (rds 2 rds4 )
JMM v1.0
(con’t 4)
The evaluation of the large signal amplification is determined by using the
large-signal transistor model in the active region of the fets.
Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid-π
model is used for Q4.
µC W Q3 Q4
ID = 0 ox (VGS − Vtn )2 IS1
Iout
Vout
2 L ID4
β +
ID = (VGS − Vtn )2 VIN Q1
ID1 ID2
Q2
2
-
Ibias
β VIN2 β 2 VIN4
I OUT = ID 1 − I D 2 = I bias − 2
I bias 4Ibias
1.5
IOUT
1
Ibias
0.5
0
2
-3 -2 -1 0 1 2 3
-0.5
VI D β / Ib i a s
-1
-1.5
typical value for Ibias=0.1mA: β / I b i a s = 5. 4 VIN = 187mV

JMM v1.0
Two-Stage CMOS OpAmp
u Basic OpAmp design are discussed
u OpAmp gain
u frequency response
u slew rate
u systematic offset voltage
u n-channel or p-channel input stage
capacitor ensures
stability when OpAmp CC is often called
is used in feedback Miller capacitance
to illustrate its
CC effect on input
+
Vin A1 -A2 1 Vout
-
differential second output

input stage gain stage buffer output gain stage
only present when
resistive loads
single need to be driven
ended output e.x. common-source
gain stage with
active load
JMM v1.0
CMOS realization of a
two-stage OpAmp
p-well process
necessary
VDD
25 25 Q 11 Q5300 Q6 300
Q 10 500
Q8
300 300
25 25
Q1 Q2
Vin- Vin+ Vout
Q 14 Q 12
Q 16 CC
100 25
Q 15 Q 13 150 150 300 500
Q4
Q3
Q7 Q9
Rb VSS
bias circuit differential input common source output

first stage second stage buffer
u p-channel input stage

u all transistor lengths are 1.6µm (1µm process)
u reasonable sizes for lengths of the transistors might be somewhere
between 1.5 and 2 times the minimum transistor length
JMM v1.0
Two-Stage OpAmp
Gain
u overallgain for low frequency application is the
most critical parameter of an OpAmp
gain of the first stage
(differential stage)
Av1 = gm1 (rds1 rds 4 )
 W  W  I bias
g m1 = 2 µ p C ox   I D1 = 2 µ pC ox  
 L 1  L 1 2
Li approximation to the finite output
rdsi ≅ α VDGi + Vti resistance, where a is technology
I Di dependent parameter: 5e-6 V1/2/m
ignoring short channel effects
gain of the second stage Av 2 = − gm 7 (rds 6 rds 7 )

(common-source stage)
gm 9
gain of the third stage Av 3 =
(common-drain stage) G L + g m 9 + g ds8 + g ds9
gain of the third stage gm 9
with body effect (bulk Av 3 =
not connected to source) G L + g m 9 + g s8 + gds8 + g ds9
body effect constant γ=0.5V1/2 g mγ
2φF=0.7V gs =
2 VSB + 2φ F
JMM v1.0
Two-Stage OpAmp
Frequency Response
u frequency response where capacitor Cc causes the
magnitude of the gain to decrease, but still well
below unity gain frequency (open-loop gain = 1)
ð midband frequency
u only compensation capacitor CC repsected
u assume Q16 is not present (resistor for lead
compensation, effect only at unity gain frequency)
u discuss simplified circuit:
g m1
midband gain Av (s ) ≅
Q 300
5 sC C
Vbias g m1
untity gain frequency ω ta ≅
CC
300 300
vin- Q1 Q2 CC
v1 v2
vin+
-A2 A3
150 150 i=gm1 vin vout
Q3 Q4
JMM v1.0
Two-Stage OpAmp
Slew Rate
u slew rate SR is the maximum rate the output
changes when input signals are large
u at slew rate limitation all current of Q5 goes either
in Q1 or Q2
ð this current has to go through CC
dv out
SR ≡ max
dt
Q5300
Vbias 2 I D1
SR = = Veff 1ω ta
CC
300 300
vin- Q1 Q2 CC
v1 v2
vin+
-A2 A3
150 150 I vout
Q3 Q4
increasing V eff1 and ω ta increases SR

p-channel fet inputs increases SR
increasing V eff1 reduces transconductance gm1
JMM v1.0
Two-Stage OpAmp
Systematic Offset Voltage Cancelation
u two-stage OpAmps may have a systematic input
offset voltage if not properly designed
u the differential input is zero: v in+= vin-
u ID6 = ID7 , which requires a well defined V GS7 value
VDD
Q5300 Q6 300
Vbias
300 300
Q1 Q2 Vout
Vin- Vin+
150 150 300

Q3 Q4
Q7
(W / L)7 = 2 (W / L )6
(W / L)4 (W / L)5
JMM v1.0
Two-Stage OpAmp
n- or p- channel input stage
u comparison between n- and p-channel input stage
OpAmps
u overal dc gain is largely unaffected since both designs
have one stage with n-channel and one stage with one or
more p-channel driving fets.
u for a given power dissipation, and therefore bias current,
having a p-channel input-pair stage maximizes the slew
rate.
u having a p-channel input first stage implies that the
second stage has an n-channel input drive fet. This
arrangement maximizes the transconductance of the drive
fet of teh 2nd stage, which is critical when high
frequency operation is important.
u output stage: n-channel source follower is preferable
because this will have less of a voltage drop (if separate
p-well is used). Its higher transconductance reduces the
effect of the load cap on the second pole. There is also
less degradation on the gain when small load resistances
are being driven.
ð p-channel input fets for the first stage is almost
always the best choice
JMM v1.0
Feedback and OpAmp Compensation
u OpAmps in closed-loop configurations are discussed

and how to compensate an OpAmp to ensure that
the closed-loop configuration is not only stable but
has a good settling characteristic.
u Optimum compensation of OpAmps is typically
considered to be one of the most difficult parts in
the OpAmp design procedure.
u first-order model of closed-loop amplifier
u linear settling time
u OpAmp compensation
u compensating the two-stage OpAmp
u lead compensation
u making compensation independent of process and
temperature
u biasing an OpAmp to have stable transconductances
JMM v1.0
First Order Model of
Closed-Loop Amplifier
u First order model of transfer function of a
dominant-pole compensated OpAmp:
A0
A(s ) = real axis
(1 + s / ω p1 ) dominant pole
A0
unity gain frequency definition A( jω ta ) ≡ 1 ≅
ω ta / ω p1
unity gain frequency of first ω ta ≅ A0ω p1
order OpAmp model
for midband frequencies ω p1 << ω << ω ta

ω ta
A(s ) ≅
s A( s )
closed-loop gain ACL ( s ) =
1 + βA(s )
1 1
ACL ( s ) =
β (1 + s / βω ta )
gain
β
- ω −3dB ≅ βω ta
Ain(s) Aout(s)
+ A(s)
JMM v1.0
Linear Settling Time
u the settling time performance is an important
design parameter of OpAmps
u the charge transfer in SC circuits is closely related to
OpAmps step response
u settling time is defined as the time it takes for an
OpAmp to reach a specified percentage of its final value
when a step input is applied
u linear settling time portion is due to the finite unity gain
frequency (independent on output step size)
u nonlinear settling time portion is due to the slew rate
limit (dependent on output step size)
ð unity gain frequency estimation for linear settling time
portion
-3dB frequency determines 1 1

the settling-time response τ= =
for s step input ω −3dB βω ta
vout (t ) = Vstep (1 − e −t / τ )
step response for a
closed-loop OpAmp
if slew rate is larger, Vstep

vout (t ) t =0 =
d
no SR limit will occur
dt τ
JMM v1.0
OpAmp Compensation
(second order model)
u for compensating OpAmps the first order model is
insufficient, because it ignores poles and zeros at
high frequencies which may cause instabilities.
u a more accurate open-loop transfer model adds one
additional pole (real axis poles and zeros):
A0
A(s ) =
(1 + s / ω p1 )(1 + s / ω eq )
first dominant pole higher frequency poles
u ωeq may be approximated with a set of real-axis
poles and zeros:
m n
1 1 1
≅∑ −∑
ω eq i =2 ω pi i=1 ω zi
u phase margin PM is an often used measure how far
an OpAmp with feedback is from becoming unstable
∠LG ( jω ) = −90 − tan (ω / ω eq )
o −1 unity gain
of LG
PM ≡ ∠LG( jω t ) − ( −180o ) = 90o − tan −1 (ω t / ω eq )
independent of β ω t = tan(90 o − PM )ω eq
JMM v1.0
OpAmp Compensation
(second order model con’t)
u Closed-loop gain if β is frequency independent (if
ωt is far away from high frequency poles and zeros)
ACL0
ACL ( s ) =
s (1 / ω p1 + 1 / ω eq ) s2
1+ +
1 + βA0 1 + βA0
A0 1
ACL0 = ≅
1 + βA0 β
u General equation for a second order transfer function:
K
H 2 (s) =
s s2
1+ + 2
ω oQ ω 0
u comparing:
ω 0 = (1 + βA0 )(ω p1ω eq ) ≅ βω taω eq
(1 + βA0 )(ω p1ω eq ) βω ta

Q= ≅
1 / ω p1 + 1 / ω eq ω eq
−π
4 Q 2 −1
% overshoot = 100
JMM v1.0
OpAmp Compensation
(2nd order transfer function)
u Relationship between Q factor and phase margin
u transfer function: Q=sqrt(1/2):
u no peaking
u widest passband
u ω0 = ω -3dB
u step response: Q<=0.5 (real poles and zeros)

u no peaking
u step response: Q > 0.5
u percentage of overshoot to be calculated
PM ω t/ω eq Q factor % overshoot
55 0.700 0.925 13.3%

60 0.580 0.817 8.7%
65 0.470 0.717 4.7%
70 0.360 0.622 1.4%
75 0.270 0.527 0.008%
u Phase margin is much larger than supposed to be

necessary (80 to 85)
JMM v1.0
Compensating the Two-Stage OpAmp
u Capacitor CC realizes dominant-pole compensation and

thereby control ωp1 and ωta :
ω ta = A0ω p1
u fet Q16 is included to realize a left-half-plane zero at
frequencies around or slightly above ωt (lead-
compensation). Q16 has Vds=0V and thus is in triode
region: 1
RC = rds16 =
 W
µ nC ox   Veff 16
 L 16
VDD
Q5300 Q6 300
Vbias
Vin- 300 300 Vin+

Q1 Q2 Vout2
Vbias
Q 16 CC
150 150 300

Q3 Q4
Q7
JMM v1.0
small-signal model
u simplified small-signal model of two-stage OpAmp
for compensation analysis
v1 CC RC vout2
gm1vin1 gm7v1
R1 C1 R2 C2
R1 = rds 4 rds2 C1 = C db2 + C db4 + C gs7

R2 = rds 6 rds 7 C 2 = C db7 + C db6 + C L 2
analysis shown in Johns&Martin
dominant pole: nondominant pole:
1 gm 7
ω p1 ≅ ω p2 ≅
g m 7 R1 R2C C C1 + C 2
gm 7
for RC=0: ωz = −
CC
−1
lead compemsation ωz =
(RC not zero) CC (1 / g m 7 − RC )
JMM v1.0
(discussion)
 s  s 
D( s ) = 1 + 1 +
 ω 

 ω p1  p2 
I
CC CC
gm7
R
ω p2 ω p1 ωz
1 gm 7
ω p1 ≅ ωz = −
g m 7 R1 R2C C CC
gm 7
ω p2 ≅
C1 + C 2
u increasing gm7 separates poles (pole-splitting)

u however, right-hand plane zero introduces negative
phase shift into transfer function
u increasing CC moves ωp1 and ωz1 to low frequency
and thus does not help
JMM v1.0
(lead compensation)
u with a non-zero RC, a third pole is introduced, but
is at high frequency and has almost no effect
u However the zero opens a number of possibilities:
−1
ωz =
CC (1 / g m 7 − RC )
u one could eliminate the right-half plane zero:
RC = 1 / g m 7
u one could choose RC to be even larger and thus move the
right-half-plane zero into the left half plane to cancel
the nondominant pole ωp2:
1  C1 + C 2 
RC = 1 + 
gm 7  CC 
u one could choose RC even larger to move the now left-
half-plane zero to a frequency slightly greater than the
unity-gain frequency that would result without the
resistor - say 20% larger (recommended): ω = 1.2ω
z t
1
RC ≅
1.2 g m1
JMM v1.0
Lead Compensation
Design Procedure
ŒStart by choosing, somewhat arbitrarily, C C' ≅ 5pF
•Using Spice, find the frequency at which a -125°
phase shift exists. Let the gain at this frequency be
denoted A’ and ωt.
ŽChoose a new CC so that ωt becomes the unity-gain
frequency of the loop gain, thus resulting in a 55°
phase margin. This can be achieved by taking CC
according to the equation (iterations possible):
C C = C C' A'
•Choose RC according: 1
RC =
1.2ω t C C
•The resulting phase margin is approximately 85°
(leaving 5° for process variations). It may be neces-
sary to iterate on RC to optimize the phase margin
•If after step 4 the phase margin is not adequate, then
increase CC while leaving RC constant
‘Replace RC by a fet with the following size:
1
RC = rds16 =
 W
µ nC ox   VeffMicroLab,
16
vlsi27 (25/34)
JMM v1.0
 L 16
Compensation Independent of
Process and Temperature
u Making lead compensation process and temperature
insensitive
u the ratios of all transconductances remain relatively
constant over process and temperature variations as
all fets depend on the same biasing network:
gm 7 g m1
ω p2 ≅ ω ta ≅
C1 + C 2 CC
u when a resistor is used to realize lead
compensation, RC can also be made to track the
inverse of transconductance (1/gm7), and thus the
lead compensation will be mostly independent of
process and temperature variantions:
−1
ωz =
CC (1 / g m 7 − RC )
JMM v1.0
Process and Temperature (con’t 2)
Making RC proportional to 1/gm7
1
RC = rds16 =
 W
µ nC ox   Veff 16
 L 16
g m 7 = µ nC ox (W / L )7 Veff 7
The product RC 1/gm7 needs to be constant
(W / L )7 Veff 7
RC g m 7 =
(W / L)16 Veff 16
Therefor, all that remains is to ensure that Veff16 /Veff7 is independent of
process and temperature variations. The ratio can be made constant by
deriving Vgs16 from the same biasing circuit used to derive Vgs7
u The following approach results in the possibility of

on-chip “resistors”, realized by using triode-region
fets that are accurately ratioed with respect to a
single off-chip resistor -> modern µcircuit design
JMM v1.0
Process and Temperature (con’t 3)
if Veff 13 = Veff 7 25 Q 11 Q6
Vbias
then Va = Vb
then (gates connected)
25
Veff 16 = Veff 12 Q 12
Q 16 CC
Veff 7 Veff 13 25
Va
thus = Q 13
Veff 16 Veff 12 300
to make Veff 13 = Veff 7 we need Vb Q7
2ID7 2 I D13
=
µ nC ox (W / L )7 µ nC ox (W / L)13
I D 7 (W / L)7 I D 7 (W / L)6
= =
however the current
I D13 (W / L )13 I D13 (W / L)11
is set by Q6, Q11
(W / L)6 = (W / L )11
condition to be satisfied
(W / L)7 (W / L)13
RC g m 7 =
(W / L )7 (W / L )12
(W / L )16 (W / L )13 as ID12=ID13 are equal
JMM v1.0
Biasing an OpAmp
to Have Stable Transconductances
u Fet transconductances are the probably the most
important parameters in OpAmps to be stabilized
u the following approach matches transconductances
to conductance of a resistor
u as a result, the fet transconductances are
independent of power-supply voltage as well as
process and temperature variations
assuming (W / L )10 = (W / L )11
25 25

2 1 −
(W / L )13  Q 10
Q 11
( ) 
g m13 =  W / L 15 
Rb
25 25
for (W / L )15 = 4(W / L)13 Q 14 Q 12
1 100 25
g m13 = Q 15 Q 13
Rb
Rb
µ i (W / L)i I Di
g mi = × g m13
µ n (W / L )13 I D13
JMM v1.0
Exercises VLSI-27
Ex ana3.9 (difficulty: easy): Consider a differential
pair amplifier shown on transparency vlsi-27/3
where Ibias=200µA and all transistors have
W=100µm and L=1.6µm. Given
µnCox=92µA/V2 and rds-n=8000 [L (µm)]/[ID
(mA)]. Find the output impedance and the gain.
Result: Av =68.6V/V, rout=64kΩ (see
Johns/Martin pp146)
Ex ana5.1 (difficulty: easy): Find the gain of the

OpAmp shown on transparency vlsi-27/9. Assume
ID5=100µA, first stage VDG=0.5V, 2nd and 3rd
stage VDG=1V and bulk of Q8 connected to VSS.
Given µnCox =3µpCox=96µA/V2, VDD=-
VSS=2.5V, RL=10kΩ, γ=0.5V1/2, φF=0.35V,
α=5e6V1/2/m, Vtn=- Vtn=0.8V.
Result: Av =-6092V/V (see Johns/Martin pp224)
JMM v1.0
Exercises VLSI-27 (con’t 2)
Ex ana5.2 (difficulty: easy): Find the unity gain
frequency of the OpAmp shown on transparency vlsi-
27/9, with CC=5pF . Assume ID5=100µA, first
stage VDG=0.5V, 2nd and 3rd stage VDG=1V and
bulk of Q8 connected to VSS. Given µnCox
=3µ pCox=96µA/V2, VDD=-VSS=2.5V,
RL=10kΩ, γ=0.5V1/2, φF=0.35V,
α=5e6V1/2/m, Vtn=- Vtn=0.8V.
Result: fta = 24.7MHz (see Johns/Martin pp227)
Ex ana5.3 (difficulty: easy): Find the slew rate of

OpAmp on transparency vlsi-27/9, with CC=5pF .
Assume ID5=100µA. What circuit chane could be
done to double the slew rate but to keep ωta and
bias currents unchanged?
Result: SR = 20V/µs, to double SR: CC=2.5pF and
W1= W2= 75µm (see Johns/Martin pp229)
JMM v1.0
Ex ana5.4 (difficulty: easy): Consider the OpAmp
shown on transparency vlsi-27/9, where Q3 qnd Q4
are each changed to widths of 120µm and we want
the output stage have a bias current of 150µA. Find
the new sizes of Q6 qnd Q7 such that there is no
systematic offset voltage.
Result: W6 = 450µm, W7 = 360µm(see
Johns/Martin pp231)
Ex ana5.5 (difficulty: easy): One phase of an SC

circuit is shown, where the input can be modelled as
a voltage step. If 0.1% accuracy is needed in the
linear settling-time portion corresponding to 100ns,
find the required unity-gain frequency in terms of
the capacitance values, C1 and C2 and in absolute
values. For C2=10C1 and for C2=0.2C1.
Result: fta = 12.1MHz, fta = 66.0MHz,
(see Johns/Martin pp235) C1 C2
-
+
+ vout
A(s)
JMM v1.0
Ex ana5.7 (difficulty: medium): OpAmp has an open-
loop transfer function given by:
A0 (1 + s / ω z )
A(s ) =
(1 + s / ω p1 )(1 + s / ω 2 )
Assume that ω2=2π 50MHz and A0=104
a) Assuming ωz=inf, find ωp1 and the unity-gain
frequency ωt‘ so that the OpAmp has a unity-gain
phase margin of 55°
b) Assuming ωz=1.2 ωt‘ (use ωt‘ from a), what is
the unity-gain frequency ωt. Also find the new
phase margin.
Result: a) ωt‘=2π 35MHz, ωp1=2π 4.27kHz, b)
ωt=2π 46.6MHz, PM= -85° (see Johns/Martin
pp245)
JMM v1.0
Coming Up...
u Next topic…
Advanced Current Mirrors and OpAmps
u Readings
for next time…
Johns&Martin: Sections 3.8 and 5
u Exercises:
JMM v1.0
Analog Microelectronics
Advanced Current Mirrors and OpAmp Design
Today’s handouts:
(1) Lecture Slides
JMM v1.0
Outline
u Johns&Martin
u advanced current mirrors (chap 6.1)
u wide-swing current mirrors
u wide-swing constant-transconductance bias circuit
u enhanced output-impedance current mirrors (not yet)
u wide-swing current mirror with enhanced output
impedance (not yet)
u folded-cascode OpAmp (chap 6.2)
u small signal analysis
u slew rate
u Exercises (6.8 & 6.10)

u spice simulations
u problems
JMM v1.0
Advanced current mirrors
wide-swing current mirrors
u The classical two-stage OpAmp was dicussed in
vlsi27.
u Recently a number of alternate OpAmps designs
have been gaining in popularity. They make use of
more advanced current mirrors.
u Wide-swing current mirror:

u as shorter channel lengths are used, it becomes more
difficult to achieve reasonable OpAmp gains due to
transistor output-impedance degradation caused to short-
channel effects.
u Conventional cascode current mirrors limit the signal
swings available.
è wide-swing current mirror
JMM v1.0
Wide-swing current mirrors
Iin Vout Iout=Iin

Ibias
Vbias W /L
2 W /L
W /L n 2
n
(n + 1)2
Q5
Q4 Q1 Vout > (n + 1)Veff
Q3 Q2 for Q4: Vtn > nVeff

W /L W /L
u The basic idea is to bias the drain-source voltages of

transistors Q2 and Q3 to be close to the mini-mum
possible without them going to triode region.
u Choice of Ibias:
u Ibias equal to maximum of Iin (all fets in saturation)
u Ibias equal to nominal of Iin (for larger Iin , fets in triode, but
probably only during slew-rate)
u Design hints:
u a common choice for n is unity
u Q5 larger (0,1V to 0.15V) in order to offset the increased
threshold voltages for Q 1 and Q 4 due to their body effects
u L of Q 1 , Q4 and Q 5 are twice minimal channel length, L of Q 2
and Q 3 are just slightly larger than minimal channel length
(high frequency poles)
JMM v1.0
Wide-swing constant-
transconductance bias circuit
20/1 20/1 Q 11 Vbias-p

Q7 20/1
Q8
5/1.6 Vcasc-p
small W/L
20/1.6 20/1.6 20/1.6 Q 14
Q6 Q 10
Q9 2/20
Q 18
10/1
10/1
10/1.6 10/1.6
Q 15
10/1.6
Q4 Q 13
Q1 Q 16
40/1 10/1 Q5 10/1

2.5/1.6 Q 17
Q2 Q3 Q 12 10/1
Vcasc-n
RB Vbias-n
bias loop cascode bias start-up circuitry

injects current as long
see vlsi-27 as ID’s are zero
slide 29
JMM v1.0
Enhanced output-impedance
current mirror
u Another variation of the cascode current mirror is
the enhanced output-impedance current mirror
shown as simplified version
u basic idea: use of feedback amplifier to keep the
drain-source voltage across Q2 stable, irrespetive
of the output voltage
ð the additional amplifier increases the output
impedance (see classical cascode current mirror,
vlsi-25 slides 16, 17)
Rout ≅ g m1rds1rds2 (1 + A)
Iout Rout
Iin Vbias -
A Q1
+
Q3 Q2
JMM v1.0
Folded-cascode OpAmp
u many modern integrated CMOS OpAmps are

designed to drive only capacitive loads
u capacitive-only loads do not need voltage buffers to
obtain low output impedance of the OpAmp
u thus it is possible to realize OpAmps having higher
speed and larger signal swings than those who must
drive resistive loads
u these improvements are obtained by having only one
single high-impedance node at the OpAmp output
that drives only capacitive loads
u all internal nodes have relatively low impedance
(around gm) thus the speed is optimized
u the compensation is usually achieved by the load
capacitance
u the most important parameter is their
transconductance:
operational transconductance amplifier OTA
JMM v1.0
Folded-cascode OpAmp con’t
may be replaced by a wide-swing
constant-transconductance bias network
and thus VB1, VB2 would be Vcasc-n, Vcasc-p
current mirror
Q3 Q4
Q 11 folded cascode fets
(see vlsi-25 slide 19)
VB1
Q 13
Q 12
Q5 Q6
Ibias1
Q1 Q2 Vout
Vin -
+ CL
Ibias2
differential-input Q8
VB2
single-ended output Q7
compensation
Q9 Q 10
Purpose of Q12, Q13
- increase slew-rate performance
- recovering improvement from slew-rate wide-swing cascode current mirror
Design hints:
- Ibias1 and Ibias2 should be derived from a single bias network
- any current mirrors should be designed by parallel combination of unit size fets
JMM v1.0
small-signal analysis
Assumption: gm5 and gm6 are much larger than gds3 and gds4
- differential output current from drains of differential pair Q1 and Q2 is
applied to the load capacitance
- the small-signal current from Q1 passes directly from source
to drain of Q6 and thus to CL (indirect for Q2 to Q5 and CL)
Vout (s )
Av = = g m1 Z L ( s ) (for gm1 = gm2)
Vin (s )
g m1rout g m rds2
Av (s ) = rout ≅
1 + srout C L 2
(see vlsi-25 slide 20)
for mid-band and g m1 thus the unity-gain g m1

high frequencies Av ≅ frequency is ωt ≅
sC L CL
Design hint:
- for large load capacitances a maximal transconductance of input fets
maximizes band width, use n-channel fets
- input bias current 4 times larger than cascode current (maximizing dc gain)
Lead compensation (series resistance R C to CL)

g m1 g m1 (1 + sRC C L )
Av (s ) = ≅
1 1 sC L
+
rout RC + 1 / sC L
RC can be choosen to place a zero at 1.2 times unity-gain frequency
JMM v1.0
slew-rate
u The diode connected fets Q12 and Q13 are turned off
during normal operation and have almost no effect
u slew-rate limiting behavior:
u assume there is a large differential input voltage that
causes Q 1 to be turned on hard and Q 2 to be turned off
u since Q 2 is off, all of the bias current of Q 4 will be
directed to through cascode fet Q5 through n-channel
current mirror and out of the load capacitance
u the output voltage will decrease linearly with a slew-
rate given by:
Id4
SR ≅
CL
u Q1 and current source Ibias will go into triode region,
moving the drain voltage of Q 1 to the negative power
supply
u Q12 and Q 13 clamp the drain voltages so they don’t change
as much during slew-rate limitation
u in addition Q 12 and Q 13 increase the bias currents for Q 3
and Q 4 and thus for C L
JMM v1.0
Exercises VLSI-28
Ex ana6.2 (difficulty: medium): find reasonable fet sizes
for the folded-cascode OpAmp: Assume pos/neg 2.5V
power supply, power dissipation maximal 2mW,
current ratio 4:1 between input and cascode fets, bias
current or Q11 is 1/30 of Q3 (thus ignoring it for
power dissipation), maximum fet width is 300um,
L=1.6um and Veff=0.25V for all except input fets,
W1=W2=300um, rounding widths to 10um,
CL=10pF, unCox= 3u pCox= 96uA/V2
a) find all fet sizes, unitiy gain frequency,
b) slew-rate with and without clamp fets
c) reasonable lead compensation RC
Result: a) Q1 to Q4=300um, Q5, Q6=60um, Q7 to
Q10=20um, Q11 to Q12=10um, ωt=2π 38MHz
b) SR= 32V/us,
c) RC=347Ω (see Johns/Martin pp271-273)
JMM v1.0
Coming Up...
u Nexttopic…
Comparators
u Readings
for next time…
Johns&Martin: Sections 6.1 and 6.2
u Exercises:
JMM v1.0
VLSI Systems Design
FSM-D Architecture Model
FSM-D
data data
data-path
(RTL logic)
inputs outputs
control
(sensors) (actuators)
cotrol path
(finite state machine)
control control
Goal: You are able to use logic gates and flip-flops wisely
and not only in an ad-hoc manner. You master the finite
state machine data path model.
JMM v1.4
Architecture Philosophy
?FSM-D architecture model is composed of 2 blocks:

?finite state machine (FSM)
?data-path (D)
?Goal of FSM-D architecture model

?structured design approach
?ressource optimization
?readability, documentation
?FSM Chatacteristics
?manager
?controlling, taking decision, initiating sub-tasks
?Data-Path Characteristics
?worker, specialist
?executing, calculating, storing & moving data
JMM v1.4
FSM-D Architecture Model
?The FSM-D architecture model

?based on FSM model and data-path model
?interface: inputs, outputs
FSM-D
data data
data-path
(RTL logic)
inputs outputs
control
(sensors) (actuators)
cotrol path
(finite state machine)
control control
JMM v1.4
FSM Structures
? Mealy machine
s[k+1]
i[k]
o[k]
transition state output
?outputs are dependent of logic register logic
inputs and state

s[k]
s[k ? 1] ? f (i[k ], s[k ])

o[k ] ? g (i[k ], s[k ])
? Moore machine
s[k+1]
i[k]
o[k]
?outputs are dependent logic register logic
on states only
(functional restricted) s[k]
s?k ? 1?? f ?i?k ?, s?k ??

o?k ?? g ?s?k ??
? Medwedjew machine s[k+1]

i[k]
?outputs are dependent transition
logic
state
register o[k]
on states only
?outputs are hazard-free s[k]
s[k ? 1] ? f (i[k ], s[k ])

o[k ] ? s[k ]
JMM v1.4
Data-Path Elements
?A typical data-path consists of 3 types of basic

elements
?buses, multiplexors, de-multiplexors
?functinal units, comparator, like adder, barrel shifter,
ALU, etc
?memory elements, like flip-flop, register, register file,
etc
bus[31:0] bus[31:0]
32 32 32 mux
bus[31:16] 32 32
16 32
32
2
a 1
32 cout enable
ADD
result register
32 32
b cin 32
32 1
JMM v1.4
Data-Path Memory Element
?Memory elements store new values at every clock

cycle
?To give the FSM full control to the data-path, the
data-path memory elements need to be upgraded
with an enable control input
mux register register

enable
di 32 d di do
do 32 32
enable 32
clock
enable
di data
do data
JMM v1.4
Design Steps
?A tutorial design shall serve as vehicle for a

practical approach: Black Jack player
?A key element in the FSM-D design procedure are
the interface definitions
?design steps:
?step 1: definitions of the algorithm
?step 2: FSM-D interface definition
?step 3: data-path design
?step 4: data-path interface definition
?step 5: FSM interface definition
?step 6: FSM state definition
?step 7: FSM design
?step 8: VHDL coding
?step 9: test-bench design and simulation
JMM v1.4
Design Step 1:
Algorithm Definition
?goal of the Black Jack game:
?get as close as possible to 21 points
?lost if overpassed 21 points
?game restrictions:
?the cards have the following values:
2, 3, 4, 5, 6, 7, 8, 9, 10 and 11
as well as boy, lady and queen all
three representing 10 points
?game rules:
?ask for as many cards as needed
?the Ass can be treated as 11 points or as 1 point
?our players behavior:
?ask for cards as long as the summed-up points are below
16
?treat Ace alyways as 11 points
?when overpassed 21 points treat possible Ace as 1 point
to get a second chance
JMM v1.4
Design Step 2
FSM-D Interface Definition
?defining the interface of the overal FSM-D
architecture model
?defining edge sensitivity of clock and active level of
control signals
FSM-D
cardReady newCard
score(4:0)
BlackJack Player
cardValue(3:0)
lost
clk
finished
start
JMM v1.4
Design Step 3:
Data-Path Definition
?data-path has to be able to execute all functional
operations of the algorithm
?clearly separate control-path and data-path tasks as
in the manager/worker analogical model
?use memory elements, buses and multiplexers for
storing and moving data
?use combinational logic for functional operations
like adding, comparing, etc
JMM v1.4
Design Step 3:
Data-Path Definition:
loading&comparing
?loading card value into register

?comparing to Ass
cmp11
A=B?
A B
enaLoad register 11
enable
cardValue(3:0) di do
regLoad
clk rst
JMM v1.4
Design Step 3:
accumulating
?accumulating the card values
cmp11
A=B?
A B
enaLoad register 11
enable
cardValue(3:0) register
di do a enaAdd enable
regLoad regAdd
ADD
result di do
clk rst b
clk rst
JMM v1.4
Design Step 3:
comparing sum
?comparing the accumulated values

?visualizing score
cmp11
A=B? cmp16 cmp21

enaLoad A B
A>B? A>B?
register 11
enable enaAdd A B A B
enaScore
register register
di do a enable 16 21 enable score
regLoad
ADD
result di do di do
clk rst regAdd
b
clk rst clk rst
JMM v1.4
Design Step 3:
subtracting 10
?insert a second path to the load register and adder

to subtract 10
cmp11
A=B? cmp16 cmp21

enaLoad A B
A>B? A>B?
mux register 11
-10 enable A B A B
in0 regLoad enaAdd enaScore
do register register
in1
ADD di
cardValue result di do do
clk rst regAdd
sel b
clk rst clk rst
JMM v1.4
Design Step 4
Data-Path Interface Definition
?defining the interface of the data-path block
?defining edge sensitivity of clock and active level of
control signals
DataPath score(4:0)
cardValue(3:0)
clk
rst
sel
enaLoad
cmp11
cmp16
cmp21
enaAdd
enaScore
JMM v1.4
Design Step 5
FSM Interface Definition
?defining the inputs and outputs of the FSM block
FSM input signals
cmp11 cmp16 cmp21 cardReady
FSM output signals

finished lost newCard sel enaLoad enaAdd enaScore
JMM v1.4
Design Step 5:
Interface Definition
Completed FSM-D Hierarchy
BlackJack Player
FSM-D
DataPath
cardValue(3:0) score(4:0)
rst
enaScore
enaLoad
enaAdd
cmp11
cmp16
cmp21
sel
clk
cardReady ControlPath newCard
lost
finished
rst
start
JMM v1.4
Design Step 6
FSM State Definition
?draw a skeleton state with placeholders for the
state name and the output signals.
state
name
enaScore
newCard
enaLoad
finished
enaAdd
lost
sel
output
signals
JMM v1.4
Design Step 7
FSM Design – FSMD Timing
?single clock cycle schema
?Moore type FSM
?FSM-D timing diagram
?registered values are available in next state or
when leaving next state
?combinational values are available in current
state or when leaving current state
state LoadReg CheckVal Idle1 OpenData Idle2
clock
enable (FSM)
registers (D) new value
inform (D)
select (FSM)
data bus (D) data
JMM v1.4
Design Step 7
FSM Design
?design the Moore type state diagram
?conditions on arrows are FSM inputs
?output values are defined in states
?use bilzard arrow for asynchronous reset
reset
cardReady
state CallCard state LoadCard

name cardReady name
enaScore
enaScore
newCard
newCard
enaLoad
enaLoad
enaAdd
enaAdd
broke
broke
hold
hold
sel
sel
output output
signals 0 0 1 - - 0 0 signals 0 0 1 1 1 0 0
cmp11cmp16
cardReady
cmp11cmp16cmp21
state AddCard state Handshake
name cardReady name
enaScore
enaScore
newCard
newCard
enaLoad
enaLoad
enaAdd
enaAdd
broke
broke
hold
hold
sel
sel
output output
signals 0 0 0 - 0 1 0 signals 0 0 0 - 0 1 0
cmp11cmp21 cmp16
cmp16cmp21
JMM v1.4
Design Step 8:
Coding – Data-Path
?all registers with associated logic are placed in one
process (same clock and asynchronous reset)
?loosely coupled combinatorial logic can be coded
with conditional signal assignments
cmp11
A=B? cmp16 cmp21

enaLoad A B
A>B? A>B?
mux register 11
-10 enable A B A B
in0 regLoad enaAdd enaScore
do register register
in1
ADD di
result di do do
clk rst regAdd
sel b
clk rst clk rst
process(clk,rst)
begin
if (rst = ‘0‘) then process
regLoad <=“00000“;
regAdd <=“00000“;
regScore <=“00000“;
elsif (clk‘event and clk=‘0‘) then continuous conditional
if (enaAdd=‘1‘) then
regAdd <= regAdd +regLoad;
assignment
end if;
... cmp11 <= ‘1‘ when (regLoad =“01011“), else ‘0‘;
end if; cmp16 <= ‘1‘ when (regAdd > “10000“) else ‘0‘;
end process; cmp21 <= ‘1‘ when (regAdd > “10101“) else ‘0‘;
JMM v1.4
Design Step 8:
Coding – FSM
?one clocked process is used for the state transition
?one combinatorial process is used for the state
dependent output assignment
state
s[k+1]
i[k]
o[k]
logic register logic
s[k]
process(clk,rst)
begin process(state)
if (rst = ‘0‘) then begin
state<=StartState; case state is
elsif (clk‘event and clk=‘0‘) then when StartState =>
case state is outvec <= “000--00“;
when StartState => when CallCard =>
state <= CallCard; outvec <= “001--00“;
when CallCard => when others =>
if (cardReady = ‘1‘) then outvec <= “UUUUUUU“;
state <= LoadCard; -- used for VHDL analysis
end if; -- „null“ for synthesis
when others => end case;
state <= IllegalState; end process;
-- used for VHDL analysis
-- „null“for synthsis
finished <= outvec(6);
end case;
lost <= outvec(5);
end if;
newCard <= outvec(4);
end process;
...
JMM v1.4
Design Step 9:
Test-Bench Design
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested
?there is a nice measurement equipment
?there are skilled and hard working people
?there are no signals coming or going to the outside of
the lab
Test Bench
control response
and generation
stimulus and
JMM v1.4
Design Step 9:
Test-Bench Design – Test Cycle
?cycle based test
?apply input patterns at begining of test cycle
?capture response after rising or falling clock edge
apply capture
stimuli response
test cycle
clock
inputs
outputs stable stable
(sync)
JMM v1.4
Design Step 9:
Test-Bench Design – Simulation
?cycle based test
?apply input patterns at begining of test cycle
?observe response after rising or falling clock edge
?visualize data-path registers and FSM state
JMM v1.4
Errors and Pitfalls
?asynchronous external inputs to FSM provoke state

hazards
?imagine a 0.1 ns hazard can be captured in state register
?imagine 100 states in FSM
?imagine 100 MHz clock frequency
?100 errors per second
? input synchronization for all „external“ (non-
synchronous) FSM inputs
new state always

input with hazards
synchronization
register
s[k+1]
i[k]
non-synchronous o[k]
inputs transition state output
logic register logic
s[k] FSM
JMM v1.4
Summary and Conclusion
?FSM-D architectural model supports structured

design approach
?9 design step approach for FSM-D design
presented
?task re-distribution between FSM and data-oath is
crucial:
?Ass counting (0, 1 or 2) n Black Jack dealer. Who
should do it? FSM or data-path?
?workers/manager analogy is used to assign sub-
tasks to control-path (manager) and data-path
(specialized workers)
JMM v1.4
JMM v1.4

Vlsi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Vlsi

Uploaded by

Copyright:

Available Formats

VLSI System Design

Overview of VLSI Design Issues

Professor: Dr. Marcel Jacomet (based on transparencies designed by

MicroLab, VLSI-1 (2/28)

MicroLab, VLSI-1 (4/28)

 microelectronics is a key technology of the world

MicroLab, VLSI-1 (5/28)

VERY LARGE SCALE INTEGRATED CIRCUIT

Technique where many circuit components and

Early (circa 1977) characterization of circuit

MicroLab, VLSI-1 (6/28)

Bell Labs lays the groundwork:

1951: Shockley develops junction

MicroLab, VLSI-1 (7/28)

Robert Noyce experimented in the late 40’s with

MicroLab, VLSI-1 (8/28)

In 1957, Noyce left Shockley’s

In early 1958, Hoerni invents

In mid 1959, Noyce develops

MicroLab, VLSI-1 (9/28)

1963: Densities and yields are improving.

1967: Fairchild markets the semi-

1968: Noyce and Moore leave

MicroLab, VLSI-1 (10/28)

In 1970, making good on

In 1971 Intel introduces the first

MicroLab, VLSI-1 (11/28)

Introduced in 1972, the 8008 had 3,500

Last, but not least, on our tour is the

MicroLab, VLSI-1 (12/28)

Many disciplines have contributed to the current state of the art

 organize  generate  verify

Circuit analysis programs predict circuit behavior at

MicroLab, VLSI-1 (15/28)

 computer has to take over routine work

MicroLab, VLSI-1 (16/28)

Chip classification according to number of active

classification #transistors example

year minimal channel length

MicroLab, VLSI-1 (17/28)

can you really imagine the chip complexity of

street map image

MicroLab, VLSI-1 (18/28)

(A) a programmable general purpose ASIC with 1/4 million

(B) a processor able to execute 64 knowledge based rules

(C) the fastest fuzzy processor in the world, designed

Q: Which engineer drew the most fets?

MicroLab, VLSI-1 (21/28)

 budget ($, speed, area, power, schedule, risk)

 behavioural design, verification

 logic design, verification

MicroLab, VLSI-1 (22/28)

There’s a lot that needs checking:

MicroLab, VLSI-1 (23/28)

MicroLab, VLSI-1 (24/28)

The trick is to choose some implementation rules that

Plan for what happens after you turn it on and

MicroLab, VLSI-1 (26/28)

history & microelectronic

 VHDL Starter (recommended)

 CAD Exercises on the MicroLab web pages

 CBT CD on VHDL for your PC (lending

JMM v1.4  different small articles

microelectronics is a key technology of the world

organize generate verify

computer has to take over routine work

budget ($, speed, area, power, schedule, risk)

behavioural design, verification

logic design, verification

VHDL Starter (recommended)

CAD Exercises on the MicroLab web pages

CBT CD on VHDL for your PC (lending

JMM v1.4 different small articles

summary of the main SPICE DC parameters used in

overlap capacitance (extrinsic) since gate slightly

4.3 through 4.3.4 (capacitances)

noise immunity. Since we’re signalling values using

if pullup is off, VOL = ______