Keka Vlsi

VLSI System Design
Overview of VLSI Design Issues
Professor: Dr. Marcel Jacomet (based on transparencies designed by Chris Terman at MIT, completely updated and adapted at MicroLabMicroLab-I3S)
Overview Microelectronic history the complexity of microelectronics design steps Goal: You are familiar with the microelectronics history, have an idea about the microelectronics complexity and you have an overview of the VLSI design steps.
MicroLab, VLSI-1 (1/28)
JMM v1.4
Whats expected of you

Class/Homework 50% in class 50% homework
Readings from a Starter Guide to VHDL and some articles. Some problems to be worked at home. SelfSelfstudy of the VHDL language with help of the CBT CD from Doulouse. Doulouse. Some design exercises to be done in the lab. Specify, design and simulate a small VHDL design project using a datadata-path / finit state machine. Place & route it on a FPGA target technology (due date: July 19th at 13h00, 2002) One 70 minute inin-class test. Meant to be duck soup if youve been coming to lectures and doing the lab and homework (date: Friday July 12th, 2002).
Project 40% of final grade
Test 60% of final grade
JMM v1.4
Timetable 4th Semester: Introduction to VLSI System Design

Date 1111-15.3. 1818-22.3. 2525-29.3. 1111-19.4. 2222-26.4. 29.429.4-3.5. 6-10.5. 1313-17.5. 2020-24.5. 2727-31.5. 3-7.6. 1010-14.6. 1717-21.6. 2424-28.6 1-5.7. 8-12.6. 1515-19. 6 19.6.
JMM v1.4
Topic vlsi1: history & complexity vlsi8: micro technologies -vlsi8: micro technologies vlsi21: toptop-down design, VHDL Ex400, 401 -vlsi21 & Ex402 vlsi21 & Ex404,405 vlsi21 & Ex406Ex406-408 vlsi21 & Ex409 vlsi21: & Ex410 Ex450 Ex451 Ex452 Test test discussion and outlook at 13h00 project due
SelfSelf-Study A VLSI tutorial How a silicon int. article Hoff VHDL/CBT VHDL/CBT VHDL/CBT VHDL VHDL VHDL chapter 5 VHDL finish project project project project project
So, whats VLSI Systems Design all about?

Youll get a bottombottom-up tour of how integrated circuits are engineered. Well talk about fieldfield-effect transistors: how they work, how theyre built, effects of new technologies various design and layout techniques, from the ordinary to the bizarre, for creating combinational and sequential circuits, datapaths, datapaths, memories, buffers, regular logic structures, how you tackle the problem of designing circuits with 1,000,000 gates -- youre not in Digital Technique anymore!

JMM v1.4
Key Technology Microelectronics

microelectronics is a key technology of the world economy technology development is extremely aggressive postpost-grade engineering education is important influence of other technologies like software engineering key technologies may be used as weapons. 1991 Japan hold 80% share of the world production of 4MB DRAMs. DRAMs. Artificial raw material shortage are disastrous. very few Swiss chip fabs. fabs. Our raw material is the high education standard, that means YOU

JMM v1.4
What is a VLSI Circuit?

VERY LARGE SCALE INTEGRATED CIRCUIT
Technique where many circuit components and the wiring that connects them are manufactured simultaneously into a compact, reliable and inexpensive chip.
Early (circa 1977) characterization of circuit size before people realized that the number of components per chip was quadrupling every 24 months ( (Moores Moores Law)! This growth rate has slowed in recent years can you guess why?

JMM v1.4
Course Outline/Brief history
Bell Labs lays the groundwork: 1940: Ohl develops PN junction 1945: Shockleys lab established 1947: Bardeen and Brattain create pointpoint-contact transistor with two PN junctions. Gain = 18.
1951: Shockley develops junction transistor which can be manufactured in quantity. 1952: Dummer forecasts solid block [with] layers of insulating, conducting and amplifying materials 1954: The first transistor radio! Also, TI makes first silicon transistor (price $2.50)

JMM v1.4
Early integration
Jack Kilby, Kilby, working at Texas Instruments, first dreamed up the idea of a monolithic integrated circuit in July 1959. By the end o of f the year, he had constructed several examples, including the flipflip-flop shown in the patent drawing above. Components are connected by handhand-soldered wires and isolated by shaping and pn diodes used as resistors. Robert Noyce experimented in the late 40s with transistors while a physics major at college. He went to MIT where where much to his surprise, few people had even heard about the transistor. After getting his PhD in 1953, he worked in industry, industry, finally arriving at Mountain View, CA and Shockley Semiconductor Labs in 1955.

JMM v1.4
In 1957, Noyce left Shockleys lab to form Fairchild SemiSemiconductor with Jean Hoerni. Hoerni. Gordon Moore is another founder.
In early 1958, Hoerni invents technique for diffusing impurities into the silicon to build planar transistors and then using a SiO2 insulator.
In mid 1959, Noyce develops first true IC using planar transistors, backback-toto-back pn junctions for isolation, diodediode-isolated silicon resistors and SiO2 insulation with evaporated metal wiring on top.

JMM v1.4
Practice makes perfect...

1.5 mm
1961: TI and Fairchild introduced the first logic ICs (cost ~$50 in quantity!). This is a dual flipflip-flop with 4 transistors. 1963: Densities and yields are improving. This circuit has four flip flops.
0.97 mm
1967: Fairchild markets the semisemi-custom chip shown below. Transistors (organized in columns) could be easily rewired using a twotwo-layer interconnect to create different circuits. This circuit contains ~150 logic gates.
3.81 mm
1968: Noyce and Moore leave Fairchild and found Intel. No business plan, just a promise to specialize in memory chips. They raise $3M in two days and move to Santa Clara. By 1971 Intel had 500 employees; by 1983 it had 21,500 employees and $1100M in sales.
JMM v1.4
The Big Bang

2.87 mm
In 1970, making good on its promise to its investors Intel starts selling a 1K bit RAM, the 1103. It was a bear to interface to, but its density and cost make it the only game it town.
In 1971 Intel introduces the first microprocessor, designed by Ted Hoff. The 4004 had 44-bit buses and a clock rate of 108KHz. It had 2300 transistors and was built in a 10um process. It never captured much interest in the market and was soon eclipsed by its more capable brothers.

JMM v1.4
Exponential Growth
Introduced in 1972, the 8008 had 3,500 transistors supporting a bytebyte-wide data path. Despite its limitations, the 8008 was the first microprocessor capable of playing the role of computer CPU as demonstrated on the cover of the July 74 issue of RadioRadio-Electronics.
Last, but not least, on our tour is the 8080. Introduced in 1974, the 8080 had 6,000 transistors fabed in a 6um process. The clock rate was 2Mhz, more than enough to ignite the personal computer industry. At least Paul Allen and his partner thought so when they wrote a BASIC interpreter for the 8080 in 1975. They would later collaborate in another, more profitable, venture...

JMM v1.4
Today
AVPAVP-III Video Codec from Lucent Technologies
Many disciplines have contributed to the current state of the art art in VLSI design: solidsolid-state physics materials science lithography and fab device modeling circuit design & layout architecture algorithms CAD tools
JMM v1.4
Well be concentrating on the rightright-hand column
ComputerComputerAided Design
CAD Tools
generate
#1
verify
Symbolic layout tools to ease the task of physical design; mask verification to ensure manufacturability.
organize
StandardStandard-cell place and route for random logic.
Circuit analysis programs predict circuit behavior at all the process corners. GateGate-level and behavioral simulators help you get it right the first time! Tools to do the tedious, repetitive work such as routing,tiling a mosaic of buildingbuilding-block cells, or verifying that the layout and schematic match.
JMM v1.4
CAD Tools #2
Problem: designing highly complex VLSI circuits (100K to xM fets) fets) classical, iterative procedures are unsuitable precise transistor models are necessary for reliable predictions data inflation Solution: new design methodologies powerful design tools high level design languages silicon compiler would be useful

JMM v1.4
VLSI Design Challenge

Goal: designing circuits with increasing complexity in always shorter times computer has to take over routine work deliberate the designer from unnecessary low qualification work shift of design activities to higher level abstract work computer has to support new design methods

JMM v1.4
Chip Complexity #1
Chip classification according to number of active elements and minimal feature size: classification SSI MSI LSI VLSI ULSI year 1970 1980 1985 1992 2002 2002 2010 #transistors 1 - 100 100 - 1k 1k - 100k 100K ? example gates registers uP RAM, sig. proc.
minimal channel length 10m 5m 2m 0.5m 0.13 0.13m ?

JMM v1.4
Chip Complexity #2
can you really imagine the chip complexity of today's VLSI chips and not just express it as a mere number street map image year feature block 1970 10x10m 200m 1980 10x5m 200m 1992 10x0.7 200m
chip 2mm 5mm 10mm
town Biel Paris Switzerland

JMM v1.4
Architecture
(Multiple choice) This is a picture of (A) a programmable general purpose ASIC with 1/4 million transistors on a 40mm2 designed in a 0.7m CMOS full custom technology. (B) a processor able to execute 64 knowledge based rules in parallel due to a 3 stage pipelined architecture with hardhard-coded adder, multiplier, divider architecture. (C) the fastest fuzzy processor in the world, designed by MicroLabMicroLab-I3S and presented at the international FUZZ98 conference in New Orleans ANSWER: _________
JMM v1.4
Circuit Design & Layout

Standard cell Full custom
RAM Generator
Q: Which engineer drew the most fets? fets? ______

JMM v1.4
VLSI: The Ideal Implementation Medium?

VLSI gives the designer control over almost everything: architecture, logic design, speed, area, power, densities are increasing, costs decreasing with each passing year is used by almost everyone: No one gets fired for building an ASIC was the enabling technology for much of the economic growth of the 80s and 90s. It will no doubt continue in its starring role for some time come. Is life really a bowl of cherries?

JMM v1.4
VLSI Fact Fact-ofof-Life #1: So much to do, so little time

You need a design methodology : budget ($, speed, area, power, schedule, risk) lowlow-level building blocks, highhigh-level architecture behavioural design, verification logic design, verification layout, verification

JMM v1.4
VLSI FactFact-ofof-Life #2: You cant reach in and fix it

Notice that the word verification verification kept appearing in the previous slide. Mistakes can be costly: find bug(s) ? ? reverify 1 week Ecu 10k new masks 3 days Ecu 25k fab run 12 weeks Ecu 1k/wafer slip ship date Ecu Ecu Ecu Theres a lot that needs checking: circuit must operate at all corners verified at building block level logic must be correct, operate reliably verified at RTL/gate level chip has to interoperate with system verified at behavioral level chip has to be manufacturable manufacturable verified at mask level, at tester
JMM v1.4
VLSI FactFact-ofof-Life #3: Verification is a tedious task

JMM v1.4
VLSI Fact Fact-ofof-Life #4: You cant find all the bugs
The key word here is find: one cant explore the behaviour of the circuit under all possible conditions some of the bugs arise from unanticipated interactions which, by definition, one never thinks of testing its not clear when one is done looking for bugs! Time pressures mean that most searches stop too soon.
The trick is to choose some implementation rules that result in a circuit that is correct by construction*. For example: choose a simple clocking scheme module inputs must go only to fet gates disallow unclocked feedback make register t(clk t(clkclk-toto-Q) > t(hold)+skew use poly only for local interconnect no diffusion wires etc., etc., etc.
* or at least avoid as many problems as possible!
JMM v1.4
VLSI FactFact-ofof-Life #5: Nobodys perfect

Plan for what happens after you turn it on and nothing happens. provide lots of observability and controlability. Youll need to localize and then find the bug. have a way to run the chip slowly and/or stop it without it burning up or loosing bits. figure out how to track down performance problems without relying on fast I/O (tester pins are slow!) leave room in the budget (time, Ecu) Ecu) for debugging. write and run your manufacturing tests before tape out.

JMM v1.4
Microelectronics in 4th Semester

history & complexity microelectronic technologies
exercises with CAD tools synthesis
EXPERIENCE data path / fsm project
VHDL
design flow Course material Textbook from Weste & Eshraghian for 4th and 5th semester (voluntary) Copy of transparencies (placeholder for private notes) VHDL Starter (recommended) CAD Exercises on the MicroLab web pages CBT CD on VHDL for your PC (lending from MicroLab in 4th semester) MicroLab, VLSI-1 (27/28) different small articles
JMM v1.4
Coming Up...
Well be traveling toptop-down in 4th semester and bottombottom-up in 5 & 6 semester: Next topic Microelectronic technologies like standard cell, gate array, seasea-ofof-gates, macro cell, FPGA, tiny micromicro-controllers. Readings for next time web CBT tutorials see on http://www.microlab http://www.microlab. microlab.ch/academics/courses ch/academics/courses How a silicon integrated circuit is made (web CBT) A VLSI Tutorial up to chapter with NAND/NOR (web CBT from Uni Manchester) (German erman) T. Hoff: Article about the P History (G erman) To learn more about Intels early days and to ogle some die photos of oldieoldie-butbut-goodie chips browse at the Intel link of the MicroLab VLSI course web page.

JMM v1.4
VLSI Design I
The MOSFET model
Wow ! Are device models as nice as Cindy ?
Overview The large signal MOSFET model and second order effects. MOSFET capacitances. Introduction in fet process technology Goal: You can use the large signal equivalent MOS device equation. You are familiar with second order effects like body effect, channel length modulation. You know the MOS capacitances. You know the basic steps in MOS fabrication.
JMM v1.4
Lets build a MOSFET

There are lots of different recipes to choose from. Like most things in life, you get what you pay for: the ability to have good bipolar devices, radiation hardness, reduced latchlatch-up and substrate noise, are all extra cost options. Well consider a general process: bulk CMOS with a pp-type substrate:
Use <100> surface to minimize surface charge
500um slice of a silicon ingot that has been doped with an acceptor (typically boron) to increase the concentration of holes to 1014/cm3 - 1018/cm3.
p-type Back is metal metalliz lized to provide a good ground connection.
Good for nn-channel fets, fets, but pp-channel fets will need a nn-type well (or tub) to live in!
JMM v1.4
Next, a thick (0.4um) layer of silicon dioxide, called field oxide, is formed on the surface by oxidation in wet oxygen. This is then etched to expose surface where we want to make a mosfet: mosfet:
Now grow a thin (0.01um = 100 ) layer of silicon dioxide, called gate oxide, on the surface by exposing the wafer to dry oxygen.
The gate oxide needs to be of high quality: uniform thickness, no defects! The thinner the gate oxide, the more oomph the fet will have (well see why soon) but the harder it is to make it defect free.

JMM v1.4
On top of the thin oxide a 0.7um thick layer of polycrystalline silicon, called polysilicon or poly for short, is deposited by CVD. The poly layer is patterned and plasma etched (thin ox not covered by poly is etched away too!) exposing the surface where the source and drain junctions will be formed:
gate oxide (only under poly)
poly wires
field oxide
exposed surface for source and drain junctions
Poly has a high sheet resistance (25 ohms/square) which can be reduced by adding a layer of a silicided refractory metal such titanium (TiSi2), tantalum (TaSi2) or molybdenum (MoSi2). These have sheet resistances of 1, 3 or 5 ohms per square, respectively. This is great for memory structures that have lots of poly wiring.

JMM v1.4
The entire surface is doped, either by diffusion or ion implantation, with phosphorus (an electron donor) which creates two nn-type regions in the substrate. The phosphorus also penetrates the poly reducing its resistance and affecting the nfets threshold.
diffusions are selfself-aligned with poly n+ n+ wires: 2020-30 ohms/sq. n+ p
Finally an intermediate oxide layer is grown and then reflowed to flatten its surface. Holes are etched in the oxide (where contacts to poly/diff are wanted) and alumin aluminum deposited, patterned and etched.
metal wires (0.08 ohms/square)
??? diff contact (0.25 - 10 ohms)
n- channel MOS field effect transistor!

JMM v1.4
NFET Operation
Picture shows configuration when Vgs < Vto S G D Ids = 0
n+ p mobile holes, fixed negative ions B
n+
depletion layer
no mobile carriers, but fixed negative ions (slight intrusion into n+, but mostly in p area)
mobile electrons, fixed positive ions (n+ means heavily doped with donors, doesnt imply positive charge!) Terminal with higher voltage is labelled D, the other is labelled S so Ids >= 0. D
Other symbols: S B
JMM v1.4
almost always ground

FET = field effect transistor

The four terminals of a fet (gate, source, drain and bulk) connect to conducting surfaces that generate a complicated set of electric fields in the channel region which depend on the relative voltages of each terminal.
Picture shows configuration when Vgb > Vto gate inversion happens here Eh source Ev drain
bulk INVERSION: A sufficiently str strong ong vertical field will attract enough electrons to the surface to create a conducting nn-type channel between the source and drain. CONDUCTION: If a channel exists, a horizontal field will cause a drift current from the drain to the source. Expect Ids proportional to Vds*(W/L)? Vds*(W/L)?
JMM v1.4
Threshold voltage
The gate voltage required to form the channel is called the threshold gate-source voltage at which the voltage. Many factors affect the gatechannel becomes conductive. Threshold voltage for sourcesource-bulk voltage zero:
VTO = Vt ms + Vfb Q Q VTO = 2 F + b + ms fc , C ox C ox
ox t ox
NA 0.61V for nn-channel 2 kT ln -0.61V for pp-channel q n i
kT N DN A ln 2 q ni
2 si q N A 2 F

JMM v1.4
Body effect (second order)

As Vsb increases, the depth of the depletion region increases, exposing more of the fixed acceptor (i.e. negative) ions in the substrate. Thus the second term in the threshold voltage equation on the previous slide increases from to
2 si qN A 2 F 2 si qN A (Vsb + 2 F )
the threshold voltage of the nn-channel transistor is now:

Vtn = Vtn0 + ( Vsb + 2 F 2 F ) 2 si qN A = C ox
T2 Vsb>0 T1 Vt2> Vt1 Vsb=0
As well see, this effect comes into play in seriesseries-connected fets where only one of the fets will have Vsb = 0 and the other fets will have Vsb > 0 and a higher threshold voltage.
JMM v1.4
Basic DC equations
MOS transistors have 3 regions of operation: cutoff region (subthreshold (subthreshold) subthreshold) linear region (triode region) saturated region (active region)
polysilicon gate SiO2 source diffusion W L
drain diffusion
Cutoff or subthreshold region: Vgs <=V <=Vt Ids = 0 There is still a small current described in the second order effects (weak inversion). Important to model for analog circuits: I ds Vds
JMM v1.4
Linear operating region

Vs Vgs > Vt 0 < Vds < Vdsat
Ids
Larger Vgs creates deeper channel which increases Ids channel length is almost always min allowable mobility (un > up)
Larger Vds increases drift current but also reduces vertical field component which in turn makes channel less deep. Channel will pinchpinch-off, when
Vds = Vgs - Vt = Vdsat
fet gain factor k=C k= Cox
2 Vds W ox I ds = Vgs Vt Vds L t ox 2
max value at Vds = Vdsat, but then channel is pinched off (see next slide)
JMM v1.4
only linear when Vds is small, otherwise parabolic

Saturated operating region

Vs Vgs > Vt Vdsat < Vds
Ids
Voltage at channel end remains essentially constant at Vdsat so drift current also remains constant: device is in saturation.
Electrons arriving from source are injected into drain depletion region and accelerated towards drain by field proportional to Vds - Vdsat usually reaching the drift velocity limit.
W ox I ds (sat ) = Vgs Vt 2 L t ox
this is just Ids from previous slide evaluated at Vds = Vdsat

JMM v1.4
ChannelChannel -length modulation (second order)

Vs Vgs > Vt Vdsat < Vds
Ids
L = L - dL dL
This looks just like a fet with a channel length of L < L. Shorter L implies greater Ids...
As Vds increases, dL get larger
As Vds increases the effective channel length gets shorter so Ids(sat) increases. dL is proportional to Vds Vdsat but most people approximate channel length modulation as a linear effect:
W ox I ds (sat ) = Vgs Vt 2 L t ox
) (1 + V
2
ds

JMM v1.4
NFET Ids curves

Put it together and what have you got?
plot of Ids vs. Vds for Vgs = 0 ,1, 2, 3, 4 and 5V
Can you find the following in the plot? Ids vs. Vds when Vgs = 0V Ids vs. Vds when Vgs = 5V value of Vt value of Vdsat evidence of body effect evidence of channel length modulation
JMM v1.4
SPICE Models
There are different models used in circuit simulators: level 1 is our simple model including the most important second order effects described level 2 model is based on device physics level 3 is a semisemi-empirical model allowing to match equations to the real circuit circuit: : example BSIM model from Berkeley models subthreshold characteristics summary of the main SPICE DC parameters used in all three models at the end of this chapter
. M1 4 3 5 0 nfet W=1u L=0.5u AS=1p AD=1p PS=3u PD=3u . . .MODEL nfet NMOS +TOX=1E+TOX=1E-8 +CGB0=345p CGS0=138p CGD0=138p +CJ=775u CJSW=344p MJ=0.35 MJSW=0.26 PB=0.75 +. . . . . .
JMM v1.4
MOSFET Capacitance Estimation

the dynamic response of MOS systems strongly depends on the parasitic capacitances associated with the MOS device. The total load capacitance on the output of a CMOS gate is the sum of: gate capacitance (of other inputs connected to out) diffusion capacitance (of drain/source regions) routing capacitances (output to other inputs)
Cgd gate Cgs source Cgb Cgs source Csb drain Cdb substrate Csb gate Cgb channel depletion layer substrate
JMM v1.4
Cgd
tox drain Cdb
MOSFET gate capacitances

Cg = Cgd + Cgs + Cgb OxideOxide-related capacitances come in two forms: overlap capacitance (extrinsic) since gate slightly overhangs diffusions and bulk:
for both Cgs and Cgd amount of overlap
C(overlap) = W LD Cox C(overlap) = 2L CGB0

for Cgb
for SPICE Cgs = W CGS0 Cgd = W CGD0 Cgb = 2L CGB0
channelchannel-charge related capacitances (intrinsic): cutcut-off: Cgb = Cox W L Cgs = Cgd = 0 linear:
shielded by channel Cgb = 0 Cgs = Cgd = 0.5 Cox W L
equally shared between S and D note capacitive coupling of gate and drain/source saturation: Cgb = 0 channel pinched off Cgd = 0 channel shortened
Cgs = 0.67 Cox W L

JMM v1.4
MOSFET diffusion capacitances

Junction capacitances Cdb and Csb are a function of the applied terminal voltages and diffusion dimensions:
source/drain diffusion xj
channel sidewall faces channel bottom junction faces p-type substrate sidewalls face p+ channel stop zerozero-bias C/unit length of sidewall junction perimeter of diffusion
zerozero-bias C/unit area of bottom junction area of diffusion
negative for reverse biased
C diff =
C jA Vj 1 V b
Mj
C jsw P Vj 1 V b
Mjsw
grading coeff. coeff. junction voltage
builtbuilt-in junction potential
grading coeff. coeff.

JMM v1.4
P-channel MOSFETs
S G D
p+
p+ n p
threshold voltage is negative since we need attract holes to form inversion layer Other symbols:
PFET is built inside its own substrate: a nn-type well or tub diffused into p-type bulk substrate. Dont forget well contacts! G Terminal with lower voltage is labelled D, the other is labelled S D B n-well always connected to Vdd to keep pn junction backback-biased
S off: Vgs > Vt lin: lin: Vgs>Vt, Vds>Vgs-Vt sat: Vgs>Vt, Vds<Vgs-Vt
JMM v1.4
DepletionDepletion -mode MOSFETs

S G D
n+ p B
n+
channel doped with donors to give negative threshold voltage, i.e., depletion fets are always on.
This mosfet is always conducting but, like ordinary enhancement fets, fets, it will conduct more current as Vgs increases. One can build logic circuits with only nnchannel devices (NMOS): enhancement fets for pulldowns and depletion fets as static pullups. pullups. Since NMOS logic dissipates DC power its been largely replaced by CMOS.

JMM v1.4
Coming Up...
Next topic Static characteristics of MOS inverters: input and output voltages, noise margins, power dissipation. Readings for next time Weste:
sections
models), 3 thru 3.2.2 (process technology) and 4.3 through 4.3.4 (capacitances)
2 thru 2.23 except 2.2.2.4 - 2.2.2.7 (fet (fet
CBT: Study the chip fabrication text of the university of Manchester at the MicroLab VLSI course web link.

JMM v1.4
Useful Constants
sym 0 ox Si VT q k ni value 8.8542E8.8542E-12 3.9 0 11.7 0 25.8 1.6022E1.6022E-19 1.381E1.381E-23 1.45E10 units F/m F/m F/m mV C J/K cm-3 description permittivity permittivity of SiO2 permittivity of silicon kT/q kT/q (@300K) charge of electron Boltzmanns Boltzmanns constant intrinsic carrier concentration

JMM v1.4
Alcatel 0,5um Process Parameters

sym Vt0 tox NA k Cox 0 2F Cgb0 Cgs0 Cgd0 Cj Cjsw Mj Mjsw
JMM v1.4
param nmos pmos units description VTO 0.61 -0.61 V threshold voltage TOX 1Ethin oxide thickness 1E-8 1E1E-8 m NSUB 4E16 4E16 cm-3 substrate doping density U0 290 72 cm2/Vs charge mobility KP A/V2 fet gain factor GAMMA V0.5 bulk threshold param. param. COX F/m2 oxide capacitance capacitance V- 1 channel length /L modulat. V-1m-1 channel length mod fact. modulat.1e1e-8 2e2e-8 PB 0.7556 0.78469 V built in junction potent. PHI 0.77 0.77 V surface inversion pot. CGB0 CGS0 CGD0 CJ CJSW MJ MJSW 3.45E3.45E-10 dito F/m overlapping cap per 2L 1.38E1.38E-10 dito F/m overlapping cap per W 1.38E1.38E-10 dito F/m overlapping cap per W 7.75E7.75E-4 8.15E8.15E-4 F/m2 zerozero-bias cap / unit A 3.44E3.54E-10 F/m zero3.44E-10 3.54E zero-bias cap per unit P 0.35 0.36 grading coeff for bottom 0.26 0.27 grading coeff sidewall MicroLab, VLSI-2 (23/24)
VLSIExercises: VLSI -2
Ex vlsi2.1 (difficulty: easy): Calculate the missing parameters on the previous transparency like intrinsic transconductance k, bulk threshold parameter and oxide capacitance Cox of an nfet (Alatel 0.5m process) Result: kn=100A/V2, kp=24.9A/V2, =0.334V0.5, Cox=3.45E=3.45E-7 F/cm2 (see Weste pp48ff) Ex vlsi2.2 (difficulty: easy): Calculate the threshold voltage shift due to the body effect of an nfet at Vsb = 2.2V ( (Alcatel Alcatel 0.5m process) Result: dVtn = 0.282V (see Weste pp55) Ex vlsi2.3 (difficulty: easy): Calculate the transconductance n of an nfet (Alatel 0.5m process), W=1 m, L= 0.5 m Result: n=200 / /V2 (see Weste pp53) Ex vlsi2.4 (difficulty: easy): Calculate the capacitances of an nfet with Vsb= Vsb=Vdb=3V, Vdb=3V, W=1m, L=0.5m, A=1m2, P=3m (Alatel (Alatel 0.5m process) Result: Cgate=2.35fF, Cdrain=Csource=1.2fF (see Weste pp183pp183-191) Weste pp99: 2.10: Have a look at ex 8, 9
JMM v1.4
VLSI Design I
Static characteristics of MOS inverter
Static characteristics? Does that mean its not going to move?
Overview Static transfer characteristic of CMOS gates Goal: You know the transfer characteristic of CMOS gates and know how to calculate noise margins

JMM v1.4
NFET Review
D G S Vgs 0.7V G + - S D + Vds >= 0
Operating regions: cutcut-off:
Vgs < Vt
linear:V linear: Vgs >= Vt Vds < Vdsat saturation:

Ids
S Vgs - Vt S
Vgs >= Vt Vds >= Vdsat
Vgs
Vds
JMM v1.4
PFET Review
D G S Vgs -0.7V G + - S + D Vds <= 0
Operating regions: cutcut-off:
Vgs > Vt
linear:V linear: Vgs <= Vt Vds > Vdsat saturation:

-Vds -Vgs
S Vgs - Vt S
Vgs <= Vt Vds <= Vdsat
-Ids
JMM v1.4
Bipolar Logic
Isnt this a CMOS course?
Bipolar = two signal levels 0 when V near 0 1 when V near Vdd

Vdd
Inverter recipe: pullup: make this connection

when Vin near 0 so that Vout = Vdd Vout
Vin
pulldown: make this connection
when Vin near Vdd so that Vout = 0
one power supply => low impedance source for 2 levels receivers have a simple job => only make one decision no DC power if connections not made at same time Boolean logic has been around a long time

JMM v1.4
Characterizing Inverters
What goals do we want to achieve with our inverter implementation (and, more generally, other functions)? fast propagation delay (next lecture!) low power dissipation compact layout noise immunity
Vout
VOH voltage-transfer Draw voltagecurve (VTC) for inverter. Shade-in areas that ShadeVTC cant enter. What can we say about gain? What is ideal inv. VTC?
VOL VIL VIH
Vin

JMM v1.4
Noise Margin
Are there other ways of signalling?
noise immunity. Since were signalling values using voltages, we want good noise margins. This means that we need to make an allowance for noise when assigning voltage levels for valid inputs and outputs definition: NM L = VIL max VOL max
NM H = VOH min VIH min
output characteristics Logical High Output Range Vdd VOHmin
input characteristics Logical High Input Range
VIHmin VILmax Logical Low Output Range VOLmax Vss

JMM v1.4
Logical Low Input Range
Choosing signal voltages

This is a subject on which reasonable people can disagree! One possible line of attack:
Vout
merged VTC for all process corners & devices
Step 1: pick VIL and VIH

dont want to amplify noise so find values of Vin where VTC gain = 1 or -1. Choose smallest VIL and largest VIH
VIL
VIH
Step 2: pick VOL and VOH

choose values so that (1) VTC is in legal territory (2) leave desired noise margins
Vout
VOH VOL VIL VIH
NML
NMH

JMM v1.4
Inverter pulldown devices

The NFET makes an ideal pulldown device:
Ipd
if pullup is off, VOL = ______ no DC connection when Vin < ______ increase width to increase Ipd compact layout
Vout
cutcut-off pulldown region saturated pulldown region Vin = Vout Vin = Vout + Vt0
linear pulldown region VIL always > Vt0

Vin
Vt0
JMM v1.4
Inverter pullup devices

Resistor. No load on input, VOH=Vdd Will dissipate static power; increasing R will reduce power and increase noise margin, but lowlow-toto-high transition gets slower. Only practical if process supports undo undop doped poly which has sheet resistance of 10M Ohm/square. DepletionDepletion-mode NFET. No load on input, VOH=Vdd. Connecting gate to source sets Vgs = 0 so Ipu is determined only by Vout. Layout can be compact since pullup is in same well as pulldown; pulldown; buried contact can be used to connect gate to source. Only found in NMOS processes. EnhancementEnhancement-mode NFET. VOH= Vdd- Vt unless gate of pullup is driven above Vdd. If gate is not switched off, pullup needs to be weak to avoid excessive power dissipation, but this may entail larger layouts. Useful where PFETs not wanted (e.g., some I/O structures).
PseudoPseudo-NMOS using saturated PFET as load device. VOH= Vdd. Useful for building large fanfan-in NOR gates found in static ROMs and PLAs where static power dissipation is okay.
JMM v1.4
Vgs, gs,pu = Vin-Vdd

G
Inverter with PFET pullup

Vds, ds,pu = Vout-Vdd
S D D
Vin
G
Vout
steady-state negligible steadypower dissipation VOL = 0V, VOH = Vdd VTC transition very sharp switching point can be adjusted by fet sizing
Vgs, gs,pd = Vin
Vds, ds,pd = Vout nonnon-vertical only because of channelchannel-length modulation Vout Vin = Vout
Vdd
n=off
p= sat
lin p=
Wn/ Wn/Wp>1 Wp>1
n= lin
sat n=
p=off
Wn/ Wn/Wp<1 Wp<1 Vin
Vt,p
Vt,n
Vdd+Vt,p
Vdd
JMM v1.4
Build your own VTC

In the steady state: Ids,pd(Vin,Vout) = -Ids,pu ds,pu(Vin-Vdd,Vout-Vdd)
Ids, ds,pd -Ids, ds,pu Ids, ds,pd -Ids, ds,pu
Vin = 0.5V
Vin = 1.5V
Vout
Vout
Vout
Ids, ds,pd -Ids, ds,pu
Vin = 2.5V
Vout
Vin
When both fets are saturated, small changes in Vin produce large changes in Vout
Vin = 3.5V
Vin = 4.5V
Vout
Vout
JMM v1.4
Ben Bitdiddles Buffer!
Vin
Vout
How many would you buy?

JMM v1.4
Coming Up...
Next topic Dynamic characteristics of MOS inverters: propagation delay, effects of rise and fall times. Transistor sizing, interconnect issues, estimating performance. Readings for next time Weste:
Sections
2.3 thrugh 2.3.2

JMM v1.4
Ex vlsi3.1 (difficulty: easy): Calculate the CMOS inverter threshold values for the following conficonfigurations (Alcatel 0,5m process,VDD=3,3V) a) Wn = Ln, Wp = Lp b) Wn = 10 Ln, Wp = Lp c) Wn = Ln, Wp = 10 Lp Result: a) Vinv = 1.30V, b) Vinv = 0.893, c) Vinv = 1.88V (see Weste pp66) Ex vlsi3.2 (difficulty: medium, time consuming): Calculate the noise margin and VIL, VIH, VOL, VOH, for a CMOS inverter operating at 3.3V with n= p, Utn= -Utp=0.61V. Result: VIL = 1.39V, VIH = 1.91V, VOL = 0.26V, VOH = 3.04, NML= NML=1.13V Weste pp99: 2.10 ex 5 (difficulty: medium, time consuming): Design an input buffer that may be used to interface with a TTL driver (V (Vdd=3.3V, VOL=0.8V, VOH=2.0V). Show full derivations of DC conditions. Assume Wn =1m and Ln = Lp = 0.5m Result: Wp = 1.51m MicroLab, VLSI-3 (14/14)
JMM v1.4
VLSI Design I
Dynamic characteristics of MOS inverters
Wow! 0 to 3.3 volts in 300ps!
Overview gate delay modeling power dissipation Goal: You are familiar with CMOS gate delay models like PenfieldPenfield-Rubenstein and wire models. You know the influence of body effect and large loads to gate delay. You know why ground bounce occurres. occurres. You know the different factors in power dissipation.
JMM v1.3
Static properties reviewed

sharp transition: inverter good receiver for voltagevoltagebased signalling Vout increasing Wn decreasing Wp Define theshold voltage Vinv as voltage where Vin = Vout on VTC. Ids,n ds,n increasing Wp decreasing Wn
Vin VOH=Vdd, VOL=0, sharp transition => good noise margins VOH=Vdd => pfet off when Vin=VOH => no static power VOL=0 => nfet off when Vin=VOL => no static power
VTC describes static behaviour. When Vin changes, Vout lags behind because it takes time for capacitors to charge/discharge. So, in real, life Vin reaches Vth before Vout does.
JMM v1.3
Choosing what to measure

V
90%
tf
Vin
Vin
Vout
??? 10%
tr
td
Vout
Rise time, tr = time for a waveform to rise from 10% to 90% of its steadysteady-state value Fall time, tf = time for a waveform to fall from 90% to 10% of its steadysteady-state value Delay time, td = time between input transition (when Vin = ???) and output transition (when Vout = ???). If ??? = Vinv, can delay be negative? does Vinv differ for each gate? so does td(seq. of gates) = sum(td)? should we choose 50% instead of Vinv?

JMM v1.3
Signal delay time

Signal delay time is composed as follows gate delay time interconnection delay time

due to minimization the delay times decreases the output impedance of buffers increases, thus the importance of interconnection delays increases
due to continuing miniaturization, signal delay time becomes less dependent on gate delay but more dependent on interconnection delay time
switch level mode of fet switch level mode of inverter Uds C R Uin Cin
JMM v1.3
UCC Rp Uout Rn
Ugs
Fall time analysis #1

Vdd Vout static transition dynamic transition Vin = Vout
n=off
p= sat
lin p=
speed
n= lin
sat n=
p=off
Vin Vt,p
Vt,n
Vdd+Vt,p
Vdd
the switching speed is limited by the time taken to discharge the capacitance CL the static transition curve moves to the right if the input transition is fast p-fet gets cutcut-off during the whole falling output time n-fet immediately gets saturated, later on linear
JMM v1.3
Fall time analysis #2

Saturated: Vout >= Vdd - Vt,n Vout Idsat,n dsat,n CL So, time to fall from 0.9Vdd to Vdd - Vt,n is given by
dVout n 2 CL = (Vdd Vt,n ) dt 2 2C L 2 n (Vdd Vt,n )
0.9V dd
Vdd Vt, n
dVout
Linear: Vout < Vdd - Vt,n
Vout Rn CL
dVout Vout CL = = Idn dt Rn

So, time to fall from Vdd - Vt,n to 0.1Vdd is given by
Vdd Vt ,n
function of Vout
CL
0.1Vdd
dVout I dn
Adding to get total fall time (Weste, Eq 4.37):

Vt,n/Vdd
CL 2 (n 0.1 ) + 0.5 ln (19 20n ) tf = n Vdd (1 n ) (1 - n )

tr is similar equals 3 to 4 for Vdd=3V=3V-5V and Vt,n=.5V=.5V-1V equals 3.6 for C05M
JMM v1.3
Estimating delays
In most CMOS circuits, the delay of a single gate is dominated by the output raise and fall time. Thus:
tr t dr = 2 tf t df = 2
Having found a general form for approximate rise and fall times, one might estimate all delays using the same general form:
t delay = A delay L CL W
width expressed as multiple of minimum width
looks like a resistor!
Where Adelay is a constant that depends on the power supply and transition voltages, the process and the minimum mosfet dimensions. This last dependency might strike one as odd, but usually all mosfets are built using the minimum allowable mosfet length for the process. Rather than solve the equations analytically, one can use Spice to determine the value of various useful constants: Ar, Af, Adr, Adf. These can be used in quick&dirty calculations for sizing transistors during the design process. MicroLab, VLSI-4 (7/29)
JMM v1.3
Input rise/fall & delay

How do input rise and fall times affect delay?
fast inputs will quickly turn off one mosfet and provide maximum Vgs to the driving mosfet for most of the output transition slow inputs will leave both mosfets on longer, reducing effective current to/from load capacitance and Vgs will be lower than above.
So we might expect slower input transitions to lead to longer output delay times. One rule of thumb (Weste (Weste, Weste, p. 216ff)
~0.2 for Vtn = 0.61V, Vdd = 3.3V
1 + 2n t dr = t dr step + t f,in 6
1 2p t df = t df step + t r,in 6
valid for input transitions that arent too long

JMM v1.3
Bootstrapping & delay

CGD
When the input starts to rise, the output, which was high, starts to fall. Thus the voltage across CGD changes requiring the input to supply more current to charge CGD, slowing the input transition. Since CGD is small, this is usually a small effect. When inverter is biased into its linear region, CGD may appear multiplied by the gain of the inverter (Miller effect). This doesnt usually matter in digital circuits since the input passes rapidly through linear region. Useful in analog circuits...

JMM v1.3
Multiple inputs & delay

A B C D Cout Cab Cbc Ccd Intermediate node capacitances
How should we model delays when we have multiple inputs? When A, B, C and D are logic 1:
treat series mosfets as resistances in series. Lump intermediate node capacitance with load capacitance. Penfield-Rubenstein model which predicts use Penfieldwhere Ri is the summed resistance from point i to ground and Ci is the capacitance at point i. Penfield-Rubenstein Slope Model uses effective Penfieldt df resistance simulated by Spice:
t d = iR i iC i
t d = iR i C i
Rn =

JMM v1.3
Body effect & delay
A B C D
If A goes from 0 to 1 while B, C and D are 1, then all the intermediate nodes in the pulldown chain have already been discharged and the top mosfet sees only a small body effect. If D goes from 0 to 1 while A, B and C are 1, then the intermediate nodes are all one Vt below Vdd and the upper mosfets see a larger body effect. Thus A is the faster input!

JMM v1.3
Driving large loads #1

If large loads have to be driven, the delay may increase drastically. Large loads are output capacitances, clock trees, etc. C t d = t inv L = 1000 t inv CG 1
CG CL=1000 CG
A possibility to reduce the delay, but probably not the optimum:

40 t inv 5 t inv 5 t inv
1 CG
40
200 CL=1000 CG
td =
40 200 1000 t inv + t inv + t inv = 50 t inv 1 40 200

JMM v1.3
Driving large loads #2

To drive a large load capacitance one might employ a sequence of n inverters, each a factor a larger than the previous one:
1 CG n=4 inverters a a2 a3 CL
The delay through each stage is atd where td is the average delay of a minimumminimum-sized inverter driving another minimumminimumsized inverter. We want an = (CL/CG), so
CL a t d Total delay = n (a t d ) = ln C G ln (a )
Thus, total delay is minimized when a = e = 2.7

7 6 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8
in practice a=3...5

JMM v1.3
Power dissipation #1
the power consumption is low compared to other technologies scaling down increases the power dissipation density with respect to chip area power dissipation produces heat on the chip, which has to be carried off through the chip socket power dissipation is one of the limiting factors in todays CMOS VLSI chips low power applications is a speciality of EM (Neuenburg, Neuenburg, watches, battery applications, etc)

JMM v1.3
Power dissipation #2
sources of power dissipation: static power dissipation (quiescent current) dynamic power dissipation
dc power dissipat dissipation: short circuit current (power to ground) due to switching ac power dissipation: capacitor current (charging, rerecharging) due to switching
static power dissipation
there is always one fet off, so only leakage current is present
I0 = I S (e qV / kT 1 ) PS = I0 VDD
JMM v1.3
Dynamic power dissipation #1
Comparison of dynamic short circuit current vs. capacitive current. As expected, the short circuit current have a less important contribution when the load gets large. Slower input transition would increase short circuit current.
Uin Uout
W/L=4
Uin
W/L=2
Uoutout-A
Idsn Idsp Idsn Idsp
W/L=4
Uin
W/L=2
Uoutout-B
50fF
W/L=4
Idsn Uoutout-C
200fF
Uin
W/L=2
Idsp
JMM v1.3

Average dynamic power for switching a squaresquare-wave input with a repetition frequency of fp = 1/t 1/tp is (capacitor current) t p /2 tp 1 1 Pd = in (t )Vout dt + i p (t )(VDD Vout )dt tp 0 t p t p /2 dt, Assuming a step input and taking in(t) = CLdVout/dt , i.e., the capacitive current, we get:
CL Pd = tp
Vdd
CL Vout dVout + tp 0
Vdd
(V
DD
Vout )d (VDD Vout )
Aha! Now one can see why everybody changes from 5V to 3.3V and to 2.5V!
2 C L VDD 2 fp Pd = = C L VDD tp
proportional to switching frequency but independent of device parameters

JMM v1.3

Short circuit power dissipation is given by Psc = Imean VDD
tr VDD+Vtp Vtn Imax Imean t1 t2 t3 tp tf
The above waveform shows the short circuit current Psc = t rf 3 (VDD 2 Vt ) 12 t p

JMM v1.3
Total power dissipation
Total power dissipation is: Ptotal = Ps + Pd + Psc dynamic power dissipation is dominant use switching activity to estimate power dissipation:
2 Pd = n switching C total VDD f
switching activity: nswitching = percentage of switching gates there exist simulators estimating power dissipation using the switching activity

JMM v1.3
Build your own power meter

linear currentcurrent-controlled current source + Vs = 0 Is g*I g*Is RY Vy(0) = 0V Device or Circuit CY Vy -
Periodic input Vin(t) = Vin(t+T)
CL
If one chooses and
Vdd C y g= T
RyCy >> T Then Vy(T) in volts will equal the average dynamic power in watts drawn from the power supply over one period.

JMM v1.3
Power and ground bounce
Metal powerpower-carrying conductors have to be sized for three reasons:

metal migration power supply noise RC delay limit current density contact replication
general rule:

J AL 0.4... 1mA / m
I
I

JMM v1.3
Its the wires, stupid

As process dimensions shrink, wiring capacitances start to dominate the mosfet capacitances. To estimate wiring capacitances, consider the following figure:
l w t h Cpp fringingfringing-field capacitance
parallelparallel-plate capacitance
ParallelParallel-plate capacitance given in process files. Fringing capacitance is significant when t is comparable to h.
JMM v1.3
Fringing Capacitance
Figure 6.11 from CMOS Digital Integrated Circuits: Leblebici: Analysis and Design, by Kang and Leblebici:
For a long conductor where (t/h)=0.4, (w/h)=0.25, (w/l)=0, the total capacitance may be 10x the parallel plate capacitance.
JMM v1.3
Wire model?
Today, the longest wire on a VLSI chip might be 2cm which has time of flight of ~130ps assuming SiO2 = 3.9 0 If the signal rise/fall time is longer than the time of flight we can model wires as a distributed RC network. Longer wires or shorter rise/fall times require the wire to be modelled as a transmission line. For short wires, a lumped RC model is sufficient. For longer wires, we use the distributed RC model where signal propagation can be shown to obey the diffusion equation:
R/unit length
dV d 2 V rc = 2 dt dx
distance from driver
C/unit length
Which means the prop time tx = kx2 with the signal edge becoming dispersed with increasing x.
JMM v1.3
Eq. Diffusion Eq . in real life

rcl2 Weste, Weste, Eq. Eq. 4.28, t = 2 .2 but 10% to 90% rise/fall time 2 Ex vlsi4.3: clock with 50pf load distributed by 1-wide metal wire running from clock buffer in corner of 10mm x 10mm chip. buffer r = 0.05 ohm/square c = 50pf/20mm l = 20mm
a) t = ? Fix: drive clock from central location to decrease l and widen clock wire to 20: r = 0.0025 ohm/square c = 50pf/10mm l = 10mm c) t = ?
whew!
b) t = ?

JMM v1.3
Inductance
BondBond-wire inductance can cause deleterious effects in large, high speed I/O buffers
package inductance: 3 .. 15 nH
with process shrinking onon-chip inductance has to be taken into account

onon-chip inductance: 10 .. 50pH/mm
Vdd
dI dV = L dt
L i(t)
design techniques: 9 separate power pins for I/O pads and chip core 9 multiple power and ground pins 9 careful selection of the position of the power and ground pins on the package 9 adding decoupling capacitances on the board 9 increase the rise and fall times 9 use advanced package technologies (SMD, etc)

JMM v1.3
Coming Up...
Next topic Combinational logic: series/parallel switch networks, transmission gates. Performance optimis optimisation. ation. Readings for next time Weste:
4.4
(inductance) 4.3.6, and 4.5 thru 4.5.1, and 4.5.4 thru 4.5.5 except 4.5.4.4, and 4.6.3 (delay modelling) 4.7 (power consumption) 4.8 (sizing routing conductors)
You should read the rest of chapter 4 when you get the chance ...

JMM v1.3
Ex vlsi4.1 (difficulty: easy): Calculate the inductive spike at the power supply provoked by 8 output buffers, each driving 50pF in 4ns, Vdd=3.3V, total bonding inductance 15nH Result: dVtot = 1.24V (see Weste pp 205) Ex vlsi4.2 (difficulty: easy): a) Calculate the power supply width Wpower necessary for feeding a clock buffer running at 50MHz driving 100pF. b) What is the ground bounce with the chosen conductor? (JAL=0.5mA/m, power supply distance l = 1mm, Vdd=3.3V, Rmetal1 = 72m/sq, tr= tf=1ns) Result: a) Wpower=33 m, b) dV = 0.72V (see Weste pp 239) Ex vlsi4.3 (difficulty: easy): Calculate the clock distribution delay for the example on transparency 25 Result: a) td=55 ns, b) td=27.5 ns, b) td=1.38 ns (see Weste pp 200)
JMM v1.3
Ex vlsi4.4 (difficulty: easy): Calculate Ar and Af for a CMOS inverter ( (Vdd Vdd=3.3V, Vdd=3.3V, Alcatel 0.5m process) Result: Ar =43.9 k, Af =10.9 k (see Weste pp208ff and transparency 7) Weste pp370: 5.9 ex 14 (difficulty: easy): A low power 3.3V chip has a clock of 12MHz. In the power downdown-mode, the clock driver drives 5mm of a 2m wide metal1 wire. If the area capacitance of metal is Ca=2.37pF/m2 and the sidewall capacitance is Cf0= 2.37pF/m what is the powerpower-down dissipation, assuming this is the dominant term? What is the dissipation if the wire is reduced to 50m length? Result: Pd = 85W, 0.85W (see Weste pp 235)

JMM v1.3
VLSI Design I
CMOS Combinational Logic
Overview Euler rules for complex CMOS gates Layout and stick diagram Goal: You know how to design compact layout of complex CMOS logic gates with the Euler rules. You are familiar with transmission gates and its limitations.
JMM v1.4
How bout more than 1 input?

Vdd
Logic recipe: pullup: make this connection

when we want F(A1,,An) = 1 F(A1,,An)
...
A1 An ...
...
pulldown: make this connection

Finally! I was getting tired of inverters...
when we want F(A1,,An) = 0
we want VOH = Vdd, better use only pfets in the pullup path similarly, since we want VOL = 0, better use only nfets in the pulldown path looking at pulldown path: since nfets are on when VGS > VTH, output will be pulled low when right combination of inputs are high CMOS gates are naturally inverting

JMM v1.4
Complementary logic
Now you know what the C in CMOS stands for!
We want complementary pullup and pulldown logic, i.e., the pulldown should be on when the pullup is off and vice versa. pullup on off on off pulldown off on on off F(A1,,An) driven 1 driven 0 driven X no connection
Since theres plenty of capacitance on the output node, when the output becomes disconnected it remembers its previous voltage -- at least for a while. The memory is the load capacitors charge. Leakage currents will cause eventual decay of the charge (thats why DRAMs need to be refreshed!). No connection is also useful for constructing tristate drivers! In this case, we call this state Z which is short for highhigh-Z which is short for high impedance which is how engineers say no connection. Isnt jargon wonderful?
JMM v1.4
CMOS complements
What a nice VOH you have...
pulldown nfet block
Thanks. It runs in the family...

pullup pfet block
conducts when VGS is high A B conducts when A is high and B is high: A.B
conducts when VGS is low
conducts when A is low or B is low: A+B = A.B A
B conducts when A is low and B is low: A.B = A+B

conducts when A is high or B is high: A+B
JMM v1.4
Development of CMOS gates /1

Example: CMOS NAND gate
B 0 1
F = A*B
A 0 1 1 1 1 0
Step 1: development of nfet block. Logic miniminimization of 0 in Karnaugh diagram
F=A*B
A B

JMM v1.4

A B 0 1 0 1 1 1 1 0
Step 2: development of pfet block. Logic miniminimization of 1 in Karnaugh diagram
F=A+B

JMM v1.4

A B 0 1 0 1 1 1 1 0
Step 3: put nfet and pfet block together
F=A*B
F A B

JMM v1.4
NAND & NOR

2-input NAND. When output is low, two nfets are in series. So to keep output fall time equivalent to that of an inverter, the nfets have to be twice as wide. Pfet widths can be same as those in the inverter (but remember there were already 2x nfet widths!). Can be extended to large fanfan-in but practical limit is 4 inputs. Why? 2-input NOR. When output is high, two pfets are in series. So to keep output rise time equivalent to that of an inverter, the pfets have to be twice as wide. Nfet widths can be same as those in the inverter. Can be extended to large fanfan-in but practical limit is 4 inputs. NOR gates are considered less good than NAND gates. Why?
A B
A1
An
PseudoPseudo-NMOS NOR gates are used to build high fanfan-in NOR gates for PLAs to save area (at some cost in static power).
JMM v1.4
Layout of simple gates

VDD p-type substrate n-type well metal/pdiff metal/pdiff contact with detail removed
Wp
Lp IN OUT
Wn Ln GND
metal2 metal poly n+ diff
contact from metal to ndiff
p+ diff

JMM v1.4
Layout Rules #1
layout rules are the common language between design and process engineers conservative rules absorb process disturbances and variations layout rules must be respected by the designer layout rules reflect the limits of a process, they describe:
minimal distance, overlap minimal width (e.x. channel length, )
layout readability is improved using colours:

metal polysilicium n-diffusion p-diffusion n-well contact, via blue red green yellow brown black

JMM v1.4
Layout Rules #2
symbol and mask layout of a CMOS inverter
n-well contact (n(n-diff)
bulk contact (p(p-diff)

JMM v1.4
Stick Diagram
stick diagrams are technology independent no layout rules need to be known mask layout may be generated automatically

JMM v1.4
(again again) NAND & NOR ( again )
A B

JMM v1.4
FanLarge Fan -In CMOS Gates

CMOS gates with large fanfan-in suffer from: body effect unsymmetrical delay large delay never use more than 4 or 5 fets in series increment logic depth &
&
&
&

JMM v1.4
CMOS Gate Recipe

Step 1. Figure out pulldown network that does what you want, e.g., F = A*(B+C) Step 2. Walk the hierarchy replacing nfets with pfets, pfets, series subnets with parallel subnets, and parallel subnets with series subnets Step 3. Combine pfet pullup network from Step 2 with nfet pulldown network from Step 1 to form fullyfully-complementary CMOS gate.
A B C
B C
B C A
But isnt it hard to wire it all up?

JMM v1.4
Complex CMOS Gates /1

classical CMOS logic gates are always inverting logic gates complex CMOS logic gates are a mixture of AND and OR structures with a final inversion Example: F = A * B + C * D Step 1: generation of nfet block (logic 0) F=A*B+C*D
A B C D
Step 2: generation of pfet block (logic 1) F = (A + B) * (C + D)
C A
D B

JMM v1.4
Step 3: put everything together. What about the layout ?
C A A
D B C D
where is this signal in the transistor schema ?
A B C D
&
&

JMM v1.4
Complex CMOS Gates Layout /1

Goal: compact layout. All complex gates may be designed using a single row of nfets and a single line of pfets, pfets, thus adjacent drain/source diffusions of fets are very close. Euler rule: generate an nn-graph by replacing the nfet block with vertices for nodes and edges for fets generate a dual pp-graph find a sequence containing all edges in the nn-graph. This sequence is called Euler n-path. generate an Euler p-path with the same labelling as the Euler n-path. If not possible start again. the labelling sequence of the 2 Euler paths are the gate sequence of the single row nfet/ nfet/pfet CMOS gate.

JMM v1.4
Complex CMOS Gates Layout /2

VDD C A A B N2 D N1 B F C D VSS C N3
start F A start
VDD
N3
N1
N2
D VSS
A -> B -> D -> C

JMM v1.4

C A A B D B F C D
A -> B -> D -> C
A
JMM v1.4

C A B A B C

JMM v1.4
A Quiz!
/1

JMM v1.4
A Quiz!
/2
Find the minimal transistor circuit (2 * 4 fets) fets) and the most compact layout using Eulers rule.
CD AB
00 01 11 10
00
01
11
10
1 0 0 1
1 0 0 0
1 0 0 0
1 0 0 0

JMM v1.4
Quiz : Solution
F=A*B+B*C*D F = B * ( A + C * D)
C
equation ready for pp-block

VDD start
VSS
P1 D
N1 A P2 start
B F
D -> C -> A -> B

JMM v1.4
Transmission Gates
CMOS
A S B
nMOS
A S B
If VA = VDD then current will flow from A to B until VB = _____ If VA = 0 then current will flow from B to A until VB = _____ Assuming S and -S are complementary signals, the CMOS transmission gate (TG) acts as a switch, controlled by S, that has no inherent voltage drop (unlike a switch constructed from a single nfet or pfet which exhibits at VT drop at one rail or the other).

JMM v1.4
CMOS TG Electrical Model

S=VDD S=0 A B A B
S=0
S= VDD
switch is off
switch is on
How on is on? Assume VA = VDD then

nfet = sat pfet = sat nfet = sat pfet = lin nfet = off pfet = lin VB 0V R
|VT,p|
Req,p eq,p Req,n eq,n
VDD-VT,n
VDD
Req,TG eq,TG
Req,n eq,n || Req,p eq,p VB 0V VDD-VT,n VDD

JMM v1.4
TG Circuits: MUX
A Y=A*S+B*S B S Is this node always the output of this gate?
inverter not drawn

JMM v1.4
TG Circuits: 4 to 1 MUX
multiplexers can easily be done with TG never forget that TG are bibi-directional compact layout by combining identical gates
A B F C D S1 S2
JMM v1.4
Best XOR in Town

A B =1 F A B
1& 1
12 transistors
A
A*B+A*B B Is this node always the output of this gate?
8 transistors
A B
A*B+A*B Is this node always the output of this gate?

6 transistors
JMM v1.4
TG Quiz
Find the function of the following 4 transistor circuit:
F B

JMM v1.4
TG Circuits: Problems
difficult to get compact layout outputs behave like bibi-directional signals many TG in series provoke large delays
Uin
Uout
Uin
R C
R C
R C
R C
R C
Uout
= 2.2 (RC )2
JMM v1.4
Coming Up...
Next topic Dynamic ( (precharge precharge/evaluate) precharge/evaluate) logic circuits: CMOS domino logic, NP domino logic, CVSL logic. Charge sharing. Readings for next time Weste:
Sections
5.3 thru 5.3.4 and 5.4.6 5.3.9 thru 5.4.1

JMM v1.4
#1
Ex vlsi5.1 (difficulty: easy): Design a CMOS gate that implements the function
Out = (( A + B) C + D E ) F
Ex vlsi5.2 (difficulty: easy): What is the Boolean equation of the following CMOS gate.
VDD
A B
GND

JMM v1.4
#2
Weste pp371: 5.9ex7 (difficulty: easy): Design a pass transistor network that implements the sum function for an adder
S = A B C + A B C + A B C + A B C

JMM v1.4
VLSI Design I
Dynamic Logic Gates
Overview Dynamic logic gates, Domino, NORA, CVSL structure, Goal: You are familiar with dynamic logic gates and its different families. You can handle the dynamic logic problems like charge sharing and timing.
JMM v1.3
Tinkering with Logic Gates

Things to like about CMOS gates:
easy to translate logic to fets railrail-toto-rail switching good noise margins no static power since fets are in cutoff sizing not critical to correct operation N inputs 2N fets (i.e., one nfet and one pfet) pfet) large circuit area, especially for pfets heavy loading of inputs pfets are either large or slow relative to nfets series connections can get very slow
Things not to like about CMOS gates:
We can replace pfet pullup network with pseudopseudo-NMOS load (pfet (pfet with grounded gate) but
dissipate static power when output is low have to make load fet small to ensure that VOL is low enough to cut off nfets in next stage reduces static power consumption (good!) increases output rise time (bad!)
One alternative: dynamic CMOS gates

JMM v1.3
Dynamic CMOS Gates

B A A CLK pre precharge switch
B evaluate switch
inputs must be stable before CLK goes high because once output has been discharged it wont go high again until next cycle for same reason, noise/glitches on inputs cannot exceed nfet threshold, a much more stringent requirement than for static CMOS CMOS gates.
Prec Precharge echarge phase clock output Evaluate phase

JMM v1.3
Theres good news & bad news

The good news:
Dynamic gates are faster than static gates despite the extra evaluate fet in the pulldown path because of the reduction in selfselfloading and the elimination of the pullup shortshort-circuit current during the first part of the output transition.
The bad news:

Dynamic gates cannot be cascaded.
Because of finite pulldown time for node , node starts to discharge!

nfets nfets
CLK
precharge
evaluate
CLK
Solution: develop techniques that avoid races CMOS Domino logic CMOS NORA (no race) logic
JMM v1.3
CMOS Domino Logic

pree preecharge: high evaluate: falls (maybe)
nfets
nfets
CLK
pree preecharge:low evaluate: rises (maybe)
buffer might be needed in any case for high fanfan-out circuits.
When CLK is low, dynamic node is pree preecharged high and buffer inverter output is low. Nfets in the next logic block will be off. When CLK goes high, dynamic node is conditionally discharged and and the buffer output will conditionally go high. Since discharge can only only happen once, buffer output can only make one lowlow-toto-high transition. When domino gates are cascaded, as each gate evaluates, if its output rises, it will trigger the evaluation of the next stage, and so on like a line of dominos falling. Like dominos, once the internal internal node in a gate falls, it stays fallen until it is picked up up by the pree preecharge phase of the next cycle. Thus many gates may evaluate in one eval cycle.

JMM v1.3
DominoMore Domino -style Circuits

weak pfet keeper keeps dynamic node pulled high during evaluate phase if its not being pulled down through nfets gate is static in both clock phases.
CLK nfets
latching pfet acts like keeper above unless dynamic node gets pulled down during evaluate phase. When buffer output goes high it switches keeper off saving static power. Good for leakage current problems... Note that you can put an even number of static gates after the inverter and before the next domino gate.
CLK nfets
Be careful of cap. coupling to dynamic node (see later slide).
CLK
Use NOR gate instead of inverter as the buffer to make a faster high fanfan-in AND gate. Same trick works for high fanfan-in OR or MUX functions.

JMM v1.3
Optimising Domino Logic (I)
nfets
nfets
CLK
precharge: low
evaluate nfet not needed?
Since domino gate outputs are low during the pre precharge phase, gates which have only domino output nodes as inputs dont need the evaluate nfet since all the nfets in the pulldown will be off anyway. But remember: if evaluate nfet is removed, precharge will ripple through cascaded gates just like evaluates do. Maybe only remove for gates where nfet stack is tall (i.e. resistive) enough that pullup will start to win anyway before ripple reaches gates and turns off pulldowns. pulldowns.
JMM v1.3
Optimising Domino Logic (II)

In domino logic circuits we want evaluate to happen as quickly as possible. We can size fets to optimise evaluate speed.
small large
large nfets
small
CLK
Some designers also grade the sizes of the nfets, nfets, smallest at the top (increase in R offset by decrease in C)
If we make the nfet in the output inverter much smaller than the pfet then
the load on the internal node decreases, and the switching threshold of the inverter increases
Both effects make the gate evaluate sooner. If large >> small, the gate delay can be cut almost in half! half! However, the other edge is very slow, so ripple pree preecharge is a problem.
JMM v1.3
it is not everything gold which is glittering

There are a few little difficulties: charge sharing sharing between nodes in the pulldown network and the dynamic node can unintentionally reduce the voltage of the dynamic node enough to switch output buffer the addition of the output inverter makes domino gates nonnon-inverting. One can often design around this limitation, but some circuits cannot be implemented solely using domino logic unless both polarities (true and complement) of the inputs are available. If both polarities of inputs are available then we can generate both polarities of internal signals with two domino gates so subsequent stages will have both polarities of their inputs available too.

JMM v1.3
Charge Sharing (I)
F=0F=0->1 E=1 D=1 C=1 B=1 A=1 ->0 CLK
3C 1.5C 1.5C
C C C
Suppose the dynamic node has been discharged during the previous evaluate cycle. Then during precharge, all the intermediate nodes in the pulldown chain will remain discharged while the dynamic node is precharged. precharged. Calculate the voltage on the dynamic node when CLK goes high. When CLK goes high, the voltage on the dynamic node goes to
3C for VDD=3.3V V = 1.1V 3C + 6C DD which is low enough to switch the output inverter.
Fortunately this situation is easily detected by CAD tools and c can an be resolved by (1) adding additional pree preecharge devices to intermediate nodes or (2) increasing size of output buffer which will increase increase capacitance of dynamic node (faster output buffer may compensate for larger internal capacitance).

JMM v1.3
Charge Sharing (II)
n-logic
n-logic
n-logic
n-logic
CLK
additional precharge devices to eliminate charge sharing problems

JMM v1.3
Capacitive Coupling
OUT
CLK
OUT t
Coupling can also occur between other signal wires and long dynamic dynamic nodes (e.g., ones that span multiple bits in a datapath). datapath). Solutions: on long routes add twists to avoid continuous routes or route dynamic signals between mutually exclusive or complementary signals.

JMM v1.3
Domino Logic Design

To convert to DominoDomino-style design we need to create schematic that uses nonnon-inverting gates: (1) look for CMOS gates followed by inverter (2) use Demorgans Law to create nonnon-inv gates
A B C D E F G H Convert to Domino OR gate Domino AND A B C D E F G H Domino OR Y X use Demorgans law X
Domino ANDAND-OR

JMM v1.3
Domino Logic Design (II)

X
8/2 8/2
Y
8/2
A B C CLK D
E F
nfet W/L = 4 pfet W/L = 8
s = static d = domino (W/L = 4) dd = domino (W/L = 8)

JMM v1.3
DualDual -rail Domino Logic

Domino circuits that generate both polarities of output
CLK A B CLK CLK A B CLK
CLK
A A B B
CLK

JMM v1.3
MultipleMultiple -output Domino

Why stop at complementary outputs? There are interesting multiplemultiple-output functions where there is a lot of sharing of nfets in the evalua evaluate logic. logic. For example, in a carrycarry-lookahead adder C1 = G1 + P1C0 Gi = A i Bi C2 = G2 + P2G1 + P2P1C0 Pi = Ai+Bi C3 = G3 + P3G2 + P3P2G1 + P3P2P1C0 C4 = G4 + P4G3 + P4P3G2 + P4P3P2G1 + P4P3P2P1C0
CLK P4 P3 P2 P1 C0
Domino version of the Manchester carry chain
JMM v1.3
G4
C4 C3 C2 C1
G3
G2
G1
DualDual -rail Keeper Circuit

CLK
A B A B
CLK
The crosscross-coupled pfets serve as keepers for the output which is high making the gate static rather than dynamic! During precharge both keepers are off; during the evaluate phase, the output that goes low switches on the keeper for the output that is staying high. Really solves capacitive coupling problems with dynamic logic in datapaths. datapaths.
JMM v1.3
Cascade voltage switch logic (CVSL)
Q Q nmos combinatorial network
clock
nmos combinatorial network
clock The static version might be quite slow due to the nfet pfet fight during switching Q Q d e a b c d b c e a dynamic CVSL

JMM v1.3
CMOS NORA Logic (NP Domino)

p blocks n blocks n blocks p blocks
pre
eval
pre
nfets
pfets
nfets
CLK
eval
CLK
pre
CLK
eval
If we turn a dynamic gate upside down and use pfets to build the logic block, we get a logic gate that pree preecharges low and discharges high. By using these gates in an alternating seque sequence nce with regular nfet dynamic gates we can eliminate the race problem we had with nfetnfet-only dynamic gate sequences and hence we dont need the buffer inverter present in domino gates. Removing the buffer is a mixed blessing since we may need it for drive reasons and to keep compatibility with other domino gates. It also makes NORA logic very susceptible to noise since during the evaluate phase all information is stored dynamically.
JMM v1.3
Domino Life Cycle

Actively pr precharging
Waiting for precharge (holding output value)
Waiting for data (holding precharge)
Actively evaluating The 9 Oclock state is very interesting: once a Domino gate has has finished evaluating, the gates immediate predecessors can start to pre preearge (forcing the gates inputs low) without affecting the value of the gates output. The gate is acting as latch so long as its predecessors dont start another evaluate cycle. Perhaps we can build a pipeline of domino stages where each stage stage serves as both logic and latch depending on where it is in its cycle. cycle. Need to have each stage supply its own pre precharge/evaluate timing dependent on what its neighbours are doing...
might be several gates

JMM v1.3
SelfSelf -timed Pipelines

0 = precharged 1 = evaluation done
P/E
done? done?
P/E
done? done?
P/E
done? done?
F1
F2
F3
Simplest correctness rules:
a stage only prec precharges when both (a) its successor has finished evaluating
(its done with our values)
Sdone = 1 Pdone = 0 Sdone = 0
(b) its predecessor has finished precharging a stage only evaluates when both (a) its successor has finished precharging
(old values are gone so we cant use em em twice!) (our new output wont affect its stored value)
(b) its predecessor has finished evalu evaluating
(there are new inputs for us to consider) Pdone = 1
So, what logic goes in the clouds? And how do we build the done? boxes?
JMM v1.3
CMuller C -Element
Add weak feedback inverter if were worried about dynamic storage for precharge/eval precharge/eval signal
P/E
Pdone Sdone
The Muller CC-Element is the AND gate for selfself-timed logic because it changes its output only after both inputs have changed. As shown above, its an elegant implementation for both sets of rules on the previous slide.

JMM v1.3
Completion Detectors
use dualdual-rail signalling (i.e., two wires) to encode reset (not yet evaluated) 00 ready with value 0 01 ready with value 1 10 and then build handshake logic that starts next stage when current stage is done and next stage has completed its previous computation and delivered its values...
SelfSelf-timed logic

JMM v1.3
SelfSelf -timed Pipeline Latency

1 = precharged 0 = evaluation done
C
P/E
done? done?
C
P/E
done? done?
C
P/E
done? done?
F1
F2
F3
Propagation through selfself-timed pipelines is constrained in both directions:
In the forward direction by how long it takes for the evaluate edge in one stage to trigger the evaluate edge in the next stage: stage: LF = tF + tD + tC In the reverse direction by how long it takes for the precharge in one stage to trigger a new evaluate in the stage after first evaluating the previous stage (remember not double count!): LR = 0.5*(t 0.5*(tC + tF + tD + tC + tF + tD)

JMM v1.3
Further Improvements
We dont have to delay evaluation until successor has finished its precharge (signalling that its finished with our values). We can just check that successor has started precharging precharging Even with this improvement, the correct sequencing will still happen for any combination of precharge and evaluate times for all the gates. We can modify the control element like so:
S P/E
Eliminate the extra inverter for good measure and use dynamic storage as control element memory
P/E
Pdone Sdone
Were going to stop here, but there are other improvements that can be made. Hint: do we have to wait until the predecessor is done computing new values before starting our eval? eval? etc., etc., etc.
JMM v1.3
Dynamic Logic Summary

Advantages of dynamic logic:
smaller area than fully static gates smaller parasitic capacitances hence higher speed reliable operation if correctly designed. Concerns:
capacitive coupling to dynamic nodes charge sharing with dynamic nodes subthreshold leakage currents in eval logic minority carrier injection and latchup alpha particle immunity vdd/ vdd/gnd noise and resistance
This makes dynamic logic a good choice for those parts of a circuit where the extra engineering investment is justified, e.g., along the critical timing paths.
Engineers who like this sort of design will find this the sort of design they like!

JMM v1.3
Coming Up...
Next topic CMOS sequential logic. logic. Readings for next time ... Weste:
5.4.4
(dynamic CMOS logic) 5.4.7 - 5.4.11 (CMOS domino logic, CVSL), except 5.4.10

JMM v1.3
Weste pp371: 5.9ex8 (difficulty: easy): Design a CVSL gate for the following fun function: ction:
S = A B C + A B C + A B C + A B C

JMM v1.3
VLSI Design I
Clocking Strategies
I take care of it ?
Todays handouts: (1) Lecture Slides

JMM/ESA v1.0
Clock Generator
VLSI Systems Design

Microelectronic Technologies
Overview microelectronic technologies, ASIC, FPGA, C Goal: You are familiar with the microelectronic technologies, and know their advantages and features.
JMM v1.4
Microelectronic Technologies
What is microelectronic ? Has a microelectronic design engineer only to have good knowledge about silicon, layout, etc. ?
application specific integrated circuit full custom macro cell standard cell gate array microprocessors PIC, COP FPGA RISC uController signal processor PAL CPLD field programmable logic

JMM v1.4
Gate Array Technology #1

prefabricated wafers
I/O stages predefined regular array of fets and interconnection channels interconnection defines functionality
features
size: 100 - 1M gates short turn around time cheap at medium quantities unsuitable for regular structures like RAM, PLA, ALU

JMM v1.4
Gate Array Technology #2

3 cells of a gate array are illustrated 1 cell corresponds to a 2 input nand gate

JMM v1.4
prefabricated wafers
SeaofSea -of -Gate Technology
I/O stages predefined regular array of fets, fets, no reserved interconnection channels interconnection defines functionality
features
size: 100 - 1M gates short turn around time cheap at medium quantities regular structures like RAM, PLA, ALU can be used

JMM v1.4
SOG Example
INV NOR2
nwell contacts GND 3 nfets 2 small, 1 large mosfets with common gate 3 pfets
horizontal wiring tracks in metalmetal-1
gate isolation mosfets
VDD unused horizontal and vertical tracks used for wiring gates together. Better granularity if main routing channels run vertically. GND
substrate contacts vertical wiring tracks in metalmetal-1 or metalmetal-2

JMM v1.4
Standard Cell Technology

complete fabrication process
predefined library of base functions modular similar to TTL families
features
chip size limits complexity long turn around time cheap at high quantities standardized cell height unsuitable for regular structures more flexible and compact (1:4) than gate array

JMM v1.4
Standard Cell Example

Create a library of prepre-layedlayed-out cells, e.g,, boolean gates, registers, muxes, muxes, adders, I/O pads, A data sheet for each cell describes the cells function, area, power, propagation delay, output rise/fall time as function of load, etc. Quiz: whats the cells function
Its just like designing with boardboard-level components. CAD tools help with placing the cells to minimize area and to meet timing constraints (perhaps directed by a floorplan created by the user); routers make the appropriate connections between the cells.
JMM v1.4
Full Custom Technology

total flexibility, only limited by layout rules manual design
features
chip size limits complexity long design and fabrication time efficient use of silicon area cheap only at highest quantities (ex. uP, uP, memories, ...)

JMM v1.4
Macrocell Technology #1
semi combines semi - and full custom technologies predefined library of base functions generators for regular structures
features
chip size limits complexity short design, long fabrication time cheap at high quantities high flexibility, compact layouts
macro cell
PLA RAM

JMM v1.4
Macrocell Technology #2
2-dim array of full custom block standard cell block
full custom block

JMM v1.4
FPGA Technology #1
field programmable device
no fabrication needed for customizing predefined logic blocks unsuitable for regular structures
features
size: up to 2000000 logic gates (see Virtex from Xilinx) Xilinx) large silicon area necessary (72 million fets, fets, 10x Pentium2) short design and customize time cheap for small quantities compared to ASICs, ASICs, FPGAs have a reduced clock speed circuit configuration downloadable (RAM or PROM)

JMM v1.4
FPGA Technology #2
I/O buffers configurable logic block (CLB) switching matrix I/O buffers I/O buffers routing channels I/O buffers
JMM v1.4
configuration
- mask programmable - one time programmable - downloading of configuration from host into internal RAM - downloading of configuration from on board serial ROM
JMM v1.4
CLB from Xilinx serie XC5200
C1. . .C4
4 H1 Din /H2 SR/H0 EC
G4 Din S/R Cont ro l SD D Q Bypass YQ
G3 F G H
G2
Log ic Funct ion of G1. . .G4
G1
G H 1
EC RD Y
Log ic Funct ion of F ,G H and H1 Din F G H S/R Cont ro l
F4 Bypass SD D Q XQ
F3
F2
Log ic Funct ion of G1. . .G4
F1
FPGA Technology #3
K (C lock) 1
EC RD X H F
FPGA Technology #4
Switching matrix with CLBs
CLB
CLB
CLB
PSM
PSM
CLB
CLB
CLB
PSM
PSM
CLB
CLB
CLB

JMM v1.4
uC Technology
field programmable device
no fabrication needed for customizing simple C software compilers software vs. hardware solutions
features
4 or 8 bit CPU, size: 512 bytes or more down to 8 pins AD, usart, usart, timer, etc. included very slow compared to hardware solutions cheap (<$2)
PIC 36 mm
JMM v1.4
How to select a technology

Selection arguments
- cost - speed - size - time to market
cost FPGA units design break even units NRE design quantity ASIC

JMM v1.4
Coming Up...
Next topic Hardware description language VHDL, toptop-down design. Readings for next time Xilinx article: The total cost of ownership

JMM v1.4
#1
Ex vlsi08.1 (difficulty: easy): Calculate the breakeven point between an FPGA and ASIC design. Assume a design time of 6 months and an additional backback-end design time of 1 month for the ASIC. The NRE costs of the ASIC are 75kEuro, the cost per unit are 150Euro for the FPGA and 3 Euro for the ASIC. The cost of 1 engineer per month are 10kEuro. Result: breakeven at 578

JMM v1.4
#2
Ex vlsi08.2 (difficulty: medium): Calculate the breakeven point between an FPGA and ASIC design. Assume the design costs from exercise vlsi08.2 and a fabrication time of 3 months for the ASIC. The revenue per sold system at a product lifetime of 4 years is 600Euro without taking into account the FPGA/ASIC chip costs. Use the triangular timetime-toto-market model from Synopsys (see Xilinx article The total cost of ownership). Result: breakeven at 14068 FPGA solutions
units/time maximum available revenue
delayed market introduction
time L product life

JMM v1.4
VLSI Design I
Regular Logic Structures

JMM v1.2
Goals for Regular Logic Structures

Look for a systematic physical structure:
w get handle on layout for random logic w automate layout task once schematic is done w may have several structures to choose from, each optimized along a different design dimension standard cells, gate arrays
But we still have to draw the schematic! So look for systematic logical structures:
w may lead to additional systematic physical layouts w find canonical logic representations that can be automatically turned into compact physical structures (automate, automate,
automate)
w would like to be able to make changes in the logic without having to redo entire layout -- look for ECO-tolerant structures (engineering change orders) muxes, ROMs, PLAs

JMM v1.2
Useful Logic Forms

Truth tables
w direct implementation as muxes, ROMs w good when you have many outputs and few inputs since cost of decoding inputs is fixed w ECO-tolerant but often not efficient use of logic
Minimum Sum-of-Products (SOP, AND-OR)

w minimize no. of literals (small fan-in ANDs) or no. of products (small fan-in ORs) w maximum sharing of product terms for multiple-output functions w if fan-ins are small: direct implementation as complex gates or as 2-levels of ANDs then ORs w if fan-ins arent small: multiple levels of gates (e.g., parity, Achilles heel = 2n-1 minterms) w efficient use of logic, but not very ECO-tolerant
But how do we minimize the number of literals or minterms? Yeah, we know about Karnaugh maps, but they arent so good for more than 4 inputs or for maximizing minterm sharing.

JMM v1.2
Logic Manipulation
Start with two-level minimization
w by inspection searching for terms that are logically adjacent:
p x + p x = p ( x + x ) = p 1= p w Karnaugh maps for simple situations w Quine-McCluskey otherwise
Then try to generate multiple levels:

w factoring. Choose literal that appears in most product terms (>1) and factor it out.
F = a c + a d +bc +bd +a e = a (c +d ) +b (c + d ) + ae w factor again with or-terms that appear in multiple places F = (a + b) (c + d ) + ae w find common subexpressions (multiple output decomposition)

JMM v1.2
Muxes as lookup tables

A 0 0 0 0 1 1 1 1 B 0 0 1 1 0 0 1 1 C 0 1 0 1 0 1 0 1 F 0 0 0 1 0 1 1 1 A,B,C
0 C C 1 A,B
Easy to implement but not necessarily compact even when implemented with TGs. But you can make a nice Boolean Unit:
OP0 OP1 OP2 OP3 A,B OP<3:0>
0 1 1 0 0 0 1 1 0 0 1 1 0 0 0 0
Vcc
B out
F
ZERO AND OR XOR
A gnd

JMM v1.2
Read-only Memories
if connection or mosfet is present, blank otherwise
7 6 5 4
if connection or mosfet is present, blank otherwise
Address decoder implemented as AND (= NOR). Note: all but one row pulled down for given input.
3 2 1 0
For each Fi, OR together all rows for which output is 1 (actually use NOR then invert).
A B C
F1
F0
Like muxes, but share decoding logic among all outputs. Potential optimizations: w delete rows with no output pulldowns w look for adjacent rows with identical output pulldown configurations and merge into single row. Are these worth doing?
JMM v1.2
PLAs
In fact, the optimizations from the previous slide are so worthwhile that we have a name for the resulting optimized ROM: Programmed Logic Array, or PLA for short.
AND plane OR plane 4,5,6,7 2,3 1 Hint: for greater ECO-tolerance, add a few extra empty rows!
What are the logic equations for F1 and F0?
A B C
F1
F0
PLAs are usually constructed directly from minimized SOP logic equations: the rows represent the minterms of the equations, the input columns form the minterms and the output columns form the sums. Note that with multiple output columns, minterm sharing between the outputs happens naturally...

JMM v1.2
PLA Folding
PLAs can be sparse, i.e., only a few of the possible connections in either plane may be made. (AND plane can only have 50%!)
A A B B C C D D 1 2 3 4 5 6 F1 F2
If we allow input and outputs to come from both above and below then we may be able to fold two columns into one if the rows they use dont overlap. This may require rearranging the rows to minimize overlap and hence maximize folding possibilities.
Row folding is another possible optimization (but not in this example).
A A B B F1 6 1 2 3 4 5 D D C C F2
JMM v1.2
Multiple-input encoding
On the previous slide, it was noted that the AND plane can have at most 50% of its connections programmed. Why? To improve the utilization of the input columns, consider encoding the 4 columns used to transmit the two input literals and their complements with some more useful functions of the two literals. For example:
AB A A B B AB AB AB
AIN
BIN AIN BIN
You get extra computing oomph: for example, its now possible to compute (A xor B) using a single row rather than the two rows it took with the old encoding.
JMM v1.2
Datapath Operators
Most digital functions can be divided into the following categories:
u u u u
datapath operators memory elements control structures I/O cells
Datapath operators form an important subclass of VLSI design that benefit from the structured design principles of hierarchy, regularity, modularity and locality.
u
N-bit Data is generally processed by the use of n identical subcircuits. Data operations may be sequenced in time or space.

JMM v1.2
Datapath Operator Example

Magnitude operator example:
u u
data may be arranged to flow in one direction control signals are introduced in an orthogonal direction to the dataflow
less than or equal m =0 m Z
m If (A<=B) then Z=A else Z=B

Am Bm Am-1 B m-1 A1 B1 A0 B0
m bits
subtractor
metal1 control flow
ctrl =0 if =0 if =0 if =0 if equal-zero mux
Zm Zm-1 Z1 Z0
metal2 data flow

JMM v1.2
Coming Up...
Next topic Sequential logic: state elements, latches and registers. Static vs. dynamic storage. Single and multiphase clocking strategies. Setup and hold times; propagation delays. Readings for next time Weste:
u Sections
8.1 thru 8.2 (data operators) u 8.3.2, 8.4.2 (just read, dont study)

JMM v1.2
VLSI Design I
CMOS Sequential Logic Clocking Strategies
Overview single and double phase clock systems Latch and FF timing Goal: You are familiar with static and dynamic latches/FFs latches/FFs as well as with single, double phase clock, clock redistribution, clock skew and PLL clocking techniques.
JMM v1.4
Sequential Logic
Use #1: Get better utilization from idle combinational logic blocks. Pipeline the system so that new computations start before the old ones complete. Add registers to keep computations separate.
8
A B A B
x
8
Use #2: Convert parallel operations to a sequence of (faster, smaller) serial operations. operations.
1
+
8 8
Use #3: Need to process a sequence of inputs and want to reuse the same hardware (finite state machine).

JMM v1.4
FlipLatches and Flip -Flops

Q follows D
D G
D G Q
Q stable Q takes value from D
level sensitive latch
D clk
D clk Q
Q stable
edge sensitive flipflip-flop
A static latch will hold data while G is inactive, however long that may be. A dynamic latch will hold data while G is inactive, but only for a while, after which the saved value may decay.
Do static latches dissipate static power? How long is for a while? Which one should I use?
JMM v1.4
Latch Timing Constraints #1

latch a
D Q G CLK t1a CLK
H S
latch b
CLa
D Q G
CLb
D Q G
t2b
Do I have to check ALL these constraints?
t1a = tnqa+ tnla > thb t1b = tnqb + tndb > tha t2a = txqa + txla < tc0 - tsb t2b = txqb + txlb < tc1 - tsa
th ts tn tx tl tq
= hold time = setup time = min delay from invalid input to invalid output = max delay from valid input to valid output = delay for combinatorial logic from input to output = delay for memory element from G to Q
tc0 = low period of clock cycle tc

JMM v1.4
Latch Timing Constraints #2

t1a CLK
H S H
t2b
t1a = tnqa+ tnla > thb t1b = tnqb + tndb > tha t2a = txqa + txla < tc0 - tsb t2b = txqb + txlb < tc1 - tsa Questions for latchlatch-based designs:
how much time for useful work (i.e. for combinational logic delay)? txla + txlb < tc - 2(t 2(ts + txq) what is the maximal clock frequency
1/f = tc > 2(t 2(txq + txl + ts )
does it help to guarantee a minimum tn, for example, by requiring a minimum number of gates in each cloud? Suppose the maximum clock skew is tSKEW. How does that affect the equations above? Clock skew measures the difference in arrival of CLK at two cascaded latches (not necessarily any two latches!).
JMM v1.4
Static Latches
Basic idea:
Need gain around this loop to make latch static. D
0 1
Want storage node to be isolated from whatever user does to Q. Q Would like fast CLKCLK-toto-Q, small setup and zero hold times.
Oops feedback not isolated from Q. Could add additional output inverters...
CLK
Obvious implementation:
Good! Input goes only to fet gates
Q
D CLKN CLK
D CLK
Should we buffer CLK 0, 1 or 2 times?
JMM v1.4
Latch Timing
1 2 Q D CLK
setup time = how long D input has to be stable before CLK transition. hold time = how long D input has to be stable after CLK transition.
ts CLK th
D
1 2
So, what node should we use to measure setup and hold times? And what should we measure? Other time of interest: CLKCLK-toto-Q
JMM v1.4
Dynamic Latches
Suppose in the interest of speed we were willing to give up the static guarantee and take our chances with dynamic latches, i.e., remove feedback path...
Eliminate when Q fanout is small (1)
Can combine other logic with inverter
D CLK
local or global clock inverter?
Can we do without the CLK inverter too?
DEC did without on 21064 but put in back in for 21164
CLKN D CLK Q CLK
Delete the PFET driven by CLKN and then add NFET driven by CLK in Qs pulldown path to handle what happens when D goes from 1 to 0.
JMM v1.4
FlipFlip -flops (registers)

Using alternating positive and negative dynamic latches with a single clock gives great speed and small area, but
lots of worries about clock skew must balance logic delays to minimize wastage need latch size checks (check optimisations!)
What about those of us who dont have buildings full of engineers to sweat the details? Use D-flipflip-flops and address all the problems once!
D G
D G
Q
slave
D CLK
master
CLK D CLK
!
JMM v1.4
FlipFlip -flop Implementations

Obvious implementation:
Q D CLK
Use jamb latches to lighten CLK load:

Weak feedback inverters (long n and p) get overridden
CLK

JMM v1.4
FlipFlip -Flop Timing

D Q clk CLK t1 CLK t2
CL
D Q clk
t1 = tnq + tnl > th t2 = txq + txl < tc - ts Questions for registerregister-based designs:
how much time for useful work (i.e. for combinational logic delay)? does it help to guarantee a minimum tn? How about designing registers so that txq > th? Supp Suppose the maximum clock skew is tSKEW. How does that affect the equations above?

JMM v1.4
FlipDynamic Flip -Flops

Ill have the Christer Svensson special please! 2
CLK D
QN
CLK is low:
node 1 follows not(D) node 2 pulled up QN is floating with its old value node 2 = 0 if node 1 = 1, otherwise it stays 1 node 2 = not(node 1) shortly after CLK QN = not(node 2) stable soon after CLK node 1 can be pulled down if D goes to 0 (capacitive coupling), but node 2 wont change!
CLK is high:
JMM v1.4
SingleSingle -Phase Clocked Systems

RTL #1:
D Q clk CLK D Q clk D Q clk
latch #2:
D Q G CLK D Q G D Q G
Simplest clocking methodology is to use a single clock in conjunction conjunction with a register. Clocks are generated with global clock buffers. CLK and CLK are generated locally. buffers necessary for large loads clkclk-in clk clk
JMM v1.4
Clock Skew
delay
delay
if a clock net is heavily loaded, there might be a race between clock and data -> clock skew special attention has be made by designing the clock tree. CAD tools are able to design balanced clock trees. two methods to avoid clock skew:
latch
delay
D Q clk
D Q clk
delay
JMM v1.4
CLK
TwoTwo -Phase Clocked Systems (latch)

D Q G PHI1 PHI2 nonnon-overlapping two phase clocks D Q G D Q G
phi1 phi2
a problem in single phase clocked systems is the generation an and distribution of nearly perfect overlapping clocks. in twotwo-phase clocked systems this is solved by nonnonoverlapping clocks nonnon-overlapping clocks can be generated with latch structures
clk
1 1
phi1
phi2
JMM v1.4
TwoTwo -Phase Clocked Systems (FF)

D Q clk CLK
CLK
D Q clk
D Q clk
nonnon-overlapping two edge clocks
in properly designed twotwo-edge clocked systems clock skew problems are drastically reduced Disadvantage: 50% speed reduction typical application: FSM on rising edge, datadata-path on falling edge designs with several FSMs and datadata-paths need thorough design

JMM v1.4
Clock Distribution
Two main techniques for clock distribution exist: a single large buffer (see Alpha processor) a distributed clock tree approach
n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath
clk
delays have to match between stages
there is no such thing as designdesign-free clocking strategy in todays highhigh-performance processes clock buffers should be surrounded by power pads due to its large power consumption
vdd clk gnd clk
clk
clk
clk
clk driver
clk MicroLab, VLSI-10 (17/23)

JMM v1.4
Phase Locked Loop Clock Technique

Phase locked loops (PLL) are used to generate internal clocks on chips for two main reasons: to synchronize the internal clock of a chip with an external clock to operate the internal clock at a higher rate than the external clock input
clock clock PLL clock route dclk dclk clock route
dclk+dpad clock dclk data out

JMM v1.4
dpad clock dclk data out

PLL
up
Divider by n
#2
Filter VCO voltage controlled n x fosc oscillator
fosc PLL fosc ffeed up down Ufilter
Phase Charge Detector down Pump
The phase detector produces a sequence of up/down pulses, which are used to switch a charge pump. The charge pump charges/discharges a capacitor with voltage or current pulses A filter is used to limit the rate of change of the capacitor voltage. The result is a slowly changing voltage that depends on the frequency difference between the PLL and VCO. The VCO increases/decreases its frequency of operation depending on its input voltgae
JMM v1.4
Static Timing Analysis

Do I have to check ALL the constraints? Yup, for every pair of connected register/latches AND for all possible data values!
We need a CAD tool: static timing analyser. Heres how it works: Step 1: LevelLevel-ize ize all signal nodes.
Start by assigning all register outputs and toptop-level inputs a level of 0. For all other gates: levelOUTPUT = max(level max(levelINPUT)+1. For each successive node level, compute min and max time for all nodes on that level (see next slide for details). This is a data independent independent computation. Might need case analysis to avoid false paths. paths.
Step 2: Compute min/max signal delays.
Step 3: Check setup and hold constraints
Use min times of register inputs to check hold time. Use max times and tCLK to check setup time or use max time + tSETUP to determine min tCLK.
JMM v1.4
Stage Delay Computation

Look at each gate and use knowledge of input timing and rise/fall rise/fall timing to compute earliest and latest time output could change f for or both rising and falling output transitions.
IN
VDD
IN OUT
CLKN IN CLK
2
OUT
C1
COUT
min 1=OV, fast

IN GND
max 1=VDD, slow
1 IN OUT
C2
COUT
Other transitions: CLK, CLK, CLKN, CLKN
min 2= VDD , fast
max 2=0V, slow
Use PenfieldPenfield-Rubenstein model to compute td,insum(Ri,Ci) over all nodes i in the stage, where Ri is d,in-out = sum(R total effective resistance to power rail and Ci is nonnon-zero if node capacitor needs to be charged/discharged. Multiply by degrading factor to account for rise/fall time of input.

JMM v1.4
Coming Up...
Next topic Data operators Readings for next time Weste:
Sections
5.5 thru 5.5.6 (latch, FF) 5.5.8 thru 5.5.11 (clock strategy) 5.5.15 and 5.5.16 (clock strategy)
Selfstudy Selfstudy Weste:

PLL
section 9.3.5.3

JMM v1.4

Ex vlsi10.1 (difficulty: easy): calculate peak current and power consumption of a 100MHz clock driver with rise and fall times of 1ns driving 30k registers bits at 100fF each with Vdd=3.3V Vdd=3.3V Result: Ipeak=9.9A, Pd=2.18 Watt

JMM v1.4
Intro to VLSI Systems

Finite State Machines

JMM/ESA v1.0
Excuse me Is there such a thing as unclocked sequential logic?
Wave pipelining
just assert new inputs to logic after waiting long enough to ensure that previous values wont be corrupted. Requires very careful design of each level of logic to ensure consistent propagation delay along all paths with all possible data values. Hard to do in the face of manufacturing variataions (fast N, slow P and vice versa) use dual-rail signaling (i.e., two wires) to encode reset (not yet evaluated) 00 ready with value 0 01 ready with value 1 10 and then build handshake logic that starts next stage when current stage is done and next stage has completed its previous computation and delivered its values. Dual-rail logic works well with precharge-evaluate gates more on this in a later lecture.
Self-timed logic

JMM/ESA v1.0
Finite State Machines
Draw and check state transition diagram
merge equivalent states
perform state encoding
design logic implementation

JMM/ESA v1.0
Correct State Diagrams

in/out
S1
0/0 1/0
1/0 1/0 1/0
S2
0/0
S3
-/0
S8
1/0 0/0
S9
1/1 0/0 -/0
S4
S5
1/1
-/1
1/0
S6
0/0
S7
Is this a Mealy or Moore machine?
Arcs leaving a state must be: (1) mutually exclusive

cant have two choices for a given input value
(2) collectively exhaustive

every state must specify what happens for each possible input combination. Nothing happens means arc back to itself.

JMM/ESA v1.0
Merge Equivalent States

Two states are equivalent if for each possible combination of inputs (1) they have identical outputs (2) they transition to equivalent states
0/0 S1 0/1 S2 1/1 1/1 0/1 S3 1/1 1/0 S4 0/1 1/1 S5 0/0
S2 S3
all but first state
Compatibility table: start by putting X in square (Si,Sj) if Si produces different output from Sj for some input
S4
S5
X X
S1 S2 S3 S4 all but last state
JMM/ESA v1.0
0/0 S1 0/1 S2
0/1 S3 1/1 1/1
1/0 S4 1/1
1/1 S5 0/1
0/0
S2 S3
S4
S5
X X
S1
S1,S5
S2
S3
S4
Next: for non-X square (Si,Sj) write in pairs of states that have to be equivalent in order for Si and Sj to be equivalent. Finally: Look at an entry in (Si,Sj). If entry is Sm,Sn, and if (Sm,Sn) has an X, put an X in square (Si,Sj). Repeat until no more squares can be Xed out.
Remaining squares indicate equivalent states.

JMM/ESA v1.0
Perform State Encoding

Given a minimized symbolic state diagram, assign binary codes to the states. We need to predict the effects of logic minimization and find state encoding the produces smallest logic implementation.
This is hard when number of states is large!
input 0 1 0 1 0 1 0 1 0 1 0 1 current state S1 S1 S2 S2 S3 S4 01 01 00 00 10 11 00 00 01 01 10 11 01 00 01 10 11 01 00 01 00 10 11 00 new state S1 S2 S1 S3 S4 S1 1 0 Q-M 1 1 0 1 1 0 Q-M 1 1 0 1 output 1 0 1 1 0 1 1 0 00 010 11 10 01 11 01 1 1 0 1
S1=01 S3=10 S2=00 S4=11
S1=00 S3=10 S2=01 S4=11
0 1 1 -
0-0 01 10 -1
00 01 10 11 00
1 0 1 0 1

JMM/ESA v1.0
FSM Logic Implementation

Multi-level Logic ROM PLA One hot Registers
One hot encoding uses a separate register for each possible state: register output is 1 if FSM is in that state. Hence only one state register is hot at a time. Makes for trivial decoding of state, simple next state logic. Good for simple FSMs and when no multi-level synthesis is available. Often a good choice for FPGAs.

JMM/ESA v1.0
Coming Up...
Next topic Arithmetic circuits: adders and multipliers. Readings for next time Weste: 8.4

JMM/ESA v1.0
VLSI Design I
Datapath Operators: Addition and Multiplication
Didnt I learn how to do addition in the first year? First year courses arent what they used to be...
01011 +00101 10000
Overview Carry propagate, carry lookahead, lookahead, carry save, carry skip and carry select adder Goal: You know serial and parallel addition and multiplication architectures
JMM v1.4
Addition/Subtraction
Most digital functions can be divided into the following categories:

datapath operators memory elements control structures I/O cells
Adder architectures:
carrycarry-propagate adder (CPA)
ripple carry adder manchester carry adder hierarchical carrycarry-lookahead adder
carrycarry-lookahead adder (CLA)

Why cant we just add
carrycarry-save adder (CSA) carrycarry-skip adder carrycarry-select adder parallel adder serial adder ...
JMM v1.4
Binary Addition
Heres an example of binary addition as one might do it by hand:
1 1 0 1
01101 +00101 10010
Carries from previous column
If we use a twostwos-complement representation for signed integers, the same procedure will work for adding both signed and unsigned numbers. Besides the sum, one often wants two other bits of information from an adder:
carrycarry-out: indicates that add in the most significant position produced a carry; used when implementing multimulti-word arithmetic, e.g, 1 + ((-1) C =a b +s (a +b ) n1 n1 n1 n1 n1
overflow: indicates that the answer has too many bits to be represented correctly by the result width (2s complement), e.g., (2N-1 - 1)+ (2N-1- 1)
V =a b s +a b s n1 n1 n1 n1 n1 n1
JMM v1.4
Adder with ripple carry chain

To convert the simple addition procedure to hardware, well need full adder module:
A B CIN
OneOne-bit adders are sometimes called counters since they count the number of 1s on their inputs and encode the answer on their outputs. Thus a full adder is a 3:2 counter. A B CIN 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 COUT S 0 0 0 1 0 1 1 1 0 1 1 0 1 0 0 1
COUT S
S = A BCin+ A BCin+ A BCin+ A BCin

Cout= A B+ ACin+BCin
AN-1 BN-1 COUT SN-1
Carry ripples from one stage to the next

A2 B2 A1 B1 A0 B0 CIN
JMM v1.4
...
S2 S1
C0 S0
propagation delay _______________

Faster carry logic (CLA)

Lets see if we can improve the speed by rewriting the equations for COUT:
COUT = AB + ACIN + BCIN = AB + (A + B)CIN = G + P CIN where G = AB and P = A + B
generate propagate
For adding two NN-bit numbers:

CN = GN + PNCN-1 = GN + PN GN-1 + PN PN-1CN-2 = GN + PN GN-1 + PN PN-1GN-2 + + PN ...P0CIN
So if we had (N+1)(N+1)-input gates and didnt mind a lot of loading on the P signals, signals, the propagation delay of adder built using this equation for the carries would be (count per fanfan-in 1 delay unit: ripple carry: 5N delays): ____________________________________ Of course, this is impractical but it does lead to some interesting ideas:
faster rippleripple-carry implementations hierarchical carrycarry-lookahead adders

JMM v1.4
Manchester carry chain (CLA)

The plan: first generate carrycarry-in for each adder bit as fast as we can then compute the sum. Delay still proportional to size of adder, but constant is pretty small.
static Manchester stages P=A+B PN
CN-1
PN GN GN
CN CN-1
PN PN GN PN When CLK is low, all C nodes precharge. precharge.

CLK CN-1 PN GN CLK CN
CN
PN
dynamic Manchester stage
When CLK is high, if GN is high, CN is asserted, i.e., driven low.
To prevent GN from affecting CN-1, PN must be computed as AN xor BN. But we needed the xor anyway now SN = PN xnor CN
JMM v1.4
Manchester Adder Block (CLA)

PNPN+1PN+2PN+3
link in Manchester carry chain SN+3

xnor
SN
xnor
SN+1
xnor
SN+2
xnor
Cin
CN+3
Cin
P A G B P A G B P A G B P A G B
AN
BN
AN+1 BN+1 AN+2 BN+2 AN+3 BN+3
The propagate logic in the Manchester carry chain puts a lot of NFETs in series, so when CIN is high the pulldown path can get long if a lot of the P signals are true. For most technologies, the performance of this long pulldown path limits the maximum length of the carry chain to around four stages before it needs to split into subchains. subchains. Adding a bypass path that skips over the block when all P signals are true can improve maximum propagation delay delay when multiple Manchester carry chains are used in series.

JMM v1.4
carryHierarchical carry -lookahead adders

The linear growth of adder carrycarry-delay with size of the input word may be improved by calculating the carries to each stage in parallel: parallel:
generate a carry from bits I thru K if it is generated in the highhigh-order (J+1,K) part of the block or if it is generated in the lowlow-order (I,J) part of the block and then propagated thru the high part
CJ = GIJ + PIJCI-1 GIK = GJ+1,K + PJ+1,K GIJ PIK = PIJ PJ+1,K

7 6,7 4,7 0,7 6 5 4,5
where I <= J and J+1 <=K
3 2,3 0,3
1 0,1
log2(n)
AK SK BK
GJ+1,K CJ
PJ+1,K GIJ
I,K
C I- 1 PIJ
PK CK-1 GK
GIK CI-1 PIK

JMM v1.4
CarryCarry -skip adders

Since computing PIK is simpler than computing GIK, lets try just computing PIK and apply the skip optimisation from Manchester adders.
C12 P8,11 C8 P4,7 C4 C0
Supp Suppose ppose it takes 1 time unit for a signal to pass thru two logic levels, then time to ripple thru block of k bits = k time units time to skip a block = 1 time unit Consider a 2424-bit carrycarry-skip adder organized as 6 blocks of four bits each. So the worst case propagation time is 4 + 1 + 1 + 1 + 1 + 4 = 12 time units But now reorganize the adder with the least significant 3 bits in in the first block, the next 4 bits in the second block, followed by bl blocks ocks of 5, 5, 4, and 3. Now the worst case propagation time is 3 + 1 + 1 + 1 + 1 + 3 = 10 time units
ripple skip ripple

JMM v1.4
LateLate -arriving inputs

Is there a general way to reorganize a logic equation to accommodate a latelatearriving input? Consider the following where X1 arrives late:
f = X X +X X +X X 1 2 1 3 4 5
If we want only one gate delay from X1 to the output f, how do we do it?

JMM v1.4
CarryCarry -select adders

Building on the idea from the previous slide: perform two additions in parallel, one assuming the carrycarry-in is zero and the other assuming the carrycarry-in is one. When the carrycarry-in is finally known, the correct result is selected from the two precomputed results.
...
>=1 & 1 0
1
>=1 &
...
1 0
CIN
...
1 0
...
1 0
...
Is this a mux mux? mux?
If it takes k time units for a block to add kk-bit numbers and if it takes one time unit to compute mux select from the two carrycarry-out signals, then for optimal operation each block should be one bit wider than the next block, just as in the carrycarry-skip adder.

JMM v1.4
JMM v1.4
Adder layouts
3232-bit carrycarry-select adder
3232-bit carrycarry-lookahead adder
NAdding M N -bit numbers

carrycarrypropagate
M-1
...
N
...
...
... ...
...
...
prop delay _____________ area _____

carrycarry-save
M-2
...
N
...
...
prop delay _____________ area _____

JMM v1.4
EvenEven -Odd Arrays

Abstract carrycarry-save picture from previous page:
M-2
CSA
CSA
CSA
CSA
CSA
Rewire so that first two adders work in parallel. Feed results into third and fourth adders which also work in parallel, etc.
M-4 2
CSA
CSA
CSA
CSA
CSA
CSA
prop delay _____________ area _____

Even and odd streams pass through half the adders so even/odd design runs at almost twice the speed of simple CSA implementation.
JMM v1.4
CPA
...
CPA
...
Wallace Trees
O(log1.5M)
CSA
CSA
CSA
CSA
CSA
CSA
...
We have been using fullfull-adders or 3:2 counters in our array adders. Higher faninfanin-counters can be used to further reduce delays for large M, e.g., Weste shows a 5:3 counter in Fig. 8.41.
Wallace trees give asymptotically better behaviour than the earlier earlier O(M) schemes, but they do not have a regular layout. Other O(log(M)) schemes, e.g., binarybinary-tree multipliers using signed digit representations, have better layout properties but at a cost cost of more complicated adder cells.
CSA

JMM v1.4
CPA
BitBit -Serial Adder

bitbit-serial adders are very slow, have a high data latency, but are extremely compact applications are signal processing
cout A n-bit register B clk n-bit register cin clk
FF clr result n-bit register

JMM v1.4
CSA Adder (pipelining)

pipelining adders are extremely fast, but lack of high data latency (CSA structure of slide #13)
nc S=A+B+C+D FF D(3) FF A(3) C(3) D(2) FF A(2) C(2) D(1) FF A(1) C(1) D(0) FF A(0) C(1)
JMM v1.4
FF FF FF B(3) FF FF B(2) FF FF B(1) FF FF B(0) clk CPA adder

Carry
S(3)
FF
S(2)
FF
S(1) 0 S(0)
FF
clk
0 CSA adders
CPA Adder (pipelining)

the CPA structure on slide #13 can also be used in a pipeline structure. Useful in signal processing applications.
FF
Carry S(3)
B(3) A(3) B(2) A(2) B(1) A(1) B(0) A(0) Cin
FF FF
FF FF
FF FF
FF FF FF
FF
FF FF
FF FF
FF FF FF
FF
FF
S(2)
FF FF
FF FF FF
FF
FF
FF
S(1)
FF FF FF
FF
FF
FF
FF
S(0)
CSA adders
JMM v1.4
Binary Multiplication
Suppose we want to multiply two numbers:
A = {AN-1, AN-2, , A1, A0} B = {BM-1, BM-2, , B1, B0}
multiplicand multiplier
to produce a (N*M)(N*M)-bit result. We can write the product as

A*B = B0*A*20 + B1*A*21 + + BM-1*A*2M-1
Note that BK*A can be accomplished with N AND gates since BK = 0 or 1. The scaling by powers of two is a simple shift. Thus multiplication of an NN-bit number by an MM-bit number boils down to the addition of M NN-bit partial products each of which is formed by a simple Boolean operation. Any of the techniques from the previous slides can be used to accomplish the required additions.

JMM v1.4
Array multipliers
Example 3x3 array multiplier using CSAs to sum partial A2B2 products: 0
A2B1 0 0 A2B0 0 0 A1B0 0 0 A0B0 0 A0B1 A1B1 A0B2 A1B2
nc
P5 P4
P3
P2
P1
P0 Actual layout is usually squished flat:

A B

JMM v1.4
Higher Radix Multiplication

Array multipliers are nice, but we get one column of adders (which (which are big/slow) for each partial product, i.e., one column for each bit bit of the multiplier. If we could use, say, 2 bits of the multiplier in generating generating each partial product we would halve the number of columns and double the speed of the multiplier! multiplier! Lets rewrite our equation for A*B:
M-2 A*B = B1,0*A*20 + B3,2*A*22 + + BM-1,M1,M-2*A*2
This looks the same as before except we have half as many partial partial products to sum. Generating each partial product is now more complicated since BK+1,K can now be 0, 1, 2 or 3. The only troublesome value here is 3 since that would seem to require more more adder inputs than we have (3*A = A + 2*A). But we can also write 3*A = 4*A - A. Well do the -A in this partial product stage and signal the next stage that it needs to add 4*A. 4*A. To keep the signalling simple well also rewrite 2*A = 4*A - 2*A
Profs go crazy nowadays, why cant he just multiply as everybody does it

JMM v1.4
(RadixBooth Recoding (Radix -4)

M-2 A*B = B1,0*A*20 + B3,2*A*22 + + BM-1,M1,M-2*A*2
AN-1 AN-2 A4 A3 A2 A1 A0 BM-1 BM-2 B3 B2 B1 B0 x
M/2
...
BK+1 BK BK-1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1
action
x1 x2 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 0
BK+1,K*A = 0*A 0 = 1*A A = 2*A 4*A - 2*A = 3*A 4*A - A

Ai x1 Ai<<1 x2 N & &
add 0 -add A 0 add A 0 add 2*A 0 sub 2*A 1 1 sub A 1 sub A add 0 --
>=1 =1
PPi carrycarry-in
Not cheaper than an ADD but all recodes can be done in parallel so we only pay time penalty once (for first column)!
JMM v1.4
16x32 Booth Multiplier
This multiplier only produces a 3232-bit result so top 1616-bits of rhombus have been omitted:
top 16 bits omitted 32
JMM v1.4
Serial Multiplication
bitbit-serial multipliers are very compact, but lack of high data latency and are very slow simplest form of serial multiplier: successive addition
cout clk
reset A B
&
FF clr result
& clk
N-1 bit register
M+N bit product -> td=MN time intervals

JMM v1.4
Serial/parallel and Pipelined Multiplication

serial/parallel multiplier: very modular structure
X1 Y0 X0 Y1 0 Y2 0 Y3
M+N bit product -> td=M+N time intervals, but time intervals are larger
pipelined multiplication: 2 delay elements per cell

Yn
& Xj PPin
JMM v1.4
& &
&
Xj+1 PPout
& P
Shifters
Shifters are very important for microprocessor architectures:
arithmetic shifting logical shifting rotation functions
barrel shifter constructed by transmission gates

shift3 shift2 shift1 shift0 result3 result2 input6 input5 input4 input3 input2 input1 input0 result1 result0
Operation: input logical right shift 0,0,0,A(3:0) logical left shift A(3:0),0,0,0 right rotate A(2:0),A(3:0) left rotate A(3:0),A(2:0) arithmetic right shift A3,A3,A3,A(3:0) arithmetic left shift A(3:0),A0,A0,A0
JMM v1.4
Coming Up...
Next topic VLSI fabrication: processing steps, basic structures, selfself-aligned processes, P and N devices. Readings for next time Weste:
Sections
8 thru 8.2.1.6 and 8.2.7.3 8.2.7 thru 8.2.8
Self study Weste:

parity
generators 8.2.2 comparators 8.2.3 zero/one detectors 8.2.4 binary counters 8.2.5 Boolean operations - ALUs 8.2.6

JMM v1.4

Ex vlsi12.1 (difficulty: medium): Develop a 1 bit full adder with not more than 3 fets in series for the not(sum) and not more than 2 fets in series for the not(carry) circuit. The not(carry) signal can be used for the sum circuit. Result: Notice that the nn- and pfet blocks are identical and not complementary.
A A B A B C B A B A A B C Carry C B A A B C B C Sum

JMM v1.4
VLSIExercises: VLSI -12 cont

Ex vlsi12.2 (difficulty: easy): A 3232-bit adder is built as a carrycarry-select adder. Each adder as well as the muxes have one delay unit. Find the optimal structure in respect to speed. Result: The maximum speed is 9 time units for a structure with stages 44-4-5-6-7-6 (see Weste pp532) Ex vlsi12.3 (difficulty: easy): A hierarchical carrycarrylookahead adder (see slide 8) is given. Show algebraically that C3=G03+ P03 Cin corresponds to the equation C3=G3+P3 G2 +P3 P2 G1 +P3 P2 P1 G0 +P3 P2 P1 P0 Cin (note that Gii= Gi and Pii= Pi) Ex vlsi12.4 (difficulty: easy, time consuming): Design a VHDL code for a 3232-bit hierarchical carrycarry-lookahead adder (see slide 8). If one block has a delay of 1 time unit, what is the overall delay. Result: The total delay is 9 time units Ex vlsi12.5 (difficulty: medium): Consider X1 as a late arriving input which needs to be speed up. Develop the circuit for the function: f = X1 X2 + X1 X3+ X4 X5
JMM v1.4
VLSI Systems Design

Design Project: Practical Aspects
I am a VHDL expert. But how applying in real live for my MP3 player!
Overview
applying the descriptiondescription-synthesis design method in practice
Goal: You are able to master your own VHDL project. project. You have basic notions about HW/SW coco-design.
JMM v1.4
Project Goal
Goal: design of an an electronic system from specification down to ASIC/FPGA Problem: one of the most difficult tasks in a VLSI project design is to find the starting design point Basic Steps: in order to proceed in a structured manner, you should perform the following steps
block diagram HW/SW coco-design (hardware/software coco-design) IP cores (intellectual property cores)
hardware
FSMD architecture model
software
coco -design
structured software design C coding, compiling
VHDL coding & simulation
hardware
software
coco -design
hardware/software system simulation synthesis, place & route back-annotation & simulation (formal design verification) backJMM v1.4
chip test
Initial System Design Steps

System design steps
1. 2. 3. 4. identify your chip in the overall system define the chip IO and group them to blocks identify functional units of your chip identify the interconnection between your units
block diagram HW/SW coco-design IP cores
5. identify speed sensitive (HW) and control sensitive (SW) tasks 6. define the intelligence of each functional unit 7. identify IP cores 8. organize as much as possible IP cores (tools, core generators, old designs, internet) 9. update design if necessary according to available IP cores 10. define interinter-process communication 11. define the interconnections between your units
In the classical HW/SW coco-design approach, the design process is continued as long as possible independent of its implementation. HW/SW design units are identified at the very end of the design steps. In smaller designs, as it is in our case, the HW/SW coco-design step is done in an early phase.
JMM v1.4
Project MP3 Player: step 1 (block diagram)

Step 1: identify your chip in the overall system
USB USB Keyboard Keyboard
MP3 MP3Player Player ASIC/FPGA ASIC/FPGA
LCD LCD MP3 MP3Decoder Decoder
Power Power
Flash FlashMemory Memory
DAC DAC

JMM v1.4
2Project MP3 Player: step 2 -4 (block diagram)

Step 2: define the chip IO and group them to blocks Step 3: identify functional units of your chip Step 4: find the interconnections between your units
MP3 Player ASIC/FPGA power management main control LCD interface I2C interface I2S interface
keyboard interface
Decoder interface
USB interface
Flash interface
DAC interface

JMM v1.4
Project MP3 Player: step 5 Co(HW/SW Co -Design)

Step 5: identify speed and control sensitive tasks Step 6: define the intelligence of each functional unit
add intelligence ?
control sensitive
MP3 Player ASIC/FPGA power management add intelligence keyboard interface Decoder interface main control LCD interface I2C interface I2S interface
USB interface
Flash interface
DAC interface
speed sensitive
JMM v1.4
add intelligence
7Project MP3 Player: step 7 -8 (Hardware Design)

Step 7: identify IP cores Step 8: organize as much as possible IP cores (tools, core generator, old designs, internet)
MP3 Player ASIC/FPGA power management
main control
PIC core
LCD interface Decoder interface I2C interface I2S interface
keyboard interface
USB core
USB interface Flash interface DAC interface

JMM v1.4
9Project MP3 Player: step 9 -11 (Hardware Design)

Step 9: update design if necessary according to available IP cores Step 10: define interinter-process communication Step 11: define the interconnection between units
main control
PIC core LCD
interface
intelligent keyboard interface
Decoder interface Port A Port B intelligent flash interface DAC interface Port C Port D
USB core
USB interface
intelligent intelligent I2S I2C interface interface

JMM v1.4
Hardware/Software Design Steps
I.
FSMD architecture model
Hardware design project steps:

imagine your chip working in the target system, identify and describe its basic functional units in a datadata-flow view II. find the RTL structure of each of the above datadata-flow functions and update your block diagram by allocating your RTL structure to one or more functional units III. fix in detail the operation of your functional units (local intelligence or datadata-path only) and add FSMs if required, fix the detailed interconnections between your units IV. design all FSMs, FSMs, define clock strategy, use colored datadataflow, be careful with the interinter-process communications V. VI. VII. VHDL coding of your RTL design test bench design simulate your VHDL design with test bench
VHDL coding
structured software design
Software design project steps:

I. II. III. IV. V. design the software structure as learned in SW engineering courses define the data structure define the HW/SW communication develop the C code compile & verify your C code
JMM v1.4
C coding
Project MP3 Player: step I (Hardware design project steps)

Step I: imagine your chip working in the target system, identify and describe its basic functional units in a datadata-flow view download MP3 song from host to flash memory (flow 1):
9 9 9 9 9 9 9 9 generate flash command, generate flash address load byte from USB into register use byte to execute ECC (Hamming code) update flash address store byte into flash write ECC code after 512 bytes generate writewrite-toto-flash after 512 bytes use pipeline structure to speed up data transfer
MP3 Player ASIC/FPGA

power management intelligent keyboard interface
PIC core main interface control

LCD DAC interface Port C
Decoder interface
Port A Port B intelligent lash interface
USB core
USB interface
Port D
intel. intel. intel. intel. I2S inter. I2C inter.

JMM v1.4
Project MP3 Player: step II (hardware design project steps)

Step II: find the RTL structure of each of the previous datadata-flow functions and update your block diagram by allocating your RTL structure to one or more functional units download MP3 song from host to flash memory (flow 1):
count enable in out clk enable in out clk ECC generator enable in out clk sel USB interface Flash interface pads to flash mem
JMM v1.4
command register
mux
Project MP3 Player: step III (hardware design project steps)

Step III: fix in detail the function of your functional units (local intelligence or datadata-path only) and add FSMs if required, fix the detailed interconnections between your units
PIC core
Software C Code
intelligent intelligent keyboard keyboard (FSMDinterface architecture)
Port A Port B Port C intelligent intelligent lash Flash & I2S interface interface (FSMD architecture) Port D intelligent intelligent I2C LCD interface interface (FSMD architecture)
USB core
Hardware (IP core)

JMM v1.4
Project MP3 Player: step IVa

IVa: design all FSMs, FSMs, define clock strategy, use Step IVa: colored datadata-flow, be careful with the interinter-process communications
data-paths, falling edge for IP Clock strategy: Rising edge for datacores and FSMs. FSMs. All handshake signals between FSMDs and IP cores on falling edge. Colors: make a lot of copies of your RTL data path Colors: Colors: for each datadata-flow step, color the old active data paths Colors: leaving a register blue, the new active datadata-paths leaving a register green, and datadata-paths treated with a combinatorial function in the corresponding dark color. Active control signals and its blocks are orange. All other datadata-signals are red. Red signals are dominant. Be sure that no red signals enter a FSM, and no darkend or red signals attack asynchronous set/reset of FFs. FFs.
count enable in out clk enable in out clk ECC generator enable in out clk sel mux command register
JMM v1.4
pads to MicroLab, VLSI-13 (13/24) flash mem
Project MP3 Player: step IVb

IVb: design all FSMs, FSMs, define clock strategy, use Step IVb: colored datadata-flow, be careful with the interinter-process communications
we decide to use 3 different FSMs in addition to the ones present in IP cores the PIC processor core is the main unit, which communicates with all other FSMD or core units, thus use interinter-process communication. There is no communication inin-between the other units.
intelligent keyboard (FSMD) Hardware (IP core)
Software C Code
intelligent intelligent Flash & I2S interface LCD interface (FSMD) (FSMD)
process 1
request
process 2
aknowledge data
data valid

JMM v1.4
Project MP3 Player: step V

Step V: VHDL coding of your RTL design
data-path manipulation and its use a processes for datasucceeding register use 2 processes for a FSM:
one process for transition table (VHDL case) one process for next state (state register) continuous assignment for output function count enable in out clk enable in out clk ECC generator Process 1 enable in out clk sel mux command register
Process 2
pads to flash mem

JMM v1.4
Project MP3 Player: step VI

Step VI: test bench design the design of a test bench is one of the most time consuming and important tasks. A test bench will be rere-used several times during the different design steps as well as for chip test (have a look at vlsi21)
Test Bench
response generation and verification
control and stimulus generation
device under test (DUT)

JMM v1.4
Final System Design Steps
system simulation
Hardware design project steps:

12. 13. system test bench design hardware/software system simulation with test bench
synthesis place and route
14. synthesis of logic level design 15. simulation of logic level with test bench 16. place & route your design for target technology 17. 18. back annotation and simulation with test bench (formal design verification)
verify test
19. chip fabrication 20. chip test with test bench 21. in system test

JMM v1.4
diagam Block diaga mm of a general System

A general system is composed of three elements:
user algorithm plant
all three items interact with each other resulting in 2 closed loops The closed loops may have realreal-time constraints

JMM v1.4
GECKO Design Environment

Design entry:
C-code software manual RTL hardware algorithms
All three design entry elements will be converted to VHDL and thus can be implemented into a SoC

JMM v1.4
SoC Design Methodology

The specifyspecify-exploreexplore-refine design flow is extended to a specifyspecify-exploreexplore-refinerefine-prototypeprototype-analyse design flow for SoC designs with realreal-time constraints

JMM v1.4
SoC with GECKO Environment

An SoC design using the GECKO system supports the two chip approach
GECKO main board for digital part application specific GECKO expansion board for analog, power, HF part
Gecko main board Software Real Time Signal Processing Hardware Hardware IP blocks
Microprocessor IP Core
SoC
Power blocks
Analog blocks
Sensor

JMM v1.4
The GECKO system

GECKO Interface Driver
GECKO main board
GECKO main board n top if an application specific GECKO expansion board (RFID reader application application, ,2W 13.56MHz RF power)
JMM v1.4
HardwareintheHardware -in -the -Loop

to iteratively improve a design fast prototyping and data analysis steps are necessary difficult to model plants are preferably not be modeled and directly included in the simulation loop variable cut between simulation and hardware respect realreal-time constraints
hardwarehardware-inin-thethe-loop
hardwarehardware-inin-thethesoftwaresoftware-loop

JMM v1.4
Homework: MyProject
define your own project plan the development and use the presented design methodology prepare the presentation of your project, be sure you do have all the necessary documentation for the discussed design steps MyProject 2002: 2002: speed controlled dc motor
Matlab/ /Simulink with speed controller Matlab dc-motor electronics GECKO main board with dcin-thehardware-in the-simulationsimulation-loop use hardware-
Implementation constraints:
microprocessor with C code for administrative tasks pulse wide modulation for driving dc motor (hardware) A/B signal encoder for speed sensing (hardware) driving circuitry (expansion board) as simple as possible
Technical data:
dc motor has 6000 turns/minute at 5V speed sensor has 12 pulses per turn
JMM v1.4
VLSI Design II
CMOS Processing
Overview Processing steps processing step sequence Goal: You know the basics of integrated circuit processing steps and you are familiar with the processing sequence of a sample CMOS technology.
JMM v1.4
Introduction
Complementary MOS (CMOS) technology is becoming the dominant candidate for VLSI applications CMOS provides both nn-channel and pp-channel MOS transistors on one chip on extremely expensive fabs cheap chips are produced each chip passes hundreds of different processing steps random process disturbances cause electrical parameter variations of the chips elements are never identical
Process technology pictures and text are copied from: Atlas of IC Technologies, W. Maly, Maly, The Benjamin Cummings Publishing Company, ISBN 00-80538053-68506850-7
JMM v1.4
VLSI Circuit Fabrication

oxidize silicon to form thin and thick layers of SiO2 to serve as insulators. deposit thin layers of material and etch into desired pattern
n+
n+ p
diffuse dopants into substrate to create P/N junctions
implant ions to set thresholds and achieve precise dopant profiles
Most fabrication steps require first creating a mask that determines where the operation will occur. Masks can either be existing layers layers on the IC (these masks are selfself-aligned) or created using a lithographic process and photoresist. photoresist. Design rules ensure that design is still functional in the face of misalignments and various sideside-effects of the fabrication process.

JMM v1.4
Overview
Overview of Processing Step Sequence n-well active poly Overview of Processing Steps making the wafers photolithography oxidation layer deposition etching diffusion implantation n-diffusion p-diffusion contacts metal1 via1 metal2 passivation
JMM v1.4
Processing Steps: Making the wafers

the basic raw material used is a wafer or disk of silicon which varies from 3 to 12 in diameter wafers are cut in thin slices (less than 1mm) of semiconductor cylindrical ingots first step in IC processing is the production of a singlesingle-crystal ingot starting from a silicon melt with a controlled amount of impurities

JMM v1.4
Processing Steps: Photolithography #1

Complementary Photolithography is a technique used in IC fabrication to transfer a desired pattern onto the surface of a silicon wafer. As such the photolithography is a key step in the entire circuit integration process.
alternative method for lower quantities: direct write procedure (E(E-beam)

JMM v1.4
Processing Step: Photolithography #2

JMM v1.4
Processing Steps: Oxidation #1

Thermal oxidation is a process in which silicon (Si (Si) Si) reacts with oxygen to form a continuous layer of highhigh-quality silicon dioxide (SiO2) oxidation of the silicon surface oxidation through a window in the oxide selective oxide growth oxidation of the silicon surface

JMM v1.4
Processing Steps: Oxidation #2
oxidation through a window
selective oxide growth
birds bike
JMM v1.4
Processing Steps: Layer Deposition - General

Thin layers of both conduction substances and insulation materials constitute an important part of any semiconductor device. epitaxy (single crystal deposition) PVD and CVD process (polycrystalline deposition)

JMM v1.4
Processing Steps: Vapour Deposition

PVD
CVD

JMM v1.4
Processing Steps: Etching

The process that immediately follows the photolithography step is the removal of material from areas of the wafer unprotected by photoresist. photoresist. Characterization by selectivity and anisotropy. wet etching
dry etching
JMM v1.4
Solid state diffusion is a process which allows atoms to move within a solid at elevated temperatures.
Processing Steps: Diffusion

JMM v1.4
Processing Steps: Implantation

The alternative to the diffusion technique of dopant introduction used in IC manufacturing is ion implantation.

JMM v1.4
DriveN-Well Implant & Drive -in

In p substrate only nn-channel fets can be processed. Therefore an nn-well has to be implanted in order to hold the pp-channel fets. fets.
Window in the mask and cross section illustrated.

JMM v1.4
ChannelChannel -stop Implant

A thick (0.4um) layer of silicon dioxide, called field oxide, is formed on the surface by oxidation in wet oxygen. This is then etched to expose surface where we want to make fets. fets.

JMM v1.4
Grow Field Oxide

Formation of active regions for nn-channel and pp-channel fets of the CMOS process. The obtained birds beak causes the active area of the device to be significantly smaller.

JMM v1.4
Grow Thin Oxide

Now grow a thin (0.01um = 100 Angstroms) layer of silicon dioxide, called gate oxide, on the surface by exposing the wafer to dry oxygen.
The gate oxide needs to be of high quality: uniform thickness, no defects! The thinner the gate oxide, the more oomph the fet will have (well see why soon) but the harder it is to make it defect free.
JMM v1.4
Deposit & Etch Polysilicon

On top of the thin oxide a 0.7um thick layer of polycrystalline silicon, called polysilicon or poly for short, is deposited by CVD. The poly layer is patterned and plasma etched (thin ox not covered by poly is etched away too!) exposing the surface where the source and drain junctions will be formed:

JMM v1.4
Implant Nfet Drain & Source

The entire surface is doped, either by diffusion or ion implantation, with phosphorus (an electron donor) which creates two nn-type regions in the substrate and an ohmic contact in the nn-well. The phosphorus also penetrates the poly reducing its resistance and affecting the nfets threshold.

JMM v1.4
Effective Nfet Dimensions

JMM v1.4
Parasitic Fets

JMM v1.4
Implant Pfet Drain & Source

Once again the entire surface is doped, either by diffusion or ion implantation, with boron (an electron acceptor) which creates two pp-type regions in the nn-well and an ohmic contact in the substrate.

JMM v1.4
Deposit SiO2 insulator

Finally an intermediate oxide layer is grown for isolation and then reflowed to flatten its surface.

JMM v1.4
Etch contact cuts

Holes are etched in the oxide where contacts to poly/diff are wanted.

JMM v1.4
Deposit & Etch Metal1

For interconnections aluminium is deposited, patterned and etched.

JMM v1.4
Voila: a CMOS Inverter!

Finally a passivation layer protects the wafer surface from contamination and scratches. Pads are opened for bonding.

JMM v1.4
Planarize

JMM v1.4
Deposit & Etch Metal2

JMM v1.4
DoubleN-well, Double -level Metal CMOS Process Steps

1. Grow barrier oxide 2. Mask/Etch Mask/Etch nn-well window 3. P n-well implant 4. Thermal drivedrive-in to deepen nn-well 5. Remove barrier oxide 6. Grow pad oxide 7. Deposit Si3N4 8. Mask/Etch Mask/Etch leaving active region 9. B channelchannel-stop implant 10. Grow field oxide (more drivedrive-in!) 11. Remove Si3N4 12. Remove pad oxide 13. B or P implant to adjust VTH 14. Grow thin (gate) oxide 15. Deposit P-doped polysilicon 16. Mask/Etch Mask/Etch leaving poly wires 17. Etch exposed thin oxide 18. Mask off pp-diffusion regions 19. Sb or As nfet source/drain implant, nn-well contact too 20. Mask all but pp-diffusion regions 21. B pfet source/drain implant 22. Thermal source/drain annealing 23. Deposit SiO2 using CVD 24. Mask/Etch Mask/Etch contacts through SiO2 25. Deposit first Al using PVD 26. Mask/Etch Mask/Etch leaving metal1 wires 27. Grow thick layer of SiO2 28. Spin on thick, flat layer of photoresist 29. Etch SiO2 and photoresist at same rate until only flat SiO2 remains 30. Mask/Etch Mask/Etch vias through SiO2 31. Deposit second using PVD 32. Mask/Etch Mask/Etch leaving metal2 wires 33. Deposit overglass to passivate circuit 34. Mask/Etch Mask/Etch pad windows

JMM v1.4
Coming Up...
Next time: Mask layout: design rules, layout examples, structured and symbolic layout techniques, retargetable layouts. CAD tools for layout: design capture, design rule checking, extraction, network comparison. Readings for next time Weste:
Chapter 3 thru 3.2.3 2 through 2.1 (CMOS processing) transparency notes (process technology)
Johns&Martin:
Transparencies:
Study CBT course on the web or on I3SI3S-CD: How a silicon integrated circuit is made ( (Uni Uni Manchester)
JMM v1.4
Weste pp168: 3.8 ex 5 (difficulty: easy): Explain why substrate and well contacts are important in CMOS.

JMM v1.4
VLSI Design II
CMOS Layout
Measure twice, fab once
Overview CMOS Layout and Design Rules Analog Layout Design Considerations Goal: You are familiar with the basic layout design rules of the Alcatel 0.5m CMOS process. You know how to layout integrated transistors, capacitors and resistors, and what has to be considered in order to realize quality analog circuits, like matching and shielding.
JMM v1.4
Sources of Error
resist exposure and development over/under etching, lateral diffusion uneven topography systematic errors corrected by bloating/ shrinking mask random errors increase minimum widths and spacing
Line registration errors
Mask misalignment
random errors increase extensions and surrounds
contacts and vias only on flat surfaces no devices near boundaries of well no poly contacts over diffusion gate metal must connect to diffusion minimum metal coverage requirements Electrical properties current density limitations latchlatch-up prevention mobility variations (why?) thinthin-oxide thickness variations sheet resistances use of process corners in analysis
Other fab difficulties
Process instabilities
JMM v1.4
Design vs. Actual IC

JMM v1.4
Line Registration Errors

JMM v1.4
Mask Alignment Errors (I)

JMM v1.4
Mask Alignment Errors (II)

Maly, Maly, Figure 2-9

JMM v1.4
Design Rules
enclosure rules
Exclusion rule
extension rules (overlapping)
width rules
spacing rules We can specify the design rules using some convenient units, e.g., microns but what happens if we want to manufacture the chip using different manufacturers? One suggestion: use an abstract unit, the lambda, and scale the design to the appropriate actual dimensions when the chip is to be manufactured. Usually all edges must be on grid, e.g., in the MOSIS scalable rules, all edges must be on a half lambda grid, on the 0.5m Alcatel all edges must be on 0.05m grid.

JMM v1.4
LambdaLambda -based Rules

One lambda ()= one half of the minimum mask dimension, typically the length of a transistor channel. Under the assumption that the worst case alignment is better than 0.75, the maximum relative misalignment between any two masks is better than 1.5. This can be used to derive design rules and to estimate minimum dimensions of a junction area and perimeter before a transistor has to be laid out.
3x3 3 2 2 3 2
4 diffusion (active) poly metal1 contact
1 3 1 1
2 6
For 0.5m Alcatel process: = 0.25m

JMM v1.4
Lambda vs. Micron Rules

LambdaLambda-based design rules are based on the assumption that one can scale a design to the appropriate size before manufacture. The assumption is that all manufacturing dimensions scale equally, equally, an assumption that works only over some modest span of time. For example: if a design is completed with a poly width of 2 and a metal width of 3 then minimum width metal wires will always be 50% wider than minimum width of poly wires. Consider the following data from Alcatel 0.5m process (compare with Weste, Table 3.2 pp145):
contacted metal pitch 1/2 * contact size contact surround metalmetal-toto-metal spacing contact surround 1/2 * contact size lambda lambda rule = 0.25u 1.5 0.375 1 0.25 4 1.0 1 0.25 1.5 0.375 9 2.25
+40% in area Scaled design is legal but much larger than it needs to be!
JMM v1.4
micron rule 0.3 0.25 0.8 0.25 0.3 1.9
Retargetable Layouts?
So, should one use lambda rules, or not? probably okay for retargeting between similar processes, e.g., when later process is a simple shrink of the earlier process. This often happens between generations as a midmid-life kicker for a process. Some 0.35m processes are shrinks of an earlier 0.5m process. Can be useful for fabless fabless semiconductor companies. most industrial designs use micron rules to get the extra space efficiency. Cost of retargeting by hand is acceptable for a successful product, but usually its time for a redesign anyway. invent some way of entering a design symbolically but use a more sophisticated technique for producing the masks for a particular process. Insight: relative sizes may change but topological relationship between components does not. not. So, instead of shrinking a design, compact it!

JMM v1.4
0.5m CMOS Alcatel Mietec Process

C05MLayers and mask definition: C05M -D layer name drawn mask name active yes active nwell yes n-well pwell no (p(p-well) poly yes poly nplus no (n+ implant) pplus yes p+ implant contact yes contact metal_1 yes metal 1 via_1 yes via 1 metal_2 yes metal 2 via_2 yes via 2 metal_3 yes metal 3 nitride yes nitride dractext yes nldd no (no low doped drain, Zener) Zener) nlddprot yes MicroLab, VLSI-15 (11/36) nplusprot yes -
JMM v1.4
C05MC05M -D: some logical descriptions

logical name nwell pwell n+diffusion p+diffusion n+source/drain p+source/drain gate locical
= = = = = = =
used masks nwell nwell active and pplus and poly active and pplus and poly active and pplus and poly and nwell active and pplus and poly and nwell active and poly masks
pfet
nwell n+diffusion p+diffusion nwell active pplus poly
nfet

JMM v1.4
(C05MLayout Rules (C05M -D)
#1
n-well, active
1.7m 0.8m 0.8m 0.5m n strap 0.7m p strap 1m 1m 0.5m 0.6m 2m (3m) 1.1m 1.1m 0.6m 2.4m n strap n-well on same (different) potential
1m

JMM v1.4
#2
poly, fets
0.5m 0.6m 0.6m 0.6m 1.1m 0.7m 1.1m 0.35m
0.6m

JMM v1.4
#3
abutting straps
abutting strap abutting strap 0.8m
1.6m
0.8m 1.15m 0.6m
1.1m 1.1m 0.6m 0.8m abutting strap 0.8m 1.15m 1m 1.15m

JMM v1.4
#4
contact via1 via2 1.1m
metal, contacts, via1, via2

0.8m 0.9m 0.9m
0.7m
1.1m
0.25m 0.25m 0.8m 0.6m
0.2m
0.7m 0.2m 1m
0.9m
via2
0.8m 0.5m 0.25m
via1
via1 need to be covered by metal2
contact
0.35m 0.6m 0.8m 0.25m contacts need to be covered by metal1

JMM v1.4
Sticks and Compaction
Stick diagram
Horizontal constraints for compaction in X
Compact X then Y
Compact Y then X
Compact X with jog insertion, then Y

JMM v1.4
Digital Layout: Choosing a style
Vertical Gates Good for circuits where fets sizes are similar and each gate has limited fanout. fanout. Best choice for multiple input static gates and for datapaths. datapaths.
Horizontal Gates Good for circuits where long and short fets are needed or where nodes must control many fets. fets. Often used in multiplemultiple-output complex gates (e.g, sum/carry circuits).
What about routing signals between gates? Note that both layouts layouts block metal/poly routing inside the cell. Choices: metal2 routing over over the cell or routing above/below the cell. avoid long (> 50 squares) poly runs dont capture white space in a cell dont obsess over the layout, instead make a second pass, optimizing where it counts
JMM v1.4
Digital Layout: Optimising Connections
Which is the better gate layout? considering node capacitances? considering composibility composibility composibility with neighbouring gates?

JMM v1.4
Digital Layout: Big vs. Parallel

cant make gates too long because of poly resistance! Eventually really large transistors have to broken into smaller transistors in wired in parallel.
area = 94m2
area = 73m2
Which is the better gate layout? considering node capacitances?

area = 133m2
considering composibility composibility composibility with neighbouring gates?

JMM v1.4
Digital Layout: Eliminating Gaps

A B C D E
A B D B C D E C E
B A D B C
E D E
JMM v1.4
Analog Layout: Large Transistors

W/L can be very large in analog circuits due to asymmetric layout, node1 has a smaller capacitor which should be used for the most critical node (high impedance)
node 1
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
node 2
node 1 Q1 Q2 node 2
gates
Q3
Q4

JMM v1.4
Analog Layout: Matching
Using lithography techniques a variety of twotwodimensional effects can cause effective sizes of components to differ from the sizes of the glass layout masks.
lateral diffusion overetching mask misalignment ...
Goal: Matching second second-order size error effects is done unitmainly by making larger objects out of several unit sized components connected together. For best accuracy, the bounding conditions around all objects should be matched, even when this means adding extra unused components.
SiO2 protection well lateral diffusion under SiO2 mask overetching
SiO2 protection poly gate

JMM v1.4
Matching Transistor Layouts: CommonCommon -Centroid Layout

use interdigitated G M2 finger structures for keeping the effect of temp SM1,M2 and oxide thickness gradients low use one outside finger for M1, one for M2 symmetry in x & y fets in analog circuitry are typically much wider than in digital circuits
M2 M1 M1 M2 M2 M1 M1 M2 M2 M1
SM1,M2 GM1 DM1 GM1 DM2
DM1
GM1 DM2
JMM v1.4
Capacitor Matching
#1
material
preferable poly1 - poly2 structures (only C05MC05M-A) if not available: poly1 - diffusion (C05M(C05M-D), but nonlinear due to voltage dependency sandwich structures with poly - metal1
in analog design very often precise ratios of capacitors are used major sources of errors in realized capacitors are due to overetching and something less relevant is an oxide thickness gradient across the surface. Goal: Larger capacitors are realized by a parallel unitcombination of smaller unit -sized capacitors overetching). unit(overetching ). If unit -size capacitors are not realizable, overetching can still be minimized by nonunitrealizing a nonunit -sized capacitor with a specific perimeterperimeter -to area ratio. For very accurate ratios commonadditionally common -centroid layout is used (oxide thickness gradient).

JMM v1.4
Capacitor Matching
x y
#2
xa = x 2e ya = y 2e
x 2e
y 2 e
e e Ca
ox C= A = Cox xy tox
Ca = Cox xa ya = Cox (x 2e )( y 2e ) poly top plate poly bottom plate Ct = Cox xa ya Cox xy Ct 2e(x + y )Cox C t 2e ( x + y ) = = C xy C 2 a C 2 (1 + 2 ) = C1a C1 (1 + 1 ) ideally
C1 C2 C2 C1
1 = 2
nC1 (1 + ) = =n C1 (1 + )
poly etch matching well region well contacts
C 2 a nC1a = C 1a C1a
JMM v1.4
Capacitor Matching
#3
unit sized capacitors C1 are squared nonunitnonunit-sized capacitors C2 are rectangular and usually between 1 and 2 times unitunit-sized capacitors (K>1) C2 A2 x2 y2 K= = = 2 C1 A1 x1
perimeterperimeter-toto-area ratio should be kept identical
P2 P = 1 A2 A1 P2 A2 = =K P A1 1 x2 + y 2 K= 2 x1
y2 = x1 K K 2 K
JMM v1.4
4 units
K=1 ... 2
Analog Layout: Resistor #1
resistor value:
L R = Rsq W
Rsq = t
material: many different materials can be used. They have different nonnon-ideal effects. Absolute accuracy is low (+(+-20% or less), matching can be made to be in the order of 1% at most.
polysilicon ( (salicided salicided and non salicided in C05MC05M-A and C05MC05M-D process) diffusions or ionion-implanted regions (n/p(n/p-diff, nn-well)
material metal1 metal2 metal3 salicid poly n+ diff sal p+ diff sal unsal n+poly n+ diff unsal p+ diff unsal n-well
typ Rsq 72m 55m 34m 2.3 2.3 2.1 325 50 70 1.3k
temp coeff 0 0 0 4300ppm/C 4300ppm/C 4300ppm/C 2000ppm/C 1600ppm/C 1600ppm/C 4300ppm/C
nonideality not used not used not used parasitic cap v dep, dep, non lin v dep, dep, nonlin parasitic cap v dep, dep, non lin v dep, dep, nonlin v dependent
most common used

JMM v1.4
Analog Layout: Resistor #2

Examples of possible resistor layout
0.14 Rsq
2.11 Rsq
matched resistors

JMM v1.4
Where does noise coupling occur every time a digital gate changes its state a glitch is injected on the digital power supply and in the surrounding substrate direct ohmic connections (power supply line) via electromagnetic fields (e.g. capacitive coupling in and from substrate) How can noise be reduced use of different power supply lines layout analog and digital circuitry in different sections of the chip protect analog layout by guard rings use shields connected to power and ground
analog part pad pin power supply
JMM v1.4
Analog Layout: Noise Considerations #1
digital part
analog part pad pin
digital part pad
analog part pad pin
digital part pad pin
power supply
power supply
Analog Layout: Noise Considerations #2
Use of shields
analog interconnect digital interconnect
ground shield
n+
n+ n-well
n+ p- substrate
Separate analog and digital parts with guard rings

VSS p+ analog region p- substrate depletion region as bypass capacitor
VDD n+ n-well
VSS p+ digital region
JMM v1.4
Summary of Analog Layout Rules

When drawing layout for analog circuits, one has to consider many details layout design rules, in order to get correct circuits without shortcuts between layers, or open circuits due to misaligned layers avoid parasitic components
9 resistors: take care of length of interconnect wires and material used for interconnects Add enough contacts. 9 Capacitors: There is a parasitic capacitor between any two isolation layers. Minimize size of all areas that do not need to have a specific size for their functionality.
Increase matching accuracy by

9 using common centroid layout 9 using non minimum sized components 9 using capacitors with constant area to perimeter ratio
reduce noise coupling by

9 separating analog and digital parts 9 using separate power supplies 9 using shielding techniques
JMM v1.4
Checking Layouts
Design Rule Checker (DRC). This is a program that checks each piece of the layout against the process design rules. This is a slow process: canonicalize layout into a set of leading and trailing nonnon-overlapping mask edges. Some Boolean mask operations may be needed. determine electrical connectivity and label each edge with the node it belongs to. test each edge end point against neighboring edges to check for spacing (leading edges) and width (trailing edges) violations. Layout vs. Schematic (LVS). First a netlist is extracted from the layout. Use the electrical info generated by the DRC and then recognize transistors are juxtapositions of channel with diffusion. Then see if extracted netlist is isomorphic to the schematic netlist. This is done by a coloring algorithm: initialize all nodes to the same color compute a new color for each node as some hashing function involving the colors of connected (ie (ie, ie, thru a fet) fet) nodes. nodes that have a unique color are isomorphic to similarly colored node in other network nodes worry about parallel fets, fets, ambiguous MicroLab, VLSI-15 (33/36)
JMM v1.4
Coming Up...
Next topic: Small signal fet model Readings for next time Weste:
3.4 through 3.4.7
Johns&Martin:
2.3 (CMOS layout design rules) 2.4 (analog layout design considerations)
Optional
have a look at Alcatel CMOS C05MC05M-D design rules manual

JMM v1.4
#1
Ex vlsi15.1 (difficulty: easy): Assume the 0.5m Alcatel Mietec process. Use the rules to calculate the minimal area and perimeter of the following layout structure. Result: a) AJ1=4.5m2, AJ2=3.188m2, AJ3=2.25m2, PJ1=6m, PJ2=6m, PJ3=1.5m (see Johns&Martin pp99)
J1 Q1
J3 Q2
J2

JMM v1.4
#2
John&Martin pp110: 2.3 (difficulty: easy): Show a layout that might be used to match two capacitors of size 4 and 2.314 units, where a unitunit-sized capacitor is 10m x 10m. Result: y2=19.56m, x2=6.717m
2.314 units 4 units
John&Martin pp123ff: 2.14, 2.15, 2.16, 2.17

JMM v1.4

CMOS Layout (replicating)
Measure twice, fab once
Todays handouts: (1) Lecture Slides (2) Problem Set #5 (3) Inverter Layout Tutorial

JMM/ESA v1.0
Design for Re-use
w whats the schematic for this cell? w what are the fat fets? w Cell was designed for placement under a metal2/metal3 routing grid. How was the layout affected by this design requirement?

JMM/ESA v1.0
Replicating Cells
What does this cell do? What if we want to replicate this cell vertically, i.e., make a stack of the cells, to process many bits in parallel? w what nodes are shared among the cells? w what nodes arent shared? w how should we arrange the cells vertically?

JMM/ESA v1.0
Vertical Replication
Place shared geometry symmetrically about shared boundary. Place items that arent to be shared 1/2 min spacing rule from shared boundary.
Reflect cell about X axis so that Pfets are next to each other: this avoids large ndiff/pdiff spacing. Run shared control signals vertically -- theyll wire themselves up automatically?

JMM/ESA v1.0
Vertical Intercell Routing

Spose we have a signal that will run vertically from one cell to the next, e.g., the carry-out from one cell becomes the carry-in for the cell above. Looks okay until we reflect the cell when we do the vertical replication!
carry-in from cell below carry-out to cell above
Solution: we have to do the routing for vertical intercell signals for a pair of cells, then replicate the pair (complete with routing) vertically.

JMM/ESA v1.0
Building a Datapath
Its often the case that we want to operate on many bits in parallel. A sensible way to arrange the layout of this sort of logic is as a datapath where data signals run horizontally between functional units and control signals run vertically to all the bits of a particular functional unit:
control bit #3 bit #2 bit #1 bit #0 data
Logic that generates the control signals can be placed at the bottom of the datapath. If control logic is complicated or irregular, it might be placed in a separate standard cell block and only the control signal buffers placed placed just below the datapath. Although its tempting to run control signals in poly (so they can control fets) this is unwise for tall datapaths because of poly resistance (e.g., 32 bits x 20u/bit = 640u = ~1000 squares = ~20k ohms!)

JMM/ESA v1.0
Datapath Bit Pitch

How tall should we make each bit of the datapath? That depends on w the width of the nfets and pfets w how much in-cell routing there is w how much over-the-cell global routing there is Global routes can be determined from datapath schematic:
Three global routing tracks required
RESULT OP1 OP2 SHIFTER ADDER CIN EN vdd (m2) global route (m2) in-cell route (m2) control (m1) gnd (m2)
JMM/ESA v1.0
Internal routing may take additional tracks
BOOLE
OP EN
OP EN
EN
Cell routing plan:
MULT
Adder Datapath
power strapping (M1=GND, M3-VDD) 32-bit carry-lookahead adder tristate output enable control logic 32-bit register w/ tristate driver
JMM/ESA v1.0
Shifter Datapath
>>4 >>2
>>8
<<16 <<1 <<8 <<2 <<4 shift right

JMM/ESA v1.0
Design for Re-use
w whats this cell do? w what are the fat fets? w Cell was designed for placement under a metal2/metal3 routing grid. How was the layout affected by this design requirement?

JMM/ESA v1.0
Breaking the Rules
BIT
BIT
word line
w How are neighboring cells placed? w Isnt the word line a long poly wire? w Wheres the p-substrate contact?

JMM/ESA v1.0
Coming Up...
Next time: Scaling effects, fundamental limits. Submicron design issues. Power dissipation and packaging. Readings for next time Weste: 6.3.7 through 6.3.9

JMM/ESA v1.0

Predicting the Future
I see I see a supercomputer the size of a sugar cube...! Neat. Where do I invest?
Todays handouts: (1) Lecture Slides (2) Mead and Conway, Chapter 9 (1981)

JMM/ESA v1.0
Scaling
Over time, process improvements will allow MOSFETs to scale down by some factor
w/ l/ t/ xj/ NA tox/
What happens?
Scaling Theory is a model which provides first order predictions.

JMM/ESA v1.0
Often, different dimensions will scale at different rates. But for an overall picture of what the future portends, there are two major scaling models: 1. Constant Voltage Scaling All spatial dimensions scale equally: W W/ L L/ tox tox/ and some other dimensions do as well: d d/ depletion thickness NA NA doping
2. Constant Field - scale VDD too: V V/ so that electric fields remain the same

JMM/ESA v1.0
First, lets consider constant field scaling, and use basic MOSFET models to predict the effect of scaling by Parameters W/L Cg = Cox W L Id Cox (W/L) (Vgs-Vt) 2 device power = V I Area = W L device power / Area Rdiff Rmetal Rpoly Effect

JMM/ESA v1.0
Speedup!
L
e = L/(E) Transit time scales as ___________ Can also compute as time to discharge gate capacitance:
delay=Cg V/I Gate discharge time scales as _________

JMM/ESA v1.0
Interconnect
Local (metal) Interconnect Delay = RC
L W
I
d
R = ___________ Scaled R = R ________ C = ____________ Scaled C = C ________ Scaled Delay = delay ___________ This turns out to be an overoptimistic prediction more later...

JMM/ESA v1.0
Scaling Table
First Order Scaling (Weste Table 4.12)
In lateral scaling, we only change the channel length L
Parameter Length (L) Width (W) Voltage (V) Gate oxide thickness (tox) Current Transconductance Junction Depth Substrate Doping (Na) Gate Field (E) Depletion layer thickness Load Capacitance (WL/tox) Gate Delay (VC/I) Constant Field 1/a 1/a 1/a 1/a 1/a 1 1/a a 1 1/a 1/a 1/a Scaling Model Consant Voltage 1/a 1/a 1 1/a a a 1/a a a 1/a 1/a 1/a^2 Resulting Influence a a 1/a 1/a^2 a^3 a^3
Lateral 1/a 1 1 1 a a 1 1 1 1 1/a 1/a^2
DC Power dissipation Dynamic Power Dissipation Power-delay product Gate Area Power-density (VI/A) Current Density
1/a^2 1/a^2 1/a^3 1/a^2 1 a
a a 1/a 1/a a^2 a^2
Devices get faster, lower power, though current density goes up.
Devices get even faster, though overall power and power density rise

JMM/ESA v1.0
Die Size
With basic scaling of the same system, wed just end up with smaller and smaller chips.
However, from year to year, the overall die size stays about the same or grows as we add features to the chip.
Fab improvements (mostly, bigger wafers) are what allow for bigger die. Because the die doesnt shrink, global interconnect, particularly clocks and on-chip buses, dont shrink either.

JMM/ESA v1.0
Global Interconnect Scaling

Interconnect scaling for global signals: L
W/
t/
2
scaled R = R * scaled C = C scaled delay = delay * 2
Even worse: wire starts looking like lossy distributed rc wire - O(L2) delay!
In the submicron domain, this increased significance of wire has led to major CAD industry turmoil.

JMM/ESA v1.0
Power Scaling
Power per chip increases with constant voltage scaling and when die size grows. How does this affect us? Junction temperature is a function of power and thermal resistance ja to environment. Example: a 30W chip at 27 C ambient. Junction temp. = 27C + 30W*ja
ja=2 C/W
Junction temp = ___________
Junction temp = _______
ja=0.1 C/W
heat sink
Heat through pins to PC board
chip
JMM/ESA v1.0
In the submicron domain, its difficult to scale VDD, so power faces the constant voltage scaling of 2 This adds impetus to the already-important goal of reducing power of VLSI systems. Some of the main ways of doing this: 1. Reduce unnecessary on-chip transitions by careful logic design, or by disabling the clock to idle systems. 2. Reduce voltage, use more parallelism. 3. Adiabatic logic.

JMM/ESA v1.0
Problems with scaling theory

Can one scale indefnitely? No. What are the limits? Are they fundamental limits?
Is there any difference?
Are they technical limitations?
Is the current technology close to those limits?

JMM/ESA v1.0
Some limits
Current Density J increases with
J=I/(Wt) scaled I = I / scaled J = J
L W
I
Metal migration imposes a limit on current density. ==> Thicker wires and more metal layers needed. ==> Increased fringing capacitance with thicker wires.
Punchthrough: source/drain depletion regions touch L

VPT=(L 2 q NA)/(2)
Xd

JMM/ESA v1.0
Subthreshold leakage
Subthreshold conductance is proportional to exp (Vgs-Vt kT/q
We can scale V t by via ion implantation. kT/q = 0.025V does not scale. Vt falls =====> Subthreshold current ______________ exponentially. Example: Vt = 0.5V means that leakage current time constant is 10 7 is 10 1 Vt = 0.1V means that leakage current time constant

JMM/ESA v1.0
Threshold Variations
Threshold varies from transistor to transistor.
VDD
Vout
If and then
pullup has big threshold, pulldown has small threshold, sum of variances > VDD inverter will not invert (Vout = 0V always.) How likely is this?

JMM/ESA v1.0
Threshold Variations, cont.

Analyze using Gaussian distribution. P(given inverter fails) = exp(-4 VDD / Vth) For given inverter... Vth = 0.08V, (Mead & Conway, p. 343) VDD = 5V ===> P = 10-110 VDD = 0.5V ===> P = 10-11
But with 10,000,000 transistors on the chip, a broken chip is very likely.
Question: Should threshold variance increase or decrease with scaling?

JMM/ESA v1.0
Lithographic Scaling Limits
Insert p. 1110 of Halliday and Resnick Here
Ultraviolet = = 0.3 X-Ray Lithography, = __________ Synchrotron lithography? Wavelength of an electron? Cost of FABs. Optical tricks.
JMM/ESA v1.0
Fundamental Physical Limits

Thermodynamic How much entropy change to set a bit? Reversability Quantum Limits Tunnelling For Eb of 1eV, the gate oxides and depletion layers must be thicker than 1 nm. In the IBM 0.4um process, the gate thickness is 7 nm. Thermal Limits

JMM/ESA v1.0
Is this the beginning of the end?
Not really. VLSI is not yet really up against any fundamental physical constraint. The constraints that were facing are technological hurdles. With sufficient economic incentive, technological hurdles are cleared. Wires are a lot more important than in the past.

JMM/ESA v1.0
Coming Up...
Next topic MOS memories. Static and dynamic RAM cells. Single and double-ended bit line sensing. Multiport register files. Readings for next time Weste: 4.13

JMM/ESA v1.0

CMOS Memories
I wonder which part does the remembering?

JMM/ESA v1.0
Semiconductor Memories
Usually the majority of transistors found in a modern system are devoted to data storage in the form of random-access memories. The need for increased densities and lower prices has driven the development of improved VLSI technology. Uses:
main memory high capacity, low cost cache memories, TLBs fast access programming info (eg, FPGA) non-volatile
Read-only memories: ROM (non-volatile!) Mask programmed Programmable ROM (PROM) Erasable PROM (EPROM) Electrically Erasable PROM (EEPROM) Read/Write or Random Access memories: RAM Static RAM (SRAM) Multiport SRAM (Register Files) Content-Addressable Memories (CAM) Non-volatile SRAM (NVRAM) Dynamic RAM (DRAM) Serial-access video memories (VRAM) Synchronous DRAM (SDRAM) RAMBUS ...
JMM/ESA v1.0
Design Tradeoffs
density: bits/unit area. Usually higher density also means lower cost per bit. Improvements due to finer lithography, better capacitor structures, new materials with higher dielectric constants.
Speed: access time (latency) and bandwidth. Improvements due to better sensing (smaller voltage swing), increased parallelism (overlapped accesses), faster I/O.
Power consumption: want power to depend on access pattern not quantity of bits stored. Improvements due to lower supply voltage.
Improvements in one dimension come at an increased cost in the other dimensions.

JMM/ESA v1.0
Memory Architecture
bit lines Col. 1 Col. 2 Col. 3 Col. 2M Row 1 Row Address Decoder N Row 2 word lines
Row 2N memory cell (one bit) Column Decoder D DATA
M N+M
w Most memory layouts are folded, i.e., D < M. Why? w What are there practical upper bounds on M and N? w What if you want even more memory? w Why only one bit per cell? (Not a silly question!) w Why are page-mode accesses a good idea?
JMM/ESA v1.0
ROM Circuits
NOR-based ROM array
shared ground R1
R2
R3
R4
shared bit line contact R1 1 0 0 0
C1
C2
C3
C4
R2 R3 R4 0 0 0 1 0 0 0 1 0 0 0 1
C1 0 0 1 0
C2 1 0 0 1
C3 0 1 0 1
C4 1 1 1 0

JMM/ESA v1.0
ROM Layout
VDD
GND
shared contact no pulldown pulldown
shared ground
ground and word line refresh
w Which are the word lines? the bit lines? w Why are the word lines strapped with M2? w What layers change when programming changes? w How often should signals be refreshed?
JMM/ESA v1.0
ROM Performance
tACCESS = tROW DECODE + tCOLUMN + tCOL DECODE tROW DECODE : If ROM is large, row decode logic is just a small percentage of total area. So we can make the driver for the word line large and thus fast. Note that we need to strap the poly word line to eliminate slow down due to poly resistance. tCOL DECODE: As with the row decode logic, we can increase speed by increasing size of transistors in this section. t COLUMN: We want small program transistors to keep the total area of ROM as small as possible. Also increasing size of pulldowns increases load on both word and bit lines. This means were limited in the speed we can achieve in pulling down the column. If CPD,DRAIN = 10fF and we have 128 rows: tCOLUMN = C V / I AV = (10fF)(128)(2.5V)/(30uA) = 110ns
Too slow! which of these can we fix?

JMM/ESA v1.0
Sense Amplifiers
Lets speed things up by sensing small changes in the bit line voltage using a sense amplifier:
R1
R2
C1
column (tree) decoder
C1
C0 C0
tenths of a volt amplified to full rail-to-rail swing
SENSE AMP

JMM/ESA v1.0
Single-ended Sense Amp

voltage reference (fets sized to produce VREF = 3V) M1 M2 2 series fets in column decoder MD M4 Choose fet sizes so that M2, MD >> MC >> M1 M3 >> M4 1 M3
bit line (pullup built into sense amp)
MC
word line -- enables pulldown when row is selected
memory cell pulldowns (connected to bit line)
When bit line is not pulled down, V1 = VDD and V2 = VREF - Vth = 2V, so M3 is off and M4 is on and the output is pulled low. When a bit line pulldown is turned on, V2 starts to drop and M2 conducts well enough so that V1 drops to V2 since MC >> M1. When V1 and V2 drop 0.5V to 1.5V, M3 is strongly conducting and M4 is weakly conducting, so output goes high. So small V on bit line produces large output swing.

JMM/ESA v1.0
SRAM Circuits
precharge or VDD
static bistable storage element
6-T SRAM Cell
access fet
word line rdata bit

tie bulk to source if possible
Differential Sense Amp
bit
precharge or VDD
long-channel fet used as current source Use CLK if possible to reduce power and improve speed
clocked cross-coupled sense amp
clk
write wdata

JMM/ESA v1.0
6-T SRAM Cell Layout

VDD
inverter pullup
GND
inverter pulldown
strapped word line bit line bit line
access fet
Pulldowns do the work when access fet is turned on, pullups can be small to save space and make the cell easy to write.
JMM/ESA v1.0
SRAM Read Cycle

VDD
6-T SRAM Cell
VDD
bit word 1
word
data
volts
bit
Cell pullup has no real effect
bit
word bit
make this big
bit
keep away from inverter threshold
1
time
Choose WPU, WACCESS, W INV so that: fast bit line recovery when WORD goes low dont want to flip selected cell on read (V1 < VTH,INV) large V on BIT lines to speed up sensing minimize cell size
JMM/ESA v1.0
Differential Sense Amp

rdata
4.8/0.6 4.8/0.6 V2
bit
1
4.8/0.6
V1
bit
4.8/0.6
VDD
0.9/7.2 VCS
long-channel fet used as current source

JMM/ESA v1.0
Fast Address Decoding

Logically, row/column decoders can be built from wide fan-in AND gates. But these are slow, place heavy loading on address wires and may be hard to fit into the pitch of the memory cell.
A2
A1
A0
One can use predecode logic to decode blocks of addresses which are then further decoded using smaller AND gates. The address lines going to the predecode gates are less loaded and all gates have smaller fanin decode happens faster. Layout works better too!
A2 A1 A0

JMM/ESA v1.0
Multiport SRAM (Reg File)

One can increase the number of SRAM ports by adding access transistors. Writes are usually double-ended; single-ended reads can be used to save space.
write read0 read1 rd0 wd wd rd1
An alternative design that can be easily expanded without worrying about unintentionally flipping the cell on reads is shown below.
wd
PU = 2/1 PD = 4/1 4/1 5/1 2/1
rd0
rd1
2/1
write read0 read1

JMM/ESA v1.0
PU = 2/2 PD = 2/3
Content-addressable RAM
By adding two transistors to the 6-T SRAM cell one can form an XOR gate to compare the cell contents to data on the bit lines. The output of this logic can drive a pulldown in a distributed NOR gate to form a word match signal for a content-addressable memory (CAM).
word
xor gate
match
This node goes high if data on bit lines doesnt match data in the cell.
This node will be pulled down if any bit of the word doesnt match
Read and Write cycles: like before Match cycle: place data on bit lines but dont assert word line.
JMM/ESA v1.0
CAM Architecture
weste, figure 8.76(b)
The word match lines from the CAM array can be used as WORD lines in a companion RAM to read out other data associated with the tag stored in the CAM. Uses: fully-associative caches, translation lookaside buffers (TLBs), ...

JMM/ESA v1.0
3-T Dynamic RAM

precharge 3-T DRAM Cell read Precharge happens before each r/w cycle. READ/WRITE and PRECHARGE dont overlap.
CW write
CC
CR Data is stored on CC. Its not destroyed on read, but will leak away through write transistor. CW >> CC rdata
wdata
WRITE: After precharge, CW is charged high. When WRITE is asserted CW shares charge with CC and dominates since CW >> CC. If WDATA is asserted, both CW and CR will be discharged, writing a 0 into the cell; otherwise a 1 will be written.
READ: After precharge, CR is charged high. When READ is asserted CR is pulled low if theres a stored 1 or remains unchanged if theres a stored 0. A sense amp is usually used to speed up the availability of read data.
Pros: little or no static power, smaller than SRAM Cons: needs refresh, need time to precharge
JMM/ESA v1.0
1-T Dynamic Ram

Explicit storage capacitor (fet gate, trench, stack) = 30fF to 100fF. If we want higher C:
better dielectric more area
1-T DRAM Cell word access fet
VREF
A C= d
thinner film
bit
TiN top electrode (VREF)
Ta2O5 dielectric
poly word line
W bottom electrode access fet Stack DRAM Cell

JMM/ESA v1.0
1-T DRAM Read Cycle

DSL lbit R2 R1 R 129 PC DSR rbit R 130
C VDD
C/2 CS PC PC
C/2
C VDD
read out of dummy cell half way between 0 and 1 value
lbit, rbit precharge (PC) row sel (RN) dummy sel (DSL,R) column sel (CS) precharge bit lines, discharge dummy cells read out bit, opposite dummy amplify difference, restore bit cell
JMM/ESA v1.0
Coming Up...
Next time: Driving large loads: I/O circuits (edge rates, ESD protection, latch up) Clock generation and distribution (skew) Readings for next time Weste: 5.4.2, 5.5, 5.6

JMM/ESA v1.0
VLSI Design I
Defect Mechanisms and Fault Models
Hes dead Jim...
Overview Defects Fault models Goal: You know the difference between design and fabrication defects. You know sources of defects and you can estimate yield. You can handle fault models at different abstraction levels.
JMM v1.4
Design Defects
Design
Specification
it helps to have a specification to compare against! if specification is written in a hardware description language from which the design is synthesized then the design should be defectdefect-free (modulo bugs in the synthesis software!) Of course the specification may be buggy... everyone feels better if the design/specification are run in the environment in which they will be used. For example, in testing a processor chip, one might boot the operating system and run some key programs, all under simulation. This leads to the need for lots of simulation cycles, e.g., as provided by a hardware emulation system. system. NowNow-a-days these are built using a small army of FPGAs. FPGAs. Other choices: inin-circuit emulation, cyclecycle-based simulators.

JMM v1.4
Manufacturing Defects
Goal: verify every gate is operating as expected
Defects from misalignment, dust and other particles, stacking faults, pinholes in dielectrics, mask scratches & dirt, thickness thickness variations layerlayer-toto-layer shorts, discontinuous wires (opens), circuit sensitivities (VTH, LCHANNEL). Find during wafer probe. Defects from scratching in handling, damage during bonding to lead frame, manufacturing defects undetected during wafer probe (particularly speedspeed-related problems). Find during testing of packaged parts. Defects from damage during board insertion (thermal, ESD), infant mortality (manufacturing defects that show up after a few hours of use). Also noise problems, susceptibility to latchlatch-up... Find during testing/burntesting/burn-in of boards. Defects that only appear after months or years of use (metal migration, oxide damage during manufacture, impurities). Found by customer (oops!).
Cost of replacing defective component increases by an order of magnitude with each stage of manufacture.
JMM v1.4
Production defects in CMOS circuits

a lot of complex processing steps are used to manufacture a chip -> defects defects and their effect depend on circuit topology and process knowledge of chemical and physical mechanisms who lead to defects are essential circuit complexity and surface determine testability and yield testability and yield are key factors for future VLSI technologies

JMM v1.4
VLSI fabrication process

fabrication process consists of a sequence of well defined process steps 50 wafers form a batch each wafer contains 100's or 1000's of chips specific test chips are distributed on the wafers test chips allow to monitor process parameters between a set of process steps the test structures are measured
process control parameters controlling geometrical chip's structurs layout measure conditions tolerances tolerances wafer for futher processing
process steps
monitor steps wafer not futher processed
disturbances environment changing

JMM v1.4
VLSI fabrication process (cont)

chip fabrication tests: process parameters
oxide thickness, distances of structures, etc
electrical parameters
currents, resistances, threshold voltages, ...
chip test on wafer packaged chip test

controlling layout disturbances parameter measuring of test-chips parameter and function test of packaged chips
wafer fabrication
bonding packaging
measuring of process parameters
parameter and function test of chips on wafer

JMM v1.4
VLSI fabrication process (cont)

parameter test
test of electrical parameters: current consumption, quiescent currents, voltage levels, delay times, etc.
function test
test for logical faults: binary test sequences are applied to the device under test (DUT)

JMM v1.4
Defect classification
defects occur at different fabrication steps: defects at wafer fabrication defects at chip packaging defects during chip lifetime

JMM v1.4
Defects at wafer fabrication

50% of all defects reason:
changes in fabrication environment substrate inhomogenities, inhomogenities, mask misalignment dust particles, photolithography defects
local or global effects electrical effects depend on layout topology

changes in delay, current consumption shorts, opens

JMM v1.4
Defect at chip packaging

reasons:
bonding problems mechanical stress
effect:
normally occur at primary inputs or outputs
easy to detect

JMM v1.4
Defects during lifetime

time dependant mechanisms lead to defects
early defects: high defect rate (burn(burn-in) middle life phase: low defect rate wear defects: defect rate climbs with time
defect rate early defects middle life phase wear defects
time

JMM v1.4
Yield modeling
defects can produce faults yield is percentage of fault free chips yield influences chip cost yield models are necessary to predict chip cost local defects produce most faults assumption: local defects are statistically independent and occur with probability p binominal distribution Pr{K=k} = Pr{k from n areas are faulty} due to Bernoulli n n k Pr{K = k} = (1 p ) p k k
with n to infinity and p to zero (np = ) we find k Pr{K = k} = e k!

JMM v1.4
Yield modeling (cont)

expectation value E {K } = ke =
k =0
probability that a chip is fault free Pr{K = 0} = e DA Murphy normalized density function f(D) Y = e AD f (D )dD
0
calculation of yield with Murphy's density function f(D) Y1, Y2, Y3 ? f(D) (for high yield) 1/D
0
1 e Y2 = AD 0
AD0
2
f2 f3 1/(2 D0)
Seed's yield model Y = e AD0 (for low yield)

JMM v1.4
f1
D0
2D0
Yield modeling (cont)

the bigger the circuit the higher the probability for a faulty chip example: 2 wafers with the same 17 defects
wafer with total 44 chips yield 61% wafer with total 316 chips yield 95%

JMM v1.4
VLSI fabrication process: conclusion

defects occur during wafer fabrication, chip packaging and during chip lifetime local and global defects local defects dominate at mature process local defects are hard to find and costly

JMM v1.4
Fault models for integrated circuits

complex circuits need more test time test time with expensive equipment leads to high test cost per chip to reduce test time fault models for structured test approaches are required if a system behaves not as expected, faults are present faults can be modeled at different electrical levels faults can be caused by defects
they occur during fabrication or life time
design errors produce designdesign-faults

for example faulty logic implementation of functions design validation is necessary

JMM v1.4
Fault models: Testing approaches

Plan: supply a set of test vectors that specify an input or output value for every pin on every cycle. Tester will load the program program into the pin cards, run it and report any discrepancies between an observed output value and the expected value.
0000 0001 0002 0003 cycle #
1 1 1 1
10 10 01 00
0000 0000 1111 1011
XXXX LLLL LLLL HLHL
input to chip = {0, 1} output from chip = {L, H} tritri-state/no compare = { X }
program for 11 pins
How many vectors do we need?

n n combinational logic m combinational logic m
2n inputs required to exhaustively test circuit

If n=50, m=25, 1ns/test then test time > 106 years
2n+m inputs required to exhaustively test circuit
Exhaustive testing is not only impractical, its not necessary! Instead we only need to verify that no faults are present which may take many fewer vectors.
JMM v1.4
Fault models: abstraction level

circuits are treated at different abstraction levels
analog or memory circuits are treated at transistor level medium size digital circuits are treated at logic level complex digital circuits or microprocessors are normally treated at functional level
example of fault manifestation: missing polysilicon material

layout level: ex. missing polysilicon electrical level: ex. open interconnection transistor level: ex. permanently shortshort-circuited transistor (if missing polysilicon gate) logic level: ex. permanent logic level "1" functional level: ex. register not resetable ...

JMM v1.4
Fault models (cont)

fault dependencies faults are layout dependent fault are technology dependent goals of fault models fault models should be realistic and thus depend on physical defect mechanisms fault models should be simple and treatable

JMM v1.4
Hard to detect faults

transient (intermittent) faults occur only from time to time due to environment changing no satisfactory strategy to search them
repeating search builtbuilt-in test: selfself-checking circuits, errorerror-correctingcorrectingcircuits circuit-blocks redundant use of several identical circuit-
benefits of redundant circuits redundancy for higher functionality security redundancy to eliminate hazards disadvantages of redundant circuits faults not detectable (masking effect)

JMM v1.4
Logic level fault models

historical perspective Eldred proposed 1959 methods how to test computers with relays, diodes, tubes, which behaved like switches stimulation of development of fault models on logic level stuckstuck-at fault model signal can be stuck at "0" or "1" independent of process technology does not model technology dependant characteristics mathematical calculus exists very useful for TTL technology (or other old "current" technologies, but not for "charge" technologies like CMOS)

JMM v1.4
Logic level fault models (cont)

Traditional model, first developed for boardboard-level tests, assumes that a node gets stuck at a 0 or 1, presumably by shorting to GND or VDD.
stuck at 0 = SS-A-0 = node@0 stuck at 1 = SS-A-1 = node@1
A B X C D
Z = ABCD ZB@1 = ACD ZB@0 = 0
example of TTL NAND gate with many defects describable with stuckstuck-at fault model
R1 R2 R4
I1 T1 I2 R3
T4 T2 O T3

JMM v1.4
Fault reduction
fault collapsing
fault equivalence fault dominance single faults, multiple faults
fault detection
) fault free function: f(x) ) with fault : f(x)
test vectors x detect fault, if condition is fulfilled: f ( x ) f ( x ) = 1 fault equivalence f ( x ) = f ( x ) fault dominance T T fault dominates
A 0 0 1 1 B 0 1 0 1 A C B fault classes /1 <=> /1 <=> /1 /0 => /0 /0 => /0 /0
/1 A stuck-at-1 <=> equivalence => dominance

JMM v1.4
Logic level fault models

fault dominance T represents test vector set to detect fault fault dominates fault under condition T T for test generation only tests for fault are necessary multiple faults: fault masking problems

JMM v1.4
Transistor level fault models

introduced due to imperfection of logic level fault models, especially for CMOS technology dependant and thus more realistic more complex to handle and thus not useful for large circuits transistor level fault models:
Wadsack's model Hayes' switch level model Reddy's restrictions due to static discharge robust test sets

JMM v1.4
Transistor level fault models (cont)

Wadsack's fault models for CMOS: defects can lead to memory effects faulty combinational logic may behave like sequential logic this effect was modeled by introducing flipflip-flop's in order to use stuckstuck-at models stuckstuck-at syndrome !
A
fault free
B
stuckstuck-at
stuckstuck-open
asop
bsop
vddsop A B Y /0 /0 /0 a b vdd Y 0 0 1 0 1 0 1 0 0 1 1 0

JMM v1.4
Functional level fault models

VLSI circuits need simple fault models goal of test: it is sometimes sufficient to know if a subsub-function works correctly model of functional faults of subsub-circuit each subsub-function has its own process dependent faults advantage:
fast simulation short test time process dependent good knowledge on important subsub-functions (ex. RAM's)
disadvantage
less accurate not useful for all subsub-functions

JMM v1.4
Functional level fault models: example

example of CMOS multiplexer with n inputs: behavior under faults:
an other input is selected one of the n inputs has a stuckstuck-at fault two inputs are selected (AND or OR result at output) if the complementary value arrives at a selected input, an error occurs at the output if the complementary value of the selected input arrives at a neighbor of the selected input, an error occurs at the output
S0 S1 S2 A0 A1 A2 A3 A4 A5 A6 A7
88to to1 1MUX MUX

JMM v1.4
Fault models summary

fault models are used to model the effects of fabrication defects on abstract levels fault models allow to search directly for circuit defects fault models need to be simple and precise CMOS defects are bad modeled with stuckstuck-at fault model

JMM v1.4
Coming Up...
Next topic Test pattern generation and fault simulation Readings for next time Weste Weste: :
reading 7 through 7.2.1

JMM v1.4
VLSIExercises: VLSI -19 #1

Ex vlsi19.1 (difficulty: easy): Calculate the yield of a circuit of area 5 mm2 and 1 cm2 if the defect rate D is 2 defects per cm2. Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low yield equation), see vlsivlsi-19/13 Ex vlsi19.2 (difficulty: easy): Discuss the circuits function with the introduction of the stuckstuck-open fault Fx=open. Can this fault be modeled by a stuckstuck-at fault?
C A A B D B F = (A+C)(B+D) C X D FX=OPEN = __________

JMM v1.4
#2
Ex vlsi19.3 (difficulty: easy): Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low yield equation), see vlsivlsi-19/13 Ex vlsi19.4 (difficulty: easy): Discuss faults due to defects at the TTL nand gate on transparency 22. What kind of stuckstuck-at fault do you have if a) R1 is an open circuit, b)open at I1, c) open in R2 Result: a) O ss-a-1 s-a-1, b) I1 s-a-1, c) O s-

JMM v1.4
VLSI Design I
Test Pattern Generation and Fault Simulation
Lets test a chip?
Overview Test pattern generation Fault simulation Goal: Design for testability terms like controllability and observability are known. You are familiar with test pattern algorithms as well as with testability measure metrics.
JMM v1.4
Testers
The device under test (DUT) can be a site on a wafer or a packaged part.
100s pin circuitry
Each pin on the chip is driven/observed by a separate set of circuitry which typically can drive the pin to o one ne data value per cycle or observe (strobe) the value of the pin at a particular particular point in a clock cycle. Timing of input transitions and sampling of outputs outputs is controlled by a small (<< # of pins) number of highhighresolution timing generators. To increase the number of possible input patterns, different data formats are provided: provided:
tCYCLE
nonnon-returnreturn-toto-zero (NRZ) returnreturn-toto-zero (RTZ) returnreturn-toto-one (RTO) surroundsurround-byby-complement (SBC)
JMM v1.4
data data data ~data data ~data
Test pattern generation

test generation is a time consuming task computercomputer-aided test programs (CAT) help designer but do not solve test problems approaches to manage test problem with increasing circuit complexity (research fields)
design for testability algorithms to generate good test vectors
design for testability: controllability, observability system designer needs DFT knowledge
ad-hoc approaches to augment controllability: ad partitioning, more testtest-pads scan-path, structured methods, multiplexer approach, scanbuiltbuilt-in logic block observation (BILBO), boundaryboundary-scan, signature analysis, etc...

JMM v1.4
Algorithms for test pattern generation

basic concepts for test generation for stuckstuck-at fault models in combinational circuits algebraic test generation: boolean difference D-algorithm Podem and FAN algorithms controllability and observability measuring

JMM v1.4
Boolean difference
algebraic method: boolean difference circuits function with input vector x f ( x ) = f ( x1 ... x n ) for ith component of vector x with fix value we define f i (1) = f ( x1 ,..., x i 1 ,1, x i +1 ..., xn ) f i (0 ) = f ( x1 ,..., xi 1 ,0, x i +1 ..., x n ) definition of boolean difference f ( x ) = f ( x1 ,..., x i ,..., x n ) f ( x1 ,..., xi ,..., xn ) x i f ( x ) = f i (0 ) f i (1) xi circuit with fault : stuckstuck-atat-1 at input xi
f ( x ) = f ( x1 ,..., x i 1 ,1, x i +1 ..., xn ) = f (1)
) and f(1) to detect ss-a-1 faults the two functions f(x) must produce different results, so the test vector set is f defined by T=1 T = f ( x ) f ( x ) = x i x i f for ss-a-0 faults: T = f ( x ) f ( x ) = xi xi
JMM v1.4
Boolean difference: Rules

f (x ) f (x ) = xi xi f (x ) f (x ) = xi x j x j xi
f (x ) f (x ) = xi xi
[ f (x )g (x )] g (x ) f (x ) f (x ) g (x ) = f (x ) g (x ) xi xi xi xi xi [ f (x ) + g (x )] g (x ) f (x ) f (x ) g (x ) = f (x ) g (x ) xi xi xi xi xi f (x ) [ f ( x )g ( x )] = g(x ) x i x i f (x ) [ f ( x ) + g ( x )] = g(x ) x i x i [ f (x ) + g (x )] [ f (x ) g (x )] = xi xi [ f (x ) g (x )] f (x ) g (x ) = xi xi xi
JMM v1.4
g ( x ) independent of xi
Boolean difference: example

Example Ex 20.1 (medium): circuit with stuckstuck-atat-1 fault at x3. Find all test patterns which detect the the fault with means of the boolean difference.
x1 x2 x3
G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
s3

JMM v1.4
DTest generation: D -algorithm

Basics: fault sensitisation (provoke error) fault propagation line justification D-notation
a signal with value D is fault free if D = 1 a signal with value D is faulty if D=0 a signal with value D is fault free if D = 0 a signal with value D is faulty if D=1
very formal table manipulation procedure advantage:

if test vector exists it will be found programmable for computers
disadvantage:
conflicts lead to time consuming dummy calculations not usable for large circuits
JMM v1.4
Test generation: Path sensitisation

Example Ex20.2 (easy): circuit with stuckstuck-atat-1 fault at x3. Find test vectors with means of DD-algorithm
sensitization: fault propagation: line justification
Step 1: Sensitize circuit. circuit. Find input values that produce a value on the faulty node thats different from the value forced by the fault. For our SS-A-1 fault above, want output of AND gate to be 0.
Is this always possible? What would it mean if no such input values exist? Is the set of sensitizing input values unique? If not, which should one choose? Whats left to do?
x1 x2 x3 G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
Xs
S-A-1

JMM v1.4
Test generation: Fault propagation

Step 2: Fault propagation. propagation. Select a path that propagates the faulty value to an observed output (y in our example).
x1 x2 x3
G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
Xs
S-A-1

JMM v1.4
Test generation: Line justification

Step 3: Line justification. justification. Find a set of input values that enables the selected path (backtracking).
Is this always possible? What would it mean if no such input values exist? Is the set of enabling input values unique? If not, which should one choose?
x1 x2 x3
G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
Xs
S-A-1

JMM v1.4
Test generation: PODEM, FAN

more recent algorithms like Podem, Podem, FAN or others basically intend to prevent conflict situations or to detect them as early as possible concept:
take decisions as late as possible (prevent wrong decisions, perhaps there is later nothing to decide) heuristics help to take decisions which succeed with higher probability
controllability and observability measuring necessary

JMM v1.4
PODEM
Podem algorithm is simpler to understand than DDalgorithm backtrack (branch(branch-andand-bound) algorithm is used
small steps to reach objective dead-end, go back if objective leads to dead-
backtrack (branch(branch-andand-bound) in Podem: Podem:

all signals are initialised to "X" fault sensitisation during fault propagation D symbols are propagated only one step to primary outputs (branch) immediate line justification of selected signal to primary inputs (new input with value corresponds branch of decision tree) succeeding fault simulation immediately detects conflict situations (bound) new branch

JMM v1.4
PODEM: Example
branchbranch-andand-bound tree
nodes represent decisions branches represent PI's 9 represent 1st decision faulty stuck-atat-1 example x1 stuck-
start
x1=0 x2=1
G1 s1 x2 x3
1
x1
G3 & s2 G5 y
1
G2 x4
G4 &
s3

JMM v1.4
Test pattern generation: Heuristics

Heuristics in FAN algorithm fault propagation
propagate to PO on path which is best observable
line justification
start with the most difficult path to control
heuristics help to find test vectors faster

JMM v1.4
Controllability and observability measure

often used to solve npnp-complete problems heuristics do not guarantee to find a solution in a given time testability measure methods:
temas, testscreen, testscreen, victor, camelot, camelot, scoap temas,
sandia controllability/observability controllability/observability analysis program ( (scoap scoap) scoap) each node in a circuit gets values for its controllability, observability and testability high values indicate nodes which are hard to control or to observe distinguish between "1" and "0" controllability distinguish between combinational and sequential values

JMM v1.4
Observability & Controllability

When propagating faulty values to observed outputs we are often faced with several choices for which should be the next gate in our path. path.
? ?
Wed like to have a way to measure the observability of a node, i.e., some indication of how hard it is to observe the node at the outputs outputs of the chip. During fault propagation we could choose the gate whose whose output was easiest to observe. Similarly, during backtracking we need a way to choose between alternative ways of forcing a particular value:
which input should we try to set to 0?
want 0 here
In this case, wed like to have a way to measure the controllability of a node, i.e., some indication of how easy it is to force the node to 0 or 1. During backtracking we could choose the input that was easiest to control.
JMM v1.4
Testability measurement: Scoap algorithm

combinational "1" and "0" controllability of a logic gate output y dependent on inputs x1..x3 OR gate: CC 0 ( y ) = CC 0 ( x1 ) + CC 0 ( x2 ) + CC 0 ( x3 ) + 1
CC 1 ( y ) = min{ CC 1 ( x1 ), CC 1 ( x 2 ), CC 1 ( x3 )}+ 1
AND gate: CC 0 ( y ) = min{ CC 0 ( x1 ), CC 0 ( x2 ), CC 0 ( x3 )}+ 1 CC 1 ( y ) = CC 1 ( x1 ) + CC 1 ( x 2 ) + CC 1 ( x3 ) + 1 combinational "1" and "0" observability of a logic gate dependent on output y and inputs x2,x3 OR gate: CO ( x1 ) = CO ( y ) + CC 0 ( x 2 ) + CC 0 ( x3 ) + 1 AND gate: CO ( x1 ) = CO ( y ) + CC 1 ( x2 ) + CC 1 ( x3 ) + 1 initialization (N are internal nodes, X,Y are PI, PO's) CC 0 ( X ) = 1 CC 0 (N ) = CO (Y ) = 0 CC 1 ( X ) = 1 CC 1 (N ) = CO (N ) =
JMM v1.4
Testability measurement: Scoap algorithm (cont)

hmmm. I guess smaller numbers are better...
testability measure assumes that the further a node is from an input/output the harder it is to set/observe CC0(Z) = min[CC0(A), CC0(B)] + 1 A CC1(Z) = CC1(A) + CC1(B) + 1 Z B CO(A) = CO(Z) + CC1(B) + 1 CO(B) = CO(Z) + CC1(A) + 1 if more than one, choose min
A B
CC0(Z) = CC0(A) + CC0(B) + 1 CC1(Z) = min[CC1(A), CC1(B)] + 1 CO(A) = CO(Z) + CC0(B) + 1 CO(B) = CO(Z) + CC0(A) + 1
-.-.0 -.-.0 CC0,CC1,CO
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-
JMM v1.4
Fault simulation
goals of fault simulation: analyze circuit under faults condition qualify test sequence, fault coverage reduce fault set during test generation good quality fault models necessary fault simulation methods parallel fault simulation concurrent fault simulation deductive fault simulation alternative to fault simulation in test generation procedures:
tracing fault sensitive paths

JMM v1.4
Fault simulation (cont)

parallel fault simulation principle: computing with 11-bit or n-bit wide variables need similar computing time test of nn-1 faults at the same time
bitposition 1 2 3 4 5
A=[00000]
fault fault free A s-a-1 B s-a-1 C s-a-1 C s-a-0
mask MA=[01000] MB=[00100] MC=[00011]
fault values [01000] [00100] [00010]
A'=[01000] MA B'=[00100] MB
1
C=[01100] MC
C'=[01110]
B=[00000]

JMM v1.4
Fault Grading
So, youve constructed a set of test vectors using the techniques described here. Will they detect all the faulty parts? You could see how many different faults your vectors detect by inserting each possible fault one at a time, running the vectors, then check to see if some output was different from the good machine on some cycle. Need *lots* of simulation probably impractical for large circuits even with hardwarehardware-accelerated simulator. You can use the same sorts of statistical sampling techniques that other QA programs employ: randomly select a set of faults, fault grade your vectors on those faults and use standard statistical techniques to see if fault coverage exceeds a desired level. The level of confidence may be increased by increasing the number of samples.
JMM v1.4
Conclusion
defects during chip fabrication are inevitable faults model defects on higher abstraction levels higher chip complexity, more gates and less pads reduce controllability, observability and thus testability test pattern generation is going to be time consuming and thus costly structured design for test during chip development is required

JMM v1.4
Coming Up...
Next topic Design for Testability Readings for next time Weste:
Sections
7.2.2 thru 7.2.5

JMM v1.4
#1
Ex vlsi20.3 (difficulty: medium): The digital circuit suffers from error s-a-0. Try to find test patterns by means of DD-algorithm. If you dont succeed use the boolean difference to calculate the test patterns. Result: T=x1 x2 x3 x4 found by boolean difference
G2 & x3 x1 x2 & & G1 & x4 G5 & G4
G3 G6 y
&

JMM v1.4
VLSIExercises: VLSI -20 #2

Ex vlsi20.4 (difficulty: easy): Calculate the Scoap combinational controllability and observability values for the circuit below (CC0,CC1,CO) Result: x1 (1,1,7), x2 (1,1,7), x3 (1,1,5), x4 (1,1,5), s1 (3,2,5), s2 (2,4,3), s3 (2,3,3), y (4,5,0)
x1 x2 x3 G1
1
s1
G3 & s2 G5 y
1
G4 &
x4
s3
Ex vlsi20.5 (difficulty: easy): a) circuit with stuckstuckatat-0 fault at s1. b) circuit with stuckstuck-atat-0 fault at x1. Find all test patterns which detect the the fault with means of the boolean difference. Result equations a) x=(x1+x2)x3, b) x=x1x2x3

JMM v1.4
VLSI Systems Design

Top-down Design and HDLs
It seems I have to hurry up!
Overview ? Top down design-flow, VHDL hardware description language, test-bench methodology Goal: You are able to design circuits with the VHDL language with behavioral, dataflow and structural modeling. You are familiar with the top down design flow and the test-bench methodology.
JMM v1.4
chapter 1
The Need for HDLs

A specification is an engineering contract that lists all the goals for a project: ?goals include area, power, throughput, latency, functionality, test coverage, costs (NREs and piece costs). Helps you figure out when youre done and how to make engineering tradeoffs. Later on goals help remind everyone (especially management) what was agreed to! ?partition the project into modules with well-defined interfaces so that each module can be worked on by a separate team. Gives the SW types a head start too! (Hardware/software codesign) ?A behavioral model serves as an executable specification that documents the exact behavior of all the individual modules and their interfaces. Since one can run tests, this model can be refined and finally verified through simulation. We need a way to talk about what hardware should do without actually designing the hardware itself, i.e., need to separate functionality from implementation. We need a Hardware Description Language
JMM v1.4
The Need for HDLs cont.

?easier to explore ideas in HDLs than in logic gates ?stepwise refinement: HDLs allow to describe designs at various levels of abstraction ?HDLs sustain description-synthesis method ?pitfalls: abstract models are not precise ?first HDLs were introduced in late 70s ?difficulties to develop general purpose HDL for signal-processing and real-time applications and ... ?portability needs lead to standardizations (Institute of electrical and electronics engineering, IEEE)

JMM v1.4
Hardware Description Languages

?textual HDLs
VHDL, Verilog-HDL, HardwareC, etc.
?graphic HDLs
SpecdChart, etc. (control & dataflow graphs)
?tabular HDLs
BIF, etc. (FSMD models in tabular forms)
?time-diagram HDLs
Waves, etc.
?Standardization
? VHDL: IEEE Std 1067-1987 & 1993 ? std_logic package IEEE Std 1164-1993 ? Verilog-HDL: IEEE Std 1997

JMM v1.4
A Tale of Two HDLs

VHDL
VHSIC HDL, Very High Speed Integrated Circuits. ADA-like verbose syntax, lots of redundancy Extensible types and simulation engine. Logic representations are not built in and have evolved with time (IEEE-1164). Design is composed of entities each of which can have multiple architectures. A configuration chooses what architecture is used for a given instance of an entity. Behavioral, structural, logic-level modeling Synthesizable subset... Harder to learn and use, not technology-specific, DoD mandate.
JMM v1.4
Verilog-HDL
C-like concise syntax
Built-in types and logic representations. Oddly, this has led to slightly incompatible simulators from different vendors. Design is composed of modules.
Behavioral, structural, logic-level modeling Synthesizable subset... Easy to learn and use, fast simulation, good for logic. Gateway Design Automation
Introduction to VHDL & Verilog

VHDL ?rich & powerful language ?data type driven language ?goal: documentation of large complex systems language structures ?entity (hierarchy interface) ?architecture (behavior of system) ?configuration (binding of entity and architecture) ?package (library of global types or blocks)
JMM v1.4
Verilog-HDL ? simple & efficient language ? hardware driven language ? goal: automatic synthesis language structures ? module (blocks or subblocks) ? #include (file structuring)
Introduction to VHDL & Verilog cont.

language features ?signal data types (in, out, bidir, signal-strength ...) ?hardware structures (memory, register-files, ...) ?logic operators (shift, rotation, masking, ...) ?asynchronous structures (set, reset of memories) ?parallel or synchronous structures ?constraints (pin, technology, area, delays, ...) ?inter-process communications (shared medium, message passing, ...)
VHDL

JMM v1.4
chapter 2
Signals, Delays, Events, Concurrency

?digital systems in contrast to software systems are fundamentally about signals ?signals in contrast to variables do have delays which leads to signal waveforms ?digital systems are comprised of components ?digital systems do have concurrency of operation ?events on signals lead to computations that may generate events on other signals
event
a b sum carry
time (ns) 5 10 15 20 25 30 35 40

JMM v1.4
Signal Values
?signal values are physically associated to wires ?VHDL language supports signal type:
?type: bit, values: 0, 1 ?type: bit_vector, values: 0001, etc
?VHDL package IEEE 1164 supports signal type:

?type: std_ulogic and vector std_ulogic_vector ?std_ulogic is a 9 value logic
value U X 0 1 Z W L H -
interpretation un-initialized forcing unknown forcing 0 forcing 1 high impedance weak unknown weak 0 weak 1 dont care
JMM v1.4
Resolved Signals
?it is common for components in a digital system to have multiple sources for the value of an input signal ?many designs use buses: a group of signals that can be shared among multiple sources ?the values on shared signals will be determined upon the type of interconnection, like wired logic ?the signal values depend on its implementation ? the VHDL simulator has to resolve the signals value ? The IEEE 1164 package offers std_logic and std_logic_vector signal types for resolved version of the signal std_ulogic and std_ulogic_vector
resolved signal necessary wired-or logic
unresolved signal

JMM v1.4
chapter 3
Entity
?the design entity is a primary programming abstraction in VHDL ?entity defines the interface of a component, without giving any information about the component behavior
a b entity HalfAdder is port (a,b : in bit; sum,carry : out bit); end HalfAdder; + sum carry
library IEEE; use IEEE.std_logic_1164.all entity HalfAdder is port (a,b : in std_ulogic; sum,carry : out std_ulogic); end HalfAdder;
JMM v1.4
Exercises Ex401: Entity

?Ex401 (difficulty: easy): Define the entities of the following digital components. Use the unresolved 9 value logic of the IEEE 1164 package. Each component has to be edited in a separate file with the components name plus extension .vhd . The files have to be analyzed by the Synopsys command: gvan
sNot i0 i1 i2 i3 8 bit data sel n z Alu32 op c use first letter of component name in capital, and first letter of signal name in small cap
JMM v1.4
Mux4to1 z
d clk
D_ff
q qNot
rNot
32 bit data 6 bit op-code
a b
Architecture
?the design architecture is a primary programming abstraction in VHDL ?architecture describes the internal behavior of a component, without giving any information about the component IOs ?The behavioral description can take many forms. These forms differ in the levels of abstraction and detail.
architecture behavior of HalfAdder is -- comment: declaration of variables begin ...
functional description of the system
end behavior;

JMM v1.4
Entity-Architecture: Hierarchy (VHDL vs. Verilog)

+
library IEEE; use IEEE.std_logic_1164.all; entity FullAdder is port (a,b,ci: in std_logic; co,s:out std_logic); end FullAdder; architecture behavior of FullAdder is -- comment: declaration of variables ...
VHDL
end behavior;
module FullAdder (a,b,ci,co,s); input a,b,ci; output co,s; /* comment: declarations of variables */ ...
Verilog-HDL

endmodule
JMM v1.4
Concurrency
?The operation of digital systems is inherently concurrent ?Within VHDL signals are assigned values using signal assignment statements <= ?Multiple signal assignment statements are executed concurrently concurrent architecture concurrent_behavior of HalfAdder is signal assignment begin
sum <= (a xor b) after 5 ns; carry <= (a and b) after 5 ns; end concurrent_behavior;
a b sum carry
time (ns) 5
JMM v1.4
10
15
20
25
30
35
40
Dataflow Model
#1
library IEEE; use IEEE.std_logic_1164.all; entity HalfAdder is port (a,b: in std_logic; carry,sum:out std_logic); end HalfAdder; architecture dataflow of HalfAdder is begin sum <= (a xor b) after 5 ns; carry <= (a and b) after 5 ns; end dataflow;

JMM v1.4
Dataflow Model
L1 + L3
s1
#2
L4 L2 s2
s3
L5
library IEEE; use IEEE.std_logic_1164.all; entity FullAdder is port (a,b,cIn: in std_logic; cOut,sum: out std_logic); end FullAdder;
architecture declarative segment architecture body
architecture dataflow of FullAdder is signal s1,s2,s3 : std_logic; constant gate_delay: Time:=5 ns; begin L1: s1 <= (a xor b) after gate_delay; L2: s2 <= (cIn and s1) after gate_delay; L3: s3 <= (a and b) after gate_delay; L4: sum <= (s1 xor cIn) after gate_delay; L5: cOut <= (s2 or s3) after gate_delay; end dataflow;
JMM v1.4
Signal Assignments
?simple signal assignments
#1
sum<=(a xor b) after 5 ns, (a or b) after 10 ns, (not a) after 15 ns; sig <= 0, 1 after 10 ns, 0 after 20 ns, 1 after 40 ns;
time (ns) 5 10 15 20 25 30 35 40
clock <= 0, not(clock) after 5 ns;

time (ns) 5 10 15 20 25 30 35 40
a <= 00000000_00000000, to_stdlogicvector(xabcd) after 5 ns;
Type conversion from hexadecimal to std_logic_vector is defined in package std_logic_1164

JMM v1.4
Conditional Signal Assignment #2

?The right hand value is computed immediately and assigned at some point in the future using the after clause
library IEEE; use IEEE.std_logic_1164.all; entity Mux4to1 is port (i0,i1,i2,i3: in std_logic_vector(7 downto 0); sel : in std_logic_vector(1 downto 0); z : out std_logic_vector(7 downto 0)); end Mux4to1; architecture dataflow of Mux4to1 is begin z <= i0 after 5 ns when sel=00 else i1 after 5 ns when sel=01 else i2 after 5 ns when sel=10 else i3 after 5 ns when sel=11 else 00000000 after 5 ns; end dataflow;
one single signal assignment

JMM v1.4
Exercises Ex402: Conditional Signal Assignment

?Ex402 (difficulty: easy): Define the VHDL code of a 1bit ALU with the operations: AND, OR, FullAdder. Use the resolved 9 value logic of the IEEE 1164 package. The Simple1bitALU.vhd file has to be analyzed and simulated by the Synopsys commands: gvan and vhdldbx
a carryIn b opcode carry Alu32 result

JMM v1.4
Delays: Delta Delay Model

?The VHDL language distinguished between tree delay models:
?Delta delay model ?Inertial delay model (default) ?Transport delay model
? Delta delay model

?If no delay is specified, a delta delay is assumed. A delta delay is as small as zero delay. It is used by the simulator which sums delta delays to zero.
in1 in2 z s1 s2 s3 s4
0 10 20 30 40 50
architecture delta_delay of Comb is signal s1,s2,s3,s4: std_logic:=0; begin s1 <=not(in1); s2 <=not(in2); s3 <=not(s1 and in2); s4 <=not(s2 and in1); z <=not(s3 and s4); end delta_delay; 60 70 in2 s2 s3 z 10 ? 2? 3?

JMM v1.4
Delays: Inertial Delay Model

?Digital circuits have a certain amount of inertia. For example it takes a finite amount of time and a certain amount of energy for the output of a gate to respond to a change on the input ? Inertial delay model (default)
?a pulse shorter than the propagation delay will not propagate to the output
out1 <= (a xor b) after 8 ns; out2 <= (a xor b) after 2 ns; input out1 out2
2 ns 8 ns input out
output for delay: 8 ns output for delay: 2 ns

time (ns)
10
15
20
25
30
35
40
VHDL93! sum <=reject 2 ns inertial (a xor b) after 5 ns;

JMM v1.4
Delays: Transport Delay Model

?Unlike switching devices, wires have a comparatively less inertia, As a result, wires will propagate signals with very small pulse width. ?In modern technologies with increasingly small feature sizes the wire delays dominate. ? Transport delay model (default)
?any pulse will propagate to the output, independent of the delay
out1 <= transport (a xor b) after 8 ns;

input out
input out1
5 8 ns
output for delay: 8 ns

time (ns) 15 20 25 30 35 40
10

JMM v1.4
Delay Model in Practice

?Accurate delay modeling of wire delays is possible, although in practice it is difficult to obtain accurate estimates of the wire delay without proceeding through physical design and layout of the circuit.
a b s1 sum library IEEE; use IEEE.std_logic_1164.all; entity HalfAdder is port (a,b: in std_logic; carry,sum:out std_logic); end HalfAdder; architecture transport_delay of HalfAdder is signal s1,s2: std_logic:=0; begin s1 <= (a xor b) after 2 ns; s2 <= (a and b) after 2 ns; sum <= transport s1 after 4 ns; carry <= transport s2 after 4 ns; end transport_delay;
s2
carry
a b sum carry s1 s2
0
JMM v1.4
inertial transport
time (ns) 2 4 6 8 10 12 14
Exercises vlsi21: Conditional Assignments

?Ex403 (difficulty: easy): Write and simulate a VHDL model of a 2-bit comparator (compare on equality, filename: Comp2.vhd).
a b Comp2 c
?Ex405 (difficulty: easy): Construct and test a VHDL module for generating the following waveforms. a b c time (ns)
0 10 20 30 40 50 60
?Ex vlsi21 (difficulty: easy): Have a look at the exercises at the end of chapter 3 of VHDL: Starters Guide

JMM v1.4
chapter 4
The Process Construct #1

?The continuous assignment model is used when components correspond to gates. ?The process construct enables the use of conventional programming language constructs. ?In contrast to concurrent signal assignment statements a process is a sequentially executed block of code. ?Control flow within a process is strictly sequential. ?With respect to simulation time a process executes in zero time.
architecture behavior of MyProcess is begin process process declarative part begin process body end process; end behavior;
JMM v1.4
Example: Process Statement

library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all; entity Memory is port (addr,wrData: in std_logic_vector(31 downto 0); wr,rd: in std_logic; rdData :out std_logic_vector(31 downto 0)); end Memory; architecture behavioral of Memory is type memArray is array(0 to 1024) of std_logic_vector(31 downto 0); begin sensitivity list MemProcess: process(addr,wr) variable mem: memArray :=( (x00000A06), -- initializing memory data others => (x00000000)); variable addrIndex: integer; begin immediate addrIndex:=conv_integer(addr); variable if (wr = 1) then assignment mem(addrIndex):=wrData; elsif (rd = 1) then rdData <=mem(addrIndex) after 10 ns; concurrent signal end if; assignment end process; end behavioral;
JMM v1.4
The Process Construct #2

?The execution of a process is initiated whenever an event occurs on any signal in the sensitivity list ?Once started the process executes to completion in zero (simulation) time. ?Processes execute concurrently with other processes and concurrent signal assignments. ?Concurrent signal assignments are in fact only special cases of processes.
identical behavior architecture behavior of MyBlock2 is begin process(a,b) begin c <= a and b after 5 ns; end process; end behavior;
architecture behavior of MyBlock1 is begin c <= a and b after 5 ns; end behavior; concurrent signal assignment process

JMM v1.4
VHDL vs. Verilog: Events

Events are variable or signal changes. Real circuits are event driven. VHDL ?process sensitivity list begin statements; end process; ?wait on/until/for event; Verilog-HDL ? always @(sensitivity list) statement ? initial (sensitivity list) statement
whow! everything is event driven like in real life

JMM v1.4
Conditional Programming Constructs

?If-then-else statement if condition then sequential statement [ elsif condition then sequential statement ] [ else sequential statement ] end; ?case statement case expression is {when choices => sequential statements } [ when others => sequential statements ] end case;

JMM v1.4
Example: Condition Statements

library IEEE; use IEEE.std_logic_1164.all; entity HalfAdder is port (a,b: in std_logic; sum,carry: out std_logic); end HalfAdder; architecture behavioral of HalfAdder is begin If_Process: process(a,b) begin if (a = b) then sum<= 0 after 5 ns; else sum<= (a or b) after 5 ns; end if; end process; Case_Process: process(a,b) begin case a is when 0 => carry <= a after 5 ns; when 1 => carry <= b after 5 ns; when others => carry <= x after 5 ns; end case; end process; end behavioral;
JMM v1.4
VHDL vs. Verilog: Combinational Logic Example

entity Multiplexer4to1 is port (sel: in std_logic_vector (1 downto 0); a,b,c,d: in std_logic_vector (15 downto 0); z:out std_logic_vector (15 downto0)); end Multiplexer4to1;
VHDL
architecture DemoExample of Multiplexer4to1 is begin 4 to 1 multiplexer process (a,b,c,d,sel) (no interfered memory) begin case sel is when (00) => z <= a; when (01) => z <= b; when (10) => z <= c; when (11) => z <= d; module Multiplexer4to1(sel,a,b,c,d,z); when others => z<=-------; input [1:0] sel; end case; input [15:0] a,b,c,d; end process; output [15:0] z; end DemoExample; assign z =(sel == 2d0) ? a: (sel == 2d1) ? b: (sel == 2d2) ? c: (sel == 2d3) ? d: Verilog-HDL 16bx; endmodule
JMM v1.4
Loop Programming Constructs

?for loop statement for index in range loop sequential statements end loop; ?while loop statement while condition loop sequential statements end loop;
loop index has not to be declared but can only be used locally

JMM v1.4
Example: Loop Statements

library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all;
entity Multiplier is b port (a,b: in std_logic_vector(31 downto 0); m: out std_logic_vector(63 downto 0)); end Multiplier;
Multiplier (32 bit)
architecture behavioral of Multiplier is constant modulDelay: Time:=10 ns; begin process(a,b) variable bReg: std_logic_vector(63 downto 0); variable aReg: std_logic_vector(31 downto 0); begin aReg:=a; bReg:=(x00000000) & b; for index in 1 to 32 loop if bReg(0)= 1 then bReg(63 downto 32):=bReg(63 downto 32)+aReg(31 downto 0); end if; bReg(63 downto 0):= 0 & bReg(63 downto 1); end loop; m<=bReg after modulDelay; end process; end behavioral;
JMM v1.4
Exercises vlsi21: Loops

?Ex405 (difficulty: easy): Write a VHDL code for a combinational shift logic block with 8 bit data buses with zero fill. Use the 2 bit signal shiftNum to indicate the number of bits to be shifted. If a std_logic_vector has to be converted to an integer type, the conv_integer() function from the std_logic_unsigned package can be used.
shiftLeft dataIn Shift shiftNum shiftRight dataOut

JMM v1.4
More on Processes
?Never assign a value to a signal in different processes (multiple drives).
process A y<=0; process B y<=1; conflict - two drivers! - not synthesisable!
?Upon initialization all processes are executed at once. ?Thereafter processes are executed in a data-driven manner:
?activated by events on signal list of the process or ?by waiting on occurrences of specific events using wait statements

JMM v1.4
The Wait Statement

?A more general way to specify when a process executes is the wait statement. ?Wait statements explicitly specify the conditions under which a process may resume execution after being suspended. ?With wait statements a process can be suspended at multiple points. wait for time expression; example: wait for 20 ns; wait on signal; example: wait on clk,reset,status; wait until condition; example: wait until (a = 1); wait;
JMM v1.4
Example: Wait Statements

library IEEE; use IEEE.std_logic_1164.all; entity Dff1 is port (d,clk: in std_logic; q,qBar: out std_logic); end Dff1; architecture behavioral of Dff1 is begin process begin wait until (clkevent and clk=1); q <=d after 1 ns; qBar<=not d after 1 ns; end process; end behavioral; library IEEE; use IEEE.std_logic_1164.all; entity Dff2 is port (d,clk,rst: in std_logic; q,qBar: out std_logic); end Dff2; architecture behavioral of Dff2 is begin process(clk,rst); begin if (rst=0) then q <= 0 after 1 ns; qBar<= 1 after 1 ns; elsif (clkevent and clk=1) then q <=d after 1 ns; qBar<=not d after 1 ns; end if; end process; end behavioral;
if a process has no sensitivity list you MUST use wait statements, otherwise your process never suspends and blocks your simulation
JMM v1.4
Latch vs. Flip-Flop

process(clk,reset) begin if (reset = 0) then q <= 0; elsif (clk=1) then q <= d; end if; end process; process(clk,reset) begin if (reset = 0) then q <= 0; elsif (clkevent and clk=1) then q <= d; end if; end process; d clk
D Q
Latch
reset
d clk
Flip-Flop
reset
process(clk,reset) begin if (reset = 0) then q <= 00000000; elsif rising_edge(clk) then if (enable = 1) then q <= d; end if; end if; end process;
JMM v1.4
d enable
Mux clk
register
reset
Exercises vlsi21: Synchronous

?Ex406 (difficulty: easy): Write a VHDL code for a 16 bit register with an enable and a asynchronous reset input. Register16
d enable q
clk reset
?Ex407 (difficulty: easy): Write a VHDL code for a 16 bit counter with an enable a load and a asynchronous reset input.
Counter16 enable load clk reset
JMM v1.4
data
count
More on Wait: Inter-Process Comm.

transmitData request acknowledge receiveData
entity Handshake is port(inputData: in std_logic_vector(31 downto 0)); end Handshake; architecture behavioral of Handshake is signal transmitData: std_logic_vector(31 downto 0); signal request, acknowledge: std_logic; begin producer: process begin wait until inputDataevent; transmitData<=inputData; request<=1; wait until acknowledge=1; request<=0; wait until acknowledge=0; end process; end behavioral; consumer: process variable receiveData: std_logic_vector(31 downto 0); begin wait until request=1; receiveData:=transmitData; acknowledge<=1; wait until request=0; acknowledge<=0; end process;
time

JMM v1.4
Exercises vlsi21: Handshake

?Ex vlsi21.8a (difficulty: easy): Write a VHDL model for communication between an input process and an output process using handshaking protocol. The input process can only read a single word (32 bit) at a time. The output device requires a reversing byte order, which is performed by the input process. Assign a delay of 1 ns to each handshake signal.
AsyncComm inputData input process output process outputData
?Ex vlsi21.8b (difficulty: medium, optional): Rewrite the above handshake model by using a clk1, clk2 signal for the two synchronous processes as well as a rst for initialization, and a start signal to initiate one data transfer. Do not use any wait constructions within the processes.
JMM v1.4
Attributes
attribute signalevent signalactive signallast_event signallast_active signallast_value signalleft signalright signalhight signallow signalascending signallength
JMM v1.4
function function returning a Boolean value signifying a change in value on this signal function returning a Boolean value signifying an assignment made to this signal (may not be a new value) function returning the time since the last event function returning the time since the signal was last active function returning the previous value of this signal returns the leftmost value of signal in its defined range returns the rightmost value of signal in its defined range returns the highest value of signal in its defined range returns the lowest value of signal in its defined range returns true if signal has an ascending range of values returns the number of elements in the array signal
Generating Periodic Waveforms

library IEEE; use IEEE.std_logic_1164.all; entity Periodic is port(Z: out std_logic); end Periodic;
Z
0 10 20 30
time (ns) 40 50
architecture behavioral of Periodic is begin process begin Z<=0, 1 after 10 ns, 0 after 20 ns, 1 after 40 ns; wait for 50 ns; end process; end behavioral; library IEEE; use IEEE.std_logic_1164.all;
entity TwoPhase is port(phi1,phi2,reset: out std_logic); 0 end twoPhase;
reset phi1 phi2
10
20
30
40
architecture behavioral of TwoPhase is begin reset_process: reset<=1, 0 after 10 ns; clock_process: process begin phi1<=1, 0 after 10 ns; phi2<=0, 1 after 12 ns, 0 after 18 ns; wait for 20 ns; end process; end behavioral;
JMM v1.4
50 60 time (ns)
Modeling Finite State Machines

reset inputData transition process clk state register enable state outputData output process outSig0 outSig1 outSig2
architecture behavioral of MooreFSM is type StateType is (MyState,YourState,InitState); signal state : StateType; signal outputData: std_logic_vector(5 downto 0); begin output_process: process(state) begin transition_process: process(reset,clk) case state is begin when MyState => if (reset = 0) then outputData<=0100; state <= InitState; when YourState => elsif rising_edge(clk) then outputData<=00100-; case state is when InitState => when MyState => outputData<= 100100; state<=YourState; when others => when YourState => outputData<=000000; if (inputDataSignal = 1) then end case; state<=MyState; end process; end if; when others => null; outSig0<=outputData(0); end case; outSig1 <=outputData(1); end if; outSig2<=outputData(2); end process; end behavioral;
JMM v1.4
Exercises vlsi21: FSM

?Ex409 (difficulty: easy): Write a VHDL model for a traffic light controller. Use a Moore type FSM. The signal carPresent indicates cars running on the main street which always have priority. If no cars are present on the main street, the secondary street gets green lights.
OrangeState carPresent carPresent GreenState
red orange green
main second main second
red orange green
0 1 0 1 0 0
RedState1
main second
0 0 1 1 0 0
red orange green
1 0 0 0 0 1
carPresent
RedState2
red orange green
reset
main second
1 0 0 0 1 0
carPresent

JMM v1.4
chapter 5
Modeling Structure
?a structural model of a system is described in terms of interconnection of its components ? a structural model consists of 3 features:
?component declaration ?signal declaration ?component interconnection
HalfAdder3 a sum b carry OR2 a ports component label H1 in1 in2 cIn HalfAdder3 a sum b carry s1 H2 HalfAdder3 a sum b carry OR2 a b O3 component interconnection sum z cOut s2 s3 b z
component declaration

JMM v1.4
Example: Structural Model

library IEEE; use IEEE.std_logic_1164.all; entity FullAdder3 is port (in1,in2,cIn: in std_logic; sum,cOut: out std_logic); end FullAdder3; architecture structural of FullAdder3 is component HalfAdder3 port(a,b: in std_logic; sum,carry: out std_logic); end component; component OR2 port(a,b: in std_logic; z: out std_logic); end component; signal s1,s2,s3: std_logic; begin H1: HalfAdder3 port map(a=>in1,b=>in2, sum=>s1,carry=>s3); H2: HalfAdder3 port map(a=>s1,b=>cIn, sum=>sum,carry=>s2); O3: OR2 port map(a=>s2,b=>s3, z=>cOut); end structural;
component behavior described elsewhere component declaration
signal declaration component interconnection (netlist)

JMM v1.4
Exercises vlsi21: Structural Model

? Ex410 (difficulty: medium): Write a VHDL code for the structural model of the FullAdder3 described in the previous transparency. Assume a delay of 1 ns for all logic gates a) Write the structural VHDL code for a HalfAdder. b) Write the VHDL codes for the necessary logic gates like OR2 and others in one file (logicgates.vhd) b) Write the VHDL code for FullAdder3 c) Analyze and simulate the whole circuit. Be aware of the correct sequence of analyzing.

JMM v1.4
port (a,b,cIn:in std_logic; cOut,sum:out std_logic); end FullAdder4;
VHDL vs. Verilog: library IEEE; Structural use IEEE.std_logic_1164.all; Description entity FullAdder4 is
architecture flatStructure of FullAdder4 is component XOR port(a,b: in std_logic; z:out std_logic); end component; component AND2 port(a,b: in std_logic; z:out std_logic); end component; component OR3 port(a,b,c: in std_logic; z:out std_logic); end component; signal net1,net2,net3,net4:std_logic; begin u1: XOR port map (a,b,net1); u2: XOR port map (cIn,net1,sum); u3: AND2 port map (cIn,a,net2); u4: AND2 port map (cIn,b,net3); u5: AND2 port map (a,b,net4); u6: OR3 port map (net2,net3,net4,cOut); end flatStructure;
VHDL
module FullAdder4 (a,b,cIn,cOut,sum); input a,b,cIn; output cOut,sum; wire net1,net2,net3,net4; XOR u1(net1,a,b); XOR u2(sum,cIn,net1); AND2 u3(net2,cIn,a); AND2 u4(net3,cIn,a); AND2 u5(net4,a,b); OR3u6(cOut,net2,net3,net4); endmodule
Verilog-HDL
JMM v1.4
VHDL vs. Verilog: Data Flow Description

library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_unsigned.all; entity FullAdder5 is port (a,b,cIn:in std_logi; sum,cOut:out std_logic); end FullAdder5;
VHDL
architecture dataFlow of FullAdder5 is signal tmp: std_logic_vector(1 downto 0); begin tmp <= 0 & a + b + cIn; cOut <= tmp(1); sum <= tmp(0); end behavior; module FullAdder5 (a,b,cIn,sum,cOut); input a,b,cIn; output cOut,sum;
Verilog-HDL
assign {cOut,sum} = a + b + cIn; endmodule

JMM v1.4
Hierarchy, Abstraction, and Accuracy

?Structural models simply describe interconnections ?Structural models do not describe any form of behavior ?Hierarchy expresses different levels of detail ?Structural models are a way to manage large, complex designs ?Modern designs have several 10 millions of gates ?Simulation time: the more detailed a design is described, the more events are generated and thus the larger the simulation time will be needed.
FullAdder3
top level
OR2
HalfAdder3
AND2
XOR2
bottom level
JMM v1.4
Generics
?The VHDL language provides the ability to construct parameterized models using the concept of generics
entity AND2 is generic(andDelay: Time); port(a,b : in std_logic; z: out std_logic; end AND2; architecture genericDelay of AND2 is begin z<=a and b after andDelay; end genericDelay;
library IEEE; use IEEE.std_logic_1164.all; entity HalfAdder4 is generic(adderDelay: Time:=3 ns); port(a,b : in std_logic; sum,carry: out std_logic; end HalfAdder4;
architecture genericDelay of HalfAdder4 is component AND2 is generic(andDelay: Time); port(a,b : in std_logic; z: out std_logic; end component; component XOR2 is generic(xorDelay: Time); port(a,b : in std_logic; z: out std_logic; end component; begin C1: XOR2 generic map(12 ns) port map(a,b,sum); C2: AND2 generic map(adderDelay) port map(a,b,carry); end genericDelay;
JMM v1.4
values to generics can be assigned at different locations
no semi column needed
More on Generics
?Within a structural model there are two ways in which the values of generic constants of lower level components can be specified:
?in the component declaration ?in the component instantiation
?If both are specified, then the value provided by the generic map() takes precedence. ?If neither is specified, then the default value defined in the model is used.
library IEEE; use IEEE.std_logic_1164.all; entity GenericOR is generic(n: positive:=2); port(in1: in std_logic_vector((n-1) downto 0); z: out std_logic); end GenericOR; architecture behavioral of GenericOR is begin process(in1) variable sum: std_logic:=0; begin sum:=0; for i in 0 to (n-1) loop sum:=sum or in1(i); end loop; z<=sum; end process; end behavioral;
JMM v1.4
Exercises vlsi21: Hierarchy, Generic

? Ex411 (difficulty: medium): Write a VHDL code of an 8 bit ALU based on the definitions made in Ex402 with the Simple1BitALU. a) Write a behavioral VHDL code for ALU8b.vhd b) Write the structural VHDL code for ALU8 in one file ALU8s.vhd. Assume a delay of 1 ns for all logic gates. What is the worst case delay of the ALU8. ? Ex412 (difficulty: easy): Write a VHDL code of an n bit register with reset and enable inputs (NbitRegister.vhd).

JMM v1.4
Configuration
?Structural models may employ different levels of abstraction ?Each component in a structural model may be described as a behavioral or a structural model ?Configuration allows stepwise refinement in a design cycle ?Configuration represents resource binding ?Description-synthesis design method
Configuration associates an architecture description to each component: - behavioral or - structural for FullAdder3
FullAdder3
OR2
HalfAdder3
AND2
XOR2
JMM v1.4
Configuration: Component Binding

?Example of binding architectures: A bit-serial adder ?one of the different architectures must be bound to the component C1 for simulation ?entity is not bound as interfaces do not change
architecture gataLevel of Comb is - - C1 a b Comb (combinational logic) C2 carryIn q Dff d clock sum carry architecture highSpeed of Comb is - - -
architecture lowPower of Comb is - - -
clk rst reset
architecture behavioral of Comb is - - MicroLab, VLSI-21 (57/94)

JMM v1.4
Configuration: Default Binding Rules

?To analyze different implementations, we simply change the configuration, compile and simulate. ?When newer component models become available we bind the new architecture to the component Default binding rules: ?if the entity name is the same as the component name, then this entity is bound to the component ?if there are different architectures in the working directory, the last compiled architecture is bound to the entity

JMM v1.4
Example: Configuration
C1 in1 in2 carryIn highSpeed Comb (combinational logic) C2 MyDff q d clk rst configuration name (used for simulation) reset clock behavioral sum carry
entity name
configuration CFG_HighSpeed of SerialAdder is for structural for C1: Comb use entity WORK.Comb(highSpeed); end for;
library name entity name architecture name
for C2: Dff use entity MyLibrary.MyDff(behavioral) generic map(gateDelay=>5 ns) port map(my_clk=>clk, my_d=>d, if different component my_q=>q, my_rst=>rst); than described in end for; entity is used, then I/O mapping must end for; be declared. end CFG_HighSpeed;
JMM v1.4
Exercises vlsi21: Configuration

? Ex 413 (difficulty: easy): Write a VHDL code of the bit-serial adder shown in the previous transparency SerialAdder.vhd a) Construct a model for the two components Comb and MyDff and place them both in your WORK library (dont use the library MyLibrary yet). b) Adapt the configuration, compile and simulate it. ? Ex414 (difficulty: medium): Consider the circuit shown below (ConfigExample). Construct a structural model comprised of three components. However in the configuration use only two components by using a n-input AND gate.
i1 i2 i3
&
?1
&
o1

JMM v1.4
chapter 6
Subprograms, Packages and Libraries

?VHDL provides mechanisms for structuring programs, reusing software modules, and otherwise managing design complexity. ?Packages contain definitions of procedures and functions that can be shared across different VHDL models. ?Packages may contain user defined data types and constants and can be placed in libraries. ?Summary: procedures, functions, packages and libraries provide facilities for creating and maintaining modular and reusable VHDL programs.

JMM v1.4
Functions
?Functions are used to compute a value based on the values of the input parameters. Functions are placed in declarative parts. Example of function definition: function rising_edge (signal clock: in std_logic) return boolean; ?Functions cannot modify parameter values (procedures can). Example of function call: rising_edge(clk) ?Functions execute in zero simulation time, thus wait statements cannot exist in functions. Parameters are restricted to be of mode in.
mode not necessary function rising_edge (signal clock: std_logic) return boolean is variable edge: boolean:=false; begin edge:=(clock= 1 and clockevent); return(edge); end rising_edge;
JMM v1.4
Example: Type Conversion Function with Functions

?As VHDL is a type sensitive language, type conversions are quite often necessary.
note: size is not declared
function to_bitvector(svalue: std_logic_vector) return bit_vector is variable outvalue: bit_vector(svaluelength-1 downto 0); begin for i in svaluerange loop case svalue i is when 0 => outvalue i:=0; when 1 => outvalue i:=1; when others => outvalue i:=0; end case; end loop; end to_bitvector;
?Many conversion procedures as well as resolution functions can be found in std_logic_1164 or std_logic_arith libraries and others. Have a look at $SYNOPSYS/packages/IEEE/src/
JMM v1.4
Procedures
?Procedures are subprograms that can modify one or more of the input parameters. Example of procedure declaration reading from a file f: procedure read_v1d (variable f: in text; v: out std_logic_vector); ?if the class of the procedure parameters is not explicitly declared, then the following rules apply:
?parameters of mode in are assumed to be of class constant ?parameters of mode out or inout are assumed to be of class variable
?Variables declared within a procedure are initialized on each call to the procedure and their values do not persists across invocations of the procedure. ?Signals cannot be declared within procedures ?Poor programming: Procedures declared within process can make assignments to signals corresponding to the ports of the encompassing entity. ?Procedure call: Dff(clk=>clk,reset=>reset,d=>s2,q=>s1,qbar=>open);
JMM v1.4
Example: Procedure
library IEEE; use IEEE.std_logic_1164.all; entity CPU is port(di: out std_logic_vector(31 downto ); addr: out std_logic_vector(2 downto 0); r,w: out std_logic; do: in std_logic_vector(31 downto 0); s: in std_logic); end CPU; architecture behavioral of CPU is procedure Mread(address: in std_logic_vector(2 downto 0); signal r: out std_logic; signal s: in std_logic; signal addr: out std_logic_vector(2 downto 0); signal data: out std_logic_vector(31 downto 0)) is begin addr<=address; procedure Mwrite(address: in std_logic_vector(2 downto 0); r<=1; signal data: in std_logic_vector(31 downto 0); wait until s=1; signal addr: out std_logic_vector(2 downto 0); data <= do; signal w: out std_logic; r<=0; signal di: out std_logic_vector(31 downto 0)) is end Mread; begin addr<=address; w<=1; wait until s=1; di <= data; begin w<=0; -- CPU behavioral end Mwrite; -- description end behavioral;
JMM v1.4
Overloading
?A very useful feature of the VHDL language is the ability to overload a subprogram or an operator. ?Imagine writing different Flip-Flop models with no and with asynchronous inputs and with different argument types. With the overloading feature only one single Flip-Flop name can be used. ?Example for Dff calls: Dff(clk,d,q,qbar); Dff(clk,d,q,qbar,reset,clear); ?From the type and number of arguments we can tell which procedure we meant to use. ?Note that in std_logic_1164.vhd the boolean functions and, or, etc have been defined for std_logic types, the functions +,*, etc have been defined for certain predefined types of the language such as integer. See also std_logic_arith package. function *(arg1,ar2: std_logic_vector) return std_logic_vector; function +(arg1,ar2: singed) return signed;
JMM v1.4
Packages
?Locally related functions and procedures can be grouped into packages, and thus easily be shared among designs and people.
package MyLibraryPackage is --- type declarations -- function declarations -- procedure declarations -end MyLibraryPackage; package body MyLibraryPackage is --- functions -- procedures -end MyLibraryPackage;
package declaration
similar to VHDL entity defines interfaces
package body
similar to VHDL architecture defines behavior
?package declaration needs to be analyzed first, and then package body can be analyzed. ?Packages are used as libraries and referenced within VHDL design units via the use clause.
JMM v1.4
Example: Package Declaration

package std_logic_1164 is type std_ulogic is (U, -- uninitialized X, -- forcing unknown 0, -- forcing 0 1, -- forcing 1 Z, -- high impedance W, -- weak unknown L, -- weak 0 H, -- weak 1 - -- dont care); type std_ulogic_vector is array (natural range <>) of std_ulogic; subtype std_logic is resolved std_ulogic; type std_logic_vector is array (natural range <>) of std_logic; function and (l,r: std_logic_vector) return std_logic_vector; function and (l,r: std_ulogic_vector) return std_ulogic_vector; -- rest of package declaration end std_logic_1164;

JMM v1.4
Libraries
?Each design unit - entity, architecture, package - is analyzed (compiled) and placed in a design library. ?Libraries are generally implemented as directories and are referenced by a logical name. ?In VHDL the libraries STD and WORK are implicitly declared. ?WORK is the working design library normally placed in a local directory. ?Once a library has been declared, all of the functions, procedures and type declarations of a package can be accessed.
library IEEE; use IEEE.std_logic_1164.all; all functions, procedures, typed are visible
only the xnor function is visible library IEEE; use IEEE.std_logic_1164.xnor; visibility must be established for each design unit entityseparately
JMM v1.4
Synopsys tools on unix workstations
Example: Libraries and Packages

/home/MyHome/ - VHDLdesign/ - WORK/ library: - lib/ MyLibrary - src/ .synopsys_vss.setup DEFAULT: ./WORK MyLibrary : ./lib use = . ./src timebase = ns MyPackage.vhd package MyPackage is -end MyPackage; package body MyPackage is -end MyPackage; in a unix shell: analyze the package MyPackage analyze the design MyVHDLdesign cd /home/MyHome/VHDLdesign gvan w MyLibrary src/MyPackage.vhd gvan MyVHDLdesign.vhd components can also be placed into libraries design environment .synopsys_vss.setup all source VHDL design files all compiled designs compiled package: MyPackage source file: MyPackage.vhd
MyVHDLdesign.vhd library MyLibrary; use MyLibrary.MyPackage.all; -- use MyLibrary.all;
entity MyVHDLdesign is /home/MyHome/VHDLdesign/lib ... /home/MyHome/VHDLdesign/WORK

JMM v1.4
library WORK
library MyLibrary
Exercises vlsi21: Libraries & Packages

? Ex415 (difficulty: medium): The small circuit ConfigExample from exercise Ex414 shall be rewritten by using the components OR2 and ANDn from the library MyLibrary. a) Write the VHDL file MyComponents.vhd holding the two components OR2 and ANDn and compile it into the library MyLibrary. b) Rewrite the ConfigExample circuit using only library elements and call it LibraryExample.vhd, compile and simulate it.
i1 i2 i3 ANDn & ANDn & OR2
?1
o1
? Ex416 (difficulty: medium): Write the VHDL package MyPackage with the functions OneCounter (counting 1) and ParityGenerator should accept std_logic_vectors or bit_vectors of any size), and analyze it into the library MyLibrary. Use the defined functions in your design PackageExample to show its functionality.
JMM v1.4
Exercises vlsi21: Packages

? Ex417 (difficulty: medium): The bit-serial adder of exercise Ex413 shall we rewritten using a procedure call for the Dff instead of a component (SerialAdder2.vhd). Place the procedure into a package MyPackage and analyze it into the library MyLibrary. Verify the functionality.
C1 in1 in2 carryIn q highSpeed Comb (combinational logic) Dff d clock behavioral library MyLibrary sum carry clk rst reset

JMM v1.4
VHDL vs. Verilog: Data Types

VHDL ?type driven language ?predefined data types in packages:
character, integer, real, bit, std_logic, textio, ...
Verilog-HDL ? arrays ? run-time constants: parameter ? continuous driven nets: wire, tri, ... ? triggered assignments: reg, integer, real, ...
?enumerate types ?arrays ?records ?pointers

JMM v1.4
VHDL vs. Verilog: Operators

Operator type arithmetic function a+b a-b a*b a/b a div b a and b a or b not(a and b) a exor b shift logic shift arith. rotate VHDL + * / mod rem and or nand xor srl,sll sra,sla ror, rol & Verilog + * / % & ~& ^ >>
a-b*n a-(a/b)*b logical
reduction, concatenation replication relational
{a,b} {4{a}} > >=
> >= /=
> >=

JMM v1.4
VHDL vs. Verilog: Sequential Structures

-- inside an architecture ... variable inp: std_logic_vector (7 downto 0); variable outp,cout:std_logic_vector (7 downto 0); process (clk) begin if (clkevent and clk = 1) then outp := outp + inp; cout := outp + 1; end if; end process; ...
VHDL
sequentially executed statements
Verilog-HDL
/* inside a module */ ... wire [7:0] inp; reg [7:0] outp, cou; ... always @(posedge clk) begin outp = oupt + inp; cout = outp + 1; end ...
JMM v1.4
VHDL vs. Verilog: Parallel Structures

-- in an architecture ... variable inp: std_logic_vector (7 downto 0); signal outp,cout:std_logic_vector (7 downto 0); p1: process (clk) begin if (clkevent and clk = 1) then outp <= outp + inp; cout <= outp + 1; end if; end process; p2: process (reset) begin if (reset = 0) then outp <= 00000000; end if; end process; ...
VHDL parallel executed statements

/* in a module */ ... wire [7:0] inp; reg [7:0] outp, cou; ... always @(posedge clk) fork outp = outp + inp; cout = outp + 1; join always @(reset) if (!reset) outp = 8b0; ...
two drivers
parallel executed blocks Verilog-HDL
JMM v1.4
VHDL vs. Verilog: Assignments

architecture ex1 of AssignExample is signal x1, y1, y2, z1, z2: std_logic_vector (7 downto 0); variable x2: std_logic_vector (7 downto 0); ... begin p1: process (clk) begin if (clkevent and clk = 0) then x1 <= y1; y1 <= x1; z1 <= y1 after 12ns; end if; end process; p2: process (y2) begin x2 := y2; y2 <= x2; z2 <= y2 after 12ns; end process; end ex1;
VHDL
signal assignment variable assignment Verilog-HDL

module AssignExample wire [7:0] v,y2,z2; reg [7:0] x1,y1,z1,x2; ... always @(posedge clk) fork x1 = y1; y1 = x1; z1 #(12) = y1; join assign x2 = y2; assign y2 = x2; assign #(12) z2 = y2; endmodule
before the falling edge of clk: x=1, y=2, z=3 12ns after falling edge of clk: ? x= y= z=
JMM v1.4
VHDL vs. Verilog: Sequential Logic

library IEEE; use IEEE.std_logic_1164.all; package MyDefinition is type vector16 is array (15 downto 0) of std_logic; end MyDefinition; library IEEE; use IEEE.std_logic_1164.all; use work.MyDefinition.all; entity AsynRegister is port (clk,rst: in std_logic; a: in vector16; z: out vector16); end AsynRegister; architecture DemoExample of AsynRegister is begin process (clk, rst); begin if (rst = 0) then z <= vector16(others => 0); elsif (clkevent and clk = 1) then z <= a; end if; end process; end DemoExample;
JMM v1.4
register with asynchronous reset VHDL
Verilog-HDL
module AsynRegister(clk,rst,a,z); input clk,rst; input [15:0] a; output [15:0] z; always @(posedge clk) if (rst == 1b0) z = 16b0; else z = a; endmodule
Dataflow Modeling
library IEEE; use IEEE.std_logic_1164.all; entity Demux2x4 is port( a,b,enable: in std_logic; z: out std_logic_vector(0 to 3);); end Demux2x4; architecture dataflowof Demux2x4 is signal abar,bbar: std_logic; begin z(3) <= not(a and b and enable); z(0) <= not(abar and bbar and enable); abar <= not a; z(2) <= not(a and bbar and enable); abar <= not a; z(1) <= not(abar and b and enable); end dataflow;
All the signal assignment statements (<=) happen concurrently after some specified delay which defaults to 1 delta, an infinitesimally small delay. Note that concurrent statements are always running so whenever A, B or ENABLE change then ABAR, BBAR, and Z(0 to 3) will also change after some delay. The delay in assigning a signal its new value means that the following statement is meaningful (it generates a periodic waveform): CLK <= not CLK after 10 ns;
JMM v1.4
VHDL Example: Behavioral Modeling

library IEEE; use IEEE.std_logic_1164.all; use IEEE.std_logic_arith.all; entity Demux2x4 is port( a,b,enable: in std_logic; z: out std_logic_vector(0 to 3);); end Demux2x4; architecture behavioral of Demux2x4 is begin process(a,b,enable) variable abar,bbar: std_logic; begin abar := not a; bbar := not b; if (enable = 1) then z(3) <= not(a and b); Process statements z(2) <= not(a and bbar); can be compiled so z(1) <= not(abar and b); behavioral simulations z(0) <= not(abar and bbar); can be quite fast. else z <= 1111; end if; end process; end behavioral;
local variables (separate copies for each instance of Demux2x4)
vector constant
Statements within a process are executed sequentially, like a program. The process is scheduled for execution after any events are processed for variables on its sensitivity list. Values of local variables are maintained between executions.
JMM v1.4
Synthesis
Idea: once an behavioral model has been finished why not use it to automatically synthesize a logic implementation in much the same was as a compiler generates executable code from a source program?
a.k.a. silicon compilers
Synthesis programs process the HDL then

?infer logic and state elements ?perform technology-independent optimizations (e.g., logic simplification, state assignment) ?map elements to the target technology ?perform technology-dependent optimizations (e.g., multi-level logic optimization, choose gate strengths to achieve speed goals)
Synopsys, Inc. is the current leader in providing synthesis tools and synthesizable HDL modules.

JMM v1.4
Logic Synthesis
Z <= (A and B) or C;
A B C A B C
if (SEL = 1) then Z <= B; else Z <= A; end if;

Z B A Z SEL B SEL A Z
1 0
signal x,y,sum: std_logic_vector(3 downto 0); sum <= unsigned(x) + unsigned(y);

SUM(0) full adder SUM(1) full adder SUM(2) full adder SUM(3) full adder
NC
X(0) Y(0)
X(1) Y(1)
X(2) Y(2)
X(3) Y(3)
process(word) variable result: std_logic; begin result := 0; for j in 0 to 3 loop result := result xor word(j); end loop; parity <= result; end process;
WORD(3)
PARITY
WORD(2) WORD(1) WORD(0)

JMM v1.4
FSM Example
architecture behavioral of Moore is type StateType is (S0,S1,S2,S3); signal current,next: StateType; begin process(current) -- state transition begin case current is when S0 => if (a=1) then next<=S2; end if; when S1 => if (a=1) then next<=S0; else next<=S2; end if; when S2 => if (x=1) then next<=S3; end if; when S3 => if (x=1) then next<=S1; end if; end case; end process; process(current) -- output logic begin case current is when S0 => z <= 0; when S1 => z <= 0; when S2 => z <= 1; when S3 => z <= 1; end case; end process; process(clk,reset) -- state register begin if (reset=0) then current<=S0; elsif (clkevent and clk=1) then current <= next; end if; end process; end behavioral;
JMM v1.4
z=0
a
S0
a
z=0
a
S1
z=1
a
S2
z=1
a
S3
Further Reading
ISBN 0-13-181447-8
ISBN 0-7923-9472-0
Also: ? D. Perry, VHDL, Second Edition, McGraw Hill, 1993 ? see VHDL tutorials at I3S-CD or on the web http://www.microlab.ch/academics/courses/vlsi/g.html ? dont forget to study the CBT tutorial on VHDL

JMM v1.4
Test Bench
/1
?avoid interactive simulation, because it can never be used again ?test benches reduce total simulation development time ?test benches are used to verify designs during stepwise refinement ?test bench methodology bridges simulation with automatic test equipment (ATE)
I can relax my test bench does everything automatically

JMM v1.4
Test Bench /2
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested ?there is a nice measurement equipment ?there are skilled and hard working people ?there are no signals coming or going to the outside of the lab
Test Bench

why do we need MicroLab if my test bench does the job as well

JMM v1.4
Test Bench in Design Flow

design of VHDL model
simulation of VHDL model test bench inp
VHDL model
FPGA synthesis place & route

FPGA test (debugger) test bench test machine inp
FPGA chip
out
synthesis of logic model

simulation of logic model test bench inp out
out
place & route physical design

simulation of extracted model test bench inp out
ASIC fabrication
prototype test (ASIC)
JMM v1.4
test bench test machine inp

ASIC chip
out
VHDL Test Bench

use IEEE.std_logic_1164.all entity TestBench is end TestBench; architecture sample of TestBench is signal clk, a: bit; signal b: bit; component MyCircuit port(clk,a:in bit; b: out bit); end component; begin DUT: MyCircuit port map (clk,a,b); process begin clk <= 0, 1 after 20 ns, 0 after 70 ns; wait for 100 ns; end process; TestPatternGenerator: block begin process begin a <= 0; -- test cycle 1 wait for 100 ns; a <= 1; -- test cycle 2 wait for 100 ns; ... end process; end block; end sample;
JMM v1.4
test bench has no inputs, no outputs
call of device under test (DUT) clk generation
test pattern generation on a cycle by cycle basis
response pattern verification not yet implemented!
Test Bench - Test Cycle

?design strictly synchronous circuits ?cycle based test benches
test cycle apply input patterns capture output response
clock
input
valid
output
stable
stable
stable

JMM v1.4
ProTest Test Machine

?test bench controls CAD simulator and test machine ?low cost rapid prototyping and test system

JMM v1.4
Conclusions
?HDLs are very useful for behavioral hardware system descriptions ?abstract models do not precisely reflect the reality ?restriction to synthesizable coding is necessary ?technology independency opens the possibility to fast FPGA prototyping ?test benches increase chip quality and highly decrease simulation time

JMM v1.4
Exercises: VLSI-21: Test-Bench

?CAD Ex418: Test-Bench (difficulty: easy) Instead of interactive simulation or writing macros for interactive simulation, it is state-of-the-art to use test-benches for simulation and chip test. Write a test bench file tb_SerialAdder2.vhd for the previous exercise Ex417. Generate the clock signal with a process and write sequential test cycles for the input signals. Be aware that the test-bench has no input and output signals, but calls the unitunder-test (UUT) and generates all stimuli.
test bench inp

VHDL model
out

JMM v1.4
Coming Up...
?Next topic
CAD exercises and mini FPGA projects PWM, blackjack dealer, simple microprocessor, etc
?Readings for next time...

VHDL tutorials A Prototype Test System for ASICs and FPGAs with a Tight Link to VHDL and Verilog-HDL Based CAD Simulators, DATE99. Design Automation and Test Engineering in Europe Conference, Jacomet et. al. (see on the MicroLab web) On a Development Environment for Real-Time information Processing in System-on-Chip Solutions, SBCCI01, IEEE 14th Symposium on Integrated Circuit and System Design, Brazil Sept. 2001, Jacomet et. al. (see on the MicroLab web)

JMM v1.4
Exercises: VLSI-21
#1
?CAD Ex45x: PWM Project (difficulty: easy; time: medium): Design of a pulse width modulator (PWM) controlling a DC-motor. The PWM shall have an microprocessor interface. The VHDL design is simulated, compiled and implemented into an FPGA and is supposed to drive small dc motor. ?CAD Ex450: (difficulty: easy): Design the VHDL code of the PWM element. The btrdy and ack signals are handshake signals for communication with the microprocessor data bus. A value 0 on the 8-bit data bus will switch off the dc motor (pOut=1), a non-zero value will generate a PWM signal with an on-time of (data/256)*100% of a period. Analyze the VHDL syntax with gvan.
btrdy ack data clk rst
JMM v1.4
(data/266) * 100% PWM

8
pOut PWM period
Exercises: VLSI-21
#2
?CAD Ex451: (difficulty: easy): Design a test-bench for the PWM. Simulate your VHDL code with the Synopsis VSS simulator and use your test-bench to verify its correct behavior. ? Result: see exercise Ex451 on the MicroLab web ?CAD Ex452 (difficulty: easy): Synthesize the PWM VHDL code into a gate level schematic for a Xilinx FPGA target technology. Connect your VHDL signals to the correct FPGA pins. Perform the place&route of the logic elements. ? Result: see exercise Ex452 on the MicroLab web ?CAD Ex453 (difficulty: easy): Download your PWM circuit into an FPGA and and applying different PWM values to your circuit by the GECKO User Interface tool. Use an oscilloscope to verify its correct output behavior. This exercise has to be done in MicroLab, using the GECKO system. ? Result: see exercise Ex453 on the MicroLab web
JMM v1.4
Exercises: VLSI-21
#3
?CAD Ex400 (difficulty: easy): Design VHDL a 2:1 multiplexer. Use a dataflow model. Simulate your VHDL code with the Synopsys VSS simulator. Get familiar with interactive simulation. ? Result: see exercise Ex400 on the microlab web ?CAD Ex401 (difficulty: medium): Design in VHDL a the SN74160 synchronous decimal counter. Use a behavioral model. Simulate your VHDL code with the Synopsis VSS simulator and use macros for interactive simulation. ? Result: see exercise Ex401 on the microlab web ?CAD Ex402 (difficulty: easy): Schematic entry of the blackjack-dealer on block-level using Synopsys SGE tool. ? Result: see exercise Ex402 on the microlab web

JMM v1.4
Exercises: VLSI-21
#4
?CAD Ex403 (difficulty: easy): Skeleton VHDL code generation for blackjack-dealer from block-level schematics using Synopsys SGE tool. ?Result: see exercise Ex403 on the microlab web ?CAD Ex404 (difficulty: medium): Design the VHDL code of your blackjack-dealer. Use the prepared templates as a guide. ?Result: see exercise Ex404 on the microlab web ?CAD Ex405 (difficulty: medium): Write a VHDL test bench for your blackjack dealer. Generate the following sequence of card values: 3, 11, 11, 7, 2, 11, 6. ?Result: see exercise Ex404 on the microlab web (final result is: 21)

JMM v1.4
Exercises: VLSI-21
#5
?CAD Ex406 (difficulty: easy): Synthesize the blackjack dealer VHDL code into a gate level schematic for a Xilinx FPGA target technology. Perform the place&route of the logic element. ? Result: see exercise Ex406 on the microlab web ?CAD Ex407 (difficulty: medium): Perform a backannotation of your FPGA chip and simulate it again with the real timing information. Does your chip still work? Look for the errors. ? Result: see exercise Ex407 on the microlab web ?CAD Ex408 (difficulty: medium): Download your blackjack dealer circuit into an FPGA and run your test bench again on the ProTest test system. This exercise has to be done in MicroLab. ? Result: see exercise Ex408 on the microlab web

JMM v1.4
VLSI Design I
Automatic Synthesis of Digital Circuits
Why should I not enjoy life instead of drawing schematics if CAD tools can do the job for me?
Overview design abstraction domains architectural models Goal: You are familiar with the design abstraction domains, the descriptiondescription-synthesis design method, the design strategies as well as the three synthesis steps. You know the FSMD architectural model as well as the interprocess communication models.
JMM v1.4
Introduction
system complexity is increasing product lifetime is decreasing design efficiency is essential new design methods are necessary higher abstraction levels are introduced CAD tools able to handle large amounts of data are needed

JMM v1.4
Design Methodology
budget ($, speed, area, power, schedule, risk) lowlow-level building blocks, highhigh-level architecture specification spice paper & pencil
Gee, I skipped these steps when doing the project!
behavioural design, verification logic design, verification layout, verification schematics simulation timing analysis layout, drc extraction net compare LVS (layout vs schematic)
JMM v1.4
CaptureCapture -Simulation Method

bottom-up approach bottom structure of a system is described knowledge of an experienced designer is difficult to automate
&
CLK data 3A
clk
ena

JMM v1.4
DescriptionDescription -Synthesis Method

toptop-down approach behaviour of a system is described technology independent CAD algorithms can search the solution space very quickly
& if data-ready then bus := data; else bus := high-Z; end if;
clk

JMM v1.4
Design methods for VLSI circuits

use advantages of toptop-down and bottombottom-up design methods automatic optimisations are not always ideal, but an optimisation of a 70000 gate design on a 100000 gate gategate-array makes no sense need of abstract design languages need to keep the design cycle short
what it is now toptop-down or bottombottom-up ?

JMM v1.4
Abstraction Domains
VLSI designs can be performed in 3 abstraction domains: behavioural domain structural domain physical domain each domain gives different freedoms to the designer parallel or serial algorithms logic technology and bitbit-slice fullfull-custom and macromacro-cells ...

JMM v1.4
YAbstraction Domains: Y -Chart

synthesis
Behavioural Domain
Structural Domain
applications , algorithm s processors progra ms system AL Us, registers subro utines, b. equ ations abstraction level instructionslogic gates tra nsistors
micro architecture abstraction level logic abstraction level
layout, transistors cells chips, mo d ules
circuit chips, MC M s , boar ds abstraction level Physical Domain

JMM v1.4
Behavioural Domain

description and verification of first ideas function and not implementation is asked modelling with general purpose languages modula-2, pascal, pascal, c, c++, lisp, ... modulamatlab, mathematica, mathematica, ... matlab, vhdl, verilogverilog-hdl, hdl, cathedral, ... vhdl, vee, ... graphic languages as vee, transformation to structural domain: synthesis
Behavioural Domain
progr a ms subro utines, b. equ ations instructions

JMM v1.4
Structural Domain
description and verification of a solution implementation decisions taken restrictions like delay, signal strength, etc. modelling styles
vhdl, verilogverilog-hdl, hdl, vhdl, schematic
transformation to physical domain: logic minimization, place and route tools
Structural Domain
processors AL Us, registers logic gates tra nsistors

JMM v1.4
Physical Domain
description and verification of physical implementation process technology specific implementation floorplan, floorplan, maskmask-layout, packaging description formats
cif, , gds2 cif stick diagrams, symbolic layout
layout, transistors cells chips, mo d ules chips, MC M s , boar ds
Physical Domain

JMM v1.4
Abstraction Levels
design domains are divided in several abstraction levels: system level micro architecture level logic level circuit level

JMM v1.4
Abstraction: System Level

highest abstraction level description with HDLs or graphical block diagrams
64 bit RISC
24 bit graphic accelerator
video interface
64 MByte memory
8 GByte hard disk
ISDN interface

JMM v1.4
Abstraction: Microarchitecture Level

register transfer system is a pure sequential machine use of memory elements and combinational logic register transfer is a complete specification on what a chip will do on every cycle
output input reg reg combinational logic
combinational logic reg combinational logic

JMM v1.4
Abstraction: Logic Level

circuit description on a quite low abstraction level today only used to design optimised functional blocks
cin
sel
a b
mux s
ALU
cout

JMM v1.4
Abstraction: Circuit Level

lowest abstraction level transistor schematic or maskmask-layout comparable to machine code in computer science
c a c b c y

JMM v1.4
Design Strategies
the goal is a fast as possible transfer of an idea to a chip descriptions in the 3 abstraction domains strategies used:
hierarchy regularity modularity locality
a strategy? why not adad-hoc

JMM v1.4
Design Strategies: Hierarchy
basic idea: divide and conquer dividing in modules, subsub-modules until complexity of subsub-modules is comprehensible comparison to software engineering: split programs in modules, procedures, subroutines.
cin a b
adder
sum cout
cin a b
sum
cout

JMM v1.4
Design Strategy: Regularity
goal is reduction of complexity idea: divide in similar building blocks identical blocks, subsub-blocks, cells, transistor sizes 1-dim. arrays: bitbit-slice technique 2-dim. arrays: systolic arrays
si+3 ci+3
si+2 ci+2
si+1 ci+1
si ci
full adder
ai+3 bi+3
full adder
ai+2 bi+2
full adder
ai+1 bi+1
full adder
ai bi
ci-1

JMM v1.4
Design Strategies: Modularity
different modules should not influence each other subsub-modules with well formed interfaces: do not use transmission gates well defined signal types and strengths well defined interconnection widths, etc.

JMM v1.4
Design Strategies: Locality

idea: reduction of complexity due to information hiding few global variables
inter-module influences reduction of inter reduction of global wiring
time locality leads to synchron designs (compare local variables in software engineering)
I cant see anything

JMM v1.4
Automatic Synthesis /1
automatic synthesis: transformation of a design from behavioural to structural domain silicon compilation: transformation from behavioural to physical domain
synthesis Behavioural Domain Structural Domain silicon compilation
Physical Domain

JMM v1.4
Automatic Synthesis /2
automatic synthesis tools on high abstraction levels do not exist yet not every description is synthesizable synthesis is a design process and not a only a coding as in software engineering synthesis steps:
allocation scheduling binding

JMM v1.4
Automatic Synthesis: Allocation

allocation defines the necessary resources clocking strategy, pipelining, memory structure etc. have to be defined manual allocation reduces the search space of design solutions tradetrade-off between chipchip-area and performance
parallel implementations of designs have high throughput, but consume large areas
delay
s1 s4 s6 s8 s10
s14
s18
s22
area
JMM v1.4
Allocation: Example
RTL example x = a + b; y = a * c; z = x + d; x = y - d; x = x + c; allocation: 1 adder, 1 multiplier, 1 substractor a + y x2 + x3
JMM v1.4
b *
c + z
Automatic Synthesis: Scheduling

scheduling defines the operation sequencing operations are bound to clock cycles scheduling principles:
resource limited: given a set of resources, solutions for a minimal execution time has to be found time-limited: given a total execution time, a minimal set timeof resources has to be found

JMM v1.4
Scheduling: Example
resource limited scheduling
each operation is bound to a clock cycle solutions for minimal execution time
directed acyclic graphs can be used
a +
b *
cycle 1 cycle 2 cycle 3
y -
d + x2 + x3 z

JMM v1.4
Automatic Synthesis: Binding

binding phase: operations and memory accesses within the clock cycles are bound to the hardware resources resources can be reused in different clock cycles binding steps:
variables are bound to memory elements operations are bound to functional blocks interconnection elements are bound for data transfers (busses, multiplexers) multiplexers)

JMM v1.4
Binding: Example
variables are bound to memories temporary variables x1 and x2 are not used simultaneously
b
a + x1 d
c * y
cycle 1
cycle 2
+ z
x2
cycle 3
+ x3

JMM v1.4
cont. Binding: Example cont .
c reg a b reg d reg mult y reg reg
x1
x2 mux
mux
mux
add z, x1, x3
sub x2

JMM v1.4
Architecture Models
synthesis is based on the knowledge of a set of architecture models and design styles design styles:
parallel or serial datapath interrupt or polling control memory access types (cache ...)

JMM v1.4
Architecture Models: Microarchitecture

microarchitecture components functional units
adder, multiplier, comparator, ALU, etc.
memory elements
flip-flop, register, registerregister-file, RAM, ROM ... latch, flip-
interconnection units
bus, multiplexer

JMM v1.4
Architectural Models: Combinational Logic

combinational logic: non subdividable units
carry-lookahead adder ... encoder, decoder, carry-
subdividable units
ripple-carry adder, selector, ALUs, ALUs, ... ripple-
implementation forms ROM (table lookup) PLA structures (2 stage logic) multistage logic bitbit-slice, systolic array, etc

JMM v1.4
Architectural Models: Finit State Machines

finit state machines (FSM) are classical control structures autonomous FSM
no inputs (image processing, ...)
nonnon-autonomous FSM with inputs (general purpose)

Mealy machine (general) Moore machine (restricted) Medwedjew machine (hazard free)

JMM v1.4
Architectural Models: Control Unit / Data Path

FSMs are used for control unit tasks datapaths are used as functional units control unit - datapath model (FSMD model)
control inputs
datapath
inputs
FSM
transfer logic state register control outputs transfer logic
datapath
status datapath control
functional unit
datapath
outputs

JMM v1.4
Architectural Models: System Architecture

FSMD is used as process on system level system consists of a set of processes hierarchical FSMD model process synchronization is needed
D Q control inputs
process 1
databus
FSM
transfer logic state register control outputs D Q control inputs transfer logic
datapath
functional unit
clock1
FSM
transfer transfer logic logic state register control outputs transfer logic
datapath
functional unit
clock2
process 2
JMM v1.4
Architectural Models: Interprocess Communication

synchronous or asynchronous communications
no protocol, delay known handshake protocol
process 1
request
aknowledge data
data valid
process 2

JMM v1.4
Architectural Models: Implementation Constraints

behavioural modelling uses abstract models, which do not model the reality precisely implementation constraints / pitfalls
deactivation of set and reset of latches simultaneously clock skew in shift registers lead to races of clock and data (two phase clocking strategy) Moore and Mealy FSMs have hazards asynchronous inputs lead to undefined FSM states
never use:
gated clocks combinatorial outputs for asynchronous inputs asynchronous inputs as FSM inputs

JMM v1.4
Conclusions
description-synthesis method description system design with HDLs (parallel constructions, RTL level) toptop-down and bottombottom-up design abstract models are not precise
races, hazards, delays, signal strength, ...
silicon compiler does not exist

JMM v1.4
Coming Up...
Next time...
Hardware description languages
Reading Weste:
Sections
6 thru 6.2.7 (design strategy) 6.4 thru 6.4.5 (design methods) 6.5 thru 6.5.4 ((design capture tools)
Self study Weste: Weste:

6.6
thru 6.6.8 (design verification) 6.8 thru 6.9 (data sheets)

JMM v1.4
VLSI Design I
Design for Test
Hes dead Jim...
Overview design for test architectures adad-hoc, scan based, builtbuilt-in Goal: You are familiar with testability metrics and you know adad-hoc test structures as well as scanscanbased test structures. Built in test structures as BILBO and boundary scan can be applied.

JMM v1.3
Design For Test

What can we do to increase testability? increase observability add more pins (?!) add small probe bus, selectively enable different values onto bus use a hash function to compress a sequence of values (e.g., the values of a bus over many clock cycles) into a small number of bits for later readread-out read-out of all state information cheap read increase controllability use muxes to isolate subsub-modules and select sources of test data as inputs provide easy setup of internal state Design strategies for test (design for testability): adad-hoc testing scanscan-based approaches selfself-test and builtbuilt-in testing

JMM v1.3
AdAd -hoc testing #1
AdAd-hoc test techniques are a collection of ideas aimed at reducing the test time. Common techniques are:
partitioning large sequential circuits adding test points adding multiplexers providing for easy state access
& =1
co3 Q3 co2
load test 1 0
& =1
co3 Q3 co2 Q2 co1
. . .
load test & test load test 1 0 & =1 1 0 & =1
co3 Q3 co2 Q2 co1 Q1 co0 Q0
& =1
load test 1 0
& =1
Q2 co1
test
& =1
load 1 0
& =1
load test 1 0
& =1
Q1 co0
test
Q1 co0 Q0
test
vdd
& =1
vdd load
1 0
& =1
vdd load
1 0
& =1
Q0 halfhalf-adder
JMM v1.3
AdAd -hoc testing
#2
bus oriented test technique

bus unit 1 unit 2 unit 3 unit 4
multiplexer based testing

A inp 1 0 1 0 Module B B control Module A Module B B inp Module A Module B
A control Module A
0 1 A out test1 test1 test2 test2
0 B out
Module A test: {test1,test2}={0,1}

JMM v1.3
ScanScan -based test techniques #1

Idea: Idea: have a mode in which all registers are chained into one giant shift register which can be loaded/ readread-out bit serially. Test remaining (combinational) logic by (1) in test mode, shift in new values for all register bits thus setting up the inputs to the combinational logic (2) clock the circuit once in normal mode, latching the outputs of the combinational logic back into the registers (3) in test mode, shift out the values of all register bits and compare against expected results. One can shift in new test values at the same time (i.e., combine steps 1 and 3).
. . . scanscan-out D Q clk normal/test 1 0 D Q
CL
shift out
1 0
QQ DD QQ DD clk clk clk clk

shift in normal/test
clk scanscan-in normal/test

JMM v1.3
ScanScan -based test techniques #2

serial
scan
scanscan-out
DD QQ DD QQ clk clk clk clk scanscan-in
CL1
DD QQ DD QQ clk clk clk clk
CL2
DD QQ DD QQ clk clk clk clk
Scan registers
serial scan chain
partial serial scan: sometimes it is not area and speed efficient to implement scan in every location where a register is used (signal processing)
R1 CL R2 CL R5 R6 R3
CL
R4 CL
JMM v1.3
Level sensitive scan design
A popular approach is the level sensitive scan design technique from T.W. Williams (LSSD)
the circuit is level sensitive (steady state response is independent of circuit and wire delays within a circuit): hazard free each register may be converted to a serial shift register
D T C 1 I A L1
D B L2
T2 reg A
D C I A D C I A D C I A B D C I A D C I A D C I A
reg B
B
Comb logic
shiftshift-clk c1 serial data in
c1 shiftshift-clk c2
shift data into reg A
normal operation
shift reg B out

JMM v1.3
serial data out
c2
Scan Elements
LSSD
D T C 1 I A L1 & & & & L1 & & T1 D
D B L2
T2
D C I A
& & L2
& &
T2
scan FF
D TI
1 0 TE
D Q clk clka
TE D TE TI TE
JMM v1.3
clkb clka clka clkb
clkb
clkb clka
SelfSelf -Test Techniques: BILBO

Problem: ScanScan-based approach is great for testing combinational logic but can be impractical when trying to test memory blocks, etc. because because of the number of separate test values required to get adequate fault fault coverage. Solution: use onon-chip circuitry to generate test data and check the results. Can be used at every powerpower-on to verify correct operation!
1 0
circuit under test
normal/test FSM A FSM B okay
Generate pseudopseudo-random data for most circuits by using, e.g., a linear feedback shift register (LFSR). Memory tests use more systematic FSMs to create ADDR and DATA patterns.
For pseudopseudo-random input data simply compute some hash of output values and compare against expected value (signature) at end of test. Memory data can be checked cyclecycle-byby-cycle.
JMM v1.3
Linear Feedback Shift Register (LFSR)

If Cis are not programmable, can eliminate AND gates and some XOR gates... =1 & c1 D Q clk =1 & c2 D Q clk =1 & c3 D Q clk .... =1 & cn-1 D Q clk cn D Q clk &
1 + c1 x + c 2 x 2 + c3 x 3 cn1 x n1 + cn x n with a small number of XOR gates the cycle time is very fast. Cycle through fixed sequence ns). of states (can be as long as 2n-1 for some ns). Handy for large modulomodulo-n counters. different responses for different initial states different responses for different ci pseudopseudo-random sequence generator (PRSG)

JMM v1.3
Signature Analysis
signature analysis is used to compact a data stream into a so called signature different responses for different ci, many wellwellknown CRC (cyclic redundancy check) polynomials correspond to a specific choice of cis. s.
serial in
=1 & c1 D Q clk =1 & c2 D Q clk =1 & c3 D Q clk .... =1 & cn-1 D Q clk cn D Q clk & D Q clk qn-1 zn
JMM v1.3
=1
parallel in
=1 & c1 =1 D Q clk z1 q1 z2 =1 =1 & c2 D Q clk q2 zn-1 . . . . =1 =1 Cn-1 D Q clk =1 &
qn
LFSR Polynomials
polynomials for maximal long sequences for n equal 1 up to 32 n 1,2,3,4,6,7,15,22 5,11,21,29 10,17,20,25,28,31 9 23 18 8 12 13 14,16 19,27 24 26 30 32 f(x) 1+x+x 1+x+xn 1+x2+xn 1+x3+xn 1+x4+xn 1+x5+xn 1+x7+xn 1+x2+x3+x4+xn 1+x+x4+x6+xn 1+x+x3+x4+xn 1+x3+x4+x5+xn 1+x+x2+x5+xn 1+x+x2+x7+xn 1+x+x2+x6+xn 1+x+x2+x23+xn 1+x+x2+x22+xn CRC 1+x+x4+x5+x7+x8 2+x15+x16 1+x MicroLab, VLSI-23 (12/24)
examples of CRCs n 8 16
JMM v1.3
BILBO
#1
Very popular builtbuilt-in test structure is the builtbuilt-in logic block observation (BILBO) from Koenemann BILBO operate in 4 different modes
parallel register mode BILBO register mode PRSG or signature analysis mode BILBO PRSG mode BILBO scan mode reset mode BILBO reset mode
normal operation of circuit
BILBO register mode BILBO signature analysis mode BILBO scan mode BILBO reset mode
scan mode mode
JMM v1.3
BILBO
#2
example of a BILBO element with polynomials 1+x+x4

D0 & D1 & D2 & D3 scan out Q clk Q4 & Q clk Q3 & =1 D
c1 c0 scan in 0 1
&
=1 D
Q clk Q1
&
=1 D
Q clk Q2
&
=1 D
=1
mode A B C D
c1 c0 0 1 0 1 0 0 1 1
function scan mode reset PRSG or signature analyzer parallel registers
JMM v1.3
IDDQ Testing
A-met meter (measures IDD) VDD
GND
Idea: CMOS logic should draw no current when its not switching. So after initializing circuit to eliminate tritri-state fights, disable pseudopseudo-NMOS gates, etc., the powerpower-supply current should be zero after all signals have settled. Good for detecting bridging faults (shorts). May want to try several different circuit states to ensure all parts of the chip have been observed.
JMM v1.3
SystemSystem -Level Test: Boundary Scan

The IEEE 1149.1 boundary scan architecture provides a standardized serial scan path through the I/O pins of a chip (also called JTAG) at the board level, chips obeying the standard may be connected in a variety of series and parallel combinations for board testing (replacing bead of nails) standardized tests:
connectivity tests between components sampling and setting chip I/Os distribution an collection of selfself-test or builtbuilt-inin-test results
PCB interconnect
serial test interconnect
IO pad and boundary cell
serial data in
JMM v1.3
serial data out

Boundary Scan: Test Access Port
The test access port (TAP) is a definition of the interface that needs to be included in an IC
TCK: test clock input TMS: test mode select TDI: test date input TDO: test data output TRST: optional signal for asynchronous reset the TAP
the test architecture

test data registers 0 1
TDI
instruction decode instruction registers clocks/control
TDO
TCK TMS (TRST)

JMM v1.3
TAP controller
Boundary Scan: TAP controller
State machine for the TAP controller. TMS is the control signal.
1 testtest-logic reset 0 1 0 runrun-test/idle 1
selectselect-DRDR-scan 0 capturecapture-DR 0 shiftshift-DR 1 exit1exit1-DR 0 pausepause-DR 0 1 exit2exit2-DR 1 updateupdate-DR 1 0 0
selectselect-IRIR-scan 1 0 capturecapture-IR 0 shiftshift-IR 1 exit1exit1-IR 0 pausepause-IR 0 1 exit2exit2-IR 1 updateupdate-IR 1 0
0 1 0
1 0

JMM v1.3
BoundaryBoundary -scan: IR
Instruction register (IR): minimum 2 bits

to next IR bit data from last cell 0 1 shiftIR clockIR D clk updateIR TRST reset & Q D clk Q IR bit
FSM state shiftIR clockIR updateIR
capturecapture-IR shiftshift-IR
exit1exit1-IR pausepause-IR exit2exit2-IR updateupdate-IR

JMM v1.3
BoundaryBoundary -scan: DR
TAP data register (DR)

boundary scan registers TDI internal data register bypass register (1 bit) TDO
boundary scan register is a special case of a data register. It allows circuit board interconnections to be tested, external components tested, and the state of the chip digital I/Os to be sampled. The boundary scan register is mandatory. internal data registers are optional and add additional access to the circuit. the bypass register is a 1 bit register used to bypass a whole chip.

JMM v1.3
BoundaryBoundary -scan: DR
boundary scan input and output cells

out PAD last cell from chip last cell next cell 0 1
shiftDR mode
D Q
clockDR
D Q clk
mode
0 1
to chip
clk
updateDR
next cell 0 1
shiftDR
D Q
clockDR
D Q clk
0 1
out PAD
clk
updateDR
boundary scan bibi-directional cell

next cell
enable
0 1
shiftDR
D Q
clockDR
D Q clk
0 1
clk
updateDR
from chip
0 1
shiftDR
D Q
clockDR
D Q clk
0 1
bidir PAD
clk
updateDR
0 1
last cell shiftDR
D Q clk
clockDR
D Q clk
0 1
to chip
JMM v1.3
updateDR
Boundary scan: instructions
Minimum 3 instructions
Bypass (all 0): it is used to bypass any serial data registers in a chip with a 1 bit register. This allows specific chips to be tested in a serialserial-scan chain without having to shift through the accumulated SR stages in all the chips Extest (all 1): testing of off chip circuitry sample/preload: places the boundary scan registers (at the chips I/O pins) in the DR chain, and samples or preloads the chips I/Os
optional recommended instructions:

Intest: Intest: singlesingle-step testing of internal circuitry via the boundary scan registers Runbist: Runbist: run internal selfself-testing procedures within a chip

JMM v1.3
Coming Up...
Next time: Top down design. Hardware description languages, logic synthesis. Readings Weste: Weste:
7.3 through 7.3.3.3 (ad(ad-hoc & scanscan-based testing) 7.3.4 through 7.3.4.1 (BILBO) 7.3.5 (Iddq (Iddq testing) 7.5 (boundary scan)

JMM v1.3

Ex vlsi22.1 (difficulty: easy): calculate the pseudopseudorandom sequence of an LFSR with the implemented polynomial 1+x+x3 use the start value x=1 Result: 1,3,7,6,5,2,4,1,...

JMM v1.3
VLSI Design II
Small Signal FET Model and Diode Models
Overview small signal equivalent circuit for fet and diodes advanced large fet modeling and secondsecond-order effects Goal: You can use the small signal equivalent circuit of a diode and a MOS transistor. You are able to determine the parameters of a fet and have the feeling for a MOS fet. fet. You are familiar with advanced modeling like weak inversion, shortshortchannel effects and leakage.
MicroLab, VLSI-24(1/22)
JMM v1.3
Summary: Large Signal Model

MOS fets have 3 regions of operation (subthreshold subthreshold): cutoff region ( subthreshold): VGS <= Vth linear region (triode region): VGS> Vth ; 0< VDS< VDSsat active region (saturated region): VGS> Vth ; VDSsat< VDS cutoff (subthreshold) linear region active region I DS = 0 I DS W = Cox L
2 VDS (VGS Vth )VDS 2
channel length modulation
Cox W I DS (sat ) = (VGS Vth )2 [1 + (VDS Veff )] 2 L k rds 2 Si = k rds = 2 L VDS Veff + 0 qN A
Body effect
Vth = Vth 0 + ( VSB + 2 F 2 F
Veff = VGS Vth

JMM v1.3
NA 2 Si qN A = V ln F T = n i Cox MicroLab, VLSI-24(2/22)
Advanced Large Signal Modeling: Cutoff or subthreshold region

Condition: VGS<=V <=Vth Channel is not inverted and therefore IDS=0 A more precise definition, which is better suited for analog design takes into account that teh channel becomes not suddenly inverted when the gategate-source voltage is increased. Depending on the gategate-source voltage, we define three regions of inversion: weak inversion: Veff < -100mV moderate inversion: -100mV < Veff < 100mV strong inversion: Veff > 100mV IDS (some designers use 200mV instead) weak inversion: I DS W (qVGS / nkT ) I D0 e L n 1.5
JMM v1.3
log IDS
Ut
quadratic strong inversion UGS
exponential weak inversion

UGS
Advanced Large Signal Modeling: Short Channel Effects

As device dimensions are scaled down, shortshort-channel effects degrade the operation of mos fets mobility degradation: short channels and large electric fields provoke more electron collisions. Carrier velocity saturates as it is not anymore n E proportional to the electric filed: d E 1 + Ec nCox W 2 where is the ID = Veff square law 2(1 + Veff ) L
= 1 LE c
1 Rsx nCoxWEc
Id UGS UGS
Rsx
hot carrier effects specially in nfets due higher mobility: high velocity electrons can generate >>V electron hole pairs in drain to substrate: V >>V reduced output impedance
G
th
VD>>0
n+ punch through current
n+ drain to source current
JMM v1.3
Advanced Large Signal Modeling: Leakage Currents

An important secondsecond-order device limitation is the leakage current of the junctions (ex samplesample-and hold time) the intrinsic concentration is a strong function of temperature, the leakage current is also strongly dependent of temperature (approx. doubles for 11C) leakage current of a reversereverse-biased junction:
junction area electron and hole lifetime
I IK
qA j ni 2 0
xd
1 0 ( n + p ) 2 2 si xd = ( 0 + VR ) qN A
JMM v1.3
Small Signal Equivalent Circuits

Why do we love them? Find Id of a transistor in active region when the gate is driven with a voltage source Vgs=V0sin(t) It is handy to use simple linear equations ! What are small signal parameters? Instead of using nonlinear transistor curves, we determine the operating point and use the derivative in this point f(x) f(x0) x0
f (n ) (x0 ) f (x ) = (x x0 )n n! n =0 Taylor: df (x0 ) approximation: f (x ) f (x ) + (x x0 ) 0 dx

operating point

small signal parameters are denoted with small letters small signal parameters are very handy for building simple equivalent circuits MicroLab, VLSI-24(6/22)
small signal
JMM v1.3
Transconductance
#1
The most important small signal component is the transconductance. transconductance. The behavior of a transconductance is the one of a voltage controlled current source. It describes the change of output current when the input voltage is varied. gm main transconductance, transconductance, describes the amplification of the drain current when a voltage is applied between gate and source. gds transconductance, transconductance, accounting for finite output impedance of transistor. Models channel length modulation effect, when drain to source voltage varies. gs transconductance, transconductance, describing how the output current depends on the source to substrate voltage (body effect).
JMM v1.3
Transconductance
#2
Cox W I DS (sat ) = (VGS Vth )2 [1 + (VDS Veff )] 2 L

I D gm = VGS W W 2I D g m = n Cox Veff = = 2 n Cox I D L Veff L I D I D Vtn gs = = VSB Vtn VSB
gm gs = 2 VSB + 2 F
I D g ds = VDS 1 g ds = = I Dsat I D rds
the negative sign is eliminated by changing the current direction in the equivalent circuit
JMM v1.3
SmallSmall -Signal Modeling in the Active Region (Low Frequency)

the lowlow-frequency model
vg id + gmvgs vgs gsvs rds is vd
vs
Depending on the terminal voltages, and the relative size of the parameters, some of the components may be ignored. This helps to reduce the complexity of hand calculations. the alternate lowlowfrequency T model
vg is rs=1/gm vs
JMM v1.3
vd is rds
MOSFET Capacitance Estimation in Active Region

The dynamic response of MOS systems strongly depends on the parasitic capacitance associated with the MOS transistor. 2 = 2 WLC + WC C gs = WCox L L + ov ox GS 0 3 3 C j0 C jx = C gd = WLov Cox = WCGD 0 Mj V 1 + XB 0 ' Csb = ( As + Ach )C js C j sw0 ' C j sw, x = M jsw Cdb = Ad C jd 1 + VXB 0 ' Csb = Csb + Cssw Cs sw = Ps C j sw,s ' Cd sw = Pd C j sw,d Cdb = Cdb + Cd sw
VGS>Vth VSB=0 Al n+ p+ field impland Cs-sw Cgs Csb p- substrate poly VDG>-Vth Cgd n+ Lov Cdb
SiO2 Cd-sw
VB=0 MicroLab, VLSI-24(10/22)

JMM v1.3
SmallSmall -Signal Modeling in the Active Region

the small signal model
Cgd Cgs + gmvgs vgs Csb gsvs rds is id
vg
vd
Cdb
vs
Gate capacitance Cgs is normally the largest parasitic cap of fet. fet. The gategate-drain overlap capacitance Cgd is normally small, can however play a role when the voltage gain is large (Miller effect). Source capacitance Csb is normally second largest capacitance, since it includes channel bulk capacitance. Drain capacitance Cdb normally smallest capacitance.
JMM v1.3
SmallSmall -Signal Modeling in the Triode region

a simplified triodetriode-region model for small VDS In the triode region a resistor modeling the conductance of the channel is normally sufficient. W 1 2 I DS = Cox V V V VDS ( ) GS th DS L 2 W 1 = g ds Cox Veff rds L vg The accurate modeling of high frequency operation of a fet in triode region is nontrivial. We use a simplified model.
vs Cgs Csb rds Cgd Cdb vd
1 1 C gs = C gd = AchCox + WLov Cox = WLCox + WCGX 0 2 2 A + 1 A C + P C C xb = x ch jx x j sw , x 2

JMM v1.3
SmallSmall -Signal Modeling cutin cut -off region

a simplified cutcut-off region model
vg Cgs Csb Cgb Cgd Cdb
vs
vd
As the channel has disappeared we have: C gs = C gd = WLov Cox = WCGX 0 but we also have a new capacitor: C gb = AchCox The capacitors Csb and Cdb are smaller as the channel is not present : C xb = AX C jx + Px C j sw, x
JMM v1.3
Diodes
p+/nwell
diode
anode
cathode SiO2
anode
Al Note that the metal p+ contacts to the diode are connected to heavily doped pn junction region cathode
n+/pwell
n+ n well p- substrate anode
cathode
diode
anode SiO2 cathode
Al n+ p well pn junction n- substrate p+
Schottky diode metal contacts to lightly doped semiconductor forms a Schottky diode
anode Al
cathode SiO2
anode
n+ n well
cathode
p- substrate Schottky diode depletion region

JMM v1.3
Diode Modeling
Cj =
If a diode is reversereverse-biased, current flow is extremely small and primarily due to thermal or electric field optically generated carriers. C j0
Mj
Cj0
N AND 0 = VT ln depletion region n2 i q si N D N A depletion = capacitance Cj 2 0 (N A + N D ) LargeLarge-signal model for forward biased junction 1 1 I S AD + ID = ISe N A ND diffusion capacitance Cd CT = Cd + AC j ID Cd = T (Cd=0 for forward biased Schottky diodes) VT SmallSmall-signal model for a forwardforward-biased diode
VD VT
VR 1 + 0
p+
1 dI D I D = = rd dVD VT
JMM v1.3
dominant for large currents

rd Cj Cd
Coming Up...
Next time: Basic current mirrors and single stage amplifiers. Readings for next time Johns&Martin:
1 through 1.1 ( (pn pn junctions) 1.2 (mos (mos transistor) 1.2 (advanced mos modeling)
CAD Exercises for next time

Ex600: simulation of static behavior of nfet Ex600a: output resistance and channel length modulation Ex600b: weak vs strong inversion
JMM v1.3
#1
Johns&Martin 1.1 pp7: Ex1.4 (difficulty: easy): Assuming process C05MC05M-D. a) Calculate the total zerozero-bias depletion capacitance CT-j0 of a p+nwell diode with an area of 5m times 5m. Do not use the Spice parameter CJ. b) At 3V reversereverse-bias the capacitance Cj has to be calculated again. Result: Result: a) CT-j0=16.3fF, b) CT-j=8.98fF Johns&Martin 1.1 pp10: Ex1.6 (difficulty: medium): Assuming process C05MC05M-D and Mj=0.5 (use Spice parameter CJ). A reversed biased p+nwell diode is charged from 0V to 3.3V through a 10k resistor. Calculate the time to charge the diode to 2/3 2/3 of its end value. value. Result: eq. 1.36 pp10) Result: t66%=130ps (Johns: see eq. Johns&Martin 1.2 pp31: 1.9 (difficulty: easy): Assuming process C05MC05M-D. a) Derive the lowlowfrequency parameters for an nfet with W=10m and L=0.5m at Vgs=1.1V, Vds= Veff , Vsb= 0.55 0.55V. b) What is the new value of rds if the draindrain-source voltage is increased by 0.55 0.55V. Result: Result: a) gm=0.98mA/V, gs=0.143mA/V, MicroLab, VLSI-24(17/22) rds=208k, b) rds=12.8k ????
JMM v1.3
#2
Johns&Martin 1.2 pp33: 1.10 (difficulty: easy): Assuming process C05MC05M-D. Find the TT-model parameter rs for the nfet for example 1.9a. Result: Result: rs=502 Johns&Martin 1.2 pp36: 1.12 (difficulty: easy): Assuming process C05MC05M-D. Find the gds for the nfet for example 1.9 working in triode region with Vds near zero. Result: Result: gm=1.99mA/V, rds=502 Johns&Martin 1.9 pp79: 1.7 (difficulty: easy): Assuming process C05MC05M-D. a) Find ID for an nfet with W=10m, L=0.5m and VGS=1.1V, VDS= Veff . b) Assuming remains constant, estimate the new value of ID if VDS is increased by 0.3V. Result: Result: a) ID=487A, b) ID= 513A
JMM v1.3
#3
Ex600a: Johns&Martin 1.9 pp79: 1.8 (difficulty: easy): Assuming process C05MC05M-D. Simulate a fet W=10m, L=2m in its active region (VGS=2V) and measure the drain current at VDS1=2V and at VDS2=3V. Estimate the output impedance rds and the channel length modulation factor . Result: Result: rds=402k, =0.006
JMM v1.3
#4
Ex vlsi24.1 (difficulty: easy): Assuming process C05MC05M-D. Find the capacitances of an nfet as shown below in its active region for Vsb=1V, Vdb=2V. Result: Result: Cgs=3.86fF, Csb=3.09fF, Cdb=1.94fF, Cgd=0.41fF ( (see see Johns&Martin pp35)
0.5m 0.6m
3m
0.6m
JMM v1.3
#5
Ex vlsi24.2 (difficulty: easy): Assume the transistors are designed with minimal dimensions using the 0.5m Alcatel Mietec process. Use the rules to calculate the Cgs, Csb and Cdb capacitances for its active region. Compare the values with a single device fet. fet. Result: Result: a) Cdb=26.6fF, Csb=49.1fF, Cgs=34.8fF, (see Johns&Martin pp103ff) node 1
27
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
node 2
gates
JMM v1.3
#6
Ex600b: (difficulty: easy, medium time): Assume the transistors are designed with W=10m and L=2m using the 0.5m Alcatel Mietec process. Simulate the fet with Spice in strong and weak inversion. Visualize VGS vs IDS, sqrt IDS and log IDS and identify the different regions and find ID0. Result: compare with transparency #3
JMM v1.3
VLSI Design II
Basic Current Mirrors and Single Stage Amplifiers
He ! Thats me !
Goal: You know the properties of the different amplifier stages and are able to choose the one which is best suited for your application. You can determine the fet dimensions from a given circuit specification. You are familiar with current mirrors. You can apply two possible techniques for improving the output impedance. You know the resulting limitations on the output voltage swing.
MicroLab, vlsi-25 (1/26)
JMM v1.2
Outline
Current mirrors Single stage amplifiers with active loads Johns&Martin
nodal analysis method simple CMOS current mirror (chap 3.1) commoncommon-source amplifier (chap 3.2) sourcesource-follower or common drain amplifier (chap 3.3) common gate amplifier (chap 3.4) source degenerated current mirror (chap 3.5) highhigh-outputoutput-impedance current mirrors (chap 3.6) cascode gain stage (chap 3.7)
Exercises
hand calculations spice simulations

JMM v1.2
Simple CMOS Current Mirror

Used as bias current source Used to multiply currents Used as high output impedance Q1 and Q2 have the same size both transistors are in active region W 2 I ds (sat ) = n C ox (Vgs Vt ) 2L
Iout V1 Q1 Q2 rout Id active
Iin
linear
Vds
Vgs 1 = Vgs 2 Iin = I out
consider minimal output voltage consider finite output impedance

JMM v1.2
Simple CMOS Current Mirror (Q1 model)

small signal model (low frequency)
vg id + gmvgs vgs gsvs rds is vd
small signal model for Q1
vs
Iin V1 Q1
Iout rout Q2
V1 vg1 iy + gm1vgs1 vgs1 gs1vs1 rds1 + ~ vy -
small signal model of diode connected transistor

JMM v1.2
Q1
v1 1/gm1=rs1
Simple CMOS Current Mirror (small signal analysis)

Small signal model of overall CMOS current mirror
Q1 Q2 + gm2vgs2 vgs2 rds2 ix + ~ vx -
1/gm1
as there is no current through gm1 -> vgs2=0 ix rds2 + ~ vx -
rout of CMOS current mirror is: rout = rds 2

JMM v1.2
Common Source Amplifier

the common source topology is the most popular gain stage, especially when highhigh-input impedance is required a common use of simple current mirrors in a singlesinglestage amplifier with an active load active loads represent highhigh-impedance output loads without using high impedance resistors or large power supply voltages. for a given supply voltage a larger gain can be achieved using active loads. for example, if a 1M load were required with a 100A bias current, a 100A x 1M=100V power supply would be necessary
active load
Q3 Q2 rout Vout common source amplifier stage
JMM v1.2
Ibias
Vin
Q1
Common Source Amplifier (small signal analysis)

it is assumed, that the bias current is such that both transistors Q2 and Q3 are in active region.
Q3 Q2 rout Vout
Ibias
Vin
Q1
Rin + vin ~ -
Q1 + gm1vgs1 vgs1 -
R2
vout
active load
rds1
rds2
v gs1 = v in Av = v out = gm1 R2 = gm1 (rds1 rds 2 ) v in

JMM v1.2
SourceSource -Follower or CommonCommon-Drain Amplifier

commoncommon-drain amplifier is commonly used as voltage buffers and thus is called sourcesource-follower ideally the small signal voltage gain is close to unity as the circuit has no voltage gain it does have a current gain dc level of the output voltage is not the same as the dc level of the input voltage note that the body effect is the major limitation on the smallsmall-signal gain
commoncommon-drain amplifier stage Q1 Vout Q3 Q2
Ibias
Vin
active load

JMM v1.2
SourceSource -Follower (small signal analysis)

Note that the voltage controlled current source that models the body effect of the nfet has been included
Ibias Vin Q1 Vout Q3 Q2
Q1 vin =vg1 vd1 + gm1vgs1 vgs1 gs1vs1 vs1 rds2 rds1 vout=vs1
active load
JMM v1.2
Nodal Equation Methodology

In order to minimize circuit equation errors, a consistent methodology should be maintained when writing nodal equations:
the first term is always the node at which the currents are being summed v this node voltage is multiplied by the sum of all admittances connected to the node v (g
out out
ds1
+ gds 2 )
the next negative terms are the adjacent node voltages, and each is mutiplied by all connecting admittances
v d gds1
the last terms are any current sources with a multiplying negative sign used if the current is shown to flow into the node +g v g v
Q1
s1 s1
m1 gs1
vin =vg1
vd1 + gm1vgs1 vgs1 gs1vs1 vs1 rds2

rds1 vout=vs1
JMM v1.2
SourceSource -Follower (small signal analysis, cont)

Q1 vin =vg1 vd1 + gm1vgs1 vgs1 gs1vs1 vs1 rds2 rds1 vout=vs1
v s1
v out ( gds1 + gds 2 ) + g s1v out g m1 (v in v out ) = 0 vout g m1 Av = = vin g m1 + g s1 + g ds1 + g ds 2 gs1 is 5 to 10% of the value of gm1, gds1 and gds2 are in the order of 1/10 of gs1 the body effect parameter gs1 is the major source of the error causing the gain less than unity

JMM v1.2
CommonCommon -Gate Amplifier

CommonCommon-gate stage with active load is used when relatively small input impedance is desired Application examples: input impedance of 50 to terminate a transmission line, or first stage of amplifier to amplify current instead of voltage
active load
Q3 Q2 Vout Ibias Vbias Vin Q1 vd1 + gm1vgs1 vgs1 vs1 RS vin
JMM v1.2
Q1 rin
commoncommon-gate amplifier stage
vout RL
gs1vs1
rds1 rin
active load
CommonCommon -Gate Amplifier (small signal analysis)

Q1 vd1 + gm1vgs1 vgs1 vs1 RS vin Q1 vd1 + vgs1 (gm1+gs1)vs1 vs1 rds1 rin is gs1vs1 rds1 rin vout RL
active load
v s1 = v gs1
thus
vout RL=rds2 only active charge present
nodal analysis for nodes vout and vs1:
RS vin
v out Gs g m 1 + g s 1 + g ds 1 Av = = G L + g ds 1 v in G + g m 1 + g s 1 + g ds 1 s 1 + g ds 1 / G L MicroLab, vlsi-25 (13/26)

JMM v1.2
Summary: Gain Stages

common source amplifier: gain stage with high input impedance. vout Av = = g m1 (rds1 rds 2 ) vin common drain amplifier (source follower): used as voltage buffers with small signal voltage gain close to 1, but can produce current gain.
Av = vout g m1 = vin g m1 + g s1 + g ds1 + g ds 2
common gate amplifier: used as gain stage when a small input impedance is desired and can be used as first stage of an amplifier designed to amplify current rather than voltage.
vout Gs g m1 + g s1 + g ds1 Av = = vin G + g m1 + g s1 + g ds1 GL + g ds1 s 1 + g ds1 / GL
JMM v1.2
SourceSource -Degenerated Current Mirror

General consequence of finit output resistance:
deviation in large signal behavior difficulties as active load
the output impedance of the basic 2 transistor current mirror can be increased by degeneration resistors Rs I I
in out
V1 Q1 Rs Q2 Rs
rout
Q1
0V + vgs -
Q2 gm2vgs gsvs rds2 vs ix Rs
ix + ~ vx -
1/gm1 Rs
impedance increase
rout
JMM v1.2
vx = = rds 2 [1 + Rs ( gm 2 + g s 2 + gds 2 )] ix
HighHigh -Output Impedance Current Mirrors Cascode Current Mirror

the output impedance of a cascode current mirror is increased by a factor 10 to 100 compared to a basic current mirror a disadvantage is the reduced output voltage swing because transistors may enter triode region
Iin
Iout Vout rout
Q3 Q1
Q4 Q2
Vout > 2 Veff + Vtn rout rds 4 rds 2 g m 4

JMM v1.2
Cascode Current Mirror

reduced output voltage swing transistor in active region Vds > Veff = Vgs - Vtn all transistor have the same size and current Id: Vgs = Veff + Vtn
Iin Iout Vout rout Q3 Q1 Q4 Q2
2Id Veff = n C ox (W / L )
Vg 3 = Vgs 1 + Vgs 3 = 2 Veff + 2 Vtn Vds 2 = Vg 3 Vgs 4 = Vg 3 (Veff + Vtn ) = Veff + Vtn
Vout > Vds 2 + Veff = 2 Veff + Vtn

JMM v1.2
Cascode Current Mirror (cont)

very high output impedance
Iin Iout Vout rout Q3 Q1 Q4 Q2
impedance impedance + gm3vgs3 vgs3 gs3vs3 rds3 vg4 vgs4=-vs4 g v + m4 gs4 vgs4 gs4vs4 vs4 gs2vs2 rds2 rds4 iout vout
rds1
vs3 no current vg2 vs3=0V g v + m2 gs2 vgs2 -
rout = rds 4 [1 + rds 2 (g m 4 + g s 4 + g ds 4 )] rds 2 rds 4 g m 4

JMM v1.2
HighOutputHigh -Output -Impedance Current Mirrors Wilson Current Mirror

very similar performance than cascode current mirror but 1/2 of its output impedance shuntshunt-series feedback to increase output impedance
Iin rin Q3 Q1
Iout rout Q4 Q2
Q2 senses output current and mirrors it to Id1 to. Iin and Id1 must precisely match otherwise Vg3 increases/decreases.

JMM v1.2
Cascode Gain Stage

cascode configuration for single stage amplifiers is commonly used in modern IC design quite large gain for single stage due to large impedance at the output to enable the large gain, high quality cascode current mirrors at the output are necessary large gain normally without any speed degradation voltage across input drive fet is limited minimizing short channel effects in modern technologies configuration: commoncommon-sourcesource-connected transistor feeding into a commoncommon-gategate-connected transistor telescopic cascode amplifier
n-channel I commoncommon-gate bias Vbias Vin Vout Q2 Q1 CL Vin
foldedfolded-cascode amplifier
Ibias p-channel commoncommon-gate Q2 Q1 Ibias2
Vbias Vout CL
identical in/out dc level possible
JMM v1.2
Cascode Gain Stage telescopic cascode amplifier

Ibias Vbias Vin output impedance of cascode stage: Vout Q2 Q1 vx vs2(gs2+gm2) vs2 rds1 ix rds2 CL
rx g m 2 rds 1rds 2
gs2vs2 vs2 rds2
ix
+ vgs2 -
gm2vgs2
vx
+ vgs1 -
gm1vgs1
gs1vs1
rds1
1 gm Av g 2 ds
rd 2 g m 2 rds1rds 2
for high impedance Ibias with
RL g
2 m p ds p
for gdsn=gdsp and gmn=gmp

JMM v1.2
2 g mrds rout 2
Summary: Cascode and Source Deg.

source degenerated current mirror: by addding a resistor RS at the source node of a current mirror fet, fet, the output impedance can be increased: vout rout = = rds 2 [1 + Rs ( g m 2 + g s 2 )] iout cascode current mirror: the output impedance of a current mirror can further be increased by using cascode fets: fets: v rout = out rds 2 rds 4 g m 4 iout
Vout 2Veff + Vtn
cascode gain stage: due to the large impedance at the output, high gain can be realized with cascode gain stages:
vout 1 gm Av = vin 2 g ds
JMM v1.2
Coming Up...
Next topic Frequency response of single stage amplifiers Readings for next time Johns&Martin:
nodal analysis method simple CMOS current mirror (chap 3.1) commoncommon-source amplifier (chap 3.2) sourcesource-follower or common drain amplifier (chap 3.3) common gate amplifier (chap 3.4) source degenerated current mirror (chap 3.5) highhigh-outputoutput-impedance current mirrors (chap 3.6) cascode gain stage (chap 3.7)
Exercises:
Have a look at the exercises in Johns&Martin. CAD exercise Ex601

JMM v1.2
VLSIExercises VLSI -25
#1
Johns&Martin chap 3.1 pp127: 3.1 (difficulty: easy): Consider the current mirror shown on transparency vlsi25/3 where Iin=100A and each transistor has W=10m and L=2m. Given rds=88000 [L (m)]/[ID (mA (mA)], mA)], find rout for the current mirror and the value of gm1. Also estimate the change in Iout for a 0.5V change in the output voltage. Result: rout =1.76M, gm1=0.45mA/V, dIout=0.28 Johns&Martin chap 3.2 pp129: 3.2 (difficulty: easy): Consider the common source stage shown on transparency vlsi25/7 where Iin=100A and all transistor have W=10m and L=2m. Given rdsn=88000 [L (m)]/[ID (mA (mA)], mA)], rdsp=50000 [L (m)]/[ID ( (mA mA)]. mA)]. What is the gain of the stage. Result: Av =-287

JMM v1.2
#2
Johns&Martin chap 3.3 pp131: 3.3 (difficulty: easy): Consider the source follower shown on transparency vlsi25/8 where Ibias=100A and all transistors designed with Alcatel 0.5m process have W=10m and L=2m. Given n=0.45V1/2, Vsb=2V, and rds(mA)]. mA)]. ds-n=88000 [L (m)]/[ID (mA What is the gain of the stage. Result: Av =0.88 Johns&Martin chap 3.5 pp136: 3.4 (difficulty: easy): Consider the source degenerated current mirror shown on transparency vlsi25/15 where Ibias=100A and all transistors designed with Alcatel 0.5m process have W=100m and L=2m. Given n=0.45V1/2, Vsb=2V, Rs=5k, and rds(mA)]. mA)]. What is ds-n=88000 [L (m)]/[ID (mA the increase in output resistance compared to simple current mirror. Result: increase=9.1, rout =16M
JMM v1.2
#3
Johns&Martin chap 3.6 pp138: 3.5 (difficulty: easy): Consider the cascode current mirror shown on transparency vlsi25/15 where Iin=100A and all transistors have W=10m and L=2m. Given VSB4=1V and rds(mA)]. mA)]. ds-n=50000 [L (m)]/[ID (mA What is the output impedance and the minimal output voltage. Result: rout =527k, Vout(min)=1.5V Johns&Martin chap 3.7 pp142: 3.6 (difficulty: easy): Consider the telescopic cascode gain stage shown on transparency vlsi25/20 assuming gm=0.5mA/V and rds=100k. What is the output impedance and gain. Result: rout =2.5M, Av=-1250

JMM v1.2
VLSI Design II
Frequency Response of Single Stage Amplifiers
[dB] 40
20
103
104
105
106
107
108
109 [Hz]
Circuit Analysis the precise way: solving complex equations the approximate way: find the dominant pole the handy way: let Spice do it precisely Goal: You are able to identify the dominant pole in a transistor circuit. You can approximately determine the contribution of each node in a circuit to the total frequency response.
MicroLab, vlsi26 (1/29)
JMM v1.2
Outline
Frequency response
commoncommon-source amplifier sourcesource-follower amplifier sourcesource-follower amplifier with compensation technique cascode gain stage
Johns&Martin
frequency response (chap 3.11)
Gray&Meyer
estimation of dominant poles zerozero-Value Time Constant Analysis (pp500 ff) ff) (Analysis and Design of Analog Integrated Circuits, 3rd edition, Wiley and Sons, ISBNISBN-04710471-5998459984-0)
Exercises
hand calculations spice simulations

JMM v1.2
Frequency Response Dominant Pole Approximation

precise calculation of frequency response is a complex task and thus different approximation methods exist one method is the zerozero-value time constant analysis first some ideas about dominantdominant-pole approximation are developed
transfer function by smallsmall-signal analysis
N( s ) a 0 + a 1 s + a 2 s 2 + + a m sm A(s ) = D(s ) 1 + b 1 s + b 2 s2 + + b n sn
very often the zeros are unimportant, thus
K A(s ) = s s s 1 p 1 p 1 p 1 2 n
Where K is a constant and p1,p2 ... are poles of the transfer function, thus
1 b1 = p i= 1 i
n
JMM v1.2
Dominant Pole Approximation (cont 2)

1 b1 = p i= 1 i
n
an important practical case occurs when one pole is dominant
p 1 << p 2 , p 3 ,
thus b 1
1 p1
1 >> p1
1 p i= 2 i
n
the gain magnitute in the frequency domain is
A ( j) =
K 2 2 2 1 + 1 + 1 + pn p p 1 2 K 2 1 + p 1
with a dominant pole we simply get
A ( j)
JMM v1.2
Dominant Pole Approximation (cont 3)

this approximation will be quite accurate as long as p 1
thus for a dominant pole situation the -3dB frequency is
3 dB p 1
3 dB
1 b1
pole plot for a circuit with a dominant pole
j
p3 p2 p1
s plane

JMM v1.2
ZeroZero -Value Time Constant

Method for finding the time constant associated with a capacitor in the small signal equivalent circuit replace the capacitor Cx by a voltage source Vx set all independent sources to ground set all other network capacitors to zero find admittance Yx (=1/Rx) which is driven by a voltage source Vx the time constant x is given by:
x = Rx C x

JMM v1.2
Frequency Response ZeroZero-Value Time Constant

RL Rin + vin ~ i3 Cx
+
vout
Rin + vin ~ -
rb
v3
i2 C
+ +
i1 C
v - 1
v2
gmv1
RL
vout
We can show that with this choice od variables the circuit equations are of the form:
i 1 = (g 11 + sC )v 1 + g 12 v 2 + g 13 v 3
i 2 = g 21 v 1 + (g 22 + sC )v 2 + g 23 v 3 i 3 = g 31 v 1 + g 32 v 2 + (g 33 + sC x )v 3
JMM v1.2
ZeroZero -Value Time Constant (cont 1)

The poles of the transfer function are the zeros of the determinant determinant of the circuit equations, which can be written in the form:
(s ) = K 0 + K 1 s + K 2 s2 + K 3 s3
If all capacitors are zero:
( s ) = K 0 (1 + b 1 s + b 2 s 2 + b 3 s 3 )
K 0 = C =C = C x = 0 0
Consider now the term K1s, this is the sum of the terms involving s that are obtained when the system determinant is evaluated. However it is apparent, that s only occurs when associated with a capacitance:
K 1 s = h 1 sC + h 2 sC + h 3 sC x
The terms are constants. h1 can be evaluated by expanding the determinant about the first row:
( s ) = (g 11 + sC ) 11 + g 12 12 + g 13 13
With cofactors xx of the determinant. The term sC is found by evaluating 11 with C and Cx equal zero
h 1 = 11
C =C x =0
JMM v1.2

Now consider expansion of the determinant about the second row.
( s ) = g 21 21 + (g 22 + sC ) 22 + g 23 23
With cofactors xx of the determinant. The term sC is found by evaluating 22 with C and Cx equal zero
h 2 = 22
similarly
C =C x =0
h 3 = 33
C = C = 0
Combining these equations gives:
K 1 = 11
and:
C =C x =0
C + 22
C =C x =0
C + 33
C =C = 0
Cx
33 C =C =0 22 C =C x =0 K 1 11 C =C x =0 C + C + Cx b1 = = 0 0 0 K0
JMM v1.2

Now consider putting i2=i3=0 and solving for v1
v 1 11 = i1 (s )
The drivingdriving-point resistance at the C node pair with all capacitors equal to zero:
11
We now define
C = C x =0
0 R 0
We can write now:
11 =
C = C = C x =0
11 = 0
C =C x =0
b 1 = R 0C + R 0C + R x 0C x
Thus:
3 dB
1 b1
3 dB
Thus the sum of the zerozero-value time constants leads to the -3dB frequency
JMM v1.2
The precise way: Add the parasitic capacitors to the equivalent circuit. Use nodal analysis for evaluating the transfer function. The approximate way: if there exists a pole p1 <<p2, p3 ,..., and the transfer function is already given be the transfer function A(s)=N(s)/D(s) 2 n D ( s ) = 1 + b s + b s + l + b s with 1 2 n the pole p1 is given by: p1 = 1 / b1 the dominant pole may be found directly in the circuit diagram by looking for the node with the largest impedance. Take care of the Miller Effect. The time constant (and its influence on the frequency response) associated with a single parasitic capacitor can be estimated with the zero value time constant method:
set all independent sources to zero replace the interesting capacitor Cx by a voltage source Vx set all other capacitors to zero evaluate the impedance Rx seen by the voltage source Vx the time constant is equal to CxRx
Summary: Frequency Analysis Methods
The handy way: AC analysis with Spice

JMM v1.2
Frequency Response CommonCommon-Source Amplifier

precise calculation of frequency response is most often left to computer simulations much insight can be obtained by finding the dominant frequency effects (dominant poles, zeros)
Rin + vin ~ -
v1 Cgs1
Cgd1 vgs1 gm1vgs1 R2
vout C2 Cdb of Q1 and Q2 and load CL
rds of Q1 and Q2
nodel analysis ...

JMM v1.2
(cont cont) Frequency Analysis ( cont )

C gd 1 g m 1R 2 1s g v out m1 = v in 1 + sa + s 2 b
at frequencies where gain has just started to decrease
a = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
3 db
3 db
1 = a 1 = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
Miller capacitance for Rin >> R2
b = R inR 2 (C gd 1 C gs 1 + C gs 1 C 2 + C gd 1 C 2 )
analysis for high frequencies for widely separated poles

2 s s s s 1 + 1+ + D(s ) = 1 + p 1 p 1 p 2 p 1 p2
p 2
g m 1 C gd 1 C gs 1 C gd 1 + C gs 1 C 2 + C gd 1 C 2
JMM v1.2
Frequency Response SourceSource-Follower Amplifier

source followers can have complex poles and thus exhibit overshoot a compensation technique resulting in only real axis poles is shown, resulting in no overshooting
Q1 vout Ibias CL
Iin
Rin
Cin
Cgd1 iin Rin Cin Cgs1 + gm1vgs1 -vgs1 rds2 gs1vs1 vs1 Cs
vd1 rds1 vout Cs=CL+Csb1

JMM v1.2
SourceSource -Follower Amplifier (cont 1)

vg1 Yg + gm1vgs1 -vgs1 vs1 Cin=Cin+Cgd1 Rs1 Cs vout Cin Cgs1
iin
Rin
R s 1 = rds 1 rds 2 (1 / g s 1 )
1. gain from vg1 to vout is found 2. admittance Yg looking into gate of Q1 without considering Cgd1 is found 3. Gain from iin to vg1 is found 4. overall gain from vin to vout is found and results interpreted 1. gain from vg1 to vout is found
v out (sC s + sC gs 1 + G s 1 ) v g 1 sC gs 1 g m 1 (v g 1 v out ) = 0 sC gs 1 + g m 1 v out = v g 1 s(C gs 1 + C s ) + g m 1 + G s 1

JMM v1.2

1. gain from vg1 to vout is found 2. admittance Yg looking into gate of Q1 without considering Cgd1 is found 3. Gain from iin to vg1 is found 4. overall gain from vin to vout is found and results interpreted 2. admittance Yg looking into gate of Q1 without considering Cgd1 is found
Yg =
ig1t v g1
s(C gs 1 + C s ) + g m 1 + G s 1
sC gs 1 (sC s + G sq )
3. Gain from iin to vg1 is found
v g1 iin
s(C gs 1 + C s ) + g m 1 + G s 1 a + sb + s 2 c
4. overall gain from vin to vout is found and results interpreted
v out sC gs 1 + g m 1 A(s ) = = iin a + sb + s 2 c

JMM v1.2

0 is the pole frequency Q is the Q factor
A ( s ) = A (0 )
N( s ) s s2 + 2 1+ 0 Q 0
There is no peaking and the transfer functions maximum is at dc if: 0 is the -3dB frequency if: Step input function: no peaking for peaking for (complex conjugate poles) For the source follower:
Q < 1 / 2 0.707 Q = 1/ 2
Q 0 .5 Q > 0 .5 % overshoot = 100 e /

4 Q 2 1
gm1 Z = C gs 1 Q=
G in (g m 1 + G s 1 ) 0 = C gs 1 C s + C 'in (C gs 1 + C s )
G in (g m 1 + G s 1 )[C gs 1 C s + C 'in (C gs 1 + C s )] G in C s + C 'in (g m 1 + G s 1 ) + C gs 1 G s 1
Source follower circuits can exhibit large amounts of overshoot under certain conditions. In practical uE circuits the parasitic capacitances and the output capacitance results in only moderate overshoot for worstworst-case conditions.
JMM v1.2
SourceSource -Follower Amplifier Compensation Technique

source followers can have complex poles and thus exhibit overshoot overshooting may be reduced by:
increasing Cin or Cs or both adding a compensation network
Q1 vout Ibias CL
Iin
Rin
Cin
C1 R1
C1 =
(g m 1 + G s1 )(C gs 1 + C s ) (g m1 + G s1 )(C gs 1 + C s )
C gs 1
C gs 1 (C s g m 1 C gs 1 G s 1 )
2
g m 1 C gs 1 C s
R1 =
(C + G ) (C (C g C G ) C
gs 1 s s m1 gs 1 s1
2 ) + G gs 1 s gs 1 s m 1
Cg
C2 =
JMM v1.2
C gs 1 C s C gs 1 + C s
(see Johns/Martin pp160pp160-162)
Frequency Response CommonCommon-Gate Amplifier

The frequency response of the commoncommon-gate stage is usually superior to that of the commoncommon-source stage due to the low impedance, rin, at the source node, assuming GL=(sC =(sCL+gds2)is not considerably smaller than gds1.
Ibias Q1 Vbias = CL vout
vout
(see Johns/Martin pp160pp160-162)

JMM v1.2
Frequency Response HighHigh-Ouput Impedance Mirrors

Both the Wilson and the cascode current mirrors introduce highhigh-frequency poles into the signal transfer function. The approximate time constant of these poles is Cgs/gm, the roof of this statement can be found by doing highhigh-frequency, smallsmall-signal analysis.
Iin
Iout Vout rout rin
Iin
Iout rout
Q3 Q1
Q4 Q2
Q3 Q1
Q4 Q2
(see Johns/Martin pp163)

JMM v1.2
Frequency Response Cascode Gain Stage

The exact highhigh-frequency analysis of a cascode gain stage is usually left to simulation on a computer. at highhigh-frequencies, the time constant due to the output node almost always dominates since the impedance is so large at that node:
Cout=(Cgd2+Cdb2)+CL+Cbias CL is normally the major contributor
Ibias Vbias Vin Vout Q2 Q1 CL
3 dB
1 R out C L
2 g 2ds g mCL
JMM v1.2
Cascode Gain Stage (cont 1)

ZeroZero-value time constant analysis method used
Ibias Vbias Vin Vout Q2 Q1 vg1 vin Cgs1 gm1vg1 rds1 Cgd1 CL gm2vs2 rds2 Cd2 vs2 Cs2 GL vout
C d 2 = C gd 2 + C db 2 + C L + C bias C s 2 = C db 1 + C sb 2 + C gs 2
All independent sources have to be set to zero (vin=0)

node vg1
Cgs 1 = C gs 1R in

JMM v1.2

gm2vs2 vg1 vin Cgs1 gm1vg1 rds1 Cgd1 rds2 Cd2 vs2 Cs2 GL vout
nodes vg1,vs2 the capacitor Cgd1 is replaced by a voltage source vx in order to calculate the input resistance seen from that node. vg1 Rin vx ix -~ + gm1vg1 Rd1
G d 1 = g ds 1 + Ys 2
admittance looking into the source of a cascode transistor is Ys2
Cgd 1 = C gd 1R d 1 (1 + R in [G d 1 + g m 1 ])

JMM v1.2

G d 1 = g ds 1 + Ys 2
admittance looking into the source of a cascode transistor is Ys2 gm2vs2 for Ys2=is/vs2
2 R L g mrds
vout rds2 Cd2 GL
vs2 is
g ds << g m Ys 2 g ds
(see cascode current mirror impedance, pp137, vlsivlsi-25/17)
Cgd 1 Cgd 1
JMM v1.2
rds C gd 1 (1 + g mR in ) 2 2 g mrds for Rin is large and equal rds C gd 1 2


node vs2
the resistance seen by the capacitor Cs2 is rds1 in paralell with the impedance seen looking in the source of Q2 which is approximately rds, thus:
Cs 2
rds C s2 2
The resistance seen by C is the output impedance of the node vout cascode amplifier, thus: d2
Cd 2 C d 2
2 g mrds
total Cgs 1 + Cgd 1 + Cs 1 + Cd 1 2 2 g mrds g r r m total C gs 1R in + C gd 1 + C s 2 ds + C d 2 ds 2 2 2

JMM v1.2
Cascode Gain Stage Comments

High frequencies considerations
Ibias Vbias Vin Vout Q2 Q1 CL
one pole dominates, thus the gain is:
Av A(s ) = 1 + s / 3 dB
at frequencies substantial larger than -3dB:
Av gm1 A(s ) s / 3 dB sC L
upper limit of the unityunity-gain frequency of an amplifier that uses a cascode gain stage is limited by source node of Q2: 3 p Veff 2 1 p 2 = > 2 L22 s2

JMM v1.2
Coming Up...
Next topic Basic OpAmp design and compensation Readings for next time Johns&Martin: Sections 3.11 Exercises: Have a look at the exercises in Johns&Martin.

JMM v1.2
#1
Johns&Martin chap 3.11 pp156: 3.8 (difficulty: easy): Consider the commoncommon-source amplifier shown on transparency vlsivlsi-26/6 where Iin=100A and all transistors have W=100m and L=1.6m. Given Rin=180k, CL=0.3pF, Cgs1=0.2pF, Cgd1=15fF, Cdb1=20fF, Cdb2=36fF, nCox=90A/V2, pCox=30A/V2, and rdsds-n=8000 [L (m)]/[ID (mA)], (mA)]. mA)]. mA)], rdsp=12000 [L (m)]/[ID (mA Estimate the 3db frequency response. Result: f-3db =554kHz Johns&Martin chap 3.11 pp160: 3.9 (difficulty: easy): Analyse the source follower and assume that Ibias=100A and all transistors have W=100m and L=1.6m. Given Rin=180k, CL=10pF, Cgs1=0.2pF, Cgd1=15fF, Csb1=40fF, Cin=30fF, nCox=90A/V2, pCox=30A/V2, and rdsds(mA mA)]. mA)]. Find 0, Q, and n=8000 [L (m)]/[ID ( z of the source follower. Result: 0 =52MHz, Q=0.8, % overshoot = 8.1%, z=5.3GHz
JMM v1.2
#2
Johns&Martin chap 3.11 pp166:3.11 (difficulty: easy): Assume that for the input transistors and the cascode transistors, gm=1mA/V, rds=100k, Rin=180k, CL=5pF, Cgs=0.2pF, Cgd=15fF, Csb=40fF, Cdb=20fF, Cbias=20fF, Estimate the dB frequency of the cascode amplifier (transparency 19). Result: -3dB =2 6.3MHz Johns&Martin chap 3.11 pp168: 3.12 (difficulty: easy): Estimate the lower bound on the frequency of the second pole of a foldedfolded-cascode amplifier for a 0.8m technology, where a typical value of 0.25V is chosen for Veff2. L2=1.5Lmin, p=0.02m2/Vs. Result: p2 =2 414MHz

JMM v1.2
Analog Microelectronics
Basic OpAmp Design and Compensation

JMM v1.0
Outline
u Johns&Martin
u MOS differential pair and gain stage (chap 3.8) u two-stage CMOS OpAmp (chap
u gain u frequency response u systematic
5.1)
offset voltage u n- or p-channel input stage

u feedback
and OpAmp compensation (chap 5.2)
u first-order
model of closed loop-amplifier u linear settling time u OpAmp compensation u compensation of two-stage OpAmp u lead compensation u making compensation independent of process and temp u biasing OpAmp to have stable transconductance u Exercises
(5.3-5.5)
u hand calculations u spice simulations

JMM v1.0
MOS Differential Pair and Gain Stage

u most
integrated amplifiers have differential input, realized with a differential transistor pair
ID2 V+ Q1 Ibias Q2 ID2 V-
ua
low-frequency small-signal equivalent circuit is based on the T model for the MOS transistor
id1=is1 v+ rs1 gate current is zero in T model is1 is2 id2=is2 vrs2

JMM v1.0
MOS Differential Pair (cont 1)

to simplify analysis the output impedance of the transistor is ignored id1=is1 Definition: id2=is2 is1 is2 i vrs2
v in v + v
v+ rs1
s1 v in v in id 1 = i s 1 = = rs1 + rs2 1 / g m1 + 1 / g m2
since both Q1 and Q2 have the same bias currents, gm1=gm2
g m1 id 1 = v in 2
Definition:
gm1 id 2 = v in 2
thus:
iout i d 1 id 2
iout = g m 1 v in

JMM v1.0

If a differential pair has a current mirror as an active load, a complete differential-input, single-ended-output gain stage can be realized. to simplify analysis the output impedance of the transistor is ignored Q3 id4 is1 Ibias Q2 Q4
is1 + vin -
i d 4 = i d 3 = i s 1
and
rout vout
id 2 = i s1
Q1
v out = ( i d 2 id 4 )rout = 2 i s1rout = g m1routv in

this result assumes that the output impedance is purely resistive, if there is also a capacitive load CL we get:
A v = g m1 z out
where
z out = rout 1 / (sCL )

vout gm1vin rout zout
Thus, for this differential stage, a very + simple model is used. This model implicitly assumes that the time constant at the output vin node is much larger than the time constant due to the parasitic capacitances at Q1 and Q2
CL
JMM v1.0

The evaluation of the output resistance rout is determined by using the small-signal equivalent circuit and applying a voltage at the output node. Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid- model is used for Q4.
rout
vx ix
+
+
is1 vin gm4va ix4 ix3 is2 rs2 is2 vx ix

+
Q3 id4 is1 Ibias
Q4
rout vout Q2
Q1
rds3 //rs3 is5 is1 is1
va -
rds4 ix1 ix2
rout = rds2 rds4
rs1
rds1 rds2
A v = g m1 (rds 2 rds4 )
JMM v1.0

The evaluation of the large signal amplification is determined by using the large-signal transistor model in the active region of the fets. Note that the T-model is used for Q1, Q2 and Q3, and the the hybrid- model is used for Q4.
C W ID = 0 ox (VGS Vtn )2 2 L ID = (VGS Vtn )2 2

I OUT = ID 1 I D 2 = I bias
IS1 + VIN Q1
Q3 ID4 ID1 ID2 Ibias
Q4
Iout
Vout
Q2
2 4 VIN 2 VIN 2 I bias 4Ibias
IOUT Ibias
-3 -2 -1
1.5 1 0.5 0 0 1
-0.5 -1 -1.5
VI D / Ib i a s
typical value for Ibias=0.1mA:

JMM v1.0
/ I b i a s = 5. 4
VIN = 187mV
Two-Stage CMOS OpAmp

u Basic
OpAmp design are discussed

response
u OpAmp gain u frequency u slew
rate u systematic offset voltage u n-channel or p-channel input stage

capacitor ensures stability when OpAmp is used in feedback CC Vin
+
CC is often called Miller capacitance to illustrate its effect on input
A1
-A2 second gain stage
1 output buffer
Vout
differential input stage
single ended output e.x. common-source gain stage with active load
output gain stage only present when resistive loads need to be driven

JMM v1.0
CMOS realization of a two-stage OpAmp

25 25
Q 10
Q 11
Q5300
VDD
p-well process necessary Q6 300

500
300 25 25
Q 14 Q 15
100 25
Q 12 Q 13
Vin-
Q1
Q2
300
Q8 Vin+ Q 16 Vout
CC
300 500
150
150
Q3 Rb VSS
Q4
Q7
Q9
bias circuit
differential input first stage
common source second stage
output buffer
u p-channel
input stage u all transistor lengths are 1.6m (1m process) u reasonable sizes for lengths of the transistors might be somewhere between 1.5 and 2 times the minimum transistor length
JMM v1.0
Two-Stage OpAmp Gain

u overall
gain for low frequency application is the most critical parameter of an OpAmp
gain of the first stage (differential stage)
Av1 = gm1 (rds1 rds 4 )
g m1
W W I bias = 2 p C ox I D1 = 2 pC ox L 1 L 1 2
approximation to the finite output resistance, where a is technology dependent parameter: 5e-6 V1/2/m ignoring short channel effects
Li rdsi VDGi + Vti I Di

gain of the second stage (common-source stage) gain of the third stage (common-drain stage)
Av 2 = gm 7 (rds 6 rds 7 )
gm 9 Av 3 = G L + g m 9 + g ds8 + g ds9 gm 9 = G L + g m 9 + g s8 + gds8 + g ds9
g m gs = 2 VSB + 2 F
gain of the third stage with body effect (bulk Av 3 not connected to source)
body effect constant =0.5V1/2 2F=0.7V

JMM v1.0
Two-Stage OpAmp Frequency Response

u frequency
response where capacitor Cc causes the magnitude of the gain to decrease, but still well below unity gain frequency (open-loop gain = 1) midband frequency u only compensation capacitor CC repsected u assume Q16 is not present (resistor for lead compensation, effect only at unity gain frequency) u discuss simplified circuit:
Vbias
300
Q 300
5
midband gain untity gain frequency
vinvin+
Q1
Q2
300
g m1 Av (s ) sC C g m1 ta CC
v1
CC
v2
150
150
Q3
Q4
i=gm1 vin
-A2
A3
vout

JMM v1.0
Two-Stage OpAmp Slew Rate

u slew
rate SR is the maximum rate the output changes when input signals are large u at slew rate limitation all current of Q5 goes either in Q1 or Q2 this current has to go through CC dv out SR dt
max
Vbias
300
Q5300
vinvin+
Q1
Q2
300
2 I D1 SR = = Veff 1 ta CC
v1 CC v2
150
150
Q3
I Q4
-A2
A3
vout
increasing V eff1 and ta increases SR p-channel fet inputs increases SR increasing V eff1 reduces transconductance gm1
JMM v1.0
Two-Stage OpAmp Systematic Offset Voltage Cancelation

u two-stage
OpAmps may have a systematic input offset voltage if not properly designed
u the differential input is zero: v in+= vinu ID6
= ID7 , which requires a well defined V GS7 value

VDD
Vbias
300
Q5300
Q6 300
Vin-
Q1
Q2
300
Vin+
Vout
150
150
300
Q3
Q4
Q7
(W / L)7 = 2 (W / L )6 (W / L)4 (W / L)5

JMM v1.0
Two-Stage OpAmp n- or p- channel input stage

u comparison
OpAmps
between n- and p-channel input stage
u overal dc
gain is largely unaffected since both designs have one stage with n-channel and one stage with one or more p-channel driving fets. u for a given power dissipation, and therefore bias current, having a p-channel input-pair stage maximizes the slew rate. u having a p-channel input first stage implies that the second stage has an n-channel input drive fet. This arrangement maximizes the transconductance of the drive fet of teh 2nd stage, which is critical when high frequency operation is important. u output stage: n-channel source follower is preferable because this will have less of a voltage drop (if separate p-well is used). Its higher transconductance reduces the effect of the load cap on the second pole. There is also less degradation on the gain when small load resistances are being driven.
p-channel input fets for the first stage is almost always the best choice
JMM v1.0
Feedback and OpAmp Compensation

u OpAmps in
closed-loop configurations are discussed and how to compensate an OpAmp to ensure that the closed-loop configuration is not only stable but has a good settling characteristic. u Optimum compensation of OpAmps is typically considered to be one of the most difficult parts in the OpAmp design procedure.
u first-order model of closed-loop amplifier u linear settling time u OpAmp compensation u compensating the two-stage u lead compensation u making
OpAmp
compensation independent of process and temperature u biasing an OpAmp to have stable transconductances

JMM v1.0
First Order Model of Closed-Loop Amplifier

u First
order model of transfer function of a dominant-pole compensated OpAmp: A0 real axis A(s ) = dominant pole (1 + s / p1 ) A0 unity gain frequency definition A( j ta ) 1 ta / p1
unity gain frequency of first order OpAmp model for midband frequencies
ta A0 p1 p1 << << ta ta A(s ) s A( s ) ACL ( s ) = 1 + A(s ) 1 1 ACL ( s ) = (1 + s / ta )

gain Aout(s)
closed-loop gain
Ain(s) + A(s)
3dB ta
JMM v1.0
Linear Settling Time

u the
settling time performance is an important design parameter of OpAmps

u the charge transfer in SC circuits is closely related to
OpAmps step response u settling time is defined as the time it takes for an OpAmp to reach a specified percentage of its final value when a step input is applied u linear settling time portion is due to the finite unity gain frequency (independent on output step size) u nonlinear settling time portion is due to the slew rate limit (dependent on output step size) unity gain frequency estimation for linear settling time portion
-3dB frequency determines the settling-time response for s step input step response for a closed-loop OpAmp if slew rate is larger, no SR limit will occur
1 3dB
1 = ta
vout (t ) = Vstep (1 e t / )
Vstep d vout (t ) t =0 = dt
JMM v1.0
OpAmp Compensation (second order model)

u for
compensating OpAmps the first order model is insufficient, because it ignores poles and zeros at high frequencies which may cause instabilities. u a more accurate open-loop transfer model adds one additional pole (real axis poles and zeros): A0 A(s ) = (1 + s / p1 )(1 + s / eq )
first dominant pole u eq higher frequency poles
may be approximated with a set of real-axis poles and zeros: m n 1 1 1 eq i =2 pi i=1 zi margin PM is an often used measure how far an OpAmp with feedback is from becoming unstable
o 1
u phase
LG ( j ) = 90 tan ( / eq ) PM LG( j t ) ( 180o ) = 90o tan 1 ( t / eq ) t = tan(90 o PM ) eq independent of

unity gain of LG
JMM v1.0
OpAmp Compensation (second order model cont)

u Closed-loop
gain if is frequency independent (if t is far away from high frequency poles and zeros) ACL0 ACL ( s ) = s (1 / p1 + 1 / eq ) s2 1+ + 1 + A0 1 + A0 A0 1 ACL0 = 1 + A0
u General
equation for a second order transfer function: K H 2 (s) = s s2 1+ + 2 oQ 0 u comparing: 0 = (1 + A0 )( p1 eq ) ta eq

Q=
(1 + A0 )( p1 eq )
1 / p1 + 1 / eq
ta eq
% overshoot = 100
JMM v1.0
4 Q 2 1
OpAmp Compensation (2nd order transfer function)

u Relationship
u no
between Q factor and phase margin
u transfer function: Q=sqrt(1/2):
peaking u widest passband u 0 = -3dB

u step response: Q<=0.5 (real poles and zeros)
u no
peaking of overshoot to be calculated Q factor 0.925 0.817 0.717 0.622 0.527 % overshoot 13.3% 8.7% 4.7% 1.4% 0.008%
u step response: Q > 0.5

u percentage
PM 55 60 65 70 75 u Phase
t/ eq 0.700 0.580 0.470 0.360 0.270
margin is much larger than supposed to be necessary (80 to 85)

JMM v1.0
Compensating the Two-Stage OpAmp

u Capacitor CC
realizes dominant-pole compensation and thereby control p1 and ta : ta = A0 p1 Q16 is included to realize a left-half-plane zero at frequencies around or slightly above t (leadcompensation). Q16 has Vds=0V and thus is in triode region: 1 RC = rds16 = W nC ox Veff 16 L 16
Vbias Vin300
u fet
Q5300
VDD
Q6 300
Q1
Q2
300
Vin+ Q 16 Vbias CC
300
Vout2
150
150
Q3
Q4
Q7
JMM v1.0
Compensating the Two-Stage OpAmp small-signal model

u simplified
small-signal model of two-stage OpAmp for compensation analysis

v1 gm1vin1 R1 C1 CC RC gm7v1 R2 C2 vout2
R1 = rds 4 rds2 R2 = rds 6 rds 7

dominant pole:
C1 = C db2 + C db4 + C gs7 C 2 = C db7 + C db6 + C L 2

nondominant pole:
analysis shown in Johns&Martin
1 p1 g m 7 R1 R2C C
for RC=0:
p2
gm 7 C1 + C 2
gm 7 z = CC
lead compemsation (RC not zero)

JMM v1.0
1 z = CC (1 / g m 7 RC )
Compensating the Two-Stage OpAmp (discussion)

s s 1 + D( s ) = 1 + p1 p2
CC gm7 I CC R
p2
p1
1 p1 g m 7 R1 R2C C p2 gm 7 C1 + C 2
gm 7 z = CC
u increasing gm7
separates poles (pole-splitting) u however, right-hand plane zero introduces negative phase shift into transfer function u increasing CC moves p1 and z1 to low frequency and thus does not help
JMM v1.0
Compensating the Two-Stage OpAmp (lead compensation)

u with
a non-zero RC, a third pole is introduced, but is at high frequency and has almost no effect u However the zero opens a number of possibilities: 1 z = CC (1 / g m 7 RC )
u one could eliminate the right-half plane zero:
RC = 1 / g m 7
u one could choose RC
to be even larger and thus move the right-half-plane zero into the left half plane to cancel the nondominant pole p2:
1 C1 + C 2 RC = 1+ gm 7 CC
u one could choose RC
even larger to move the now lefthalf-plane zero to a frequency slightly greater than the unity-gain frequency that would result without the resistor - say 20% larger (recommended): = 1.2
z
1 RC 1.2 g m1
JMM v1.0
Lead Compensation Design Procedure

' Start by choosing, somewhat arbitrarily, C C 5pF Using Spice, find the frequency at which a -125 phase shift exists. Let the gain at this frequency be denoted A and t. Choose a new CC so that t becomes the unity-gain frequency of the loop gain, thus resulting in a 55 phase margin. This can be achieved by taking CC according to the equation (iterations possible): ' CC = CC A' Choose RC according: 1 RC = 1.2 t C C The resulting phase margin is approximately 85 (leaving 5 for process variations). It may be necessary to iterate on RC to optimize the phase margin If after step 4 the phase margin is not adequate, then increase CC while leaving RC constant Replace RC by a fet with the following size: 1 RC = rds16 = W vlsi27 (25/34) nC ox VeffMicroLab, 16 L 16
JMM v1.0
Compensation Independent of Process and Temperature

u Making
lead compensation process and temperature insensitive u the ratios of all transconductances remain relatively constant over process and temperature variations as all fets depend on the same biasing network: gm 7 g m1 p2 ta C1 + C 2 CC u when a resistor is used to realize lead compensation, RC can also be made to track the inverse of transconductance (1/gm7), and thus the lead compensation will be mostly independent of process and temperature variantions: 1 z = CC (1 / g m 7 RC )

JMM v1.0
Compensation Independent of Process and Temperature (cont 2)

Making RC proportional to 1/gm7
RC = rds16
1 = W nC ox Veff 16 L 16
g m 7 = nC ox (W / L )7 Veff 7
The product RC 1/gm7 needs to be constant
(W / L )7 Veff 7 RC g m 7 = (W / L)16 Veff 16

Therefor, all that remains is to ensure that Veff16 /Veff7 is independent of process and temperature variations. The ratio can be made constant by deriving Vgs16 from the same biasing circuit used to derive Vgs7 u The
following approach results in the possibility of on-chip resistors, realized by using triode-region fets that are accurately ratioed with respect to a single off-chip resistor -> modern circuit design
JMM v1.0
Compensation Independent of Process and Temperature (cont 3)

if then
Veff 13 = Veff 7 Va = Vb
25
Vbias
Q 11
Q6
then (gates connected)
Veff 16 = Veff 12
thus
25
Q 12
25
Veff 7
Veff 16 Veff 12 to make Veff 13 = Veff 7
Veff 13
we need
Va Q 13
Q 16
CC
300
Vb
Q7
2ID7 2 I D13 = nC ox (W / L )7 nC ox (W / L)13
I D 7 (W / L)7 = I D13 (W / L )13

condition to be satisfied
however the current is set by Q6, Q11
I D 7 (W / L)6 = I D13 (W / L)11 (W / L)6 = (W / L )11 (W / L)7 (W / L)13

as ID12=ID13 are equal
( W / L )7 (W / L )12 RC g m 7 = (W / L )16 (W / L )13

JMM v1.0
Biasing an OpAmp to Have Stable Transconductances

u Fet
transconductances are the probably the most important parameters in OpAmps to be stabilized u the following approach matches transconductances to conductance of a resistor u as a result, the fet transconductances are independent of power-supply voltage as well as process and temperature variations
assuming
g m13 =
for
(W / L )10 = (W / L )11 ( W / L )13 2 1 ( ) W / L 15

Rb
25
25
Q 10
Q 11
(W / L )15 = 4(W / L)13

1 = Rb
i (W / L)i I Di g m13 n (W / L )13 I D13
25
25
Q 14 Q 15 Rb
100 25
Q 12 Q 13
g m13
g mi =
JMM v1.0
Exercises VLSI-27
Ex ana3.9 (difficulty: easy): Consider a differential pair amplifier shown on transparency vlsi-27/3 where Ibias=200A and all transistors have W=100m and L=1.6m. Given nCox=92A/V2 and rds-n=8000 [L (m)]/[ID (mA)]. Find the output impedance and the gain. Result: Av =68.6V/V, rout=64k (see Johns/Martin pp146) Ex ana5.1 (difficulty: easy): Find the gain of the OpAmp shown on transparency vlsi-27/9. Assume ID5=100A, first stage VDG=0.5V, 2nd and 3rd stage VDG=1V and bulk of Q8 connected to VSS. Given nCox =3pCox=96A/V2, VDD=VSS=2.5V, RL=10k , =0.5V1/2, F=0.35V, =5e6V1/2/m, Vtn=- Vtn=0.8V. Result: Av =-6092V/V (see Johns/Martin pp224)

JMM v1.0
Exercises VLSI-27 (cont 2)

Ex ana5.2 (difficulty: easy): Find the unity gain frequency of the OpAmp shown on transparency vlsi27/9, with CC=5pF . Assume ID5=100A, first stage VDG=0.5V, 2nd and 3rd stage VDG=1V and bulk of Q8 connected to VSS. Given nCox =3 pCox=96A/V2, VDD=-VSS=2.5V, RL=10k, =0.5V1/2, F=0.35V, =5e6V1/2/m, Vtn=- Vtn=0.8V. Result: fta = 24.7MHz (see Johns/Martin pp227) Ex ana5.3 (difficulty: easy): Find the slew rate of OpAmp on transparency vlsi-27/9, with CC=5pF . Assume ID5=100A. What circuit chane could be done to double the slew rate but to keep ta and bias currents unchanged? Result: SR = 20V/s, to double SR: CC=2.5pF and W1= W2= 75m (see Johns/Martin pp229)

JMM v1.0

Ex ana5.4 (difficulty: easy): Consider the OpAmp shown on transparency vlsi-27/9, where Q3 qnd Q4 are each changed to widths of 120m and we want the output stage have a bias current of 150A. Find the new sizes of Q6 qnd Q7 such that there is no systematic offset voltage. Result: W6 = 450m, W7 = 360m(see Johns/Martin pp231) Ex ana5.5 (difficulty: easy): One phase of an SC circuit is shown, where the input can be modelled as a voltage step. If 0.1% accuracy is needed in the linear settling-time portion corresponding to 100ns, find the required unity-gain frequency in terms of the capacitance values, C1 and C2 and in absolute values. For C2=10C1 and for C2=0.2C1. Result: fta = 12.1MHz, fta = 66.0MHz, C2 (see Johns/Martin pp235) C1
+
A(s)
vout

JMM v1.0

Ex ana5.7 (difficulty: medium): OpAmp has an openloop transfer function given by: A0 (1 + s / z ) A(s ) = (1 + s / p1 )(1 + s / 2 ) Assume that 2=2 50MHz and A0=104 a) Assuming z=inf, find p1 and the unity-gain frequency t so that the OpAmp has a unity-gain phase margin of 55 b) Assuming z=1.2 t (use t from a), what is the unity-gain frequency t. Also find the new phase margin. Result: a) t=2 35MHz, p1=2 4.27kHz, b) t=2 46.6MHz, PM= -85 (see Johns/Martin pp245)

JMM v1.0
Coming Up...
u Next
topic Advanced Current Mirrors and OpAmps for next time Johns&Martin : Sections 3.8 and 5
u Readings
u Exercises:
Have a look at the exercises in Johns&Martin.

JMM v1.0
Analog Microelectronics
Advanced Current Mirrors and OpAmp Design

JMM v1.0
Outline
u Johns&Martin
u advanced current mirrors (chap 6.1)
u wide-swing
current mirrors u wide-swing constant-transconductance bias circuit u enhanced output-impedance current mirrors (not yet) u wide-swing current mirror with enhanced output impedance (not yet)
u folded-cascode OpAmp (chap 6.2)
u small
signal analysis u slew rate u Exercises
(6.8 & 6.10)
u spice simulations u problems

JMM v1.0
Advanced current mirrors wide-swing current mirrors

u The
classical two-stage OpAmp was dicussed in vlsi27. u Recently a number of alternate OpAmps designs have been gaining in popularity. They make use of more advanced current mirrors.
u Wide-swing
current mirror:
u as shorter channel lengths are used, it becomes more
difficult to achieve reasonable OpAmp gains due to transistor output-impedance degradation caused to shortchannel effects. u Conventional cascode current mirrors limit the signal swings available. wide-swing current mirror

JMM v1.0
Wide-swing current mirrors

Ibias Vbias
(n + 1)2
W /L
Iin
W /L 2 n
Vout Iout=Iin
W /L 2 n
Q5
Q4 Q3
Q1 Q2
Vout > (n + 1)Veff for Q4: Vtn > nVeff
W /L
W /L
u The
basic idea is to bias the drain-source voltages of transistors Q2 and Q3 to be close to the mini-mum possible without them going to triode region. u Choice of Ibias:
u Ibias
equal to maximum of Iin (all fets in saturation) u Ibias equal to nominal of Iin (for larger Iin , fets in triode, but probably only during slew-rate)
u Design
u Q5
hints:
u a common choice for n is unity
larger (0,1V to 0.15V) in order to offset the increased threshold voltages for Q 1 and Q 4 due to their body effects u L of Q 1 , Q4 and Q 5 are twice minimal channel length, L of Q 2 and Q 3 are just slightly larger than minimal channel length (high frequency poles)
JMM v1.0
Wide-swing constanttransconductance bias circuit

20/1 20/1
Q8
20/1.6
Q7
20/1.6
Q 11
20/1 5/1.6 20/1.6
Vbias-p Q 14 Vcasc-p small W/L

2/20 10/1
Q9
Q6
Q 10
Q 18
10/1
10/1.6
Q1
40/1
Q4
10/1
10/1.6
10/1.6
Q 13
Q 15
Q 16 Q 17
Q2 RB
Q3
2.5/1.6
Q5
10/1
Q 12
Vcasc-n
10/1
Vbias-n
bias loop
see vlsi-27 slide 29
cascode bias
start-up circuitry
injects current as long as IDs are zero

JMM v1.0
Enhanced output-impedance current mirror

u Another
variation of the cascode current mirror is the enhanced output-impedance current mirror shown as simplified version u basic idea: use of feedback amplifier to keep the drain-source voltage across Q2 stable, irrespetive of the output voltage the additional amplifier increases the output impedance (see classical cascode current mirror, vlsi-25 slides 16, 17) Rout g m1rds1rds2 (1 + A)
Iout Iin Vbias +
Rout Q1
Q3
Q2

JMM v1.0
Folded-cascode OpAmp
u many
modern integrated CMOS OpAmps are designed to drive only capacitive loads u capacitive-only loads do not need voltage buffers to obtain low output impedance of the OpAmp u thus it is possible to realize OpAmps having higher speed and larger signal swings than those who must drive resistive loads u these improvements are obtained by having only one single high-impedance node at the OpAmp output that drives only capacitive loads u all internal nodes have relatively low impedance (around gm) thus the speed is optimized u the compensation is usually achieved by the load capacitance u the most important parameter is their transconductance: operational transconductance amplifier OTA

JMM v1.0
Folded-cascode OpAmp
may be replaced by a wide-swing constant-transconductance bias network and thus VB1, VB2 would be Vcasc-n, Vcasc-p current mirror Q 11 Q 12 Ibias1 Vin + Q1 Q2 Ibias2 Q3 Q 13 Q4
cont
folded cascode fets (see vlsi-25 slide 19) VB1 Q5 Q6 Vout CL VB2 Q8 compensation Q 10
differential-input single-ended output
Q7 Q9
Purpose of Q12, Q13 - increase slew-rate performance - recovering improvement from slew-rate
wide-swing cascode current mirror
Design hints: - Ibias1 and Ibias2 should be derived from a single bias network - any current mirrors should be designed by parallel combination of unit size fets
JMM v1.0
Folded-cascode OpAmp small-signal analysis

Assumption: gm5 and gm6 are much larger than gds3 and gds4 - differential output current from drains of differential pair Q1 and Q2 is applied to the load capacitance - the small-signal current from Q1 passes directly from source to drain of Q6 and thus to CL (indirect for Q2 to Q5 and CL)
Vout (s ) Av = = g m1 Z L ( s ) Vin (s ) g m1rout Av (s ) = 1 + srout C L

for mid-band and high frequencies
(for gm1 = gm2)

2 g m rds 2
rout
(see vlsi-25 slide 20) thus the unity-gain frequency is
g m1 Av sC L
g m1 t CL
Design hint: - for large load capacitances a maximal transconductance of input fets maximizes band width, use n-channel fets - input bias current 4 times larger than cascode current (maximizing dc gain) Lead compensation (series resistance R C to CL)
Av (s ) =
g m1
RC can be choosen to place a zero at 1.2 times unity-gain frequency

JMM v1.0
1 1 + rout RC + 1 / sC L
g m1 (1 + sRC C L ) sC L
Folded-cascode OpAmp slew-rate

u The
diode connected fets Q12 and Q13 are turned off during normal operation and have almost no effect u slew-rate limiting behavior:
assume there is a large differential input voltage that causes Q 1 to be turned on hard and Q 2 to be turned off u since Q 2 is off, all of the bias current of Q 4 will be directed to through cascode fet Q5 through n-channel current mirror and out of the load capacitance u the output voltage will decrease linearly with a slewrate given by:
u
Id4 SR CL
u Q1
and current source Ibias will go into triode region, moving the drain voltage of Q 1 to the negative power supply u Q12 and Q 13 clamp the drain voltages so they dont change as much during slew-rate limitation u in addition Q 12 and Q 13 increase the bias currents for Q 3 and Q 4 and thus for C L

JMM v1.0
Exercises VLSI-28
Ex ana6.2 (difficulty: medium): find reasonable fet sizes for the folded-cascode OpAmp: Assume pos/neg 2.5V power supply, power dissipation maximal 2mW, current ratio 4:1 between input and cascode fets, bias current or Q11 is 1/30 of Q3 (thus ignoring it for power dissipation), maximum fet width is 300um, L=1.6um and Veff=0.25V for all except input fets, W1=W2=300um, rounding widths to 10um, CL=10pF, unCox= 3u pCox= 96uA/V2 a) find all fet sizes, unitiy gain frequency, b) slew-rate with and without clamp fets c) reasonable lead compensation RC Result: a) Q1 to Q4=300um, Q5, Q6=60um, Q7 to Q10=20um, Q11 to Q12=10um, t=2 38MHz b) SR= 32V/us, c) RC=347 (see Johns/Martin pp271-273)

JMM v1.0
Coming Up...
u Next
topic Comparators for next time Johns&Martin : Sections 6.1 and 6.2
u Readings
u Exercises:
Have a look at the exercises in Johns&Martin.

JMM v1.0
VLSI Systems Design

FSM-D Architecture Model
FSM-D
data data
data-path (RTL logic)
inputs
control
outputs (actuators)
cotrol path (finite state machine)
control control
(sensors)
Goal: You are able to use logic gates and flip-flops wisely and not only in an ad-hoc manner. You master the finite state machine data path model.

JMM v1.4
Architecture Philosophy
?FSM-D architecture model is composed of 2 blocks:
?finite state machine (FSM) ?data-path (D)
?Goal of FSM-D architecture model

?structured design approach ?ressource optimization ?readability, documentation
?FSM Chatacteristics
?manager ?controlling, taking decision, initiating sub-tasks
?Data-Path Characteristics
?worker, specialist ?executing, calculating, storing & moving data

JMM v1.4
FSM-D Architecture Model

?The FSM-D architecture model
?based on FSM model and data-path model ?interface: inputs, outputs
FSM-D
data data
data-path (RTL logic)
inputs
control
outputs (actuators)
cotrol path (finite state machine)
control control
(sensors)

JMM v1.4
FSM Structures
? Mealy machine
? outputs are dependent of inputs and state
s[k+1] i[k] transition logic state register output logic o[k]
s[k]
s[k ? 1] ? f (i[k ], s[k ]) o[k ] ? g (i[k ], s[k ])
? Moore machine
? outputs are dependent on states only (functional restricted)
s[k]
s?k ? 1?? f ?i?k ? , s?k ?? o?k ?? g ?s?k ??
? Medwedjew machine
? outputs are dependent on states only ? outputs are hazard-free
s[k+1] i[k] transition logic state register o[k]
s[k]
s[k ? 1] ? f (i[k ], s[k ]) o[k ] ? s[k ]

JMM v1.4
Data-Path Elements
?A typical data-path consists of 3 types of basic elements
?buses, multiplexors, de-multiplexors ?functinal units, comparator, like adder, barrel shifter, ALU, etc ?memory elements, like flip-flop, register, register file, etc
bus[31:0] bus[31:0]
32
32
bus[31:16]
32 32 32 32
mux 32
16
a 32
result cin b 32 32 1
ADD
cout 32
enable register
32

JMM v1.4
Data-Path Memory Element

?Memory elements store new values at every clock cycle ?To give the FSM full control to the data-path, the data-path memory elements need to be upgraded with an enable control input
mux di 32 enable
register d do 32
register enable di do
32
32
clock enable di do data data

JMM v1.4
Design Steps
?A tutorial design shall serve as vehicle for a practical approach: Black Jack player ?A key element in the FSM-D design procedure are the interface definitions ?design steps:
?step 1: definitions of the algorithm ?step 2: FSM-D interface definition ?step 3: data-path design ?step 4: data-path interface definition ?step 5: FSM interface definition ?step 6: FSM state definition ?step 7: FSM design ?step 8: VHDL coding ?step 9: test-bench design and simulation

JMM v1.4
Design Step 1: Algorithm Definition

?goal of the Black Jack game:
?get as close as possible to 21 points ?lost if overpassed 21 points
?game restrictions:
?the cards have the following values: 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11 as well as boy, lady and queen all three representing 10 points
?game rules:
?ask for as many cards as needed ?the Ass can be treated as 11 points or as 1 point
?our players behavior:

?ask for cards as long as the summed-up points are below 16 ?treat Ace alyways as 11 points ?when overpassed 21 points treat possible Ace as 1 point to get a second chance

JMM v1.4
Design Step 2 FSM-D Interface Definition

?defining the interface of the overal FSM-D architecture model ?defining edge sensitivity of clock and active level of control signals
FSM-D
cardReady newCard score(4:0)
BlackJack Player
cardValue(3:0)
lost clk finished
start

JMM v1.4
Design Step 3: Data-Path Definition

?data-path has to be able to execute all functional operations of the algorithm ?clearly separate control-path and data-path tasks as in the manager/worker analogical model ?use memory elements, buses and multiplexers for storing and moving data ?use combinational logic for functional operations like adding, comparing, etc

JMM v1.4
Design Step 3: Data-Path Definition: loading&comparing

?loading card value into register ?comparing to Ass
cmp11 A=B? A B enaLoad cardValue(3:0) register enable di do regLoad clk rst 11

JMM v1.4
Design Step 3: Data-Path Definition: accumulating

?accumulating the card values
cmp11 A=B? A B enaLoad cardValue(3:0) register enable di do regLoad clk rst a 11 register enaAdd enable
ADD
result b clk rst di do
regAdd

JMM v1.4
Design Step 3: Data-Path Definition: comparing sum

?comparing the accumulated values ?visualizing score
cmp11 A=B? A B 11 a regLoad clk rst enaAdd register enable di do regAdd b clk rst clk rst cmp16 A>B? A B 16 cmp21 A>B? A B enaScore register 21 enable score di do
enaLoad register enable di do
ADD
result

JMM v1.4
Design Step 3: Data-Path Definition: subtracting 10

?insert a second path to the load register and adder to subtract 10
cmp11 A=B? A B cmp16 A>B? A B 16 regAdd b clk rst clk rst cmp21 A>B? A B enaScore register 21 enable score di do
-10
enaLoad register mux enable in0 do di do in1 clk rst
11 regLoad enaAdd register a enable
ADD
result di do
cardValue sel

JMM v1.4
Design Step 4 Data-Path Interface Definition

?defining the interface of the data-path block ?defining edge sensitivity of clock and active level of control signals
DataPath
cardValue(3:0)
score(4:0)
clk rst sel enaLoad enaAdd enaScore cmp11 cmp16 cmp21

JMM v1.4
Design Step 5 FSM Interface Definition

?defining the inputs and outputs of the FSM block
FSM input signals cmp11 cmp16 cmp21 cardReady FSM output signals finished lost newCard sel enaLoad enaAdd enaScore

JMM v1.4
Design Step 5: Interface Definition Completed FSM-D Hierarchy

BlackJack Player FSM-D DataPath
cardValue(3:0) score(4:0)
clk cardReady
ControlPath
enaScore
rst enaLoad enaAdd cmp11 cmp16 cmp21 sel
newCard lost finished
rst start

JMM v1.4
Design Step 6 FSM State Definition

?draw a skeleton state with placeholders for the state name and the output signals.
state name finished lost newCard sel enaLoad enaAdd enaScore

JMM v1.4
output signals
Design Step 7 FSM Design FSMD Timing

?single clock cycle schema ?Moore type FSM ?FSM-D timing diagram
?registered values are available in next state or when leaving next state ?combinational values are available in current state or when leaving current state
state clock enable (FSM) registers (D) inform (D) select (FSM) data bus (D) data new value LoadReg CheckVal Idle1 OpenData Idle2

JMM v1.4
Design Step 7 FSM Design

?design the Moore type state diagram ?conditions on arrows are FSM inputs ?output values are defined in states ?use bilzard arrow for asynchronous reset
cardReady
reset
state name
CallCard hold broke newCard sel enaLoad enaAdd enaScore
cardReady
state name
LoadCard hold broke newCard sel enaLoad enaAdd enaScore Handshake hold broke newCard sel enaLoad enaAdd enaScore
output signals 0 0 1 - - 0 0
output signals 0 0 1 1 1 0 0
cmp11cmp16 cardReady
cmp11cmp16cmp21 state name AddCard hold broke newCard sel enaLoad enaAdd enaScore cardReady state name
output signals 0 0 0 - 0 1 0
output signals 0 0 0 - 0 1 0
cmp16cmp21
cmp11cmp21
cmp16

JMM v1.4
Design Step 8: Coding Data-Path

?all registers with associated logic are placed in one process (same clock and asynchronous reset) ?loosely coupled combinatorial logic can be coded with conditional signal assignments
cmp11 A=B? A B cmp16 A>B? A B 16 regAdd clk rst clk rst cmp21 A>B? A B enaScore register 21 enable score di do enaLoad register mux enable in0 do di do in1 clk rst
-10
11 regLoad enaAdd register a enable
ADD
result di do b
sel
process(clk,rst) begin if (rst = 0) then process regLoad <=00000; regAdd <=00000; regScore <=00000; continuous conditional elsif (clkevent and clk=0) then if (enaAdd=1) then assignment regAdd <= regAdd +regLoad; end if; ... cmp11 <= 1 when (regLoad =01011), else 0; end if; cmp16 <= 1 when (regAdd > 10000) else 0; end process; cmp21 <= 1 when (regAdd > 10101) else 0;
JMM v1.4
Design Step 8: Coding FSM

?one clocked process is used for the state transition ?one combinatorial process is used for the state dependent output assignment
state
s[k]
process(clk,rst) begin if (rst = 0) then state<=StartState; elsif (clkevent and clk=0) then case state is when StartState => state <= CallCard; when CallCard => if (cardReady = 1) then state <= LoadCard; end if; when others => state <= IllegalState;
-- used for VHDL analysis -- nullfor synthsis
process(state) begin case state is when StartState => outvec <= 000--00; when CallCard => outvec <= 001--00; when others => outvec <= UUUUUUU;
-- used for VHDL analysis -- null for synthesis
end case; end process; finished <= outvec(6); lost <= outvec(5); newCard <= outvec(4); ...
end case; end if; end process;
JMM v1.4
Design Step 9: Test-Bench Design

?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested ?there is a nice measurement equipment ?there are skilled and hard working people ?there are no signals coming or going to the outside of the lab
Test Bench

JMM v1.4
Design Step 9: Test-Bench Design Test Cycle

?cycle based test ?apply input patterns at begining of test cycle ?capture response after rising or falling clock edge
apply stimuli
capture response
test cycle clock inputs outputs (sync) stable stable

JMM v1.4
Design Step 9: Test-Bench Design Simulation

?cycle based test ?apply input patterns at begining of test cycle ?observe response after rising or falling clock edge ?visualize data-path registers and FSM state

JMM v1.4
Errors and Pitfalls

?asynchronous external inputs to FSM provoke state hazards
?imagine a 0.1 ns hazard can be captured in state register ?imagine 100 states in FSM ?imagine 100 MHz clock frequency ?100 errors per second
? input synchronization for all external (nonsynchronous) FSM inputs
input synchronization register non-synchronous inputs i[k] transition logic
new state always with hazards

s[k+1] state register output logic o[k]
s[k]
FSM

JMM v1.4
Summary and Conclusion

?FSM-D architectural model supports structured design approach ?9 design step approach for FSM-D design presented ?task re-distribution between FSM and data-oath is crucial:
?Ass counting (0, 1 or 2) n Black Jack dealer. Who should do it? FSM or data-path?
?workers/manager analogy is used to assign subtasks to control-path (manager) and data-path (specialized workers)

JMM v1.4

JMM v1.4

Keka Vlsi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Keka Vlsi

Uploaded by

Copyright:

Available Formats

VLSI System Design

Overview of VLSI Design Issues

Whats expected of you

Project 40% of final grade

Test 60% of final grade

Timetable 4th Semester: Introduction to VLSI System Design

MicroLab, VLSI-1 (3/28)

So, whats VLSI Systems Design all about?

MicroLab, VLSI-1 (4/28)

Key Technology Microelectronics

MicroLab, VLSI-1 (5/28)

What is a VLSI Circuit?

MicroLab, VLSI-1 (6/28)

Course Outline/Brief history

MicroLab, VLSI-1 (7/28)

MicroLab, VLSI-1 (8/28)

MicroLab, VLSI-1 (9/28)

Practice makes perfect...

The Big Bang

MicroLab, VLSI-1 (11/28)

MicroLab, VLSI-1 (12/28)

AVPAVP-III Video Codec from Lucent Technologies

Well be concentrating on the rightright-hand column

MicroLab, VLSI-1 (15/28)

VLSI Design Challenge

MicroLab, VLSI-1 (16/28)

minimal channel length 10m 5m 2m 0.5m 0.13 0.13m ?

chip 2mm 5mm 10mm

town Biel Paris Switzerland

MicroLab, VLSI-1 (18/28)

Circuit Design & Layout

Q: Which engineer drew the most fets? fets? ______

VLSI: The Ideal Implementation Medium?

MicroLab, VLSI-1 (21/28)

VLSI Fact Fact-ofof-Life #1: So much to do, so little time

MicroLab, VLSI-1 (22/28)

VLSI FactFact-ofof-Life #2: You cant reach in and fix it

VLSI FactFact-ofof-Life #3: Verification is a tedious task

MicroLab, VLSI-1 (24/28)

VLSI FactFact-ofof-Life #5: Nobodys perfect

MicroLab, VLSI-1 (26/28)

Microelectronics in 4th Semester

exercises with CAD tools synthesis

EXPERIENCE data path / fsm project

MicroLab, VLSI-1 (28/28)

Lets build a MOSFET

Use <100> surface to minimize surface charge

p-type Back is metal metalliz lized to provide a good ground connection.

MicroLab, VLSI-2 (3/24)

exposed surface for source and drain junctions

MicroLab, VLSI-2 (4/24)

??? diff contact (0.25 - 10 ohms)

n- channel MOS field effect transistor!

n+ p mobile holes, fixed negative ions B

almost always ground

FET = field effect transistor

VTO = Vt ms + Vfb     Q Q VTO = 2 F + b + ms fc , C ox C ox

NA 0.61V for nn-channel 2 kT ln -0.61V for pp-channel q n i

MicroLab, VLSI-2 (8/24)

Body effect (second order)

the threshold voltage of the nn-channel transistor is now:

MicroLab, VLSI-2 (9/24)

Linear operating region

VTO = Vt ms + Vfb Q Q VTO = 2 F + b + ms fc , C ox C ox