Professional Documents
Culture Documents
Professor: Dr. Marcel Jacomet (based on transparencies designed by Chris Terman at MIT, completely updated and adapted at MicroLabMicroLab-I3S)
Overview Microelectronic history the complexity of microelectronics design steps Goal: You are familiar with the microelectronics history, have an idea about the microelectronics complexity and you have an overview of the VLSI design steps.
MicroLab, VLSI-1 (1/28)
JMM v1.4
Readings from a Starter Guide to VHDL and some articles. Some problems to be worked at home. SelfSelfstudy of the VHDL language with help of the CBT CD from Doulouse. Doulouse. Some design exercises to be done in the lab. Specify, design and simulate a small VHDL design project using a datadata-path / finit state machine. Place & route it on a FPGA target technology (due date: July 19th at 13h00, 2002) One 70 minute inin-class test. Meant to be duck soup if youve been coming to lectures and doing the lab and homework (date: Friday July 12th, 2002).
MicroLab, VLSI-1 (2/28)
JMM v1.4
Topic vlsi1: history & complexity vlsi8: micro technologies -vlsi8: micro technologies vlsi21: toptop-down design, VHDL Ex400, 401 -vlsi21 & Ex402 vlsi21 & Ex404,405 vlsi21 & Ex406Ex406-408 vlsi21 & Ex409 vlsi21: & Ex410 Ex450 Ex451 Ex452 Test test discussion and outlook at 13h00 project due
SelfSelf-Study A VLSI tutorial How a silicon int. article Hoff VHDL/CBT VHDL/CBT VHDL/CBT VHDL VHDL VHDL chapter 5 VHDL finish project project project project project
Technique where many circuit components and the wiring that connects them are manufactured simultaneously into a compact, reliable and inexpensive chip.
Early (circa 1977) characterization of circuit size before people realized that the number of components per chip was quadrupling every 24 months ( (Moores Moores Law)! This growth rate has slowed in recent years can you guess why?
Bell Labs lays the groundwork: 1940: Ohl develops PN junction 1945: Shockleys lab established 1947: Bardeen and Brattain create pointpoint-contact transistor with two PN junctions. Gain = 18.
1951: Shockley develops junction transistor which can be manufactured in quantity. 1952: Dummer forecasts solid block [with] layers of insulating, conducting and amplifying materials 1954: The first transistor radio! Also, TI makes first silicon transistor (price $2.50)
Early integration
Jack Kilby, Kilby, working at Texas Instruments, first dreamed up the idea of a monolithic integrated circuit in July 1959. By the end o of f the year, he had constructed several examples, including the flipflip-flop shown in the patent drawing above. Components are connected by handhand-soldered wires and isolated by shaping and pn diodes used as resistors. Robert Noyce experimented in the late 40s with transistors while a physics major at college. He went to MIT where where much to his surprise, few people had even heard about the transistor. After getting his PhD in 1953, he worked in industry, industry, finally arriving at Mountain View, CA and Shockley Semiconductor Labs in 1955.
In 1957, Noyce left Shockleys lab to form Fairchild SemiSemiconductor with Jean Hoerni. Hoerni. Gordon Moore is another founder.
In early 1958, Hoerni invents technique for diffusing impurities into the silicon to build planar transistors and then using a SiO2 insulator.
In mid 1959, Noyce develops first true IC using planar transistors, backback-toto-back pn junctions for isolation, diodediode-isolated silicon resistors and SiO2 insulation with evaporated metal wiring on top.
1961: TI and Fairchild introduced the first logic ICs (cost ~$50 in quantity!). This is a dual flipflip-flop with 4 transistors. 1963: Densities and yields are improving. This circuit has four flip flops.
0.97 mm
1967: Fairchild markets the semisemi-custom chip shown below. Transistors (organized in columns) could be easily rewired using a twotwo-layer interconnect to create different circuits. This circuit contains ~150 logic gates.
3.81 mm
1968: Noyce and Moore leave Fairchild and found Intel. No business plan, just a promise to specialize in memory chips. They raise $3M in two days and move to Santa Clara. By 1971 Intel had 500 employees; by 1983 it had 21,500 employees and $1100M in sales.
MicroLab, VLSI-1 (10/28)
JMM v1.4
In 1970, making good on its promise to its investors Intel starts selling a 1K bit RAM, the 1103. It was a bear to interface to, but its density and cost make it the only game it town.
In 1971 Intel introduces the first microprocessor, designed by Ted Hoff. The 4004 had 44-bit buses and a clock rate of 108KHz. It had 2300 transistors and was built in a 10um process. It never captured much interest in the market and was soon eclipsed by its more capable brothers.
Exponential Growth
Introduced in 1972, the 8008 had 3,500 transistors supporting a bytebyte-wide data path. Despite its limitations, the 8008 was the first microprocessor capable of playing the role of computer CPU as demonstrated on the cover of the July 74 issue of RadioRadio-Electronics.
Last, but not least, on our tour is the 8080. Introduced in 1974, the 8080 had 6,000 transistors fabed in a 6um process. The clock rate was 2Mhz, more than enough to ignite the personal computer industry. At least Paul Allen and his partner thought so when they wrote a BASIC interpreter for the 8080 in 1975. They would later collaborate in another, more profitable, venture...
Today
Many disciplines have contributed to the current state of the art art in VLSI design: solidsolid-state physics materials science lithography and fab device modeling circuit design & layout architecture algorithms CAD tools
MicroLab, VLSI-1 (13/28)
JMM v1.4
ComputerComputerAided Design
CAD Tools
generate
#1
verify
Symbolic layout tools to ease the task of physical design; mask verification to ensure manufacturability.
organize
StandardStandard-cell place and route for random logic.
Circuit analysis programs predict circuit behavior at all the process corners. GateGate-level and behavioral simulators help you get it right the first time! Tools to do the tedious, repetitive work such as routing,tiling a mosaic of buildingbuilding-block cells, or verifying that the layout and schematic match.
MicroLab, VLSI-1 (14/28)
JMM v1.4
CAD Tools #2
Problem: designing highly complex VLSI circuits (100K to xM fets) fets) classical, iterative procedures are unsuitable precise transistor models are necessary for reliable predictions data inflation Solution: new design methodologies powerful design tools high level design languages silicon compiler would be useful
Chip Complexity #1
Chip classification according to number of active elements and minimal feature size: classification SSI MSI LSI VLSI ULSI year 1970 1980 1985 1992 2002 2002 2010 #transistors 1 - 100 100 - 1k 1k - 100k 100K ? example gates registers uP RAM, sig. proc.
JMM v1.4
Chip Complexity #2
can you really imagine the chip complexity of today's VLSI chips and not just express it as a mere number street map image year feature block 1970 10x10m 200m 1980 10x5m 200m 1992 10x0.7 200m
Architecture
(Multiple choice) This is a picture of (A) a programmable general purpose ASIC with 1/4 million transistors on a 40mm2 designed in a 0.7m CMOS full custom technology. (B) a processor able to execute 64 knowledge based rules in parallel due to a 3 stage pipelined architecture with hardhard-coded adder, multiplier, divider architecture. (C) the fastest fuzzy processor in the world, designed by MicroLabMicroLab-I3S and presented at the international FUZZ98 conference in New Orleans ANSWER: _________
MicroLab, VLSI-1 (19/28)
JMM v1.4
RAM Generator
VLSI Fact Fact-ofof-Life #4: You cant find all the bugs
The key word here is find: one cant explore the behaviour of the circuit under all possible conditions some of the bugs arise from unanticipated interactions which, by definition, one never thinks of testing its not clear when one is done looking for bugs! Time pressures mean that most searches stop too soon.
The trick is to choose some implementation rules that result in a circuit that is correct by construction*. For example: choose a simple clocking scheme module inputs must go only to fet gates disallow unclocked feedback make register t(clk t(clkclk-toto-Q) > t(hold)+skew use poly only for local interconnect no diffusion wires etc., etc., etc.
* or at least avoid as many problems as possible!
MicroLab, VLSI-1 (25/28)
JMM v1.4
VHDL
design flow Course material Textbook from Weste & Eshraghian for 4th and 5th semester (voluntary) Copy of transparencies (placeholder for private notes) VHDL Starter (recommended) CAD Exercises on the MicroLab web pages CBT CD on VHDL for your PC (lending from MicroLab in 4th semester) MicroLab, VLSI-1 (27/28) different small articles
JMM v1.4
Coming Up...
Well be traveling toptop-down in 4th semester and bottombottom-up in 5 & 6 semester: Next topic Microelectronic technologies like standard cell, gate array, seasea-ofof-gates, macro cell, FPGA, tiny micromicro-controllers. Readings for next time web CBT tutorials see on http://www.microlab http://www.microlab. microlab.ch/academics/courses ch/academics/courses How a silicon integrated circuit is made (web CBT) A VLSI Tutorial up to chapter with NAND/NOR (web CBT from Uni Manchester) (German erman) T. Hoff: Article about the P History (G erman) To learn more about Intels early days and to ogle some die photos of oldieoldie-butbut-goodie chips browse at the Intel link of the MicroLab VLSI course web page.
VLSI Design I
The MOSFET model
Wow ! Are device models as nice as Cindy ?
Overview The large signal MOSFET model and second order effects. MOSFET capacitances. Introduction in fet process technology Goal: You can use the large signal equivalent MOS device equation. You are familiar with second order effects like body effect, channel length modulation. You know the MOS capacitances. You know the basic steps in MOS fabrication.
MicroLab, VLSI-2 (1/24)
JMM v1.4
500um slice of a silicon ingot that has been doped with an acceptor (typically boron) to increase the concentration of holes to 1014/cm3 - 1018/cm3.
Good for nn-channel fets, fets, but pp-channel fets will need a nn-type well (or tub) to live in!
MicroLab, VLSI-2 (2/24)
JMM v1.4
Next, a thick (0.4um) layer of silicon dioxide, called field oxide, is formed on the surface by oxidation in wet oxygen. This is then etched to expose surface where we want to make a mosfet: mosfet:
Now grow a thin (0.01um = 100 ) layer of silicon dioxide, called gate oxide, on the surface by exposing the wafer to dry oxygen.
The gate oxide needs to be of high quality: uniform thickness, no defects! The thinner the gate oxide, the more oomph the fet will have (well see why soon) but the harder it is to make it defect free.
On top of the thin oxide a 0.7um thick layer of polycrystalline silicon, called polysilicon or poly for short, is deposited by CVD. The poly layer is patterned and plasma etched (thin ox not covered by poly is etched away too!) exposing the surface where the source and drain junctions will be formed:
gate oxide (only under poly)
poly wires
field oxide
Poly has a high sheet resistance (25 ohms/square) which can be reduced by adding a layer of a silicided refractory metal such titanium (TiSi2), tantalum (TaSi2) or molybdenum (MoSi2). These have sheet resistances of 1, 3 or 5 ohms per square, respectively. This is great for memory structures that have lots of poly wiring.
The entire surface is doped, either by diffusion or ion implantation, with phosphorus (an electron donor) which creates two nn-type regions in the substrate. The phosphorus also penetrates the poly reducing its resistance and affecting the nfets threshold.
diffusions are selfself-aligned with poly n+ n+ wires: 2020-30 ohms/sq. n+ p
Finally an intermediate oxide layer is grown and then reflowed to flatten its surface. Holes are etched in the oxide (where contacts to poly/diff are wanted) and alumin aluminum deposited, patterned and etched.
metal wires (0.08 ohms/square)
JMM v1.4
NFET Operation
Picture shows configuration when Vgs < Vto S G D Ids = 0
n+
depletion layer
no mobile carriers, but fixed negative ions (slight intrusion into n+, but mostly in p area)
mobile electrons, fixed positive ions (n+ means heavily doped with donors, doesnt imply positive charge!) Terminal with higher voltage is labelled D, the other is labelled S so Ids >= 0. D
Other symbols: S B
JMM v1.4
bulk INVERSION: A sufficiently str strong ong vertical field will attract enough electrons to the surface to create a conducting nn-type channel between the source and drain. CONDUCTION: If a channel exists, a horizontal field will cause a drift current from the drain to the source. Expect Ids proportional to Vds*(W/L)? Vds*(W/L)?
MicroLab, VLSI-2 (7/24)
JMM v1.4
Threshold voltage
The gate voltage required to form the channel is called the threshold gate-source voltage at which the voltage. Many factors affect the gatechannel becomes conductive. Threshold voltage for sourcesource-bulk voltage zero:
ox t ox
kT N DN A ln 2 q ni
2 si q N A 2 F
As well see, this effect comes into play in seriesseries-connected fets where only one of the fets will have Vsb = 0 and the other fets will have Vsb > 0 and a higher threshold voltage.
JMM v1.4
Basic DC equations
MOS transistors have 3 regions of operation: cutoff region (subthreshold (subthreshold) subthreshold) linear region (triode region) saturated region (active region)
polysilicon gate SiO2 source diffusion W L
drain diffusion
Cutoff or subthreshold region: Vgs <=V <=Vt Ids = 0 There is still a small current described in the second order effects (weak inversion). Important to model for analog circuits: I ds Vds
MicroLab, VLSI-2 (10/24)
JMM v1.4
Larger Vgs creates deeper channel which increases Ids channel length is almost always min allowable mobility (un > up)
Larger Vds increases drift current but also reduces vertical field component which in turn makes channel less deep. Channel will pinchpinch-off, when
max value at Vds = Vdsat, but then channel is pinched off (see next slide)
JMM v1.4
Voltage at channel end remains essentially constant at Vdsat so drift current also remains constant: device is in saturation.
Electrons arriving from source are injected into drain depletion region and accelerated towards drain by field proportional to Vds - Vdsat usually reaching the drift velocity limit.
W ox I ds (sat ) = Vgs Vt 2 L t ox
L = L - dL dL
This looks just like a fet with a channel length of L < L. Shorter L implies greater Ids...
As Vds increases the effective channel length gets shorter so Ids(sat) increases. dL is proportional to Vds Vdsat but most people approximate channel length modulation as a linear effect:
W ox I ds (sat ) = Vgs Vt 2 L t ox
) (1 + V
2
ds
Can you find the following in the plot? Ids vs. Vds when Vgs = 0V Ids vs. Vds when Vgs = 5V value of Vt value of Vdsat evidence of body effect evidence of channel length modulation
MicroLab, VLSI-2 (14/24)
JMM v1.4
SPICE Models
There are different models used in circuit simulators: level 1 is our simple model including the most important second order effects described level 2 model is based on device physics level 3 is a semisemi-empirical model allowing to match equations to the real circuit circuit: : example BSIM model from Berkeley models subthreshold characteristics summary of the main SPICE DC parameters used in all three models at the end of this chapter
. M1 4 3 5 0 nfet W=1u L=0.5u AS=1p AD=1p PS=3u PD=3u . . .MODEL nfet NMOS +TOX=1E+TOX=1E-8 +CGB0=345p CGS0=138p CGD0=138p +CJ=775u CJSW=344p MJ=0.35 MJSW=0.26 PB=0.75 +. . . . . .
MicroLab, VLSI-2 (15/24)
JMM v1.4
Cgd
channelchannel-charge related capacitances (intrinsic): cutcut-off: Cgb = Cox W L Cgs = Cgd = 0 linear:
shielded by channel Cgb = 0 Cgs = Cgd = 0.5 Cox W L
equally shared between S and D note capacitive coupling of gate and drain/source saturation: Cgb = 0 channel pinched off Cgd = 0 channel shortened
channel sidewall faces channel bottom junction faces p-type substrate sidewalls face p+ channel stop zerozero-bias C/unit length of sidewall junction perimeter of diffusion
C diff =
C jA Vj 1 V b
Mj
C jsw P Vj 1 V b
Mjsw
JMM v1.4
P-channel MOSFETs
S G D
p+
p+ n p
threshold voltage is negative since we need attract holes to form inversion layer Other symbols:
PFET is built inside its own substrate: a nn-type well or tub diffused into p-type bulk substrate. Dont forget well contacts! G Terminal with lower voltage is labelled D, the other is labelled S D B n-well always connected to Vdd to keep pn junction backback-biased
MicroLab, VLSI-2 (19/24)
S off: Vgs > Vt lin: lin: Vgs>Vt, Vds>Vgs-Vt sat: Vgs>Vt, Vds<Vgs-Vt
JMM v1.4
n+ p B
n+
channel doped with donors to give negative threshold voltage, i.e., depletion fets are always on.
This mosfet is always conducting but, like ordinary enhancement fets, fets, it will conduct more current as Vgs increases. One can build logic circuits with only nnchannel devices (NMOS): enhancement fets for pulldowns and depletion fets as static pullups. pullups. Since NMOS logic dissipates DC power its been largely replaced by CMOS.
Coming Up...
Next topic Static characteristics of MOS inverters: input and output voltages, noise margins, power dissipation. Readings for next time Weste:
sections
models), 3 thru 3.2.2 (process technology) and 4.3 through 4.3.4 (capacitances)
CBT: Study the chip fabrication text of the university of Manchester at the MicroLab VLSI course web link.
Useful Constants
sym 0 ox Si VT q k ni value 8.8542E8.8542E-12 3.9 0 11.7 0 25.8 1.6022E1.6022E-19 1.381E1.381E-23 1.45E10 units F/m F/m F/m mV C J/K cm-3 description permittivity permittivity of SiO2 permittivity of silicon kT/q kT/q (@300K) charge of electron Boltzmanns Boltzmanns constant intrinsic carrier concentration
param nmos pmos units description VTO 0.61 -0.61 V threshold voltage TOX 1Ethin oxide thickness 1E-8 1E1E-8 m NSUB 4E16 4E16 cm-3 substrate doping density U0 290 72 cm2/Vs charge mobility KP A/V2 fet gain factor GAMMA V0.5 bulk threshold param. param. COX F/m2 oxide capacitance capacitance V- 1 channel length /L modulat. V-1m-1 channel length mod fact. modulat.1e1e-8 2e2e-8 PB 0.7556 0.78469 V built in junction potent. PHI 0.77 0.77 V surface inversion pot. CGB0 CGS0 CGD0 CJ CJSW MJ MJSW 3.45E3.45E-10 dito F/m overlapping cap per 2L 1.38E1.38E-10 dito F/m overlapping cap per W 1.38E1.38E-10 dito F/m overlapping cap per W 7.75E7.75E-4 8.15E8.15E-4 F/m2 zerozero-bias cap / unit A 3.44E3.54E-10 F/m zero3.44E-10 3.54E zero-bias cap per unit P 0.35 0.36 grading coeff for bottom 0.26 0.27 grading coeff sidewall MicroLab, VLSI-2 (23/24)
VLSIExercises: VLSI -2
Ex vlsi2.1 (difficulty: easy): Calculate the missing parameters on the previous transparency like intrinsic transconductance k, bulk threshold parameter and oxide capacitance Cox of an nfet (Alatel 0.5m process) Result: kn=100A/V2, kp=24.9A/V2, =0.334V0.5, Cox=3.45E=3.45E-7 F/cm2 (see Weste pp48ff) Ex vlsi2.2 (difficulty: easy): Calculate the threshold voltage shift due to the body effect of an nfet at Vsb = 2.2V ( (Alcatel Alcatel 0.5m process) Result: dVtn = 0.282V (see Weste pp55) Ex vlsi2.3 (difficulty: easy): Calculate the transconductance n of an nfet (Alatel 0.5m process), W=1 m, L= 0.5 m Result: n=200 / /V2 (see Weste pp53) Ex vlsi2.4 (difficulty: easy): Calculate the capacitances of an nfet with Vsb= Vsb=Vdb=3V, Vdb=3V, W=1m, L=0.5m, A=1m2, P=3m (Alatel (Alatel 0.5m process) Result: Cgate=2.35fF, Cdrain=Csource=1.2fF (see Weste pp183pp183-191) Weste pp99: 2.10: Have a look at ex 8, 9
MicroLab, VLSI-2 (24/24)
JMM v1.4
VLSI Design I
Static characteristics of MOS inverter
Static characteristics? Does that mean its not going to move?
Overview Static transfer characteristic of CMOS gates Goal: You know the transfer characteristic of CMOS gates and know how to calculate noise margins
NFET Review
D G S Vgs 0.7V G + - S D + Vds >= 0
Vgs < Vt
S Vgs - Vt S
Vgs
Vds
MicroLab, VLSI-3 (2/14)
JMM v1.4
PFET Review
D G S Vgs -0.7V G + - S + D Vds <= 0
Vgs > Vt
S Vgs - Vt S
-Ids
MicroLab, VLSI-3 (3/14)
JMM v1.4
Bipolar Logic
Isnt this a CMOS course?
Vin
one power supply => low impedance source for 2 levels receivers have a simple job => only make one decision no DC power if connections not made at same time Boolean logic has been around a long time
Characterizing Inverters
What goals do we want to achieve with our inverter implementation (and, more generally, other functions)? fast propagation delay (next lecture!) low power dissipation compact layout noise immunity
Vout
VOH voltage-transfer Draw voltagecurve (VTC) for inverter. Shade-in areas that ShadeVTC cant enter. What can we say about gain? What is ideal inv. VTC?
Vin
Noise Margin
noise immunity. Since were signalling values using voltages, we want good noise margins. This means that we need to make an allowance for noise when assigning voltage levels for valid inputs and outputs definition: NM L = VIL max VOL max
NM H = VOH min VIH min
VIL
VIH
Vout
VOH VOL VIL VIH
NML
NMH
Ipd
if pullup is off, VOL = ______ no DC connection when Vin < ______ increase width to increase Ipd compact layout
Vout
cutcut-off pulldown region saturated pulldown region Vin = Vout Vin = Vout + Vt0
Vin
Vt0
JMM v1.4
PseudoPseudo-NMOS using saturated PFET as load device. VOH= Vdd. Useful for building large fanfan-in NOR gates found in static ROMs and PLAs where static power dissipation is okay.
MicroLab, VLSI-3 (9/14)
JMM v1.4
Vin
G
Vout
steady-state negligible steadypower dissipation VOL = 0V, VOH = Vdd VTC transition very sharp switching point can be adjusted by fet sizing
Vds, ds,pd = Vout nonnon-vertical only because of channelchannel-length modulation Vout Vin = Vout
Vdd
n=off
p= sat
lin p=
n= lin
sat n=
p=off
Vt,p
Vt,n
Vdd+Vt,p
MicroLab, VLSI-3 (10/14)
Vdd
JMM v1.4
Vin = 0.5V
Vin = 1.5V
Vout
Vout
Vout
Ids, ds,pd -Ids, ds,pu
Vin = 2.5V
Vout
Vin
Ids, ds,pd -Ids, ds,pu
When both fets are saturated, small changes in Vin produce large changes in Vout
Vin = 3.5V
Vin = 4.5V
Vout
Vout
MicroLab, VLSI-3 (11/14)
JMM v1.4
Vin
Vout
Coming Up...
Next topic Dynamic characteristics of MOS inverters: propagation delay, effects of rise and fall times. Transistor sizing, interconnect issues, estimating performance. Readings for next time Weste:
Sections
VLSIExercises: VLSI -3
Ex vlsi3.1 (difficulty: easy): Calculate the CMOS inverter threshold values for the following conficonfigurations (Alcatel 0,5m process,VDD=3,3V) a) Wn = Ln, Wp = Lp b) Wn = 10 Ln, Wp = Lp c) Wn = Ln, Wp = 10 Lp Result: a) Vinv = 1.30V, b) Vinv = 0.893, c) Vinv = 1.88V (see Weste pp66) Ex vlsi3.2 (difficulty: medium, time consuming): Calculate the noise margin and VIL, VIH, VOL, VOH, for a CMOS inverter operating at 3.3V with n= p, Utn= -Utp=0.61V. Result: VIL = 1.39V, VIH = 1.91V, VOL = 0.26V, VOH = 3.04, NML= NML=1.13V Weste pp99: 2.10 ex 5 (difficulty: medium, time consuming): Design an input buffer that may be used to interface with a TTL driver (V (Vdd=3.3V, VOL=0.8V, VOH=2.0V). Show full derivations of DC conditions. Assume Wn =1m and Ln = Lp = 0.5m Result: Wp = 1.51m MicroLab, VLSI-3 (14/14)
JMM v1.4
VLSI Design I
Dynamic characteristics of MOS inverters
Wow! 0 to 3.3 volts in 300ps!
Overview gate delay modeling power dissipation Goal: You are familiar with CMOS gate delay models like PenfieldPenfield-Rubenstein and wire models. You know the influence of body effect and large loads to gate delay. You know why ground bounce occurres. occurres. You know the different factors in power dissipation.
MicroLab, VLSI-4 (1/29)
JMM v1.3
Vin VOH=Vdd, VOL=0, sharp transition => good noise margins VOH=Vdd => pfet off when Vin=VOH => no static power VOL=0 => nfet off when Vin=VOL => no static power
VTC describes static behaviour. When Vin changes, Vout lags behind because it takes time for capacitors to charge/discharge. So, in real, life Vin reaches Vth before Vout does.
MicroLab, VLSI-4 (2/29)
JMM v1.3
tf
Vin
Vin
Vout
??? 10%
tr
td
Vout
Rise time, tr = time for a waveform to rise from 10% to 90% of its steadysteady-state value Fall time, tf = time for a waveform to fall from 90% to 10% of its steadysteady-state value Delay time, td = time between input transition (when Vin = ???) and output transition (when Vout = ???). If ??? = Vinv, can delay be negative? does Vinv differ for each gate? so does td(seq. of gates) = sum(td)? should we choose 50% instead of Vinv?
due to minimization the delay times decreases the output impedance of buffers increases, thus the importance of interconnection delays increases
due to continuing miniaturization, signal delay time becomes less dependent on gate delay but more dependent on interconnection delay time
switch level mode of fet switch level mode of inverter Uds C R Uin Cin
MicroLab, VLSI-4 (4/29)
JMM v1.3
UCC Rp Uout Rn
Ugs
n=off
p= sat
lin p=
speed
n= lin
sat n=
p=off
Vin Vt,p
Vt,n
Vdd+Vt,p
Vdd
the switching speed is limited by the time taken to discharge the capacitance CL the static transition curve moves to the right if the input transition is fast p-fet gets cutcut-off during the whole falling output time n-fet immediately gets saturated, later on linear
MicroLab, VLSI-4 (5/29)
JMM v1.3
0.9V dd
Vdd Vt, n
dVout
Vout Rn CL
function of Vout
CL
0.1Vdd
dVout I dn
JMM v1.3
Estimating delays
In most CMOS circuits, the delay of a single gate is dominated by the output raise and fall time. Thus:
tr t dr = 2 tf t df = 2
Having found a general form for approximate rise and fall times, one might estimate all delays using the same general form:
t delay = A delay L CL W
width expressed as multiple of minimum width
Where Adelay is a constant that depends on the power supply and transition voltages, the process and the minimum mosfet dimensions. This last dependency might strike one as odd, but usually all mosfets are built using the minimum allowable mosfet length for the process. Rather than solve the equations analytically, one can use Spice to determine the value of various useful constants: Ar, Af, Adr, Adf. These can be used in quick&dirty calculations for sizing transistors during the design process. MicroLab, VLSI-4 (7/29)
JMM v1.3
So we might expect slower input transitions to lead to longer output delay times. One rule of thumb (Weste (Weste, Weste, p. 216ff)
~0.2 for Vtn = 0.61V, Vdd = 3.3V
1 + 2n t dr = t dr step + t f,in 6
1 2p t df = t df step + t r,in 6
When the input starts to rise, the output, which was high, starts to fall. Thus the voltage across CGD changes requiring the input to supply more current to charge CGD, slowing the input transition. Since CGD is small, this is usually a small effect. When inverter is biased into its linear region, CGD may appear multiplied by the gain of the inverter (Miller effect). This doesnt usually matter in digital circuits since the input passes rapidly through linear region. Useful in analog circuits...
How should we model delays when we have multiple inputs? When A, B, C and D are logic 1:
treat series mosfets as resistances in series. Lump intermediate node capacitance with load capacitance. Penfield-Rubenstein model which predicts use Penfieldwhere Ri is the summed resistance from point i to ground and Ci is the capacitance at point i. Penfield-Rubenstein Slope Model uses effective Penfieldt df resistance simulated by Spice:
t d = iR i iC i
t d = iR i C i
Rn =
A B C D
If A goes from 0 to 1 while B, C and D are 1, then all the intermediate nodes in the pulldown chain have already been discharged and the top mosfet sees only a small body effect. If D goes from 0 to 1 while A, B and C are 1, then the intermediate nodes are all one Vt below Vdd and the upper mosfets see a larger body effect. Thus A is the faster input!
1 CG
40
200 CL=1000 CG
td =
The delay through each stage is atd where td is the average delay of a minimumminimum-sized inverter driving another minimumminimumsized inverter. We want an = (CL/CG), so
CL a t d Total delay = n (a t d ) = ln C G ln (a )
in practice a=3...5
Power dissipation #1
the power consumption is low compared to other technologies scaling down increases the power dissipation density with respect to chip area power dissipation produces heat on the chip, which has to be carried off through the chip socket power dissipation is one of the limiting factors in todays CMOS VLSI chips low power applications is a speciality of EM (Neuenburg, Neuenburg, watches, battery applications, etc)
Power dissipation #2
sources of power dissipation: static power dissipation (quiescent current) dynamic power dissipation
dc power dissipat dissipation: short circuit current (power to ground) due to switching ac power dissipation: capacitor current (charging, rerecharging) due to switching
I0 = I S (e qV / kT 1 ) PS = I0 VDD
MicroLab, VLSI-4 (15/29)
JMM v1.3
Comparison of dynamic short circuit current vs. capacitive current. As expected, the short circuit current have a less important contribution when the load gets large. Slower input transition would increase short circuit current.
Uin Uout
W/L=4
Uin
W/L=2
Uoutout-A
W/L=4
Uin
W/L=2
Uoutout-B
50fF
W/L=4
Idsn Uoutout-C
200fF
Uin
W/L=2
Idsp
MicroLab, VLSI-4 (16/29)
JMM v1.3
CL Vout dVout + tp 0
Vdd
(V
DD
Aha! Now one can see why everybody changes from 5V to 3.3V and to 2.5V!
2 C L VDD 2 fp Pd = = C L VDD tp
The above waveform shows the short circuit current Psc = t rf 3 (VDD 2 Vt ) 12 t p
Total power dissipation is: Ptotal = Ps + Pd + Psc dynamic power dissipation is dominant use switching activity to estimate power dissipation:
2 Pd = n switching C total VDD f
switching activity: nswitching = percentage of switching gates there exist simulators estimating power dissipation using the switching activity
CL
Vdd C y g= T
RyCy >> T Then Vy(T) in volts will equal the average dynamic power in watts drawn from the power supply over one period.
metal migration power supply noise RC delay limit current density contact replication
general rule:
J AL 0.4... 1mA / m
I
I
parallelparallel-plate capacitance
ParallelParallel-plate capacitance given in process files. Fringing capacitance is significant when t is comparable to h.
MicroLab, VLSI-4 (22/29)
JMM v1.3
Fringing Capacitance
Figure 6.11 from CMOS Digital Integrated Circuits: Leblebici: Analysis and Design, by Kang and Leblebici:
For a long conductor where (t/h)=0.4, (w/h)=0.25, (w/l)=0, the total capacitance may be 10x the parallel plate capacitance.
MicroLab, VLSI-4 (23/29)
JMM v1.3
Wire model?
Today, the longest wire on a VLSI chip might be 2cm which has time of flight of ~130ps assuming SiO2 = 3.9 0 If the signal rise/fall time is longer than the time of flight we can model wires as a distributed RC network. Longer wires or shorter rise/fall times require the wire to be modelled as a transmission line. For short wires, a lumped RC model is sufficient. For longer wires, we use the distributed RC model where signal propagation can be shown to obey the diffusion equation:
R/unit length
dV d 2 V rc = 2 dt dx
distance from driver
C/unit length
Which means the prop time tx = kx2 with the signal edge becoming dispersed with increasing x.
MicroLab, VLSI-4 (24/29)
JMM v1.3
a) t = ? Fix: drive clock from central location to decrease l and widen clock wire to 20: r = 0.0025 ohm/square c = 50pf/10mm l = 10mm c) t = ?
whew!
b) t = ?
Inductance
BondBond-wire inductance can cause deleterious effects in large, high speed I/O buffers
package inductance: 3 .. 15 nH
Vdd
dI dV = L dt
L i(t)
design techniques: 9 separate power pins for I/O pads and chip core 9 multiple power and ground pins 9 careful selection of the position of the power and ground pins on the package 9 adding decoupling capacitances on the board 9 increase the rise and fall times 9 use advanced package technologies (SMD, etc)
MicroLab, VLSI-4 (26/29)
JMM v1.3
Coming Up...
Next topic Combinational logic: series/parallel switch networks, transmission gates. Performance optimis optimisation. ation. Readings for next time Weste:
4.4
(inductance) 4.3.6, and 4.5 thru 4.5.1, and 4.5.4 thru 4.5.5 except 4.5.4.4, and 4.6.3 (delay modelling) 4.7 (power consumption) 4.8 (sizing routing conductors)
You should read the rest of chapter 4 when you get the chance ...
VLSIExercises: VLSI -4
Ex vlsi4.1 (difficulty: easy): Calculate the inductive spike at the power supply provoked by 8 output buffers, each driving 50pF in 4ns, Vdd=3.3V, total bonding inductance 15nH Result: dVtot = 1.24V (see Weste pp 205) Ex vlsi4.2 (difficulty: easy): a) Calculate the power supply width Wpower necessary for feeding a clock buffer running at 50MHz driving 100pF. b) What is the ground bounce with the chosen conductor? (JAL=0.5mA/m, power supply distance l = 1mm, Vdd=3.3V, Rmetal1 = 72m/sq, tr= tf=1ns) Result: a) Wpower=33 m, b) dV = 0.72V (see Weste pp 239) Ex vlsi4.3 (difficulty: easy): Calculate the clock distribution delay for the example on transparency 25 Result: a) td=55 ns, b) td=27.5 ns, b) td=1.38 ns (see Weste pp 200)
MicroLab, VLSI-4 (28/29)
JMM v1.3
VLSIExercises: VLSI -4
Ex vlsi4.4 (difficulty: easy): Calculate Ar and Af for a CMOS inverter ( (Vdd Vdd=3.3V, Vdd=3.3V, Alcatel 0.5m process) Result: Ar =43.9 k, Af =10.9 k (see Weste pp208ff and transparency 7) Weste pp370: 5.9 ex 14 (difficulty: easy): A low power 3.3V chip has a clock of 12MHz. In the power downdown-mode, the clock driver drives 5mm of a 2m wide metal1 wire. If the area capacitance of metal is Ca=2.37pF/m2 and the sidewall capacitance is Cf0= 2.37pF/m what is the powerpower-down dissipation, assuming this is the dominant term? What is the dissipation if the wire is reduced to 50m length? Result: Pd = 85W, 0.85W (see Weste pp 235)
VLSI Design I
CMOS Combinational Logic
Overview Euler rules for complex CMOS gates Layout and stick diagram Goal: You know how to design compact layout of complex CMOS logic gates with the Euler rules. You are familiar with transmission gates and its limitations.
MicroLab, VLSI-5 (1/34)
JMM v1.4
...
A1 An ...
...
we want VOH = Vdd, better use only pfets in the pullup path similarly, since we want VOL = 0, better use only nfets in the pulldown path looking at pulldown path: since nfets are on when VGS > VTH, output will be pulled low when right combination of inputs are high CMOS gates are naturally inverting
Complementary logic
Now you know what the C in CMOS stands for!
We want complementary pullup and pulldown logic, i.e., the pulldown should be on when the pullup is off and vice versa. pullup on off on off pulldown off on on off F(A1,,An) driven 1 driven 0 driven X no connection
Since theres plenty of capacitance on the output node, when the output becomes disconnected it remembers its previous voltage -- at least for a while. The memory is the load capacitors charge. Leakage currents will cause eventual decay of the charge (thats why DRAMs need to be refreshed!). No connection is also useful for constructing tristate drivers! In this case, we call this state Z which is short for highhigh-Z which is short for high impedance which is how engineers say no connection. Isnt jargon wonderful?
MicroLab, VLSI-5 (3/34)
JMM v1.4
CMOS complements
What a nice VOH you have...
pulldown nfet block
conducts when VGS is high A B conducts when A is high and B is high: A.B
JMM v1.4
F = A*B
A 0 1 1 1 1 0
F=A*B
A B
F=A+B
F=A*B
F A B
A B
A1
An
PseudoPseudo-NMOS NOR gates are used to build high fanfan-in NOR gates for PLAs to save area (at some cost in static power).
MicroLab, VLSI-5 (8/34)
JMM v1.4
Wp
Lp IN OUT
Wn Ln GND
metal2 metal poly n+ diff
p+ diff
Layout Rules #1
layout rules are the common language between design and process engineers conservative rules absorb process disturbances and variations layout rules must be respected by the designer layout rules reflect the limits of a process, they describe:
minimal distance, overlap minimal width (e.x. channel length, )
Layout Rules #2
symbol and mask layout of a CMOS inverter
n-well contact (n(n-diff)
Stick Diagram
stick diagrams are technology independent no layout rules need to be known mask layout may be generated automatically
A B
&
&
&
B C
B C A
C A
D B
C A A
D B C D
A B C D
&
&
start F A start
VDD
N3
N1
N2
D VSS
A
JMM v1.4
A Quiz!
/1
A Quiz!
/2
Find the minimal transistor circuit (2 * 4 fets) fets) and the most compact layout using Eulers rule.
CD AB
00 01 11 10
00
01
11
10
1 0 0 1
1 0 0 0
1 0 0 0
1 0 0 0
Quiz : Solution
F=A*B+B*C*D F = B * ( A + C * D)
C
VSS
P1 D
N1 A P2 start
B F
Transmission Gates
CMOS
A S B
nMOS
A S B
If VA = VDD then current will flow from A to B until VB = _____ If VA = 0 then current will flow from B to A until VB = _____ Assuming S and -S are complementary signals, the CMOS transmission gate (TG) acts as a switch, controlled by S, that has no inherent voltage drop (unlike a switch constructed from a single nfet or pfet which exhibits at VT drop at one rail or the other).
S=0
S= VDD
switch is off
switch is on
|VT,p|
Req,p eq,p Req,n eq,n
VDD-VT,n
VDD
Req,TG eq,TG
JMM v1.4
TG Circuits: MUX
A Y=A*S+B*S B S Is this node always the output of this gate?
TG Circuits: 4 to 1 MUX
multiplexers can easily be done with TG never forget that TG are bibi-directional compact layout by combining identical gates
A B F C D S1 S2
MicroLab, VLSI-5 (28/34)
JMM v1.4
12 transistors
A
8 transistors
A B
6 transistors
JMM v1.4
TG Quiz
Find the function of the following 4 transistor circuit:
F B
TG Circuits: Problems
difficult to get compact layout outputs behave like bibi-directional signals many TG in series provoke large delays
Uin
Uout
Uin
R C
R C
R C
R C
R C
Uout
= 2.2 (RC )2
MicroLab, VLSI-5 (31/34)
JMM v1.4
Coming Up...
Next topic Dynamic ( (precharge precharge/evaluate) precharge/evaluate) logic circuits: CMOS domino logic, NP domino logic, CVSL logic. Charge sharing. Readings for next time Weste:
Sections
VLSIExercises: VLSI -5
#1
Ex vlsi5.1 (difficulty: easy): Design a CMOS gate that implements the function
Out = (( A + B) C + D E ) F
Ex vlsi5.2 (difficulty: easy): What is the Boolean equation of the following CMOS gate.
VDD
A B
GND
VLSIExercises: VLSI -5
#2
Weste pp371: 5.9ex7 (difficulty: easy): Design a pass transistor network that implements the sum function for an adder
S = A B C + A B C + A B C + A B C
VLSI Design I
Dynamic Logic Gates
Overview Dynamic logic gates, Domino, NORA, CVSL structure, Goal: You are familiar with dynamic logic gates and its different families. You can handle the dynamic logic problems like charge sharing and timing.
MicroLab, VLSI-6 (1/28)
JMM v1.3
We can replace pfet pullup network with pseudopseudo-NMOS load (pfet (pfet with grounded gate) but
dissipate static power when output is low have to make load fet small to ensure that VOL is low enough to cut off nfets in next stage reduces static power consumption (good!) increases output rise time (bad!)
B evaluate switch
inputs must be stable before CLK goes high because once output has been discharged it wont go high again until next cycle for same reason, noise/glitches on inputs cannot exceed nfet threshold, a much more stringent requirement than for static CMOS CMOS gates.
Prec Precharge echarge phase clock output Evaluate phase
CLK
precharge
evaluate
CLK
Solution: develop techniques that avoid races CMOS Domino logic CMOS NORA (no race) logic
MicroLab, VLSI-6 (4/28)
JMM v1.3
nfets
nfets
CLK
pree preecharge:low evaluate: rises (maybe)
When CLK is low, dynamic node is pree preecharged high and buffer inverter output is low. Nfets in the next logic block will be off. When CLK goes high, dynamic node is conditionally discharged and and the buffer output will conditionally go high. Since discharge can only only happen once, buffer output can only make one lowlow-toto-high transition. When domino gates are cascaded, as each gate evaluates, if its output rises, it will trigger the evaluation of the next stage, and so on like a line of dominos falling. Like dominos, once the internal internal node in a gate falls, it stays fallen until it is picked up up by the pree preecharge phase of the next cycle. Thus many gates may evaluate in one eval cycle.
latching pfet acts like keeper above unless dynamic node gets pulled down during evaluate phase. When buffer output goes high it switches keeper off saving static power. Good for leakage current problems... Note that you can put an even number of static gates after the inverter and before the next domino gate.
CLK nfets
CLK
Use NOR gate instead of inverter as the buffer to make a faster high fanfan-in AND gate. Same trick works for high fanfan-in OR or MUX functions.
nfets
nfets
CLK
precharge: low
Since domino gate outputs are low during the pre precharge phase, gates which have only domino output nodes as inputs dont need the evaluate nfet since all the nfets in the pulldown will be off anyway. But remember: if evaluate nfet is removed, precharge will ripple through cascaded gates just like evaluates do. Maybe only remove for gates where nfet stack is tall (i.e. resistive) enough that pullup will start to win anyway before ripple reaches gates and turns off pulldowns. pulldowns.
MicroLab, VLSI-6 (7/28)
JMM v1.3
large nfets
small
CLK
Some designers also grade the sizes of the nfets, nfets, smallest at the top (increase in R offset by decrease in C)
If we make the nfet in the output inverter much smaller than the pfet then
the load on the internal node decreases, and the switching threshold of the inverter increases
Both effects make the gate evaluate sooner. If large >> small, the gate delay can be cut almost in half! half! However, the other edge is very slow, so ripple pree preecharge is a problem.
MicroLab, VLSI-6 (8/28)
JMM v1.3
3C 1.5C 1.5C
C C C
Suppose the dynamic node has been discharged during the previous evaluate cycle. Then during precharge, all the intermediate nodes in the pulldown chain will remain discharged while the dynamic node is precharged. precharged. Calculate the voltage on the dynamic node when CLK goes high. When CLK goes high, the voltage on the dynamic node goes to
3C for VDD=3.3V V = 1.1V 3C + 6C DD which is low enough to switch the output inverter.
Fortunately this situation is easily detected by CAD tools and c can an be resolved by (1) adding additional pree preecharge devices to intermediate nodes or (2) increasing size of output buffer which will increase increase capacitance of dynamic node (faster output buffer may compensate for larger internal capacitance).
n-logic
n-logic
n-logic
n-logic
CLK
Capacitive Coupling
OUT
CLK
OUT t
Coupling can also occur between other signal wires and long dynamic dynamic nodes (e.g., ones that span multiple bits in a datapath). datapath). Solutions: on long routes add twists to avoid continuous routes or route dynamic signals between mutually exclusive or complementary signals.
Domino ANDAND-OR
Y
8/2
A B C CLK D
E F
CLK
A A B B
CLK
CLK P4 P3 P2 P1 C0
Domino version of the Manchester carry chain
MicroLab, VLSI-6 (16/28)
JMM v1.3
G4
C4 C3 C2 C1
G3
G2
G1
A B A B
CLK
The crosscross-coupled pfets serve as keepers for the output which is high making the gate static rather than dynamic! During precharge both keepers are off; during the evaluate phase, the output that goes low switches on the keeper for the output that is staying high. Really solves capacitive coupling problems with dynamic logic in datapaths. datapaths.
MicroLab, VLSI-6 (17/28)
JMM v1.3
clock
clock The static version might be quite slow due to the nfet pfet fight during switching Q Q d e a b c d b c e a dynamic CVSL
pre
eval
pre
nfets
pfets
nfets
CLK
eval
CLK
pre
CLK
eval
If we turn a dynamic gate upside down and use pfets to build the logic block, we get a logic gate that pree preecharges low and discharges high. By using these gates in an alternating seque sequence nce with regular nfet dynamic gates we can eliminate the race problem we had with nfetnfet-only dynamic gate sequences and hence we dont need the buffer inverter present in domino gates. Removing the buffer is a mixed blessing since we may need it for drive reasons and to keep compatibility with other domino gates. It also makes NORA logic very susceptible to noise since during the evaluate phase all information is stored dynamically.
MicroLab, VLSI-6 (19/28)
JMM v1.3
Actively evaluating The 9 Oclock state is very interesting: once a Domino gate has has finished evaluating, the gates immediate predecessors can start to pre preearge (forcing the gates inputs low) without affecting the value of the gates output. The gate is acting as latch so long as its predecessors dont start another evaluate cycle. Perhaps we can build a pipeline of domino stages where each stage stage serves as both logic and latch depending on where it is in its cycle. cycle. Need to have each stage supply its own pre precharge/evaluate timing dependent on what its neighbours are doing...
P/E
done? done?
P/E
done? done?
P/E
done? done?
F1
F2
F3
a stage only prec precharges when both (a) its successor has finished evaluating
(its done with our values)
(b) its predecessor has finished precharging a stage only evaluates when both (a) its successor has finished precharging
(old values are gone so we cant use em em twice!) (our new output wont affect its stored value)
So, what logic goes in the clouds? And how do we build the done? boxes?
MicroLab, VLSI-6 (21/28)
JMM v1.3
CMuller C -Element
Add weak feedback inverter if were worried about dynamic storage for precharge/eval precharge/eval signal
P/E
Pdone Sdone
The Muller CC-Element is the AND gate for selfself-timed logic because it changes its output only after both inputs have changed. As shown above, its an elegant implementation for both sets of rules on the previous slide.
Completion Detectors
use dualdual-rail signalling (i.e., two wires) to encode reset (not yet evaluated) 00 ready with value 0 01 ready with value 1 10 and then build handshake logic that starts next stage when current stage is done and next stage has completed its previous computation and delivered its values...
SelfSelf-timed logic
C
P/E
done? done?
C
P/E
done? done?
C
P/E
done? done?
F1
F2
F3
In the forward direction by how long it takes for the evaluate edge in one stage to trigger the evaluate edge in the next stage: stage: LF = tF + tD + tC In the reverse direction by how long it takes for the precharge in one stage to trigger a new evaluate in the stage after first evaluating the previous stage (remember not double count!): LR = 0.5*(t 0.5*(tC + tF + tD + tC + tF + tD)
Further Improvements
We dont have to delay evaluation until successor has finished its precharge (signalling that its finished with our values). We can just check that successor has started precharging precharging Even with this improvement, the correct sequencing will still happen for any combination of precharge and evaluate times for all the gates. We can modify the control element like so:
S P/E
Eliminate the extra inverter for good measure and use dynamic storage as control element memory
P/E
Pdone Sdone
Were going to stop here, but there are other improvements that can be made. Hint: do we have to wait until the predecessor is done computing new values before starting our eval? eval? etc., etc., etc.
MicroLab, VLSI-6 (25/28)
JMM v1.3
This makes dynamic logic a good choice for those parts of a circuit where the extra engineering investment is justified, e.g., along the critical timing paths.
Engineers who like this sort of design will find this the sort of design they like!
Coming Up...
Next topic CMOS sequential logic. logic. Readings for next time ... Weste:
5.4.4
(dynamic CMOS logic) 5.4.7 - 5.4.11 (CMOS domino logic, CVSL), except 5.4.10
VLSIExercises: VLSI -6
Weste pp371: 5.9ex8 (difficulty: easy): Design a CVSL gate for the following fun function: ction:
S = A B C + A B C + A B C + A B C
VLSI Design I
Clocking Strategies
I take care of it ?
Clock Generator
Overview microelectronic technologies, ASIC, FPGA, C Goal: You are familiar with the microelectronic technologies, and know their advantages and features.
MicroLab, VLSI-8 (1/20)
JMM v1.4
Microelectronic Technologies
What is microelectronic ? Has a microelectronic design engineer only to have good knowledge about silicon, layout, etc. ?
application specific integrated circuit full custom macro cell standard cell gate array microprocessors PIC, COP FPGA RISC uController signal processor PAL CPLD field programmable logic
features
size: 100 - 1M gates short turn around time cheap at medium quantities unsuitable for regular structures like RAM, PLA, ALU
prefabricated wafers
I/O stages predefined regular array of fets, fets, no reserved interconnection channels interconnection defines functionality
features
size: 100 - 1M gates short turn around time cheap at medium quantities regular structures like RAM, PLA, ALU can be used
SOG Example
INV NOR2
nwell contacts GND 3 nfets 2 small, 1 large mosfets with common gate 3 pfets
VDD unused horizontal and vertical tracks used for wiring gates together. Better granularity if main routing channels run vertically. GND
features
chip size limits complexity long turn around time cheap at high quantities standardized cell height unsuitable for regular structures more flexible and compact (1:4) than gate array
Its just like designing with boardboard-level components. CAD tools help with placing the cells to minimize area and to meet timing constraints (perhaps directed by a floorplan created by the user); routers make the appropriate connections between the cells.
MicroLab, VLSI-8 (8/20)
JMM v1.4
features
chip size limits complexity long design and fabrication time efficient use of silicon area cheap only at highest quantities (ex. uP, uP, memories, ...)
Macrocell Technology #1
complete fabrication process
semi combines semi - and full custom technologies predefined library of base functions generators for regular structures
features
chip size limits complexity short design, long fabrication time cheap at high quantities high flexibility, compact layouts
macro cell
PLA RAM
Macrocell Technology #2
2-dim array of full custom block standard cell block
FPGA Technology #1
field programmable device
no fabrication needed for customizing predefined logic blocks unsuitable for regular structures
features
size: up to 2000000 logic gates (see Virtex from Xilinx) Xilinx) large silicon area necessary (72 million fets, fets, 10x Pentium2) short design and customize time cheap for small quantities compared to ASICs, ASICs, FPGAs have a reduced clock speed circuit configuration downloadable (RAM or PROM)
FPGA Technology #2
I/O buffers configurable logic block (CLB) switching matrix I/O buffers I/O buffers routing channels I/O buffers
MicroLab, VLSI-8 (13/20)
JMM v1.4
configuration
- mask programmable - one time programmable - downloading of configuration from host into internal RAM - downloading of configuration from on board serial ROM
JMM v1.4
C1. . .C4
G3 F G H
G2
G1
G H 1
EC RD Y
F4 Bypass SD D Q XQ
F3
F2
F1
FPGA Technology #3
K (C lock) 1
EC RD X H F
FPGA Technology #4
CLB
CLB
CLB
PSM
PSM
CLB
CLB
CLB
PSM
PSM
CLB
CLB
CLB
uC Technology
field programmable device
no fabrication needed for customizing simple C software compilers software vs. hardware solutions
features
4 or 8 bit CPU, size: 512 bytes or more down to 8 pins AD, usart, usart, timer, etc. included very slow compared to hardware solutions cheap (<$2)
PIC 36 mm
MicroLab, VLSI-8 (16/20)
JMM v1.4
cost FPGA units design break even units NRE design quantity ASIC
Coming Up...
Next topic Hardware description language VHDL, toptop-down design. Readings for next time Xilinx article: The total cost of ownership
VLSIExercises: VLSI -8
#1
Ex vlsi08.1 (difficulty: easy): Calculate the breakeven point between an FPGA and ASIC design. Assume a design time of 6 months and an additional backback-end design time of 1 month for the ASIC. The NRE costs of the ASIC are 75kEuro, the cost per unit are 150Euro for the FPGA and 3 Euro for the ASIC. The cost of 1 engineer per month are 10kEuro. Result: breakeven at 578
VLSIExercises: VLSI -8
#2
Ex vlsi08.2 (difficulty: medium): Calculate the breakeven point between an FPGA and ASIC design. Assume the design costs from exercise vlsi08.2 and a fabrication time of 3 months for the ASIC. The revenue per sold system at a product lifetime of 4 years is 600Euro without taking into account the FPGA/ASIC chip costs. Use the triangular timetime-toto-market model from Synopsys (see Xilinx article The total cost of ownership). Result: breakeven at 14068 FPGA solutions
units/time maximum available revenue
JMM v1.4
VLSI Design I
Regular Logic Structures
But we still have to draw the schematic! So look for systematic logical structures:
w may lead to additional systematic physical layouts w find canonical logic representations that can be automatically turned into compact physical structures (automate, automate,
automate)
w would like to be able to make changes in the logic without having to redo entire layout -- look for ECO-tolerant structures (engineering change orders) muxes, ROMs, PLAs
But how do we minimize the number of literals or minterms? Yeah, we know about Karnaugh maps, but they arent so good for more than 4 inputs or for maximizing minterm sharing.
Logic Manipulation
Start with two-level minimization
w by inspection searching for terms that are logically adjacent:
p x + p x = p ( x + x ) = p 1= p w Karnaugh maps for simple situations w Quine-McCluskey otherwise
F = a c + a d +bc +bd +a e = a (c +d ) +b (c + d ) + ae w factor again with or-terms that appear in multiple places F = (a + b) (c + d ) + ae w find common subexpressions (multiple output decomposition)
0 C C 1 A,B
Easy to implement but not necessarily compact even when implemented with TGs. But you can make a nice Boolean Unit:
OP0 OP1 OP2 OP3 A,B OP<3:0>
0 1 1 0 0 0 1 1 0 0 1 1 0 0 0 0
Vcc
B out
F
ZERO AND OR XOR
A gnd
Read-only Memories
if connection or mosfet is present, blank otherwise
7 6 5 4
Address decoder implemented as AND (= NOR). Note: all but one row pulled down for given input.
3 2 1 0
For each Fi, OR together all rows for which output is 1 (actually use NOR then invert).
A B C
F1
F0
Like muxes, but share decoding logic among all outputs. Potential optimizations: w delete rows with no output pulldowns w look for adjacent rows with identical output pulldown configurations and merge into single row. Are these worth doing?
MicroLab, VLSI-9 (6/12)
JMM v1.2
PLAs
In fact, the optimizations from the previous slide are so worthwhile that we have a name for the resulting optimized ROM: Programmed Logic Array, or PLA for short.
AND plane OR plane 4,5,6,7 2,3 1 Hint: for greater ECO-tolerance, add a few extra empty rows!
A B C
F1
F0
PLAs are usually constructed directly from minimized SOP logic equations: the rows represent the minterms of the equations, the input columns form the minterms and the output columns form the sums. Note that with multiple output columns, minterm sharing between the outputs happens naturally...
PLA Folding
PLAs can be sparse, i.e., only a few of the possible connections in either plane may be made. (AND plane can only have 50%!)
A A B B C C D D 1 2 3 4 5 6 F1 F2
If we allow input and outputs to come from both above and below then we may be able to fold two columns into one if the rows they use dont overlap. This may require rearranging the rows to minimize overlap and hence maximize folding possibilities.
Row folding is another possible optimization (but not in this example).
A A B B F1 6 1 2 3 4 5 D D C C F2
MicroLab, VLSI-9 (8/12)
JMM v1.2
Multiple-input encoding
On the previous slide, it was noted that the AND plane can have at most 50% of its connections programmed. Why? To improve the utilization of the input columns, consider encoding the 4 columns used to transmit the two input literals and their complements with some more useful functions of the two literals. For example:
AB A A B B AB AB AB
AIN
You get extra computing oomph: for example, its now possible to compute (A xor B) using a single row rather than the two rows it took with the old encoding.
MicroLab, VLSI-9 (9/12)
JMM v1.2
Datapath Operators
Most digital functions can be divided into the following categories:
u u u u
Datapath operators form an important subclass of VLSI design that benefit from the structured design principles of hierarchy, regularity, modularity and locality.
u
N-bit Data is generally processed by the use of n identical subcircuits. Data operations may be sequenced in time or space.
data may be arranged to flow in one direction control signals are introduced in an orthogonal direction to the dataflow
less than or equal m =0 m Z
m bits
subtractor
metal1 control flow
Zm Zm-1 Z1 Z0
Coming Up...
Next topic Sequential logic: state elements, latches and registers. Static vs. dynamic storage. Single and multiphase clocking strategies. Setup and hold times; propagation delays. Readings for next time Weste:
u Sections
8.1 thru 8.2 (data operators) u 8.3.2, 8.4.2 (just read, dont study)
VLSI Design I
CMOS Sequential Logic Clocking Strategies
Overview single and double phase clock systems Latch and FF timing Goal: You are familiar with static and dynamic latches/FFs latches/FFs as well as with single, double phase clock, clock redistribution, clock skew and PLL clocking techniques.
MicroLab, VLSI-10 (1/23)
JMM v1.4
Sequential Logic
Use #1: Get better utilization from idle combinational logic blocks. Pipeline the system so that new computations start before the old ones complete. Add registers to keep computations separate.
8
A B A B
x
8
Use #2: Convert parallel operations to a sequence of (faster, smaller) serial operations. operations.
1
+
8 8
Use #3: Need to process a sequence of inputs and want to reuse the same hardware (finite state machine).
D G
D G Q
Q stable Q takes value from D
D clk
D clk Q
Q stable
A static latch will hold data while G is inactive, however long that may be. A dynamic latch will hold data while G is inactive, but only for a while, after which the saved value may decay.
Do static latches dissipate static power? How long is for a while? Which one should I use?
MicroLab, VLSI-10 (3/23)
JMM v1.4
latch b
CLa
D Q G
CLb
D Q G
t2b
t1a = tnqa+ tnla > thb t1b = tnqb + tndb > tha t2a = txqa + txla < tc0 - tsb t2b = txqb + txlb < tc1 - tsa
th ts tn tx tl tq
= hold time = setup time = min delay from invalid input to invalid output = max delay from valid input to valid output = delay for combinatorial logic from input to output = delay for memory element from G to Q
t2b
t1a = tnqa+ tnla > thb t1b = tnqb + tndb > tha t2a = txqa + txla < tc0 - tsb t2b = txqb + txlb < tc1 - tsa Questions for latchlatch-based designs:
how much time for useful work (i.e. for combinational logic delay)? txla + txlb < tc - 2(t 2(ts + txq) what is the maximal clock frequency
1/f = tc > 2(t 2(txq + txl + ts )
does it help to guarantee a minimum tn, for example, by requiring a minimum number of gates in each cloud? Suppose the maximum clock skew is tSKEW. How does that affect the equations above? Clock skew measures the difference in arrival of CLK at two cascaded latches (not necessarily any two latches!).
MicroLab, VLSI-10 (5/23)
JMM v1.4
Static Latches
Basic idea:
Need gain around this loop to make latch static. D
0 1
Want storage node to be isolated from whatever user does to Q. Q Would like fast CLKCLK-toto-Q, small setup and zero hold times.
Oops feedback not isolated from Q. Could add additional output inverters...
CLK
Obvious implementation:
Q
D CLKN CLK
D CLK
Should we buffer CLK 0, 1 or 2 times?
MicroLab, VLSI-10 (6/23)
JMM v1.4
Latch Timing
1 2 Q D CLK
setup time = how long D input has to be stable before CLK transition. hold time = how long D input has to be stable after CLK transition.
ts CLK th
D
1 2
So, what node should we use to measure setup and hold times? And what should we measure? Other time of interest: CLKCLK-toto-Q
JMM v1.4
Dynamic Latches
Suppose in the interest of speed we were willing to give up the static guarantee and take our chances with dynamic latches, i.e., remove feedback path...
Eliminate when Q fanout is small (1)
D CLK
Delete the PFET driven by CLKN and then add NFET driven by CLK in Qs pulldown path to handle what happens when D goes from 1 to 0.
MicroLab, VLSI-10 (8/23)
JMM v1.4
What about those of us who dont have buildings full of engineers to sweat the details? Use D-flipflip-flops and address all the problems once!
D G
D G
Q
slave
D CLK
master
CLK D CLK
!
MicroLab, VLSI-10 (9/23)
JMM v1.4
Q D CLK
CLK
CL
D Q clk
t1 = tnq + tnl > th t2 = txq + txl < tc - ts Questions for registerregister-based designs:
how much time for useful work (i.e. for combinational logic delay)? does it help to guarantee a minimum tn? How about designing registers so that txq > th? Supp Suppose the maximum clock skew is tSKEW. How does that affect the equations above?
CLK D
QN
CLK is low:
node 1 follows not(D) node 2 pulled up QN is floating with its old value node 2 = 0 if node 1 = 1, otherwise it stays 1 node 2 = not(node 1) shortly after CLK QN = not(node 2) stable soon after CLK node 1 can be pulled down if D goes to 0 (capacitive coupling), but node 2 wont change!
MicroLab, VLSI-10 (12/23)
CLK is high:
JMM v1.4
latch #2:
D Q G CLK D Q G D Q G
Simplest clocking methodology is to use a single clock in conjunction conjunction with a register. Clocks are generated with global clock buffers. CLK and CLK are generated locally. buffers necessary for large loads clkclk-in clk clk
MicroLab, VLSI-10 (13/23)
JMM v1.4
Clock Skew
D Q clk CLK D Q clk D Q clk
delay
delay
if a clock net is heavily loaded, there might be a race between clock and data -> clock skew special attention has be made by designing the clock tree. CAD tools are able to design balanced clock trees. two methods to avoid clock skew:
latch
D Q clk CLK D Q clk D Q clk
delay
D Q clk
D Q clk
delay
JMM v1.4
CLK
MicroLab, VLSI-10 (14/23)
phi1 phi2
a problem in single phase clocked systems is the generation an and distribution of nearly perfect overlapping clocks. in twotwo-phase clocked systems this is solved by nonnonoverlapping clocks nonnon-overlapping clocks can be generated with latch structures
clk
1 1
phi1
phi2
MicroLab, VLSI-10 (15/23)
JMM v1.4
D Q clk
D Q clk
in properly designed twotwo-edge clocked systems clock skew problems are drastically reduced Disadvantage: 50% speed reduction typical application: FSM on rising edge, datadata-path on falling edge designs with several FSMs and datadata-paths need thorough design
Clock Distribution
Two main techniques for clock distribution exist: a single large buffer (see Alpha processor) a distributed clock tree approach
n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath n-bit datapath
clk
there is no such thing as designdesign-free clocking strategy in todays highhigh-performance processes clock buffers should be surrounded by power pads due to its large power consumption
clk
clk
clk
clk driver
PLL
up
Divider by n
#2
Filter VCO voltage controlled n x fosc oscillator
The phase detector produces a sequence of up/down pulses, which are used to switch a charge pump. The charge pump charges/discharges a capacitor with voltage or current pulses A filter is used to limit the rate of change of the capacitor voltage. The result is a slowly changing voltage that depends on the frequency difference between the PLL and VCO. The VCO increases/decreases its frequency of operation depending on its input voltgae
JMM v1.4
We need a CAD tool: static timing analyser. Heres how it works: Step 1: LevelLevel-ize ize all signal nodes.
Start by assigning all register outputs and toptop-level inputs a level of 0. For all other gates: levelOUTPUT = max(level max(levelINPUT)+1. For each successive node level, compute min and max time for all nodes on that level (see next slide for details). This is a data independent independent computation. Might need case analysis to avoid false paths. paths.
Use min times of register inputs to check hold time. Use max times and tCLK to check setup time or use max time + tSETUP to determine min tCLK.
MicroLab, VLSI-10 (20/23)
JMM v1.4
VDD
IN OUT
CLKN IN CLK
2
OUT
C1
COUT
1 IN OUT
C2
COUT
Use PenfieldPenfield-Rubenstein model to compute td,insum(Ri,Ci) over all nodes i in the stage, where Ri is d,in-out = sum(R total effective resistance to power rail and Ci is nonnon-zero if node capacitor needs to be charged/discharged. Multiply by degrading factor to account for rise/fall time of input.
Coming Up...
Next topic Data operators Readings for next time Weste:
Sections
5.5 thru 5.5.6 (latch, FF) 5.5.8 thru 5.5.11 (clock strategy) 5.5.15 and 5.5.16 (clock strategy)
section 9.3.5.3
Wave pipelining
just assert new inputs to logic after waiting long enough to ensure that previous values wont be corrupted. Requires very careful design of each level of logic to ensure consistent propagation delay along all paths with all possible data values. Hard to do in the face of manufacturing variataions (fast N, slow P and vice versa) use dual-rail signaling (i.e., two wires) to encode reset (not yet evaluated) 00 ready with value 0 01 ready with value 1 10 and then build handshake logic that starts next stage when current stage is done and next stage has completed its previous computation and delivered its values. Dual-rail logic works well with precharge-evaluate gates more on this in a later lecture.
Self-timed logic
S1
0/0 1/0
S2
0/0
S3
-/0
S8
1/0 0/0
S9
1/1 0/0 -/0
S4
S5
1/1
-/1
1/0
S6
0/0
S7
Is this a Mealy or Moore machine?
S2 S3
Compatibility table: start by putting X in square (Si,Sj) if Si produces different output from Sj for some input
S4
S5
X X
S1 S2 S3 S4 all but last state
MicroLab, VLSI-11 (5/9)
JMM/ESA v1.0
0/0 S1 0/1 S2
1/0 S4 1/1
1/1 S5 0/1
0/0
S2 S3
S4
S5
X X
S1
S1,S5
S2
S3
S4
Next: for non-X square (Si,Sj) write in pairs of states that have to be equivalent in order for Si and Sj to be equivalent. Finally: Look at an entry in (Si,Sj). If entry is Sm,Sn, and if (Sm,Sn) has an X, put an X in square (Si,Sj). Repeat until no more squares can be Xed out.
0 1 1 -
0-0 01 10 -1
00 01 10 11 00
1 0 1 0 1
One hot encoding uses a separate register for each possible state: register output is 1 if FSM is in that state. Hence only one state register is hot at a time. Makes for trivial decoding of state, simple next state logic. Good for simple FSMs and when no multi-level synthesis is available. Often a good choice for FPGAs.
Coming Up...
Next topic Arithmetic circuits: adders and multipliers. Readings for next time Weste: 8.4
VLSI Design I
Datapath Operators: Addition and Multiplication
Didnt I learn how to do addition in the first year? First year courses arent what they used to be...
Overview Carry propagate, carry lookahead, lookahead, carry save, carry skip and carry select adder Goal: You know serial and parallel addition and multiplication architectures
MicroLab, VLSI-12 (1/29)
JMM v1.4
Addition/Subtraction
Most digital functions can be divided into the following categories:
Adder architectures:
carrycarry-save adder (CSA) carrycarry-skip adder carrycarry-select adder parallel adder serial adder ...
MicroLab, VLSI-12 (2/29)
JMM v1.4
Binary Addition
Heres an example of binary addition as one might do it by hand:
1 1 0 1
If we use a twostwos-complement representation for signed integers, the same procedure will work for adding both signed and unsigned numbers. Besides the sum, one often wants two other bits of information from an adder:
carrycarry-out: indicates that add in the most significant position produced a carry; used when implementing multimulti-word arithmetic, e.g, 1 + ((-1) C =a b +s (a +b ) n1 n1 n1 n1 n1
overflow: indicates that the answer has too many bits to be represented correctly by the result width (2s complement), e.g., (2N-1 - 1)+ (2N-1- 1)
V =a b s +a b s n1 n1 n1 n1 n1 n1
MicroLab, VLSI-12 (3/29)
JMM v1.4
COUT S
...
S2 S1
C0 S0
So if we had (N+1)(N+1)-input gates and didnt mind a lot of loading on the P signals, signals, the propagation delay of adder built using this equation for the carries would be (count per fanfan-in 1 delay unit: ripple carry: 5N delays): ____________________________________ Of course, this is impractical but it does lead to some interesting ideas:
faster rippleripple-carry implementations hierarchical carrycarry-lookahead adders
PN GN GN
CN CN-1
CN
PN
To prevent GN from affecting CN-1, PN must be computed as AN xor BN. But we needed the xor anyway now SN = PN xnor CN
MicroLab, VLSI-12 (6/29)
JMM v1.4
SN
xnor
SN+1
xnor
SN+2
xnor
Cin
CN+3
Cin
P A G B P A G B P A G B P A G B
AN
BN
The propagate logic in the Manchester carry chain puts a lot of NFETs in series, so when CIN is high the pulldown path can get long if a lot of the P signals are true. For most technologies, the performance of this long pulldown path limits the maximum length of the carry chain to around four stages before it needs to split into subchains. subchains. Adding a bypass path that skips over the block when all P signals are true can improve maximum propagation delay delay when multiple Manchester carry chains are used in series.
3 2,3 0,3
1 0,1
log2(n)
AK SK BK
GJ+1,K CJ
PJ+1,K GIJ
I,K
C I- 1 PIJ
PK CK-1 GK
JMM v1.4
Supp Suppose ppose it takes 1 time unit for a signal to pass thru two logic levels, then time to ripple thru block of k bits = k time units time to skip a block = 1 time unit Consider a 2424-bit carrycarry-skip adder organized as 6 blocks of four bits each. So the worst case propagation time is 4 + 1 + 1 + 1 + 1 + 4 = 12 time units But now reorganize the adder with the least significant 3 bits in in the first block, the next 4 bits in the second block, followed by bl blocks ocks of 5, 5, 4, and 3. Now the worst case propagation time is 3 + 1 + 1 + 1 + 1 + 3 = 10 time units
ripple skip ripple
If we want only one gate delay from X1 to the output f, how do we do it?
...
>=1 & 1 0
1
>=1 &
...
1 0
CIN
...
1 0
...
1 0
...
If it takes k time units for a block to add kk-bit numbers and if it takes one time unit to compute mux select from the two carrycarry-out signals, then for optimal operation each block should be one bit wider than the next block, just as in the carrycarry-skip adder.
JMM v1.4
Adder layouts
M-1
...
N
...
...
... ...
...
...
M-2
...
N
...
...
CSA
CSA
CSA
CSA
CSA
Rewire so that first two adders work in parallel. Feed results into third and fourth adders which also work in parallel, etc.
M-4 2
CSA
CSA
CSA
CSA
CSA
CSA
CPA
...
CPA
...
Wallace Trees
O(log1.5M)
CSA
CSA
CSA
CSA
CSA
CSA
...
We have been using fullfull-adders or 3:2 counters in our array adders. Higher faninfanin-counters can be used to further reduce delays for large M, e.g., Weste shows a 5:3 counter in Fig. 8.41.
Wallace trees give asymptotically better behaviour than the earlier earlier O(M) schemes, but they do not have a regular layout. Other O(log(M)) schemes, e.g., binarybinary-tree multipliers using signed digit representations, have better layout properties but at a cost cost of more complicated adder cells.
CSA
CPA
Carry
S(3)
FF
S(2)
FF
S(1) 0 S(0)
FF
clk
0 CSA adders
FF
Carry S(3)
FF FF
FF FF
FF FF
FF FF FF
FF
FF FF
FF FF
FF FF FF
FF
FF
S(2)
FF FF
FF FF FF
FF
FF
FF
S(1)
FF FF FF
FF
FF
FF
FF
S(0)
CSA adders
JMM v1.4
Binary Multiplication
Suppose we want to multiply two numbers:
A = {AN-1, AN-2, , A1, A0} B = {BM-1, BM-2, , B1, B0}
multiplicand multiplier
Note that BK*A can be accomplished with N AND gates since BK = 0 or 1. The scaling by powers of two is a simple shift. Thus multiplication of an NN-bit number by an MM-bit number boils down to the addition of M NN-bit partial products each of which is formed by a simple Boolean operation. Any of the techniques from the previous slides can be used to accomplish the required additions.
Array multipliers
Example 3x3 array multiplier using CSAs to sum partial A2B2 products: 0
A2B1 0 0 A2B0 0 0 A1B0 0 0 A0B0 0 A0B1 A1B1 A0B2 A1B2
nc
P5 P4
P3
P2
P1
This looks the same as before except we have half as many partial partial products to sum. Generating each partial product is now more complicated since BK+1,K can now be 0, 1, 2 or 3. The only troublesome value here is 3 since that would seem to require more more adder inputs than we have (3*A = A + 2*A). But we can also write 3*A = 4*A - A. Well do the -A in this partial product stage and signal the next stage that it needs to add 4*A. 4*A. To keep the signalling simple well also rewrite 2*A = 4*A - 2*A
Profs go crazy nowadays, why cant he just multiply as everybody does it
M/2
...
BK+1 BK BK-1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1
action
x1 x2 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 0
add 0 -add A 0 add A 0 add 2*A 0 sub 2*A 1 1 sub A 1 sub A add 0 --
>=1 =1
PPi carrycarry-in
Not cheaper than an ADD but all recodes can be done in parallel so we only pay time penalty once (for first column)!
MicroLab, VLSI-12 (22/29)
JMM v1.4
This multiplier only produces a 3232-bit result so top 1616-bits of rhombus have been omitted:
top 16 bits omitted 32
MicroLab, VLSI-12 (23/29)
JMM v1.4
Serial Multiplication
bitbit-serial multipliers are very compact, but lack of high data latency and are very slow simplest form of serial multiplier: successive addition
cout clk
reset A B
&
FF clr result
& clk
M+N bit product -> td=M+N time intervals, but time intervals are larger
& Xj PPin
JMM v1.4
& &
&
Xj+1 PPout
& P
Shifters
Shifters are very important for microprocessor architectures:
arithmetic shifting logical shifting rotation functions
Operation: input logical right shift 0,0,0,A(3:0) logical left shift A(3:0),0,0,0 right rotate A(2:0),A(3:0) left rotate A(3:0),A(2:0) arithmetic right shift A3,A3,A3,A(3:0) arithmetic left shift A(3:0),A0,A0,A0
MicroLab, VLSI-12 (26/29)
JMM v1.4
Coming Up...
Next topic VLSI fabrication: processing steps, basic structures, selfself-aligned processes, P and N devices. Readings for next time Weste:
Sections
generators 8.2.2 comparators 8.2.3 zero/one detectors 8.2.4 binary counters 8.2.5 Boolean operations - ALUs 8.2.6
Overview
Goal: You are able to master your own VHDL project. project. You have basic notions about HW/SW coco-design.
MicroLab, VLSI-13 (1/24)
JMM v1.4
Project Goal
Goal: design of an an electronic system from specification down to ASIC/FPGA Problem: one of the most difficult tasks in a VLSI project design is to find the starting design point Basic Steps: in order to proceed in a structured manner, you should perform the following steps
block diagram HW/SW coco-design (hardware/software coco-design) IP cores (intellectual property cores)
hardware
FSMD architecture model
software
coco -design
hardware
software
coco -design
hardware/software system simulation synthesis, place & route back-annotation & simulation (formal design verification) backJMM v1.4
chip test
5. identify speed sensitive (HW) and control sensitive (SW) tasks 6. define the intelligence of each functional unit 7. identify IP cores 8. organize as much as possible IP cores (tools, core generators, old designs, internet) 9. update design if necessary according to available IP cores 10. define interinter-process communication 11. define the interconnections between your units
In the classical HW/SW coco-design approach, the design process is continued as long as possible independent of its implementation. HW/SW design units are identified at the very end of the design steps. In smaller designs, as it is in our case, the HW/SW coco-design step is done in an early phase.
MicroLab, VLSI-13 (3/24)
JMM v1.4
Power Power
DAC DAC
keyboard interface
Decoder interface
USB interface
Flash interface
DAC interface
control sensitive
MP3 Player ASIC/FPGA power management add intelligence keyboard interface Decoder interface main control LCD interface I2C interface I2S interface
USB interface
Flash interface
DAC interface
speed sensitive
JMM v1.4
add intelligence
MicroLab, VLSI-13 (6/24)
main control
PIC core
keyboard interface
USB core
USB interface Flash interface DAC interface
main control
interface
Decoder interface Port A Port B intelligent flash interface DAC interface Port C Port D
USB core
USB interface
I.
FSMD architecture model
VHDL coding
JMM v1.4
C coding
Decoder interface
USB core
USB interface
Port D
JMM v1.4
command register
mux
PIC core
Software C Code
Port A Port B Port C intelligent intelligent lash Flash & I2S interface interface (FSMD architecture) Port D intelligent intelligent I2C LCD interface interface (FSMD architecture)
USB core
Hardware (IP core)
JMM v1.4
Software C Code
intelligent intelligent Flash & I2S interface LCD interface (FSMD) (FSMD)
process 1
request
process 2
aknowledge data
data valid
Process 2
JMM v1.4
Test Bench
response generation and verification
system simulation
14. synthesis of logic level design 15. simulation of logic level with test bench 16. place & route your design for target technology 17. 18. back annotation and simulation with test bench (formal design verification)
verify test
19. chip fabrication 20. chip test with test bench 21. in system test
all three items interact with each other resulting in 2 closed loops The closed loops may have realreal-time constraints
All three design entry elements will be converted to VHDL and thus can be implemented into a SoC
Gecko main board Software Real Time Signal Processing Hardware Hardware IP blocks
Microprocessor IP Core
SoC
Power blocks
Analog blocks
Sensor
GECKO main board n top if an application specific GECKO expansion board (RFID reader application application, ,2W 13.56MHz RF power)
MicroLab, VLSI-13 (22/24)
JMM v1.4
hardwarehardware-inin-thethe-loop
hardwarehardware-inin-thethesoftwaresoftware-loop
Homework: MyProject
define your own project plan the development and use the presented design methodology prepare the presentation of your project, be sure you do have all the necessary documentation for the discussed design steps MyProject 2002: 2002: speed controlled dc motor
Matlab/ /Simulink with speed controller Matlab dc-motor electronics GECKO main board with dcin-thehardware-in the-simulationsimulation-loop use hardware-
Implementation constraints:
microprocessor with C code for administrative tasks pulse wide modulation for driving dc motor (hardware) A/B signal encoder for speed sensing (hardware) driving circuitry (expansion board) as simple as possible
Technical data:
dc motor has 6000 turns/minute at 5V speed sensor has 12 pulses per turn
MicroLab, VLSI-13 (24/24)
JMM v1.4
VLSI Design II
CMOS Processing
Overview Processing steps processing step sequence Goal: You know the basics of integrated circuit processing steps and you are familiar with the processing sequence of a sample CMOS technology.
MicroLab, VLSI-14 (1/32)
JMM v1.4
Introduction
Complementary MOS (CMOS) technology is becoming the dominant candidate for VLSI applications CMOS provides both nn-channel and pp-channel MOS transistors on one chip on extremely expensive fabs cheap chips are produced each chip passes hundreds of different processing steps random process disturbances cause electrical parameter variations of the chips elements are never identical
Process technology pictures and text are copied from: Atlas of IC Technologies, W. Maly, Maly, The Benjamin Cummings Publishing Company, ISBN 00-80538053-68506850-7
MicroLab, VLSI-14 (2/32)
JMM v1.4
n+
n+ p
Most fabrication steps require first creating a mask that determines where the operation will occur. Masks can either be existing layers layers on the IC (these masks are selfself-aligned) or created using a lithographic process and photoresist. photoresist. Design rules ensure that design is still functional in the face of misalignments and various sideside-effects of the fabrication process.
Overview
Overview of Processing Step Sequence n-well active poly Overview of Processing Steps making the wafers photolithography oxidation layer deposition etching diffusion implantation n-diffusion p-diffusion contacts metal1 via1 metal2 passivation
MicroLab, VLSI-14 (4/32)
JMM v1.4
birds bike
MicroLab, VLSI-14 (9/32)
JMM v1.4
CVD
dry etching
MicroLab, VLSI-14 (12/32)
JMM v1.4
Solid state diffusion is a process which allows atoms to move within a solid at elevated temperatures.
The gate oxide needs to be of high quality: uniform thickness, no defects! The thinner the gate oxide, the more oomph the fet will have (well see why soon) but the harder it is to make it defect free.
MicroLab, VLSI-14 (18/32)
JMM v1.4
Parasitic Fets
Planarize
Coming Up...
Next time: Mask layout: design rules, layout examples, structured and symbolic layout techniques, retargetable layouts. CAD tools for layout: design capture, design rule checking, extraction, network comparison. Readings for next time Weste:
Chapter 3 thru 3.2.3 2 through 2.1 (CMOS processing) transparency notes (process technology)
Johns&Martin:
Transparencies:
Study CBT course on the web or on I3SI3S-CD: How a silicon integrated circuit is made ( (Uni Uni Manchester)
MicroLab, VLSI-14 (31/32)
JMM v1.4
Weste pp168: 3.8 ex 5 (difficulty: easy): Explain why substrate and well contacts are important in CMOS.
VLSI Design II
CMOS Layout
Measure twice, fab once
Overview CMOS Layout and Design Rules Analog Layout Design Considerations Goal: You are familiar with the basic layout design rules of the Alcatel 0.5m CMOS process. You know how to layout integrated transistors, capacitors and resistors, and what has to be considered in order to realize quality analog circuits, like matching and shielding.
MicroLab, VLSI-15 (1/36)
JMM v1.4
Sources of Error
resist exposure and development over/under etching, lateral diffusion uneven topography systematic errors corrected by bloating/ shrinking mask random errors increase minimum widths and spacing
Mask misalignment
contacts and vias only on flat surfaces no devices near boundaries of well no poly contacts over diffusion gate metal must connect to diffusion minimum metal coverage requirements Electrical properties current density limitations latchlatch-up prevention mobility variations (why?) thinthin-oxide thickness variations sheet resistances use of process corners in analysis
MicroLab, VLSI-15 (2/36)
Process instabilities
JMM v1.4
Design Rules
enclosure rules
Exclusion rule
width rules
spacing rules We can specify the design rules using some convenient units, e.g., microns but what happens if we want to manufacture the chip using different manufacturers? One suggestion: use an abstract unit, the lambda, and scale the design to the appropriate actual dimensions when the chip is to be manufactured. Usually all edges must be on grid, e.g., in the MOSIS scalable rules, all edges must be on a half lambda grid, on the 0.5m Alcatel all edges must be on 0.05m grid.
3x3 3 2 2 3 2
1 3 1 1
2 6
Retargetable Layouts?
So, should one use lambda rules, or not? probably okay for retargeting between similar processes, e.g., when later process is a simple shrink of the earlier process. This often happens between generations as a midmid-life kicker for a process. Some 0.35m processes are shrinks of an earlier 0.5m process. Can be useful for fabless fabless semiconductor companies. most industrial designs use micron rules to get the extra space efficiency. Cost of retargeting by hand is acceptable for a successful product, but usually its time for a redesign anyway. invent some way of entering a design symbolically but use a more sophisticated technique for producing the masks for a particular process. Insight: relative sizes may change but topological relationship between components does not. not. So, instead of shrinking a design, compact it!
JMM v1.4
used masks nwell nwell active and pplus and poly active and pplus and poly active and pplus and poly and nwell active and pplus and poly and nwell active and poly masks
pfet
nwell n+diffusion p+diffusion nwell active pplus poly
nfet
#1
n-well, active
1.7m 0.8m 0.8m 0.5m n strap 0.7m p strap 1m 1m 0.5m 0.6m 2m (3m) 1.1m 1.1m 0.6m 2.4m n strap n-well on same (different) potential
1m
#2
poly, fets
0.6m
#3
abutting straps
1.6m
#4
contact via1 via2 1.1m
0.7m
1.1m
0.2m
0.7m 0.2m 1m
0.9m
via2
0.8m 0.5m 0.25m
via1
via1 need to be covered by metal2
contact
0.35m 0.6m 0.8m 0.25m contacts need to be covered by metal1
Stick diagram
Compact X then Y
Compact Y then X
Vertical Gates Good for circuits where fets sizes are similar and each gate has limited fanout. fanout. Best choice for multiple input static gates and for datapaths. datapaths.
Horizontal Gates Good for circuits where long and short fets are needed or where nodes must control many fets. fets. Often used in multiplemultiple-output complex gates (e.g, sum/carry circuits).
What about routing signals between gates? Note that both layouts layouts block metal/poly routing inside the cell. Choices: metal2 routing over over the cell or routing above/below the cell. avoid long (> 50 squares) poly runs dont capture white space in a cell dont obsess over the layout, instead make a second pass, optimizing where it counts
MicroLab, VLSI-15 (18/36)
JMM v1.4
Which is the better gate layout? considering node capacitances? considering composibility composibility composibility with neighbouring gates?
area = 94m2
area = 73m2
A B D B C D E C E
B A D B C
MicroLab, VLSI-15 (21/36)
E D E
JMM v1.4
node 1
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
node 2
node 1 Q1 Q2 node 2
gates
Q3
Q4
Using lithography techniques a variety of twotwodimensional effects can cause effective sizes of components to differ from the sizes of the glass layout masks.
lateral diffusion overetching mask misalignment ...
Goal: Matching second second-order size error effects is done unitmainly by making larger objects out of several unit sized components connected together. For best accuracy, the bounding conditions around all objects should be matched, even when this means adding extra unused components.
SiO2 protection well lateral diffusion under SiO2 mask overetching
M2 M1 M1 M2 M2 M1 M1 M2 M2 M1
DM1
MicroLab, VLSI-15 (24/36)
GM1 DM2
JMM v1.4
Capacitor Matching
#1
material
preferable poly1 - poly2 structures (only C05MC05M-A) if not available: poly1 - diffusion (C05M(C05M-D), but nonlinear due to voltage dependency sandwich structures with poly - metal1
in analog design very often precise ratios of capacitors are used major sources of errors in realized capacitors are due to overetching and something less relevant is an oxide thickness gradient across the surface. Goal: Larger capacitors are realized by a parallel unitcombination of smaller unit -sized capacitors overetching). unit(overetching ). If unit -size capacitors are not realizable, overetching can still be minimized by nonunitrealizing a nonunit -sized capacitor with a specific perimeterperimeter -to area ratio. For very accurate ratios commonadditionally common -centroid layout is used (oxide thickness gradient).
Capacitor Matching
x y
#2
xa = x 2e ya = y 2e
x 2e
y 2 e
e e Ca
ox C= A = Cox xy tox
Ca = Cox xa ya = Cox (x 2e )( y 2e ) poly top plate poly bottom plate Ct = Cox xa ya Cox xy Ct 2e(x + y )Cox C t 2e ( x + y ) = = C xy C 2 a C 2 (1 + 2 ) = C1a C1 (1 + 1 ) ideally
C1 C2 C2 C1
1 = 2
nC1 (1 + ) = =n C1 (1 + )
poly etch matching well region well contacts
C 2 a nC1a = C 1a C1a
JMM v1.4
Capacitor Matching
#3
unit sized capacitors C1 are squared nonunitnonunit-sized capacitors C2 are rectangular and usually between 1 and 2 times unitunit-sized capacitors (K>1) C2 A2 x2 y2 K= = = 2 C1 A1 x1
P2 P = 1 A2 A1 P2 A2 = =K P A1 1 x2 + y 2 K= 2 x1
y2 = x1 K K 2 K
JMM v1.4
4 units
K=1 ... 2
MicroLab, VLSI-15 (27/36)
resistor value:
L R = Rsq W
Rsq = t
material: many different materials can be used. They have different nonnon-ideal effects. Absolute accuracy is low (+(+-20% or less), matching can be made to be in the order of 1% at most.
polysilicon ( (salicided salicided and non salicided in C05MC05M-A and C05MC05M-D process) diffusions or ionion-implanted regions (n/p(n/p-diff, nn-well)
material metal1 metal2 metal3 salicid poly n+ diff sal p+ diff sal unsal n+poly n+ diff unsal p+ diff unsal n-well
typ Rsq 72m 55m 34m 2.3 2.3 2.1 325 50 70 1.3k
nonideality not used not used not used parasitic cap v dep, dep, non lin v dep, dep, nonlin parasitic cap v dep, dep, non lin v dep, dep, nonlin v dependent
2.11 Rsq
matched resistors
Where does noise coupling occur every time a digital gate changes its state a glitch is injected on the digital power supply and in the surrounding substrate direct ohmic connections (power supply line) via electromagnetic fields (e.g. capacitive coupling in and from substrate) How can noise be reduced use of different power supply lines layout analog and digital circuitry in different sections of the chip protect analog layout by guard rings use shields connected to power and ground
analog part pad pin power supply
JMM v1.4
digital part
power supply
power supply
MicroLab, VLSI-15 (30/36)
Use of shields
analog interconnect digital interconnect
ground shield
n+
n+ n-well
n+ p- substrate
VDD n+ n-well
JMM v1.4
JMM v1.4
Checking Layouts
Design Rule Checker (DRC). This is a program that checks each piece of the layout against the process design rules. This is a slow process: canonicalize layout into a set of leading and trailing nonnon-overlapping mask edges. Some Boolean mask operations may be needed. determine electrical connectivity and label each edge with the node it belongs to. test each edge end point against neighboring edges to check for spacing (leading edges) and width (trailing edges) violations. Layout vs. Schematic (LVS). First a netlist is extracted from the layout. Use the electrical info generated by the DRC and then recognize transistors are juxtapositions of channel with diffusion. Then see if extracted netlist is isomorphic to the schematic netlist. This is done by a coloring algorithm: initialize all nodes to the same color compute a new color for each node as some hashing function involving the colors of connected (ie (ie, ie, thru a fet) fet) nodes. nodes that have a unique color are isomorphic to similarly colored node in other network nodes worry about parallel fets, fets, ambiguous MicroLab, VLSI-15 (33/36)
JMM v1.4
Coming Up...
Next topic: Small signal fet model Readings for next time Weste:
Johns&Martin:
2.3 (CMOS layout design rules) 2.4 (analog layout design considerations)
Optional
#1
Ex vlsi15.1 (difficulty: easy): Assume the 0.5m Alcatel Mietec process. Use the rules to calculate the minimal area and perimeter of the following layout structure. Result: a) AJ1=4.5m2, AJ2=3.188m2, AJ3=2.25m2, PJ1=6m, PJ2=6m, PJ3=1.5m (see Johns&Martin pp99)
J1 Q1
J3 Q2
J2
#2
John&Martin pp110: 2.3 (difficulty: easy): Show a layout that might be used to match two capacitors of size 4 and 2.314 units, where a unitunit-sized capacitor is 10m x 10m. Result: y2=19.56m, x2=6.717m
Todays handouts: (1) Lecture Slides (2) Problem Set #5 (3) Inverter Layout Tutorial
w whats the schematic for this cell? w what are the fat fets? w Cell was designed for placement under a metal2/metal3 routing grid. How was the layout affected by this design requirement?
Replicating Cells
What does this cell do? What if we want to replicate this cell vertically, i.e., make a stack of the cells, to process many bits in parallel? w what nodes are shared among the cells? w what nodes arent shared? w how should we arrange the cells vertically?
Vertical Replication
Place shared geometry symmetrically about shared boundary. Place items that arent to be shared 1/2 min spacing rule from shared boundary.
Reflect cell about X axis so that Pfets are next to each other: this avoids large ndiff/pdiff spacing. Run shared control signals vertically -- theyll wire themselves up automatically?
Solution: we have to do the routing for vertical intercell signals for a pair of cells, then replicate the pair (complete with routing) vertically.
Building a Datapath
Its often the case that we want to operate on many bits in parallel. A sensible way to arrange the layout of this sort of logic is as a datapath where data signals run horizontally between functional units and control signals run vertically to all the bits of a particular functional unit:
control bit #3 bit #2 bit #1 bit #0 data
Logic that generates the control signals can be placed at the bottom of the datapath. If control logic is complicated or irregular, it might be placed in a separate standard cell block and only the control signal buffers placed placed just below the datapath. Although its tempting to run control signals in poly (so they can control fets) this is unwise for tall datapaths because of poly resistance (e.g., 32 bits x 20u/bit = 640u = ~1000 squares = ~20k ohms!)
BOOLE
OP EN
OP EN
EN
MULT
Adder Datapath
power strapping (M1=GND, M3-VDD) 32-bit carry-lookahead adder tristate output enable control logic 32-bit register w/ tristate driver
MicroLab, VLSI-16 (8/16)
JMM/ESA v1.0
Shifter Datapath
>>4 >>2
>>8
w whats this cell do? w what are the fat fets? w Cell was designed for placement under a metal2/metal3 routing grid. How was the layout affected by this design requirement?
BIT
BIT
word line
w How are neighboring cells placed? w Isnt the word line a long poly wire? w Wheres the p-substrate contact?
Coming Up...
Next time: Scaling effects, fundamental limits. Submicron design issues. Power dissipation and packaging. Readings for next time Weste: 6.3.7 through 6.3.9
I see I see a supercomputer the size of a sugar cube...! Neat. Where do I invest?
Todays handouts: (1) Lecture Slides (2) Mead and Conway, Chapter 9 (1981)
Scaling
Over time, process improvements will allow MOSFETs to scale down by some factor
w/ l/ t/ xj/ NA tox/
What happens?
Often, different dimensions will scale at different rates. But for an overall picture of what the future portends, there are two major scaling models: 1. Constant Voltage Scaling All spatial dimensions scale equally: W W/ L L/ tox tox/ and some other dimensions do as well: d d/ depletion thickness NA NA doping
2. Constant Field - scale VDD too: V V/ so that electric fields remain the same
First, lets consider constant field scaling, and use basic MOSFET models to predict the effect of scaling by Parameters W/L Cg = Cox W L Id Cox (W/L) (Vgs-Vt) 2 device power = V I Area = W L device power / Area Rdiff Rmetal Rpoly Effect
Speedup!
L
e = L/(E) Transit time scales as ___________ Can also compute as time to discharge gate capacitance:
Interconnect
Local (metal) Interconnect Delay = RC
L W
I
d
R = ___________ Scaled R = R ________ C = ____________ Scaled C = C ________ Scaled Delay = delay ___________ This turns out to be an overoptimistic prediction more later...
Scaling Table
First Order Scaling (Weste Table 4.12)
In lateral scaling, we only change the channel length L
Parameter Length (L) Width (W) Voltage (V) Gate oxide thickness (tox) Current Transconductance Junction Depth Substrate Doping (Na) Gate Field (E) Depletion layer thickness Load Capacitance (WL/tox) Gate Delay (VC/I) Constant Field 1/a 1/a 1/a 1/a 1/a 1 1/a a 1 1/a 1/a 1/a Scaling Model Consant Voltage 1/a 1/a 1 1/a a a 1/a a a 1/a 1/a 1/a^2 Resulting Influence a a 1/a 1/a^2 a^3 a^3
DC Power dissipation Dynamic Power Dissipation Power-delay product Gate Area Power-density (VI/A) Current Density
Devices get faster, lower power, though current density goes up.
Devices get even faster, though overall power and power density rise
Die Size
With basic scaling of the same system, wed just end up with smaller and smaller chips.
However, from year to year, the overall die size stays about the same or grows as we add features to the chip.
Fab improvements (mostly, bigger wafers) are what allow for bigger die. Because the die doesnt shrink, global interconnect, particularly clocks and on-chip buses, dont shrink either.
t/
2
Even worse: wire starts looking like lossy distributed rc wire - O(L2) delay!
In the submicron domain, this increased significance of wire has led to major CAD industry turmoil.
Power Scaling
Power per chip increases with constant voltage scaling and when die size grows. How does this affect us? Junction temperature is a function of power and thermal resistance ja to environment. Example: a 30W chip at 27 C ambient. Junction temp. = 27C + 30W*ja
ja=2 C/W
Junction temp = ___________
ja=0.1 C/W
heat sink
chip
MicroLab, VLSI-17 (10/20)
JMM/ESA v1.0
In the submicron domain, its difficult to scale VDD, so power faces the constant voltage scaling of 2 This adds impetus to the already-important goal of reducing power of VLSI systems. Some of the main ways of doing this: 1. Reduce unnecessary on-chip transitions by careful logic design, or by disabling the clock to idle systems. 2. Reduce voltage, use more parallelism. 3. Adiabatic logic.
Some limits
Current Density J increases with
J=I/(Wt) scaled I = I / scaled J = J
L W
I
Metal migration imposes a limit on current density. ==> Thicker wires and more metal layers needed. ==> Increased fringing capacitance with thicker wires.
Xd
Subthreshold leakage
Subthreshold conductance is proportional to exp (Vgs-Vt kT/q
We can scale V t by via ion implantation. kT/q = 0.025V does not scale. Vt falls =====> Subthreshold current ______________ exponentially. Example: Vt = 0.5V means that leakage current time constant is 10 7 is 10 1 Vt = 0.1V means that leakage current time constant
Threshold Variations
Threshold varies from transistor to transistor.
VDD
Vout
If and then
pullup has big threshold, pulldown has small threshold, sum of variances > VDD inverter will not invert (Vout = 0V always.) How likely is this?
But with 10,000,000 transistors on the chip, a broken chip is very likely.
Ultraviolet = = 0.3 X-Ray Lithography, = __________ Synchrotron lithography? Wavelength of an electron? Cost of FABs. Optical tricks.
MicroLab, VLSI-17 (17/20)
JMM/ESA v1.0
Not really. VLSI is not yet really up against any fundamental physical constraint. The constraints that were facing are technological hurdles. With sufficient economic incentive, technological hurdles are cleared. Wires are a lot more important than in the past.
Coming Up...
Next topic MOS memories. Static and dynamic RAM cells. Single and double-ended bit line sensing. Multiport register files. Readings for next time Weste: 4.13
Semiconductor Memories
Usually the majority of transistors found in a modern system are devoted to data storage in the form of random-access memories. The need for increased densities and lower prices has driven the development of improved VLSI technology. Uses:
main memory high capacity, low cost cache memories, TLBs fast access programming info (eg, FPGA) non-volatile
Read-only memories: ROM (non-volatile!) Mask programmed Programmable ROM (PROM) Erasable PROM (EPROM) Electrically Erasable PROM (EEPROM) Read/Write or Random Access memories: RAM Static RAM (SRAM) Multiport SRAM (Register Files) Content-Addressable Memories (CAM) Non-volatile SRAM (NVRAM) Dynamic RAM (DRAM) Serial-access video memories (VRAM) Synchronous DRAM (SDRAM) RAMBUS ...
MicroLab, VLSI-18 (2/21)
JMM/ESA v1.0
Design Tradeoffs
density: bits/unit area. Usually higher density also means lower cost per bit. Improvements due to finer lithography, better capacitor structures, new materials with higher dielectric constants.
Speed: access time (latency) and bandwidth. Improvements due to better sensing (smaller voltage swing), increased parallelism (overlapped accesses), faster I/O.
Power consumption: want power to depend on access pattern not quantity of bits stored. Improvements due to lower supply voltage.
Memory Architecture
bit lines Col. 1 Col. 2 Col. 3 Col. 2M Row 1 Row Address Decoder N Row 2 word lines
M N+M
w Most memory layouts are folded, i.e., D < M. Why? w What are there practical upper bounds on M and N? w What if you want even more memory? w Why only one bit per cell? (Not a silly question!) w Why are page-mode accesses a good idea?
MicroLab, VLSI-18 (4/21)
JMM/ESA v1.0
ROM Circuits
NOR-based ROM array
shared ground R1
R2
R3
R4
C1
C2
C3
C4
R2 R3 R4 0 0 0 1 0 0 0 1 0 0 0 1
C1 0 0 1 0
C2 1 0 0 1
C3 0 1 0 1
C4 1 1 1 0
ROM Layout
VDD
GND
shared ground
w Which are the word lines? the bit lines? w Why are the word lines strapped with M2? w What layers change when programming changes? w How often should signals be refreshed?
MicroLab, VLSI-18 (6/21)
JMM/ESA v1.0
ROM Performance
tACCESS = tROW DECODE + tCOLUMN + tCOL DECODE tROW DECODE : If ROM is large, row decode logic is just a small percentage of total area. So we can make the driver for the word line large and thus fast. Note that we need to strap the poly word line to eliminate slow down due to poly resistance. tCOL DECODE: As with the row decode logic, we can increase speed by increasing size of transistors in this section. t COLUMN: We want small program transistors to keep the total area of ROM as small as possible. Also increasing size of pulldowns increases load on both word and bit lines. This means were limited in the speed we can achieve in pulling down the column. If CPD,DRAIN = 10fF and we have 128 rows: tCOLUMN = C V / I AV = (10fF)(128)(2.5V)/(30uA) = 110ns
Sense Amplifiers
Lets speed things up by sensing small changes in the bit line voltage using a sense amplifier:
R1
R2
C1
C1
C0 C0
SENSE AMP
MC
When bit line is not pulled down, V1 = VDD and V2 = VREF - Vth = 2V, so M3 is off and M4 is on and the output is pulled low. When a bit line pulldown is turned on, V2 starts to drop and M2 conducts well enough so that V1 drops to V2 since MC >> M1. When V1 and V2 drop 0.5V to 1.5V, M3 is strongly conducting and M4 is weakly conducting, so output goes high. So small V on bit line produces large output swing.
SRAM Circuits
precharge or VDD
access fet
bit
precharge or VDD
long-channel fet used as current source Use CLK if possible to reduce power and improve speed
clk
write wdata
inverter pullup
GND
inverter pulldown
access fet
Pulldowns do the work when access fet is turned on, pullups can be small to save space and make the cell easy to write.
MicroLab, VLSI-18 (11/21)
JMM/ESA v1.0
VDD
bit word 1
word
data
volts
bit
bit
word bit
bit
1
time
Choose WPU, WACCESS, W INV so that: fast bit line recovery when WORD goes low dont want to flip selected cell on read (V1 < VTH,INV) large V on BIT lines to speed up sensing minimize cell size
MicroLab, VLSI-18 (12/21)
JMM/ESA v1.0
bit
1
4.8/0.6
V1
bit
4.8/0.6
VDD
0.9/7.2 VCS
A2
A1
A0
One can use predecode logic to decode blocks of addresses which are then further decoded using smaller AND gates. The address lines going to the predecode gates are less loaded and all gates have smaller fanin decode happens faster. Layout works better too!
A2 A1 A0
An alternative design that can be easily expanded without worrying about unintentionally flipping the cell on reads is shown below.
wd
PU = 2/1 PD = 4/1 4/1 5/1 2/1
rd0
rd1
2/1
PU = 2/2 PD = 2/3
Content-addressable RAM
By adding two transistors to the 6-T SRAM cell one can form an XOR gate to compare the cell contents to data on the bit lines. The output of this logic can drive a pulldown in a distributed NOR gate to form a word match signal for a content-addressable memory (CAM).
word
xor gate
match
This node goes high if data on bit lines doesnt match data in the cell.
This node will be pulled down if any bit of the word doesnt match
Read and Write cycles: like before Match cycle: place data on bit lines but dont assert word line.
MicroLab, VLSI-18 (16/21)
JMM/ESA v1.0
CAM Architecture
The word match lines from the CAM array can be used as WORD lines in a companion RAM to read out other data associated with the tag stored in the CAM. Uses: fully-associative caches, translation lookaside buffers (TLBs), ...
CW write
CC
CR Data is stored on CC. Its not destroyed on read, but will leak away through write transistor. CW >> CC rdata
wdata
WRITE: After precharge, CW is charged high. When WRITE is asserted CW shares charge with CC and dominates since CW >> CC. If WDATA is asserted, both CW and CR will be discharged, writing a 0 into the cell; otherwise a 1 will be written.
READ: After precharge, CR is charged high. When READ is asserted CR is pulled low if theres a stored 1 or remains unchanged if theres a stored 0. A sense amp is usually used to speed up the availability of read data.
Pros: little or no static power, smaller than SRAM Cons: needs refresh, need time to precharge
MicroLab, VLSI-18 (18/21)
JMM/ESA v1.0
VREF
A C= d
thinner film
bit
Ta2O5 dielectric
JMM/ESA v1.0
C VDD
C/2 CS PC PC
C/2
C VDD
lbit, rbit precharge (PC) row sel (RN) dummy sel (DSL,R) column sel (CS) precharge bit lines, discharge dummy cells read out bit, opposite dummy amplify difference, restore bit cell
MicroLab, VLSI-18 (20/21)
JMM/ESA v1.0
Coming Up...
Next time: Driving large loads: I/O circuits (edge rates, ESD protection, latch up) Clock generation and distribution (skew) Readings for next time Weste: 5.4.2, 5.5, 5.6
VLSI Design I
Defect Mechanisms and Fault Models
Overview Defects Fault models Goal: You know the difference between design and fabrication defects. You know sources of defects and you can estimate yield. You can handle fault models at different abstraction levels.
MicroLab, VLSI-19 (1/32)
JMM v1.4
Design Defects
Design
Specification
it helps to have a specification to compare against! if specification is written in a hardware description language from which the design is synthesized then the design should be defectdefect-free (modulo bugs in the synthesis software!) Of course the specification may be buggy... everyone feels better if the design/specification are run in the environment in which they will be used. For example, in testing a processor chip, one might boot the operating system and run some key programs, all under simulation. This leads to the need for lots of simulation cycles, e.g., as provided by a hardware emulation system. system. NowNow-a-days these are built using a small army of FPGAs. FPGAs. Other choices: inin-circuit emulation, cyclecycle-based simulators.
Manufacturing Defects
Goal: verify every gate is operating as expected
Defects from misalignment, dust and other particles, stacking faults, pinholes in dielectrics, mask scratches & dirt, thickness thickness variations layerlayer-toto-layer shorts, discontinuous wires (opens), circuit sensitivities (VTH, LCHANNEL). Find during wafer probe. Defects from scratching in handling, damage during bonding to lead frame, manufacturing defects undetected during wafer probe (particularly speedspeed-related problems). Find during testing of packaged parts. Defects from damage during board insertion (thermal, ESD), infant mortality (manufacturing defects that show up after a few hours of use). Also noise problems, susceptibility to latchlatch-up... Find during testing/burntesting/burn-in of boards. Defects that only appear after months or years of use (metal migration, oxide damage during manufacture, impurities). Found by customer (oops!).
Cost of replacing defective component increases by an order of magnitude with each stage of manufacture.
MicroLab, VLSI-19 (3/32)
JMM v1.4
process steps
electrical parameters
currents, resistances, threshold voltages, ...
wafer fabrication
bonding packaging
function test
test for logical faults: binary test sequences are applied to the device under test (DUT)
Defect classification
defects occur at different fabrication steps: defects at wafer fabrication defects at chip packaging defects during chip lifetime
effect:
normally occur at primary inputs or outputs
easy to detect
time
Yield modeling
defects can produce faults yield is percentage of fault free chips yield influences chip cost yield models are necessary to predict chip cost local defects produce most faults assumption: local defects are statistically independent and occur with probability p binominal distribution Pr{K=k} = Pr{k from n areas are faulty} due to Bernoulli n n k Pr{K = k} = (1 p ) p k k
probability that a chip is fault free Pr{K = 0} = e DA Murphy normalized density function f(D) Y = e AD f (D )dD
0
calculation of yield with Murphy's density function f(D) Y1, Y2, Y3 ? f(D) (for high yield) 1/D
0
1 e Y2 = AD 0
AD0
2
f2 f3 1/(2 D0)
f1
D0
2D0
1 1 1 1
10 10 01 00
Exhaustive testing is not only impractical, its not necessary! Instead we only need to verify that no faults are present which may take many fewer vectors.
MicroLab, VLSI-19 (17/32)
JMM v1.4
benefits of redundant circuits redundancy for higher functionality security redundancy to eliminate hazards disadvantages of redundant circuits faults not detectable (masking effect)
example of TTL NAND gate with many defects describable with stuckstuck-at fault model
R1 R2 R4
I1 T1 I2 R3
T4 T2 O T3
Fault reduction
fault collapsing
fault equivalence fault dominance single faults, multiple faults
fault detection
) fault free function: f(x) ) with fault : f(x)
test vectors x detect fault, if condition is fulfilled: f ( x ) f ( x ) = 1 fault equivalence f ( x ) = f ( x ) fault dominance T T fault dominates
A 0 0 1 1 B 0 1 0 1 A C B fault classes /1 <=> /1 <=> /1 /0 => /0 /0 => /0 /0
fault free
B
stuckstuck-at
stuckstuck-open
asop
bsop
vddsop A B Y /0 /0 /0 a b vdd Y 0 0 1 0 1 0 1 0 0 1 1 0
disadvantage
less accurate not useful for all subsub-functions
Coming Up...
Next topic Test pattern generation and fault simulation Readings for next time Weste Weste: :
#2
Ex vlsi19.3 (difficulty: easy): Result: Y5mm2=0.91 (high yield), Y1cm2=0.24 (low yield equation), see vlsivlsi-19/13 Ex vlsi19.4 (difficulty: easy): Discuss faults due to defects at the TTL nand gate on transparency 22. What kind of stuckstuck-at fault do you have if a) R1 is an open circuit, b)open at I1, c) open in R2 Result: a) O ss-a-1 s-a-1, b) I1 s-a-1, c) O s-
VLSI Design I
Test Pattern Generation and Fault Simulation
Overview Test pattern generation Fault simulation Goal: Design for testability terms like controllability and observability are known. You are familiar with test pattern algorithms as well as with testability measure metrics.
MicroLab, VLSI-20 (1/26)
JMM v1.4
Testers
The device under test (DUT) can be a site on a wafer or a packaged part.
Each pin on the chip is driven/observed by a separate set of circuitry which typically can drive the pin to o one ne data value per cycle or observe (strobe) the value of the pin at a particular particular point in a clock cycle. Timing of input transitions and sampling of outputs outputs is controlled by a small (<< # of pins) number of highhighresolution timing generators. To increase the number of possible input patterns, different data formats are provided: provided:
tCYCLE
nonnon-returnreturn-toto-zero (NRZ) returnreturn-toto-zero (RTZ) returnreturn-toto-one (RTO) surroundsurround-byby-complement (SBC)
JMM v1.4
design for testability: controllability, observability system designer needs DFT knowledge
ad-hoc approaches to augment controllability: ad partitioning, more testtest-pads scan-path, structured methods, multiplexer approach, scanbuiltbuilt-in logic block observation (BILBO), boundaryboundary-scan, signature analysis, etc...
Boolean difference
algebraic method: boolean difference circuits function with input vector x f ( x ) = f ( x1 ... x n ) for ith component of vector x with fix value we define f i (1) = f ( x1 ,..., x i 1 ,1, x i +1 ..., xn ) f i (0 ) = f ( x1 ,..., xi 1 ,0, x i +1 ..., x n ) definition of boolean difference f ( x ) = f ( x1 ,..., x i ,..., x n ) f ( x1 ,..., xi ,..., xn ) x i f ( x ) = f i (0 ) f i (1) xi circuit with fault : stuckstuck-atat-1 at input xi
) and f(1) to detect ss-a-1 faults the two functions f(x) must produce different results, so the test vector set is f defined by T=1 T = f ( x ) f ( x ) = x i x i f for ss-a-0 faults: T = f ( x ) f ( x ) = xi xi
MicroLab, VLSI-20 (5/26)
JMM v1.4
f (x ) f (x ) = xi xi
[ f (x )g (x )] g (x ) f (x ) f (x ) g (x ) = f (x ) g (x ) xi xi xi xi xi [ f (x ) + g (x )] g (x ) f (x ) f (x ) g (x ) = f (x ) g (x ) xi xi xi xi xi f (x ) [ f ( x )g ( x )] = g(x ) x i x i f (x ) [ f ( x ) + g ( x )] = g(x ) x i x i [ f (x ) + g (x )] [ f (x ) g (x )] = xi xi [ f (x ) g (x )] f (x ) g (x ) = xi xi xi
MicroLab, VLSI-20 (6/26)
JMM v1.4
g ( x ) independent of xi
x1 x2 x3
G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
s3
disadvantage:
conflicts lead to time consuming dummy calculations not usable for large circuits
MicroLab, VLSI-20 (8/26)
JMM v1.4
Step 1: Sensitize circuit. circuit. Find input values that produce a value on the faulty node thats different from the value forced by the fault. For our SS-A-1 fault above, want output of AND gate to be 0.
Is this always possible? What would it mean if no such input values exist? Is the set of sensitizing input values unique? If not, which should one choose? Whats left to do?
x1 x2 x3 G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
Xs
S-A-1
Step 2: Fault propagation. propagation. Select a path that propagates the faulty value to an observed output (y in our example).
x1 x2 x3
G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
Xs
S-A-1
Step 3: Line justification. justification. Find a set of input values that enables the selected path (backtracking).
Is this always possible? What would it mean if no such input values exist? Is the set of enabling input values unique? If not, which should one choose?
x1 x2 x3
G1 s1
1
G3 & s2 G5 y
1
G2 x4
G4 &
Xs
S-A-1
PODEM
Podem algorithm is simpler to understand than DDalgorithm backtrack (branch(branch-andand-bound) algorithm is used
small steps to reach objective dead-end, go back if objective leads to dead-
PODEM: Example
branchbranch-andand-bound tree
nodes represent decisions branches represent PI's 9 represent 1st decision faulty stuck-atat-1 example x1 stuck-
start
x1=0 x2=1
G1 s1 x2 x3
1
x1
G3 & s2 G5 y
1
G2 x4
G4 &
s3
line justification
start with the most difficult path to control
sandia controllability/observability controllability/observability analysis program ( (scoap scoap) scoap) each node in a circuit gets values for its controllability, observability and testability high values indicate nodes which are hard to control or to observe distinguish between "1" and "0" controllability distinguish between combinational and sequential values
? ?
Wed like to have a way to measure the observability of a node, i.e., some indication of how hard it is to observe the node at the outputs outputs of the chip. During fault propagation we could choose the gate whose whose output was easiest to observe. Similarly, during backtracking we need a way to choose between alternative ways of forcing a particular value:
want 0 here
In this case, wed like to have a way to measure the controllability of a node, i.e., some indication of how easy it is to force the node to 0 or 1. During backtracking we could choose the input that was easiest to control.
MicroLab, VLSI-20 (17/26)
JMM v1.4
CC 1 ( y ) = min{ CC 1 ( x1 ), CC 1 ( x 2 ), CC 1 ( x3 )}+ 1
AND gate: CC 0 ( y ) = min{ CC 0 ( x1 ), CC 0 ( x2 ), CC 0 ( x3 )}+ 1 CC 1 ( y ) = CC 1 ( x1 ) + CC 1 ( x 2 ) + CC 1 ( x3 ) + 1 combinational "1" and "0" observability of a logic gate dependent on output y and inputs x2,x3 OR gate: CO ( x1 ) = CO ( y ) + CC 0 ( x 2 ) + CC 0 ( x3 ) + 1 AND gate: CO ( x1 ) = CO ( y ) + CC 1 ( x2 ) + CC 1 ( x3 ) + 1 initialization (N are internal nodes, X,Y are PI, PO's) CC 0 ( X ) = 1 CC 0 (N ) = CO (Y ) = 0 CC 1 ( X ) = 1 CC 1 (N ) = CO (N ) =
MicroLab, VLSI-20 (18/26)
JMM v1.4
testability measure assumes that the further a node is from an input/output the harder it is to set/observe CC0(Z) = min[CC0(A), CC0(B)] + 1 A CC1(Z) = CC1(A) + CC1(B) + 1 Z B CO(A) = CO(Z) + CC1(B) + 1 CO(B) = CO(Z) + CC1(A) + 1 if more than one, choose min
A B
CC0(Z) = CC0(A) + CC0(B) + 1 CC1(Z) = min[CC1(A), CC1(B)] + 1 CO(A) = CO(Z) + CC0(B) + 1 CO(B) = CO(Z) + CC0(A) + 1
-.-.0 -.-.0 CC0,CC1,CO
MicroLab, VLSI-20 (19/26)
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-
JMM v1.4
Fault simulation
goals of fault simulation: analyze circuit under faults condition qualify test sequence, fault coverage reduce fault set during test generation good quality fault models necessary fault simulation methods parallel fault simulation concurrent fault simulation deductive fault simulation alternative to fault simulation in test generation procedures:
tracing fault sensitive paths
bitposition 1 2 3 4 5
A=[00000]
A'=[01000] MA B'=[00100] MB
1
C=[01100] MC
C'=[01110]
B=[00000]
Fault Grading
So, youve constructed a set of test vectors using the techniques described here. Will they detect all the faulty parts? You could see how many different faults your vectors detect by inserting each possible fault one at a time, running the vectors, then check to see if some output was different from the good machine on some cycle. Need *lots* of simulation probably impractical for large circuits even with hardwarehardware-accelerated simulator. You can use the same sorts of statistical sampling techniques that other QA programs employ: randomly select a set of faults, fault grade your vectors on those faults and use standard statistical techniques to see if fault coverage exceeds a desired level. The level of confidence may be increased by increasing the number of samples.
MicroLab, VLSI-20 (22/26)
JMM v1.4
Conclusion
defects during chip fabrication are inevitable faults model defects on higher abstraction levels higher chip complexity, more gates and less pads reduce controllability, observability and thus testability test pattern generation is going to be time consuming and thus costly structured design for test during chip development is required
Coming Up...
Next topic Design for Testability Readings for next time Weste:
Sections
#1
Ex vlsi20.3 (difficulty: medium): The digital circuit suffers from error s-a-0. Try to find test patterns by means of DD-algorithm. If you dont succeed use the boolean difference to calculate the test patterns. Result: T=x1 x2 x3 x4 found by boolean difference
G3 G6 y
&
s1
G3 & s2 G5 y
1
G4 &
x4
s3
Ex vlsi20.5 (difficulty: easy): a) circuit with stuckstuckatat-0 fault at s1. b) circuit with stuckstuck-atat-0 fault at x1. Find all test patterns which detect the the fault with means of the boolean difference. Result equations a) x=(x1+x2)x3, b) x=x1x2x3
Overview ? Top down design-flow, VHDL hardware description language, test-bench methodology Goal: You are able to design circuits with the VHDL language with behavioral, dataflow and structural modeling. You are familiar with the top down design flow and the test-bench methodology.
MicroLab, VLSI-21 (1/94)
JMM v1.4
chapter 1
?graphic HDLs
SpecdChart, etc. (control & dataflow graphs)
?tabular HDLs
BIF, etc. (FSMD models in tabular forms)
?time-diagram HDLs
Waves, etc.
?Standardization
? VHDL: IEEE Std 1067-1987 & 1993 ? std_logic package IEEE Std 1164-1993 ? Verilog-HDL: IEEE Std 1997
Verilog-HDL
C-like concise syntax
Built-in types and logic representations. Oddly, this has led to slightly incompatible simulators from different vendors. Design is composed of modules.
Behavioral, structural, logic-level modeling Synthesizable subset... Easy to learn and use, fast simulation, good for logic. Gateway Design Automation
MicroLab, VLSI-21 (5/94)
Verilog-HDL ? simple & efficient language ? hardware driven language ? goal: automatic synthesis language structures ? module (blocks or subblocks) ? #include (file structuring)
chapter 2
a b sum carry
time (ns) 5 10 15 20 25 30 35 40
Signal Values
?signal values are physically associated to wires ?VHDL language supports signal type:
?type: bit, values: 0, 1 ?type: bit_vector, values: 0001, etc
value U X 0 1 Z W L H -
interpretation un-initialized forcing unknown forcing 0 forcing 1 high impedance weak unknown weak 0 weak 1 dont care
MicroLab, VLSI-21 (9/94)
JMM v1.4
Resolved Signals
?it is common for components in a digital system to have multiple sources for the value of an input signal ?many designs use buses: a group of signals that can be shared among multiple sources ?the values on shared signals will be determined upon the type of interconnection, like wired logic ?the signal values depend on its implementation ? the VHDL simulator has to resolve the signals value ? The IEEE 1164 package offers std_logic and std_logic_vector signal types for resolved version of the signal std_ulogic and std_ulogic_vector
resolved signal necessary wired-or logic
unresolved signal
chapter 3
Entity
?the design entity is a primary programming abstraction in VHDL ?entity defines the interface of a component, without giving any information about the component behavior
a b entity HalfAdder is port (a,b : in bit; sum,carry : out bit); end HalfAdder; + sum carry
library IEEE; use IEEE.std_logic_1164.all entity HalfAdder is port (a,b : in std_ulogic; sum,carry : out std_ulogic); end HalfAdder;
MicroLab, VLSI-21 (11/94)
JMM v1.4
Mux4to1 z
d clk
D_ff
q qNot
rNot
a b
Architecture
?the design architecture is a primary programming abstraction in VHDL ?architecture describes the internal behavior of a component, without giving any information about the component IOs ?The behavioral description can take many forms. These forms differ in the levels of abstraction and detail.
architecture behavior of HalfAdder is -- comment: declaration of variables begin ...
end behavior;
library IEEE; use IEEE.std_logic_1164.all; entity FullAdder is port (a,b,ci: in std_logic; co,s:out std_logic); end FullAdder; architecture behavior of FullAdder is -- comment: declaration of variables ...
VHDL
end behavior;
module FullAdder (a,b,ci,co,s); input a,b,ci; output co,s; /* comment: declarations of variables */ ...
Verilog-HDL
JMM v1.4
Concurrency
?The operation of digital systems is inherently concurrent ?Within VHDL signals are assigned values using signal assignment statements <= ?Multiple signal assignment statements are executed concurrently concurrent architecture concurrent_behavior of HalfAdder is signal assignment begin
sum <= (a xor b) after 5 ns; carry <= (a and b) after 5 ns; end concurrent_behavior;
a b sum carry
time (ns) 5
JMM v1.4
10
15
20
25
30
35
40
Dataflow Model
#1
library IEEE; use IEEE.std_logic_1164.all; entity HalfAdder is port (a,b: in std_logic; carry,sum:out std_logic); end HalfAdder; architecture dataflow of HalfAdder is begin sum <= (a xor b) after 5 ns; carry <= (a and b) after 5 ns; end dataflow;
Dataflow Model
L1 + L3
s1
#2
L4 L2 s2
s3
L5
library IEEE; use IEEE.std_logic_1164.all; entity FullAdder is port (a,b,cIn: in std_logic; cOut,sum: out std_logic); end FullAdder;
architecture dataflow of FullAdder is signal s1,s2,s3 : std_logic; constant gate_delay: Time:=5 ns; begin L1: s1 <= (a xor b) after gate_delay; L2: s2 <= (cIn and s1) after gate_delay; L3: s3 <= (a and b) after gate_delay; L4: sum <= (s1 xor cIn) after gate_delay; L5: cOut <= (s2 or s3) after gate_delay; end dataflow;
MicroLab, VLSI-21 (17/94)
JMM v1.4
Signal Assignments
?simple signal assignments
#1
sum<=(a xor b) after 5 ns, (a or b) after 10 ns, (not a) after 15 ns; sig <= 0, 1 after 10 ns, 0 after 20 ns, 1 after 40 ns;
time (ns) 5 10 15 20 25 30 35 40
in1 in2 z s1 s2 s3 s4
0 10 20 30 40 50
architecture delta_delay of Comb is signal s1,s2,s3,s4: std_logic:=0; begin s1 <=not(in1); s2 <=not(in2); s3 <=not(s1 and in2); s4 <=not(s2 and in1); z <=not(s3 and s4); end delta_delay; 60 70 in2 s2 s3 z 10 ? 2? 3?
out1 <= (a xor b) after 8 ns; out2 <= (a xor b) after 2 ns; input out1 out2
2 ns 8 ns input out
10
15
20
25
30
35
40
input out1
5 8 ns
10
s2
carry
a b sum carry s1 s2
0
JMM v1.4
inertial transport
time (ns) 2 4 6 8 10 12 14
MicroLab, VLSI-21 (24/94)
?Ex405 (difficulty: easy): Construct and test a VHDL module for generating the following waveforms. a b c time (ns)
0 10 20 30 40 50 60
?Ex vlsi21 (difficulty: easy): Have a look at the exercises at the end of chapter 3 of VHDL: Starters Guide
chapter 4
architecture behavior of MyBlock1 is begin c <= a and b after 5 ns; end behavior; concurrent signal assignment process
VHDL
architecture DemoExample of Multiplexer4to1 is begin 4 to 1 multiplexer process (a,b,c,d,sel) (no interfered memory) begin case sel is when (00) => z <= a; when (01) => z <= b; when (10) => z <= c; when (11) => z <= d; module Multiplexer4to1(sel,a,b,c,d,z); when others => z<=-------; input [1:0] sel; end case; input [15:0] a,b,c,d; end process; output [15:0] z; end DemoExample; assign z =(sel == 2d0) ? a: (sel == 2d1) ? b: (sel == 2d2) ? c: (sel == 2d3) ? d: Verilog-HDL 16bx; endmodule
MicroLab, VLSI-21 (32/94)
JMM v1.4
entity Multiplier is b port (a,b: in std_logic_vector(31 downto 0); m: out std_logic_vector(63 downto 0)); end Multiplier;
architecture behavioral of Multiplier is constant modulDelay: Time:=10 ns; begin process(a,b) variable bReg: std_logic_vector(63 downto 0); variable aReg: std_logic_vector(31 downto 0); begin aReg:=a; bReg:=(x00000000) & b; for index in 1 to 32 loop if bReg(0)= 1 then bReg(63 downto 32):=bReg(63 downto 32)+aReg(31 downto 0); end if; bReg(63 downto 0):= 0 & bReg(63 downto 1); end loop; m<=bReg after modulDelay; end process; end behavioral;
MicroLab, VLSI-21 (34/94)
JMM v1.4
More on Processes
?Never assign a value to a signal in different processes (multiple drives).
process A y<=0; process B y<=1; conflict - two drivers! - not synthesisable!
?Upon initialization all processes are executed at once. ?Thereafter processes are executed in a data-driven manner:
?activated by events on signal list of the process or ?by waiting on occurrences of specific events using wait statements
if a process has no sensitivity list you MUST use wait statements, otherwise your process never suspends and blocks your simulation
MicroLab, VLSI-21 (38/94)
JMM v1.4
Latch
reset
d clk
Flip-Flop
reset
process(clk,reset) begin if (reset = 0) then q <= 00000000; elsif rising_edge(clk) then if (enable = 1) then q <= d; end if; end if; end process;
JMM v1.4
d enable
Mux clk
register
reset
MicroLab, VLSI-21 (39/94)
clk reset
?Ex407 (difficulty: easy): Write a VHDL code for a 16 bit counter with an enable a load and a asynchronous reset input.
Counter16 enable load clk reset
MicroLab, VLSI-21 (40/94)
JMM v1.4
data
count
time
?Ex vlsi21.8b (difficulty: medium, optional): Rewrite the above handshake model by using a clk1, clk2 signal for the two synchronous processes as well as a rst for initialization, and a start signal to initiate one data transfer. Do not use any wait constructions within the processes.
MicroLab, VLSI-21 (42/94)
JMM v1.4
Attributes
attribute signalevent signalactive signallast_event signallast_active signallast_value signalleft signalright signalhight signallow signalascending signallength
JMM v1.4
function function returning a Boolean value signifying a change in value on this signal function returning a Boolean value signifying an assignment made to this signal (may not be a new value) function returning the time since the last event function returning the time since the signal was last active function returning the previous value of this signal returns the leftmost value of signal in its defined range returns the rightmost value of signal in its defined range returns the highest value of signal in its defined range returns the lowest value of signal in its defined range returns true if signal has an ascending range of values returns the number of elements in the array signal
MicroLab, VLSI-21 (43/94)
Z
0 10 20 30
time (ns) 40 50
architecture behavioral of Periodic is begin process begin Z<=0, 1 after 10 ns, 0 after 20 ns, 1 after 40 ns; wait for 50 ns; end process; end behavioral; library IEEE; use IEEE.std_logic_1164.all;
10
20
30
40
architecture behavioral of TwoPhase is begin reset_process: reset<=1, 0 after 10 ns; clock_process: process begin phi1<=1, 0 after 10 ns; phi2<=0, 1 after 12 ns, 0 after 18 ns; wait for 20 ns; end process; end behavioral;
JMM v1.4
50 60 time (ns)
architecture behavioral of MooreFSM is type StateType is (MyState,YourState,InitState); signal state : StateType; signal outputData: std_logic_vector(5 downto 0); begin output_process: process(state) begin transition_process: process(reset,clk) case state is begin when MyState => if (reset = 0) then outputData<=0100; state <= InitState; when YourState => elsif rising_edge(clk) then outputData<=00100-; case state is when InitState => when MyState => outputData<= 100100; state<=YourState; when others => when YourState => outputData<=000000; if (inputDataSignal = 1) then end case; state<=MyState; end process; end if; when others => null; outSig0<=outputData(0); end case; outSig1 <=outputData(1); end if; outSig2<=outputData(2); end process; end behavioral;
MicroLab, VLSI-21 (45/94)
JMM v1.4
0 1 0 1 0 0
RedState1
main second
0 0 1 1 0 0
1 0 0 0 0 1
carPresent
RedState2
red orange green
reset
main second
1 0 0 0 1 0
carPresent
chapter 5
Modeling Structure
?a structural model of a system is described in terms of interconnection of its components ? a structural model consists of 3 features:
?component declaration ?signal declaration ?component interconnection
HalfAdder3 a sum b carry OR2 a ports component label H1 in1 in2 cIn HalfAdder3 a sum b carry s1 H2 HalfAdder3 a sum b carry OR2 a b O3 component interconnection sum z cOut s2 s3 b z
component declaration
JMM v1.4
VHDL vs. Verilog: library IEEE; Structural use IEEE.std_logic_1164.all; Description entity FullAdder4 is
architecture flatStructure of FullAdder4 is component XOR port(a,b: in std_logic; z:out std_logic); end component; component AND2 port(a,b: in std_logic; z:out std_logic); end component; component OR3 port(a,b,c: in std_logic; z:out std_logic); end component; signal net1,net2,net3,net4:std_logic; begin u1: XOR port map (a,b,net1); u2: XOR port map (cIn,net1,sum); u3: AND2 port map (cIn,a,net2); u4: AND2 port map (cIn,b,net3); u5: AND2 port map (a,b,net4); u6: OR3 port map (net2,net3,net4,cOut); end flatStructure;
VHDL
module FullAdder4 (a,b,cIn,cOut,sum); input a,b,cIn; output cOut,sum; wire net1,net2,net3,net4; XOR u1(net1,a,b); XOR u2(sum,cIn,net1); AND2 u3(net2,cIn,a); AND2 u4(net3,cIn,a); AND2 u5(net4,a,b); OR3u6(cOut,net2,net3,net4); endmodule
MicroLab, VLSI-21 (50/94)
Verilog-HDL
JMM v1.4
VHDL
architecture dataFlow of FullAdder5 is signal tmp: std_logic_vector(1 downto 0); begin tmp <= 0 & a + b + cIn; cOut <= tmp(1); sum <= tmp(0); end behavior; module FullAdder5 (a,b,cIn,sum,cOut); input a,b,cIn; output cOut,sum;
Verilog-HDL
top level
OR2
HalfAdder3
AND2
XOR2
bottom level
MicroLab, VLSI-21 (52/94)
JMM v1.4
Generics
?The VHDL language provides the ability to construct parameterized models using the concept of generics
entity AND2 is generic(andDelay: Time); port(a,b : in std_logic; z: out std_logic; end AND2; architecture genericDelay of AND2 is begin z<=a and b after andDelay; end genericDelay;
library IEEE; use IEEE.std_logic_1164.all; entity HalfAdder4 is generic(adderDelay: Time:=3 ns); port(a,b : in std_logic; sum,carry: out std_logic; end HalfAdder4;
architecture genericDelay of HalfAdder4 is component AND2 is generic(andDelay: Time); port(a,b : in std_logic; z: out std_logic; end component; component XOR2 is generic(xorDelay: Time); port(a,b : in std_logic; z: out std_logic; end component; begin C1: XOR2 generic map(12 ns) port map(a,b,sum); C2: AND2 generic map(adderDelay) port map(a,b,carry); end genericDelay;
JMM v1.4
More on Generics
?Within a structural model there are two ways in which the values of generic constants of lower level components can be specified:
?in the component declaration ?in the component instantiation
?If both are specified, then the value provided by the generic map() takes precedence. ?If neither is specified, then the default value defined in the model is used.
library IEEE; use IEEE.std_logic_1164.all; entity GenericOR is generic(n: positive:=2); port(in1: in std_logic_vector((n-1) downto 0); z: out std_logic); end GenericOR; architecture behavioral of GenericOR is begin process(in1) variable sum: std_logic:=0; begin sum:=0; for i in 0 to (n-1) loop sum:=sum or in1(i); end loop; z<=sum; end process; end behavioral;
JMM v1.4
Configuration
?Structural models may employ different levels of abstraction ?Each component in a structural model may be described as a behavioral or a structural model ?Configuration allows stepwise refinement in a design cycle ?Configuration represents resource binding ?Description-synthesis design method
Configuration associates an architecture description to each component: - behavioral or - structural for FullAdder3
FullAdder3
OR2
HalfAdder3
AND2
XOR2
MicroLab, VLSI-21 (56/94)
JMM v1.4
architecture gataLevel of Comb is - - C1 a b Comb (combinational logic) C2 carryIn q Dff d clock sum carry architecture highSpeed of Comb is - - -
Example: Configuration
C1 in1 in2 carryIn highSpeed Comb (combinational logic) C2 MyDff q d clk rst configuration name (used for simulation) reset clock behavioral sum carry
entity name
configuration CFG_HighSpeed of SerialAdder is for structural for C1: Comb use entity WORK.Comb(highSpeed); end for;
for C2: Dff use entity MyLibrary.MyDff(behavioral) generic map(gateDelay=>5 ns) port map(my_clk=>clk, my_d=>d, if different component my_q=>q, my_rst=>rst); than described in end for; entity is used, then I/O mapping must end for; be declared. end CFG_HighSpeed;
MicroLab, VLSI-21 (59/94)
JMM v1.4
i1 i2 i3
&
?1
&
o1
chapter 6
Functions
?Functions are used to compute a value based on the values of the input parameters. Functions are placed in declarative parts. Example of function definition: function rising_edge (signal clock: in std_logic) return boolean; ?Functions cannot modify parameter values (procedures can). Example of function call: rising_edge(clk) ?Functions execute in zero simulation time, thus wait statements cannot exist in functions. Parameters are restricted to be of mode in.
mode not necessary function rising_edge (signal clock: std_logic) return boolean is variable edge: boolean:=false; begin edge:=(clock= 1 and clockevent); return(edge); end rising_edge;
MicroLab, VLSI-21 (62/94)
JMM v1.4
function to_bitvector(svalue: std_logic_vector) return bit_vector is variable outvalue: bit_vector(svaluelength-1 downto 0); begin for i in svaluerange loop case svalue i is when 0 => outvalue i:=0; when 1 => outvalue i:=1; when others => outvalue i:=0; end case; end loop; end to_bitvector;
?Many conversion procedures as well as resolution functions can be found in std_logic_1164 or std_logic_arith libraries and others. Have a look at $SYNOPSYS/packages/IEEE/src/
MicroLab, VLSI-21 (63/94)
JMM v1.4
Procedures
?Procedures are subprograms that can modify one or more of the input parameters. Example of procedure declaration reading from a file f: procedure read_v1d (variable f: in text; v: out std_logic_vector); ?if the class of the procedure parameters is not explicitly declared, then the following rules apply:
?parameters of mode in are assumed to be of class constant ?parameters of mode out or inout are assumed to be of class variable
?Variables declared within a procedure are initialized on each call to the procedure and their values do not persists across invocations of the procedure. ?Signals cannot be declared within procedures ?Poor programming: Procedures declared within process can make assignments to signals corresponding to the ports of the encompassing entity. ?Procedure call: Dff(clk=>clk,reset=>reset,d=>s2,q=>s1,qbar=>open);
MicroLab, VLSI-21 (64/94)
JMM v1.4
Example: Procedure
library IEEE; use IEEE.std_logic_1164.all; entity CPU is port(di: out std_logic_vector(31 downto ); addr: out std_logic_vector(2 downto 0); r,w: out std_logic; do: in std_logic_vector(31 downto 0); s: in std_logic); end CPU; architecture behavioral of CPU is procedure Mread(address: in std_logic_vector(2 downto 0); signal r: out std_logic; signal s: in std_logic; signal addr: out std_logic_vector(2 downto 0); signal data: out std_logic_vector(31 downto 0)) is begin addr<=address; procedure Mwrite(address: in std_logic_vector(2 downto 0); r<=1; signal data: in std_logic_vector(31 downto 0); wait until s=1; signal addr: out std_logic_vector(2 downto 0); data <= do; signal w: out std_logic; r<=0; signal di: out std_logic_vector(31 downto 0)) is end Mread; begin addr<=address; w<=1; wait until s=1; di <= data; begin w<=0; -- CPU behavioral end Mwrite; -- description end behavioral;
MicroLab, VLSI-21 (65/94)
JMM v1.4
Overloading
?A very useful feature of the VHDL language is the ability to overload a subprogram or an operator. ?Imagine writing different Flip-Flop models with no and with asynchronous inputs and with different argument types. With the overloading feature only one single Flip-Flop name can be used. ?Example for Dff calls: Dff(clk,d,q,qbar); Dff(clk,d,q,qbar,reset,clear); ?From the type and number of arguments we can tell which procedure we meant to use. ?Note that in std_logic_1164.vhd the boolean functions and, or, etc have been defined for std_logic types, the functions +,*, etc have been defined for certain predefined types of the language such as integer. See also std_logic_arith package. function *(arg1,ar2: std_logic_vector) return std_logic_vector; function +(arg1,ar2: singed) return signed;
MicroLab, VLSI-21 (66/94)
JMM v1.4
Packages
?Locally related functions and procedures can be grouped into packages, and thus easily be shared among designs and people.
package MyLibraryPackage is --- type declarations -- function declarations -- procedure declarations -end MyLibraryPackage; package body MyLibraryPackage is --- functions -- procedures -end MyLibraryPackage;
package declaration
similar to VHDL entity defines interfaces
package body
similar to VHDL architecture defines behavior
?package declaration needs to be analyzed first, and then package body can be analyzed. ?Packages are used as libraries and referenced within VHDL design units via the use clause.
MicroLab, VLSI-21 (67/94)
JMM v1.4
Libraries
?Each design unit - entity, architecture, package - is analyzed (compiled) and placed in a design library. ?Libraries are generally implemented as directories and are referenced by a logical name. ?In VHDL the libraries STD and WORK are implicitly declared. ?WORK is the working design library normally placed in a local directory. ?Once a library has been declared, all of the functions, procedures and type declarations of a package can be accessed.
library IEEE; use IEEE.std_logic_1164.all; all functions, procedures, typed are visible
only the xnor function is visible library IEEE; use IEEE.std_logic_1164.xnor; visibility must be established for each design unit entityseparately
MicroLab, VLSI-21 (69/94)
JMM v1.4
library WORK
library MyLibrary
?1
o1
? Ex416 (difficulty: medium): Write the VHDL package MyPackage with the functions OneCounter (counting 1) and ParityGenerator should accept std_logic_vectors or bit_vectors of any size), and analyze it into the library MyLibrary. Use the defined functions in your design PackageExample to show its functionality.
MicroLab, VLSI-21 (71/94)
JMM v1.4
Verilog-HDL ? arrays ? run-time constants: parameter ? continuous driven nets: wire, tri, ... ? triggered assignments: reg, integer, real, ...
> >= /=
> >=
VHDL
Verilog-HDL
/* inside a module */ ... wire [7:0] inp; reg [7:0] outp, cou; ... always @(posedge clk) begin outp = oupt + inp; cout = outp + 1; end ...
MicroLab, VLSI-21 (75/94)
JMM v1.4
two drivers
JMM v1.4
VHDL
before the falling edge of clk: x=1, y=2, z=3 12ns after falling edge of clk: ? x= y= z=
JMM v1.4
Verilog-HDL
module AsynRegister(clk,rst,a,z); input clk,rst; input [15:0] a; output [15:0] z; always @(posedge clk) if (rst == 1b0) z = 16b0; else z = a; endmodule
MicroLab, VLSI-21 (78/94)
Dataflow Modeling
library IEEE; use IEEE.std_logic_1164.all; entity Demux2x4 is port( a,b,enable: in std_logic; z: out std_logic_vector(0 to 3);); end Demux2x4; architecture dataflowof Demux2x4 is signal abar,bbar: std_logic; begin z(3) <= not(a and b and enable); z(0) <= not(abar and bbar and enable); abar <= not a; z(2) <= not(a and bbar and enable); abar <= not a; z(1) <= not(abar and b and enable); end dataflow;
All the signal assignment statements (<=) happen concurrently after some specified delay which defaults to 1 delta, an infinitesimally small delay. Note that concurrent statements are always running so whenever A, B or ENABLE change then ABAR, BBAR, and Z(0 to 3) will also change after some delay. The delay in assigning a signal its new value means that the following statement is meaningful (it generates a periodic waveform): CLK <= not CLK after 10 ns;
MicroLab, VLSI-21 (79/94)
JMM v1.4
vector constant
Statements within a process are executed sequentially, like a program. The process is scheduled for execution after any events are processed for variables on its sensitivity list. Values of local variables are maintained between executions.
MicroLab, VLSI-21 (80/94)
JMM v1.4
Synthesis
Idea: once an behavioral model has been finished why not use it to automatically synthesize a logic implementation in much the same was as a compiler generates executable code from a source program?
a.k.a. silicon compilers
Synopsys, Inc. is the current leader in providing synthesis tools and synthesizable HDL modules.
Logic Synthesis
Z <= (A and B) or C;
A B C A B C
NC
X(0) Y(0)
X(1) Y(1)
X(2) Y(2)
X(3) Y(3)
process(word) variable result: std_logic; begin result := 0; for j in 0 to 3 loop result := result xor word(j); end loop; parity <= result; end process;
WORD(3)
PARITY
JMM v1.4
FSM Example
architecture behavioral of Moore is type StateType is (S0,S1,S2,S3); signal current,next: StateType; begin process(current) -- state transition begin case current is when S0 => if (a=1) then next<=S2; end if; when S1 => if (a=1) then next<=S0; else next<=S2; end if; when S2 => if (x=1) then next<=S3; end if; when S3 => if (x=1) then next<=S1; end if; end case; end process; process(current) -- output logic begin case current is when S0 => z <= 0; when S1 => z <= 0; when S2 => z <= 1; when S3 => z <= 1; end case; end process; process(clk,reset) -- state register begin if (reset=0) then current<=S0; elsif (clkevent and clk=1) then current <= next; end if; end process; end behavioral;
JMM v1.4
z=0
a
S0
a
z=0
a
S1
z=1
a
S2
z=1
a
S3
Further Reading
ISBN 0-13-181447-8
ISBN 0-7923-9472-0
Also: ? D. Perry, VHDL, Second Edition, McGraw Hill, 1993 ? see VHDL tutorials at I3S-CD or on the web http://www.microlab.ch/academics/courses/vlsi/g.html ? dont forget to study the CBT tutorial on VHDL
Test Bench
/1
?avoid interactive simulation, because it can never be used again ?test benches reduce total simulation development time ?test benches are used to verify designs during stepwise refinement ?test bench methodology bridges simulation with automatic test equipment (ATE)
Test Bench /2
?compare a test bench with MicroLab-I3S:
?there are chips and PCBs needed to be tested ?there is a nice measurement equipment ?there are skilled and hard working people ?there are no signals coming or going to the outside of the lab
Test Bench
response generation and verification
out
out
ASIC fabrication
prototype test (ASIC)
JMM v1.4
ASIC chip
out
clock
input
valid
output
stable
stable
stable
Conclusions
?HDLs are very useful for behavioral hardware system descriptions ?abstract models do not precisely reflect the reality ?restriction to synthesizable coding is necessary ?technology independency opens the possibility to fast FPGA prototyping ?test benches increase chip quality and highly decrease simulation time
out
Coming Up...
?Next topic
CAD exercises and mini FPGA projects PWM, blackjack dealer, simple microprocessor, etc
Exercises: VLSI-21
#1
?CAD Ex45x: PWM Project (difficulty: easy; time: medium): Design of a pulse width modulator (PWM) controlling a DC-motor. The PWM shall have an microprocessor interface. The VHDL design is simulated, compiled and implemented into an FPGA and is supposed to drive small dc motor. ?CAD Ex450: (difficulty: easy): Design the VHDL code of the PWM element. The btrdy and ack signals are handshake signals for communication with the microprocessor data bus. A value 0 on the 8-bit data bus will switch off the dc motor (pOut=1), a non-zero value will generate a PWM signal with an on-time of (data/256)*100% of a period. Analyze the VHDL syntax with gvan.
btrdy ack data clk rst
MicroLab, VLSI-21 (94/94)
JMM v1.4
Exercises: VLSI-21
#2
?CAD Ex451: (difficulty: easy): Design a test-bench for the PWM. Simulate your VHDL code with the Synopsis VSS simulator and use your test-bench to verify its correct behavior. ? Result: see exercise Ex451 on the MicroLab web ?CAD Ex452 (difficulty: easy): Synthesize the PWM VHDL code into a gate level schematic for a Xilinx FPGA target technology. Connect your VHDL signals to the correct FPGA pins. Perform the place&route of the logic elements. ? Result: see exercise Ex452 on the MicroLab web ?CAD Ex453 (difficulty: easy): Download your PWM circuit into an FPGA and and applying different PWM values to your circuit by the GECKO User Interface tool. Use an oscilloscope to verify its correct output behavior. This exercise has to be done in MicroLab, using the GECKO system. ? Result: see exercise Ex453 on the MicroLab web
MicroLab, VLSI-21 (95/94)
JMM v1.4
Exercises: VLSI-21
#3
?CAD Ex400 (difficulty: easy): Design VHDL a 2:1 multiplexer. Use a dataflow model. Simulate your VHDL code with the Synopsys VSS simulator. Get familiar with interactive simulation. ? Result: see exercise Ex400 on the microlab web ?CAD Ex401 (difficulty: medium): Design in VHDL a the SN74160 synchronous decimal counter. Use a behavioral model. Simulate your VHDL code with the Synopsis VSS simulator and use macros for interactive simulation. ? Result: see exercise Ex401 on the microlab web ?CAD Ex402 (difficulty: easy): Schematic entry of the blackjack-dealer on block-level using Synopsys SGE tool. ? Result: see exercise Ex402 on the microlab web
Exercises: VLSI-21
#4
?CAD Ex403 (difficulty: easy): Skeleton VHDL code generation for blackjack-dealer from block-level schematics using Synopsys SGE tool. ?Result: see exercise Ex403 on the microlab web ?CAD Ex404 (difficulty: medium): Design the VHDL code of your blackjack-dealer. Use the prepared templates as a guide. ?Result: see exercise Ex404 on the microlab web ?CAD Ex405 (difficulty: medium): Write a VHDL test bench for your blackjack dealer. Generate the following sequence of card values: 3, 11, 11, 7, 2, 11, 6. ?Result: see exercise Ex404 on the microlab web (final result is: 21)
Exercises: VLSI-21
#5
?CAD Ex406 (difficulty: easy): Synthesize the blackjack dealer VHDL code into a gate level schematic for a Xilinx FPGA target technology. Perform the place&route of the logic element. ? Result: see exercise Ex406 on the microlab web ?CAD Ex407 (difficulty: medium): Perform a backannotation of your FPGA chip and simulate it again with the real timing information. Does your chip still work? Look for the errors. ? Result: see exercise Ex407 on the microlab web ?CAD Ex408 (difficulty: medium): Download your blackjack dealer circuit into an FPGA and run your test bench again on the ProTest test system. This exercise has to be done in MicroLab. ? Result: see exercise Ex408 on the microlab web
VLSI Design I
Automatic Synthesis of Digital Circuits
Why should I not enjoy life instead of drawing schematics if CAD tools can do the job for me?
Overview design abstraction domains architectural models Goal: You are familiar with the design abstraction domains, the descriptiondescription-synthesis design method, the design strategies as well as the three synthesis steps. You know the FSMD architectural model as well as the interprocess communication models.
MicroLab, VLSI-22 (1/40)
JMM v1.4
Introduction
system complexity is increasing product lifetime is decreasing design efficiency is essential new design methods are necessary higher abstraction levels are introduced CAD tools able to handle large amounts of data are needed
Design Methodology
budget ($, speed, area, power, schedule, risk) lowlow-level building blocks, highhigh-level architecture specification spice paper & pencil
behavioural design, verification logic design, verification layout, verification schematics simulation timing analysis layout, drc extraction net compare LVS (layout vs schematic)
MicroLab, VLSI-22 (3/40)
JMM v1.4
&
CLK data 3A
clk
ena
& if data-ready then bus := data; else bus := high-Z; end if;
clk
Abstraction Domains
VLSI designs can be performed in 3 abstraction domains: behavioural domain structural domain physical domain each domain gives different freedoms to the designer parallel or serial algorithms logic technology and bitbit-slice fullfull-custom and macromacro-cells ...
Behavioural Domain
Structural Domain
applications , algorithm s processors progra ms system AL Us, registers subro utines, b. equ ations abstraction level instructionslogic gates tra nsistors
Behavioural Domain
description and verification of first ideas function and not implementation is asked modelling with general purpose languages modula-2, pascal, pascal, c, c++, lisp, ... modulamatlab, mathematica, mathematica, ... matlab, vhdl, verilogverilog-hdl, hdl, cathedral, ... vhdl, vee, ... graphic languages as vee, transformation to structural domain: synthesis
Behavioural Domain
progr a ms subro utines, b. equ ations instructions
Structural Domain
description and verification of a solution implementation decisions taken restrictions like delay, signal strength, etc. modelling styles
vhdl, verilogverilog-hdl, hdl, vhdl, schematic
Structural Domain
processors AL Us, registers logic gates tra nsistors
Physical Domain
description and verification of physical implementation process technology specific implementation floorplan, floorplan, maskmask-layout, packaging description formats
cif, , gds2 cif stick diagrams, symbolic layout
Physical Domain
Abstraction Levels
design domains are divided in several abstraction levels: system level micro architecture level logic level circuit level
64 bit RISC
video interface
64 MByte memory
ISDN interface
cin
sel
a b
mux s
ALU
cout
c a c b c y
Design Strategies
the goal is a fast as possible transfer of an idea to a chip descriptions in the 3 abstraction domains strategies used:
hierarchy regularity modularity locality
basic idea: divide and conquer dividing in modules, subsub-modules until complexity of subsub-modules is comprehensible comparison to software engineering: split programs in modules, procedures, subroutines.
cin a b
adder
sum cout
cin a b
sum
cout
goal is reduction of complexity idea: divide in similar building blocks identical blocks, subsub-blocks, cells, transistor sizes 1-dim. arrays: bitbit-slice technique 2-dim. arrays: systolic arrays
si+3 ci+3
si+2 ci+2
si+1 ci+1
si ci
full adder
ai+3 bi+3
full adder
ai+2 bi+2
full adder
ai+1 bi+1
full adder
ai bi
ci-1
different modules should not influence each other subsub-modules with well formed interfaces: do not use transmission gates well defined signal types and strengths well defined interconnection widths, etc.
time locality leads to synchron designs (compare local variables in software engineering)
Automatic Synthesis /1
automatic synthesis: transformation of a design from behavioural to structural domain silicon compilation: transformation from behavioural to physical domain
Physical Domain
Automatic Synthesis /2
automatic synthesis tools on high abstraction levels do not exist yet not every description is synthesizable synthesis is a design process and not a only a coding as in software engineering synthesis steps:
allocation scheduling binding
delay
s1 s4 s6 s8 s10
s14
s18
s22
area
MicroLab, VLSI-22 (24/40)
JMM v1.4
Allocation: Example
RTL example x = a + b; y = a * c; z = x + d; x = y - d; x = x + c; allocation: 1 adder, 1 multiplier, 1 substractor a + y x2 + x3
MicroLab, VLSI-22 (25/40)
JMM v1.4
b *
c + z
Scheduling: Example
resource limited scheduling
each operation is bound to a clock cycle solutions for minimal execution time
a +
b *
y -
d + x2 + x3 z
Binding: Example
variables are bound to memories temporary variables x1 and x2 are not used simultaneously
b
a + x1 d
c * y
cycle 1
cycle 2
+ z
x2
cycle 3
+ x3
x1
x2 mux
mux
mux
add z, x1, x3
sub x2
Architecture Models
synthesis is based on the knowledge of a set of architecture models and design styles design styles:
parallel or serial datapath interrupt or polling control memory access types (cache ...)
memory elements
flip-flop, register, registerregister-file, RAM, ROM ... latch, flip-
interconnection units
bus, multiplexer
subdividable units
ripple-carry adder, selector, ALUs, ALUs, ... ripple-
implementation forms ROM (table lookup) PLA structures (2 stage logic) multistage logic bitbit-slice, systolic array, etc
control inputs
datapath
inputs
FSM
transfer logic state register control outputs transfer logic
datapath
functional unit
datapath
outputs
process 1
databus
FSM
transfer logic state register control outputs D Q control inputs transfer logic
datapath
functional unit
clock1
FSM
transfer transfer logic logic state register control outputs transfer logic
datapath
functional unit
clock2
process 2
MicroLab, VLSI-22 (36/40)
JMM v1.4
process 1
request
aknowledge data
data valid
process 2
never use:
gated clocks combinatorial outputs for asynchronous inputs asynchronous inputs as FSM inputs
Conclusions
description-synthesis method description system design with HDLs (parallel constructions, RTL level) toptop-down and bottombottom-up design abstract models are not precise
races, hazards, delays, signal strength, ...
Coming Up...
Next time...
Hardware description languages
Reading Weste:
Sections
6 thru 6.2.7 (design strategy) 6.4 thru 6.4.5 (design methods) 6.5 thru 6.5.4 ((design capture tools)
VLSI Design I
Design for Test
Overview design for test architectures adad-hoc, scan based, builtbuilt-in Goal: You are familiar with testability metrics and you know adad-hoc test structures as well as scanscanbased test structures. Built in test structures as BILBO and boundary scan can be applied.
AdAd-hoc test techniques are a collection of ideas aimed at reducing the test time. Common techniques are:
partitioning large sequential circuits adding test points adding multiplexers providing for easy state access
& =1
co3 Q3 co2
load test 1 0
& =1
. . .
load test & test load test 1 0 & =1 1 0 & =1
& =1
load test 1 0
& =1
Q2 co1
test
& =1
load 1 0
& =1
load test 1 0
& =1
Q1 co0
test
Q1 co0 Q0
test
vdd
& =1
vdd load
1 0
& =1
vdd load
1 0
& =1
Q0 halfhalf-adder
JMM v1.3
#2
A control Module A
0 B out
CL
shift out
1 0
JMM v1.3
scan
scanscan-out
CL1
CL2
Scan registers
partial serial scan: sometimes it is not area and speed efficient to implement scan in every location where a register is used (signal processing)
R1 CL R2 CL R5 R6 R3
MicroLab, VLSI-23 (6/24)
CL
R4 CL
JMM v1.3
A popular approach is the level sensitive scan design technique from T.W. Williams (LSSD)
the circuit is level sensitive (steady state response is independent of circuit and wire delays within a circuit): hazard free each register may be converted to a serial shift register
D T C 1 I A L1
D B L2
T2 reg A
D C I A D C I A D C I A B D C I A D C I A D C I A
reg B
B
Comb logic
c1 shiftshift-clk c2
normal operation
c2
Scan Elements
LSSD
D B L2
T2
D C I A
& & L2
& &
T2
scan FF
D TI
1 0 TE
D Q clk clka
TE D TE TI TE
JMM v1.3
clkb
clkb clka
MicroLab, VLSI-23 (8/24)
1 0
Generate pseudopseudo-random data for most circuits by using, e.g., a linear feedback shift register (LFSR). Memory tests use more systematic FSMs to create ADDR and DATA patterns.
For pseudopseudo-random input data simply compute some hash of output values and compare against expected value (signature) at end of test. Memory data can be checked cyclecycle-byby-cycle.
MicroLab, VLSI-23 (9/24)
JMM v1.3
1 + c1 x + c 2 x 2 + c3 x 3 cn1 x n1 + cn x n with a small number of XOR gates the cycle time is very fast. Cycle through fixed sequence ns). of states (can be as long as 2n-1 for some ns). Handy for large modulomodulo-n counters. different responses for different initial states different responses for different ci pseudopseudo-random sequence generator (PRSG)
Signature Analysis
signature analysis is used to compact a data stream into a so called signature different responses for different ci, many wellwellknown CRC (cyclic redundancy check) polynomials correspond to a specific choice of cis. s.
serial in
=1 & c1 D Q clk =1 & c2 D Q clk =1 & c3 D Q clk .... =1 & cn-1 D Q clk cn D Q clk & D Q clk qn-1 zn
MicroLab, VLSI-23 (11/24)
JMM v1.3
=1
parallel in
=1 & c1 =1 D Q clk z1 q1 z2 =1 =1 & c2 D Q clk q2 zn-1 . . . . =1 =1 Cn-1 D Q clk =1 &
qn
LFSR Polynomials
polynomials for maximal long sequences for n equal 1 up to 32 n 1,2,3,4,6,7,15,22 5,11,21,29 10,17,20,25,28,31 9 23 18 8 12 13 14,16 19,27 24 26 30 32 f(x) 1+x+x 1+x+xn 1+x2+xn 1+x3+xn 1+x4+xn 1+x5+xn 1+x7+xn 1+x2+x3+x4+xn 1+x+x4+x6+xn 1+x+x3+x4+xn 1+x3+x4+x5+xn 1+x+x2+x5+xn 1+x+x2+x7+xn 1+x+x2+x6+xn 1+x+x2+x23+xn 1+x+x2+x22+xn CRC 1+x+x4+x5+x7+x8 2+x15+x16 1+x MicroLab, VLSI-23 (12/24)
examples of CRCs n 8 16
JMM v1.3
BILBO
#1
Very popular builtbuilt-in test structure is the builtbuilt-in logic block observation (BILBO) from Koenemann BILBO operate in 4 different modes
parallel register mode BILBO register mode PRSG or signature analysis mode BILBO PRSG mode BILBO scan mode reset mode BILBO reset mode
normal operation of circuit
BILBO register mode BILBO signature analysis mode BILBO scan mode BILBO reset mode
MicroLab, VLSI-23 (13/24)
JMM v1.3
BILBO
#2
c1 c0 scan in 0 1
&
=1 D
Q clk Q1
&
=1 D
Q clk Q2
&
=1 D
=1
mode A B C D
c1 c0 0 1 0 1 0 0 1 1
JMM v1.3
IDDQ Testing
A-met meter (measures IDD) VDD
GND
Idea: CMOS logic should draw no current when its not switching. So after initializing circuit to eliminate tritri-state fights, disable pseudopseudo-NMOS gates, etc., the powerpower-supply current should be zero after all signals have settled. Good for detecting bridging faults (shorts). May want to try several different circuit states to ensure all parts of the chip have been observed.
MicroLab, VLSI-23 (15/24)
JMM v1.3
connectivity tests between components sampling and setting chip I/Os distribution an collection of selfself-test or builtbuilt-inin-test results
PCB interconnect
serial data in
JMM v1.3
The test access port (TAP) is a definition of the interface that needs to be included in an IC
TCK: test clock input TMS: test mode select TDI: test date input TDO: test data output TRST: optional signal for asynchronous reset the TAP
TDI
TDO
TAP controller
MicroLab, VLSI-23 (17/24)
State machine for the TAP controller. TMS is the control signal.
0 1 0
1 0
BoundaryBoundary -scan: IR
capturecapture-IR shiftshift-IR
BoundaryBoundary -scan: DR
boundary scan register is a special case of a data register. It allows circuit board interconnections to be tested, external components tested, and the state of the chip digital I/Os to be sampled. The boundary scan register is mandatory. internal data registers are optional and add additional access to the circuit. the bypass register is a 1 bit register used to bypass a whole chip.
BoundaryBoundary -scan: DR
D Q
clockDR
D Q clk
mode
0 1
to chip
clk
updateDR
next cell 0 1
shiftDR
D Q
clockDR
D Q clk
0 1
out PAD
clk
updateDR
enable
0 1
shiftDR
D Q
clockDR
D Q clk
0 1
clk
updateDR
from chip
0 1
shiftDR
D Q
clockDR
D Q clk
0 1
bidir PAD
clk
updateDR
0 1
last cell shiftDR
D Q clk
clockDR
D Q clk
0 1
to chip
JMM v1.3
updateDR
MicroLab, VLSI-23 (21/24)
Minimum 3 instructions
Bypass (all 0): it is used to bypass any serial data registers in a chip with a 1 bit register. This allows specific chips to be tested in a serialserial-scan chain without having to shift through the accumulated SR stages in all the chips Extest (all 1): testing of off chip circuitry sample/preload: places the boundary scan registers (at the chips I/O pins) in the DR chain, and samples or preloads the chips I/Os
Coming Up...
Next time: Top down design. Hardware description languages, logic synthesis. Readings Weste: Weste:
7.3 through 7.3.3.3 (ad(ad-hoc & scanscan-based testing) 7.3.4 through 7.3.4.1 (BILBO) 7.3.5 (Iddq (Iddq testing) 7.5 (boundary scan)
VLSI Design II
Small Signal FET Model and Diode Models
Overview small signal equivalent circuit for fet and diodes advanced large fet modeling and secondsecond-order effects Goal: You can use the small signal equivalent circuit of a diode and a MOS transistor. You are able to determine the parameters of a fet and have the feeling for a MOS fet. fet. You are familiar with advanced modeling like weak inversion, shortshortchannel effects and leakage.
MicroLab, VLSI-24(1/22)
JMM v1.3
Cox W I DS (sat ) = (VGS Vth )2 [1 + (VDS Veff )] 2 L k rds 2 Si = k rds = 2 L VDS Veff + 0 qN A
Body effect
log IDS
Ut
UGS
= 1 LE c
1 Rsx nCoxWEc
Id UGS UGS
Rsx
hot carrier effects specially in nfets due higher mobility: high velocity electrons can generate >>V electron hole pairs in drain to substrate: V >>V reduced output impedance
G
th
VD>>0
MicroLab, VLSI-24(4/22)
JMM v1.3
I IK
qA j ni 2 0
xd
1 0 ( n + p ) 2 2 si xd = ( 0 + VR ) qN A
MicroLab, VLSI-24(5/22)
JMM v1.3
small signal parameters are denoted with small letters small signal parameters are very handy for building simple equivalent circuits MicroLab, VLSI-24(6/22)
small signal
JMM v1.3
Transconductance
#1
The most important small signal component is the transconductance. transconductance. The behavior of a transconductance is the one of a voltage controlled current source. It describes the change of output current when the input voltage is varied. gm main transconductance, transconductance, describes the amplification of the drain current when a voltage is applied between gate and source. gds transconductance, transconductance, accounting for finite output impedance of transistor. Models channel length modulation effect, when drain to source voltage varies. gs transconductance, transconductance, describing how the output current depends on the source to substrate voltage (body effect).
MicroLab, VLSI-24(7/22)
JMM v1.3
Transconductance
#2
gm gs = 2 VSB + 2 F
I D g ds = VDS 1 g ds = = I Dsat I D rds
the negative sign is eliminated by changing the current direction in the equivalent circuit
MicroLab, VLSI-24(8/22)
JMM v1.3
vs
Depending on the terminal voltages, and the relative size of the parameters, some of the components may be ignored. This helps to reduce the complexity of hand calculations. the alternate lowlowfrequency T model
vg is rs=1/gm vs
MicroLab, VLSI-24(9/22)
JMM v1.3
vd is rds
The dynamic response of MOS systems strongly depends on the parasitic capacitance associated with the MOS transistor. 2 = 2 WLC + WC C gs = WCox L L + ov ox GS 0 3 3 C j0 C jx = C gd = WLov Cox = WCGD 0 Mj V 1 + XB 0 ' Csb = ( As + Ach )C js C j sw0 ' C j sw, x = M jsw Cdb = Ad C jd 1 + VXB 0 ' Csb = Csb + Cssw Cs sw = Ps C j sw,s ' Cd sw = Pd C j sw,d Cdb = Cdb + Cd sw
VGS>Vth VSB=0 Al n+ p+ field impland Cs-sw Cgs Csb p- substrate poly VDG>-Vth Cgd n+ Lov Cdb
SiO2 Cd-sw
vg
vd
Cdb
vs
Gate capacitance Cgs is normally the largest parasitic cap of fet. fet. The gategate-drain overlap capacitance Cgd is normally small, can however play a role when the voltage gain is large (Miller effect). Source capacitance Csb is normally second largest capacitance, since it includes channel bulk capacitance. Drain capacitance Cdb normally smallest capacitance.
MicroLab, VLSI-24(11/22)
JMM v1.3
vs
vd
As the channel has disappeared we have: C gs = C gd = WLov Cox = WCGX 0 but we also have a new capacitor: C gb = AchCox The capacitors Csb and Cdb are smaller as the channel is not present : C xb = AX C jx + Px C j sw, x
MicroLab, VLSI-24(13/22)
JMM v1.3
Diodes
p+/nwell
diode
anode
cathode SiO2
anode
Al Note that the metal p+ contacts to the diode are connected to heavily doped pn junction region cathode
n+/pwell
cathode
diode
Schottky diode metal contacts to lightly doped semiconductor forms a Schottky diode
anode Al
cathode SiO2
anode
n+ n well
cathode
JMM v1.3
Diode Modeling
Cj =
If a diode is reversereverse-biased, current flow is extremely small and primarily due to thermal or electric field optically generated carriers. C j0
Mj
Cj0
N AND 0 = VT ln depletion region n2 i q si N D N A depletion = capacitance Cj 2 0 (N A + N D ) LargeLarge-signal model for forward biased junction 1 1 I S AD + ID = ISe N A ND diffusion capacitance Cd CT = Cd + AC j ID Cd = T (Cd=0 for forward biased Schottky diodes) VT SmallSmall-signal model for a forwardforward-biased diode
VD VT
VR 1 + 0
p+
1 dI D I D = = rd dVD VT
JMM v1.3
MicroLab, VLSI-24(15/22)
Coming Up...
Next time: Basic current mirrors and single stage amplifiers. Readings for next time Johns&Martin:
1 through 1.1 ( (pn pn junctions) 1.2 (mos (mos transistor) 1.2 (advanced mos modeling)
MicroLab, VLSI-24(16/22)
JMM v1.3
#1
Johns&Martin 1.1 pp7: Ex1.4 (difficulty: easy): Assuming process C05MC05M-D. a) Calculate the total zerozero-bias depletion capacitance CT-j0 of a p+nwell diode with an area of 5m times 5m. Do not use the Spice parameter CJ. b) At 3V reversereverse-bias the capacitance Cj has to be calculated again. Result: Result: a) CT-j0=16.3fF, b) CT-j=8.98fF Johns&Martin 1.1 pp10: Ex1.6 (difficulty: medium): Assuming process C05MC05M-D and Mj=0.5 (use Spice parameter CJ). A reversed biased p+nwell diode is charged from 0V to 3.3V through a 10k resistor. Calculate the time to charge the diode to 2/3 2/3 of its end value. value. Result: eq. 1.36 pp10) Result: t66%=130ps (Johns: see eq. Johns&Martin 1.2 pp31: 1.9 (difficulty: easy): Assuming process C05MC05M-D. a) Derive the lowlowfrequency parameters for an nfet with W=10m and L=0.5m at Vgs=1.1V, Vds= Veff , Vsb= 0.55 0.55V. b) What is the new value of rds if the draindrain-source voltage is increased by 0.55 0.55V. Result: Result: a) gm=0.98mA/V, gs=0.143mA/V, MicroLab, VLSI-24(17/22) rds=208k, b) rds=12.8k ????
JMM v1.3
#2
Johns&Martin 1.2 pp33: 1.10 (difficulty: easy): Assuming process C05MC05M-D. Find the TT-model parameter rs for the nfet for example 1.9a. Result: Result: rs=502 Johns&Martin 1.2 pp36: 1.12 (difficulty: easy): Assuming process C05MC05M-D. Find the gds for the nfet for example 1.9 working in triode region with Vds near zero. Result: Result: gm=1.99mA/V, rds=502 Johns&Martin 1.9 pp79: 1.7 (difficulty: easy): Assuming process C05MC05M-D. a) Find ID for an nfet with W=10m, L=0.5m and VGS=1.1V, VDS= Veff . b) Assuming remains constant, estimate the new value of ID if VDS is increased by 0.3V. Result: Result: a) ID=487A, b) ID= 513A
MicroLab, VLSI-24(18/22)
JMM v1.3
#3
Ex600a: Johns&Martin 1.9 pp79: 1.8 (difficulty: easy): Assuming process C05MC05M-D. Simulate a fet W=10m, L=2m in its active region (VGS=2V) and measure the drain current at VDS1=2V and at VDS2=3V. Estimate the output impedance rds and the channel length modulation factor . Result: Result: rds=402k, =0.006
MicroLab, VLSI-24(19/22)
JMM v1.3
#4
Ex vlsi24.1 (difficulty: easy): Assuming process C05MC05M-D. Find the capacitances of an nfet as shown below in its active region for Vsb=1V, Vdb=2V. Result: Result: Cgs=3.86fF, Csb=3.09fF, Cdb=1.94fF, Cgd=0.41fF ( (see see Johns&Martin pp35)
0.5m 0.6m
3m
0.6m
MicroLab, VLSI-24(20/22)
JMM v1.3
#5
Ex vlsi24.2 (difficulty: easy): Assume the transistors are designed with minimal dimensions using the 0.5m Alcatel Mietec process. Use the rules to calculate the Cgs, Csb and Cdb capacitances for its active region. Compare the values with a single device fet. fet. Result: Result: a) Cdb=26.6fF, Csb=49.1fF, Cgs=34.8fF, (see Johns&Martin pp103ff) node 1
27
J1 Q1 J2 Q2 J3 Q3 J4 Q4 J5
node 2
gates
MicroLab, VLSI-24(21/22)
JMM v1.3
#6
Ex600b: (difficulty: easy, medium time): Assume the transistors are designed with W=10m and L=2m using the 0.5m Alcatel Mietec process. Simulate the fet with Spice in strong and weak inversion. Visualize VGS vs IDS, sqrt IDS and log IDS and identify the different regions and find ID0. Result: compare with transparency #3
MicroLab, VLSI-24(22/22)
JMM v1.3
VLSI Design II
Basic Current Mirrors and Single Stage Amplifiers
He ! Thats me !
Goal: You know the properties of the different amplifier stages and are able to choose the one which is best suited for your application. You can determine the fet dimensions from a given circuit specification. You are familiar with current mirrors. You can apply two possible techniques for improving the output impedance. You know the resulting limitations on the output voltage swing.
MicroLab, vlsi-25 (1/26)
JMM v1.2
Outline
Current mirrors Single stage amplifiers with active loads Johns&Martin
nodal analysis method simple CMOS current mirror (chap 3.1) commoncommon-source amplifier (chap 3.2) sourcesource-follower or common drain amplifier (chap 3.3) common gate amplifier (chap 3.4) source degenerated current mirror (chap 3.5) highhigh-outputoutput-impedance current mirrors (chap 3.6) cascode gain stage (chap 3.7)
Exercises
hand calculations spice simulations
Iin
linear
Vds
vs
Iin V1 Q1
Iout rout Q2
Q1
v1 1/gm1=rs1
MicroLab, vlsi-25 (4/26)
1/gm1
active load
Q3 Q2 rout Vout common source amplifier stage
MicroLab, vlsi-25 (6/26)
JMM v1.2
Ibias
Vin
Q1
Ibias
Vin
Q1
Rin + vin ~ -
Q1 + gm1vgs1 vgs1 -
R2
vout
active load
rds1
rds2
Ibias
Vin
active load
Q1 vin =vg1 vd1 + gm1vgs1 vgs1 gs1vs1 vs1 rds2 rds1 vout=vs1
active load
MicroLab, vlsi-25 (9/26)
JMM v1.2
ds1
+ gds 2 )
the next negative terms are the adjacent node voltages, and each is mutiplied by all connecting admittances
v d gds1
the last terms are any current sources with a multiplying negative sign used if the current is shown to flow into the node +g v g v
Q1
s1 s1
m1 gs1
vin =vg1
rds1 vout=vs1
JMM v1.2
v s1
v out ( gds1 + gds 2 ) + g s1v out g m1 (v in v out ) = 0 vout g m1 Av = = vin g m1 + g s1 + g ds1 + g ds 2 gs1 is 5 to 10% of the value of gm1, gds1 and gds2 are in the order of 1/10 of gs1 the body effect parameter gs1 is the major source of the error causing the gain less than unity
active load
Q3 Q2 Vout Ibias Vbias Vin Q1 vd1 + gm1vgs1 vgs1 vs1 RS vin
MicroLab, vlsi-25 (12/26)
JMM v1.2
Q1 rin
vout RL
gs1vs1
rds1 rin
active load
active load
v s1 = v gs1
thus
RS vin
common gate amplifier: used as gain stage when a small input impedance is desired and can be used as first stage of an amplifier designed to amplify current rather than voltage.
vout Gs g m1 + g s1 + g ds1 Av = = vin G + g m1 + g s1 + g ds1 GL + g ds1 s 1 + g ds1 / GL
MicroLab, vlsi-25 (14/26)
JMM v1.2
the output impedance of the basic 2 transistor current mirror can be increased by degeneration resistors Rs I I
in out
V1 Q1 Rs Q2 Rs
rout
Q1
0V + vgs -
ix + ~ vx -
1/gm1 Rs
impedance increase
rout
JMM v1.2
vx = = rds 2 [1 + Rs ( gm 2 + g s 2 + gds 2 )] ix
MicroLab, vlsi-25 (15/26)
Iin
Q3 Q1
Q4 Q2
2Id Veff = n C ox (W / L )
Vg 3 = Vgs 1 + Vgs 3 = 2 Veff + 2 Vtn Vds 2 = Vg 3 Vgs 4 = Vg 3 (Veff + Vtn ) = Veff + Vtn
impedance impedance + gm3vgs3 vgs3 gs3vs3 rds3 vg4 vgs4=-vs4 g v + m4 gs4 vgs4 gs4vs4 vs4 gs2vs2 rds2 rds4 iout vout
rds1
Iin rin Q3 Q1
Iout rout Q4 Q2
Q2 senses output current and mirrors it to Id1 to. Iin and Id1 must precisely match otherwise Vg3 increases/decreases.
foldedfolded-cascode amplifier
Ibias p-channel commoncommon-gate Q2 Q1 Ibias2
Vbias Vout CL
JMM v1.2
rx g m 2 rds 1rds 2
gs2vs2 vs2 rds2
ix
+ vgs2 -
gm2vgs2
vx
+ vgs1 -
gm1vgs1
gs1vs1
rds1
1 gm Av g 2 ds
rd 2 g m 2 rds1rds 2
for high impedance Ibias with
RL g
2 m p ds p
2 g mrds rout 2
cascode gain stage: due to the large impedance at the output, high gain can be realized with cascode gain stages:
vout 1 gm Av = vin 2 g ds
JMM v1.2
Coming Up...
Next topic Frequency response of single stage amplifiers Readings for next time Johns&Martin:
nodal analysis method simple CMOS current mirror (chap 3.1) commoncommon-source amplifier (chap 3.2) sourcesource-follower or common drain amplifier (chap 3.3) common gate amplifier (chap 3.4) source degenerated current mirror (chap 3.5) highhigh-outputoutput-impedance current mirrors (chap 3.6) cascode gain stage (chap 3.7)
Exercises:
Have a look at the exercises in Johns&Martin. CAD exercise Ex601
#1
Johns&Martin chap 3.1 pp127: 3.1 (difficulty: easy): Consider the current mirror shown on transparency vlsi25/3 where Iin=100A and each transistor has W=10m and L=2m. Given rds=88000 [L (m)]/[ID (mA (mA)], mA)], find rout for the current mirror and the value of gm1. Also estimate the change in Iout for a 0.5V change in the output voltage. Result: rout =1.76M, gm1=0.45mA/V, dIout=0.28 Johns&Martin chap 3.2 pp129: 3.2 (difficulty: easy): Consider the common source stage shown on transparency vlsi25/7 where Iin=100A and all transistor have W=10m and L=2m. Given rdsn=88000 [L (m)]/[ID (mA (mA)], mA)], rdsp=50000 [L (m)]/[ID ( (mA mA)]. mA)]. What is the gain of the stage. Result: Av =-287
#2
Johns&Martin chap 3.3 pp131: 3.3 (difficulty: easy): Consider the source follower shown on transparency vlsi25/8 where Ibias=100A and all transistors designed with Alcatel 0.5m process have W=10m and L=2m. Given n=0.45V1/2, Vsb=2V, and rds(mA)]. mA)]. ds-n=88000 [L (m)]/[ID (mA What is the gain of the stage. Result: Av =0.88 Johns&Martin chap 3.5 pp136: 3.4 (difficulty: easy): Consider the source degenerated current mirror shown on transparency vlsi25/15 where Ibias=100A and all transistors designed with Alcatel 0.5m process have W=100m and L=2m. Given n=0.45V1/2, Vsb=2V, Rs=5k, and rds(mA)]. mA)]. What is ds-n=88000 [L (m)]/[ID (mA the increase in output resistance compared to simple current mirror. Result: increase=9.1, rout =16M
MicroLab, vlsi-25 (25/26)
JMM v1.2
#3
Johns&Martin chap 3.6 pp138: 3.5 (difficulty: easy): Consider the cascode current mirror shown on transparency vlsi25/15 where Iin=100A and all transistors have W=10m and L=2m. Given VSB4=1V and rds(mA)]. mA)]. ds-n=50000 [L (m)]/[ID (mA What is the output impedance and the minimal output voltage. Result: rout =527k, Vout(min)=1.5V Johns&Martin chap 3.7 pp142: 3.6 (difficulty: easy): Consider the telescopic cascode gain stage shown on transparency vlsi25/20 assuming gm=0.5mA/V and rds=100k. What is the output impedance and gain. Result: rout =2.5M, Av=-1250
VLSI Design II
Frequency Response of Single Stage Amplifiers
[dB] 40
20
103
104
105
106
107
108
109 [Hz]
Circuit Analysis the precise way: solving complex equations the approximate way: find the dominant pole the handy way: let Spice do it precisely Goal: You are able to identify the dominant pole in a transistor circuit. You can approximately determine the contribution of each node in a circuit to the total frequency response.
MicroLab, vlsi26 (1/29)
JMM v1.2
Outline
Frequency response
commoncommon-source amplifier sourcesource-follower amplifier sourcesource-follower amplifier with compensation technique cascode gain stage
Johns&Martin
frequency response (chap 3.11)
Gray&Meyer
estimation of dominant poles zerozero-Value Time Constant Analysis (pp500 ff) ff) (Analysis and Design of Analog Integrated Circuits, 3rd edition, Wiley and Sons, ISBNISBN-04710471-5998459984-0)
Exercises
hand calculations spice simulations
N( s ) a 0 + a 1 s + a 2 s 2 + + a m sm A(s ) = D(s ) 1 + b 1 s + b 2 s2 + + b n sn
very often the zeros are unimportant, thus
K A(s ) = s s s 1 p 1 p 1 p 1 2 n
Where K is a constant and p1,p2 ... are poles of the transfer function, thus
1 b1 = p i= 1 i
n
JMM v1.2
p 1 << p 2 , p 3 ,
thus b 1
1 p1
1 >> p1
1 p i= 2 i
n
A ( j) =
K 2 2 2 1 + 1 + 1 + pn p p 1 2 K 2 1 + p 1
A ( j)
JMM v1.2
3 dB p 1
3 dB
1 b1
j
p3 p2 p1
s plane
x = Rx C x
vout
Rin + vin ~ -
rb
v3
i2 C
+ +
i1 C
v - 1
v2
gmv1
RL
vout
We can show that with this choice od variables the circuit equations are of the form:
i 1 = (g 11 + sC )v 1 + g 12 v 2 + g 13 v 3
i 2 = g 21 v 1 + (g 22 + sC )v 2 + g 23 v 3 i 3 = g 31 v 1 + g 32 v 2 + (g 33 + sC x )v 3
MicroLab, vlsi26 (7/29)
JMM v1.2
(s ) = K 0 + K 1 s + K 2 s2 + K 3 s3
If all capacitors are zero:
( s ) = K 0 (1 + b 1 s + b 2 s 2 + b 3 s 3 )
K 0 = C =C = C x = 0 0
Consider now the term K1s, this is the sum of the terms involving s that are obtained when the system determinant is evaluated. However it is apparent, that s only occurs when associated with a capacitance:
K 1 s = h 1 sC + h 2 sC + h 3 sC x
The terms are constants. h1 can be evaluated by expanding the determinant about the first row:
( s ) = (g 11 + sC ) 11 + g 12 12 + g 13 13
With cofactors xx of the determinant. The term sC is found by evaluating 11 with C and Cx equal zero
h 1 = 11
C =C x =0
MicroLab, vlsi26 (8/29)
JMM v1.2
( s ) = g 21 21 + (g 22 + sC ) 22 + g 23 23
With cofactors xx of the determinant. The term sC is found by evaluating 22 with C and Cx equal zero
h 2 = 22
similarly
C =C x =0
h 3 = 33
C = C = 0
K 1 = 11
and:
C =C x =0
C + 22
C =C x =0
C + 33
C =C = 0
Cx
33 C =C =0 22 C =C x =0 K 1 11 C =C x =0 C + C + Cx b1 = = 0 0 0 K0
MicroLab, vlsi26 (9/29)
JMM v1.2
v 1 11 = i1 (s )
The drivingdriving-point resistance at the C node pair with all capacitors equal to zero:
11
We now define
C = C x =0
0 R 0
We can write now:
11 =
C = C = C x =0
11 = 0
C =C x =0
b 1 = R 0C + R 0C + R x 0C x
Thus:
3 dB
1 b1
3 dB
Thus the sum of the zerozero-value time constants leads to the -3dB frequency
MicroLab, vlsi26 (10/29)
JMM v1.2
The precise way: Add the parasitic capacitors to the equivalent circuit. Use nodal analysis for evaluating the transfer function. The approximate way: if there exists a pole p1 <<p2, p3 ,..., and the transfer function is already given be the transfer function A(s)=N(s)/D(s) 2 n D ( s ) = 1 + b s + b s + l + b s with 1 2 n the pole p1 is given by: p1 = 1 / b1 the dominant pole may be found directly in the circuit diagram by looking for the node with the largest impedance. Take care of the Miller Effect. The time constant (and its influence on the frequency response) associated with a single parasitic capacitor can be estimated with the zero value time constant method:
set all independent sources to zero replace the interesting capacitor Cx by a voltage source Vx set all other capacitors to zero evaluate the impedance Rx seen by the voltage source Vx the time constant is equal to CxRx
Rin + vin ~ -
v1 Cgs1
rds of Q1 and Q2
a = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
3 db
3 db
1 = a 1 = R in [C gs 1 + C gd 1 (1 + g m 1R 2 )] + R 2 (C gd 1 + C 2 )
Miller capacitance for Rin >> R2
b = R inR 2 (C gd 1 C gs 1 + C gs 1 C 2 + C gd 1 C 2 )
p 2
g m 1 C gd 1 C gs 1 C gd 1 + C gs 1 C 2 + C gd 1 C 2
MicroLab, vlsi26 (13/29)
JMM v1.2
Iin
Rin
Cin
Cgd1 iin Rin Cin Cgs1 + gm1vgs1 -vgs1 rds2 gs1vs1 vs1 Cs
iin
Rin
R s 1 = rds 1 rds 2 (1 / g s 1 )
1. gain from vg1 to vout is found 2. admittance Yg looking into gate of Q1 without considering Cgd1 is found 3. Gain from iin to vg1 is found 4. overall gain from vin to vout is found and results interpreted 1. gain from vg1 to vout is found
JMM v1.2
Yg =
ig1t v g1
s(C gs 1 + C s ) + g m 1 + G s 1
sC gs 1 (sC s + G sq )
v g1 iin
s(C gs 1 + C s ) + g m 1 + G s 1 a + sb + s 2 c
A ( s ) = A (0 )
N( s ) s s2 + 2 1+ 0 Q 0
There is no peaking and the transfer functions maximum is at dc if: 0 is the -3dB frequency if: Step input function: no peaking for peaking for (complex conjugate poles) For the source follower:
Q < 1 / 2 0.707 Q = 1/ 2
gm1 Z = C gs 1 Q=
G in (g m 1 + G s 1 ) 0 = C gs 1 C s + C 'in (C gs 1 + C s )
Source follower circuits can exhibit large amounts of overshoot under certain conditions. In practical uE circuits the parasitic capacitances and the output capacitance results in only moderate overshoot for worstworst-case conditions.
MicroLab, vlsi26 (17/29)
JMM v1.2
Iin
Rin
Cin
C1 R1
C1 =
(g m 1 + G s1 )(C gs 1 + C s ) (g m1 + G s1 )(C gs 1 + C s )
C gs 1
C gs 1 (C s g m 1 C gs 1 G s 1 )
2
g m 1 C gs 1 C s
R1 =
(C + G ) (C (C g C G ) C
gs 1 s s m1 gs 1 s1
2 ) + G gs 1 s gs 1 s m 1
Cg
C2 =
JMM v1.2
C gs 1 C s C gs 1 + C s
(see Johns/Martin pp160pp160-162)
MicroLab, vlsi26 (18/29)
vout
Iin
Iin
Iout rout
Q3 Q1
Q4 Q2
Q3 Q1
Q4 Q2
3 dB
1 R out C L
2 g 2ds g mCL
MicroLab, vlsi26 (21/29)
JMM v1.2
C d 2 = C gd 2 + C db 2 + C L + C bias C s 2 = C db 1 + C sb 2 + C gs 2
Cgs 1 = C gs 1R in
nodes vg1,vs2 the capacitor Cgd1 is replaced by a voltage source vx in order to calculate the input resistance seen from that node. vg1 Rin vx ix -~ + gm1vg1 Rd1
G d 1 = g ds 1 + Ys 2
admittance looking into the source of a cascode transistor is Ys2
Cgd 1 = C gd 1R d 1 (1 + R in [G d 1 + g m 1 ])
G d 1 = g ds 1 + Ys 2
admittance looking into the source of a cascode transistor is Ys2 gm2vs2 for Ys2=is/vs2
2 R L g mrds
vs2 is
g ds << g m Ys 2 g ds
Cgd 1 Cgd 1
JMM v1.2
node vs2
the resistance seen by the capacitor Cs2 is rds1 in paralell with the impedance seen looking in the source of Q2 which is approximately rds, thus:
Cs 2
rds C s2 2
The resistance seen by C is the output impedance of the node vout cascode amplifier, thus: d2
Cd 2 C d 2
2 g mrds
Av A(s ) = 1 + s / 3 dB
at frequencies substantial larger than -3dB:
Av gm1 A(s ) s / 3 dB sC L
upper limit of the unityunity-gain frequency of an amplifier that uses a cascode gain stage is limited by source node of Q2: 3 p Veff 2 1 p 2 = > 2 L22 s2
Coming Up...
Next topic Basic OpAmp design and compensation Readings for next time Johns&Martin: Sections 3.11 Exercises: Have a look at the exercises in Johns&Martin.
#1
Johns&Martin chap 3.11 pp156: 3.8 (difficulty: easy): Consider the commoncommon-source amplifier shown on transparency vlsivlsi-26/6 where Iin=100A and all transistors have W=100m and L=1.6m. Given Rin=180k, CL=0.3pF, Cgs1=0.2pF, Cgd1=15fF, Cdb1=20fF, Cdb2=36fF, nCox=90A/V2, pCox=30A/V2, and rdsds-n=8000 [L (m)]/[ID (mA)], (mA)]. mA)]. mA)], rdsp=12000 [L (m)]/[ID (mA Estimate the 3db frequency response. Result: f-3db =554kHz Johns&Martin chap 3.11 pp160: 3.9 (difficulty: easy): Analyse the source follower and assume that Ibias=100A and all transistors have W=100m and L=1.6m. Given Rin=180k, CL=10pF, Cgs1=0.2pF, Cgd1=15fF, Csb1=40fF, Cin=30fF, nCox=90A/V2, pCox=30A/V2, and rdsds(mA mA)]. mA)]. Find 0, Q, and n=8000 [L (m)]/[ID ( z of the source follower. Result: 0 =52MHz, Q=0.8, % overshoot = 8.1%, z=5.3GHz
MicroLab, vlsi26 (28/29)
JMM v1.2
#2
Johns&Martin chap 3.11 pp166:3.11 (difficulty: easy): Assume that for the input transistors and the cascode transistors, gm=1mA/V, rds=100k, Rin=180k, CL=5pF, Cgs=0.2pF, Cgd=15fF, Csb=40fF, Cdb=20fF, Cbias=20fF, Estimate the dB frequency of the cascode amplifier (transparency 19). Result: -3dB =2 6.3MHz Johns&Martin chap 3.11 pp168: 3.12 (difficulty: easy): Estimate the lower bound on the frequency of the second pole of a foldedfolded-cascode amplifier for a 0.8m technology, where a typical value of 0.25V is chosen for Veff2. L2=1.5Lmin, p=0.02m2/Vs. Result: p2 =2 414MHz
Analog Microelectronics
Basic OpAmp Design and Compensation
Outline
u Johns&Martin
u MOS differential pair and gain stage (chap 3.8) u two-stage CMOS OpAmp (chap
u gain u frequency response u systematic
5.1)
u first-order
model of closed loop-amplifier u linear settling time u OpAmp compensation u compensation of two-stage OpAmp u lead compensation u making compensation independent of process and temp u biasing OpAmp to have stable transconductance u Exercises
(5.3-5.5)
integrated amplifiers have differential input, realized with a differential transistor pair
ID2 V+ Q1 Ibias Q2 ID2 V-
ua
low-frequency small-signal equivalent circuit is based on the T model for the MOS transistor
id1=is1 v+ rs1 gate current is zero in T model is1 is2 id2=is2 vrs2
v in v + v
v+ rs1
s1 v in v in id 1 = i s 1 = = rs1 + rs2 1 / g m1 + 1 / g m2
g m1 id 1 = v in 2
Definition:
gm1 id 2 = v in 2
thus:
iout i d 1 id 2
iout = g m 1 v in
is1 + vin -
i d 4 = i d 3 = i s 1
and
rout vout
id 2 = i s1
Q1
A v = g m1 z out
where
Thus, for this differential stage, a very + simple model is used. This model implicitly assumes that the time constant at the output vin node is much larger than the time constant due to the parasitic capacitances at Q1 and Q2
CL
JMM v1.0
rout
vx ix
+
+
Q4
rout vout Q2
Q1
va -
rs1
rds1 rds2
A v = g m1 (rds 2 rds4 )
MicroLab, vlsi27 (6/34)
JMM v1.0
IS1 + VIN Q1
Q4
Iout
Vout
Q2
IOUT Ibias
-3 -2 -1
1.5 1 0.5 0 0 1
-0.5 -1 -1.5
VI D / Ib i a s
/ I b i a s = 5. 4
VIN = 187mV
MicroLab, vlsi27 (7/34)
A1
1 output buffer
Vout
single ended output e.x. common-source gain stage with active load
output gain stage only present when resistive loads need to be driven
Q 10
Q 11
Q5300
VDD
300 25 25
Q 14 Q 15
100 25
Q 12 Q 13
Vin-
Q1
Q2
300
Q8 Vin+ Q 16 Vout
CC
300 500
150
150
Q3 Rb VSS
Q4
Q7
Q9
bias circuit
output buffer
u p-channel
input stage u all transistor lengths are 1.6m (1m process) u reasonable sizes for lengths of the transistors might be somewhere between 1.5 and 2 times the minimum transistor length
MicroLab, vlsi27 (9/34)
JMM v1.0
gain for low frequency application is the most critical parameter of an OpAmp
g m1
W W I bias = 2 p C ox I D1 = 2 pC ox L 1 L 1 2
approximation to the finite output resistance, where a is technology dependent parameter: 5e-6 V1/2/m ignoring short channel effects
Av 2 = gm 7 (rds 6 rds 7 )
gm 9 Av 3 = G L + g m 9 + g ds8 + g ds9 gm 9 = G L + g m 9 + g s8 + gds8 + g ds9
g m gs = 2 VSB + 2 F
MicroLab, vlsi27 (10/34)
gain of the third stage with body effect (bulk Av 3 not connected to source)
response where capacitor Cc causes the magnitude of the gain to decrease, but still well below unity gain frequency (open-loop gain = 1) midband frequency u only compensation capacitor CC repsected u assume Q16 is not present (resistor for lead compensation, effect only at unity gain frequency) u discuss simplified circuit:
Vbias
300
Q 300
5
vinvin+
Q1
Q2
300
g m1 Av (s ) sC C g m1 ta CC
v1
CC
v2
150
150
Q3
Q4
i=gm1 vin
-A2
A3
vout
rate SR is the maximum rate the output changes when input signals are large u at slew rate limitation all current of Q5 goes either in Q1 or Q2 this current has to go through CC dv out SR dt
max
Vbias
300
Q5300
vinvin+
Q1
Q2
300
2 I D1 SR = = Veff 1 ta CC
v1 CC v2
150
150
Q3
I Q4
-A2
A3
vout
increasing V eff1 and ta increases SR p-channel fet inputs increases SR increasing V eff1 reduces transconductance gm1
MicroLab, vlsi27 (12/34)
JMM v1.0
OpAmps may have a systematic input offset voltage if not properly designed
u the differential input is zero: v in+= vinu ID6
Vbias
300
Q5300
Q6 300
Vin-
Q1
Q2
300
Vin+
Vout
150
150
300
Q3
Q4
Q7
OpAmps
u overal dc
gain is largely unaffected since both designs have one stage with n-channel and one stage with one or more p-channel driving fets. u for a given power dissipation, and therefore bias current, having a p-channel input-pair stage maximizes the slew rate. u having a p-channel input first stage implies that the second stage has an n-channel input drive fet. This arrangement maximizes the transconductance of the drive fet of teh 2nd stage, which is critical when high frequency operation is important. u output stage: n-channel source follower is preferable because this will have less of a voltage drop (if separate p-well is used). Its higher transconductance reduces the effect of the load cap on the second pole. There is also less degradation on the gain when small load resistances are being driven.
p-channel input fets for the first stage is almost always the best choice
MicroLab, vlsi27 (14/34)
JMM v1.0
closed-loop configurations are discussed and how to compensate an OpAmp to ensure that the closed-loop configuration is not only stable but has a good settling characteristic. u Optimum compensation of OpAmps is typically considered to be one of the most difficult parts in the OpAmp design procedure.
u first-order model of closed-loop amplifier u linear settling time u OpAmp compensation u compensating the two-stage u lead compensation u making
OpAmp
compensation independent of process and temperature u biasing an OpAmp to have stable transconductances
order model of transfer function of a dominant-pole compensated OpAmp: A0 real axis A(s ) = dominant pole (1 + s / p1 ) A0 unity gain frequency definition A( j ta ) 1 ta / p1
unity gain frequency of first order OpAmp model for midband frequencies
closed-loop gain
Ain(s) + A(s)
3dB ta
MicroLab, vlsi27 (16/34)
JMM v1.0
OpAmps step response u settling time is defined as the time it takes for an OpAmp to reach a specified percentage of its final value when a step input is applied u linear settling time portion is due to the finite unity gain frequency (independent on output step size) u nonlinear settling time portion is due to the slew rate limit (dependent on output step size) unity gain frequency estimation for linear settling time portion
-3dB frequency determines the settling-time response for s step input step response for a closed-loop OpAmp if slew rate is larger, no SR limit will occur
1 3dB
1 = ta
vout (t ) = Vstep (1 e t / )
Vstep d vout (t ) t =0 = dt
MicroLab, vlsi27 (17/34)
JMM v1.0
compensating OpAmps the first order model is insufficient, because it ignores poles and zeros at high frequencies which may cause instabilities. u a more accurate open-loop transfer model adds one additional pole (real axis poles and zeros): A0 A(s ) = (1 + s / p1 )(1 + s / eq )
first dominant pole u eq higher frequency poles
may be approximated with a set of real-axis poles and zeros: m n 1 1 1 eq i =2 pi i=1 zi margin PM is an often used measure how far an OpAmp with feedback is from becoming unstable
o 1
u phase
gain if is frequency independent (if t is far away from high frequency poles and zeros) ACL0 ACL ( s ) = s (1 / p1 + 1 / eq ) s2 1+ + 1 + A0 1 + A0 A0 1 ACL0 = 1 + A0
u General
(1 + A0 )( p1 eq )
1 / p1 + 1 / eq
ta eq
% overshoot = 100
JMM v1.0
4 Q 2 1
MicroLab, vlsi27 (19/34)
peaking of overshoot to be calculated Q factor 0.925 0.817 0.717 0.622 0.527 % overshoot 13.3% 8.7% 4.7% 1.4% 0.008%
PM 55 60 65 70 75 u Phase
JMM v1.0
realizes dominant-pole compensation and thereby control p1 and ta : ta = A0 p1 Q16 is included to realize a left-half-plane zero at frequencies around or slightly above t (leadcompensation). Q16 has Vds=0V and thus is in triode region: 1 RC = rds16 = W nC ox Veff 16 L 16
Vbias Vin300
u fet
Q5300
VDD
Q6 300
Q1
Q2
300
Vin+ Q 16 Vbias CC
300
Vout2
150
150
Q3
Q4
Q7
MicroLab, vlsi27 (21/34)
JMM v1.0
1 p1 g m 7 R1 R2C C
for RC=0:
p2
gm 7 C1 + C 2
gm 7 z = CC
1 z = CC (1 / g m 7 RC )
MicroLab, vlsi27 (22/34)
p2
p1
1 p1 g m 7 R1 R2C C p2 gm 7 C1 + C 2
gm 7 z = CC
u increasing gm7
separates poles (pole-splitting) u however, right-hand plane zero introduces negative phase shift into transfer function u increasing CC moves p1 and z1 to low frequency and thus does not help
MicroLab, vlsi27 (23/34)
JMM v1.0
a non-zero RC, a third pole is introduced, but is at high frequency and has almost no effect u However the zero opens a number of possibilities: 1 z = CC (1 / g m 7 RC )
u one could eliminate the right-half plane zero:
RC = 1 / g m 7
u one could choose RC
to be even larger and thus move the right-half-plane zero into the left half plane to cancel the nondominant pole p2:
1 C1 + C 2 RC = 1+ gm 7 CC
u one could choose RC
even larger to move the now lefthalf-plane zero to a frequency slightly greater than the unity-gain frequency that would result without the resistor - say 20% larger (recommended): = 1.2
z
1 RC 1.2 g m1
JMM v1.0
JMM v1.0
lead compensation process and temperature insensitive u the ratios of all transconductances remain relatively constant over process and temperature variations as all fets depend on the same biasing network: gm 7 g m1 p2 ta C1 + C 2 CC u when a resistor is used to realize lead compensation, RC can also be made to track the inverse of transconductance (1/gm7), and thus the lead compensation will be mostly independent of process and temperature variantions: 1 z = CC (1 / g m 7 RC )
RC = rds16
1 = W nC ox Veff 16 L 16
g m 7 = nC ox (W / L )7 Veff 7
The product RC 1/gm7 needs to be constant
following approach results in the possibility of on-chip resistors, realized by using triode-region fets that are accurately ratioed with respect to a single off-chip resistor -> modern circuit design
MicroLab, vlsi27 (27/34)
JMM v1.0
Veff 13 = Veff 7 Va = Vb
25
Vbias
Q 11
Q6
Veff 16 = Veff 12
thus
25
Q 12
25
Veff 7
Veff 13
we need
Va Q 13
Q 16
CC
300
Vb
Q7
transconductances are the probably the most important parameters in OpAmps to be stabilized u the following approach matches transconductances to conductance of a resistor u as a result, the fet transconductances are independent of power-supply voltage as well as process and temperature variations
assuming
g m13 =
for
25
25
Q 10
Q 11
25
25
Q 14 Q 15 Rb
100 25
Q 12 Q 13
g m13
g mi =
JMM v1.0
Exercises VLSI-27
Ex ana3.9 (difficulty: easy): Consider a differential pair amplifier shown on transparency vlsi-27/3 where Ibias=200A and all transistors have W=100m and L=1.6m. Given nCox=92A/V2 and rds-n=8000 [L (m)]/[ID (mA)]. Find the output impedance and the gain. Result: Av =68.6V/V, rout=64k (see Johns/Martin pp146) Ex ana5.1 (difficulty: easy): Find the gain of the OpAmp shown on transparency vlsi-27/9. Assume ID5=100A, first stage VDG=0.5V, 2nd and 3rd stage VDG=1V and bulk of Q8 connected to VSS. Given nCox =3pCox=96A/V2, VDD=VSS=2.5V, RL=10k , =0.5V1/2, F=0.35V, =5e6V1/2/m, Vtn=- Vtn=0.8V. Result: Av =-6092V/V (see Johns/Martin pp224)
A(s)
vout
Coming Up...
u Next
topic Advanced Current Mirrors and OpAmps for next time Johns&Martin : Sections 3.8 and 5
u Readings
u Exercises:
Analog Microelectronics
Advanced Current Mirrors and OpAmp Design
Outline
u Johns&Martin
u advanced current mirrors (chap 6.1)
u wide-swing
current mirrors u wide-swing constant-transconductance bias circuit u enhanced output-impedance current mirrors (not yet) u wide-swing current mirror with enhanced output impedance (not yet)
u folded-cascode OpAmp (chap 6.2)
u small
classical two-stage OpAmp was dicussed in vlsi27. u Recently a number of alternate OpAmps designs have been gaining in popularity. They make use of more advanced current mirrors.
u Wide-swing
current mirror:
difficult to achieve reasonable OpAmp gains due to transistor output-impedance degradation caused to shortchannel effects. u Conventional cascode current mirrors limit the signal swings available. wide-swing current mirror
Iin
W /L 2 n
Vout Iout=Iin
W /L 2 n
Q5
Q4 Q3
Q1 Q2
W /L
W /L
u The
basic idea is to bias the drain-source voltages of transistors Q2 and Q3 to be close to the mini-mum possible without them going to triode region. u Choice of Ibias:
u Ibias
equal to maximum of Iin (all fets in saturation) u Ibias equal to nominal of Iin (for larger Iin , fets in triode, but probably only during slew-rate)
u Design
u Q5
hints:
larger (0,1V to 0.15V) in order to offset the increased threshold voltages for Q 1 and Q 4 due to their body effects u L of Q 1 , Q4 and Q 5 are twice minimal channel length, L of Q 2 and Q 3 are just slightly larger than minimal channel length (high frequency poles)
MicroLab, vlsi28 (4/12)
JMM v1.0
Q8
20/1.6
Q7
20/1.6
Q 11
Q9
Q6
Q 10
Q 18
10/1
10/1.6
Q1
40/1
Q4
10/1
10/1.6
10/1.6
Q 13
Q 15
Q 16 Q 17
Q2 RB
Q3
2.5/1.6
Q5
10/1
Q 12
Vcasc-n
10/1
Vbias-n
bias loop
see vlsi-27 slide 29
cascode bias
start-up circuitry
injects current as long as IDs are zero
variation of the cascode current mirror is the enhanced output-impedance current mirror shown as simplified version u basic idea: use of feedback amplifier to keep the drain-source voltage across Q2 stable, irrespetive of the output voltage the additional amplifier increases the output impedance (see classical cascode current mirror, vlsi-25 slides 16, 17) Rout g m1rds1rds2 (1 + A)
Iout Iin Vbias +
Rout Q1
Q3
Q2
Folded-cascode OpAmp
u many
modern integrated CMOS OpAmps are designed to drive only capacitive loads u capacitive-only loads do not need voltage buffers to obtain low output impedance of the OpAmp u thus it is possible to realize OpAmps having higher speed and larger signal swings than those who must drive resistive loads u these improvements are obtained by having only one single high-impedance node at the OpAmp output that drives only capacitive loads u all internal nodes have relatively low impedance (around gm) thus the speed is optimized u the compensation is usually achieved by the load capacitance u the most important parameter is their transconductance: operational transconductance amplifier OTA
Folded-cascode OpAmp
may be replaced by a wide-swing constant-transconductance bias network and thus VB1, VB2 would be Vcasc-n, Vcasc-p current mirror Q 11 Q 12 Ibias1 Vin + Q1 Q2 Ibias2 Q3 Q 13 Q4
cont
folded cascode fets (see vlsi-25 slide 19) VB1 Q5 Q6 Vout CL VB2 Q8 compensation Q 10
Q7 Q9
Purpose of Q12, Q13 - increase slew-rate performance - recovering improvement from slew-rate
Design hints: - Ibias1 and Ibias2 should be derived from a single bias network - any current mirrors should be designed by parallel combination of unit size fets
MicroLab, vlsi28 (8/12)
JMM v1.0
rout
g m1 Av sC L
g m1 t CL
Design hint: - for large load capacitances a maximal transconductance of input fets maximizes band width, use n-channel fets - input bias current 4 times larger than cascode current (maximizing dc gain) Lead compensation (series resistance R C to CL)
Av (s ) =
g m1
1 1 + rout RC + 1 / sC L
g m1 (1 + sRC C L ) sC L
diode connected fets Q12 and Q13 are turned off during normal operation and have almost no effect u slew-rate limiting behavior:
assume there is a large differential input voltage that causes Q 1 to be turned on hard and Q 2 to be turned off u since Q 2 is off, all of the bias current of Q 4 will be directed to through cascode fet Q5 through n-channel current mirror and out of the load capacitance u the output voltage will decrease linearly with a slewrate given by:
u
Id4 SR CL
u Q1
and current source Ibias will go into triode region, moving the drain voltage of Q 1 to the negative power supply u Q12 and Q 13 clamp the drain voltages so they dont change as much during slew-rate limitation u in addition Q 12 and Q 13 increase the bias currents for Q 3 and Q 4 and thus for C L
Exercises VLSI-28
Ex ana6.2 (difficulty: medium): find reasonable fet sizes for the folded-cascode OpAmp: Assume pos/neg 2.5V power supply, power dissipation maximal 2mW, current ratio 4:1 between input and cascode fets, bias current or Q11 is 1/30 of Q3 (thus ignoring it for power dissipation), maximum fet width is 300um, L=1.6um and Veff=0.25V for all except input fets, W1=W2=300um, rounding widths to 10um, CL=10pF, unCox= 3u pCox= 96uA/V2 a) find all fet sizes, unitiy gain frequency, b) slew-rate with and without clamp fets c) reasonable lead compensation RC Result: a) Q1 to Q4=300um, Q5, Q6=60um, Q7 to Q10=20um, Q11 to Q12=10um, t=2 38MHz b) SR= 32V/us, c) RC=347 (see Johns/Martin pp271-273)
Coming Up...
u Next
topic Comparators for next time Johns&Martin : Sections 6.1 and 6.2
u Readings
u Exercises:
FSM-D
data data
inputs
control
outputs (actuators)
cotrol path (finite state machine)
control control
(sensors)
Goal: You are able to use logic gates and flip-flops wisely and not only in an ad-hoc manner. You master the finite state machine data path model.
Architecture Philosophy
?FSM-D architecture model is composed of 2 blocks:
?finite state machine (FSM) ?data-path (D)
?FSM Chatacteristics
?manager ?controlling, taking decision, initiating sub-tasks
?Data-Path Characteristics
?worker, specialist ?executing, calculating, storing & moving data
FSM-D
data data
inputs
control
outputs (actuators)
cotrol path (finite state machine)
control control
(sensors)
FSM Structures
? Mealy machine
? outputs are dependent of inputs and state
s[k+1] i[k] transition logic state register output logic o[k]
s[k]
? Moore machine
? outputs are dependent on states only (functional restricted)
s[k]
? Medwedjew machine
? outputs are dependent on states only ? outputs are hazard-free
s[k]
Data-Path Elements
?A typical data-path consists of 3 types of basic elements
?buses, multiplexors, de-multiplexors ?functinal units, comparator, like adder, barrel shifter, ALU, etc ?memory elements, like flip-flop, register, register file, etc
bus[31:0] bus[31:0]
32
32
bus[31:16]
32 32 32 32
mux 32
16
a 32
result cin b 32 32 1
ADD
cout 32
enable register
32
mux di 32 enable
register d do 32
register enable di do
32
32
Design Steps
?A tutorial design shall serve as vehicle for a practical approach: Black Jack player ?A key element in the FSM-D design procedure are the interface definitions ?design steps:
?step 1: definitions of the algorithm ?step 2: FSM-D interface definition ?step 3: data-path design ?step 4: data-path interface definition ?step 5: FSM interface definition ?step 6: FSM state definition ?step 7: FSM design ?step 8: VHDL coding ?step 9: test-bench design and simulation
?game restrictions:
?the cards have the following values: 2, 3, 4, 5, 6, 7, 8, 9, 10 and 11 as well as boy, lady and queen all three representing 10 points
?game rules:
?ask for as many cards as needed ?the Ass can be treated as 11 points or as 1 point
FSM-D
cardReady newCard score(4:0)
BlackJack Player
cardValue(3:0)
start
cmp11 A=B? A B enaLoad cardValue(3:0) register enable di do regLoad clk rst a 11 register enaAdd enable
ADD
result b clk rst di do
regAdd
cmp11 A=B? A B 11 a regLoad clk rst enaAdd register enable di do regAdd b clk rst clk rst cmp16 A>B? A B 16 cmp21 A>B? A B enaScore register 21 enable score di do
ADD
result
cmp11 A=B? A B cmp16 A>B? A B 16 regAdd b clk rst clk rst cmp21 A>B? A B enaScore register 21 enable score di do
-10
ADD
result di do
cardValue sel
DataPath
cardValue(3:0)
score(4:0)
FSM input signals cmp11 cmp16 cmp21 cardReady FSM output signals finished lost newCard sel enaLoad enaAdd enaScore
clk cardReady
ControlPath
enaScore
rst start
output signals
reset
state name
cardReady
state name
LoadCard hold broke newCard sel enaLoad enaAdd enaScore Handshake hold broke newCard sel enaLoad enaAdd enaScore
output signals 0 0 1 - - 0 0
output signals 0 0 1 1 1 0 0
cmp11cmp16 cardReady
cmp11cmp16cmp21 state name AddCard hold broke newCard sel enaLoad enaAdd enaScore cardReady state name
output signals 0 0 0 - 0 1 0
output signals 0 0 0 - 0 1 0
cmp16cmp21
cmp11cmp21
cmp16
-10
ADD
result di do b
sel
process(clk,rst) begin if (rst = 0) then process regLoad <=00000; regAdd <=00000; regScore <=00000; continuous conditional elsif (clkevent and clk=0) then if (enaAdd=1) then assignment regAdd <= regAdd +regLoad; end if; ... cmp11 <= 1 when (regLoad =01011), else 0; end if; cmp16 <= 1 when (regAdd > 10000) else 0; end process; cmp21 <= 1 when (regAdd > 10101) else 0;
MicroLab, VLSI-30 (21/27)
JMM v1.4
s[k]
process(clk,rst) begin if (rst = 0) then state<=StartState; elsif (clkevent and clk=0) then case state is when StartState => state <= CallCard; when CallCard => if (cardReady = 1) then state <= LoadCard; end if; when others => state <= IllegalState;
-- used for VHDL analysis -- nullfor synthsis
process(state) begin case state is when StartState => outvec <= 000--00; when CallCard => outvec <= 001--00; when others => outvec <= UUUUUUU;
-- used for VHDL analysis -- null for synthesis
end case; end process; finished <= outvec(6); lost <= outvec(5); newCard <= outvec(4); ...
MicroLab, VLSI-30 (22/27)
JMM v1.4
Test Bench
response generation and verification
apply stimuli
capture response
s[k]
FSM
?workers/manager analogy is used to assign subtasks to control-path (manager) and data-path (specialized workers)