VLSI Interview Questions

3/2/2016
4.BasicDigitalCircuitsIntroductiontoDigitalCircuits
BIBLICA
4.BasicDigitalCircuits
Digitalcircuitsarecompositionsoflogicgates.Asmallnumberofdigitalcircuitsoccurfrequentlyinlargerdigitaldesigns,suchasmultiplexers,encoders,
decoders,andmemoryelements.Inthischapter,weusethemethodoflogicalefforttostudythosebasicdigitalcircuitsthatserveasbuildingblocks
fortheconstructionoflargerdigitalsystems.
4.1.LogicGates
InSectionLogicGates,weintroducelogicgateswithoneandtwoinputs.Often,weneedgateswithmorethan2inputs,orwishtodesignnewlogic
gatesforspecificlogicfunctionsortimingbehavior.Inthefollowingwestudythedesignandcharacterizationoflogicgatesaselementarybuildingblocks
fordigitalcircuits.
4.1.1.LogicGateswithMultipleInputs
AssumewedesignadigitalcircuitandneedaNANDgatewith3inputs.Wemayassemblethe3inputNANDgateusing2inputNANDgatesandan
inverterasbuildingblocks,seeFigure4.1.UsingBooleanalgebra,itisstraightforwardtoshowthatthiscircuitimplementsthelogicfunction
Thereareseveralproblemswiththisimplementation,though.First,thedelayfrominputs and to islargerthanfrominput Suchasymmetries
canbeconfusingwhendesigninglargercircuitswithdelayconstraints.Second, the delay of the longest path is larger than necessary, and, third, the
CMOSimplementationneeds10transistors,whichisalsomorethannecessary.
A
B
C
Figure4.1: 3inputNANDgatebuiltfrom2inputNANDgates.
Analternativedesignforthe3inputNANDgateusesCMOStransistorsasbuildingblocks,asshowninFigure4.2.Thiscircuitneedsonly6transistors,
andissymmetricw.r.t.itsinputs.Output is0onlyifallinputsare1.
A=0
B=0
C=0
Figure4.2: CMOScircuitof3inputNANDgateandinteractiveswitchmodel.
ThecircuittopologyinFigure4.2extendsto inputNANDgatesfor
:compose pMOStransistorsinparalleland nMOStransistorsinseries.
[1]TheseriescompositionofthenMOStransistorsdeterminestheonresistanceofthepulldownpath.Thelargernumberofinputs,thesmallerthepull
downcurrentand,hence,thelargerthedelayofthegate.Themodeloflogicaleffortreflectsthisdependenceofdelayonthenumberofinputsinthe
logical effort of the gate. The matched input NAND gate increases the width of the nMOS transistors to match the drive current of the reference
inverter.Increasingthetransistorwidthsreducestheonresistanceattheexpenseofincreasingtheinputcapacitanceofthegate.Thistransistorsizing
doesnotsolvetheproblemthatthedelayincreaseswithincreasingnumbersofinputs.Itmerelyshiftstheburdenfromthegateitselftoitsdriver.
Todeterminethelogicaleffortandparasiticdelayofthe3inputNANDgate,wesizethetransistorstomatchtheoutputcurrentofthereferenceinverter.
Figure4.3showsthematched3inputNANDgatewithtransistorwidths.Whenallinputsare1,allthreenMOStransistorsareswitchedon.Theseriesof
onresistancessumstoatotalof
Tobeequaltoonresistance
ofthereferenceinverter,weneedtotriplethewidthsofthenMOS
transistorsoftheNANDgate,
IfoneoftheNANDinputsis0,thenonepMOStransistorisswitchedon.Tomatchtheon
resistanaceofthereferenceinverter,weassignthesamewidth
tothepMOStransistorsoftheNANDgate.Ifmorethanone
NANDinputis0,thentheparallelcompositionofpMOStransistorshasaloweronresistancethanforasingleinput.Thecorrespondingpullupdelayis
https://bibl.ica.jku.at/app/read/dc/build/html/basiccircuits/basiccircuits.html
1/43
3/2/2016
smallerifonlyoneinputis0.Therefore, using normalized pMOS widths of 2 units matches the reference inverter in the worst case when the pullup
delayoftheNANDgateislargest.
2
Y
Figure4.3: Matched3inputNANDgate.
Thematched3inputNANDgateenablesustodeterminethelogicaleffortofa3inputNANDgatebymeansofthenormalizedinputcapacitances.Each
input,e.g.input drivestwoparallelgatecapacitancesofonepMOSandonenMOStransistor:
Therefore,thelogicaleffortofthe3inputNANDgateis
per input. Note that the logical effort of the 3input NAND gate is by
larger than the logical effort of the 2input NAND gate. Furthermore, we
determinetheparasiticdelayofthe3inputNANDgatebycalculatingthenormalizedoutputcapacitance:
Thus,theparasiticdelayofa3inputNANDgateis
Wefindthattheparasiticdelayofthe3inputNANDgateisbyonedelayunitlargerthanthatofthe2inputNANDgate.
ForNANDgateswith inputs,thematchedgatehasnMOStransistorsofwidth andpMOStransistorsofwidth2.Therefore,thelogicaleffortofthe
inputNANDgateis
(1)
perinput,andhasparasiticdelay
(2)
Thelargerthenumberofinputs thelargerthelogicaleffortandtheparasiticdelayoftheNANDgate.Asaconsequence,themoreinputsthegate
has,thesloweritis,or,fromthecircuitdesignerperspective,themoreeffortweneedtoinvestinthedrivercircuittokeepthepathdelaylow.
The design of an inputNORgate procedes analogously to the input NAND gate. The input NOR gate has parallel nMOS transistors and
pMOStransistorsinseries.Thematched inputNORgatehasnMOStransistorsofwidth
andpMOStransistorsofwidth
Asaresult
eachinputofthe inputNORgatehaslogicaleffort
andtheparasiticdelayofthegateis
For example, a 3input NOR gate has logical effort
per input and parasitic delay
The logical effort of an input NOR gate is

2/43
3/2/2016
largerthanthelogicaleffortofan inputNANDgate.Thus,ifwehavethechoicetoimplementthelogicofacircuitpathwithNANDgatesorNORgates,
wecommonlypreferNANDgates,becausetheytendtoyieldlowerpathdelays.
4.1.2.TreeStructuredLogicGates
Whenthenumberofinputsoflogicgatesislarge,treestructuredgatesoffersuperiorperformancecomparedtoCMOSgates.Asaconcreteexample,
consider a 16input NAND gate. The logical effort of a CMOS NAND gate with a pulldown network of 16 nMOS transistors in series is
according to Equation (1), and the parasitic delay is
according to Equation (2). Figure 4.4 shows an alternative, treestructured
implementationthatuses2inputNANDgatesandinvertersasbuildingblocks.Thetreehasapathlogicaleffortof
andapath
parasiticdelayof
Thetreestructureimprovesbothquantities.
Figure4.4: Atreestructured16inputNANDgateusing2inputNANDgatesandinvertersasbuildingblocks.
ThetreestructureoftheNANDcircuitisinspiredbyBooleanalgebra,moresuccinctlytheassociativityoftheANDoperation:
Associativity implies that we can parenthesize an expression any way we want to. We may even omit the parentheses altogether because,
mathematically, the parenthesization does not affect the value of the expression. From the perspective of the circuit designer, however, parentheses
expressanevaluationorder.Forexample,givenabinaryANDoperationand8operands
therearemanydifferentevaluationorders.
Oneofthemcorrespondstotheleftskewedparenthesization
which groups the operations such that all 7 AND operations must be executed one after each other. The corresponding circuit has a depth of 7
operations.Incontrast,thetreestructuredparenthesizationyieldstheminimumdepthof
operations:
Figure4.5showsthecorrespondingcircuits.Wenotethatthetreestructureisasymptoticallyoptimal,i.e.foralargeenoughnumberofinputs no
othercircuittopologyhasadepthsmallerthan
uptoconstantfactors.Theconstantsdependonthenumberofinputsofthebuildingblocks.For
example, if we replace subtrees of three 2input AND gates with one 4input AND gate, the depth of the treestructured AND gate in Figure 4.5
decreasesfrom3to2levels.
3/43
3/2/2016
(a)
A0
A1
A2
A3
A4
A5
A6
A7
(b)
A0
A1
A2
A3
A4
A5
A6
A7
Figure4.5: Leftskewed(a)andtreestructured(b)conjunctionof8inputswith7binaryANDoperators.
Given a treestructured input AND gate, we obtain an input NAND gate by complementing the output. The 16input NAND gate in Figure 4.4
implementseach2inputANDgateofthetreewitha2inputCMOSNANDcircuitfollowedbyaninverter.Tocomplementtheoutput,weomittheinverter
attherootofthetree.The3inputNANDgateinFigure4.1isanexampleofanunbalancedNANDtree,wherethenumberofinputsisnotapowerof2.
Incontrast,thetreesinFigure4.4andFigure4.5arebalanced,becauseeachpathfrominputtooutputhasthesamedepth,ornumberofgates.
4.1.3.AsymmetricGates
Anasymmetricgatehasinputswithdifferentlogicalefforts.Asymmetricgatesrequireadditionalattentionfromthecircuitdesigner,butfacilitatefaster
circuitsifusedappropriately.Theasymmetrymaybecausedbythecircuittopologyorbydeliberatetransistorsizingtoreducethedelayofthecritical
path.Wedesignamajoritygatetomotivateasymmetriccircuittopologies,andthendiscusshowtotradelogicaleffortbetweeninputs.
MajorityGate
ThemajorityoperationisaternaryBooleanoperation,cf.majorityfunction:
(3)
Intermsofatruthtablethemajorityoperationisspecifiedas:
A
Asthenamesuggests,theoutputofthemajorityoperationisthevaluethatoccursmoreoftenininputs
and
Theinverting3inputmajoritygate
inFigure4.6computesthecomplement
4/43
3/2/2016
A=0
B=0
C=0
Figure4.6: SymmetricCMOScircuitofinverting3inputmajoritygateandinteractiveswitchmodel.
Wecharacterizethedelayofthemajoritygatebyderivingitslogicaleffortandparasiticdelay.Themajoritygatehassixarms,eachofwhichisaseries
compositionoftwotransistors.Tomatchtheonresistancesofthereferenceinverter,wemakethetransistorsineacharmofthemajoritygatetwiceas
wideasthoseofthereferenceinverter.Thus,asshowninFigure4.7,allpMOStransistorshavewidth
and all nMOS transistors have width
Thesetransistorwidthsmatchthereferenceinverterforallcombinationsofinputvalues,exceptwhenallinputvaluesareequal.Inthiscaseall
three arms of either pullup or pulldown network are switched on, such that the parallel composition yields a three times smaller equivalent on
resistance.Weclassifytheseinputcombinationsasexceptionallyfast,analogoustotheexceptionalinputcombinationsoftheNANDorNORgates.
4
Y
Figure4.7: Matchedinvertingmajoritygate.
EachinputofthemajoritygatedrivestwopMOSandtwonMOStransistors.Therefore,allthreeinputs
and havelogicaleffort
ThenormalizedoutputcapacitanceofthemajoritygateconsistsofthreepMOSandthreenMOStransistors.Therefore,theparasiticdelayofthemajority
gateis
ComparedtoaNANDorNORgate,themajoritygateisrelativelyslow.Oneoptionforreducingthelogicaleffortandparasiticdelayofthemajoritygateis
transistorsharing.ObserveinFigure4.7thatthetwopMOStransistorsthatoneinputdrivesinthepullupnetworkareindistinctarms.Thepulldown
networkexhibitsthesamestructureonlywithnMOStransistors.Iftwoarmsshareonetransistor,wecanreducethelogicaleffortbyreducingthenumber
oftransistorsfromtwotoone.Thisdoesnotworkforallinputssimultaneously.AsshowninFigure4.8,ifwesharethetransistorsofinput wecannot
alsosharethetransistorsofinputs and acrossarmswithoutalteringthelogicfunction.However,sharingthetransistorsofinput reducesitslogical
effort,andalsotheparasiticdelayoftheentiregate.
5/43
3/2/2016
A=0
B=0
C=0
Figure4.8: Asymmetricinverting3inputmajoritygatewithtransistorsharingandinteractiveswitchmodel.
Sharingthetransistorsofinput asshowninFigure4.8,doesnotaffectthetransistorwidthsofthematchedmajoritygate.Apathfromsupplyrailto
outputconsistsofaseriescompositionoftwotransistors,eitherinthepulluporpulldownnetwork,justlikeinFigure4.7.Theexceptionalcase,whenall
threeinputvaluesareequal,hasalargerequivalentonresistancethanthegateinFigure4.7,butisstillfasterthantheregularcase.Thus,thematched
versionofthemajoritygateinFigure4.8 has transistor widths
and
While the logical efforts of inputs and remain unchanged,
transistorsharinghalvesthelogicaleffortofinput :
Sincethelogicaleffortofinput differsfromthelogicaleffortofinputs and themajoritygateinFigure4.8isanasymmetricgate.Incontrast,the

majoritygateinFigure4.6isasymmetricgate,becauseallinputshaveequallogicaleffort.Ratherthancharacterizinganasymmetricgatebylisting
thelogicaleffortsforallinputs,itissometimessufficienttouseasinglenumberinstead.Thetotallogicaleffortisthesumofthelogicaleffortsofall
inputs.Forexample,thetotallogicaleffortpermitsaquantitativecomparisonofthesymmetricandasymmetricmajoritygates.The symmetric majority
gatehastotallogicaleffort
Transistorsharingreducesthetotallogicaleffortoftheasymmetricgateto
Sharing the transistors of input reduces not only the logical effort but also the output capacitance and, thus, the parasitic delay of the asymmetric
majoritygate:
Notethatwecanconstructalternativegatedesignsbysharingthetransistorsofinput orofinput Theresultingasymmetricgatesdonotreducethe

outputcapacitance,however.Onlysharingthetransistorsdrivenbyinput reducestheparasiticdelay.Thus,ifwehavethechoice,weprefersharing
thosetransistorsconnectedtotheoutputofthegateinordertoreducetheparasiticdelayasaddedbenefit.
AsymmetricTransistorSizing
Theasymmetricmajoritygatesacrificesthesymmetryofthecircuittopologyforareducedlogicaleffortofoneinputwithoutaffectingthelogicaleffortof
theotherinputs.Thebenefitofthereducedlogicaleffortisareducedgatedelayforsignaltransitionsonthecorrespondinginput.Evenwithasymmetric
gatetopology,inherenttoNANDandNORgatesforinstance,wecantradelogicaleffortbetweeninputsbytransistorsizing.The resulting asymmetry
maybedesirabletospeedupthecriticalpathofacircuit.
As an example, consider the 2input NAND gate in Figure4.9.Assumethatinput is critical, and we wish to reduce its delay. Thus, our goal is to
minimizethelogicaleffortofinput whichmeanstoreduce
from4/3tothatofareferenceinverter
6/43
3/2/2016
2
Y
1
1
1
Figure4.9: MatchedasymmetricNANDgatewithnMOSscalingfraction
Since the matched NAND gate has normalized nMOS transistor widths
reducing width
to 1 reduces the input
capacitanceofinput asdesired.However,ifallwedoisreducetheinputcapacitanceofinput thegatebecomesmismatchedw.r.t.thereference
inverter.In fact, halving the width of the nMOS transistor doubles its onresistance, which results in a smaller output current. Consequently, the gate
would be slower rather than faster as planned. To obtain a delay reduction, we need to reduce the logical effort, i.e. we wish to reduce the input
capacitanceofinput withoutchangingtheequivalentonresistanceofthepulldownnetwork.Transistorsharinginthemajoritygatehasexactlythis
effect.
Incaseofthe2inputNANDgate,wecanbalanceachangeofwidth
byacorrespondingchangeof
sothatthepulldownnetworkofthe
asymmetric NAND gate remains matched to the reference inverter. More specifically, matching an asymmetric 2input NAND gate requires that
equivalent onresistance
of the series composition of nMOS transistors must be equal to onresistance
of the reference
inverter:
Sincetheonresistanceisindirectlyproportionaltothetransistorwidth,weobtainthismatchingconstraintfortheasymmetricNANDgate:
becausethereferenceinverterhasannMOStransistorofwidth
Wecanfulfillthisconstraintwithascalingfraction
ifwe
choose
and
as shown in Figure 4.9. For
we obtain the symmetric matched NAND gate with
Scalingfraction
reducesthelogicaleffortofinput As approaches0,width
andlogicaleffort
approachvalue1.Thetablebelowliststherelevantquantitiesforseveralvaluesof
1/2
4/3
4/3
2.67
1/4
4/3
10/9
3.11
1/10
10/9
10
28/27
11.04
1/100
100/99
100
298/297
34
101.0
Thepriceofreducingthelogicaleffortofinput isarapidlygrowinglogicaleffortofinput Choosing

appears to be a good compromise,
where
iswithin
oftheminimumlogicaleffort,yetthetotallogicaleffortofthegateincreasesby
only.
4.1.4.CompoundGates
ThereexistcomplexBooleanexpressionsthatwecanimplementinasingleCMOSgate.Inparticular,allBooleanexpressionsformedwithANDandOR
operationscanbecastintoaCMOSgateifallvariablesappearinuncomplementedformonlyandtheentireANDORexpressioniscomplemented.
TwowidelyusedfamiliesofcompoundgatesaretheinvertingANDOR(AOI)gatesandtheinvertingORAND(OAI)gates.Asanexample,consider
theBooleanexpression
IfourCMOSgatesupplyconsistsofNAND,NOR,andNOTgates,wemightbetemptedtousetheinvolutiontheorem,andtransformtheexpression
suchthatwecanimplement withthesegates:
The circuit for
is shown in Figure 4.10. The logical effort of the longest paths
and
is
and the
7/43
3/2/2016
correspondingpathparasiticdelayis
A
B
C
Figure4.10: Implementationof
withNAND,NOR,andNOTgates.
Now,recallthecompletenesstheorem,whichtellsusthatanyBooleanexpressionofANDandORoperationscanberealizedwithaseriesparallel
switchnetwork.Inourexample,
ifandonlyif
AND
ORif
This logical equivalence translates directly into the CMOS pull
downnetworkinFigure4.11(a).Weimplementtheconjunctionof and withaseriescompositionofnMOStransistors,andthedisjunctionof
and withaparallelcomposition.Withthepulldownnetworkinplace,wederivethepullupnetworkinFigure4.11(b)systematicallybyapplyingthe
principleofduality.ThepullupnetworkusespMOSinsteadofnMOStransistors,andwereplacetheparallelcompositionwithaseriescomposition
andviceversa.
Y
A
4
Y
C
B
1
B
(a)
Figure4.11: CMOSgatefor
Y
A
(b)
(c)
:(a)pulldownnetwork,(b)pullupnetworkisdualofpulldownnetwork,(c)matchedCMOSgate.
Figure4.11(c)redrawsthepullupnetworkslightlyforthesakeofclarity,andannotatesthetransistorsizesneededtomatchthereferenceinverter.The
gateisasymmetric,becausethelogicaleffortsoftheinputsdiffer:
Theparasiticdelayofthegateis
ComparedtothelogicallyequivalentcircuitinFigure4.10,theCMOSgateoffersimprovementsintermsof
speedandnumberoftransistors.ToemphasizethefactthatourCMOSgateisacompoundgate,weintroducethesymbolinFigure4.12byfusingan
AND gate with a NOR gate. Since the output of the AND gate drives the input of the OR gate whose output is complemented by an inverter, the
compoundgateisaninvertingANDORgateorAOIgateforshort.
A
B
C
Figure4.12: TheAOIgatesymbolreflectsthelogicfunction.
AOIgatesexistforthecomplementofallBooleanexpressionsinSOPnormalform.
AnalogoustoAOIgates,thefamilyofOAIgates,orinverting ORAND gates, consists of compound gates for Boolean expressions in complemented
POSnormalform.Asanexample,considerBooleanexpression
TheOAIgatefor andthecorrespondingOAIsymbolareshowninFigure4.13.Thematchedgateissymmetric,withlogicaleffort
parasiticdelay
perinputand
8/43
3/2/2016
D
Y
Figure4.13: MatchedOAIgatefor
A
B
C
D
andcorrespondingOAIgatesymbol.
Besides the AOI and OAI gates for complemented twolevel normal forms, we can design compound gates for multilevel expressions as well. For
example,Booleanexpression
hasapulldownnetworkconsistingofaseriescompositionof andaparallelcompositionof andaseriescompositionof and
C
2
4
Y
D
B
C
A
D
Figure4.14: Matchedmultilevelcompoundgatefor
andcorrespondingcompoundgatesymbol.
ThecompoundgateinFigure4.14isasymmetric,becausethelogicaleffortsoftheinputsdiffer:
Theparasiticdelayofthecompoundgateis
CMOSdesignersoftenusecompoundgatesasefficientimplementationtarget.Ifoneoftheinputsmustbecomplemented,thenitiseasytointroducea
separate inverter stage to drive the input of the compound gate. If we want an AO or OA function without inverted output, we may use an additional
inverterstagetocomplementtheoutputofthecorrespondingAOIorOAIgate.
DesignaCMOScompoundgatetocompute
a.Deriveeitherthepulldownorthepullupnetwork.
b.Deducethecomplementarynetworkusingtheprincipleofduality.
c.Performtransistorsizingtomatchthecompoundgate.
d.Determinethelogicaleffort(s)andparasiticdelay.
Solution
9/43
3/2/2016
4.1.5.XORandXNORGates
OurCMOSimplementationoftheXORgateinFigure1.42deservesacloserlook.UnlikeNANDorNORgates,theXORgateassumesthattheinputs
are available in complemented and uncomplemented form, and the pullup and pulldown networks are not duals of each other. First, however, we
characterizethedelayofthe2inputXORgatebyderivingitslogicaleffortandparasiticdelay.Thematched2inputXORgateisshowninFigure4.15.
Eacharmisaseriescompositionoftwotransistors.Tomatchtheonresistancesofthereferenceinverter,wemakethetransistorsineacharmofthe
XORgatetwiceaswide.
A
Y
Figure4.15: Matched2inputXORgate.
Eachofthefourinputsofthe2inputXORgate,
and drivesonepMOSandonenMOStransistor.Therefore,eachinputhaslogicaleffort
ThenormalizedoutputcapacitanceoftheXORgateconsistsoftwopMOSandtwonMOStransistors,sothattheparasiticdelayis
Comparedtothe2inputNANDandNORgates,theXORgateusestwiceasmanytransistors,whosecapacitancesmakeitrelativelyslow.
SymmetricXORGates
XORgateswith inputsimplementtheoddparityoperation,whichoutputsa1ifthenumberofinputswithvalue1isodd,cf.ref:parityfunction<parity
function>.Forexample,for
function
equals1ifthenumberof1inputsis1or3.Thetruthtableshowsthe3inputXORfunction
oroddparityfunction.
A B C Y
0
TheoddparitypropertysuggestsadesignforanXORCMOScircuit.Tobuildan inputXORgate,weneed armswith transistorseach.Halfofthe

armsareinthepullupnetwork,oneperminterm,andeachhasanoddnumberofcomplementedinputs.Theotherhalfofthearmsareinthepulldown
network.Foreachcombinationofinputvalues,exactlyonearmisswitchedon.Figure4.16showstheCMOScircuitfora3inputXORgate.Whenever
youtoggleoneoftheinputsoutput toggles,becausechangingoneinputchangestheparityfromoddtoevenorviceversa.
10/43
3/2/2016
A=0
B=0
C=0
Figure4.16: CMOScircuitof3inputXORgateandinteractiveswitchmodel.
Thematched3inputXORgatehaspMOStransistorsofwidth
andnMOStransistorsofwidth
and drivestwopMOSandtwonMOStransistors,resultinginalogicaleffortof
nMOStransistors,thatcontributetotheoutputcapacitanceandparasiticdelay
the inputXORgatehaslogicaleffort
Eachofthesixinputs
perinput.TheoutputisconnectedtofourpMOSandfour
Sincethegatetopologygeneralizesto inputs,wefindthat
perinput.Theparasiticdelayofthe inputXORgateamountsto
Theexponentialgrowthofbothlogicaleffortandparasiticdelayinthenumberofinputs limitstheapplicabilityofthisCMOScircuittosmall Forlarger

numbersofinputstreestructuredtopologiesbasedon2inputXORgatesasbuildingblocksarefaster.
AsymmetricXORGates
WecanreducethelogicaleffortandparasiticdelayoftheXORgatebytransistorsharing.NoticeinFigure4.16thateachinputdrivestwopMOSand
twonMOStransistors.Iftwoarmsshareonetransistor,wecanreducethelogicaleffortofthecorrespondinginput.Sharingdoesnotworkforallinputs
simultaneously.AsshowninFigure4.17wesharethetransistorsofinputs and toreducetheirlogicaleffortsandtheparasiticdelayoftheentire
gate.Then,wecanalsosharethetransistorsofinputs and
A=0
B=0
butnotthoseofinputs and withoutalteringthelogicfunction.
C=0
Figure4.17: AsymmetricCMOScircuitof3inputXORgateandinteractiveswitchmodel.
TomatchtheXORgate,observethatanypathfromsupplytooutputconsistsofaseriescompositionof3transistors,justlikeanarminFigure 4.16.
Therefore, sharing transistors does not affect the transistor widths of the matched gate. Like the symmetric XOR gate, all pMOS transistors of the
11/43
3/2/2016
matchedgatehavewidth
inputs
andallnMOStransistorswidth
Asaresult,theXORgateinFigure4.17isasymmetric.Thelogicaleffortsof
and are
whilethelogicaleffortsofinputs and remainunchangedcomparedtothesymmetricdesign:
Sharingthetransistorsconnectedtotheoutputofthegatehalvestheoutputcapacitance,andtherefore,theparasiticdelayoftheasymmetricXORgate
is
TheasymmetricXORgatehassmallertotallogicaleffortandsmallerparasiticdelaythanthesymmetricgate.
XORGatesandDuality
Thesymmetric2inputXORgateinFigure4.15violatestheprincipleofduality,becausethepullupandpulldownnetworksarenotdualsofeachother.
Inthissection,weusethe2inputXORgatetodemonstratethatdualityisasufficientbutnotanecessarypropertyofCMOSgates.
Recallthetruthtableofthe2inputXORgate,
A B Y
0
TheSOPnormalformfortheXORfunctionis
Interpretthisequalityas
if
AND
ORif
AND
Negatepredicates
argumentfortheswitchmodelofpMOStransistors.Then,weobtaintheequivalentXORlogic:
if(
AND
)ORif(
AND
and
to
and
tosimplifythe
).
TheBooleanexpressionconsistsofANDandORoperations,whichtranslatesdirectlyintotheseriesparallelpullupnetworkinFigure4.18(a). Given
thepullupnetwork,wecanderivethepulldownnetworkbyformingthedual,asshowninFigure4.18(b).Wetransformtheparallelarmsofthepullup
networkintoaseriescompositionofparallelnMOStransistors.TheinteractiveswitchmodelenablesyoutoverifythetruthtableoftheXORgate.
A=0
B=0
Figure4.18: 2inputXORgate:(a)pullupnetworkderivedfromlogicfunction,(b)thepulldownnetworkisthedualofthepullupnetwork,and(c)
interactiveswitchmodel.
12/43
3/2/2016
TheXORgateinFigure4.18hasthesamepullupnetworkasouroriginalXORgateinFigure4.15,uptotheorderofthepMOSgateinputs.However,
thepulldownnetworksaredifferent.ToarriveattheXORgateinFigure4.15, consider the truth table of the XOR gate again, now with the goal to
derivethepulldownnetworkfromthelogicfunction.Weseethat
if
AND
ORif
AND
TofittheswitchmodelofnMOS
transistors,negatethepredicates
if(
AND
and
)ORif(
to
AND
and
andwefindtheXORlogic:
).
ThisBooleanexpressiontranslatesdirectlyintotheseriesparallelpulldownnetworkinFigure4.19(a).Derivingthedualofthepulldownnetworkyields
thepullupnetworkinFigure4.19(b).
A=0
B=0
Figure4.19: 2inputXORgate:(a)pulldownnetworkderivedfromlogicfunction,(b)thepullupnetworkisthedualofthepulldownnetwork,and(c)
interactiveswitchmodel.
TheXORgateinFigure4.19hasthesamepulldownnetworkasouroriginalXORgateinFigure4.15,butadifferentpullupnetwork.However, we
arriveattheXORgatedesigninFigure4.15bycombiningthepullupnetworkofFigure4.18withthepulldownnetworkofFigure4.19.Thisdesign
savesawireandissimplerthanthedesignswithdualnetworks,becausethefourarmsareindependent.Furthermore,theXORgatedemonstratesthat
CMOSgatesdonotnecessarilyhavepullupandpulldownnetworksthataredualsofeachother.The3inputXORgatesandthemajoritygate are
examplesofCMOSgateswhosepullupandpulldownnetworksarenotdualseither.
XORCMOSCircuits
The2inputXORgateinFigure4.15requirescomplementedanduncomplementedinputs.Infact,wemayspecifythisCMOSgateasalogicfunctionof
four inputs, XOR
with the additional constraints
and
that the driving circuit has to obey. The input specification
distinguishestheXORgatefromotherCMOSgates,theNANDandNORgatesforexample.The2inputNANDandNORgateshavetwoinputsnotonly
becausetheyrealizebinarylogicfunctionsbutalsointermsofthenumberofinputwiresthattheirCMOSimplementationshave.Incontrast,wedefine
theXORgateasabinarylogicfunction,drawtheXORgatesymbolwithtwoinputs,butourCMOSgateinFigure4.15requiresfourinputs.Thereason
isthatthereexistsno2inputCMOSgateforthe2inputXORlogicfunction,becauseCMOSgatescanimplementmonotonicallydecreasingfunctions
only. Hence, implementing a 2input XOR logic gate requires a CMOS circuit rather than a CMOS gate. In this section, we discuss several CMOS
implementationsofthe2inputXORlogicgate.
WebeginwithastraightforwardextensionoftheCMOSgateinFigure4.15.Toobtaina2inputXORgate,wegeneratethecomplementsofinputs
and bymeansofinverters.TheresultingimplementationoftheXORlogicgateisshowninFigure4.20.
13/43
3/2/2016
A
B
Y
A
Figure4.20: 2inputXORgatewithinputinverters.
Ifwewishtousethe2inputXORgateinFigure4.20asabuildingblockfortheconstructionoflargercircuits,weneedtocharacterizeitsdelayasa
functionoftheelectricaleffort.SincetheXORlogicgateconsistsofinvertersandanXORCMOSgate,wecanscaletheinvertersandtheXORCMOS
gateindependentlytominimizeitsdelay.SincetheXORcircuitisareconvergingbranch,wecouldapplythe2danalysismethodtodeterminethescale
factors for a given electrical effort. However, a close look reveals why a 2danalysis may not yield the fastest design. First, notice that the circuit is
symmetricininputs and ifweuseasymmetricXORCMOSgate.Thus,thecircuithasonepathofinterest,sayfrominput tooutput Assume
theinverterhasinputcapacitance
TheXORCMOSgateimposesaloadofonepMOSandonenMOSgatecapacitanceoneachlegofaforkdriven
byinput Let
inFigure4.21.
betheinputcapacitanceofinputs and oftheXORCMOSgate,thenourXORcircuitincludesa1forkwithequalloads,asshown
XOR2
in
HC
C
in
Figure4.21: Thepathofinterestof2inputXORgateinFigure4.20includesa1forkwithequalloads.
Recallthatweshouldavoid1forks,butletusignorethislessonforamoment.Instead,weinsistonminimizingthedelayofthepathfrominputtooutput
inFigure4.21givenelectricaleffort Noticethatthecircuitdoesnotpermitequalizingthedelaysofthelegsofthefork.Instead,thepaththroughthe
inverterlegaddstheinverterdelaytothedelaythroughtheXORCMOSgate.Thus,allwecandobygatesizingistominimizethedelayoftheslower
legwiththeinverter.Thisleavesuswiththeoptimizationproblemofhowmuchinputcapacitancetoassigntotheonpathinverterversustheoffpath
XORCMOSgate.Asdiscussedinforkdesign,weintroducefactor inrange
tosplittheinputcapacitancebetweentheinverterandthe
XORCMOSgate:
suchthat
Then,thestageeffortsoftheinverterandtheXORCMOSgateare:
Now,thelowerlegoftheforkhasdelay
andtheupperleghasdelay
where
Wecanminimize
bysettingthederivative
tozero,andfind
ThuswhenusingtheXORlogicgateaspartofalargercircuit,wecanchoose dependingonelectricaleffort tominimizeitsdelay.

14/43
3/2/2016
If we wish to minimize the delay of the XOR logic gate further, we could make the XOR CMOS gate asymmetric such that the logical effort of the
complementedinput issmallerthanthatoftheuncomplementedinput,soastospeedupthepaththroughtheinverterleg.Alternatively,wemayrecall
our lesson that a 2fork is preferable to a 1fork. Hence, our second attempt for an XOR logic gate uses 2forks to generate the complemented and
uncomplementedinputs,asshowninFigure4.22.
A
B
Y
A
Figure4.22: 2inputXORgatewith2forkinputs.
We can analyze this XOR circuit with our 2danalysis method. The path of interest is shown in Figure 4.23, with effort delay
invertersofthe2fork,andeffortdelay fortheXORCMOSgate.
d/2
1
d/2
1
3
assigned to the
d2
C
C in
XOR2
H C in
C
d 1 +1
Figure4.23: Thepathofinterestofthe2inputXORgateinFigure4.22includesa2forkwithequalloads.
The2danalysisenablesustoexpresseffortdelay
asafunctionof
suchthatthepatheffortdelaybecomes
Minimizingthepatheffortdelaybysettingthederivative
Foragivenelectricaleffort
wedeterminedelay
tozeroyieldsthepolynomialin
byfindingthepositiverealrootofthepolynomial.Then,theminimumdelayoftheXORlogicgate
is
where
Figure4.28belowcomparesthedelaysofthetwoXORgateswith1forkand2forkinputsasa
functionofelectricaleffort Notunexpectedly,forall the2forkdesignisfasterthanthe1forkdesign.Furthermorethe1forkdesignslowsdown
muchmorerapidlythanthe2forkdesignforincreasing
The speed difference between the 1fork and 2fork XOR gate designs motivates the search for even faster XOR gate circuits. In the following, we
presentthreealternativedesigns.WebeginwiththereconvergingNANDcircuitinFigure4.24ontheleft.UsingBooleanalgebra,itisstraightforwardto
verifythatthiscircuitimplementsthe2inputXORfunction.Duringourstudyofreconvergingbranches,wehaveidentifiedthe2forkstylemodification
inFigure4.24ontherightassuperiorbecauseitisfaster.ThecomparisonofthedelaysinFigure4.28showsthatthereconvergingNANDcircuitis
slowerthanthe2forkXORgatecircuitupto
Forlargerelectricalefforts,thereconvergingNANDcircuitisfaster.
15/43
3/2/2016
Figure4.24: Thereconvergingbranchcircuitimplementsa2inputXORgate(left).The2forkstyle(right)yieldsafastercircuit.
SinceNANDgateshavenotonlyarelativelysmalllogicaleffortbutalsosmallparasiticdelaycomparedtoother2inputgates,a2levelNANDcircuitis
another top contender for an XOR gate. Again, we may use a 1fork or the 2fork shown in Figure 4.25 to generate the complemented and
uncomplementedinputs.The2forkisfasterthanthe1fork,asusual.TheresultingXORcircuithasfourstagesonthe2inverterleg,sothatweexpect
thiscircuittobebettersuitedforlargerelectricalefforts.
A
Y
B
Figure4.25: A2levelNANDcircuitwith2forkinputsimplementsa2inputXORgate.
To determine the minimum delay as a function of electrical effort we apply the 1danalysismethod with effort delay assigned to the gates as
showninFigure4.26.The1danalysisofthecircuityieldsthepolynomial
Givenelectricaleffort
thepositiverealrootofthepolynomalequalseffortdelay forwhichthecircuithasminimumpathdelay
4.28plots overelectricaleffort
Weseethatthe2forkXORgateisfasterupto
C
C in
d
3
For
Figure
the4stageNANDcircuitisfaster.
d
4
d
C
C
H C in
2d+1
Figure4.26: Pathofinterestof2levelNANDcircuitwith2forkinputs.
Compoundgates enable us to implement twolevel logic in a compact fashion. Thus, we may implement an XOR gate using OAI or AOI compound
gates.Figure4.27showsontheleftanXORcircuitdesignbasedonanOAIgate.ThiscircuitusesaNANDgatethatbehavesliketheinverterofa1
forkdrivingtheOAIgate.ThiscircuitisindeedslowerthanthemorecomplexversionontherightofFigure4.27thatusesa2forkstructureattheinputs
andaAOIgate.
A
B
Y
Figure4.27: 2inputXORgatebasedonanOAIcompoundgate(left),andwith2forkinputsandanAOIcompoundgate(right).
WeanalyzetheAOIcircuitusinga2danalysis,byassigningeffortdelay
to the NAND gate and the inverter in the upper leg and effort delay
tothelowerlegofthefork,andeffortdelay totheAOIgate.Then,the2danalysispermitsexpressing asafunctionof Minimizingthe
patheffortdelaybysettingitsderivativew.r.t. tozeroyieldspolynomial
Givenelectricaleffort thepositiverealrootsofthepolynomialyieldminimumpathdelay
whichisplottedinFigure4.28asa
functionof WefindthattheAOIcircuithascompetitivedelaysxacrosstherangeof butforno istheAOIcircuitthefastestXORgate.
16/43
3/2/2016
1forkXOR 2forkXOR
reconv.NAND 2levelNAND
AOI
MinimumDelay
30
25
20
15
10
10
20
30
40
ElectricalEffort
Figure4.28: MinimumdelaysofXORcircuitsoverelectricaleffort.
The comparison in Figure 4.28 suggests that the fastest XOR implementation depends on electrical effort
In particular, among the studied
alternatives,the2forkXORcircuitofFigure4.22isfastestfor
whereasfor
thetwolevelNANDcircuitofFigure4.25isfastest.
XNORGates
TheXNORgateisverysimilartotheXORgate.Logically, the XNOR gate produces the complement of the XOR gate. Thus, an input XNOR gate
implementstheevenparityoperation,whichoutputs1ifthenumberofinputswithvalue1iseven.ThetruthtablebelowshowstheXNORfunctionfor
whichequals1ifthenumberof1inputsis0or2.
A B C Y
0
TheCMOSimplementationofa2inputXNORgateisshowninFigure4.29.Likethe2inputXORgate,itconsistsoffourarms.Thepullupandpull
downnetworksarenotdualsofeachother.NoteworthyisthefactthatXORandXNORCMOSgateshavethesametopology.Inparticular,wedonot
needanoutputinvertertoimplementthecomplement,asrequiredfortheANDandORgates.
A=0
B=0
Figure4.29: CMOScircuitforXNORgateandinteractiveswitchmodel.
SincethetopologyoftheXNORgateequalsthatoftheXORgate,bothlogicaleffortandparasiticdelayoftheXNORgateareequaltothoseoftheXOR
gate:
17/43
3/2/2016
XNORgateswithmorethantwoinputscanbeconstructedanalogoustotheXORgate.AlsoanalogoustotheXORlogicgatearethedesignissuesof
CMOScircuitsfora2inputXNORlogicgate.
4.1.6.SkewedGates
Whenoptimizingacircuitforspeed,wemaywantthefallingtransitionofasignalfromhightolowvoltagetobefasterthantherisingtransitionfromlow
tohighvoltageorviceversa.Inthissection,wediscussskewedgateswhosecriticaltransitionisfasterthanthenoncriticaltransition.Incontrast,
CMOSgateswithequalrisingandfallingdelaysareunskewedornormalskewgates.WedistinguishbetweenHIskewgatesandLOskewgates.Ina
HIskewgatetherisingoutputtransitionisthefaster,criticaltransition,andinLOskewgatesthefallingoutputtransitioniscritical.
DesignandAnalysis
Wecanskewagatebytransistorsizing.Forexample,considerthematched2inputNANDgateinFigure4.30(a).Bydefinition,thetransistorsofthe
matchedNANDgatearesizedtoprovideequaldrivecurrentsfortherisingandfallingtransitions.Furthermore,thedrivecurrentsareequaltothoseof
the reference inverter. Since the delay of a gate depends on its drive current, the rising and falling delays are equal if the magnitudes of the drive
currentsofthecorrespondingtransitionsareequal.ThekeyinsightfordesigningaskewedgateistoshrinkthetransistorsintheCMOSnetworkthat
drivestheuncriticaltransition.Forexample,ifwewishtospeeduptherisingtransitionoftheNANDgate,weshrinkthenMOStransistorsofthepull
down network. The effect is that the pMOS transistors deliver the same drive current as the matched NAND gate on the rising transition, but smaller
nMOStransistorsreducethelogicaleffortofthegate,resultinginafastertransition.Forexample,theHIskewNANDgateinFigure4.30(b)halvesthe
widthsofthenMOStransistorscomparedtothematchedNANDgatetospeeduptherisingtransition.
(a)
(b)
(c)
Figure4.30: Skewinga2inputNANDgate:(a)matchedgate,(b)HIskewgate,(c)downscaledgate.
TocharacterizethedelayoftheHIskewNANDgate,wedetermineitslogicaleffortandparasiticdelay.However,duetotheunmatchedtransistorsizes,
logicaleffortandparasiticdelaydifferfortherisingandfallingtransitions.Weuse and todenotethelogicaleffortandparasiticdelayoftherising
outputtransitiondrivenbythepullupnetwork,and and forthefallingoutputtransitiondrivenbythepulldownnetwork.Then the delays of the
risingandfallingtransitionsofaskewedgateare:
where istheelectricaleffortofthegate.
Weconsidertherisingoutputtransitionfirst.Theoutputofa2inputNANDgatetransitionsfrom0to1ifinitiallybothinputs and are1,andoneof
theinputsswitchesfrom1to0.The corresponding pMOS transistor in the pullup network drives the output. Which one of the two pMOS transistors
switchesdoesnotmatterinthisexample,becausetheNANDgateissymmetric.SincethewidthofthepMOStransistorsoftheHIskewgateequalsthe
widthinthematchedNANDgate,bothgatesproducethesamedrivecurrent.Furthermore,becausethematchedNANDgatehasthesamepullupdrive
current as the reference inverter, we conclude that the HIskew gate has the same pullup drive current as the reference inverter. Now, recall our
definitionofthelogicaleffortofaCMOSgateastheratioofitsinputcapacitancetothatofthereferenceinverter,assumingthattheCMOSgateissized
todeliverthesamedrivecurrentasthereferenceinverter.Sincethedrivecurrentsareequal,thelogicaleffort oftherisingtransitionoftheHIskew
gateistheratiooftheinputcapacitanceoftheHIskewgateandtheinputcapacitanceofthereferenceinverter.AccordingtoFigure4.30(b),eachinput
hasinputcapacitance
because
and
sothat
ShrinkingthepulldowntransistorsoftheHIskewgateretainsthedrivecurrentthroughthepullupnetworkwhilereducingtheinputcapacitance.Asa
resultlogicaleffort
islessthanthelogicaleffortofthematchedgate
Similarly,theparasiticdelayoftherisingtransitionoftheHI
18/43
3/2/2016
skewgateis
Wefindthatparasiticdelay
islessthanthatofthematchedgate
Next,wedetermine and ofthefallingtransitionoftheHIskewgate.Toapplythedefinitionoflogicaleffort,weconstructthedownscaledNAND

gateshowninFigure4.30(c).ThedownscaledgatehasthesamenMOStransistorwidthsastheHIskewgate,sothattheirdrivecurrentsareequal.
Wehalvethewidthofthepulluptransistors,sothatthedownscaledgateisascaledversionofthematchedNANDgatewithscalefactor
Now,
wecanarguethattheHIskewNANDgatehasthesamepulldowndrivecurrentasthematchedNANDgatescaledby
whichinturnhasthe
samepulldowndrivecurrentasareferenceinverterscaledby
Therefore,thelogicaleffortofthefallingtransitionoftheHIskewgateisthe
ratiooftheinputcapacitanceoftheHIskewgateandtheinputcapacitanceofthescaledreferenceinverter:
foreachinput.Thecostofthereducedlogicaleffort
oftherisingtransitionisanincreasedlogicaleffort
ofthefallingtransition,compared
to
ofthenormalskewgate.Analogously,theparasiticdelayofthefallingtransitionoftheHIskewgateis
Therisingandfallinglogicaleffortsandparasiticdelaysarerelatedthroughscalefactor ofthedownscaledmatchedgate:
and
Notice that scale factor
is also the scale factor by which we shrank the pulldown nMOS transistors of the HIskew gate to begin with. This
observationsimplifiesthedesignandanalysisofaskewedgatebasedthemodeloflogicaleffort.BelowistheproceduretodesignandanalyzeaHI
skewgatetheprocedureforaLOskewgatecanbeviewedasdual:
1.Chooseascalefactor forthenMOStransistorsofthepulldownnetworkofthematchedgate.
2.Thelogicaleffort oftherisingtransitionistheratiooftheinputcapacitanceoftheHIskewgateandthereferenceinverter.Thefallingtransitionhas
logicaleffort
3.Theparasiticdelay oftherisingtransitionistheratiooftheoutputcapacitanceoftheHIskewgateandthereferenceinverter.Thefallingtransition
hasparasiticdelay
TofacilitateacomparisonbetweenaHIskewandnormalskewgates,weusetheaverageoftherisingandfallingquantities:
Theaverageprovidesanaccuratemeasureofthetransitiondelayofagateovertime.Whenthegateisinoperationaspartofalargercircuit,halfofall
outputtransitionsarerisingandtheotherhalfarefallingtransitions.FortheHIskewNANDgateinFigure4.30,theaveragelogicaleffortis
andtheaverageparasiticdelay
Thus,althoughourHIskewNANDgatehasafasterrisingtransitionthantheunskewedgate,bothaverage
logicaleffortandparasiticdelayarelargerthan
and
oftheunskewedgate.
FastSkewedGates
OurobservationthattheHIskewNANDgatehasalargeraveragedelaythantheunskewedNANDraisesthequestionwhetheraskewedgateexiststhat
isfasterthanitsunskewedversion.Thisisthecaseindeed.TheLOskew2inputNORgateinFigure4.31(b)hasasmalleraveragelogicaleffortand
smalleraverageparasiticdelaythanthenormalskewNORgate.
Y
1
(a)
Y
1
(b)
Y
3/4
3/4
(c)
Figure4.31: Skewinga2inputNORgate:(a)matchedgate,(b)LOskewgate,(c)upscaledgate.
19/43
3/2/2016
The LOskew NOR speeds up the falling transition by shrinking the pullup pMOS transistors. In Figure 4.31, we scale the pMOS transistors of the
matchedNORgatewithfactor
toobtaintransistorwidth
SincethenMOStransistorsareunchangedcomparedtothematchedNOR
gate, the drive current of the falling transition through the pulldown network of the LOskew gate is the same as the pulldown drive current of the
matchedNORgateandthereferenceinverter.Therefore,thelogicaleffortofthefallingtransitionis
Thelogicaleffortof
the rising transition is then
Note that is the ratio of the input capacitances of the LOskew gate and the scaled reference
inverter,and
isequaltothescalefactoroftheupscaledmatchedNORgateinFigure4.31(c).TheaveragelogicaleffortoftheLOskewNOR
gate is
which is less than the logical effort
of the normalskew gate. Similarly, we find the parasitic delays
and
Theaverageparasiticdelayof
unitsislessthantheparasiticdelay
ofthenormalskewNORgate.
Weconcludethatforallelectricalefforts theaveragedelayoftheLOskewNORgate,
isslightlylessthanthedelayofthe
normalskewNORgate,
Wecanformulateaminimizationproblemtodeterminescalefactor fortheLOskewNORgatesuch
thattheaveragedelayassumesitsminimum.Usingcalculus,wefindthatscalefactor
minimizes the average delay if it is chosen as a function of electrical effort For

the range of is very small,
Therefore,
choosing
will approximate the minimum average delay reasonably well for all The resulting LOskew NOR gate has pMOS widths
This gate is slightly faster than the matched NOR gate, because its average logical effort
is less than
and the
average parasitic delay
is the same. Before you decide to use LOskew NOR gates in your circuits, however, notice that the speed benefit
comparedtothenormalskewNORisrelativelysmall.Furthermore,inmanycircuits,itisnottheaveragedelaybuttheworstcasedelaythatlimitsthe
overall performance. The worstcase delay of the LOskew NOR is the delay of the noncritical rising transition, which is larger than the delay of the
matchedNORgate.
MultistagePaths
Thedelayofan stagepathwithskewedgatesdependsonitsoutputtransition.Letindex
thepathdelaysoftherisingandfallingoutputtransitionare:
denotethe
gatefromthelaststage,then
Since CMOS gates implement monotonically decreasing functions, every second stage of the path transitions in one direction, and all other stages
transitionintheoppositedirection.Therefore,iftheoutputofgate1inthelaststagerises,theoutputsofallgateswithoddindex rise,andtheoutputs
ofthegateswitheven fall.Pathdelay
isthesumofthecorrespondingskewedgatedelays.Analogously,iftheoutputofthelastgateofthepath
falls,thepathdelayis
Theaveragepathdelayisafunctionoftheaveragelogicaleffortsandparasiticdelays:
Ifwewishtominimizetheaveragepathdelay,weapplythemethodoflogicaleffortformultistagepathsasweknowitalready,exceptthatweusethe
averagelogicaleffortsandparasiticdelaysoftheskewedgatesforpathandgatesizing.Thedesignprocedureisthesameasforunskewedgates.With
skewedgateswehaveanadditionaldesigngoalatourdisposal,whichistominimizethedelayofthecriticaltransition.
Example4.1: PathwithSkewedGates
Considerthe3stagepathinFigure4.32withaHIskewNANDgateinstage1,aLOskewNORgateinstage2andanormalskewinverterinstage3.
GiventhetransistorwidthsinFigure4.32andloadcapacitance
determinethepathdelayofthecriticalrisingtransition,thepathdelayofthe
noncriticalfallingtransition,andtheaveragepathdelay.
20/43
3/2/2016
2
A
2
Y
Figure4.32: A3stagepathwithaHIskewNAND,aLOskewNOR,andanormalskewinverter.
Analysisofthegatelogicaleffortsandparasiticdelaysyieldsthetabulatedvalues.
HIskewNAND 1
3/2
5/3
10/3
5/2
LOskewNOR
3/2
8/3
4/3
NOT
Theaveragepathdelayfrominput or tooutput isthesumoftheaveragegatedelays:
Theminimumdelayisthedelayofthecriticaltransition.Thecriticaltransitionofthepathistherisingoutputtransition,whichthepathspeedsupwiththe
HIskewNANDgatefollowedbytheLOskewNORgate.Therisingdelayis
Themaximumdelayisthedelayofthefallingoutputtransition
Wefindthatthecriticaltransitionofthepathis2.5delayunitsfasterthantheaveragedelay,attheexpenseofthenoncriticaltransitionwhichis2.5delay
unitsslower.
Weclosethissectionwithabriefnoteonanotherpracticallyrelevantapplicationofskewedgates,i.e.thedesignofgateswithequalrisingandfalling
delaysiftheratioofthecarriermobilitiesoftheCMOSfabricationprocessisnot
Throughoutthistext,weassumethatthemobilityratiois
Value 2 is convenient for backoftheenvelope estimates, but is merely an approximation to reality, where typical values are in range
In CMOS processes where
we may skew the gates to equalize the rise and fall times. The model of logical effort
enablestodeterminethedesiredtransistorwidths,seeChapter7in[SSH99].
4.2.Comparators
Comparatorcircuitscomparetwo bitsignals and Themagnitudecomparisons,including
or
arecommonlyimplementedwithan
arithmeticcircuit.However,fortwospecialcases,equalitycomparisonandequalitytozero,simplercircuitsexist.
4.2.1.EqualitytoZero
Givenan bitsignal wewishtodeterminewhether
istrueorfalse.Equality
Thus,wemayusean inputNORgatetocomputetheequalitytozero:
Notethatvalue inequality
istrue,ifforeachsignal
wehave
referstoannbitsignalof0s,whereasthe intheNORequalitydenotesasinglebit,theoutputoftheNORgate.
Forlargervaluesof wemayuseatreestructuredNORgatetominimizethedelay.Figure4.33showsonepossibletreestructurefor
isduetotheBooleanidentity:
which
21/43
3/2/2016
Asforanytreestructure,thefastestdesignforan bitequalitytozerocomparatordependsonthenumberofinputs
A0
A1
A2
A3
A0
A1
A2
A3
Figure4.33: Equalitytozerocomparatorfora4bitsignal:(left)4inputNORgate,(right)onepossibletreestructuredNORgate.
4.2.2.EqualityComparator
Giventwo bitsignals and wewishtodeterminewhether
caseoftheequalitycomparisonwhere isidenticaltozero.Equality
istrueorfalse.Theequalitytozerocomparisonmaybeviewedasaspecial
istrueifforeachbitposition wehave
Moreprecisely:
Recallthatthe2input XNOR gaterealizestheequalityrelation,

Thus, we may implement an equality comparator using
one2inputXNORgateforeachbitposition,andan inputANDgatetocomputetheallquanitficationbymeansofaconjunction:
Output
if
For larger values of we implement the AND gate as a tree structure. For example, Figure 4.34 shows a 4bit equality
comparatorwithaNANDNORtreeimplementationofthe4inputANDgateontheright.
A0
A0
B0
B0
A1
A1
B1
B1
A2
A2
B2
B2
A3
A3
B3
B3
Figure4.34: Equalitycomparatorfor4bitsignals:(left)conjunctionwith4inputANDgate,(right)onepossibletreestructuredANDgate.
4.3.Multiplexer
Amultiplexer,ormuxforshort,isacircuitwith
inputsandsteerittooutput
D1
and selectinputs
thatchooseoneofthe data
Forexample,a2inputmultiplexer,alsocalled2:1muxtoemphasize2inputsand1output,has
and
selectinput Output isdefinedsuchthat
S
D0
datainputs
datainputs,
and
Thecircuitsymbolofthe2:1muxsignifiesthatselectsignal steersinput tooutput if

andotherwiseinput
Wecanmodelthefunctional
behaviorofa2:1multiplexerusingaswitchwithtwoclosedpositions.Notethatthisswitchmodeldiffersfromourstandardmodel, where a switch is
eitheropenorclosed.
S=0
D0
D1
S=1
Y
D0
Todesignacircuitfor2:1mux,wefirstconstructatruthtable.The2:1muxhasthreeinputs,
D1
and
andoneoutput
We transliterate the
22/43
3/2/2016
specificationlogicstraightintothetruthtable:if
S D1
D0
thensetoutput
(topfourrows),elseif
set
Output ofthistruthtableisrepresentedbytheBooleanexpression:
(bottomfourrows):
This expression suggests an implementation of the 2:1 mux using logic gates as shown on the right.
Thecriticalpathofthiscircuitisa3stagepathfrominput throughtheinverter,oneANDgateandthe
ORgatetooutput
Afasterimplementationofthe2:1muxcanbebasedonacompoundgate,orwiththeCMOScircuit
D0
shown in Figure 4.35. Since the circuit produces the complemented output
it
implements an inverting 2:1 mux. Each data input drives one vertical select arm consisting of two D1
nMOSandtwopMOStransistors.Thecomplementedanduncomplementedselectinputsenableone
of the select arms to drive output If select input
the left arm is enabled, and if
the right arm. The output value depends on the
correspondingdatainput, fortheleftarmand fortherightarm.
S=0
D1=0
D0=0
Figure4.35: CMOScircuitforinverting2:1muxandinteractiveswitchmodel.
Theinverting2:1multiplexerinFigure4.35hasaremarkablyflexiblestructurethatextendsto datainputs,
evenif isnotapowerof2.Ifwe
are willing to generate one select signal per data input, e.g. by means of a decoder,wecanreplicatetheselectarm timestoconstructthe way
multiplexerinFigure4.36.Selectarm drivesthecomplementof ontooutput ifselectinput
Allotherselectsignals
must be
Otherwise,twoenabledselectarmsmightdrivedifferentvaluesonoutput effectivelyshortcircuiting
andGND.
23/43
3/2/2016
...
S0
S1
Sn1
Y
S0
S1
D0
D1
Sn1
...
Dn1
Figure4.36: An waymultiplexerwithoneselectarmperdatainput.
Since each select arm consists of two series pMOS transistors in the pullup network and two series nMOS transistors in the pulldown network, the
matched way multiplexer has pMOS width
and nMOS width
Therefore, the logical effort of each data input is
independentofthenumberofinputs.Thispropertyisuniquefeatureofthemultiplexercircuit.Unfortunately, the parasitic delay
grows
proportionaltothenumberofinputs Nevertheless,wenotethattheeffortdelayofthemultiplexercircuitisindependentofitsfanin.
For larger numbers of inputs the parasitic delay of the way circuit in Figure 4.36 may dominate the total delay. In this case, treestructured
multiplexers provide a fast alternative. As an example, consider a 4:1 multiplexer with four data inputs and two select inputs, as shown on the left in
Figure4.37.The4:1multiplexersteersdatainput tooutput iftheselectsignalequals inbinaryformat.Notethat selectinputsenableusto
selectoneof
datainputs,becauseabinarynumberwith bitscanrepresentunsignednumbersinrange
selectsignalsmaybeusedasselectinputforoneof levelsinabinarytreeof2:1muxeswith
Figure4.37has
inputsand
levels.
Inparticular,eachofthe
datainputs.Themultiplexertreeontherightin
S0
S
2
D0
00
D1
01
D2
10
D3
11
D0
D1
S1
0
S0
Y
1
D2
D3
Figure4.37: A4:1multiplexer(left)builtasatreeof2:1multiplexers(right).
Thefunctionalityofthe4:1multiplexeriseasytoverifybyperfectinduction.Forexample,if
i.e.
and
steersits0inputto because
The0inputofthelevel1muxisdrivenbythetoplevel0mux,whichsteers
Thereforeoutput
for
Weargueanalogouslyabouttheothercasesoftheperfectinduction:
Adelayanalysisrevealsthata waymuxwithindependentselectarmshasadelayof
thenthelevel1outputmux
toitsoutput,because
perdatainput,whereasthe4:1treemuxwith
two2waymuxesonitscriticalpathhasaminimumdelayof
delayunitsperdatainput.Thus,thetreemuxisfasterthanthe4way
circuitforelectricaleffort
Thisdelayanalysisneglectsthefactthatthe4waymuxcircuitisinvertingwhereasthetreemuxisnot.Wemayalso
designtreestructuredmultiplexerswith4wayor8waymuxesasbuildingblocks.Duetotheflexibilityofthe waymuxcircuit,thedesignspacefortree
structuredmuxesisquitelarge.Ithasbeenshown,however,that4waymuxesasbuildingblocksproducegenerallythefastesttreestructuresfor
datainputs.Chapter11of[SSH99]studiestreestructuresinmoredetail.
4.4.TristateInverterandTransmissionGates
AsingleselectarmofthemultiplexercircuitinFigure4.36isatristateinverter.Thenametristatereferstothefactthattheoutputcanassumeathird
24/43
3/2/2016
stateinadditiontotheusualbinarystates0and1.Inthethirdstate,output ofthetristateinverterisconnectedneitherto
norGND.Therefore,the
outputhasanundeterminedvoltage.Wesaytheoutputfloats,anddenotethisthirdstatewithletterZ.Figure4.38showsthattheoutputoftheselect
armfloatsifselectsignal
S=0
Otherwise,if
thearmfunctionslikeaninverter,i.e.
D=0
Figure4.38: Interactiveswitchmodelofonemultiplexerselectarm.
Inamultiplexercircuit,allbutoneselectarmfloat.Theselectedarmdoesnotfloat,anddrivesoutput
Inafigurativesense,thefloatingarmsshutup,
leavingthewordtotheonlyselectedarm.Inan waymultiplexerwith
arms,thereisalwaysoneselectedarm,andtheoutputneverfloats.In
general, this is the expected behavior from a multiplexer, that an implementation with logic gates produces as well. Tristate inverters expand the
applicabilityofselectarms,forexampleasdriversforsharedbuseswhereallarmsmayfloat.
Toarriveatthetraditionalcircuitrealizationofatristateinverter,weplaythefollowingcircuittrick.Notethatwecanintroduceanewwireintheselect
arm,showninFigure4.39(b),withoutaffectingitsfunctionality.Now,imaginewecouldpulloutput totherightlikearubberband.Then,weobtainthe
topologicallyequivalentcircuitinFigure4.39(c).
X
S
S
S
Y
(a)
(b)
(c)
Figure4.39: Threeequivalentcircuits:(a)selectarm,(b)selectarmwithadditionalwire
(c)inverterdrivingatransmissiongate.
TheparallelcompositionofannMOSandpMOStransistorinFigure4.39(c)isatransmissiongatewiththefunction:
The transmission gate disconnects its two terminals, if

inverter,suchthat
If
inverterdrivesoutput
EN
A
and connects its two terminals if
output isdisconnectedfrom
andfloats.Otherwise,if
InFigure4.39(c), terminal
is driven by an
output isconnectedtoterminal
and the
throughthetransmissiongate.
Thiscircuitiscalledtristateinverter,andiscommonlydefinedasa2inputgate,withadistinguishedenableinput.Thesymbolof
thetristateinverterisshownontheleft.Itassumesthattheenablesignaliscomplementedinternally.Thefunctionofthetristate
inverterisdefinedas
25/43
3/2/2016
ToanalyzethelogicaleffortofthetristateinverterinFigure4.39(c),weneedtounderstandthepasscharacteristicsofthetransmissiongate.To that
end,weintroducearefinementofoursimpletransistorswitchmodel.The key to understanding the transmission gate is the thresholdvoltage
whichisacharacteristicprocessparameterofatransistor.FortodaysnMOStransistors,
isinrange
andforpMOS
transistors
hasoppositepolarity.
Vd
nMOS
g=1
VDD
VDD
Vds >= 0
strong0
VDD
Vgs
Vgs
pMOS
g=0
d=1
Vds >= Vt
weak1
Vgs
GND
s=0
Vs
GND
Vds <= Vt
GND
Vgs
weak0
d=0
Vs
VDD
s=1
GND
Vds <= 0
Vd
strong1
Figure4.40: Refined switch model of MOS transistors for ON position. Due to a threshold voltage ( ) drop, nMOS transistors pass a weak 1 and
pMOStransistorsaweak0.However,sincenMOStransistorspassastrong0theyaresuitedforpulldownnetworks,andsincepMOStransistorspassa
strong1theyaresuitedforpullupnetworksofCMOScircuits.
AnnMOStransistorisswitchedoffifthevoltagebetweengateandsourceissmallerthanthethresholdvoltage,
Thiscutoffeffectinfluences
thecurrentbetweensourceanddrain,
aswellastheterminalvoltagesinamoredifferentiatedfashionthanoursimpleswitchmodel suggests.
Figure4.40showstwocaseswherethesourceterminalofthenMOStransistorisconnectedtoGND(left),andwherethedrainterminalisconnectedto
(right). We assume the gate voltage of the nMOS transistor equals
In our simple switch model
i.e. g = 1 in the digital
abstraction, closes the nMOS transistor. Now, consider the case where the source is tied to GND,
see Figure 4.40 on the left. The voltage
betweengateandsourceis
whichislargerthanthresholdvoltage Thecurrent flowingfromdraintosourcedependson
drain voltage andonresistance of the transistor, such that
according to our transistor RC model.In case where
the
transistorpullsthedrainvoltagetoground,i.e.
ThisisthebehaviorofannMOStransistorthatweknowfromthepulldownnetworkofthe
CMOSinverter,forexample.
InFigure4.40ontheright,bothgateanddrainterminalsofthenMOStransistoraretiedto
sothat
In this circuit, current
dependsonsourcevoltage Sincethetransistorisswitchedononlyif
thesourcevoltageoftheclosedtransistorcannotriseto
without
external force. Instead,
implies
by KVL, so that
The sourcedrain voltage suffers the socalled threshold drop,
whicheffectivelyreducesoncurrent
InthedigitalabstractionwesaythatthenMOStransistorpullsthesourceterminaltoaweak1 at the drain,
becausethemaximumsourcevoltageisby smallerthandrainvoltage
Incontrast,showninFigure4.40ontheleft,thenMOStransistorpulls
thedrainterminaltoastrong0atthesource,becausetheminimumdrainvoltageequalssourcevoltage0.Thisbehavioristhereasonwhyweuse
nMOStransistorsinpulldownbutnotinpullupnetworksofCMOScircuits.
TheanalogouseffectoccursinpMOStransistors,whereallpolaritiesarereversed.The threshold voltage of a pMOS transistor is negative, so that a
pMOStransistorisclosedif
andisopenif
ThepMOStransistorpullsthesourceterminaltoaweak0atthedrain,because
theminimumsourcevoltageisby largerthan
seeFigure4.40ontheleft.However,apMOStransistorpullsthedrainterminaltoastrong1
at the source, because the maximum drain voltage is
if
which is smaller than
see Figure 4.40 on the right. Thus, pMOS
transistorsasuitedforpullupbutnotforpulldownnetworksofCMOScircuits.
26/43
3/2/2016
nMOS
Vd =0
Vd = VDD Vt
g=1
s=0
g=1
d=strong0
pMOS
s=1
d=weak1
Vs = Vt
Vs = VDD
g=0
d=0
g=0
s=weak0
d=1
s=strong1
Figure4.41: ThepasstransistormodelabstractstherefinedswitchmodelinFigure4.40.
Figure4.41showsthepasstransistormodelasaconvenientabstractionoftherefinedswitchmodel.WesaythatannMOStransistorpassesastrong
0frominput(source)tooutput(drain),butaweak1.Analogously,apMOStransistorpassesastrong1frominput(drain)tooutput(source),butaweak
0.Sincetransistorsaresymmetric,sourceanddraincanbeusedinterchangeablyasinputsoroutputs.Accordingtothepasstransistormodelwecan
useeitherannMOSorapMOStransistorasaswitchtopassaninputsignaltotheoutput.However,neitherpassesbothinputs0and1stronglytothe
output.ThetransmissiongateusesaparallelcompositionofnMOSandpMOStransistors,sothatoneofthetransistorspassestheinputstrongly,see
Figure4.42.
EN=0
A=0
A=0
EN=0
Y=0
A=1
EN=1
Y=1
EN=1
EN=1
EN=0
EN=1
EN=1
EN=0
Y=strong0
A=1
Y=strong1
Figure4.42: Thetransmissiongatepassesbothastrong0viathenMOSandastrong1viathepMOStransistor.
Whenatransistorpassesastrong0or1,itactslikeaclosedswitchintheRCmodel.Current isdeterminedbytheonresistance.However,when
passingaweak0or1,current iseffectivelysmallerthanpassingastrong0or1.Wemodelthisbehaviorbymeansofanincreasedonresistance.
Asareasonableestimate,weassumethattheonresistanceistwiceaslargewhenpassingaweakversusastronginput.Then, given onresistances
and ofthesimpleswitchmodel,wedefinetheweakandstrongonresistancesas
Theonresistancesenableustomodelthetransmissiongateasaresistiveswitchcircuit,asshowninFigure4.43.
27/43
3/2/2016
EN=0
EN=0
A=0
Y=strong0
A=1
EN=1
weak0
EN=1
2Rp
A=0
strong1
Y=0
strong0
Y=strong1
Rp
A=1
Y=1
Rn
weak1
2Rn
Figure4.43: Resistiveswitchmodeloftransmissiongatepassinga0(left)anda1(right),assumingnormalizedwidths
Givenunitsizedtransistors,wehave
passing0is:
becauseofmobilityratio
Theneffectiveonresistancefor
4
EN
andforpassing1:
Sincethedifferenceissmall,weapproximatetheonresistanceofthetransmissiongatewithunitsizedtransistorsto
be inbothcaseswhenpassing0or1.Withthisapproximation,wearriveatthematchedtristateinvertershownon
theright.TheequivalenttristateinvertercircuitinFigure4.39(c)consistsofaseriescompositionofaninverteranda
transmissiongate.Bydoublingthesizeofboth,thetristateinverterhasthesamepullupandpulldowndrivecurrents
as the reference inverter. Thus, the logical effort of the tristate inverter is
complementedanduncomplementedenableinputs.Theparasiticdelayis
for input
EN
and
for the
4.5.Encoder
Inthebroadsense,anencoderisacircuitthattransformsitsinputsintoacodewordofagivencode.Inthenarrowsenseofdigitalcircuits,anencoder
commonlydenotesacircuitthattransformsaonehotcodedwordintoabinarycodedword.Inthissection,wediscusstheonehottobinaryencoderand
anotherusefulcircuit,thepriorityencoder.
4.5.1.OneHottoBinaryEncoder
Theonehotcodewith bits where
codewith
bitsdefines4codewords
0
setsbit
for
incodeword
:
Theonehottobinaryencoderisacircuitwith inputs
where
formatifinput
Forexample,thistruthtabledefinesa4:2encoderfor
andallotherbits
and outputs
:
where
for
For example, the onehot
suchthatoutput
inbinary
28/43
3/2/2016
Figure4.44showstwoalternativecircuitsforthe4:2encoder.
A3 A2 A1 A0
A3 A2 A1 A0
Y0
Y0
Y1
Y1
Figure4.44: Twoimplementationsofa4:2encoderbasedonORgates(left)andNORgate(right).
Whenbuildinglargerencoders,eachofthe outputsrequiresanORorNORgatewith
inputs.Theloadpresentedtotheinputsoftheencoder
differssubstantially,betweendrivingjustoneanddriving outputgates.Forexample,a16:4encoderbasedonORgateshastheoutputequations:
Input drivesonlyoneORgateofoutput
Incontrast,input
isconnectedtoallfourORgates.Therefore,theelectricaleffortof
is4times
larger,andsignalchangesoninput
sufferalargerdelaythanoninput
Themethodoflogicaleffortenablesustoassesswhetherwecanspeed
uptheslowinputs,forexamplebyinsertingbuffers.However,ratherthanequalizingthedelaysoftheencoderinputs,wepreferholdingthedrivingcircuit
responsibleforcopingwiththevariousloadcapacitancespresentedbytheencoderinputs.
4.5.2.PriorityEncoder
An bitpriorityencoderisacircuitwith inputs
and outputs
where
suchthat
Informally,apriorityencoderidentifiesthefirstinthesequenceofinputbitswithvalue1.Forexample,hereisthetruthtableofa3bitpriorityencoder:
The inputs are ordered according to their index Output

if
and
up to
and input
The values of the
remaininginputs
doesnotmatterforoutput ThecircuitontheleftofFigure4.45implementsthislogicforoutput withan
inputANDgate,exceptfor
29/43
3/2/2016
A0
A0A1A2A3A4
Y0
A1
Y1
Y2
A2
Y0
Y1
Y2
Y3
A3
Y3
Y4
A4
Y4
Figure4.45: Twoimplementationsofa5bitpriorityencoderwithoneANDgateperoutput(left)andadaisychain(right).
ThedaisychaincircuitontherightofFigure4.45useslessareathantheANDgatecircuitontheleft,attheexpenseofalargerpropagationdelay.Asa
roughestimateofthearearequirements,letuscountinvertersand2inputgatestoassessthearearequirementsofthetwocircuits.Given2inputgates
only, we implement all larger AND gates of the circuit on the left as tree gates. Furthermore, assume that we make the complemented and
uncomplementedinputsavailablewith2forks,resultinginthreeinvertersperinput,foratotalof invertersfor 2forks.AtreestructuredANDgate
with inputsrequiresapproximately
2inputNANDgatesandthesamenumberofinverters.Thus,alloftheANDgatesofan bit
priorityencodertogetherrequireroughly
2inputNANDgatesandinverters,respectively.Therefore,thetotalnumberofgatesofthepriorityinverterontheleftinFigure4.45 is approximately
2inputNANDgatesand
inverters.Asymptotically,forlarge thenumberofgatesgrowsquadratically,ascaryfact.But,thecircuitcan
bequitefast,becausethecriticalpathisproportionaltotheheightofthelargestANDgateofoutput
whichconsistsofabout
2inputAND
gatesonly.
ThedaisychaincircuitontherightofFigure4.45usesonlyalinearnumberofgates.EachinnerstageofthechainconsistsoftwoNANDgatesand
threeinverters.ThefirststageneedsonlyoninverterandthelaststageoneNANDgateandoneinverter.Therefore,thetotalnumberofgatesofan
bit priority encoder with daisy chain structure is
2input NAND gates and
inverters. For large the
daisychainoccupiessignificantlylessareathantheANDgatedesign.Thepricewepayfortheareaefficiencyisthepropagationdelayofthecritical
path,whichstretchesfrominput tooutput
Thereare
NANDgatesand invertersonthispath.Apathwith
stagesrequiresa
patheffortofabout
tobeoptimallysized.Forlarge thispatheffortwouldbehumongous.Inalllikelihood,practicaldesignswillhavetosettle
onasuboptimaldelaythatisdominatedbythegatesonthecriticalpath.
If we wish to design a wide priority encoder with a large number of inputs and outputs then the AND gate design is not an option because of its
quadraticarearequirements,andthedaisychainwouldbeslow.Thus, we may want to reduce the number of gates on the chain. We can halve the
number of gates by using the inverting chain in Figure 4.46. The NAND gate in odd stage
of the chain computes the complement
andtheNORgateinevenstage
ofthechaincomputestheuncomplementedconjunction
ofthecomplemented
inputsignals.
30/43
3/2/2016
A0
Y0
A1
A2
Y1
A3
Y2
Y3
A4
Y4
A5
Y5
A6
Y6
Figure4.46: A7bitpriorityencoderwithinvertingdaisychain.
Example4.2: PriorityEncoder
Wewishtosizethegatesofthe7bitpriorityencoderinFigure4.46inordertominimizeitsdelay.Weproceedintwosteps.First,weobtainarough
estimateforthedelayofthechainassumingfixedgatesizes,and,second,wesizethegatesonthedaisychainaccordingtothemethodoflogicaleffort.
Togetafeelforthedelayofthedaisychainassumetheloadcapacitanceofeachoutputis
Furthermore, all gates shall be scaled by
factor
Thus,eachinverterhasinputcapacitance
eachNANDgateinputpresentscapacitance
and each NOR gate input
Ourpathofinterestisthecriticalpathfrominput tooutput
Thelogicaleffortofthepathwith1inverter,3NANDgates,and3NOR
gates is
The branching effort of all stages except the last is
for a path branching effort of
The electrical
effortis
Thus,thepatheffortofthecriticalpathis
Accordingtothepathsizingcalculator, the best
numberofstagesofaninverterchainforthisstageeffortis
Thus,thenumberofstagesonthecriticalpathofourdaisychain,
isbyno
meansunreasonable.ThedelaycomputationofourpriorityencoderdesignisdetailedinTable4.1.
16
8/3
2.67
20
5/2
4/3
3.33
10
16
8/5
5/3
2.67
20
5/2
4/3
3.33
10
16
8/5
5/3
2.67
10
5/4
4/3
1.67
10
36
3.6
5/3
total
22.33 13
Table4.1: Delayestimatesforpriorityencoder.
Table4.1suggestsobviousimprovementstoourcircuit.Toobtainminimumdelay,allstagesshouldbearthesameeffortdelay.Stages6and7present
outlierswith
and
deviatingthemostfromtheaverage
Wecanfixtheimbalancesbetweenthestageeffortsbygate
sizing.Tominimizethedelayofthechain,weshouldspreadpatheffort
acrossthe
stagesofthechain.Thus,withaneffortdelayof
than
perstage,thetotaleffortdelaywouldbe
timeunits.Theresultingimprovementcomparedtoourinitialdesignisless
However,wemayevendobetterifwealsoresizetheoutputgates.
Thelessonwehavelearnedfrombranchingcircuitsisthatthedelayonthepathofinterestcanbereducedeitherbydecreasingtheoffpathcapacitance
or by increasing the onpath capacitance or both. The method of logical effort also teaches us that each stage of a fast design should bear effort
If the NOR gates that drive offpath outputs and bear best stage effort
they should have input
capacitance
Thisisalmosttwiceofourinitialassumption,andincreasesthebranchingeffortcomparedtoourinitialdesign.Welower
thebranchingeffortofthepathofinterestbyreducingtheNORgatesizesofoutputs and fromtheoriginalvalueof
slightly to
TheoffpathANDgates,i.e.theNANDINVpairs,thatdriveoutputs and havetwostageseach.Thus,ifbothNANDgateand
inverterbearbeststageeffort
then the NAND gates can have input capacitance
If we use minimum sized NAND gates with
wecanreducethebranchingeffortofthepath.Asaresult,weobtainthecriticalpathofthepriorityencodershownbelow,wherethe
31/43
3/2/2016
inputcapacitances
oftheNANDandNORgatesareyettobedetermined:
C
C
6
Y6
A0
36
Ingeneral,toobtaintheminimumdelayforsuchaproblem,weneedtoiteratethegatesizing.Forthefirstiteration,weproceedasfollows.Weassume
that the branching efforts in each stage is
Then, we know path effort
already, and the stage effort for minimum delay should be
whichisreasonablycloseto
becausethenumberofstagesisclosetotheoptimum.Workingfromtheoutputtowardstheinput,we
cancomputetheinputcapacitancesofeachgatebasedontherelationfor
3.45 6.29 6.06 6.83 7.27 9.0
20.1
1.64 2.32 1.58 2.1
1.44 1
Giventhegateinputcapacitances,wecancomputethebranchingefforts,andfindthatourassumptionofbranchingeffort
only on average. The branching effort of our sized path is
rather than 64. Thus, given
only, and the stage effort for minimum path delay should be
isalmostfulfilled,but
the path effort is
rather than assumed value
Nevertheless,ourdesignisprettyclosetotheminimumdelaywithoutfurtheriterationsalready.Table4.2summarizestheresultingdelays.
10.29
1.71 1
1.71 1
6.29
14.06
2.23 4/3
2.98 2
6.06
10.83
1.79 5/3
2.98 2
6.83
15.28
2.23 4/3
2.98 2
7.27
13.01
1.79 5/3
2.98 2
9.01
20.13
2.23 4/3
2.98 2
20.13
36
1.79 5/3
2.98 2
total
19.6 13
Table4.2: Delayestimatesforoptimizedpriority
encoder.
Wefindthatgatesizingreducesthedelayoftheoriginalpriorityencoderfrom
prettyfastalready.
to
timeunits.Thisresultconfirmsthatourinitialattemptwas
4.6.Decoder
Adecodercomputestheinversetransformofanencoder.Intherealmofdigitalcircuits,thetermdecodercommonlyreferstoacircuitthattransformsa
binarycodeintoaonehotcode.
Abinarytoonehotdecoderisacircuitwith inputs
where
and outputs where
thatassertsoutput ifthebinary
codedvalueoftheinputequals Forexample,thetruthtablebelowspecifiesa2:4decoderwith
inputsand
outputs,suchthatoutput
ifinput
for
Figure4.47 shows the implementation of the 2:4 decoder. Since each output represents one minterm, we use one 2input AND gate per output. In
general,fora
decoder,thecircuitrequires ANDgateswith inputs each. Each complemented and uncomplemented input drives
AND
gates.
Theprimaryuseofdecodersisasaddressdecodersformemories.Forexample,aregisterfilewith8registersemploysa3:8decodertoselect1out
of8registers.Theaddressinputis3bitswide,withabinarycodetocoveraddressrange
The3:8decodertransformsthebinaryaddressinto
a onehot code that asserts exactly 1 of its 8 outputs associated with the selected register. A gatelevel implementation of a 3:8 decoder is shown in
32/43
3/2/2016
Figure4.48ontheleft.Weuse2forkstoprovideeachinputincomplementedanduncomplemented
polarity.Sinceeachinputdrives4NANDgates,theloadsofthelegsareequal.Assumingalsothatthe
loadsoftheoutputs areequal,thedecodercircuitissymmetric.Thus,wemayminimizethedelayof
thedecoderbyanalyzingthepathofinterest,actuallytheforkofinterest,shownontherighthandside
of Figure 4.48. Each input has capacitance
The load capacitance at each output
shall be
Each leg of the fork drives 4 NAND gates, one of which is on the path of
interest,whereastheothersbranchoffthepathofinterest.
A1
A0
Y3
Y2
Y1
Y0
Figure4.47: Implementationofa2:4decoder.
A2
A1
A0
Y7
3 C3
Y6
C1
Y5
C3
d+1
d
H Cin
Cin
C3
C2
d/2
H Cin
d/2
Y4
3 C3
Y3
Y2
Y1
Y0
Figure4.48: Gatelevelimplementationof3:8decoder(left)andpathofinterestfordelayanalysis(right).
Weusethe1danalysismethodtominimizethedelayofthecircuit.ThedelaysallocatedtotheindividualgatesareincludedinFigure4.48.Forthe
upperlegofthefork,wemultiplytheeffortdelaystoobtain:
because
Forthelowerleg,thepartialproductsoftheeffortdelaysyield:
33/43
3/2/2016
Since
Givenelectricaleffort
weobtainthepolynomialin :
wecandetermineeffortdelay
decoderisthen
where
asthelargeroftwopositiverealrootsofthepolynomial.Theminimumdelayofthe3:8
Figure4.49plotstheminimumdelayfor
D
MinimumDelay
35
30
25
20
15
10
20
40
60
80
ElectricalEffort
Figure4.49: Minimumdelay of3:8decoderasafunctionofelectricaleffort
Our3:8decoderisessentiallya3stagedesign,whichisbestsuitedforpatheffort
Forlargerdecodersorcircuitswithlargerelectrical
effort wemayincreasethenumberofstageseitherbyappendinginverterstotheoutputsorbyprependinginverterstotheinputs.Furthermore,for
large wemayimplementthe
inputNANDgatesusingatreestructure.
4.7.MemoryElements
Allofthelogicalcircuitswehavediscussedsofarareacyclic,thatistheydonotformcyclesorloopsexceptimplicitlythroughthepowersupply.Acyclic
circuitshaveinputs andoutputs andimplementlogicfunctionsoftheinputs,
Whentheinputschange,theoutputsassumethevalues
definedby afteracircuitspecificpropagationdelay.Cycliccircuitsexhibitamorecomplex,timedependentbehaviorthantheiracycliccousins,because
their outputs depend on the input sequence including current and past input values. Cyclic circuits are of particular interest for the implementation of
memoryelements.Inthissection,wediscusstwoofthemostcommonlyusedtypesofmemoryelements,theDlatchandtheDflipflop.
4.7.1.CyclicCircuits
Acycliccircuithasoneormoreloops,notcountingimplicitpowersupplyloops.Forexample,Figure4.50showstwocycliccircuitswithoneloopeach.
Q0
Q1
Q0
Q1
Q2
Figure4.50: Cycliccircuitswithoneloopoftwo(left)andthree(right)inverters.
Cycliccircuitsaremoredifficulttoanalyzethanacycliccircuits.Inparticular,thecircuitsshowninFigure4.50donotevenhavedistinguishedinputand
output terminals. The inverter pair on the left of Figure4.50 has two wires:
connects the output of inverter 0 to the input of inverter 1, and
connectstheoutputofinverter1totheinputofinverter0.Analogously,thethreeinvertercycleontherighthasthreewires,
and
Letusanalyzethefunctionalityofthetwoinverterloop.Theoutputofeachinvertercanbe0or1.First,assumethat
Thus,theinputofinverter
1 is 0, hence
Since is the input of inverter 0, the output of inverter 0 must be
This is the same value that we started with. We
conclude,thatthecircuitreinforces
and,bysymmetry,
Second,assumethat
Thentheoutputofinverter1is
which
causestheoutputofinverter0tobe
Thisisalsothesamevaluewithstartedoutwith.Weobservethatthetwoinverterloopreinforcesitsstate
ineithercase
or
34/43
3/2/2016
The threeinverter loop behaves very differently than the twoinverter loop. Assume that
Since
is the input of inverter 1, the output of
inverter1mustbe
Since istheinputofinverter2,itsoutputmustbe
Now,input istheinputofinverter0.Therefore,theoutput
ofinverter0mustbe
whichisthecomplementoftheassumptionwestartedwith.Ifwetraversetheloopforasecondtime,wefind
that
isthewireiscomplementedagain.Weobservethatthecircuitdoesnotreinforceitsstate.Instead,thevaluesofthewirestoggleataspeeddetermined
bythepropagationdelayoftheinverters.
Weconcludethatthetwoinverterloopisstablewhereasthethreeinverterloopisnot.Infact,thesebehaviorscanbeobservedinlargerloopsaswell.
Every inverter loop with an even number of inverters is stable. In contrast inverter loops with odd numbers of inverters as unstable. Oddnumbered
inverterloopsarecalledringoscillators,andareusefultomeasuretheaveragepropagationdelayofaninverterinagivenmanufacturingprocess.
Ringoscillatorsarenotusedaslogicelementsindigitalcircuits.Incontrast,loopswithevennumbersofinvertersareusefulbuildingblocks,because
theyarebistable.Abistablecircuitisacircuitwithtwostablestates.Astateisstable,ifthecircuitdoesnottransitionintoanotherstatewithoutexternal
stimulus.Thetwostablestatesofthebistabletwoinverterloopare
and
Whenthecircuitassumesa
stablestate,theinvertersenforcethat
and
arecomplementsofeachother,
Q=0
Thus,inFigure4.51,wecallthewiressimply and
Q =1
Q=1
Q =0
Figure4.51: Thetwostatesofabistableinverterloop.
TheproblemwiththebistableinverterloopinFigure4.51isthatwecannotcontrolitsstate.Sincethecircuitisstableandhasnoinputs,itisnotobvious
howtotransitionthecircuitfromonestablestateintotheother.TheDlatch,thatwediscussbelow,fixesthisdeficiency.
4.7.2.DLatch
Latch
ADlatchisabistablememoryelementwithdatainput clockinput andoutput TheDlatchsymbolisshownontheright.

Figure4.52showsaDlatchimplementationwithabistableinverterloopanda2:1multiplexertosteerthedatainputsignalintothe
loop.
Q
D
1
0
Figure4.52: ADlatchimplementedasabistableinverterloopwithaninputmultiplexer.
Clockinput oftheDlatchservesasselectinputofthemultiplexer,anddetermineswhetherdatainput assertsthestateofthebistableinverterloop.
Hence,wedistinguishtwomodesofoperation:
:Dlatchistransparent.
Theinverterloopisopen,andinput drivestheinverterpair.Sinceoutput followsinput
wesaytheDlatchistransparent.
:Dlatchisopaque.
Theinverterloopisclosed,andstoresthecurrentstate.Sinceoutput retainsitsvalue,independentofinput
opaque.
wesaytheDlatchis
Figure4.53showstheDlatchcircuitwiththemultiplexerreplacedbyitsswitchmodel.Themodeofoperationdependsonthepositionofthemultiplexer
switch.If
theloopisopen,andinput drivesbothoutput andtheinverterpair.Otherwise,if
theclosedloopisdisconnectedfrominput
andretainsvalue becausetheloopisabistablecycliccircuit.
=1
=0
D
Q
Figure4.53: ModesofoperationoftheDlatch.TheDlatchiseither(left)transparent:output followsinput

storesvalue
or(right)opaque:theinverterloop
ThewaveformdiagraminFigure4.54illustratestheoperationofaDlatchovertime.Thediagramshowsthevoltagelevelsoftheclockanddatainputs
35/43
3/2/2016
andtheDlatchoutput.Initially,theDlatchisopaqueandstoresoutputvalue
Input transitionsto1beforetheclocksignal.WhentheDlatch
becomestransparent,theoutputfollowsinput afterapropagationdelay,indicatedbythecurvedarrowfrom to TheDlatchremainstransparent
aslongastheclocksignalis
Output followsinput afterapropagationdelay,indicatedbythecurvedarrowsfromthetransitionsof to
Whentheclocktransitionsto
theDlatchstoresthelastvalueofinput beforetheclocktransition,andholdsthisvalueforaslongastheDlatch
remainsopaque.TheDlatchiscalledlevelsensitivebecauseitstwomodesofoperationdependontheleveloftheclocksignal.
transparent
opaque
transparent
D
Q
time
Figure 4.54: Waveform diagram of Dlatch. When the Dlatch is transparent, output follows input Otherwise, during the gray shaded time
intervals,theDlatchisopaque,i.e.holdsoutput unchangedandblocks frompropagatingtotheoutput.
Inthefollowing,wederiveaCMOScircuitfortheDlatchinFigure4.52.Webeginwiththeimplementationofthe2:1multiplexerwithtwoselectarms,
integrating the arm into the inverter loop as shown in Figure4.55.Since this multiplexer circuit inverts the output, we add an inverter to generate
uncomplementedoutput Figure4.55showsthemultiplexerselectarmsastristateinvertersinformofaninverterthatdrivesatransmissiongate.
outputinverter
Q
feedbackinverter
X
arm
arm
Figure4.55: Dlatchimplementationwithmultiplexerselectarmsshownastristateinverters.
ThefunctionalityoftheDlatchcircuitdependsessentiallyonthestateofinternalnode
invertingmuxarmchangesthepolarity,wehave
the Dlatch is transparent. Otherwise, when
inverterloopreinforces
Theoutputinverterdrives
Whenthe armisclosed,itsteersinput to
tooutput
sothat
the arm is open and the arm is closed, then node
Sincethe
whenclockinput
and
is disconnected from input
and the
TheDlatchisopaqueandoutput retainsitsvalue.
AcompactCMOSimplementationofthemuxarmsisshowninFigure4.56.WeexploitthecircuitequivalenceinFigure4.39toimplementeachmux
armwithfourtransistors.
arm
arm
X
Q
outputinverter
Y
feedbackinverter
36/43
3/2/2016
Figure4.56: Dlatchimplementationwithexplodedmultiplexerselectarms.
Figure4.57showsthecomplete12transistorCMOScircuitfortheDlatch,includinganinteractiveswitchmodel.Notethattheoutputofthefeedback
inverteris
=0
andthecircuitalsoenforces
and
D=0
Figure4.57: CMOScircuitforDlatchandinteractiveswitchmodel.
ThetimingbehavioroftheDlatchdependsonthemodeofoperation.WhentheDlatchistransparent,thecriticalpathstretchesfromdatainput to
output Thesignalpropagatesthroughinternalnode andbypassesthefeedbackloopentirely.Thedelayisindependentofclocksignal When
the Dlatch is opaque, the output remains unchanged. Hence, it does not make sense to speak of a propagation delay. Therefore, we discuss the
propagationdelayofthetransparentDlatch.Inparticular,weanalyzethecriticalpathoftheDlatchwiththematchedtransistorsizesshowninFigure
4.58.
Figure4.58: Dlatchcircuitwithmatchedtransistorsizes.
ThepropagationdelayofthetransparentDlatchisthedelayofthe2:1multiplexerplusthedelayoftheoutputinverter:
Assuming
that output drives capacitive load
the output inverter has electrical effort
and parasitic delay
such that
time units. Next, we analyze the multiplexer. The logical effort is the input capacitance
of input divided by the input
capacitanceofareferenceinverter
Input drivesthepMOStransistorofwidth4andthenMOStransistorofwidth2ofthe arm.Thus,
andthelogicaleffortofinput ofthemuxis
Todeterminetheelectricaleffortofthemux,noticethatoutput of
themuxdrivestwoinverters,thefeedbackinverterandtheoutputinverter.Thus,theloadcapacitanceofthemuxis
Therefore,
the electrical effort of input of the mux is
The parasitic delay of the 2:1 mux is
see
multiplexer.Wefindthatthedelayofthemultiplexeris
andthepropagationdelayoftheDlatch
amountsto
This delay may serve as a point of reference for circuit optimizations. For example, we observe that the feedback inverter diverts current from the
multiplexeroutput.IfweshrinkthepMOStransistorofthefeedbackinverterfrom2to1unitsofnormalizedwidth,wereducethecapacitanceoftheoff
pathbranchandthedelayofthepathofinterest.
Propagationdelay
isonlyoneofseveralcharacteristicquantitiesofthetimingbehavioroftheDlatch.Arguablyevenmoreimportantisthetiming
behavioroftheDlatchwhentheclockinputchanges.Moresuccinctly,ifboththeclockanddatainputschangeataboutthesametime,theDlatchmay
37/43
3/2/2016
becomeunstable.DigitalcircuitswithDlatchesshouldavoidthisscenariobyallmeansbecauseitcanimpactthefunctionalityadversely.Functionalbugs
causedbycarelesstimingbehaviorareparticularlydifficulttouncover.WediscussthetimingbehavioroftheDlatchinFigure4.59,wheredatainput
transitionsattime
andclockinput attime
armdrives
armdrives
Y
Y
D
time
0
Figure4.59: Dlatchcircuitandtiminganalysis.Theclockinputtransitionsfrom1to0attime
for
TheopaqueDlatchstoresinputvalue
GiventheDlatchtransistorsizesinFigure4.58andaloadcapacitanceof
:
11
14
20
TheDlatchistransparentfor
andopaque
thedelaysofthecircuitelementsoftheDlatchare:
armdelay
feedbackinverterdelay
armdelay
outputinverterdelay
Withtheseelementdelays,wecanexpressthepropagationdelayofthetransparentDlatchas
Thewaveformdiagram
showsthecorrespondingtransitions.Initially,theDlatchistransparentbecause
andthe armisclosedwhereasthe armisopen.At time
the inputchangesfrom0to1.Output willfollow aftertheinputhaspropagatedthroughthe armtointernalnode at
andthen
through the output inverter at time
The change of internal node also affects node which changes at time
from 0 to 1 after
propagationdelay
ofthefeedbackinverter.
WhentheDlatchbecomesopaqueattime
itstoresvalue
Clocksignal
closes,orturnson,the armandopens,orturnsoff,the
arm.Thus,aftertheclocktransitionat
signal
takespropagationdelay
time units through the inverting arm to reinforce
internalnode
attime
Itisthisswitchingdelayofthemultiplexerthatcancausetrouble.Inparticular,thetransitionofinput mustoccur
by a sufficiently long time period before the negative transition of clock to stabilize internal node through the feedback inverter, because the
feedbackloopisbistableonlyif
Ifweforce
thefeedbackloopwillassumeanunpredictablestate.Thiscanhappen,ifthetimeinterval
betweenthetransitionsofinputs and istoosmall.Figure4.60illustratesthetimingproblemsofaDlatch.
armdrives
armdrives
D
armdrives
armdrives
armdrives
Y
D
Q
X
11
15
Q
Q
time
0
armdrives
time
0
11
14 15
17
19
time
0
5 6
Figure 4.60: Dlatch timing problems: (left) in corner case

the Dlatch stores the input after
(middle)
causesoutput tofollow afteraglitchincreasesthepropagationdelay,(right)
inputtransitionof because changestooclosetothenegativeedgeofclock
InthewaveformdiagramontheleftofFigure4.60,theintervalbetweenthetransitionofinput andclock is
11
14
missesthe
Attime
thenegativeclockedgebeginstoclosethe armandtoopenthe arm.Internalnode follows justintimeat

toreinforceinternalnode
afterthemultiplexerdelayof
timeunits,sothat
for
Timeinterval
isthesmallestintervalfortheD
38/43
3/2/2016
latchtocaptureinput safely.
ThewaveformdiagraminthemiddleofFigure4.60assumesthatinterval
clockedgeattime
thefeedbackloopentersunstablestate
inverting armdrivesvalue
onto inner node
issmaller,thatis
Afterthemultiplexerdelayof
Afterthenegative
timeunits,attime
the
However, since the feedback inverter produces output value
just 1 time unit after the
negative clock edge at time

the arm pulls inner node back to 0 at time
The resulting glitch on node propagates through the
feedbackandoutputinverters.Sinceinvertersattenuatesuchglitches,thefeedbackloopstabilizesandrecovers
However,theglitchappearsat
output afterpropagationdelay
whichislargerthanpropagationdelay
ofthetransparentDlatch.
ThewaveformdiagramontherightinFigure4.60 assumes an even smaller interval
The negative clock edge opens the armtoo
earlytopullinnernode to0.The armfightstheclosing arm,whichsucceedstopull to1.Asaconsequence,theDlatchfailstocaptureinput

andcontinuestostoretheoldvalue
Theoutputinverterpropagatestheglitchofnode tooutput beforerestoringtheoldoutput
value
Figure4.61summarizesthetimingbehavioroftheDlatchinagraphthatplotsinterval
onthe
horizontalaxisandpropagationdelay
ontheverticalaxis.Case(1)correspondstothescenario
inFigure4.59,wherethetransitionofinput occurssufficientlyearlybeforethenegativeclockedge
for the Dlatch to propagate the change safely to output The propagation delay is the sum of the
delays of the arm and the output inverter,
Senarios (2), (3), and (4)
correspondtothethreecasesillustratedinFigure4.60.Thecloserthetransitionofinput getstothe
negativeclockedge,thelargerpropagationdelay
becomes.Wheninterval
becomestoo
small,theDlatchfailstocapturethenewinputsignalaltogether,andretainstheoldinputvalue.We
observethatforasafeoperationoftheDlatch,weneedtoguaranteethatinput changessufficiently
earlybeforethenegativeclockedge,i.e.interval
mustbelargeenough.
Thesetuptime
istheminimumtimeinput mustbestablebeforethenegativeclock
edgetocapturetheinputvaluewithinareasonabledelay
dD
d max
(3)
(1)
(2)
dD
+ dX
(4)
D
tsetup
(1)
(2)
(3)
time
(4)
D
Q
Forexample,reasonablespecificationsdefine
transparentDlatch,
tobe
dD
largerthanthepropagationdelayofthe
dmax
tsetup
+ d
Figure 4.61: Dlatch timing: the smaller the

interval between input transition of and the
negative clock edge,
becomes, the
larger the propagation delay from input to
output
becomes. When the interval
becomestoosmall,theDlatchfailstocapture
theinputtransitionentirely,seecase(4).
thold
Figure4.62: Dlatchsetuptimeandholdtimecharacterizethetimeintervalaroundthenegativeclockedgewhereinput maychangesafely.The

negativeclockedgeoccursat
ThetimingproblemsoftheDlatchoccurevenwhenthetransitionofinput occursafterthenegativeclockedge.Ifinput transitionsbeforethe arm
iscompletelyopen,glitchescanpropagatetooutput Figure4.62illustratesdelay
asafunctionofinterval
Analogoustothesetuptime,
foraglitchfreeoperationoftheDlatch,wemustensurethatinput doesnottransitionuntilatimeperiodafterthenegativeclockedgehaspassed.
Theholdtime
delay
istheminimumtimeinput mustbestableafterthenegativeclockedgetocapturetheinputvaluewithinareasonable
SetuptimeandholdtimecharacterizethetimingbehavioroftheDlatcharoundthenegativeclockedge.MostmanufacturersofferDlatchesasbasic
circuitelements,andsupplytheirprocessspecificsetupandholdtimesintheirdatasheets.Forthecircuitdesigner,itisimportanttoensurethatthedata
inputdoesnotchangewithintheintervalaroundthenegativeclockedgeoftheDlatch.ThisisthereasonwhytheclockinputoftheDlatchiscommonly
connectedtotheregularbeatofaclocksignalwithawelldefinedclockperiod.Suchaclocksignalrestrictsthedesignchoicesbutgivesthedesignera
cleantimereferenceforthepermitteddelayrangeofthecircuitthatdrivesthedatainputofaDlatch.
4.7.3.DFlipflop
ADflipflopisabistablememoryelementwithdatainput clockinput andoutput A Dflipflop is an edgetriggered memory element that is
activatedbyaclockedge.Atthetriggeringclockedge,theDflipflopstoresinput untilthenexttriggeringclockedgeoccurs.Thisbehaviorisdifferent
fromaDlatchwhichislevelsensitive.WhereasaDlatchisopaquewhentheclockinputlevelislow,theDflipflopisalwaysopaqueexceptfortheshort
timeperiodaroundthetriggeringclockedge.Nevertheless,theDflipflopcanbebuildbyaseriescompositionoftwoDlatcheswithcomplementedclock
39/43
3/2/2016
inputs,asshowninFigure4.63.The Dflipflop symbol on the right has a triangle at the clock input to indicate that the flipflop is
edgetriggered.
Q1
master
Latch
Latch
slave
Figure4.63: ApositiveedgetriggeredDflipflopimplementedwithtwobacktobackDlatchesandcomplementedclocks.
ThefirstDlatchwithinput andoutput iscalledthemasterlatch,andisnegativesensitive,becauseitistransparentwhenclockinput
The
secondDlatchwithinput andoutput istheslavelatch,andispositivesensitive,becauseitistransparentwhenclockinput
TheDflipflopis
positiveedge triggered because it stores input at the rising edge of the clock. If we invert the clock input, the flipflop would be negativeedge
triggered,andstoreinput atthefallingedgeoftheclock.Figure4.64showsaCMOScircuitforthepositiveedgetriggeredDflipflop.Thecircuitsaves
fourtransistorsbyremovingtheoutputinverterofthemasterlatchandtheinputinverteroftheslavelatch.
Q1
Figure4.64: A20transistorCMOScircuitforthepositiveedgetriggeredDflipflopsavestheoutputinverterofthemasterlatchandtheinputinverterof
theslavelatch.
To understand the functionality of the Dflipflop, consider the switch model and the associated waveform diagrams in Figure 4.65 and Figure 4.66.
Figure4.65illustratestheoperationduringthenegativehalfcycleoftheclock.Themasterlatchistransparent,andinput canpropagatetoinnernode
Sincetheslavelatchisopaque,input cannotpropagatebeyond tooutput Instead,theslavelatchreinforcesoutputvalue
=1
D
Q1
=0
Q
D
Q1
Q
Figure4.65: Switch model of Dflipflop and operation during negative halfcycle of the clock. The master latch is transparent and the slave latch is
opaque.
Duringthepositivehalfcycleoftheclock,illustratedinFigure4.66,themasterlatchisopaque,anddisconnectsinput frominnernode
Thus,the
masterlatchstoresthelastvalueofinput beforethepositiveclockedge.Theslavelatchistransparent,andpropagatesthevalueof
tooutput
whichisthelastvalueofinput beforethepositiveclockedge.
40/43
3/2/2016
=0
D
Q1
=1
Q
D
Q1
Q
Figure 4.66: Switch model of Dflipflop and operation during positive halfcycle of the clock. The master latch is opaque and the slave latch is
transparent.
Notethatoutput remainsunchangedafterthenegativeclockedge,becausetheopaqueslavelatchstoresduringthenegativehalfcyclethevaluethat
themasterlatchstoredduringtheprecedingpositivehalfcycleoftheclock.
ThetriggeringclockedgeisthetimingcriticaltransitionoftheDflipflop.IncaseofthepositiveedgetriggeredDflipflop,therisingclockedgeturnsinto
thefallingclockedgeofthemasterlatch.Thisisthecriticalclockedge,wherethesetuptimeandholdtimeofthemasterDlatchmustbeobserved.
Thus,theDflipflopissubjecttothesametimingconstraintsasaDlatch.Inputsignal mustbestableduringthesetuptimebeforethepositiveclock
edgeandduringtheholdtimeafterthepositiveclockedge.
TheDflipflopisalsocalledregister,andisoneofthemostwidelyusedmemoryelementsindigitalcircuitdesign.AsingleD
flipflopimplementsa1bitregister.An bitregisterconsistsof Dflipflopstriggeredbythesameclocksignal.Eachofthe
DflipflopshasanindependentDinputandQoutput.Theregistersymbolisshownontheright.
ADlatchistransparentwhenclock
andopaquewhen
ApositiveedgetriggeredDflipflopstoresinput attherisingedgeof
clocksignal CompletethewaveformdiagrambelowwithDlatchoutput
andDflipflopoutput
assumingbothmemoryelements
areconnectedtoclock andinputsignal
D
Qlatch
Qff
Solution
4.8.Wires
Incircuitdiagramswedrawwirestosignifytheconnectivitybetweenterminalsoftransistors,gates,andlargercircuits.Real wires, however, exhibit a
morecomplextimingbehaviorthanmeetstheeye.Wedistinguishthreewiremodels:
Idealwire:Thewirehasnegligibleresistanceandcapacitancethatwecanignoreforthepurposeofdelayanalysis.Inpractice,thissimple
modelisrelevantforveryshortwiresonly.
Capacitive wire: The wire has negligible resistance but significant capacitance. This model applies to medium length wires. For delay
analysis,wecanmodelthewirecapacitanceasalumpedcapacitanceinparalleltotheloadcapacitanceofthegatedrivingthewire.The
methodoflogicaleffortcapturesstraycapacitancesduetowireswithoutchanges.
RCwire:Thewirehassignificantresistanceandcapacitance.Longwireshavesignificantdelaywhichcanbemodeledasresistancesand
capacitancesdistributedalongtheextendofthewire.InstateoftheartVLSIchipslongwireshavedelaysspanningtensofclockcycles,
andhaveevenbecomeperformancelimiting.Modelinglongwiresisbeyondthecapabilitiesofthemethodoflogicaleffort.
In this section, we introduce the Elmoredelay as an alternative model for the timing analysis of RC circuits, and show that we can reduce the signal
propagationdelayoflongwiresfromaquadraticfunctioninthewirelengthtoalinearfunctionbyusingrepeaters.
4.8.1.ElmoreDelay
41/43
3/2/2016
Whenawireissolongthatitincursadelaycomparableorevenlargerthanthedelayofitsdriver,weneedtoaccountforthewiredelayifwewishto
minimize the delay of the whole circuit. Figure4.67 illustrates the situation with a long wire driven by an inverter and load capacitance
We can
modeltheresistanceandcapacitanceofthewirebyconsideringaninfinitesimallyshortsegment.Then,themethodsofcalculusapply,andthemodelfor
the wire delay results in the famous diffusionequation, which does not have analytic solutions for the problem at hand. We could solve the diffusion
equation numerically, but a more insightful model can be derived from an approximation of the wire by means of finite resistances and capacitances,
lumpedintodiscreteresistorsandcapacitors,asshowninFigure4.67.
CL
Ron
R1
R2
C1
R3
C2
C3
CL
Figure4.67: RCmodelofalongwiredrivenbyaninverterandloadcapacitance
TheElmoredelayapproximatesthedelayofanRCwirewith segmentsveryaccuratelyasthesumoftimeconstants:
Figure4.68illustratestheindexingusedinthissumfor
Eachindexcorrespondstoabranchingnodewithanoffpathcapacitance.Ifweindex
theresistorsandcapacitorsasshowninFigure4.68,wecaninterprettheElmoredelayasthesumofthe
constantsinducedbythecurrentsshown
intheRCmodel.Thetotaldrivecurrentflowingfromtheinverterintothewireis
ByKCL, chargescapacitor
whereas current
flowsintotheremainingwiresegments.TheRC constant of the first segment is
Current charges capacitor
Since flows
through and
thetimeconstantforthesecondsegmentis
ForthewholewiremodelinFigure4.68,theElmoredelayamountsto:
R1
R2
i1
C1
R3
i2
C2
R4
i3
C3
i4
C4
Figure4.68: WireRCmodelwithcurrentsforElmoredelayapproximation.
TheElmoredelayenablesustoderivethedelayofawireasafunctionofitslength.Assumethatawiresegmentoflength
haslumpedresistance
andcapacitance InFigure4.68,weassumethat
and
forallnodes Then,theElmoredelayofawirewith segmentsandlength
is:
WeconcludethattheElmoredelayisproportionaltothesquareofthenumberofwiresegments andis,hence,proportionaltothesquareofthewire
length,
Intuitively,thisresultisobviousconsidering(1)thatthetotalresistanceofawire
growslinearlyin and(2)thetotal
capacitance
growslinearlyin viewingthewireasoneplateofacapacitorandtherestofthechipastheother.Therefore, product
growsquadraticallyin
4.8.2.RepeaterInsertion
Ifwewanttoreducethedelayofalongwire,increasingthesizeofthedriveraloneisnotaneffectivemethod.However,ifweuserepeaterssuchas
invertersorbuffersatcarefullychosenlocationsdistributedacrossthewire,wecanreducethewiredelaysignificantly.Moresuccinctly,wedivideawire
oflength into segmentsoflength
andinsertarepeateratthebeginningofeachsegment,asillustratedinFigure4.69.Usinginverters
as repeaters minimizes the additional delay introduced by the repeaters themselves. Nevertheless, to reduce the overall delay, we need to strike the
42/43
3/2/2016
properbalanceinthenumberofrepeaters.Toofewrepeaters,andthewiredelayremainsquadraticinwirelength.Toomanyrepeaters,andthewire
delayisdominatedbythedelayoftheinverterchain.
CL
Figure4.69: Wiremodelwithinvertersasrepeaters.
Tofindtheoptimalnumberofrepeaters,considerRCmodelofasinglewiresegmentinFigure4.70.Weassumethatthewireoflength withtotal
resistance
andcapacitance
has segmentsoflength
suchthateachwiresegmenthasresistance
and capacitance
TheRCmodeloftherepeatinginvertershallhaveonresistance
andoutputcapacitance
Assumingthatallinvertershavethesame
size,theloadcapacitanceofthewiresegmentistheinputcapacitanceoftheinverter,
L
Rinv
Rwire /n
Cinv
Cwire /n
Cinv
Figure4.70: RCmodelofonerepeatedwiresegment.
TheElmoredelayofonewiresegmentoflength
includingrepeatinginverteris
Therefore,theElmoredelayofthewholerepeatedwireoflength is
Sincetheoptimalnumberofrepeaters minimizesthewiredelay
wesetthederivativetozero:
andfindtheoptimalnumberofrepeaters
Withthisvalueof andutilizingtheobservationthatthetotalresistanceandcapacitanceofthewireareproportionaltoitslength
wefindthattheminimumdelayoftherepeatedwireisproportionalto :
i.e.
and
Here,weassumethattheinvertersizeand,hence,
and
areconstants,independentofwirelength Weconclude,thatthedelayoflongwires
doesnotneedtobeproportionaltothesquareoftheirlengthbut,attheexpenseofinsertingrepeaters,canbereducedtobedirectlyproportionalto
theirlength.
Footnotes
[1] Thenumberofinputsofagateisalsocalledfanin.Forinstance,aNANDgatewith inputshasafaninof
43/43

VLSI Interview Questions

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VLSI Interview Questions

Uploaded by

Copyright:

Available Formats

3/2/2016

For example, a 3input NOR gate has logical effort

per input and parasitic delay

The logical effort of an input NOR gate is

Sincethelogicaleffortofinput differsfromthelogicaleffortofinputs and themajoritygateinFigure4.8isanasymmetricgate.Incontrast,the

Notethatwecanconstructalternativegatedesignsbysharingthetransistorsofinput orofinput Theresultingasymmetricgatesdonotreducethe

Thepriceofreducingthelogicaleffortofinput isarapidlygrowinglogicaleffortofinput Choosing

The circuit for

is shown in Figure 4.10. The logical effort of the longest paths

hasapulldownnetworkconsistingofaseriescompositionof andaparallelcompositionof andaseriescompositionof and

TheoddparitypropertysuggestsadesignforanXORCMOScircuit.Tobuildan inputXORgate,weneed armswith transistorseach.Halfofthe

Theexponentialgrowthofbothlogicaleffortandparasiticdelayinthenumberofinputs limitstheapplicabilityofthisCMOScircuittosmall Forlarger

butnotthoseofinputs and withoutalteringthelogicfunction.

whilethelogicaleffortsofinputs and remainunchangedcomparedtothesymmetricdesign:

betheinputcapacitanceofinputs and oftheXORCMOSgate,thenourXORcircuitincludesa1forkwithequalloads,asshown

ThuswhenusingtheXORlogicgateaspartofalargercircuit,wecanchoose dependingonelectricaleffort tominimizeitsdelay.

Next,wedetermine and ofthefallingtransitionoftheHIskewgate.Toapplythedefinitionoflogicaleffort,weconstructthedownscaledNAND

minimizes the average delay if it is chosen as a function of electrical effort For

Theaveragepathdelayfrominput or tooutput isthesumoftheaveragegatedelays:

Recallthatthe2input XNOR gaterealizestheequalityrelation,

Thecircuitsymbolofthe2:1muxsignifiesthatselectsignal steersinput tooutput if

The transmission gate disconnects its two terminals, if

and connects its two terminals if

For example, the onehot

The inputs are ordered according to their index Output

1.64 2.32 1.58 2.1

rather than assumed value

ADlatchisabistablememoryelementwithdatainput clockinput andoutput TheDlatchsymbolisshownontheright.

Figure4.53: ModesofoperationoftheDlatch.TheDlatchiseither(left)transparent:output followsinput

the arm is open and the arm is closed, then node

is disconnected from input

Figure 4.60: Dlatch timing problems: (left) in corner case

thenegativeclockedgebeginstoclosethe armandtoopenthe arm.Internalnode follows justintimeat

onto inner node

However, since the feedback inverter produces output value

just 1 time unit after the

negative clock edge at time

The negative clock edge opens the armtoo

earlytopullinnernode to0.The armfightstheclosing arm,whichsucceedstopull to1.Asaconsequence,theDlatchfailstocaptureinput

Figure 4.61: Dlatch timing: the smaller the

Figure4.62: Dlatchsetuptimeandholdtimecharacterizethetimeintervalaroundthenegativeclockedgewhereinput maychangesafely.The

You might also like