You are on page 1of 84

Sub-system Design

Designing of various Arithmetic Building Blocks

By Vishakha Bhale, PhD


DESIGN OF ARITHMATIC BUILDING BLOCKS:
Data Path Operations: Adders, Shifters,
Multiplier, Power and Speed Trade off In Data-
path Structures.

Reference Book: Digital Integrated Circuits


By:-
Jan M. Rabaey,
Anantha Chandrakasan &
Borivoje Nikolic
NPTEL Video Lectures:-

2
Adders circuits….
 Ripple-carry
 Mirror
 Carry by-pass
 Linear carry select
 Carry Looked Ahead
 Transmission gate based
 Manchester carry chain

3
Data paths in Digital Processor Architectures
 Components of digital processor data paths, memory,
control and input/output blocks.

 Data paths is the core of the processors ---- all


computations are performed.

 Other blocks are support units.

 A typical data path consists of an interconnection of


combinational functions, such as arithmetic operators
(addition, multiplications, comparison & shift) or logic
(XOR, OR, AND etc)

4
 The design of the arithmetic operators is the topic
of our discussion.

5
6
 Data paths are often arranged in a bit-sliced
organization as shown in Figure.

 Data in a processor are arranged in a word-based


fashion.
Eg: Micro-processor operating on a single data
paths --- 32 or 64 bits wide.

Signal processor data paths ---- 5 to 24 bits


wide.

 Due to the same operation performed frequently on


each bit of the data word, the data path consists of
32 bit slices, each operating on a single bit ---- hence
7
name bit sliced.
The Adders
 Commonly used arithmetic operation & is speed-
limiting element.

 Careful optimization of adder is of utmost


importance
----- Optimization (logic or circuit level)

8
The Binary Adder: Definitions
 A & B are adder inputs
 Ci carry input
 S sum
 C0 carry output

9
10
11
(1)
(2)

(3)

12
(4)
13
 G & P are only functions of A & B are independent
upon Ci.

 An N-bit adder can be constructed by cascading N


FA’s in series, connecting C0, k-1 to Ci, k for k = 1 to
N-1 and first carry-in Ci,0=0.

 This conf. is calledripple-carry adder , since carry


bit “ripples” from one stage to the other.

 The delay through the circuit depends upon the


number of logic stages that must be traversed and is
a function of applied input signals.

14
 For some i/p signals, no rippling effect occurs at all,
while for others, the carry ripples all the way from its
LSB to MSB.
 The propagation delay of such a structure (critical
path) is defined as the worst case delay over all
possible input patterns.
 In case of ripple carry adder worst case delay occurs
when a carry generated at LSB position propagates
all the way to MSB position.
This carry will be finally consumed in the last
stage to produce the sum.

15
 The delay is then proportional to the number of
bits in input words N is given by:

(4)

tcarry and tsum equal the propagation delays from


Ci to C0 and S, respectively

16
Two important conclusions can be drawn
from Eq (4):
1. The propagation delay of ripple carry adder is
linearly proportional to N.

2. When designing FA for fast ripple-carry adder,


optimize tcarry than tsum, since the latter has the
less influence on the total value of tadder.

17
Inverting property of Full Adder
Statement: “Inverting all inputs to a full
adder results in inverted values for all
outputs”
It is expressed in equations as follows:

(5)
18
 This property is useful in optimizing the speed of
ripple-carry adder.
 The reorganized eq. set here is:

(6)

19
C0

20
 The corresponding adder using
complementary static CMOS requires
28 transistors, consuming large area
and also it is slow.
 Tall PMOS transistor stacks in both
carry & sum generation circuits.
 Intrinsic load capacitance of C0 is
large
(2 diffusion, 6 gate capacitance &
wiring capacitances)
 Signal propagates through two
inverting stages in carry generation
circuit.
 Sum generation ckt needs one extra
logic stage.

21
Mirror Adder Design
 Improved adder ckt ….
 Operation based on Eq. (6)
 Carry generation ckt is imp. To analyzed
1. Carry-inverting gate is eliminated
2. PUN & PDN are not duals instead they form
the cleaver implementation of P/G/D functions.
 This results in considerable reduction in both area
and delay.

22
6
12 12 4 4
4 4 6

12 4 4 6

6 2 2 3

6 2
6 2 2 3
2
3

23
Observations from mirror adder design
1. This FA cell requires only 24 transistors
2. The NMOS & PMOS chains are completely
symmetrical.
3. Maximum of two series transistors in carry-
generation circuitry. (tall transistor stacking in
PUN is reduced)
4. The transistors connected to Ci are placed
closest to o/p of the gate.
5. Only transistors in carry stage have to be
optimized for speed keeping transistors in sum
stage for minimum size (W/L)

24
5. Transistor sizing for minimum delay:

----- Keep size of the carry stage 3 to 4 times that


of sum stage.
This maintains the optimal fan-out of 2.
The resulting transistor sizes are annotated on
Figure. Where PMOS/NMOS ratio of 2 is assumed.

25
Carry Looked-ahead Adders
Algorithm based on prediction of carry.
Benefits of CLA circuit:
1. Speed improvement due to pre-computation of carry
before it is actually calculated at all the stages of adder
circuits.

26
27
CLA LOGIC

28
Carry Looked-ahead Adders
Implementation issues in CLA circuit:
1. Irregular gate structures
2. Needs extra logic (means extra H/W) for speed
improvement.
3. Restrictions on Fan-in parameter….since the
designer can’t go beyond adding too many i/p’s to a
particular gate.
4. Non-modularity of CLA structure
5. Hence, layout will not be symmetric.

29
Comparison of CLA with conventional FA
representation
In conventional FA, we
get regular structure i.e
the design follows the
concept of Modularity
Layout will be
symmetric

30
Transmission Gate XOR

B
M2

A
A
F
M1 M3/M4
B

31
 Another ex of effective
use of txion gate is the
popular XOR ckt.
 The complete
implementation requires
only 6 transistors
(including inverters to
generate B’) compared
with the 12 transistors
required for its CMOS
implementation

32
To understand the operation of the ckt we need to only analyze B=0 & B=1 cases
separately.
For B=1, M1 & M2 acts as inverter, M3 & M4 OFF
----o/p function F1= A’B
For B=0, M1 & M2 are disabled , M3 & M4 is operational
----o/p function F2=AB’
Overall function F= F1+F2
F=A’B+AB’
33
Transmission-Gate-Based Adder
 A FA implementation using transmission gate is
shown in Fig. below and uses 24 transistors.

 It is based on propagate-generate model.

 The propagate signal which is the XOR of inputs A


& B, is used to select true or complementary value
of the i/p carry as new sum o/p.

34
 Based on propagate signal, o/p carry is either set
to i/p carry or either one of the i/ps A or B.

 Having similar delays for both sum and carry o/ps.

35
36
Benefits of Txion gate adder ckt
Compact area
Similar delays in both sum and carry generation
circuits

37
Fig: Manchester carry gates (a) Static, using P, G & K
38 (b) dynamic using only P & G signals
 A Manchester carry chain adder uses a cascade of
pass transistors to implement carry chain.

 Here the carry propagation ckt for txion gate


based adder is simplified by adding G & P signals.

 The P path is unchanged, it passes Ci to C0 o/p if P


signal (A XOR B) is true.
Else, o/p is either pulled low by Di signal or pulled
up by Gi signal.

39
 The dynamic implementation
(Fig b above) makes this ckt
more simple.

 Here txion gates are


replaced by NMOS-only
pass transistors.

 Here pre-charging the o/p


eliminates the need for kill
signal (for the case in which
the carry chain propagates
the complementary values of
carry signals).

40
41
 During pre-charge (phi=0), all intermediate nodes
of the pass transistor carry chain are precharges
to VDD.

 During evaluation phase, Ak node is discharged


when there is an incoming carry and propagation
signal Pk is high, OR when there is generate signal
for stage k (Gk) is high.

42
43
Carry by-pass adders OR Carry-skip adders

 The ripple carry adder is only practical for the


implementation of additions with a relatively small
word length.

 Desktop computers use word lengths of 32 bits,


servers require 64 bits, very fast computers, such
as mainframes, supercomputers or multimedia
processors require word length of up to 128 bits.

 The linear dependence of adder speed on number


of bits makes usage of ripple adder impractical.

44
The Carry-Bypass Adder
 Consider 4-bit adder block of Fig.(a).

 Let the values of Ak & Bk (k=0,1,2,3) are such that


all propagate signals Pk (k=0….3) are high.

 An incoming carry Ci,0=1 propagates under those


conditions through the complete adder chain and
causes an outgoing carry C0,3=1.

45
 In other words,
If (P0P1P2P3=1) then C0,3=Ci.0
Else either DELETE or GENERATE occurred
 This speeds up the adder operation.
 When BP=P0P1P2P3=1, the incoming carry is
forwarded immediately to the next block through
the by-pass transistor Mb- hence name carry-
bypass adder or carry-skip adder.
 If this is not the case the carry is obtained by way
of the normal route.

46
47
48
Carry Bypass in Manchester Carry-Chain
Adder
 Fig shows the possible carry propagation paths
when FA circuit is implemented in Manchester-
carry style.

 It shows how bypass speeds up addition operation:


The carry propagates either through the bypass
path, or a carry is generated somewhere in the
chain.

 In both cases delay is smaller than normal ripple


conf.

49
Carry By-pass in Manchester carry chain
adder
 Fig shows the possible carry-
propagation paths when full-
adder is implemented in
Manchester-carry style.
 This fig shows how the by-
pass speeds up the addition:
The carry propagates either
thru. By-Pass path, or a carry
is generated somewhere in
the chain.
 In both ths cases delay is
smaller than normal ripple
conf.

50
 Area overhead incurred by adding by-pass
transistor is small typically between 10 % to 20 %.

 Adding by-pass path breaks the regular bit-slice


structure.

51
Worst case delay in carry-skip adders
 To compute the delay of N-bit adder.

 Assume that the total adder is divided in (N/M)


equal-length bypass-stages, each of which contains
M bits.

52
 The approximate total propagation delay time is given
by:

tsetup : fixed overhead time to create the


generate and propagate signals
 tcarry : propagation delay through single bit. The
worst case delay thru. Single stage of M bits is
approx. M times larger. (Mtcarry)

 tbypass : propagation delay through bypass


multiplexer of a single stage

 tsum : time to generate the sum of final stage

 The critical path is shaded in grey color on block


53 diagram.
54
tp still proportional to N
Ripple adder is faster for
smaller values of N.
Whereas, overhead of
extra by-pass multiplexer
makes the bypass
structure not interesting.
The cross-over point
depends on technology
considerations & is
normally situated betn.
Four & eight bits.

55
Carry Select Adder
 In ripple adder, every FA cell has to wait for the
incoming carry before outgoing carry can be
generated.
 The approach here is to anticipate both possible
values of carry i/p and evaluate the result
(i.e. sum) for both possibilities in advance.
 Once the real value of incoming carry is known, the
correct result is easily selected with a simple MUX
stage.
 This idea is called carry-select adder and is shown
in Fig below.

56
57 Fig: Four bit carry select module (Topology)
 Instead of waiting on the arrival of o/p carry of
bit k-1, both 0 and 1 possibilities are analyzed.

 From ckt. point of view two carry paths have to be


implemented.

 When C0,k-1 finally settles, either the result of 0


or 1 path is selected by the MUX, which can be
performed with a minimal delay.

 H/W overhead of the adder is restricted to one


MUX and extra carry path and equals about 30 %
wrt. A ripple adder ckt.
58
A full carry select adder (16 bit) is now
constructed by chaining a no. of equal length adder
stages. The critical path is shaded in gray color.

59
60
 Class Assignment :-
1. Comparison of CMOS, NMOS & Transmission gate logic
circuits.
2. Comparative analysis of adder circuits wrt Speed, power
and area parameters (prepare the tabular comparative
analysis )

61
SHIFTERS
 Combinational vs sequential circuits

 Flip-flop is a single bit data storage element in


digital designs and is called as 1-bit memory cell.

 To increase the storage capacity we have to use


group of FFs. This group of FFs is known as
REGISTERs.

 The “n” bit register consists of “n” number of FFs


& is capable of storing “n-bit” word.

63
SHIFT REGISTERS

64
One bit shifter
Right nop Left

Ai Bi

Ai-1 Bi-1

Bit-Slice i

...

Fig: one-bit( left-right) programmable shifter


65
 Shift operation is another essential arithmetic
operation that requires adequate H/W support.

 It is used mainly in floating-point units, scalars


and multiplications etc.

 To shift the data R/L appropriate signal wiring is


needed known as control wires.

A programmable shifter, is more complex &


requires complex circuitry.

66
 Fig shows a basic 1-bit shifter.

 Depending upon the control signals, i/p word is


either shifted to L/R, or else it remains
unchanged.

 Multi-bit shifters can be designed by cascading a


number of these units.

 Disadv: Structure become complex & too slow


for larger shift values.

67
 Thus, a more structured approach is needed for
shifter designs.
1. Barrel shifter 2 Logarithmic shifter

68
Barrel shifter
 Definition
 Structure
 Operation

69
 It is a type of shifter which plays important role in
data shifting and rotation in a single clock cycle.

 It is having applications in many area: Eg: ALU,


Microprocessor units.

 Presently there are large number of shifters in use.

 The simplest shifter is the shift register, which


can shift its data bit-by-bit (1 bit/position) per
clock cycle.

70
 But for some specific application there may be the
need to shift several bits of information in one
clock cycle and also to vary the length of the clock
cycle.

 So BARREL SHIFTES will be useful there.

71
Structure of Barrel shifter
 It consists of an array of transistors.

 Row (horizontal portion) equal to the word length of the


data.

 Column (vertical portion) equal to maximum shift width (1


bit, 2 bit, 3 bit etc).

 The control wires are routed diagonally upwards through


the array.

 Barrel shifter needs control wires for every shift in bit.

72
4×4 barrel shifter

73
Barrel Shifter: Area dominated by wiring
A3
B3

Sh1
A2
B2

Sh2 : Data Wire


A1
B1 : Control Wire

Sh3
A0
B0

Sh0 Sh1 Sh2 Sh3

74
4×4 Barrel Shifter Layout

A3

A2

A1

A0

Sh0 Sh1 Sh2 Sh3


Buffer
Widthbarrel ~ 2 pm M
75
 Advantage:
 The signal has to pass through at most one
transmission gate. Hence propagation delay is
ideally constant.

But practically it has certain finite delay.


since the capacitance at the input of the buffers
rises linearly with the maximum shift width.

76
 An important property of this ckt is the the layout
is dominated by the no. wires running through the
cell and not the transistors.

77
Operation
 Barrel shifter can be designed to perform foll 6-
types of operations:

1. Shift right logical


2. Shift left logical
3. Shift right arithmetic
4. Shift right arithmetic
5. Rotate right
6. Rotate left.

78
 Designing of a barrel shifter is almost symmetrical
and can be done using repetitive combinational
logic blocks.
 It can be designed using MUX.
 Trees of 2:1, 4:1, 8:1 MUX are used for this
purpose.

79
80
Logarithmic Shifter
While the Barrel shifter implements the whole shifter
as a single array of pass transistors, the Logarithmic
shifter uses a staged approach.

The total shift value is decomposed into shifts over


powers of two.

A Shifter with a max shift width of M consists of


log2M stages, where the ith stage either shifts over 2i
or passes data unchanged.

81
 An ex of shifter with a max shift value of seven bits is shown
in Fig below

For ex to shift over five bits, the first stage is set to shift
mode, second to pass mode and third again to shift mode.

Here the control word is already encoded and no separate


decoder is required.

 The speed of logarithmic shifter depends on the shift width in


logarithmic way, since an M-bit shifter requires log 2M stages.

82
83
Furthermore, the series connection of pass transistors
slows the shifter down for larger shift values.
We conclude that, a barrel shifter is appropriate for
smaller shifters.
For larger shift value, logarithmic shifter becomes
more effective, in terms of both area and speed.
Further, the logarithmic shifter is easily parameterised,
allowing for automatic generation.

84

You might also like