Sub-System Design: Designing of Various Arithmetic Building Blocks

Sub-system Design
Designing of various Arithmetic Building Blocks
By Vishakha Bhale, PhD

DESIGN OF ARITHMATIC BUILDING BLOCKS:
Data Path Operations: Adders, Shifters,
Multiplier, Power and Speed Trade off In Data-
path Structures.
Reference Book: Digital Integrated Circuits

By:-
Jan M. Rabaey,
Anantha Chandrakasan &
Borivoje Nikolic
NPTEL Video Lectures:-
2
Adders circuits….
 Ripple-carry
 Mirror
 Carry by-pass
 Linear carry select
 Carry Looked Ahead
 Transmission gate based
 Manchester carry chain
3
Data paths in Digital Processor Architectures
 Components of digital processor data paths, memory,
control and input/output blocks.
 Data paths is the core of the processors ---- all

computations are performed.
 Other blocks are support units.
 A typical data path consists of an interconnection of

combinational functions, such as arithmetic operators
(addition, multiplications, comparison & shift) or logic
(XOR, OR, AND etc)
4
 The design of the arithmetic operators is the topic
of our discussion.
5
6
 Data paths are often arranged in a bit-sliced
organization as shown in Figure.
 Data in a processor are arranged in a word-based

fashion.
Eg: Micro-processor operating on a single data
paths --- 32 or 64 bits wide.
Signal processor data paths ---- 5 to 24 bits

wide.
 Due to the same operation performed frequently on

each bit of the data word, the data path consists of
32 bit slices, each operating on a single bit ---- hence
7
name bit sliced.
The Adders
 Commonly used arithmetic operation & is speed-
limiting element.
 Careful optimization of adder is of utmost

importance
----- Optimization (logic or circuit level)
8
The Binary Adder: Definitions
 A & B are adder inputs
 Ci carry input
 S sum
 C0 carry output
9
10
11
(1)
(2)
(3)
12
(4)
13
 G & P are only functions of A & B are independent
upon Ci.
 An N-bit adder can be constructed by cascading N

FA’s in series, connecting C0, k-1 to Ci, k for k = 1 to
N-1 and first carry-in Ci,0=0.
 This conf. is calledripple-carry adder , since carry

bit “ripples” from one stage to the other.
 The delay through the circuit depends upon the

number of logic stages that must be traversed and is
a function of applied input signals.
14
 For some i/p signals, no rippling effect occurs at all,
while for others, the carry ripples all the way from its
LSB to MSB.
 The propagation delay of such a structure (critical
path) is defined as the worst case delay over all
possible input patterns.
 In case of ripple carry adder worst case delay occurs
when a carry generated at LSB position propagates
all the way to MSB position.
This carry will be finally consumed in the last
stage to produce the sum.
15
 The delay is then proportional to the number of
bits in input words N is given by:
(4)
tcarry and tsum equal the propagation delays from

Ci to C0 and S, respectively
16
Two important conclusions can be drawn
from Eq (4):
1. The propagation delay of ripple carry adder is
linearly proportional to N.
2. When designing FA for fast ripple-carry adder,

optimize tcarry than tsum, since the latter has the
less influence on the total value of tadder.
17
Inverting property of Full Adder
Statement: “Inverting all inputs to a full
adder results in inverted values for all
outputs”
It is expressed in equations as follows:
(5)
18
 This property is useful in optimizing the speed of
ripple-carry adder.
 The reorganized eq. set here is:
(6)
19
C0
20
 The corresponding adder using
complementary static CMOS requires
28 transistors, consuming large area
and also it is slow.
 Tall PMOS transistor stacks in both
carry & sum generation circuits.
 Intrinsic load capacitance of C0 is
large
(2 diffusion, 6 gate capacitance &
wiring capacitances)
 Signal propagates through two
inverting stages in carry generation
circuit.
 Sum generation ckt needs one extra
logic stage.
21
Mirror Adder Design
 Improved adder ckt ….
 Operation based on Eq. (6)
 Carry generation ckt is imp. To analyzed
1. Carry-inverting gate is eliminated
2. PUN & PDN are not duals instead they form
the cleaver implementation of P/G/D functions.
 This results in considerable reduction in both area
and delay.
22
6
12 12 4 4
4 4 6
12 4 4 6
6 2 2 3
6 2
6 2 2 3
2
3
23
Observations from mirror adder design
1. This FA cell requires only 24 transistors
2. The NMOS & PMOS chains are completely
symmetrical.
3. Maximum of two series transistors in carry-
generation circuitry. (tall transistor stacking in
PUN is reduced)
4. The transistors connected to Ci are placed
closest to o/p of the gate.
5. Only transistors in carry stage have to be
optimized for speed keeping transistors in sum
stage for minimum size (W/L)
24
5. Transistor sizing for minimum delay:
----- Keep size of the carry stage 3 to 4 times that

of sum stage.
This maintains the optimal fan-out of 2.
The resulting transistor sizes are annotated on
Figure. Where PMOS/NMOS ratio of 2 is assumed.
25
Carry Looked-ahead Adders
Algorithm based on prediction of carry.
Benefits of CLA circuit:
1. Speed improvement due to pre-computation of carry
before it is actually calculated at all the stages of adder
circuits.
26
27
CLA LOGIC
28
Carry Looked-ahead Adders
Implementation issues in CLA circuit:
1. Irregular gate structures
2. Needs extra logic (means extra H/W) for speed
improvement.
3. Restrictions on Fan-in parameter….since the
designer can’t go beyond adding too many i/p’s to a
particular gate.
4. Non-modularity of CLA structure
5. Hence, layout will not be symmetric.
29
Comparison of CLA with conventional FA
representation
In conventional FA, we
get regular structure i.e
the design follows the
concept of Modularity
Layout will be
symmetric
30
Transmission Gate XOR
B
M2
A
A
F
M1 M3/M4
B
31
 Another ex of effective
use of txion gate is the
popular XOR ckt.
 The complete
implementation requires
only 6 transistors
(including inverters to
generate B’) compared
with the 12 transistors
required for its CMOS
implementation
32
To understand the operation of the ckt we need to only analyze B=0 & B=1 cases
separately.
For B=1, M1 & M2 acts as inverter, M3 & M4 OFF
----o/p function F1= A’B
For B=0, M1 & M2 are disabled , M3 & M4 is operational
----o/p function F2=AB’
Overall function F= F1+F2
F=A’B+AB’
33
Transmission-Gate-Based Adder
 A FA implementation using transmission gate is
shown in Fig. below and uses 24 transistors.
 It is based on propagate-generate model.
 The propagate signal which is the XOR of inputs A

& B, is used to select true or complementary value
of the i/p carry as new sum o/p.
34
 Based on propagate signal, o/p carry is either set
to i/p carry or either one of the i/ps A or B.
 Having similar delays for both sum and carry o/ps.
35
36
Benefits of Txion gate adder ckt
Compact area
Similar delays in both sum and carry generation
circuits
37
Fig: Manchester carry gates (a) Static, using P, G & K
38 (b) dynamic using only P & G signals
 A Manchester carry chain adder uses a cascade of
pass transistors to implement carry chain.
 Here the carry propagation ckt for txion gate

based adder is simplified by adding G & P signals.
 The P path is unchanged, it passes Ci to C0 o/p if P

signal (A XOR B) is true.
Else, o/p is either pulled low by Di signal or pulled
up by Gi signal.
39
 The dynamic implementation
(Fig b above) makes this ckt
more simple.
 Here txion gates are

replaced by NMOS-only
pass transistors.
 Here pre-charging the o/p

eliminates the need for kill
signal (for the case in which
the carry chain propagates
the complementary values of
carry signals).
40
41
 During pre-charge (phi=0), all intermediate nodes
of the pass transistor carry chain are precharges
to VDD.
 During evaluation phase, Ak node is discharged

when there is an incoming carry and propagation
signal Pk is high, OR when there is generate signal
for stage k (Gk) is high.
42
43
Carry by-pass adders OR Carry-skip adders
 The ripple carry adder is only practical for the

implementation of additions with a relatively small
word length.
 Desktop computers use word lengths of 32 bits,

servers require 64 bits, very fast computers, such
as mainframes, supercomputers or multimedia
processors require word length of up to 128 bits.
 The linear dependence of adder speed on number

of bits makes usage of ripple adder impractical.
44
The Carry-Bypass Adder
 Consider 4-bit adder block of Fig.(a).
 Let the values of Ak & Bk (k=0,1,2,3) are such that

all propagate signals Pk (k=0….3) are high.
 An incoming carry Ci,0=1 propagates under those

conditions through the complete adder chain and
causes an outgoing carry C0,3=1.
45
 In other words,
If (P0P1P2P3=1) then C0,3=Ci.0
Else either DELETE or GENERATE occurred
 This speeds up the adder operation.
 When BP=P0P1P2P3=1, the incoming carry is
forwarded immediately to the next block through
the by-pass transistor Mb- hence name carry-
bypass adder or carry-skip adder.
 If this is not the case the carry is obtained by way
of the normal route.
46
47
48
Carry Bypass in Manchester Carry-Chain
Adder
 Fig shows the possible carry propagation paths
when FA circuit is implemented in Manchester-
carry style.
 It shows how bypass speeds up addition operation:

The carry propagates either through the bypass
path, or a carry is generated somewhere in the
chain.
 In both cases delay is smaller than normal ripple

conf.
49
Carry By-pass in Manchester carry chain
adder
 Fig shows the possible carry-
propagation paths when full-
adder is implemented in
Manchester-carry style.
 This fig shows how the by-
pass speeds up the addition:
The carry propagates either
thru. By-Pass path, or a carry
is generated somewhere in
the chain.
 In both ths cases delay is
smaller than normal ripple
conf.
50
 Area overhead incurred by adding by-pass
transistor is small typically between 10 % to 20 %.
 Adding by-pass path breaks the regular bit-slice

structure.
51
Worst case delay in carry-skip adders
 To compute the delay of N-bit adder.
 Assume that the total adder is divided in (N/M)

equal-length bypass-stages, each of which contains
M bits.
52
 The approximate total propagation delay time is given
by:
tsetup : fixed overhead time to create the

generate and propagate signals
 tcarry : propagation delay through single bit. The
worst case delay thru. Single stage of M bits is
approx. M times larger. (Mtcarry)
 tbypass : propagation delay through bypass

multiplexer of a single stage
 tsum : time to generate the sum of final stage
 The critical path is shaded in grey color on block

53 diagram.
54
tp still proportional to N
Ripple adder is faster for
smaller values of N.
Whereas, overhead of
extra by-pass multiplexer
makes the bypass
structure not interesting.
The cross-over point
depends on technology
considerations & is
normally situated betn.
Four & eight bits.
55
Carry Select Adder
 In ripple adder, every FA cell has to wait for the
incoming carry before outgoing carry can be
generated.
 The approach here is to anticipate both possible
values of carry i/p and evaluate the result
(i.e. sum) for both possibilities in advance.
 Once the real value of incoming carry is known, the
correct result is easily selected with a simple MUX
stage.
 This idea is called carry-select adder and is shown
in Fig below.
56
57 Fig: Four bit carry select module (Topology)
 Instead of waiting on the arrival of o/p carry of
bit k-1, both 0 and 1 possibilities are analyzed.
 From ckt. point of view two carry paths have to be

implemented.
 When C0,k-1 finally settles, either the result of 0

or 1 path is selected by the MUX, which can be
performed with a minimal delay.
 H/W overhead of the adder is restricted to one

MUX and extra carry path and equals about 30 %
wrt. A ripple adder ckt.
58
A full carry select adder (16 bit) is now
constructed by chaining a no. of equal length adder
stages. The critical path is shaded in gray color.
59
60
 Class Assignment :-
1. Comparison of CMOS, NMOS & Transmission gate logic
circuits.
2. Comparative analysis of adder circuits wrt Speed, power
and area parameters (prepare the tabular comparative
analysis )
61
SHIFTERS
 Combinational vs sequential circuits
 Flip-flop is a single bit data storage element in

digital designs and is called as 1-bit memory cell.
 To increase the storage capacity we have to use

group of FFs. This group of FFs is known as
REGISTERs.
 The “n” bit register consists of “n” number of FFs

& is capable of storing “n-bit” word.
63
SHIFT REGISTERS
64
One bit shifter
Right nop Left
Ai Bi
Ai-1 Bi-1
Bit-Slice i
...
Fig: one-bit( left-right) programmable shifter

65
 Shift operation is another essential arithmetic
operation that requires adequate H/W support.
 It is used mainly in floating-point units, scalars

and multiplications etc.
 To shift the data R/L appropriate signal wiring is

needed known as control wires.
A programmable shifter, is more complex &

requires complex circuitry.
66
 Fig shows a basic 1-bit shifter.
 Depending upon the control signals, i/p word is

either shifted to L/R, or else it remains
unchanged.
 Multi-bit shifters can be designed by cascading a

number of these units.
 Disadv: Structure become complex & too slow

for larger shift values.
67
 Thus, a more structured approach is needed for
shifter designs.
1. Barrel shifter 2 Logarithmic shifter
68
Barrel shifter
 Definition
 Structure
 Operation
69
 It is a type of shifter which plays important role in
data shifting and rotation in a single clock cycle.
 It is having applications in many area: Eg: ALU,

Microprocessor units.
 Presently there are large number of shifters in use.
 The simplest shifter is the shift register, which

can shift its data bit-by-bit (1 bit/position) per
clock cycle.
70
 But for some specific application there may be the
need to shift several bits of information in one
clock cycle and also to vary the length of the clock
cycle.
 So BARREL SHIFTES will be useful there.
71
Structure of Barrel shifter
 It consists of an array of transistors.
 Row (horizontal portion) equal to the word length of the

data.
 Column (vertical portion) equal to maximum shift width (1

bit, 2 bit, 3 bit etc).
 The control wires are routed diagonally upwards through

the array.
 Barrel shifter needs control wires for every shift in bit.
72
4×4 barrel shifter
73
Barrel Shifter: Area dominated by wiring
A3
B3
Sh1
A2
B2
Sh2 : Data Wire

A1
B1 : Control Wire
Sh3
A0
B0
Sh0 Sh1 Sh2 Sh3
74
4×4 Barrel Shifter Layout
A3
A2
A1
A0
Sh0 Sh1 Sh2 Sh3

Buffer
Widthbarrel ~ 2 pm M
75
 Advantage:
 The signal has to pass through at most one
transmission gate. Hence propagation delay is
ideally constant.
But practically it has certain finite delay.

since the capacitance at the input of the buffers
rises linearly with the maximum shift width.
76
 An important property of this ckt is the the layout
is dominated by the no. wires running through the
cell and not the transistors.
77
Operation
 Barrel shifter can be designed to perform foll 6-
types of operations:
1. Shift right logical

2. Shift left logical
3. Shift right arithmetic
4. Shift right arithmetic
5. Rotate right
6. Rotate left.
78
 Designing of a barrel shifter is almost symmetrical
and can be done using repetitive combinational
logic blocks.
 It can be designed using MUX.
 Trees of 2:1, 4:1, 8:1 MUX are used for this
purpose.
79
80
Logarithmic Shifter
While the Barrel shifter implements the whole shifter
as a single array of pass transistors, the Logarithmic
shifter uses a staged approach.
The total shift value is decomposed into shifts over

powers of two.
A Shifter with a max shift width of M consists of

log2M stages, where the ith stage either shifts over 2i
or passes data unchanged.
81
 An ex of shifter with a max shift value of seven bits is shown
in Fig below
For ex to shift over five bits, the first stage is set to shift
mode, second to pass mode and third again to shift mode.
Here the control word is already encoded and no separate

decoder is required.
 The speed of logarithmic shifter depends on the shift width in

logarithmic way, since an M-bit shifter requires log 2M stages.
82
83
Furthermore, the series connection of pass transistors
slows the shifter down for larger shift values.
We conclude that, a barrel shifter is appropriate for
smaller shifters.
For larger shift value, logarithmic shifter becomes
more effective, in terms of both area and speed.
Further, the logarithmic shifter is easily parameterised,
allowing for automatic generation.
84

Sub-System Design: Designing of Various Arithmetic Building Blocks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Sub-System Design: Designing of Various Arithmetic Building Blocks

Uploaded by

Copyright:

Available Formats

Sub-system Design

Designing of various Arithmetic Building Blocks

By Vishakha Bhale, PhD

Reference Book: Digital Integrated Circuits

 Data paths is the core of the processors ---- all

 Other blocks are support units.

 A typical data path consists of an interconnection of

 Data in a processor are arranged in a word-based

Signal processor data paths ---- 5 to 24 bits

 Due to the same operation performed frequently on

 Careful optimization of adder is of utmost

 An N-bit adder can be constructed by cascading N

 This conf. is calledripple-carry adder , since carry

 The delay through the circuit depends upon the

tcarry and tsum equal the propagation delays from

2. When designing FA for fast ripple-carry adder,

----- Keep size of the carry stage 3 to 4 times that

 It is based on propagate-generate model.

 The propagate signal which is the XOR of inputs A

 Having similar delays for both sum and carry o/ps.

 Here the carry propagation ckt for txion gate

 The P path is unchanged, it passes Ci to C0 o/p if P

 Here txion gates are

 Here pre-charging the o/p

 During evaluation phase, Ak node is discharged

 The ripple carry adder is only practical for the

 Desktop computers use word lengths of 32 bits,

 The linear dependence of adder speed on number

 Let the values of Ak & Bk (k=0,1,2,3) are such that

 An incoming carry Ci,0=1 propagates under those

 It shows how bypass speeds up addition operation:

 In both cases delay is smaller than normal ripple

 Adding by-pass path breaks the regular bit-slice

 Assume that the total adder is divided in (N/M)

tsetup : fixed overhead time to create the

 tbypass : propagation delay through bypass

 tsum : time to generate the sum of final stage

 The critical path is shaded in grey color on block

 From ckt. point of view two carry paths have to be

 When C0,k-1 finally settles, either the result of 0

 H/W overhead of the adder is restricted to one

 Flip-flop is a single bit data storage element in

 To increase the storage capacity we have to use

 The “n” bit register consists of “n” number of FFs

Fig: one-bit( left-right) programmable shifter

 It is used mainly in floating-point units, scalars

 To shift the data R/L appropriate signal wiring is

A programmable shifter, is more complex &

 Depending upon the control signals, i/p word is

 Multi-bit shifters can be designed by cascading a

 Disadv: Structure become complex & too slow

 It is having applications in many area: Eg: ALU,

 Presently there are large number of shifters in use.

 The simplest shifter is the shift register, which

 So BARREL SHIFTES will be useful there.

 Row (horizontal portion) equal to the word length of the

 Column (vertical portion) equal to maximum shift width (1

 The control wires are routed diagonally upwards through

 Barrel shifter needs control wires for every shift in bit.

Sh2 : Data Wire

Sh0 Sh1 Sh2 Sh3

Sh0 Sh1 Sh2 Sh3

But practically it has certain finite delay.