You are on page 1of 47

02/09/17

Lecture 1: EVOLUTION OF COMPUTER SYSTEM


Lecture 33:EVOLUTION
Lecture 1: DESIGN OFOFADDERS (PART
COMPUTER 1)
SYSTEM

DR. KAMALIKA DATTA


PROF. INDRANIL SENGUPTA
DR. KAMALIKA DATTA
DEPARTMENT OFOF
DEPARTMENT COMPUTER SCIENCE
COMPUTER SCIENCEAND
ANDENGINEERING, NITKHARAGPUR
ENGINEERING, IIT MEGHALAYA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA

1
2

Introduc)on
• Computers are built using 7ny electronic switches.
– Typically made up of MOS transistors.
– The state of the switches are typically expressed in binary (ON/OFF).
• To design arithme7c circuits for use in computers, we need to
work with binary numbers.
– How to represent numbers in binary?
– How to carry out various arithme7c opera7ons in binary?
– How to implement them efficiently in hardware?

2 2

1
02/09/17

Representa)on of Integers
• Unsigned integer number representa7on
– For n-bit binary, range is 0 to (2n-1).
• Signed integer number representa7on
– For n-bit 1’s complement, range is -(2n-1-1) to +(2n-1-1).
– For n-bit 2’s complement, range is -2n-1 to +(2n-1-1).
– For both the representa7ons, subtrac7on can be done using addi7on.
– 2’s complement representa7on is most widely used.

3 2

Subtrac)on Using Addi)on :: 1’s Complement


• How to compute A – B ?
– Compute the 1’s complement of B (say, B1).
– Compute R = A + B1
– If a carry is obtained aZer addi7on is ‘1’:
• Add the carry back to R (called end-around carry).
• That is, R = R + 1.
• The result is a posi7ve number.
Else
• The result is nega7ve, and is in 1’s complement form in R.

4 2

2
02/09/17

Example 1 :: 6 – 2
1’s complement of 2 = 1101

6 :: 0110 Assume 4-bit representa7ons.


-2 :: 1101 Since there is a carry, it is
1 0011 added back to the result.
1 The result is posi7ve.
End-around
carry 0100 = +4

Spring Semester Programming and Data 5 2

2007 Structure

Example 2 :: 3 – 5
1’s complement of 5 = 1010

3 :: 0011 Assume 4-bit representa7ons.


-5 :: 1010
Since there is no carry, the result is
1101 = -2 nega7ve.
1101 is the 1’s complement of
0010, that is, it represents –2.

6 2

3
02/09/17

Subtrac)on Using Addi)on :: 2’s Complement

• How to compute A – B ?
– Compute the 2’s complement of B (say, B2).
– Compute R = A + B2
– If a carry is obtained aZer addi7on is ‘1’:
• Ignore the carry.
• The result is a posi7ve number.
Else
• The result is nega7ve, and is in 2’s complement form in R.

7 2

Example 1 :: 6 – 2
2’s complement of 2 = 1101 + 1 = 1110

6 :: 0110 Assume 4-bit representa7ons.


-2 :: 1110
Presence of carry indicates that
1 0100 = +4 the result is posi7ve.
No need to add the end-around
Ignore carry carry like in 1’s complement.

8 2

4
02/09/17

Example 2 :: 3 – 5

2’s complement of 5 = 1010 + 1 = 1011

3 :: 0011
Assume 4-bit representa7ons.
-5 :: 1011
1110 = -2 Since there is no carry, the result is
nega7ve.
1110 is the 2’s complement of
0010, that is, it represents –2.

9 2

Addi)on of Two Binary Digits (Bits)


• When two bits A and B are added, a sum (S) and carry (C) are
generated as per the following truth table:
Inputs Outputs S = A’.B + A.B’ = A ⊕ B
0 + 0 = 00
A B S C A S
0 + 1 = 01 C = A.B
HA
0 0 0 0 1 + 0 = 01 B C
1 + 1 = 10
0 1 1 0
1 0 1 0
HALF ADDER
1 1 0 1

10 2

5
02/09/17

Addi)on of Mul)-bit Binary Numbers


0010110 Carry 1111110 Carry
0101011 Number A 0111111 Number A
+ 0001001 Number B + 0000001 Number B
0110100 Sum S 1000000 Sum S

• At every bit posi7on (stage), we require to add 3 bits:


Ø 1 bit for number A
Ø 1 bit for number B WE NEED A FULL ADDER
Ø 1 carry bit coming from the previous stage

11 2

Full Adder
Inputs Outputs
A S
A B Cin S Cout B FA
Cin Cout
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0 S = A’.B’.Cin + A’.B.Cin’ + A.B’Cin’ + A.B.C
0 1 1 0 1 = A ⊕ B ⊕ Cin
1 0 0 1 0 Cout = B.Cin + A.Cin + A.B + A.B.Cin
= A.B + B.Cin + A.Cin
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1

12 2

6
02/09/17

Various Implementa)ons of Full Adder

A A HALF C OR Cout
B B ADDER S A HALF C
Cin B ADDER S S

13 2

• Delay of a full adder:


– Assume that the delay of all basic gates
(AND, OR, NAND, NOR, NOT) is δ.
– Delay for Carry = 2δ
– Delay for Sum = 3δ
(AND-OR delay plus one inverter delay)

14 2

7
02/09/17

Parallel Adder Design


• We shall look at the various designs of n-bit parallel adder.
a) Ripple carry adder
b) Carry look-ahead adder
c) Carry save adder
d) Carry select adder

15 2

Ripple Carry Adder


• Cascade n full adders to create a n- bit parallel adder.
• Carry output from stage-i propagates as the carry input to
stage-(i+1).
• In the worst-case, carry ripples through all the stages.
1111110 Carry
0111111 Number A
+ 0000001 Number B
1000000 Sum S

16 2

8
02/09/17

An-1 Bn-1 A2 B2 A1 B1 A0 B0

FAn-1
Cn-1
…… C3
FA2
C2
FA1
C1
FA0
C0

Delay for C1 = 2δ
Cn Sn-1 S2 S1 S0 Delay for C2 = 4δ
Delay for Cn-1 = 2(n-1)δ
Two numbers: An-1…A2A1A0 and Bn-1…B2B1B0 Delay for Cn = 2nδ
Input carry: C0
Delay for S0 = 3δ
Sum: Sn-1…S2S1S0
Delay for S1 = 2δ + 3δ = 5δ
Output carry: Cn
Delay for S2 = 4δ + 3δ = 7δ
Delay for Sn-1 = 2(n-1)δ + 3δ = (2n+1) δ
Delay is propor7onal to n

17 2

How to Design a Parallel Subtractor?


• Observa7on:
– Compu7ng A-B is the same as adding the 2’s complement of B to A.
– 2’s complement is equal to 1’s complement plus 1.
– Let Xi = Bi’.
An-1 Xn-1 A2 X2 A1 X1 A0 X0

FAn-1
Cn-1
…… C3
FA2
C2
FA1
C1
FA0
C0 = 1

Cn Sn-1 S2 S1 S0

18 2

9
02/09/17

A Parallel Adder/Subtractor
An-1 A1 A0 Bn-1 B1 B0
ADD’ / SUB
… xor … xor xor

Cn n-bit Parallel Adder C0


Sn-1 S2 S1 S0

19 2

END OF LECTURE 33

20 2

10
02/09/17

Lecture 1: EVOLUTION OF COMPUTER SYSTEM


Lecture
Lecture34: DESIGN OF
1: EVOLUTION OFADDERS (PART
COMPUTER 2)
SYSTEM

DR. KAMALIKA
DR.
PROF. KAMALIKA
INDRANIL
DATTA
DATTA
SENGUPTA
DEPARTMENT OFOF
DEPARTMENT COMPUTER
DEPARTMENT COMPUTERSCIENCE
OF COMPUTER AND
SCIENCE
SCIENCE ENGINEERING,
AND
AND NITKHARAGPUR
ENGINEERING, IIT
ENGINEERING, NIT MEGHALAYA
MEGHALAYA

21
2

Carry Look-ahead Adder


• The propaga7on delay of an n-bit ripple carry order has been
seen to be propor7onal to n.
– Due to the rippling effect of carry sequen7ally from one stage to the next.
• One possible way to speedup the addi7on.
– Generate the carry signals for the various stages in parallel.
– Time complexity reduces from O(n) to O(1).
– Hardware complexity increases rapidly with n.

22 2

11
02/09/17

• Consider the i-th stage in the addi7on process.


• We define the carry generate and carry propagate
func7ons as: Ai Si
Gi = Ai.Bi Bi FA
Ci Ci+1
Pi = Ai ⊕ Bi
• Gi = 1 represents the condi7on when a carry is
generated in stage-i independent of the other stages. Ci+1 = Gi + Pi.Ci
• Pi = 1 represents the condi7on when an input carry Ci
will be propagated to the output carry Ci+1.

23 2

Unrolling the Recurrence


Ci+1 = Gi + PiCi = Gi + Pi (Gi-1 + Pi-1Ci-1) = Gi + PiGi-1 + PiPi-1Ci-1
= Gi + PiGi-1 + PiPi-1 (Gi-2 + Pi-2Ci-2)
= Gi + PiGi-1 + PiPi-1 Gi-2 + PiPi-1Pi-2Ci-2 = …..

i-1 i i
Ci+1 = Gi + ∑ Gk ∏ Pj + C0 ∏ Pj
k=0 j=k+1 j=0

24 2

12
02/09/17

Design of 4-bit CLA Adder


4 AND2 gates
C4 = G3 + G2P3 + G1P2P3 + G0P1P2P3 + C0P0P1P2P3 3 AND3 gates
C3 = G2 + G1P2 + G0P1P2 + C0P0P1P2 2 AND4 gates
C2 = G1 + G0P1 + C0P0P1 1 AND5 gate
1 OR2, 1 OR3, 1 OR4
C1 = G0 + C0P0 and 1 OR5 gate
S0 = A0 ⊕ B0 ⊕ C0 = P0 ⊕ C0
S1 = P1 ⊕ C1 4 XOR2 gates

S2 = P2 ⊕ C2
S3 = P3 ⊕ C3

25 2

Design of 4-bit CLA Adder


4 AND2 gates
C4 = G3 + G2P3 + G1P2P3 + G0P1P2P3 + C0P0P1P2P3 3 AND3 gates
C3 = G2 + G1P2 + G0P1P2 + C0P0P1P2 2 AND4 gates
C2 = G1 + G0P1 + C0P0P1 1 AND5 gate
1 OR2, 1 OR3, 1 OR4
C1 = G0 + C0P0 and 1 OR5 gate
S0 = A0 ⊕ B0 ⊕ C0 = P0 ⊕ C0
S1 = P1 ⊕ C1 4 XOR2 gates

S2 = P2 ⊕ C2
S3 = P3 ⊕ C3

26 2

13
02/09/17

Design of 4-bit CLA Adder


4 AND2 gates
C4 = G3 + C3P3 2 AND3 gates
C3 = G2 + G1P2 + G0P1P2 + C0P0P1P2 1 AND4 gates
C2 = G1 + G0P1 + C0P0P1 1 AND5 gate
1 OR2, 1 OR3, 1 OR4
C1 = G0 + C0P0 and 1 OR2 gate
S0 = A0 ⊕ B0 ⊕ C0 = P0 ⊕ C0
S1 = P1 ⊕ C1 4 XOR2 gates

S2 = P2 ⊕ C2
S3 = P3 ⊕ C3

27 2

The 4-bit CLA C4


P3

Circuit G3

C3
P2
G2

C2 P1
G1
P0
C1 G0
C0

28 2

14
02/09/17

B3 A3 B2 A2 B1 A1 B0 A0

Gi and Pi Generator 3δ

G3 P3 G2 P2 G1 P1 G0 P0

4-bit Carry Look Ahead Circuit 2δ

C3 C2 C1 C0

xor xor xor xor 3δ

C4
S3 S2 S1 S0

29 2

16-bit Adder Using 4-bit CLA Modules


A15-A12 B15-B12 A11-A8 B11-B8 A7-A4 B7-B4 A3-A0 B3-B0

4-bit CLA 4-bit CLA 4-bit CLA 4-bit CLA C0


Adder C12 Adder C8 Adder C4 Adder

C16 S15-S12 S11-S8 S7-S4 S3-S0

Problem: Carry propaga5on between modules s5ll slows down the adder

30 2

15
02/09/17

• Solu7on:
– Use a second level of carry look-ahead mechanism to generate the
input carries to the CLA blocks in parallel.
– The second level of CLA generates C4, C8, C12 and C16 in parallel with
two gate delays (2δ).
– For larger values of n, more CLA levels can be added.
• Delay calcula7on of a 16-bit adder:
a) For original single-level CLA: 14δ
b) For modified two-level CLA: 10δ

31 2

Delay of a k-bit Adder

n TCLA TRCA
4 8δ 9δ
16 10δ 33δ TCLA = (6 + 2 log4 n ) δ
32 12δ 65δ
64 12δ 129δ
TRCA = (2n + 1) δ
128 14δ 257δ
256 14δ 513δ

32 2

16
02/09/17

Carry Select Adder


• Basically consists of two parallel adders (say, ripple-carry adder)
and a mul7plexer.
• For two given numbers A and B, we carry out addi7on twice:
– With carry-in as 0
– With carry-in as 1
• Once the correct carry-in is known, the correct sum is selected
by a mul7plexer.

33 2

Basic building block of a


carry-select adder, with
block size of 4.

• For a mul7-bit adder, the number


of bits in each carry select block
can be either uniform or variable.

34 2

17
02/09/17

Uniform sized adder


• A 16-bit carry select adder with a uniform block size of 4 is shown.
• The least significant block needs a single adder (since the carry-in is known).
• Total delay is 4 full adder delays, plus 3 MUX delays.

35 2

Variable-sized adder
• A 16-bit carry select adder with variable block sizes of 2-2-3-4-5 is shown.
• Total delay is 2 full adder delays, plus 4 MUX delays.

36 2

18
02/09/17

Carry Save Adder


• Here we add three operands (say, X, Y and Z) together.
• For adding mul7ple numbers, we have to construct a tree of
carry save adders.
– Used in combina7onal mul7plier design.
• Each carry save adder is simply an independent full adder
without carry propaga7on.
• A parallel adder is required only at the last stage.

37 2

• An illustra7ve example: X: 10011


Y: + 11001
X: 10011 X: 10011 Z: + 01011
Y: + 11001 Y: + 11001 S: 00001
Z: + 01011 Z: + 01011 C: 11011
C: 11011 S: 00001 Sum: 110111

A set of full adders generate carry The sum and carry vectors are
and sum bits in parallel added later (with proper shiZing)

38 2

19
02/09/17

An n-bit Carry Save Adder


Xn-1 Yn-1 Zn-1 X2 Y 2 Z 2 X1 Y 1 Z 1 X0 Y 0 Z 0

Full …. Full Full Full


Adder Adder Adder Adder

Cn-1 Sn-1 C2 S2 C1 S1 C0 S0

The carry input of the full adder is used as the third


input

39 2

Adding m Numbers:
Some Examples CSA CSA

m=3
CSA m=6
CSA
m=4
CSA CSA
CSA

Parallel Adder Parallel Adder


Parallel Adder

40 2

20
02/09/17

END OF LECTURE 34

41 2

Lecture 1: EVOLUTION OF COMPUTER SYSTEM


Lecture 35:1:DESIGN
Lecture OF OF
EVOLUTION MULTIPLIERS
COMPUTER (PART
SYSTEM1)

DR. KAMALIKA DATTA


DR.
PROF. KAMALIKA
INDRANIL DATTA
SENGUPTA
DEPARTMENT OF
DEPARTMENT COMPUTER
OF OF
DEPARTMENT COMPUTERSCIENCE
SCIENCE
COMPUTER AND
AND
SCIENCE ENGINEERING,
ANDENGINEERING, NIT
ENGINEERING, IIT
NIT MEGHALAYA
KHARAGPUR
MEGHALAYA

42
2

21
02/09/17

Genera)ng the Status Flags


• Many contemporary processors have a flag register that contains the status
of the last arithme7c / logic opera7on.
– Zero (Z): tells whether the result is zero.
• Can be used for both arithme7c and logic opera7ons.
– Sign (S): tells whether the result is posi7ve (=0) or nega7ve (=1).
• Can be used for both arithme7c and logic opera7ons.
– Carry (C): tells whether there has been a carry out of the most significant stage.
• Used only for arithme7c opera7ons.
– Overflow (V): tells whether the result is too large to fit in the target register.
• Used only for arithme7c opera7ons (addi7on and subtrac7on).

43 2

A B Fn-1 F1 F0 F
… Carry

out
NOR
ALU C Flag

Z Flag • Overflow can occur during addi7on


when the sign of the two operands
F
F are the same.
Assume A, B Fn-1 … ² Sign of the result becomes
different from the sign of the
and F are n-bit
S Flag operand(s).
registers
V = An-1.Bn-1.Fn-1’ + An-1’.Bn-1’.Fn-1
V = Fn-1 ⊕ Carry_out

44 2

22
02/09/17

• The MIPS architecture does not have any status flags.


• Why?
– MIPS ISA is designed for efficient pipeline implementa7on.
– Several instruc7ons can be in various stages of execu7on in the pipeline.
– Flag registers result in side effects among instruc7ons.
• MIPS stores informa7on about the flags temporarily in a GPR.
slt $t0, $s1, $s2
beq $t0, $zero, Label

45 2

Mul)plica)on of Unsigned Numbers


• Mul7plica7on requires substan7ally 1 0 1 0 Mul)plicand M (10)
more hardware than addi7on. 1 1 0 1 Mul)plier Q (13)
--------
• Mul7plica7on of two n-bit number
1 0 1 0
generates a 2n-bit product. 0 0 0 0
• We can use shiZ-and-add method. 1 0 1 0
1 0 1 0
– Repeated addi7ons of shiZed ---------------
versions of the mul7plicand. 1 0 0 0 0 0 1 0 Product P (130)

46 2

23
02/09/17

A General Case
A3 A2 A1 A0
• Each AiBj is called a par7al
B3 B2 B1 B0 product.
---------------------- • Genera7ng the par7al
A3 B 0 A 2 B 0 A 1 B 0 A 0 B 0 products is easy.
A3B 1 A 2B 1 A 1B 1 A 0B 1 • Requires just an AND
gate for each par7al
A3B 2 A 2B 2 A 1B 2 A 0B 2
product.
A 3B 3 A 2B 3 A 1B 3 A 0B 3 • Adding all the n-bit par7al
------------------------------------- products in hardware is more
difficult.

47 2

Design of a Combina)onal Array Mul)plier


• We can directly map the mul7plica7on process as
discussed to hardware.
– We use an array of cells to generate the par7al
products.
– Instead of adding the par7al products at the end,
we add the par7al products at every stage of the
mul7plica7on. Full Adder
• The required mul7plica7on cell is as shown.
– Combines capabili7es of par7al product genera7on
and also addi7on of par7al products.

48 2

24
02/09/17

• Extremely inefficient, and Multiplicand

requires very large amount (PP0)


0 m3 0 m2 0 m1 0 m0

of hardware. q0
0
• n2 mul7plica7on cells for PP1 p0
an n x n mul7plier. q1
0
• Advantage is that it is very PP2 p1
fast. q2
0
PP3 p2
q3
0
,
p7 p6 p5 p4 p3

Product: p7 p6 p5 p4 p3 p2 p1 p0

49 2

Unsigned Sequen)al Mul)plica)on


• Requires much less hardware, but requires several clock cycles to perform
mul7plica7on of two n-bit numbers.
– Typical hardware complexity: O(n).
– Typical 7me complexity: O(n).
• In the “hand mul5plica5on” that we have seen:
– If the i-th bit of the mul7plier is 1, the mul7plicand is shiZed leZ by i bit posi7ons,
and added to the par7al product.
– The rela7ve posi7on of the par7al products do not change; it is the mul7plicand
that gets shiZed leZ.

50 2

25
02/09/17

• In the “shi:-and-add” mul7plica7on that we discuss now, we make the


following modifica7ons.
– We do not shiZ the mul7plicand (i.e., keep its posi7on fixed).
– We right shiZ an 2n-bit par7al product at every step.

51 2

START

A = 0; C = 0; M: n-bit mul7plicand
COUNT = n;
M = mul7plicand; Q: n-bit mul7plier
Q = mul7plier;
A: n-bit temporary register
0 1
Q0
C: 1-bit carry out from adder
A=A+0 A=A+M

Arithme7c ShiZ Right (C, A, Q )


COUNT = COUNT – 1;

COUNT = 0? STOP

52 2

26
02/09/17

C A Q
Example 1: (10) x (13)
0 0 0 0 0 0 0 1 1 0 1 Initialization
Assume 5-bit numbers.
0 0 1 0 1 0 0 1 1 0 1 A = A + M
Step 1
M: (0 1 0 1 0)2 0 0 0 1 0 1 0 0 1 1 0 Shift
Q: (0 1 1 0 1)2
0 0 0 1 0 1 0 0 1 1 0 A = A + 0 Step 2
Product = 130 0 0 0 0 1 0 1 0 0 1 1 Shift
= (0 0 1 0 0 0 0 0 1 0)2 0 0 1 1 0 0 1 0 0 1 1 A = A + M Step 3
0 0 0 1 1 0 0 1 0 0 1 Shift
0 1 0 0 0 0 0 1 0 0 1 A = A + M
Step 4
0 0 1 0 0 0 0 0 1 0 0 Shift
0 0 1 0 0 0 0 0 1 0 0 A = A + 0
Step 5
0 0 0 1 0 0 0 0 0 1 0 Shift

53 2

C A Q
Example 2: (29) x (21)
0 0 0 0 0 0 1 0 1 0 1 Initialization
Assume 5-bit numbers.
0 1 1 1 0 1 1 0 1 0 1 A = A + M Step 1
M: (1 1 1 0 1)2 0 0 1 1 1 0 1 1 0 1 0 Shift
Q: (1 0 1 0 1)2
0 0 1 1 1 0 1 1 0 1 0 A = A + 0 Step 2
Product = 609 0 0 0 1 1 1 0 1 1 0 1 Shift
= (1 0 0 1 1 0 0 0 0 1)2 1 0 0 1 0 0 0 1 1 0 1 A = A + M Step 3
0 1 0 0 1 0 0 0 1 1 0 Shift
0 1 0 0 1 0 0 0 1 1 0 A = A + 0 Step 4
0 0 1 0 0 1 0 0 0 1 1 Shift
1 0 0 1 1 0 0 0 0 1 1 A = A + M Step 5
0 1 0 0 1 1 0 0 0 0 1 Shift

54 2

27
02/09/17

Data Path for Shif-and-Add Mul)plier

C A Q Q0

Carry
out n-bit registers
ADDER

M Control Unit ..
.
MUX

55 2

END OF LECTURE 35

56 2

28
02/09/17

Lecture 1: EVOLUTION OF COMPUTER SYSTEM


Lecture 36:1:DESIGN
Lecture OF OF
EVOLUTION MULTIPLIERS
COMPUTER (PART
SYSTEM2)

DR. KAMALIKA DATTA


DR.
PROF. KAMALIKA
INDRANIL DATTA
SENGUPTA
DEPARTMENT OF COMPUTER
DEPARTMENT SCIENCE
OF COMPUTER AND
SCIENCE ENGINEERING,
AND NIT
ENGINEERING, NIT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, IIT MEGHALAYA
MEGHALAYA
KHARAGPUR

57
2

Signed Mul)plica)on
• We can extend the basic shiZ-and-add mul7plica7on method to
handle signed numbers.
• One important difference:
– Required to sign-extend all the par7al products before they are added.
– Recall that for 2’s complement representa7on, sign extension can be
done by replica7ng the sign bit any number of 7mes.

0101 = 0000 0101 = 0000 0000 0000 0101 = 0000 0000 0000 0000 0000 0000 0000 0101

1011 = 1111 1011 = 1111 1111 1111 1011 = 1111 1111 1111 1111 1111 1111 1111 1011

58 2

29
02/09/17

An Example: 6-bit 2’s


1 1 0 1 0 1 (-11)
complement X 0 1 1 0 1 0 (+26)
mul)plica)on -------------------------
0 0 0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 0 1 0 1
Note: For n-bit mul7plica7on, 0 0 0 0 0 0 0 0 0 0
since we are genera7ng a 2n-
1 1 1 1 1 0 1 0 1
bit product, overflow can never
occur. 1 1 1 1 0 1 0 1
0 0 0 0 0 0 0
-------------------------
1 1 1 0 1 1 1 0 0 0 1 0 (-286)

59 2

Booth’s Algorithm for Signed Mul)plica)on


• In the conven7onal shiZ-and-add mul7plica7on as discussed, for n-bit
mul7plica7on, we iterate n 7mes.
– Add either 0 or the mul7plicand to the 2n-bit par7al product (depending on the
next bit of the mul7plier).
– ShiZ the 2n-bit par7al product to the right.
• Essen7ally we need n addi5ons and n shi: opera5ons.
• Booth’s algorithm is an improvement whereby we can avoid the addi7ons
whenever consecu7ve 0’s or 1’s are detected in the mul7plier.
– Makes the process faster.

60 2

30
02/09/17

Basic Idea Behind Booth’s Algorithm


• We inspect two bits of the mul7plier (Qi, Qi-1) at a 7me.
– If the bits are same (00 or 11), we only shiZ the par7al product.
– If the bits are 01, we do an addi7on and then shiZ.
– If the bits are 10, we do a subtrac7on and then shiZ.
• Significantly reduces the number of addi7ons / subtrac7ons.
• Inspec7ng bit pairs as men7oned can also be expressed in terms of
Booth’s Encoding.
– Use the symbols +1, -1 and 0 to indicate changes w.r.t. Qi and Qi-1.
– 01 à +1, 10 à -1, 00 or 11 à 0.
– For encoding the least significant bit Q0, we assume Q -1 = 0.

61 2

• Examples of Booth encoding:


a) 01110000 :: +1 0 0 -1 0 0 0 0
b) 01110110 :: +1 0 0 -1 +1 0 -1 0
c) 00000111 :: 0 0 0 0 +1 0 0 -1
d) 01010101 :: +1 -1 +1 -1 +1 -1 +1 -1
• The last example illustrates the worst case for Booth’s
mul7plica7on (alterna7ng 0’s and 1’s in mul7plier).
– In the illustra7ons, we shall show the two mul7plier bits explicitly
instead of showing the encoded digits.

62 2

31
02/09/17

START

A = 0; Q -1 = 0 M: n-bit mul7plicand
COUNT = n;
M = mul7plicand; Q: n-bit mul7plier
Q = mul7plier;
A: n-bit temporary register
01 10
Q0Q -1
Q -1: 1-bit flip-flop
A=A+M 00 or A = A – M
11
Arithme7c ShiZ Right (A, Q , Q -1) Skips over consecu)ve 0’s
COUNT = COUNT – 1; and 1’s of the mul)plier Q.

COUNT = 0? STOP

63 2

A Q Q -1
Example 1: (-10) x (13)
0 0 0 0 0 0 1 1 0 1 0 Initialization
Assume 5-bit numbers.
0 1 0 1 0 0 1 1 0 1 0 A = A - M Step 1
M: (1 0 1 1 0)2 0 0 1 0 1 0 0 1 1 0 1 Shift
-M: (0 1 0 1 0)2
1 1 0 1 1 0 0 1 1 0 1 A = A + M Step 2
Q: (0 1 1 0 1)2 1 1 1 0 1 1 0 0 1 1 0 Shift
Product = -130 0 0 1 1 1 1 0 0 1 1 0 A = A - M Step 3
= (1 1 0 1 1 1 1 1 1 0)2 0 0 0 1 1 1 1 0 0 1 1 Shift

0 0 0 0 1 1 1 1 1 0 1 Shift Step 4
1 0 1 1 1 1 1 1 0 0 1 A = A + M
1 1 0 1 1 1 1 1 1 0 0 Shift Step 5

64 2

32
02/09/17

A Q Q -1
Example 2:
0 0 0 0 0 0 0 1 1 1 0 0 0 Initialization
(-31) x (28)
0 0 0 0 0 0 0 0 1 1 1 0 0 Shift Step 1
Assume 6-bit numbers.

M: (1 0 0 0 0 1)2
0 0 0 0 0 0 0 0 0 1 1 1 0 Shift Step 2
-M: (0 1 1 1 1 1)2 0 1 1 1 1 1 0 0 0 1 1 1 0 A = A - M
Step 3
Q: (0 1 1 1 0 0)2 0 0 1 1 1 1 1 0 0 0 1 1 1 Shift

Product = -868 0 0 0 1 1 1 1 1 0 0 0 1 1 Shift Step 4


= (1 1 0 0 1 0 0 0 0 0 1 1 1 1 1 0 0 0 1 Shift Step 5
0 1 1 1 0 0)2
1 0 0 1 0 0 1 1 1 0 0 0 1 A = A + M Step 6
1 1 0 0 1 0 0 1 1 1 0 0 0 Shift

65 2

Arithme7c
shiZ right Data Path for Booth’s Algorithm

An-1 A Q Q0 Q- -1

n-bit registers
SUBTRACT
ADD /

Control Unit ..
M .

Add / Subtract

66 2

33
02/09/17

Design of Fast Mul)plier


a) Bit-Pair Recoding of Booth’s Mul)plica)on
– A technique that halves the maximum number of summands; derived
directly from the Booth’s algorithm.
– If we group the Booth-coded mul7plier digits in pairs, we observe:
• (+1, -1): (+1, -1) * M = 2 * M – M = M
• (0, +1): (0, +1) * M = M
– We need a single addi7on instead of a pair of addi7on & subtrac7on.
• Other similar rules can be framed.
• Shown on next slide.

67 2

Original Booth-coded Pair Equivalent Recoded Pair


(+1, 0) (0, +2) • Every equivalent recoded
pair has at least one 0.
(-1, +1) (0, -1)
• Worst-case number of
(0, 0) (0, 0) addi7ons or subtrac7ons
(0, 1) (0, 1) is 50% of the number of
mul7plier bits.
(+1, 1) --
• Reduces the worst-case
(+1, -1) (0, +1)
7me required for
(-1, 0) (0, -2) mul7plica7on.

68 2

34
02/09/17

Example: (+13) X (-22) in 6-bits.

Original: Multiplier -- 1 0 1 0 1 0
Booth: Multiplier -- -1 +1 -1 +1 -1 0
Recoded: Multiplier -- 0 -1 0 -1 0 -2

0 0 1 1 0 1
. -1 . -1 . -2 • M = 0 0 1 1 0 1 (+13)
--------------------------
• -1 * M = 1 1 0 0 1 1
1 1 1 1 1 1 1 0 0 1 1 0
1 1 1 1 1 1 0 0 1 1
• -2 * M = 1 0 0 1 1 0
1 1 1 1 0 0 1 1
--------------------------
1 1 0 1 1 1 1 0 0 0 1 0

69 2

b) Carry Save Mul)plier


– We have seen earlier how carry save adders (CSA) can be used to add several
numbers with carry propaga7on only in the last stage.
– The par7al products can be generated in parallel using n2 AND gates.
– The n par7al products can then be added using a CSA tree.
– Instead of lezng the carries ripple through during addi7on, we save them and
feed it to the next row, at the correct weight posi7ons.

70 2

35
02/09/17

4 x 4 Carry
Save
Mul)plier

71 2

• Wallace Tree Mul7plier


– A Wallace tree is a circuit that reduces the problem of summing n n-bit
numbers to the problem of summing two Θ(n)-bit numbers.
– It uses n/3 (floor of) carry-save adders in parallel to convert the sum of n
numbers to the sum of 2n/3 (ceiling of) numbers.
– It then recursively constructs a Wallace tree on the 2n/3 (ceiling of) resul7ng
numbers.
– The set of numbers is progressively reduced un7l there are only two numbers
leZ.
– By performing many carry-save addi7ons in parallel, Wallace trees allow two
n-bit numbers to be mul7plied in Θ(log2 n) 7me using a circuit of size Θ(n2).

72 2

36
02/09/17

m(0) m(1) m(2) m(3) m(4) m(5) m(6) m(7)


8 9 10 11 12 13 14 15
• The figure shows a Wallace tree
CSA CSA
that adds 8 par7al products m(0),
m(1), …, m(7). 10 10 13

• The par7al product m(i) consists of CSA CSA


(n+i) bits. 13 15
13
• Each line represents an en7re
CSA 15
number – the label of an edge
15 15
indicates the number of bits.
• The carry-lookahead adder at the CSA
bo|om adds a (2n-1)-bit number to 15 16
a 2n-bit number to give the 2n-bit Parallel Adder
product. Product 16

73 2

END OF LECTURE 36

74 2

37
02/09/17

Lecture 1: EVOLUTION OF COMPUTER SYSTEM


Lecture
Lecture 37: DESIGN
1: EVOLUTION OF OF DIVIDERS
COMPUTER SYSTEM

PROF.
DR. KAMALIKA DATTA
DR.INDRANIL
KAMALIKASENGUPTA
DATTA
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, IIT MEGHALAYA
KHARAGPUR
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING, NIT MEGHALAYA

75
2

Introduc)on
• Division is more complex Instruc)on Latency Cycles /
than mul7plica7on. Issue
• Example: Typical values in Load / Store 3 1
Pen7um-3 processor.
Integer Mul7ply 4 1
– Not easy to construct
high-speed dividers. Integer Divide 36 36
• The ra7os have not Floa7ng-point Add 3 1
changed much in later Floa7ng-point Mul7ply 5 2
processors. Floa7ng-point Divide 38 38

76 2

38
02/09/17

• Latency:
– Minimum delay aZer which the first result is obtained, star7ng from the
7me when the first set of inputs is applied.
• Cycles/Issue:
– Whenever a new set of inputs is applied to a func7onal unit (e.g. adder),
it is called an issue.
– Pipelined implementa7on of arithme7c unit reduces the number of clock
cycles between successive issues.
– For non-pipelined arithme7c units (e.g. divider), the number of clock
cycles between successive issues is much higher.
• Next input can be applied only aZer the previous opera7on is complete.

77 2

The Process of Integer Division


• In integer division, a divisor M and a dividend D are given.
• The objec7ve is to find a third number Q, called the quo5ent, such that
D = Q x M + R
where R is the remainder such that 0 ≤ R < M.
• The rela7onship D = Q x M suggests that there is a close correspondence
between division and mul7plica7on.
– Dividend, quo7ent and divisor correspond to product, mul7plicand and
mul7plier, respec7vely.
– Similar algorithms and circuits can be used for mul7plica7on and division.

78 2

39
02/09/17

• One of the simplest division methods is the sequen7al digit-by-digit algorithm


similar to that used in pencil-and-paper methods.
0 1 1 0 Quotient Q = Q0Q1Q2Q3
Divisor M 1 1 0 1 0 0 1 0 1 Dividend D = R0
1 1 0 Q0.M (Does not go; Q0 = 0)
-------------
1 0 0 1 0 1 R1
D = 37 = (1 0 0 1 0 1)2 - 1 1 0 Q1.2-1.M (Does go; Q1 = 1)
-------------
M = 6 = (1 1 0)2 0 1 1 0 1 R2
Quo)ent Q = 6 - 1 1 0 Q2.2-2.M (Does go; Q2 = 1)
-------------
Remainder R = 1 0 0 0 1 R3
1 1 0 Q3.2-3.M (Does not go; Q3 = 0)
-------------
0 0 1 R4 = Remainder R

79 2

• In the example, the quo7ent Q = Q0Q1Q2… is computed one bit at a 7me.


– At each step i, the divisor shiZed i bits to the right (i.e. 2-i.M) is compared with
the current par7al remainder Ri.
– The quo7ent bit Qi is set to 0 (1) if 2-i.M is greater than (less than) Ri,
– The new par7al remainder Ri+1 is computed as:
Ri+1 = Ri - Qi.2-i.M

80 2

40
02/09/17

• Machine implementa7on:
– For hardware implementa7on, it is more convenient to shiZ the
par7al remainder to the leZ rela7ve to a fixed divisor; thus
Ri+1 = 2Ri - Qi.M (instead of Ri+1 = Ri - Qi.2-i.M)
– The final par7al remainder is the required remainder shiZed to the
leZ, so that R = 2-3.R4 (see next slide).

81 2

Divisor M Quotient Q
1 1 0 1 0 0 1 0 1 Dividend = 2R0
1 1 0 Q0.M 0
Do not ---------------
subtract 1 0 0 1 0 1 R1
1 0 0 1 0 1 0 2R1
1 1 0 Q1.M 0 1
---------------
D = 37 = (1 0 0 1 0 1)2 0 1 1 0 1 0 R2
M = 6 = (1 1 0)2 0 1 1 0 1 0 0 2R2
1 1 0 Q2.M 0 1 1
Quo)ent Q = 6 ---------------
Remainder R = 1 0 0 0 1 0 0 R3
0 0 0 1 0 0 0 2R3
1 1 0 Q3.M 0 1 1 0
---------------
0 0 1 0 0 0 R4 = 23.R

82 2

41
02/09/17

Restoring Division: The Data Path


Next quo5ent bit
An A Dividend Q Q0

Q and M are n-bit registers


A is an (n+1)-bit register
SUBTRACT
ADD /

Control Unit ..
Divisor M .
Add / Subtract

83 2

Basic Steps
Repeat the following steps n 7mes:
a) ShiZ the dividend one bit at a 7me star7ng into register A.
b) Subtract the divisor M from this register A (trial subtrac5on).
c) If the result is nega7ve (i.e. not going):
• Add the divisor M back into the register A (i.e. restoring back).
• Record 0 as the next quo7ent bit.
d) If the result is posi7ve: A Q
• Do not restore the intermediate result.
• Record 1 as the next quo7ent bit. M

84 2

42
02/09/17

START
Restoring Division
A = 0; M = divisor; • Quo7ent in Q
Q = dividend; COUNT = n
• Remainder in A
ShiZ leZ A, Q

A Q
A=A–M Trial subtrac5on

No Yes M
A –ve ?
Q0=1 Q0=0
A=A+M Restora5on No
Yes
COUNT = COUNT – 1 COUNT = 0? STOP

85 2

• Analysis:
– For n-bit divisor and n-bit dividend, we iterate n 7mes.
– Number of trial subtrac7ons: n
– Number of restoring addi7ons: n/2 on the average
• Best case: 0
• Worst case: n

86 2

43
02/09/17

A Simple Example: 8/3 for 4-bit representa)on (n=4)


Initially: 0 0 0 0 0 1 0 0 0 Shift: 0 0 1 0 0 0 0 0 –
0 0 0 1 1 Subtract:
Shift: 0 0 0 0 1 0 0 0 – Set Q0: 0 0 0 0 1
Subtract: 0 0 0 0 0 0 0 0 1
Shift: 0 0 0 1 0 0 0 1 -
Set Q0: 1 1 1 1 0
Subtract:
Restore: 0 0 0 1 1 Set Q0: 1 1 1 1 1
0 0 0 0 1 0 0 0 0 Restore: 0 0 0 1 1
Shift: 0 0 0 1 0 0 0 0 – 0 0 0 1 0 0 0 1 0
Subtract:
Set Q0: 1 1 1 1 1 Remainder Quo)ent
Restore: 0 0 0 1 1 00010 = 2 0010 = 2
0 0 0 1 0 0 0 0 0

87 2

Non-Restoring Division A Q

M
• The performance of restoring division algorithm can be improved by
exploi7ng the following observa7on.
• In restoring division, what we do actually is: ShiZ leZ means
– If A is posi7ve, we shiZ it leZ and subtract M. mul7plying by 2.
• That is, we compute 2A – M.
– If A is nega7ve, we restore is by doing A+M, shiZ it leZ, and then subtract M.
• That is, we compute 2(A + M) – M = 2A + M.
• We can accordingly modify the basic division algorithm by elimina7ng the
restoring step à NON-RESTORING DIVISION.

88 2

44
02/09/17

• Basic steps in non-restoring division:


a) Start by ini7alizing register A to 0, and repeat steps (b)-(d) n 7mes.
b) If the value in register A is posi7ve,
• ShiZ A and Q leZ by one bit posi7on.
• Subtract M from A.
c) If the value in register A is nega7ve,
• ShiZ A and Q leZ by one bit posi7on.
• Add M to A.
d) If A is posi7ve, set Q0 = 1; else, set Q0 = 0.
e) If A is nega7ve, add M to A as a final correc7ve step.

89 2
89

START
Non-Restoring Division A Q
A = 0; M = divisor;
Q = dividend; COUNT = n M

No A -ve ? Yes No
ShiZ leZ A, Q ShiZ leZ A, Q COUNT = 0?
A=A–M A=A+M
Yes
No
No Yes A<0?
Q0 = 1 A -ve ? Q0 = 0
Yes

COUNT = COUNT – 1 A=A+M STOP


Correc5on

90 2

45
02/09/17

A Simple Example: 8/3 for n=4


Initially: 0 0 0 0 0 1 0 0 0 Shift: 0 0 0 1 0 0 0 1 –
Subtract: - 0 0 1 1
Shift: 0 0 0 0 1 0 0 0 – Set Q0: 1 1 1 1 1 0 0 1 0
Subtract: - 0 0 1 1
Set Q0: 1 1 1 1 0 0 0 0 0 Correction Add: Quo)ent
1 1 1 1 1
Shift: 1 1 1 0 0 0 0 0 – 0010 = 2
0 0 0 1 1
Add: 0 0 1 1 0 0 0 1 0
Set Q0: 1 1 1 1 1 0 0 0 0
Remainder
Shift: 1 1 1 1 0 0 0 0 –
Add: 0 0 1 1
00010 = 2
Set Q0: 0 0 0 0 1 0 0 0 1

91 2

Data Path for Non-Restoring Division

Next quo5ent bit


An A Q Q0

Q and M are n-bit registers


A is an (n+1)-bit register
SUBTRACT
ADD /

Control Unit ..
M .
Add / Subtract

92 2

46
02/09/17

High Speed Dividers


• Some of the methods used to increase the speed of mul7plica7on can also
be modified to speed up division.
– High-speed addi7on and subtrac7on.
– High-speed shiZing.
– Combina7onal array divider (implemen7ng restoring division).
• The main difficulty is that it is very difficult to implement division in a
pipeline to improve the performance.
– Unlike mul7plica7on, where carry-save Wallace tree mul7pliers can be used
for pipeline implementa7on.

93 2

END OF LECTURE 37

94 2

47

You might also like