You are on page 1of 32

Chapter 4

Arithmetic for Computers

Arithmetic

• Where we've been:


– Performance (seconds, cycles, instructions)
• What's up ahead:
– Implementing the Architecture
operation

a
32 ALU
result
32
b
32

2
Constructing an ALU (Arithmetic Logic Unit)

• ALU is a device that performs the arithmetic


operations like addition and subtraction, or logical
operations like AND and OR.
• An ALU can be constructed from four hardware
building blocks: AND gate, OR gate, Inverter, and
Multiplexor. (What is the truth table of each?)
• The Multiplexor: If S == 0 then C = A else C = B
Selects one of the inputs to be the output, based on
a control input. S

A 0
C
B 1

ALU (Cont.)

• Let's build an ALU to support the andi and


ori instructions
– We'll just build a 1 bit ALU, and use 32 of
them because MIPS word is 32 bits wide

op a b res
operation

a
result
b

4
The 1-bit Logical Unit for AND and OR

Operation

Result

1
b

Different Implementations
• Not easy to decide the “best” way to build something
– Don't want too many inputs to a single gate
– Don’t want to have to go through too many gates
– for our purposes, ease of comprehension is important
• Let's look at a 1-bit ALU for addition:
C a rr y In
• A full adder or a (3,2) adder
has three inputs (a, b, & Cin)
a
and two outputs (Sum & Cout).
Sum
• A half adder or a (2,2) adder
b
has only 2 inputs (a & b) and 2
outputs (Sum & Cout).
C a rry O u t

• How could we build a 1-bit ALU for Add, AND, and OR?
• How could we build a 32-bit ALU?

6
Input and Output Specification for 1-bit Adder

Inputs Outputs
a b CarryIn CarryOut Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

Values of the Inputs when CarryOut is a 1

Inputs
a b CarryIn
0 1 1
1 0 1
1 1 0
1 1 1

• CarryOut = (b . CarryIn) + (a . CarryIn) + (a . b) + (a . b . CarryIn)


• If a . b . CarryIn is true, then all of the other terms must be true,
so we leave out the last term.
• cout = a b + a cin + b cin

8
Adder Hardware for the Carry Out Signal

CarryIn

CarryOut

• The above Hardware was constructed from the following


equation: CarryOut = (b . CarryIn) + (a . CarryIn) + (a . b)

The Sum bit

• The Sum bit is set to 1 when exactly one input is 1


or when all three inputs are 1.
(check Truth Table on slide 7).
• Sum = (a . b’ . CarryIn’) + (a’ . b . CarryIn’) +
(a’ . b’ . CarryIn) + (a . b . CarryIn)
• Sum = a XOR b XOR cin
• Draw the logic circuit? (Left as exercise).

10
A 1-bit ALU that Performs AND, OR, & Addition

O p e r a t io n
C a rry In

a
0

1
R e s u lt

2
b

C a rry O u t

11

A 32 bit ALU Constructed from 32 1-bit ALUs.


C a rr y In O p e r a t io n

Ripple carry: CarryOut


of the less significant bit a0 C a rry In
R e s u lt 0
A LU 0
is connected to the b0
C a rry O u t

CarryIn of the more


significant bit. a1 C a rry In
R e s u lt 1
A LU 1
b1
C a rry O u t

a2 C a rry In
R e s u lt 2
A LU 2
b2
C a rry O u t

a31 C a rry In
R e s u lt 3 1
A LU 31
b31

12
What about subtraction (a – b) ?

• Two's complement approach: just negate b and add.


• How do we negate?
Binvert Operation
CarryIn
• A very clever solution:
a
0

1
Result

b 0 2

CarryOut

13

Tailoring the ALU to the MIPS

• Need to support the set-on-less-than instruction (slt)


– remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
• Need to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b

14
Detecting Overflow

• No overflow when adding a positive and a negative


number
• No overflow when signs are the same for
subtraction
• Overflow occurs when the value affects the sign:
– overflow when adding two positives yields a
negative
– or, adding two negatives gives a positive
– or, subtract a negative from a positive and get a
negative
– or, subtract a positive from a negative and get a
positive
15

Supporting slt (A 1-bit ALU that performs AND, OR, …)

• In slt operation, the least B in v e r t O p e ra tio n

significant bit is set either to C a rr y In

0 or 1, and the rest of the bits


(31 bits) are set to 0. a
0
• 1 means negative and 0
means positive.
1

• So, we need only to connect


the sign bit from the adder R e s u lt

output to the least significant b 0 2

bit to get set on less than.


1

• The result output from the


most significant ALU bit for L e s s 3

the slt operation is not the


output of the adder. So, the
ALU output for slt is input a . C a rry O u t

value Less
Supporting slt (A 1-bit ALU for the most significant bit)

• We need a new 1-bit ALU for B in v e r t O p e r a tio n

C a r ry In
the most significant bit that has
an extra output bit: the adder
a
output. 0

• This figure shows the design, 1

with this new adder output line


called Set, and used only for slt. R e s u lt

b 0 2
• As long as we need a special
1
ALU for the most significant
bit, we added the overflow Less 3

detection logic since it is also Set

associated with that bit.


O v e r flo w
O v e rflo w
d e t ec t i o n

b.

17

A 32-bit ALU constructed from 31 copies of 1-bit ALU)


B in v e r t C a rry In O p e r a t io n

a0 C a rry In
b0 ALU0 R e s u lt0
Le ss
C a rry O u t

a1 C a rry In
b1 ALU1 R e s u lt1
0 Le ss
C a rry O u t

a2 C a rry In
b2 ALU2 R e s u lt2
0 Le ss
C a rry O u t

C a r ry In

a31 C a rry In R e s u lt3 1


b31 ALU 31 S et
0 Le ss O v e r f lo w

18
Test for equality [ (a – b = 0) Æ a = b ]

Bnegate Operation
• The simplest way is to OR
all the outputs together and
a0 CarryIn
then send that signal b0 ALU0
Result0
Less
through an inverter as CarryOut

shown in the figure.


a1 CarryIn Result1
b1 ALU1
0 Less
CarryOut Zero

•Note: zero is a 1 when the result is zero! a2 CarryIn Result2


b2 ALU2
• When ALU do subtraction, set 0 Less
CarryOut
both CarryIn and Binvert to 1. For
Adds or logical operations, we
want both control lines to be 0.
Result31
So, for simplicity combining the a31
b31
CarryIn
ALU31 Set
CarryIn and Binvert to a single 0 Less Overflow

control line called Bnegate

19

Control Lines

• We can think of the combination of the 1-bit Bnegate line and


the 2-bit Operation lines as 3-bit control lines for the ALU,
telling it to perform add, subtract, AND, OR, or set on less
than.

• Notice control lines:

000 = and
001 = or
010 = add
110 = subtract
111 = slt

20
The Symbol Commonly Used to Represent an ALU

ALU operation

a
ALU Zero
result

b Overflow

CarryOut

21

Conclusion

• We can build an ALU to support the MIPS instruction set


– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
• Important points about hardware
– all of the gates are always working
– the speed of a gate is affected by the number of inputs to the gate
– the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
• Our primary focus:
– Clever changes to organization can improve performance
(similar to using better algorithms in software)
– we’ll look at two examples for addition and multiplication

22
Problem: Ripple Carry Adder is Slow

• Is a 32-bit ALU as fast as a 1-bit ALU?


• Is there more than one way to do addition?
– two extremes: ripple carry and sum-of-products

Can you see the ripple? How could you get rid of it?

c1 = b0c0 + a0c0 + a0b0


c2 = b1c1 + a1c1 + a1b1 c2 =
c3 = b2c2 + a2c2 + a2b2 c3 =
c4 = b3c3 + a3c3 + a3b3 c4 =

Not feasible! Why?

23

Carry Lookahead Adder (First Level of Abstraction)

• An approach in-between our two extremes


• Ci+1 = (bi . ci) + (ai . ci) + (ai .bi)
= (ai . bi) + (ai + bi) . ci
• Example: c2=(a1.b1)+(a1+b1).((a0.b0)+(a0+b0).c0)
• Motivation:
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry? gi = ai bi
– When would we propagate the carry? pi = ai + bi
• Using generate and propagate to define ci+1 we get:
ci+1 = gi + pi . ci
• Suppose gi equals 1. Then
ci+1 = gi + pi . ci = 1 + pi . ci = 1
That is, the adder generates a CarryOut (ci+1) independent of
the value of CarryIn (ci)
24
Carry Lookahead Adder (Continue)

• Suppose that gi is 0 and pi is 1. Then


ci+1 = gi + pi . ci = 0 + 1 . ci = ci
That is, the adder propagates CarryIn to a CarryOut.
• So, CarryIni+1 = 1 if either gi is 1 or both pi= 1 and CarryIni= 1.
• Example: We express the CarryIn signals more economically.
Let’s show it for 4 bits:
c1= (g0+(p0.c0)
c2= g1+(p1.g0)+(p1.p0.c0)
c3= g2+(p2.g1)+(p2.p1.g0)+(p2.p1.p0.c0)
c4= g3+(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)+(p3.p2.p1.p0.c0)

• Even this simplified form leads to large equations. For example,


consider a 16-bit adder.

25

Carry Lookahead Adder (Second Level of Abstraction)

• To perform carry lookahead adder for 4-bit adders, we


need propagate and generate signals at higher level.
• Example: For four 4-bit adder blocks:
P0= p3.p2.p1.p0
P1= p7.p6.p5.p4
P2= p11.p10.p9.p8
P3= p15.p14.p13.p12

That is, “super” propagate signal for the 4-bit


abstraction (Pi) is true only if each of the bits in the
group will propagate a carry.

26
Carry Lookahead Adder (Continue)

• For the “super” generate signal (Gi), we care only if


there is a carry out of the most significant bit of the
4-bit group. It also occurs if an earlier generate is
true and all the intermediate propagates, including
that of the most significant bit, are also true:

G0= g3+(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)
G1= g7+(p7.g6)+(p7.p6.g5)+(p7.p6.p5.g4)
G2= g11+(p11.g10)+(p11.p10.g9)+(p11.p10.p9.g8)
G3= g15+(p15.g14)+(p15.p14.g13)+(p15.p14.p13.g12)

27

Carry Lookahead Adder (Continue)

• The carry in for each 4-bit group of the 16-bit adder


are:

C1= G0 + (P0.c0)
C2= G1 + (P1.G0) + (P1.P0.c0)
C3= G2 + (P2.G1) + (P2.P1.G0) + (P2.P1.P0.c0)
C4= G3 + (P3.G2) + (P3.P2.G1) + (P3.P2.P1.G0) + (P3.P2.P1.P0.c0)

28
Four 4-bit ALUs using Carry Lookahead to form
a 16-bit adder
C a r r y I n

a 0 C a r r y I n
b 0 R e s u lt 0 - - 3
a 1
b 1 A L U 0
a 2 p i
b 2 P 0
G 0 g i
a 3
b 3 C a r r y - l o o k a h e a d u n it
C 1
c i + 1

a 4 C a r r y I n
b 4 R e s u lt 4 - - 7
a 5
b 5 A L U 1
a 6 P 1 p i + 1
b 6 G 1 g i + 1
a 7
b 7
C 2
c i + 2

a 8 C a r r y I n
b 8 R e s u lt 8 - - 1 1
a 9
b 9 A L U 2
a 1 0 P 2 p i + 2
b 1 0 G 2 g i + 2
a 1 1
b 1 1
C 3
c i + 3

a 1 2 C a r r y I n
b 1 2 R e s u lt 1 2 - - 1 5
a 1 3
b 1 3 A L U 3
a 1 4 P 3 p i + 3
b 1 4 G 3 g i + 3
a 1 5 C 4
b 1 5 c i + 4

C a r r y O u t

29

Example: Both Levels of the Propagate and Generate

• Determine the gi, pi, Pi, and Gi values of these two 16-
bit numbers:
a: 0001 1010 0011 0011two
b: 1110 0101 1110 1011two
• Also, what is CarryOut 15 (C4)?

• Answer: Aligning the bits makes it easy to see the


values of generate gi (ai . bi) and propagate pi (ai + bi):
a: 0001 1010 0011 0011
b: 1110 0101 1110 1011
gi: 0000 0000 0010 0011
pi: 1111 1111 1111 1011
where the bits are numbered 15 to 0 from left to right.

30
Answer (Continue)
• Next, the “super” propagates (P3, P2, P1, P0) are simply the AND
of the lower-level propagates:
P3 = 1 . 1 . 1. 1 = 1
P2 = 1 . 1 . 1. 1 = 1
P1 = 1 . 1 . 1. 1 = 1
P0 = 1 . 0 . 1. 1 = 0
• The “super” generates are more complex, so use the following
equations:
G0 = g3 +(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)
= 0 +(1.0)+(1.0.1)+(1.0.1.1)= 0
G1 = g7 +(p7.g6)+(p7.p6.g5)+(p7.p6.p5.g4)
= 0 +(1.0)+(1.1.1)+(1.1.1.0)= 1
G2 = g11 +(p11.g10)+(p11.p10.g9)+(p11.p10.p9.g8)
= 0 +(1.0)+(1.1.0)+(1.1.1.0)= 0
G3 = g15 +(p15.g14)+(p15.p14.g13)+(p15.p14.p13.g12)
= 0 +(1.0)+(1.1.0)+(1.1.1.0)= 0

31

Answer (Continue)

• Finally, CarryOut 15 is
C4 = G3 +(P3.G2)+(P3.P2.G1)+(P3.P2.P1.G0)
+(P3.P2.P1.P0.c0)
= 0+(1.0)+(1.1.1)+(1.1.1.0)+(1.1.1.0.0)=1

Hence there is a carry out when adding these two 16-bit


numbers.

32
Example: Speed of Ripple Carry versus Carry Lookahead

• Assume each AND or OR gate takes the same time


for a signal to pass through it.
• Time is estimated by simply counting the number
of gates along the longest path through a piece of
logic.
• Compare the number of gate delays for the critical
paths of two 16-bit adders, one using ripple carry
and one using two-level carry lookahead.

33

Answer
CarryIn

CarryOut

• The above figure shows that the carry out signal takes
two gates delays per bit.
• Then the number of gate delays between a carry in to
the least significant bit and the carry out to the most
significant is 16 x 2 = 32.

34
Answer (Continue)

• For carry lookahead, the carry out of the most significant


bit is just C4, defined in the previous example (Slide 32).
• It takes two level of logic to specify C4 in terms of Pi and
Gi (the OR of several AND terms).
• Pi is specified in one level of logic (AND) using pi.
• Gi is specified in two levels using pi and gi.
• pi and gi are each one level of logic, defined in terms of ai
and bi.
• If we assume one gate delay for each level of logic in these
equations, the worst case is 2 + 2 + 1 = 5 gate delays.
• Hence for 16-bit addition a carry-lookahead adder is six
times faster, using this simple estimate of hardware speed.

35

Multiplication

• More complicated than addition


– accomplished via shifting and addition
• More time and more area
• Let's look at 3 versions based on grade school algorithm

1000 (multiplicand)
__x_1001 (multiplier)
1000
0000
0000
1000
1001000 (Product)
• If we ignore the sign bits, the length of the multiplication of
an n-bit multiplicand and an m-bit multiplier is a product that
is n + m bits long.
• That is, n + m bits are required to represent all possible
products.
36
Multiplication

• Each step of multiplication is simple:


1. Just place a copy of the multiplicand
(1 x multiplicand) in the proper place if the
multiplier digit is 1, or
2. Place 0 (0 x multiplicand) in the proper place if the
digit is 0.
• Negative numbers: convert and multiply
– there are better techniques, we won’t look at
them.

37

First Version of the Multiplication Hardware

Multiplicand
Shift left
64 bits

Multiplier
64-bit ALU Shift right
32 bits

Product
Control test
Write
64 bits

38
First Version of the Multiplication Hardware

• The Multiplicand register, ALU, and Product register are all 64


bits wide, with only the Multiplier register containing 32 bits.
• We will need to move the multiplicand left one digit each step
as it may be added to the intermediate products. So, over 32
steps a 32-bit multiplicand would move 32 bits to the left.
Hence we need a 64-bit Multiplicand register, initialized with
the 32-bit multiplication in the right half and 0 in the left half.
• The multiplier is shifted in the opposite direction at each step.
• The Product register initialized to 0.
• Control decides when to shift the Multiplicand and Multiplier
registers and when to write new values into the Product
register.

39

The First Multiplication Algorithm Using Hardware


in Slide 38
S ta rt

M u lt ip li e r 0 = 1 1 . T e s t M u l t ip li e r 0 = 0
M u l t ip li e r 0

1 a . A d d m u lt ip li c a n d t o p r o d u c t a n d
p la c e th e r e s u lt in P r o d u c t r e g is t e r

2 . S h ift th e M u l tip lic a n d r e g i s te r le ft 1 b it

3 . S h ift th e M u l t ip li e r r e g is t e r r i g h t 1 b it

N o : < 3 2 r e p e t i t io n s
3 2 n d r e p e t iti o n ?

Y e s : 3 2 r e p e titio n s

D o n e
40
The First Multiplication Algorithm

• If the least significant bit of the multiplier is 1, add


the multiplicand to the product.
• If not, go to the next step.
• Shift the multiplicand left and the multiplier right in
the next two steps.
• These three steps are repeated 32 times.

41

Example: First Multiply Algorithm

• Using 4-bit numbers to save space, multiply 0010 x 0011.


Iteration Step Multiplier Multiplicand Product
0 Initial Values 0011 0000 0010 0000 0000
1a: 1 Æ Prod =Prod + Mcand 0011 0000 0010 0000 0010
1 2: Shift left Multiplicand 0011 0000 0100 0000 0010
3: Shift right Multiplier 0001 0000 0100 0000 0010
1a: 1 Æ Prod =Prod + Mcand 0001 0000 0100 0000 0110
2 2: Shift left Multiplicand 0001 0000 1000 0000 0110
3: Shift right Multiplier 0000 0000 1000 0000 0110
1: 0 Æ no operation 0000 0000 1000 0000 0110
3 2: Shift left Multiplicand 0000 0001 0000 0000 0110
3: Shift right Multiplier 0000 0001 0000 0000 0110
1: 0 Æ no operation 0000 0001 0000 0000 0110
4 2: Shift left Multiplicand 0000 0010 0000 0000 0110
3: Shift right Multiplier 0000 0010 0000 0000 0110
42
Example: First Multiply Algorithm

• The table (slide 42) shows the value of each


register for each of the steps labeled according to
the algorithm (slide 40), with the final value of
0000 0110 (6 in decimal).
• Color is used to indicate the register values that
change on that step.
• The underline bit is the one examined to determine
the operation of the next step.
• If each step took a clock cycle, this algorithm
would require 32 x 3 steps = 96 clock cycles to
multiply two 32-bit numbers.

43

Second Version of the Multiplication Algorithm and Hardware

• Half of the bits of the multiplicand in the first algorithm


were always 0, so only half could contain useful bit values.
• A full 64-bit ALU seemed wasteful and slow since half of
the adder bits were adding 0 to the intermediate sum.
• The first algorithm shifts the multiplicand left with 0s
inserted in the new positions, so the multiplicand cannot
affect the least significant bits of the product.
• Instead of shifting the multiplicand left, what if we shift the
product right?
• Now the multiplicand would be fixed relative to the product,
and since we are adding only 32 bits, the adder need be
only 32 bits wide.

44
Second Version of the Multiplication Hardware

Multiplicand

32 bits

Multiplier
32-bit ALU Shift right
32 bits

Shift right
Product Control test
Write
64 bits

45

Second Version of the Multiplication Hardware

• Compare with the first version in slide 38.

• The Multiplicand register, ALU, and


Multiplier register are all 32 bits wide with
only the Product register left at 64 bits.

• Now the product is shifted right.

46
Second Multiplication Algorithm Using the Hardware in
Slide 45
S ta rt

M u l t ip li e r 0 = 1 1 . T e s t M u l t ip li e r 0 = 0
M u lt ip li e r 0

1 a . A d d m u lt ip l ic a n d t o t h e le ft h a lf o f
t h e p r o d u c t a n d p la c e th e r e s u lt in
th e le f t h a lf o f t h e P r o d u c t r e g i s t e r

2 . S h if t t h e P r o d u c t r e g is t e r r ig h t 1 b it

3 . S h if t t h e M u lt i p l ie r r e g is t e r r i g h t 1 b it

N o : < 3 2 r e p e t i t io n s
3 2 n d r e p e titio n ?

Y e s : 3 2 r e p e t i tio n s

D o n e
47

Second Multiplication Algorithm

• This algorithm starts with the 32-bit


Multiplicand and 32-bit Multiplier registers
set to their named values and the 64-bit
Product register set to 0.

• The Product register is shifted right instead


of shifting the multiplicand.

• This algorithm only forms a 32-bit sum, so


only the left half of the 64-bit Product
register is changed by the addition.
48
Example: Second Multiply Algorithm

• Multiply 0010 x 0011 using the algorithm in slide 47.


Iteration Step Multiplier Multiplicand Product
0 Initial Values 0011 0010 0000 0000
1a: 1 Æ Prod =Prod + Mcand 0011 0010 0010 0000
1 2: Shift right Product 0011 0010 0001 0000
3: Shift right Multiplier 0001 0010 0001 0000
1a: 1 Æ Prod =Prod + Mcand 0001 0010 0011 0000
2 2: Shift right Product 0001 0010 0001 1000
3: Shift right Multiplier 0000 0010 0001 1000
1: 0 Æ no operation 0000 0010 0001 1000
3 2: Shift right Product 0000 0010 0000 1100
3: Shift right Multiplier 0000 0010 0000 1100
1: 0 Æ no operation 0000 0010 0000 1100
4 2: Shift right Product 0000 0010 0000 0110
3: Shift right Multiplier 0000 0010 0000 0110
49

Final Version of the Multiplication Algorithm and Hardware

• The final observation was that the Product


register had wasted space that matched
exactly the size of the multiplier.

• The third version of the multiplication


algorithm combines the right-most half of the
product with the multiplier.

• The least significant bit of the 64-bit Product


register (Product0) now is the bit to be tested.

50
Third Version of the Multiplication Hardware

Multiplicand

32 bits

32-bit ALU

Shift right Control


Product
Write test
64 bits

51

Third Version of the Multiplication Hardware

• Comparing with the second version in slide 45.

• The separate Multiplier register has


disappeared.

• The multiplier is placed instead in the right half


of the Product register.

52
The Third Multiplication Algorithm
S ta rt

P ro d u c t0 = 1 1 . T e s t P ro d u c t0 = 0
P ro d u c t0

1 a . A d d m u lt i p l ic a n d t o t h e le f t h a lf o f
th e p r o d u c t a n d p la c e th e r e s u lt in
t h e le f t h a lf o f t h e P r o d u c t r e g i s t e r

2 . S h if t t h e P r o d u c t r e g is t e r r ig h t 1 b it

N o : < 3 2 r e p e ti t io n s
3 2 n d r e p e t it io n ?

Y e s : 3 2 r e p e t it io n s

D o n e

53

The Third Multiplication Algorithm

• The algorithm starts by assigning the


multiplier to the right half of the Product
register, placing 0 in the upper half.

• It needs only two steps because the


Product and Multiplier registers have been
combined.

54
Example: Third Multiply Algorithm

• Multiply 0010 x 0011 using the algorithm in slide 53.

Iteration Step Multiplicand Product


0 Initial Values 0010 0000 0011
1a: 1 Æ Prod =Prod + Mcand 0010 0010 0011
1 2: Shift right Product 0010 0001 0001
1a: 1 Æ Prod =Prod + Mcand 0010 0011 0001
2 2: Shift right Product 0010 0001 1000
1: 0 Æ no operation 0010 0001 1000
3 2: Shift right Product 0010 0000 1100
1: 0 Æ no operation 0010 0000 1100
4 2: Shift right Product 0010 0000 0110

55

Signed Multiplication

• So far we have dealt with positive numbers.


• Signed numbers: First convert the Multiplier and
Multiplicand to positive numbers and then
remember the original signs.
• The algorithm runs for 31 iterations, leaving the
signs out of the calculations.
• We need to negate the product only if the original
signs disagree.

56
Booth’s Algorithm

• Booth’s algorithm changes the first step of the third


multiplication algorithm in slide 53 – looking at 1 bit
of the multiplier and then deciding whether to add
the multiplicand – to looking at 2 bits of the
multiplier.
• The new first step has four cases, depending on the
values of the 2 bits (check next slide).
• Assume that the pair of bits examined consists of
the current bit and the bit to the right – which has the
current bit in the previous step.
• The second step is to shift the product right.

57

Booth’s Algorithm

• The value of these 2 bits:

Current Bit to the Explanation Example


bit right

1 0 Beginning of a run of 1s 00001111000

1 1 Middle of a run of 1s 00001111000

0 1 End of a run of 1s 00001111000

0 0 Middle of a run of 0s 00001111000

58
Booth’s Algorithm

• Booth’s Algorithm steps:


1. Depending on the current and previous bits, do one of
the following:

00: Middle of a string of 0s, so no arithmetic operation.


01: End of a string of 1s, so add the multiplicand to the
left half of the product.
10: Beginning of a string of 1s, so subtract the
multiplicand from the left half of the product.
11: Middle of a string of 1s, so no arithmetic operation.

2. As in the previous algorithm (version three), shift the


Product register right 1 bit.

59

Comparing Third Algorithm and Booth’s Algorithm


for Positive Numbers
Iter- Multi- Original Algorithm Booth’s Algorithm
ation licand
Step Product Step Product
0 0010 Initial values 0000 0110 Initial values 0000 0110 0
0010 1: 0 Æ no operation 0000 0110 1a: 00Æno operation 0000 0110 0
1
0010 2: Shift right Product 0000 0011 2: Shift right Product 0000 0011 0
0010 1a: 1Æ 0010 0011 1c: 10Æ 1110 0011 0
2 Prod = Prod + Mcand Prod = Prod – Mcand
0010 2: Shift right Product 0001 0001 2: Shift right Product 1111 0001 1
0010 1a: 1Æ 0011 0001 1d: 11Æno operation 1111 0001 1
3 Prod = Prod + Mcand
0010 2: Shift right Product 0001 1000 2: Shift right Product 1111 1000 1
0010 1: 0 Æ no operation 0001 1000 1b: 01Æ 0001 1000 1
4 Prod = Prod + Mcand
0010 2: Shift right Product 0000 1100 2: Shift right Product 0000 1100 0

60
Comparing Example (Continue)

• Booth’s algorithm starts with a 0 to the right of the rightmost


bit for the first stage.
• Booth’s operation is identified according to the values in the
2 bits.
• By the fourth step, the two algorithms have the same values
in the Product register.
• The one other requirement is that shifting the product right
must preserve the sign of the intermediate result, since we
are dealing with signed numbers.
• The solution is to extend the sign when the product is shifted
to the right.
• Example: Step 2 of the second iteration turns 1110 0011 0
into 1111 0001 1 instead of 0111 0001 1.
• This shift is called an arithmetic right shift to differentiate it
from a logical right shift.

61

Example: Booth’s Algorithm

• Let’s try Booth’s algorithm with negative numbers: 2 x –3 = -6


or 0010 x 1101 = 1111 1010 (in binary).

Iteration Step Multiplicand Product


0 Initial values 0010 0000 1101 0
1c: 10 Æ Prod = Prod – Mcand 0010 1110 1101 0
1 2: Shift right Product 0010 1111 0110 1
1b: 01 Æ Prod = Prod + Mcand 0010 0001 0110 1
2 2: Shift right Product 0010 0000 1011 0
1c: 10 Æ Prod = Prod – Mcand 0010 1110 1011 0
3 2: Shift right Product 0010 1111 0101 1
1d: 11 Æ no operation 0010 1111 0101 1
4 2: Shift right Product 0010 1111 1010 1

62
Multiply in MIPS

• MIPS provides a separate pair of 32-bit


registers to contain the 64-bit product,
called Hi and Lo.
• To produce a properly signed or unsigned
product, MIPS has two instructions:
multiply (mult) and multiply unsigned
(multu).

63

Chapter Four Summary

• Constructing an ALU to do the following


arithmetic and logical operations:

AND, OR, addition, subtraction,


less than, and equality.

• Ripple Adder versus Carry-Lookahead.


• Three versions of Multiplications.
• Signed Multiplication – Booth’s Algorithm.
• We are ready to move on (and implement
the processor).

64

You might also like