Chapter 3 OnlyFor Q39 and ProblemNo 9

Chapter 4
Arithmetic for Computers
Arithmetic
• Where we've been:

– Performance (seconds, cycles, instructions)
• What's up ahead:
– Implementing the Architecture
operation
a
32 ALU
result
32
b
32
2
Constructing an ALU (Arithmetic Logic Unit)
• ALU is a device that performs the arithmetic

operations like addition and subtraction, or logical
operations like AND and OR.
• An ALU can be constructed from four hardware
building blocks: AND gate, OR gate, Inverter, and
Multiplexor. (What is the truth table of each?)
• The Multiplexor: If S == 0 then C = A else C = B
Selects one of the inputs to be the output, based on
a control input. S
A 0
C
B 1
ALU (Cont.)
• Let's build an ALU to support the andi and

ori instructions
– We'll just build a 1 bit ALU, and use 32 of
them because MIPS word is 32 bits wide
op a b res
operation
a
result
b
4
The 1-bit Logical Unit for AND and OR
Operation
Result
1
b
Different Implementations
• Not easy to decide the “best” way to build something
– Don't want too many inputs to a single gate
– Don’t want to have to go through too many gates
– for our purposes, ease of comprehension is important
• Let's look at a 1-bit ALU for addition:
C a rr y In
• A full adder or a (3,2) adder
has three inputs (a, b, & Cin)
a
and two outputs (Sum & Cout).
Sum
• A half adder or a (2,2) adder
b
has only 2 inputs (a & b) and 2
outputs (Sum & Cout).
C a rry O u t
• How could we build a 1-bit ALU for Add, AND, and OR?
• How could we build a 32-bit ALU?
6
Input and Output Specification for 1-bit Adder
Inputs Outputs
a b CarryIn CarryOut Sum
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Values of the Inputs when CarryOut is a 1
Inputs
a b CarryIn
0 1 1
1 0 1
1 1 0
1 1 1
• CarryOut = (b . CarryIn) + (a . CarryIn) + (a . b) + (a . b . CarryIn)

• If a . b . CarryIn is true, then all of the other terms must be true,
so we leave out the last term.
• cout = a b + a cin + b cin
8
Adder Hardware for the Carry Out Signal
CarryIn
CarryOut
• The above Hardware was constructed from the following

equation: CarryOut = (b . CarryIn) + (a . CarryIn) + (a . b)
The Sum bit
• The Sum bit is set to 1 when exactly one input is 1

or when all three inputs are 1.
(check Truth Table on slide 7).
• Sum = (a . b’ . CarryIn’) + (a’ . b . CarryIn’) +
(a’ . b’ . CarryIn) + (a . b . CarryIn)
• Sum = a XOR b XOR cin
• Draw the logic circuit? (Left as exercise).
10
A 1-bit ALU that Performs AND, OR, & Addition
O p e r a t io n
C a rry In
a
0
1
R e s u lt
2
b
C a rry O u t
11
A 32 bit ALU Constructed from 32 1-bit ALUs.

C a rr y In O p e r a t io n
Ripple carry: CarryOut

of the less significant bit a0 C a rry In
R e s u lt 0
A LU 0
is connected to the b0
C a rry O u t
CarryIn of the more

significant bit. a1 C a rry In
R e s u lt 1
A LU 1
b1
C a rry O u t
a2 C a rry In
R e s u lt 2
A LU 2
b2
C a rry O u t
a31 C a rry In
R e s u lt 3 1
A LU 31
b31
12
What about subtraction (a – b) ?
• Two's complement approach: just negate b and add.

• How do we negate?
Binvert Operation
CarryIn
• A very clever solution:
a
0
1
Result
b 0 2
CarryOut
13
Tailoring the ALU to the MIPS
• Need to support the set-on-less-than instruction (slt)

– remember: slt is an arithmetic instruction
– produces a 1 if rs < rt and 0 otherwise
– use subtraction: (a-b) < 0 implies a < b
• Need to support test for equality (beq $t5, $t6, $t7)
– use subtraction: (a-b) = 0 implies a = b
14
Detecting Overflow
• No overflow when adding a positive and a negative

number
• No overflow when signs are the same for
subtraction
• Overflow occurs when the value affects the sign:
– overflow when adding two positives yields a
negative
– or, adding two negatives gives a positive
– or, subtract a negative from a positive and get a
negative
– or, subtract a positive from a negative and get a
positive
15
Supporting slt (A 1-bit ALU that performs AND, OR, …)
• In slt operation, the least B in v e r t O p e ra tio n
significant bit is set either to C a rr y In
0 or 1, and the rest of the bits

(31 bits) are set to 0. a
0
• 1 means negative and 0
means positive.
1
• So, we need only to connect

the sign bit from the adder R e s u lt
output to the least significant b 0 2
bit to get set on less than.

1
• The result output from the

most significant ALU bit for L e s s 3
the slt operation is not the

output of the adder. So, the
ALU output for slt is input a . C a rry O u t
value Less
Supporting slt (A 1-bit ALU for the most significant bit)
• We need a new 1-bit ALU for B in v e r t O p e r a tio n
C a r ry In
the most significant bit that has
an extra output bit: the adder
a
output. 0
• This figure shows the design, 1
with this new adder output line

called Set, and used only for slt. R e s u lt
b 0 2
• As long as we need a special
1
ALU for the most significant
bit, we added the overflow Less 3
detection logic since it is also Set
associated with that bit.

O v e r flo w
O v e rflo w
d e t ec t i o n
b.
17
A 32-bit ALU constructed from 31 copies of 1-bit ALU)

B in v e r t C a rry In O p e r a t io n
a0 C a rry In
b0 ALU0 R e s u lt0
Le ss
C a rry O u t
a1 C a rry In
b1 ALU1 R e s u lt1
0 Le ss
C a rry O u t
a2 C a rry In
b2 ALU2 R e s u lt2
0 Le ss
C a rry O u t
C a r ry In
a31 C a rry In R e s u lt3 1

b31 ALU 31 S et
0 Le ss O v e r f lo w
18
Test for equality [ (a – b = 0) Æ a = b ]
Bnegate Operation
• The simplest way is to OR
all the outputs together and
a0 CarryIn
then send that signal b0 ALU0
Result0
Less
through an inverter as CarryOut
shown in the figure.

a1 CarryIn Result1
b1 ALU1
0 Less
CarryOut Zero
•Note: zero is a 1 when the result is zero! a2 CarryIn Result2

b2 ALU2
• When ALU do subtraction, set 0 Less
CarryOut
both CarryIn and Binvert to 1. For
Adds or logical operations, we
want both control lines to be 0.
Result31
So, for simplicity combining the a31
b31
CarryIn
ALU31 Set
CarryIn and Binvert to a single 0 Less Overflow
control line called Bnegate
19
Control Lines
• We can think of the combination of the 1-bit Bnegate line and

the 2-bit Operation lines as 3-bit control lines for the ALU,
telling it to perform add, subtract, AND, OR, or set on less
than.
• Notice control lines:
000 = and
001 = or
010 = add
110 = subtract
111 = slt
20
The Symbol Commonly Used to Represent an ALU
ALU operation
a
ALU Zero
result
b Overflow
CarryOut
21
Conclusion
• We can build an ALU to support the MIPS instruction set

– key idea: use multiplexor to select the output we want
– we can efficiently perform subtraction using two’s complement
– we can replicate a 1-bit ALU to produce a 32-bit ALU
• Important points about hardware
– all of the gates are always working
– the speed of a gate is affected by the number of inputs to the gate
– the speed of a circuit is affected by the number of gates in series
(on the “critical path” or the “deepest level of logic”)
• Our primary focus:
– Clever changes to organization can improve performance
(similar to using better algorithms in software)
– we’ll look at two examples for addition and multiplication
22
Problem: Ripple Carry Adder is Slow
• Is a 32-bit ALU as fast as a 1-bit ALU?

• Is there more than one way to do addition?
– two extremes: ripple carry and sum-of-products
Can you see the ripple? How could you get rid of it?
c1 = b0c0 + a0c0 + a0b0

c2 = b1c1 + a1c1 + a1b1 c2 =
c3 = b2c2 + a2c2 + a2b2 c3 =
c4 = b3c3 + a3c3 + a3b3 c4 =
Not feasible! Why?
23
Carry Lookahead Adder (First Level of Abstraction)
• An approach in-between our two extremes

• Ci+1 = (bi . ci) + (ai . ci) + (ai .bi)
= (ai . bi) + (ai + bi) . ci
• Example: c2=(a1.b1)+(a1+b1).((a0.b0)+(a0+b0).c0)
• Motivation:
– If we didn't know the value of carry-in, what could we do?
– When would we always generate a carry? gi = ai bi
– When would we propagate the carry? pi = ai + bi
• Using generate and propagate to define ci+1 we get:
ci+1 = gi + pi . ci
• Suppose gi equals 1. Then
ci+1 = gi + pi . ci = 1 + pi . ci = 1
That is, the adder generates a CarryOut (ci+1) independent of
the value of CarryIn (ci)
24
Carry Lookahead Adder (Continue)
• Suppose that gi is 0 and pi is 1. Then

ci+1 = gi + pi . ci = 0 + 1 . ci = ci
That is, the adder propagates CarryIn to a CarryOut.
• So, CarryIni+1 = 1 if either gi is 1 or both pi= 1 and CarryIni= 1.
• Example: We express the CarryIn signals more economically.
Let’s show it for 4 bits:
c1= (g0+(p0.c0)
c2= g1+(p1.g0)+(p1.p0.c0)
c3= g2+(p2.g1)+(p2.p1.g0)+(p2.p1.p0.c0)
c4= g3+(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)+(p3.p2.p1.p0.c0)
• Even this simplified form leads to large equations. For example,

consider a 16-bit adder.
25
Carry Lookahead Adder (Second Level of Abstraction)
• To perform carry lookahead adder for 4-bit adders, we

need propagate and generate signals at higher level.
• Example: For four 4-bit adder blocks:
P0= p3.p2.p1.p0
P1= p7.p6.p5.p4
P2= p11.p10.p9.p8
P3= p15.p14.p13.p12
That is, “super” propagate signal for the 4-bit

abstraction (Pi) is true only if each of the bits in the
group will propagate a carry.
26
• For the “super” generate signal (Gi), we care only if

there is a carry out of the most significant bit of the
4-bit group. It also occurs if an earlier generate is
true and all the intermediate propagates, including
that of the most significant bit, are also true:
G0= g3+(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)
G1= g7+(p7.g6)+(p7.p6.g5)+(p7.p6.p5.g4)
G2= g11+(p11.g10)+(p11.p10.g9)+(p11.p10.p9.g8)
G3= g15+(p15.g14)+(p15.p14.g13)+(p15.p14.p13.g12)
27
• The carry in for each 4-bit group of the 16-bit adder

are:
C1= G0 + (P0.c0)
C2= G1 + (P1.G0) + (P1.P0.c0)
C3= G2 + (P2.G1) + (P2.P1.G0) + (P2.P1.P0.c0)
C4= G3 + (P3.G2) + (P3.P2.G1) + (P3.P2.P1.G0) + (P3.P2.P1.P0.c0)
28
Four 4-bit ALUs using Carry Lookahead to form
a 16-bit adder
C a r r y I n
a 0 C a r r y I n
b 0 R e s u lt 0 - - 3
a 1
b 1 A L U 0
a 2 p i
b 2 P 0
G 0 g i
a 3
b 3 C a r r y - l o o k a h e a d u n it
C 1
c i + 1
a 4 C a r r y I n
b 4 R e s u lt 4 - - 7
a 5
b 5 A L U 1
a 6 P 1 p i + 1
b 6 G 1 g i + 1
a 7
b 7
C 2
c i + 2
a 8 C a r r y I n
b 8 R e s u lt 8 - - 1 1
a 9
b 9 A L U 2
a 1 0 P 2 p i + 2
b 1 0 G 2 g i + 2
a 1 1
b 1 1
C 3
c i + 3
a 1 2 C a r r y I n
b 1 2 R e s u lt 1 2 - - 1 5
a 1 3
b 1 3 A L U 3
a 1 4 P 3 p i + 3
b 1 4 G 3 g i + 3
a 1 5 C 4
b 1 5 c i + 4
C a r r y O u t
29
Example: Both Levels of the Propagate and Generate
• Determine the gi, pi, Pi, and Gi values of these two 16-
bit numbers:
a: 0001 1010 0011 0011two
b: 1110 0101 1110 1011two
• Also, what is CarryOut 15 (C4)?
• Answer: Aligning the bits makes it easy to see the

values of generate gi (ai . bi) and propagate pi (ai + bi):
a: 0001 1010 0011 0011
b: 1110 0101 1110 1011
gi: 0000 0000 0010 0011
pi: 1111 1111 1111 1011
where the bits are numbered 15 to 0 from left to right.
30
Answer (Continue)
• Next, the “super” propagates (P3, P2, P1, P0) are simply the AND
of the lower-level propagates:
P3 = 1 . 1 . 1. 1 = 1
P2 = 1 . 1 . 1. 1 = 1
P1 = 1 . 1 . 1. 1 = 1
P0 = 1 . 0 . 1. 1 = 0
• The “super” generates are more complex, so use the following
equations:
G0 = g3 +(p3.g2)+(p3.p2.g1)+(p3.p2.p1.g0)
= 0 +(1.0)+(1.0.1)+(1.0.1.1)= 0
G1 = g7 +(p7.g6)+(p7.p6.g5)+(p7.p6.p5.g4)
= 0 +(1.0)+(1.1.1)+(1.1.1.0)= 1
G2 = g11 +(p11.g10)+(p11.p10.g9)+(p11.p10.p9.g8)
= 0 +(1.0)+(1.1.0)+(1.1.1.0)= 0
G3 = g15 +(p15.g14)+(p15.p14.g13)+(p15.p14.p13.g12)
= 0 +(1.0)+(1.1.0)+(1.1.1.0)= 0
31
Answer (Continue)
• Finally, CarryOut 15 is
C4 = G3 +(P3.G2)+(P3.P2.G1)+(P3.P2.P1.G0)
+(P3.P2.P1.P0.c0)
= 0+(1.0)+(1.1.1)+(1.1.1.0)+(1.1.1.0.0)=1
Hence there is a carry out when adding these two 16-bit

numbers.
32
Example: Speed of Ripple Carry versus Carry Lookahead
• Assume each AND or OR gate takes the same time

for a signal to pass through it.
• Time is estimated by simply counting the number
of gates along the longest path through a piece of
logic.
• Compare the number of gate delays for the critical
paths of two 16-bit adders, one using ripple carry
and one using two-level carry lookahead.
33
Answer
CarryIn
CarryOut
• The above figure shows that the carry out signal takes
two gates delays per bit.
• Then the number of gate delays between a carry in to
the least significant bit and the carry out to the most
significant is 16 x 2 = 32.
34
Answer (Continue)
• For carry lookahead, the carry out of the most significant

bit is just C4, defined in the previous example (Slide 32).
• It takes two level of logic to specify C4 in terms of Pi and
Gi (the OR of several AND terms).
• Pi is specified in one level of logic (AND) using pi.
• Gi is specified in two levels using pi and gi.
• pi and gi are each one level of logic, defined in terms of ai
and bi.
• If we assume one gate delay for each level of logic in these
equations, the worst case is 2 + 2 + 1 = 5 gate delays.
• Hence for 16-bit addition a carry-lookahead adder is six
times faster, using this simple estimate of hardware speed.
35
Multiplication
• More complicated than addition

– accomplished via shifting and addition
• More time and more area
• Let's look at 3 versions based on grade school algorithm
1000 (multiplicand)
__x_1001 (multiplier)
1000
0000
0000
1000
1001000 (Product)
• If we ignore the sign bits, the length of the multiplication of
an n-bit multiplicand and an m-bit multiplier is a product that
is n + m bits long.
• That is, n + m bits are required to represent all possible
products.
36
Multiplication
• Each step of multiplication is simple:

1. Just place a copy of the multiplicand
(1 x multiplicand) in the proper place if the
multiplier digit is 1, or
2. Place 0 (0 x multiplicand) in the proper place if the
digit is 0.
• Negative numbers: convert and multiply
– there are better techniques, we won’t look at
them.
37
First Version of the Multiplication Hardware
Multiplicand
Shift left
64 bits
Multiplier
64-bit ALU Shift right
32 bits
Product
Control test
Write
64 bits
38
First Version of the Multiplication Hardware
• The Multiplicand register, ALU, and Product register are all 64

bits wide, with only the Multiplier register containing 32 bits.
• We will need to move the multiplicand left one digit each step
as it may be added to the intermediate products. So, over 32
steps a 32-bit multiplicand would move 32 bits to the left.
Hence we need a 64-bit Multiplicand register, initialized with
the 32-bit multiplication in the right half and 0 in the left half.
• The multiplier is shifted in the opposite direction at each step.
• The Product register initialized to 0.
• Control decides when to shift the Multiplicand and Multiplier
registers and when to write new values into the Product
register.
39
The First Multiplication Algorithm Using Hardware

in Slide 38
S ta rt
M u lt ip li e r 0 = 1 1 . T e s t M u l t ip li e r 0 = 0
M u l t ip li e r 0
1 a . A d d m u lt ip li c a n d t o p r o d u c t a n d
p la c e th e r e s u lt in P r o d u c t r e g is t e r
2 . S h ift th e M u l tip lic a n d r e g i s te r le ft 1 b it
3 . S h ift th e M u l t ip li e r r e g is t e r r i g h t 1 b it
N o : < 3 2 r e p e t i t io n s
3 2 n d r e p e t iti o n ?
Y e s : 3 2 r e p e titio n s
D o n e
40
The First Multiplication Algorithm
• If the least significant bit of the multiplier is 1, add

the multiplicand to the product.
• If not, go to the next step.
• Shift the multiplicand left and the multiplier right in
the next two steps.
• These three steps are repeated 32 times.
41
Example: First Multiply Algorithm
• Using 4-bit numbers to save space, multiply 0010 x 0011.

Iteration Step Multiplier Multiplicand Product
0 Initial Values 0011 0000 0010 0000 0000
1a: 1 Æ Prod =Prod + Mcand 0011 0000 0010 0000 0010
1 2: Shift left Multiplicand 0011 0000 0100 0000 0010
3: Shift right Multiplier 0001 0000 0100 0000 0010
1a: 1 Æ Prod =Prod + Mcand 0001 0000 0100 0000 0110
1: 0 Æ no operation 0000 0000 1000 0000 0110
1: 0 Æ no operation 0000 0001 0000 0000 0110
42
Example: First Multiply Algorithm
• The table (slide 42) shows the value of each

register for each of the steps labeled according to
the algorithm (slide 40), with the final value of
0000 0110 (6 in decimal).
• Color is used to indicate the register values that
change on that step.
• The underline bit is the one examined to determine
the operation of the next step.
• If each step took a clock cycle, this algorithm
would require 32 x 3 steps = 96 clock cycles to
multiply two 32-bit numbers.
43
Second Version of the Multiplication Algorithm and Hardware
• Half of the bits of the multiplicand in the first algorithm

were always 0, so only half could contain useful bit values.
• A full 64-bit ALU seemed wasteful and slow since half of
the adder bits were adding 0 to the intermediate sum.
• The first algorithm shifts the multiplicand left with 0s
inserted in the new positions, so the multiplicand cannot
affect the least significant bits of the product.
• Instead of shifting the multiplicand left, what if we shift the
product right?
• Now the multiplicand would be fixed relative to the product,
and since we are adding only 32 bits, the adder need be
only 32 bits wide.
44
Second Version of the Multiplication Hardware
Multiplicand
32 bits
Multiplier
32-bit ALU Shift right
32 bits
Shift right
Product Control test
Write
64 bits
45
Second Version of the Multiplication Hardware
• Compare with the first version in slide 38.
• The Multiplicand register, ALU, and

Multiplier register are all 32 bits wide with
only the Product register left at 64 bits.
• Now the product is shifted right.
46
Second Multiplication Algorithm Using the Hardware in
Slide 45
S ta rt
M u l t ip li e r 0 = 1 1 . T e s t M u l t ip li e r 0 = 0
M u lt ip li e r 0
1 a . A d d m u lt ip l ic a n d t o t h e le ft h a lf o f
t h e p r o d u c t a n d p la c e th e r e s u lt in
th e le f t h a lf o f t h e P r o d u c t r e g i s t e r
2 . S h if t t h e P r o d u c t r e g is t e r r ig h t 1 b it
3 . S h if t t h e M u lt i p l ie r r e g is t e r r i g h t 1 b it
N o : < 3 2 r e p e t i t io n s
3 2 n d r e p e titio n ?
Y e s : 3 2 r e p e t i tio n s
D o n e
47
Second Multiplication Algorithm
• This algorithm starts with the 32-bit

Multiplicand and 32-bit Multiplier registers
set to their named values and the 64-bit
Product register set to 0.
• The Product register is shifted right instead

of shifting the multiplicand.
• This algorithm only forms a 32-bit sum, so

only the left half of the 64-bit Product
register is changed by the addition.
48
Example: Second Multiply Algorithm
• Multiply 0010 x 0011 using the algorithm in slide 47.

Iteration Step Multiplier Multiplicand Product
0 Initial Values 0011 0010 0000 0000
1a: 1 Æ Prod =Prod + Mcand 0011 0010 0010 0000
1 2: Shift right Product 0011 0010 0001 0000
3: Shift right Multiplier 0001 0010 0001 0000
1a: 1 Æ Prod =Prod + Mcand 0001 0010 0011 0000
1: 0 Æ no operation 0000 0010 0001 1000
1: 0 Æ no operation 0000 0010 0000 1100
49
Final Version of the Multiplication Algorithm and Hardware
• The final observation was that the Product

register had wasted space that matched
exactly the size of the multiplier.
• The third version of the multiplication

algorithm combines the right-most half of the
product with the multiplier.
• The least significant bit of the 64-bit Product

register (Product0) now is the bit to be tested.
50
Third Version of the Multiplication Hardware
Multiplicand
32 bits
32-bit ALU
Shift right Control

Product
Write test
64 bits
51
Third Version of the Multiplication Hardware
• Comparing with the second version in slide 45.
• The separate Multiplier register has

disappeared.
• The multiplier is placed instead in the right half

of the Product register.
52
The Third Multiplication Algorithm
S ta rt
P ro d u c t0 = 1 1 . T e s t P ro d u c t0 = 0
P ro d u c t0
1 a . A d d m u lt i p l ic a n d t o t h e le f t h a lf o f
th e p r o d u c t a n d p la c e th e r e s u lt in
t h e le f t h a lf o f t h e P r o d u c t r e g i s t e r
2 . S h if t t h e P r o d u c t r e g is t e r r ig h t 1 b it
N o : < 3 2 r e p e ti t io n s
3 2 n d r e p e t it io n ?
Y e s : 3 2 r e p e t it io n s
D o n e
53
The Third Multiplication Algorithm
• The algorithm starts by assigning the

multiplier to the right half of the Product
register, placing 0 in the upper half.
• It needs only two steps because the

Product and Multiplier registers have been
combined.
54
Example: Third Multiply Algorithm
• Multiply 0010 x 0011 using the algorithm in slide 53.
Iteration Step Multiplicand Product

0 Initial Values 0010 0000 0011
1a: 1 Æ Prod =Prod + Mcand 0010 0010 0011
1 2: Shift right Product 0010 0001 0001
1a: 1 Æ Prod =Prod + Mcand 0010 0011 0001
1: 0 Æ no operation 0010 0001 1000
1: 0 Æ no operation 0010 0000 1100
55
Signed Multiplication
• So far we have dealt with positive numbers.

• Signed numbers: First convert the Multiplier and
Multiplicand to positive numbers and then
remember the original signs.
• The algorithm runs for 31 iterations, leaving the
signs out of the calculations.
• We need to negate the product only if the original
signs disagree.
56
Booth’s Algorithm
• Booth’s algorithm changes the first step of the third

multiplication algorithm in slide 53 – looking at 1 bit
of the multiplier and then deciding whether to add
the multiplicand – to looking at 2 bits of the
multiplier.
• The new first step has four cases, depending on the
values of the 2 bits (check next slide).
• Assume that the pair of bits examined consists of
the current bit and the bit to the right – which has the
current bit in the previous step.
• The second step is to shift the product right.
57
Booth’s Algorithm
• The value of these 2 bits:
Current Bit to the Explanation Example

bit right
1 0 Beginning of a run of 1s 00001111000
1 1 Middle of a run of 1s 00001111000
0 1 End of a run of 1s 00001111000
0 0 Middle of a run of 0s 00001111000
58
Booth’s Algorithm
• Booth’s Algorithm steps:

1. Depending on the current and previous bits, do one of
the following:
00: Middle of a string of 0s, so no arithmetic operation.

01: End of a string of 1s, so add the multiplicand to the
left half of the product.
10: Beginning of a string of 1s, so subtract the
multiplicand from the left half of the product.
11: Middle of a string of 1s, so no arithmetic operation.
2. As in the previous algorithm (version three), shift the

Product register right 1 bit.
59
Comparing Third Algorithm and Booth’s Algorithm

for Positive Numbers
Iter- Multi- Original Algorithm Booth’s Algorithm
ation licand
Step Product Step Product
0 0010 Initial values 0000 0110 Initial values 0000 0110 0
0010 1: 0 Æ no operation 0000 0110 1a: 00Æno operation 0000 0110 0
1
0010 2: Shift right Product 0000 0011 2: Shift right Product 0000 0011 0
0010 1a: 1Æ 0010 0011 1c: 10Æ 1110 0011 0
2 Prod = Prod + Mcand Prod = Prod – Mcand
0010 1a: 1Æ 0011 0001 1d: 11Æno operation 1111 0001 1
3 Prod = Prod + Mcand
0010 1: 0 Æ no operation 0001 1000 1b: 01Æ 0001 1000 1
4 Prod = Prod + Mcand
60
Comparing Example (Continue)
• Booth’s algorithm starts with a 0 to the right of the rightmost

bit for the first stage.
• Booth’s operation is identified according to the values in the
2 bits.
• By the fourth step, the two algorithms have the same values
in the Product register.
• The one other requirement is that shifting the product right
must preserve the sign of the intermediate result, since we
are dealing with signed numbers.
• The solution is to extend the sign when the product is shifted
to the right.
• Example: Step 2 of the second iteration turns 1110 0011 0
into 1111 0001 1 instead of 0111 0001 1.
• This shift is called an arithmetic right shift to differentiate it
from a logical right shift.
61
Example: Booth’s Algorithm
• Let’s try Booth’s algorithm with negative numbers: 2 x –3 = -6

or 0010 x 1101 = 1111 1010 (in binary).
Iteration Step Multiplicand Product

0 Initial values 0010 0000 1101 0
1c: 10 Æ Prod = Prod – Mcand 0010 1110 1101 0
1b: 01 Æ Prod = Prod + Mcand 0010 0001 0110 1
1c: 10 Æ Prod = Prod – Mcand 0010 1110 1011 0
1d: 11 Æ no operation 0010 1111 0101 1
62
Multiply in MIPS
• MIPS provides a separate pair of 32-bit

registers to contain the 64-bit product,
called Hi and Lo.
• To produce a properly signed or unsigned
product, MIPS has two instructions:
multiply (mult) and multiply unsigned
(multu).
63
Chapter Four Summary
• Constructing an ALU to do the following

arithmetic and logical operations:
AND, OR, addition, subtraction,

less than, and equality.
• Ripple Adder versus Carry-Lookahead.

• Three versions of Multiplications.
• Signed Multiplication – Booth’s Algorithm.
• We are ready to move on (and implement
the processor).
64

Chapter 3 OnlyFor Q39 and ProblemNo 9

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 3 OnlyFor Q39 and ProblemNo 9

Uploaded by

Copyright:

Available Formats

Chapter 4

Arithmetic for Computers

• Where we've been:

• ALU is a device that performs the arithmetic

• Let's build an ALU to support the andi and

Values of the Inputs when CarryOut is a 1

• CarryOut = (b . CarryIn) + (a . CarryIn) + (a . b) + (a . b . CarryIn)

• The above Hardware was constructed from the following

The Sum bit

• The Sum bit is set to 1 when exactly one input is 1

A 32 bit ALU Constructed from 32 1-bit ALUs.

Ripple carry: CarryOut

CarryIn of the more

• Two's complement approach: just negate b and add.

Tailoring the ALU to the MIPS

• Need to support the set-on-less-than instruction (slt)

• No overflow when adding a positive and a negative

Supporting slt (A 1-bit ALU that performs AND, OR, …)

• In slt operation, the least B in v e r t O p e ra tio n

significant bit is set either to C a rr y In

0 or 1, and the rest of the bits

• So, we need only to connect

output to the least significant b 0 2

bit to get set on less than.

• The result output from the

the slt operation is not the

• We need a new 1-bit ALU for B in v e r t O p e r a tio n

• This figure shows the design, 1

with this new adder output line

detection logic since it is also Set

associated with that bit.

A 32-bit ALU constructed from 31 copies of 1-bit ALU)

a31 C a rry In R e s u lt3 1

shown in the figure.

•Note: zero is a 1 when the result is zero! a2 CarryIn Result2

control line called Bnegate

• We can think of the combination of the 1-bit Bnegate line and

• Notice control lines:

• We can build an ALU to support the MIPS instruction set

• Is a 32-bit ALU as fast as a 1-bit ALU?

c1 = b0c0 + a0c0 + a0b0

Not feasible! Why?

Carry Lookahead Adder (First Level of Abstraction)

• An approach in-between our two extremes

• Suppose that gi is 0 and pi is 1. Then

• Even this simplified form leads to large equations. For example,

Carry Lookahead Adder (Second Level of Abstraction)

• To perform carry lookahead adder for 4-bit adders, we

That is, “super” propagate signal for the 4-bit

• For the “super” generate signal (Gi), we care only if

Carry Lookahead Adder (Continue)

• The carry in for each 4-bit group of the 16-bit adder

Example: Both Levels of the Propagate and Generate

• Answer: Aligning the bits makes it easy to see the

Hence there is a carry out when adding these two 16-bit

• Assume each AND or OR gate takes the same time

• For carry lookahead, the carry out of the most significant

• More complicated than addition

• Each step of multiplication is simple:

First Version of the Multiplication Hardware

• The Multiplicand register, ALU, and Product register are all 64

The First Multiplication Algorithm Using Hardware

2 . S h ift th e M u l tip lic a n d r e g i s te r le ft 1 b it