Computer Arithmetic: Electrical and Computer Engineering Department

Electrical and Computer Engineering Department
Computer Arithmetic
Number Representation
 Binary numbers (base 2) - integers
0000  0001  0010  0011  0100  0101  0110  0111
 1000  1001  . . .
in decimal from 0 to 2n-1 for n bits
 MIPS represents numbers as 32-bit constants.
0 ten is in MIPS
0000 0000 0000 0000 0000 0000 0000 0000two
  
Bit 31 (most significant bit) Bit 3 Bit 0
Thus the largest (unsigned) number is
1111 1111 1111 1111 1111 1111 1111 1111two or
4,294,967,295 ten = 232 -1
 Bits are just bits (have no inherent meaning)
conventions define the relationships between bits and numbers
 But numbers can also be signed: if the sign bit is 0 the
number is positive, if it is 1 the number is negative.
 Thus the signed number
1111 1111 1111 1111 1111 1111 1111 1111two is
-1ten
 With 32 bits the range of integers becomes –231 (-2,147,483,648)
is 1000 0000 0000 0000 0000 0000 0000 0000two
to 231-1 (2,147,483,647)
Computers use the two’s complement to represent the X signed
number as (x31  -231)+(x30  230)+…+(x1  21)+(x0  20)
MIPS Representations
 32-bit signed numbers (2’s complement):
0000 0000 0000 0000 0000 0000 0000 0011two = 3ten

1111 1111 1111 1111 1111 1111 1111 1101two = -3ten
0000 0000 0000 0000 0000 0000 0000 0000two = 3ten -3ten = 0
 To get the two’s complement all 0s become 1s, and all 1s
become 0s, then a 1 is added to the least significant bit
 Converting n-bit numbers into numbers with more than n bits:
 MIPS 16-bit immediate gets converted to 32 bits for
arithmetic
 copy the most significant bit (the sign bit) into the other bits
0010 -> 0000 0010
1010 -> 1111 1010
MIPS Representations
 Exercise:
what is the decimal’s equivalent of the two’s complement of
1111 1111 1111 1111 1111 1110 0000 1100two
 We negate first
0000 0000 0000 0000 0000 0001 1111 0011two

1
0000 0000 0000 0000 0000 0001 1111 0100two
 Then we apply the formula

(x31  -231)+(x30  230)+…+(x1  21)+(x0  20)
128 + 127 + 126 + 125 + 124 + 122 =
=256+128+64+32+16+4=500
 Of course, it gets more complicated
 storage locations (e.g., register file words) are finite,
so have to worry about overflow (i.e., when the
number is too big to fit into 32 bits)
 have to be able to represent negative numbers, e.g.,
how do we specify -8 in
 addi $sp, $sp, -8 #$sp = $sp - 8
 real systems have to provide for more than just
integers, e.g., fractions and real numbers (and
floating point)
More instructions
 Since MIPS uses signed arithmetic, we need
“unsigned” versions of operations such as
 slt
0 rs rt rd 0 42
Sign bit 31 magnitude bits

 slti 10 rs rt constant
Sign bit
 sltiu 11 constant
 The result of slt and sltu when comparing registers

will be different when the most significant bit is 1
MIPS Arithmetic and Logic Instructions
31 25 20 15 5 0
R-type: op Rs Rt Rd funct
I-Type: op Rs Rt Immed 16
INST op funct INST op funct INST op funct

ADDI 001000 xx ADD 000000 100000 000000 101000
addiu 001001 xx addu 000000 100001 000000 101001
SLTI 001010 xx SUB 000000 100010 SLT 000000 101010
sltiu 001011 xx subu 000000 100011 sltu 000000 101011
ANDI 001100 xx AND 000000 100100 000000 101100
ORI 001101 xx OR 000000 100101
XORI 001110 xx XOR 000000 100110
LUI 001111 xx NOR 000000 100111
Addition and Subtraction
 Addition is done by carrying 1s to the left
00… 00100 (410)
+ 00… 00100 (410)
00… 01000 (810)
 Subtraction is like addition with Two's complement of the second
number
00… 00100 (410)
- 00… 00011 (310)
 00… 00100 (410)
+ 11…
11101 (-310) 00…
Overflow
 Adding two numbers of different sign does not yield an
overflow
 Subtracting operands of same sign does not yield an
overflow
 Overflow (when number of available bits is not
sufficient – addition causes carry out of MSB
and subtraction causes a borrow into MSB)
Examples of overflow:
 A+B<0 (when A,B0);
 A+B0 (when A,B<0)
 A-B<0 when A0, B<0
 A-B0 when A<0, B0
MIPS Instructions
Category Instr Op Example Meaning
Code
Arithmetic add unsigned 0 and 33 addu $s1, $s2, $s3 $s1 = $s2 + $s3
(R & I subtr unsigned 0 and 35 subu $s1, $s2, $s3 $s1 = $s2 - $s3
format)
add imm. 9 addiu $s1, $s2, 6 $s1 = $s2 + 6
unsigned
Data load byte 36 lbu $s1, 25($s2) $s1 = Memory($s2+25)
Transfer unsigned
Cond. set on less than 0 and 43 sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else
Branch (I unsigned
$s1=0
& R format)
set on less than 11 sltiu $s1, $s2, 6 if ($s2<6) $s1=1 else
$s1=0
imm. unsigned
 Sign extend - addiu, sltiu
 Zero extend - lbu
 No overflow detected - addu, subu, addiu, sltu,
sltiu
because unsigned numbers are used for addresses
Overflow Detection and Effects
 Some instructions (add, addi, sub) detect overflow and
cause an interrupt (exception)
 When an overflow interrupt occurs control jumps to predefined
memory address for exceptions
 Address of instruction causing the overflow is saved for
possible resumption. The offending address is stored in the
exception program counter $epc (not part of the Register
File)
 To debug the program, the contents of this register can be moved to
a general purpose register “move from system control” (R-Type
operation) mfc0 $k1, $epc #$k1=$epc  $k0, $k1
OS-reserved registers in Register File
 After debugging, and restoring registers, control can return with a
jump to the OS-reserved register
 jr $k1
Example
 Remember that xor sets a bit to 1 if the two corresponding bits
differ. Code can detect an overflow
addu $t0, $t1, $t2
xor $t3, $t1, $t2 #check if sign of $t1 and$t2 differs
slt $t3, $t3, $zero #if sign differed, $t1=1
bne $t3, $zero, No_overflow #signs differed so no
overflow
xor $t3, $t0, $t1 #$t3 is negative is sign of sum is
different that that of $t1
slt $t3, $t3, $zero #$t3=1 if sum sign is different
than that of $t1
bne $t3, $zero, Overflow # all 3 signs are different
Arithmetic Logic Unit (ALU)
 The device that performs the computer arithmetic and logic
operations
 It has four
building blocks
 For logic
operations only
operation
a•b
a+b
1-bit Binary Adder
carry_in
a b carry_in carry_out Sum
0 0 0 0 0
a
+ Sum 0 0 1 0 1
b 0 1 0 0 1
0 1 1 1 0
carry_out
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
Sum = Σminterms = a’b’c + a’bc’ + ab’c’ +abc

carry_out = ab+ac+bc
1-bit Binary Adder
 carry_out = a • b+a • carryIn+b • carryIn CarryIn
 is implemented by the gates
 The 1-bit ALU with 0 is implemented with

the gates
operation
a•b
a &b, a|b, a+b, 0
3
Modifying the ALU Cell for slt
Subtraction is same as
adding negative b thus
Binvert goes Hi,
and CarryIn is 1
Less
Set
Overflow
detection Overflow
This is the MSB ALU Cell

Modifying the ALU for slt
 First perform a
subtraction
 Make the result 1 if the
subtraction yields a
negative result
 Make the result 0 if the
subtraction yields a
positive result
Modifying the ALU for beq
Bnegate Operation
Output Zero
goes Hi for
equality
MIPS Multiply Instruction
mult $s2, $s3 # hi||lo = $s2 * $s3
op rs rt rd shamt funct
000000 $s2 $s3 00000 00000 24

 Low-order word of the product is placed in processor
dedicated register lo and the high-order word is placed
in processor register hi
 mult uses signed integers and result is a signed 64-bit
number. Overflow is checked in software
 MIPS uses multu for unsigned products
000000 $s2 $s3 00000 00000 25
Multiply Instruction
 The product needs to be moved to general purpose registers to
become available for other operations. Instructions mfhi $s0
and mflo $s5 are provided for this.
 mfhi $s0 000000 00000 00000 $s0 00000 16
 mflo $s5 000000 00000 00000 $s5 00000 18
 Multiplication is more complicated than addition - via shifting and
addition
0010ten (multiplicand)
x 1011ten (multiplier)
0010
0010 (partial product
0000 array)
0010
00010110ten (product)
 m bits  n bits = m+n bit product 32+32=64 bits double precision
product produced – more time to compute
Multiply Instruction
 Binary numbers make it easy:
0 => place 0s (0x multiplicand) in the proper place
1 => place a copy of multiplicand in the proper place
32 0s 32-bit multiplicand
0000000………… 00000 101100011…………… … 1100
 at each stage shift the multiplicand left ( x 2)

 use next LSB of b to determine whether to add in shifted
multiplicand
 accumulate 2n bit partial product at each stage
 The process is repeated 32 times in MIPS
Unsigned shift-add multiplier (version
 64-bit Multiplicand reg., 64-bit ALU, 64-bit Product reg.
1)
(initialized to 0s), 32-bit multiplier register
32 0s 32-bit multiplicand
0000000………… 00000 101100011…………… … 1100
If Multiplier0 = 1 add multiplicand

to product register
Multiply Algorithm Version 1
1001two x 1001two
 Multiplier Multiplicand Product
0 1001 00001001 00000000
1 1001 00001001 00000000
1001 00010010 00001001
0100 00010010 00001001
2 0100 00010010 00001001
0010 00100100 00001001
3 0010 00100100 00001001
0001 01001000 00001001
4 0001 01001000 00001001
0000 10010000 01010001
Observations on Multiply Version 1
 1 clock cycle per step => 100 clocks per multiply
Ratio of multiply to add 1:5 to 1:100
 1/2 bits in multiplicand always 0
=> 64-bit adder is wasted
 0’s inserted in right of multiplicand as it is shifted left
=> least significant bits of product never changed once
formed
 Instead of shifting multiplicand to left, shift product to
right?
Multiply Hardware Version 2
 32-bit Multiplicand register, 32 -bit ALU, 64-bit
Product register, 32-bit Multiplier register
If Multiplier0 = 1
 Multiplicand stay’s still

and product moves right
 Product register wastes
space that exactly matches
size of multiplier
 So we can combine
Multiplier register and
Product register
Multiply Hardware Version 3
 32-bit Multiplicand register, 32 -bit ALU, 64-bit Product
register, (0-bit Multiplier register)
2 steps per bit
because
Multiplier & If Product0=1
Product
combined
MIPS registers
Hi and Lo are
left and right
half of Product Initially 32 0s 32-bit multiplier
Gives us MIPS 0000000………… 00000 101100011…………… … 1100
instruction
MultU
1001two x 1001two
 Iter. Multiplicand Product
0 1001 0000 1001
1 1001 0000 1001
add 1001 1001 1001
shift 1001 0100 1100
2 1001 0100 1100
shift 1001 0010 0110
3 1001 0010 0110
shift 1001 0001 0011
4 1001 0001 0011
add 1001 1010 0011
shift 1001 0101 0001
Multiplication of signed integers
 What about signed multiplication?
 Easiest solution is to make both positive & remember
whether to complement product when done (leave out the
sign bit, run for 31 steps)
 Apply definition of 2’s complement. Need to sign-extend
partial products and subtract at the end
 Booth’s Algorithm is elegant way to multiply signed
numbers using same hardware as before and save cycles
 It can handle multiple bits at a time, thus it is faster
Motivation for Booth’s Algorithm
 Example 2 x 6 = 0010 x 0110two:
0010
x 0110
+ 0000 shift (0 in multiplier)
+ 0010 add (1 in multiplier)
+ 0010 add (1 in multiplier)
+ 00001100 shift (0 in multiplier)

0000
 ALU with add or subtract gets same result in more than one way:
6 = – 2 + 8
0110 = – 00010 + 01000 = 11110 +
01000
Motivation for Booth’s Algorithm
 For example
0010
x 0110
0000 shift (0 in
multiplier)
– 0010 sub (first 1 in multiplier) .
0000 shift (mid string of 1s) .
+ 0010 add (prior step had last 1)
00001100
 Booth’s algorithm handles signed products by looking at the
strings of 1s
Booth’s Algorithm
Now the test in the algorithm depends on two bits. Results are placed
in the left half of the product register.
Current Bit Bit to the Right Explanation Example Op
1 0 Begins run of 1s 0001111000 subtract
1 1 Middle of run of 1s 0001111000 no op
0 1 End of run of 1s 0001111000 add
0 0 Middle of run of 0s 0001111000 no op
Originally for Speed (when shift was faster than add)
 Replace a string of 1s in multiplier with an initial subtract when we
first see a 10 and then later add for the first 01
Booths Example (2 x 7) mythical bit
Operation Multiplicand Product register next operation?
0. initial value 0010 0000 0111 0 10 -> subtract
 1a. P = P - m 1110 + 1110 1110
0111 0 shift P (sign extend)
 1b. 0010 1111 0011 1 11 -> nop, shift
 2. 0010 1111 1001 1 11 -> nop, shift
 3. 0010 1111 1100 1 01 -> add
 4a. 0010 +0010
 0001 1100 1 shift

4b. 0010 0000 1110 0 done
Booths Example (2 x -3) (1111 1010two)
Operation Multiplicand Product next?
0. initial value 0010 0000 1101 0 10 -> subtract

1a. P = P - m 1110 + 1110 1110
1101 0 shift P (sign ext)
1b. 0010 1111 0110 1 01 -> add multiplicand
+ 0010
2a. 0001 0110 1 shift P and sign ext.
2b. 0010 0000 1011 0 10 -> sub multiplicand

+ 1110
3a. 0010 1110 1011 0 shift and sign ext
3b. 0010 1111 0101 1 11 -> no op

4a 1111 0101 1 shift
4b. 0010 1111 1010 1 done
Faster Multiplication
One 32-bit adder for each bit of the multiplier.

One input is the multiplicand ANDed with a multiplier bit, the other
input is the output of the prior adder.
Resulting time is 5 32-bit add times
Division: Paper & Pencil
1001ten Quotient
Divisor1000ten 1001110ten Dividend
–1000
11
111
1110
–1000
110 Remainder (or Modulo result)
A number can be subtracted, creating quotient bit on each step – how many
times does the divisor go in a portion of the dividend.
Binary => 1 * divisor or 0 * divisor
Dividend = Quotient x Divisor + Remainder
=> | Dividend | = | Quotient | + | Divisor |
We assume for now unsigned 32-bit integers.
3 versions of divide, successive refinement
MIPS Divide Instruction
 div $s2, $s3
op rs rt rd shamt funct
000000 $s2 $s3 00000 00000 26

 The division quotient is placed in processor dedicated
register lo and the remainder is placed in processor
register hi
 div uses signed integers and result is a signed 64-bit
number. Overflow and division by 0 are checked in
software
 MIPS uses divu for unsigned divisions
000000 $s2 $s3 00000 00000 27
Division Hardware (Version 1)
 64-bit Divisor register, 64-bit ALU, 64-bit Remainder
register, 32-bit Quotient register
Initially holds divisor Initially holds 0s
If Reminder63 = 1, Quotient0=0
Initially holds dividend Reminder63 = 0, Quotient0=1
Divide Algorithm Version 1
Observations on Divide Version 1
 1/2 bits in divisor always 0
=> 1/2 of 64-bit adder is wasted
=> 1/2 of divisor is wasted
 Instead of shifting divisor to right,
shift the remainder to left?
 1st step cannot produce a 1 in quotient bit
(otherwise too big for the register)
=> switch order to shift first and then subtract,
can save 1 iteration
 32-bit Divisor register, 32-bit ALU, 64-bit
Remainder register, 32-bit Quotient register
If Reminder63 = 1, Quotient0=0
Reminder63 = 0, Quotient0=1
 We can eliminate Quotient register by combining with
Remainder as it is shifted left
 Start by shifting the Remainder left as before.
 Thereafter loop contains only two steps because the
shifting of the Remainder register shifts both the
remainder in the left half and the quotient in the right half
 The consequence of combining the two registers together
and the new order of the operations in the loop is that the
remainder will be shifted left one time too many.
 Thus the final correction step must shift back only the
remainder in the left half of the register
 32-bit Divisor reg, 32 -bit ALU, 64-bit Remainder reg,
(0-bit Quotient reg)
At the end right half

holds quotient
At the
start
dividend
is here
Divide Algorithm Version 3
Divide 0000 0111 by 0010 (2s 1110)
Iteration Divisor Remainder reg oper?
Remainder register initially holds dividend
0. 0010 0000 0111 in. val
0a 0010 0000 1110 shft lf 1
1a 0010 1110 1110 Rem=Rem-Div
1b 0010 0000 1110 Rem=Rem+Div
0001 1100 sll Rem, R0=0
2a 0010 1111 1100 Rem=Rem-Div
2b 0010 0001 1100 Rem=Rem+Div
0011 1000 sll Rem, R0=0
3a 0010 0001 1000 Rem=Rem-Div
3b 0010 0011 0001 sll Rem, R0=1
4a 0010 0001 0001 Rem=Rem-Div

4b 0010 0010 0011 sll Rem, R0=1
Shift left half
of Rem right 1
0001 0011
remainder quotient
 Same Hardware as Multiply: just need ALU to add or subtract,
and 64-bit register to shift left or shift right
 Hi and Lo registers in MIPS combine to act as 64-bit register
for multiply and divide
 Signed Divides: Simplest is to remember signs, make positive,
and complement quotient and remainder if necessary
 Note: Dividend and Remainder must have same sign
 Note: Quotient negated if Divisor sign & Dividend sign
disagree
e.g., –7 ÷ 2 = –3, remainder = –1
 And 7 ÷ (– 2)= – 3, remainder = 1
 Possible for quotient to be too large: if divide 64-bit integer by
1, quotient is 64 bits (“called saturation”)
Review: MIPS Instructions, so far
Category Instruction Op Code Example Meaning
add 0 and 32 add $s1, $s2, $s3 $s1 = $s2 + $s3
add unsigned 0 and 33 addu $s1, $s2, $s3 $s1 = $s2 + $s3
subtract 0 and 34 sub $s1, $s2, $s3 $s1 = $s2 - $s3
subt unsigned 0 and 35 subu $s1, $s2, $s3 $s1 = $s2 - $s3
Arithmetic add immediate 8 addi $s1, $s2, 6 $s1 = $s2 + 6
(R & I add immediate 9 addiu $s1, $s2, 6 $s1 = $s2 + 6
format) unsigned
multiply 0 and 24 mult $s1, $s2 hi || lo = $s1 * $s2
multiply 0 and 25 multu $s1, $s2 hi || lo = $s1 * $s2
unsigned
divide 0 and 26 div $s1, $s2 lo = $s1/$s2,
remainder in hi
divide 0 and 27 divu $s1, $s2 lo = $s1/$s2,
unsigned remainder in hi
Review: MIPS ISA, continued
Category Instr Op Code Example Meaning
Shift sll 0 and 0 sll $s1, $s2, 4 $s1 = $s2 << 4
(R srl 0 and 2 srl $s1, $s2, 4 $s1 = $s2 >> 4
format)
sra 0 and 3 sra $s1, $s2, 4 $s1 = $s2 >> 4
Data load word 35 lw $s1, 24($s2) $s1 = Memory($s2+24)
Transfer store word 43 sw $s1, 24($s2) Memory($s2+24) = $s1
(I format)
load byte 32 lb $s1, 25($s2) $s1 = Memory($s2+25)
load byte 36 lbu $s1, 25($s2) $s1 = Memory($s2+25)
unsigned
store byte 40 sb $s1, 25($s2) Memory($s2+25) = $s1
load upper imm 15 lui $s1, 6 $s1 = 6 * 216
move from hi 0 and 16 mfhi $s1 $s1 = hi
move to hi 0 and 17 mthi $s1 hi = $s1
move from lo 0 and 18 mflo $s1 $s1 = lo
move to lo 0 and 19 mtlo $s1 lo = $s1
Review: MIPS ISA- continued
Category Instr Op Code Example Meaning
Cond. br on equal 4 beq $s1, $s2, L if ($s1==$s2) go to L
Branch br on not equal 5 bne $s1, $s2, L if ($s1 !=$s2) go to L
(I & R
set on less than 0 and 42 slt $s1, $s2, $s3 if ($s2<$s3) $s1=1 else
format)
$s1=0
set on less than 0 and 43 sltu $s1, $s2, $s3 if ($s2<$s3) $s1=1 else
unsigned $s1=0
set on less than 10 slti $s1, $s2, 6 if ($s2<6) $s1=1 else
immediate $s1=0
set on less than imm. 11 sltiu $s1, $s2, 6 if ($s2<6) $s1=1 else
unsigned $s1=0
Uncond. jump 2 j 2500 go to 10000
Jump (J jump and link 3 jal 2500 go to 10000; $ra=PC+4
& R format)
jump register 0 and 8 jr $s1 go to $s1
jump and link reg 0 and 9 jalr $s1, $s2 go to $s1, $s2=PC+4
Floating-Point
 What can be represented in N bits?
 Unsigned 0 to 2N
 2s Complement - 2N-1 to 2N-1 - 1
 With MIPS singed integers the 32-bit architecture allows
us to represent numbers in the range ±2.15Χ109
 But, what about very large numbers?
9,349,398,989,787,762,244,859,087,678
 What about very small numbers?
0.0000000000000000000000045691
 Floating point representation allows much larger range
at the expense of accuracy
Scientific Notation
Normalized notation – single number to the Decimal notation
left of decimal point (no leading 0s) Exponent - how many digits the decimal point is moved
to left to get to 1
Sign, magnitude
23 -24
1.02 x 10 -1.673 x 10
significand radix (base)
Sign, magnitude Binary notation (binary floating point)

Number of ys determines range
yyyyyyy
±1.xxxxxtwo × 2
radix (base 2)
Number of xs determines accuracy
MIPS Register bit allocation
 Representation of floating point means that the binary point
“floats” - to get a non-0 bit before it. The binary point is not fixed.
 Since number of bits in register is fixed - we need to compromise
1 8 bits 23 bits
sign S E F
fraction:
Signed exponent sign + magnitude, normalized
binary significand
When exponent is too large – or too small – an exception
Overflow, or underflow
 Representation of 2.0tenx10-38 to 2.0tenx1038. In double-
precision two registers are used. This increases the range to
2.0ten x10-308 to 2.0tenx10308. Significand now has 52 bits.
1 11 bits 20 bits
sign S E F
Signed exponent F
32 bits
IEEE 754 Floating Point Standard
Representation of floating point numbers in IEEE 754 standard -
assures uniformity across computer implementations.
 The 1 in significand is implicit - 0ten is given as a reserved

representation 00…..000two
 The rest of the numbers areS E-127

N = (-1) (1+F) 2
 Placing the sign bit and exponent first makes easier integer
comparisons for sorting
 Using an exponent bias allows exponent to be unsigned, smallest

being 00…000two largest being 11…111two. Makes comparisons
easier
 Double precision bias is 1023ten. Underflow and overflow can still

occur
IEEE 754 - continued
 If the 23 significand bits are numbered from left-to-right then the
floating point number represented by these bits is
S
N = (-1)(1+s1•2-1+s2•2-2+ .. +s23•2-23)2(E-bias)
 So the register containing the bits
1 1000 0011 111000….. 0
represents
N = (-1) (1+2 +2 +2 )2
-1 (27
-2 +2 1
-3+20
-127)
N = -1.875 2 (131-127)
N = -1.87524 = -1.875  16= -30
Exercis
 Show the IEEE 754 representation
e of 10ten in single and double
precision
10ten = 1010two = 1.01 23 in normalized notation

 The sign bit is 0, the exponent is 3+127 = 130= 1000 0010two
23 bits
0 1000 0010 01000….. 0 sp
 In double precision the exponent is 3+1023= 1026

= 100 0000 0010two 20 bits
0 100 0000 0010 01000….. 0 dp
0 0000 00 00000….. 0 32 bits

MIPS Floating Point Instructions
 Use special registers $f0… $f31. For double precision
$f0, $f2,.. $f30 (which are in fact pairs of registers).
 The advantage of having separate registers for floating
point operations – we have twice as many registers
available and
 We maintain the same number of bits in the instruction
format.
 Disadvantage – need more instructions – need special
instructions to load words into the floating point registers.
R2000 CPU
Basic Addition Algorithm
For addition (or subtraction) there are specific steps that are taken to
make sure the proper digits are added:
(1) the decimal (or binary) point has to be aligned
(2) this means that the significand of the smaller number is shifted to
the right until the decimal points are aligned.
(3) then the addition of the significand takes place
(4) the result needs to be normalized, which means the decimal point
is shifted left and exponent increases.
(5) the result needs to be truncated to available number of digits and

round off (add 1 to the last available digit if number to the right is 5 or
larger)
Basic Addition Algorithm
Addition example
 add 9.999•101 and 0.1610•100
 first shift to align decimal point

0.01610•101
 then truncate 0.016•101
 then add
9.999•101
+ 0.016•101
10.015•101
 after normalization 1.0015•102
 after rounding 1.002•102

Arithmetic Unit for FP addition
Subtract to
determine
which is Significand
smaller of the larger
number
Significand of the smaller
Output the number
larger
exponent
Add
significands
Round off
significand
Final result
Floating Point Multiplication
 Biased exponents are added then one
bias is subtracted
 1.110 •1010 x 9.200 •10-5
 New exponent is 10-5=5
 when using bias 137 (which is
10+127) + 122 (is -5+127)-127=132
 Significands are multiplied
10.212000ten
 Unormalized product is 10.212ten •105
Normalized is 1.0212ten •106. Rounded
to four digits 1.021ten •106
 Sign is determined by the sign of both
operands + 1.021ten •106
 Floating point registers have special load and store instructions,
but still use integer registers to store the base address for the loaded
word
load word in coprocessor 1

lwc1 $f1, 100($s2) # $f1=memory[$s2+100]
store word from coprocessor 1
swc1 $f1, 100($s2) # memory[$s2+100]=$f2
 there are move instructions to move data between integer and
floating point registers
 mtc1 $t1, $f5 #moves $t1 into $f5
 mfc1.d $t2, $f2 #moves $f2, $f3 into $t2, $t3
 The suffix .s or .d in an instruction specifies it is floating
point. .s if it is single and .d if it is double precision.
add.s $f1, $f4, $f5 # $f1=$f4+$f5

sub.s $f2, $f4, $f6 # $f2=$f4-$f6
mul.s $f2, $f4, $f6 # $f2=$f4x$f6
div.s $f2, $f4, $f6 # $f2=$f4/$f6
 To load two floating point numbers, subtract them and place the
result into memory the code is
lwc1 $f4, 100($s2) #loads a 32-bit floating
point number into $f4
lwc1 $f6, 200($s2) #loads another 32-bit
floating point number into $f6
sub.s $f10, $f4, $f6 #$f10=$f4-$f6
swc1 $f10, 240($s2) #store 32-bit floating
point into memory
 Flow operations also have floating point variants
 Compare less than single and Compare less than double
c.lt.s $f2, $f4 #if $f2<$f4 condition=true,

otherwise condition bit is
false
c.lt.d $f2, $f4 #compare in double precision
 R-type instruction
 Other conditions can also be tested (eq, ne, gt, etc.)
 Floating point has branch operations based on the state of the
“condition” bit
Branch if FP comparison is true (true==1)
 bc1t 25 #jump to address PC+4+100 if true
Branch if FP comparison is false (true==0)
bc1f 100 #jump to address PC+4+400 if false
I-type instructions
Extra Bits for rounding
IEEE 754 allows the use of guard and round bits added to the
allowed number of bits to reduce round-off errors
How many extra bits?
IEEE: As if computed the result exactly and rounded.
 Guard Digits: digits to the right of the first P digits of significand

to guard against loss of digits – can later be shifted left into first P
places during normalization.
 Addition: carry-out shifted in
 Subtraction: borrow digit and guard
 Multiplication: carry and guard, Division requires guard
 Sticky bit is set to 1 if there are nonzero bits to the right of the
round bit –helps deal with rounding numbers like 2.345
Extra Bits for rounding
Exercise add numbers 8.76 •101 and 1.47 •102 with only three
allowed significand digits: a) use guard and round digits;
b) do not use these two digits
a) We extend the numbers with the two digits
8.7600 •101 and align 0.87600 •102 and 1.4700 •102
Then we add the significands 0.8760
+ 1.4700
2.3460
After rounding off to three significand digits 2.35 •102
b) Without guard and round digits 0.87
+ 1.47
2.34 or 2.34 •102
Results differ
Using the sticky bit in rounding
Exercise add numbers 5.01 •10-1 and 1.34 •102 with only three
allowed significand digits: a) use guard, round bits and sticky bits;
b) use only guard, round bits
a) We extend the numbers with the two digits
5.0100 •10-1 and align 0.0050100 •102 and 1.3400 •102
Then we add the significands 0.0050
+ 1.3400
1.3450
After rounding off to three significand digits 1.35 •102 because sticky
bit was 1
b) Without sticky bit result is 1.34 •102
Results differ
Exercis
A single-precision IEEE number is stored at memory address X.
Write a sequence of MIPSeinstructions to multiply the number at X
by 2 and store the results back at location X. Accomplish this
without using any floating point instructions (Don’t worry about
overflow).
a) Multiplying by 2 is same as adding 1 to the exponent field
23 bits
0 1000 0100 1000….. 0 $t0
23 bits
0 0000 0000 0000….. 1 $s0
afte
lw $t0, X($0) r
addi $s0, $0, 1 addi
sll $s0, $s0, 23 tion
addu $t0, $t0, $s0
sw $t0, X($0)
Exercis
X=0100 0110 1101 1000 0000 0000 0000twoand
e
Y=1011 1110 1110 0000 0000 0000 0000two represent
single-precision IEEE 754 floating point numbers.
a) What is x + y? 1 8 bits 23 bits
sign S E M
b) What is x * y?
a) Remember that mantissa:
Signed exponentsign + magnitude, normalized
binary significand
Convert +1.1011*214 + –1.11*2–2
1.1011 0000 0000 0000 0000 000
–0.0000 0000 0000 0001 1100 000
–1.1010 1111 1111 1110 0100 000

0100 0110 1101 0111 1111 1111 0010 0000
b) What is x * y ?
First we calculate a new exponent Exercise -
111 11 1
100 01101
continued
+011 11101
1000 01010
–011 11111 minus bias
1111 1111
100 01011 new exponent
Then we multiply ×1.101 1000 0000 0000 0000 0000
the significands 1.110 0000 0000 0000 0000 0000
1 11 11
1 1011 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
11 0110 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
+1.10 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
10.11 1101 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
Exercise -
continued
Normalize and round:

exponent 100 0110 0
significand
1.011 1010 0000 0000 0000 0000
Signs differ, so result is negative:
1100 0110 0011 1010 0000 0000 0000 0000
Exercis
e
IA-32 uses 82-bit registers, allowing 64-bit significands and 16-bit
exponents:
- what is the bias in the exponent?
- Range of numbers?
- How much greater accuracy compared to double precision?
a) There are 15 bits available for the exponent size,

thus bias is 215-1 = 32767;
a) range of numbers is 2.0 x 10 –9864 to 2.0 x 109864
b) accuracy is 20% better
double precision range was 2.0ten x10-308 to 2.0tenx10308 so
range is 32 times larger (9834/308)

Computer Arithmetic: Electrical and Computer Engineering Department

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Arithmetic: Electrical and Computer Engineering Department

Uploaded by

Copyright:

Available Formats

Electrical and Computer Engineering Department

0000 0000 0000 0000 0000 0000 0000 0011two = 3ten

0000 0000 0000 0000 0000 0001 1111 0011two

0000 0000 0000 0000 0000 0001 1111 0100two

 Then we apply the formula

Sign bit 31 magnitude bits

 The result of slt and sltu when comparing registers

INST op funct INST op funct INST op funct

Sum = Σminterms = a’b’c + a’bc’ + ab’c’ +abc

 The 1-bit ALU with 0 is implemented with

a &b, a|b, a+b, 0

This is the MSB ALU Cell

000000 $s2 $s3 00000 00000 24

 at each stage shift the multiplicand left ( x 2)

If Multiplier0 = 1 add multiplicand

 Multiplicand stay’s still

+ 00001100 shift (0 in multiplier)

0. initial value 0010 0000 1101 0 10 -> subtract

1b. 0010 1111 0110 1 01 -> add multiplicand

2a. 0001 0110 1 shift P and sign ext.

2b. 0010 0000 1011 0 10 -> sub multiplicand

3a. 0010 1110 1011 0 shift and sign ext

3b. 0010 1111 0101 1 11 -> no op

One 32-bit adder for each bit of the multiplier.

000000 $s2 $s3 00000 00000 26

Initially holds divisor Initially holds 0s

At the end right half

4a 0010 0001 0001 Rem=Rem-Div

significand radix (base)

Sign, magnitude Binary notation (binary floating point)

 The 1 in significand is implicit - 0ten is given as a reserved

 The rest of the numbers areS E-127

 Using an exponent bias allows exponent to be unsigned, smallest

 Double precision bias is 1023ten. Underflow and overflow can still

1 1000 0011 111000….. 0

10ten = 1010two = 1.01 23 in normalized notation

 In double precision the exponent is 3+1023= 1026

0 0000 00 00000….. 0 32 bits

(1) the decimal (or binary) point has to be aligned

(3) then the addition of the significand takes place

(5) the result needs to be truncated to available number of digits and

 add 9.999•101 and 0.1610•100

 first shift to align decimal point

 then truncate 0.016•101

 after normalization 1.0015•102

 after rounding 1.002•102

load word in coprocessor 1

add.s $f1, $f4, $f5 # $f1=$f4+$f5

c.lt.s $f2, $f4 #if $f2<$f4 condition=true,

 Guard Digits: digits to the right of the first P digits of significand

–1.1010 1111 1111 1110 0100 000

Normalize and round:

a) There are 15 bits available for the exponent size,

You might also like