You are on page 1of 14

UNIT II

ARITHMETIC OPERATIONS
NUMBER FORMATS:
The two principal number formats are fixed point and floating point
Fixed-Point- Allow a limited range of values and relatively have simple hardware
requirements.
Floating Point- Allow a much larger range of values, but requires costly hardware.
Fixed -Point Numbers:
i) Binary Numbers:
The fixed-point format is derived directly from the ordinary decimal representation of
a number as a sequence of digits separated by a decimal. The digits to the left of the decimal
point represent an integer; the digits to right represent a fraction. This is positional notation in
which each digit has a weight according to its position relative to the decimal point. If i> 1, the ith
digit to the left (right) of the decimal point has weight 10 i-1 (10-i). Thus the five-digit decimal
number 192.73 is equivalent to
1 x 102+9 x101+2 x 100+ 7 x10-1+3 x10-2
More generally, we can assign weights of the form r i, where r is the base or radix the
number system, to each digit. The most fundamental number representation used in computers
employs a base-two positional notation. A binary word of the form
bN .b3b2b1b0.b-1b-2b-3bM
N

Represents the number

i= M

bi2i

Suppose that an n-bit word is to contain a signed binary number, one bit is reserved to represent
the sign of the number, while the remaining indicates its magnitude.

ii) Signed Numbers:


a) Signed Magnitude Representation:

Suppose that both positive and negative binary numbers are to be represented by an
n-bit word X = x n-1 x n-2 x n-3 ...x 2 x 1 x 0 , The standard format for positive numbers is given as
follows, with the sign bit of 0 on the left and the magnitude to the right in usual positional
notation.
x n-1 x n-2 x n-3 ...x 2 x 1 x 0
Sign

magnitude

A natural way to indicate negative numbers is to employ the same positional notation for
the magnitude and change the sign bit x n-1 to 1 to indicate minus. Thus with n= 8, +75 =
01001011, while -75 = 11001011. This number code is called sign magnitude.
b) Ones Complement Representation:
In the ones complement code -X is denoted by

, the bitwise logical

complement of X. In this code we again have +75 = 01001011, but now -75 = 10110100.
c) Twos Complement Representation:
In the twos-complement code, -X is formed by adding 1 to the least significant bit
of

and ignoring any carry bit generated from the most significant (sign) position.

In twos-complement code, +75 = 01001011 and -75 = 10110101. Note that in both
complement codes, xn-1 retains its role as the sign bit, but the remaining bits no longer
form a simple positional code when the number is negative.
The primary advantage of the complement codes is that subtraction can be
performed by logical complementation and addition only. Consequently, twos
complement addition and subtraction can be implemented by a simple adder
designed for unsigned numbers. Multiplication and division are more difficult to
implement if twos-complement code is used instead of sign magnitude.
Exceptional conditions:
If the result of an arithmetic operation involving numbers is too large or small to be
represented by n bits, overflow or underflow is said to occur. It is generally necessary to
detect overflow and underflow, since they may indicate bad data or a programming error.
Overflow can occur only when adding two numbers of the same sign.
A related issue in computer arithmetic is round-off error, which results from the fact
that every number must be represented by a limited number of bits. An operation
involving n-bit numbers frequently produces a result of more than n bits. Retaining the n

most significant bits of the result without modification is called truncation. Clearly the
resulting number is in error by the amount of the discarded digits. This error can be
reduced by a process called rounding. One way of rounding is to add rj/2 to the number
before truncation, where rj is the weight of the least significant retained digit. For
instance, to round 0.346712 to three decimal places, add 0.0005 to obtain 0.347212 and
then take the three most significant digits 0.347.
iii) Decimal numbers:
Humans use decimal arithmetic and decimal numbers have to be converted into binary
from decimal representation. Converting unsigned binary number xn-1 xn-2 x0 to decimal
requires a polynomial of the form
n1

i=0

Xi 2k+i

The number codes in which each decimal digit are encoded separately by a
sequence of bits is called decimal codes.
The most widely used decimal code is Binary Coded Decimal (BCD)
In BCD format each digit di of a decimal number is denoted by its 4 bit equivalent b i,3b
i,2b i,1b i,0 in binary form.
Excess 3 code is a code which is formed by adding 0011 to the corresponding BCD
number
Two-out-of-five code-is a decimal code in which each decimal digit is represented by a 5
bit sequence containing two 1s and 3 0s. There are exactly ten distinct sequences of this type.

iv) Hexadecimal Codes:


The hexadecimal number format uses a base of radix r =16 and the use of 16
digits, consisting of decimal digits 0.9 and numerical values 10, 11, 12, 13, 14 and 15
are represented as A, B, C, D, E and F.
For example the unsigned hexadecimal number 2FA0C has the interpretation
=2 x 164 + F x 163 + A x 162 +0 x 161 +C x 160
= (195,084)10
2FA0C=001011111010000011002

ADDITION AND SUBTRACTION:


ADDER/SUBTRACTER:
An adder suffices for both addition and subtraction when twos-complement number code
is used. The best way to add negative numbers which have 1 as the sign bit depends on the
number code used. Adding -X to Y is equivalent to subtracting X from Y, so the ability to add
negative numbers implies the ability to do subtraction. Subtraction is relatively simple with twoscomplement code because negation (changing X to -X) is very easy to implement.
If X =xn-1 xn-2 .....x0 is a twos-complement integer, then negation is realized by

-X=

xn-1 xn-2 ..... x0 + 1

(1)

Where + denotes addition modulo 2n. An efficient way to obtain the ones-complement portion x=
x
x ..... x of -X in (1) uses the Word-based EXCLUSIVE OR function X s with a
n-1

n-2

control variable s. When s = 0, X


X

s = X, but when s = l, X

s = X. Suppose that Y and

s are now applied to the inputs of an n-bit adder. The addition of l required by (2.2) to

change X to -X can be realized by applying s to the carry input line of the adder. The control line
s selects the addition operation Y + X when s = 0 and the subtraction operation Y- X = Y + X + 1
when s = 1. Thus extending a parallel adder to perform twos-complement subtraction as well as
addition merely requires connecting n two-input EXCLUSIVE-OR gates to the adders inputs;
these gates are represented by a single n-bit word gate in the following Figure

Carry look-ahead Adders (or) High Speed Adders:


The general strategy for designing fast adders is to reduce the time required to form carry
signals. One approach is to compute the input carry needed by stage i directly from carry-like
signals obtained from all the preceding stages i-1, i-2 , ... ,0, rather than waiting for normal
carries to ripple slowly from stage to stage. Adders that use this principle are called carrylookahead adders. An n-bit carry-lookahead adder is formed from n stages, each of which is
basically a full adder modified by replacing its carry output line ci by two auxiliary signals called
gi and pi, or generate and propagate, respectively, which are defined by the following logic
equations:

gi = xiyi

pi = xi + yi

(1)

The name generate comes from the fact that stage i generates a carry of 1 (ci = 1)
independent of the value of ci-1 if both xi and yi are l; that is, if xiyi = l. Stage i propagates ci-1;
that is, it makes ci = l in response to ci-1 =l if xi or yi is 1, in other words, if xi+yi = 1.
Now the usual equation ci = xiyi + xici-1 + yici-1, denoting the carry signal ci to be sent
to stage i + l, can be rewritten in terms of gi and pi as follows

ci = gi + pi ci-1
Similarly, ci-1 can be expressed in terms of gi-1, pi-1, and ci-2.

(2)

ci-1 = gi-1 + pi-1 ci-2

(3)

On substituting (3) into (2) we obtain

ci = gi + pi gi-1 + pi pi-1ci-2
Continuing in this way, ci can be expressed as a sum-of-products function of the p and g
outputs of all the preceding stages. For example, the carries in a four-stage carry-lookahead
adder are defined as follows:

C0 = g0 + p0cin
C1 = g1 + p1g0 + p1p0cin
C2 = g2 + p2g1 + p2p1g0 + p2p1p0cin
C3 = g3 + p3g2 + p3p2g1 + p3p2p1g0 + p3p2p1p0cin (4)
We can further simplify the design by noting that the sum equation for stage i

zi = xi yi

ci-1 , which is equivalent to

zi = pi gi

ci-1

(5)

The following figure shows the general form of a carry-lookahead adder circuit designed
in this way.

Overall structure of carry-lookahead adder


MULTIPLICATION:
SIGNED MULTIPLICATION:
BOOTHS MULTIPLICATION ALGORITHM:
(BOOTH RECODING TECHNIQUE)
Booths Algorithm is an interesting and widely used scheme, for twos complement
multiplication and was proposed by Andrew D. Booth in the 1950s.
In Booth's algorithm, two adjacent bits xixi+1 are examined in each step. If xixi+1 = 01, then Y
is added to the partial product, while if x ixi+1 = 10, Y is subtracted from Pi (partial product).
If xixi+1 = 00 or 11, then neither addition nor subtraction is performed. Thus, booth's
algorithm effectively skips over sequences of 1s and 0s in X. As a result, the total number of
addition/subtraction steps required to multiply two numbers decrease (however, at the cost of
extra hardware).

Booths Algorithm Example:


Multiplier

= 11001110 (Q)

-50

Multiplicand = 10101010 (M) -86


Step
0
1
2

3
4
5

6
7

Action
Initialize registers
set Q[-1] =0
Right Shift A.Q
Subtract M from A
Right Shift A.Q
Right Shift A.Q
Right Shift A.Q
Add M to A
Right Shift A.Q
Right Shift A.Q
Subtract M from A
Right Shift A.Q

Accumulator A
00000000
00000000
00000000
01010110
01010110
00101011
00010101
00001010
10101010
10110100
11011010
11101101
01010110
01000011
00100001

Register Q
11001110
110011100
011001110
011001110
001100111
100110011
110011001
110011001
011001100
001100110
001100110
100110011

Right Shift A.Q

00010000

11001100 1

FAST BOOTH MULTIPLICATION ALGORITHM:


(BOOTH BIT PAIR RECODING TECHNIQUE)

(Modified Booths Algorithm)


Algorithm: (for unsigned numbers)
1. Pad the LSB with one zero.
2. Pad the MSB with 2 zeros if n is even and 1 zero if n is odd.
3. Divide the multiplier into overlapping groups of 3-bits.
4. Determine partial product scale factor from modified booth 2 encoding table.
5. Compute the Multiplicand Multiples
6. Sum Partial Products
Algorithm: (for signed numbers)
1. Pad the LSB with one zero.
2. If n is even dont pad the MSB ( n/2 PPs) and if n is odd sign extend the MSB by
1 bit ( n+1/2 PPs).
3. Divide the multiplier into overlapping groups of 3-bits.
4. Determine partial product scale factor from modified booth 2 encoding table.
5. Compute the Multiplicand Multiples
6. Sum Partial Products
BOOTH BIT PAIR RECODING TABLE:
Xi+1

Example:
Multiply X =
101011110 using
Algorithm
Modified
(for
signed

Xi

Xi-1

Action

0Y

1Y

1Y

2Y

-2 Y

-1 Y

-1 Y

0Y

011010001 and Y =
modified
Booths
Booths
numbers)

Algorithm:

1. Pad the LSB with one zero.


2. If n is even dont pad the MSB ( n/2 PPs) and if n is odd sign extend the MSB by 1 bit
( n+1/2 PPs).
3. Divide the multiplier into overlapping groups of 3-bits.
4. Determine partial product scale factor from modified booth 2 encoding table.
5. Compute the Multiplicand Multiples
6. Sum Partial Products

n is odd sign extend


the MSB by 1 bit

1
0
1
0
1
1

1
0
1
0
1
1

1
0
1
0
1
1

1
0
1
0
1
1

1
0
1
0
0
0

1
0
1
0
1
1

1
0
1
1
0
1

1
0
1
0
1
1

Xi+1

Xi

Xi-1

Action

0
0
0
1
0

1
0
1
1
0

0
0
0
0
1

Y
0
Y
-Y
Y

1
0
0
1
1
1

1
0
1
0
1
0

Y
X
1
0
0
0
1
1

1
0
1
0
1
0
0
1

0
1
0
0
1
1

1
1
1
0
1
0

0
0
0
0
1

1
1
1
0
0

Pad the LSB with one


zero

1
0
1
0

1 1 0 -162
0 0 1 209
1 1 0
Y
0
0
Y
-Y(2s complement of Y)
Y
1 0 1 1 1 1 1 0 (-33858)

DIVISION:
RESTORING AND NON-RESTORING DIVISION:
The circuit used for division is shown in the following Figure. The 2n-bit shift register
A.Q stores the partial remainders. Initially the dividend (which can contain up to 2n bits) is
placed in A.Q. The divisor V is placed in the M register where it remains throughout the division
process. In each step A.Q is shifted to the left. The positions vacated at the right-most end of the
Q register can be used to store the quotient bits as they are generated. When the division process
terminates, Q contains die quotient, while A contains the (shifted) remainder. The required
subtractions are facilitated using 2s complement arithmetic.

The datapath of a sequential n-bit binary divider


The division algorithm is of two types, restoring division and non-restoring division. The
algorithm for both the methods are given as follows,
RESTORING DIVISION:

Q
Dividend D
Divisor M

A restoring division example

NON- RESTORING DIVISION:

Q
Dividend D
Divisor M

Step 2 of Nonrestoring
Algorithm

A Non-restoring division example

FLOATING POINT NUMBERS:


a) Basic Format:
Three numbers are associated with a floating point number,
Mantissa M Significand or fraction, Exponent E, Base B
The three components together represent the real number M x BE
Example:
1.0 x 1018
1.0- Mantissa
18-exponent
10-base
Floating Point number is stored as a word (M,E) consisting of a pair of signed
fixed point number a mantissa M, which is usually a fraction or an integer and E
which is an integer.
Extra mantissa digits are included in a floating point processing circuits- These
are called guard bits to reduce approximation error. Guard bits are removed
automatically from the end result.
b) Normalization and biasing:
Normalized-The mantissa is said to be normalized, if the digit to the right
of the radix point is not zero. Normalization restricts the magnitude |M| of a
fractional binary mantissa to the range
|M| < 1
Bias- The floating point exponents should be coded in excess code where
the exponent field E contains an integer i.e., desired exponent value plus k. The
quantity K is called bias, and the exponent encoded in this way is called a biased
exponent or characteristic.

c) Standards:
Institute of Electrical and Electronics Engineers (IEEE) sponsored a
standard format for 32 bit and larger floating point numbers known as
IEEE 754 standard.

8-bit exponent
(Excess-127 binary
significand with

23-bit mantissa
(fraction part of sign magnitude binary

Integer)

hidden bit)

Number N represented by a 32 bit IEEE standard, floating point number


has the following set of interpretations,
If E=255 & M 0 then N = NAN
If E=255 & M = 0 then N = (-1)s
If 0<E<255 then N = (-1)s 2 E-127 (1.M)
If E=0 & M0 then N = (-1)s 2 E-127 (0.M)
If E=0 & M=0, then N =

You might also like