Fixed Point Numbers

Fixed point numbers
Fast and inexpensive implementation

Limited in the range of numbers
Susceptible to problems of overflow
In a fixed-point processor, numbers are represented in
integer format.
Fixed-point numbers and their data types are
characterized by their -
word size in bits
binary point
and
whether they are signed or unsigned
The dynamic range of an N-bit number based on 2s-
complement representation is between -(2N-1) & (2 N-1 -
1), or between -32,768 and 32,767 for a 16-bit system.
By normalizing the dynamic range between -1 and 1, the
range will have 2N sections, 2 -(N-1) -size of each section
starting at -1 up to 1 2 -(N-1).
For a 4-bit system, there would be 16 sections, each of
size 1/8, from -1 to 7/8 .
In unsigned integer
the stored number can take on any integer value
from 0 to 65,535.
signed integer
uses two's complement
allows negative numbers
it ranges from -32,768 to 32,767
With unsigned fraction notation

65,536 levels spread uniformly between 0 and 1
the signed fraction format
allows negative numbers, equally spaced between
-1 and 1
15+1=0 6+(-2)=4
The 4-bit unsigned numbers represent a modulo (mod)
16 system.
If 1 is added to the largest number (15), the operation
wraps around to give 0 as the answer.
A number wheel graphically demonstrates the addition
properties of a finite bit system.
Addition procedure
1Find the first number x on the wheel.
2. Step off y units in the clockwise direction, which brings you to
the answer.
Carry and Overflow
Carry applies to unsigned numbers when adding or

subtracting, result is incorrect.
Overflow applies to signed numbers when adding or

subtracting, result is incorrect.
Examples:
Overflow Carry
Sign bit
01111 + 100+
00111 111
-------- -------------
10110 1011
Sign bit
Carry
Fractional Fixed Point Rep
Rather than using the integer values just
discussed, a fractional fixed-point number
that has values between +0.99 . . . and -1
can be used.
Data types
1.Short:
it is of size 16 bits represented as 2s complement with
a range from -215 to (215 -1)
2.Int or signed int:
it is of size 32 bits represented as 2s complement with
a range from -231 to ( 231-1)
3.Float:
it is of size 32 bits represented as IEEE 32 bit with a
range from 2-126(1.175494x10-38) to 2+128
(3.40282346x1038)
4.Double:
it is of size 64 bits represented as IEEE 64 bit with a
range from 2-1022(2.22507385x10-308) to 2
1024(1.79769313x10308)
Floating-point representation
The advantage over fixed-point representation is that

it can support a much wider range of values.
The floating-point format needs slightly more storage
The speed of floating-point operations is measured in

FLOPS.
General format of floating point number :
X= M. be
where M is the value of the significand (mantissa),
b is the base
e is the exponent.
Mantissa determines the accuracy of the number
Exponent determines the range of numbers that can be
represented
Floating point numbers can be represented as:
Single precision :
called "float" in the C language family
it is a binary format that occupies 32 bits
its significand has a precision of 24 bits
Double precision :
called "double" in the C language family
it is a binary format that occupies 64 bits
its significand has a precision of 53 bits
Single Precision (SP):
31 30 23 22 0
S e f
Bit 31 represents sign bit

Bits 23 to 30 represents exponent bits
Bits 0 to 22 represents fractional bits
Numbers as small as 10-38 and as large as 10 38 can be

represented
Double precision (DP) :
since 64 bits, more exponent and fractional bits are available
a pair of registers are used
31 30 20 19 0 31 0
s e f f
Bits 0 to 31 of first register represents fractional bits

Bits 0 to 19 second register also represents fractional bits
Bits 20 to 30 represents exponent bits
Bits 31 is the sign bit
Numbers as small as 10 -308 and as large as 10 +308 can be

represented
Instructions ending in SP or DP represents single and double
precision
Some Floating point instructions have more latencies than fixed
point instructions
Eg: MPY requires one delay
MPYSP has three delays
MPYDP requires nine delays
Single precision floating point value can be loaded into a single
register where as Double precision values need a pair of
registers
A1:A0, A3:A2 ,.. B1:B0, B3:B2 ,
C6711 processor has a single precision reciprocal instruction

RCPSP for performing division

Fixed Point Numbers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fixed Point Numbers

Uploaded by

Copyright:

Available Formats

Fixed point numbers

Fast and inexpensive implementation

With unsigned fraction notation

Carry applies to unsigned numbers when adding or

Overflow applies to signed numbers when adding or

The advantage over fixed-point representation is that

The floating-point format needs slightly more storage

The speed of floating-point operations is measured in

Bit 31 represents sign bit

Numbers as small as 10-38 and as large as 10 38 can be

Bits 0 to 31 of first register represents fractional bits

Numbers as small as 10 -308 and as large as 10 +308 can be

C6711 processor has a single precision reciprocal instruction

You might also like