You are on page 1of 11

Fixed point tutorial

Gianluca Biccari
(Email: gb@gianlucabiccari.com)

Binary Format






N binary digits (bit) has 2N states.


There is no an unique meaning inherent in a
binary word, it depends on the representation
and mapping choose.

Most of the time people are used to think of


binary word as a positive integers
but in digital signal processing real numbers
are required!

Real number in DSP




Fixed point representation: every word has the same


number of digits and the binary point is always fixed at
the same position.
N bit
whole part

fractional part

binary point

Qm.n format: m bit for whole part, n bit for fractional part.


Floating point representation: the position of the point


must be determined at processing time. These
calculations are much slower than fixed point. The
reason for using floating point representation is that the
range of possible values is much greater.

Qm.n format: unsigned fixed point


N = m + n bits
X[B2] =

bn+m-1 . . . bn+1 bn

bn-1

1
x [B10] = n
2
fractional part adjustment

x min =0
2 N 1 m n
x max = n = 2 2
2
resolution: 2-n

...

b2 b1 b0

N 1

2 b
i

i =0

x like an integer number

i.e. : Q16.16
N = 32 bit
range: 0 (216 2-16 ) = 65535,99998
resolution: 3,05 10-5

Qm.n format: unsigned fixed point


examples


Q6.2 : N = 8, range = 0 63,75, resolution: 0,25

Example1: 0x8A  (138)/22 = 34,5


Example2: 53,68  53.68 22 = 214.72 = (round) 215
 (11010111)[B2] = 0xD7
Example3: 0xD7  (215)/22 = 53,75

Twos complement representation




For non-negative x: use ordinary base-two


representation with leading (sign) bit

For negative x (x):


1 - Find 2s complement of x: C2(x) = 2N x
2 - Use base two representation to write this number

Using N bit is possible to write all


the number x that:
- 2 N-1 x 2 N 1 1
Example: N = 4, x = -6
1 - C2(4) = 24 - 6 = 10
2 - 10 = (1010)[B2]

10000

MSB = 1

MSB = 0

1111
1110
1101

1000
0000
0001
0010

0111

24 = 16
-1 (16-1=15)
-2 (16-2=14)
-3 (16-3=13)
-8 (16-8=8)
0
+1
+2
+7

Twos complement representation




Every arithmetic operations is like a hardware addition

There is only one representation for the zero value

From 2s compl. to dec base

b3 b2 b1 b0

N 2

x [B10] = 2

N 1

bN 1 + 2 bi
i

i =0
bN-1 = 1

Example: N = 4, x = 1101
x[B2] = 1101  x[B10] = -8+5 = -3
x[B2] = 0101  x[B10] = 0+3 = 3

bN-1 = 0

1111
1110
1101

1000
0000
0001
0010

0111

-1 (=-8+7)
-2 (=-8+6)
-3 (=-8+5)
-8 (=-8+0)
0 (=0+0)
+1(=0+1)
+2(=0+2)
+7(=0+7)

Qm.n format: signed twos


complement fixed point
sign bit

N = m + n + 1 bit

X[B2] = bn+m bn+m-1 . . . bn+1 bn

fractional part
adjustment

x [B10]

...

N 2
1
= n 2 N 1 bN 1 + 2i bi
2
i =0

2 N 1
x min = n = 2 m
2

2 N 1 1 m n
x max =
=2 2
n
2
resolution: 2-n

bn-1

b2 b1 b0

x like twos
complement
number

i.e. : Q2.13
N = 16 bit
range: -22 (22 2-13 ) = - 4 3,99987
resolution: 1,22 10-4

Float value


Qm.n value

From float to Qm.m:

N 2

x [B10] 2 n = 2 N 1 bN 1 + 2i bi

i =0

Qm.n representations bit

Qm.n is the 2s complement representation of the


float value multiplied by 2n.


Examples:
Q4.4 N=9, resolution= 0.0625, range: -16 15,9375
9.6  9.6 24 =153.6 = (round) 154 = (0 1001 1010)[B2]
-12,4  -12.4 24 = -198.4 = (round) -198
C2(198) = 29 -198 = 314 = (1 0011 1010)[B2]

2s c. Qm.n format examples


Q15.16 N=32, range = -32768 32767.99998
resolution: 1.525 10-5
Example1:
-1265,337
-1265,337 216 = -82925125.632 = (round) - 82925126
C2(82925126) = 232 82925126 = 0xFB0EA9BA


Example2:
0xFB0EA9BA=(1111 1011 0000 1110 1010 1001 1011 1010)[B2]
= ( -231 + (2064558522) ) / 216 = -1265.337

Arithmetic rules for 2s c. fixed point


Addition: two numbers has to be represented in the same
Qm.n format to be added, it means that the binary point must
be aligned.
 Multiplication:
Qm1.n1 * Qm2.n2 = Q (m1+m2+1).(n1+n2)
(Most of DSP achieve an implicit shifting on the result)


Resolution:
is the smallest non-zero magnitude representable, = 2-n
Accuracy:
is the magnitude of the maximum difference between a real
value and its representation, = 2-n/2 = 2-n-1

You might also like