You are on page 1of 11

Fixed point tutorial

Gianluca Biccari
(Email: gb@gianlucabiccari.com)

Binary Format
N binary digits (bit) has 2N states. There is no an unique meaning inherent in a binary word, it depends on the representation and mapping choose.

Most of the time people are used to think of binary word as a positive integers but in digital signal processing real numbers are required!

Real number in DSP


Fixed point representation: every word has the same number of digits and the binary point is always fixed at the same position.
N bit whole part fractional part

binary point

Qm.n format: m bit for whole part, n bit for fractional part. Floating point representation: the position of the point must be determined at processing time. These calculations are much slower than fixed point. The reason for using floating point representation is that the range of possible values is much greater.

Qm.n format: unsigned fixed point


N = m + n bits
X[B2] = bn+m-1 . . . bn+1 bn bn-1
N 1

...

b2 b1 b0

1 x [B10] = n 2
fractional part adjustment

2 b
i i =0

x like an integer number

x min =0 2 N 1 m n x max = n = 2 2 2
resolution: 2-n

i.e. : Q16.16 N = 32 bit range: 0 (216 2-16 ) = 65535,99998 resolution: 3,05 10-5

Qm.n format: unsigned fixed point examples


Q6.2 : N = 8, range = 0 63,75, resolution: 0,25
Example1: 0x8A Example2: 53,68 (138)/22 = 34,5 53.68 22 = 214.72 = (round) 215 (11010111)[B2] = 0xD7 (215)/22 = 53,75

Example3: 0xD7

Twos complement representation


For non-negative x: use ordinary base-two representation with leading (sign) bit For negative x (x): 1 - Find 2s complement of x: C2(x) = 2N x 2 - Use base two representation to write this number Using N bit is possible to write all the number x that: - 2 N-1 x 2 N 1 1 Example: N = 4, x = -6 1 - C2(4) = 24 - 6 = 10 2 - 10 = (1010)[B2]
10000 1111 1110 1101 1000 0000 0001 0010 0111 24 = 16 -1 (16-1=15) -2 (16-2=14) -3 (16-3=13) -8 (16-8=8) 0 +1 +2 +7

MSB = 1

MSB = 0

Twos complement representation


Every arithmetic operations is like a hardware addition There is only one representation for the zero value From 2s compl. to dec base
N 2
b3 b2 b1 b0

x [B10] = 2

N 1

bN 1 + 2 bi
i i =0
bN-1 = 1

Example: N = 4, x = 1101 x[B2] = 1101 x[B10] = -8+5 = -3 x[B2] = 0101 x[B10] = 0+3 = 3

bN-1 = 0

1111 1110 1101 1000 0000 0001 0010 0111

-1 (=-8+7) -2 (=-8+6) -3 (=-8+5) -8 (=-8+0) 0 (=0+0) +1(=0+1) +2(=0+2) +7(=0+7)

Qm.n format: signed twos complement fixed point


sign bit

N = m + n + 1 bit
bn-1 ... b2 b1 b0

X[B2] = bn+m bn+m-1 . . . bn+1 bn

fractional part adjustment

x [B10]

N 2 1 = n 2 N 1 bN 1 + 2i bi 2 i =0

x like twos complement number

2 N 1 x min = n = 2 m 2

2 N 1 1 m n x max = =2 2 n 2
resolution: 2-n

i.e. : Q2.13 N = 16 bit range: -22 (22 2-13 ) = - 4 3,99987 resolution: 1,22 10-4

Float value
From float to Qm.m:

Qm.n value
[
N 2

x [B10] 2 n = 2 N 1 bN 1 + 2i bi
i =0

Qm.n representations bit

Qm.n is the 2s complement representation of the float value multiplied by 2n. Examples: Q4.4 N=9, resolution= 0.0625, range: -16 15,9375 9.6 9.6 24 =153.6 = (round) 154 = (0 1001 1010)[B2] -12,4 -12.4 24 = -198.4 = (round) -198 C2(198) = 29 -198 = 314 = (1 0011 1010)[B2]

2s c. Qm.n format examples


Q15.16 N=32, range = -32768 32767.99998 resolution: 1.525 10-5 Example1: -1265,337 -1265,337 216 = -82925125.632 = (round) - 82925126 C2(82925126) = 232 82925126 = 0xFB0EA9BA Example2: 0xFB0EA9BA=(1111 1011 0000 1110 1010 1001 1011 1010)[B2] = ( -231 + (2064558522) ) / 216 = -1265.337

Arithmetic rules for 2s c. fixed point


Addition: two numbers has to be represented in the same Qm.n format to be added, it means that the binary point must be aligned. Multiplication: Qm1.n1 * Qm2.n2 = Q (m1+m2+1).(n1+n2) (Most of DSP achieve an implicit shifting on the result) Resolution: is the smallest non-zero magnitude representable, = 2-n Accuracy: is the magnitude of the maximum difference between a real value and its representation, = 2-n/2 = 2-n-1