You are on page 1of 11

# Fixed point tutorial

Gianluca Biccari
(Email: gb@gianlucabiccari.com)

Binary Format
N binary digits (bit) has 2N states. There is no an unique meaning inherent in a binary word, it depends on the representation and mapping choose.

Most of the time people are used to think of binary word as a positive integers but in digital signal processing real numbers are required!

## Real number in DSP

Fixed point representation: every word has the same number of digits and the binary point is always fixed at the same position.
N bit whole part fractional part

binary point

Qm.n format: m bit for whole part, n bit for fractional part. Floating point representation: the position of the point must be determined at processing time. These calculations are much slower than fixed point. The reason for using floating point representation is that the range of possible values is much greater.

## Qm.n format: unsigned fixed point

N = m + n bits
X[B2] = bn+m-1 . . . bn+1 bn bn-1
N 1

...

b2 b1 b0

1 x [B10] = n 2

2 b
i i =0

## x like an integer number

x min =0 2 N 1 m n x max = n = 2 2 2
resolution: 2-n

i.e. : Q16.16 N = 32 bit range: 0 (216 2-16 ) = 65535,99998 resolution: 3,05 10-5

## Qm.n format: unsigned fixed point examples

Q6.2 : N = 8, range = 0 63,75, resolution: 0,25
Example1: 0x8A Example2: 53,68 (138)/22 = 34,5 53.68 22 = 214.72 = (round) 215 (11010111)[B2] = 0xD7 (215)/22 = 53,75

Example3: 0xD7

## Twos complement representation

For non-negative x: use ordinary base-two representation with leading (sign) bit For negative x (x): 1 - Find 2s complement of x: C2(x) = 2N x 2 - Use base two representation to write this number Using N bit is possible to write all the number x that: - 2 N-1 x 2 N 1 1 Example: N = 4, x = -6 1 - C2(4) = 24 - 6 = 10 2 - 10 = (1010)[B2]
10000 1111 1110 1101 1000 0000 0001 0010 0111 24 = 16 -1 (16-1=15) -2 (16-2=14) -3 (16-3=13) -8 (16-8=8) 0 +1 +2 +7

MSB = 1

MSB = 0

## Twos complement representation

Every arithmetic operations is like a hardware addition There is only one representation for the zero value From 2s compl. to dec base
N 2
b3 b2 b1 b0

x [B10] = 2

N 1

bN 1 + 2 bi
i i =0
bN-1 = 1

Example: N = 4, x = 1101 x[B2] = 1101 x[B10] = -8+5 = -3 x[B2] = 0101 x[B10] = 0+3 = 3

bN-1 = 0

## Qm.n format: signed twos complement fixed point

sign bit

N = m + n + 1 bit
bn-1 ... b2 b1 b0

## fractional part adjustment

x [B10]

N 2 1 = n 2 N 1 bN 1 + 2i bi 2 i =0

## x like twos complement number

2 N 1 x min = n = 2 m 2

2 N 1 1 m n x max = =2 2 n 2
resolution: 2-n

i.e. : Q2.13 N = 16 bit range: -22 (22 2-13 ) = - 4 3,99987 resolution: 1,22 10-4

Float value
From float to Qm.m:

Qm.n value
[
N 2

x [B10] 2 n = 2 N 1 bN 1 + 2i bi
i =0

## Qm.n representations bit

Qm.n is the 2s complement representation of the float value multiplied by 2n. Examples: Q4.4 N=9, resolution= 0.0625, range: -16 15,9375 9.6 9.6 24 =153.6 = (round) 154 = (0 1001 1010)[B2] -12,4 -12.4 24 = -198.4 = (round) -198 C2(198) = 29 -198 = 314 = (1 0011 1010)[B2]

## 2s c. Qm.n format examples

Q15.16 N=32, range = -32768 32767.99998 resolution: 1.525 10-5 Example1: -1265,337 -1265,337 216 = -82925125.632 = (round) - 82925126 C2(82925126) = 232 82925126 = 0xFB0EA9BA Example2: 0xFB0EA9BA=(1111 1011 0000 1110 1010 1001 1011 1010)[B2] = ( -231 + (2064558522) ) / 216 = -1265.337

## Arithmetic rules for 2s c. fixed point

Addition: two numbers has to be represented in the same Qm.n format to be added, it means that the binary point must be aligned. Multiplication: Qm1.n1 * Qm2.n2 = Q (m1+m2+1).(n1+n2) (Most of DSP achieve an implicit shifting on the result) Resolution: is the smallest non-zero magnitude representable, = 2-n Accuracy: is the magnitude of the maximum difference between a real value and its representation, = 2-n/2 = 2-n-1