Developing Fixed-Point Algorithms for DSP

Fixed-Point Algorithm Development
Azadeh Haghparast, Henri Penttinen, and Antti Huovilainen

Laboratory of Acoustics and Audio Signal Processing,
Helsinki University of Technology
21.04.2006
1 Introduction
This document presents the basic rules and guidelines for the manipulation of xed-point binary num-
bers, which are found in DSPs and hardware components. Fixed-point implementation of algorithms
has considerable improvement in the execution speed and decreases the power consumption. However,
these improvements come at the cost of reduced dynamic range and accuracy of the variables, and extra
programming effort. As inspirational and supportive references this article has used texts of Randy Yates
[1, 2], the rest of the content is based on practical experience.
1
2 Fixed-Point Representation of Numbers
Fixed-point numbers are represented either as integers or as fractions between -1.0 and +1.0. The devel-
opment of xed-point programs requires scaling of variables to prevent overows while maintaining the
accuracy. Fixed-point numbers are represented by the following characteristics:
1. The word length in bits
2. The position of the binary point
3. Whether it is signed or unsigned
The position of the binary point is the means by which xed-point values are scaled. The signed
numbers are used to represent both negative and positive numbers. On the other hand, the unsigned
numbers are used for positive numeric data only.
The position of the binary point is represented by numbers a and b, where a is the number of bits on
the left side of the binary point, called the integer word length, and b is the number of bits on the right
side of the binary point, which is called the fraction word length. The total number of bits is equal to
N = a + b. For signed xed-point numbers, the rst bit is used to represent the sign of the number,
while in unsigned xed-point numbers, there is no sign bit.
1
Randy Yates DSP page: http://home.earthlink.net/yatescr/dsp.htm
1
2.1 Unsigned Fixed-Point Numbers
Unsigned xed-point numbers are used to represent numbers greater than or equal to zero. The value of
an unsigned N-bit binary number, x, with an integer word length of a and fraction word length b is given
by
x =
1
2
b
N1
n=0
2
n
x
n
, (1)
where x
n
is bit n of x. Therefore, the range of an unsigned number is between 0 and
2
N
1
2
b
= 2
a
2
b
.
Note that a and b can have negative values to represent very small or very large values.
Example 1 Unsigned xed-point number with
N = 8
a = 4
b = 4
x = (10001010b) = (
1
2
4
)
7
n=0
2
n
x
n
x =
1
2
4
(2
1
+ 2
3
+ 2
7
) = 8.625
Example 2 Unsigned xed-point number with
N = 16
a = 2
b = 18
x = 04BC(hex) = (0000010010111100b)
x =
1
2
18
(2
2
+ 2
3
+ 2
4
+ 2
5
+ 2
7
+ 2
10
) = 0.004623413085938
2.2 Signed Fixed-Point Numbers
Signed xed-point numbers are used to represent both positive and negative numbers. The rst bit on
the left side of the number is used for the sign of the number. Therefore, for an N-bit number with a
fraction word length of b, the integer word length is equal to a = N b 1.
The value of a given Nbit signed number, x, is given by
x =
1
2
b
2
N1
x
N1
+
N2
n=0
2
n
x
n
, (2)
where x
n
is bit n of x. The range of an Nbit signed number is between 2
N1b
and 2
N1b
1
2
b
.
Example 3 The range of a signed xed-point number with
N = 16
a = 13
b = 2
The range is from 2
13
= 8192 and 2
13
2
2
= 8191.75.
2
Example 4 The range and value of a signed xed-point number with
N = 16
a = 6
b = 9
x = (1000010111000001b)
x =
1
2
9
(2
15
+ 2
0
+ 2
6
+ 2
7
+ 2
8
+ 2
10
) = 61.123046875
2.3 Precision of Fixed-Point Number
Precision of a xed-point number is equal to the word length. For example, precision of a 16-bit signed
number is 16 bits.
2.4 Resolution of a Fixed-Point Number
Resolution depends on the smallest non-zero magnitude that can be represented for example a 16-bit
signed number with a = 11 and b = 4, the resolution is equal to
1
b
=
1
2
4
.
3 Fixed-point Arithmetic
The arithmetic operations are addition, subtraction, multiplication, and division among which we focus
on addition and multiplication. Since subtraction is the inverse of addition and division is equivalent
to multiplication by the multiplicative inverse. The shifting operation is used to perform addition and
multiplication. Therefore, shifting is explained rst. For simplication, a xed-point number with the
integer word length of a and fractional word length of b is shown as X(a, b) in this section.
3.1 Shifting
Shifting operation is equivalent to multiplication or division by a power of two. Shifting is also used to
displace the position of the binary point. Displacement of the binary point is performed in addition and
multiplication operations, as explained in following sections. A shift can be be performed either to the
right or to the left. The number of shifts must be a positive integer number. Shifts to the right and to the
left are indicated by >> and <<, respectively. If n is the number of shifts, we have
X(a, b) >> n = X(a n, b +n) (3)
X(a, b) << n = X(a +n, b n). (4)
3
Example 5 Consider an unsigned 8-bit number, x
N = 8
a = 4
b = 4
x = (01010100) =
1
2
4
(2
6
+ 2
4
+ 2
2
) = 5.25
x
1
= (01010100) >> 2 = (00010101)
x
1
is the number obtained by shifting x two bits to the right. This shift can be interpreted in two different
ways. If we assume the position of the binary point is xed, shifting scales the number. Therefore, x
1
is
equal to
1
2
4
(2
4
+ 2
2
+ 1) = 1.3125, which means division by a power of 2
2
.
Example 6 However, shifting can be interpreted as displacing the binary point. This way, 2 bits shift to
the right is equivalent to moving the binary point 2 bits to the right. Therefore, we will have the same
number with different precision.
N = 8
a
1
= 6
b
1
= 2
x
1
= (00010101) =
1
2
2
(2
4
+ 2
2
+ 1) = 5.25
This interpretation is useful, when performing addition and multiplication operations.
3.2 Addition
In order to add two xed-point numbers, they must be scaled to the same format, that is, X(a, b)+Y (c, d)
has a correct result on the condition that a = c and b = d. Therefore, rst they must be shifted to have
the same binary point position and then added to each other. The result of the addition is a xed-point
number with a length of N + 1 bits, where N is the number of bit in each number.
3.3 Multiplication
For multiplication, both numbers must be either signed or unsigned. In unsigned multiplication, the
result has a number of bits equal to the sum of the numbers of the bits of the multiplicatives.
X(a
1
, b
1
) + Y (a
2
, b
2
) = Z(a
1
+a
2
, b
1
+b
2
) (5)
For signed multiplication, the result is as follows:
X(a
1
, b
1
) +Y (a
2
, b
2
) = Z(a
1
+a
2
+ 1, b
1
+b
2
) (6)
4
Table 1: The formats of signed 16-bit xed-point numbers and the minimum and maximum values they
can represent.
Format Maximum Positive Value (0x7FFF) In Decimal Maximum Negative Value (0x8000) In Decimal
1.15 0.999969482421875 -1.0
2.14 1.99993896484375 -2.0
3.13 3.9998779296875 -4.0
4.12 7.999755859375 -8.0
5.11 15.99951171875 -16.0
6.10 31.9990234375 -32.0
7.9 63.998046875 -64.0
8.8 127.99609375 -128.0
9.7 255.9921875 -256.0
10.6 511.984375 -512.0
11.5 1023.96875 -1024.0
12.4 2047.9375 -2048.0
13.3 4095.875 -4096.0
14.2 8191.75 -8192.0
15.1 16383.5 -16384.0
16.0 32767 -32768.0
4 Fixed-Point Implementation of FIR Filter
In this section, we try to clarify the xed-point algorithm development by a simple example. The input
signal is received from the audio input, passed through an FIR high-pass lter and played back via the
audio output.
Signed 16-bit xed-point numbers have 16 different formats. Table 1 shows these formats and the
minimum and maximum values which they can represent. The formulation expresses the division be-
tween the number of integer bits and fractional bits. For example the formulation 1.15, means it has one
integer bit and 15 bits for the fractional part of the number. Format 3.13 means that 3 is the number of
integer bits and 13 is the number of fractional bits.
Consider an FIR lter of order 4 with the following coefcients
c
0
= 1.4356078
c
1
= +3.0203450
c
2
= +0.4567123
c
3
= +0.0005642.
According to the structure of an FIR lter, for each output sample we need to do four multiplications
and three additions.
y(n) = c
0
x(n) + c
1
x(n 1) +c
2
x(n 2) +c
3
x(n 3). (7)
In this example, all the input samples have the format of 1.15. Also, the output sample should have
the same format. Now, we need to convert the coefcients to integer values with the right format.
5
In order to convert a number to its xed-point format, rst we determine the format with highest
precision from the table by nding the range in which the number lies. Then the suitable xed-point
version of the number is obtained as follows.
x =
number 2
15
if the number is negative,
2
16
+ number 2
15
if the number is positive.
(8)
Therefore, the most suitable format for the lter coefcients in this example is 3.13. The fraction
word length can be obtained by
b =
log
2
((2
M
1)/max(|c
i
|))
, (9)
where M is the word length, c
i
is the lter coefcients, and x denotes the greatest integer less than or
equal to x.
b =
log
2
((2
16
1)/3.0203450)
= 13. (10)
According to the format 3.13, the lter coefcients are
c
0
= 18495
c
1
= 98970
c
2
= 14966
c
3
= 18.
Next, these coefcients are multiplied by the input, which has the format of 1.15. The result of multipli-
cations are of the following format.
N
result
= 16 + 16 = 32
a
result
= 3 + 1 + 1 = 5
b
result
= 13 + 15 = 28
The result of multiplication has one extra sign bit that is eliminated by one shift to the left. Therefore,
the results will have the format of 4.28. Before adding up the results, we can change the format from
4.28 to for 4.12. This is done by 16 bits shift to the right and taking the lowest 16 bits.
The third step is to add the multiplication results. The result of the addition is of the format 4.12. To
convert it to the format 1.15, it is shifted to the left to change the position of the binary point 3 bits to
the right.
References
[1] Randy Yates, Fixed-point arithmetic: An introduction, URL:
http://home.earthlink.net/yatescr/fp.pdf, 2001.
[2] Randy Yates, Practical considerations in xed-point r lter implementation, URL:
http://home.earthlink.net/ yatescr/fir.pdf, 2003.
6

Developing Fixed-Point Algorithms for DSP

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Developing Fixed-Point Algorithms for DSP

Uploaded by

Copyright:

Available Formats

Fixed-Point Algorithm Development

Azadeh Haghparast, Henri Penttinen, and Antti Huovilainen

You might also like