0% found this document useful (0 votes)

98 views5 pages

Converting Floating-Point to Fixed-Point DSP

This document discusses converting floating-point applications to fixed-point for digital signal processors (DSPs). It notes that while DSPs are fast, they primarily support fixed-point operations for speed. This means programmers must deal with errors from reduced precision arithmetic. While floating-point DSPs have higher precision, fixed-point DSPs are much faster and cheaper. The document provides an overview of issues in converting applications and strategies for doing so successfully, including understanding the data and algorithms at a floating-point level before optimizing for fixed-point.

Uploaded by

Nidhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

98 views5 pages

Converting Floating-Point to Fixed-Point DSP

Uploaded by

Nidhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Converting floating-point applications to fixed-point

By Randy Allen
EEdesign.com
(09/24/04, 06:23:00 PM EDT)

Digital signal processors (DSPs) represent one of the fastest growing segments of the embedded world. Yet
despite their ubiquity, DSPs present difficult challenges for programmers. In particular, because
computation speed is critical to DSP applications, DSPs as a rule focus on supporting fixed-point
operations.

This focus means that programmers must not only deal with mathematically sophisticated applications, but
they must also deal with the errors introduced by performing these applications using reduced-precision
arithmetic. This type of error analysis is yet another subject to be mastered by DSP programmers.

While it would be ideal if programmers could avoid using fixed-point arithmetic, as a practical
consideration, they cannot. Fixed-point DSPs execute at gigahertz rates; floating-point DSPs peak out in
the 300-400 megahertz range.

Because fixed-point DSPs are consumed in large volumes, their price per chip is a fraction of the price of a
floating-point DSP. As a result, the only developers who can reasonably justify using a floating-point DSP
are those developing low-volume applications requiring high precision arithmetic.

While converting floating-point applications to fixed-point appears daunting, the task often suffers from
"fear of the unknown" syndrome. With knowledge of the issues, the right tools, and a well-thought out
development methodology, the conversion process is very manageable.

Without knowledge of the issues, the right tools, and a well-thought out development methodology,
floating-point applications can in fact suffer the same problems as fixed-point equivalents. The reduced
precision and range of fixed-point numbers means that fixed-point applications encounter stability
problems more easily than floating-point applications, but both types of applications encounter them.

Figure 1 — Ill-conditioned problem

To illustrate more concretely, consider the set of simultaneous linear equations in Figure 1. High school
algebra teaches one way of solving simultaneous equations: add and subtract multiples of rows and
columns to eliminate variables from positions until you get an easily-solved system (an upper triangular
matrix).

Solving the system this way yields a result of x = [-14 138 134]. Alternatively, you can compute the
left-inverse of the matrix and multiply that by the vector on the right. The analog to this solution in the
scalar world (that is, where the left operand is a scalar rather than a matrix) is computing 1/A (where A is
the left operand) and multiplying both sides of the equation by that quotient.

The left side is reduced to just x; the right side will be the solution. That method yields a solution of x =
[-41.75 27.0 -78.5] — very different from the first. This difference is not just a function of these two
methods; other methods will yield different answers as well. The difficulty is the problem itself.

Figure 1 illustrates an "ill-conditioned problem." Ill-conditioned problems are problems that are "close" or
"nearby" other problems whose solutions are not nearby the true mathematical solution of the original
problem. So, a slight change in the problem (as caused, for instance, by round-off error while solving) can
produce an answer that's nowhere close to the solution of the original problem.

Ill-conditioned problems require care to solve even when in double-precision arithmetic. The message of
ill-conditioned problems is that properties of a numerical algorithm require close study, regardless of
whether the algorithm is implemented in floating-point or in fixed-point. Fixed-point arithmetic by
definition has less precision and thereby more error than floating-point arithmetic. Less precision triggers
instability more easily, but instability can arise using floating-point or fixed-point.

Understanding how fixed-point arithmetic triggers stability issues requires a quick review of fixed-point
arithmetic. "Fixed-point" arithmetic is a phrase that encompasses three different forms of arithmetic
formats: integer arithmetic, fractional arithmetic, and mixed arithmetic. Integer arithmetic is the
fundamental building block for all other formats (including floating-point arithmetic). In integer arithmetic,
bits are interpreted as the 2's-complement representation of an integer. Figure 2 illustrates a 4-bit integer
number.

Figure 2 — Interpretation of bits in a 4-bit integer

Integers may be either signed or unsigned. In the case of an unsigned integer, the leading bit is not treated
as a sign bit, but instead a multiple of the next higher power of 2 (in the example of a 4 bit number in
Figure 2, the lead bit multiplies +8 rather than -8).

The range of numbers which an integer can represent is dependent on the number of bits N in the integer.
N
For unsigned numbers, the smallest possible value is 0; the largest possible value is 2 -1. For signed
N-1 N-1
numbers, the smallest possible value is -2 ; the largest is 2 -1. The smallest resolution for an integer is
one.

Integers are perfect for representing integral values, but most real world quantities require finer resolution.
That requirement drives the second format: fractional arithmetic. Fractional arithmetic uses the same
integer quantities, but interprets them as a fraction by assuming that the decimal point (technically, a
"binal point") is inside the quantity rather than at the right.

In the case of unsigned fractional numbers, the binal point is implicitly at the left (since there is no need
for a sign bit). Signed fractional numbers must reserve one bit to represent the sign, placing the binal point
one bit from the left. Figure 3 provides more details on the fractional representation.

Figure 3 — Interpretation of fractional numbers

Fractional numbers obviously provide a more nearly continuous set of values than integers, and thereby
provide more resolution. The resolution is not free; fractional numbers have a more limited dynamic range,
since fractions are limited to having an absolute value that is less than or equal to 1.

The obvious compromise is to combine the two into a mixed format. In such a format, the binal point is not
set at any specific location within the number; it is instead allowed to be anywhere within (or, for that
matter, outside) the number proper. This flexibility allows the programmer to trade resolution versus range
by choosing the binal point placement. Again, however, nothing is free. Generally, the programmer must
manually track the binal point location. While not difficult (particularly to those who grew up in the era of
slide rules), it is tedious.

Mixed format provides a balance between integers and fractionals; a programmer can trade bits to obtain
either more range or more resolution. The dynamic range depends on the number of integer bits as well as
the total number of bits in the quantity. For I integer bits, a signed mixed mode number will range between
I-1 I-1 -(N-I-1) I -(N-I)
-2 and 2 -2 ; an unsigned number will range between 0 and 2 -2 .

Mixed mode incurs another cost. With the exception of fractional multiplication, which requires a little bit of
extra maneuvering, both integer and fractional arithmetic can be affected using standard integer
operations.

Mixed mode is not so straightforward. Adding a mixed mode number with 5 integer bits and 3 fractional
bits to a mixed mode number with 4 integer bits and 4 fractional bits requires shifting to align the binal
point prior to doing the addition proper. Forgetting to align (easy to do given that you have to manually
track the binal point) creates frustrating errors.

Fixed-point processors gain speed and power efficiency over floating-point processors at the cost of
reduced precision, reduced range, and increased headaches resulting from manual binal point tracking.
Fixed-point requires that programmers find a fixed-point representation small enough to execute efficiently
(the fewer the bits, the more speed and power efficient the application) but that maintains enough range
and precision to execute successfully.

Note that "execute successfully" means more than just avoiding instabilities in the algorithm. Most signal
processing involves image files or sound files that are evaluated by human senses. A precision loss that is
not detectable by human senses is certainly acceptable; greater losses in precision may not be so.

Additionally, signal processing algorithms are mathematically complex, and need exploration and
understanding at the floating-point level. Once the floating-point properties are understood and a stable
algorithm has been created, an understanding of the data is required.

Balancing range and resolution in the fixed-point format requires a thorough knowledge of the range of the
data. Once those characteristics are understood, a prototype implementation is next. Issues such as
getting the proper shifts for addition of fixed-point numbers and tracking the binal point of mixed-mode
numbers are not difficult, but they are tedious.

Tracking these issues through a detailed implementation is difficult; it is much better to start at a higher
level that has less detail. In other words, time spent speeding up an incorrect algorithm is not time
productively spent.

These requirements have led to a fairly uniform methodology for DSP applications, illustrated in Figure 4.

Figure 4 — DSP application development flow

While not universal, this flow is the choice of a large number of DSP development teams. Initial prototyping
is done in floating point in M, the language of the Matlab interpreter1. M is an excellent language for high
level prototyping and exploration.

After initial exploration and prototyping, the algorithm is manually converted into a floating-point C
application. The conversion is necessary because the next step involves understanding the data, which in
turn requires a large amount of simulation.

The Matlab interpreter, like any interpreter, is slow compared to compilers. It cannot provide the simulation
speed necessary to simulate large computations. Since C is compiled, it provides the speed required to do
enough simulation to understand the fixed-point precision requirements.

Once these characteristics are understood, a third manual conversion — this time to fixed-point C — is
undertaken. C provides some support in terms of tracking binal points and getting fixed-point results, and
all DSPs provide support for C compilers. C is not an ideal language in that it does not directly support
some features such as saturation arithmetic (all C arithmetic is modulo by default) that are required on
DSPs, but it is a much easier development language than assembly. This fixed-point model then serves as
golden reference for the final implementation.

This development methodology, while common, is far from ideal. The same application is rewritten by hand
four times. Because the initial development in M is in floating point, getting reference results for the final
implementation is difficult. Changes in specification required from fixed-point problems are not uncovered
until the third model is developed; feeding those changes back into the beginning of the development cycle
is time-consuming, painful, and error-prone.

Because there are 4 different models written in 4 different languages, communication is difficult. Basic C
provides little support in terms of tracking binal points; C++ can help this, but at a price — the resulting
implementation code is less efficient. Even with these flaws, this methodology is a proven way of
developing DSP applications.

Technology advances can eliminate many of these flaws. At least two companies now provide fixed-point
support for M. This support allows the exploration and graphical facilities of M to be applied to fixed-point
programs in addition to floating-point programs, making it easier to diagnose reduced-precision problems.

M also enables automatic tracking of binal points, eliminating a major headache. The result of such
exploration can be a golden M model that produces bitwise identical results to the application running on a
DSP.

Unfortunately, fixed-point support in M is not enough to solve all the fixed-point exploration problems. As
mentioned previously, simulation is important for verifying that a quantization meets the "human
perception" test. Interpreted execution cannot provide enough simulation cycles to perform that test.

Compiled simulation is a fundamental technology requirement. Similarly, you cannot explore fixed-point
characteristics without knowing the types of various operands and operations (that is, are they fixed-point
or floating-point), again a technology not currently supported in M.

With these technical advances, however, the design flow illustrated in Figure 5 becomes feasible. In such a
flow, all exploration — floating-point and fixed-point — is performed in M, resulting in one golden M model
that provides results that are bitwise identical to those arising from the implementation. Compilation
technology can then automatically convert that fixed-point model into an implementation-level model
tuned to a particular processor.

Figure 5 — Improved design flow

Using a fixed-point DSP for a signal processing application is more difficult than using a floating-point DSP,
but not that much more difficult. Running an application in fixed-point is not such a fearful event,
particularly if the proper analysis has been performed to verify the floating-point algorithm up front. When
this analysis is extended to fixed-point, fears of overflows and underflows are unfounded.

Given the cost and performance advantages of fixed-point DSPs, there are strong rational reasons to
balance against the unfounded numerical fears for converting to fixed-point processors. And, as chips
decrease in size and correspondingly increase in speed, the "application" portion of Application Specific
Integrated Circuits (ASICs) is becoming far more sophisticated mathematically. Increased mathematical
sophistication means increased exploration and verification at both floating-point and fixed-point
representations. This will expand the need for improved methodologies in ASIC and FPGA development.
Randy Allen, Ph.D., is founder and CEO of startup Catalytic Inc., which provides a tool that helps with
floating-point to fixed-point conversion. Allen has more than 20 years of software and compiler
development experience in startup companies, advanced technology divisions, and research organizations.
His previous management experience includes serving as vice president of engineering at CynApps, vice
president of performance engineering at Chronologic Simulation, executive director of software
development at Kubota/Ardent Computer, and director in the Advanced Technology Group at Synopsys.

Floating-Point to Fixed-Point in Audio
No ratings yet
Floating-Point to Fixed-Point in Audio
10 pages
Fixed-Point Simulation in Matlab Lab
No ratings yet
Fixed-Point Simulation in Matlab Lab
7 pages
Fixed-Point Algorithm Basics
No ratings yet
Fixed-Point Algorithm Basics
6 pages
Floating Point & Fixed Point Representation - BCA II
No ratings yet
Floating Point & Fixed Point Representation - BCA II
24 pages
Fixed Point
No ratings yet
Fixed Point
3 pages
DVCon Europe 2015 TA2 3 Paper
No ratings yet
DVCon Europe 2015 TA2 3 Paper
8 pages
Fixed-Point and Floating-Point Basics
No ratings yet
Fixed-Point and Floating-Point Basics
154 pages
Understanding Fixed Point Arithmetic
No ratings yet
Understanding Fixed Point Arithmetic
8 pages
Floating Point To Fixed
No ratings yet
Floating Point To Fixed
15 pages
Fixed vs Floating Point Numbers
No ratings yet
Fixed vs Floating Point Numbers
20 pages
Fixed-Point vs Floating-Point Representation
No ratings yet
Fixed-Point vs Floating-Point Representation
16 pages
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
No ratings yet
Floating Point Sept 6, 2006 15-213: "The Course That Gives CMU Its Zip!"
34 pages
CO III SEM UNIT V (1) Anu Degree Notes For Co
No ratings yet
CO III SEM UNIT V (1) Anu Degree Notes For Co
32 pages
Number Representation Explained
No ratings yet
Number Representation Explained
5 pages
DSP Computational Accuracy Explained
No ratings yet
DSP Computational Accuracy Explained
106 pages
FIR Filter Design with Fixed Point Arithmetic
No ratings yet
FIR Filter Design with Fixed Point Arithmetic
12 pages
01 DigitalNumericalFormats
No ratings yet
01 DigitalNumericalFormats
27 pages
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
No ratings yet
Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic Floating-Point Arithmetic 33333
18 pages
L-5 Floating Point Representation of Numbers
No ratings yet
L-5 Floating Point Representation of Numbers
21 pages
Floating-Point Representation in Computing
No ratings yet
Floating-Point Representation in Computing
6 pages
Finite Word Length Effects
No ratings yet
Finite Word Length Effects
31 pages
Fixed vs. Floating Point in Computing
No ratings yet
Fixed vs. Floating Point in Computing
24 pages
DSP Finite Word Length Effects
No ratings yet
DSP Finite Word Length Effects
26 pages
Ponto Flutuante
No ratings yet
Ponto Flutuante
87 pages
Lec 4
No ratings yet
Lec 4
15 pages
IEEE Floating-Point Representation
No ratings yet
IEEE Floating-Point Representation
42 pages
Lec 3
No ratings yet
Lec 3
20 pages
IEEE 754 Floating Point Overview
No ratings yet
IEEE 754 Floating Point Overview
9 pages
Introduction To Fixed Point Math
No ratings yet
Introduction To Fixed Point Math
8 pages
Unit 5 - Share
No ratings yet
Unit 5 - Share
38 pages
Fixed Point Representation
No ratings yet
Fixed Point Representation
3 pages
L2-Variables and Floating Point Number System
No ratings yet
L2-Variables and Floating Point Number System
38 pages
11 MD Zakir Hussain
No ratings yet
11 MD Zakir Hussain
6 pages
Understanding Number Representation in Computers
No ratings yet
Understanding Number Representation in Computers
23 pages
Understanding IEEE Floating Point Standards
No ratings yet
Understanding IEEE Floating Point Standards
31 pages
Understanding Floating Point Numbers
No ratings yet
Understanding Floating Point Numbers
19 pages
COA Unit 2
No ratings yet
COA Unit 2
23 pages
Floating - Point - Number
No ratings yet
Floating - Point - Number
36 pages
Fixed vs Floating Point Representation
No ratings yet
Fixed vs Floating Point Representation
5 pages
White Paper One VHDL Maths 2008
No ratings yet
White Paper One VHDL Maths 2008
5 pages
Digital Signal Processing Unit IV Guide
No ratings yet
Digital Signal Processing Unit IV Guide
44 pages
Understanding Binary Number Representation
No ratings yet
Understanding Binary Number Representation
14 pages
Number Representations in Digital Systems
No ratings yet
Number Representations in Digital Systems
28 pages
Floating Point Numbers
No ratings yet
Floating Point Numbers
5 pages
DSP Arithmetic
No ratings yet
DSP Arithmetic
33 pages
Understanding Floating Point Arithmetic
No ratings yet
Understanding Floating Point Arithmetic
36 pages
32-Bit Floating Point ALU Design
No ratings yet
32-Bit Floating Point ALU Design
4 pages
Data Representation in Computer Architecture
No ratings yet
Data Representation in Computer Architecture
22 pages
Floating Point Basics and Formats
No ratings yet
Floating Point Basics and Formats
5 pages
Mailam Engineering College Mailam (Po), Villupuram (DT) - Pin: 604 304
No ratings yet
Mailam Engineering College Mailam (Po), Villupuram (DT) - Pin: 604 304
43 pages
Finite Precision Arithmetic Explained
No ratings yet
Finite Precision Arithmetic Explained
72 pages
Finite Word Length Effects in Digital Filter
No ratings yet
Finite Word Length Effects in Digital Filter
26 pages
Numerical Analysis Lecture Notes
No ratings yet
Numerical Analysis Lecture Notes
6 pages
Binary Number Representations
No ratings yet
Binary Number Representations
14 pages
Demystifying Floating Point - John Farrier - CppCon 2015
No ratings yet
Demystifying Floating Point - John Farrier - CppCon 2015
61 pages
CMU 15-213: Floating Point Arithmetic
No ratings yet
CMU 15-213: Floating Point Arithmetic
30 pages
Understanding Floating Point Representation
No ratings yet
Understanding Floating Point Representation
10 pages
Multinomial Probit and Multinomial Logit: A Comparison of Choice Models For Voting Research
No ratings yet
Multinomial Probit and Multinomial Logit: A Comparison of Choice Models For Voting Research
16 pages
Heat Conduction - Basic Research
100% (1)
Heat Conduction - Basic Research
362 pages
Gaussian Elimination in Linear Systems
No ratings yet
Gaussian Elimination in Linear Systems
14 pages
Unit IV
No ratings yet
Unit IV
20 pages
Differentiable Simulation For Physical System Identification
No ratings yet
Differentiable Simulation For Physical System Identification
8 pages
Viva
No ratings yet
Viva
35 pages
Tube Based MPC
No ratings yet
Tube Based MPC
12 pages
Understanding Multicollinearity in Regression
No ratings yet
Understanding Multicollinearity in Regression
11 pages
Direct and Iterative Methods For Solving Linear Systems of Equations
No ratings yet
Direct and Iterative Methods For Solving Linear Systems of Equations
16 pages
Linear Algebra Foundations
No ratings yet
Linear Algebra Foundations
17 pages
Application of Numerical Methods
100% (2)
Application of Numerical Methods
7 pages
Concepts of Error in Numerical Analysis
No ratings yet
Concepts of Error in Numerical Analysis
21 pages
Math Methods for Engineers & Scientists
No ratings yet
Math Methods for Engineers & Scientists
583 pages
Robotics: Trajectory Planning and Tracking Control of A Differential-Drive Mobile Robot in A Picture Drawing Application
No ratings yet
Robotics: Trajectory Planning and Tracking Control of A Differential-Drive Mobile Robot in A Picture Drawing Application
15 pages
Numerical Methods Homework 3 Guide
No ratings yet
Numerical Methods Homework 3 Guide
3 pages
System of Equations: Error Analysis & Methods
No ratings yet
System of Equations: Error Analysis & Methods
28 pages
Report of Hilbert Matrix Assignement No 3 - R1
No ratings yet
Report of Hilbert Matrix Assignement No 3 - R1
5 pages
10OCEAN
No ratings yet
10OCEAN
10 pages
Thesis THE FIRST ONE
No ratings yet
Thesis THE FIRST ONE
2 pages
Root-Finding Methods Explained
No ratings yet
Root-Finding Methods Explained
44 pages
MsTower Help PDF
No ratings yet
MsTower Help PDF
167 pages
Adaptive SOC Estimation for Lithium-Ion Batteries
No ratings yet
Adaptive SOC Estimation for Lithium-Ion Batteries
10 pages
90209-1023DEB - Arc Welding As Language Reference Manual
100% (1)
90209-1023DEB - Arc Welding As Language Reference Manual
180 pages
Tetrahedron Volume and Programming Languages
No ratings yet
Tetrahedron Volume and Programming Languages
31 pages
CSC336 Assignment 1
No ratings yet
CSC336 Assignment 1
2 pages
2022 ASME GTP Combustor Exit Temperature Factors
No ratings yet
2022 ASME GTP Combustor Exit Temperature Factors
14 pages
Group11 A08 A36 AA Presentation
No ratings yet
Group11 A08 A36 AA Presentation
15 pages
Slides On Numerical Analysis
No ratings yet
Slides On Numerical Analysis
45 pages
First Steps Inna
No ratings yet
First Steps Inna
232 pages
Tricks of The Trade: Tutorial
No ratings yet
Tricks of The Trade: Tutorial
8 pages

Converting Floating-Point to Fixed-Point DSP

Uploaded by

Converting Floating-Point to Fixed-Point DSP

Uploaded by

Converting floating-point applications to fixed-point

Figure 1 — Ill-conditioned problem

Figure 2 — Interpretation of bits in a 4-bit integer

Figure 3 — Interpretation of fractional numbers

Figure 4 — DSP application development flow

Figure 5 — Improved design flow

You might also like