You are on page 1of 26

There are some advantages of digital signal processor (DSP) are given below,

 In DSP the digital system can be cascaded without any loading problems.
 In this digital circuits can be reproduced easily in large quantities at comparatively lower cost.
 The digital circuits are less sensitive to tolerances of component values.
 These are easily transported because the digital signals can be processed off line.
 Digital signal processing operations can be changed by changing the program in digital programmable
system.
 It has a better control of accuracy in digital systems compared to analog systems.
 Sophisticated signal processing algorithms can be implemented by DSP method.
 Digital signals are easily stored on magnetic media such as magnetic tape without loss of quality of
reproduction of signal.

There are some disadvantages of digital signal processor (DSP) are given below,

 The digital communications require a greater bandwidth than analogue to transmit the same information.
 The detection of digital signals requires the communications system to be synchronized, whereas
generally speaking this is not the case with analogue systems.

What is DSP?

Introduction:
The DSP is a microprocessor chip with optimized architecture to support processing of complex algorithms at
high speed in less time. DSP operates on digital signal. Hence all the real world signals and signals to be
processed are converted to digital form with the help of ADC (Analog to Digital Converter) before. One the
processing is completed the digital signal is converted back to analog form using DAC (Digital to Analog
Converter) as per requirements.
The DSP chips are manufactured by many manufacturers, popular among them include TI, CEVA, Analog
Devices, ARM, Intel, Freescale, XILINX etc. The different manufacturers have their own hardware architecture
and software instructions to support complex algorithms. The figure-1 depicts typical components in a DSP
chip.

Benefits or advantages of DSP

Following are the benefits or advantages of DSP:

➨DSP offers very high accuracy. Hence filters designed in DSP have tighter control on output accuarcy
compare to analog filters.

➨The digital implementation is cheaper compare to analog counterpart.


➨The reconfiguration is difficult in analog system as entire hardware and its components need to be changed.
At the same time, in DSP reconfiguration is very easy and only code or DSP program need to be flashed after
changes are incorporated as per requirements.

➨The DSP offers various interface types such as UART, I2C etc. which helps in interfacing other ICs with the
DSP.
➨DSP can be interfaced with FPGA. This combo helps in design of protocol stack of entire wireless system
such as WiMAX, LTE etc. In this type of architecture few modules are ported on FPGA and few ones on DSP
as per latency requirements.

Drawbacks or disadvantages of DSP

Following are the disadvantages of DSP:

➨The use of DSP requires anti-aliasing filter before ADC and re-construction filter after DAC. Moreover it
requires ADC and DAC modules. Due to use of this extra components complex of DSP based hardware
increases.
➨DSP processes signal at high speed and moreover it consists of higher internal hardware resources. Due to
this, DSP dissipates higher power compare to analog signal processing. Analog processing consists of passive
components (R, L and C) m which dissipates lower power.

➨The hardware architectures and software instructions of each DSP are different. This requires training on the
DSP in order to program for various applications. Hence only highly skilled engineers can only program the
device.
➨Most of the DSP chips are very costly and hence one needs to use the appropriate IC as per requirements
(hardware, software).

IMPORTANT FEATURES OF DSP PROCESSORS

DSP processors are designed to support repetitive, numerically intensive tasks [3]. To this end, most DSP
processors not only have a powerful data path, but also have the ability to move large amounts of data to and
from memory quickly. Moreover, DSP processors provide special instruction sets to exploit hardware
efficiency.

The two most important features of DSP processor architectures, the data path containing fast multiply-
accumulate unit(s) and the multiple-access memory architectures [4], are addressed in this section. The N-tap
finite-impulse- response (FIR) filter:
is used as a typical example in this section.

Pipelining (see Chapter 3) is often used to increase the performance of a processor. Almost all the processors on
the market today use pipelining to some extent. However, in the process of improving performance, pipelining
also makes programming difficult. In this section, two common programming techniques, referred to as time-
stationary and data-stationary [5] coding, used for programming pipelined PDSPs, are discussed.

18.3.1    Data Path

Only fixed-point DSP processor data paths are addressed in this section. They typically incorporate a multiplier,
accumulators, an ALU (arithmetic logic unit), one or more shifters, operand registers, and other specialized
units such as saturation arithmetic unit.

3   Introduction to Digital Signal Processors

3.1    How DSPs are Different from Other Microprocessors

The last forty years have shown that computers are extremely capable in two broad areas, data manipulation,
such as word processing and database management, and mathematical calculation, used in science, engineering,
and Digital Signal Processing. All microprocessors can perform both tasks; however, it is difficult (expensive)
to make a device that is optimized for both. There are technical tradeoffs in the hardware design, such as the
size of the instruction set and how interrupts are handled. Even more important, there are marketing issues
involved: development and manufacturing cost, competitive position, product lifetime, and so on. As a broad
generalization, these factors have made traditional microprocessors, such as the Pentium, primarily directed at
data manipulation. Similarly, DSPs are designed to perform the mathematical calculations needed in Digital
Signal Processing.

Figure 3-1: Data manipulation versus mathematical calculation.

Figure 3-1 lists the most important differences between these two categories. Data manipulation involves
storing and sorting information. For instance, consider a word processing program. The basic task is to store the
information (typed in by the operator), organize the information (cut and paste, spell checking, page layout,
etc.), and then retrieve the information (such as saving the document on a floppy disk or printing it with a laser
printer). These tasks are accomplished by moving data from one location to another, and testing for inequalities
(A=B, A<B, etc.). As an example, imagine sorting a list of words into alphabetical order. Each word is
represented by an 8 bit number, the ASCII value of the first letter in the word. Alphabetizing involved
rearranging the order of the words until the ASCII values continually increase from the beginning to the end of
the list. This can be accomplished by repeating two steps over-and-over until the alphabetization is complete.
First, test two adjacent entries for being in alphabetical order (IF A>B THEN ...). Second, if the two entries are
not in alphabetical order, switch them so that they are (AB). When this two-step process is repeated many
times on all adjacent pairs, the list will eventually become alphabetized.

As another example, consider how a document is printed from a word processor. The computer continually tests
the input device (mouse or keyboard) for the binary code that indicates, "print the document." When this code is
detected, the program moves the data from the computer's memory to the printer. Here we have the same two
basic operations: moving data and inequality testing. While mathematics is occasionally used in this type of
application, it is infrequent and does not significantly affect the overall execution speed.

Figure 3-2: FIR digital filter. Each sample in the output signal is found by multiplying
samples from the input signal by the kernel coefficients and summing the products.

  In comparison, the execution speed of most DSP algorithms is limited almost completely by the number of
multiplications and additions required. For example, Figure 3-2 shows the implementation of an FIR digital
filter, the most common DSP technique. Using the standard notation, the input signal is referred to by x[ ], while
the output signal is denoted by y[ ]. Our task is to calculate the sample at location n in the output signal. An FIR
filter performs this calculation by multiplying appropriate samples from the input signal by a group of
coefficients, denoted by a0, a1, a2, a3,…, and then adding the products. This is simply saying that the input signal
has been convolved with a filter kernel consisting of a0, a1, a2, a3,…. Depending on a0, a1, a2, a3... the
application, there may only be a few coefficients in the filter kernel, or many thousands. While there is some
data transfer and inequality evaluation in this algorithm, such as to keep track of the intermediate results and
control the loops, the math operations dominate the execution time.

In addition to performing mathematical calculations very rapidly, DSPs must also have a predictable execution
time. Suppose you launch your desktop computer on some task, say, converting a word-processing document
from one form to another. It does not matter if the processing takes ten milliseconds or ten seconds; you simply
wait for the action to be completed before you give the computer its next assignment.
In comparison, most DSPs are used in applications where the processing is continuous, not having a defined
start or end. For instance, consider an engineer designing a DSP system for an audio signal, such as a hearing
aid. If the digital signal is being received at 20,000 samples per second, the DSP must be able to maintain a
sustained throughput of 20,000 samples per second. However, there are important reasons not to make it any
faster than necessary. As the speed increases, so does the cost, the power consumption, the design difficulty,
and so on. This makes an accurate knowledge of the execution time critical for selecting the proper device, as
well as the algorithms that can be applied.

  3.2    Characteristics of DSP processors

Although there are many DSP processors, they are mostly designed with the same few basic operations in mind,
that they share the same set of basic characteristics. These characteristics fall into three categories:

 Specialized high speed arithmetic


 Data transfer to and from the real world
 Multiple access memory architectures

Figure 3-3: Typical DSP operations require specific functions.

  Typical DSP operations require a few specific operations. Figure 3-3 shows an FIR filter and illustrates the
basic DSP operations: additions and multiplications, delays, and array handling. Each of these operations has its
own special set of requirements. Additions and multiplications require us to fetch two operands, perform the
addition or multiplication (usually both), store the result, or hold it for a repetition. Delays require us to hold a
value for later use. Array handling requires us to fetch values from consecutive memory locations, and copy
data from memory to memory. To suit these fundamental operations DSP processors often have parallel
multiply and add, multiple memory accesses (to fetch two operands and store the result), lots of registers to hold
data temporarily, efficient address generation for array handling, and special features such as delays or circular
addressing.

3.2.1  Circular Buffering

Digital Signal Processors are designed to quickly carry out FIR filters and similar techniques. To understand the
hardware, we must first understand the algorithms. To start, we need to distinguish between off-line processing
and real-time processing. In off-line processing, the entire input signal resides in the computer at the same time.
For example, a geophysicist might use a seismometer to record the ground movement during an earthquake.
After the shaking is over, the information may be read into a computer and analyzed in some way. Another
example of off-line processing is medical imaging, such as CT and MRI. The data set is acquired while the
patient is inside the machine, but the image reconstruction may be delayed until a later time. The key point is
that all of the information is simultaneously available to the processing program. This is common in scientific
research and engineering, but not in consumer products. Off-line processing is the realm of personal computers
and mainframes.
(a)                                                                                    (b)

Figure 3-4: Circular buffer operation. a. Circular buffer at some instant and b. Circular buffer
after next sample.

 In real-time processing, the output signal is produced at the same time that the input signal is being acquired.
For example, this is needed in telephone communication, hearing aids, and radar. These applications must have
the information immediately available, although it can be delayed by a short amount. For instance, a 10-
millisecond delay in a telephone call cannot be detected by the speaker or listener. Likewise, it makes no
difference if a radar signal is delayed by a few seconds before being displayed to the operator. Real-time
applications input a sample, perform the algorithm, and output a sample, over-and-over. Alternatively, they may
input a group of samples, perform the algorithm, and output a group of samples. This is the world of Digital
Signal Processors.

Now imagine that the FIR filter in Figure 3-2 being implemented in real-time. To calculate the output sample,
we must have access to a certain number of the most recent samples from the input. For example, suppose we
use eight coefficients in this filter. This means we must know the value of the eight most recent samples from
the input signal. These eight samples must be stored in memory and continually updated as new samples are
acquired. What is the best way to manage these stored samples? The answer is circular buffering. Figure 3-4
illustrates an eight sample circular buffer. We have placed this circular buffer in eight consecutive memory
locations, 20041 to 20048. Figure 3-4 (a) shows how the eight samples from the input might be stored at one
particular instant in time, while (b) shows the changes after the next sample is acquired. The idea of circular
buffering is that the end of this linear array is connected to its beginning; memory location 20041 is viewed as
being next to 20048, just as 20044 is next to 20045. You keep track of the array by a pointer that indicates
where the most recent sample resides. For instance, in (a) the pointer contains the address 20044, while in (b) it
contains 20045. When a new sample is acquired, it replaces the oldest sample in the array, and the pointer is
moved one address ahead. Circular buffers are efficient because only one value needs to be changed when a
new sample is acquired.

3.2.2  Mathematics

To perform the simple arithmetic required, DSP processors need special high-speed arithmetic units. Most DSP
operations require additions and multiplications together. So DSP processors usually have hardware adders and
multipliers which can be used in parallel within a single instruction. Figure 3-5 shows the data path for the
Lucent DSP32C processor. The hardware multiply and add work in parallel so that in the space of a single
instruction, both an add and a multiply can be completed.

  
Figure 3-5: The data path for the Lucent DSP32C processor.

  Delays require that intermediate values be held for later use. This may also be a requirement, for example,
when keeping a running total - the total can be kept within the processor to avoid wasting repeated reads from
and writes to memory. For this reason DSP processors have lots of registers which can be used to hold
intermediate values. Registers may be fixed point or floating point format.

Array handling requires that data can be fetched efficiently from consecutive memory locations. This involves
generating the next required memory address. For this reason DSP processors have address registers which are
used to hold addresses and can be used to generate the next needed address efficiently:

The ability to generate new addresses efficiently is a characteristic feature of DSP processors. Usually, the next
needed address can be generated during the data fetch or store operation, and with no overhead. DSP processors
have rich sets of address generation operations.

  

register
*rP Read the data pointed to by the address in register rP
indirect
Having read the data, post increment the address pointer to
*rP++ post increment
point to the next value in the array
Having read the data, post decrement the address pointer to
*rP-- post decrement
point to the previous value in the array
Having read the data, post increment the address pointer by
register             
*rP++rI the amount held in register rI to point to rI values further
post increment
down the array
Having read the data, post increment the address pointer to
*rP++rIr bit reversed point to the next value in the array, as if the address bits
were in bit reversed order

Figure 3-6: Addressing modes for the Lucent DSP32C processor.

Figure 3-6 shows some addressing modes for the Lucent DSP32C processor. The assembler syntax is very
similar to C language. Whenever an operand is fetched from memory using register indirect addressing, the
address register can be incremented to point to the next needed value in the array. This address increment is free
- there is no overhead involved in the address calculation - and in the case of the Lucent DSP32C, processor up
to three such addresses may be generated in each single instruction. Address generation is an important factor in
the speed of DSP processors at their specialized operations.

The last addressing mode - bit reversed - shows how specialized DSP processors can be. Bit reversed addressing
arises when a table of values has to be reordered by reversing the order of the address bits:

 reverse the order of the bits in each address


 shuffle the data so that the new, bit reversed, addresses are in ascending order

This operation is required in the Fast Fourier Transform - and just about nowhere else. So one can see that DSP
processors are designed specifically to calculate the Fast Fourier Transform efficiently.

3.2.3  Input and Output Interfaces

In addition to the mathematics, in practice DSP is mostly dealing with the real world. Although this aspect is
often forgotten, it is of great importance and marks some of the greatest distinctions between DSP processors
and general-purpose microprocessors.

In a typical DSP application, the processor will have to deal with multiple sources of data from the real world
(Figure 3-7). In each case, the processor may have to be able to receive and transmit data in real time, without
interrupting its internal mathematical operations. There are three sources of data from the real world:

 Signals coming in and going out


 Communication with an overall system controller of a different type
 Communication with other DSP processors of the same type

These multiple communications routes mark the most important distinctions between DSP processors and
general-purpose processors. The need to deal with these different sources of data efficiently leads to special
communication features on DSP processors:

  

Figure 3-7: Communication of DSP with overall system controllers.


When DSP processors first came out, they were rather fast processors: for example the first floating point DSP -
the AT&T DSP32 - ran at 16 MHz at a time when PC computer clocks were 5 MHz. This meant that we had
very fast floating point processors: a fashionable demonstration at the time was to plug a DSP board into a PC
and run a fractal (Mandelbrot) calculation on the DSP and on a PC side by side. The DSP fractal was of course
faster. Today, however, the fastest DSP processor is the Texas TMS320C6201, which runs at 200 MHz. This is
no longer very fast compared with an entry level PC. In addition, the same fractal today will actually run faster
on the PC than on the DSP. However, DSP processors are still used - why? The answer lies only partly in that
the DSP can run several operations in parallel: a far more basic answer is that the DSP can handle signals very
much better than a Pentium. Try feeding eight channels of high quality audio data in and out of a Pentium
simultaneously in real time, without affecting the processor performance, if you want to see a real difference.

Figure 3-8: The four activities on a synchronous serial port.

Signals tend to be fairly continuous, but at audio rates or not much higher. They are usually handled by high-
speed synchronous serial ports. Serial ports are inexpensive - having only two or three wires - and are well
suited to audio or telecommunications data rates up to 10 Mbit/s. Most modern speech and audio analogue to
digital converters interface to DSP serial ports with no intervening logic. A synchronous serial port requires
only three wires: clock, data, and word sync (Figure 3-8). The addition of a fourth wire (frame sync) and a high
impedance state when not transmitting makes the port capable of Time Division Multiplex (TDM) data
handling, which is ideal for telecommunications:

DSP processors usually have synchronous serial ports - transmitting clock and data separately - although some,
such as the Motorola DSP56000 family, have asynchronous serial ports as well (where the clock is recovered
from the data). Timing is versatile, with options to generate the serial clock from the DSP chip clock or from an
external source. The serial ports may also be able to support separate clocks for receive and transmit - a useful
feature, for example, in satellite modems where the clocks are affected by Doppler shifts. Most DSP processors
also support companding to A-law or mu-law in serial port hardware with no overhead - the Analog Devices
ADSP2181 and the Motorola DSP56000 family does this in the serial port, whereas the Lucent DSP32C has a
hardware compander in its data path instead.

The serial port will usually operate under DMA - data presented at the port is automatically written into DSP
memory without stopping the DSP - with or without interrupts. It is usually possible to receive and transmit data
simultaneously.

The serial port has dedicated instructions which make it simple to handle. Because it is standard to the chip, this
means that many types of actual I/O hardware can be supported with little or no change to code - the DSP
program simply deals with the serial port, no matter to what I/O hardware this is attached.

Host communications is an element of many, though not all, DSP systems. Many systems will have another,
general purpose, processor to supervise the DSP: for example, the DSP might be on a PC plug-in card or a VME
card - simpler systems might have a microcontroller to perform a 'watchdog' function or to initialize the DSP on
power up. Whereas signals tend to be continuous, host communication tends to require data transfer in batches -
for instance to download a new program or to update filter coefficients. Some DSP processors have dedicated
host ports which are designed to communicate with another processor of a different type, or with a standard bus.
For instance the Lucent DSP32C has a host port which is effectively an 8 bit or 16 bit ISA bus: the Motorola
DSP56301 and the Analog Devices ADSP21060 have host ports which implement the PCI bus.

The host port will usually operate under DMA - data presented at the port is automatically written into DSP
memory without stopping the DSP - with or without interrupts. It is usually possible to receive and transmit data
simultaneously.

The host port has dedicated instructions, which make it simple to handle. The host port imposes a welcome
element of standardization to plug-in DSP boards - because it is standard to the chip, it is relatively difficult for
individual designers to make the bus interface different. For example, of the 22 main different manufacturers of
PC plug-in cards using the Lucent DSP32C, 21 are supported by the same PC interface code: this means it is
possible to swap between different cards for different purposes, or to change to a cheaper manufacturer, without
changing the PC side of the code. Of course this is not foolproof - some engineers will always 'improve upon' a
standard by making something incompatible if they can - but at least it limits unwanted creativity.

Interprocessor communications is needed when a DSP application is too much for a single processor - or where
many processors are needed to handle multiple but connected data streams. Link ports provide a simple means
to connect several DSP processors of the same type. The Texas TMS320C40 and the Analog Devices
ADSP21060 both have six link ports (called 'comm. ports' for the 'C40). These would ideally be parallel ports at
the word length of the processor, but this would use up too many pins (six ports each 32 bits wide=192, which
is a lot of pins even if we neglect grounds). So a hybrid called serial/parallel is used: in the 'C40, comm. ports
are 8 bits wide and it takes four transfers to move one 32 bit word - in the 21060, link ports are 4 bits wide and
it takes 8 transfers to move one 32 bit word.

The link port will usually operate under DMA - data presented at the port is automatically written into DSP
memory without stopping the DSP - with or without interrupts. It is usually possible to receive and transmit data
simultaneously. This is a lot of data movement - for example the Texas TMS320C40 could in principle use all
its six comm. ports at their full rate of 20 MByte/s to achieve data transfer rates of 120 MByte/s. In practice, of
course, such rates exist only in the dreams of marketing men since other factors such as internal bus bandwidth
come into play.

The link ports have dedicated instructions which make them simple to handle. Although they are sometimes
used for signal I/O, this is not always a good idea since it involves very high speed signals over many pins and
it can be hard for external hardware to exactly meet the timing requirements.

3.2.4  Architecture of the DSP

One of the biggest bottlenecks in executing DSP algorithms is transferring information to and from memory.
This includes data, such as samples from the input signal and the filter coefficients, as well as program
instructions, the binary codes that go into the program sequencer. For example, suppose we need to multiply
two numbers that reside somewhere in memory. To do this, we must fetch three binary values from memory,
the numbers to be multiplied, plus the program instruction describing what to do. To fetch the two operands in a
single instruction cycle, we need to be able to make two memory accesses simultaneously. Actually, since we
also need to store the result - and to read the instruction itself - we really need more than two memory accesses
per instruction cycle.
Figure 3-9: The Von Neumann architecture uses a single memory to hold both data and instructions.

Figure 3-9 shows how this seemingly simple task is done in a traditional microprocessor. This is often called
Von Neumann architecture. Von Neumann architecture contains a single memory and a single bus for
transferring data into and out of the central processing unit (CPU). Multiplying two numbers requires at least
three clock cycles, one to transfer each of the three numbers over the bus from the memory to the CPU. We
don't count the time to transfer the result back to memory, because we assume that it remains in the CPU for
additional manipulation (such as the sum of products in an FIR filter). Since the von Neumann architecture uses
only a single memory bus, it is cheap, requiring less pins, and simple to use because the programmer can place
instructions or data anywhere throughout the available memory. But it does not permit multiple memory
accesses.

The modified von Neumann architecture allows multiple memory accesses per instruction cycle by the simple
trick of running the memory clock faster than the instruction cycle. For example the Lucent DSP32C runs with
an 80 MHz clock: this is divided by four to give 20 million instructions per second (MIPS), but the memory
clock runs at the full 80 MHz - each instruction cycle is divided into four 'machine states' and a memory access
can be made in each machine state, permitting a total of four memory accesses per instruction cycle (Figure
3-10). In this case the modified von Neumann architecture permits all the memory accesses needed to support
addition or multiplication: fetch of the instruction; fetch of the two operands; and storage of the result.

Figure 3-10: Multiple memory accesses per instruction cycle in modified von Neumann
architecture. Four memory accesses per instruction cycle.

The Von Neumann design is quite satisfactory when you are content to execute all of the required tasks in
serial. In fact, most computers today are of the Von Neumann design. We only need other architectures when
very fast processing is required, and we are willing to pay the price of increased complexity. This leads us to the
Harvard architecture, shown Figure 3-11. This is named for the work done at Harvard University in the 1940s
under the leadership of Howard Aiken (1900-1973). As shown in this illustration, Aiken insisted on separate
memories for data and program instructions, with separate buses for each. Since the buses operate
independently, program instructions and data can be fetched at the same time, improving the speed over the
single bus design. Most present day DSPs use this dual bus architecture.

Figure 3-11: The Harvard architecture uses separate memories for data and instructions, providing higher speed.
The Harvard architecture has two separate physical memory buses. This allows two simultaneous memory
accesses. The true Harvard architecture dedicates one bus for fetching instructions, with the other available to
fetch operands. This is inadequate for DSP operations, which usually involve at least two operands. So DSP
Harvard architectures usually permit the program bus to be used also for access of operands. Note that it is often
necessary to fetch three things - the instruction plus two operands - and the Harvard architecture is inadequate to
support this.

The Harvard architecture requires two memory buses. This makes it expensive to bring off the chip - for
example a DSP using 32 bit words and with a 32 bit address space requires at least 64 pins for each memory bus
- a total of 128 pins if the Harvard architecture is brought off the chip. This results in very large chips, which are
difficult to design into a circuit.

Figure 3-12: The Super Harvard Architecture improves upon the Harvard design by adding an
instruction cache and a dedicated I/O controller.

Figure 3-12 illustrates the next level of sophistication, the Super Harvard Architecture. This term was coined
by Analog Devices to describe the The Scientist and Engineer's Guide to Digital Signal Processing 510 internal
operation of their ADSP-2106x and new ADSP-211xx families of Digital Signal Processors. These are called
SHARC® DSPs, a contraction of the longer term, Super Harvard Architecture. The idea is to build upon the
Harvard architecture by adding features to improve the throughput. While the SHARC DSPs are optimized in
dozens of ways, two areas are important enough to be included in Figure 3-12: an instruction cache, and an I/O
controller. First, let's look at how the instruction cache improves the performance of the Harvard architecture. A
handicap of the basic Harvard design is that the data memory bus is busier than the program memory bus. When
two numbers are multiplied, two binary values (the numbers) must be passed over the data memory bus, while
only one binary value (the program instruction) is passed over the program memory bus. To improve upon this
situation, we start by relocating part of the "data" to program memory. For instance, we might place the filter
coefficients in program memory, while keeping the input signal in data memory. (This relocated data is called
"secondary data" in the illustration).

At first glance, this doesn't seem to help the situation; now we must transfer one value over the data memory
bus (the input signal sample), but two values over the program memory bus (the program instruction and the
coefficient). In fact, if we were executing random instructions, this situation would be no better at all.

However, DSP algorithms generally spend most of their execution time in loops. This means that the same set
of program instructions will continually pass from program memory to the CPU. The Super Harvard
architecture takes advantage of this situation by including an instruction cache in the CPU. This is a small
memory that contains about 32 of the most recent program instructions. The first time through a loop, the
program instructions must be passed over the program memory bus. This results in slower operation because of
the conflict with the coefficients that must also be fetched along this path. However, on additional executions of
the loop, the program instructions can be pulled from the instruction cache. This means that all of the memory
to CPU information transfers can be accomplished in a single cycle: the sample from the input signal comes
over the data memory bus, the coefficient comes over the program memory bus, and the program instruction
comes from the instruction cache. In the jargon of the field, this efficient transfer of data is called a high
memory access bandwidth.

Figure 3-13: Typical DSP architecture. Digital Signal Processors are designed to implement
tasks in parallel. This simplified diagram is the Analog Devices SHARC DSP.

Figure 3-13 presents a more detailed view of the SHARC architecture, showing the I/O controller connected to
data memory. This is how the signals enter and exit the system. For instance, the SHARC DSPs provides both
serial and parallel communications ports. These are extremely high speed connections. For example, at a 40
MHz clock speed, there are two serial ports that operate at 40 MBits/second each, while six parallel ports each
provide a 40 Mbytes/second data transfer. When all six parallel ports are used together, the data transfer rate is
an incredible 240 Mbytes/second.

Both Harvard and von Neuman architectures require the programmer to be careful of where in memory data is
placed: for example with the Harvard architecture, if both needed operands are in the same memory bank then
they cannot be accessed simultaneously.

3.2.5  Data formats in DSP processors

DSP processors store data in fixed or floating point formats. It is worth noting that fixed point format is not
quite the same as integer (Figure 3-14). The integer format is straightforward: representing whole numbers from
0 up to the largest whole number that can be represented with the available number of bits. Fixed point format is
used to represent numbers that lie between 0 and 1, with a binary point assumed to lie just after the most
significant bit. The most significant bit in both cases carries the sign of the number. The size of the fraction
represented by the smallest bit is the precision of the fixed point format. The size of the largest number that can
be represented in the available word length is the dynamic range of the fixed point format.

  

Figure 3-14: The comparison integer and fixed point data formats.

To make the best use of the full available word length in the fixed point format, the programmer has to make
some decisions. If a fixed point number becomes too large for the available word length, the programmer has to
scale the number down, by shifting it to the right: in the process lower bits may drop off the end and be lost. If a
fixed point number is small, the number of bits actually used to represent it is small. The programmer may
decide to scale the number up, in order to use more of the available word length. In both cases the programmer
has to keep a track of by how much the binary point has been shifted, in order to restore all numbers to the same
scale at some later stage.

Floating point format has the remarkable property of automatically scaling all numbers by moving, and keeping
track of, the binary point so that all numbers use the full word length available but never overflow.

  

Figure 3-15: The floating point format.

Floating point numbers have two parts: the mantissa, which is similar to the fixed point part of the number, and
an exponent which is used to keep track of how the binary point is shifted (Figure 3-15). Every number is scaled
by the floating point hardware. If a number becomes too large for the available word length, the hardware
automatically scales it down, by shifting it to the right. If a number is small, the hardware automatically scales it
up, in order to use the full available word length of the mantissa. In both cases the exponent is used to count
how many times the number has been shifted. In floating point numbers the binary point comes after the second
most significant bit in the mantissa.

Figure 3-16: The block floating point format.

The block floating point format in Figure 3-16 provides some of the benefits of floating point, but by scaling
blocks of numbers rather than each individual number. Block floating point numbers are actually represented by
the full word length of a fixed point format. If any one of a block of numbers becomes too large for the
available word length, the programmer scales down all the numbers in the block, by shifting them to the right. If
the largest of a block of numbers is small, the programmer scales up all numbers in the block, in order to use the
full available word length of the mantissa. In both cases the exponent is used to count how many times the
numbers in the block have been shifted.

Some specialized processors, such as those from Zilog, have special features to support the use of block floating
point format. More usually, it is up to the programmer to test each block of numbers and carry out the necessary
scaling.

Figure 3-17: The advantage of floating point format over fixed point. Floating point
hardware automatically scales every number to avoid overflow.

The floating point format has one further advantage over fixed point: it is faster. Because of quantization error,
a basic direct form 1 IIR filter second order section requires an extra multiplier, to scale numbers and avoid
overflow. But the floating point hardware automatically scales every number to avoid overflow, so this extra
multiplier is not required (Figure 3-17).

3.2.6  Precision and dynamic range

The precision with which numbers can be represented is determined by the word length in the fixed point
format, and by the number of bits in the mantissa in the floating point format. In a 32 bit DSP processor the
mantissa is usually 24 bits, so the precision of a floating point DSP is the same as that of a 24 bit fixed point
processor. But floating point has one further advantage over fixed point: because the hardware automatically
scales each number to use the full word length of the mantissa, the full precision is maintained even for small
numbers (Figure 3-18).

There is a potential disadvantage to the way floating point works. Because the hardware automatically scales
and normalizes every number, the errors due to truncation and rounding depend on the size of the number. If we
regard these errors as a source of quantization noise, then the noise floor is modulated by the size of the signal.
Although the modulation can be shown to be always downwards (that is, a 32 bit floating point format always
has noise which is less than that of a 24 bit fixed point format), the signal dependent modulation of the noise
may be undesirable, notably, the audio industry prefers to use 24 bit fixed point DSP processors over floating
point because it is thought by some that the floating point noise floor modulation is audible. The precision
directly affects quantization error.

Figure 3-18: The floating point hardware automatically scales each number and the
full precision is maintained even for small numbers

The largest number which can be represented determines the dynamic range of the data format. In fixed point
format this is straightforward: the dynamic range is the range of numbers that can be represented in the
available word length. For floating point format, though, the binary point is moved automatically to
accommodate larger numbers, so the dynamic range is determined by the size of the exponent. Therefore the
dynamic range of a floating point format is enormously larger than for a fixed point format (Figure 3-19). 

Figure 3-19: The dynamic range comparison of floating point format over fixed point.

While the dynamic range of a 32 bit floating point format is large, it is not infinite, thus it is possible to suffer
overflow and underflow even with a 32 bit floating point format. A classic example of this can be seen by
running fractal (Mandelbrot) calculations on a 32 bit DSP processor. After quite a long time, the fractal pattern
ceases to change because the increment size has become too small for a 32 bit floating point format to represent.
  

Figure 3-20: The data path of the Lucent DSP32C

Most DSP processors have extended precision registers within the processor. Figure 3-20 shows the data path of
the Lucent DSP32C processor. Although this is a 32 bit floating point processor, it uses 40 and 45 bit registers
internally, thus, results can be held to a wider dynamic range internally than when written to memory.

DSP Processors, Embodiments and Alternatives: Part I

Uptil now, we described digital signal processing in general terms, focusing on DSP fundamentals, systems and
application areas. Now, we narrow our focus in DSP processors. We begin with a high level description of the
features common to virtually all DSP processors. We then describe typical embodiments of DSP processors,
and briefly discuss alternatives to DSP processors such as general purpose microprocessors,
microcontrollers(for comparison purposes) (TMS320C20) and FPGA’s. The next several blogs provide a
detailed treatment of DSP processor architectures and features.

So, what are the “special” things that a DSP processor can perform? Well, like the name says, DSP processors
do signal processing very well. What does “signal processing” mean? Really, it’s a set of algorithms for
processing signals in the digital domain. There are analog equivalents to these algorithms, but processing them
digitally has been proven to be more efficient. This has been trend for many many years. Signal processing
algorithms are the basic building blocks for many applications in the world from cell phones to MP3 players,
digital still cameras, and so on. A summary of these algorithms is shown below:

 FIR Filter:
 IIR Filter:
 Convolution:
 Discrete Fourier Transform:
 Discrete Cosine Transform:

One or more of these algorithms are used in almost every signal processing application. FIR filters and IIR
filters are almost fundamental to any DSP application. They remove unwanted noise from signals being
processed, convolution algorithms are used to look for similarities in signals, discrete Fourier transforms are
used to represent signals in formats that are easier to process, and discrete cosine transforms are used in image
processing applications. We will discuss the details of some of these algorithms later, but there are things to
notice about  this entire list of algorithms. First, they all have a summing operation, the function. In the
computer world, this is equivalent to an accumulation of a large number of elements which is implemented
using a “for” loop. DSP processors are designed to have large accumulators because of this characteristic. They
are specialized in this way. DSPs also have special hardware to perform the “for” loop operation so that the
programmer does not have to implement this in software, which would be much slower.

The algorithms above also have multiplication of two different operands. Logically, if we were to speed up this
operation, we would design a processor to accommodate the multiplication and accumulation of two operands
like this very quickly. In fact, this is what has been done with DSPs. They are designed to support the
multiplication and accumulation of data sets like this very quickly; for most processors, in just one cycle. Since
these algorithms are very common in most DSP applications, tremendous execution savings can be obtained by
exploiting these processor optimizations.

There are also inherent structures in DSP algorithms that allow them to be separated and operated on in parallel.
Just as in real life, if I can do more things in parallel, I can get more done in the same amount of time. As it
turns out, signal processing algorithms have this characteristic as well. So, we can take advantage of this by
putting multiple orthogonal (nondependent) execution units in our DSP processors and exploit this parallelism
when implementing these algorithms.

DSP processors must also add some reality in the mix of these algorithms shown above. Take the IIR filter
described above. You may be able to tell just by looking at this algorithm that there is a feedback component
that essentially feeds back previous outputs into the calculation of the current output. Whenever you deal with
feedback, there is always an inherent stability issue. IIR filters can become unstable just like other feedback
systems. Careless implementation of feedback systems like the IIR filter can cause the output to oscillate
instead of asymptotically decaying to zero (the preferred approach). This problem is compounded in the digital
world where we must deal with finite word lengths, a key limitation in all digital systems. We can alleviate this
using saturation checks in software or use a specialized instruction to do this for us. DSP processors, because of
the nature of signal processing algorithms, use specialized saturation underflow/overflow instructions to deal
with these conditions efficiently.

There is more I can say about this, but you get the point. Specialization is really all it is about with DSP
processors; these processors specifically designed to do signal processing really well. DSP processors may not
be as good as other processors when dealing with nonsignal processing centric algorithms (that’s fine; I am not
any good at medicine either). So, it’s important to understand your application and choose the right processor.
(A previous blog about DAC and ADC did mention this issue).

(We describe below common features of DSP processors.)

Dozens of families of DSP processors are available on the market today. The salient feature of some of the
commonly used families of DSP Processors are summarized in Table 1. Throughout these series, we will use
these processors as examples to illustrate the architectures and features that can be found in commercial DSP
processors.

Most DSP processors share some common features designed to support repetitive, numerically intensive tasks.
The most important of these features are introduced briefly here. Each of these features and many others will be
examined in greater detail in this blog article series.
Table 1.

Fast Multiply Accumulate

The most often cited of DSP feature of DSP processors is the ability to perform a multiply-accumulate
operation (often called a MAC) in a single instruction cycle. The multiply-accumulate operation is useful in
algorithms that involve computing a vector product, or matrix product, such as digital filters, correlation,
convolution and Fourier transforms. To achieve this functionality, DSP processors include a multiplier and
accumulator integrated into  the main arithmetic processing unit (called the data path) of the processor. In
addition, to allow a series of multiply-accumulate operations to proceed without the possibility of arithmetic
overflow, DSP processors generally provide extra bits in their accumulator registers to accommodate growth of
the accumulated result. DSP processor data paths will be discussed in detail in some later blog here.

Multiple Access Memory Architecture.

A second feature shared by most DSP processors is the ability to complete several accesses to memory in a
single instruction cycle. This allows the processor to fetch an instruction while simultaneously fetching
operands for the instruction or storing the result of the previous instruction to memory. High bandwidth
between the processor and memory is essential for good performance if repetitive data intensive operations are
required in an algorithm, as is common in many DSP applications.

In many processors, single cycle multiple memory accesses are subject to restrictions. Typically, all but one of
the memory locations must reside on-chip, and multiple memory accesses can take place only with certain
instructions. To support simultaneous access of multiple memory locations, DSP processors provide multiple
on-chip  buses, multiported on-chip memories, and in some casesm multiple independent memory banks. DSP
memory structures are quite distinct from those of general purpose processors and microcontrollers. DSP
processor memory architectures will be investigated in detail later.

Specialized Addressing Modes.

To allow arithmetic processing to proceed at maximum speed and to allow specification of multiple operands in
a small instruction word, DSP processors incorporate dedicated address generation units. Once the appropriate
addressing registers have been configured, the address generation units operate in the background, forming the
addresses required for operand accesses in parallel with the execution of arithmetic instructions. Address
generation units typically support a selection of addressing modes tailored to DSP applications. The most
common of these is register-indirect addressing with post-increment, which is used in situations where a
repertitive computation is performed on a series of data stored sequentially in memory. Special addressing
modes (called circular or modulo addressing) are often supported to simplify the use of data buffers. Some
processors support bit-reversed addressing, which eases the task of interpreting the results of the FFT algorithm.
Addressing modes will be described in detail later.

Specialized Execution Control.

Because many DSP algorithms involve performing repetitive computations, most DSP processors provide
special support for efficient looping. Often, a special loop or repeat instruction is provided that allows the
programmer to implement a for-next loop without expending any instruction cycles for updating and testing the
loop counter or for jumping back to the top of the loop.
Some DSP processors provide other execution control features to improve performance, such as context
switching and low-latency, low overhead interrupts for fast input/output.

Hardware looping and interrupts will also be discussed later in this blog series.

Perfipheral and Input/Output Interfaces

To allow low-cost, high performance input and output (I/O), most DSP processors incorporate one or more
serial or parallel I/O interfaces, and specialized I/O handling mechanisms such as Direct Memory Access
(DMA). DSP processor peripheral interfaces are often designed to interface directly with common peripheral
devices like analog-to-digital and digital-to-analog converters.

As integrated circuit manufacturing techniques have improved in terms of density and flexibility, DSP
processor vendors have included not just peripheral interfaces, but complete peripheral devices on-chip.
Examples of this are chips designed for cellular telephone applications, several of which incorporate a DAC and
ADC on chip.

DSP Processor Embodiments:

The most familiar form of DSP processor is the single chip processor, which is incorporated into a printed
circuit board design by the system designer. However, with the widespread proliferation of DSP processors into
many kinds of applications, the increasing level of integration in all kinds of DSP products, and the
development of new packaging techniques, DSP processors can now be found in many different forms,
sometimes masquerading as something else entirely. In this blog, we briefly discuss some of the forms that DSP
processors take.

Multichip Modules.

A multi-chip module (MCM) is generically an electronic assembly (such as a package with a number of
conductor terminals or “pins”) where multiple integrated circuits (ICs), semiconductor dies and/or other discrete
components are integrated, usually onto a unifying substrate, so that in use it is treated as if it were a single
component (as though a larger IC).[1] Other terms, such as “hybrid” or Hybrid integrated circuit, also refer to
MCMs.

One advantage of this approach is achieving higher packaging density — more circuits per square inch of
printed circuit board. This in turn results in increased operating speed and reduced power dissipation. (As
multichip module packaging technology advanced, vendors began to offer multichip modules containing DSP
processors. )

For example, Texas Instruments sells the 42 dual C40 multichip module (MCM) containing two SMJ320C40
digital signal processors (DSPs) with 128K words × 32 bits (42D) or 256K words × 32 bits (42C) of zero-wait-
state SRAMs mapped to each local bus. Global address and data buses with two sets of control signals are
routed externally for each processor, allowing external memory to be accessed. The external global bus
provides a continuous address reach of 2G words.

It gives a whopping performance of 80 Million Floating-Point Operations Per Second (MFLOPS) With 496-
MBps-Burst I/O Rate for 40-MHz Modules.
Multiple Processors on a Chip.

As IC manufacturing technology became more sophisticated, DSP processors now squeeze more features and
performance onto a single-chip  processor, and they even combine multiple processors on a single IC. As with
multichip modules, multiprocessor chips provide increased performance and reduced power compared with
design using multiple, separately packaged processors. However, the selection of multiprocessor chip offerings
is limited to only a few devices.

Chip Sets

n a computer system, a chipset is a set of electronic components in an integrated circuit that manages the data
flow between the processor, memory and peripherals. It is usually found on the motherboard. Chipsets are
usually designed to work with a specific family of microprocessors. Because it controls communications
between the processor and external devices, the chipset plays a crucial role in determining system performance.

DSP chipsets seemed to follow the move towards processor integration in PC’s. While some manufacturers
combine multiple processors on a single chip and others use multichip modules to combine multiple chips into
one package, another variation on DSP processor packaging is to divide the DSP into two or more separate
packages. This was the approach that Butterfly DSP had taken with their DSP chip set, which consisted of the
LH9320 address generator and the LH9124 processor. Dividing the processor into two or more packages may
make sense if the processor is very complex and if the number of input/output pins is very large. Splitting
functionality into multiple integrated circuits may allow the use of much less expensive IC packages, and
thereby provide cost savings. This approach also provides added flexibility allowing the system designer to
combine the individual ICs in the configuration best suited for the application. For example, with the Butterfly
chip set, multiple address generator chips could be used in conjunction with one processor chip. Finally, chip
sets have the potential of providing more I/O pins than individual chips. In the case of the Butterfly chip set, the
use of separate address generator and processor chips allowed the processor to have eight 24-bit external data
buses, many more provided by more common single-chip processors.

Fixed point vs Floating point

Various types of processors (DSPs, MCUs, etc.) have the ability to do math using floating point numbers, but
what exactly does this mean? In general, floating point math offers a wider range of numbers and more
precision than fixed point math. Knowing the difference, and when to use which type of math can make a
difference in terms of a faster calculation or a more precise calculation. Mostly, the objective is to use only as
much calculating power as you will need to get the job done.

Figure 1: The number 0.15625 represented as a single-precision floating-point number per the IEEE 754-1985
standard. (Credit: Codekaizen, wikipedia.org)

A fundamental difference between the two is the location of the decimal point: fixed point numbers have a
decimal in a fixed position and floating-point numbers have a sign. Both types of numbers are set up in sections,
and there’s a placeholder for every portion of a number. Referring to Figure 1, fixed point numbers have a
certain number of reserved digits that are on the left side of the decimal for the integer portion of the number.
The numbers to the right of the decimal point are reserved for the fractional part of the number. If your MCU
only uses fixed numbers, the decimal stays in the same place in that if two digits are set for the fractional
portion, then that is the level of precision you will have going forward.

Figure 2: The decimal is called a “radix” in computer science terminology. The radix is set so there’s a fixed
number of bits to the left and right of the radix. The number n would be the number of bits the processor can
handle, so n could be 4,8,16, 32 or higher, depending on the bit width of the processor’s data path.

Very large numbers and very small numbers will have to fit in the same number of placeholders, what is
actually bits, separated by the decimal in the same place, regardless of the number. For instance, if a fixed-point
format will represent money, the level of precision might be just two places after the decimal. The programmer,
knowing the register need hold only two bits after the decimal point, can put in 9999 and know that the fixed-
point unit will interpret that number as 99.99, which is $99.99. (Here, base-10 numbers are used as an example,
but recall that processors use base-2 or binary numbers).

Similarly, the number 001 would be interpreted by the code as 0.01. Decimals are left out of the code itself.
Using the above money example again, the number 100 would be seen by fixed-point math as 1.00. The code
for a fixed-point processor is written with respect to the decimal, which is in a fixed position. Fixed point math,
independent of processor speed, is easier to code with and faster than floating point math. Fixed point is
adequate unless you know that you will be dealing with higher numbers than the fixed-point unit can handle.
Fixed-point numbers often are set up to use the most significant bit to represent a positive or negative sign. This
means that a 4-bit unsigned integer has a range of 0 to 15 (because 2 4 = 16), while a 4-bit signed integer has a
range of -8 to 7 (-8,-7,-6,-5,-4,-3,-2,-1,0,1,2,3,4,5,6,7). Again, this is because, in a number that has only 4 bits in
which to represent it, there are only 16 total possible numbers that can be represented. (I.e., 2 4 = 16, where 4 is
the total number of bits wide that the processor can handle in this example). Some recommend to never store
money as a floating-point value.

Floating point numbers also fit into a specific pattern. In fact, the Institute of Electrical and Electronics
Engineers (IEEE) has a standard for representing floating-point numbers (IEEE 754). A floating-point number
doesn’t have a fixed number of bits before and after a decimal. Rather, a floating-point number is defined by the
total number of bits reserved for expressing a number. Like fixed-point numbers, floating point numbers have a
pre-determined number of bits to hold the floating-point number, which has a sign (positive or negative
number) as well as a number (i.e., mantissa) with an exponent. All of this has to fit in the data path allotted for
the processor, which could be 16-bit, 32-bit, or 64-bit, etc. (See Figure 2 for how a 32-bit wide floating point
number might be expressed.) Floating point numbers store as many precision bits as will fit in the data path, and
the exponent determines the location of the decimal point in relation to the precision bits. The length of the
exponent and mantissa would reflect the largest and smallest numbers anticipated by the application.
Figure 3: IEEE 754 32-bit ( a.k.a. single precision) floating-point numbers have three parts: the sign, the
exponent, and the fraction. The fraction is also known as a significand or mantissa. The signed bit is 0 for a
positive number and a 1 for a negative number in this standard, with an 8-bit exponent. “Double precision” is
64-bits wide.

Floating-point numbers lose precision in that they only have a fixed number of bits with which to express a real
number (e.g., 16-, 32- or 64-bit). Real numbers can go on to positive or negative infinity, and there is an infinite
number of real numbers between 0 and 1, as well. A 16-bit processor has only 16-bits with which to represent
numbers and is therefore capped at 216. For example, a 4-bit processor has only 4 bits in which to represent
numbers and is capped at 9999 as its highest number (if it doesn’t use one bit for a sign). A 16-bit processor can
only represent 216 different numbers.

Floating point numbers can seem confusing and complicated, but it’s also time-consuming for a processor.
Doing math using floating point numbers can involve several steps to account for differences in exponential
values. The IEEE 754 standard, first published as late as 1985, resolved problems having to do with creating
portable code with respect to floating point conventions. Prior to the standard, companies handled floating point
math as they saw fit, making code difficult to port from one type of processor architecture to another. The latest
update to the standard was made in 2008. Several java-script based online tools are available for helping with
understanding IEEE-754 floating point numbers using base-2. (Search for “IEEE-754 converter”.) Many articles
and white papers have been written about how to best use floating point numbers since processors can be quite
literal in comparing numbers and overflowing the highest possible number is going to roll the number over to
zero. Simply put, floating point numbers can be much more complicated than fixed numbers with regard to how
processors handle them.

Digital signal processing (DSP) can be separated into two categories: fixed
point and floating point.
There are a lot of discussions and information available on the two counterparts, yet the concepts are still
confusing to many in understanding their differences and how those differences affect real-world products. The
main purpose of this article is not to go too deep technically, but instead to explore the difference of the numeric
representations in a simple yet intuitive way. Since it is only fair to compare an apple to another apple, we’re
going to use the same 32-bit length for both notations in this article as shown in Fig. 1 (below).
For Fixed Point Notation:
Integer Value = -1Sign x Bits
Sign = 0 (Positive value) or 1 (Negative value)
Bits = 231 possible values
Positive Min: 1
Positive Max: +2147483647

With simple scaling, fractional numbers can be represented as well. The represented values are equally spaced
across the whole range. The gaps between adjacent values are always the same.

For 32-bit Floating Point (IEEE 754):


Value = -1Sign x [(1+ Mantissa) x 2(Exponent-127)]
Sign = 0 (Positive value) or 1 (Negative value)
Exponent = 1 to 254 (0 and 255 are reserved for special cases). Subtracting 127 gives -126 to 127
Mantissa = 0 to 0.999999881 for all ‘0s’ to all ‘1s’. Adding 1 gives 1.000000000 to 1.999999881
Positive Min: 1 x 2−126 ≈ 1.2 × 10−38
Positive Max: 1.99999881 x 2127 ≈ 3.4 × 1038

The represented values are unequally spaced between these two extremes, such that the gap between adjacent
numbers is much smaller for small values and much larger for large numbers. This notation “steals” eight bits to
become the exponent, which gives the extensive dynamic range, but to gain this benefit, it loses the “stolen”
eight bits’ resolution for all values.
Fig. 2 (above) illustrates how the two notations differ in their numeric representations. In this illustration, we
just focus on positive numbers (Sign Bit=0) to avoid complications. Then the eight most significant bits in the
standard fixed point format are used as the scaling index to match the number of bits of the exponent in the
floating point format. This leaves the remaining 23 bits to be matched for both formats. Once the bit-matching
is accomplished, we can then generate meaningful results.

Exponent e can be any arbitrary number as long as all blocks are within the min/max range (-126 to 127). The
Index=1 is mapped to e once the desired data range is chosen for the fixed point computation. The graph shows
that both notations have the same 23-bit resolution at i=1. For i=0, the resolution of floating point is better; in
fact, tremendously better. There are e+126 floating point blocks, while there is only one fixed point block (i=0),
both covering the same numerical range.

For 1<i<256, the resolution of fixed point is better. Although it seems floating point loses ground at higher
values, keep in mind that the fixed point counterpart can only have a total of 256 (28) blocks with 23-bit
resolution per block. Once the value goes beyond the upper limit, the developer is forced to scale up further and
live with a lower resolution in order to be able to carry out the assigned mathematical task. This is why floating
point is dominant when the computation values are small, and can handle large numbers when fixed point
cannot. To make things more complicated for fixed point DSP developers, they have to handle overflow and
truncation errors as well.
In Summary:

Advantages for Floating Point:

 Extremely large dynamic range


 Applications with intensive computations
 The smaller the number, the higher the precision and lower quantization noise
 Possible computation for large numbers, beyond fixed point’s capability
 Simpler and faster development without worrying about overflow and truncation errors

Advantages For Fixed Point:

 Better resolution within a narrow range of 1<i<256 (7 out of 256 floating point Exponent blocks)
 Simpler DSP silicon

You might also like