31946-J0 DigitalSignalProcessor SW ED2 PR2 Web

Digital Signal Processor
Student Workbook
Ê>?~Æ6J0Ä%#]Ë
31946-J0
Edition 2
3031946J00503
SECOND EDITION
Second Printing, March 2005
Copyright September, 2003 Lab-Volt Systems, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system,
or transmitted in any form by any means, electronic, mechanical, photocopied, recorded, or
otherwise, without prior written permission from Lab-Volt Systems, Inc.
Information in this document is subject to change without notice and does not represent a
commitment on the part of Lab-Volt Systems, Inc. The Lab-Volt F.A.C.E.T.® software and
other materials described in this document are furnished under a license agreement or a
nondisclosure agreement. The software may be used or copied only in accordance with the terms
of the agreement.
ISBN 0-86657-281-3
Lab-Volt and F.A.C.E.T.® logos are trademarks of Lab-Volt Systems, Inc.
All other trademarks are the property of their respective owners. Other trademarks and trade
names may be used in this document to refer to either the entity claiming the marks and names or
their products. Lab-Volt System, Inc. disclaims any proprietary interest in trademarks and trade
names other than its own.
Lab-Volt License Agreement
By using the software in this package, you are agreeing to 6. Registration. Lab-Volt may from time to time update the
become bound by the terms of this License Agreement, CD-ROM. Updates can be made available to you only if a
Limited Warranty, and Disclaimer. properly signed registration card is filed with Lab-Volt or an
authorized registration card recipient.
This License Agreement constitutes the complete
agreement between you and Lab-Volt. If you do not agree 7. Miscellaneous. This agreement is governed by the laws of
to the terms of this agreement, do not use the software. the State of New Jersey.
Promptly return the F.A.C.E.T. Resources on Multimedia
(CD-ROM) compact discs and all other materials that are Limited Warranty and Disclaimer
part of Lab-Volt's F.A.C.E.T. product within ten days to
Lab-Volt for a full refund or credit. This CD-ROM software has been designed to assure correct
operation when used in the manner and within the limits
1. License Grant. In consideration of payment of the license described in this Instructor's Guide. As a highly advanced
fee, which is part of the price you paid for this Lab-Volt software product, it is quite complex; thus, it is possible that if
product, Lab-Volt, as Licensor, grants to you, the Licensee, a it is used in hardware configurations with characteristics other
nonexclusive, nontransferable license to use this copy of the than those specified in this Instructor's Guide or in
CD-ROM software with the corresponding F.A.C.E.T. Lab- environments with nonspecified, unusual, or extensive other
Volt reserves all rights not expressly granted to the Licensee. software products, problems may be encountered by a user. In
2. Ownership. As the Licensee, you own the physical media such cases, Lab-Volt will make reasonable efforts to assist the
on which the CD-ROM is originally or subsequently recorded user to properly operate the CD-ROM but without
or fixed, but Lab-Volt retains title to and ownership of the guaranteeing its proper performance in any hardware or
software programs recorded on the original compact disc and software environment other than as described in this
any subsequent copies of the CD-ROM, regardless of the Instructor's Guide.
form or media in or on which the original and other copies This CD-ROM software is warranted to conform to the
may exist. This license is not a sale of the original software descriptions of its functions and performance as outlined in
program of Lab-Volt's CD-ROM or any portion or copy of it. this Instructor's Guide. Upon proper notification and within a
3. Copy Restrictions. The CD-ROM software and the period of one year from the date of installation and/or
accompanying materials are copyrighted and contain customer acceptance, Lab-Volt, at its sole and exclusive
proprietary information and trade secrets of Lab-Volt. option, will remedy any nonconformity or replace any
Unauthorized copying of the CD-ROM even if modified, defective compact disc free of charge. Any substantial
merged, or included with other software or with written revisions of this product, made for purposes of correcting
materials is expressly forbidden. You may be held legally software deficiencies within the warranty period, will be
responsible for any infringement of Lab-Volt's intellectual made available, also on a licensed basis, to registered owners
property rights that is caused or encouraged by your failure to free of charge. Warranty support for this product is limited, in
abide by the terms of this agreement. You may make copies all cases, to software errors. Errors caused by hardware
of the CD-ROM solely for backup purposes provided the malfunctions or the use of nonspecified hardware or other
copyright notice is reproduced in its entirety on the backup software are not covered.
copy. LICENSOR MAKES NO OTHER WARRANTIES OF ANY KIND
4. Permitted Uses. This CD-ROM, Instructor's Guide, and all CONCERNING THIS PRODUCT, INCLUDING WARRANTIES
accompanying documentation is licensed to you, the OR MERCHANTABILITY OR OF FITNESS FOR A
PARTICULAR PURPOSE. LICENSOR DISCLAIMS ALL
Licensee, and may not be transferred to any third party for OBLIGATIONS AND LIABILITIES ON THE PART OF
any length of time without the prior written consent of Lab- LICENSOR FOR DAMAGES, INCLUDING BUT NOT LIMITED
Volt. You may not modify, adapt, translate, reverse engineer, TO SPECIAL OR CONSEQUENTIAL DAMAGES ARISING OUT
decompile, disassemble, or create derivative works based on OF OR IN CONNECTION WITH THE USE OF THE SOFTWARE
the Lab-Volt product without the prior written permission of PRODUCT LICENSED UNDER THIS AGREEMENT.
Lab-Volt. Written materials provided to you may not be Questions concerning this agreement and warranty and all
modified, adapted, translated, or used to create derivative requests for product repairs should be directed to the Lab-Volt
works without the prior written consent of Lab-Volt. field representative in your area.
5. Termination. This agreement is effective until terminated. LAB-VOLT SYSTEMS, INC.
It will terminate automatically without notice from Lab-Volt P.O. Box 686
if you fail to comply with any provisions contained herein. Farmingdale, NJ 07727
Upon termination you shall destroy the written materials, Attention: Program Development
Lab-Volt's CD-ROM software, and all copies of them, in part Phone: (732) 938-2000 or (800) LAB-VOLT
or in whole, including modified copies, if any. Fax: (732) 774-8573
Technical Support: (800) 522-4436
Technical Support E-Mail: techsupport@labvolt.com
THIS PAGE IS SUPPOSE TO BE BLANK
Table of Contents
Unit 1 – DSP Trainer Familiarization..........................................................................................1

Exercise 1 – Introduction to the DSP Circuit Board .................................................................11
Exercise 2 – The Assembler and Debugger ..............................................................................15
Exercise 3 – Processor Arithmetic ............................................................................................19
Unit 2 – CPU Architecture ..........................................................................................................23
Exercise 1 – The Central Arithmetic Logic Unit ......................................................................29
Exercise 2 – Memory Space......................................................................................................35
Exercise 3 – Addressing............................................................................................................38
Unit 3 – Program Execution........................................................................................................45
Exercise 1 – The Program Controller........................................................................................48
Exercise 2 – The Pipeline..........................................................................................................53
Unit 4 – Basic I/O .........................................................................................................................57
Exercise 1 – DSP Peripherals....................................................................................................60
Exercise 2 – Digital Signal Processing: The FIR Filter ............................................................64
Appendix A – Safety ................................................................................................................. A-ii
i
THIS
ii
Introduction
This Student Workbook provides a unit-by-unit outline of the Fault Assisted Circuits for
Electronics Training (F.A.C.E.T.) curriculum.
The following information is included together with space to take notes as you move through the
curriculum.
♦ The unit objective

♦ Unit fundamentals
♦ A list of new terms and words for the unit
♦ Equipment required for the unit
♦ The exercise objectives
♦ Exercise discussion
♦ Exercise notes
The Appendix includes safety information.
iii
THIS
iv
Digital Signal Processor Unit 1 – DSP Trainer Familiarization
UNIT 1 – DSP TRAINER FAMILIARIZATION
UNIT OBJECTIVE
Upon completion of this unit, you will be able to explain the difference between a digital signal
processor (DSP) and a general-purpose processor. You will be familiar with the design process
for DSP programs.
UNIT FUNDAMENTALS
A DigitalSignalProcessor(DSP) is an incredibly fast and powerful microprocessor that, like our

brain, can handle the analysis of signals in real-time.
The internal design of DSPs, the key element being the multiply and add architecture, makes
them often much faster at calculating mathematical operations than other microprocessors.
1
Digital Signal Processors are characterized by:
• specialized structures that make them execute commands rapidly and efficiently.
• fast multiply instructions.
• reduced numbers of commands making the DSP programming process simpler.
DSPs have revolutionized telecommunications. They can be found inside of, to name a few,
cellular phones, modems, speech recognition/synthesis devices, Digital Versatile Disk (DVD)
players and high level security devices.
In fact, DSPs are commonly found in other devices that are not immediately, in the minds of
people, associated with them, such as: hard disk drive controllers, vehicle suspension systems
and in the signal processing circuits of medical imagers and radar systems.
DSPs began to appear at the end of the 1970s and the beginning of the 1980s with Bell Lab's
DSP1, Intel's 2920, and NEC's µPD7720.
In 1982, Texas Instruments introduced the TMS32010, the first member of what was to become
a popular 16-bit fixed-point DSP family. This DSP had an average calculation rate of 8 MIPS.
2
In 1998, DSPs, using parallelism, reached calculation speeds of up to 1600 MIPS.
The DSP used with the Lab-Volt DIGITAL SIGNAL PROCESSOR circuit board is a Texas
Instruments TMS320C50. The TMS320C50 is a third-generation DSP with an internal design
based on the first-generation TMS32010.
Also in 1982, the first floating-point DSPs were produced by Hitachi. This numeric format
greatly increased the dynamic calculation range of DSPs.
NEC introduced, two years later, the first 32-bit floating-point DSPs that had a calculation speed
of 6.6 MIPS.
Generally, real world signals (e.g., radar and sonar) are better processed by floating-point DSPs.
Constructed signals (e.g., telecom, imaging and control) are generally better processed using
fixed-point DSPs.
3
The uses that DSP's have been put to has grown because:
• They allow for more complex processing than is possible with analog circuitry;
• They provide repeatable signal processing performance;
• Digital processing codecan be easily modified, and with it design updates or changes are
more flexible;
• They usually result in a lower development cost than analog designs with equivalent
performance levels.
A DSP cannot operate without the intelligence of a program giving it its commands. The
program tells the DSP which instructions it must execute to perform certain functions. This
program is stored as machine codeinside of the DSP.
If a programmer were to write a DSP program

using machine code it would be very difficult.
For this reason, an assembler language is

developed to program the DSP.
This is a programming language whose

instructions, mnemonicsare symbolic and
usually in one-to-one correspondence with the
machine instructions.
An assemblerand a linkerare used to translate

the program written in assembler language into
DSP machine codes.
4
The assembler translates the program file into object fileswhich are then linked together to create
the executable file.
The C language is a high-level languagewhich is

used more and more to program complex DSPs or
highly complex algorithms.
Programming in C simplifies the design of DSP

applications because the programmer is no longer
limited by the small instruction set of
low-level languages(like the assembler language).
A C compiler is used to translate the C source codes

into the appropriate DSP assembler codes.
The last part of programming involves checking your program for mistakes and making changes
until it correctly performs the desired function.
5
This final process is commonly known as debugging.
A program that aids software debugging is called a debugger.
A debugger gives the programmer an ability to diagnose the problems associated with their DSP
programs. This is done before committing the program to the DSP's memory.
The C5x Visual Development Environment, C5x VDE, is the debugger used with the Digital
Signal Processor.
DSP system developers rarely debug a DSP without the aid of a debugger. As well, to aid them
they often use EVMs, emulators, and simulators
The DSP used with the circuit board is part of the TMS320C5x DSK (Digital Signal processing
Kit) evaluation module.
When using EVMs, emulators, and simulators the developers can change, during the
development process, the model of the DSP being tested.
6
Once functional, the final test for a program are implemented with a DSP system.
The programs included and used with the Digital Signal Processor are written in the assembler
language. The assembler language used is one specific to the TMS320C5x EVMs, it has added
instructions in it called DSK directives.
To run, or examine the function of, a Digital Signal Processor program, the executable file
(*.dsk) must be downloaded into the DSP through the C5x VDE, the Trainer's debugger.
NEW TERMS AND WORDS

digital - pertaining to data represented by numbers. A digital signal is not continuous, it does not
have a numerical value associated with every point in time, and it has discrete amplitudes.
signal - a time-dependant physical quantity (like a current level) by which, for example,
information is transmitted in an electronic system or circuit.
processor - a device that performs operations on data according to specific rules given to it by a
list of instructions.
real-time - a processor operating mode under which a data sample is received, processed, and
returned before the processor's next data sample is received. This is done so quickly as to allow
the user: to respond instantaneously, affect the functioning of the environment or guide the
physical processes which the processor controls. Most interactive systems operate in a real-time
mode.
fixed-point - a system of arithmetic in which all numerical quantities are expressed by a number
of bits. In this system the decimal point is implicitly located at some predetermined position.
MIPS - a unit of measure proportional to the performance level of a processor. One MIPS
corresponds to the execution of a Million Instructions Per Second (it is sometimes abbreviated
MIP). Often the multiply/accumulate instruction, common to nearly all DSPs, is used to calculate
the MIPS rate.
parallelism - parallelism is a type of computing in which several independent operations are
carried out at the same time instead of one after the other.
7
floating-point - a system of arithmetic characterized by a notation where real numbers are

represented by a fixed-point value known as the mantissa, and by an integer known as the
exponent. The real number is equal to the mantissa multiplied by two to the power of the
exponent.
code - a piece of programming text found in a programming language.
machine code - instruction code recognized and executed by a microprocessor. The code is
expressed in a binary numerical representation.
mnemonics - a symbolic representation made of alphabetic letters and designed to aid human
memory; It commonly represents the operation code of an assembly language instruction-name.
The assembler translates the mnemonic into machine code.
assembler - a program that converts, for execution, symbolic instructions (mnemonics) into
machine code.
linker - a program that creates one executable file from one or many object files.
object files - File which consists of machine code directives that usually represent a portion of a
program.
C language - a general purpose programming language that produces code independent of the
type of microprocessors it is developed for.
high-level language - a programming language closer to human language, each program
instruction or statement corresponds to one or more machine-executable instructions.
low-level languages - a programming language close to machine language and in which each
mnemonic has a one-to-one equivalence with machine code.
compiler - a program that converts a high-level language into a low-level machine language.
EVMs - Evaluation Modules are low cost development boards that include a target processor,
and a limited amount of peripherals and of external memory. EVMs are used to test codes in
real-time.
emulators - a combination of hardware microprograms and software that enables one computer
system to execute programs written for another type of microprocessor.
simulators - a program that permits a computer system to imitate the logical operation of another
type of microprocessor.
surface mount - a type of technology that allows for a fully automated manufacturing process
for printed circuits. It consists of soldering the pieces directly on the surface of a printed circuit
board (PCB).
CODEC - is the abbreviation for CODer-DECoder. It is an electronic circuit that converts analog
signals into digital representations, and decodes digital signals into analog form.
anti-aliasing filter - low pass filter designed to remove, from the input signal the high frequency
components that degrade the analog-to-digital conversion of the output signal.
post-filter - low pass filter designed to remove, from the output signal, high frequency
components that are created by the digital-to-analog conversion.
interrupt - the suspension of a computer process caused by an external event. Once the external
event handling procedure is completed, the computer process is resumed.
hand-shaking - the dialogue that takes place between two devices before a transfer of
information begins. Hand-shaking is the exchange of predetermined signals for purposes of
control when a connection is established.
8
label - a symbol that begins in column 1 of an assembler source statement. A label is the only
assembler statement that can begin in column 1.
mnemonic - a symbolic representation made of alphabetic letters and designed to aid human
memory; It is commonly an abbreviation, or shortened form, of the description of the machine
code operation that it performs. The assembler translates the mnemonic into machine code.
operands - the part of an instruction that designates where the central processing unit (CPU) will
fetch or store data during instruction execution.
comment - the portion of a source statement that documents or improves the readability of a
source file. Comments are not compiled, assembled, or linked; they have no effect on the object
file.
registers - a storage device having a specified capacity such as a bit, a byte, or a computer word
and usually intended for a special purpose.
conditional blocks - a block of code that is only assembled if a certain conditional statement is
true.
CPU - the CPU, Central Processing Unit, is that portion of the processor involved in arithmetic,
shifting, and Boolean logic operations, as well as the generation of data- and program-memory
addresses.
peripheral - in a data processing system, any equipment, distinct from the central processing
unit, which may provide the system with outside communication or additional facilities.
breakpoints - breakpoints are used to correct or debug programs. A breakpoint is a place in a
computer program, usually an instruction, where the execution of the program is interrupted.
subroutine - a sequence of computer instructions that perform a specific task and that are usually
used repeatedly by the main program (routine).
wavetable - a list of values that define one period of a signal. The wavetable is stored in memory
and is used to generate a waveform.
dma - an abbreviation for Data Memory Address.
pma - an abbreviation for Program Memory Address.
clock cycle rates - synonymous with processor cycle rate, it usually refers to the rate at which the
DSP system performs its most basic unit of work.
dynamic range - the dynamic range is the ratio between the largest and smallest value a quantity
or parameter can take.
internal arithmetic unit - that part of a computer which performs arithmetic operations. E.g.,
taking two numbers stored in specific places in memory, adding them together, and storing the
result.
numerical formats - a programmer's convention where each bit in a word of information is
implied to be weighted by a certain value.
two's complement - a numerical convention for the representation of values in fixed-point
processors. The left-most bit represents a negative decimal value and the remaining bits each
represent a different positive decimal value.
weights - the factor by which a digit in a binary number is multiplied to obtain its additive
contribution in the representation of a real number.
binary point - the character, in binary notation, that separates the integral part of a numerical
expression from its fractional part.
9
EQUIPMENT REQUIRED
F.A.C.E.T. base unit
DIGITAL SIGNAL PROCESSOR circuit board
C5x VDE program
Ex1_1 , ex1_2 assembler (asm) and program (dsk) files
Oscilloscope
Multimeter
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
10
Exercise 1 – Introduction to the DSP Circuit Board
EXERCISE OBJECTIVE
Upon completion of this unit, you will be familiar with the location and the function of each of
the various components of the DIGITAL SIGNAL PROCESSOR training system.
DISCUSSION
The circuit board has two functional sections: the section containing the circuit board
accessories, and the section containing the Digital Signal Processor and its peripherals.
The circuit board accessories are the:
• POWER SUPPLY with AUXILIARY POWER INPUT

• DC SOURCE
• MICROPHONE PRE-AMPLIFIER
• AUDIO AMPLIFIER
The POWER SUPPLY circuit block delivers a filtered and regulated DC supply to the entire
circuit board.
The circuit board can be operated in two different ways. Either the input voltage for the Power
Supply can be received from a Lab-Volt FACET Base Unit or it can be received through external
±15 V connections found on the AUXILIARY POWER INPUT block.
The DC SOURCE block delivers a DC voltage varying, depending on the position of the
potentiometer, between -3.5 Vdc and +3.5 Vdc. The DC SOURCE can be used as the source of
an input reference signal for programs run on the DSP.
The MICROPHONE PRE-AMPLIFIER is used to adjust a microphone's signal to a level suitable

for input into the DSP.
The GAIN potentiometer varies the output-level between a low and a high value.
To be able to hear the signal from the ANALOG OUTPUT, located on the CODEC block, the
AUDIO AMPLIFIER is used. Either the speaker or the headphones can be used to listen to the
signal.
11
The second functional section of the circuit board, the DSP and its peripherals, contains the:
• DSP
• CODEC
• I/O INTERFACE
• INTERRUPTS
• AUXILIARY I/O
• SERIAL PORT
The Digital Signal Processor is found at the heart of a digital signal processing system.
The DSP block contains a TMS320C50 DSP integrated circuit (IC) in a 132-pin surface mount
package. It may reach execution speeds of up to 50 MIPS. There are many kinds of DSPs, they
may vary in cycle speeds. The calculation speed is set by a DSP's clock. However, the speed is
limited by the IC's internal system design constraints. Some DSPs use an internal oscillator to set
the clock and others use an external oscillator.
The DSP used on the circuit board is configured to use an external oscillator. The Oscillator
located on the circuit board provides it with a 40 MHz reference signal. The DSP divides this
signal to make a 20 MHz internal one (the master clock frequency) that it uses to time its
instruction cycles.
Some DSP programs are written to internal ROM during the manufacturing process, most,
however, use external ROM to store their program. Both types of DSPs access their ROM at
boot-up and store the program to RAM for execution. A DSP uses digital signals. To be able to
interact with the outside world it must have a translator to convert the analog signals to digital
ones and then back again.
A CODEC is the translator that is used for this purpose.
A CODEC is usually made up of the following components:
• a programmable input GAIN

• an ANTI-ALIASING FILTER
• an Analog-to-Digital converter
• a Digital-to-Analog converter
• a POST-FILTER
The I/O INTERFACE is a means to display and to input program information. The 8-position
DIP switch enters an 8-bit number into the DSP. Depending on the program being used the
information will be processed in different ways. The 7-segments displays are used to show
program information to the DSP user.
12
Like most microprocessors, DSPs have interrupt control capabilities. Two push-buttons can be
used as user input devices for a program. When one of the push-buttons is pressed an interrupt is
signaled within the DSP and the program code associated with it is executed. The AUXILIARY
I/O section was added for signal monitoring purposes and for prototyping of additional DSP
exercises done with the circuit board.
The headers of the AUXILIARY I/O block can be used to interface the DSP with an external
circuit. The external circuit can be powered by the 10-pin header located in the AUXILIARY I/O
block. The AUXILIARY I/O section has three headers. ±5 Vdc and ±15 Vdc connection points
are available on the 10-pin right header; these can be used to power an external circuit. The
circuit board supplies have a common ground. The left header outputs the 8 LSB pins (labeled
D0 to D7) of the external DSP data bus, and include 4 pre-decoded addresses (labeled PA0# to
PA3#) which can be used for prototype development are also included.
The middle header has the following input/output (I/O) pins:
• Data, Program, and I/O space select (DS#, PS#, IS#)

• Timer output (TOUT)
• Read select and Write enable for external devices (RD#, WE#)
• Read/Write select for external accesses (R/W#)
• Interrupt acknowledge signal (IACK#)
• External interrupt input (INT4#)
• Directional and Chip select to control external data transfer (DIR, CS#)
The DSP on the circuit board is programmed to be the slave of a host computer. For the DSP
Trainer to be used the circuit board SERIAL PORT must be connected to one of your computer's
serial ports.
NOTE: If the host computer does not have a second serial port connection available, then at the
appropriate times during the exercise procedures, you can disconnect the Base Unit serial link
and use it to connect the circuit board SERIAL PORT to the computer.
The C5x Visual Development Environment (VDE) manages hand-shaking between the circuit
board and your computer. It controls all input and output from the DSP's memory via the serial
link. Once the communication link between your computer and the DSP board is established, the
C5x VDE can be used to download a program into the DSP.
13
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
14
Exercise 2 – The Assembler and Debugger
EXERCISE OBJECTIVE
Upon completion of this exercise, you will understand basic DSP source file syntax. You will be
able to operate the debugger that accompanies the DIGITAL SIGNAL PROCESSOR.
DISCUSSION
The source file for a DSP program can be written inside of a text editor, virtually any ASCII
editor can be used.
The instruction lines found in the source file and used in the assembler programming language
are called source statements. A DSP program is a list of these assembled source statements. The
source statements used in the assembler language have a very precise syntax.
There are four fields that make up a statement:
• the label (optional)

• the instruction mnemonic
• the instruction mnemonic operands (the number of operands depends on the instruction
used)
• the comment (optional)
Each source statement field must be separated by one or more blanks. The source statements
themselves must either begin with a label or a blank. The beginning of a comment line must be
indicated by a semicolon or an asterisk. A source file may also contain assembler directives.
Directives supply the program with data and control the assembly process.
Assembler directives permit the following to be done:
• initialize program instructions and data values into memory.

• define symbolic names for certain DSP registers (using the .mmregs directive).
• reserve space in memory for variables that have not been initialized.
• assemble conditional blocks.
The executable file dsk5a.exe is the assembler program used with the DIGITAL SIGNAL
PROCESSOR. When a source file (*.asm) is assembled, a dsk file (*.dsk) and a listing file (*.lst)
are created.
15
The dsk file, also known as the program file, contains a list of machine code corresponding to
assembled source statements. To run a program, the program file must be loaded into the DSP.
The DSP is loaded with the dsk file. The listing file lists all source statements, line numbers and
any errors that occurred during assembly. When the program is viewed inside of the debugger,
the listing and the dsk files are used to create a display of the source file statements.
The C5x VDE is the debugger used with the Digital Signal Processor. It has the following
functions:
• Load dsk programs into memory and view the program code,
• run and halt the program and execute single step commands (execution of single
instructions),
• display in a viewing window the CPU registers and peripheral registers,
• display in a viewing window the DSP memory areas,
• graph DSP memory values while the DSP program is running,
• edit CPU registers, DSP program instructions and memory,
• place breakpoints at specific DSP source statements.
The C5x VDE uses the listing file to dis-assemble (contrary of assemble) machine code
contained within the dsk file. The dis-assembled code is then displayed. When a dsk file is
loaded into DSP memory the Dis-Assembly window automatically opens.
The Dis-Assembly window displays four columns of information:
1. The address in memory where the instruction is found,

2. the instruction in machine code,
3. the instruction mnemonic,
4. the instruction operands.
The source statement highlighted with a yellow line represents the next instruction that the DSP
will execute. A source statement highlighted with a purple line corresponds to an instruction
where a breakpoint has been set.
A toolbar located at the top of the debugger screen has commands that aid in the control of
program execution. Run and Halt, are used to begin and stop program execution. StepInto: You
can single step through the code by clicking on the StepInto button on the Toolbar. This will
execute one program instruction for every click of the button. StepOver: If you do not wish to
single step through a subroutine, you can execute the StepOver command once you reach a
CALL function. The entire function will then be executed, at this point single stepping can
resume. StepOut: The StepOut command will execute all of the instructions necessary to execute
a subroutine. Execution will be halted once a RET (return from subroutine) assembler instruction
is encountered.
16
The value of all CPU registers are shown in the C5X Registers window. You will become
familiar with many of the CPU registers as you advance through the course. For the moment, it is
sufficient to know that these registers contain DSP system information. The registers displayed
in the window contain values, DSP status and control bits and instruction pointers. Memory is
viewed inside of the debugger by opening a Memory display window. The memory addresses to
be monitored are user selected. As many memory windows as needed may be launched inside of
the debugger.
When a dsk file is loaded inside of the C5x VDE, the following is true for the Dis-Assembly and
Memory display windows:
• All source statement labels, used to declare a variable within the source code, appear in blue.
• All comments of labeled source statements appear in green.
The Memory display window can be used as a Watch Window. Variables stored in memory may
be watched and edited if necessary. Within all viewing windows, the following is true:
• Memory addresses and registers appear in red when the values stored within them are
modified during the execution of the previous instruction.
• Memory addresses and registers (except the RAM, XF and INTM registers) can be edited by
simply double-clicking on the desired register or memory address.
The Graph command in the View menu can be used for graphical displays of data values. Signals
can be viewed in either the time or frequency domain, at any point in your program. Breakpoints
halt a program for the debugger user to be able to verify the status of the loaded program after a
certain instruction. When an instruction, in the Dis-Assembly window, is double-clicked on, a
breakpoint is set on the instruction.
The associate breakpoint window can be launched by executing the Associate Breakpoints
command in the Options menu. A window can be continuously refreshed by using the associate
breakpoint feature. A selected display window (Graph display, Memory display, CPU Register
display, ...) can be associated with any breakpoint. When a breakpoint is executed any display
windows that are associated with it are updated. This effectively connects a probe to a specific
point in the program.
17
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
18
Exercise 3 – Processor Arithmetic
EXERCISE OBJECTIVE
Upon completion of this exercise, you will be familiar with the numerical formats and
representations used within DSPs.
DISCUSSION
Digital Signal Processors are categorized by the way that their arithmetic is performed. A DSP
can either be: a fixed-point DSP, or, a floating-point DSP The type of DSP chosen for a specific
application depends on the suitability of its arithmetic for the task. The TMS320C50 is a fixed-
point DSP.
Fixed-point DSPs are usually cheaper than their floating-point counterparts because they contain
less silicon and have less external pins. Fixed-point devices generally have faster clock cycle
rates. In 1998, these clock cycles were as small as 10 ns, corresponding to a processor cycle rate
of 100 MHz.
Floating-point devices are usually more flexible because their arithmetic system has access to a
wider dynamic range and in many cases these systems are more precise. A typical 16-bit fixed-
point processor stores coefficients and data values with 16-bit precision. However, within the
internal arithmetic unit of the DSP, intermediate values are kept at 32 bits of precision. By so
doing, the cumulative rounding error made during calculations is minimized.
When you use your computer or your calculator you can calculate such values as: (-1*23) or
(3.453) A DSP can also provide answers to the same types of questions. A programmer must use
certain numerical formats so that every value desired to be used in the DSP has a binary
representation associated with it. This binary value will need at times to represent either a
positive or negative, fractional or integer number. Since a DSP is a processor that specializes in
doing rapid calculations, it is essential to understand how the diverse range of numeric values
can be expressed.
Integers, both negative and positive, are represented by the two's complement integer format
(2s-format). Fractional numbers, both negative and positive, are represented by the two's
complement fractional format (Q-format). These formats differ only by the associated weights
that are given to each bit of information. In two's complement integer notation (2s-format) a
negative sign is associated with the most significant bit. The 2s-format provides a numeric range
covering: -2N-1 to +(2N-1 - 1) where N represents the number of bits in the binary number.
19
The two's complement fractional format (or Q-format) associates different weights with each bit
as well. The existence of the binary point separating the fractional weighted values from the
integral weighted values is implied. In Q15-format the most significant bit is the sign bit and it is
given a weight of -20. This implies that the binary point is located between the MSB and the
14th bit. By changing the position of the binary point the weight given to each bit is also
changed. Consequently, the dynamic range and the precision of the two's complement fractional
format may vary with the type of format being used.
Note that by continuing to move the binary point further and further to the right a handy
relationship is uncovered. The 2s-format and the Q15-format decimal representations are
proportional by a scaling factor of 215.
The 2s- and Q-formats can be used by the fixed-point internal arithmetic units of any DSP. These
formats are numerical conventions used by programmers. The binary arithmetic done inside of a
fixed-point DSP is not affected by the format of the binary number used.
Floating-point DSPs generally use a 32-bit format where the 24 left-most bits represent the
mantissa and the 8 remaining bits represent the exponent. So that a continuous range of values is
covered by a 32-bit floating-point number, the mantissa must vary over -1 to 0 and +1 to +2.
This means that the bit weighted by 20 will always be equal to 1. Therefore, it becomes
unnecessary to store it in memory and during calculations it becomes an implied bit.
Floating-point processors are usually more precise and have a larger dynamic range. While in
theory the choice between fixed- and floating-point arithmetic is independent of the choice of
precision, in practice floating-point processors usually provide higher precision. This arises
because more bits are provided to define the mantissa (24 bits + 1 implied bit) compared to
fixed-point DSPs that usually have 16 bits, although 20- and 24-bit fixed-point DSPs exist.
20
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
21
22
Digital Signal Processor Unit 2 – CPU Architecture
UNIT 2 – CPU ARCHITECTURE
UNIT OBJECTIVE
Upon completion of this unit, you will understand the basic difference between the architecture
of a digital signal processor and that of a general-purpose processor. You will be familiar with
the layout of the internal elements of a DSP CPU.
UNIT FUNDAMENTALS
In the 1950s, analog signal-processing circuit designers began to look to computers to simulate
their designs. They were able to simulate the circuits, but not in realtime. It was until the mid-
1970s that computers became powerful enough to do the realtime signal processing of the analog
circuits that they had been simulating. DSPs today are in fact the result of years of research that
even now is still a very active field. Their specialized architecture allows them to implement
signal processing algorithms more effectively than general-purpose processors.
The basic processor architecture that is most often implemented in general-purpose processors is
known as the Von Neumann architecture. The Von Neumann architecture has a single memory
space that is used for both data and instructions (instructions belonging to the program). Digital
Signal Processors have historically used a slightly different internal structure known as the
Harvard architecture. The Harvard architecture, as opposed to the Von Neumann, has separate
memory spaces for data and program instructions. The Harvard architecture differentiates
between the types of information it stores in memory. The information is either a data word (an
operand for an instruction) or a program word (the instruction).
Data words are kept in data memory space and are read from and written to different locations
within the processor via the data bus (the DB). Programming words are kept in program memory
space and are read from and written to different locations within the processor via the program
bus (the PB). The Von Neumann architecture only uses one bus. This bus accesses both data and
program instructions.
A typical DSP contains:
• Memory
• a Central Processing Unit (CPU)
• Peripherals
• a Bus structure
23
Memory consists of all of the addressable storage space inside of a processing unit:
• Program Read-Only Memory (ROM)

• Data/program Single-Access RAM (SARAM)
• Data/program Dual-Access RAM (DARAM)
The Central Processing Unit (CPU) is that part of a processor where reside the circuits that
control the interpretation and execution of instructions.
The peripherals are those elements such as the timer, that are used by the CPU to time the
execution of instructions or, such as the serial ports, to communicate with devices exterior to the
processor. The bus structures of processors are differentiated by the way that the individual
processor buses are interconnected with the other elements of the processor (CPU, memory and
peripherals). It is essentially the bus structure that differentiates a Harvard architecture from a
Von Neumann architecture.
The CPU of the TMS320C50(C50) contains:
• Program control elements

• Memory-mapped registers
• an Auxiliary Register Arithmetic Unit (ARAU)
• a Central Arithmetic Logic Unit (CALU)
• a Parallel Logic Unit (PLU)
The CPU elements are found in practically all DSP models, but they might go under different
names. E.g.: The CALU of the DSP32xx family, designed by Lucent Technologies, is named a
Data Arithmetic Unit (DAU). The Program Controller is the unit that controls processor
instruction execution. The PC (Program Counter register) and status and control registers are at
the heart of Program Controller unit operation. Memory-mapped registers are on-chip registers
mapped to (associated with) a data memory address.
There are 28 core CPU registers, 17 peripheral registers, 16 I/O port registers, and 35 reserved
registers in the C50. In total, 96 registers are mapped into data memory. Since memory-mapped
registers are addressed in data memory space, they can be written to, and read from, in the same
way as any other data memory location. The Auxiliary Register Arithmetic Unit (ARAU) is used
to deduce (calculate and compare) and keep track of the position of information held within DSP
memory. The C50 has eight Auxiliary Registers (ARs) which are used by the ARAU to store
important memory addresses. The Central Arithmetic Logic Unit (CALU) is responsible for
executing logic and all arithmetic operations within a DSP. For example on the TMS320C50
DSP, the CALU executes these operations with a 16-bit x 16-bit multiplier, an accumulator,
operand registers, binary shifters, and a 32-bit 2s-complement Arithmetic Logic Unit (ALU).
The Parallel Logic Unit (PLU) is a 16-bit logic unit that executes logic operations without
interrupting the CALU (the main CPU arithmetic and logic unit).
24
NEW TERMS AND WORDS

architecture - architecture is a term applied to the overall structure and the logical
interrelationships of the components of a processor (or of a computer, a network) and its
software. Processor architecture can be divided into five fundamental components: input/output,
storage, communication, control, and processing.
general-purpose processors - a processor designed to operate on a wide variety of computational
and logical problems. E.g., the Intel Pentium line of processors.
memory space - memory space is a property of the DSP. Memory space represents the range of
addresses allocated to either internal or external memory devices by the DSP bus structure. On-
chip memory (ROM and RAM) for a specific processor is said to reside in the processor memory
space as does the processors peripherals and memory-mapped registers.
bus - a bus is a transmission path for the signals sent between processor devices.
ROM - ROM, Read Only Memory, this type of memory is used to store program code during the
manufacturing process. ROM is a non-volatile memory because it retains its data after the
processor has shut down.
RAM - RAM, Random Access Memory. This is usually used to store temporary program
information. RAM is a volatile memory because when power is removed the stored information
is lost.
registers - a group of bits used for temporarily holding data or for controlling, specifying, the
status of a device.
status and control registers - the operation of the TMS320C50 DSP CPU is determined by the
information found inside of four 16-bit Status and Control Registers. The four status and control
registers are: the Circular Buffer Control Register (CBCR), Processor Mode STatus register
(PMST), STatus Register 0 (ST0), STatus Register 1 (ST1).
Arithmetic Logic Unit (ALU) - that part of a processor that performs arithmetic (addition,
subtraction) and logic (AND, OR, ...) operations.
MAC - An abbreviation (mnemonic) for Multiply and ACcumulate, an operation often executed
in DSPs.
sign-extension - the process of filling the high-order bits of a number with the sign bit. For
example, when loading a 16-bit number into a 32-bit field, the sign bit of the 16-bit number is
extended into bit positions 17 to 32.
overflow - in an arithmetic operation, a result whose absolute value is too large to be represented
within the range of the numeration system in use.
underflow - in an arithmetic operation, a result whose absolute value is too small to be
represented within the range of the numeration system in use.
OVerflow saturation Mode (OVM) - when enabled, any overflow value produced by the ALU
appears as the maximum possible value. For the TMS320C50, the value appears as 7FFF FFFFh.
When enabled, any underflow value produced by the ALU will appear as 8000 0000h, the
minimum possible value.
dma - an abbreviation for data memory address.
memory - a device in which information can be inserted and stored and from which it may be
extracted when wanted.
access - to access memory is the action of reading the value held within a certain memory
location or of storing a value to a certain memory location.
25
non-volatile - a characteristic of a memory device not subject to the loss of stored information
when power is removed.
volatile - a characteristic of a memory device subject to the loss of stored information when
power is removed.
allocation - to allocate memory is to associate a specific address of the data or program bus with
a memory storage space.
memory block - a portion or section of memory storage. A storage block is considered a single
element for holding a specific or fixed number of words.
Harvard Architecture - the internal organization of a microprocessor which is characterized by
separate memory spaces for program instructions and data. The program and data memory
spaces are each accessed by one of two parallel buses. The Harvard architecture allows each
memory space to be accessed simultaneously.
modified Harvard architecture - a modified Harvard architecture is a variation on the basic
structure of the Harvard architecture. The variations are used to increase the simultaneous
memory accesses of the DSP. A modified Harvard architecture is also known as an extended
Harvard architecture or as a Super Harvard ARChitecture (SHARC).
memory bandwidth - the memory bandwidth of a processor is proportional to the number of
memory cycles per instruction cycle. A high memory bandwidth occurs in processors with many
data and address buses.
program/data memory - program/data memory is a memory that can be accessed by either one
of the two parallel buses inside of a processor with a Harvard architecture.
instruction cache - an instruction cache saves an instruction and is usually used to repeat that
instruction in a program. An instruction cache is also known as a program cache.
memory configuration bits - these bits are status and control bits that select the memory
configuration that is used by the DSP. Within the TMS320C50 DSP the MP/MC#, RAM, OVLY
bits are found within the Processor Mode Status Register (PMST) and the CNF bit, another
memory configuration bit, is found in Status Register 1 (ST1).
kernel - the programs that form the core or the most essential parts of an operating system for a
computer. Nucleus is a near-synonym for kernel and tends to be used where the effects are
achieved by a mixture of normal programming and micro coding (such as is done with the
assembler language).
addressing modes - an addressing mode is one of a set of methods used for specifying the
operand(s) of a machine code instruction. An addressing mode describes to the processor the
method that it will use for storing and retrieving data from memory.
implied addressing - implied addressing means that the instruction operand addresses are
implied by the instruction. An example of a 'C50 instruction that uses implied addressing is
ADDB (addition of ACC and ACCB registers).
direct addressing - a type of addressing that encodes the operand address within the instruction
word or within a word following the instruction word. This addressing mode is also known as
register-direct addressing or paged memory-direct addressing.
immediate (short and long) addressing - immediate addressing encodes the operand in the
instruction word or in a separate word that follows the instruction word.
indirect addressing - in this type of addressing the operand being addressed resides in memory
and the address of the memory location containing the operand is stored within a register. It is
this register that is specified during indirect addressing.
26
circular addressing - an addressing mode in which the contents of a register is used to cycle
through a range of addresses, creating a circular memory buffer. Circular addressing is also
known as modulo addressing.
paged memory-direct addressing - a type of addressing that encodes the operand address within
the instruction word or within a word following the instruction word. This addressing mode is a
type of direct addressing.
immediate addressing - immediate addressing encodes the operand in the instruction word or in
a separate word that follows the instruction word.
indirect - in this type of addressing the operand being addressed resides in memory and the
address of the memory location containing the operand is stored within a register. It is this
register that is specified during indirect addressing.
Auxiliary Register Arithmetic Unit - an Auxiliary Register Arithmetic Unit (or ARAU) is the
name given by Texas Instruments the developers of the TMS320C50 DSP, to the AGU of the
DSP. This is common practice, many DSP developers give different names to the units (like the
AGU) inside of their DSPs.
data buffers - a data buffer is a section of memory that is used to store data. The data arrives
from an off-chip source (such as a CODEC) or from a previous computation. It is held in the
buffer until the processor is ready to process the data.
circular buffers - a section of memory used as a buffer and that appears to wrap around on itself.
Circular buffers are typically implemented in software on conventional processors and via
modulo, circular, addressing on DSPs.
FIFO - a First-In, First-Out queue in which the most recent arrival is placed at the end of the
waiting list and the item waiting the longest receives service first. A FIFO is used as a buffer to
connect two devices operating asynchronously at different speeds. Each device is connected to
one end of the FIFO.
ARAU - ARAU stands for Auxiliary Register Arithmetic Unit, it is the unit within a DSP that is
responsible for addressing.
SACL *+, 0, AR0 - the 'C50 SACL instruction stores the ACCL (ACCumulator Low) bits in
memory. At this point in the program the accumulator contains the most recent sample received
by the DSP from the CODEC. The indirect addressing operands tell the CPU to store the sample
in one of the dma labeled XN0 to XN15. The dma is pointed to by auxiliary register 0 (AR0).
ADD *+, 0, AR0 - the 'C50 ADD instruction is used to add the operand to the accumulator. In
this case, the ADD instruction is being repeated, and is indirectly addressing the 16 most-
recently received samples. This in effect adds the samples together (an operation required by the
averaging process).
SACL 10 h - the 'C50 SACL instruction, as previously stated, stores the ACCL (ACCumulator
Low) bits in memory. In this particular case, when SACL is executed, the accumulator holds the
average of the 16 most recent samples received by the DSP and stored in memory. SACL stores
this average to the dma labeled OUTPUT.
interrupt service routine - an interrupt service routine (ISR) is a subroutine that is run every time
that a specific event occurs. In this case, program ex2_3.asm executes the ISR when a sample
from the CODEC is received by the DSP. In RUN mode, execution of the ISR is done
automatically.
27
EQUIPMENT REQUIRED
C5x VDE program
Ex2_1, ex2_2, ex2_2b and ex2_3 assembler and program files
Oscilloscope
Function generator
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
28
Exercise 1 – The Central Arithmetic Logic Unit
EXERCISE OBJECTIVE
Upon completion of this exercise, you will be familiar with the role that the CALU plays within
a DSP.
DISCUSSION
NOTE: Some 'C50 assembler CALU instructions are briefly covered in this exercise. It will be
left up to you, the student, to cover the rest of the related material. The material can be found in
the following file: C:\LV91027\DOC\TMS320C5x_UsersGuide.pdf. The Central Arithmetic
Logic Unit (CALU) is where the most important signal processing manipulations take place. The
CALU, also known as the data path, is the principle arithmetic and logic processing path for a
DSP. It lies along the data (operand) bus and is an integral part of the execution of nearly every
instructions.
A fixed-point CALU contains:
• Multiplier(s)
• Accumulator(s)
• Operand registers
• Shifters
• At least one Arithmetic Logic Unit (ALU)
Signal processing algorithms are almost entirely devoted to arithmetic and logic operations. The
CALU is designed to execute these types of operations extremely rapidly. A DSP is
differentiated from a general-purpose processor by:
1. Its memory architecture (a DSP usually has a Harvard architecture).

2. The rapid execution time of the CALU (or data path).
29
Both the Multiplier and the ALU are simultaneously used during a MAC instruction. The CALU
is said to be using its entire computational bandwidth. For most DSPs, when the entire
computational bandwidth of the CALU is repetitively used, a result is produced every clock
cycle. The operand registers play an important role within the CALU. The registers are used to
temporarily store operands, before they are supplied for arithmetic operations to the ALU or
Multiplier. The CALU of the TMS320C50 ('C50) has 3 operand registers. Memory-mapped
Temporary REGister 0 (TREG0) is an operand register used by the Multiplier. It holds one of the
multiplication operands for the Multiplier. The Product REGister (PREG) is a 32-bit operand
register which stores the Multiplier result. The value held in the PREG can be sent to the ALU
for an arithmetic operation, or it can be passed on to the Data Bus (DB) for the another stage of
processing. ACCB (the ACCumulator Buffer) provides a temporary storage place for the value
held by in the ACCumulator register (ACC). The ACC register is designed to hold the last
arithmetic result produced by the ALU. The ALU is designed to implement a wide range of
arithmetic and logical operations.
EXAMPLE OPERAND 1 OPERAND 2 OPERATION OUTPUT

1 1011 0100 0001 1101 ADD 1101 0001
2 1011 0100 0001 1010 SUBTRACT 1001 1010
3 0010 1001 1011 1101 AND 0010 1001
4 0010 1001 1011 1101 OR 1011 1101
5 0111 0101 – NEGATE 10001011
Some operations that are commonly executed by the ALU include: addition, subtraction,
negation, and logical and, or, xor, and not. The majority of ALU instructions execute within a
single clock cycle. Most of the ALU instructions that take more than one clock cycle rely on
other units for pre- or post-processing of data. E.g., add a data value to the ACC and then execute
a binary shift. The TMS320C50 requires 2 clock cycles to execute the operation. The binary shift
is an example of the type of processing that takes place after addition. The ALUs of fixed-point
DSPs execute 2s-complement arithmetic. The ALU executes operations using twice the precision
of the native word width of the processor. For example the ALU of the 'C50, a 16-bit fixed-point
DSP, inputs, outputs, and executes with a 32-bit word width.
Most DSP have an ALU mode of operation called sign-extension mode. When enabled all ALU
outputs are sign-extended. Sign extension prevents a negative number from being mistaken for a
positive one. When the number of bits used to represent a word (e.g., 16 bits) is less than the
number of bits required to represent the same word inside of the CALU (32 bits) then sign-
extension extends the sign-bit into the added MSBs.
30
The last arithmetic or logical operation executed by the ALU is stored in the ACCumulator
(ACC). The result held in the ACC can either be stored in the ACC Buffer register (ACCB),
passed on to the ALU, or to another stage of processing using the Data Bus (DB). In the case of
the 'C50 DSP, two operands need to be input into the ALU to execute any of its arithmetic or
logical operations. One of the operands is supplied by the ACCumulator register (ACC). One of
three other locations provide the other data operand for an ALU operation:
• Data path (e.g., to fetch an operand from memory)

• Multiplier Product REGister (PREG)
• ACCumulator Buffer (ACCB) register
Multiplication is an essential operation used in virtually all digital signal processing applications.
In many of the applications where multiplication is used half or more of the instructions executed
by the processor are multiplication operations. Central to nearly all programmable digital signal
processors is the single-cycle Multiplier. The Multiplier refers to the circuit within the DSP that
executes the multiplication of binary numbers. Depending on operand size(8-bit or 16-bit for the
C50), nearly all Multiplier instructions can be executed within one clock cycle.
Multiplication in fixed-point DSPs is executed with 2s-complement arithmetic. A Multiplier

requires a minimum two operands to execute a multiplication. These operands are treated as 2s-
complement numbers. In the TMS320C50, register TREG0 is always used as one of the operand
sources for the Multiplier. In certain cases, such as when the square root instructions (SQRA and
SQRS) are executed, there are no other operands than TREGO used by the Multiplier. When
another multiplication operand is required it is fetched from one of two other locations:
• Data memory using the Data Bus (DB)

• Program memory using the Program Bus (PB)
As previously stated, the Multiplier result is stored in a Product REGister (PREG). The product
register is twice as wide as the word width of the multiplication operands (native data word
width of the DSP).
31
OPERAND 1 OPERAND 2 OPERATION RESULT PREG (AFTER SIGN EXT.)

0111 0111
(+ 119)
0011 0111
(+ 55)
MULTIPLIER 0001 1001 1001 0001
(+ 6545)
0001 1001 1001 0001
(+ 6545)
0110 0110
(+ 102)
1011 0111
(- 73)
MULTIPLIER 0010 0010 1110 1010
(+ 8938)
FALSE
1110 0010 1110 1010
(- 7446)
All Multiplier results are sign-extended before they are stored in the Product REGister (PREG).
This combined with the fact that the PREG has twice the operand word width means that, by
itself, the Multiplier does not introduce any errors into computations. To keep the level of
arithmetic precision constant, the number of bits that are used to represent multiplication,
accumulation and other arithmetic operation results, need to be increased. That is why that in
DSPs the Multiplier Product Register and the ALU ACCcumulator (ACC) have a width twice
that of the native data word width.
OPERAND 1 OPERAND 2 OPERATION ACCUMULATOR OVM CORRECTION

OVER-FLOW 7FFF FFFF h 7FFF FFFF h ADDITION FFFF FFFE h 7FFF FFFF h
UNDER-FLOW 8000 0000 h 8000 0000 h ADDITION 0000 0000 h
FALSE
8000 0000 h
maximum positive value 7FFF FFFF h 231 - 1
maximum negative value 8000 0000 h -231
an overflowed value FFFF FFFF h -1
an underflowed value 0000 0000 h 0
32
Most signal processing applications require the addition of series of data values. These
operations when executed within fixed-point DSP can easily lead to an overflow or underflow.
In many processors, a mode of operation exists which is used to decrease the error that is caused
when overflow or underflow occurs. This mode within the TMS320C50 DSP is named
OVerflow saturation Mode (OVM). Barring the occurrence of overflow or underflow the
precision level within the ALU and the Multiplier is kept at the same level as when the
arithmetic entered the CALU. However, at some point it is usually necessary to reduce the
precision of the results; The data bus is still only half the bit-width of the CALU results.
Therefore, the programmer must select the product register or accumulator bits which will be
passed on to the next stage of processing (via the data bus).
The selection of which bits to pass on is done with shifters that are located at the exit of the
PREG and of the ACC. A shifter can shift a binary number to the right or to the left by so many
bits. However, shifting a number n bits to the left effectively multiplies it by a power of two (2n).
Pre- and Postscalers are used to scale values before they are input to or output from the
Multiplier and ALU. Scaling is an important operation in fixed-point DSPs because overflow can
be avoided by prescaling CALU inputs.
DSP FAMILY METHOD USED TO AVOID OVERFLOW

AT&T DSP16xx 4 guard bits
Analog Devices ADSP-21xx 8 guard bits
TI TMS320C2x and C5x No guard bits.
Intermediate results can be scaled.
Ideally, the size of an accumulator register should be larger than the size of the multiplier
product register by several bits. The extra bits named guard bits allow the programmer to
accumulate a number of values without the risk of overflowing the accumulator and without the
need to scale intermediate results (avoiding overflow). A single-bit field, present in the 'C50, and
known as the carry bit (or the C bit), is associated with the ACC register. The C bit indicates
whether an ALU operation generated a carry or a borrow. The DSP can be programmed to
conditionally test this bit. The C bit, similar to a guard bit, is useful for extended-precision
arithmetic.
33
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
34
Exercise 2 – Memory Space
EXERCISE OBJECTIVE
Upon completion of this exercise, you will be familiar with the basic characteristics of the
modified Harvard architecture, as used by DSPs.
DISCUSSION
Memory is an important part of any microcomputer or microprocessor. In computers like the one
you are using, memory is used to store program information such as the program code for the
C5x VDE and it is also used to store data information. The DSP contains on-chip memory and is
also able to access off-chip memory through its external address and data buses. On-chip
memory is usually of two types: ROM (Read Only Memory) is used to store program code
during the manufacturing process. ROM is a non-volatile memory because it retains its data
after the processor has shut down. RAM (Random Access Memory) is used to store temporary
program information. RAM is a volatile memory because when power is removed the stored
information is lost. Both categories of memory (ROM and RAM) are found on-chip (inside the
DSP). The allocation in memory space of these types of memory is able to be configured in
various ways.
The TMS320C50 DSP uses two types of on-chip RAM: SARAM (Single-Access RAM) - An
SARAM memory block can be written to or read from once within one instruction cycle.
DARAM (Dual-Access RAM) - A DARAM block can be read from and written to in the same
instruction cycle. The Harvard architecture has two parallel buses. One bus (the PB) is
dedicated to addressing and transport of programming information and the other (the DB) is
dedicated to addressing and transport of data. Two parallel buses allow program and data
memory to be accessed simultaneously.
Each of the parallel busses of a 16-bit fixed-point DSP can allocate 216 addresses to on-chip
memory and peripherals.
Most DSPs today use a modified Harvard architecture to increase their memory bandwidth.
The specific modifications present in the TMS320C5x that have been added to the traditional
Harvard structure are:
• a program/data memory, a memory that can be addressed by both the DB and the PB;
• an instruction cache that supplements program/data memory.
35
HARVARD ARCHITECTURE MODIFICATION 1

In the case of the TMS320C50 DSP, certain SARAM blocks can be configured as program/data
memory. This implies that each memory element within the SARAM block has been allocated a
data bus address and a program bus address. Program memory is addressed by the program bus.
Operands can only be stored in or read from program memory using the program bus. Data
memory is addressed by the data bus. Operands can only be stored in or read from data memory
using the data bus. Program/data memory is addressed by both the program bus and the data bus.
Operands can be stored in or read from program/data memory by either using the program bus or
the data bus. The C50 has four memory configuration bits that select how data and program
bus addresses are allocated among the different on-chip memories, I/O ports, internal memory-
mapped registers and external memory-mapped peripherals. The memory configuration bits are:
CNF: Enables on-chip DARAM B0 to be addressed by the PB or the DB. RAM:
Enables/disables SARAM from being addressed by the PB. OVLY: Enables/disables SARAM
from being addressed by the DB. MP/MC#: Enables/disables on-chip ROM from being
addressed by the PB.
The memory configuration bits should be initialized (set or cleared) at the beginning of a DSP
program and then they should no longer be changed. By altering the value of one of the bits,
memory elements either become mapped to other addresses (sometimes addresses on a different
bus) or become no longer address mapped at all.
HARVARD ARCHITECTURE MODIFICATION 2

In the case of the 'C50, the register named the Program Counter (PC) acts as an instruction cache.
It can store one instruction word (16 bits in width). The instruction once loaded into the PC can
be repeated the number of times is specified by the RePeaT Counter register (RPTC).
During a repeat loop, the program bus does not have to be used to read the next instruction in the
program. The DSP simply fetches the next instruction from the instruction cache. When the
instruction from the cache is being executed, a Program Bus (PB) access is freed. The PB is no
longer required to fetch the next program instruction. The freed memory access can be used to
read another operand from program/data memory. Specialized instructions like the MAC
(Multiply and ACcumulate) when repeated, use the freed Program Bus access to fetch a total of
two operands during a single clock cycle. When programming a DSP the only memory space
initializations that should be of a concern to the programmer are: The CNF, MP/MC#, RAM and
OVLY bit initializations. These select the proper memory configuration. The use of the DSK
directives describing the memory locations where program and data are stored: .entry, .ps, .text,
.word, .byte, .data or .ds, .set.
36
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
37
Exercise 3 – Addressing
EXERCISE OBJECTIVE
Upon completion of this exercise, you will understand the function that of address generation
unit within a DSP and the specialized addressing modes that it offers.
EXERCISE DISCUSSION
A processor uses an address to identify specific memory storage spaces (such as a DARAM
memory element or a peripheral register). An address in fact becomes the name of a certain
location. The address is used any time that the processor is required to write an operand to or
read an operand from the location. Addressing is the means by which operand locations are
specified to the processor when a read or write is executed. Many addressing modes exist.
Depending on the addressing mode used with an instruction, an operand could be fetched
directly from an internal memory address or from a register.
The most common types of addressing found in DSPs are:
• implied addressing
• direct addressing
• immediate (short and long) addressing
• indirect addressing
• circular addressing
Different types of DSPs offer a variety of different addressing modes. The type of addressing
used with an instruction influences program flexibility and performance. Certain types of
addressing were meant only to be used for specific situations and others are restricted for use by
a small processor instruction subset. Implied addressing is when the operand addresses are
implied by the instruction. An example of a 'C50 instruction that uses implied addressing is:
ADDB. This instruction adds the ACC and ACCB registers together. ACC and ACCB are thus
the implied operands for the instruction. Direct addressing encodes operand address within the
instruction word or within a word following the instruction word.
38
An Example of Direct Addressing with One Program Word

Instruction Opcode
Machine Code
1001110111101001
Partially Encoded Operand Address
7 bits wide
The direct addressing used within the TMS320C50 is known as paged memory direct addressing.
When the instruction word is executed, the encoded 7-bit address is concatenated with the upper
9 bits of the address held in a status and control register. Direct addressing encodes the operand
address within the instruction word or within a word following the instruction word.
An Example of Direct Addressing with Two Program Words

Instruction Opcode
Machine Code
1110110010101001
0001110101101110
Encoded Operand Address
Memory-direct and register-direct addressing are other forms of direct addressing. Both use a
second word following the instruction word and which encodes the operand address.
Processors using paged memory-direct addressing have data memory divided up into memory
pages. Each memory page corresponds to a section of memory. A special register (known in the
'C50 as the Data Page pointer, DP) stores the number of the current memory page. In the case of
the 'C50, the DP points to one of 512 possible data pages. Each page containing 128 words.
Direct addressing within the TMS320C50 requires that the 7 lower bits of the data memory
address be encoded within the instruction word. When the word is executed, the 9 bits from the
Data memory Page pointer (DP) are concatenated with the 7 bits encoded within the instruction
word. The operation forms the full 16-bit data memory address of the operand. Assume the 9-bit
Data memory Page pointer (DP) of a DSP held the value 126 (7E h). The 7 lower bits of the
addressed dma encoded within an instruction word were equal to 43 h.
39
Example of Short Immediate Addressing in the TMS320C50 DSP

ADD #05Ah
ADD Opcode
10111000 0 1 0 1 1 0 1 0
5A h
Operand
Known as short immediate addressing, the type of addressing shown encodes the operand into
the instruction word.
Example of Long Immediate Addressing in the TMS320C50 DSP

ADD #0B948h
ADD Opcode
1011111110010000
1011100101001000
B948 h
Operand
Known as long immediate addressing, the type of addressing shown encodes the operand into a
second word that follows the instruction word. DSPs use indirect and circular addressing to
manage operand address sets. These are required when performing repetitive calculations on data
series. The series are often stored sequentially in memory. DSPs include an Address Generation
Unit (AGU). The AGU is dedicated to the calculation of addresses for the different types of
addressing modes.
The AGU has its own separate arithmetic unit, AGU arithmetic is independent of the CALU. All
address calculations take place in parallel with instruction execution. The incorporation of an
Address Generation Unit (AGU) within a DSP allows arithmetic processing to proceed at
maximum speed while multiple instruction operands are specified. The figure shows the
Auxiliary Register Arithmetic Unit (ARAU) of the TMS320C50 ('C50). The 'C50 AGU has
eight memory-mapped auxiliary registers, identified AR0 through AR7. The registers can be
used for storing addresses or temporary data. Indirect and circular addressing require the use of
some of the Auxiliary Registers (AR0 to AR7). Within the ST0 register of the TMS320C50 is a
3-bit field identified as ARP, the Auxiliary Register Pointer. ARP is a 3-bit wide field that holds
a value between 0 and 7. The field specifies the current auxiliary register (AR0 to AR7) being
used for indirect-register addressing. Once the appropriate registers have been configured, the
AGU provides the necessary operand address required by the processor for the execution of an
instruction. As stated previously, the address generation unit operations are executed in parallel
with the CALU arithmetic instructions.
40
Any location in data memory can be read from or written to using an address contained in an
Auxiliary Register (AR0 to AR7). To select the specific AR used to address data memory, the
Auxiliary Register Pointer (ARP) must be loaded with the value of the AR (0 to 7) to be used.
The ARP points to the current auxiliary register used to address memory. When an assembler
instruction supporting indirect addressing, such as, the TMS320C50 DSP addition (ADD)
instruction: ADD * is executed, the processor fetches the data memory address from the correct
auxiliary register (which is pointed to by ARP). In this example, ARP is equal to 2 and so the
current auxiliary register is AR2. When an assembler instruction supporting and using indirect
addressing is executed, the address for the operand will be fetched from AR2. Many DSP
applications manage data buffers. Circular addressing, also known as modulo addressing, is
used to manage circular buffers.
In real-time applications such as the ones executed by DSPs, the programmer must determine the
size of the data buffer and then must set aside a portion of memory for the buffer. The data
buffers implemented on DSPs generally use a first-in, first-out (FIFO) protocol. This means that
the first values that are written to the buffer will be the first values read out of the buffer. For the
programmer to manage data into and out of the buffer, two address pointers must be maintained
(in the case of the 'C50, two auxiliary registers are used as the pointers). One of the pointers, the
read pointer, indicates the current value to be read from the buffer. The second pointer, the write
pointer, indicates the current location to write a new value to in the buffer.
After each read or write operation in a FIFO buffer with linear addressing the corresponding
pointer moves down (increments to the next location in the buffer). Once the pointers have
advanced to the end of the buffer they must be reset to point back to the beginning of the buffer.
In a FIFO buffer with circular addressing, after the read or the write pointer reaches the end of
the buffer it automatically advances to the start of the buffer. The automated end-of-buffer
verification and advance-to-start-of-buffer operation (usually automated by the AGU) make the
buffer appear circular to the programmer. Many DSP processors provide a form of circular
addressing. However, the facility of use and the mechanisms used to control it vary from DSP to
DSP. The approach taken to implement circular addressing in the TMS320C50 is to use start and
end address registers. The registers, respectively named CBSR1 and CBER1, hold the start and
end addresses for the circular buffer.
The AGU of the 'C50 takes charge of determining whether one of the write or read pointers is at
the end of the buffer or at the beginning. The AGU, thus automates the end-of-buffer verification
and advance to start-of buffer operation. Circular addressing is indirect addressing, however,
with added circular buffer management.
41
TMS320C50 Instructions
...
MAR *, AR3
LAR AR3, #0984h
...
ADD *+, 0, AR0
...
Data buffers are often used and always implemented with indirect addressing (circular and
linear). Processors require two additional elements to be specified within the indirect address
field of an instruction when indirect addressing is used: & The content of the current AR can be
incremented, decremented or it can stay unchanged. The change to be implemented must be
specified. E.g., Increment by 1: AR3 = 0984h, AR3 + 1 = 0985h. & Whether the Auxiliary
Register Pointer ARP should be updated to another AR or should stay the same must be specified
in the indirect address field of the instruction. E.g., Update ARP from AR3 to AR0.
Most addressing modes covered in this section involve attaching a second word to the instruction
word. These addressing modes thus require two program words to be stored in memory,
increasing program size and slowing execution time. Addressing 2-61
To remedy the effects of so called long addressing modes (2 program words in length) many
processors offer short versions of some of their addressing modes, or simply put, short
addressing modes. Short addressing modes use only one program word to specify both the
instruction and the address. However, by so doing, the range of addresses that can be specified is
shortened.
42
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
43
44
Digital Signal Processor Unit 3 – Program Execution
UNIT 3 – PROGRAM EXECUTION
UNIT OBJECTIVE
Upon completion of this unit, you will be familiar with the fundamentals of DSP program
execution.
UNIT FUNDAMENTALS
Program control refers to the rules (mechanisms) used in a processor for determining the next
instruction to be executed. The instruction set available to a DSP is executed by the Program
Controller unit. The instruction set controls such things as how data is sequenced through the
CALU and how values are read from and written to memory.
An instruction sent through the Program Controller is sent through an organization of

computational hardware. Different execution stages for one instruction proceed in parallel with
the different execution stages of other instructions. Executing instructions in such a fashion is
known as pipelining. A pipeline is a method of executing instructions in an assembly line
fashion. A must for developing efficient application code is detailed knowledge of the DSP
architecture being programmed. This is often true when it comes to knowing which “dirty tricks”
can be used to take advantage of an architecture's strengths. A DSP that has an orthogonal
instruction set is simpler to program and optimize than a DSP whose commands work only on
specific ALU registers.
DSP architecture constrains the type of operations and instructions that may be performed by the
processor. A DSP's instruction set has a profound influence on the processor's suitability for
different tasks. Not all instructions found in one DSP have analogs in other DSPs using different
architectures. The instruction used with a given architecture must be natural and efficient.
Processor architectures also have a profound influence on the suitability of a DSP for certain
tasks. As a result, different types of DSP architectures are used for different types of tasks. There
are many basic similarities between the DSP architectures available. These similarities are those
that have been covered or that will be covered in this course.
• CALU configured for digital signal processing

• specialized instruction set
• multiple memory banks and buses
• specialized addressing modes
• specialized execution control
• peripherals specialized for digital signal processing
45
These similarities are present because of the effort to provide high throughput with key signal
processing applications. Traditionally, high throughput was provided by specialized hardware
(designed to accelerate multiplications) and a dual-bus (Harvard) memory architecture. Any
improvements since then have been made with incremental enhancements.
Examples of incremental enhancements that have been made to the basic Harvard architecture
are the addition of an instruction cache and a memory block accessible by both the program and
data bus (program/data memory). Both of these improvements were made to the TMS320C50
DSP. The need, in recent years, to make faster and better DSPs has permitted new architectures
to appear. Processor performance can be improved by using faster clock speeds, however, the
process has limits. To increase performance beyond these limits, newer types of DSP
architectures are being made. These designs focus primarily on increasing the amount of useful
work that gets done every clock cycle. Improving the instruction execution rate is done by
making modifications to the way the Program Controller unit and the pipeline operate. VLIW
and Superscalar architectures are some of the non-traditional DSP architectures that have begun
being designed for programmable DSPs. They provide a 80% to 100% average increase in
performance over traditional DSP architectures. DSP processors have used complex, compound
instructions that allow a programmer to encode multiple operations in a single instruction. A
MAC instruction (Multiply and ACcumulate) is an example of a complex compound-instruction.
Processors using complex compound-instructions are limited to issuing and executing one
instruction per instruction cycle.
By issuing a single instruction and using a complex-instruction approach, DSP processors have
achieved very strong signal processing performance without requiring a large amount of program
memory.
Newer designs such as the VLIW and Superscalar architectures use parallelism to increase the
number of instructions executed per instruction cycle. These types of architectures when
implemented within DSPs are made to execute multiple RISC-like instructions during each
clock cycle. As opposed to a complex-instruction, a RISC instruction performs a basic operation
such as a data move. A RISC instruction set is small, certain instructions (those that provide
similar operations) are eliminated. The process of elimination is based upon careful quantitative
analyses leading in the end to higher performance. VLIW and Superscalar architectures are
similar. VLIW instruction parallelism is established and conducted by the compiler. Instruction
scheduling is done at compile time.
Superscalar architectures schedule operation parallelism at run time, making it invisible to users.
By so doing, the Superscalar design adds an amount of hardware overhead to the CPU. This is
not true of VLIW, where the burden is placed on the compiler.
46
NEW TERMS AND WORDS

None
EQUIPMENT REQUIRED
C5x VDE program
Ex3_1 and ex3_2 assembler and program files
Oscilloscope
Function generator
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
47
Exercise 1 – The Program Controller
EXERCISE OBJECTIVE
Upon completion of this exercise, you will be familiar with the function of the hardware and
software features that digital signal processors have evolved to handle program control.
EXERCISE DISCUSSION
All digital signal processors, and for that matter general-purpose processors, have a specialized
unit dedicated to executing the current instruction and determining the next instruction to
execute. Within the TMS320C50 DSP ('C50), the unit is known as the Program Controller. DSP
Program Controllers have evolved efficient hardware features to rapidly execute instructions.
Program controller hardware is said to have low-overhead (zero-overhead). The Program
Controller decodes instructions, manages the pipeline, stores the central processing unit (CPU)
status, and decodes conditional operations.
The following software mechanisms are managed by a DSP Program Controller.
• branch
• subroutine
• reset
• interrupt
• repeat
• conditional processing
The software mechanisms listed above, though not unique to all digital signal processors, are
used for program control. By using these software mechanisms a DSP programmer is in fact,
using specialized hardware features that belong to the Program Controller. The specialized
hardware in question can be categorized as follows:
• Program Counter register

• stack support
• repeat counters
• program counter-related hardware
• status registers
The Program Counter register (often abbreviated PC register) holds the program memory address
of the next instruction to be fetched and executed by the program controller. The content of the
program counter is updated every instruction cycle. Depending on the previous instruction
executed, the surrounding hardware (the program counter-related hardware) usually increments
the program counter by one. In certain cases, the program controller is loaded with an entirely
different program memory address.
48
These cases occur when the previous instruction executed was a program control instruction,
such as a call, a return from subroutine, or an interrupt service routine (ISR). Digital signal
processors use stacks to save and return address and status information during subroutines and
interrupt service routines (ISR). A stack can consist of any memory device. Most often DSPs
provide at least one of three kinds of stack support: A stack can consist of any memory device.
Most often DSPs provide at least one of three kinds of stack support: shadow registers
A stack can consist of any memory device. Most often DSPs provide at least one of three kinds
of stack support:
• hardware stack
• software stack
Every time that an interrupt is executed an interrupt context save is initiated. The content of
key DSP registers are saved to their respective backup registers (shadow registers). The values
in the copied registers are still available to the interrupt service routine (ISR) code but after
context save they are protected while held in the shadow registers. The shadow registers are
copied back to the CPU registers when the return from subroutine instruction is given. Context
save and restore is automatic and reduces DSP ISR overhead; The programmer avoids including
the save and restore operations as instructions in the ISR code. The hardware stack is used
during interrupts and subroutines to save and restore the content of the Program Counter register.
The programmer usually does not have control over the hardware stack. The hardware stack is
not used except invisibly during subroutine calls, interrupt service routines and repeat
instructions. When a subroutine is called, an interrupt occurs or a repeat is executed, the current
contents of the Program Counter register (the return address) is automatically saved to the stack
(pushed on to the stack). When a return operation occurs, the return address is retrieved from the
stack (popped from the stack) and loaded into the Program Counter. The key advantage of a
software stack over a hardware stack is that its depth can be configured by the programmer.
This can be done by simply reserving an appropriately sized section of memory. Hardware
stacks, in contrast, are usually fairly shallow and the programmer must carefully guard against
stack overflow (by avoiding nesting of too many interrupts or subroutines).
DSP algorithms frequently involve the repetitive execution of a small number of instructions.
Such operations are required in FIR and IIR filters, FFT and matrix multiplication algorithms
(these are different types of signal processing operations).
49
To eliminate looping overhead in DSPs, Program Controllers have been designed with circuits
capable of repetitively executing a small number of instructions. The operation they perform is
often called hardware looping. The following registers in the 'C50 are used by hardware loops.
The Program Counter acts as the instruction cache that stores the instruction to be repeated
during single-instruction hardware looping. The RePeaT Counter register holds the count on the
number of times the instruction held in the instruction cache must be repeated during single-
instruction hardware looping. The multiple-instruction hardware looping registers used in the
'C50 (BRCR, PASR, and PAER) are used for control and status of the hardware loop. Hardware
loops, as opposed to software loops, lose no time incrementing or decrementing an index.
Example: Difference in the overhead required for a software and a hardware loop.
B = 16 RPT #16
LOOP: MAC H0, X0 MAC H0, X0
B=B-1
Branch to LOOP
The above examples implement an FIR filter. The one on the left uses a software loop and the
one on the right uses a hardware loop (done using the RPT instruction). The hardware loop
executes the RPT instruction only once and then automatically repeats 16 times the multiply and
accumulate instruction. Hardware looping overhead is reduced compared with software looping
overhead.
The hardware loop, in this example, executes a single-instruction several times. It is a single-
instruction hardware loop. During a single-instruction hardware loop, after the repeat (RPT)
instruction is executed: The microcode instruction to be repeated is loaded into the instruction
cache (PC register), and a counter (RPTC) is loaded with the value of the number of times the
instruction is to be repeated. During the loop, the Program Counter acting as the instruction
cache supplies to the Program Controller, the instruction to be executed. Many instructions that
take two or more cycles to execute will only take one when executed from within a hardware
loop that uses an instruction cache. All DSPs use instruction caches (that are 1-word deep) to
implement single instruction hardware loops, however, not all DSPs use multi-word instruction
caches to implement multi-instruction hardware loops. Multi-instruction hardware loops that
don't use an instruction cache must re-read the instructions being repeated each time the
processor (Program Controller) proceeds through the loop. By not using an instruction cache and
needing to re-read repeated instructions, the program bus cannot be freed. This means that
instructions that execute more rapidly in single-instruction hardware loops won't in multi-
instruction hardware loops without instruction caches.
50
Based on your current knowledge of hardware loops which of the following is true?
a. Multi-instruction loops are limited by the number of instructions that can be repeated.
b. Single-instruction and multi-instruction loops are often limited by the minimum number of
times that a loop can be repeated.
c. Single-instruction and multi-instruction loops are often limited by the maximum number of
times that a loop can be repeated.
d. All of the above.
Hardware loops have certain limitations associated with them that are not necessarily associated
with software loops.
The number of instructions repeated in multi-instruction loops might be limited by a maximum
value.
The minimum and the maximum number of times a loop can be repeated for both single- and
multi-instruction loops might also be limited.
The fallbacks of traditional software approaches to repeated instruction execution, however, are
that:
Branch instructions typically require several instruction cycles to execute. The processor must
usually use a register to maintain the loop index, which is the count of the number of times the
instruction(s) to be repeated must still be executed.
The processor data path must then be used to increment or decrement the index and test to see if
the loop condition has been met. To avoid these problems, DSP processors have evolved special
hardware control constructs that repeat either a single instruction or a group of instructions a
number of times.
As stated, the primary role of the Program Controller is to determine the next instruction to be
executed. Interrupts are used to signal to a processor both external (a push-button is pressed) and
internal (a word is received through the serial port) events. All DSPs use interrupts and most use
interrupts as their primary means of communicating with peripherals. An interrupt is an external
event that causes the processor to stop executing its current program and to branch to a special
block of code called an interrupt service routine (ISR). The ISR code, once called, typically deals
with the source of the data that signaled the interrupt. E.g., if a word is received through the
serial port, an interrupt is signaled. The ISR will execute the necessary code to process the word.
Once an interrupt is signaled to the Program Controller, a branch instruction is executed and the
Program Counter is loaded with the pma of a special block of code (often called an interrupt
vector). Interrupts can be disabled. In fact, this occurs during DSP initialization, ISRs, and
single-instruction hardware loops. It is the Program Controller that disables interrupts for the
duration of single instruction hardware loop execution. A direct consequence of this inability to
access an interrupt is that a programmer must carefully consider the maximum interrupt lockout-
time that can be accepted.
51
Most processors, including DSPs, sample the status of the interrupt lines every instruction cycle.
The processor uses status registers to signal interrupts (once sampled) and other information to
the Program Controller. From the previous discussion and procedure sections in this manual you
have become acquainted with a few of the status and control registers of the TMS320C50 DSP.
Though, not all DSPs can be said to have the same number of status and control registers, it is
true that all DSPs do contain these types of registers. Many of the registers used by the 'C50
Program Controller and CPU have equivalent counterparts in other DSPs.
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
52
Exercise 2 – The Pipeline
EXERCISE OBJECTIVE
Upon completion of this exercise, you will know the advantages of having a deep pipeline as
used in DSPs. You will be familiar with the layout of a pipeline reservation table and be able to
use it to solve pipeline conflicts.
DISCUSSION
In everyday life, whenever a large task must be done rapidly it is divided into smaller tasks and
then distributed among workers. A process whose beginnings took place in the early 20th
century and which is now known as the assembly line. By dividing the task into smaller
operations and working on each operation separately yet at the same time, the overall task to be
completed is finished much faster. Digital Signal Processors (nearly all on the market) use the
same process to execute program instructions. The process, when used within a processor, is
given the name of pipelining. By dividing the sequence of operations into smaller pieces, and by
executing the pieces in a pipeline, processor performance is increased. The number of
instructions executed per unit time is increased without changing the total time required to
execute an instruction.
Pipelining, though meant to improve performance, can complicate programming. For example, &
On some processors, pipelining causes certain instruction sequences to execute more slowly. &
On other processors, certain instruction sequences must be avoided for correct operation of the
program. Pipelining represents a trade-off between efficiency and ease of use. To illustrate how
pipelining increases performance, consider the Texas Instruments TMS320C50 ('C50) DSP. It
uses four separate execution units to process a single instruction word. The units accomplish the
following actions:
• Fetch
• Decode
• Read
• Execute
Because four actions are implemented per instruction word this pipeline is said to be a 4-level
pipeline. A DSP possesses a 5-level pipeline.
53
Most DSPs are pipelined, pipeline depth may vary between the types of DSPs. DSPs when
compared to general-purpose processors have on average deeper pipelines. A processor with an
ideal (no problems occur during instruction execution) N-level pipeline, will have the number of
instructions that it can execute per instruction cycle approximately increased by a factor N,
compared with the same processor not using a pipeline. However, processor performance begins
to drop when the pipeline becomes too large; The time required to control the pipeline execution
stages becomes too large.
Parallel execution of the different hardware execution units is not possible in this case. A
processor using the execution units sequentially, does not begin processing the next instruction
until the current instruction has been executed.
If a clock cycle occurs every 50 ns (as it does with the 'C50 found on the DSP circuit board),
then in this case an instruction takes 200 ns to complete.
The execution stages now work in parallel; While one stage is fetching the next instruction,
another is decoding the previous instruction, and so on. Because these operations (Fetch,
Decode, Read, and Execute) are done in parallel, the instruction cycle times are much shorter
than they are when the operations are executed sequentially. A subtle point about pipelining is
that an instruction may be spread out over multiple instruction cycles, and yet still appear to the
programmer to execute in one instruction cycle. In reality, an instruction is executed every
instruction cycle, though, it takes each individual instruction, 4 clock cycles to execute.
An important pipeline characteristic is that by increasing the depth of a pipeline, the chance of
occurrence of resource contention is increased and by the same token the programming
complexity level needed to avoid pipeline conflicts is also increased. A pipeline conflict is a
situation encountered due to the natural operation of the pipeline, the occurrence of such a
situation reduces processor performance. Pipeline conflicts can be avoided by careful
programming. A pipeline conflict can be categorized into one of three different classes:
• Structural conflicts
• Data conflicts
• Control Conflicts
A structural conflict occurs during a given instruction cycle because two or more phases of a
pipeline require the same hardware resource, such as data bus use, register access or memory
block access. A data conflict (also called a pipeline hazard) occurs when a dependence between
instructions exists and, because of the pipeline, a data operand is not provided to an instruction at
the appropriate time. This type of situation could occur because certain processor register
modifications could only happen during certain pipeline phases. The 'C50 may only make
modifications to the ARAU registers during the Read phase of the pipeline. An instruction using
and modifying an ARAU register and depending on the previous instruction for the contents of
the register will cause a pipeline data conflict.
54
A control conflict occurs when a conditional software control instruction is executed. The
execution of the program memory addresses sequentially following the conditional instruction
must be suspended until the conditional instruction has been executed. Though the previously
described situations are given the name of conflict, proper program execution is not necessarily
halted. In fact, to avoid resource contention, and in the process pipeline conflicts, programmable
DSPs can use three fundamentally different techniques:
• interlocking
• time-stationary coding
• data-stationary coding
A fine line exists between the different techniques listed above. Interlocking is a type of pipeline
behavior. The pipeline reacts in certain manners when confronted with certain situations, such as
pipeline conflicts. Where as time-stationary and data-stationary coding are programming models,
that is, code formats used by a programmer. The TI TMS320C5x family of DSPs use
interlocking, while most members of the AT&T DSP16/16A family use time-stationary coding.
The members of the AT&T DSP32/32C family use data-stationary coding. As with most
definitions of technical concepts, the boundaries between the techniques used among the various
DSPs for avoiding pipeline conflicts are not rigid; Most DSPs use a flavor of all three
techniques.
Interlocks are not always easy to spot. Pipeline operation and interlocks can be visualized using
a reservation table. The columns are divided between the many instruction cycles that are
executed by the DSP. The operations held in a column occur at the same time; Progression
through time is from left to right. The four rows correspond to each of the four pipeline stages.
The reservation table for a 5-level pipeline would have five rows.
One solution to pipeline conflicts is interlocking. An interlocking pipeline delays the

progression of an instruction through a pipeline stage. This occurs when the instruction is in
resource contention with the pipeline stage of another instruction (structural conflict).
55
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
56
Digital Signal Processor Unit 4 – Basic I/O
UNIT 4 – BASIC I/O
UNIT OBJECTIVE
Upon completion of this unit, you will have a sound understanding of the methods used by a
DSP for off-chip communication.
UNIT FUNDAMENTALS
Most signal processing applications require the DSP to communicate with the external (analog)
world. In these cases, data must usually be input to and/or output from the DSP. Filtering,
waveform generation, and companding are examples of signal processing applications that
require communication between the DSP and the outside world.
For any given DSP in existence today, features such as arithmetic performance, memory
bandwidth, addressing modes, execution control, and instruction set orthogonality were
carefully evaluated before a final design was done. To manage communication with the outside
world, as well as ensure real-time signal processing, today's DSPs have also had to evolve
specialized peripherals. Signal processing applications demand specialized peripherals, the
following are found integrated into most DSPs:
• Synchronous serial ports

• Parallel ports
• Timers
• On-chip Analog to Digital and Digital to Analog converters
• Host ports
• Bit I/O ports
• On-chip DMA controller
• Clock generators
The listed peripherals, when integrated into DSPs, were designed to operate even when the CPU
is in power down mode (idle).
Actual communication between the digital signal processor and off-chip circuitry, devices, or
peripherals is made via its input and output pins. The pins are located on the exterior surface of
the DSP integrated circuit package. Each pin corresponds to the output or input of a key
processor signal. Most DSP package pins are used for external memory interfaces (serial and
parallel port interfaces), some are dedicated to output and input of clock signals, others still (such
as the external interrupt lines or the reset pin), allow for external devices to assert processor
states. For most DSPs, transmission and reception of data words is done using serial and parallel
ports. The serial and parallel interfaces are found among the DSP package input and output pins.
57
A serial interface transmits and receives data one bit at a time. Parallel interfaces send or receive
entire data words (8, 16, or 32 bits long) one at a time. To do this a parallel port has a data line
for each bit sent. Parallel ports transmit more bits per second, however they require more
external interface pins than serial ports. A parallel port is used to transmit and receive data words
from external hardware such as an off-chip memory block, a DIP switch, or a display unit. A
parallel port interface is usually made up of two types of data lines:
• data bit signals.

• a strobe or handshake signal.
• serial port interface is usually made up of three separate data lines:
• a bit clock signal.
• a frame synchronization signal.
• a data signal.
The serial port interface can be used for various applications, such as:
• Transmission and reception of data samples from a codec, an analog to digital (A/D), or a
digital to analog (D/A) converter.
• Communication with other processors (such as other DSPs).
• Communication with other external serial systems.
Once communication has been established with an external peripheral, such as a codec, an off-
chip memory block or another processor, the DSP signal processing application may be
implemented unhindered.
Certain DSPs support multi-processor setups (a form of parallelism). Parallelism not only refers
to the concurrent execution of multiple instructions it can also refer to parallel processing.
Parallel processing is when two or more digital signal processor chips are used for a given
application (multi-processing) and are connected through a shared serial line, allowing inter-
processor communication. One chip is assigned as the master and the others as the slaves. Many
applications have stringent real-time constraints that require multiple DSPs to be used in
concert. DSPs that support multi-processor setups require special Time-Division Multiplexing
(TDM) serial port optimization.
TDM is used to manage communication between the processors over the shared serial line. In a
TDM network, time is divided into time slots. Each time slot is associated with a different
processor. During a given time slot the associated processor may transmit data, the others must
receive and may not transmit. The destination for the transmitted data word can be included in
the data word or it can be sent via a secondary data line in parallel with the data word. Either
approach can be used. The DSP peripherals and their interfaces are important to, and answer the
needs of digital signal processor tasks.
58
NEW TERMS AND WORDS

None
EQUIPMENT REQUIRED
C5x VDE program
Ex4_1 and ex4_2 assembler and program files
Oscilloscope
Function generator
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
59
Exercise 1 – DSP Peripherals
EXERCISE OBJECTIVE
Upon completion of this exercise, you will be familiar with the specialized peripherals used by
DSPs.
DISCUSSION
The peripherals found on the TMS320C50 ('C50) DSP are good examples of the types used on
many existing programmable DSPs. The 'C50 used by the Lab-Volt Digital Signal Processor
FACET board, has many of its on-chip peripherals communicating with devices on the circuit
board. The 'C50 serial port is used to communicate with the CODEC. The 'C50 parallel port is
used to communicate with the ROM memory chip, the 4 I/O INTERFACE displays, and the DIP
switch. The 'C50 master clock is externally generated by the oscillator and input to the DSP.
Two of the 'C50 external user interrupt lines are each connected to a push-button on the surface
of the FACET circuit board.
A DSP serial port is usually divided into two sections: a receive section and a transmit section.
The transmit and receive sections may be independent. There will be a:
• receive data line

• receive frame synchronization line
• receive bit clock line
• transmit data line
• transmit frame synchronization line
• transmit bit clock line
In other DSPs, independent receive and transmit data pins exist, however, the frame
synchronization and bit clock lines are shared between the two sections. The 'C50 serial port
operates through three memory-mapped user registers (known as DRR, DXR, and SPC). DRR
(Data Receive Register) is where words received through the serial port receive data pin are
stored. DRR is memory-mapped to data memory address (dma) 20h. DXR (Data transmit
Register) is where words to be transmitted through the serial transmit data pin are stored. Once a
word is stored in DXR the serial port transmit circuitry takes charge of transmitting the word.
DXR is memory-mapped to dma 21h. SPC (Serial Port Control) is a status and mode control
register for the DSP serial port. DSP Peripherals 4-9 Through the memory-mapped Serial Port
Control register (SPC) the 'C50 DSP allows the programmer to specify the serial port transmit
and receive characteristics.
60
The bit clock line polarity, the shift direction, data word length, and whether the frame
synchronization signals are bit-length or word-length are serial port characteristics that certain
DSP chips may permit the programmer to configure. A DSP parallel port is usually implemented
in one of two ways. The main processor data bus can be used as the parallel port or the parallel
port can be made separate from the processor external bus interface. Processors separating the
parallel port from the external bus interface simplify interfacing to external devices.
DSPs which use their data bus as a parallel port typically reserve a special section of their
address space for access to off-chip devices. In some DSPs, the reserved memory addresses are
accessed with specialized instructions. The 'C50 has two such parallel port instructions (IN and
OUT). The 'C50 IN and OUT instructions are used to read and write a data word from and to an
external I/O port. I/O port accesses are distinguished from program and data accesses by a
designated strobe or handshake pin. The pin is asserted when the external read or write is
performed. Clock signals are used to sequence DSP operations. A clock signal consists of a
square wave at some known frequency. The highest frequency clock signal within a processor is
known as the master clock. The master clock signal is typically generated externally. However,
a number of DSPs now have phase-locked loops (PLL). These DSPs require only a very low
frequency input signal to generate using the PLL the master clock signal. The master clock signal
can be generated in many different ways. The choice of which way to generate the clock signal
can usually be configured by the programmer. A large number of programmable DSPs provide
timers. A timer is a peripheral that changes the content of a register at regular intervals in such a
manner as to measure time. Some DSPs provide a timer output pin. A square wave at the timer
frequency can be output from the pin providing a software-controlled oscillator to a programmer.
The 'C50 timer as used on the FACET circuit board outputs a square wave at a frequency of 10
MHz. The signal is input to the codec as its master clock. Recall that in exercise 2 of Unit 2, we
had used the timer of the 'C50 DSP. The timer register (TIM) was read at the beginning and at
the end of an algorithm. In this manner the duration of the algorithm was measured. Measuring
the duration of an event is a possible timer application, however, on DSPs, timers are usually
used as a source of periodic interrupts. A timer in reality consists of a:
• clock source
• prescaler
• counter
The clock source usually consists of the DSP master clock signal. A plethora of package pins
exist on programmable DSPs, related to some of these pins are additional peripherals that we
have not yet covered in this discussion.
• External user interrupts

• Bit I/O ports
The TMS320C50 DSP found on the FACET circuit board uses external user interrupts. In fact,
most DSPs provide this type of peripheral. An external interrupt functions exactly the same as an
“internal” interrupt, however, in most cases, external user interrupts are given lower response
priority than other interrupts.
61
The TMS320C50 DSP found on the FACET circuit board uses bit I/O ports (two of them, BIO#
and XF), also known in certain DSPs as general-purpose I/O pins, to establish communication
between the C5x VDE and the DSP. Bit I/O ports are software controlled. In this particular
application (communication between the C5x VDE and the DSP), software control of the BIO#
and XF I/O ports is done with a communication program (kernel) held in off-chip ROM. The
software is first run by the DSP when communication between the C5x VDE debugger program
and the DSP is attempted. CODEC is the abbreviation for CODer-DECoder. It is an electronic
circuit that converts analog signals into digital representations, and decodes digital signals into
analog form. Though signal processing can be entirely done with digital signals, most often a
conversion from analog-to-digital and back again is required.
A CODEC is usually made up of the following components:
• a programmable input gain

• an anti-aliasing filter
• an Analog-to-Digital converter
• a Digital-to-Analog converter
• a post-filter
As stated, 'C50 communication with the codec is established through the DSP receive and
transmit serial interfaces. The dual serial communication can only be implemented after both the
serial DSP peripherals and the codec are initialized.
62
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
63
Exercise 2 – Digital Signal Processing: The FIR Filter
EXERCISE OBJECTIVE
Upon completion of this exercise, you will be familiar with a common DSP application, known
as filtering.
DISCUSSION
Signals are received and transmitted by both digital and analog processing systems. The effect of
system processing on a signal can be visualized (analyzed) in two different ways, either using the
time domain or the frequency domain. In the frequency domain a single-frequency signal is
represented as a peak.
More complex, and in turn, more common signals such as the square wave, can be represented as
a superposition of many harmonically related sinusoids. A signal visualized in the frequency
domain has a certain spectrum. In our example, the square wave spectrum shows that there is a
large voltage (amplitude) associated with a central frequency and then decreasing voltages
associated with side frequencies. Electrical components can be similarly described. Resistors,
capacitors, and inductors can be expressed in both the time and frequency domains. Certain
electrical circuits thus have the effect of attenuating or amplifying the frequency components of
signals.
The fact that electrical circuits produce a gain or loss that is proportional to signal frequency is
exploited when creating filters. A filter is a device which transmits signals at frequencies within
one or more frequency bands and attenuates signals at all other frequencies. An RC circuit,
shown above, acts as a filter. DC signals are transmitted un-attenuated through the circuit, while
higher frequency components are greatly attenuated. A filter is used to, in effect, shape the
spectrum of a signal. Filtering consists in the suppression or attenuation of unwanted signal
frequencies. The operation can be implemented by analog circuits (using different types of
electrical components) or using a DSP.
A filter is often defined, depending on the frequencies that it is meant to attenuate, as either:
• low-pass
• high-pass
• band-pass (notch)
64
The above graphs are named Bode plots, or also generally known as filter frequency responses.
From a frequency response graph (Bode plot) many filter properties and characteristics are
apparent. The pass band of a filter is defined as the range of frequencies over which signals pass
virtually un-attenuated through the filter. The cut-off frequency is known as the point where the
response of the pass band drops by 3 dB. The transition region is the area between the pass band
and the stop band. A large gain rate of change with frequency within this region usually
improves filter performance (depending on the specific application). The stop band is defined as
the range of frequencies where signals pass through the circuit and are attenuated. The level of
attenuation is dependant on the filter design specifications. Many other characteristics belong to
a filter, however, describing the full extent of them would require a more in depth discussion.
For the moment, it is important to remember that a filter creates a variation of signal gain with
frequency.
As stated, a filter takes advantage of the frequency domain representation of electronic circuit
components (such as the capacitor and inductor). However, filtering can also be performed by
digital signal processors. The effects, once produced only by analog circuits, are able to be
mathematically represented and efficiently performed by processors. Until recently, the most
common application for the DSP was the filter (the digital filter). DSPs are preferred over analog
circuits when implementing such things as filters. This is because digitizing any design ensures
that the same results can be reproduced time and time again. Digital filters are implemented with
the summation, of filter coefficient (Ai) and data sample (Sj) multiplications. The coefficients
are stored in memory and they represent the filter frequency response information. Usually, the
more coefficients used to represent a filter the smaller the transition region of the filter. The
data samples represent the input signal. The operation shown above can be executed with the aid
of the Multiply and ACcumulate (MAC) instruction, found on nearly all DSPs. Traditionally,
DSPs had been specifically enhanced to implement filter operations in real-time. The MAC
instruction was and still is the cornerstone of the filter operation.
RPT #79
MACD #C0,*-
APAC
The TMS320C50 DSP requires only three source code instructions to properly implement the
mathematical part of a filter algorithm.
The RPT source statement instructs the DSP to loop the following instruction 80 times (79+1).
The MACD instruction multiplies the corresponding filter coefficient C with an indirectly
addressed data point which has been stored in memory. The APAC source statement finishes the
calculation of the filtered signal sample by adding the accumulator and product register together.
65
NOTES
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
______________________________________________________________________________
66
APPENDIX A – SAFETY
Safety is everyone's responsibility. All must cooperate to create the safest possible working
environment. Students must be reminded of the potential for harm, given common sense safety
rules, and instructed to follow the electrical safety rules.
Any environment can be hazardous when it is unfamiliar. The F.A.C.E.T. computer-based

laboratory may be a new environment to some students. Instruct students in the proper use of the
F.A.C.E.T. equipment and explain what behavior is expected of them in this laboratory. It is up
to the instructor to provide the necessary introduction to the learning environment and the
equipment. This task will prevent injury to both student and equipment.
The voltage and current used in the F.A.C.E.T. Computer-Based Laboratory are, in themselves,
harmless to the normal, healthy person. However, an electrical shock coming as a surprise will
be uncomfortable and may cause a reaction that could create injury. The students should be made
aware of the following electrical safety rules.
1. Turn off the power before working on a circuit.

2. Always confirm that the circuit is wired correctly before turning on the power. If required,
have your instructor check your circuit wiring.
3. Perform the experiments as you are instructed: do not deviate from the documentation.
4. Never touch “live” wires with your bare hands or with tools.
5. Always hold test leads by their insulated areas.
6. Be aware that some components can become very hot during operation. (However, this is not
a normal condition for your F.A.C.E.T. course equipment.) Always allow time for the
components to cool before proceeding to touch or remove them from the circuit.
7. Do not work without supervision. Be sure someone is nearby to shut off the power and
provide first aid in case of an accident.
8. Remove power cords by the plug, not by pulling on the cord. Check for cracked or broken
insulation on the cord.
THIS
THIS

31946-J0 DigitalSignalProcessor SW ED2 PR2 Web

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

31946-J0 DigitalSignalProcessor SW ED2 PR2 Web

Uploaded by

Copyright:

Available Formats

Digital Signal Processor

Second Printing, March 2005

Copyright September, 2003 Lab-Volt Systems, Inc.

Lab-Volt and F.A.C.E.T.® logos are trademarks of Lab-Volt Systems, Inc.

Unit 1 – DSP Trainer Familiarization..........................................................................................1

♦ The unit objective

The Appendix includes safety information.

UNIT 1 – DSP TRAINER FAMILIARIZATION

A DigitalSignalProcessor(DSP) is an incredibly fast and powerful microprocessor that, like our

Digital Signal Processors are characterized by:

In 1998, DSPs, using parallelism, reached calculation speeds of up to 1600 MIPS.

If a programmer were to write a DSP program

For this reason, an assembler language is

This is a programming language whose

An assemblerand a linkerare used to translate

The C language is a high-level languagewhich is

Programming in C simplifies the design of DSP

A C compiler is used to translate the C source codes

This final process is commonly known as debugging.

A program that aids software debugging is called a debugger.

NEW TERMS AND WORDS

floating-point - a system of arithmetic characterized by a notation where real numbers are

Exercise 1 – Introduction to the DSP Circuit Board

The circuit board accessories are the:

• POWER SUPPLY with AUXILIARY POWER INPUT

The MICROPHONE PRE-AMPLIFIER is used to adjust a microphone's signal to a level suitable

A CODEC is the translator that is used for this purpose.

A CODEC is usually made up of the following components:

• a programmable input GAIN

The middle header has the following input/output (I/O) pins:

• Data, Program, and I/O space select (DS#, PS#, IS#)

Exercise 2 – The Assembler and Debugger

There are four fields that make up a statement:

• the label (optional)

Assembler directives permit the following to be done:

• initialize program instructions and data values into memory.

The Dis-Assembly window displays four columns of information:

1. The address in memory where the instruction is found,

Exercise 3 – Processor Arithmetic

UNIT 2 – CPU ARCHITECTURE

A typical DSP contains:

• Program Read-Only Memory (ROM)

The CPU of the TMS320C50(C50) contains:

• Program control elements

NEW TERMS AND WORDS

Exercise 1 – The Central Arithmetic Logic Unit

A fixed-point CALU contains:

1. Its memory architecture (a DSP usually has a Harvard architecture).

EXAMPLE OPERAND 1 OPERAND 2 OPERATION OUTPUT

• Data path (e.g., to fetch an operand from memory)

Multiplication in fixed-point DSPs is executed with 2s-complement arithmetic. A Multiplier

• Data memory using the Data Bus (DB)

OPERAND 1 OPERAND 2 OPERATION RESULT PREG (AFTER SIGN EXT.)

OPERAND 1 OPERAND 2 OPERATION ACCUMULATOR OVM CORRECTION

DSP FAMILY METHOD USED TO AVOID OVERFLOW

Exercise 2 – Memory Space

HARVARD ARCHITECTURE MODIFICATION 1

HARVARD ARCHITECTURE MODIFICATION 2

The most common types of addressing found in DSPs are:

An Example of Direct Addressing with One Program Word

An Example of Direct Addressing with Two Program Words

Example of Short Immediate Addressing in the TMS320C50 DSP

Example of Long Immediate Addressing in the TMS320C50 DSP