This action might not be possible to undo. Are you sure you want to continue?
both of the above approaches, the MAC operation ofH/W mnltipliers and/or multiplier accumulator
can be completed in one clock cycle. The presence oftbe mandatory requirements of a P-DSP.
Introduction to Programmable DSPs
Fig.2.11 Implementation of convolver wid! single multiplier/;Idder In Fig. 2-I,Yn , the output at the nth sampling instant, is obtained by multiplying the array x" = [x" ·xn _ !x" -2 "', x" _ IJ( + 3 X" _ M + z X" -M+ !l corresponding to the present and the past M - I samples of the input with the array b .. rho hI hz·•· hM-3 hlJ( _ 2 hM_ corresponding to the impulse response sequence. To obtain Yn+ l' the input signal array xn + 1is multiplied with the array ·b. The vector x" .. I is obtained by shifting the array x" towards right so that the (n + 1)1h sample of the input data X. + ! becomes the first element and all the elements of x" are shifted towards right by 1 position so that the lh element of x" becomes the (i + 1)111 element of x" + I. Instead of shifting the elements of x" towards right all at a. time after fi..nishmg the vector multiplication, each of the elements may be shifted separately soon after the MAC operation that uses these elements is over. For example, after obtaining the pood.Uctxn_M+! hM-1, the elementx:n_Mmaybe made to be equal tox,,_M+ I' Similarly after obtaining the product x. _M + 2, hM _ 2~ the .element X. _ M + 1 may be made equal to x. _M + 2 and so on. This is achieved inP~DSP by using a special instruction called MACD multiply accumulate with data sbift. For example, 'FMS32OCSX has the instruction MACD pgm, dma, which multiplies the content of the program memorypgm with the content of the data memory with address dma 'and stores the result in the product register. The content of product register is added to the accumulator
The programmable digital signal processors (P-DSPs) are designed with. features that are specifically required for digital signal processing applications. The conventional microprocessors are meant fO! general purpose applications and hence they do not have these features. However, an advanced microprocessor or a RIse processor may use some of the techniques adopted in P-DSPs or may even have instructions that are specifically required for DSP applications. They may have performances close to that ofa P-DSP for certain operations. For example, the DEC Alpha 21064 computes a 1024 point complex FFT in 480 us, as compared to the Analog device ADSP 2W60 that takes about 460 !-IS to carry out the same operation. However in terms of low power requirement, cost, real time 1/0 capability and availability ofhigb speed on-chip memories, the P-D'SPs have an advantage over the advanced microprocessors and the rusc processors. In this chapter some of the features specifically required for performing digital signal processing operations efficiently are discussed in detail. ..
before the new product address is dma + 1.
is stored. FUrther, the oontent of dma is copied to the next location whose -
MULTIPLIER AND MULTIPLIER ACOJMUlAliOR
One of tile most common operations requiredin digital signa:} processing applications is array multiplication. For example, convolution and correlation require array multiplication. In Chapter J, it Wall shown how the array multiplication can be <lone using a single multiplier and adder. The implementation scheme is. reproduced in Fig. 2.1. One of the important requirements of these array multipliers is that they have to process the signals in real time. Before the next sample of the input signal arrives al the input to the array, the array multiplication should be completed, This requires the multiplication a! well as accumulation to be carried ·oul using hardware elements. There are two approaches 10 solve this problem. A dedicated MAC unit may be implemented in hardware, which integrates multiplier and accumulator in a single hardware unit. This approach is adopted by the Motorola DSP processor DSP5600X. The other approach is to have multiplier and accumulator separate. For example, in the Texas Instruments DSP processor, 320C5X,: theoutput of the multiplier is stored into the product register. The content of this in turn call be added to accumulator register A.Ce in the central AU), In
M0DIFIE6 BUS STRUCTURES AND MEMORY ACCESS SCHEMES IN P-DSPs
requires four memory accesses per instruction cycle. (An instruction cycle is the time that elapses since an. instruction is fetched till thepartieular insauction completes execution including the time taken for writing the result into a register or memory ..Many of the instructions in P-DSPs including theMACD mstruction require only one processor clock period/instruction cycle, In the conventional microprocessors one instruction cycle corresponds to several clock periods.) The four memory accesses/clock period required for the MACD instrncti.ons are as follows: t. Fetch the MACD instruction from the program memory 2. Fetch one of the operands from the program memory 3. Felch the second operand from the data memory
It may be noted that the MAC operation with data move (i.e, the MACDinstruction)
OigitalSigi'lJI PfOCe'~SOrs 4. Write dma+ the content 1 of the data memory with address
dma into the location
wIth the addre
I '----;.....,. _ _,l·
Froc.essing Unit H'-------I
Unit\- 1· ...
The relatively static impulse response coefficients are storedin tile program memory and the Sail ples of the input data are stored in the data memory. If the MACD instruction is to 'be executed in machine with Von Neumann architecture, ir requires, four clock cycles, This is because in the V( Neumann ari;;!:d!e(:tyre~hown in Fig, 2.2 there isa single address bus and a single data bus for acce,s ing the program as well as. data memory area .. One of the ways by which the number of clock cye], required for: the memory access can be reduced! is to use more than. one bus fi).r both ~ddres;s and dat For example in the Harvard architecture shown in Fig. 2.3, there are two separate buses for tI
program and data memory. Hence the content of program memory and data memory can 'be accessed in parallel. The instruction code can 'be fed from the program memory to the control unit while the operand is fed to the processing unit from the data memory. The processing unit consisting of the registers and processing elements such as MAC units, multiplier, ALU, shifter, etc., are also referred to as data path. The P-DSPs follow the modified Harvard architecture shown in Fig. 2.4, One set of bus is used to access a memory that has both program and data and another that has data alone. Data can also be transferred from one memory to another. The modified Harvard architecture is used in several P-DSPs, for e)(amplep:DSPs from Texas Instruments and Analog devices. With the Harvard architecture, the number of memory accesses/clock cycle was shown to be two, This can be .increased further by using more number of buses. For example, by using three separate address and data 'buses, the number of memQry accesses/clock cycle can 'be increased to three. Motorola DSP5600X, DSP960Q2, etc. haw three separate buses. TMS320C54X has four address buses. Since the cost of an Ie increases with the number of pins in the Ie, extending a number of buses outside the chip would unduly increase the price. Hence the P-DSP's use multiple buses only for connecting the on-chip memory to the control unit and data path, For accessingoff-chip memory only a single bus is used for accessing both the program memory and data memory. Because of this, any operation that involves all off-chip memory is slow compared 10 that using the on-chip memory,
MULTIPLE ACCESS MEM0RY
,...--'--_,__-ll----In-s'-.ru-ct-io-n-. Contro! 'U:~jt
f-------,-- --,-------loj' __
The number of memory accesses/clock period can also be increased by using a high speed memory that permits more than. one memory access/clock period. For example, the DARAM, the dual access RAM, permits two memory access/clock period, Multiple access RAM may be connected 10 the processing unit of the P-DSP by using the Harvard architecture. For example DARAM connected to ,a P-DSP with two independent data and address buses can be used to achieve four memory accesses! clock period.
2.4 MULTIPORTED MEMORY
Another technique that is adopted for increasing the number of accesses/clock period is to use multiported memory. For example the dual port memory has two independent data and address buses as shown in Fig. 2.5 and hence two memory accesses can be achieved in a clock period, Multiported memories dispense with the need for storing the program and data. in.two different memory chips in order to permit simultaneous access to both program and data memory. However, one of the major limitations of the dualported memory is the increase in the cost compared to two single port memory of the same total capacity. This is because (If the increased number of pins and larger chip area required for the dualported memory. Larger number of 110 pins require a larger and more expensive package and a larger die size.
Address Bus 1
rooessing ~ Status Unit ~~
Ras_'U_llsf_o;_pe_ra_n_d_' .J....'o-i s_'
Control Unit Addre~s Memory Address Bus 2
Dual Port Memory
Data Bus 1 Data Bus 2
Modified harvard architecture
Block diagram ora dualpol1ed memory
Digital Sign~1 Processing vi) Since DSP processors multi processing vii) To support mem.ory.
Digital Signal, Processors they should have
can be used with general processors, fast, the DSP processors
should have on chip
Digital Signal Processors
In this chapter we will briefly introduce DSP processor architecture and their features. In the previous chapters we have implemented various DSF algorithms such as convolution, FFf, filtering, correlation etc in. C. There are few things common to' all DSP algorithms such as, i) Processing on arrays is involved. are mWtiply and accumulate.
viii) For real. time applications interrupts and timers are required. Hence DSP . processors 'should have powerful interrupt structure and timers. The architectures of DSP processors are designed to have these features, The processors from Analog Devices, Texas Instruments, Motorola etc are commonly used.
5.1.2 Types of Architectures
There are three types of standard follows: I) Von-Neumann Architecture :
archltectures fur microprocessors.
They are as
ii) Majority of operations
iii) Linear and circular shifting of arrays is required. These operations require Iargeutime when they are vimplemented on general purpose processors. This is because the hardware of general purpose processors is not . optimized to perform such. operations fast. Hence general purpose processors are not suitable for [)SF operations. Particularly, real time OSP operations are very difficult on general purpose processors. Hence OSP processors having architecture suitable for [)sP operations are developed. 5.1.1 Desirable Features of DSP Processors Now let us see what featun!s OSP processors will be performed fast. i) [)sP processors
ii) OSP operations
General purpose processors normally'-have this type of architecture. The architecture shares same memory for progra!!!....!l_nd data. The processors perform instruction fetch, decode and execute operations sequentially. In such architecture, the speed can be _increased by pipelining. This txE.~ of architecture contains common interval address and data bus, ALD, accumulator, 110 devices and common memory for program and data. This type of architecture is not suitable for OSP processors. Ii) Harvard Architecture : The harvard architecture has separate memories for program and data. There are also separate, address and data buses for program and data. Because of these separate on chip memories and internal buses, the speed of execution in harvard architecture Is high.
should have so that DSP operations
For storing progra ms
should have multiple registers so that data (i.e. arrays) exchange Hence OSP processor
from register to register is fast
require multiple operands sim1.!ltaneously. should have multiple operand .fetch capacity, .. should have circular buffers to support should be able to-perfonn
iii) OSP processors
circular shift operations.
IV) The DSP processor operations very fast, v) DSP processors should jumps and shifts.
For storing data
pointers to support
(5 - 1)
Fig. 5.1.1 Harvard architecture shOWing separate pmgram and data memories
Digital Signal Pc~.l
that there is Program Memory Address (PMA) bus and Program Memory Data (PMD) b~ separate for program memory. Similarly there is separate Data Memory Data (DMD) bus and Data Memory Address (DMA) bus for data memory. This is all on chip. The digital slgnal processor ii;cludes various registers, address generators, ALUs etc.
In the above figure observe
completed before next sample of input comes. This requires very fast of multiplication and accumulation. dedicated hardware unit called. MAC is used. It is called multiplier-accumulator (MAC). It is one of the computational unit in processor. The complete MAC operation is executed in one dock cycle.
The harvard achiteeture has multiple bus structure and separate memories. Hence its speed is increased. It is possible to fetch next instruction when current instruction is executed. That is, the retch, decode and execute operations are done parallely,
(If!) Modified Harvard Architecture ; In this architecture data memory can be shared by data. as well as programs. Fig. 5.1.2 illustrates this concept. /
In Texas Instruments OSP processor 32!JC5X, the output of multiplier is 'stored into the product register. This -product register contents are added to accu~ulator register ACe in central ALU. The DSP processors have a special instruction multiply accumulate with data shift.
called Jo.1ACD.This means
5.11.4 Multiple • Some
the DSP processors use-XJ~!:i long instruction word (VLIW) architecture. Such architecture consists of multiple number of ALUs, MAC units, shifters etc. HE.l
.. ,Fig. 5.1.3 shows the block diagram ofvd~rChitecrure.
DIgIlaI SigJ1al Processor
Fig; 5.1.2 Modified harvard architecture data memory shared by programs
Fill •. 11.1.3 VLIW IrchltttctLl'"
Normally the program memory and data memory addresses are generated by separate address generators. The data address generator for programs can address program memory as well as data memory. This provides flexibility in use of these memories. The speed ot operaticm il> abo increased. The architecture shown in Fig. 5.1.2 above Is normally on chip. Today's commonly used OSP processors normally have thIs type of architecture, 5.1..3 Dldlo.tld
MAC Unit • Most of the operatioN In P$P lnvolve array multipliCAtion. The operations such All convotutlon, correlation require multiply and accumulate operations. In real time appUcatioru, the may multiplication and accumulation mU!ltbi!
.. ",.. c. :
• The above architechlre consists of multiported register file, It Is used for fetehlng the operands artd storln! the results. • The Read/write ero!!! bar prOVide!!parallel random access by iunc:tionalunlts 10 the multiported register tile. The functi.onal units work concurrently with
the load/store operation of data between a RAM and the register file.
Copyrighted material Copyrighted ma
·r •........ _...
•. ~Fig. 5.1.5 Shows the instruction execution with pipeline. Here observe that when. I 1 is in decode phase, next instruction I 2 is fetched. Similarly when 12 goes to decode phase, next instruction I 3is fetched. Thus observe that all the functional units are executing four successive instructions a.t any time. On c-omparing Fig.5.L4 and Fig. 5.1.5 we observe that five instructions are executed! in the same time if pipelining is used.
Value of T
1 2 3 4 5 6 7 8
13 14 15
12 13 1.4 [1
Fig. 5.1.5 I.,stru..tlon eJC:e(lutl~? with pipeline
5.1.6 Adva.nced Addressing Modes Conventional mieroprocessors have addressing modes such as direct, indirect, Immediate etc. The DSP processors have additional addressing modes because of whi.ch executlon is fast. These addressing modes a.re discussed next. 18.104.22.168Short ImmtdlD AddftSllng
The operand is specified using a short constant. This short constant becomes the part of a single wo.rd instruction. In lMS32OC5X series of DSP processors 8-bit operand can be specified as one of the operand in single word instructions such as add, subtract, AND, OR, XOR etc. 22.214.171.124 Short DirtCt AcIcIf'eplng The lower order address of the operand, isspeclfied in the single word Instruction, InTMS320CXX DSP P.roce5lIOrt, lower '7 bits of the address are specified as the part of the in!truction., Hl8her9 bits o.f the address are stoted In the datil page pointer. Bach
8uch datil page' cOn.tIts ot'128
The CPU and I/O regi!tors are lIcce!lsed as memory lOcation. These registers lite mapped In the starting page or .final page o.f the memory space, In TMS32OCSx"page o co.rresponds to CPU and I/O registers. _
Architecture of TMSJ20C5X 57 C4X: parallel-processing C6X: clusters in virtual reality, imagerecognition telecom routing, and parallel-
pooled modems, remote-access servers, digital subscriber loop systems, cable modems and multichannel telephone systems C8X: video telephony, 3D computer graphics.virtual reality and a number of multimedia applications The TI DSP chips .have IC numbers .with the prefix TMS320. If the next letter is C (e.g, TMS32OC5X), it indicates that CMOS technology is used for the' IC and the on-chip non-volatile memory is a ROM. Ifit is E (e.g, TMS320ESX) it indicates that the technoigy used is CMOS and the on-chip non-volatile memory is an EPROM. If it is neither (e.g, TMS3205X)" it indicates that NMOS technology is used for the IC and the on-chip non-volatile memory is a ROM. Under C5X itself there arethree procesrors, 'eSQ,'C5l and 'CSX, that have identical instruction set but have differences in the capacity of on-chip ROM and. RAM. The characteristics of some of the TMS320 family DSP chips are given in Table 3.1. The instruction set ofTMS32OC5X and! other DSP chips is superior to the instruction set of eonversional microprocessors such as 8085, Z80, etc., as most of the instructions require only a single. cyCle for execution. The multiply accumulate operation used quite frequently in signal processing applications such as convolution requires only one cycle in DSP.
processing systems. wireless base stations,
Characteristics of some of the TMS320 family DSP chips
Cycle lime (ns]
on chip RAM
4K 4K 8
Leading manufacturers of integrated circuits such as Texas Instruments (TI), Analog Devices a Motorola manufacture the digital signal processor (DSP) chips. These manufacturers have develop a range 1)[ DSP chips with varied complexity. The underlying concepts are broadly the same. Some these concepts are discussed ill Chapter 2. In order to give a feel fer the design. of systems with D chips, in this chapter, Some details on the design of systems wing the TMS320C5X DSP chip (I noted in brief as 5X ) manufactured by TI are given. The lMS320 DSP family consists of two types of single-chip DSPs: 16-bit fixed-point and 32floating-point. These DSPs possess the operational flexibility of high-speed ccntrollers andtJie mum cal capability of array processors. Combining these two qualities, the TMS320 processors are im pensive alternatives to custom fabricated VLSI and multichip bit-slice processors. TMS320C5X 1 longs 10 the fifth generation of the TI's TMS320 family orDSI's. The first five generations ofTMS3 family Me CIX, C2X, C3X, C4X an,d C5X. The CIX, C2X, C2XX and C5Xare 16-bit fixed-po processors. Instruction sets of the higher generation fixed-point prccessorsare upward compatible the lower generation fixed-point. processors. For example C5X can execute the instructions of b, ClX and C2X. The 54X is upward compatible with 5X. C3Xand C4X are 32-bil floating-point PI< essors and C4X is upward compatible with C3X instruction set. The sixth generation C6X devil feature VelociTPI.I, an advanced very long instruction word (VLIW) architecture developed by TI a can execute 1600 MlPS ...The eighth generation C8X devices, have, on a. single piece of silicon number of advanced DSPs (ADSPs) and a RISC master processor. Typical application of the abc
Total memory Parallel ports
Architecture of TMS320CSX
The block diagram of the internal. architecture ofC5X is shown in Fig. 3.1. The 320C5X DSPs are said to have advanced Harvard architecture because, they have separate memory bus structures for program. and data and have instructions that enable data transfer between the program and! data memory , area. 3.2 BUS STRUCTURiE
families of Tl DSPs areas follows;
CIX,. C2X, C2XX, C5X, C54X: toys, hard disk drives, modems, cellular suspensions C3X: filters, analysers, hi-fi systems, voice mail, imaging, bar-code readers, motor control, 3D graphics or scientific processing phones and active
Separate program and data buses allow simultaneous access to prog:ram. instructions and data, providing a b.igh degree of parallelism, For example, while data is multiplied, a previous product can be loaded into, added to or subtracted from the accumulator and, at the same time, a new address can' be generated. Such parallelism supports a powerful" set of arithmetic, logic and bit-manipulation operations that can all be performed in a single machine cycle. In addition, the 'C5X includes the control mechanisms to manage interrupts, repeated operations and function calling. The 'C5X architecture has four buses and their functions are as fol1ows:
Copyrighted Copyrighted rna
Digiral Signal Proces50rs
of TMS32OC5X 59
} Program ROM
Del<! ~DataJPro gram OARAM
" CPU registers (except STO and STI), peripheral registers and 1/0 ports occupy data memory space. Some of the registers/execution units in the CPU of C5X processors and their functions are as follows.
CENTRAL ARITHMETIC L0GIC UNIT (CALU)
'C51 'C52 'C53 16K 'C56 32K 'C57S 2K 'LC57 32K
2K 8K 4K
'C51 'C52 'C53 'C56 'C575 'LC57
82 (32)< 16) 81 (512)< 16)
3K 6K 6K
(512 x '16)
Memory ColTlrol _____, ~u~pmIng
Program COlTlrolier Program counter Status! centrel registers Hardwara
Memory Mapped ~ Registers
It consists of the following elements: (l6xI6)-bit parallel multiplier, arithmetic logic unit (ALU), accumulator (ACC), accumulator buffer (ACCB), product register (PREG) each witih 32 bits and , 0-16-bit left barrel shifter and right barrel shifter, One oftbe operands for the ALU operation comes from ACC, The result of operations performed in central ALU are stored in ACe. Either the higher order word or lower order word of ACC can be loaded from memory. A .32-bi.t register denoted as ACeS is used for temporary storage of ACe. The hardware multiplier unit in the C5X processors performs 16 x 16 multiplication of numbers represented in 2'5 complement form. The 32-bit PREG holds the result of multi plical ion. The 16-bit temporary register 0 (JREGO) holds the multiplicand.The other operand for the multiplication can be specified using one of the addressing modes. 0-16-bit left barrel shifter and right barrel shifter in CALU permit the contents of memory to be left shifted by 0 to 16 bits before they are either fed to ALU or stored from ALU to memory. Tile CPU registers ACe and PREG can also be shifted using these shifters. In this case they require two cycles .. A 5-bitregister TREG 1 specifies the number of bits by which the scaling shifter should shift: either the incoming data to one of the CPU registers or vice versa, When the incoming data to CPU is left: shifted by the scaling shifter the LSBs are filled with O. 3.4 AUXILIARY REGISTER ALU (ARAU)
a 3-bit auxiliary registerpointer CARP)
Multiplier Accumulator ACC buffer Sh"'--
,____' nterrrupt ~
nltlallsa- 11-r~---1 lion GerHIratlo"
~, :,~ '"
Auxiliary Register Arithme""
It consists of eight lfi-bit auxiliary registers (ARs) ARO-AR7,
Ulgic Un~ 1f-ln-stru-cIion-· (MAU) -i ~R-=(Ig:;,
and an unsigned Iti-bit ,ALU. ARAU calculates indirect addresses by using inputs from. ARs, 16~bit index register (JNDX) and auxiliary register compare register (AReR). The ARAU can autoindex the current AR while the data memoryIocation is being addressed and can index either by :l: 1 or by the conte Dis oCtO.eINDX. As a.result, accessing data does not require the CALU for address manipulation; therefore, the CALU is free for other operations in parallel. This makes the instructions to be executed faster compared to the cOlI~entional microprocessors, For example, lei us consider the followingsequence of 8085 instructions: MOYA,M
,Inlemal architedl.lre of CSX
code and immediate operands from program memory
Program bus (PRJ: It
space to the CPU.
carries the instruction
Program address bus (PAR): It provides addresses to program memory space fur both reads and writes. Data read bus (DB):'U interconnects various elements of the CPU to data memory space. Data read address bus (DAB): It provides the address to access the data memory space. The program and! data buses can work together to transfer data from on-chip data memory and internal or external program memory 10 the multiplier for single-cycle multiplylaccwnulate operations.
These instructions enable the accumulator to be loaded using indirect addressing mode and HL regi8ter~used. as the address pointer is incremented. These two instructions can be replaced by a single 5X instruction LACe *+, O. Further, any one of tile auxiliary registers can be used as the address pointer and incremented by the above instruction, The register that will be used is specified by the content of tb.e ARP. The auxiliary registers ARO-AR7 may also be used as the general purpose registers for holding the operands for arithmetic and logical operations in CALU. Some of the other registers of ARAo and their functions are as follows:
Arcllirecture of TMSJ2OC5X
3.5 INDEX REGISTER (lNDX)
The 16-bit INDX is used by the ARAU as a step value (addition or subtraction by more than 1) t( modify the.address in the ARs during indirect addressing.Por example, when the AAAU steps acrosl a row of a matrix, the indirect address is incremented by I. However, when the ARAU steps down! column, the address is incremented by the dimension of the matrix. The ARAU can add, or subtract tlu value stored in the INDX from the current AR as part of the indirect address operation.INDX can alsc map the dimension of the address block used for bit-reversal addressing. 3.6
AUXILIARY REGISTER COMPARE REGISTER (ARCR)
The 16-bit ARCR is used fo,[ address boundary comparison. The CMPR instrucnon compares thl ARCR to the, selected AR and, places the result of the Compare, in the TC bit of STL 3.7
The programcontroller contains logic circuitry that decodes the instructions. manages the CPU pipeline, stores the status of CPU operations and decodes the conditional operations. Parallelism of architecture lets the 'C5X perform. three concurrent memory operations in any given machine cycle: fetch an instruction, read an operand and write an operand.Tbe program controller consists of the following elements: 16-M program counter (PC) 16~bit status registers STl. processor mode status register (PMST) and circular buffer control register (CBCR) (16 x 16)-bit.hardware. stack Address generarton logic Instruction regtster Interrupt flag register and interrupt mask register
BLOCK, MOVE ADDRESS REGISTER (BMAR)
The 1,6·oil BMAR holds an address value
used with block moves and rnultiplyfaccumuJlIte oPel'll'
second operand. .
15 ~ 13 ARP
8- 0 DP
tions. This register provides the 16-blt address for an indirect-addressed
Status regi5ter 0 (STO) bit !15sigmmmt
BLOCK REPEAT REGISTERS (RPTC, BReR, PASR, PAER)
SOME FLAGS IN THE STATUS REGISTERS
single-instrucrion operation and is loaded by the RPT and RPTZ instructions. Block repeat counte register (BRCR) holds the count value for the block repeat feature. This value is loaded before a blocl
repeat operation is initiated. Block repeat program address start register (pASR) indicates the 16-bi address where the repeated block of code starts. The block repeat program address end registe (PAER) indicates the 16·bit address where the repeated block of 00 de ends. The PASR and PAER an loaded by the RPTB instruction.
All these. registers are 16-bit wide. Repeat counter register (RlPTC) holds the. repeat count in a, repea
LOGIC UNIT (PLU)
It performs Boolean operations or the bit manipulations required of high-speed controllers. The PLl can set, dear, test or toggle bits in a status register control register, or any data memory location. Tht PLU allows logic operations to be performed on data memory values directly without affecting fhl contents ofthe ACe or PREG. Results of a PLU function are written back to the original data memo~ location.
The status registers can be stored into data memory and loaded from data memory. thereby allowing the 'C5X status to be saved and restored for subroutines. The STO and STI each have an associated 1level deep shadow registerstack for automatic context-saving when an interrupt trap is taken. These registers are automatically restored upon a return from interrupt. "The bit assignment details for STO and STl are. given in Fig. 3 ..2. Significance of the various bits of STO and S'I'l are as follows; AM (Auxiliary Register Pointer),-The.se bitsselect the AR to be used in indirect addressing. When the ARP is loaded, the previous A_RP value is copied to the auxiliary register buffer (ARB) in STI. OV (Overj1ow) flag bit: This bit indicates that an arithmetic operation overflow in the ALU. DVM (Overflow Mode) bit: This bit enables/disablesthe accumulator overflow saturation mode in the ALD. . INTM (Interrupt Mode) bit: This bit globally masks or enables .all interrupts, The INTM bit has rio effect on the non-maskable RS and NMI interrupts. DP (Data Memory Page Pointer) bils;These bits specify tbe address of the current data. memory page. The DP bits are concatenated with. the 7 LSBs of an instruction word to form a direct memory address of 16 bits.
The 'csx has 96 registers mapped into page 0 of the data memory space. All 'csx DSPs have 28 CPt registers and 16 input/output (I/O) port registers but have different numbers of peripheral and reserves registers. Since the memory-mapped registers are a component of the data memory space, they can hi written to and read from in the same way as any other data memory location. The memory-mappec registers are used. for indirect data address pointers, temporary storage, CPU status and control, 0: integer arithmetic processing through the ARAU.
11 1 (Sri)
Digital Signal Processors
Archilectu.re of TMS32OC5X
This 3-bit field holds the previous value contained in the ARP in STO. Whenever the ARP is loaded, the previous ARP value IS copied to the ARB, except when using the LST #0 instruction. When the ARE is loaded using the LST #1 instruction, the same value is also copied to the ARP. This is useful when restoring context (when not using the automatic context save) in a subroutine that modifies the current ARP. CNF On-chip RAM configuration control bit. This l-bit field enables the on-chip dual-access RAM block 0 (DARAM SO) to be addressable in data memory space or program memory space. The CN! bit can be modified by the LST #1 instrucnon. If CNF is 0, the on-chip iDARAM block 0 is' mapped intc data memory space. The CNF' bit can be cleared by a reset or the CLRC CNF instruction. When CNF is 1, me on-chip DARAM block 0 is mapped into program memory space. The CNF bit can be set b} the SBTC CNP instruction. TC Test/control flag bit. This l-bitflag stores the results of the ALU or parallel logic unit(PLU) test bh operations. The. status of the TC bit determines if the conditional branch, call and return insnuctiom are to be executed. SXM Sign-extension mode bit. This I-bit field enables/disables sign extension of an arithmetic operation, The SXM bit does not affect the operations of certain arithmetic or logical Instructions; i!hf ADDC, ADDS, SUBB or SUBS instruction suppresses sign extension, regardless of SXM. C Carry bit. This t-bu field indicates an arithmetic operation carry or borrow in the ALU. The singlebit shift and rotate instructions affect the C bit. HM Hold mode bit. This l-bit field determines whether the central processing unit (CPU) stops 01 continues execution when acknowledgingan active HOLD signal. XF pin status bit. This l-bit field determines the level of the external flag (XF) output pin. PM Product shift mode bits .. This 2-bitfield determines the product shifter (P-SCALER) mode and shift value. for tile PREG output into the ALU. Table 3.2 give-s the PM bits and the function performed Table 3.2
PM bits Function P-SCALER mode lor PREG output PM bits and the function
The 'C5X has a total. address range of 224K words x 16 bits. The memory space is divided into four individually selectable memory segments: 64K-word program memory space, 64K-word local data memory space, 64iK-word 110 ports and 32K-word global data memory space. 3.13.1 Program ROM
All 'C5X DSPs carry a 16-bit on-chip maskable programmable ROM (see Fig. 3.1 for sizes). Some of the 'C5X DSPs have boot loader code resident in the on-chip ROM, and the other 'C5X DSPs offer the boot loader code as an option. This memory is used for booting program code from slower external ROM or EPROM to fast on-Chip or external RAM. Once the custom program has been booted into RAM, the boot ROM space can be removed from program memory space by setting the MPIMC bit in the processor mode status register (PMST}. The on-chip ROM is selected at reset by driving the MP! MC pin low. If the on-chip ROM is not selected, the 'C5X devices start execution from off-chip memory. 3.13.2 Data/Program Dual.Ac:cess RAM
AU'C5X DSPs carry a 1056-word X l6·bit on-Chip dual-access RAM (DARAM). The, DARAM is divided into three individually selectable memory blocks: 512-word data or program DARAM block 80, 512-word data DARAM block B I and 32-word data DARAM block 82. The DARAM is primarily intended 10 store data values but, when.needed, can. be used 10 store programs as well. DARAM blocks Bl and B2 are always configured as data memory; however, DARAM block 80 can be conflgured by software as data. or program memory. DARAM improves the operationalspeed of the 'C5X CPU. The CPU operates with a 4-deep pipeline. in this pipeline, the CPU reads data on the third stage and writes data. on the fourth stage. Hence, for a given instruction sequence, the second Instruction could be reading data at the same time the [1tS1 instruction is writing data. The dual data buses (DB and DAB) allow the CPU to read from and write to ElARAM in the same machine cycle.
Left-shifted 1 bit; LSB zero-filled
Left-shifted 4 bits; 4 LSBs zero-filled
Righhslrlilted 6 bits: sign extended; 6 lSB, 10'1. The product is alway> sign extended, regardless of the value of the SXM bit
The 'C5X architecture contains II c-onsiderable amount of on-chip memory to aid in system performance and integration: Program Read-Only Memory (ROM) Dare/Program Dual-Access RAM (DARAM) Data/Program Single-Access RAM (SARAM)
Almost all 'C5X DSPs carry a 16-bit on-chip single-access RAM (SARAM) of sizes varying from 19K x.l6 words, Code can be booted from an off-chip ROM and then executed at fuli speed once, it is .loaded into the on-chip SARAM. The SARAM can be configured by software as data memory, as program memory or combination of both data memory and program memory. The, SARAM is divided into lK- and/or 2K·word blocks contiguous in address memory space, All 'C5X CPUs support parallel accesses to these SARAM blocks. However, one SARAM block can be accessed only once per machine cycle. In other words, the CPU can read from or write to one SARAM block while accessing another SARAM block.
an.Chip Memory Protection
The 'C5X DSPs have a maskable option that protects the contents of on-chip memories. When the related bit is set, noextemaUy originating instruction can access the on-chip memory spaces.
Digital Sigml Processors
3.14.5 Host Port Interface (HPI)
The lIPl is available on the 'C57S and 'LC57. It is an S-bit parallel l/O port that provides an interface to a host processor. Information is exchanged between the DSP and the host processor through on-chip memory tha( is accessible to both the host processor and tile 'C57.
'C5X DSPs have the same CPU structure; however, they have different on-chip peripherals nected to their CPUs. The 'C5X OSP on-chip peripherals available are as follows: Clock Generator Hardware Timer Software- Programmable Wait-State Generators Parallel I/O Ports Host Port Interface (HPI) Serial Port Buffered Serial Port (BSP) Time-Division Multiplexed (TDM) Serial Port
The clock generator consists of an internal oscillator and a phaselocked loop (PLL) circuit. The clock generator can be driven internally by a crystal resonator circuit or driven externally by a clock source. The PLL circuit can generate an internal CPU clock by multiplying the clock source by a specific factor and so a clock source with a frequency lower than that of the CPU can be used.
Three different kinds of serial ports are available: a general-purpose serial port, a time-division multiplexed (TOM) serial port and a buffered serial port (BSP) .. Each 'C5X contains at least one general-purpose, high-speed synchronous, full-duplexedserial port interface thai provides direct communication with serial devices such as codecs, serial analog-to-digital (AID) converters and other serial systems, The serial port is capable of operating at up to one-fourth the machine cycle rate (CiLKOUTl). The serial port transmitter and receiver are double-buffered and individually controlled maskable external interrupt signals. Data is framed either as bytes or as words. Five 16-bit registers (SPC, DRR. DXR, XSR, RSR) control and operate the serial port; interface. The serial port control (SPC) register contains the mode control and status bits of the serial port. The data receiveregister (ORR) holds the incoming serial data, and the data transmit register (DXR) holds the outgoing serial data. The data transmit shift register (XSR) controls the shifting of the data 'from the OXR to the output pin. The data receive shift register (RSR) controls the storing of the data from the input pill to the ORR.
3.14 ..7 Buffered Serial Port (BSP)
"The BSP is available on the 'C56 and 'C57 devices, U is a full-duplexed, double-buffered serial port and an. autobufferingunit (ABU). The :SSP provides flexibility on the data stream length. The ABU supports high-speed data transfer and reduces interrupt: latencies. The SSP has a 2K-word buffer, which resides the 'C5X internal memory .. Five BSP registers control and operate the BSP.
A l6~bit hardware timer with a 4-bit prescaler is available, This programmable timer clocks at a rate that is between 112 and 1/32 of the machine cycle rate (CLKOUTl), depending upon thetimer's divide-down ratio. The timer can be stopped, restarted, reset or disabled by specific status bits. Three registers control and operate the timer. The timer counterregister ('ElM) gives the current count of the timer. The timer period register (PRD) defines the period for the timer. The 16-bit timer control register (TCR) controls the operations of the timer.
TOM Serial Port
Software-programmable wait-state logic is incorporated in 'C5X DSPs allowing wait-state generation without any external hardware for interfacing with slower off-chip memory and VO devices. This feature consists of multiple wait-state generating circuits ..Each circuit is user-programmable to operate in different wait states fur off-chip memory accesses.
The TDM serial port available on th€ 'C50, 'C5l and 'C53 devices is a full-duplexed serial port that can be configured by softwareeither for synchronous operations or for time-division multiplexed operations, ThroM. serial port is commonly used in multiprocessor applications.
Parallel I/O Ports
A total of 64K I/O ports are available, 16 of these ports are memory-mapped in data memory space. Each of the I/O ports can be addressed by the IN or the OUT instruction, The memory-mapped I/O portscan be accessed with any instruction iliat reads from or writes to data. memory. The IS signal indicates a read or write operation through an I/O port. The 'C5X can easily interface with external f} devices through the 110 ports while requiring minimal off-chip address decoding circuits.
Four external interrupt Jines (INTl- INT4) and! five internal interrupts, a timer interrupt and four serial port interrupts are user maskable. When an interrupt service routine (ISR) is executed, the contents of the program counter are: saved on an 8-1evelhardware stack, and the contents of 11 specific CPU registers, ACC, ACCS, PREG, STU, STI, PMS'F, TREGO, TREGl, TREG2, INDX and ARCR, are saved in one deep stack (shadow registers). When a return from interrupt instruction is executed, the CPU registers' contents are restored,
Digital Signal Processing
D!gital Signal Processors
Digital Signal PrQces~;o.
5.2.1 Features of TMS320C5x Processors
This family of processors 1) Powerful 16 ibit CPU. 25. have following features :
2) 20" 25, 35 and SO ns single cycle instruction execution time for SV operation. 40 and 50 ns single cycle instruction execution time for3V operation. 3) 16
•. ~ Fig. 5.1.5 Shows the instruction execution with pipeline. Here observe that when I 1 is in decode phase, next instruction I 2 is fetched. Similarly when 12 goes to decode phase, next instruction I 3is fetched. Thus observe that all the functional units are executing four successive instructions a.t any time, On comparing Fig. 5.1.4 and Fig. 5.1.5 we observe that five instructions are executed! in the same time if pipelining is used. Valua of T Fatch
16 bit Multiply
/ Add operations
can be performed
in single cycle.
4) 224K x 16 bit maximum divided into. 64K program, 5) Upto 32K
addressable external memory space. This space 64K data, 64K I/O and 32K global memories. ROM.
1 2 3 4 5 6 7
11 J2 J3
l6 bit single access on-chip program
6) Upto 9K x 16-bit single access on-chip prograrrr/data RAM (SARAM) 7) lK" 16-bit dual access on chip program/data (TDM) serial port. wait state generation capability. RAM (DARAM). interface. 8) Full duplex synchronous 9) Time Division Multiplexed 10) It has hardware/software serial port for coder/decoder
J2 13 1.4 [1
11) On chip timer for control operations, 12) Repeat instructions for effident use of program 13) It has buffered serial port. port. locked loop (PLL) docking management operations i.e, xl, x2,><3, ><4, space.
Fig. 5-1.5 I.,stru..uon eJC:e(lutl~? with pipeline
5_1.6Adva.nced Addressing Modes
Conventional microprocessors have addressing modes such as direct, indirect, Immediate etc. The DSP processors have additional addressing modes because of whi.ch executlon is fast. These addressing modes are discussed next.
14) It has host interface 15) If has multiple and! x9
16) Block move facility for data/program
126.96.36.199Short Immtdilflo Add,...lng
The operand is specified using a short constant, This short constant becomes the part of a single word instruction. In lMS32OC5X series of DSP processors S-bit operand can be specified as one of the operand in single word instructions such as add, subtract, AND, OR, XOR etc.
scan based emulation
18) Boundary scan. 19) This family is manufactured 20) It has low power dissipation into high performance static CMOS technology.
and power down modes. (JTAG).
5.1.61 Short DirtCt Acldreplng The lower order address of the operand. isspeclfied in the single word Instruction, InTMS320CXX DSP P.roce5lIOrt, lower '7 bits of the address are specified as the part of the in!truction., Hl8her9 bits o.f the addms are stored In the datil page pointer. Bach such d.ata page' cOn.tIts ot'128 \vordl, 188.8.131.52 M.~ilpptd Addrtulnll
21) IEEE standard Test Access. Port
22) The processors are avaIlable in .flve packaging opttons, 'I " The TMS3.2OC5x famll.y o.f :processors Is manufactured with static: CMOS technology, It has advanced harvard architecture, These procel90rs can execute upto' 50 mllllon Instructions per seconds (MIPS). .I
5.2.2 Architecture of TMS320C5x ProceSIOrs
Pig. 5.2..1 shows the functional of DSP processors.
The CPU and I/O regl.8tors are lIcce!lsed as memory lOcation. These registers lite mapped In the starting page or .final page o.f the memory space, In TMS32OCSx"pase o co.rresponds to CPU and I/O registers. -
Digital Signal Processing-
Digital Signal Processors
I----c;;~ '---"""'''''L.._~__ ; f-----------tr
~~~q::J=t~~§:f , ....
t----L. ... ~
Fla. B.2.1 Functlon.1 block dl.gram or Ircl'lltectura of TMI3UC5x
The meanings of varlous symbols used in the above block diagram are given in Table 5.2.1. -,.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.