Professional Documents
Culture Documents
Very-long-instruction-word (VLIW) processing is an important approach for substantially increasing the number of
instructions that are processed per cycle (Texas Instruments, 1999). A very-long-instruction word is essentially a
concatenation of several short instructions and requires multiple execution units, running in parallel, to carry out
the instructions in a single cycle. The principles of VLIW architecture and data flow for the TMS320C62x family of
advanced, fixed-point DSP processors is illustrated in Figure 12.13. The CPU contains two data paths and eight
independent execution units, organized in two sets-(LI, SI, M1 and DI) and (L2, S2, M2 and D2). In this case, each
short instruction is 32 bits wide and eight of these are linked together to form a very long instruction word packet
which may be executed in parallel.
The VLIW processing starts when the CPU fetches an instruction packet (eight 32-bit instructions) from the on-chip
program memory. The eight instructions in thefetch packet are formed into an execute packet, if they can be
executed in parallel, and then dispatched to the eight execution units as appropriate. The next 256-bit instruction
packet is fetched from the program memory while the execute packet is decoded and executed. If the eight
instructions in a fetch packet are not executable in parallel (for example if the eight instructions were all multiply-
accumulate instructions, then only two can be performed in a cycle because there are only two multipliers
available), then several execute packets will be formed and dispatched to the execution units, one at a time. A fetch
packet is always 256 bits wide (eight instructions), but an execute packet may vary between one and eight
instructions.
The VLIW architecture is clearly designed to support instruction level parallelism. This architecture, together with
fast clock speeds (typically, 200 MHz), leads to very high performance DSP processors. In the TMS320C62x, the
instruction parallelism is scheduled at compile time. However, the computational efficiency of such processors falls
if the instructions cannot be executed in parallel.
2) Explain Von-Neumann Architecture, Harvard Architecture, and modified Harvard Architecture in detail. How
architecture of advanced Digital signal processor is different from modified Harvard architecture.
A modified Harvard architecture machine is very much like a Harvard architecture machine, but it relaxes the strict
separation between instruction and data while still letting the CPU concurrently access
DSP processors support various addressing modes for execution of instructions and to access data. The efficient way
of accessing data (signal sample and filter coefficients) can significantly improve implementation performance, it
provides flexible ways to access data helps in writing programs. Data addressing modes enhance DSP
implementation, DSP processors addressing modes are:
What is need of DSP processor when high speed processors are available?
DSP, is a specialized microprocessor that has an architecture which is optimized for the fast operational needs of
digital signal processing.
The goal of digital DSP signal processors is usually to measure, filter or compress continuous real-world analog
signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully.
We cannot use a general-purpose microprocessor to process signals very well, Add and subtract operations are
performed quite simply by general-purpose microprocessors in a single or very few clock cycles. The multiply and
divide operations are more complex. A digital multiply operation consists of a series of shift and add operations.
Division, which is more complex.
What are silent features of TMS320C67XX family of DSP processors
DSP processor :
Instruction cycle - Instruction is executed in a single cycle of the clock.
Instruction execution - parallel execution is possible.
Suitable for - Array processing operation.
Addressing mode - Direct and indirect addressing mode.
Computational units - Three separate computational units: ALU, MAC, Sifter.
Address generation - Address is generated combine by DAGs and program sequencer.
Application - Speech processing, audio processing, signal processing
Microprocessor :
Instruction cycle - Multiple clock cycle is required for the execution of one instruction.
Instruction execution - Execution instruction is always sequential.
Suitable for - general-purpose processing.
Addressing mode - Direct, indirect register, register indirect, immediate, etc.
Computational units - Only main unit: ALU.
Address generation - The program counter is incremented sequentially to generate an address.
. Explain application of the DSP processor in following fields 1. Radar Signal processing 2. Speech recognition
Speech Recognition:-
Voice recognition involves inputting information into a computer using human voice and the computer listening and
recognizing human speech. The speech recognition system operates in one of two modes. In the training mode, the
user trains the system to recognize his or her voice by speaking each word to be admitted into a microphone. The
system digitizes and creates a template of each word to be recognized and stores this in its memory. In the
recognition mode, each spoken word is again digitized and its template is compared with the templates in
memory.When a match occurs the word has been recognized and the system informs the user or takes some action.
The Block diagram is shown in the figure.
This increases their independence by enabling them to perform simple tasks such as turning on / off lights, radio, or
TV.
Figure above is the block diagram for the c67x DSP. The C6000 devices come with prog memory, which on some
devices, can be used as a program cache. The devices also have varying sizes of data memory. Peripherals such as a
direct memory access (DMA) controller, power – down logic, and external memory interface (EMIF) usually come
with the cpu, while peripherals such as serial ports and host ports are on only certain devices.
Central processing unit (CPU)
The CPU contains:
Program fetch unit.
Instruction dispatch unit.
Instruction decode unit.
Two data paths, each with four functions units.
32 32-bit registers.
Control logic.
Test, emulation and interrupt logic.
The program fetch, instruction dispatch and instruction decode unit can deliver up to eight 32 bit instructions to the
functional unit every CPU clock cycle. The processing of instruction occurs in each of the two data paths, each
contains four functional units and 16, 32 bit general purpose registers. A control register file provides the means to
configure and control various processor operation.
Internal Memory.
The c67x DSP has a 32 bit, byte addressable address space. Internal memory is organized in separate data and prog
spaces. When off chip memory is used, these spaces are unified on most devices to a single memory space via the
external; memory interface (EMIF).
Memory and peripheral options.
A variety of memory and peripherals options are available for the C6000 platform.
Large on chip RAM, up-to 7M bits
Program cache.
2 level cache.
32 bit external memory interface supports SDRAM, SBSRAM, SRAM, and other asynchronous memories for a board
range of external memory requirement and max system performance.
DMA controller transfers data between address ranges in the memory map without intervention by the CPU.
EDMA controller performs the same functions as the DMA controller.
HPI is a parallel port through which a host processor can directly access the cpu’s memory space.
Expansion bus is a replacement for the HPI, as well as an expansion of the EMIF.
McBSP is based on the standard serial port interface found on the TMS320C2000 devices.
Timers in the c6000 devices are two 32 bit general purpose times used for these functions.
Time event.
Count event.
Generate pulses.
Interrupt the CPU.