VLIW architecture explained in detail

1)Explain VLIW architecture in detail?
Very-long-instruction-word (VLIW) processing is an important approach for substantially increasing the number of
instructions that are processed per cycle (Texas Instruments, 1999). A very-long-instruction word is essentially a
concatenation of several short instructions and requires multiple execution units, running in parallel, to carry out
the instructions in a single cycle. The principles of VLIW architecture and data flow for the TMS320C62x family of
advanced, fixed-point DSP processors is illustrated in Figure 12.13. The CPU contains two data paths and eight
independent execution units, organized in two sets-(LI, SI, M1 and DI) and (L2, S2, M2 and D2). In this case, each
short instruction is 32 bits wide and eight of these are linked together to form a very long instruction word packet
which may be executed in parallel.
The VLIW processing starts when the CPU fetches an instruction packet (eight 32-bit instructions) from the on-chip
program memory. The eight instructions in thefetch packet are formed into an execute packet, if they can be
executed in parallel, and then dispatched to the eight execution units as appropriate. The next 256-bit instruction
packet is fetched from the program memory while the execute packet is decoded and executed. If the eight
instructions in a fetch packet are not executable in parallel (for example if the eight instructions were all multiply-
accumulate instructions, then only two can be performed in a cycle because there are only two multipliers
available), then several execute packets will be formed and dispatched to the execution units, one at a time. A fetch
packet is always 256 bits wide (eight instructions), but an execute packet may vary between one and eight
instructions.
The VLIW architecture is clearly designed to support instruction level parallelism. This architecture, together with
fast clock speeds (typically, 200 MHz), leads to very high performance DSP processors. In the TMS320C62x, the
instruction parallelism is scheduled at compile time. However, the computational efficiency of such processors falls
if the instructions cannot be executed in parallel.
2) Explain Von-Neumann Architecture, Harvard Architecture, and modified Harvard Architecture in detail. How
architecture of advanced Digital signal processor is different from modified Harvard architecture.
A modified Harvard architecture machine is very much like a Harvard architecture machine, but it relaxes the strict
separation between instruction and data while still letting the CPU concurrently access
two (or more) memory buses.
VON NEUMANN ARCHITECTURE HARVARD ARCHITECTURE

It is ancient computer architecture based on stored It is modern computer architecture based on Harvard
program computer concept. Mark I relay based model.
Same physical memory address is used for instructions Separate physical memory address is used for
and data. instructions and data.
There is common bus for data and instruction transfer. Separate buses are used for transferring data and
Two clock cycles are required to execute single instruction.
instruction. An instruction is executed in a single cycle.
Explain how higher throughput is obtained in DSP using VLIW architecture?

The performance gains that can be achieved with VLIW architecture depends on the degree of parallelism in the
algorithm selected for a DSP application and the number of functional units. The throughput will be higher only if the
algorithm involves execution of independent operations.
Write different features of DSP processors.?

Architectural features, con chip memory,special instructions and I/O capability execution speed,,arithmetic avalibality,
wordlength
Explain different addressing modes of TMS320C67XX DSP processor?
DSP processors support various addressing modes for execution of instructions and to access data. The efficient way
of accessing data (signal sample and filter coefficients) can significantly improve implementation performance, it
provides flexible ways to access data helps in writing programs. Data addressing modes enhance DSP
implementation, DSP processors addressing modes are:
• Immediate Addressing Mode

Operand is explicitly known in value, capability to include data as part of the instruction
• Register Addressing Mode
Operand is always in processor register reg, it provides capability to reference data through its register
• Direct Addressing Mode
Operand is always in memory location mem, provides capability to reference data by giving its memory location
directly.
• Indirect Addressing Mode
Operand memory location is variable, operand address is given by the value of register Addrreg, operand accessed
using pointer Addrreg.
• Special Addressing Modes
It provides to implement real-time digital signal processing and FFT algorithms
i.Circular Addressing Mode: Circular buffer allows one to handle a continuous stream of incoming data samples;
once the end of the buffer is reached, samples are added to the beginning again.
ii.Bit-Reversed Addressing Mode: Address generation unit can be provided with the capability of providing bit-
reversed indices.
What are differences between fixed point and floating-point processor?

Fixed point floating-point
What is need of DSP processor when high speed processors are available?
DSP, is a specialized microprocessor that has an architecture which is optimized for the fast operational needs of
digital signal processing.
The goal of digital DSP signal processors is usually to measure, filter or compress continuous real-world analog
signals. Most general-purpose microprocessors can also execute digital signal processing algorithms successfully.
We cannot use a general-purpose microprocessor to process signals very well, Add and subtract operations are
performed quite simply by general-purpose microprocessors in a single or very few clock cycles. The multiply and
divide operations are more complex. A digital multiply operation consists of a series of shift and add operations.
Division, which is more complex.
What are silent features of TMS320C67XX family of DSP processors
Features of TMS 320C67X processor:

1. Advanced VLIW CPU of TMS 320C67X consists 32 general purpose register. Each register has 32-Bits.
2. It has 8 functional units each functional unit consists of two multiplier and 6 ALVs.
3. It contains ‘C’ compiler and assembly optimizer which is used to simplify assembly language programming,
a window based debugger interface for visibility.
4. It can execute 8 instructions per cycle. Highly effective RISC codes can be developed.
5. Instruction packing can be done, that means it gives code size equivalence for 8 instructions executed
parallaly or serially.
6. It provides conditional execution of all instructions so it reduces branching. The court can be executed
independently on functional units.
7. It provides support for key arithmetic operations. All the common operations required for control and data
manipulation applications are supported.
8. It provides hardware support for single precision 32-Bits and double precision 64-Bits IEEE floating point
operations. 32 × 32 bit integer multiplication can be performed. It supports floating point addition and
subtraction capability and mixed precision multiply instruction.
Compare DSP processor and microprocessor
DSP processor :
 Instruction cycle - Instruction is executed in a single cycle of the clock.
 Instruction execution - parallel execution is possible.
 Suitable for - Array processing operation.
 Addressing mode - Direct and indirect addressing mode.
 Computational units - Three separate computational units: ALU, MAC, Sifter.
 Address generation - Address is generated combine by DAGs and program sequencer.
 Application - Speech processing, audio processing, signal processing
Microprocessor :
 Instruction cycle - Multiple clock cycle is required for the execution of one instruction.
 Instruction execution - Execution instruction is always sequential.
 Suitable for - general-purpose processing.
 Addressing mode - Direct, indirect register, register indirect, immediate, etc.
 Computational units - Only main unit: ALU.
 Address generation - The program counter is incremented sequentially to generate an address.
. Explain application of the DSP processor in following fields 1. Radar Signal processing 2. Speech recognition
Speech Recognition:-
Voice recognition involves inputting information into a computer using human voice and the computer listening and
recognizing human speech. The speech recognition system operates in one of two modes. In the training mode, the
user trains the system to recognize his or her voice by speaking each word to be admitted into a microphone. The
system digitizes and creates a template of each word to be recognized and stores this in its memory. In the
recognition mode, each spoken word is again digitized and its template is compared with the templates in
memory.When a match occurs the word has been recognized and the system informs the user or takes some action.
The Block diagram is shown in the figure.
This increases their independence by enabling them to perform simple tasks such as turning on / off lights, radio, or
TV.
 RADAR transmits radio signals at distant objects and analyzes reflection.

 Data gathered can include the position and movement of the object, also radar can identify the object
through its "signature" - the distinct reflection it generates.
 There are many forms of RADAR - such as continuous CW), Doppler, ground penetrating or synthetic
aperture; and they're used in many applications, from air traffic control to weather prediction.
 In the moderm Radar systems digital ignal processing DSP is used extensively. At the transmitter end, it
generates and shapes the transmission pulses, controls the antenna beam patter while at the receiver, DSP
performs many complex tasks, including STAP (space time adaptive processing)- the removal of clutter, and
beamforming (electronic guidance of direction).
 The front end of the receiver for RADA is still often analog due the high frequencies involved. With fast ADC
convertors - often multiple channel, complex IF signals are digitized. However, digital technology is coming
closer to the antenna. We may also require fast digital interfaces to detect antenna position, or control
other hardware.
 The main task of a radar's signal processor is to make decisions. After a signal has been transmitted, the
receiver starts receiving return signals, with those originating from near objects arriving first because time
of arrival translates into target range.
 The signal processor places a raster of range bins over the whole period of time, and now it has to make a
decision for each of the range bins as to whether it contains an object or not.
Explain with neat diagram architecture of TMS320C67XX DSP processors?
DSP processor with the help of neat block diagram:
Figure above is the block diagram for the c67x DSP. The C6000 devices come with prog memory, which on some
devices, can be used as a program cache. The devices also have varying sizes of data memory. Peripherals such as a
direct memory access (DMA) controller, power – down logic, and external memory interface (EMIF) usually come
with the cpu, while peripherals such as serial ports and host ports are on only certain devices.
Central processing unit (CPU)
The CPU contains:
Program fetch unit.
Instruction dispatch unit.
Instruction decode unit.
Two data paths, each with four functions units.
32 32-bit registers.
Control logic.
Test, emulation and interrupt logic.
The program fetch, instruction dispatch and instruction decode unit can deliver up to eight 32 bit instructions to the
functional unit every CPU clock cycle. The processing of instruction occurs in each of the two data paths, each
contains four functional units and 16, 32 bit general purpose registers. A control register file provides the means to
configure and control various processor operation.
Internal Memory.
The c67x DSP has a 32 bit, byte addressable address space. Internal memory is organized in separate data and prog
spaces. When off chip memory is used, these spaces are unified on most devices to a single memory space via the
external; memory interface (EMIF).
Memory and peripheral options.
A variety of memory and peripherals options are available for the C6000 platform.
Large on chip RAM, up-to 7M bits
Program cache.
2 level cache.
32 bit external memory interface supports SDRAM, SBSRAM, SRAM, and other asynchronous memories for a board
range of external memory requirement and max system performance.
DMA controller transfers data between address ranges in the memory map without intervention by the CPU.
EDMA controller performs the same functions as the DMA controller.
HPI is a parallel port through which a host processor can directly access the cpu’s memory space.
Expansion bus is a replacement for the HPI, as well as an expansion of the EMIF.
McBSP is based on the standard serial port interface found on the TMS320C2000 devices.
Timers in the c6000 devices are two 32 bit general purpose times used for these functions.
Time event.
Count event.
Generate pulses.
Interrupt the CPU.

VLIW architecture explained in detail

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

VLIW architecture explained in detail

Uploaded by

Copyright:

Available Formats

1)Explain VLIW architecture in detail?

two (or more) memory buses.

VON NEUMANN ARCHITECTURE HARVARD ARCHITECTURE

Explain how higher throughput is obtained in DSP using VLIW architecture?

Write different features of DSP processors.?

• Immediate Addressing Mode

What are differences between fixed point and floating-point processor?

Features of TMS 320C67X processor:

 RADAR transmits radio signals at distant objects and analyzes reflection.

DSP processor with the help of neat block diagram:

You might also like