You are on page 1of 47

Execution Control

Execution Control
Execution control refers to the rules or mechanisms used in a processor for determining the next instruction to execute. several features of execution control hardware looping, interrupt handling, stacks, and relative branch support.

Hardware Looping
DSP algorithms frequently involve the repetitive execution of a small number of instructions- socalled inner loops or kernels. problems associated with traditional approaches to repeated instruction execution:
a natural approach to looping uses a branch instruction to jump back to the start of the loop. most loops execute a fixed number of times, the processor must usually use a register to maintain the loop index, that is, the count of the number of times the processor has been through the loop.

DSP processors have evolved to avoid these problems via hardware looping, also known as zero-overhead looping. Hardware loops are special hardware control constructs that repeat either a single instruction or a group of instructions some number of times. The key difference between hardware loops and software loops is that hardware loops lose no time incrementing or decrementing counters, checking to see if the loop is finished, or branching back to the top of the loop. This can result in considerable savings.

The software loop takes roughly three times as long to execute, assuming that all instructions execute in one instruction cycle. In fact, branch instructions usually take several cycles to execute, so the hardware looping advantage is usually even larger.

Types of H/w looping

Single-Instruction Hardware Loops and Multi-Instruction Hardware Loops

Single-Instruction Hardware Loops

A single-instruction hardware loop repeats a single instruction some number of times. Eg. Texas Instruments TMS320C2x A multi-instruction hardware loop repeats a group of instructions some number of times. Eg. Analog Devices ADSP-21 xx

Single-Instruction Hardware Loops

Single-instruction hardware loop
executes one instruction repeatedly the instruction needs to be fetched from program memory only once. the program bus can be used for accessing memory for purposes other than fetching an instruction, e.g., for fetching data or coefficient values that are stored in program memory.

Multi-Instruction Hardware Loops

must refetch the instructions in the block of code being repeated each time the processor proceeds through the loop. the processor's program bus is not available to access other data.

Loop Repetition Count

A feature that differentiates processors' hardware looping capabilities is the minimum and maximum number of times a loop can be repeated. Almost all processors support a minimum repetition count of one, and 65,536 is a common upper limit. Frequently, the maximum number of repetitions is lower if the repetition count is specified using immediate data, simply because the processor may place restrictions on the size of immediate data words.

Loop Repetition Count

A hardware looping pitfall found on some processors (e.g., Motorola DSP5600x and Zoran ZR3800x) is the following: a loop count of zero causes the processor to repeat the loop the maximum number of times. While this is not a problem for loops whose size is fixed, it can be a problem if the program dynamically determines the repetition count at run time.

Loop Effects on Interrupt Latency

Single-instruction hardware loops disables interrupts for the duration of their execution.
system designers making use of both singleinstruction hardware loops and interrupts must carefully consider the maximum interrupt lockout time they can accept and code their singleinstruction loops accordingly. Alternative: use multi-instruction loops on single instructions and breaking the single-instruction loop up into a number of smaller loops

Nesting Depth
A nested loop is one loop placed within another. The most common approaches to hardware loop nesting are:
Directly nestable Partially nestable Software nestable Nonnestable

Hardware-Assisted Software Loops

An alternative to nesting hardware loops is to nest a single hardware loop within a software loop that uses specialized instructions for software looping. TMS320C3x and TMS320C4x support a decrement-and branch-if-not-zero instruction. While this instruction costs one instruction cycle on the TMS320C3x and TMS320C4x and three on the DSP16xx, the number of cycles is less than would be required to execute individual decrement, test, and branch instructions. This is a reasonable compromise between
the hardware cost required to support nested hardware loops the execution time cost of implementing loops entirely in software.

An external event that causes the processor to stop executing its current program and branch to a special block of code called an interrupt service routine. All DSP processors support interrupts and most use interrupts as their primary means of communicating with peripherals.

Interrupt Sources
On-chip peripherals : generate interrupts when certain conditions are met External interrupt lines : asserted by external circuitry to interrupt the processor Software interrupts : generated either under software control or due to a software-initiated operation

Interrupt Vectors
All processors associate a different memory address with each interrupt. These locations are called interrupt vectors. Processors that provide different locations for each interrupt are said to support vectored interrupts. Typical interrupt vectors are one or two words long and are located in low memory. The interrupt vector usually contains a branch or subroutine call instruction that causes the processor to begin execution of the interrupt service routine located elsewhere in memory. On some processors, interrupt vector locations are spaced apart by several words.

Interrupt Enables
All processors provide mechanisms to globally disable interrupts. On some processors, this may be via special interrupt enable and interrupt disable instructions, while on others it may involve writing to a special control register. Most processors also provide individual interrupt enables for each interrupt source. This allows the processor to selectively decide which interrupts are allowed at any given time.

Interrupt Priorities and Automatically Nestable Interrupts

Prioritized interrupts some interrupts have a higher priority than others.
the one with the higher priority will be serviced, the one with the lower priority must wait for the processor to finish servicing the higher priority interrupt.

When a processor allows a higher-priority interrupt to interrupt a lower priority interrupt that is already executing, we call this automatically nestable interrupts.

Interrupt Latency
The amount of time between an interrupt occurring and the processor doing something in response to it. Formal definition: The minimum time from the assertion of an external interrupt line to the execution of the first word of the interrupt vector that can be guaranteed under certain assumptions.

Interrupt Latency
The details of and assumptions used in this definition are as follows:
Most processors sample the status of external interrupt lines every instruction cycle. For an interrupt to be recognized as occurring in a given instruction cycle, the interrupt line must be asserted some amount of time prior to the start of the instruction cycle; this time is referred to as the set-up time. we assume that these setup time requirements are missed. This lengthens interrupt latency by one instruction cycle.

Interrupt Latency
Depending on the processor synchronization can add from one to three instruction cycles to the processor's interrupt latency. we assume the processor is in an interruptible state, which typically means that it is executing the shortest interruptible instruction possible. we assume the processor is in an interruptible state, which typically means that it is executing the shortest interruptible instruction possible.

Mechanisms for Reducing Interrupts

Some processors provide "autobuffering" on their serial ports. This feature allows the serial port to save its received data directly to the processor's memory without interrupting the processor. After a certain number of samples have been transferred, the serial port interrupts the processor. This is a specialized form of DMA.

Processor stack support is closely tied to execution control. For example, subroutine calls typically place their return address on the stack, while interrupts typically use the stack to save both return address and status information. DSP processors typically provide one of three kinds of stack support:
Shadow registers: Shadow registers are dedicated backup registers that hold the contents of key processor registers during interrupt processing. Hardware stack: A hardware stack holds selected registers during interrupt processing or subroutine calls. Software stack: A software stack is a conventional stack using the processor's main memory to store values during interrupt processing or subroutine calls.

Relative Branch Support

All DSP processors support branch or jump instructions as one of their most basic forms of execution control. In PC-relative branching, the address to which the processor is to branch is specified as an offset from the current address. PC-relative addressing is useful for creating positionindependent programs can be relocated in memory when it is loaded into the processor's memory. In addition to supporting position-independent code, PC-relative branching can also save program memory in certain situations.


A technique for increasing the performance of a processor by breaking a sequence of operations into smaller pieces and executing these pieces in parallel when possible, thereby decreasing the overall time required to complete the set of operations. Unfortunately, in the process of improving performance, pipelining frequently complicates programming.

Pipelining and Performance

A hypothetical processor uses separate execution units to accomplish the following actions sequentially for a single instruction:
Fetch an instruction word from memory Decode the instruction Read a data operand from or write a data operand to memory Execute the ALU or MAC portion of the instruction

Assuming that each of the four stages above takes 20 ns to execute, and that they must be done sequentially

A pipelined implementation of this processor starts a new instruction fetch immediately after the previous instruction has been fetched . Similarly, it begins decoding each instruction as soon as the previous instruction is finished decoding. In essence, it overlaps the various stages of execution. As a result, the execution stages now work in parallel.

Pipeline Depth
Although most DSP processors are pipelined, the depth (number of stages) of the pipeline may vary from one processor to another. In general, a deeper pipeline allows the processor to execute faster but makes the processor harder to program. Most processors use three stages (instruction fetch, decode, and execute) or four stages (instruction fetch, decode, operand fetch, and execute). In three-stage pipelines the operand fetch is typically done in the latter part of the decode stage.

The execution sequence shown in Figure 9-2 is referred to as a perfect overlap, because the pipeline phases mesh together perfectly and provide 100 percent utilization of the processor's execution stages. In reality, processors may not perform as well as we have shown in our hypothetical example. The most common reason for this is resource contention.


One solution to this problem is interlocking. An interlocking pipeline delays the progression of the latter of the conflicting instructions through the pipeline


Branching Effects
Branching effects occur whenever there is a change in program flow, and not just for branch instructions. For example, subroutine call instructions, subroutine return instructions, and return from interrupt instructions are all candidates for the pipeline effects described above. Processors offering delayed branches frequently also offer delayed returns.

Branching Effects

Branching Effects

Interrupt Effects
Interrupts typically involve a change in a program's flow of control to branch to the interrupt service routine. The pipeline often increases the processor's interrupt response time, much as it slows down branch execution. When an interrupt occurs, almost all processors allow instructions at the decode stage or further in the pipeline to finish executing, because these instructions may be partially executed. What occurs past this point varies from processor to processor. We discuss several examples below:

Interrupt Effects

Interrupt Effects

Interrupt Effects

Pipeline Programming Models

The examples in the sections above have concentrated on the instruction pipeline and its behavior and interaction with other parts of the processor under various circumstances. Two major assembly code formats for pipelined processors:
time-stationary data-stationary

Pipeline Programming Models

In the time-stationary programming model, the processor's instructions specify the actions to be performed by the execution units (multiplier, accumulator, and so on) during a single instruction cycle. A good example is the AT&T DSP16xx family where a multiply-accumulate instruction looks like: a0=a0+p p=x*y y=*r0++ p=*pt++

Pipeline Programming Models

Data-stationary programming specifies the operations that are to be performed, but not the exact times during which the actions are to be executed. As an example, consider the following AT&T DSP32xx instruction: a1 = a1 + (*r5++ = *r4++) * *r3++