You are on page 1of 27

Instruction Pipelining

Products at various stages are worked out simultaneously – process called as pipelining • In a Pipeline – new inputs are accepted at one end before previously accepted inputs appears as outputs at the other end • To apply this concept to instruction execution – an instruction has a number of stages • The pipeline has two independent stages First stage – fetches an instruction and buffers it. - when the second stage is free, the first stage passes it the buffered instruction

- While the second stage is executing the instruction, the first stage takes advantage of any unused memory cycles to fetch and buffer the next instruction. This is called as instruction prefetch or fetch overlap - When a conditional branch instruction is passed on from the fetch to the execution stage, the fetch stage fetches the next instruction in memory after the branch instruction - To gain further speedup, the pipeline must have more stages

Consider the following decomposition of the instruction processing • Fetch instruction – Read the next instruction into a buffer • Decode instruction – Determine the opcode and the operand specifiers • Calculate operands – calculate the Effective Address of each source operand • Fetch operands – fetch each operand from memory • Execute instructions – Perform the indicated operation and store the result • Write operand – store the result in memory


• • • • Figure 12.10 shows that a six stage pipeline can reduce the execution time for 9 instructions from 54 time units to 14 time units However to simplify the pipeline Hardware, the timing is setup assuming that each instruction requires all six stages Also the diagram assumes that all of the stages can be performed in parallel. Another difficulty is the condition branch instruction,which can indicate several instruction fetches Figure 12.11 illustrates the effects of the conditional branch,using the same program as Figure 12.10

Fig.12.10 Timing of Pipeline

Figure 12.11 The Effect of a Conditional Branch on Instruction Pipeline operation

• In Figure 12.11, the branch is taken. This is not determined until the end of time unit7 • At this point, the pipeline must be cleared of instructions that are not useful. • During time unit 8, instruction 15 enters the pipeline. No instructions complete during time units 9 through 12; this is the performance penalty incurred because we couldn’t anticipate the branch

Pipeline Performance
• The cycle time T of an instruction pipeline is the time needed to advance a set of instructions one stage through pipeline T = max[Ti] + d = Tm + d 1 ≤ i ≤ k Where Tm = maximum stage delay k =number of stages in the instruction pipeline d=time delay of a latch,needed to advance signals and data from one stage to the next .

Dealing with Branches A variety of approaches have been taken for dealing with conditional branches • Multiple Streams • Prefetch Branch Target • Loop buffer • Branch prediction • Delayed branching

Multiple Streams

• • • • •

Have two pipelines Prefetch each branch into a separate pipeline Use appropriate pipeline Leads to bus & register contention Multiple branches lead to further pipelines being needed

Prefetch Branch Target
• Target of branch is prefetched in addition to instructions following branch • Keep target until branch is executed • Used by IBM 360/91

Loop Buffer
• • • • • • Very fast memory Maintained by fetch stage of pipeline Check buffer before fetching from memory Very good for small loops or jumps c.f. cache Used by CRAY-1

Branch Prediction (1)
• Predict never taken
– Assume that jump will not happen – Always fetch next instruction – 68020 & VAX 11/780 – VAX will not prefetch after branch if a page fault would result (O/S v CPU design)

• Predict always taken
– Assume that jump will happen – Always fetch target instruction

Branch Prediction (2)
• Predict by Opcode
– Some instructions are more likely to result in a jump than thers – Can get up to 75% success

• Taken/Not taken switch
– Based on previous history – Good for loops

Branch Prediction (3)
• Delayed Branch
– Do not take jump until you have to – Rearrange instructions

Branch Prediction State Diagram

Intel 80486 Pipelining
 The 80486 implements a five stage pipeline
(i) Fetch - Instructions are fetched from the cache or from external memory - Objective of the fetch – to fill the prefetch buffers with new data - the status of the prefetch relative to the other pipeline stages varies from instruction to instruction - On average, about five instructions are fetched with each 16byte load - operates independently (ii) Decode stage 1: - all opcode and addressing mode information is decoded in D1 stage

- the required information and instruction length information is included at
most first 3bytes of the instruction. - Hence 3 bytes are passed to the D1 stage from the prefetch buffers - D1 decoder can then direct the D2 stage to capture the rest of the instruction (iii) Decode stage 2: - The D2 stage expands each opcode into control signals for the ALU. - It also controls the computation of the more complex addressing modes (iii) Execute - This stage includes ALU operations, cache access, and register update

(v) Write Back - If needed updates registers and status flags modified during the preceding execute stage. • With the use of two decode stages, the pipeline can sustain a throughput of close to one instruction per clock cycle. • Complex instructions and conditional branches can slow down this rate • Figure 12.19 80486 Instruction Pipeline examples

The Pentium Processor
 An overview of the pentium 4 processor organization is depicted in figure 4.13 (i) Register Organization - it includes the following type of registers (table 12.1) – below
(a) Integer Unit Length(bits) Purpose 32 General purpose register 16 Contain segment register 32 status and control bits 32 Instruction pointer

Type Number General 8 Segment 6 Flags 1 Instruction 1 pointer

Type Number Numeric 8 Control 1 Status 1 Tag word 1 Instruction 1 Pointer Data Pointer 1 (b) Floating point Unit Length(bits) Purpose 80 Hold floating point numbers 16 Control bits 16 status bits 16 specifies contents of numeric register 48 Points to instruction interrupted by exception 48 Points to operand interrupted by exception

EFLAGS Register (i) Trap Flag – causes an interrupt after the execution of each instr. (ii) Interrupt enable Flag – recognize external interupts

(iii) Direction Flag(DF) – determines whether the string processing instructions increment or decrement (iv) I/O Privilege Flag – causes the processor to generate an exception (v) Resume Flag(RF) – Allows the programmer to disable dbug exceptions (vi) Alignment check(AC) – Activates if a word or double word is addressed (vii) Identification Flag(ID) – If this bit can be set and cleared, then this processor supports the CPUID instruction

Control Registers: - The Pentium employs four 32 bit control registers to control various aspects of processor operation - Register CR1 is unused - CR0 register contain system control flags - The flags are as follows (i) Protection enable(PE)- enable or disable protection mode (ii) Monitor Coprocessor (MP) (iii) Emulation(EM) (iv) Task Switched(TS) (v) Extension Type(ET) (vi) Numeric error (vii)Write Protect (viii)Alignment Mask (ix) Not Write Through (x) Cache Disable (xi) Paging

When Paging is enabled, the CR2 and CR3 registers are valid The CR2 registers the 32bit linear address of the last page, before a page fault interrupt

MMX Registers: - Pentium MMX capability makes use of several 64bit data types - The processor does not include specific MMX registers - The existing floating point registers are used to store MMX values - Thus the existing Pentium architecture is easily extended to support the MMX capability

Interrupt Processing: - Interrupt processing within a processor is a facility provided to support the operating system Interrupts and Exceptions: - An interrupt is generated by a signal from hardware and it may occur at random times during the execution of a program - An exception is generated from software and it is provoked by the execution of an instruction Two sources of interrupts are 1. Maskable Interrupts - Received on the processor INTR pin - The processor doesn’t recognize a maskable interrupt unless the interrupt enable flag is set

2. Nonmaskable Interrupts - Received on the processors NMI pin. - Recognition of such interrupts cannot be prevented Two sources of Exceptions 1. Processor-detected exceptions - Results when the processor encounters an error while attempting to execute an instruction 2. Programmed exceptions: - These are instructions that generate an exception(INT0,INT3,INT and BOUND)

Interrupt vector Table: -Interrupt processing on the Pentium uses the interrupt vector table. - Every type of interrupt is assigned a number and this number is used o index into the interrupt vector table Interrupt Handling: - Interrupt handling routine uses the system stack to store the processor state