This action might not be possible to undo. Are you sure you want to continue?
is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization requires sophisticated compilation techniques.
In the latter way.Making the Execution of Programs Faster Use faster circuit technology to build the processor and the main memory. . the number of operations performed per second is increased even though the elapsed time needed to perform any one operation is not changed. Arrange the hardware so that more than one operation can be performed at the same time.
Dave each have one load of clothes to wash.Traditional Pipeline Concept Laundry Example Ann. and fold Washer takes 30 minutes Dryer A B C D takes 40 minutes takes 20 minutes “Folder” . Brian. Cathy. dry.
Traditional Pipeline Concept 6 PM 7 8 9 Time 30 A B C D 40 20 30 40 20 30 40 20 30 Sequential 10 11 Midnight 40 20 laundry takes 6 hours for 4 loads If they learned pipelining. how long would laundry take? .
Traditional Pipeline Concept 6 PM T a s k O r d e r 7 8 9 Time 30 A B C D 40 40 40 40 20 Pipelined 10 11 Midnight laundry takes 3.5 hours for 4 loads .
it helps throughput of entire workload Pipeline rate limited by slowest pipeline stage Multiple tasks operating simultaneously using different resources Potential speedup = Number pipe stages Unbalanced lengths of pipe stages reduces speedup Time to “fill” pipeline and time to “drain” it reduces speedup Stall for Dependences .Traditional Pipeline Concept 6 PM 7 8 9 Time T a s k O r d e r 30 A 40 40 40 40 20 Pipelining B C D doesn’t help latency of single task.
Basic idea of instruction pipelining.Use the Idea of Pipelining in a Computer Fetch + Execution I1 F E F I2 E F I3 E T ime Clock cycle 1 1 2 2 3 3 1 2 3 4 Time Instruction I1 I2 F1 E1 F2 E2 F3 (c) Pipelined execution E3 (a) Sequential execution Interstage buffer B1 Instruction fetch unit Execution unit I3 (b) Hardware organization Figure 8.1. .
2. .Use the Idea of Pipelining in a Computer Time Clock cycle 1 2 3 4 5 6 7 Instruction Fetch + Decode + Execution + Write I1 I2 I3 I4 F1 D1 F2 E1 D2 F3 W1 E2 D3 F4 W2 E3 D4 W3 E4 W4 (a) Instruction execution divided into four steps Interstage buffers F : Fetch instruction B1 D : Decode instruction and fetch operands B2 E: Execute operation B3 W : Write results (b) Hardware organization Textbook page: 457 Figure 8. A 4stage pipeline.
Role of Cache Memory Each pipeline stage is expected to complete in one clock cycle. Fortunately. pipeline is almost useless. . The clock period should be long enough to let the slowest pipeline stage to complete. we have cache. if each instruction needs to be fetched from main memory. Faster stages can only wait for the slowest one to complete. Since main memory is very slow compared to the execution.
and there is no interruption throughout program execution.Pipeline Performance The potential increase in performance resulting from pipelining is proportional to the number of pipeline stages. . this increase would be achieved only if all pipeline stages require the same time to complete. this is not true. Unfortunately. However.
Effect of an e xecution operation taking more than one clock c ycle.3. .Pipeline Performance Time Clock cycle Instruction I1 I2 I3 I4 F1 D1 F2 E1 D2 F3 D3 F4 W1 E2 W2 E3 D4 W3 E4 W4 E5 1 2 3 4 5 6 7 8 9 I5 F5 D5 Figure 8.
. and the pipeline stalls. Instruction (control) hazard – a delay in the availability of an instruction causes the pipeline to stall.Pipeline Performance The previous pipeline is said to have been stalled for two clock cycles. Any condition that causes a pipeline to stall is called a hazard. So some operation has to be delayed. Structural hazard – the situation when two instructions require the use of a given hardware resource at the same time. Data hazard – any condition in which either the source or the destination operands of an instruction are not available at the time expected in the pipeline.
. Pipeline stall caused by a cache miss in F2.Pipeline Performance Time Instruction hazard Clock cycle Instruction I1 I2 I3 1 2 3 4 5 6 7 8 9 F1 D1 E1 F2 W1 D2 F3 E2 D3 W2 E3 W3 (a) Instruction execution steps in successive clock cycles Time Clock cycle Stage F: Fetch D: Decode E: Execute W: Write F1 F2 D1 F2 idle E1 F2 idle idle W1 F2 idle idle idle F3 D2 idle idle D3 E2 idle E3 W2 W3 1 2 3 4 5 6 7 8 9 Idle periods – stalls (bubbles) (b) Function performed by each processor stage in successive clock cycles Figure 8.4.
. Effect of a Load instruction on pipeline timing.5.Pipeline Performance Structural hazard Load X(R1). R2 Time Clock cycle Instruction I1 I2 (Load) I3 I4 F1 D1 F2 E1 D2 F3 W1 E2 D3 F4 M2 E3 D4 W2 W3 E4 1 2 3 4 5 6 7 I5 F5 D5 Figure 8.
Pipeline Performance Again. pipelining does not result in individual instructions being executed faster. Pipeline stall causes degradation in pipeline performance. rather. We need to identify all hazards that may cause the pipeline to stall and to find ways to minimize their impact. Throughput is measured by the rate at which instruction execution is completed. . it is the throughput that increases.
Pls draw the figure for 4stage pipeline. the I2 takes two clock cycles for execution.Quiz Four instructions. . and figure out the total cycles needed for the four instructions to complete.
Data Hazards .
Another example: Mul R2.Data Hazards We must ensure that the results obtained when instructions are executed in a pipelined processor are identical to those obtained when the same instructions are executed sequentially. Hazard occurs A←3+A B←4×A No hazard A←5×C B ← 20 + C When two operations depend on each other. R6 . they must be executed sequentially in the correct order. R4 Add R5. R4. R3.
6. .Data Hazards Time Clock cycle Instruction I1 (Mul) I2 (Add) I3 I4 F1 D1 F2 E1 D2 F3 W1 D2A E2 D3 F4 W2 E3 D4 W3 E4 W4 1 2 3 4 5 6 7 8 9 Figure 8. Pipeline stalled by data dependenc y between D 2 and W 1. Pipeline stalled by data dependency between D2 and W1.6. Figure 8.
Operand Forwarding Instead of from the register file. A special arrangement needs to be made to “forward” the output of ALU to the input of ALU. . the second instruction can get data directly from the output of ALU after the previous instruction is completed.
Source 1 Source 2 SRC1 SRC2 Register file ALU RSLT Destination (a) Datapath SRC1. .7.SRC2 E: Execute (ALU) RSLT W: Write (Register file) Forwarding path (b) Position of the source and result registers in the processor pipeline Figure 8. Operand forw arding in a pipelined processor .
R4.Handling Data Hazards in Software Let the compiler detect and handle the hazard: I1: Mul R2. . R6 The compiler can reorder the instructions to perform some useful work during the NOP slots. R3. R4 NOP NOP I2: Add R5.
When a location other than one explicitly named in an instruction as a destination operand is affected. Sometimes an instruction changes the contents of a register other than the one named as the destination. the instruction is said to have a side effect. R4 Instructions designed for execution on pipelined hardware should have few side effects. .Side Effects The previous example is explicit and easily detected. (Example?) Example: conditional code flags: Add R1. R3 AddWithCarry R2.
Instruction Hazards .
the pipeline stalls.Overview Whenever the stream of instructions supplied by the instruction fetch unit is interrupted. Cache miss Branch .
8.Unconditional Branches Time Clock cycle Instruction I1 I2 (Branch) I3 Ik Ik+1 F1 E1 F2 E2 F3 X Execution unit idle 1 2 3 4 5 6 Fk Ek Fk+1 Ek+1 Figure 8. An idle cycle caused by a branch instruction. .
Time Clock cycle 1 F1 2 D1 F2 3 E1 D2 F3 4 W1 E2 D3 F4 X X Fk Dk Fk+1 Ek Dk+1 Wk E k+1 5 6 7 8 Branch Timing . Branch timing. .9.Branch penalty .Reducing the penalty I1 I2 (Branch) I3 I4 Ik Ik+1 (a) Branch address computed in Ex ecute stage Time Clock cycle I1 I2 (Branch) I3 Ik Ik+1 1 F1 2 D1 F2 3 E1 D2 F3 X Fk Dk Fk+1 Ek Wk 4 W1 5 6 7 D k+1 E k+1 (b) Branch address computed in Decode stage Figure 8.
2b. . Use of an instruction queue in the hardware organization of Figure 8.Instruction Queue and Prefetching Instruction fetch unit Instruction queue F : Fetch instruction D : Dispatch/ Decode unit E : Ex ecute instruction W : Write results Figure 8.10.
The decision to branch cannot be made until the execution of that instruction has been completed. .Conditional Braches A conditional branch instruction introduces the added hazard caused by the dependency of the branch condition on the result of a preceding instruction. Branch instructions represent about 20% of the dynamic instruction count of most programs.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.