Professional Documents
Culture Documents
Architecture
Pipelining
Anugrah Jain
CSE, LNMIIT
Acknowledgement
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Ex ecution
fetch unit (c) Pipelined execution
unit
Instruction
I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage u
bffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure 8.3. Effect of an xeecution operation taking more than one clock
ycle.
c
Pipeline Performance
• The previous pipeline is said to have been stalled for two
clock cycles.
• Any condition that causes a pipeline to stall is called a
hazard.
• Data hazard – any condition in which either the source or
the destination operands of an instruction are not available
at the time expected in the pipeline. So, some operation
must be delayed, and the pipeline stalls.
• Instruction (control) hazard – a delay in the availability of an
instruction causes the pipeline to stall.
• Structural hazard – the situation when two instructions
require the use of a given hardware resource at the same
time.
Pipeline Performance
Instruction
I 1 (Mul) F1 D1 E1 W1
I 2 (Add) F2 D2 D2A E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
SRC1 SRC2
Register
file
ALU
RSLT
Destination
(a) Datapath
SRC1,SRC2 RSLT
E: Execute W: Write
(ALU) (Register file)
Forwarding path
(b) Position of the source and result registers in the processor pipeline
Instruction
I1 F1 E1
I3 F3 X
Ik Fk Ek
Branch Timing I1 F1 D1
F2
E1 W1
I 2 (Branch) D2 E2
I3 F3 D3 X
- Branch penalty I4 F4 X
Ik Fk Dk Ek Wk
- Reducing the penalty
I k+1 Fk+1 Dk+1 Ek+1
Time
Clock cycle 1 2 3 4 5 6 7
I1 F1 D1 E1 W1
I 2 (Branch) F2 D2
I3 F3 X
Ik Fk Dk Ek Wk
D : Dispatch/
Decode E : Execute W : Write
instruction results
unit
Figure 8.10. Use of an instruction queue in the hardware organization of Figure 8.2b.
Conditional Branches
LOOP Decrement R2
Branch>0 LOOP
Shift_left R1
NEXT Add R1,R3
Instruction
Decrement F E
Branch F E
Branch F E
Figure 8.13. Execution timing showing the delay slot being filled
during the last two passes through the loop in Figure 8.12.
Branch Prediction
Instruction
I 1 (Compare) F1 D1 E1 W1
I 2 (Branch>0) F2 D 2 /P2 E2
I3 F3 D3 X
I4 F4 X
Ik Fk Dk
Figure 8.14. Timing when a branch decision has been incorrectly predicted
as not taken.
Branch Prediction
Instruction
I1 F1 D1 E1 W1
I 2 (Load) F2 D2 E2 M2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4
I5 F5 D5
• = 210 MIPS.
Number of Pipeline Stages
• An n-stage pipeline may increase instruction throughput by
n times.
• As the number of stages increase, the probability of the
pipeline being stalled increases.
• The inherent delay in the basic operations increases.
• Hardware considerations (area, power, complexity,…)
• Performance/Cost ratio (PCR) by Larson = f / (c + kh)
• Where f = maximum throughput, c = cost of all logic stages,
k = number of stages and h = cost of each latch.
Thank You