Professional Documents
Culture Documents
time time
P1 a1 a2 a3 a4 P1 a1 b1 c1 d1
P2 b1 b2 b3 b4 P2 a2 b2 c2 d2
P3 c1 c2 c3 c4 P3 a3 b3 c3 d3
P4 d1 d2 d3 d4 P4 a4 b4 c4 d4
P1 P1
P2 P2
P3 P3
P4 P4
time time
Basic Pipeline
Five stage “RISC” load-store architecture
lw IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
IF ID EX MEM WB
Latency: 5 cycles
Throughput: 1 instr/cycle
Concurrency: 5 CPI = 1
Cycles Per Instruction (CPI)
Fetch Instruction(FI)
Read the next expected instruction into a buffer
Decode Instruction(DI)
Determine the opcode and the operand specifiers.
Calculate Operands(CO)
Calculate the effective address of each source operand.
Fetch Operands(FO)
Fetch each operand from memory. Operands in registers need
not be fetched.
Execute Instruction(EI)
Perform the indicated operation and store the result
Write Operand(WO)
Store the result in memory.
Timing diagram for instruction pipeline
operation
Six-stage CPU instruction pipeline
6
Pipeline Performance: Clock & Timing
Si Si+1
m d
Latch delay : d
= max { m }+d
Pipeline frequency : f
f=1/
Advantages
• Delayed branch
– Leverages branch that does not take effect until
after execution of following instruction
زفحي عرف يذ ل لخدي زيح ذيفنت ىتح دعب ذيفنت عت مي ت ت ةي
– The following instruction becomes the delay slot
Normal vs Delayed Branch