Professional Documents
Culture Documents
Module Outline:
• Pipelined Data Path Design
• Pipelined Control Path Design
• Hazards
• Exception Handling
stall = delay, i.e. we will have delay in the pipelined computer whenever there is
control / data dependences between the instructions
ELEC6036 (Vincent Tam) Module 1: Pipelining Page 5
The Five Stages of Load
• As the instruction execution cycle uses different hardware components for different
steps, it is possible to set up a pipeline.
P1 P2 P3 P4 P5
Instruction Instruction Address Data fetch Instruction
fetch unit analyzer calculation unit execution
unit unit
P1: 1 2 3 4 5 6 7 8
P2: 1 2 3 4 5 6 7
P3: 1 2 3 4 5 6
P4: 1 2 3 4 5
P5: 1 2 3 4
Time
1 2 3 4 5 6 7 8
• Suppose each stage takes 1 nsec. Then each instruction still takes 5 nsec. But once
the pipeline has been filled, a complete instruction rolls out every 1 nsec. Thus, the
speedup is 5. nsec - nano-second !
Design Issues:
• We have to make sure that the same resource (e.g., ALU) is not used in more than
one pipeline stage.
• If the resources used in the same pipelining stage are different, then overlapping is
possible.
• However, we must note that to retain the intermediate values produced by an
individual instruction for all its pipeline stages, we must include temporary registers
between the pipeline stages.
T im e ( in c lo c k c y c l e s )
P ro g ra m
C C 1 C C 2 C C 3 C C 4 C C 5 C C 6 C C 7
e x e c u t io n
o rd e r
( i n in s t r u c t io n s )
lw $ 1 , 1 0 0 ( $ 0 ) IM R eg A LU D M R eg
lw $ 2 , 2 0 0 ( $ 0 ) IM R eg A LU D M R eg
lw $ 3 , 3 0 0 ( $ 0 ) IM R eg A LU D M R eg
(IF)
CC#4
- Conflict of interests !
BEQ - Branch
on Equal
IF (R1 == 0) ?
e.g. if there is 30% branch, then the total CPI - 1.5 * 0.3 + 1.0 * 0.7
or r8, r1, r9
(R)
(W on
R1)
(R
on R1)
(Read
on R1
with the
new/updated value !)
(R)
(R)
(IF)
(ID)
* Data-
path
Diagram
(EX)
(MEM)
(WB)
-
Control
Diagram
(WB)
(WB)
(IF) (ID/RF)
(EX)
(MEM)
• Insert a “bubble” into the pipeline to prevent 2 writes at the same cycle
- the control logic can be complex
- lose instruction fetch and issue opportunity
• No instruction is started in Cycle 6!!
* Add
NOOP "Mem"
stage
for R-type
and ORi-type
* NOOP for WB
• Ifetch: Instruction Fetch
- fetch the instruction from the instruction memory
• Reg/Dec: registers fetch and instruction decode
• Exec: calculate the memory address
• Mem: write the data into the data memory
EX - Execution stage
ID - Instruction Decode
IF - Instruction Fetch
PC - Program Counter
index addressing
: r2 (base addr) +
offset(35)
No Mem
access
is needed
if
(NOOP) (r6 ==
r7)
then
jump
to
inst at
- ori is a misc (or independent) 100 !
instr. to be inserted after BEQ
Due
to the first
instr. that is completed
NOOP ! sub
does not
need any
MEM opera-
tion !
104
NOOP !
15
r14
- :
no need !
110
114
common register as R2
* WB on R2 by
2nd instr
should be AFTER
WB on R2
by the 3rd in-
str
* WB on R3 by the last instr should
be AFTER reading on R3 by the 4th instr,
ELEC6036 (Vincent Tam) Module 1: Pipelining i.e. RS should be after Page 55
the OF in the above diagram !
Data Hazards
EX
MEM
WB
MUX
- Multiplexer
* Scheduled version:
rescheduling the instr
so as to reduce
the load stalls
• External interrupts:
- allow pipeline to drain
- load PC with interrupt address
• Faults (within instruction, restartable)
- force trap instruction into IF
- disable writes till trap hits WB
- must save multiple PCs or PC + state
• Recall: precise exceptions ==> state of the machine is preserved as if program
executed up to the offending instruction
- all previous instructions completed
- offending instruction and all following instructions act as if they have not even started
- same system code will work on different implementations
• Load with data page fault, Add with instruction page fault?
• Solution
1: interrupt vector/instruction
2: interrupt ASAP, restart everything incomplete
Write Read
(WB)
vector-based computers