You are on page 1of 29

The Processor

Lecture # 9 & 10
Course Instructor: Dr. Afshan Jamil
Outline
• An overview of pipelining
• Pipelining analogy
• RISC-V Pipeline
• Pipeline performance example
• Pipeline speedup
• Pipeline and ISA design
• Pipeline hazards
• Structural hazard
• Data hazard
• Control hazard
• Pipeline summary
An overview of pipelining
• Pipelining is an implementation technique in which
multiple instructions are overlapped in execution.
• All steps in a task, called stages in pipelining, operate
concurrently.
• If we have separate resources for each stage, we can
pipeline the tasks.
• Pipelining improves performance by increasing
instruction throughput, as opposed to decreasing the
execution time of an individual instruction.
CONTD…

• If all the stages take about the same amount of time


and there is enough work to do, then the speed-up
due to pipelining is equal to the number of stages in
the pipeline.
Pipelining Analogy
• Pipelined laundry: overlapping execution
– Parallelism improves performance
RISC-V Pipeline

Five stages, one step per stage


1. IF: Instruction fetch from memory
2. ID: Instruction decode & register read
3. EX: Execute operation or calculate address
4. MEM: Access memory operand
5. WB: Write result back to register
Pipeline performance example

• Contrast the average time between instructions of a


single-cycle implementation, in which all instructions
take one clock cycle, to a pipelined implementation.
Assume that the operation times for the major functional
units in this example are 200 ps for memory access for
instructions or data, 200 ps for ALU operation, and 100
ps for register file read or write. In the single-cycle
model, every instruction takes exactly one clock cycle, so
the clock cycle must be stretched to accommodate the
slowest instruction.
CONTD…
CONTD…

Single-cycle (Tc= 800ps)


CONTD…

Pipelined (Tc= 200ps)


Pipeline Speedup
• If all stages are balanced
– i.e., all take the same time

• If not balanced, speedup is less


• Speedup due to increased throughput
– Latency (time for each instruction) does not
decrease
Pipelining and ISA Design

• RISC-V ISA designed for pipelining


– All instructions are same length
• Easier to fetch in one stage and decode in second
stage.
• x86: 1- to 15-bytes instructions
– Few and regular instruction formats
• Can decode and read registers in one step
CONTD…

– Load/store addressing
• Can calculate address in 3rd stage, access memory
in 4th stage
– Alignment of memory operands
• Memory access takes only one cycle
Pipeline Hazards

• Situations that prevent starting the next instruction in


the next cycle
• Structural hazard
– A required resource is busy. It means that the
hardware cannot support the combination of
instructions that we want to execute in the same
clock cycle
CONTD…
• Data hazard
– When a planned instruction cannot execute in the
proper clock cycle because data that is needed to
execute the instruction is not yet available
• Control hazard
– When the proper instruction cannot execute in the
proper pipeline clock cycle because the instruction
that was fetched is not the one that is needed.
Structural Hazards
• When a planned instruction cannot execute in the proper
clock cycle because the hardware does not support the
combination of instructions that are set to execute.
• Conflict for use of a resource
• In RISC-V pipeline with a single memory
– Load/store requires data access
– Instruction fetch would have to stall for that cycle
• Hence, pipelined data paths require separate
instruction/data memories.
Structural Hazards
E.g., suppose single – not separate – instruction and data memory in
pipeline below with one read port
– then a structural hazard between first and fourth
instructions
P ro g ra m
e x e c u t io n 2 4 6 8 10 12 14
T im e
o rd e r
( i n in s t r u c ti o n s )
I n s tr u c tio n D a ta
ld x7 , 1 0 0 ( x22 )
f e tc h
R eg ALU
access
R eg
Hazard if
ld x8 , 2 0 0 ( x22 ) 2 ns
I n s tr u c tio n
fe tc h
R eg ALU
D a ta
acces s
R eg single
I n s tr u c tio n D a ta
memory
ld x9 , 3 0 0 ( x22
2) 2 ns R eg ALU R eg
f e tc h acces s

I n s tr u c tio n D a ta
ld x9 , 4 0 0 ( x22 ) R eg ALU R eg
2 ns f e tc h acces s

2 ns 2 ns 2 ns 2 ns 2 ns

• RISC-V was designed to be pipelined: structural hazards are easy to


avoid!
Data Hazards
• An instruction depends on completion of data access by a
previous instruction
– add x1, x2, x3
sub x4, x1, x5
Forwarding (Bypassing)
• Use result when it is computed
– Don’t wait for it to be stored in a register
– Requires extra connections in the datapath
Load-Use Data Hazard
• Can’t always avoid stalls by forwarding
– If value not computed when needed
– Can’t forward backward in time!
Code Scheduling to Avoid Stalls (Software
Solution)
• Reorder code to avoid use of load result in the next instruction
• C code for a = b + e; c = b + f;

ld x1, 0(x31) ld x1, 0(x31)


ld x2, 8(x31) ld x2, 8(x31)
stall add x3, x1, x2 ld x4, 16(x31)
sd x3, 24(31) add x3, x1, x2
ld x4, 16(x31) sd x3, 24(31)
stall add x5, x1, x4 add x5, x1, x4
sd x5, 32(x31) sd x5, 32(x31)

13 cycles 11 cycles
CONTD…

Reordering Code to Avoid Pipeline Stall


• Example:
ld x6, 0(x22)
ld x8, 4(x22)
sd x8, 0(x22) Data hazard
sd x9, 4(x22)

• Reordered code:
ld x6, 0(x22)
ld x8, 4(x22)
sd x9, 4(x22)
Interchanged
sd x8, 0(x22)
Control Hazards

• Branch determines flow of control


– Fetching next instruction depends on branch
outcome
– Pipeline can’t always fetch correct instruction
– Still working on ID stage of branch
• In RISC-V pipeline
– Need to compare registers and compute target
early in the pipeline
– Add hardware to do it in ID stage
Stall on Branch
• Wait until branch outcome determined
before fetching next instruction
Branch Prediction
• Longer pipelines can’t readily determine branch
outcome early
– Stall penalty becomes unacceptable
• Predict outcome of branch
– Only stall if prediction is wrong
• In RISC-V pipeline
– Can predict branches not taken
– Fetch instruction after branch, with no delay
RISC-V with Predict Not Taken

Prediction
correct

Prediction
incorrect
More-Realistic Branch Prediction
• Static branch prediction
– Based on typical branch behavior
– Example: loop and if-statement branches
• Predict backward branches taken
• Predict forward branches not taken
• Dynamic branch prediction
– Hardware measures actual branch behavior
• e.g., record recent history of each branch
– Assume future behavior will continue the trend
• When wrong, stall while re-fetching, and update history
Pipeline Summary
The BIG Picture

• Pipelining improves performance by increasing


instruction throughput
– Executes multiple instructions in parallel
– Each instruction has the same latency
• Subject to hazards
– Structure, data, control
• Instruction set design affects complexity of pipeline
implementation

You might also like