Professional Documents
Culture Documents
Lecture - 17 - MIPS - Instruction Level Parallelism
Lecture - 17 - MIPS - Instruction Level Parallelism
MEL G642
1
Intro
1 Branch pred eliminates most Control dependencies
of
2 But data dependencies makes it harder to finish an instr
in a Single Cycle
3 How to further enhance the performance of the processor
I why do we execute
only one instr per cycle
I every don't execute more than I instr per cycles
Issue multiple instr per cycle decode them execute them
Cos How possibly multiple lust can be executed in a
given lyle
Instruction Level parallelism
Instruction level Parallelism
I Two Primary Methods to increase ILP
A- Deeper pipeline
To Overlap more instructions
entÉt
0 How to resolve hazards certain
t data dependencies have been resolved upto a
Crieff
we can
CPI < 1 even though
Improve the lust per cycle
eve have EPIC 1
Is due to verypredictions
few
IPC – Instructions per cycle Branch mis
an 0
Issue multiple instr in a Cycle and execute multiple instr
2 now
so not a multicore processor
Replicate the processor component
2
Concept of Speculation
A Approach allows compiler or processor to guess about
properties of other instructions
3
Approach I IT
iI
Emotion out
a instr
Ii
s a eamoeenanted
I
Concept of In Imminent
emotion unit
Superscalar Ir
processors If Execution Unit
Execution Unit
All this is ideally possible
Approach I
- Multiple Issue : Dynamic Multiple Issue
Processor chooses instruction(s) to execute in each cycle
Compiler can help in reordering Super scalar is an
2 Can corr example: in order
Processor resolves Hazards without
complice execution or out of
also
so Pentium processors are Superscaler order execution 4
what Should be the Ideal EPI
for a pipelined processor Ideal CPI 1
2 EPICalculation
be executed in parallel
5 cycles a no
of instr which can
2 CPI 56 0
🤩 Super
3 Issues
All instr cannot be executed in Same cycle due to dependencies
5
u
RAW
STALL Ence Mem
In Is r
RAW
CPI 55 L
3 In
15 u
Considering Raw deep cycles I 2 3 4 5
or CPI 3 6
5
WAW Dependencies
I write after write deep
which evey is 5 6 7
written
Charm R E M B
R update
hereof
p Becango E M Bry updated
130Thisshouldnot
E M WB
Ry bethefinalvalue
ofRu
WB
R8 E
thefinalvalueof Ithere
WB Mold
Ry I
Ry updated't
rid Ru updated
5 501
Some way the processor has to figure out to WB the
value later Cycle
will be seen later 7
Exampled Dependency Quiz
Consider a processor with 5 stage pipelined, with the forwarding feature, can issue
10 instructions and execute in one cycle. Which cycle do we EXE and WB for the
following instructions. Assume that the instructions are issued at 0th cycle
given
some pipelined f dotage
EXE WB
am EWER
forwarding
can issue ever
MUL R2,R2,R2 0 I 2 3 4
10 intr in tuple It
ADD R1,R1,R2 I 2 3 4 5
Assume RAW
MUL R3,R3,R3 0 I 2 3 4
hist are issued
or cycle ADD R1,R1,R3 2 3 4 5 6
both MUL R4,R4,R4 0 1 2 3 4
cycle Me
o fetch ADD R1,R1,R4 3 4 5 6 7
l decode
2 ever
3
man
Y WB write the exec s cycle
west other will be set accordingly
8
Removing False Dependencies
RAW – TRUE dependencies… we need to obey, the program
computes that way,
may be delay the EXE, forwarding
Q Can execute if we take care of using the same register for multiple
writes
4 If I can find a way by which waiting onto a
Same every can be avoided Then false dependencies
Can be removed
9
Explaining
Duplicating Register Values
Same
Issue ever
i
C100 C101
ADD R1,R2,R3 E Mfm WB
SUB R4,R1,R5
X
ADDI R3,R4,01
9 E Mfm
E
WB
Mm WB
SUB R4,R8,R9 E WB
Mm
. 3 2sSince this exec
. happened earlier
. and not in program's
DIV Rio RY R11 Order
10
Proposedbot
Register Renaming
Conceptof physical registers
1 programmer visualizes these registers
a
• Architectural registers: REGS for programmer/compiler use
• Physical registers: all places the processor can put values
2h All places where the processor can put values
• In parallel to FETCH and DECODE, the processor can
rewrite the program to use physical registers
Go This is Ma Register renaming Mr for this Reg Allocation table is neg
R2
MUL R2,R2,R2
rear
LAW WRAW R3
MaR1,R1,R2
ADD ADD I.isPs Pl Pt
WAR R4 Py
MUL R2,R4,R4 p
Magaw R5
ADD R3,R3,R2
Ps Plz
R6 Po
MUL R2,R6,R6 MUN Pu PG Pf
R7 Pyo
RaAW Gi
ADD R5,R5,R2
R8 Pbo
a without remaining
she example given is purely Sequential
CASE Y
CASEZ
Cases
find ILP IPC when I div 2ALO Units are present
Cycles 2 I IPC Cal
This processor can issue instr out of order
cycles I I I
ILP 52 2.5
I IPC
I
L5
Kuopiofree
IPC Cale
Casey
ILP 5
2
2.5 sane I Unitfor
Cycles
DIV 1 ALU
I
II
IPC 5
x 4 125
18
Thank You
19