Professional Documents
Culture Documents
Degreeofoverlap
Serial,Overlapped,Pipelined,
Superpipelined/Superscalar
p
Depth
Shallow,Deep
Structure
Linear,Non linear
Schedulingofoperations
Static,Dynamic
ASahu 1 ASahu slide2
Serial Shallow
Linear A B C
Pipeline
Overlapped
Deep
Nonlinear A B C
Pipelined Pipeline
Sequence:A,B,C,B,C,A,C,A
ASahu slide3 ASahu slide4
type1:beginnings(decode)andendings
Static (putaway)inorder
samesequenceofstagesforallinstructions
type2:onlybeginningsinorder
allactionsinorder
ifoneinstructionstalls,allsubsequent
if one instruction stalls all subsequent
type3:noorderrestrictionsexcept
instructionsaredelayed dependencies
Dynamic type1extended:beginningsinorder,
aboveconditionsarerelaxed referencesthateffectmemorystatearein
higherthroughputisachieved
order
[notethatamemoryreferencemayleadto
pagefault]
ASahu slide5 ASahu slide6
ASahu 1
CS521CSEIITG 11/23/2012
previous EX W
read/write instr DataForwarding/
1
previous current R EX HWApproach
instr instr
read/write
current previous W
instr Instruction
instr Reordering
2
/SWApp
delay = 3
current R
instr
ASahu slide9 ASahu slide10
DataforwardingpathP1 DataforwardingpathP2
I: add $t1,... I: lw $t1,...
add $s1,$t1,.. add $s1,$t1,..
I IM RF DM RF I IM RF DM RF
ALU
ALU
I+1 IM RF DM RF I+1 IM IM RF DM RF
ALU
ALU
ASahu 2
CS521CSEIITG 11/23/2012
DataforwardingpathP3 DataforwardingpathP4
I: add $t1,... I: lw $t1,...
sw $t1,.. sw $t1,..
I IM RF ALU DM RF I IM RF DM RF
ALU
I+1 IM RF DM RF I+1 IM RF DM RF
ALU
ALU
Dataforwardingpaths Dataforwardingpathlist
I: lw $t1,...
IM RF RF add $s1,$t1,..
ALU
I DM
P2
P1
IM RF RF
ALU
IM DM
I+1 fromALUout(EX/DM)toALUin1/2
IM RF RF
I: add $t1,... P2
ALU
I DM
sw $
$t1,..
,
P3 fromDM/ALUout(DM/WB)toALUin1/2
IM RF RF
ALU
I+1 DM
I: lw $t1,...
P3/P4
IM RF DM RF
sw $t1,.. fromDM/ALUout(DM/WB)toDMin
ALU
I
P4
IM RF DM RF
ALU
I+1
P1=ALUtoALU
ASahu 3
CS521CSEIITG 11/23/2012
CausedbyResourceConflicts
Useofahardwareresourcein A B A C
morethanonecycle
A B A C Nonlinear A B C
A B A C Pipeline
Differentsequencesof
A B C D
resourceusagebydifferent 1 2 3 4 5 6 7 8
instructions A C B D
Reservation Table A X X X
forX
Nonpipelinedmulticycle F D X X (RequiredResources
B X X
resources F D X X ofinstructionin C X X X
Cycle)
ASahu slide19 ASahu slide20
Multifunctional
Pipeline A B C 1 2 3 4 5 6 7 8 9 10 11
A 1 2 3 1 4 12 5
1,2 23 6
2,3
1 2 3 4 5 6 7 8 B 1 1,2 2,3 3,4 4,5
ReservationTable A Y
X Y X X C 1 1,2 1-3 2-4
forX
B X Y X
forY
13means
C Y X Y X Y X Collisions 1,2,3
ASahu slide21 ASahu slide22
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
A 1 2 1 3 1 2 2 A 1 12
1,2 1 23
2,3
B 1 1 2 2 3 3 B 1 1 2 2
C 1 1 2 1 2 3 2 2 C 1 1 1 2 2
Collisions
ASahu slide23 ASahu slide24
ASahu 4
CS521CSEIITG 11/23/2012
m.21
NoCollisionfor1,8,3and6interval
CollisionvectorforX 1011010
1:collision
0:nocollision
1,8,1,8,.
1 8 1 8 (1,8)
(1 8) avg =4.5
=45 8+
1011010
1011010
A
3,3,3,3,. (3) avg =3 1011010
3 8+
6,6,6,6,. (6) avg =6 6 8+ 1
1011011 1011011 1111111
C
MinimumAverageLatency?
3 6 B
ASahu slide25 ASahu slide26
A B C
ASahu slide27 ASahu slide28
KaiHwang,"AdvancedComputerArchitecture:
Consideragreedycycle(k1,k2,..,kn) Parallelism,Scalability,Programmability
Letp=no.of1sininitialcollisionvector Chapter6
k11 < p
p+1
k2< 2p k1 +2 ki <p+1,
k3< 3p k1 k2 +3 k 1 +k2<2p+2
.
kn < np k1 k2 kn1 +n
k1 +k2 +kn < np+n MAL< p+1
ASahu 5
CS521CSEIITG 11/23/2012
ImprovingBranchPerformance
condeval targetaddrgen
BranchElimination
branch replacebranchwithotherinstructions
instr
BranchSpeedUp
nextinline
reducetimeforcomputingCCandTIF
p g
instr
i t delay = 2
BranchPrediction
target
instr guesstheoutcomeandproceed,undoif
delay = 5 necessary
theorderofcond eval andtargetaddr gen maybedifferent BranchTargetCapture
cond eval maybedoneinpreviousinstruction
makeuseofhistory
ASahu slide31
ASahu 6