ACA Iitg Sahu Lec09

CS521CSEIITG 11/23/2012
Degreeofoverlap
Serial,Overlapped,Pipelined,
Superpipelined/Superscalar
p
Depth
Shallow,Deep
Structure
Linear,Non linear
Schedulingofoperations
Static,Dynamic
ASahu 1 ASahu slide2
Serial Shallow
Linear A B C
Pipeline
Overlapped
Deep
Nonlinear A B C
Pipelined Pipeline
Sequence:A,B,C,B,C,A,C,A
ASahu slide3 ASahu slide4
type1:beginnings(decode)andendings
Static (putaway)inorder
samesequenceofstagesforallinstructions
type2:onlybeginningsinorder
allactionsinorder
ifoneinstructionstalls,allsubsequent
if one instruction stalls all subsequent
type3:noorderrestrictionsexcept
instructionsaredelayed dependencies
Dynamic type1extended:beginningsinorder,
aboveconditionsarerelaxed referencesthateffectmemorystatearein
higherthroughputisachieved
order
[notethatamemoryreferencemayleadto
pagefault]
ASahu 1
CS521CSEIITG 11/23/2012
Type CPI Datadependencies=>Datahazards

Serial 56 RAW(readafterwrite)
WAR(writeafterread)
Overlapped
pp 3 WAW(writeafterwrite)
WAW (write after write)
Pipelined (static) 1.5 2 Resourceconflicts=>Structuralhazards
Pipelined (dynamic) 1.2 1.5 useofsameresourceindifferentstages
Proceduraldependencies=>Controlhazards
Multiple instruction issue < 1.0 conditionalandunconditionalbranches,calls/returns
previous EX W
read/write instr DataForwarding/
1
previous current R EX HWApproach
instr instr
read/write
current previous W
instr Instruction
instr Reordering
2
/SWApp
delay = 3
current R
instr
DataforwardingpathP1 DataforwardingpathP2
I: add $t1,... I: lw $t1,...
add $s1,$t1,.. add $s1,$t1,..
I IM RF DM RF I IM RF DM RF
ALU
ALU
I+1 IM RF DM RF I+1 IM IM RF DM RF
ALU
ALU
ASahu 2
CS521CSEIITG 11/23/2012
DataforwardingpathP3 DataforwardingpathP4
I: add $t1,... I: lw $t1,...
sw $t1,.. sw $t1,..
I IM RF ALU DM RF I IM RF DM RF
ALU
I+1 IM RF DM RF I+1 IM RF DM RF
ALU
ALU
Dataforwardingpaths Dataforwardingpathlist
I: lw $t1,...
IM RF RF add $s1,$t1,..
ALU
I DM
P2
P1
IM RF RF
ALU
IM DM
I+1 fromALUout(EX/DM)toALUin1/2
IM RF RF
I: add $t1,... P2
ALU
I DM
sw $
$t1,..
,
P3 fromDM/ALUout(DM/WB)toALUin1/2
IM RF RF
ALU
I+1 DM
I: lw $t1,...
P3/P4
IM RF DM RF
sw $t1,.. fromDM/ALUout(DM/WB)toDMin
ALU
I
P4
IM RF DM RF
ALU
I+1
P1=ALUtoALU
1 move $t0 $zero

2 P2=MtoALU
P3=ALUtoM
2 addi $t2, $zero,100 WAW P4=MtoM
Patterson,D.A.,andHennessy,J.L.,Computer
3 L: lw $t2 0($7) P2 P1
OrganizationandDesign:The
4 add $t1 $t2 $s1
5 add $a $t1 $s5 3 4 5 Hardware/SoftwareInterface
6 sw $a 32($s3) 2OPs P3
2 OP
7 add $6 $3 $a
8 addi $t0 $t0 1 6 Chapter6.4/6.5,thirdedition
9
10
lw
sw
$7 0($8)
$7 8($0)
7 Ebook canbefound
11 add $s9 $s9 1 8 P4
12
13
beq
hlt
$t0 $t2 L
9
ASahu 17 ASahu slide18
ASahu 3
CS521CSEIITG 11/23/2012
CausedbyResourceConflicts
Useofahardwareresourcein A B A C
morethanonecycle
A B A C Nonlinear A B C
A B A C Pipeline
Differentsequencesof
A B C D
resourceusagebydifferent 1 2 3 4 5 6 7 8
instructions A C B D
Reservation Table A X X X
forX
Nonpipelinedmulticycle F D X X (RequiredResources
B X X
resources F D X X ofinstructionin C X X X
Cycle)
Multifunctional
Pipeline A B C 1 2 3 4 5 6 7 8 9 10 11
A 1 2 3 1 4 12 5
1,2 23 6
2,3
1 2 3 4 5 6 7 8 B 1 1,2 2,3 3,4 4,5
ReservationTable A Y
X Y X X C 1 1,2 1-3 2-4
forX
B X Y X
forY
13means
C Y X Y X Y X Collisions 1,2,3
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
A 1 2 1 3 1 2 2 A 1 12
1,2 1 23
2,3
B 1 1 2 2 3 3 B 1 1 2 2
C 1 1 2 1 2 3 2 2 C 1 1 1 2 2
Collisions
ASahu 4
CS521CSEIITG 11/23/2012
m.21
NoCollisionfor1,8,3and6interval
CollisionvectorforX 1011010
1:collision
0:nocollision
1,8,1,8,.
1 8 1 8 (1,8)
(1 8) avg =4.5
=45 8+
1011010
1011010
A
3,3,3,3,. (3) avg =3 1011010
3 8+
6,6,6,6,. (6) avg =6 6 8+ 1
1011011 1011011 1111111
C
MinimumAverageLatency?
3 6 B
LatencyCycles MAL> maxno.ofcheckmarksinanyrow

(1,8)(1,8,6,8)(3)(6)(3,8)(3,6,3) MAL< avg latencyofanygreedycycle
SimpleLatencyCycles(nofigurerepeats)
(1,8)(3)(6)(3,8)(6,8) avg latencyofanygreedycycle<
GreedyLatencyCycles no.of1sininitialcollisionvector+1
(1,8)(3) fromdifferentstartingstates
A B C
KaiHwang,"AdvancedComputerArchitecture:
Consideragreedycycle(k1,k2,..,kn) Parallelism,Scalability,Programmability
Letp=no.of1sininitialcollisionvector Chapter6
k11 < p
p+1
k2< 2p k1 +2 ki <p+1,
k3< 3p k1 k2 +3 k 1 +k2<2p+2
.
kn < np k1 k2 kn1 +n
k1 +k2 +kn < np+n MAL< p+1
ASahu 5
CS521CSEIITG 11/23/2012
ImprovingBranchPerformance
condeval targetaddrgen
BranchElimination
branch replacebranchwithotherinstructions
instr
BranchSpeedUp
nextinline
reducetimeforcomputingCCandTIF
p g
instr
i t delay = 2
BranchPrediction
target
instr guesstheoutcomeandproceed,undoif
delay = 5 necessary
theorderofcond eval andtargetaddr gen maybedifferent BranchTargetCapture
cond eval maybedoneinpreviousinstruction
makeuseofhistory
ASahu slide31
ASahu 6

ACA Iitg Sahu Lec09

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ACA Iitg Sahu Lec09

Uploaded by

Copyright:

Available Formats

CS521CSEIITG 11/23/2012

Type CPI Datadependencies=>Datahazards

ASahu slide7 ASahu slide8

1 move $t0 $zero

LatencyCycles MAL> maxno.ofcheckmarksinanyrow

ASahu slide29 ASahu slide30

You might also like