You are on page 1of 6

CS521CSEIITG 11/23/2012

Degreeofoverlap
Serial,Overlapped,Pipelined,
Superpipelined/Superscalar
p
Depth
Shallow,Deep
Structure
Linear,Non linear
Schedulingofoperations
Static,Dynamic
ASahu 1 ASahu slide2

Serial Shallow

Linear A B C
Pipeline
Overlapped

Deep
Nonlinear A B C
Pipelined Pipeline

Sequence:A,B,C,B,C,A,C,A
ASahu slide3 ASahu slide4

type1:beginnings(decode)andendings
Static (putaway)inorder
samesequenceofstagesforallinstructions
type2:onlybeginningsinorder
allactionsinorder
ifoneinstructionstalls,allsubsequent
if one instruction stalls all subsequent
type3:noorderrestrictionsexcept
instructionsaredelayed dependencies
Dynamic type1extended:beginningsinorder,
aboveconditionsarerelaxed referencesthateffectmemorystatearein
higherthroughputisachieved
order
[notethatamemoryreferencemayleadto
pagefault]
ASahu slide5 ASahu slide6

ASahu 1
CS521CSEIITG 11/23/2012

Type CPI Datadependencies=>Datahazards


Serial 56 RAW(readafterwrite)
WAR(writeafterread)
Overlapped
pp 3 WAW(writeafterwrite)
WAW (write after write)
Pipelined (static) 1.5 2 Resourceconflicts=>Structuralhazards
Pipelined (dynamic) 1.2 1.5 useofsameresourceindifferentstages
Proceduraldependencies=>Controlhazards
Multiple instruction issue < 1.0 conditionalandunconditionalbranches,calls/returns

ASahu slide7 ASahu slide8

previous EX W
read/write instr DataForwarding/
1
previous current R EX HWApproach
instr instr
read/write
current previous W
instr Instruction
instr Reordering
2
/SWApp

delay = 3
current R
instr
ASahu slide9 ASahu slide10

DataforwardingpathP1 DataforwardingpathP2
I: add $t1,... I: lw $t1,...
add $s1,$t1,.. add $s1,$t1,..
I IM RF DM RF I IM RF DM RF
ALU

ALU

I+1 IM RF DM RF I+1 IM IM RF DM RF
ALU

ALU

ASahu 2
CS521CSEIITG 11/23/2012

DataforwardingpathP3 DataforwardingpathP4
I: add $t1,... I: lw $t1,...
sw $t1,.. sw $t1,..
I IM RF ALU DM RF I IM RF DM RF

ALU
I+1 IM RF DM RF I+1 IM RF DM RF
ALU

ALU
Dataforwardingpaths Dataforwardingpathlist
I: lw $t1,...
IM RF RF add $s1,$t1,..
ALU

I DM

P2
P1
IM RF RF
ALU

IM DM
I+1 fromALUout(EX/DM)toALUin1/2
IM RF RF
I: add $t1,... P2
ALU

I DM
sw $
$t1,..
,
P3 fromDM/ALUout(DM/WB)toALUin1/2
IM RF RF
ALU

I+1 DM

I: lw $t1,...
P3/P4
IM RF DM RF
sw $t1,.. fromDM/ALUout(DM/WB)toDMin
ALU

I
P4
IM RF DM RF
ALU

I+1

P1=ALUtoALU

1 move $t0 $zero


2 P2=MtoALU
P3=ALUtoM
2 addi $t2, $zero,100 WAW P4=MtoM
Patterson,D.A.,andHennessy,J.L.,Computer
3 L: lw $t2 0($7) P2 P1
OrganizationandDesign:The
4 add $t1 $t2 $s1
5 add $a $t1 $s5 3 4 5 Hardware/SoftwareInterface
6 sw $a 32($s3) 2OPs P3
2 OP
7 add $6 $3 $a
8 addi $t0 $t0 1 6 Chapter6.4/6.5,thirdedition
9
10
lw
sw
$7 0($8)
$7 8($0)
7 Ebook canbefound
11 add $s9 $s9 1 8 P4
12
13
beq
hlt
$t0 $t2 L
9
ASahu 17 ASahu slide18

ASahu 3
CS521CSEIITG 11/23/2012

CausedbyResourceConflicts
Useofahardwareresourcein A B A C
morethanonecycle
A B A C Nonlinear A B C
A B A C Pipeline
Differentsequencesof
A B C D
resourceusagebydifferent 1 2 3 4 5 6 7 8
instructions A C B D
Reservation Table A X X X
forX
Nonpipelinedmulticycle F D X X (RequiredResources
B X X
resources F D X X ofinstructionin C X X X
Cycle)
ASahu slide19 ASahu slide20

Multifunctional
Pipeline A B C 1 2 3 4 5 6 7 8 9 10 11
A 1 2 3 1 4 12 5
1,2 23 6
2,3
1 2 3 4 5 6 7 8 B 1 1,2 2,3 3,4 4,5
ReservationTable A Y
X Y X X C 1 1,2 1-3 2-4
forX
B X Y X
forY
13means
C Y X Y X Y X Collisions 1,2,3
ASahu slide21 ASahu slide22

1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
A 1 2 1 3 1 2 2 A 1 12
1,2 1 23
2,3
B 1 1 2 2 3 3 B 1 1 2 2
C 1 1 2 1 2 3 2 2 C 1 1 1 2 2

Collisions
ASahu slide23 ASahu slide24

ASahu 4
CS521CSEIITG 11/23/2012

m.21
NoCollisionfor1,8,3and6interval
CollisionvectorforX 1011010
1:collision
0:nocollision
1,8,1,8,.
1 8 1 8 (1,8)
(1 8) avg =4.5
=45 8+
1011010
1011010
A
3,3,3,3,. (3) avg =3 1011010
3 8+
6,6,6,6,. (6) avg =6 6 8+ 1
1011011 1011011 1111111
C
MinimumAverageLatency?
3 6 B
ASahu slide25 ASahu slide26

LatencyCycles MAL> maxno.ofcheckmarksinanyrow


(1,8)(1,8,6,8)(3)(6)(3,8)(3,6,3) MAL< avg latencyofanygreedycycle
SimpleLatencyCycles(nofigurerepeats)
(1,8)(3)(6)(3,8)(6,8) avg latencyofanygreedycycle<
GreedyLatencyCycles no.of1sininitialcollisionvector+1
(1,8)(3) fromdifferentstartingstates

A B C
ASahu slide27 ASahu slide28

KaiHwang,"AdvancedComputerArchitecture:
Consideragreedycycle(k1,k2,..,kn) Parallelism,Scalability,Programmability
Letp=no.of1sininitialcollisionvector Chapter6
k11 < p
p+1
k2< 2p k1 +2 ki <p+1,
k3< 3p k1 k2 +3 k 1 +k2<2p+2
.
kn < np k1 k2 kn1 +n
k1 +k2 +kn < np+n MAL< p+1

ASahu slide29 ASahu slide30

ASahu 5
CS521CSEIITG 11/23/2012

ImprovingBranchPerformance
condeval targetaddrgen
BranchElimination
branch replacebranchwithotherinstructions
instr
BranchSpeedUp
nextinline
reducetimeforcomputingCCandTIF
p g
instr
i t delay = 2
BranchPrediction
target
instr guesstheoutcomeandproceed,undoif
delay = 5 necessary
theorderofcond eval andtargetaddr gen maybedifferent BranchTargetCapture
cond eval maybedoneinpreviousinstruction
makeuseofhistory
ASahu slide31

ASahu 6

You might also like