454
‘chapter 6 Enhancing Performance with Pipelining
the system will became the bottleneck, That bottleneck sic of the nex
the instruction level is trying muliproce P exploit parallelism at muck
coarser levels, Parallel processing is the topic of Bf Chapter 9, which appears on
PR) Historical Perspective and Further
f Reading
rs, the earliest superscalar, the development of out-of-order and specu
Exer
62 [5 UF the time for an ALU operation can be shortened by 2
pared to the description in Figute 6.2 on page 373)
Will it affect the speedup obtained from pipelining? how much
Otherwise, why?
by, What if the ALU operation now takes 259% more time?
6.2 [10] <$6.1> A computer architect ne ign the pipeline of a new mi
sme itis perfectly pipelined. How mich speedup will it achieve comps618 Exercises Pr
63 sing a drawin 1456
Chapter 6
Enhancing Performance with Pipelining,457
96.4, 6.5> IB sctice: Forwarding in Memchapter 6
Enhancing Performance with Pipelining614 Exercises 459
paths required and har es that must be detected, ce
of the two operands. The number of cases should equal th f yo}
he hazard if no forwarding existed.
6.36 16.6> We hat ram core consisting pnditional bi
The program core will be executed thousands of times. Below are me
ich for one execution of the program core (T for taken, N for
Branch I: 1-1
Branch 2: N-N-N-N
Branch 4 T-N-T-N-T-N
Branch 4: T-T-T-N-1
ranch 5: T-T-N-P-T-N
Asstume the behavior of each branch the same for ram core execu
For d schemes, assum pwn prediction butler a
ach butter i al to the sume state before each executi tthe prediction
or the following branch prediction scher
el ctor, initialized to predict take
slictor, initialized to weakly predict taken
What are the prediction accuracie
6.37 [10] <9$6.4-6.6> Sketch all the forwarding paths for neh inj
pw when they must be enabled (as we did on ,
6.38 (11 1-6.6> Write the logic tod hazards on h sour
dlid on page 410
6.39 [10] <996, he examp shows hos 1
nance on our pipelined dat ith forwarding and stalls or 7
vad. Rewrite the Following code te ize performance on th pachapter 6 Enhancing Performance with Pipelining
6.40 [20] <$6.6> Consider the pipelined datapath in Figure 6.54 on page 4
attempt to Hush and an attempt to stall occur simultaneously? If so, doth
netions? If there are any cooperating
sty? Is there a simple change you can make to the datapath to ensure t
n ¥ priority? You may want to consider the following code sequence to hel
15 446 or implementing forwarding in Figure
6.7-5 did not consider forwarding ofa result as the value to be stored by
instruction. Add this to the Verilog co
rot consider forwarding of a
ke this simple addition to the Verilog code
6.43 [15] <666.6, 6.7> The Verilog code for implementing branch hazard detect
nd stalls in Figure 6.7.3 on does nat detect the possibility of d.
all data hazards for branch opera
he forwarding and stall logic needed for completin;
6.44 [10] <$96.6, 6.7> Rewrite theVerilog code in 6.7.3 on page 6.
implement a delayed branch st
6.45 [20] <$56.6,6.7> Rewrite the verilog code i Sn page 6.7-6-6
to implem "inch target buffer. Assume the buffer is implemented with a mod
tle with the following definition:
edict rrentPc.n date desti
Lake sure you accomodate all three possiblities: a correct prediction, a mis i
buffer (that is, miss = true), and an incorrect prediction. In the last two eases,
rast also update the predictionChapter 6 Enhancing Performance with Pipelining
6.46 |] month 4, 6.3-6.8> If you have » a simulation syste
as Verilog or ViewLogic, first design i apathy i
Chapter 5, Then evolve this design into a pipelined organization, inh
sure to run MIPS ach si rns i
nitinues to operate correcth
6.9 The following code has been unrolled At not yet sche
me th isa multiple of two (ie a multiple of eight
u $30, L
Schedule this code for fast execution on the standarel MIPS sam
it suppe instruction), Assume initial san x
ranches are resolved in the MEM stage. How does the schedule mn
inst the original unscheduled cod
eas 9> This exercise is similar to Exercise 6.4 nis tin
corde a nrolled twice (creating th le}. However, iti
ot known that the loop index is a multiple of three, and need
invent a means of ensuring that the I execu . 7
Jing some code to the beginning or end of the loop that tak
not handled by the loc
6.49 [2 > Using the code in Exercise 6.47, unrall the cod ”
hedule it for the static multiple-issue version af the MIPS processor des
pages 436439, You may assume that the est stout
6.50 [10 Le technology le ature si i
ecome real wer (as compared to the logic). As logic be ster
the shrinking feature size and Les increase iclays eonsur 7
yeles. ‘That is why the Pentium 1 pipeline stages dedicated to transf
ng data along wires from one part of the pipeline to another. What are the ¢
backs to having to add pipe stages for wir
6.51 [30) <96.10 rocessors are intro f
sions of textbooks, To keep your textbook curren :
ments in this area and write a one-page elab
6.10, Use th Wide Web to exp r of thel614 Exercises
463
5 4:1, Stall on the LW result 2. Bypass the ADD result, 3. N Answers to
Dp | - Check Yourself
BH page 6.7-%: Statements 1 oth tr
3s nly statemen npletel f
2 toxall s partly accurat
be ; predication:sollwa th prediction:
bw oftware: superscalar: hardware; EPIC: both, since there is substant
h supports multiple namie scheduling: hard