You are on page 1of 1

Paul Fake and Stephen Beard

Lab 6 Write-up

Note: We would like to give credit to Adam Miller for his test suite that we used to test
our instructions. Also, with mispim compiled with O3, Shang should run in about 3
minutes for O3, and about 20 minutes for no optimization.

1) If you are building a processor and have to do static branch prediction (meaning
you have to assume at compile time whether a branch is taken or not), how should
you do it? You can make a different decision for branches that go forward or
backward.
Backwards branches are typically within loops, and loop conditions will almost always
pass, sometimes very many times, whereas the conditions will only fail once per loop.
Thus, backwards branches tend to be taken more often than not, so it is useful for the
compiler to assume a backwards branch will be taken and place one of the branch-taken
instructions into the branch delay slot. As for forward branches, many more forward
branches are taken than not taken without optimization, but many more are not taken than
taken with optimization. Predicting that they will not be taken will slow down the non-
optimized case, but it will speed up the optimized case, so the processor should predict
forward branches will not be taken.

2) How good is the gcc MIPS compiler in filling the branch delay slot?
Without any optimization, the compiler does not fill any useful instructions into the
branch delay slot. With O3, however, there are 8.25 times as many useful instructions
after a branch than there are useless instructions, so the compiler does a pretty decent job
putting useful instructions in the delay slot.

3) How good is the gcc MIPS compiler in avoiding load-use hazards?


Whether optimized or not, the instruction after a load does not use the result of the load.
In order to fix hazards, the compiler places a no-op in place of the hazardous instruction.
So, the compiler does a good job avoiding load-use hazards.

4) If you are building a 256-byte direct-mapped cache, what should you choose as
your block (line) size?
For O3, a block size of 32 bytes yields the highest hit rate, so that would be an ideal size
for a 256-byte cache (assuming that most programs are release with optimization).

5) What conclusions can you draw about the differences between compiling with no
optimization and -O3 optimization?
O3 optimization places useful instructions in branch delay slots, puts more non-hazardous
instructions after a load, and makes better use of registers, resulting in fewer loads and
stores. Without optimization, branch delay slots are filled with no-ops, load-use hazards
are remedied with no-opts, and most variables are stored into memory instead of
registers.