You are on page 1of 1

Nicolas Farrell

CPE 315
Lab 6

1. Based on the data garnered through mipsim static branch prediction is largely dependent on
the level of compiler optimization. For the O0 optimization I found that predicting the branch is
taken is always the better choice. However, for the O3 optimization I found that predicting
backwards branches always taken and forward branches not taken would result in better run
time. Overall, designing a processor I would use backward branches taken, and forward branches
untaken.

2. For the O0 level of optimization the gcc compiler doesn't even bother to try to fill the branch
delay slot: Branch delay slot:
Useful instruction: 0
Not useful instruction: 60024298
However, for the O3 level of optimization the gcc compiler does a very effective job of filling
the branch delay slot 89%: Branch delay slot:
Useful instruction: 50146455
Not useful instruction: 6073340

3. For the O3 level of optimization the compiler is roughly 88% effective at preventing load-use
hazards: Load Use Hazard:
Has load use hazard: 5498222
Has no load use hazard: 41739242
For the O0 level of optimization the compiler is roughly 60% effective at preventing load-use
hazards: Load Use Hazard:
Has load use hazard: 151276387
Has no load use hazard: 230915613

4. For the O0 level of optimization the cache was most effective with a block size of 8 words,
however, there was no significant drop off in performance all the way up to 64 word blocks. For
the O3 level of optimization the caches was most effective with a block size of 32 words, with
the 64 word block only slightly worse. Overall, the best choice for a 256 byte direct mapped
cache would appear to be 32 word blocks, as it would give great performance on low level
optimization and high level optimization.

5. Based off the output garnered it would appear that the primary differences in O0 and O3
optimization are in dynamic code size and effective register usage. O0 optimization required
almost 1.4 billion dynamic instructions, whereas the O3 optimization did the same task in
330million instructions. The register aspect is a result of the reduction in loads and stores. The
O0 level has a massive number of pointless loads and stores, while the O3 level effectively fills
its registers to avoid loads and stores.