You are on page 1of 2

Review Binod Kumar (Roll No.

-143079023)

Strong points:

The authors in this paper have shown that Value Prediction (VP) can be leveraged to reduce
the aggressiveness of the out-of-order engine, by executing many instructions in-order either
in the frontend or at commit stage. This leads to reduction in the number of physical register
file (PRF) ports required by the out-of-order engine. Thus, significant silicon area savings,
significant power savings in the scheduler and the register file and savings on the access
time of the register file.
Since prediction and validation are done in-order, banking the register file can greatly
decrease the number of PRF ports required by the value prediction hardware. In this way,
they have obtained performance on-par with a wide processor using VP but using a smaller
out-of-order engine & a PRF of similar complexity as a processor without VP.
Experimental results indicate that the proposed technique, EOLE acheives 5% speedup or
higher for few benchmarks and 10% performance enhancement for one benchmark.
The authors have analyzed extensivley the impact of instruction queue size and issue width
on their proposed architecture. The authors have provided an in-depth analysis of the
hardware complexity and proposed few solutions for mitigation of the hardware cost.

Weak points:

The authors claim that the register file in the OoO engine would be less likely to become a
temperature hotspot than in a conventional design. This is not always true as in this
proposal, there is provision of a distributed register file organization with one file servicing
reads from the OoO engine and the other servicing reads from the late-execution or
validation and training stage. However, the temperature hotspot formation depends on
accesses to the register file also. A very frequent (and unusual) access can also lead to local
temperature increase. Without experimental results on temperature simulations, authors
claims can not be substantiated.
The authors have not provided any experimental results on the energy/power savings
although they claim so in reference to the hardware complexity. Merely providing
performance benefits does not appear appealing as other factors (area, power etc.) may get
severely affected leading to lower overall benefits.

Points of disagreement:
If instructions were very well scheduled, eventually removing all stall cycles from RAW
dependencies, it is sure that Value Prediction would have no benefit at all, because the actual
results would always get computed before the predicted value became useful. Agreed that,
perfect scheduling is not possible in current processors, because of their complex pipelines
and memory hierarchies. Thus, analysis of the efficacy of the instruction scheduling and the
required amount of value prediction is needed to establish the impact of Value Prediction in
the proposed architecture.

Suggestions for improvement:


The number of ALUs in the proposed Early-Execution stage could also be limited.
Specifically, -ops and their predicted/computed results could be buffered after
the Early Execution/Rename stage. In this case, dispatch groups of up to 8 -ops can be
built, with the extra constraint of a limited number of at most 4 early-executions
or prediction writes on the PRF per dispatch group.
It would be interesting to evaluate the possible variations of EOLE designs (i.e., the full
range of hardware complexity mitigation techniques for both Early and Late execution)
and the exploration of other possible sources of late Execution, e.g. indirect jumps, returns,
but also store address computations.

Points which are not clear:


The authors major claim is that EOLE allows to increase performance while re-distributing
execution to different stages of the pipeline, reducing the out-of-order engine complexity in
the process. However, the accuracy of predictors still plays a pivotal role. Analysis of the
prediction accuracy on the savings in hardware (in terms of complexity of out-of-order
engine) would have made clear the impact of the suggested microarchitecture changes.

Points to be discussed in class:


It might be possible to further limit the number of effective write ports on the
PRF required by Early-Execution and Value Prediction as many -ops are not
predicted or early-executed. Hence, they do not generate any writes on the PRF
and one could therefore limit the number of -ops that write to the PRF at the
exit of the Early-Execution stage.
Value Prediction can be tackled from the compiler point of view rather than the
microarchitecture. It would be interesting to characterize the interaction of VP and compiler
optimizations. Less optimized programs would benefit more from VP as compulsory
compiler passes enabled effectively eliminate repetitive and very predictable stack loads.
Thus, compiler optimization needs to be discussed in the reference of value prediction to get
a holistic picture of overall benefits.

You might also like