Professional Documents
Culture Documents
Superscalar Architecture
• VSP processor dynamically varies the pipeline depth and clock frequency
according to behaviour in a program and unifies plural pipeline stages to one
stage for low-energy operation when the processor workload is light.
• The proposed technique resizes the BTB size along with pipeline scaling. In
addition, to prevent the prediction accuracy from degrading, update of the
BTB by branch instruction type after the BTB size reduces is limited.
• The BTB size can be reduced to one-eight after pipeline unification with only
0.02% prediction accuracy degradation. This results in 9.2% dynamic energy
reduction of the processor core. The leakage energy consumption in the BTB
is reduced by 87.5%.
Problem Statement :
• A deeper superscalar pipeline achieves a higher performance but consumes a
larger energy consumption. For the energy reduction of a deeply-pipelined
processor, a variable stage pipeline (VSP) architecture is proposed which
reduces the energy consumption by dynamically unifying the pipeline stages
according to behaviour in a program.
Variable Stages Pipeline :
• To dynamically vary the pipeline depth, pipeline registers are replaced with
the circuit shown in Fig. 2.
• The proposed technique can be adopted not only into VSP but also into PSU
and DPS.
Related Work :
• A BTB resizing technique is proposed to reduce the energy consumption. It
requires a profiling phase and relies on software. By contrast, our resizing
technique is purely implemented by hardware and does not require any
profiling.
• Lazy BTB aims at BTB energy reduction by filtering out redundant BTB
lookups using a dynamic profiling. However, Lazy BTB degrades 1.7%
performance and has a small penalty (two cycles) for a branch misprediction.
Our technique incurs only 0.05% performance loss on a 4-width superscalar
processor on the average.
Methodology :
• For performance evaluation and power estimation, FabScalar is used as the
baseline processor.
• Table II shows six benchmark programs and reference input sets. Each
benchmark program is forwarded to its single simulation point specified by
SimPoint.
• Figure 6 shows how an instruction goes through fetch stages under a deeper and
shallower pipelines, respectively.
Figure 7 describes the basic Figure 8 shows the detailed BTB
microarchitecture of implementation of the proposed
our BTB resizing technique. technique.
Evaluation :
A. Evaluation of prediction accuracy :
• Figure 9 shows the prediction accuracy in case of reducing the BTB size
from 2K to 64. All the other parameters are the same as Table I.
• Although the drowsy strategy can reduce a significant leakage energy in the
BTB, the difficulty for implementation is a problem because using the
drowsy strategy requires a separate voltage controller for each BTB entry.
• BAF reduces the BTB leakage energy by 83% with the drowsy strategy
whereas this technique reduces that by 87.5% with an easy implemented
leakage control technique at a maximum.
Summary and Future Work :
• BTB resizing technique for variable stages superscalar architecture is
proposed.
• The performance evaluation results show that the proposed technique reduces
the BTB size from 2K to 256 entries with a negligible performance
degradation.