You are on page 1of 2

RECVF review --- Binod kumar, 143079023

Strong points

The paper utilizes the widely reserached concept of redundant execution to achieve fault-tolerance
using the idea of execution assistance. The main contribution is the introduction of forwarding of
critical values which act as hints for the execution of the second core (the trailing core). This leads
to an energy-efficient fault-tolerant architecture. The proposed technique cleverly uses DVFS to
obtain reduction in energy consumption. The experimental results show improvement in energy
consumption as compared to two of the previously reported techniques, CRT and PVA. Generally,
usage of per-core DVFS has shown to be effective in power reduction of a chip-level redundant
architecture. This reduction though is appreciable only for some kind of programs such as those
having poor branch prediction and L1 data cache performance by utilizing the slack created due to
these events which provides an opportunity to operate the trailing core at lower frequency. It is
indeed a strong point that the technique (i.e., critical value forwarding) introduced in this paper
does not rely on these program properties. Another strong point is that the area overhead is quite
minimal as compared to other fault-tolerant architectures.
The paper shows that forwarding of values of critical instructions provides fault-tolerance
simulataneously with improvement in power. This is definitely a better approach then simply
forwarding the outcome of branches or forwarding the outcome of all load and branch instructions
(sum of these two constitute nearly one-third of total instructions in SPEC'00 programs) as reported
in the literature. This is clearly summarized in their key finding that 80% of the speedup of
forwarding can be achieved by forwarding the results of just 10-15% of all instructions. The
performance degradation with the proposed technique is also not much.

Weak points

One of the weak points of the proposed technique is its probable inability to cover
all faults that occur in the cache coherence related circuitry. This is because it does not redundantly
access the memory hierarchy for unverified cache lines obtained from cache-to-cache transfers. The
authors could have commented on this aspect, regarding how fault recovery can still be done upto
some extent (if possible). Another weak point is that the proposed technique does not provide any
hint towards fault localisation (i.e., where the fault occured- whether trailing core/leading core?). If
fault localisation can be done, then debugging for the corresponding electrical errors becomes
relatively easier. The authors could have analyzed the chances of fault diagnosis with some
additional amount of hardware on the lines of Online Diagnosis of Hard Faults
in Microprocessors by Bower et al. ACM TACO-2007. The authors claim that, the architecture
provides a high degree of coverage for processor control and execution logic. This may not be
always true, i.e., for all kind of programs. Say, if a sparse matrix multiplication/computation is
performed, then the execution logic is relatively less exercised. The proposed architecture would
surely be able to perform fault recovery in this case. However, in some other program, a fault
(which was latent during sparse matrix computation) may appear and the fault recovery/fault
detection may not be that easy/smooth .

Disagreement

For the identification of critical instructions for value forwarding, the authors utilise the concept
introduced in reference no. 32. These are merely heuristics and a plausible explaination for their
relevance for all kind of programs need to be provided. Although results shown by the authors in
present work seem to validate these heuristics, a reasoning as to why other heuristics do not perform
as good as fanout2 heuristic is definitely needed.
As is common with any signature based error detection technique, the chances of aliasing are
relevant to this paper also when a fingerprint generated by faulty leading core matches with that of
the trailing core or vice-versa. The authors do not comment much on these scenarios. A very
interesting case is when both the core are faulty and due to this, somehow the generated fingerprint
(signatures) match, how does the fault recovery happen under such a scenario.

Point of discussion

This method can be used for coarse-grained multithreading to schedule several trailing threads on
few processor cores and allow leading threads to occupy individual cores. This can definetely
improve CMP throughput as compared to the proposal in this paper. The idea of critical value
forwarding can be used for the scheduling in this case. Thus, compared to this proposal where each
leading/trailing thread combination requires two cores for execution, there is a need to multiplex
multiple trailing threads on a single trailing core. However, the multiplexing scheme must ensure
small performance penalty as compared to non-redundant execution.

Possibilities of improvement

There are few opportunities (enhancements) which stem out of the use of execution assistance
using critical value forwarding. One of them is exploring the possibility of adaptive critical value
forwarding. The technique of adaptive critical value forwarding can be based on monitoring of
some parameters of the leading core's execution. One such parameter can be to forward attempts to
increase execution assistance for the programs by monitoring retirement stalls in the leading core.
Similar to this, branch forwarding may also be carried out in an adaptive manner. The fanout()
heuristic fails to gauge the importance of special-cases of branch/jump instructions as these do not
produce any values to be consumed. However, they can be very important for performance. So,
branch, jump and call instructions which are mispredicted should be treated on critical paths and
result values of these instructions are forwarded from leading to trailing core where the branch
outcomes can be used instead of branch predictions.

Suggestions

Execution assistance is an ideal candidate technique for performance enhancement in addition to


fault-tolerance. Many work in literature have observed that redundant execution in leader-follower
fashion can help achieve gain in performance. This however requires the execution to be adaptive in
nature for optimium improvement in performance. This is because some programs require more
execution assistance (i.e., more instruction results to be forwarded) to achieve good performance.
However, a static scheme of forwarding would have to provide this higher amount to all programs
even if they need/not. This wastes interconnect power, core power, core-to-core bandwidth and chip
area. Compared to this, execution assistance in an adaptive fashion provides higher assistance to
only to programs that need it. This can increasing the hardware efficiency while achieving the goal
of fault-tolerance, improvement in performance and energy efficiency.
The enhancement suggested in Point of Discussion can potentially solve the problem of
throughput loss which is inherent in the present proposal because two cores are getting used to
execute a single program. However, another important aspect is the order in which execution
requests are handled by trailing core. For a multiplexed scheme, priority-based scheduling can be
performed such that a higher priority is assigned to trailing core threads that are stalled in the
leading core. This can also assist in performance improvement.

You might also like