Professional Documents
Culture Documents
Ieee
Ieee
net/publication/348851345
CITATIONS READS
3 601
4 authors, including:
Osman Hasan
National University of Sciences and Technology
313 PUBLICATIONS 2,768 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Runtime Hardware Trojan Monitors Through Modeling Burst Mode Communication Using Formal Verification View project
Statistical Model Checking of Relief Supply Location and Distribution in Natural Disaster Management View project
All content following this page was uploaded by Mahmoud Masadeh on 31 January 2021.
Abstract—Redundancy has been a general method to produce a overhead can be reduced by adapting a selective or partial
fault-tolerance system. The Triple Modular Redundancy (TMR) redundancy [4], to protect only the essential parts of the system
with majority voters covers 100% single fault-masking, where while keeping the remaining non-essential parts unprotected.
the minimum area overhead is 200%. On the other hand, ap-
proximate computing is suitable for applications that can tolerate On the other hand, approximate computing with its design-
errors and imprecision in their underlying computations. Thus, efficiency has been proposed to minimize the overhead asso-
inexact results allow reducing the computational complexity and ciated to fault tolerance [5].
hardware requirements with increased performance and power Approximate computing (AC) has emerged as an enabler for
efficiency. This work explains how approximate computing could enhanced performance and reduced power consumption [6]. It
provide low-cost fault-tolerant architectures with an enhanced
system’s reliability. In particular, we implement a novel Quadru- has been used in a large range of applications, e.g., multimedia,
ple Modular Redundancy (QMR) designs using three identical machine learning, big data and scientific applications, which
approximate modules in addition to the exact module. Moreover, are error-resilient, i.e., these applications are able to tolerate
a two-steps magnitude-based voter is proposed to be able to imprecision in their underlying computations, due to several
tolerate approximation error. To validate our approach, we factors including a noisy and/or redundant input data and
conducted experiments and the results showed the ability to
achieve high fault tolerance, i.e., 99.88%, while reducing the user perceptual limitations [7] [8]. Approximation accuracy
probability of system failure by 15%, with 62% and 49.5% depends on the application, the user and input data, where
reduced area and power, respectively, compared to the traditional the real definition of good quality of results is flexible. For
TMR. example, in image processing, the final result images are
perceived by humans, which is subjective between different
I. I NTRODUCTION
people. Image processing applications can accept up to 10%
Modern integrated circuits (ICs) are increasingly moving errors in an error-tolerant context [9]. Such an error margin for
towards reduced feature sizes with high integration density. accepted results can be leveraged upon to improve execution
However, this consequently results in reduced noise margins time and energy. In this paper, we investigate the use of AC
and an increase in susceptibility of applied voltages to external in designing efficient fault-tolerance systems.
effects caused by noise and radiation [1]. Such critical effects Triple Modular Redundancy (TMR) is one of the most tradi-
can cause permanent damage (hard errors) or transient faults tional fault-tolerance techniques, which consists of a triplicated
(soft errors), that may lead to a faulty system. Therefore, module and a voter for masking the errors, which may affect
error mitigation through fault tolerance, at both hardware one of the three redundant modules. The overhead of the TMR
and software levels, is proposed to solve this major issue, is a 200% increase in area, where the output of the system is
which significantly affects embedded and microprocessor- compared with the gold execution to detect the occurrence
based systems. of faults that cause errors. Approximate TMR (ATMR) [10]
Fault-tolerant computing can be done in hardware, soft- has been presented for hardware projects as a way to achieve
ware or in a hybrid fashion. In hardware-based systems, a fault coverage as good as the traditional TMR without a
components redundancy techniques, such as double modular huge area overhead. However, as stated in [11], the previously
redundancy (DMR) and triple modular redundancy (TMR), researched approaches of ATMR, depend on including dif-
are used for fault detection and correction, respectively [2]. ferent versions of the approximate module, where only one
For software-based systems, redundancy techniques include module could give an inexact output at each input vector
the addition of redundant code, without additional hardware (these modules are approximated in a complementary manner).
components. Such technique is very attractive in designs This allows the majority voter, for any input vectors, to select
based on commercial-off-the-shelf designs such as commercial two matching outputs out of three. However, this can be an
microprocessors [3]. Moreover, hybrid solutions combine the unrealistic assumption, since different approximate modules
benefits of both hardware and software techniques. could lead to different results with minor similarities between
The addition of redundant components, either hardware or their outputs. To overcome such infeasible assumption, we
software or both, introduces a non-negligible implementation propose a highly-reliable innovative scheme by using three
overhead in terms of size, power and performance. Such identical approximate modules in addition to the exact one.