Professional Documents
Culture Documents
A Cost Effective Self Healing
A Cost Effective Self Healing
Abstract—In this paper, self-healing concept for hardware sys- healing capabilities in mind [10]. DMR and TMR are two
tems is investigated and a new approach is proposed. Hardware types of redundancy techniques. In DMR technique, there are
systems have been proposing imitations to biological organisms two identical instances running in parallel for the same appli-
in the way they offer healing and recovery abilities. Digital
systems with inspired homogeneous architecture have improved cation and their outputs are connected to a comparator that
capabilities to compensate for any faults. Self-healing is defined triggers a mismatch whenever a fault occurs. TMR has three
by the ability of a system to detect faults or failures and fix them. identical modules running in parallel to provide fault masking
One of the main problems in current self-healing approaches is by a majority voter at their outputs to mask a signal fault. Em-
area overhead and scalability for complex structures considering bryonic hardware is a hardware system with self-healing based
they are based on redundancy and spare blocks. This paper pro-
poses a different approach for self-healing based on embryonic on replication of small building blocks with exact architecture
structures without a need for spare cells. The area overhead is and it is inspired by the multicellular organisms cell division
lower compared to other approaches relying on spare cells. The and differentiation mechanisms. The drawbacks of current
proposed approach relies on time multiplexing two functions in techniques are area and power consumption where they require
one cell within one clock cycle. The reliability of the proposed redundant hardware which is an extra overhead. EmHW is the
technique is studied and compared to conventional system with
different failure rates. This approach is capable of healing up same principle of self healing mechanism of multi-cellular
to 50% of the cells where each cell can cover another neighbor organisms for fault tolerance hardware system. Similar to
failed cell at most. The area overhead is 9% for the proposed multi-cellular organisms in nature the structure of circuits
approach which is much lower compared to other approaches based on EmHW is two-dimensional array of electronic cells
using spare cell. The proposed approach is applied to investigate (e-cell) with hardware reconfiguration capability. E-cells have
two case studies; ALU array, and neural network.
the abilities of adaptive self-healing (recovering) based on its
Keywords: Self-Healing, Embryonic hardware, Fault toler- structure. The structure is homogeneous where every cell can
ant, Evolvable hardware. perform all set of functions and can be configured to operate
one of them as shown in Fig. 1. Every cell is configured
I. I NTRODUCTION to performs only a specific function that is decided by the
configuration information in each cell. When a cell fails the
Nowadays, the electronic circuit industry is increasing in responsible module of self-diagnosis triggers self-healing and
complexity very rapidly with new devices and architectures the faulty cell will be replace by the spare cell. Fig. 1 shows
of hardware systems. Hardware failures happening -while general cellular structure, it is composed of six modules;
running some real time tasks- are common during operation for I/O module, address module, configuration module, control
heating, aging or surrounding environment parameters. Such module, function module, and detection module. The operation
failures will be a hindrance to the system in order to work of each one is as follows; I/O module is used for transmission
properly. Self-healing is a key approach for reliable electronic signals between different cells. Address module determines
systems that run under harsh environments, such as cosmic gene information from its coordinate of location information
radiation and cruel temperature in space applications. Self- and determines the location of the cells. Configuration module
healing is done via multiple approaches and many of them are stores the configuration information of all the cells which
based on redundancy which relies on recovering faulty cells mimics DNA in biological cells. it also provides information
and the reintegration of them back into the system. Self-repair for realization of self healing. Control module controls all
is the replacement of faulty cells by functioning spare cells. operations of the cell such as fault case, transparent case, and
Self-healing term can be used for both concepts. The role of idle case. Function module is the processing unit or the cor-
self healing is to recover the fault in the system to maintain the responding function which is determined by the configuration
operation with highest performance and longest time possible. information. Detection module detects cell working status, if
Some of the self healing research in the circuit level is based it is in a normal working condition or not.
on the circuit replication [1], [2]. Where, the faulty cell is
replaced by the spare one after the detection of a faulty cell Self-healing is not only crucial for reliability but it’s also
by the controlling part. Self-healing can be achieved in systems one of the main aspects for an intelligent system [10]. Other
over different levels. The first is application level healing [3], self-healing approaches inspired from hardware intelligence
which is the ability of an application, or a service, to heal paradigms include evolvable hardware, where a system can
itself. Next comes the System level healing [4], [5] which be evolved from an initial operating point with errors or low
depends on a programming language and design patterns that fitness until it finds a way around faulty cells through rearrang-
we apply internally. Finally, the level of hardware healing from ing cells with genetic programming. Another mechanism can
architecture and down to circuit techniques is the lowest level be considered where an online synthesis process takes place
of healing [6]. allowing the system to find a new synthesized architecture
There are multiple methods for self-healing such as Dual whenever a fault is detected. Such systems rely on a similar
Modular Redundancy (DMR) [7], Triple Modular Redundancy uniform organization like FPGAs or Genetic Programming
(TMR) [8], Embryonic Hardware EmHW [6], [9], as well units [11].
as other techniques like evolvable hardware and Intelligent The rest of the paper is organized as follows.Section II in-
Hardware System (IHW) framework that are designed with troduces proposed self-healing techniques. Section III presents
Reliability
which has three nodes, three hidden layer with 4 nodes 0.5
in each layer, and output layer with one node. This NN 0.4
is used for classification between two different classes. NN
0.3
is implemented using VHDL and the simulation results are
0.2
obtained using ISE Xilinx 14.4 on vertex 5.
0.1
system. The reliability is the ability of the system to perform Fig. 9. Reliability of conventional system.
its function within a certain period of time. In this paper the
reliability analysis is considered. As many analyses are done
1
in previous work on reliability [15], [16]. The probability of
success for system is 0.9
0.8
cell can run two functions at same clock period for neighbor 0.3
fault cell case then the reliability of the system is given by
0.2
Where m is number of active unites for k function out from Fig. 10. Reliability of the proposed technique.
these unites. Fig. 9 and Fig. 10 show the reliability of the
conventional and proposed technique for different failure rate. 350
Z ∞ 50
The MTTF of the traditional and proposed design with fail- Fig. 11. MTTF of the proposed and conventional technique.
ure rate 0.00315(times/year) is shown in Fig. 11 . Therefor, the
MTTF indicates proposed method has a better performance.
V. C ONCLUSION multiplexing the active cell will perform its task operation and
Digital systems performance reduces with time in case of the task of the faulty cell. So, it gives the flexibility to repair
faulty parts in the system. So, researchers work on self-healing 50% of cells without using spare cells. Also, the reliability of
system to save performance with its age. Many of approaches proposed technique is studied and compared to conventional
are done on self healing which is based on redundancy; the system in different failure rate. The proposed approach is
spare parts are used instead of faulty parts. The main drawback implemented for two cases; ALU array and NN. The proposed
of these approaches is area overhead. This paper presented method is implemented using VHDL and the simulation results
an approach of self healing without redundancy. The idea is are obtained using ISE Xilinx 14.4 vertex 5. For area overhead
based on instead using spare blocks, every block (cell) will the proposed technique has 9% overhead which is small as we
perform as spare cell for its number and using time division compared it with the previous work in the paper.