A Novel Fault Tolerant Design and an Algorithm for Tolerating Faults in Digital Circuits

R.V.Kshirsagar1, R.M.Patrikar2 Priyadarshini College of Engg. & Arch., Nagpur. 440019 India 2 Visvesvaraya National Institute of Tech, Nagpur. 440022 India e-mail: ravi_kshirsagar@yahoo.com , rajendra@computer.org
1

Abstract - This paper proposes a novel fault tolerant algorithm for tolerating stuck-at-faults in digital circuits. We consider in this paper single stuck-at type faults, occurring either at a gate input or at a gate output. A stuck-at-fault may adversely affect on the functionality of the user implemented design. A novel fault tolerant design based on hardware redundancy (replication) is presented here for single fault model to tolerate transient as well as permanent faults. The design is also suitable to be used for highly dependable systems implemented by means of Field Programmable Gate Arrays (FPGAs) at RTL level. This approach offers the possibility of using larger and more cost effective devices that contain interconnect defects without compromising on performance or configurability. The algorithm presented here demonstrates the fault tolerance capability of the design and is implemented for a full adder circuit but can be generalized for any other digital circuit. Using exhaustive testing the functioning of all the three full adders can be easily verified. In case of occurrence of stuck-atfaults; the circuit will configure itself to select the other fault free outputs. We have evaluated our novel fault tolerant technique (NFT) in five different circuits: full adder, encoder, counter, shift register and microprocessor. The proposed design approach scales well to larger digital circuits also and does not require fault detection. We have also presented and compared the results of triple modular redundancy (TMR) method with our technique .All possible faults are tested by injecting the faults using a multiplexer. Index Terms - Fault tolerance, fault injection, field programmable gate arrays (FPGA), reconfiguration, triple modular redundancy (TMR), novel fault tolerant technique (NFT).

I.

INTRODUCTION

Reliability and performance are the two important factors becoming major concern for next generation very deep sub-micron systems. Their reduced voltage supplies and therefore noise margins, together with their reduced internal capacitances, will dramatically increase their susceptibility and sensitivity to radiations and noise in general, making system’s failures extremely likely [1], [2]. As a consequence, not only systems oriented to mission critical applications (e.g., space, avionic, transport, etc.) will reinforce the use of fault-tolerance, but also general purpose systems implemented by next generation very deep sub-micron technologies, including FPGAs, will require the use of some form of fault tolerance [4], [5]. There are two fundamentally different approaches that can be taken to increase the reliability of computing systems. The first approach is called fault prevention (also known as fault intolerance) and the second, fault tolerance. In the traditional fault prevention approach, the objective is to increase the reliability by a priori elimination of all faults. Since this is almost impossible to achieve in practice, the goal of fault prevention is to reduce the probability of a system failure to an acceptable low value. In the fault tolerance approach, faults are expected to occur during computation, but their effects are automatically countered by incorporating redundancy – that is, additional resources - so that valid computation can continue even in the presence of faults. These resources may include additional hardware (hardware redundancy), the addition of redundant information (information redundancy), additional software (software redundancy), more time (time redundancy), or a combination of all these. They are redundant in the sense that they can be omitted from a system without affecting its normal operation. Most of the early work in fault-tolerant system

1

While this application is still an important one. known as opens. if two or more of the voter’s three inputs are a ‘1’. this approach has limited effectiveness in counteracting faults in hardware and reducing the number of applications in which system failures once per day or once per week are not acceptable.1. The basic concept of triple redundancy is that a sensitive circuit can be hardened to SEUs by implementing three copies of the same circuit and performing a bit-wise ‘majority vote’ on the output of the triplicate circuit as shown in Fig. power consumption and the like. 2.e. then the boolean equation for the voter is: V= AB + AC + BC. However. Reliability is improved by using reliable components. We have designed a circuit (NFT) as shown in Fig. i. power consumption and the like. B. and safety are of vital importance. and the output V. based on the algorithm presented here. 1. Whereas our design is 100% fault tolerant and reliable. The typical faults affecting interconnections are their breaking. and so on. Original Circuit I n p u t s V O T E R The Truth-Table for majority vote circuit is shown in Table 1. 2 . The drawback of the TMR circuit is that the voter circuit is not 100% reliable. known as shorts. The common form of modular redundancy in practical systems is the Triple-Modular Redundancy (TMR) used for single event upset (SEU) mitigation [6]. The circuit in question can be a mere flip flop or an entire logic design. 2. and unwanted connections of points. refined interconnections. respectively.As per the field studies . The function of the majority voter is to output the logic value (‘1’ or ‘0’) that corresponds to at least two of its inputs. then the output of the voter is a ‘1’. availability. which replaces the voter circuit of TMR method.3. such non-permanent faults are the dominant cause of very large scale integration(VLSI)circuits/system failures (82-98%) [3]. A fault-tolerant design can provide dramatic improvements in system availability and lead to substantial reduction in maintenance costs as a consequence of fewer system failures. For example. redundancy is the most commonly used approach. and C. If the inputs of the voter are labeled A. Such fault is logically modeled as the signal being stuck-at corresponding fixed logic value v (0. which are usually not available.. fault tolerance is now regarded as a desirable and in some cases an essential feature of a wide range of computing systems. fault prevention) techniques have been preferred mainly because a redundant design results in increased overhead in terms of area. Majority voter circuit Duplicated Circuit Output Duplicated Circuit Fig. nonredundant (i. TABLE 1 MAJORITY VOTE TRUTH-TABLE A 0 0 0 0 1 1 1 1 B 0 0 1 1 0 0 1 1 C 0 1 0 1 0 1 0 1 V 0 0 0 1 0 1 1 1 The logic gate representation of the majority voter is shown in Fig.design was motivated by aerospace applications and in particular by the requirements for computers to be able to operate unattended for long period of time. 1) and it is denoted by s-a-v. regardless of the inputs that would normally affect it.e.Modeling intermittent and transient faults require statistical data on their probability of occurrence. though it results in increased overhead in terms of area. The algorithm and design presented here deal not only with the permanent stuck-at-faults but also the transient faults in digital circuits [7]. the line has always the same logic value v. To tolerate permanent faults in system hardware. Triple modular redundancy with voter Testing is unlikely to detect the presence of shortlived transient faults [2]. For commercial systems. A short between a signal line and ground or power can make the signal remain at a fixed voltage level. especially in applications where reliability. A B V C Fig.

which consists of following steps: 1. ALGORITHM 17.. It is assumed that the design has: ‘n’ no.…. 14.in). 11. This circuit is also capable of tolerating the bridging faults. 5. In place of the voter circuit we have used a novel circuit.y3. METHODOLOGY The algorithm presented in this paper can be generalized for any digital circuit.e. connect the similar outputs of CUT and its copies to the inputs of EORs.in) The primary outputs are denoted as (y1..osb) Circuit under test (CUT) EOR – Ex-or gate PE – Priority encoder MUX – Multiplexer Algorithm single-stuck-at-fault Inputs : (i1. 13.. The algorithm is presented for a single fault simulation model. the shorts between two signal lines.……. i. Choose the interconnect or output of CUT randomly. 2. 3. 4.ym) begin for CUT do create two more copies CUT and 2*m EOR 6. 16. The idea is implemented for a full adder circuit as shown in Fig. as shown in Fig. of primary outputs. Assign values to the primary inputs..i2..y3.. 21.i3..……. The circuit under test is denoted as CUT here. (i1.y2. 5.. end for for all EOR do create m copies of PE connect similar outputs of EORs to the inputs of PEs end for for all PE do create m copies of MUX connect the outputs of PEs to select lines of MUXs connect the similar outputs of CUT and its copies to the input lines of MUXs end for for all inputs i1 to in do check the outputs y1 to ym at MUXs end for end algorithm This algorithm can be used to generate a fault tolerant design for any digital circuit. 19. 3 .4. 20. Novel fault tolerant circuit A s1 mux 2:1 i1 o pe i2 V B C s2 We have also presented the comparisons between TMR technique and our technique (NFT) in terms of area overhead and performance. 18. The primary inputs are denoted as (i1.in) and CUT Outputs : (y1..i2.e. Compare the monitored values in case (3) and (5).7.i3. We have used the following notations for the main signals.…………. III. 2. 6. 12. Force inverse value (inject fault) on chosen interconnect.3. Perform logic simulation and record the monitored Values. Fig 3.os2.is2. priority encoder and multiplexer to produce fault free output at any moment of time. and a list of combinational circuits. components & circuits: • • • • • • • • 1. consisting of ex-or gates.‘m’ no. Again perform the logic simulation and record the monitored values. i2. 9.. of primary inputs. This approach allows achieving fault tolerance with respect to all possible faults.….ym) The intermediate inputs are denoted as (is1. II. This paper presents a novel technique to tolerate stuck at faults at the outputs of any digital circuit under test (CUT).... 3. If there is a fault at one of the outputs then circuit itself detects the fault and configures to provide the fault free output. i... 4.isa) The intermediate outputs are denoted as (os1. i3.. 8. 10.y2. 15.

Again the similar outputs of the ex-or gates are fed to the priority encoders . the outputs s4 and s5 of ex-or gates will be ‘0’ which are connected to the inputs of priority encoder ..The outputs of encoders are fed to two different multiplexers as select lines. If the input (abci) is “001” then outputs s1 . This system is designed in such a way that it tests the outputs of full adders and if any of them is stuck at any fault level then the circuit selects the fault free output through priority encoder which is finally propagated to the output (sum_o / carry_o) through mux1 / mux2. hence s1(‘1’)/c1(‘0’) is propagated to the output sum_o /carry_o through mux1/mux2 . which are connected to the select lines(o1/o2) of mux1/mux2. which is the correct (fault free) output. Logic node The multiplexer can be inserted at the output net of the circuit to be tested.The priority encoder is designed in such a way that whenever i1/i3 is ‘0’ then the outputs of priority encoders will be assigned a value ‘0’ and whenever i2/i4 is ‘0’ then the outputs of priority encoders will be assigned a value ‘1’ . i.We have used d flip flops (d-ffs) at the outputs of the CUTs so as to synchronize the outputs . The inputs to these multiplexers are the similar outputs of the two full adders.. data ‘1’ ‘0’ node any other circuit. Outputs of full adders (fa1. sum and carry respectively. c2. s3 and c1.e. All possible faults are tested exhaustively by injecting the faults (0. TESTING STRATEGY fault enable Fig 5. This technique can be generalized and implemented for tolerating faults in Three bits input is applied to the full adder and its copies. 4. The similar outputs of the CUT and its copies are fed to the ex-or gates.a1 b1 ci1 fa1 s1 d q q1 c1 d q q2 s4 m 1 i1 i2 q4 s5 pe1 o1 sum_o a2 b2 ci2 fa2 S2 d q q3 c2 d q a3 b3 ci3 fa3 s3 d q q6 q5 c4 c3 i3 i4 d q pe2 m carry_o 2 o2 reset clk Fig. IV. 1) in all the nodes of the CUT using multiplexer as shown in Fig. fa2 and fa3) are s1.5. fa1 and fa3.e. c3.The truth table of full adder for exhaustive testing is shown in Table 2. s2. Here the priority is given 4 . s2 and s3 are equal to ‘1’ . Circuit to tolerate interconnect stuck-at-faults c5 The circuit under test is triplicated. and since both the inputs to the ex-or gates are same .It is assumed that only one fault occurs at a time. i.

Circuit Full Adder 8 to 3 Encoder 3-Bit shift register 3-Bit ripple counter 8-Bit microprocessor No.In this case outputs s1 . then the output s4 of ex-or gate becomes ‘1’ (as s1=q1=‘0’ and s2=s3=‘1’).T. then the output s4 of ex-or gate becomes ‘1’ (as s1=q1=‘1’ and s2=s3=‘0’). s2 and s3 will be ‘0’. 1. outputs s4 and s5 of ex-or gates will also be ‘0’ and as i1 is ‘0’ . 5. Similar testing and detecting technique is applied using the remaining test vectors.random testing as shown in Table 3. i2 becomes ‘0’. of instances 09 22 21 No. TABLE 3 TRUTH-TABLE (PSEUDO-RANDOM TESTING) Input a 0 0 0 1 b 0 1 0 1 ci 0 1 1 1 s 0 0 1 1 Output co 0 1 0 1 TABLE 4 FAULT COVERAGE USING NOVEL FAULT TOLERANT TECHNIQUE (NFT) Sr. 2.364 9.2i Fault tolerance technique None T. of nets 10 29 28 No. of CLBs 01 05 05 No.the select line of mux1 will be ‘0’ making the final sum output ( sum_o )as ‘0’ .M. TABLE 2 TRUTH-TABLE (EXHAUSTIVE TESTING) a 0 0 0 0 1 1 1 1 Input b 0 0 1 1 0 0 1 1 Output ci 0 1 0 1 0 1 0 1 s 0 1 1 0 1 0 0 1 co 0 0 0 1 0 1 1 1 This circuit can also be tested for any stuck at fault using pseudo.e. of o/ps 02 02 02 Total equiv.G. 4. of tolerated faults 46 96 20 42 168 Fault tolerance (%) 100 100 100 100 100 TABLE 5 COMPARISON OF THREE TECHNIQUES IN TERMS OF AREA AND PERFORMANCE FOR FULL ADDER CIRCUIT USING ISE 8. i. Now assume that s1 is s-a-1. of i/ps 03 09 09 5 . of i/o pads 05 11 11 No. of injected faults 46 96 20 42 168 No. No. Delay (ns) 8. 3. of LUTs (F.369 13.to i1/i3 inputs. 2.i2 becomes ‘0’ . No. “000” and “111” and by inserting the faults at either c1 or c2 or c3 for checking the carry output carry_o at the output of mux2. therefore o1 will be ‘1’ which is the select signal for mux1 and hence s3 (‘1’) is propagated to the output sum_o through mux1 making it the fault free/correct output. N.R. Now let us consider the input as “011”. gate counts 12 54 48 Sr. Now assume that s1 is s-a-0. 3. 1.977 No. No. therefore o1 will be ‘1’ which is the select signal for mux1 and hence s3(‘0’) is propagated to the output sum_o through mux1 making it the fault free/correct output.F.) 02 09 08 No..

1998. The methodology could be generalized for other such circuits also in the similar way. Synthesis was done using the Xilinx’s synthesizer tool (XST) of ISE Foundation series 8. “Implementation of self. 1998. pp. 1. Mangione-Smith and M.2i. pp. 212 .Lj. Djordjevic.0.K. Vol. Stojcev.364 ns and our NFT had a delay of 9. 6(2).Lach and W. The technique presented in this paper discusses the stuck at faults with a novel algorithm. We have used Modelsim SE 6. 173-178. Lala PK. 3. 47. 1993. TABLE 6 COMPARISON IN TERMS OF POWER DISSIPATION Sr.2.We have compared the three implementations of full adder circuit to check area.Sechi “Novel Fault-Tolerant Adder Design for FPGA .Stankovic. Using our technique we have reduced the area to some extent.M.77 135. We have successfully implemented and tested this design onto Xilinx’s XCV50-6 PQ240 FPGA. and Giacomo R. Design and Assessment Report.Hanchek and S.journal of Microelectronics Reliability. Power dissipation(mW) 127. Fault tolerance technique None T. performance and power dissipation in Xilinx’s XCV506PQ240 FPGA: normal CUT. 54 – 58. 2001.33.. “Boundary-scan test: A practical approach”. Sandi Habinc.977ns. 2001. the TMR version had a delay of 13. June. H. 2001. This design is also capable of producing the fault free outputs even in case of occurrence of any stuck at faults at the inputs of priority encoders / multiplexers and at the output of priority encoders. T. “Methodologies for Tolerating Cell and Interconnect Faults in FPGAs”. issue 44. December. Monica Alderighi’.369ns. Gaisler Research.checking two-level combinational logic on FPGA and CPLD circuits”. REFERENCES [1] M.T. No. IEEE Transactions on Computers. ver. 2002. January. The improvement in performance and power dissipation was more when used over larger/bigger circuits.003-01. the normal full adder circuit without fault tolerance had a maximum delay of 8. TMR and our technique (NFT) for permanent as well as transient faults. SIMULATION AND SYNTHESIS The fault-tolerance ability of the design was tested and verified by means of VHDL simulation programs. G.Based Systems” . Table 5 shows the results in terms of area and performance for three implementations of full adder circuit. “Functional Triple Modular Redundancy (FTMR)”. 6 . Castilo X et al. In terms of performance. Sergio D’Angelol. pp. Dutt. 15 . N. J. CONCLUSION Many techniques have been suggested in past to detect and tolerate interconnect stuck at faults.R. pp.54 VI. pp. We have written VHDL codes for top entity and various components. FPGA. 2. The circuits were synthesized and tested for Xilinx’s XCV50-6PQ240 FPGA. On VLSI Systems. San Francisco: Morgan Kuffman Publishers.344% in performance using our technique. 1-55. IEEE Proceedings on On–Line Testing Workshop. Power dissipation in each circuit was evaluated using Xilinx’s XPower tool. [2] [3] [4] [5] [6] [7] V.2e for verifying the design.221. Total power dissipation using NFT was also less as compared to TMR method. Dordecht Kluwer Academic Publishers. 2004. Table 4 shows the results in terms of fault coverage for the five circuits mentioned earlier.F. representing an improvement of 25. Potkonjak. “Self-checking and fault-tolerant digital system design”. IEEE Trans. Cecilia Metra’. “Low Overhead Fault-Tolerant FPGA Systems”. Table 6 shows the comparison of three implementations in terms of power dissipation.73 129. Volume 7. F.R.

Sign up to vote on this title
UsefulNot useful