Abstract FSMs to automatically recover from single event upsets (SEUs) within the This paper presents a technique for same clock cycle. We shall report on the automatic error correction of FSMs for efficacy of this method to correct single single event upsets (SEU). We present a bit errors and an extension to detect two general technique of adding Hamming bit errors. Area and delay results of error checking bits to automatically using this technique on selected designs recover from single bit errors within the are also reported. same clock cycle. Experimental results demonstrate the efficacy of the method. Section 2 introduces the FSM model and the effect of single event upsets. Section 3 describes the use of Hamming error- 1 Introduction checking bits to automatically correct single bit errors. The experimental technique is presented in Section 4. FPGAs are increasingly being used for Results on selected FSM examples are mission-critical applications in presented in Section 5. Section 6 hazardous operating environments discusses the current status and future (space, military, medical etc.). In these work. applications the circuit must be fault- tolerant, especially finite state machines (FSMs), where failures are very hard to 2 FSM Model and Single Event detect [1]. Most single-fault-tolerant Upsets FSMs are implemented using triple module redundancy (TMR) or by using a A finite state machine is defined to be a single-error-correcting (SEC) code 6-tuple [10]: during state encoding [6, 7]. Typical instances of SEC employ a minimal M ( I , O , S , , , s0) encoding (binary, Gray, etc.) scheme to where: force a Hamming [2] distance of three. I is a finite nonempty set of inputs; O is Commercially available FPGA synthesis a finite nonempty set of outputs; S is a tools [8, 9] only implement error finite nonempty set of states; recovery to a specified recovery state. : I S S is the state transition They do not implement error correction. function; : I S O is the output In this paper, we investigate a general function; and s0 is the initial state. Figure technique of adding Hamming error
Kumar 1 118/MAPLD 2004
1 shows the hardware model of an FSM. 3 Error Correction An FSM is typically modeled as a set of state registers to store the state vector, We use a Hamming error-correcting combinational state decode for the next- code [1, 2, 3, 10]. To the n encoding bits, state and output logic. k parity checking bits are added to form an (n + k)-bit code. The location of each Outputs of the bits in the new code is assigned a Inputs decimal value starting at 1 for the most Combinational significant bit and n + k to the least Logic significant bit. k parity checks are performed on selected bits of each encoding. The result of each parity check is recorded as 1 if an error has been detected or 0 if no error has been detected. State Present State Vector Next state The number of parity bits k must satisfy the inequality 2k n k 1 . The parity Clock Reset checks are performed such that their value when an error occurs is equal to Figure 1: FSM Model the decimal value assigned to the location of the erroneous bit, and is Consider an FSM with five states S0, S1, equal to zero if no error occurs. This S2, S3, and S4 encoded using a binary or number is called the position number. sequential encoding. The encodings are The parity checking bits are placed in displayed in Figure 2: positions 1, 2, 4, …, 2k-1 such that they S0 000 are independent of each other and S1 001 encoded only in terms of the encoding S2 010 bits. Consider the example in Figure 2, S3 011 the original encoding uses three bits S4 100 (n=3), then k = 3 satisfies the above Figure 2: Example FSM Encoding inequality, thus three parity bits must be If the FSM state register experienced a added to the encoding bits to generate single event upset, say in state S0, it the error correcting code. could place the FSM in states S1, S2, or S4 depending on which bit of the state- The position numbers are shown in register experienced the upset. It is hard Table 1. From the table we observe that to determine if the current state is a an error in position 1, 3, or 5 should result of a valid state transition or the result in a 1 in the least significant bit result of a single event upset. It is hard to (c0) of the position number. Hence, the recover from such upsets because the code must be designed to have digits in upset actually placed the FSM in a legal position 1, 3, and 5 to have even parity. state. If the FSM had transitioned from If a parity check of these bits shows an S4 to states 101 or 110, it would have odd parity, the corresponding position been easier to detect the illegal transition number bit (c0) is set to 1, otherwise to and recover from the illegal state. 0. Also, from the table we observe that
Kumar 2 118/MAPLD 2004
an error in position 2, 3, or 6 should 4,5,6 parity check: 0 0 1; parity check is result in the center bit (c1) being set to 1 odd; c2 = 1. and an error in position 4, 5, 6 should 2,3,6 parity check: 0 0 1; parity check is result in the most significant bit (c2) odd; c1 = 1. being set to 1. 1,3,5 parity check: 0 0 0; parity check is Table 1: Position Numbers even; c0 = 0. Error position Position number The position number c2c1c0 is 110, c2 c1 c0 which means that the location of the 0 (no error) 0 0 0 error is in position 6. To correct the 1 0 0 1 error, the bit in position 6 is inverted, 2 0 1 0 and the correct encoding 000000 is 3 0 1 1 obtained. 4 1 0 0 5 1 0 1 The error correction outlined in this 6 1 1 0 section has been implemented to handle any encoding style and is automatically The parity bits are chosen as follows: inserted for FSMs. p1 is selected to establish even parity in bits 1, 3, 5. 4 Experimental Technique P2 is selected to establish even parity in bits 2, 3, 6. We collected seven sample FSM designs P3 is selected to establish even parity in (in VHDL) with varying complexity bits 4, 5, 6. with regards to number of states, inputs, Table 2: Hamming code for FSM and outputs. We added error correction Position 1 2 3 4 5 6 to the FSMs and wrote out a VHDL p1 p2 e2 p3 e1 e0 description with the correction logic 0 0 0 0 0 0 (after logic synthesis). 0 1 0 1 0 1 1 0 0 1 1 0 We generated a test-bench with a 1 1 0 0 1 1 thousand random test vectors. We 1 1 1 0 0 0 simulated the original FSM description and forced one of the state-bits in the Table 2 shows the Hamming code for synthesized VHDL to a constant value the example FSM from Figure 2. and simulated it and compared the outputs of the original and synthesized The error location and correction is designs. The outputs were identical performed by computing the parity because the single bit error was checks and determining the position corrected. We conducted the same number. For example, if we assume that experiment by forcing one of the parity bit zero of the encoding (e0) of state S0 is bits, we observed the same behavior, inverted (SEU), the new code is: because the single bit error was 000001. The three parity checks yield corrected. the following position number bits: We then forced two state bits (two parity bits) and re-simulated the synthesized
Kumar 3 118/MAPLD 2004
design and now we observed differences Vol. C-20, No. 10, October 1971, pp. between the original and synthesized 1167 – 1177. designs. The two bit errors actually produced functional errors whereas the [5] R Leveugle, “Optimized state single bit errors were corrected. assignment of single fault tolerant FSMs based on SEC codes,” Proc. 30th DAC, The other experiment compared the area 1993, pp. 14 – 18. and delay results of this technique with the more common usage of TMRs. [6] C. Bolchini, R. Montandon, F. Salice, and D. Sciuto, “A State Encoding 5 Results For Self-Checking Finite State Machines,” Proceedings of the ASP- Results on a set of example FSMs is DAC '95/CHDL '95/VLSI '95., IFIP presented in this section. International Conference on Hardware Description Languages; IFIP Use Darren’s tables and graphs here. International Conference on Very Large Scale Integration., Asian and South 6 Discussion Pacific, 1995, pp. 711 – 716.
[7] R. Rochet, R. Leveugle, and G.
We discuss the current status and future Saucier, “Analysis and Comparison of extensions. Fault Tolerant FSM Architecture Based on SEC Codes,” Proc. IEEE References International Workshop on Defect and Fault Tolerance in VLSI Systems, 1993, [1] S. Leveugle, L. Martinez, “Design pp. 9 – 16. methodology of FSMs with intrinsic fault tolerance and recovery [8] Precision™ Synthesis Reference capabilities,” IEEE Proc. EuroASIC’92, Manual, Mentor Graphics Corp. 2004. 1992, pp. 201 – 206. [9] Synplify Pro™ Reference Manual, [2] R.W. Hamming, “Error Detecting Synplicity Inc. 2003. and Error Correcting Codes,” The Bell System Technical Journal, Vol. 29, April [10] Z. Kohavi, Switching and Finite 1950, pp. 147 – 160. Automata Theory, McGraw-Hill Book Company. [3] D.B. Armstrong, “A general method of applying error correction to [11] M. Berg, “A Simplified Approach synchronous digital systems,” The Bell to Fault Tolerant State Machine Design System Technical Journal, Vol. 40, No. for Single Event Upsets,” Mentor 2, March 1961, pp. 577 – 593. Graphics Users’ Group User2User Conference, 2004. [4] J.F. Meyer, “Fault-tolerant sequential machines,” IEEE Trans. On Computers,