You are on page 1of 5

Abstract— Memory cells have been protected from soft errors for more than a decade due to the

increase in soft error rate in logic circuits, the encoder and decoder circuitry around the memory blocks have become susceptible to soft errors as well and must also be protected. I introduce a new approach to design fault-secure encoder and decoder circuitry for memory designs. The key novel contribution of this paper is identifying and defining a new class of error-correcting codes whose redundancy makes the design of fault-secure detectors (FSD) particularly simple. Further quantify the importance of protecting encoder and decoder circuitry against transient errors, illustrating a scenario where the system failure rate (FIT) is dominated by the failure rate of the encoder and decoder. I prove that Euclidean Geometry Low-Density Parity-Check (EG-LDPC) codes have the fault-secure detector capability. Using some of the smaller EGLDPC codes, I can tolerate bit or nanowire defect rates of 10% and fault rates of 10-18 upsets/device/cycle, achieving a FIT rate at or below one for the entire memory system and a memory density of 1010 bit/cm 2 with nanowire pitch of 10 nm for memory blocks of 10 Mb or larger. Larger EG-LDPC codes can achieve even higher reliability and lower area overhead. Index Terms—Decoder, encoder, fault tolerant, memory, nanotechnology.

I.INTRODUCTION Nanotechnology provides smaller, faster, and lower energy devices, which allow more powerful and compact circuitry; however, these benefits come with a cost—the nanoscale devices may be less reliable. Thermal- and shot-noise estimations[5],[8] alone suggest that the transient fault rate of an individual nanoscale device may be orders of magnitude higher than today‘s devices. As a result, we can expect combinational logic to be susceptible to transient faults, not just the storage and communication systems. Therefore, to build fault-tolerant nanoscale systems, we must protect both combinational logic and memory against transient faults. In the present work we introduce a fault-tolerant nanoscale memory architecture which tolerates transient faults both in the storage unit and in the supporting logic. In this paper, we introduce a fault-tolerant nanoscale memory architecture which tolerates transient faults both in the storage unit and in the supporting logic (i.e., encoder, decoder (corrector), and detector circuitries). Particularly, we identify a class of errorcorrecting codes (ECCs) that guarantees the existence of a simple fault-tolerant detector design. This class satisfies a new, restricted definition for ECCs which guarantees that the ECC codeword has an appropriate

redundancy structure such that it can detect multiple errors occurring in both the stored codeword in memory and the surrounding circuitries. We call this type of error-correcting codes, fault-secure detector capable ECCs (FSD-ECC). The parity-check Matrix of an FSD-ECC has a particular structure that the decoder circuit, generated from the parity-check Matrix, is Fault-Secure. The ECCs we identify in this class are close to optimal in rate and distance, suggesting we can achieve this property without sacrificing traditional ECC metrics. We use the faultsecure detection unit to design a fault-tolerant encoder and corrector by monitoring their outputs. If a detector detects an error in either of these units, that unit must repeat the operation to generate the correct output vector. Using this retry technique, we can correct potential transient errors in the encoder and corrector outputs and provide a fully fault-tolerant memory system. However combinational logic has already started showing susceptibility to soft errors, and therefore the encoder and decoder (corrector) units will no longer be immune from the transient faults. Therefore, protecting the memory system support logic implementation is more important. Here we proposed a fault tolerant memory system that tolerates multiple errors in each memory word as well as multiple errors in the encoder and corrector units. We illustrate using Euclidean Geometry codes and Projective Geometry codes to design the faulttolerant memory system, due to their well-suited characteristics for this application. II.SYSTEM OVERVIEW An overview of our proposed reliable memory system is shown in Fig. 1 and is described in the following. The information bits are fed into the encoder to encode the information vector, and the fault secure detector of the encoder verifies the validity of the encoded vector. If the detector detects any error, the encoding operation must be redone to generate the correct codeword. The codeword is then stored in the memory. During memory access operation, the stored codewords will be accessed from the memory unit. Codewords are susceptible to transient faults while they are stored in the memory; therefore a corrector unit is designed to correct potential errors in the retrieved codewords. In our design all the memory words pass through the corrector and any potential error in the memory words will be corrected. Similar to the encoder unit, a fault-secure detector monitors the operation of the corrector unit. All the units shown in Fig. 1 are implemented in fault-prone, nanoscale circuitry,the

Further detail the scrubbing operation and potential optimization to achieve high performance and high reliability. In order to avoid accumulation of too many errors in any memory word that surpasses the code correction capability. is Fault-Secure. 2) Any two points are connected by exactly one line. To perform the periodic scrubbing operation. 4) Two lines intersect in exactly one point or they are parallel.only component which must be implemented in reliable circuitry are two OR gates III. where hi . Large memories are conventionally organized as sets of smaller memory blocks called banks.1 OVERVIEW OF OUR PROPOSED FAULTTOLERANT MEMORY ARCHITECTURE that accumulate the syndrome bits for the detectors. generated from the parity-check Matrix.. EG is a finite geometry that is shown to have the following fundamental structural properties: 1) Every line consists of  points. Let EG be a Euclidean Geometry with n points and J lines. Euclidean Geometry codes are also called EG-LDPC codes based on the fact that they are low-density paritycheck (LDPC) codes [14]. are fault-secure detector capable ECCs (FSD-ECC). the normal memory access operation is stopped and the memory performs the scrub operation.2 BANKED MEMORY ORGANIZATION WITH CLUSTER SIZE OF 2 H displays the lines that intersect at a specific point in EG and have weight  . during this period. the system must perform memory scrubbing. Therefore. transient errors accumulate in the memory words over time. i.ECCS WITH FAULT SECURE DETECTOR Particularly. LDPC codes have a limited number of 1‘s in each row and column of the matrix. A row in H displays the points on a specific line of EG and have weight  . 3) Every point is intersected by  lines. respectively. whose rows and columns corresponds to lines and points in an EG Euclidean geometry. The rows of H are called the incidence vectors of the lines in EG. The reason for breaking a large memory into smaller banks is to trade off overall memory density for access speed and reliability. MEMORY SCRUBBING Memory scrubbing is the process of periodically reading memory words from the memory. this limit guarantees limited complexity in their associated detectors and correctors making them fast and light weight [9]. j  0 otherwise. FIG. EUCLIDEAN GEOMETRY CODE REVIEW The construction of Euclidean Geometry codes based on the lines and points of the corresponding finite geometries [27]. We call this type of error-correcting codes. each memory bit can be upset by a transient fault with certain probability. and the columns of H are called the intersecting vectors . j  1 if and only if the ith line of EG contains the jth point of EG. correcting any potential errors. restricted definition for ECCs which guarantees that the ECC codeword has an appropriate redundancy structure such that it can detect multiple errors occurring in both the stored codeword in memory and the surrounding circuitries. they do not intersect.A column in FIG. Data bits stay in memory for a number of cycles and. and writing them back into the memory [12]. The parity-check Matrix of an FSD-ECC has a particular structure that the decoder circuit. and hi . This class satisfies a new. we identify a class of errorcorrecting codes (ECCs) that guarantees the existence of a simple fault-tolerant detector design. Let H be a J  n binary matrix.e.

n  2 2t 2t detector circuit for the (15. Therefore each bit of the syndrome vector is the product of c with one row of the parity-check matrix. we present a brief review of this correcting technique. For the whole detector. the implementation has n syndrome bits instead of n  k . 2t IV. parity-check matrix H of an EG Euclidean geometry can be formed by taking the incidence vector of a line in EG and its 2  2 cyclic shifts as rows. which is basically implementing the following vector-matrix multiplication on the received encoded vector c and parity-check matrix H : s  c. is considered here.  3t . It is shown in [15] that type-I 2-D EG-LDPC has the following parameters for any positive integer t  2 : • Information bits. t • Minimum distance. 5) EG-LDPC CODE An error is detected if any of the syndrome bits has a nonzero value. CORRECTOR One-step majority-logic correction is a fast and relatively compact error-correcting technique [1]. 1 . nn . This binary sum is implemented with an XOR gate. therefore this code is a cyclic code. Then we show the one-step majority-logic corrector for EGLDPC codes. • Dimensions of the parity-check matrix. Therefore. A special subclass of EG-LDPC codes. H is the incidence matrix of the lines in EG over the points in EG. to generate one digit of the syndrome vector we need a  -input XOR gate. 7. and therefore the number of rows do not necessarily represents the rank of the H matrix. The (2 2t  1)  (2 2t  1) .4 SERIAL ONE-STEP MAJORITY LOGIC CORRECTOR . The final error detection signal is implemented by an OR function of all the syndrome bits.3 FAULT-SECURE DETECTOR FOR (15.DESIGN STRUCTURE FAULT SECURE DETECTOR The core of the detector operation is to generate the syndrome vector. Since the row weight of the parity-check matrix is  . The output of this -input OR gate is the error detector signal. 5) EG-LDPC code. H are not necessarily linearly independent. • Row weight of the parity-check matrix. type-I 2-D EG-LDPC.   2t . This product is a linear binary sum over digits of c where the corresponding digit in the matrix row is 1.  2 t . it takes n(   1) 2-input XOR gates.H T ………… (1). Since the matrix is n  n . d m in  2  1 . k  2 • Length. In this section. There is a limited class of ECCs that are onestep-majority correctable which include type-I twodimensional EG-LDPC. • Column weight of the parity-check matrix. The rank of H is n  k which makes the code of this matrix linear code. 7. It is important to note that the rows of FIG. Fig. It is shown in [15] that H is a LDPC matrix. or (   1) 2-input XOR gates. 3 shows the FIG. and therefore the code is an LDPC code.of the points in EG.

encoder. ren(read enable). [1] Shu Lin and Daniel J. here address line is the information vector. The memory output is combination of coded vector and error vector. 2004.5 BEHAVIORAL WAVEFORM FOR THE MEMORY SYSTEM SIMULATION FAULT SECURE Figure 6 shows the schematic diagram of the proposed memory system. And also takes less area compared to other ECC techniques and in this architecture there is no need of decoder because we use systematic generated matrix. Error Control Coding. The majority value indicates the correctness of the code-bit under consideration. inputs are I (information vector). V.5. This corrected coded word is given to the detector to FIG. e.6 SCHEMATIC OF THE PROPESD MEMORY SYSTEM VI. Forshaw. ONE-STEP MAJORITY-LOGIC CORRECTOR One-step majority logic correction is the procedure that identifies the correct value of a each bit in the codeword directly from the received codeword. This method consists of two parts: 1) Generating a specific set of linear sums of the received vector bits and 2) Finding the majority value of the computed linear sums. This memory output is given as an input to the corrector which corrects the coded word. The theory behind the one-step majority corrector and the proof that EG-LDPC codes have this property are available in [1]. otherwise it is kept unchanged. and e (error vector) to introduce an error.g. Avoiding iteration makes the correction latency both small and deterministic. In this the encoded word is given to the memory for this if ‗wen‘ is ‗1‘(high) data is write into memory in a perticular address. wen(write enable). In fig. . R. 2004 [5] M. if the majority value is 1. D. Nikolic´.5. Costello. Prentice Hall.e.RESULTS AND ANALYSIS The behavioral simulation waveforms for the fault secure memory system is shown in fig.SUMMARY In this project implementation of fault secure encoder and decoder for memory applications.‖ Nanotechnology.At the corrector side detector sinal is ‗md‘. Stadler. pp. Here we overview the structure of such correctors for EGLDPC codes. This sum is called Parity-Check sum. and K. If ‗ren‘ is high data is read and given as an output of memory. vol. These steps correct a potential error in one code bit let‘s say. clk. S220–S223.check whether coded word is correct or not. ―A short review of nanoelectronic architectures. c n 1 . Crawley. The main advantage of the proposed architecture is using this detect-and-repeat technique we can correct potential transient errors in the encoder or corrector output and provide fault-tolerant memory system with fault-tolerant supporting circuitry. FIG.. and detector circuitries). The one-step majority logic error correction is summarized in the following procedure. 15. decoder (corrector). Using this architecture tolerates transient faults both in the storage unit and in the supporting logic (i. the bit is inverted. second edition. this is in contrast to the general messagepassing error correction strategy which may demand multiple iterations of error diagnosis and trial correction. This technique can be implemented serially to provide a compact implementation or in parallel to minimize correction latency. The core of the one-step majority-logic corrector is generating  parity-check sums from the appropriate rows of the parity-check matrix. A linear sum of the received encoded vector bits can be formed by computing the inner product of the received vector and a row of a parity-check matrix.

MA: MIT Press. 47. vol.‖ IEEE Trans. [14] W.. ―Low-density paritycheck codes based on finite geometries: A rediscovery and new results. [] Y. 39. Weldon. J. S. [12] A. Jul. A.. Tang. Press. no. Jan.I. 2001.‖ IEEE Trans. no. 7. no. Costello. 2nd ed. pp. Englewood Cliffs. Lin and D. Saleh. 572–596. pp. Kou. 51. Jr. Cambridge. MA: M. pp. ―Error rate in current-controlled logic processors with shot noise. 2711–2736. G. 83–86. Reliab. Inf. Kish. 2005. [27] H. [15] S. NJ: Prentice-Hall. Error Control Coding. 2004. Cambridge. Lin. and M. J. Feb.‖ Fluctuation Noise Lett. 1972. Theory. 2. P. Serrano. J. 1990. C. . W. 2nd Ed. Inf. Kim and L.T. Xu. and K.. Lin. Peterson and E.[8] J. 114–122. 1. S. vol. no. J. 1. Low-Density Parity-Check Codes. Theory. pp. 1963. and J. Gallager. Fossorier. 2004. ―Reliability of scrubbing recoverytechniques for memory systems. ―Codes on finite geometries.‖ IEEE Trans. vol. 4. vol. Patel. S. Abdel-Ghaffar. [9] R. Error-Correcting Codes.

7 7057080398907.:9 147 90        ! ./0. 2.35489.41 90 54398 3   %0701470   8 90 3.7 8:24.88 41  ! .97 2:95.947  .4/08   8   9     9       W4:230941905.2.70 430 8905 2.390 3.4/0  ' $$%#&%&# &%$&#%% # %0 .36:0 (  %070 8 ..4/0   850. 343074 .97  %8 574/:.3/ 9070147090. . 07090.088.89.9390.9 90 748 41  . .8.4/0/ .943 8 .947 41 .8 90 1443 5.7 3/0503/039  . 07747 /090.47708543/3/9 3 90 2.08 90 .0790543983 98 843 3 ( 9.97    9 8  .907747 .7 30. 30 3  . 25020393 90 1443 .97    &% $&#%% # #      !  3077478/090. 9 430 74 41 90 5.0.2.97  9 8 25479. -701 70.7.3 :.4/0  $3.9 41 .4/08  950   ! 8.97  .  . !  3 98 80.4/0  $3.4/0 8 .8 748 90701470 98 .0.943-98          W039  3   9 9 /090.90 8 9007747/090.0/ 03.2.947 2.:0  %0 13. 1.3-014720/-9.2.9 . 3.7 .4770.7 -3..3/5.0./ 41 3   %0  9  L  9   5..97 74 8   %8 -3. 8198 .3/ 98     ./03.3419083/7420 -98 ..908  47 90 40 /090.9.3  # .3042097.36:0  %03 0 84 90 430 8905 2.4770.39 94 3490 9. 90 83/7420 -98  %0 4:95:9 41 98 35:9 # .947 147  !..943 8 94 0307.79 .209078147.908   9       9 W32:2/89.79 .0.0.039007 9 K   W31472.88 41 8 9.9 8 .0 41 98 .:/0 950  94 /203843.0. .0.9..  %        $#  ##% #   $%!  #%   %0701470 0.4770.479 .341   8 3  .0.943 43 90 70..97   41 .438/070/070 98843 3 ( 9.8 3 83/7420 -98 3890. 2.0.97 %07.97419030834.4770.4/0 41 982.79 .90430/9419083/7420.8 .4770..90 90 83/7420 .0  / 23   3 L 3   W 2038438 41 90 5.0. ! 2.943 .0.07/9841 .943  0 5708039 .9470300/ ..470 41 90 /090..90 47 8   35:9 #.3 # 1:3.0. 2.90     848 90 .97     W #4 09 41 90 5.08 3 8   35:9  #.7 8:2 8 25020390/ 9 .0.3/70.79 .9 950     ! .947 8 90 574/:..79 . 8:-.479 4.79 .978 8  940307. . 2.947 4507.9  8 .0 .088.93 90.34190  2.0 2. 8 35:9 #.79 ..   ##% #  30 8905 2.947 . 8 -. 30.943 41 .-0 .947  9 9. 8 25020390/ - .3/ 90701470 90 3:2-07 41 748 /4 349 30..978 3 L 3   902502039. 290/ .947 .0.4/08.3.7.943 83.3!.97 30.479 4.425.70 349 30.0 902./03.94783.90/1.090740941905. -9 41 90 83/7420 .0.

43/0/943    ( 478.0/ .8094130.943 2.0.3//0.0 90 897:.943 47 3 5.4/0/47/8.90/2.. 07747 .7.9 2502039.943 ..:702024788902884331  3 1  35:98.0.97    ($:3.4/0/ . 3   ' #$&%$$$ %0 -0.930884190. .3/90574419.39.0 .42-3.7.:980.4770. 0  .0 41 3.4    55 $ $    . -49 82.7.947 1 703 8 /.  %82094/.9 2502039..3/ 07747.9 47 349 9 90 .0790/ 49078098 059 :3.4/07.4770.55.3/ 574.479 4.47.78:2419070.947 .947-98.70 31472.93.  8479 70.//70883089031472.:9 80.344  .:800:8088902.9 549039..9743.90.79 .03 .335:9 94 90 .307747 3989003.36:0 .3/ 4. 2.90 ..70/ 3 90 1443 574.941 90 70.9438 41 07747 /. .:08 90-983.38039 1.0.9.3/ .30/  %0 9047 -03/ 90 430 8905 2. 03 790 03.3488 .0.38039 077478 3 90 03.4770.850.30 48904 7747439744/3 !7039..39 20247 88902 9 1.947 .90 748 41 90 5.7.479 4.7 .943.0.:.90. 82:.3/ 0 07747 .79 .0.947 .943 41 1.4/047/9883.:70 03.4/0/ 47/ 8 .3/ 2:950 907.943  .07.7.90.470 41 90 430 8905 2.9478 147  !. 80.0307.4770.2  41 905745480/2024788902     $% %!# !$ # $$%   ' $&#     3 98 5740.9:708  .0 98 5745079 .3 -0 25020390/ 807.0.8.4770.943 ... /.70.4/07 47 .-0  .9 1742 90 70.02.908 90 .3/ 7050.943 190 2. -9 3 90 .4770. 8:2  %0 .479 .7.0/ !.0 . 077473430.94783.947 .:9 9407.3/3908:5547934.943 .9 /039108 90 .041905745480/.0 5.0.425:90/30. .1.4770.3/     3/3 90 2.4770..70 .9 . 94 574.3/ /090.943 8 90 574.425.943 897.947 4:95:9 .36:08 .4770.4770.9.947 %8202474:95:98.943 8 8:22.4/0-909 88.947 94 .4/07 .:0 41 90 .:98 -49 3 90 8947.03..4770.2088.0.0.0.08 90 .4/0714720247. .39 8:554793 .4770.-0  703 70..557457.3/. 0907 .78:28      %0 2.9:70 9070 8 34 300/ 41 /0.0/03.425.9 !.2./0 . 0.908 97.947.7.0. .479 4.479.90.9438 &83 98 . 870. 5079.4/07 -0.3400.79 0.4397.97  %8 8:2 8 ..0.947 8 0307.014728 147 901.943. .0/:70  %080 89058 .8 2/     ' # $&%  ' #  # % &% $&#  #$$% :70  848 90 8.438/07./07  7.4770.9.4770.3490. 8 790 394 20247 3 . /02.3-014720/-.84 9.36:0 0 .4/047/ /70.0.4/0/. 5.8.03 94 90 /090. 97./ 03..4/08 .78:2841 9070.0:39.4/0/ 47/ %8 .  0 03. 8:28 1742 90. .//7088  070 .9.03 94 9020247  147 98 1 03 8    /..3.0.4/0/ 47/ 8 .4770.425:93903307574/:.9 90.:9 9407../..:0 41 .79 .3/.9:708:83 98 /090.98 90 .4770.:0 3/.943 41 .479.:9708  %0 2.3 4:95:9 41 20247  %0 20247 4:95:9 8 .8994900307./0 1./.9 .  %8 90.9:70 41 8:.4/08  30.479 .4/3 907.947 .903.4/07  /0.-0 3 (  070 0 4.7.9 .0.97  %0 430 8905 2.3/ 3 98 ..0/. 74 41 .0.90/ .0/:70 9.9:70 9407.4/0 -9:3/07.93  5.3/ /09072389.. .08 088 .4770..3/ 97.43889841945.4770..947  94 3974/:.947  .70/ 94 4907  90..9478/0/090. 549039.903.947 -98.798   0307. 2.:97  3/ .4770.4770.  $%! #%   ##% #  30 8905 2.3 .0 942320 .0/ . # $9.883 07747 . .

3    (      !0907843 .0790.439740/ 4.4    34    55     :     (   $..11.:77039 .93 4/08 3// .38 31 %047 .2-7/0  %!7088    (   4:  $  3  .80/ 43 1390 04209708  70/8.90  #0.07 4 0389!.    ( %.3/   !.79 0.3/   !    4884707  4 /0389 5.07 .7:--3 70.36:08 147 20247 889028   %7.-  .90 3 . 574.7  4/08 431390 04209708 %7.943480099 .4.0.34  .4/08 .4.79 .9:.38  #0. .3  : $ 3 .4  34  55   .-9 41 8.3/ 30 708:98   %7.3/  48904 7747439744/3 3/0/  3044/ 118 !7039.3/  $ -/0 .38  31  %047  .0    $077. (   2 .08847898493480 :.0 .2-7/0   % !7088   ($ 3.4  34  55    0-           .4/08 -.3/     0/43  7  7747 4770.3/   8  7747 7.4  34   55       (#  .