You are on page 1of 11

1368 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO.

12, DECEMBER 2006

Sequential Element Design With


Built-In Soft Error Resilience
Ming Zhang, Member, IEEE, Subhasish Mitra, Senior Member, IEEE, T. M. Mak, Senior Member, IEEE,
Norbert Seifert, Senior Member, IEEE, Nicholas J. Wang, Quan Shi, Kee Sup Kim, Member, IEEE,
Naresh R. Shanbhag, Fellow, IEEE, and Sanjay J. Patel, Member, IEEE

Abstract—This paper presents a built-in soft error resilience


(BISER) technique for correcting radiation-induced soft errors
in latches and flip-flops. The presented error-correcting latch
and flip-flop designs are power efficient, introduce minimal speed
penalty, and employ reuse of on-chip scan design-for-testability
and design-for-debug resources to minimize area overheads. Cir-
cuit simulations using a sub-90-nm technology show that the pre-
sented designs achieve more than a 20-fold reduction in cell-level
soft error rate (SER). Fault injection experiments conducted on
a microprocessor model further demonstrate that chip-level SER
improvement is tunable by selective placement of the presented
error-correcting designs. When coupled with error correction
code to protect in-pipeline memories, the BISER flip-flop design
improves chip-level SER by 10 times over an unprotected pipeline
with the flip-flops contributing an extra 7–10.5% in power. When
only soft errors in flips-flops are considered, the BISER technique Fig. 1. Circuit schematic of a representative single-port D-latch.
improves chip-level SER by 10 times with an increased power
of 10.3%. The error correction mechanism is configurable (i.e.,
can be turned on or off) which enables the use of the presented geting such platforms, not only the memory elements but also
techniques for designs that can target multiple applications with a the latches and flip-flops, must be protected from soft errors in
wide range of reliability requirements.
order to satisfy the system data integrity requirements.
Index Terms—Circuit simulation, error correction, fault injec- This work presents a new family of sequential element de-
tion, sequential element design, soft error rate (SER). signs with built-in soft error resilience (BISER) that demon-
strates four major contributions:
I. INTRODUCTION 1) error correction technique that achieves a more than
20-fold reduction in the cell-level SER of latches and
flip-flops;

S OFT errors are radiation-induced transient errors caused by


neutrons from cosmic rays and alpha particles from pack-
aging materials [1]. Soft error protection is very important for
2) reuse paradigm that helps lower the power and area penal-
ties of the error correction technique;
3) set of power saving techniques including an economy op-
enterprise computing and communication applications since the eration mode;
system-level soft error rate (SER) has been rising with tech- 4) set of fault injection results to illustrate the chip-level ef-
nology scaling and increasing system complexity. Several de- fectiveness and power penalty of the BISER technique.
signs today implement extensive error detection and correction The rest of this paper is organized as follows. In Section II,
(ECC) mainly for on-chip SRAMs and register-files. However, we describe the principle of the new error-correcting designs.
memory protection is not enough for designs manufactured in Section III describes the reuse paradigm that further reduces the
advanced technologies because soft errors in flip-flops, latches, overhead of the presented technique. Section IV presents the
and combinational logic, also referred to as logic soft errors, are key circuit design considerations and the cell-level characteri-
significant contributors to the system-level SER [2], [3]. Logic zation results. Section V presents the chip-level SER, perfor-
soft errors pose a major challenge to robust enterprise com- mance, power, and area impact of the BISER technique based on
puting and networking platform designs. For many designs tar- fault injection results. Other BISER design variations are briefly
described in Section VI. Finally, related work and conclusions
Manuscript received August 30, 2005. This work was supported in part by are described in Sections VII and VIII, respectively.
MARCO Gigascale Systems Research Center (GSRC).
M. Zhang, T. M. Mak, N. Seifert, Q. Shi, and K. S. Kim are with Intel Cor- II. ERROR CORRECTION IN SEQUENTIAL ELEMENTS
poration, Folsom, CA 95630 USA (e-mail: ming.y.zhang@intel.com).
S. Mitra is with the Department of Electrical Engineering, Stanford Univer- A master–slave flip-flop, as implemented in several cycle-
sity, Stanford, CA 94305 USA. based microprocessor designs, is composed of a master latch
N. J. Wang, N. R. Shanbhag, and S. J. Patel are with the Coordinated Sci- followed by a slave latch. A representative single-port latch is
ence Laboratory, University of Illinois at Urbana-Champaign, Urbana, IL 61801
USA. shown in Fig. 1. The inverter IN1 is used to generate comple-
Digital Object Identifier 10.1109/TVLSI.2006.887832 mentary clock signals locally; the transmission gate TG1 allows
1063-8210/$20.00 © 2006 IEEE
ZHANG et al.: SEQUENTIAL ELEMENT DESIGN WITH BISER 1369

Fig. 2. Block diagram of an error-correcting flip-flop design.

TABLE I keeper circuit will not cause an error at Q, because both O1 and
TRUTH TABLE OF THE C-ELEMENT O2 hold the correct logic values under the SEU assumption, and
hence, the Q node is strongly driven by the C-element. Simula-
tions have shown that the error-correcting design can achieve an
SER reduction of more than 20-fold when compared to an un-
protected flip-flop. More details are provided in Section IV.

III. REUSE PARADIGM


data at the D pin to flow through when the clock signal CLK is
high; the inverter IN3 and tristate inverter TI1 form a regenera- Scan DFT has become a de facto test standard in the in-
tive loop to store the sampled data when CLK is low. dustry because it enables an automated solution to high quality
The key to BISER is a new flip-flop design, composed of two production testing at low cost. In addition, scan is extremely
flip-flops joined with a C-element as shown in Fig. 2. To illus- valuable for postsilicon debug activities [4] because it provides
trate how error correction is achieved by the C-element, con- access to the internal nodes of an integrated circuit. Fig. 3(a)
sider the scenario where a particle strikes one of the four latches shows the block diagram of a scan flip-flop design of a micro-
when CLK is low. Note that only one latch is affected by a par- processor, comprising system and scan portions. Each portion is
ticle strike under the assumption of a single event upset (SEU). a master-slave flip-flop composed of two latches. Note that the
When CLK is high, latches LB and PH1 are transparent, and the two-port latches, such as the PH1 block in the system portion
same data is stored in these two latches (assuming that no soft or the LA block in the scan portion, can sample from one of the
error has affected PH2 or LA). As shown in Table I, the C-ele- two data lines depending on which clock signal is active.
ment acts as an inverter when the outputs of LB and PH1 match, The scan flip-flop in Fig. 3(a) has two operation modes: test
and the flip-flop output Q has the correct value. When CLK and normal. The scan clocks for the test mode are illustrated in
turns low, latches LB and PH1 hold the stored logic value inside Fig. 4. Clocks SCA and SCB are applied along the scan chain
their feedback loops. The contents of latches LB and PH1 are [Fig. 3(b)] alternately to shift a test pattern into latches LA and
now prone to soft errors, while LA and PH2 are not error-prone LB. Next, the UPDATE clock is applied to move the contents
because they are transparent and driven by the preceding logic of LB to PH1. Thus, a test pattern is written into the system
stages. Suppose that a particle strike flips the logic value stored portion. Next, functional clock CLK is applied which captures
in PH1. The two inputs of the C-element will be different but the system’s response to the test pattern. Finally, the CAPTURE
the error will not propagate to the C-element output. Using the clock is applied to move the contents of PH1 to LA. The system
same principle, error correction is also enabled during the sce- response can then be scanned out by alternately applying clocks
nario where a particle strike occurs when CLK is high. SCA and SCB. During normal system operation, the scan por-
The purpose of the keeper circuit in Fig. 2 is to fight the tion is shut off by assigning zero values to the scan clocks (SCA,
leakage current in the C-element when both the pull-up and the SCB, UPDATE, and CAPTURE), while the system portion is
pull-down paths in the C-element are shut off, which occurs only being clocked at full speed. The main reasons for using this
when the content of one bistable gets flipped by a particle strike. style of scan DFT instead of the classical scan flip-flop with a
Depending on the process technology and the clock frequency, multiplexer are simplified postsilicon debug and at-speed func-
the keeper structure may not be required. A particle strike in the tional testing. The scan portion of the flip-flop design in Fig. 3
1370 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 12, DECEMBER 2006

Fig. 3. (a) Block diagram of a scan flip-flop design. (b) Scan chain.

scan flip-flop. However, at the chip level, the TEST signal is


directly generated from the available test circuitry. Moreover,
the TEST signal remains static (either high or low) in any op-
eration mode. Minimal buffering and routing is required since
there are no strict timing constraints to meet. As a result, the EC
design does not increase the number of timing-critical signals
in the system, and hence, does not require major architectural
Fig. 4. Clock waveforms for a scan flip-flop in test mode.
changes.

B. Economy Mode
is sometimes designed to operate at-speed when scan is used for The economy mode is motivated by the fact that in a modern
at-speed functional test [5]. chip design environment, a single design often targets several
A key observation is that scan resources [e.g., latches LA and application segments at the same time with apparently con-
LB in Fig. 3(a)] are unused during normal operation, but still flicting requirements. For example, a mobile application (e.g.,
occupy chip area and consume leakage power. The concept of a laptop) requires very low power consumption but may not
reusing scan resources for error correction is illustrated in more have a stringent requirement for low SER. On the other hand,
detail by the error-correcting scan flip-flop design (denoted EC an enterprise application (e.g., data centers) may be less power
design) in Fig. 5. The EC design is modified based on the scan constrained, but may have to satisfy very stringent data integrity
flip-flop design in Fig. 3 by adding an OR gate to the clock path of requirements against soft errors. This provides a motivation to
LB and an AND gate to the clock path of LA, as well as rerouting introduce an economy mode into the EC design. The system
the 2-D data port of the latch LA. The EC design has three dis- can adaptively switch between the normal and economy modes
tinct operation modes: normal, test, and economy. depending on the criticality of the application. If the application
requires high reliability, the system works under normal mode.
A. Normal and Test Modes Otherwise, the system switches into an economy mode, which
When the EC design operates in normal mode, the scan clocks significantly reduces the power consumption by turning off part
SCA, SCB, UPDATE, and TEST are forced low, while the CAP- of the EC design circuitry.
TURE signal is high. This equivalently converts the scan portion One way to invoke an economy mode for the EC design is
into a master–slave flip-flop that operates in parallel with the to disable their scan portions by assigning proper values to the
system flip-flop. Error correction is then achieved in the same scan clocks, as illustrated in Fig. 6. The signal CAPTURE is
way as described in Section II. forced low so that the second clock port C2 of latch LA is always
The test mode operation of the EC design is activated by low even though the clock signal CLK is still toggling during
forcing the TEST signal high. This ensures that the output of economy mode. The signal SCB is forced high so that the clock
scan portion O2 becomes a “don’t care” to the output of the EC port C1 of latch LB is always high. The combined assignment
design Q so that the shifting of a test sequence along the scan of CAPTURE and SCB signal values ensures the scan portion,
chain does not interfere with the operation of the EC design. equivalent to a shadow master–slave flip-flop, has an opaque
The clock waveforms for the EC design in test mode remain the master latch and transparent slave latch. As a result, the scan
same, as shown previously in Fig. 4. The EC design requires portion does not consume any dynamic power caused by internal
an extra control signal TEST at the cell level compared to the data or clock activities. The signal TEST is also forced high so
ZHANG et al.: SEQUENTIAL ELEMENT DESIGN WITH BISER 1371

Fig. 5. Error-correcting scan flip-flop design.

Fig. 6. Error-correcting scan flip-flop design in economy mode.

that the C-element is disabled and the value of Q depends solely design, on the other hand, needs to meet the same timing con-
on the operation of the system portion. This added economy straints (setup time, CLK-to-Q delay, etc.) as the system flip-flop
mode roughly halves the dynamic power consumption of the in order to guarantee the correct operation. This means that tran-
EC design when it runs noncritical applications. sistors at critical locations within the scan portion of the EC de-
sign need to be replaced with the same type as those used in the
system portion.
IV. CIRCUIT DESIGN CONSIDERATIONS AND RESULTS
Due to the presence of both the nMOS and pMOS stacks in
The scan portion in a scan flip-flop [Fig. 3(a)] typically op- the C-element, it is very difficult to match the delay of a C-el-
erates at a lower speed than the system clock during test mode ement with that of an inverter by only sizing up the transistors.
[4]. As a result of this relaxed timing requirement, slow transis- Instead, low- transistors are used in the transistor stacks. Fur-
tors (those with smaller channel widths, longer channel lengths, thermore, the keeper circuit needs to be designed with great
or higher threshold voltages) are sometimes used in the scan care. More specifically, the forward inverter is kept at minimal
portion to lower leakage power during normal mode without af- size while the feedback inverter is intentionally weakened by
fecting its functionality in test mode. The scan portion in an EC using a larger channel length value. This minimizes the extra
1372 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 12, DECEMBER 2006

TABLE II
CELL-LEVEL POWER, AREA, AND SER COMPARISONS

when noncritical applications are running. Note that an SER im-


provement of more than 20-fold is theoretically possible for the
EC design but is beyond the resolution of the simulator we used.

Fig. 7. Simulation test bench for timing and power measurements. V. CHIP-LEVEL RESULTS
In this section, we explore the impact of BISER on a micro-
delay caused by contention in the keeper loop when the C-ele- processor by conducting a case study. We focus on the impact
ment writes a new value into the keeper circuit. The impact of that BISER has on SER and power consumption, identifying the
these circuit modifications are accurately modeled by our sim- tradeoff between these two metrics. The chip-level area penalty
ulation methodology, which is described as follows. caused by BISER is also estimated. To conduct this investiga-
Comparisons are conducted between the proposed EC de- tion, we adopt the methodology used in [9], which is summa-
sign, a reference scan flip-flop (denoted MSFF), and a dual in- rized below.
terlocked storage cell [6] flip-flop with scan circuitry (denoted We use a highly detailed register transfer level model of a
DICE). The DICE design has been chosen for comparison be- deeply pipelined, out-of-order microprocessor similar in com-
cause it is one of the best known classical circuit hardening tech- plexity to the Alpha 21264. The fault model is a single bit flip
niques. Since the circuit styles and transistor sizes may be sig- of a state element, either a flip-flop or a RAM cell. In order to
nificantly different across all designs being compared, we adopt understand how various logic blocks in the pipeline contribute to
a unified approach previously suggested in [7] to analyze the the failure rate of the microarchitecture, each flip-flop or RAM
timing and power of the flip-flops under investigation. Fig. 7 cell in the processor is categorized based on the general func-
illustrates the test bench we used. We assume the external ca- tion provided by that bit of state. The cumulative error coverage
pacitive loads and are both fan-out of 4. We also insert as a function of protected states can then be obtained by means
buffers between CLKI (fed by an ideal voltage source) and the of fault injection experiments. Two fault injection campaigns
CLK pin of the flip-flop, as well as between DI and D. These are then performed to characterize the processor under two sce-
buffering inverters serve two purposes: 1) provide realistic D narios: one where all memories are protected and soft errors af-
and CLK signals to the flip-flop and 2) capture power dissipa- fect flip-flops only and the other where soft errors affect both
tion differences due to different CLK and D loadings in different flip-flops and in-pipeline memories.
flip-flop designs. While optimizing the various flip-flop designs,
the objective is to match the timing parameter, D-to-Q delay, A. Scenario I: Soft Errors Affect Flip-Flops Only
with that of the reference design MSFF. We make the following To emulate this scenario, faults are only injected into flip-flop
assumptions during the power measurement of all flip-flops: states within the processor pipeline because memories are as-
1) data activity factor (average number of output transitions sumed to be protected by ECC, and hence, not affected by soft
per clock cycle) is 0.25; errors. The error coverage as a function of cumulative flip-flop
2) low-to-high and high-to-low data transitions are equally state coverage is shown in Fig. 8. The chip-level power penalty
likely. of selectively applying BISER techniques to flip-flops is also
The cell layout areas are estimated by an internal tool at Intel, estimated based on the cell-level power, total chip power, and
with a worst case error of 5% compared to real layouts. The the percentage of protected flip-flops. The power and area
SERs are obtained from an internal simulator [8]. penalties are listed together with chip-level SER improvement
Table II shows the simulated cell-level power, area, and SER in Table III. The chip-level SER can be improved by ten times
of all the designs. All the measurements are normalized with with an increased power of 10.3%.
respect to those of MSFF. The D-to-Q delay parameters are al-
ready equalized for all designs and hence not shown. The EC
B. Scenario II: Soft Errors Affect Both Flip-Flops and
design achieves a 20-fold SER improvement over the reference
Memories
design (MSFF) with the power and area overhead of 1.43 and
0.17, respectively. This also indicates a 7% power and 26% area To emulate this scenario, faults are injected into both flip-flop
savings as compared to the DICE design. As mentioned earlier, and memory states within the processor pipeline. The error cov-
these savings are a direct result of reusing existing on-chip re- erage plot is shown in Fig. 9. The raw SERs of flip-flops (de-
sources. Another distinct advantage of the EC design over DICE noted ) are different from those of RAM cells (de-
is the economy mode, which can further lower the overheads noted ). Since fault injection was conducted into both
ZHANG et al.: SEQUENTIAL ELEMENT DESIGN WITH BISER 1373

TABLE IV
CHIP-LEVEL POWER, AREA, AND PERFORMANCE PENALTIES OF BISER
AS A FUNCTION OF CHIP-LEVEL SER WHEN SOFT ERRORS AFFECT
BOTH FLIP-FLOPS AND IN-PIPELINE MEMORIES

Fig. 8. Error coverage versus state coverage when all memories are protected
and soft errors affect flip-flops only.

TABLE III rise abruptly in the beginning because most of the error coverage
CHIP-LEVEL POWER AND AREA PENALTIES OF BISER AS A FUNCTION
OF CHIP-LEVEL SER WHEN ALL MEMORIES ARE PROTECTED AND
is achieved by protecting the RAMs in certain logic blocks. The
SOFT ERRORS AFFECT FLIP-FLOPS ONLY curve corresponding to higher flip-flop SER is lower because
flip-flops contribute to more errors in that case. A prediction has
been made [10] that flip-flop SER will become larger relative to
SRAM SER as process technology advances, meaning it will be
more important to protect flip-flops.
The chip-level power penalty results are shown in Table IV.
A ten-fold improvement in chip-level SER can be achieved by
paying 7.0%–10.5% power penalty, depending on the flip-flop
SER relative to SRAM SER. The higher the flip-flop SER, the
more power penalty is needed to achieve the same chip-level
SER improvement.

VI. OTHER DESIGN VARIATIONS


The same principles of BISER have been employed in var-
ious sequential element designs. Examples include error-cor-
recting scanout and mux-scan flip-flops [3], and sequential ele-
ment designs that correct combinational logic soft errors [11].
We present two additional design variations in this section.

A. Low-Power EC Design
The EC design described in Section V inherits the main fea-
tures of scan clocks in a scan flip-flop, i.e., all scan clocks (SCA,
SCB, CAPTURE, and UPDATE) are globally routed and are
not timing critical. A low-power error-correcting design named
EC-LP is also possible as shown in Fig. 10. The CAPTURE
Fig. 9. Error coverage versus state coverage when errors affect both flip-flops
signal in this design is integrated into the clock generation cir-
and in-pipeline memories. cuit, instead of being globally routed. This configuration re-
duces clock loading inside the cell and eliminates one pin, which
results in lower power consumption. The main difference be-
flip-flop and RAM states, the error coverage curve depends on tween EC-LP and EC designs stems from the clock waveforms
the relative SER of flip-flops to RAMs. We consider various sce- in test mode, as illustrated in Fig. 11. Note that the EC-LP de-
narios based on the comparative SER study in [10]. The first sign does not have an economy mode.
curve (circle) corresponds to the case where the SER of the
flip-flop is the same as that of the RAM cell; the second curve in- B. Error-Correcting Scan Pulse Latch
dicates the SER of flip-flop is one-tenth of that of the RAM cell; Latches with pulse clocking, also known as pulse latches,
and so on. All three curves exhibit similar behavior. They start to have been used in several microprocessors. For example, on the
1374 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 12, DECEMBER 2006

Fig. 10. Low-power error-correcting design.

Fig. 11. Clock waveforms for a low-power error-correcting design in test


mode.

Itanium 2 processor, 95% of all static latches are pulse latches


[12]. A pulse latch combines the behavioral merits of both an
edge triggered flip-flop and a level sensitive latch. On the one
hand, it provides a relatively wider transparency window than
that of a flip-flop, which allows cycle stealing and skew toler-
ance. On the other hand, it maintains edge triggering behavior Fig. 12. Scan pulse latch design.
and avoids the relatively long hold times of a level-sensitive
latch. Moreover, a pulse latch has speed, power, and area
advantages over its flip-flop counter-part: a pulse latch is faster desired location, the system’s response is captured by one or
due to the reduced number of logic gate levels between its input more PCK pulses while SHIFT is low. To shift out the system
and output, consumes less power due to roughly halved clock response, SCKA and SCKB are again alternately pulsed high
loading and reduced data switching activities, and occupies while SHIFT is kept high and PCK is kept low. During normal
smaller silicon area due to smaller transistor counts. mode operation, SCKA, SCKB, and SHIFT signals are kept
Scan pulse latches have been implemented in a micropro- low while PCK is used to latch system data. An SEU in either
cessor for production testing and postsilicon debug purposes [8], the PL1 or PL2 block will be corrected by the C-element as
[12], [13]. Fig. 12 shows a scan pulse latch with system and scan explained earlier.
portions. During test operation, the system and scan portions
form a master–slave flip-flop to shift the test sequence in and VII. RELATED WORK
out. During normal operation, SHIFT is assigned a low value so Prior work for soft error protection in memory and sequen-
that the system portion is isolated from the scan-in data at SI. tial elements can be broadly classified into the following cat-
Note that data is written from both directions into the latch cell egories: process-level, device-level, circuit-level, and system-
during normal operation. This design eliminates the need for an level techniques. We briefly review and compare some repre-
interrupted feedback inverter, and hence, saves clock power. sentative techniques in this section.
The design illustrated in Fig. 13 is an error-correcting scan At the process level, silicon-on-sapphire (SOS) and sil-
pulse latch. During test mode operation, SCKA and SCKB icon-on-insulator (SOI) technologies have been developed as
signals are pulsed high alternately to shift in the test sequence a soft error mitigation technique for space and military appli-
while SHIFT is high. Once the test sequence reaches the cations. These technologies are immune to radiation-induced
ZHANG et al.: SEQUENTIAL ELEMENT DESIGN WITH BISER 1375

Fig. 13. (a) Error-correcting scan pulse latch. (b) Circuit schematics of the PL1 or PL2 block in (a). (c) Circuit schematics of the C block in (a).

latch-up as the parasitic p-n-p-n structure does not exist any- design [6]. These redundant transistor-based designs usually
more due to full electrical isolation of individual transistors. require at least twice as many transistors as unprotected circuits,
Moreover, the thin silicon film reduces the charge collection which typically indicates very high area and power penalties.
depth, which in turn, lowers the sensitivity to soft errors. The presented BISER technique overcomes this limitation by
Prior work has demonstrated the SER improvement of SRAM reusing exiting on-chip DFT resources.
devices fabricated on SOI over those on bulk: 2 improvement At the system-level, hardware and time redundancy tech-
for the 180-nm node and 5 for the 90-nm node [14]. However, niques have been proposed to combat soft errors. Classical
the robustness of an SOI device with a floating body may be hardware redundancy techniques include chip-level duplication
compromised by an ionizing particle that triggers the inherent (as used in HP-Tandem machines [21]), block-level duplication
and parasitic bipolar transistors, and hence, results in charge used in IBM Z-Series machines [21], triple modular redun-
amplification. Employing external body contacts could cure dancy [22], parity prediction [23]–[25], application-specific
this problem but significantly increases the area penalty. error detection techniques [26]–[28], DIVA [29] and several
At the device level, rad-hard-by-design (RHBD) techniques others. A major benefit of these techniques is that they do not
have been used to lower a device’s sensitivity to radiation. For assume any particular error mechanism, and hence, work for
example, guardbanding around nMOS and/or pMOS transistors most error sources. However, except for the application-specific
greatly reduces the susceptibility of CMOS circuits to radiation- error detection techniques, the area and power overheads are
induced latch-up [15]. However, applying such techniques to an generally very high. Major time redundancy techniques include
existing standard cell library could be time consuming since the error detection with shifted operands (RESO) [30], redundant
layouts of many devices need to be regenerated. execution using spare elements (REESE) [31], multithreading
At the circuit level, there are two main approaches to reduce for transient error detection [32]–[36], and software imple-
the effects of soft errors: 1) increasing the critical charge of mented hardware fault tolerance (SIHFT) [37], [38]. A key
a circuit node and 2) adding transistors to enable redundant drawback for these techniques is that the performance penalty
storage of information. Capacitive hardening [16], resistive can be very significant: around 40% for multithreading, and
hardening [17], and the use of high-drive transistors have been 40%–200% for SIHFT. Furthermore, the power overheads of
used to increase the critical charge. These techniques tend to these techniques are also significant due to redundant execution,
increase the power consumption and lower the speed of the although there is a lack of published power overheads. These
circuits. Redundant circuit techniques include the low power techniques are mainly applicable for specific designs such as
Whitaker cell [18], Barry/Dooley design [19], [20], and DICE microprocessors.
1376 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 12, DECEMBER 2006

VIII. CONCLUSIONS [14] P. Roche and G. Gasiot, “Impacts of front-end and middle-end process
modifications on terrestrial soft error rate,” IEEE Trans. Device Mater.
The BISER paradigm presented in this paper is the key to Reliab., vol. 5, no. 3, pp. 382–396, Sep. 2005.
the development of power, performance, and area efficient soft [15] J. V. Osborn, D. C. Mayer, R. C. Lacoe, S. C. Moss, and S. D. LaLu-
error protection techniques. BISER has the following unique mondiere, “Single event latchup characteristics of three commercial
CMOS processes,” in Proc. 7th NASA Symp. VLSI Des., 1998, Paper
advantages over other major soft error protection techniques: No. 4.3.1.
1) minimal area and power overheads because resources al- [16] STMicroelectronics Press Release, “New chip technology from STmi-
ready present for test and debug are reused for soft error croelectronics eliminates soft error threat to electronic systems,” 2003
[Online]. Available: http://www.st.com/stonline/press/news/year2003/
resilience;
t1394h.htm
2) minimal routing overhead; [17] L. R. Rockett Jr., “Simulated SEU hardened scaled CMOS SRAM cell
3) no requirement for major architectural changes; design using gated resistors,” IEEE Trans. Nucl. Sci., vol. 39, no. 5, pp.
4) applicability to any digital design (e.g., microprocessors, 1532–1541, Oct. 1992.
[18] M. N. Liu and S. Whitaker, “Low power SEU immune CMOS memory
network processors, ASICs); circuits,” IEEE Trans. Nucl. Sci., vol. 39, no. 6, pp. 1679–1684, Dec.
5) broad spectrum of design choices suitable for adaptive ap- 1992.
plications with a wide range of power and performance [19] M. J. Barry, “Radiation resistant SRAM memory cell,” U.S. Patent
tradeoffs; 5 157 625, Oct. 20, 1992.
[20] J. G. Dooley, “SER-immune latch for gate array, standard cell, and
6) additional power saving techniques such as economy mode other ASIC applications,” U.S. Patent 5 311 070, May 10, 1994.
operation. [21] W. Bartlett and L. Spainhower, “Commercial fault tolerance: A tale of
two systems,” IEEE Trans. Dependable Secure Comput., vol. 1, no. 1,
pp. 87–96, Jan. 2004.
ACKNOWLEDGMENT [22] D. P. Siewiorek and R. S. Swarz, Reliable Computer Systems Design
and Evaluation, 3rd ed. Natick, MA: A. K. Peters, 1998.
The authors would like to thank K. Ganesh, J. Maiz, P. [23] K. Mohanram, E. S. Sogomonyan, M. Gossel, and N. A. Touba,
Shipley, A. Vo, S. Walstra, and V. Zia, and from Intel Corpo- “Synthesis of low-cost parity-based partially self-checking circuits,”
ration for their discussions and assistance during the course of in Proc. IEEE On-Line Testing Symp., 2003, pp. 35–40.
[24] N. A. Touba and E. J. McCluskey, “Logic synthesis of multilevel
this research. circuits with concurrent error detection,” IEEE Trans. Comput.-Aided
Des., vol. 16, no. 7, pp. 783–789, Jul. 1997.
[25] C. Zeng, N. R. Saxena, and E. J. McCluskey, “Finite state machine syn-
REFERENCES
thesis with concurrent error detection,” in Proc. Int. Test Conf., 1999,
[1] R. C. Baumann, “Soft errors in advanced semiconductor devices-Part pp. 672–680.
I: The three radiation sources,” IEEE Trans. Device Mater. Reliab., vol. [26] K. H. Huang and J. A. Abraham, “Algorithm based fault tolerance
1, no. 1, pp. 17–22, Mar. 2001. for matrix operations,” IEEE Trans. Comput., vol. C-33, no. 6, pp.
[2] M. Zhang and N. R. Shanbhag, “A soft error rate analysis (SERA) 518–528, Jun. 1984.
methodology,” in Proc. IEEE Int. Conf. Comput.-Aided Des., 2004, pp. [27] W. J. Huang, N. R. Saxena, and E. J. McCluskey, “A reliable LZ data
111–118. compressor on reconfigurable coprocessors,” in Proc. IEEE Field Pro-
[3] S. Mitra, M. Zhang, T. M. Mak, N. Seifert, V. Zia, and K. S. Kim, gram. Custom Comput. Mach., 2000, pp. 249–258.
“Logic soft errors: A major barrier to robust platform design,” in Proc. [28] J. Y. Jou and J. A. Abraham, “Fault-tolerant FFT networks,” IEEE
IEEE Int. Test Conf., 2005, pp. 687–696. Trans. Comput., vol. 37, no. 5, pp. 548–561, May 1988.
[4] R. Kuppuswamy, P. DesRosier, D. Feltham, R. Sheikh, and P. [29] T. M. Austin, “DIVA: A reliable substrate for deep submicron microar-
Thadikaran, “Full hold-scan systems in microprocessors: Cost/benefit chitecture design,” in Proc. Int. Symp. Microarch., 1999, pp. 196–207.
analysis,” Intel Technol. J., vol. 8, pp. 63–72, Feb. 2004. [30] J. H. Patel and L. Y. Fung, “Concurrent error detection in ALUs by
[5] A. Carbine and D. Feltham, “Pentium pro processor design for test and recomputing with shifted operands,” IEEE Trans. Comput., vol. C-31,
debug,” in Proc. IEEE Int. Test Conf., 1999, pp. 294–303. no. 7, pp. 589–595, Jul. 1982.
[6] T. Calin, M. Nicolaidis, and R. Velazco, “Upset hardened memory de- [31] J. B. Nickel and A. K. Somani, “REESE: A method of soft error detec-
sign for submicron CMOS technology,” IEEE Trans. Nucl. Sci., vol. tion in microprocessors,” in Proc. Int. Conf. Dependable Syst. Netw.,
43, no. 6, pp. 2874–2878, Dec. 1996. 2001, pp. 401–410.
[7] V. Stojanovic, V. G. Oklobdzija, and R. Bajwa, “A unified approach in [32] E. Rotenberg, “AR-SMT: A microarchitectural approach to fault toler-
the analysis of latches and flip-flops for low power systems,” in Proc. ance in microprocessors,” in Proc. Int. Symp. Fault-Tolerant Comput.,
Int. Symp. Low Power Electron. Des., 1998, pp. 227–232. 1999, pp. 84–91.
[8] N. Seifert, V. Ambrose, P. Shipley, M. Pant, and B. Gill, “Radiation [33] N. R. Saxena, S. Fernandez-Gomez, W. Huang, S. Mitra, S. Yu, and E.
induced clock jitter and race,” in Proc. Int. Phys. Reliab. Symp., 2005, J. McCluskey, “Dependable computing and online testing in adaptive
pp. 215–222. and configurable systems,” IEEE Des. Test. Comput., vol. 17, no. 1, pp.
[9] N. J. Wang, J. Quek, T. M. Rafacz, and S. J. Patel, “Characterizing the 29–41, Jan. 2000.
effects of transient faults on a high-performance processor pipeline,” in [34] J. Ray, J. C. Hoe, and B. Falsafi, “Dual use of superscalar datapath for
Proc. Int. Conf. Dependable Syst. Netw., 2004, pp. 61–70. transient-fault detection and recovery,” in Proc. Int. Symp. Microarch.,
[10] R. C. Baumann, “The impact of technology scaling on soft error rate 2001, pp. 214–224.
performance and limits to the efficacy of error correction,” in Dig. IEEE [35] S. S. Mukherjee, M. Kontz, and S. K. Reinhardt, “Detailed design
Int. Electron Devices Meeting, 2002, pp. 329–332. and evaluation of redundant multi-threading alternatives,” in Proc. Int.
[11] S. Mitra, M. Zhang, S. Waqas, N. Seifert, B. Gill, and K. S. Kim, “Com- Symp. Comput. Arch., 2002, pp. 99–110.
binational logic soft error correction,” in Proc. IEEE Int. Test Conf., [36] T. N. Vijaykumar, I. Pomeranz, and K. Cheng, “Transient-fault
2006, Paper No. 29.2. recovery using simultaneous multithreading,” in Proc. Int. Symp.
[12] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J. Comput. Arch., 2002, pp. 87–98.
Sullivan, and T. Grutkowski, “The implementation of the Itanium 2 [37] N. Oh, P. P. Shirvani, and E. McCluskey, “Error detection by dupli-
microprocessor,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp. cated instructions in super-scalar processors,” IEEE Trans. Reliab., vol.
1448–1460, Nov. 2002. 51, no. 1, pp. 63–75, Mar. 2002.
[13] D. D. Josephson, S. Poehlman, V. Govan, and C. Mumford, “Test [38] N. Oh, S. Mitra, and E. J. McCluskey, “ED4I: Error detection by di-
methodology for the McKinley processor,” in Proc. Int. Test Conf., verse data and duplicated instructions,” IEEE Trans. Comput., vol. 51,
2001, pp. 578–585. no. 2, pp. 180–199, Feb. 2002.
ZHANG et al.: SEQUENTIAL ELEMENT DESIGN WITH BISER 1377

Ming Zhang (S’06–M’07) received the B.S. degree seven more pending. He had served on the program committees of various
in physics from Peking University, Beijing, China, in conferences and workshops.
1999 and the M.S. and Ph.D. degrees in electrical en- Dr. Mak was a recipient of the SRC Outstanding Industrial Mentor Award in
gineering from the University of Illinois at Urbana- both 1997 and 2004 and the Best Paper Award at the International Test Confer-
Champaign (UIUC), Urbana, in 2001 and 2006, re- ence in 2004 and a Best Panel Award from VTS in 2004.
spectively.
He is currently a Staff Computer-Aided Design
(CAD) Engineer with Intel Corporation, Folsom,
CA. From 1999 to 2001, he developed micro- Norbert Seifert (SM’04) received the M.S. degree
electromechanical systems for nanolithography in physics from Vanderbilt University, Nashville, TN,
applications at the Microelectronics Laboratory of in 1994, and the Diplom Ingenieur and Ph.D. degrees
UIUC. From 2004 to 2005, he interned at Intel Corporation and developed in physics from the Technical University of Vienna,
soft error resilient circuits and fault-tolerant architectures. His Ph.D. research Vienna, Austria, in 1990 and 1993, respectively.
work included a soft error rate analysis methodology and various classes of He is currently a Staff Reliability and Design En-
soft-error tolerant circuit design techniques. His current research interests in- gineer with Intel Corporation, Hillsboro, OR, where
clude error-resilient and low-power circuits, variation and degradation-tolerant he is responsible for developing a coherent chip-level
circuits and architectures, and circuit/architecture design for nanotechnology. SER methodology. He is also studying the impact of
He has published more than twenty technical papers and holds seven issued or NBTI on circuit and system performance. He worked
pending U.S. patents. He serves on the program committees of several IEEE in TCAD and circuit design in the Alpha Develop-
conferences and symposia. ment Group (DEC/Compaq/HP) from 1997 to 2003. Prior to joining DEC, he
Dr. Zhang was a recipient of the M. E. Van Valkenburg Research Award studied charge transfer processes in atomic collisions as a postdoctoral associate
for demonstrated excellence in circuit and system research and the University at North Carolina State University, Raleigh, and computational fluid dynamics
Award for excellence in teaching an undergraduate-level circuit class. of high-power laser material processing as a postdoctoral associate at the Tech-
nical University of Vienna. He has worked extensively on the interaction of radi-
ation with matter, in general, and on the response of digital circuits in particular.
He has published more than 30 technical papers on this topic and has presented
Subhasish Mitra (SM’06) is an Assistant Professor soft error tutorials at several reliability conferences. He actively serves on the
in the Departments of Electrical Engineering and organizing and program committees at several IEEE sponsored conferences. He
Computer Science, Stanford University, Stanford, is a frequent reviewer for leading reliability journals and is a co-editor of the
CA. His research interests include robust system September, 2005, IEEE TRANSACTIONS DEVICE AND MATERIALS RELIABILITY
design, VLSI design and test, computer-aided design Special Issue on Soft Errors and Data Integrity in Terrestrial Computer Systems.
(CAD), fault-tolerant computing, and computer
architecture. Prior to joining Stanford, he was a Prin-
cipal Engineer at Intel Corporation, Hillsboro, OR,
where he was responsible for developing enabling Nicholas J. Wang received the M.S. and B.S.
technologies for robust system design – Design for degrees in electrical and computer engineering from
Excellence (Reliability, Testability, and Debug) – the University of Illinois Urbana-Champaign, Ur-
in advanced technologies. He has published more than 90 technical papers bana Champaign, where he is currently pursuing the
in leading conferences and journals, and invented design and test techniques Ph.D. degree in electrical and computer engineering.
that have shown wide-spread proliferation in the industry. His X-Compact His research interests include fault-tolerant com-
technique for test compression is being used by more than 40 Intel products, puter architectures.
and is supported by major CAD tools.
Dr. Mitra was a recipient of the IEEE Circuits and Systems Society Donald
O. Pederson Award for the best paper published in the IEEE TRANSACTIONS
ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, a Best
Paper Award at the Intel Design and Test Technology Conference for his work
on built-in soft error resilience, a Best Paper Award nomination at the Design
Automation Conference, a Divisional Recognition Award from the Intel Mo-
bility Group for a breakthrough soft error protection technology, Terman Fel- Quan Shi received the B.S. and M.S. degrees in radio
lowship from the Stanford School of Engineering, the Sundaram Seshu Scholar electronics from Beijing Normal University, Beijing,
Lecturer at the Coordinated Science Laboratory of the University of Illinois at China, in 1986 and 1989, respectively, and the Ph.D.
Urbana Champaign, and the Intel Achievement Award, Intel’s highest honor, degree in electrical engineering from University of
for the development and deployment of a breakthrough test compression tech- New Mexico, Albuqueque, in 2000.
nology that achieved an order of magnitude reduction in scan test cost. He has He is currently a Design Engineer with Intel,
held consulting positions at several companies, and serves on the organizing and Hillsboro, OR. Before joining Intel, he was with
program committees of several IEEE and ACM sponsored conferences, sym- NASA Institute of Advanced MicroElectronics in
posia, and workshops. Albuquerque, NM. His research interests include
circuit hardening techniques, circuit reliability,
circuit modeling, and asynchronous circuits.

T. M. Mak (SM’01) received the B.S.E.E. degree


from Hong Kong Polytechnic University, Hong
Kong, in 1979. Kee Sup Kim (M’92) received the Ph.D. degree from
He is a Research Scientist with the Design the University of Wisconsin-Madison, Madison.
Technology Solution Group, Intel Corporation, Currently, he is Director of DFX at Mobility
Santa Clara, CA, carrying out test research. He Group at Intel Corporation, Hillsboro, OR, where he
is currently serving a second term assignment to is in charge of developing solutions for design for
mentor MARCO/Focus Center Research Program testability, manufacturability, debug, and reliability
(FCRP) research. He has been with Intel for over for communications products. Previously, he worked
22 years and has worked on a variety of areas on various aspects of testing and DFT for Intel CPU
including test development, product engineering, products. He served as an organizing committee
design automation, and design for test. His current research interests range member for International Conference on Computer
from defect-based testing, fault effects as a result of nanometer technology, Design.
circuit level and physical design test issues, I/O interface and analog testing, Dr. Kim was a co-recipient of the IEEE Donald O. Peterson Award for his
and fault tolerant and online testing. He currently holds eight patents with work in test compression.
1378 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 14, NO. 12, DECEMBER 2006

Naresh R. Shanbhag (F’06) received the Ph.D. de- Dr. Shanbhag was a recipient of the IEEE TRANSACTIONS ON VERY LARGE
gree in electrical engineering from the University of SCALE INTEGRATION (VLSI) SYSTEMS Best Paper Award in 2001, the IEEE
Minnesota, Minneapolis, in 1993. Leon K. Kirchmayer Best Paper Award in 1999, the Xerox Faculty Award in
Since August 1995, he has been with the Depart- 1999, the National Science Foundation CAREER Award in 1996, and the Dar-
ment of Electrical and Computer Engineering, and lington Best Paper Award from the IEEE Circuits and Systems Society in 1994.
the Coordinated Science Laboratory, University of
Illinois at Urbana-Champaign, Urbana, where he
is presently a Professor. From 1993 to 1995, he
worked at AT&T Bell Laboratories, Murray Hill, Sanjay J. Patel (M’99) received the B.S., M.S., and
NJ, where he was the lead chip architect for AT&T’s Ph.D. degrees in computer science and engineering
51.84-Mb/s transceiver chips over twisted-pair from the University of Michigan, Ann Arbor, in 1990,
wiring for Asynchronous Transfer Mode (ATM)-LAN and very high-speed dig- 1992, and 1999, respectively.
ital subscriber line (VDSL) chip-sets. His research interests include the design Currently, he is an Associate Professor in the Elec-
of integrated circuits and systems for broadband communications including trical and Computer Engineering Department and a
low-power/high-performance VLSI architectures for error-control coding, Willett Faculty Scholar, University of Illinois at Ur-
equalization, as well as digital integrated circuit design. He has published more bana-Champaign, Urbana. He is also serving as Chief
than 90 journal articles, book chapters, and conference publications in this area Architect at AGEIA Technologies, St. Louis, MO.
and holds three U.S. patents. He is also a co-author of the research monograph His research interests include processor microarchi-
Pipelined Adaptive Digital Filters (Kluwer, 1994). From 1997–1999, he was tecture, computer architecture, and high performance
a Distinguished Lecturer for the IEEE Circuits and Systems Society. From and reliable computer systems. He has worked with architecture, hardware ver-
1997–1999 and from 1999–2002, he served as an Associate Editor for the ification, logic design, and performance modeling at Digital Equipment Cor-
IEEE TRANSACTION ON CIRCUITS AND SYSTEMS—PART II: EXPRESS BRIEFS poration, Intel Corporation, and HAL Computer Systems, as well as provided
and the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) consultation for Transmeta, Jet Propulsion Laboratory, HAL, Intel, and AGEIA
SYSTEMS, respectively. He has served on the technical program committees of Technologies.
various conferences. He is also a co-founder and the Chief Technology Officer Dr. Patel is a member of the IEEE Computer Society.
of Intersymbol Communications, Inc., (a wholly owned subsidiary of Kodeos
Communications, Inc., since March 2006) Champaign, IL, which was founded
in 2000, and where he provides strategic directions in the development of
mixed-signal receivers for next generation optical fiber links.

You might also like