Professional Documents
Culture Documents
Abstract— Field-programmable gate array (FPGA)-based significantly outnumber the user memory bits, SEU-emulation
single-event upset (SEU) emulation in user flip-flops is essential in user memories may not be critical for reliability evaluation
for reliability evaluation of mapped designs. Previous approaches of FPGA designs [3]. However, a fault-injection methodology
to inject SEUs in user flip-flops utilized the configuration memory
bits that control the set and reset settings of the flip-flops. may support it for completeness. In contrast, when the FPGA
In contrast, this paper presents a novel approach for SEU is used as substitute hardware for fault emulation in ASICs,
emulation in user flip-flops contents through single-frame on-chip only SEU injection in user memory elements is of interest;
partial reconfiguration (PR). The presented methodology exploits in particular, in FFs due to their dynamic per-clock cycle
the inherent architectural features of the latest Xilinx FPGAs to nature. Similarly, single-event transients (SETs) effects can be
support state initialization of flip-flops during PR. The proposed
approach does not require instrumentation overhead for flip-flops indirectly emulated as SEUs in an SRAM-FPGA in the case
and reduces the fault-injection times by orders of magnitude. of substitute use, since SETs become observable only when
registered by memory elements. Therefore, SEU-emulation in
Index Terms— Fault injection, radiation effects, reliability
evaluations. user flip-flops is vital for reliability evaluation of ASICs.
There are two main approaches for SEU-emulation in flip-
I. I NTRODUCTION flops on an FPGA, i.e., instrumentation and reconfiguration
based. Instrumentation-based approaches add extra logic for
F AULT injection is an important tool for low-cost
reliability evaluation of electronic circuits against
radiation-induced soft errors. There are two approaches to
run-time bitflip insertion through flip-flops logical ports.
These modifications of the circuits can be done at different
injecting faults and evaluating their effects: through simula- levels of abstractions, i.e., HDL, netlist, mapping, or place
tion or emulation. Fault injection through simulation utilizes and route. These approaches are significantly fast, but they
the traditional circuit modeling and simulation tools (with have to instrument each design flip-flop, resulting in huge
their corresponding technology libraries). This minimizes the resource overhead. Reconfiguration-based approaches utilize
experimental setup times and provides higher observability on the configuration memory for run-time SEU injection using
a per-cycle basis, but suffers from very slow processing times. a methodology called readback capture and restore [4]. The
On the other hand, fault injection by emulation utilizes a capture operation updates the shadow configuration cells
hardware platform for this purpose and provides much faster (i.e., captured cells) with instantaneous flip-flop values while
injection and evaluation times, but limited observability [1]. the clock is stopped. Then, a readback procedure accesses the
Field-programmable gate arrays (FPGAs) provide an ideal corresponding configuration memory and flips the cell state
fault-emulation platform due to their higher logic densities on which the error is to be injected. This is followed by a
and reconfigurability. The design mapped to an FPGA could configuration memory write-back and a restore operation. The
represent the end product on a deployed FPGA or it may be restore operation reinitializes the flip-flops with the modified
an ASIC design utilizing the FPGA as emulation platform [2]. values. The clock is then allowed to resume after this operation
For an FPGA system, the most relevant fault model is a single- is completed. This methodology has been widely used for
event upset (SEU) due to their dependence on SRAM tech- flip-flops fault injection by the well-known FT-UNSHADES
nology. Since in such a design the configuration memory bits framework [5]. The platform supports fault injection in
configuration memory as well as in flip-flop contents.
Manuscript received September 26, 2017; revised December 6, 2017 and
January 29, 2018; accepted February 26, 2018. Date of publication However, injection in flip-flops is based on total reconfig-
March 6, 2018; date of current version April 12, 2018. This work was uration which is a time-consuming process. Furthermore,
supported by the Spanish Ministry of Economy and Competitiveness under FT-UNSHADES is based on a custom printed circuit board
Grant ESP2014-54505-C2-1-R.
A. Ullah is with the Center for Advanced Studies in Engineering, hosting multiple-FPGAs for controlling the whole injection
Islamabad 44000, Pakistan, and also with the ARIES Research Center, Escuela process. A recent work in [3] combines both instrumentation
Politécnica Superior, Universidad Antonio de Nebrija, 28015 Madrid, Spain and reconfiguration-based approaches to achieve a tradeoff
(e-mail: anees.ullah@case.edu.pk; aullah@nebrija.es).
P. Reviriego and J. A. Maestro are with the ARIES Research Center, Escuela between the area and the reconfiguration delay. This is based
Politécnica Superior, Universidad Antonio de Nebrija, 28015 Madrid, Spain on the utilization of the local reset lines of flip-flops in addition
(e-mail: previrie@nebrija.es; jmaestro@nebrija.es). to using the configuration memory control bits that define the
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. functionality of the local reset lines. Although, the proposed
Digital Object Identifier 10.1109/TNS.2018.2812719 method is an improvement compared to the previous ones,
0018-9499 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
990 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018
it limits the utilization of the FPGA slice resources to one for instrumentation of flip-flops and automates the mapping
flip-flop out of the available “n” (this value depends on the process from HDL level, thereby incurring slightly less over-
FPGA family for Virtex-5 it was 4; for 7-series FPGAs, it is head compared to the previous approaches. The downside
8) flip-flops due to architectural limitations. This limitation of instrumentation-based approaches is that they have huge
is due to the fact that the slice’s flip-flops have shared resource overheads. Moreover, the modifications have to be
control lines and they cannot be triggered independently. often applied at a postsynthesis level, imposing the need
Moreover, their approach is based on reverse engineering to develop ad hoc methodologies and tools for these tasks,
to locate the part of the configuration memory which thereby increasing design effort.
controls the behavior of local reset lines of the flip-flop. This
limits the applicability of the method across other device B. Reconfiguration-Based Approaches
families.
Reconfiguration-based approaches modify the correspond-
This paper presents an efficient partial-reconfiguration-
ing captured cells in the configuration memory related to
based SEU injection approach in user flip-flops without insert-
the flip-flops for fault injection. This process requires the
ing any extra hardware, while achieving significant increase
capturing of the current state of the flip-flops in the captured
in fault-injection speeds over existing reconfiguration-based
cells of the configuration memory, reading back the bitstream,
approaches. The presented methodology is based on single-
modifying it and writing it back to the captured cells for fault
frame fault injection at a selected fault site (flip-flop) with the
injection. Swift et al. [11] show the partial reconfiguration-
assertion of the chip-wide global reset line in a controlled
based injection in input–output configuration bits, the adjacent
manner to avoid corruption of other state elements of the
bits and the bits used for their registers’ settings (i.e., set and
design. Since, the presented methodology utilizes standard
reset) of an input–output block. Through single or multiple-
Xilinx partial reconfiguration (PR) flow and does not require
frames partial or full reconfiguration, the same employed fault-
any reverse engineering it is generic and can be applied across
injection technique can be extended to inject faults in all the
several Xilinx FPGA families. Moreover, because of the usage
user accessible resources in the FPGA, such as the FPGA
of standard design components the overall methodology is
configuration bits as well as the FFs. However, the method-
quite simple to implement. Therefore, the design and the
ology does not target the FFs but only their configuration
subsequent nonrecurring engineering efforts are minimized.
settings that affect their behavior. Antoni et al. [12] utilize a
The rest of this paper is organized as follows. Section II
reconfiguration-based fault injection through the JBits API in
presents the state-of-the-art SEU fault-injection platforms
flip-flops. In order to not affect other flip-flops, a complicated
supporting flip-flops. Section III outlines the presented
mechanism is adopted to avoid unintended state corruption.
approach. Section IV presents the developed test vehicle and
Moreover, the JBits API is obsolete and not applicable to
Section V shows the experimental results, while Section VI
state-of-the-art FPGAs. The reported fault-injection times are
provides the conclusion of this paper.
3.5 s per injection. Similarly, FADES [13] is another platform
based on JBits and reports 3.3 s per fault injection. FT-
II. R ELATED W ORK UNSHADES [14]–[16] is a well-known platform for flip-flop
SRAM-based FPGAs have been used for SEU-emulation fault injection based upon the Xilinx captures and restores
in user design flip-flops by many researchers over the last approach. This platform utilizes full-bitstream-based recon-
two decades. The existing techniques can be broadly classi- figuration for flip-flop injection through the JTAG interface.
fied into three categories: instrumentation-based approaches, For fault injection in flip-flops, the platform reads back the
reconfiguration-based approaches, and combined approaches. bitstream and the current state of the flip-flops from the FPGA.
Then, it iterates over the bitstream and readback file and
A. Instrumentation-Based Approaches updates the INIT values of the flip-flops while the intended
flip-flop state is inverted. This merging is a serial process and
The instrumentation-based approaches insert extra hardware
is off-chip; therefore, it takes a long time compared to on-
in the design’s flip-flops providing anchor points to the logical
chip PR-based approaches. Although, an improved version in
layer for run-time injection at the design’s speed. Some of
the form of FT-UNSHADES2 [5] improves upon the previous
the early works that use scan-chains-based methodologies for
versions to include various features, the platform is still
transient fault injection in flip-flops contents were reported
using full-bitstream-based fault injection for flip-flops. The
in [6] and [7]. A complete solution based on a variation
drawback of full-reconfiguration-based approaches is that they
of scan-chain-based instrumentation was proposed by Lopez-
have significant overhead with respect to the fault-injection
Ongil et al. [8] to increase the fault-injection speed and to
time compared to the instrumentation-based approaches.
minimize the error logging and the communication time with
the PC. A fault injection and fault masking analysis platform
is presented by Naviner et al. [9]. It is based on saboteur C. Combined Approaches and Limitations
insertion for supporting fault injection in any node of interest. Both instrumentation and reconfiguration approaches are
It can support multiple-fault models and is highly parameter- limited due to their resource and reconfiguration overheads.
izable. However, this flexibility and completeness is achieved Researchers have developed hybrid approaches to achieve an
with very high resource overhead. Similarly, the Netlist Fault optimal solution. López-Ongil et al. [17] presented a unified
Injection tool [10] modifies the FPGA primitive’s library fault-injection platform achieving results with injection in
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
ULLAH et al.: EFFICIENT METHODOLOGY FOR ON-CHIP SEU INJECTION IN FLIP-FLOPS 991
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
992 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018
it can be noted that the other input to the multiplexer “M1” is for proper reinitialization of the PR-region after the partial
from the configurable logic block (CLB) logic (e.g., LUTs). bitstream is delivered to the FPGA. The purpose of this feature
In the normal operation of the device, the multiplexer always is to ensure that the partial reconfigurable design’s flip-flops
forward the CLB outputs to the flip-flop. Although, GSR and are initialized to user defined RT-level description. In our case,
SR can both be used for transferring the INIT storage value we utilize this configuration frame to protect the static region
into the flip-flop, the utilization of the local SR signal needs from undesired state corruption during the capture and restores
instrumentation methods that would add resource overhead for operations when the GSR is asserted. Furthermore, we utilize
fault injection, as was done in [3]. It can be noted that the the partial bitstream once for unmasking the PR-region and
global write enable (GWE) signal enables the usage of the masking the rest of the FPGA. The described fault-injection
flip-flop for writing purposes and acts in a similar fashion to technique uses single-frame modifications, achieving hence
the clock enable (CE) signal local to the slice. The GWE signal significant fast injection times. In Section III-B illustrates the
disables writing to memory elements during the bitstream implementation flow for achieving fault injection in flip-flops
download process until the configuration memory is initialized. through PR.
The assertion of these global signals has to be done carefully
to avoid undesired effects on the design. The asynchronous B. Design Flow and Fault Site List Generation
nature of these signals along with signal propagation delays The design and fault list generation flow for Xilinx FPGAs
across the chip means that timing violation can be caused if is shown in Fig. 2. The flow starts with the description of
the clocking constraints are not met. This could be ensured by static and reconfiguration regions. The static region contains
the reconfiguration controller in the user design. For example, the portion of design that is fixed during run time while the
when a microprocessor is used for controlling the internal reconfigurable region undergoes changes. The static region
configuration access port (ICAP), a few no operations can consists of a microprocessor and its peripheral connected
ensure timing, although, in most cases the software processing through a standard bus protocol. The static region is described
delay is enough. In addition to these global signals, there are with block design in Xilinx flow while the reconfigurable
other important signals that avoid data contention during full region is defined through HDL for description of the DUT.
or PR as mentioned in [20]. After the flip-flop is reinitialized A standard PR-flow is followed for floor planning the design
from the captured cell, the propagation and analysis of the and generating the full and partial bitstreams. It is worth
injected fault can be resumed by activating the clock again. mentioning that the partial bitstreams are generated with
The global signals, GCAP, and GSR, necessary for state settings for RAR and the generation of the logic allocation
saving and restoring in the proposed methodology, can cause file. The RAR partial bitstream is essential for masking the
state corruption of other memory elements. Therefore, it is static region and unmasking the reconfigurable region and is
imperative to protect other memory elements from uninten- downloaded once for these configuration settings. It remains
tional state corruption. To achieve such a selective injection, active during the rest of the fault-injection operation. The logic
the FPGA architecture should be considered. Fig. 1 shows allocation file contains the design flip-flop information only
that memory elements are grouped into clock regions for related to the reconfiguration region housing the DUT. The
local control over clock and reset circuitry and ease of clock logic allocation file uniquely identifies every memory element,
distribution. Each clock region can be separately controlled for example, flip-flops, in the configuration memory space. The
with tristate buffers to enable or disable GSR and the GCAP information is vital for SEU injection in a specific flip-flop as
signals. This capability allows the masking of the GSR and it contains the exact bit address of the corresponding capture
GCAP effects for the static region and their unmasking for the cell. All the generated files are fed to the developed fault
DUT region. This requires knowledge of certain configuration site list generator block which is responsible for generating
memory frames and individual control over particular bits a fault dictionary to be used during SEU injection in run
that enable GSR and GCAP to a region. These configuration time. The fault dictionary represents a mapping of the flip-
frames belong to a special configuration memory space that flop locations of the mapped design on FPGA to the exact
is responsible for controlling the clock and reset circuitry to location in the configuration memory space. The format is
the FPGA clock regions. Since this depends on the FPGA designed to uniquely identify the flip-flop position in terms
architecture, only vendor design automation tools can generate of its configuration frame and corresponding capture cell bit
such configuration as part of the bitstream. Such capability is offset. This fault dictionary is converted into a binary format to
not only important for fault injection but also for dynamic PR be stored in off-chip nonvolatile memories (DRAM or SDcard)
for state initialization [21]. These architectural features can to be later used for fault injection.
be leveraged for run-time fault injection in flip-flops given
that the DUT region conforms to PR rules particularly at C. SEU Injection Algorithm
the floor planning and the bitstream generation phases. The The fault-injection flow is presented in Algorithm 1 to
safe reinitialization of flip-flops in a partially reconfiguration be used with an on-chip processor on Xilinx FPGAs. The
region is ensured when the user applies the reset after recon- algorithm starts by loading the RAR partial bitstream from the
figuration (RAR) constraint [21]. These constraints embed external nonvolatile memory into the configuration memory.
certain configuration frames specific to the floor-planned PR- This is an essential one-time step for masking the static region
region in the partial bitstream. Although, these configuration and unmasking the reconfigurable region. The next step is
frames increase the size of the partial bitstream they are vital to load the fault dictionary from the external memory into
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
ULLAH et al.: EFFICIENT METHODOLOGY FOR ON-CHIP SEU INJECTION IN FLIP-FLOPS 993
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
994 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
ULLAH et al.: EFFICIENT METHODOLOGY FOR ON-CHIP SEU INJECTION IN FLIP-FLOPS 995
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018
Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.