You are on page 1of 8

IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO.

4, APRIL 2018 989

An Efficient Methodology for On-Chip SEU


Injection in Flip-Flops for Xilinx FPGAs
Anees Ullah , Pedro Reviriego, and Juan Antonio Maestro

Abstract— Field-programmable gate array (FPGA)-based significantly outnumber the user memory bits, SEU-emulation
single-event upset (SEU) emulation in user flip-flops is essential in user memories may not be critical for reliability evaluation
for reliability evaluation of mapped designs. Previous approaches of FPGA designs [3]. However, a fault-injection methodology
to inject SEUs in user flip-flops utilized the configuration memory
bits that control the set and reset settings of the flip-flops. may support it for completeness. In contrast, when the FPGA
In contrast, this paper presents a novel approach for SEU is used as substitute hardware for fault emulation in ASICs,
emulation in user flip-flops contents through single-frame on-chip only SEU injection in user memory elements is of interest;
partial reconfiguration (PR). The presented methodology exploits in particular, in FFs due to their dynamic per-clock cycle
the inherent architectural features of the latest Xilinx FPGAs to nature. Similarly, single-event transients (SETs) effects can be
support state initialization of flip-flops during PR. The proposed
approach does not require instrumentation overhead for flip-flops indirectly emulated as SEUs in an SRAM-FPGA in the case
and reduces the fault-injection times by orders of magnitude. of substitute use, since SETs become observable only when
registered by memory elements. Therefore, SEU-emulation in
Index Terms— Fault injection, radiation effects, reliability
evaluations. user flip-flops is vital for reliability evaluation of ASICs.
There are two main approaches for SEU-emulation in flip-
I. I NTRODUCTION flops on an FPGA, i.e., instrumentation and reconfiguration
based. Instrumentation-based approaches add extra logic for
F AULT injection is an important tool for low-cost
reliability evaluation of electronic circuits against
radiation-induced soft errors. There are two approaches to
run-time bitflip insertion through flip-flops logical ports.
These modifications of the circuits can be done at different
injecting faults and evaluating their effects: through simula- levels of abstractions, i.e., HDL, netlist, mapping, or place
tion or emulation. Fault injection through simulation utilizes and route. These approaches are significantly fast, but they
the traditional circuit modeling and simulation tools (with have to instrument each design flip-flop, resulting in huge
their corresponding technology libraries). This minimizes the resource overhead. Reconfiguration-based approaches utilize
experimental setup times and provides higher observability on the configuration memory for run-time SEU injection using
a per-cycle basis, but suffers from very slow processing times. a methodology called readback capture and restore [4]. The
On the other hand, fault injection by emulation utilizes a capture operation updates the shadow configuration cells
hardware platform for this purpose and provides much faster (i.e., captured cells) with instantaneous flip-flop values while
injection and evaluation times, but limited observability [1]. the clock is stopped. Then, a readback procedure accesses the
Field-programmable gate arrays (FPGAs) provide an ideal corresponding configuration memory and flips the cell state
fault-emulation platform due to their higher logic densities on which the error is to be injected. This is followed by a
and reconfigurability. The design mapped to an FPGA could configuration memory write-back and a restore operation. The
represent the end product on a deployed FPGA or it may be restore operation reinitializes the flip-flops with the modified
an ASIC design utilizing the FPGA as emulation platform [2]. values. The clock is then allowed to resume after this operation
For an FPGA system, the most relevant fault model is a single- is completed. This methodology has been widely used for
event upset (SEU) due to their dependence on SRAM tech- flip-flops fault injection by the well-known FT-UNSHADES
nology. Since in such a design the configuration memory bits framework [5]. The platform supports fault injection in
configuration memory as well as in flip-flop contents.
Manuscript received September 26, 2017; revised December 6, 2017 and
January 29, 2018; accepted February 26, 2018. Date of publication However, injection in flip-flops is based on total reconfig-
March 6, 2018; date of current version April 12, 2018. This work was uration which is a time-consuming process. Furthermore,
supported by the Spanish Ministry of Economy and Competitiveness under FT-UNSHADES is based on a custom printed circuit board
Grant ESP2014-54505-C2-1-R.
A. Ullah is with the Center for Advanced Studies in Engineering, hosting multiple-FPGAs for controlling the whole injection
Islamabad 44000, Pakistan, and also with the ARIES Research Center, Escuela process. A recent work in [3] combines both instrumentation
Politécnica Superior, Universidad Antonio de Nebrija, 28015 Madrid, Spain and reconfiguration-based approaches to achieve a tradeoff
(e-mail: anees.ullah@case.edu.pk; aullah@nebrija.es).
P. Reviriego and J. A. Maestro are with the ARIES Research Center, Escuela between the area and the reconfiguration delay. This is based
Politécnica Superior, Universidad Antonio de Nebrija, 28015 Madrid, Spain on the utilization of the local reset lines of flip-flops in addition
(e-mail: previrie@nebrija.es; jmaestro@nebrija.es). to using the configuration memory control bits that define the
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. functionality of the local reset lines. Although, the proposed
Digital Object Identifier 10.1109/TNS.2018.2812719 method is an improvement compared to the previous ones,
0018-9499 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
990 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018

it limits the utilization of the FPGA slice resources to one for instrumentation of flip-flops and automates the mapping
flip-flop out of the available “n” (this value depends on the process from HDL level, thereby incurring slightly less over-
FPGA family for Virtex-5 it was 4; for 7-series FPGAs, it is head compared to the previous approaches. The downside
8) flip-flops due to architectural limitations. This limitation of instrumentation-based approaches is that they have huge
is due to the fact that the slice’s flip-flops have shared resource overheads. Moreover, the modifications have to be
control lines and they cannot be triggered independently. often applied at a postsynthesis level, imposing the need
Moreover, their approach is based on reverse engineering to develop ad hoc methodologies and tools for these tasks,
to locate the part of the configuration memory which thereby increasing design effort.
controls the behavior of local reset lines of the flip-flop. This
limits the applicability of the method across other device B. Reconfiguration-Based Approaches
families.
Reconfiguration-based approaches modify the correspond-
This paper presents an efficient partial-reconfiguration-
ing captured cells in the configuration memory related to
based SEU injection approach in user flip-flops without insert-
the flip-flops for fault injection. This process requires the
ing any extra hardware, while achieving significant increase
capturing of the current state of the flip-flops in the captured
in fault-injection speeds over existing reconfiguration-based
cells of the configuration memory, reading back the bitstream,
approaches. The presented methodology is based on single-
modifying it and writing it back to the captured cells for fault
frame fault injection at a selected fault site (flip-flop) with the
injection. Swift et al. [11] show the partial reconfiguration-
assertion of the chip-wide global reset line in a controlled
based injection in input–output configuration bits, the adjacent
manner to avoid corruption of other state elements of the
bits and the bits used for their registers’ settings (i.e., set and
design. Since, the presented methodology utilizes standard
reset) of an input–output block. Through single or multiple-
Xilinx partial reconfiguration (PR) flow and does not require
frames partial or full reconfiguration, the same employed fault-
any reverse engineering it is generic and can be applied across
injection technique can be extended to inject faults in all the
several Xilinx FPGA families. Moreover, because of the usage
user accessible resources in the FPGA, such as the FPGA
of standard design components the overall methodology is
configuration bits as well as the FFs. However, the method-
quite simple to implement. Therefore, the design and the
ology does not target the FFs but only their configuration
subsequent nonrecurring engineering efforts are minimized.
settings that affect their behavior. Antoni et al. [12] utilize a
The rest of this paper is organized as follows. Section II
reconfiguration-based fault injection through the JBits API in
presents the state-of-the-art SEU fault-injection platforms
flip-flops. In order to not affect other flip-flops, a complicated
supporting flip-flops. Section III outlines the presented
mechanism is adopted to avoid unintended state corruption.
approach. Section IV presents the developed test vehicle and
Moreover, the JBits API is obsolete and not applicable to
Section V shows the experimental results, while Section VI
state-of-the-art FPGAs. The reported fault-injection times are
provides the conclusion of this paper.
3.5 s per injection. Similarly, FADES [13] is another platform
based on JBits and reports 3.3 s per fault injection. FT-
II. R ELATED W ORK UNSHADES [14]–[16] is a well-known platform for flip-flop
SRAM-based FPGAs have been used for SEU-emulation fault injection based upon the Xilinx captures and restores
in user design flip-flops by many researchers over the last approach. This platform utilizes full-bitstream-based recon-
two decades. The existing techniques can be broadly classi- figuration for flip-flop injection through the JTAG interface.
fied into three categories: instrumentation-based approaches, For fault injection in flip-flops, the platform reads back the
reconfiguration-based approaches, and combined approaches. bitstream and the current state of the flip-flops from the FPGA.
Then, it iterates over the bitstream and readback file and
A. Instrumentation-Based Approaches updates the INIT values of the flip-flops while the intended
flip-flop state is inverted. This merging is a serial process and
The instrumentation-based approaches insert extra hardware
is off-chip; therefore, it takes a long time compared to on-
in the design’s flip-flops providing anchor points to the logical
chip PR-based approaches. Although, an improved version in
layer for run-time injection at the design’s speed. Some of
the form of FT-UNSHADES2 [5] improves upon the previous
the early works that use scan-chains-based methodologies for
versions to include various features, the platform is still
transient fault injection in flip-flops contents were reported
using full-bitstream-based fault injection for flip-flops. The
in [6] and [7]. A complete solution based on a variation
drawback of full-reconfiguration-based approaches is that they
of scan-chain-based instrumentation was proposed by Lopez-
have significant overhead with respect to the fault-injection
Ongil et al. [8] to increase the fault-injection speed and to
time compared to the instrumentation-based approaches.
minimize the error logging and the communication time with
the PC. A fault injection and fault masking analysis platform
is presented by Naviner et al. [9]. It is based on saboteur C. Combined Approaches and Limitations
insertion for supporting fault injection in any node of interest. Both instrumentation and reconfiguration approaches are
It can support multiple-fault models and is highly parameter- limited due to their resource and reconfiguration overheads.
izable. However, this flexibility and completeness is achieved Researchers have developed hybrid approaches to achieve an
with very high resource overhead. Similarly, the Netlist Fault optimal solution. López-Ongil et al. [17] presented a unified
Injection tool [10] modifies the FPGA primitive’s library fault-injection platform achieving results with injection in

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
ULLAH et al.: EFFICIENT METHODOLOGY FOR ON-CHIP SEU INJECTION IN FLIP-FLOPS 991

configuration memory through reconfiguration while in the


user flip-flops through instrumentation. Similarly, a recent
work by Serrano et al. [3] presented an interesting approach
utilizing both instrumentation and partial reconfiguration for
SEU injection in flip-flops. They achieve flip-flops content
inversion through local reset lines. The inversion is achieved
by modifying some configuration bits that control the behavior
of local reset line for each flip-flop. The position of these
bits in configuration memory is achieved through reverse
engineering of portions of the bitstream. Moreover, it has a
resource overhead of three look-up tables (LUTs) per flip-
flop that needs to be controlled. In contrast to [3], this
paper uses a global reset line for state inversion of the flip-
flops. A nonintrusive PR-based methodology is used, which
results in very fast fault-injection times and does not require
any modifications to the standard FPGA design flow. Simi-
larly, the methodology used in [5] cannot be applied to on- Fig. 1. SEU Injection in flip-flops using shadow registers.
chip injectors. This is because the total-reconfiguration may
result in state corruption of the injector control circuitry and
other memory elements. Section III presents the proposed connected to every memory element in the FPGA. Therefore,
methodology. it could result in unintentional state corruption of the design.
Fig. 1 shows the approach for SEU injection in flip-flops,
III. P ROPOSED M ETHODOLOGY while the design is active utilizing PR. Fig. 1 consists of
The proposed methodology supports SEU injection in the configuration control logic, clock regions, and a magnified
design’s flip-flops mapped on Xilinx SRAM-based FPGAs. view of a flip-flop inside an FPGA. Fig. 1 shows a functional
In particular, the fault injection is achieved through PR. model of the flip-flop’s capture and restores circuitry along
A read–modify–write (RMW) operation is performed on a with the masking logic that locally controls the behavior of the
single-configuration frame followed by asserting the chip-wide global signals and is extracted from [18] and [19]. The flip-
global signals. This proposed methodology protects against flop is augmented with circuitry to control its initialization
unintentional state corruption resulting from the activation of and state capture from the shadow storage (INIT) memory
the global reset signal, hence making it feasible to be used with element. The INIT memory element is usually implemented
on-chip fault injection in flip-flops. The proposed methodology with reduced functionalities as opposed to the actual flip-
achieves fast injection times without incurring any hardware flop to save silicon area. The flip-flop itself is implemented
overhead for FF instrumentation. Moreover, the methodology as a master–slave configuration and can be utilized as a
does not require any reverse engineering and is based on the latch. While the design is in operational mode, the current
standard PR flow. Therefore, it is applicable across different state of a flip-flop can be captured in the INIT storage by the
Xilinx FPGA families. In Section III-A explain the details of assertion of Global CAPture (GCAP) signal. It can be noted
our proposal. that the INIT storage can be initialized from bitstream or from
the current state of the flip-flop through a GCAP signal
which controls a multiplexer “M2.” The GCAP is a global
A. SEU Injection in Flip-Flops Through Xilinx PR signal that connects to every memory element of the FPGA.
This section outlines the architectural features of the Xilinx Before the capture process is initiated, the clock to the par-
SRAM-based FPGAs which are essential for fault injection tially reconfigurable design [which represents a design under
in the flip-flop contents through the proposed methodology. test (DUT)] has to be stopped to freeze the state of flip-
Sequential logic elements, for example flip-flops, need to flops. The captured values (now stored in the INIT captured
be initialized to predefined user values for correct execution cells) can then be readback from the configuration memory
of circuits mapped onto SRAM-based FPGAs. The RT-level to invert the contents of a flip-flop and written back to the
user defined initialization of memory elements is embedded configuration memory. This is followed by asserting the global
inside the generated configuration bitstream. During the initial set reset (GSR) [4] signal to transfer capture cell values to
bitstream download process, captured cells are initialized with actual flip-flops. Fig. 1 also shows that the shadow storage
these values. As this process is done before the FPGA enters output is connected to the flip-flop through a multiplexer
the user mode of operation, the global asynchronous signals “M1” controlled by GSR and the logical OR of the local set
which are vital for transferring captured cells values to the reset (SR) line. This means that the GSR or SR signal can
flip-flops do not cause any synchronization issues. However, be used to allow the INIT storage output to affect the flip-
to achieve fault injection while the design is in operation, flop contents. The FPGA internal architecture is designed in
changing the flip-flop contents through the captured cells needs a way that the SR line is shared by all the flip-flops inside
particular considerations. This is due to the asynchronous a slice keeping in view that the neighboring flip-flops will
nature of the global signals along with the fact that they are be fed by the same clocking and reset signals. Furthermore,

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
992 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018

it can be noted that the other input to the multiplexer “M1” is for proper reinitialization of the PR-region after the partial
from the configurable logic block (CLB) logic (e.g., LUTs). bitstream is delivered to the FPGA. The purpose of this feature
In the normal operation of the device, the multiplexer always is to ensure that the partial reconfigurable design’s flip-flops
forward the CLB outputs to the flip-flop. Although, GSR and are initialized to user defined RT-level description. In our case,
SR can both be used for transferring the INIT storage value we utilize this configuration frame to protect the static region
into the flip-flop, the utilization of the local SR signal needs from undesired state corruption during the capture and restores
instrumentation methods that would add resource overhead for operations when the GSR is asserted. Furthermore, we utilize
fault injection, as was done in [3]. It can be noted that the the partial bitstream once for unmasking the PR-region and
global write enable (GWE) signal enables the usage of the masking the rest of the FPGA. The described fault-injection
flip-flop for writing purposes and acts in a similar fashion to technique uses single-frame modifications, achieving hence
the clock enable (CE) signal local to the slice. The GWE signal significant fast injection times. In Section III-B illustrates the
disables writing to memory elements during the bitstream implementation flow for achieving fault injection in flip-flops
download process until the configuration memory is initialized. through PR.
The assertion of these global signals has to be done carefully
to avoid undesired effects on the design. The asynchronous B. Design Flow and Fault Site List Generation
nature of these signals along with signal propagation delays The design and fault list generation flow for Xilinx FPGAs
across the chip means that timing violation can be caused if is shown in Fig. 2. The flow starts with the description of
the clocking constraints are not met. This could be ensured by static and reconfiguration regions. The static region contains
the reconfiguration controller in the user design. For example, the portion of design that is fixed during run time while the
when a microprocessor is used for controlling the internal reconfigurable region undergoes changes. The static region
configuration access port (ICAP), a few no operations can consists of a microprocessor and its peripheral connected
ensure timing, although, in most cases the software processing through a standard bus protocol. The static region is described
delay is enough. In addition to these global signals, there are with block design in Xilinx flow while the reconfigurable
other important signals that avoid data contention during full region is defined through HDL for description of the DUT.
or PR as mentioned in [20]. After the flip-flop is reinitialized A standard PR-flow is followed for floor planning the design
from the captured cell, the propagation and analysis of the and generating the full and partial bitstreams. It is worth
injected fault can be resumed by activating the clock again. mentioning that the partial bitstreams are generated with
The global signals, GCAP, and GSR, necessary for state settings for RAR and the generation of the logic allocation
saving and restoring in the proposed methodology, can cause file. The RAR partial bitstream is essential for masking the
state corruption of other memory elements. Therefore, it is static region and unmasking the reconfigurable region and is
imperative to protect other memory elements from uninten- downloaded once for these configuration settings. It remains
tional state corruption. To achieve such a selective injection, active during the rest of the fault-injection operation. The logic
the FPGA architecture should be considered. Fig. 1 shows allocation file contains the design flip-flop information only
that memory elements are grouped into clock regions for related to the reconfiguration region housing the DUT. The
local control over clock and reset circuitry and ease of clock logic allocation file uniquely identifies every memory element,
distribution. Each clock region can be separately controlled for example, flip-flops, in the configuration memory space. The
with tristate buffers to enable or disable GSR and the GCAP information is vital for SEU injection in a specific flip-flop as
signals. This capability allows the masking of the GSR and it contains the exact bit address of the corresponding capture
GCAP effects for the static region and their unmasking for the cell. All the generated files are fed to the developed fault
DUT region. This requires knowledge of certain configuration site list generator block which is responsible for generating
memory frames and individual control over particular bits a fault dictionary to be used during SEU injection in run
that enable GSR and GCAP to a region. These configuration time. The fault dictionary represents a mapping of the flip-
frames belong to a special configuration memory space that flop locations of the mapped design on FPGA to the exact
is responsible for controlling the clock and reset circuitry to location in the configuration memory space. The format is
the FPGA clock regions. Since this depends on the FPGA designed to uniquely identify the flip-flop position in terms
architecture, only vendor design automation tools can generate of its configuration frame and corresponding capture cell bit
such configuration as part of the bitstream. Such capability is offset. This fault dictionary is converted into a binary format to
not only important for fault injection but also for dynamic PR be stored in off-chip nonvolatile memories (DRAM or SDcard)
for state initialization [21]. These architectural features can to be later used for fault injection.
be leveraged for run-time fault injection in flip-flops given
that the DUT region conforms to PR rules particularly at C. SEU Injection Algorithm
the floor planning and the bitstream generation phases. The The fault-injection flow is presented in Algorithm 1 to
safe reinitialization of flip-flops in a partially reconfiguration be used with an on-chip processor on Xilinx FPGAs. The
region is ensured when the user applies the reset after recon- algorithm starts by loading the RAR partial bitstream from the
figuration (RAR) constraint [21]. These constraints embed external nonvolatile memory into the configuration memory.
certain configuration frames specific to the floor-planned PR- This is an essential one-time step for masking the static region
region in the partial bitstream. Although, these configuration and unmasking the reconfigurable region. The next step is
frames increase the size of the partial bitstream they are vital to load the fault dictionary from the external memory into

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
ULLAH et al.: EFFICIENT METHODOLOGY FOR ON-CHIP SEU INJECTION IN FLIP-FLOPS 993

it is masked. Note that after the TriggerGSR() function at


line 24, there is a software delay introduced to enable the
physical signal propagation and avoid timing violations. In this
case, the delay involved due to the Advanced eXtensible Inter-
face (AXI) protocol is enough to ensure no timing violation
will occur. However, if the proposed methodology is to be used
without a processor, a delay of some cycles after the triggering
of GSR is necessary. The injection of an SEU is followed
by its testing in the final phase. This is accomplished by the
application of test patterns from the processor to the DUT.
After the completion of the fault insertion, the DUT design
is resumed and is concurrently checking its inputs for errors.
The software counter is automatically incremented upon the
detection of an error. Also, the detected fault site is saved.
Fig. 2. Implementation flow for Xilinx FPGAs. These steps are outlined in lines 28–32. Note that the algorithm
is equipped with additional techniques that allow the count of
the system memory. This is followed by applying a “reset”
the elapsed clock cycles between the start and the end of the
to the reconfigurable DUT region. Then, the algorithm picks
operations of interest, as shown in line 17 and lines 26 and
up a fault (frame and bit pair) from the fault dictionary for
27. Finally, once all the fault sites are evaluated the algorithm
injection. The algorithm enters the frame formatting phase to
reports the final results to the PC using UART interface.
convert the frame and bit pair to the frame address register
format [4]. The top/bottom fields identify the FPGA half in IV. T EST V EHICLE
which the flip-flop resides. The row address identifies the cor- The experimental results are collected on a ZynQ-based
responding resource row inside the half. The column address system-on-chip (SoC) integrating a dual-core Cortex-A9
pin-points the resource column (e.g., CLB and BRAM) while processor and an Artix-7 FPGA available on the Zedboard. It is
the minor offset locates the exact frame inside the resource worth mentioning that any other microprocessor, for example,
column (each resource has different number of frames; the a Xilinx Microblaze or a custom hardware architectural imple-
CLB has 36 minor frames). Last, the bit offset is the position mentation is also possible such that the configuration memory
of captured cells (INT0/INT1) related to a flip-flop. These is accessed through the ICAP. The processor in the static
parameters are extracted from the fault dictionary entry with region is realized through an ARM core in the programmable
several masking operations (lines 5–12 of the algorithm) system (PS) region while the rest of static region and the
carried out according to [4]. This leads the algorithm to state reconfigurable region maps into the programmable logic (PL).
capture, readback, and modify phase. The algorithm picks Several AXI peripherals including ICAP, Startup primitive (for
up an injection cycle either serially or randomly, resets the control over GSR line), timer, UART controller are used from
benchmark circuit (and starts DUT execution) and waits for Xilinx IP catalogs. A custom HDL clock manager module is
the injection cycle to reach. This is supported by the clock designed and connected through the AXI bus to control the
and reset manager in hardware. When the injection cycle CE signal to both copies of the benchmark circuit i.e., the
is reached, the clock to the benchmark circuit is stopped DUT and the GOLD modules. This enables a cycle-by-cycle
immediately. The clock and reset manager then interrupts application of inputs to the benchmarks from the processor.
the processor to start the fault-injection procedure using the This module also keeps track of the benchmark cycles and
Xilinx software libraries. The first of the executed commands enables selection of specific cycle for SEU injection and
is the GCAPTURE, which is issued through the ICAP to gives a feedback to the processor in case such a cycle event
the configuration memory. As an RAR partial bitstream is occurs. Apart from these responsibilities, the clock manager
already downloaded (line 1) to the FPGA, this command only also resets the benchmark circuit when the evaluation of an
captures the state of reconfigurable region as the static region SEU is completed. This is necessary to avoid unintended SEU
is masked. After the current state of flip-flops is captured in accumulation in registers. The comparator module is respon-
the configuration memory shadow cells, a readback() function sible for detecting any mismatches between the GOLD and
utilizes the fault dictionary format (lines 5–12) to readback the DUT copies of the benchmark circuit while the inputs are
the exact configuration memory frame. Once the frame is in being applied from the processor. An error event interrupts the
the processor memory space, the corresponding word and bit processor when an error is detected. The processor keeps count
position from the fault dictionary is used to invert a specific of the detected faults and resets the benchmark circuit for the
bit. These operations are achieved by using the logic outlined next SEU evaluation. Note that it is also possible to duplicate
in lines 20–22. The next phase is configuration memory write- the execution in time without hardware replication. In such
back and GSR triggering. The Writeback() function transfers case, two runs will be required for the expected and the FI
the modified FF content to the configuration memory. This is results collection and then the comparison can be done offline.
followed by a software-level trigger of the GSR port located
on the Startup peripheral of the system. This results in the V. R ESULTS AND D ISCUSSION
inversion of the selected single flip-flop in the reconfigurable The experimental results are collected using the commonly
region while avoiding any changes in the static region as adopted International Test Conference (ITC) benchmarks and

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
994 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018

Algorithm 1 Fault Injection in Flip-Flop


Inputs: Partial Bitstream, Fault Dictionary
Outputs: Log file, Timing measurements
∗∗∗∗∗∗∗∗∗∗∗∗ Initialization Phase ∗∗∗∗∗∗∗∗∗∗∗∗∗
1: LoadPartialBitstram();
2: fd_array[size] = LoadFaultDictionary();
3: ApplyDutReset();
4: foreach fault site “fs” in fd_array do
// frame formatting
5: frame = fs.frame;
6: Bit = fs.bit;
7: Top = (frame & 0x00400000)  22;
8: Row = (frame & 0x003E0000)  17;
9: Major = (frame & 0x0001FF80)  7;
10: Minor = (frame & 0x0000001F)  0;
11: word = Bit/32;
12: INITbit = Bit%32;
// state capture, readbackand modify
operation Fig. 3. Processor-based implementation.
13: Cycle = PickUpInjectCycle(); TABLE I
14: EnableClockDUT(); R ESOURCE U TILIZATION OF B ENCHMARKS
15: ApplyDutReset();
16: StopClockDUT(Cycle);
17: Time1 = StartTimer();
18: TriggerCapture();
19: rdframe[] = Readback
(Top,Block,Row,Major,Minor);
20: rdword = rdframe[word];
21: rdword = (word ^(1  INITbit));
22: rdframe[word] = rdword;
// configuration memory writeback and GSR
triggering
23: WriteBack(Top,Block,Row,Major,Minor);
24: TriggerGSR();
25: Wait();
26: Time2 = StopTimer();
27: CyclesTaken = Time2 – Time1;
// testing
28: while (fault is not detected)
29: ApplyDutInputs(); methodology and users can run this comparison in software.
30: if (fault is detected) Table I specifies the resources in terms of LUTs and FFs
31: detected_faults++; for the DUT copy of the implemented benchmark circuit.
32: detectedsites[]=fs; The number of flip-flops is proportional to the size of the
33: end_foreach
34: Report(detected_faults); fault list that needs to be injected. Furthermore, the size of
35: Report(detectedsites[]); partial bitstream with RAR features is also reported. This
36: Report(MeasuredTime); bitstream needs to be downloaded once before the start of the
fault-injection campaign. This is necessary for masking static
region and unmasking the reconfigurable region as discussed
in Section III-A. As shown in Table I, and because of the
a few DSP circuit designs. The considered circuits contain similar floor-planning requirements (the small size of each
feedforward as well as feedback circuits. Also, different archi- benchmark circuit and the enabled RAR features), all the
tectures of the same benchmark circuit are considered for ITC benchmarks have an RAR bitstream size of 149 kB.
their reliability evaluation. For example, the FIR filter has In 7-series FPGAs, RAR features are physically supported
been implemented with two architectural choices; the first by instantiating partition pins in the routing interconnect tile.
is a serial design and the second one is a cascaded serial Since in these FPGA families the routing interconnect tiles
one. Table I shows the resource utilization of the benchmark are located on the left and the right of the CLB columns,
designs mapped into reconfigurable regions. Note that the the PR requirements dictate that the minimum size to support
resource utilization of the static region is not reported. This RAR features is two CLB columns. This is an essential
is due to the fact that the ARM processor is in the PS region consideration ensured by vendors’ design tools through PR
and a few of the required static IPs are in the PL region. design rule checks. Therefore, postreconfiguration flip-flops
Therefore, the static logic in PS and PL regions remains fixed state initialization depends on this kind of physical layout and
for all the benchmarks. However, the size of the comparator will not work without it.
required to implement the scheme in Fig. 3 is variable, but Table II represents the fault-injection statistics of the bench-
it is not reported because it is not a requirement of the mark circuits for different number of SEUs per FF i.e.,

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
ULLAH et al.: EFFICIENT METHODOLOGY FOR ON-CHIP SEU INJECTION IN FLIP-FLOPS 995

TABLE II TABLE III


FAULTS S TATISTICS C YCLES TAKEN BY I NJECTION O PERATIONS

radiation test experiments. Our methodology can be easily


integrated into existing approved fault-injection platforms.
Table III gives the minimum and the maximum number of
clock cycles taken by the individual fault-injection operations
required to upset a flip-flop. Modifying the configuration mem-
ory always requires RMW cycle [20] on Xilinx FPGAs. How-
ever, in addition, the flip-flop bit inversion requires capture
and restores operations. Furthermore, the capture operation
is carried out using the Xilinx software routine. However,
the restore operation is carried out using the Startup primitive
instantiated inside the AXI ICAP peripheral instead of the
GRESTORE library command. This is because the restore
routines in the Xilinx software libraries were not working.
This may be due to limitations of the software library as noted
100 SEUs/FF, 400 SEUs/FF, and 700 SEUs/FF. The results in [22]. The processor controls the GSR input of the startup
appear as a ratio of the number of SEUs that caused failure primitive through an AXI GPIO port. Therefore, the capture
at the circuit’s primary outputs (i.e., the numerator) over the operation takes more clock cycles than the restore. A hardware
total number of injected SEUs (i.e., the denominator). The instantiation of both of the primitives and its direct control
injected SEUs are a subset of the total SEUs which represent from the EMIO ports of the PS could improve the cycle count
the entire space of all the possible SEUs for a benchmark. further. The actual fault-injection time depends on several
It is the product of the size of inputs, the SEU sites (i.e., flip- factors. Firstly, the speed of injection depends on the operating
flops in our case) and the total number of cycles taken by the frequencies of the processor and the interconnect AXI bus.
execution of the benchmark. As the total number of SEUs is Second, the frequency with which the ICAP primitive is
the product of three variables, the entire fault space may be toggled determines how fast the data can be accessed and
quite significant and in certain cases infinite, for example, in retrieved from the configuration memory. Finally, the frame
finite-state machines (FSMs) with a serial infinite data stream size, which depends on the FPGA family, is also an important
(i.e., B01 and B02 in our benchmarks). Therefore, the injected variable that may affect the injection speed.
SEUs could be sampled randomly by selecting certain clock Table IV measures the faults injected per second con-
cycles for injection or randomly selecting an input for each sidering these variables. Table IV compares three methods:
clock cycle. It can be noted from Table II that for different the proposed one and two relevant works. Note that the
number of injected SEUs per FF, the obtained benchmark’s proposed method has been extrapolated with different bus
failure rate remains almost the same for 100 SEUs per FF and frequencies that are supported with various 7-series FPGAs.
above. The experimental test results showed that the failure These maximum bus frequencies are related to the ICAP bus
rate is quite sensitive to the variations in the input patterns. interface port and not to the ICAP switching clock port which
The FIR filters were tested exhaustively with all possible 16-bit always remains at 100 MHz. It can be noted that on Kintex-
input patterns. The cascaded serial architecture shows more 7 the maximum supported AXI frequency of 220 MHz can
vulnerability to SEUs than the fully serial architecture because be achieved. These frequencies are based on Xilinx ICAP
of slow faults propagation and reset postinjection. The Viterbi documents [23]. The second parameter that needs to be con-
decoder is tested with a serial bitstream of inputs. The results sidered when evaluating the fault-injection times is the frame
show that it is quite resilient to SEUs in flip-flops due to its size. For example, in the case of Virtex-5 FPGAs, the frame
internal error correction capabilities. is composed of 41 words each of 32 bits. However, in the
Since fault injection should realistically mimic the behavior case of the 7-series FPGAs, the frame size has increased
of beam irradiation, it is very important to validate any fault- to 101 32-bit words. While the frame size has increased,
injection platform. However, separately irradiating flip-flops the ICAP switching frequency has not seen any improvements
are not possible for validation as previously noted in [3]. In in current FPGAs. This frequency determines how fast the
addition, in our case we are only presenting a methodology for data exchange with the configuration memory can run at.
SEU injection in flip-flops and not a fault-injection platform. The overall time taken by a single fault injection using our
Fault-injection platforms are approved when the same test methodology requires 142.27 µs when the PS is running at
platform shows consistent results for both fault injections and 666.7 MHz while the AXI bus and the ICAP_clk pin is clocked

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.
996 IEEE TRANSACTIONS ON NUCLEAR SCIENCE, VOL. 65, NO. 4, APRIL 2018

TABLE IV FI technique supports very fast fault injection, making it


FAULTS I NJECTED P ER S ECOND suitable for large injection campaigns.
R EFERENCES
[1] A. Ullah, E. Sanchez, L. Sterpone, L. A. Cardona, and C. Ferrer,
“An FPGA-based dynamically reconfigurable platform for emulation of
permanent faults in ASICs,” Microelectron. Rel., vol. 75, pp. 110–120,
Aug. 2017.
[2] H. M. Quinn, D. A. Black, W. H. Robinson, and S. P. Buchner,
“Fault simulation and emulation tools to augment radiation-hardness
assurance testing,” IEEE Trans. Nucl. Sci., vol. 60, no. 3,
pp. 2119–2142, Jun. 2013.
[3] F. Serrano, J. A. Clemente, and H. Mecha, “A methodology to emulate
at 100 MHz. Changing the frequency of AXI bus from 100 to single event upsets in flip-flops using FPGAs through partial reconfig-
120 MHz, while the ICAP_clk remains at 100 MHz, the fault- uration and instrumentation,” IEEE Trans. Nucl. Sci., vol. 62, no. 4,
pp. 1617–1624, Aug. 2015.
injection time reduced to 118.5 µs. Similarly, when the AXI [4] 7 Series FPGAs Configuration User Guide (UG470), Xilinx Inc.,
bus is clocked at a 220 MHz on Kintex-7, the resulting fault- San Jose, CA, USA, Sep. 2016.
injection time reduces to 64 µs. Serrano et al. [3] reported a [5] J. Mogollon, H. Guzman-Miranda, J. Napoles, J. Barrientos, and
M. Aguirre, “FTUNSHADES2: A novel platform for early evaluation
fault-injection time of 630 µs on a Virtex-5 FPGA platform. of robustness against SEE,” in Proc. 12th Eur. Conf. Radiat. Effects
Therefore, we can achieve 5.3 times improvement in fault- Compon. Syst., pp. 169–174, Sep. 2011.
[6] P. Civera, L. Macchiarulo, M. Rebaudengo, M. S. Reorda, and
injection speeds without incurring any hardware overhead M. Violante, “An FPGA-based approach for speeding-up fault injection
for FF instrumentation. Furthermore, improvements in speed campaigns on safety-critical circuits,” J. Electron. Test., Theory Appl.,
can be achieved using the ICAP port when highly optimized vol. 18, no. 3, pp. 261–271, Jun. 2002.
[7] P. Civera, L. Macchiarulo, M. Rebaudengo, M. S. Reorda, and
hardware controllers are developed or based on streaming M. Violante, “New techniques for efficiently assessing reliability of
interfaces between the PS and PL. It is worth mentioning that SOCs,” Microelectron. J., vol. 34, no. 1, pp. 53–61, Jan. 2003.
the smallest granularity to access and change the configuration [8] C. Lopez-Ongil, M. Garcia-Valderas, M. Portela-Garcia, and L. Entrena,
“Autonomous fault emulation: A new FPGA-based acceleration sys-
memory is a single frame using PR. Since, our proposed tem for hardness evaluation,” IEEE Trans. Nucl. Sci., vol. 54, no. 1,
methodology is based on frame-level configuration memory pp. 252–261, Feb. 2007.
modifications; we can achieve faster SEU injection times than [9] L. A. B. Naviner, J. F. Naviner, G. G. dos Santor, Jr., E. C. Marques,
and N. M. Pavia, Jr., “FIFA: A fault-injection–fault-analysis-based tool
in the case of commonly used FI techniques. The reported for reliability assessment at RTL level,” Microelectron. Rel., vol. 51,
faults per second in [3] and [5] are compared with our pp. 1459–1463, Sep. 2011.
[10] W. Mansour and R. Velazco, “An automated SEU fault-injection method
proposed method in Table IV. It can be noted that when the and tool for HDL-based designs,” IEEE Trans. Nucl. Sci., vol. 60, no. 4,
proposed methodology is implemented on a Kintex-7 FPGA, pp. 2728–2733, Aug. 2013.
the supported number of faults per second are 15 625 com- [11] G. M. Swift et al., “Dynamic testing of Xilinx Virtex-II field program-
mable gate array (FPGA) input/output blocks (IOBs),” IEEE Trans. Nucl.
pared to the 10 000 faults per second supported by FTUN- Sci., vol. 51, no. 6, pp. 3469–3474, Dec. 2004.
SHADES2 [5]. Moreover, if an FSM-based ICAP controller [12] L. Antoni, R. Leveugle, and B. Feher, “Using run-time reconfiguration
is developed for the proposed methodology, the fault-injection for fault injection applications,” IEEE Trans. Instrum. Meas., vol. 52,
no. 5, pp. 1468–1473, Oct. 2003.
speeds will be improved by orders of magnitude. As discussed [13] D. de Andres, J.-C. Ruiz, D. Gil, and P. Gil, “Fault emulation for
by Nazar and Carro [24], the injection cycles depend on dependability evaluation of VLSI systems,” IEEE Trans. Very Large
the number of words in the frame and some commands Scale Integr. (VLSI) Syst., vol. 16, no. 4, pp. 422–431, Apr. 2008.
[14] M. A. Aguirre, J. N. Tombs, A. Torralba, and L. G. Franquelo,
for setting up the configuration engine. For Virtex-5, they “UNSHADES-1: An advanced tool for in-system run-time hardware
reported 215 cycles. Similarly, for 7-series FPGAs, the number debugging,” in Field Programmable Logic and Application. Springer-
Verlog, Sep. 2003, pp. 1170–1173.
of cycles required for injection with an FSM-based solution [15] M. A. Aguirre et al., “Microprocessor and FPGA interfaces for in-system
would be far less than measured with AXI-timers in Table III. co-debugging in field programmable hybrid systems,” Microprocess
Therefore, our proposed methodology provides the fastest Microsyst., vol. 29, nos. 2–3, pp. 75–85, Apr. 2005.
[16] M. A. Aguirre et al., “Selective protection analysis using a SEU
achievable fault injection times compared to the recent fault- emulator: Testing protocol and case study over the Leon2 processor,”
injection approaches in flip-flops even though the 7-series IEEE Trans. Nucl. Sci., vol. 54, no. 4, pp. 951–956, Aug. 2007.
[17] C. López-Ongil et al., “A unified environment for fault injection at any
frames are more than twice the size of Virtex-5 FPGA on design level based on emulation,” IEEE Trans. Nucl. Sci., vol. 54, no. 4,
which the previous approaches were implemented. pp. 946–950, Aug. 2007.
[18] D. P. Schultz, L. C. Hung, and F. E. Goetting, “Programmable logic
VI. C ONCLUSION device capable of preserving user data during partial or complete
This paper has presented a fast SEU injection approach into reconfiguration,” U.S. Patent 6 507 211 B1, Jan. 14, 2003.
[19] M. L. Voogel, “Programmable memory element with power save mode
user flip-flops on Xilinx FPGAs. The methodology leverages in a programmable logic device,” U.S. Patent 7 239 173 B1, Jul. 3, 2007.
the existing state initialization features that are applied post-PR [20] B. J. Blodget et al., “Reconfiguration of a programmable logic device
and does not require any reverse engineering. The exploitation using internal logic,” U.S. Patent 6 920 627 B2, Jul. 19, 2005.
[21] Vivado Design Suite User Guide: Partial Reconfiguration (UG909),
of this feature allows selective state capture and restores Xilinx Inc., San Jose, CA, USA, Apr. 2016.
capabilities, which are vital for fault injection in a target flip- [22] M. Happe, A. Traber, and A. Keller, “Preemptive hardware multi-
tasking in ReconOS,” in Applied Reconfigurable Computing. Springer,
flop without corrupting other state elements. The presented Mar. 2015, pp. 79–90.
methodology is particularly suitable for state-of-the-art FPGA [23] AXI HWICAP v3.0 LogiCORE IP Product Guide (PG134), Xilinx Inc.,
SoCs due to on-chip availability of a processor along with PL. San Jose, CA, USA, Oct. 2016.
[24] G. L. Nazar and L. Carro, “Fast single-FPGA fault injection platform,”
Finally, because the methodology is on-chip, it can be easily in Proc. IEEE Int. Symp. Defect Fault Tolerance VLSI Nanotechnol.
reproduced using available FPGAs. As a result, the proposed Syst. (DFT), Oct. 2012, pp. 152–157.

Authorized licensed use limited to: University of Szeged. Downloaded on February 13,2023 at 16:58:51 UTC from IEEE Xplore. Restrictions apply.

You might also like