You are on page 1of 26

Tradeoffs in Flight-Design Upset

Mitigation in State-of-the-Art FPGAs

Hardened By Design
vs.
Design-Level Hardening
Gary M. Swift and Ramin Roosta
Jet Propulsion Laboratory / California Institute of Technology

The research done in this paper was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under contract with the
National Aeronautics and Space Administration (NASA) and was partially sponsored by the NASA Electronic Parts and Packaging Program.

Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise, does not
constitute or imply its endorsement by the United States Government or the Jet Propulsion Laboratory, California Institute of Technology.

Swift and Roosta 1 144_C4 / MAPLD04


In the beginning was Actel …

• Leveraging from a commercial product line


 ONO anti-fuse based  one-time programmable
(OTP)

• “beginning” = 1993
 Reference:
Katz, R.; Barto, R.; McKerracher, P.; Carkhuff, B.; Koga, R.;
“SEU hardening of field programmable gate arrays
(FPGAs) for space applications and device
characterization,” IEEE Transactions on Nuclear Science,
Dec. 1994

Swift and Roosta 2 144_C4 / MAPLD04


Later, Xilinx
Leveraging from a commercial product line
 SRAM based  reconfigurable

“later” = 1998
 Reference: Guertin, S.M.; Swift, G.M.; Nguyen, D.; “Single-event
upset test results for the Xilinx XQ1701L PROM”, Radiation Effects
Data Workshop Record, 1999
 Quote:
(Xilinx SRAM-based FPGAs)… “do appear suited to a broad
range of other (non-critical) applications, such as sensor and
camera controllers.”

Swift and Roosta 3 144_C4 / MAPLD04


OUTLINE
• FPGAs:
A key enabling technology for modern spacecraft

• Background in radiation testing of FPGAs


▫ Earlier, Katz/Swift collaboration
▫ Recently, Xilinx Consortium

• Feature Comparison

• Triple Modular Redundancy (TMR) -


hardware approach vs. software approach

• Concluding Remarks

Swift and Roosta 4 144_C4 / MAPLD04


FPGAs: A key “enabling technology”

Like custom ASICs, FPGAs can replace whole boards


 Saving mass, volume, power
 Achieving extra functionality

FPGAs are much cheaper than ASICs


 Design efforts can be later in the schedule
 Design mistakes don’t require a re-spin through the
foundry

Swift and Roosta 5 144_C4 / MAPLD04


MER Pyro-Controller

Used self-checking of configuration to initiate a re-


configuration after spotting an upset

Swift and Roosta 6 144_C4 / MAPLD04


MER Pyro-Controller
Nearing Mars Xilinx XQR4062XL
30
predicted
25 MER-A Nov. 23
MER-A
20 MER-B Nov. 23
# of Upsets

MER-B

15

10
Oct. 28
5 Oct. 28
MER-A

MER-B
0
0 50 100 150
Days after Launch
Swift and Roosta 7 144_C4 / MAPLD04
My Background

• Actel experience is older


 No direct involvement in radiation tests since the
ONO anti-fuse was replaced
 Results here are from others’ work

• Xilinx experience is recent


 Active participant in Xilinx Rad Test Consortium
 Currently, finishing two+ year test campaign
targeting the Virtex II family

Swift and Roosta 8 144_C4 / MAPLD04


Currently Available Devices

Actel RT54SX-S family vs. Xilinx Virtex II family


(-SU)

Note: both are essentially immune to single-event latchup


and have good total ionizing dose tolerance,
[ Actel > 135 krad(Si); Xilinx > 200 krad(Si) ]

Swift and Roosta 9 144_C4 / MAPLD04


Main Feature Comparison
Actel Xilinx
RT54SX72S XQR2V6000

Gates: 72,000 ~6M ( /~3.2 )


flip/flops: 2012 67,584 / 3.2 = ~20k

I/O Pins: 360 824 / 3 = 274

Speed external : 230 MHz 622 Mb/s (I-mode LVDS)


Speed internal : 310 MHz 360 MHz

Swift and Roosta 10 144_C4 / MAPLD04


Extra Features Comparison
Actel Xilinx
RT54SX72S XQR2V6000

Block RAM: no 2.5 Mb

I/O Standards: many many

Others: hardwired TMR Clock Manager


Multipliers

Swift and Roosta 11 144_C4 / MAPLD04


Actel: What bits can upset?

User flip-flops only


 Direct hits of same flip/flop in multiple domains
▫ Very unlikely due to layout
 Clock domain hits

SEFI modes essentially eliminated

Swift and Roosta 12 144_C4 / MAPLD04


Xilinx: What bits can upset?

× NAND
× Ex-OR
• Configuration Bits × Flip-Flop type
 Logical Function × etc…

 Routing
 User Options
× Type of I/O
• Block RAM × Mode of Block RAM Access
× Clock Manager
× etc…
• User Flip-flops

• Control Registers

Swift and Roosta 13 144_C4 / MAPLD04


Xilinx: Heavy Ion Test Results

Low Threshold Low Susceptibility


(soft) (hard)
1.E-07
Cross Section per Bit (cm2)

1.E-08

1.E-09

1.E-10
X-2V1000 configuration bits
Weibull Curve Fit

1.E-11
0 10 20 30 40 50 60 70
2
LET (MeV-cm /mg)

Resulting in fairly low in-space rates:


~6 per day for 2V6000 in GCRmin.
Swift and Roosta 14 144_C4 / MAPLD04
Actel: Heavy Ion Test Results
Where’s Threshold Low Susceptibility
??? (~100x harder)
Data for two 1.E-09
10-9
RTAX2000S
prototypes
at 1 MHz using
Cross Section (cm )
2

checkerboard
pattern

from Fig. 12,


JJ Wang et al.,
NSREC 2003
[Ref. 1] 307
315
1.E-10
10-10
0 20 40 60 80 100 120
2
LET (MeV-cm /mg)

Very low in-space rates (assume LETth > 40 achieved):


~1 per 6800 years for SX72-S in GCRmin.
Swift and Roosta 15 144_C4 / MAPLD04
Actel-style TMR

SX-A “R” cell

triplicates to:

RTSX-S
“R” cell

Swift and Roosta 16 144_C4 / MAPLD04


Actel-style TMR

Actel-style TMR is fairly straightforward:

 Each flip-flop is replaced by three plus feedback voter

 Triplicated elements spread out physically

 Uses one clock/inverse-clock domain

 No external parts needed

Swift and Roosta 17 144_C4 / MAPLD04


Xilinx-style TMR
Xilinx-style TMR is more complicated:
 First, it’s not too useful without
configuration scrubbing
 Whole functional blocks are triplicated,
not individual flip-flops
 Three voters are used
 Three clock domains
 Elimination of:
▫ Weak keepers (aka half latches)
▫ Use of configuration cells as part of the design
- For example, SRL16
 Needs some external circuitry
(at least, a watchdog timer + PROMs)
Swift and Roosta 18 144_C4 / MAPLD04
Xilinx-style TMR

Swift and Roosta 19 144_C4 / MAPLD04


Xilinx-style TMR
In Xilinx-style TMR, I/O’s use three pins tied externally :

P
Minority
Voter

D0
P
Minority
Voter

D1 D
P
Minority
Voter

D2
Board Traces

Pins

Swift and Roosta 20 144_C4 / MAPLD04


Xilinx TMRtool

• Xilinx-style TMR done


by hand is difficult and Design Entry Simulation
tedious
EDIF NGC

• An automated tool EDIF


TMR XTMR
XILINX
Back-Annotation
which integrates into Timing, ncd2edif,
ncd2vhdl,
the design flow has ncd2verilog
XILINX
been developed Implementation
NGO

(“now” available) Translate, Map,


Floorplan, Par,
BitGen NCD

FPGA
• In-beam testing shows BIT

tool is very effective


Swift and Roosta 21 144_C4 / MAPLD04
Upset Comparison
• ATMR now has eliminated:
 Upsets of static storage elements, and
 SEFIs
• ATMR upsets from:
 Transients that are clocked into storage
 Clock tree hits

• Xilinx FPGAs have a small susceptibility to two types


of SEFIs
 Reset (sometimes only partial)
 Disable scrub port
• XTMR in combination with scrubbing can lower
system upset rates below the SEFI rate

Swift and Roosta 22 144_C4 / MAPLD04


Rate Comparison

• Actel
• Dominated by transients
• Roughly one system error per thousand years (GCRmin)

• Xilinx
• Dominated by SEFI rate
• Expect one SEFI per ~65 years in GCRmin
• Expect one system error ~5-20x less often

GCR = Galactic Cosmic Ray background (interplanetary space)


almost identical to geosynchronous orbit

Swift and Roosta 23 144_C4 / MAPLD04


CONCLUSIONS
For the present –

Both can achieve very acceptable radiation tolerance

Actel wins on:


▫ Less burden on the designer
▫ No auxiliary components
▫ Lower SEFI susceptibility

Xilinx wins on:


▫ Designer control of the resources vs. hardness tradeoff
▫ On-chip feature set
▫ Re-configurability

Competition is good.

Swift and Roosta 24 144_C4 / MAPLD04


Acronyms
FPGA - Field Programmable Gate Array
ASIC - Application Specific Integrated Circuit
SEU - Single Event Upset
SEFI - Single Event Functionality Interrupt
TMR - Triple Modular Redundancy
ATMR - Actel-style TMR
XTMR - Xilinx-style TMR
LET - Linear Energy Transfer (proportional to deposited
charge per micron for a heavy ion strike on an active node)
GCRmin - Galactic Cosmic Ray background (highest during
“solar minimum” period of ~11-yr cycle of sunspots)
MER - Mars Exploration Rovers
(i.e., Spirit and Opportunity)

Swift and Roosta 25 144_C4 / MAPLD04


Additional References
[1] J.J. Wang, W. Wong, S. Wolday, B. Cronquist, J. McCollum,
R. Katz, I. Kleyner, “Single event upset and hardening in 0.15 
antifuse-based field programmable gate array,” IEEE
Transactions on Nuclear Science, Dec. 2003

[2] Jih-Jong Wang, R.B. Katz, F. Dhaoui, J.L. McCollum, W. Wong,


B.E. Cronquist, R.T. Lambertson, E. Hamdy, I. Kleyner,
W. Parker, “Clock buffer circuit soft errors in antifuse-based field
programmable gate arrays,” IEEE Transactions on Nuclear
Science, Dec. 2000

[3] R. Katz, J.J. Wang, R. Koga, K.A. LaBel, J. McCollum,


R. Brown, R.A. Reed, B. Cronquist, S. Crain, T. Scott, W.
Paolini, B. Sin, “Current radiation issues for programmable
elements and devices,” IEEE Transactions on Nuclear Science,
Dec. 1998

Swift and Roosta 26 144_C4 / MAPLD04

You might also like