Realis c Failure Rates and Predic on Confidence
Data Accuracy
There are several sources of failure rate data compiled by defense, telecommunications,
process industries, oil and gas and other organizations. Some are published in the form of
data handbooks such as:
OREDA (Offshore data)
Some are in-house data collections that are not generally available. These occur in:
large industrial manufacturers
public utilities.
Data collection activities were at their peak in the 1980s but, sadly, they declined during the
1990s and the majority of published sources have not been updated since that time.
Failure data are usually, unless otherwise specified, taken to refer to random failures (i.e.
constant failure rates). It is important to read, carefully, any covering notes since, for a given
temperature and environment, a stated component, despite the same description, may exhibit
a wide range of failure rates because:
1. Some failure-rate data include items replaced during preventive maintenance whereas
others do not. These items should, ideally, be excluded from the data but, in practice,
it is not always possible to identify them. This can affect rates by an order of
magnitude.
Failure rates are affected by the tolerance of a design and this will cause a variation in the
values. Because definitions of failure vary, a given parametric drift may be included in
one database as a failure, but ignored in another.
3. Although nominal environmental and quality assurance levels are described in some
databases, the range of parameters covered by these broad descriptions is large. They
represent, therefore, another source of variability.
4. Component parts often are only described by reference to their broad type (e.g. signal
transformer). Data are therefore combined for a range of similar devices rather than being
separately grouped, thus widening the range of values. Furthermore, different failure
modes are often mixed together in the data.
5. The degree of data screening will affect the relative numbers of intrinsic and induced
failures in the quoted failure rate. An example would be not including a systematic failure
whose re-occurrence is designed out.
6. Reliability growth occurs because field experience is used to enhance reliability as a
result of modifications. This will influence the failure rate data.
7. Trial and error replacement is sometimes used as a means of diagnosis and this can
artificially inflate failure rate data.
8. Some data record undiagnosed incidents and ‘no fault found’ visits. If these are included
in the statistics as faults, then failure rates can be inflated. Quoted failure rates are
therefore influenced by the way they are interpreted by an analyst.
Failure rate values can span one or two orders of magnitude as a result of different
combinations
of these factors. Prediction calculations are explained in Chapters 8 and 9 but it will be seen
(Section 4.4) that the relevance of failure rate data is more important than refinements in the
model
used for the calculation. The data sources described in Section 4.2 can at least be subdivided
into
‘site/company specific’, ‘industry specific’ and ‘generic’ and research, described in Section 4.4,
confirms that the more specific the data source the greater the confidence in the prediction.
Data are presented in one of two forms:
1. Tables: lists of failure rates such as those in Appendices 3 and 4, with or without
multiplying factors, for such parameters as quality and environment. Sometimes failure
rates are tabulated, for a given component type, against ambient temperature and the ratio
of applied to rated stress (power or voltage).
2. Regression Models: obtained by regression analysis of the data. These are presented
in the form of equations that provide a failure rate as a result of inserting the device
parameters into the appropriate expression. Because of the large number of variables
involved in describing microelectronic devices, data are often expressed in the form
of models. These regression equations (WHICH GIVE A TOTALLY MISLEADING
IMPRESSION OF PRECISION) involve some or all of the following:
complexity (number of gates, bits, equivalent number of transistors)
• number of pins
junction temperature (see Arrhenius, Section 11.2)
• package (ceramic and plastic packages)
• technology (CMOS, NMOS, bipolar, etc.)
• type (memory, random LSI, analogue, etc.)
• voltage or power loading
• quality level (affected by screening and burn-in)
• environment
• length of time in manufacture.
Although empirical relationships have been established relating certain device failure rates
to specific stresses, such as voltage and temperature, no precise formula exists which links
specific
environments to failure rates. The permutation of different values of environmental factors,
such as
those listed in Chapter 12, is immense. General adjustment (multiplying) factors have been
evolved
and these are often used to scale up basic failure rates to particular environmental conditions.
Because failure rate is, probably, the least precise engineering parameter, it is important to bear
in mind the limitations of a reliability prediction. The research described in Section 4.4 makes
it possible to express predictions using confidence intervals. The resulting MTBF, availability
(or whatever) should not be taken as an absolute parameter but rather as a general guide to the
design reliability. Within the prediction, however, the relative percentages of contribution to the
total failure rate are of a better accuracy and provide a valuable tool in design analysis.
Because of the differences between data sources, comparisons of reliability should always
involve the same data source in each prediction.
For a reliability assessment to be meaningful, it must address a specific system failure mode.
To predict that a safety (shutdown) system will fail at a rate of, say, once per annum is, on
its own, saying very little. It might be that 90% of the failures lead to a spurious shutdown
and 10% to a failure to respond. If, on the other hand, the ratios were to be reversed then the
picture would be quite different.
The failure rates, mean times between failures or availabilities must therefore be assessed
for defined failure types (modes). In order to achieve this, the appropriate component level
failure modes must be applied to the prediction models that are described in Chapters 8
and 9. Component failure mode data are sparse but a few of the sources do contain some
information. The following sections indicate where this is the case.
4.2.1╇ Electronic Failure Rates
4.2.1.1╇ US Military Handbook 217 (generic, no failure modes)
This is one of the better known data sources and was from RADC (Rome Air Data Center in
the USA). Opinions are sharply divided as to its value due to the unjustified precision implied
by virtue of its regression model nature of its microelectronics sections. It covers:
microelectronics
discrete semiconductors
tubes (thermionic)
lasers
resistors and capacitors
inductors
connections and connectors
meters
crystals
lamps, fuses and other miscellaneous items.
4.2.2.3╇ TECHNIS (the author) (industry and generic, many failure modes, some repair times)
For over twenty-five years, the author has collected a wide range of failure rate and mode data
as well as recording the published data mentioned here. This is available to clients on a report
basis. An examination of these data has revealed a 40% improvement in failure rates between
the 1980s and the 1990s.
4.2.2.4╇ UKAEA (industry and generic, many failure modes)
This databank is maintained by the Systems Reliability Department (SRD) of UKAEA at
Warrington, Cheshire, who have collected the data as a result of many years of consultancy. It
is available on disk to members who pay an annual subscription.
4.2.2.5╇ Sources of Nuclear Generation Data (industry specific)
In the UK, UKAEA, above, has some nuclear data, as has NNC (National Nuclear
Corporation) although this may not be openly available.
In the USA Appendix III of the WASH 1400 study provided much of the data frequently
referred to and includes failure rate ranges, event probabilities, human error rates and some
common cause information. The IEEE standard IEEE500 also contains failure rates and
restoration times. In addition there is NUCLARR (Nuclear Computerized Library for Assessing
Reactor Reliability), which is a PC-based package developed for the Nuclear Regulatory
Commission and contains component failure rates and some human error data. Another US
source is the NUREG publication. Some of the EPRI data are related to nuclear plants.
In France, Électricité de France (EDF) provides the EIReDA mechanical and electrical failure
rate database, which is available for sale.
In Sweden the TBook provides data on components in Nordic nuclear power plants.
4.2.2.6╇ US Sources of Power Generation Data (industry specific)
The EPRI (Electric Power Research Institute) of GE Co., New York, data scheme is largely
gas turbine generation failure data in the USA.
There is also the GADS (Generating Availability Data System) operated by NERC (North
American Electric Reliability Council). They produce annual statistical summaries based on
experience from power stations in the USA and Canada.
4.2.2.7╇ SINTEF (industry specific)
SINTEF (at Trondheim) is part of the Norwegian Institute of Technology and, amongst many
activities, collects failure rate data as, for example, data sheets on fire and gas detection
equipment.
Many companies (e.g. Siemens) and for that matter firms of RAMS consultants (e.g. RM
Consultants Ltd) maintain failure-rate data but only for use by that organization.
4.2.3╇ Some Older Sources
A number of sources have been much used and are still frequently referred to. They are,
however, somewhat dated but are listed here for completeness.
Reliability Prediction Manual for Guided Weapon Systems (UK MOD) – DX99/013–100
Reliability Prediction Manual for Military Avionics (UK MOD) – RSRE250
UK Military Standard 00–41
Electronic Reliability Data – INSPEC/NCSR (1981)
Green and Bourne, Reliability Technology, Wiley 1972 (book)
Frank Lees, Loss Prevention in the Process Industries, Butterworth-Heinemann (book).
4.3╇ Data Ranges
For some components there is fairly close agreement between different sources whereas
in other cases there is a wide range of failure rate values, the reasons for which were
summarized in Section 4.1.
The FARADIP.THREE database was created to show the ranges of failure rate for most
component types. This database, CURRENTLY version 6.5 in 2010 (but updated annually),
is a summary of Technis data together with most of the other databases and shows, for each
component, the range of failure rate values that is to be found from them. Where a value
in the range tends to predominate then this is indicated. Failure mode percentages are also
included. It is available as a software package (with FMEA facilities) from the author at
26 Orchard Drive, Tonbridge, Kent TN10 4LG, UK technis.djs@virgin.net and includes:
Microelectronics:
logic and linear
memory.
Discrete:
diodes and transistors
optoelectronics
lamps and displays
crystals and piezo devices
tubes.
Passive:
capacitors
resistors