0% found this document useful (0 votes)

361 views80 pages

Section 7a Reliability Notes

Uploaded by

Georgina Sule

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

361 views80 pages

Section 7a Reliability Notes

Uploaded by

Georgina Sule

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

WARWICK MANUFACTURING GROUP

Section
7
Product Excellence using 6 Sigma (PEUSS)

Introduction
to
Reliability
Warwick Manufacturing Group
AN INTRODUCTION TO RELIABILITY
ENGINEERING

Contents

1 Introduction 1
2 Measuring reliability 4
3 Design for reliability 12
4 Reliability management 34
5 Summary 35

Copyright © 2007 University of Warwick

Warwick Manufacturing Group
Introduction to Reliability Engineering Page 1

RELIABILITY ENGINEERING
1 Introduction
1.1 Definition

Most people will have some concept of what reliability is from everyday life, for example,
people may discuss how reliable their washing machine has been over the length of time they
have owned it. Similarly, a car that doesn’t need to go to the garage for repairs often, during
its lifetime, would be said to have been reliable. It can be said that reliability is quality over
time. Quality is associated with workmanship and manufacturing and therefore if a product
doesn’t work or breaks as soon as you buy it you would consider the product to have poor
quality. However if over time parts of the product wear-out before you expect them to then
this would be termed poor reliability. The difference therefore between quality and reliability
is concerned with time and more specifically product life time.
Reliability engineering has both quantitative and qualitative aspects; measurements of
reliability are necessary for customer requirements compliance. However measuring
reliability does not make a product reliable, only by designing in reliability can a product
achieve its reliability targets. These lecture notes will therefore introduce some of the
terminology used in reliability engineering. It will provide information about measuring
reliability as well as designing for reliability. Moreover it will emphasise the importance of
good engineering principles to ensure product reliability. By identifying possible causes of
failure and elimination will obviously help to improve product reliability.
The formal definition of reliability is as follows: The ability of an item to perform a required
function under stated conditions for a stated period of time. BS4778
Another definition concerns the probabilistic nature of measuring reliability, i.e. the
probability of an item to perform a required function under specified conditions for a stated
period of time. It is therefore a measure of engineering uncertainty and to quantify reliability
involves the use of statistics and more specifically probability theory. These notes will also
describe some useful probability distributions that can describe the lifetime behaviour of
products.

1.2 What is reliability?

Reliability is associated with unexpected failures of products or services and understanding

why these failures occur is key to improving reliability. The main reasons why failures occur
include:
• The product is not fit for purpose or more specifically the design is inherently
incapable.
• The item may be overstressed in some way.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 2

• Failures can be caused by wear-out

• Failures might be caused by variation.

• Wrong specifications may cause failures.

• Misuse of the item may cause failure.

• Items are designed for a specific operating environment and if they are then used
outside this environment then failure can occur.
There are many reasons for failure in items the list above is a generic list.

The load and strength of an item may be generally known, however there will always be an
element of uncertainty. The actual strength values of any population of components will vary;
there will be some that are relatively strong, others that are relatively weak, but most will be
of nearly average strength. Similarly there will be some loads greater than others but mostly
they will be average. Figure 1, below shows the load strength relationship with no overlaps.
Probability

Load Strength

Figure 1: load strength relationship , no overlaps

However if, as shown in figure 2, there is an overlap of the two distributions then failures will
occur. There therefore needs to be a safety margin to ensure that there is no overlap of these
distributions.

Failure
Probability

Load Strength

Figure 2 load strength relationship - overlaps

Warwick Manufacturing Group
Introduction to Reliability Engineering Page 3

It is clear that to ensure good reliability the causes of failure need to be identified and
eliminated. Indeed the objectives of reliability engineering are:
• To apply engineering knowledge to prevent or reduce the likelihood or frequency of
failures;
• To identify and correct the causes of failure that do occur;

• To determine ways of coping with failures that do occur;

• To apply methods of estimating the likely reliability of new designs, and for analysing
reliability data.
These notes will discuss some of the techniques that can be used to identify failures as well as
the statistical techniques for analysing reliability data.

1.3 Why is Reliability important?

Unreliability has a number of unfortunate consequences and therefore for many products and
services is a serious threat. For example poor reliability can have implications for:
• Safety

• Competitiveness

• Profit margins

• Cost of repair and maintenance

• Delays further up supply chain

• Reputation

• Good will

KEY POINTS

• Reliability is a measure of uncertainty and therefore estimating reliability means

using statistics and probability theory
• Reliability is quality over time

• Reliability must be designed into a product or service

• Most important aspect of reliability is to identify cause of failure and eliminate in

design if possible otherwise identify ways of accommodation
• Reliability is defined as the ability of an item to perform a required function without
failure under stated conditions for a stated period of time
• The costs of unreliability can be damaging to a company
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 4

2 Measuring reliability
2.1 Requirements

Many customers will produce a statement of the reliability requirements that is included in the
specification of the product. This statement should include the following:
• The definition of failure related to the product’s function and should cover all failure
modes relevant to the function;
• A full description of the environments in which the product will be stored, transported,
operated and maintained;
• A statement of the reliability requirement

Care must be given in defining failure to ensure that the failure criteria are unambiguous.
Failure should always relate to a measurable parameter or to a clear indication. For example, a
definition of failure could include ‘failure of a function to operate’. To be able to design for
the load of the product the design team must have accurate information concerning the
environment of the product. If an item must fully operate at high altitude with extreme
changes in temperature then the design must be robust enough to withstand such
environmental factors. Similarly if a product is stored in extreme conditions prior to use then
the design must accommodate for the storage conditions.
The reliability requirement should be stated in a way which can be verified, and which makes
sense relative to the use of the product. The simplest requirement is to state that no failure will
occur under stated conditions. Reliability requirements based on life parameters (see section
2.3) must be based on the corresponding life distributions. A common parameter used is
MTBF, when a constant failure rate is assumed.
Reliability and Maintainability case

The UK MOD has recently moved away from prescriptive reliability requirements and now
requests a reliability case from their suppliers. The reliability and maintainability (R&M) case
is defines as “A reasoned, auditable argument created to support the contention that a defined
system satisfies the R&M requirements” . DEF STAN 0042 part 3 is a document produced by
the UK MOD that gives guidance on what goes into an R&M case.

2.2 The bath tub curve

The bath-tub curve is a representation of the reliability performance of components or non-

repaired items. It observes the reliability performance of a large sample of homogenous items
entering the field at some start time (usually zero). If we observe the items over their lifetime
without replacement then we can observe three distinct shapes or periods. Figure 3 shows the
bath-tub curve and these 3 periods. The infant mortality or early failures portion shows that
the population will initially experience a high hazard function that starts to decrease. This
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 5

period of time represents the burn-in or debugging period where weak items are weeded out.
After the initial phase when the weak components have been weeded out and mistakes
corrected, the remaining population reaches a relatively constant hazard function period,
known as the useful life period. From figure 3 you can see that the hazard function is constant,
this shape can be modelled by the exponential distribution (see section 2.3) when failure are
occurring randomly through time. The final portion of the bath-tub curve is called the wear-
out phase, this is when the hazard function increases with time.
Hazard function

Useful Life

Time
Infant Wear Out
Mortality
Figure 3 The bath-tub curve

2.3 Life distributions

2.3.1 Distribution functions

If you take a large number of measurements you can draw a histogram to show the how the
measurements vary. A more useful diagram, for continuous data, is the probability density
function. The y axis is the percentage measured in a range(shown on the x-axis) rather than
the frequency as in a histogram. If you reduce the ranges(or intervals) then the histogram
becomes a curve which describes the distribution of the measurements or values. This
distribution is the probability density function or PDF. Figure 4, below, shows an example
of a PDF. The area under the curve of the distribution is equal to 1, i.e.

−∞
∫ f (x)dx = 1
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 6

The probability of a value falling between any two values x1 and x2 is the area bounded by
this interval, i.e.
x2

p(x1 < x < x2 ) = ∫ f (x)dx

x
1

f(x)Frequency

values

Figure 4 Probability density function

In reliability since we are usual discussing time we will change x to t, i.e. f(t). The CDF,
cumulative distribution function or F(t), gives the probability that a measured value will
fall between -∞ and t, i.e.
F (t) = ∫ f (t)dt
−∞
t

Figure 5, below, shows the CDF as x tends to ∞ F(t) tends to 1.

1
1 .2
1
cy(

0 .8

0 .6
re q u e
nCDF,F(t)

0 .4
F

0 .2
0
1 1 2 3 3 4 5 t 5 6 7 7 9

V a lu e

Figure 5 Cumulative distribution function F(t)

Warwick Manufacturing Group
Introduction to Reliability Engineering Page 7

In reliability engineering we are concerned with the probability that an item will survive for a
stated interval of time (or cycles or distance etc.) i.e. there is no failure in the interval (0 to t).
This is known as the survival function and is given by R(t). From the definition:

R(t) = 1 − F (t) = ∞ f (t)dt = 1 − ∫ ∫ f (t)dt

t −∞

1.2
1
(%)

0.8
0.6
R(t)

0.4
Frequenc

0.2
y

0
1 1 2 3 3 4 5t5 6 7 7 9

Value

Figure 6 Survival function R(t)

Another important function in reliability is the hazard function h(t); it is the conditional
probability of failure in the interval t to (t+dt), given that no failure has occurred by t:

f (t ) f (t )
h(t) = R(t ) = 1− F (t )
The bath tub curve, shown in figure 3, shows 3 different hazard functions, one decreasing,
one constant and one increasing hazard function.

2.3.2 Particular life distributions

There are 3 continuous life distributions that are commonly used in reliability engineering;
these are called, the exponential, Weibull and lognormal distributions. The normal distribution
as discussed in both the Six Sigma and SPC lectures is not generally used in reliability
engineering (although it is sometimes used).
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 8

The Exponential Distribution

When an item is subject to failures that occur in random intervals and the expected number of
failures is the same for long periods of time then the distribution of failures is said to fit an
exponential distribution. The PDF, CDF and survival function is given as:
f (t) = λe−λt and F (t) = 1 − e−λt and R(t) = e−λt

The hazard function for the exponential distribution is given as:

λe− λt
h(t) = − λt =λ
e
Notice that the hazard function is not a function of time and is in fact a constant equal to λ..

For repaired items, λ, is called the failure rate and 1/λ is called the mean time between failures
(MTBF), sometimes denoted as θ. An important point to note is that 63.2% of items will have
failed by time t=θ.
The failure rate can be calculated as the total number of failures divided by the total operating
time.
The exponential distribution is the most commonly used distribution in reliability engineering
and models the useful life portion of the bath-tub curve.

The Weibull Distribution

This distribution takes account of a non-constant hazard function. The Survival function is
given by:
t β
−( )
R(t) = e η

where β is the shape parameter and η is the scale parameter or characteristic life. The
characteristic life is the life at which 63.2% of the population will have failed.
When β = 1, the hazard function is constant and therefore the data can be modelled by an
exponential distribution with η=1/λ .
When β<1, we get a decreasing hazard function and

When β>1, we get a increasing hazard function

Figure 7, below, shows the Weibull shape parameters superimposed on the bath-tub curve.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 9

function
Useful Life
Hazar

β=1
d

β<1 β>1

Time
Infant Wear Out
Mortality
Figure 7 bath tub curve and the weibull distribution

When β>3.5, the Weibull distribution is an approximation for the Normal distribution.

There is also a three parameter version of the Weibull distribution and this is called the
location parameter. It is sometimes called the failure free time or the minimum life.
Other notation often used with the Weibull distribution is the B n-life, this is the time by which
n% of the population can be expected to fail. (n is the proportion failing).

The Lognormal Distribution

The lognormal distribution is more versatile than the normal distribution and is a better fit to
reliability data, such as for populations with wear-out characteristics. Also, it does not have
the Normal’s disadvantage of extending below zero i.e. always positive. The lognormal
distribution and the normal distribution describe situations when the hazard function is
increasing.

2.4 Modelling system reliability

Usually multiple components make up a systems and we often want to know the reliability of
a system that uses more than one component. How the components are connected together
determines what type of system reliability model is used.
There are different types of system reliability models and theses are typically used to analyse
items such as an aircraft completing its flight successfully.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 10

2.4.1 Series systems

Simplest reliability model is a serial model where all the components must be working for the
system to be successful, for example:

A B Z
Figure 8 example of components connected in series

The reliability of this model is calculated by:

RS = RA * RB ….RZ

If the components can be assumed to be exponentially distributed then the system reliability
can be calculated as:

RS = e−λ t A * e
−λ t
B *....* e
−λ
Z
t

The Failure rate of the system is calculated as by adding the failure rates together, i.e.

λS = λA + λB + .......... + λZ
2.4.2 Active redundancy

One of the most common forms of redundancy is the parallel reliability model where two
independent items are operating but the system can successfully operate as long as one of
them is working, diagrammatically:
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 11

B
Figure 9 dual redundant system

The reliability of the system is equal to the probability of item 1 or item 2 surviving, e.g.

RS = RA + RB – (RA* RB)

And for the constant hazard function case:

RS = e−λ t A + e
−λ t
B − e
−(λ
A
+λ
B
)t

This example assumes each item is not repaired after failure i.e. non-maintained system.

2.4.3 M-out-of-N redundancy

In some active parallel redundant configurations, m out of the n items may be required to be
working for the system to function. The reliability of an m-out-of-n system, with n identical
independent items is given by:

m −1
R = 1 − ∑ n Ci Ri (1 − R)n
−i
i =0

There are other system reliability models including the standby redundancy situation but those
above are the simplest.

2.5 Reliability prediction

One of the most common methods for predicting the reliability of a system is called the parts
count method.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 12

Assuming all the parts in a system are independently exponentially distributed, i.e. one part
does not cause the other to fail then the overall system failure rate can be calculated using the
series system model shown above. For example, the failure rate of a printed circuit board is
the sum of the failure rates of each of the components.
For example:

Component type Quantity Failure rate Quantity*failure rate

-6 -6
Ceramic capacitor 30 0.00001 * 10 0.0003 * 10
-6 -6
Tantalum capacitor 10 0.0003 * 10 0.003 * 10
-6 -6
Carbon resistor 30 0.00001 * 10 0.0003 * 10
-6 -6
Diodes 10 0.0002 * 10 0.002 * 10
-6 -6
Transistors 15 0.0005 * 10 0.0075 * 10
-6 -6
Logic IC 20 0.001 * 10 0.020 * 10
-6
PCB failure rate = 0.035800 * 10

The failure rates for components can be estimated from company in-service databases or can
be attained from published handbooks and published data.

KEY POINTS

• PDF, CDF, Reliability function and hazard function

• Bath-tub curve – infant mortality, useful life and wear-out

• Exponential distribution most widely used – constant hazard function

• Weibull with shape parameter, can model decreasing and increasing hazard function.
rd
When Beta =1 is equal to exponential. Characteristic life is the 63 percentile
• Series systems modelling used for estimating system reliability by using parts count
method

3 Design for reliability

The objective of design for reliability is to design a given product that meets its requirements
under the specified environmental conditions. To achieve this good sound engineering design
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 13

rules should be followed. However there are a few general principles that should observed,
these include:
• Component selection – well-established and known components should be used
(company usually have their own approved components list). If this is not he case then
analysis must be done to check the component is fit for purpose.
• Consider the load-strength relationship and ensure there is an adequate safety margin.

• Minimum complexity

• Diversity – avoids common mode failures

• Analyse failure modes and their effects (FMEA)

• Identify any single point failures and either mitigate or design them out.

• Use lessons learned from previous products to design out any known

weaknesses. Ultimately the aim is to maximise reliability during service life by:

• Measurement & control of manufacturing quality / screening

• Optimized design & build process to improve intrinsic reliability

• Assure no systematic faults present in product

• Provide sufficient margin to meet life requirements

3.1 Product life cycle

Each product has a life cycle, figure 10 illustrates a generic product life cycle. There are a
number of tools and techniques that are most useful at various stages of the product life cycle.
For example, at the design stage, it is most appropriate to use techniques that will be useful for
design reviews. Testing parts for fitness of purpose using accelerated life testing is also
necessary at this stage. When the product has been built it becomes costly to change the
design so all design reviews need to be done as early as possible in the product life cycle.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 14

Design
FMECA, FTA, PoF,RBD
FE,accelerated life test

Development
Development Test
Use
Field data analysis
FRACAS

Test
Manufacture
ESS, Burn-in
SPC

Figure 10: Product life cycle

Development testing is used to investigate the robustness of the product and to identify any
design weaknesses with respect to the load. Development testing incorporates environmental
testing and is used for fitness of purpose of the product.
When the product has been developed, the design closed and ready for production, statistical
process control and other quality engineering tools are imperative for ensuring a good quality
product.
Environmental stress screening or burn-in is sometimes used to test all manufactured units
prior to release to the customer. The purpose of ESS is to identify any manufacturing
weaknesses in individual items.
When in-service, product performance data should be collected to check the product
reliability and also to feed forward to new product design in the form of lessons learned.
More discussion on some of these tools and techniques is given in later sections.

3.2 Reliability tools and techniques

3.2.1 Introduction

Some of the tools that are useful during the design stage can be thought of as tools for fault
avoidance. The fall into two general methods, bottom-up and top-down.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 15

3.2.2 Top-down method

• Undesirable single event or system success at the highest level of interest (the top event)
should be defined.
• Contributory causes of that event at all levels are then identified and analysed.

• Start at highest level of interest to successively lower levels

• Event-oriented method

• Useful during the early conceptual phase of system design

• Used for evaluating multiple failures including sequentially related failures and common-
cause events
Some examples of top-down methods include: Fault tree analysis (FTA); Reliability block
diagram (RBD) and Markov analysis
Fault tree analysis

Fault tree analysis is a systematic way of identifying all possible faults that could lead to
system fail-danger failure. The FTA provides a concise description of the various
combinations of possible occurrences within the system that can result in predetermined
critical output events. The FTA helps identify and evaluate critical components, fault paths,
and possible errors. It is both a reliability and safety engineering task, and it is a critical data
item that is submitted to the customer for their approval and their use in their higher-level
FTA and safety analysis. The key elements of a FTA include:
– Gates represent the outcome

– Events represent input to the gates

– Cut sets are groups of events that would cause a system to fail

The following diagram shows the flowchart symbols that are used in fault tree analysis in
order to aid with the correct reading of the fault tree.
FTA can be done qualitatively by drawing the tree and identifying all the basic events.
However to identify the probability of the top event then probabilities or reliability figures
must be input for the basic events. Using logic the probabilities are worked up to given a
probability that the top event will occur. Often the data from an FMEA are used in
conjunction with an FTA.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 16

A rectangle signifies a fault or undesired event caused by one or more

preceding causes acting through logic gates

Circle signifies a primary failure or basic fault that requires no further development

Diamond denotes a secondary failure or undesired event but not developed further

And gate denotes that a failure will occur if all inputs fail (parallel redundancy)

Or gate denotes a failure will occur if any input fails (series reliability)

Transfer event

• FTA is used to:

– investigate potential faults;

– its modes and causes;

– and to quantify their contribution to system unreliability in the course of product

design .

Reliability block diagram

The RBD is discussed and shown in section 2.4 above. It is however among the first tasks to
be completed. It model system success and gives results for the total system. As shown in
section 2.4, it deals with different system configuration, including, parallel, redundant,
standby and alternative functional paths. It doesn’t provide any fault analysis and uses
probabilistic measures to calculate system reliability.

3.2.3 Bottom-up method

• Identify fault modes at the component level

Warwick Manufacturing Group
Introduction to Reliability Engineering Page 17

• For each fault mode the corresponding effect on performance is deduced for the next
higher system level
• The resulting fault effect becomes the fault mode at the next higher system level, and
so on
• Successive iterations result in the eventual identification of the fault effects at all
functional levels up to the system level.

• Rigorous in identifying all single fault modes

• Initially may be qualitative

Some examples of bottom-up methods include: Event tree analysis (ETA); FMEA and Hazard
and operability study (HAZOP).
Event tree analysis

• considers a number of possible consequences of an initiating event or a system failure.

• may be combined with a fault tree

• used when it is essential to investigate all possible paths of consequent events their
sequence
• analysis can become very involved and complicated when analysing larger systems
Example:

A PA1 = 0.5 C1 Pc1 = 0.5 Car came to slow stop, no

damage to the car, other
property or injuries Pc1=0.5
C C2 Pc2 = 0.4 Car came to slow stop,
no damage to the wheel,
B
Pc2= 0.5*0.3*0.4=0.06
P = 0.3 C
Car collided with the
A B1 C3 Pc3 = 0.6 centre divider, damage to
the car and the divider
PA2 = 0.5 Pc3=0.5*0.3*0.6=0.09
C
CP = 0.2 Car ran off the road, damage
B 4 c4 to the car, driver injured
Pc4=0.5*0.7*0.2=0.07
PB2 = 0.7 C CP = 0.8 Collision with another vehicle,
A no property damage or injury
B property damage, no injury 5 c5 damage to both, both drivers
C damage to the car only, no other property damage injured
Pc5=0.5*0.7*0.8=0.28
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 18

Failure Modes and Effects Analysis (FMEA)

Failure mode and effect analysis (FMEA) is a bottom -up, qualitative dependability analysis
method, which is particularly suited to the study of material, component and equipment
failures and their effects on the next higher functional system level. Iterations of this step
(identification of single Failure modes and the evaluation of their effects on the next higher
system level) result in the eventual identification of all the system single failure modes.
FMEA lends itself to the analysis of systems of different technologies (electrical, mechanical,
hydraulic, software, etc.) with simple functional structures. FMECA extends the FMEA to
include criticality analysis by quantifying failure effects in terms of probability of occurrence
and the severity of any effects. The severity of effects is assessed by reference to a specified
scale.
FMEAs or FMECAs are generally done where a level of risk is anticipated in a program early
in product or process development. Factors that may be considered are new technology, new
processes, new designs, or changes in the environment, loads, or regulations. FMEAs or
FMECAs can be done on components or systems that make up products, processes, or
manufacturing equipment. They can also be done on software systems.

The FMEA or FMECA, analysis generally follows the following steps:

• Identification of how the component of system should perform;
• Identification of potential failure modes, effects, and causes;
• Identification of risk related to failure modes and effects;
• Identification of recommended actions to eliminate or reduce the risk;
• Follow-up actions to close out the recommended actions.

Benefits include:
• Identifies systematically the cause and effect relationships.
• Gives an initial indication of those failure modes that are likely to be critical,
especially single failures that may propagate.
• Identifies outcomes arising from specific causes or initiating events that are believed
to be important.
• Provides a framework for identification of measures to mitigate risk.
• Useful in the preliminary analysis of new or untried systems or processes.

Limitations include:
• The output data may be large even for relatively simple systems.
• May become complicated and unmanageable unless there is a fairly direct (or "single-
chain") relationship between cause and effect may not easily deal with time sequences,
restoration processes, environmental conditions, maintenance aspects, etc.
• Prioritising mode criticality is complicated by competing factors involved
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 19

Physics of Failure (PofF).

In simple terms, Physics-of-Failure is an understanding of the physical properties of the
materials, processes, and technologies used in the design and how those properties can interact
with the life hazard conditions placed on the design during the product’s full life cycle. The
reliability engineer must understand the customer’s use and misuse conditions and
component/ environment interactions to assist the design team in working around limitations
inherent in the selected design approach.

This type of analysis will answer most of the why, where, when, and how questions about the
life of a component. Analysis methods prove essential in understanding, determining, and
applying appropriate corrective action for root cause of failure. Understanding the root cause
of a failure is essential in today’s highly competitive market for successfully manufacturing
quality components.

3.3 Reliability testing

3.3.1 Accelerated life testing

The concept of accelerated testing is to compress time and accelerate the failure mechanisms
in a reasonable test period so that product reliability can be assessed. The only way to
accelerate time is to stress potential failure modes. These include electrical and mechanical
failures. Failure occurs when the stress exceeds the product’s strength. In a product’s
population, the strength is generally distributed and usually degrades over time. Applying
stress simply simulates aging. Increasing stress increases the unreliability and improves the
chances for failure occurring in a shorter period of time. This also means that a smaller sample
population of devices can be tested with an increased probability of finding failure. Stress
testing amplifies unreliability so failure can be detected sooner. Accelerated life tests are also
used extensively to help make predictions. Predictions can be limited when testing small
sample sizes. Predictions can be erroneously based on the assumption that life-test results are
representative of the entire population. Therefore, it can be difficult to design an efficient
experiment that yields enough failures so that the measures of uncertainty in the predictions
are not too large. Stresses can also be unrealistic. Fortunately, it is generally rare for an
increased stress to cause anomalous failures, especially if common sense guidelines are
observed.

Anomalous testing failures can occur when testing pushes the limits of the material out of the
region of the intended design capability. The natural question to ask is: What should the
guidelines be for designing proper accelerated tests and evaluating failures? The answer is:
Judgment is required by management and engineering staff to make the correct decisions in
this regard. To aid such decisions, the following guidelines are provided:

1. Always refer to the literature to see what has been done in the area of accelerated
testing.

2. Avoid accelerated stresses that cause “nonlinearities,” unless such stresses are
plausible in product-use conditions. Anomalous failures occur when accelerated stress
causes “nonlinearities” in the product. For example, material changing phases from
solid to liquid, as in a chemical “nonlinear” phase transition (e.g., solder melting,
inter-metallic changes, etc.); an electric spark in a material is an electrical
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 20

nonlinearity; material breakage compared to material flexing is a mechanical

nonlinearity.

3. Tests can be designed in two ways: by avoiding high stresses or by allowing them,
which may or may not cause nonlinear stresses. In the latter test design, a concurrent
engineering design team reviews all failures and decides if a failure is anomalous or
not. Then a decision is made whether or not to fix the problem. Conservative decisions
may result in fixing some anomalous failures. This is not a concern when time and
money permit fixing all problems. The problem occurs when normal failures are
labeled incorrectly as anomalous and no corrective action is taken.

Accelerated life testing is normally done early in the design process as a method for testing
for fit for purpose. It can be done at the component level or the sub -assembly level but is
rarely done at a system level as there are usually too many parts and factors that can cause
failures and these can be difficult to control and monitor.

Step-Stress Testing is an alternative test; it usually involves a small sample of devices

exposed to a series of successively higher and higher steps of stress. At the end of each stress
level, measurements are made to assess the results to the device. The measurements could be
simply to assess if a catastrophic failure has occurred or to measure the resulting parameter
shift due to the step’s stress. Constant time periods are commonly used for each step-stress
period. This provides for simpler data analysis. There are a number of reasons for performing
a step-stress test, including:

• Aging information can be obtained in a relatively short period of time. Common step-
stress tests take about 1 to 2 weeks, depending on the objective.
• Step-stress tests establish a baseline for future tests. For example, if a process
changes, quick comparisons can be made between the old process and the new
process. Accuracy can be enhanced when parametric change can be used as a measure
for comparison. Otherwise, catastrophic information is used.
• Failure mechanisms and design weaknesses can be identified along with material
limitations. Failure-mode information can provide opportunities for reliability growth.
Fixes can then be put back on test and compared to previous test results to assess fix
effectiveness.
• Data analysis can provide accurate information on the stress distribution in which the
median-failure stress and stress standard deviation can be obtained.

3.3.2 Reliability enhancement testing or HALT

The goal of Reliability enhancement testing (RET) is to identify any potential failure modes
that are inherent in a design early in the design process. Identifying the root cause of the
failure mode and then incorporating a fix to the design can achieve reliability growth. This is
accomplished by designing out the possibility of potential failure modes occurring with the
customer and reducing the inherent risk associated with new product development. RET at the
unit or subassembly level utilizes step-stress testing as its primary test method. It should be
noted that Highly Accelerated Life Testing (HALT) is not meant to be a simulation of the real
world but a rapid way to stimulate failure modes. These methods commonly employ
sequential testing, such as step-stressing the units with temperature and then vibration. These
two stresses can be combined so that temperature and vibration are applied simultaneously.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 21

This speeds up testing, and if an interactive vibration/temperature failure mode is present, this
combined testing may be the only way to find it. Other stresses used may be power step-
stress, power cycling, package preconditioning with infrared (IR) reflow, electrostatic-
discharge (ESD) simulation, and so forth. The choice depends on the intended type of unit
under test and the unit’s potential failure modes.

HALT is primarily for assemblies and subassemblies. The HALT test method utilizes a HALT
chamber. Today, these multi -stress environmental systems are produced by a large number of
suppliers. The chamber is unique and can perform both temperature and vibration step-stress
testing.

3.3.3 Demonstration testing

Demonstration of reliability may be required as part of a development and production

contract, or prior to release to production, to ensure that the requirements have been met. Two
basic forms of reliability measurement are used:
1. A sample of units may be subjected to a formal reliability test, with conditions
specified in detail.
2. Reliability may be monitored during development and use.
The first method has been shown to be problematic and subject to sever limitations and
practical problems. The limitations include:
• PRST (Probability ratio sequential test) assumes a constant hazard function;
• It implies that MTBF is an inherent parameter of a system;
• Extremely costly
• It is an acceptance test
• Objective is to have no or very few failures
It has been shown that a well managed reliability growth programme as discussed earlier
would avoid the need for demonstration testing as they concentrate on how to improve
products. It is has also been argued that the benefit to the product in terms of improved
reliability is sometimes questionable having used PRST methods.

3.3.4 Environmental Stress Screening

If all processes were under complete control, product screening or monitoring would be
unnecessary. If products were perfect, there would be no field returns or infant mortality
problems, and customers would be satisfied with product reliability and quality. However, in
the real world, unacceptable process and material variations exist. Product flaws need to be
anticipated before customers receive final products and use them. This is the primary reason
that a good screening and monitoring program is needed to provide high quality products.
Screening and monitoring programs are a major factor in achieving customer satisfaction.
Parts are screened in the early production stage until the process is under control and any
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 22

material problems have been resolved. Once this occurs, a monitoring program can ensure that
the process has not changed and that any deviations have been stabilized. Here, the term
“screening” implies 100% product testing while “monitoring” indicates a sample test. Screens
are based upon a product’s potential failure modes. Screening may be simple, such as on-off
cycling of the unit, or it may be more involved, requiring one or more powered environmental
stress screens. Usually, screens that power up the unit, compared with non-powered screens,
provide the best opportunity to precipitate failure-mode problems. Screens are constantly
reviewed and may be modified based on screening yield results. For example, if field returns
are low and the screen yields are high (near 100 percent), the screen should be changed to find
all the field issues. If yields are high with acceptable part per million (PPM) field returns, then
a monitoring program will replace the screen. In general, monitoring is preferred for low-
cost/high-volume jobs. A major caution for selecting the correct screening program is to
ensure that the process of screening out early life failures does not remove too much of a
product’s useful life. Manufacturers have noted that, in the attempt to drive out early life
failure, the useful life of some products can become reduced. If this occurs, customers will
find wear-out failure mechanisms during early field use.

The information obtained when a product is first introduced to the Development Phase in the
HALT process enables the development of a HASS test. HALT, is a highly accelerated
reliability growth Test-Analyze-And-Fix (TAAF) process. In HASS, failures are analyzed,
and corrective actions are implemented. The test is repeated until the observed failure modes
have been fixed, and the environmental technology limits of the part are understood. This
information is used for the Production Phase. At this stage, one either develops a traditional
ESS or a HASS test. HASS test combines thermal cycling, vibration, and power stress
simultaneously. The testing range is within the operating limits that are known from prior
HALT testing performed. Similar to HALT, HASS is an aggressive screening program to help
weed out failure modes and implement corrective actions as soon as possible. This process
enables products to be moved quickly into a monitoring program. The HASS process typically
helps to reduce screen time (30 percent to 80 percent) and move a product more quickly into
the Monitoring Phase. For example, a common screen uses 168-hour burn-in, 20-hour thermal
shock, and a 60-minute vibration test. Since this is a fairly lengthy screen, it is advantageous
to work with a HASS program. In the HASS process, this test is quickly reduced to a
monitoring program. Since faster test results help in implementing product improvements and
moving to a monitoring test, cost savings can be passed onto the customer. If HASS
precipitates a failure, an immediate failure analysis is performed to determine the root cause.
A 100 percent screen is maintained until the process is in control. At that point, monitoring
can be performed. The monitoring also includes a HALT at given intervals to ensure that the
product safety margins have not deteriorated from those obtained in the Evaluation Phase.
Thus, knowledge of the HALT and HASS environmental limits, relative to a customer
specification, is very helpful in providing engineering confidence in the proper design of the
screening and the subsequent monitoring test. Such sound practices are important for
providing a highly reliable product.

3.3.5 Reliability Growth/Enhancement h Planning

Traditionally, the need for Reliability Growth planning has been for large subsystems or
systems. This is simply because of the greater risk in new product development at that level
compared to the component level. Also, in programs where one wishes to push mature
products or complex systems to new reliability milestones, inadequate strategies will be
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 23

costly. A program manager must know if Reliability Growth can be achieved under required
time and cost constraints. A plan of attack is required for each major subsystem so that
system-level reliability goals can be met. However Reliability Growth planning is
recommended for all new “platforms,” whether they are complex subsystems or simple
components. In a commercial environment with numerous product types, the emphasis must
be on platforms rather than products. Often there may be little time to validate, let alone
assess, reliability. Yet, without some method of assessment, platforms could be jeopardized.
Accelerated testing is, without question, the featured Reliability Growth tool for industry. It is
important to devise reliability planning during development that incorporates the most time-
and cost effective testing techniques available.

Reliability growth can occur at the design and development stage of a project but most of the
growth should occur in the first accelerated testing stage, early in design. Generally, there are
two basic kinds of Reliability Growth test methods used: constant stress testing and step-stress
testing. Constant stress testing applies to an elevated stress maintained at a particular level
over time, such as isothermal aging, in which parts are subjected to the same temperature for
the entire test (similar to a burn-in). Step-stress testing can apply to such stresses as
temperature, shock, vibration, and Highly Accelerated Life Test (HALT). These tests
stimulate potential failure modes, and Reliability Growth occurs when failure modes are
fixed. No matter what the method, Reliability Growth planning is essential to avoid wasting
time and money when accelerated testing is attempted without an organized program plan.

Today’s competitive market requires thorough planning, especially since platform complexity
has increased dramatically as competition and technological advances have driven down size
and costs. Traditional methodology provides us with many valuable tools, such as test
planning, growth tracking and assessment, fix-effectiveness factor estimation, corrective
action review team operations, and Test-Analyze-And-Fix (TAAF) strategies. All methods
use a FRACAS type approach to audit corrective actions.

There are numerous types of accelerated tests including HALT, Step-Stress, Highly
Accelerated Stress Screening (HASS), Environmental Stress Screening (ESS), failure-free
accelerated testing, Reliability Growth and accelerated reliability growth. These practices are
all important, since each has been used today in both commercial and industrial applications
for ensuring product reliability. The methods have not been without confusion. Confusion
exists as to when and which test method should be used and the Reliability Growth that can be
achieved with any one method. The methods should be to integrating and implemented
throughout the product development cycle using Table 1 summarizes the approach and how
these tests fit into the product life cycle.

Accelerated test or Stage of product Definitions and uses

methods life cycle
Reliability Growth or Design and Reliability Growth is the positive
Reliability Enhancement development improvement in a reliability parameter
over a period of time due to changes in
product design or the manufacturing
process. A Reliability Growth program is
commonly established to help
systematically plan for reliability
achievement over a program’s duration so
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 24

that resources and reliability risks can be

managed.
Understanding Customer Proposal and Understanding Customer Requirements is
Requirements concept design a common sense topic that is often
overlooked. It can be a simple exercise or
include a full approach. In a full approach,
tools such as FMEA, competitive
Benchmarking, QFD and Reliability
Predictive Modeling are used to assure the
smartest approach in a product maturation
program.
HALT (Highly Design and HALT is a type of step-stress test that
Accelerated Life Test) development often combines two stresses, such as
temperature and vibration. This highly
accelerated stress test is used for finding
failure modes as fast as possible and
assessing product risks. Frequently it
exceeds the equipment-specified limits.
Step-Stress Test Design and Exposing small samples of product to a
development of units series of successively higher “steps” of a
or components stress (like temperature), with a
measurement of failures after each step.
This test is used to find failures in a short
period of time and to perform risk studies.

Failure-Free Test or Post design This is also termed zero failure testing.
demonstration test This is a statistically significant reliability
test used to demonstrate that a particular
reliability objective can be met at a certain
level of confidence. For example, the
reliability objective may be 1000 FITs (1
million hours MTTF) at the 90 percent
confidence level. The most efficient
statistical sample size is calculated when
no failures are expected during the test
period. Hence the name.
ESS (Environmental production This is an environmental screening test or
Stress Screening) tests used in production to weed out latent
and infant mortality failures.

HASS (Highly production This is a screening test or tests used in

Accelerated production to weed out infant mortality
Stress Screen) failures. This is an aggressive test since it
implements stresses that are higher than
common ESS screens. When aggressive
levels are used, the screening should be
established in HALT testing.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 25

The basic methodology in Reliability Growth planning should minimally consist of

the following
• The design of appropriate accelerated tests to stress expected or unexpected failure
modes of the subsystems. These tests should be chosen and designed to stimulate
failures at a faster rate. An effective program will include a streamlined root-cause
corrective action plan. Without a complete plan, accelerated testing will be wasted
and Reliability Growth opportunities lost.
• The correct use of historical acceleration factors and Reliability Growth parameters.
This will enable estimates of accelerated Reliability Growth over the program’s
testing phases to be generated.
• The accurate tracking and assessment of Reliability Growth during and after each test
phase and corrective action. This helps with correct assessment techniques for fix
effectiveness to properly evaluate compliance of growth goals/milestones. Planning
Reliability Growth is not enough; periodic assessments of reliability are also essential
so that management is assured that their achievement of the planned Reliability
Growth goals is realistic.
• Good monitoring and management of corrective actions using FRACAS

3.4 FRACAS

A Failure Reporting and corrective action system (FRACAS) is closed-loop coordinated

system that is used to manage failures throughout the product life cycle. It records
information about failures of a product and when and where they occurred but it also enforces
corrective action details are documented.
It is used from the beginning of a project until it is withdrawn from service. All personnel use
the system and are responsible for any FRACAS they own.
There will be one group who will have overall control of the FRACAS administration and
their function would be to ensure that all FRACAS raised are acted upon and closed out and
also to collect lessons learned and feed forward as appropriate.
For example: an engineer is working on the reliability growth programme and has
encountered a failure of the product during the step-stress test of the unit. The engineer should
raise a FRACAS and state the problem as well as give details of the unit number, date of
failure, what is was doing when it failed, reference to the test conditions. This FRACAS
would then be passed to an engineer to diagnose the fault and ultimately find a fix. The unit
can be fixed and the testing resumed. However the FRACAS is not closed until the fix has
been documented and implemented in the design and ultimately drawings (assuming the fault
is due to design). The owner of this FRACAS is responsible for seeing the problem through
and an engineering review of FRACAS would show all outstanding open FRACAS. This
ensures that problems are getting solved rather than stacked up.

3.5 In-service data analysis

Data can be collected from the field and analysed for a number of reasons. These broadly
include checking that the product service reliability is meeting requirements; looking at the
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 26

service performance and identifying any systematic faults that can be fixed in this and future
products.
Exploratory Data Analysis can be used on such data to look for trends or systematic failures.
To analyse such data the data recorded should include the following basic information:
• Date of failure
• Reason for removal from field
• Serial number of product removed
• Product type
• Customer
• Length of time operating prior to failure
• Date entered service
After diagnosis of the product the following information is required:
• Root cause of failure
• Repair information – what was removed and repaired and when
• Diagnostic information
• Category of failure i.e. was is a design failure or a component failure or a
manufacturing failure or a misuse failure or a diagnostic failure or was is working and
no fault found.
• When was it shipped out after repair

This type of information allows the product support organisation within a company to analyse
the data to identify where the majority of failures are occurring.
For example:
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 27
Find ‘Root’ Cause Determine ‘Real’ Cause
Agree Categorisation
of N.F.F. Removals of all Unit Removals of Confirmed Removals

External Fault
No Fault Found

Confirmed Faulty Design

Build
Confirmed Not Faulty Supplier
Test
Understand Why
Fault Isolation ‘Fails’

Troubleshooting
System Design
Fault Isolation Manual

Figure 11: Example of root cause analysis

By finding out that most confirmed failures are due to build problems the data can then be
filtered and analysed to explore what types of build failures are occurring and eventually to
investigate ways of reducing such failures.

3.6 Risk analysis

Risk management applies to all new product development. Common technical risk areas
include performance, producibility, production, scheduling and resources. Risk varies
depending on whether customer requirements match technology performance capability
predictions, if field experience is available on analogous assemblies, if the technology is
revolutionary or evolutionary, if the application is new, if the intended use environment is
harsh and different from previous field experience, and so forth. Risks are often assessed in
categories. A technology management risk matrix is often used in industry as shown below:

Evolutionary Revolutionary
Same application Low Risk High Risk
New Application Moderate Risk Very High Risk

Revolutionary technologies carry a higher risk. For example, when the first airplanes were
developed in the early 1900s, flying these early machines often resulted in injury or death.
Now that flying is a mature technology, the risks of flying are very low. Evolutionary changes
to the aircraft having similar applications today carry low risks since the technology is mature.

The goal of a risk management program is to make correct decisions at key points in the
program. Technology risk management is essential to the success of any development
program. Risk issues and their consequences concern everyone involved with a program’s
success. The larger and the more undeveloped a technical program, the more important it is to
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 28

manage risks. In the case of a reasonably large and/or complex program, many technical
details can impact the system. The purpose of risk management is to identify, assess and
mitigate risks throughout the project.

Since component and subsystem risks are magnified at the system level, it is important that
program management becomes aware of issues early in the program. All potential risk areas
require identification and risk handling. Management can then direct resources to prioritized
risk areas and conserve valuable time and expenses. These benefits are best realized when
technical risk issues can be properly identified, assessed, quantified, and finally handled both
at the system and the subsystem level.

3.7 REMM

The REMM project developed a methodology that supports product reliability enhancement
and consequently is viewed primarily as a design for reliability tool although it could be used
by management and product support to view the reliability status of products. It is viewed as a
move away from ‘playing the numbers game’ using reliability predictions towards a more
considered approach. Its purpose is to encourage design for reliability by providing engineers
with reliability information and lessons learned on previous designs as well as providing a
proactive holistic approach to design
Current reliability prediction techniques for electronics have been shown to be
unrepresentative of real situations. There is a dichotomy between the way reliability is
assessed (predicted) and the way it is actually achieved.
Many aerospace companies continue to use MIL Handbook 217 for reliability predictions.
Usually the predictions are made on a parts count basis and then the overall predicted failure
rate is factored using in-service performance of previous products. So if a prediction had been
done on a product that is now in the field then a multiplicative factor can be obtained by
comparing the in-service reliability and that prediction. Such factors are therefore used in
predictions for new designs.
The problem with this approach is that the product reliability is not really considered. The
parts count prediction is based on the number of components of each type and their
corresponding failure rates and as is widely known this is all based on the assumption of
constant failure rates. In addition there is a temptation when given an MTBF requirement to
use the prediction to show that it will be met without doing any engineering analysis. In other
words the prediction is often altered rather than the design or manufacture of the product.
The R&M case was developed by the UK MOD as a move away from prescribing reliability
methods and tools towards a more measured approach to achieving reliable products. The
definition is given as “A reasoned, auditable argument created to support the contention that a
defined system satisfies the R&M requirement and is therefore concerned with providing
progressive assurance that the product will be reliable. The impetus for this move towards the
R&M case is partly driven by experience. In the past the MOD, in their contracts, prescribed
specific reliability techniques and tools to be performed. They believed that using such tools
and techniques would produce the required reliable products. However, in many cases, the
opposite occurred. Payments were made to suppliers for producing the required documents
however no evidence was necessary to show that any of the tools and techniques applied
actually affected the product design and development. In other words the focus was on
producing documents rather than showing that the product would meet the necessary
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 29

reliability requirements in service. This situation is similar to the discussion on prediction

outlined earlier and highlights that there is a synergy between the aims of REMM and the
R&M case.

REMM Process
The REMM process was developed by the project team based on the philosophy of
evolutionary design of products i.e. any new product design can be based on an existing
product design. To design a new product the REMM philosophy urges the user to consider the
differences between the new product design and previous product designs. These differences
would include functionality, process (design and manufacture) and environmental changes to
enable the project team to concentrate on the risky aspects of the new product design and
development. Figure 11 below shows the simplified REMM process flow. As in DEF STAN
00-42 part3 the REMM process starts with analysing reliability requirements and then moves
on to identifying the reliability risks. The risks are identified in REMM by identifying the
novel aspects of the product design, manufacture, application and use.
Having captured requirements a similar product is identified and its associated data is
analysed in order to inform the project team of any reliability issues. The new design is
therefore altered in light of this data analysis. For example, if a particular component used in a
previous product and likely to be used in the new product is shown from analysis to be
causing reliability problems in the field then by highlighting such issues the project team can
make informed decisions regarding the new product design.

Capture requirements

Review lessons learned

Select similar equipment
through data analysis

Modify design

Service data
Capture differences between base and new design

Reliability plan generated via expert system

Reliability assessment model

Reliability tasks guidance

Reliability Case

Figure 12: Simplified REMM process

The REMM process therefore encourages the active use of lessons learned from previous
product manufacture and performance.
A REMM tool has been developed and implemented at Goodrich engine controls to
encompass the process shown in figure 12. The tool allows the user to design a new product
based on all previous products. Obviously there will be changes to designs over time for
numerous reasons, e.g. new technology, additional functionality, different installations, new
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 30

environment etc. These differences between the new design and previous designs are captured
by the tool and fed into the REMM expert system. Capturing these differences focuses the
project team on the risky aspects of this new product.

High Level Tasks - REMM Expert system

The REMM expert system has been implemented at Goodrich engine controls and is a
knowledge base expert system. It has been populated with rules that were written by
interviewing chief engineers across the different disciplines, i.e. electronic, mechanical,
components, process and reliability. The rules consist of all the possible changes structured
hierarchically. So for example, Environmental changes can be due to vibration, thermal,
humidity and shock changes. Looking at vibration changes, the input fact could be, for
example:
• Small change in level of vibration;
• Significant change in level of vibration;
• Small change in vibration frequency range;
• Significant change in vibration frequency range.

Rules for new product design aspects have also been generated and so the differences
captured include both actual differences and novel aspects.
The REMM tool is used at the concept stage to design the new product using information on
previous designs. One of the outputs of the tool is a task list generated from the expert system.
The tasks suggested may well be tasks that are already planned but it gives a starting point for
developing the reliability plan as it focuses on the high level risk areas in the new product
design. The task list can therefore be used to develop the reliability programme and plan

The tool developed at Goodrich establishes the skeleton of the product reliability case when
the expert system is run. The skeleton reliability case consists of the product description in
terms of functionality, installation, use, environment and technology, it lists the hardware and
provides a list of all the differences and novel aspects of the design. It also contains the task
list and provides references to guidance material or procedures related to specific tasks. When
a task has been completed the case is updated with a reference to a company report detailing
the outcome and solution. To close the loop the tool ought to be linked to the FRACAS
system to ensure that the solution found is implemented to improve the reliability of the
product.
At present the REMM tool is useful at the beginning of the project and really only identifies
fairly high level tasks but by implementing the statistical model the project team can identify
lower level, specific risks to update the skeleton reliability case document.

Low Level Tasks - REMM statistical model

The REMM statistical model aims to estimate reliability of a new design at specified times in
the product lifecycle to aid engineering understanding of performance and to help inform
downstream processes that will support enhancement. The model will support ‘what if’
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 31

analysis. This means that alternative scenarios can be investigated and the impact on
reliability estimated. Therefore the REMM statistical model provides a tracking system to
help analyse how reliability will and does evolve throughout the lifecycle by integrating the
numerical estimates with the engineering understanding of reliability. This is a deliberate
move to a proactive approach to design for reliability.
The REMM statistical model requires engineering concerns about potential faults in the item
of equipment to be identified and estimates when they are likely to occur as failures under
different scenarios. This model is therefore a Bayesian type statistical model.
Two basic types of input are required to populate the model.
• Engineering concerns with the new design and an estimate of their probability of
occurrence in service assuming no actions taken
• Reliability profile of the engineering concern or fault class to which it belongs

The data therefore comes from two sources – structured engineering judgement and historical
event data.
The basic output from the REMM statistical model will be the estimated reliability function,
also known as the survival function. This provides an estimate of the probability of surviving
until a specified time. Figure 13, below shows the model formulation as the reliability of the
new design is a combination of expert judgement and event data with respect to pre-defined
categories.

Reliability Estimate for New System Design

Failure Class Design Component Build

Expert
ND NC NB
Judgement

Event
RD(t) RC(t) RB(t)
Data

Figure 13: Model formulation

The engineering concerns data or expert judgement data is collected by conducting one-to-one
semi-structured interviews of members of the project team. Prior to the individual elicitation
interviews a group session is held to identify the changes in the new design as well as the
novel aspects and brief each expert on the process.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 32

In the individual interviews each expert was asked to concentrate on the new product and
consider the following questions:
• Do you have any concerns about any aspect of the product?
• What are they?
• What is the likelihood that your concerns will cause a failure in the field? This was
done by asking the designers and specialist experts to rate risks on a scale of 0-1.
• What mitigation action could be taken to avoid this concern occurring as a failure?

The resulting information is used to provide a list of concerns for the project team as well as
an input to the statistical model.
The concern list on its own can be used as a risk assessment and the project team can decide
what actions to take to reduce the risk and improve the reliability. The concern list can be
added to the skeleton reliability case and updated as mitigating actions are taken and concerns
resolved. This process is an iterative process and would be implemented at key stages in the
design and development process. The reliability case would therefore be a living document
that grows as more information is gathered.
Running the model provides more information about the reliability of the product at a specific
time in the product development process. The reliability estimate can be compared with the
reliability requirement. If the estimate is lower then closer inspection of the results can help to
identify the key tasks to improve the reliability. For example, in figure 13, assuming the
overall reliability estimate is unacceptable then the data for the category contributing most to
this estimate could be investigated further. In figure 14, looking at the data for the preload
concerns can help guide the project team towards mitigating actions that may improve the
reliability. ‘What if’ analysis can be done to identify those actions that would give the biggest
improvement in reliability and therefore providing guidance to the project team.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 33

Estimated F-Lynx

0.9 Cracking Crack (est)

0.8 Lubrication
0.7 Lub (est)
0.6 Corrosion
R e l i a b i l ity

0.5 Preload Corr (Est)

0.4 Pre (Est)

0.3
0.2 Overall Overall (Est)
0.1
0
0 500 1000 1500 2000
Operational Time

Figure 14: Example of output from REMM statistical model

All the analysis undertaken can be added to the reliability case to show how the product is
being progressively improved by identifying and subsequently reducing product risks.
Reliability Measurements
From the discussion above the REMM tool is used to help build the reliability programme
plan and the skeleton reliability case and is carried out at the concept design stage. The
statistical model is implemented at key points in the design and development process. The
concerns are updated throughout as more data is gathered.
The reliability estimates can therefore be plotted at each of the key stages to show how the
reliability estimate is changing throughout the development of the product and therefore how
close the estimate is to the requirement. Figure 15, below, shows an example of reliability
changing throughout the product life cycle.
Warwick Manufacturing Group
Introduction to Reliability Engineering Page 34

Updated

Required

Current
estimate

Lifecycle stage

Figure 15: example of reliability varying over the lifecycle

4 Reliability management

Key aspects of reliability management include:

• Corporate level involvement

• Integral part of product development not parallel

• Reliability procedures integrated into design process

• Built into programme plan and produce a reliability plan

• Ownership of the reliability plan within the design team

A reliability plan should contain the following:

• Statement of reliability requirement

• Organisation for reliability

• Reliability activities to be performed and why

• Timing of major activities

• Management of suppliers

• Standards and company procedures to be used

Warwick Manufacturing Group
Introduction to Reliability Engineering Page 35

• Lesson learned feedback

• Risk Analysis/risk register

• Data collection and analysis procedure

• Reliability monitoring plan

Reference

O’Connor (2002). Chapter 15, Reliability management, Practical Reliability Engineering,

John Wiley

5 Summary

These lecture notes provide information about why reliability is important, moreover they
give an overview of all the aspects of reliability engineering, including:
• Need for Reliability requirements;

• Need for planning to achieve reliability throughout the product life cycle;

• How to measure reliability;

• Other factors that are affected by poor reliability such as safety, competitiveness,
goodwill, maintenance costs and ultimately profit.
• Design for reliability;

• The bath-tub curve and Reliability (life distributions);

• Reliability methods – FTA,ETA,FMEA,RBD;

• Reliability testing – reliability growth planning, ESS, HALT, HASS, step-stress

testing;
• Need for good reliability management and planning;

• Managing risk;

• Using FRACAS;

• The importance of collecting an analysing in-service data.

This lecture is an overview of reliability engineering and therefore it gives an appreciation for
the topic. It is not expected to be a reliability-engineering manual but gives a flavour for the
importance of the topic and how it fits into the design, development and use of a product.
Warwick Manufacturing Group

Introduction To Reliability
100% (2)
Introduction To Reliability
37 pages
Section 14b Reliability Lecture Notes
100% (1)
Section 14b Reliability Lecture Notes
37 pages
Reliability Engineering Self Study Report
100% (1)
Reliability Engineering Self Study Report
16 pages
Reliability Engineering Insights
100% (1)
Reliability Engineering Insights
15 pages
Introduction to Reliability Engineering
100% (1)
Introduction to Reliability Engineering
125 pages
Introduction To Reliability Engineering - CERN 06.11
No ratings yet
Introduction To Reliability Engineering - CERN 06.11
37 pages
System Reliability & Maintainability Overview
100% (1)
System Reliability & Maintainability Overview
36 pages
RJS Relibility PPT-1
100% (1)
RJS Relibility PPT-1
61 pages
Introduction to Reliability Engineering
80% (5)
Introduction to Reliability Engineering
48 pages
Understanding Reliability Engineering Concepts
No ratings yet
Understanding Reliability Engineering Concepts
49 pages
Introduction To Reliability Engineering - CERN 06.11
No ratings yet
Introduction To Reliability Engineering - CERN 06.11
48 pages
Reliability and Maintainability Engineering
100% (1)
Reliability and Maintainability Engineering
60 pages
Reliability Engineering Course Overview
No ratings yet
Reliability Engineering Course Overview
27 pages
INDE 6336 Reliability Engineering: Instructor: Dr. Qianmei (May) Feng E217-D3, (713) 743-2870 Qmfeng@uh - Edu
No ratings yet
INDE 6336 Reliability Engineering: Instructor: Dr. Qianmei (May) Feng E217-D3, (713) 743-2870 Qmfeng@uh - Edu
27 pages
Benefits of Reliability Engineering
No ratings yet
Benefits of Reliability Engineering
13 pages
Introduction to Reliability Engineering
No ratings yet
Introduction to Reliability Engineering
52 pages
Reliability Engineering Introduction
No ratings yet
Reliability Engineering Introduction
37 pages
Understanding Reliability and Maintenance
No ratings yet
Understanding Reliability and Maintenance
81 pages
ME 424 - Lec 03 - 1 - Reliability
No ratings yet
ME 424 - Lec 03 - 1 - Reliability
48 pages
Reliability Engineering Overview Guide
No ratings yet
Reliability Engineering Overview Guide
21 pages
Reliability Engineering Concepts Overview
No ratings yet
Reliability Engineering Concepts Overview
15 pages
Reliability
100% (1)
Reliability
16 pages
Practical Reliability Engineering
No ratings yet
Practical Reliability Engineering
246 pages
UJ Module Lecture 1 Rev3
No ratings yet
UJ Module Lecture 1 Rev3
165 pages
Understanding Basic Reliability Concepts
No ratings yet
Understanding Basic Reliability Concepts
45 pages
01-Reliability Management
100% (1)
01-Reliability Management
120 pages
Reliability Engineering Overview
No ratings yet
Reliability Engineering Overview
9 pages
Chapter Seven
No ratings yet
Chapter Seven
34 pages
Introduction to Reliability Engineering
100% (1)
Introduction to Reliability Engineering
9 pages
Reliability Engineering Overview
No ratings yet
Reliability Engineering Overview
26 pages
Understanding Reliability in Engineering
100% (1)
Understanding Reliability in Engineering
15 pages
Eie512 02
No ratings yet
Eie512 02
45 pages
Introduction To Reliability
No ratings yet
Introduction To Reliability
26 pages
Reliability Engineering Overview and Analysis
No ratings yet
Reliability Engineering Overview and Analysis
19 pages
Reliability and Quality Engineering Course
No ratings yet
Reliability and Quality Engineering Course
61 pages
Understanding Reliability in Engineering
No ratings yet
Understanding Reliability in Engineering
27 pages
Evaluating Scribd's Reliability Factors
100% (1)
Evaluating Scribd's Reliability Factors
27 pages
Understanding Reliability in Engineering
No ratings yet
Understanding Reliability in Engineering
2 pages
Lesson 01 Quality Control and Reliability
No ratings yet
Lesson 01 Quality Control and Reliability
38 pages
Reliability Improvement Methods
No ratings yet
Reliability Improvement Methods
25 pages
Testing Methods for Reliability Engineering
No ratings yet
Testing Methods for Reliability Engineering
72 pages
Introduction To Reliability Engineering
No ratings yet
Introduction To Reliability Engineering
12 pages
Module 3
No ratings yet
Module 3
79 pages
CGE676 Lect. 1 PDF
No ratings yet
CGE676 Lect. 1 PDF
12 pages
Reliability Blueprint PDF
100% (3)
Reliability Blueprint PDF
30 pages
01 - Reliability Basics
100% (1)
01 - Reliability Basics
39 pages
2018 CRE Primer Chap 2
No ratings yet
2018 CRE Primer Chap 2
96 pages
Through The Design Process: Guide On Achieving Reliability
100% (1)
Through The Design Process: Guide On Achieving Reliability
19 pages
Understanding Reliability Engineering
100% (1)
Understanding Reliability Engineering
31 pages
Understanding Reliability Engineering
No ratings yet
Understanding Reliability Engineering
18 pages
Introduction to Reliability Theory
100% (2)
Introduction to Reliability Theory
19 pages
Reliability Engineering: Chapter Outline
No ratings yet
Reliability Engineering: Chapter Outline
14 pages
Understanding Reliability Engineering Concepts
No ratings yet
Understanding Reliability Engineering Concepts
31 pages
Reliability Management in Quality Control
No ratings yet
Reliability Management in Quality Control
15 pages
Reliability and Maintainability Insights
100% (2)
Reliability and Maintainability Insights
50 pages
Understanding Probability Density Functions
No ratings yet
Understanding Probability Density Functions
6 pages
Eterna Oil Ownership and Rainoil Insights
No ratings yet
Eterna Oil Ownership and Rainoil Insights
2 pages
Six Sigma Roles and Responsibilities
No ratings yet
Six Sigma Roles and Responsibilities
4 pages
List of Lube Blending Plants in Nigeria
100% (1)
List of Lube Blending Plants in Nigeria
4 pages
Capacity Analysis for Photocopy Centre
No ratings yet
Capacity Analysis for Photocopy Centre
34 pages
Questionnaire on Crisis Management Strategies
No ratings yet
Questionnaire on Crisis Management Strategies
6 pages
Risk Analysis Techniques for Business Decisions
No ratings yet
Risk Analysis Techniques for Business Decisions
31 pages
QUESTIONNAIRE
100% (2)
QUESTIONNAIRE
5 pages
Weibull Distribution 2
No ratings yet
Weibull Distribution 2
3 pages
Availability, Reliability, SIL
No ratings yet
Availability, Reliability, SIL
3 pages
Essential Guide to Planned Maintenance
No ratings yet
Essential Guide to Planned Maintenance
1 page
Storage Tank Design and Materials Guide
No ratings yet
Storage Tank Design and Materials Guide
1 page
Understanding Batch Production Systems
No ratings yet
Understanding Batch Production Systems
19 pages
Pareto Analysis of Equipment Failures
No ratings yet
Pareto Analysis of Equipment Failures
2 pages
Chi Square Test for Filling Machine Reliability
No ratings yet
Chi Square Test for Filling Machine Reliability
16 pages
Convert Equipment Downtime Logs for Reliability
No ratings yet
Convert Equipment Downtime Logs for Reliability
14 pages
Batch Production System Reliability
No ratings yet
Batch Production System Reliability
53 pages
Monthly Failure Data Analysis 2010-2015
No ratings yet
Monthly Failure Data Analysis 2010-2015
168 pages
Reliability 1
No ratings yet
Reliability 1
13 pages
Mem Construction 4
No ratings yet
Mem Construction 4
21 pages
MEM 6 Exams Questions
No ratings yet
MEM 6 Exams Questions
4 pages
Pareto Chart
No ratings yet
Pareto Chart
15 pages
Empirical System Reliability
No ratings yet
Empirical System Reliability
7 pages
What ATF To Use For Toyota
No ratings yet
What ATF To Use For Toyota
6 pages
Chevron Havoline Motor Oil Range
No ratings yet
Chevron Havoline Motor Oil Range
4 pages
Group II Base Oil 500N Data Sheet
No ratings yet
Group II Base Oil 500N Data Sheet
1 page
Chevron Capella HFC
No ratings yet
Chevron Capella HFC
1 page
Chevron Clarity Hydraulic Oil AW
No ratings yet
Chevron Clarity Hydraulic Oil AW
1 page
SDS - TX GST Advantage RO 32, 46
No ratings yet
SDS - TX GST Advantage RO 32, 46
9 pages
SDS - TX Rando HD 46
No ratings yet
SDS - TX Rando HD 46
9 pages
How To Test Software
100% (3)
How To Test Software
166 pages
MODERATION
No ratings yet
MODERATION
2 pages
DOE and Robust Parameter Design
No ratings yet
DOE and Robust Parameter Design
40 pages
Kafd - Msc.upm.21.7 (Cement)
No ratings yet
Kafd - Msc.upm.21.7 (Cement)
5 pages
Open Channel Flow Devices Guide
No ratings yet
Open Channel Flow Devices Guide
52 pages
World of Mathematics On Apollonius of Perga
No ratings yet
World of Mathematics On Apollonius of Perga
7 pages
Class 9 Maths New
No ratings yet
Class 9 Maths New
4 pages
Unit - 3 Notes Python
No ratings yet
Unit - 3 Notes Python
37 pages
Design and Optimization of Runner and Gating Systems For Permanent Mould Casting
No ratings yet
Design and Optimization of Runner and Gating Systems For Permanent Mould Casting
7 pages
Finance Students' Portfolio Quiz
No ratings yet
Finance Students' Portfolio Quiz
4 pages
Partial Orderings and Lattices
No ratings yet
Partial Orderings and Lattices
19 pages
NCERT Solutions For CBSE Class 8 Maths Chapter 10 Visualising Solid Shapes
No ratings yet
NCERT Solutions For CBSE Class 8 Maths Chapter 10 Visualising Solid Shapes
13 pages
Variational Approach to FRW Cosmology
No ratings yet
Variational Approach to FRW Cosmology
2 pages
Modelling Phase Change in A 3D Thermal Transient Analysis
No ratings yet
Modelling Phase Change in A 3D Thermal Transient Analysis
22 pages
Advances in Open-Channel Hydraulics After V.T. Chow's Book
No ratings yet
Advances in Open-Channel Hydraulics After V.T. Chow's Book
16 pages
GCSE Answers Limits of Accuracy
No ratings yet
GCSE Answers Limits of Accuracy
4 pages
Horizontal Distance Measurement in Surveying
No ratings yet
Horizontal Distance Measurement in Surveying
20 pages
Residential Electrician Skill Standards Guide
No ratings yet
Residential Electrician Skill Standards Guide
34 pages
Contactiving Conductivity Sensor
100% (1)
Contactiving Conductivity Sensor
134 pages
Two-Arm Helical Antenna Analysis
No ratings yet
Two-Arm Helical Antenna Analysis
24 pages
Control Systems Assignment
No ratings yet
Control Systems Assignment
2 pages
Answer Key 2017 Civil Service Examinations
No ratings yet
Answer Key 2017 Civil Service Examinations
2 pages
MHT Cet Syllabus
No ratings yet
MHT Cet Syllabus
1 page
9709 w21 Ms 53
No ratings yet
9709 w21 Ms 53
13 pages
Filomat
No ratings yet
Filomat
17 pages
Enventive Tolerance Analysis Reports Tutorial
No ratings yet
Enventive Tolerance Analysis Reports Tutorial
12 pages
r07 Remedial Mathematics
No ratings yet
r07 Remedial Mathematics
2 pages
Python Fundamentals CheatSheet Updated
No ratings yet
Python Fundamentals CheatSheet Updated
5 pages
HEC-1 Flood Hydrograph User Manual
No ratings yet
HEC-1 Flood Hydrograph User Manual
434 pages
Functions of Several Variables Explained
No ratings yet
Functions of Several Variables Explained
118 pages