You are on page 1of 62


Assoc. Prof. Ir. Dr. Cheong Kuan Yew

School of Materials & Mineral Resources Engineering
Engineering Campus
Universiti Sains Malaysia
Topic Outcome:
At the end of this topic, students will be able to:
Describe the term “reliability”.
Discuss how to improve the reliability of a product.
Apply statistical aspect in reliability.
Explain a life-history curve of a product.
Construct and discuss OC curve related to reliability.
Use life & reliability testing plan.
Differentiate between availability & maintainability
Topic Outline:
What is Reliability?
Achieving Reliability.
Statistical Aspects of Reliability.
Life-History Curve.
Failure Analysis.
Availability & Maintainability.
(1) What is Reliability?
A reliable product or service meaning that it works for
longer period of time before failure occurs.
Reliability is:
Quality over a long run, or
A probability of success, or,
A probability (numerical value) that a product or a
service will perform its intended function
satisfactorily for a prescribed life under certain
stated environmental conditions.
4 factors associated with reliability.
Can Reliability be measured ?
(numerical value)

Reliability, R, is usually measured on

a “0” to “1” scale,
a percentage, or
parts per million (ppm).
R = 1 or 100%  no failure of the product or
R = 0 or 0%  certain failure.
R = 0.5 or 50%  the product or service is
expected to fail on half the occasions when it is
Common expressions of reliability also include:
Failure rate per 1000 hours (2% failures per Failure
1000 h). in time
Failure rate per 1000 usage cycles.
Mean Time Between Failure (MTBF) (MTBF =
3,500 h).
Probability of an item or service failing (failure
= 2ppm).
Reliability over a fixed period (99% reliability
after 5,000 h).
What is the intended function of the

designed for particular applications.
expected to perform it task.
What is the intended life of the

How long is the product/service is expected to last.

Product life
Time, or
What is the environment conditions
of the product/service usage?

Storage, transportation, and usage conditions.

Would it be more severe than actual recommended
Relationship between Reliability
& Customer Expectation
Key Customer Variables Versus Product Categories/Applications
Calculator PC Pacemaker Auto Airline Satellite
Price Low Extremely
Discomfort & Low
repercussion caused by
malfunction Reliability Low

Customer expectations Low

E. R. Hnatek, “Practical Reliability of Electronic Equipment

and Products”, Marcel Dekker: NY, 2003, p3.
Reliability vs Durability vs
Durability: The ability to endure with ongoing preventive

Functional: continue to function correctly with no unscheduled


Robustness: Ability to continuously functioning correctly under

stressed conditions (high temp, altitude, shock..)
(2) Achieving Reliability
Reliability must start from very beginning.
Fully understand customer’s need,
need market
requirement, competitive analysis, comparison
with previous product,
product etc.
Then translate these to subsequence phases.
2 approaches: (1) top-down (2) bottom-up.
Top-down market demand & competitive analysis
Bottom-up  comparing current product to previous
product in terms of complexity, technology
capability,and design/manufacturing process.
By combining the two approaches, there are
6 main factors that a reliable
product/service can be achieved:
(1) Emphasis (Penegasan).
(2) System.
(3) Design.
(4) Production.
(5) Transportation.
(6) Maintenance.
(1) Emphasis

Increase emphasis is needed because of:

Legal means,
More complicated product/service, or
(2) System Reliability

Complex product/service
 many different components
 a system
 chances of components not working is increased.
Therefore, the method of arranging the components
affects the entire system reliability.
Series, parallel, or combination.
(2) System Reliability
(Components in Series arrangement)

Component A Component B Component C

RA=0.955 RB=0.750 RC=0.999

Reliability of a system depends on individual
Multiplicative theorem is applied.
System Reliability, RS=(RA)(RB)(RC)=0.716
System reliability is always less than the lowest
reliability value of the components.
One component fails, the whole system
not working.
(2) System Reliability
(Components in Parallel arrangement)

Component I

Component J
One component fails, the whole system still able to work;
until all parallel components do not function.
Rs = 1 – (Probability of component I fail)(Probability of
component J fail)
RS = 1 – (1-RI)(1-RJ)=1-(1-0.750)(1-0.840)=0.960
(2) System Reliability
(Components in Parallel arrangement)

As the number of components in parallel increases, the

reliability increases.
The reliability for a parallel arrangement of components is
greater than the reliability of the individual components.
(2) System Reliability
(Components in Combination arrangement)

Component I

Component A RI=0.750 Component C

RA=0.955 RC=0.999
Component J


Series + parallel arrangements of components

RS = (RA)(RI,J )(RC) = (0.95)(0.96)(0.99) = 0.90

Component I Component K

Component A RI=0.750 Component C RK=0.750

RA=0.955 RC=0.999
Component J Component L

RJ=0.840 RL=0.840
Component M

RS = (RA)(RI,J )(RC)(RK,L,M )
= (RA)[1-(1-RI)(1-RJ)](RC)[1-(1-RK)(1-RL)(1-RM)
(3) Design

Fewer number of components, greater the reliability

Approximate calculation (series) Rs =Rn,
where n is the number of components and R is the
reliability of the component,assuming the reliability
is the same for all the components.
(3) Design

Other techniques:
Having a backup or redundant component (in parallel
arrangement; cheaper redundant component) 
see next slide
Having a fail-safe type of device (safety concern)
Protection for certain environment.
Investment in reliability (RM)  reliability
Up to certain level.
(3) Design
Redundancy (Kelebihan) is the duplication of parts or
features in such a way that the duplicate can take over
the function of another part in the event of failure.
Eg: In a two-engine aircraft, the second aircraft engine
will propel the aircraft if one engine fails.
The addition of redundant parts to a product can
improve the reliability of a system enormously.
It is important with safety related products and services.
There are 2 types of redundancy:
(3) Design
(Active redundancy)
Active redundancy occurs when all the redundant items
are in operation at the same time.
All 4 aircraft engines operating at the same time
when only 1 engine is enough at cruising altitude
(one engine is not enough for take off).
Both hydraulic brake circuits in your car always
working.Only one circuit is enough to stop the car in
normal driving (but not enough for an emergency
(3) Design
(Passive redundancy)
Passive redundancy occurs when the redundant items
are available but not put into in use until the active
item fails.
The spare tyre in your car can be used in the event
of a puncture.
Spare globe in an overhead projector.
(4) Production

Basic quality techniques will minimize the risk of

product/service unreliability.
Emphasis should placed on those components which are
least reliable.
(5) Transportation

Product transports to customer.

The actual performance of the product by the customer is
the final evaluation.
Good packaging techniques and shipment evaluation are
(6) Maintenance

Designers try to eliminate the need for customer

Is it practical?
Product should have ample warming when failure occurs.
(light or buzzer).
Maintenance should be simple and easy to perform.
Discipline & Tasks Involved with
Product Reliability
A study of 72 nondefense corporations revealed that the
product reliability techniques they preferred and felt to
be important were the following:
Parts control
control 76%
Failure analysis and correction action 65%
Environment stress screening 55%
Test, analyze, fix 50%
Reliability qualification tests 32%
Design reviews 24%
Failure modes, effects, and critically analysis 20%

N.H. Criscimagna, “Benchmarking Commercial Reliability Practices”, IITRI, 1997.

Reliability Goals & Metrics

Typical reliability metrics for a high-reliability, high-

availability,fault-tolerant product are shown below:
Metric Definition

Corrective What customers see CMs are maintenance activities done in a reactive mode and
maintenance (CM) exclude proactive activity such as preventive maintenance.
Part replacement What factory & A part replacement is any part replaced during a CM activity.
(PR) rate logistics
organization see
Failure rate What engineers see A returned part that fails a manufacturing or engineering test.
Any parts that pass all tests are called no trouble found (NTF).
NTFs are important because they indicate a problem with out
test capabilities, diagnostics, or support process.
Each of the stated reliability metrics takes one of three
CM/PR/failure rate goal based on market demand.
Expected CM/PR/failure rate  based on predictions
(Technology/Process Capability)
Actual CM/PR/failure rate  based on measurement.
The relationship among the various metrics
Actual > Expected Expected > Goal
Potential design or Consider new
process problems technology or

Actual Goal
Actual > Goal
CM/PR/F Potential competitive CM/PR/F
R Disadvantage R
Reliability Prediction

CM/PR/F Reliability
R Model
of product’s

Actual Goal
Reliability Prediction

Limitations of Reliability Prediction

Simple technique omit great deal of distinguishing
detail and the very prediction suffers inaccuracy.
Detailed prediction techniques can become
bogged down in detail and become very costly. The
prediction will also lag far behind and may hinder
timely product development.
Considerable effort is required to generate sufficient data
on a part class/level to report statistically valid
reliability figures for the class/level.
Other variants that can affect the stated failure rate of a
given system are uses,operator
procedures,maintenance and rework practices,
measurement techniques or definitions of failure,
operating environments, and excess handling differing
from those addressed by modeling technique.
(3) Statistical Aspects of Reliability

(Distribution Applicable to Reliability)

Types of probability distribution used in reliability
studies are:
continuous probability distribution
discrete probability distribution
Negative binomial
(Frequency Distribution Curves)
Only Exponential, Normal, and Weibull distributions are
widely used.
Their frequency distributions, f(t), as a function of time
are given below.
(Reliability Curves)

Reliability curves for exponential,normal, and Weibull

distributions are given below:
Exponential Normal Weibull
(Reliability Curves)
Reliability as a function of time.
Exponential : Rt = reliability at time t
−t t = test time or cycle
Rt = e θ θ = mean life or Mean Time
Between Failure (MTBF)
Normal : β = Weibull slope
t (or shape parameter)
Rt = 1.0 − ∫ f (t )dt
Weibull : Area under Normal Curve

Rt = e
( θ)
− t
(Failure-Rate Curves)
Failure-rate,  , is important in describing the life-
history curve of a product.
Failure-rate  probability of a failure during a stated
period of time, cycle, or number of impacts.
Failure rate can be estimated from test data by use of
the formulae:
(1) time terminated without a replacement.
(Failure-Rate Curves)

number of test failures

λest =
∑ (test time or cycle)
λest =
∑ t + ( n − r )T
est = estimated failure rate
r = number of test failures
t = test time for a failed item
n = number of items tested
T = terminated time
(Failure-Rate Curves)

•Determine the failure rate for an item that has the test of
9 items terminated at the end of 22 hours. Four of the
items failed after 4, 12, 15, and 21 h, respectively. Five
items were still operating at the end of 22 h.
λest =
∑ t + ( n − r )T
λest = = 0.025
(4 + 12 + 15 + 21) + (9 − 4)22
(Failure-Rate Curves)

(2) Time terminated with replacement.

λest =

 est = estimated failure rate

r = number of test failures
t = test time for a failed item
(Failure-Rate Curves)

•Determine the failure rate for 50 items that are tested for
15 h. When failure occurs, the item is replaced with
another unit. At the end of 15 h, 6 of the items had
λest =
λest = = 0.008
(Failure-Rate Curves)

(3) Failure terminated

Determine the failure rate for 6 items that are tested
to failure. Test cycles are 1025, 1550, 2232, 3786,
5608, and 7918.
λest =
λest = = 0.00027
1025 + 1550 + 2232 + 3786 + 5608 + 7918
(Failure-Rate Curves)
Exponential :
θ Constant failure rate:
Normal : Exponential distribution and
 t −θ 
Weibull distribution ( =1)
− 
e σ 2 
λ= 2
∞ 1  x −θ 
∫t −  
2 σ 
Weibull :
β −1
 β  t 
λ =   
 θ  θ 
(Failure-Rate Curves)
Exponential Normal Weibull
(Failure-Rate Curves)

In the previous equations,  is the mean life or

Mean Times Between Failure (MTBF).
How much time has elapsed between failures.
It is used when speaking of repairable
Another parameter that can be used to describe
reliability as a function of time is Mean Times to
Failure (MTTF).
It is used for non-repairable systems.
(Failure-Rate Curves)

The amount of time that a system is actually

operating is of great concern.
Eg: without radar screen, air traffic controllers are
sightless and therefore out of operation. To be
consider reliable, the radar must be functional for
a significant amount of expected operating time.
Since many systems need preventive or
corrective maintenance, a system’s reliability
can be judged in terms of the amount of time
it is available for use:
(Failure-Rate Curves)

Availability =
MTTF + mean time to repair

MTTF value can be replaced by MTBF.

Mean time to repair = mean down time (MDT).

•Determine the failure rate and MTBF for 6 items
that are tested to failure. Test cycles are 1025,
1550, 2232, 3786, 5608, and 7918.
λest = Time-1
λest = = 0.00027
1025 + 1550 + 2232 + 3786 + 5608 + 7918
θ = = 3704 cycle

Windshield-wiper motors are readily available and easy

to install. Calculate the availability of the windshield
wipers on a bus driven eight hours a day, if the MTBF
is 1250 hours. When the windshield-wiper motor must
be replaced, the bus is out of service for a total of 24
Availability = = 0.98
1250 + 24
The bus is available 98 percent of the time.
(4) Life-History Curve

Does the failure pattern change over the life of the

For most products and some services, the pattern of
failures does change. Typically (but not always), the
pattern of failures follows what is known as a “bathtub”
curve. This is the life-history curve of the product.
Debugging Chance Failure Phase Wear out Phase
Failure rate
( )

Time (t)
Life-History Curve
(Debugging Phase)
It is also called burn-in or infant-mortality or early failure
A new machine or service, we often find it fails a few
times before it ‘settle down’ to a reliable state of
Weibull distribution with  < 1 is used to describe the
occurrence of failures in this phase.
Product is under warranty (usually).
It is a significant quality cost.
Life-History Curve
(Chance Failure Phase)
Random or constant failure phase.
Failure rate is constant.
The product or service has ‘settle down’ and is reliable.
Any failures that do occur are random
Exponential and Weibull ( =1) distributions are used to
describe this phase.
Reliability studies and Sampling Plans are concerned
with this phase.
The lower the failure rate, the better the product.
Life-History Curve
(Wear Out Phase)
The product is wearing out or the service support
systems are beginning to fail.
Wear out failures tend to have a sharp rise in failure rate.
Normal distribution is the best to describe this phase.
Weibull distribution ( >1) can be used depending on the
type of wear-out distribution.
(8) Availability and
Time related factors of availability, reliability, and
maintainability are interrelated.
Eg: when a water line breaks (reliability) it is no
longer available to provide water to customers
and must be repaired or maintained.
Availability, A:
A time-related factor.
Measures the ability of a product or service to
perform its designated function.
Product available operation +standby.
Availability and Maintainability

Uptime MTBF
A= =
Uptime + Downtime MTBF + MDT

MDT= mean downtime = mean time to repair (MTTR) or

time to obtain a replacement.
Availability and Maintainability

Uptime MTBF
A= =
Uptime + Downtime MTBF + MDT


If MTBF is defined as mean time before failure, then MTBF = MTTF.

Availability and Maintainability

Preventive and corrective maintenance on a product
or service can be achieved.
Mean time to repair,mean time to service,repair hours
per number of operating hours, preventive
maintenance cost, and down probability  figure of
merit for maintainability.
Keeping maintainability low  more cost effective
method of keeping availability high than
concentrating on reliability.