You are on page 1of 62

Reliability

Assoc. Prof. Ir. Dr. Cheong Kuan Yew


School of Materials & Mineral Resources Engineering
Engineering Campus
Universiti Sains Malaysia
Topic Outcome:
At the end of this topic, students will be able to:
Describe the term “reliability”.
Discuss how to improve the reliability of a product.
Apply statistical aspect in reliability.
Explain a life-history curve of a product.
Construct and discuss OC curve related to reliability.
Use life & reliability testing plan.
Differentiate between availability & maintainability
Topic Outline:
What is Reliability?
Achieving Reliability.
Statistical Aspects of Reliability.
Life-History Curve.
Failure Analysis.
Availability & Maintainability.
(1) What is Reliability?
A reliable product or service meaning that it works for
longer period of time before failure occurs.
Reliability is:
Quality over a long run, or
A probability of success, or,
A probability (numerical value) that a product or a
service will perform its intended function
satisfactorily for a prescribed life under certain
stated environmental conditions.
4 factors associated with reliability.
Can Reliability be measured ?
(numerical value)

Reliability, R, is usually measured on


a “0” to “1” scale,
a percentage, or
parts per million (ppm).
Examples:
R = 1 or 100%  no failure of the product or
service.
R = 0 or 0%  certain failure.
R = 0.5 or 50%  the product or service is
expected to fail on half the occasions when it is
used.
Common expressions of reliability also include:
Failure rate per 1000 hours (2% failures per Failure
1000 h). in time
(FIT)
Failure rate per 1000 usage cycles.
Mean Time Between Failure (MTBF) (MTBF =
3,500 h).
Probability of an item or service failing (failure
= 2ppm).
Reliability over a fixed period (99% reliability
after 5,000 h).
What is the intended function of the
product/service?

Product/service
designed for particular applications.
expected to perform it task.
What is the intended life of the
product/service?

How long is the product/service is expected to last.


Product life
Usage
Time, or
Both.
What is the environment conditions
of the product/service usage?

Storage, transportation, and usage conditions.


Would it be more severe than actual recommended
condition?
Relationship between Reliability
& Customer Expectation
Key Customer Variables Versus Product Categories/Applications
Environment.
Calculator PC Pacemaker Auto Airline Satellite
Price Low Extremely
high
Discomfort & Low
repercussion caused by
Designed-in
malfunction Reliability Low

Customer expectations Low

E. R. Hnatek, “Practical Reliability of Electronic Equipment


and Products”, Marcel Dekker: NY, 2003, p3.
Reliability vs Durability vs
Robustness
Durability: The ability to endure with ongoing preventive
maintenance/servicing

Functional: continue to function correctly with no unscheduled


breakdown.

Robustness: Ability to continuously functioning correctly under


stressed conditions (high temp, altitude, shock..)
(2) Achieving Reliability
Reliability must start from very beginning.
Fully understand customer’s need,
need market
requirement, competitive analysis, comparison
with previous product,
product etc.
Then translate these to subsequence phases.
2 approaches: (1) top-down (2) bottom-up.
Top-down market demand & competitive analysis
Bottom-up  comparing current product to previous
product in terms of complexity, technology
capability,and design/manufacturing process.
By combining the two approaches, there are
6 main factors that a reliable
product/service can be achieved:
(1) Emphasis (Penegasan).
(2) System.
(3) Design.
(4) Production.
(5) Transportation.
(6) Maintenance.
(1) Emphasis

Increase emphasis is needed because of:


Legal means,
More complicated product/service, or
Automation.
(2) System Reliability

Complex product/service
 many different components
 a system
 chances of components not working is increased.
Therefore, the method of arranging the components
affects the entire system reliability.
Series, parallel, or combination.
(2) System Reliability
(Components in Series arrangement)

Component A Component B Component C

RA=0.955 RB=0.750 RC=0.999


Reliability of a system depends on individual
components.
Multiplicative theorem is applied.
System Reliability, RS=(RA)(RB)(RC)=0.716
System reliability is always less than the lowest
reliability value of the components.
One component fails, the whole system
not working.
(2) System Reliability
(Components in Parallel arrangement)

Component I

RI=0.750
Component J
RJ=0.840
One component fails, the whole system still able to work;
until all parallel components do not function.
Rs = 1 – (Probability of component I fail)(Probability of
component J fail)
RS = 1 – (1-RI)(1-RJ)=1-(1-0.750)(1-0.840)=0.960
(2) System Reliability
(Components in Parallel arrangement)

As the number of components in parallel increases, the


reliability increases.
The reliability for a parallel arrangement of components is
greater than the reliability of the individual components.
(2) System Reliability
(Components in Combination arrangement)

Component I

Component A RI=0.750 Component C

RA=0.955 RC=0.999
Component J

RJ=0.840

Series + parallel arrangements of components


RS = (RA)(RI,J )(RC) = (0.95)(0.96)(0.99) = 0.90
Q&A

Component I Component K

Component A RI=0.750 Component C RK=0.750


RA=0.955 RC=0.999
Component J Component L

RJ=0.840 RL=0.840
Component M

RM=0.999
RS = (RA)(RI,J )(RC)(RK,L,M )
= (RA)[1-(1-RI)(1-RJ)](RC)[1-(1-RK)(1-RL)(1-RM)
(3) Design

Fewer number of components, greater the reliability


Approximate calculation (series) Rs =Rn,
where n is the number of components and R is the
reliability of the component,assuming the reliability
is the same for all the components.
(3) Design

Other techniques:
Having a backup or redundant component (in parallel
arrangement; cheaper redundant component) 
see next slide
Over-design.
Having a fail-safe type of device (safety concern)
Maintenance.
Protection for certain environment.
Investment in reliability (RM)  reliability
Up to certain level.
(3) Design
(redundancy)
Redundancy (Kelebihan) is the duplication of parts or
features in such a way that the duplicate can take over
the function of another part in the event of failure.
Eg: In a two-engine aircraft, the second aircraft engine
will propel the aircraft if one engine fails.
The addition of redundant parts to a product can
improve the reliability of a system enormously.
enormously
It is important with safety related products and services.
There are 2 types of redundancy:
redundancy
(3) Design
(Active redundancy)
Active redundancy occurs when all the redundant items
are in operation at the same time.
time
Examples:
All 4 aircraft engines operating at the same time
when only 1 engine is enough at cruising altitude
(one engine is not enough for take off).
Both hydraulic brake circuits in your car always
working.Only one circuit is enough to stop the car in
normal driving (but not enough for an emergency
stop).
(3) Design
(Passive redundancy)
Passive redundancy occurs when the redundant items
are available but not put into in use until the active
item fails.
Examples:
The spare tyre in your car can be used in the event
of a puncture.
Spare globe in an overhead projector.
(4) Production

Basic quality techniques will minimize the risk of


product/service unreliability.
Emphasis should placed on those components which are
least reliable.
(5) Transportation

Product transports to customer.


The actual performance of the product by the customer is
the final evaluation.
Good packaging techniques and shipment evaluation are
essential.
(6) Maintenance

Designers try to eliminate the need for customer


maintenance.
Is it practical?
Product should have ample warming when failure occurs.
(light or buzzer).
Maintenance should be simple and easy to perform.
Discipline & Tasks Involved with
Product Reliability
A study of 72 nondefense corporations revealed that the
product reliability techniques they preferred and felt to
be important were the following:
Supplier
Parts control
control 76%
72%
Failure analysis and correction action 65%
Environment stress screening 55%
Test, analyze, fix 50%
Reliability qualification tests 32%
Design reviews 24%
Failure modes, effects, and critically analysis 20%

N.H. Criscimagna, “Benchmarking Commercial Reliability Practices”, IITRI, 1997.


Reliability Goals & Metrics

Typical reliability metrics for a high-reliability, high-


availability,fault-tolerant product are shown below:
Metric Definition

Corrective What customers see CMs are maintenance activities done in a reactive mode and
maintenance (CM) exclude proactive activity such as preventive maintenance.
rate
Part replacement What factory & A part replacement is any part replaced during a CM activity.
(PR) rate logistics
organization see
Failure rate What engineers see A returned part that fails a manufacturing or engineering test.
Any parts that pass all tests are called no trouble found (NTF).
NTFs are important because they indicate a problem with out
test capabilities, diagnostics, or support process.
Each of the stated reliability metrics takes one of three
forms:
CM/PR/failure rate goal based on market demand.
Expected CM/PR/failure rate  based on predictions
(Technology/Process Capability)
Actual CM/PR/failure rate  based on measurement.
The relationship among the various metrics
Expected
CM/PR/F
R
Actual > Expected Expected > Goal
Potential design or Consider new
process problems technology or
design/mfg/maintenanc
e

Actual Goal
Actual > Goal
CM/PR/F Potential competitive CM/PR/F
R Disadvantage R
Reliability Prediction

Expected
CM/PR/F Reliability
R Model
-baseline
understanding
of product’s
reliability

Actual Goal
CM/PR/F CM/PR/F
R R
Reliability Prediction

Limitations of Reliability Prediction


Simple technique omit great deal of distinguishing
detail and the very prediction suffers inaccuracy.
Detailed prediction techniques can become
bogged down in detail and become very costly. The
prediction will also lag far behind and may hinder
timely product development.
Considerable effort is required to generate sufficient data
on a part class/level to report statistically valid
reliability figures for the class/level.
Other variants that can affect the stated failure rate of a
given system are uses,operator
procedures,maintenance and rework practices,
measurement techniques or definitions of failure,
operating environments, and excess handling differing
from those addressed by modeling technique.
technique
(3) Statistical Aspects of Reliability

(Distribution Applicable to Reliability)


Types of probability distribution used in reliability
studies are:
continuous probability distribution
Exponential
Normal
Weibull
Gamma
discrete probability distribution
Geometric
Negative binomial
(Frequency Distribution Curves)
Only Exponential, Normal, and Weibull distributions are
widely used.
Their frequency distributions, f(t), as a function of time
are given below.
(Reliability Curves)

Reliability curves for exponential,normal, and Weibull


distributions are given below:
Exponential Normal Weibull
(Reliability Curves)
Reliability as a function of time.
Exponential : Rt = reliability at time t
−t t = test time or cycle
Rt = e θ θ = mean life or Mean Time
Between Failure (MTBF)
Normal : β = Weibull slope
t (or shape parameter)
Rt = 1.0 − ∫ f (t )dt
0
Weibull : Area under Normal Curve

Rt = e
( θ)
− t
β
(Failure-Rate Curves)
Failure-rate,  , is important in describing the life-
history curve of a product.
Failure-rate  probability of a failure during a stated
period of time, cycle, or number of impacts.
Failure rate can be estimated from test data by use of
the formulae:
(1) time terminated without a replacement.
(Failure-Rate Curves)

number of test failures


λest =
∑ (test time or cycle)
r
λest =
∑ t + ( n − r )T
est = estimated failure rate
r = number of test failures
t = test time for a failed item
n = number of items tested
T = terminated time
(Failure-Rate Curves)

Q&A
•Determine the failure rate for an item that has the test of
9 items terminated at the end of 22 hours. Four of the
items failed after 4, 12, 15, and 21 h, respectively. Five
items were still operating at the end of 22 h.
r
λest =
∑ t + ( n − r )T
4
λest = = 0.025
(4 + 12 + 15 + 21) + (9 − 4)22
(Failure-Rate Curves)

(2) Time terminated with replacement.

r
λest =
∑t

 est = estimated failure rate


r = number of test failures
t = test time for a failed item
(Failure-Rate Curves)

Q&A
•Determine the failure rate for 50 items that are tested for
15 h. When failure occurs, the item is replaced with
another unit. At the end of 15 h, 6 of the items had
failed.
r
λest =
∑t
6
λest = = 0.008
50(15)
(Failure-Rate Curves)

(3) Failure terminated

Q&A
Determine the failure rate for 6 items that are tested
to failure. Test cycles are 1025, 1550, 2232, 3786,
5608, and 7918.
r
λest =
∑t
6
λest = = 0.00027
1025 + 1550 + 2232 + 3786 + 5608 + 7918
(Failure-Rate Curves)
Exponential :
1
λ=
θ Constant failure rate:
Normal : Exponential distribution and
 t −θ 
2
Weibull distribution ( =1)
− 
e σ 2 
λ= 2
∞ 1  x −θ 
∫t −  
2 σ 
Weibull :
β −1
 β  t 
λ =   
 θ  θ 
(Failure-Rate Curves)
Exponential Normal Weibull
(Failure-Rate Curves)

In the previous equations,  is the mean life or


Mean Times Between Failure (MTBF).
MTBF:
How much time has elapsed between failures.
It is used when speaking of repairable
systems.
Another parameter that can be used to describe
reliability as a function of time is Mean Times to
Failure (MTTF).
MTTF:
It is used for non-repairable systems.
(Failure-Rate Curves)

The amount of time that a system is actually


operating is of great concern.
Eg: without radar screen, air traffic controllers are
sightless and therefore out of operation. To be
consider reliable, the radar must be functional for
a significant amount of expected operating time.
Since many systems need preventive or
corrective maintenance, a system’s reliability
can be judged in terms of the amount of time
it is available for use:
(Failure-Rate Curves)

MTTF
Availability =
MTTF + mean time to repair

MTTF value can be replaced by MTBF.


Mean time to repair = mean down time (MDT).
Q&A

Q&A
•Determine the failure rate and MTBF for 6 items
that are tested to failure. Test cycles are 1025,
1550, 2232, 3786, 5608, and 7918.
r
λest = Time-1
∑t
6
λest = = 0.00027
1025 + 1550 + 2232 + 3786 + 5608 + 7918
1
θ = = 3704 cycle
λ
Q&A

Windshield-wiper motors are readily available and easy


to install. Calculate the availability of the windshield
wipers on a bus driven eight hours a day, if the MTBF
is 1250 hours. When the windshield-wiper motor must
be replaced, the bus is out of service for a total of 24
hours.
1250
Availability = = 0.98
1250 + 24
The bus is available 98 percent of the time.
(4) Life-History Curve

Does the failure pattern change over the life of the


product?
For most products and some services, the pattern of
failures does change. Typically (but not always), the
pattern of failures follows what is known as a “bathtub”
curve. This is the life-history curve of the product.
Debugging Chance Failure Phase Wear out Phase
Phase
Failure rate
( )

Time (t)
Life-History Curve
(Debugging Phase)
It is also called burn-in or infant-mortality or early failure
phase.
A new machine or service, we often find it fails a few
times before it ‘settle down’ to a reliable state of
performance.
Weibull distribution with  < 1 is used to describe the
occurrence of failures in this phase.
Product is under warranty (usually).
It is a significant quality cost.
cost
Life-History Curve
(Chance Failure Phase)
Random or constant failure phase.
Failure rate is constant.
The product or service has ‘settle down’ and is reliable.
Any failures that do occur are random
Exponential and Weibull ( =1) distributions are used to
describe this phase.
Reliability studies and Sampling Plans are concerned
with this phase.
The lower the failure rate, the better the product.
Life-History Curve
(Wear Out Phase)
The product is wearing out or the service support
systems are beginning to fail.
Wear out failures tend to have a sharp rise in failure rate.
Normal distribution is the best to describe this phase.
Weibull distribution ( >1) can be used depending on the
type of wear-out distribution.
(8) Availability and
Maintainability
Time related factors of availability, reliability, and
maintainability are interrelated.
Eg: when a water line breaks (reliability) it is no
longer available to provide water to customers
and must be repaired or maintained.
Availability, A:
A time-related factor.
Measures the ability of a product or service to
perform its designated function.
Product available operation +standby.
Availability and Maintainability

Uptime MTBF
A= =
Uptime + Downtime MTBF + MDT

MDT= mean downtime = mean time to repair (MTTR) or


time to obtain a replacement.
Availability and Maintainability

MTTF
Uptime MTBF
A= =
Uptime + Downtime MTBF + MDT
MTTF
MTBF

MTTF MDT
Time

If MTBF is defined as mean time before failure, then MTBF = MTTF.


Availability and Maintainability

Maintainability:
Preventive and corrective maintenance on a product
or service can be achieved.
Mean time to repair,mean time to service,repair hours
per number of operating hours, preventive
maintenance cost, and down probability  figure of
merit for maintainability.
Keeping maintainability low  more cost effective
method of keeping availability high than
concentrating on reliability.