You are on page 1of 294

MTBFMONTREAL.

CA
A GOAL WITHOUT A PLAN…
IS JUST A WISH

Reliability engineering is an engineering field that deals with the


study, evaluation, and life-cycle management of reliability: the
probability of a system or component to perform its
intended function (w/o failure) under stated conditions for a
specified period of time.

Reliability from Concept to Culture

RELIABILITY AND MAINTENANCE


PROGRAM
Dr. Sorin Voiculescu

Intro
FOR DESIGN AND
MANUFACTURING
MTBFMONTREAL.CA
ISBN-13: 978-1607730606
ISBN-10: 160773060X

REFERENCE 1

❑ Title of book: 50 ways to improve


product reliability
❑ Author: Mike Silverman

Reliability from Concept to Culture


❑ ISBN#: 978-1607730606

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 2


MTBFMONTREAL.CA
REFERENCE 2
(REQUIRED)

❑ Title of book: Reliability Engineering


❑ Author: K.C. Kapur, M. Pecht
❑ ISBN#: 978-1607730606

Reliability from Concept to Culture


Reliability Engineering presents an integrated approach to the design,
engineering, and management of reliability activities throughout the life
cycle of a product, including concept, research and development, design,
manufacturing, assembly, sales, and service. Containing illustrative guides
that include worked problems, numerical examples, homework problems, a
solutions manual, and class-tested materials, it demonstrates to product
development and manufacturing professionals how to distribute key
reliability practices throughout an organization.
The authors explain how to integrate reliability methods and techniques in
the Six Sigma process and Design for Six Sigma (DFSS). They also discuss
relationships between warranty and reliability, as well as legal and
liability issues. Other topics covered include:
• Reliability engineering in the 21st Century
• Probability life distributions for reliability analysis
• Process control and process capability
• Failure modes, mechanisms, and effects analysis
• Health monitoring and prognostics
Dr. Sorin Voiculescu

• Reliability tests and reliability estimation


Reliability Engineering provides a comprehensive list of references on the
topics covered in each chapter. It is an invaluable resource for those
interested in gaining fundamental knowledge of the practical aspects of
reliability in design, manufacturing, and testing. In addition, it is useful for
implementation and management of reliability programs.
INDU 6391 DR. SORIN VOICULESCU 3
MTBFMONTREAL.CA
REFERENCE 3
(OPTIONAL)

Reliability from Concept to Culture


❑ Title of book: Reliability Engineering and Risk Analysis: A Practical
Guide, Third Edition
❑ Author: M. Modarres, M. P. Kaminskiy, V. Krivtsov
❑ISBN#: 1498745873

This undergraduate and graduate textbook provides a practical and


comprehensive overview of reliability and risk analysis techniques.
Written for engineering students and practicing engineers, the book is
multi-disciplinary in scope. The new edition has new topics in classical
confidence interval estimation; Bayesian uncertainty analysis; models
for physics-of-failure approach to life estimation; extended
discussions on the generalized renewal process and optimal
maintenance; and further modifications, updates, and discussions. The
Dr. Sorin Voiculescu

book includes examples to clarify technical subjects and many end of


chapter exercises.

INDU 6391 DR. SORIN VOICULESCU 4


MTBFMONTREAL.CA
ISBN-13: 978-0873898379
ISBN-10: 0873898370

REFERENCE 4
(OLD REFERENCE)

❑ Title of book: The certified reliability


engineer handbook
❑ Author: D. W. Benbow, H. W. Broome

Reliability from Concept to Culture


ISBN#: 978-0873898379
❑ Edition: 2nd

The structure of this book is based on that of the Body of


Knowledge specified by ASQ for the Certified Reliability
Engineer, which includes design review and control;
prediction, estimation, and apportionment methodology;
failure mode effects and analysis; the planning, operation
and analysis of reliability testing and field failures,
including mathematical modeling; understanding human
factors in reliability; and the ability to develop and
Dr. Sorin Voiculescu

administer reliability information systems for failure


analysis, design and performance improvement and
reliability program management over the entire product
life cycle.

INDU 6391 DR. SORIN VOICULESCU 5


MTBFMONTREAL.CA
REFERENCE 5
(OLD REFERENCE)

❑ Title of book: Handbook of Reliability,


Availability, Maintainability and Safety in
Engineering Design
❑ Author: R. F. Stapelberg

Reliability from Concept to Culture


ISBN#: 978-1-84800-174-9

In the past two decades, industry—particularly the process


industry—has witnessed the development of several large
‘super-projects’, most in excess of a billion dollars. These large
super-projects include the exploitation of mineral resources such
as alumina, copper, iron, nickel, uranium and zinc, through the
construction of huge complex industrial process plants. Although
these super-projects create many, thou-sands of jobs resulting in
a significant decrease in unemployment, especially during
construction, as well as projected increases in the wealth and
growth of the economy, they bear a high risk in achieving their
forecast profitability through maintaining budgeted costs. Most
of the super-projects have either exceeded their budgeted
establishment costs or have experienced operational costs far
in excess of what was originally estimated in their feasibility
prospectus scope. This has been the case not only with projects
in the process industry but also with the development of
infrastructure and high-technology projects in the petroleum
and defense industries. The more significant contributors to the
cost ‘blow-outs’ experienced by these projects can be
Dr. Sorin Voiculescu

attributed to the complexity of their engineering design, both in


technology and in the complex integration of systems. These
systems on their own are usually adequately designed and
constructed, often on the basis of previous similar, though
smaller designs.

INDU 6391 DR. SORIN VOICULESCU 6


MTBFMONTREAL.CA
WHAT WILL INDU 6391 PRESENT?

Definition phase:
- benchmarking,
- reliability by similarity (similar products performance analysis)

Reliability from Concept to Culture


- reliability metrics appropriate to project
- establish targets / break-down targets / write contracts
- lessons learned implementation

Design phase:
- evaluate design capabilities by reliability predictions
- FMEA (failure mode and effect analysis),
- fault trees, physics of failure
- HALT for design (identify weakness and increase the final design’s
reliability by early testing)
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 7


MTBFMONTREAL.CA
WHAT WILL INDU 6391 PRESENT?

Validation phase:
- reliability by similarity (taking into account design changes)
- reliability growth

Reliability from Concept to Culture


- testing (when no similarity is possible)
- requirements validation testing
- life testing
- accelerated life testing
- endurance testing
- HALT for reliability (modern tool)

Manufacturing phase:
- reduce the risks potentially induced by the production line using ESS testing
Operational phase:
- FRACAS (follow-up field performance)
- early trends detection and corrective actions
- real-time health monitoring
- optimize the maintenance tasks
Dr. Sorin Voiculescu

- maintenance models
- reliability centered maintenance
- MSG3

INDU 6391 DR. SORIN VOICULESCU 8


INDU 6391
INDU 6391ESSENTIAL

DR. SORIN VOICULESCU


9
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
RELIABILITY VS. QUALITY

DR. SORIN VOICULESCU


10
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
RELIABILITY VS. QUALITY

The everyday usage term "quality of a product" is loosely taken to mean


its inherent degree of excellence. In industry, this is made more precise by
defining quality to be "conformance to requirements at the start of use".
Assuming the product specifications adequately capture customer
requirements, the quality level can now be precisely measured by the
fraction of units shipped that meet specifications.

Reliability from Concept to Culture


But how many of these units still meet specifications after a week of
operation? Or after a month, or at the end of a one-year warranty
period? That is where "reliability" comes in. Quality is a snapshot at the
start of life and reliability is a motion picture of the day-by-day
operation. Time zero defects are manufacturing mistakes that escaped
final test. The additional defects that appear over time are "reliability
defects" or reliability fallout.
The quality level might be described by a single fraction defective. To
describe reliability fallout a probability model that describes the fraction
fallout over time is needed. This is known as the life distribution model.

From an operating point of view: Reliability is the quality degradation


over time
Operational reliability of a product is highly influenced by quality
❑ in manufacturing
❑ of components
❑ of storage and transport
Dr. Sorin Voiculescu

❑ of processes used in design


❑ of the user

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 11


INDU 6391
ENGINEERING INTEGRITY
RAMS

DR. SORIN VOICULESCU


12
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
WHY INVEST IN A PLAN WHEN WE HAVE A
SIGNED CONTRACT?

For a product that entries into service in 2020, obsolescence 10 years


Contractual MTTF (mean time to failure, average operating time) =
15 years

Reliability from Concept to Culture


Let’s suppose actual MTTF = 5 years (reliability not met)
Operating time to realize that the contractual value cannot be met: 3
years
Calendar time to understand root cause : 0.5 years
Calendar time to negotiate (argue) with Supplier: 0.5 years
Calendar time to bring a corrective action and to certify: 1 year
Calendar time to retrofit: 1 year

Updated product after 6 years


Dr. Sorin Voiculescu

Above values are fictive numbers intended to highlight the need of a reliability program plan.

INDU 6391 DR. SORIN VOICULESCU 13


MTBFMONTREAL.CA
BARRIERS IN IMPLEMENTING THE
RELIABILITY

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 14


INDU 6391
DR. SORIN VOICULESCU
15
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
16
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
17
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
18
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
19
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
20
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
21
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
22
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
23
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
24
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
25
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
26
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
27
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
28
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
29
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
30
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
EXAMPLE ON HONEYWELL TRUE-STEAM

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Disclaimer: This example is only intended to present a specific problem on a particular product of Honeywell. It is not intended to harm n anyway the good image of Honeywell. Remember: there is an improvement
potential in any product of any Company. Honeywell removed the product from the market.

INDU 6391 DR. SORIN VOICULESCU 31


INDU 6391
DR. SORIN VOICULESCU
32
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
RELIABILITY PLACE IN THE PYRAMID OF
NEEDS

Companies need to secure a design first (something to sale), to make


it safe, attractive to customers (most of the cases this means not
expensive), to assess quality and, last (but not especially least),
reliability.

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 33


INDU 6391
DR. SORIN VOICULESCU
34
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
Reliability from Concept to Culture

Reliability
Failure

SOME DEFINITIONS Risk


Dr. Sorin Voiculescu

Types of Reliability
System break-down
FTA (basics)

INDU 6391 DR. SORIN VOICULESCU 35


MTBFMONTREAL.CA
RELIABILITY DEFINITION
Reliability engineering is an engineering field that deals with the
study, evaluation, and life-cycle management of reliability: the
ability of an item to perform its required function under stated
conditions for a specified period of time.

Reliability from Concept to Culture


Suppose that T is the random time-to-failure of an unit. We say also t
hat T is a hard or traumatic failure.

R(t ) = Punit does not fail over 0,t   = PT  t 

R(t)
1

0 t
Life time
Dr. Sorin Voiculescu

Conditions
time to failure
physical, chemical, mechanically, stresses…
numbers of cycles to failures

Reliability engineering relies heavily on statistics and probability theory


INDU 6391 DR. SORIN VOICULESCU 36
MTBFMONTREAL.CA
RELIABILITY DEFINITION
Reliability engineering is an engineering field that deals with the
study, evaluation, and life-cycle management of reliability: the
ability of an item to perform its required function under stated
conditions for a specified period of time.

Reliability from Concept to Culture


R(t ) = Punit does not fail over 0,t   = PT  t 
R(t)
1

0
t

Only 2 of the 5 items contributing to the definition can be drawn on a


2D graph.
If any of these 5 items changes, the reliability changes. That is why, in
Dr. Sorin Voiculescu

order to assess/estimate/study the reliability, one needs to have


clearly defined ALL of the 5 items.

INDU 6391 DR. SORIN VOICULESCU 37


MTBFMONTREAL.CA
KEY COMPONENTS

❑ Item
Item is what we are studying and embeds:
Components: components quality plays an important role

Reliability from Concept to Culture


Design: e.g.: high temperature spots will reduce the long time
performance (will be detailed in upcoming lectures)
Design margins: robustness of a design plays also an important role
Manufacturing process: e.g. hand soldering provides less reliable
results than automated one
Technology: e.g. lead soldering provides different performance
against lead-free
Transport, storage, installation, etc.
❑Probability
A probability of 51% means that there is a 49% chances that the
conclusion is incorrect based on the data.
The probability generally is translated into the reliability target (e.g.
Dr. Sorin Voiculescu

x% surviving at time t, PPM under warranty, cost of operating over a


specified time period, etc.)
Reliability is a positive number within 0 to 1 range.

INDU 6391 DR. SORIN VOICULESCU 38


MTBFMONTREAL.CA
KEY COMPONENTS

❑ Required function
This should be defined for every part, subassembly, and product. The
statement of the required function should explicitly state or imply a
failure definition. For example, a pump's required function might be

Reliability from Concept to Culture


moving at least 20 gallons per minute. The implied failure definition
would be moving fewer than 20 gallons per minute.
When defining the function one has to have a clear definition of the
failure mode. If one function can fail in multiple ways (e.g. multiple
failure modes) then each failure mode has a reliability function. The
unit’s function reliability will be a combination of each individual
reliabilities.
❑ Stated conditions
These include: environmental conditions, maintenance conditions, usage
conditions, storage and moving conditions, possibly others
❑ Specified period of time
DO NOT mix calendar time and TIME as a measure of functioning
(operating hours, calendar time, cycles, km, miles, etc.). See the
example on business vs. commercial aircrafts utilisation given in class.
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 39


MTBFMONTREAL.CA
EXAMPLE IN CLASS

Reliability from Concept to Culture


Electronic “on” time (in hours) versus flight hours:

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 40


MTBFMONTREAL.CA
COMPARISON RESULTS
Reliability cannot predict the exact time to failure of an unit. It always
deals with a population. Reliability provides units to measure the
performance of the population.
One of the measuring units is the MTBUR.
MTBUR = Mean time between unscheduled removals is the average
expected time between two consecutive removals.

Reliability from Concept to Culture


That means that, for an MTBUR of 18,000 (operating hours), on
average, one failure occurs for each 18,000 cumulated operating
hours (under the assumption that all units operate identically, a fleet
of 10 units will cumulate 18,000 hours if each operates 1,800 ; a
fleet of 180 units will cumulate 18,000 hours when each operates
100 hours).

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 41


MTBFMONTREAL.CA
MANAGEMENT’S RELIABILITY
MEASUREMENT

Reliability is not performed for the sake of reliability. It is a mean to


achieve other targets:
❑ Safety

Reliability from Concept to Culture


❑ Catastrophes
❑ Image (media impact)
❑ Availability
❑ Dispatch interruption rate
❑ Mission interruption rate
❑ Warranty
❑ Scheduled maintenance cost
❑ Life cycle cost
❑ Aftermarket
❑ Marketing
❑ Liability
Dr. Sorin Voiculescu

❑ USD, CAD, EUR, YEN, etc.

INDU 6391 DR. SORIN VOICULESCU 42


MTBFMONTREAL.CA
STATISTICS AND RELIABILITY

Failures do not happen at fixed times; they occurs randomly based on


a distribution.

Reliability from Concept to Culture


The PDF is the basic description of the time to failure of an item.
All other functions related to an item’s reliability can be derived from
the PDF
Dr. Sorin Voiculescu

Reference: Z. Klim

INDU 6391 DR. SORIN VOICULESCU 43


MTBFMONTREAL.CA
STATISTICS AND RELIABILITY

The cumulative distribution function of a real-valued random variable


X is the function given by

Reliability from Concept to Culture


where the right-hand side represents the probability that the random
variable X takes on a value less than or equal to x. The probability
that X lies in the semi-closed interval (a, b], where a < b, is
therefore

Probability that the value of the random variable T is less than or


equal to “t” is defined as the cumulative probability function

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 44


MTBFMONTREAL.CA
STATISTICS AND RELIABILITY
R(t) + F(t) = 1

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: Z. Klim

INDU 6391 DR. SORIN VOICULESCU 45


MTBFMONTREAL.CA
FAILURE

Failure is the state or condition of not meeting a desirable or


intended objective, and may be viewed as the opposite of success.
Product failure ranges from failure to sell the product to fracture of
the product, in the worst cases leading to personal injury, the province
of forensic engineering.

Reliability from Concept to Culture


The criteria for failure are heavily dependent on context of use, and
may be relative to a particular observer or belief system. A situation
considered to be a failure by one might be considered a success by
another, particularly in cases of direct competition or a zero-sum
game. Similarly, the degree of success or failure in a situation may be
differently viewed by distinct observers or participants, such that a
situation that one considers to be a failure, another might consider to
be a success, a qualified success or a neutral situation.
It may also be difficult or impossible to ascertain whether a situation
meets criteria for failure or success due to ambiguous or ill-defined
definition of those criteria. Finding useful and effective criteria, or
heuristics, to judge the success or failure of a situation may itself be a
significant task.
Failure can be differentially perceived from the viewpoints of the
evaluators. A person who is only interested in the final outcome of an
activity would consider it to be an Outcome Failure if the core issue
has not been resolved or a core need is not met. A failure can also be
a process failure whereby although the activity is completed
successfully, a person may still feel dissatisfied if the underlying
Dr. Sorin Voiculescu

process is perceived to be below expected standard or benchmark.

Reference: WIKIPEDIA

INDU 6391 DR. SORIN VOICULESCU 46


MTBFMONTREAL.CA
Intermittent Nuisances Degradation
Sudden Partial
Status

Function

FAILURE Degraded

Failed
Time
Intermittent Nuisances
Failure is an event which causes the system performance to deviate
from the specified performance

Reliability from Concept to Culture


The termination of the ability of an item to perform its required
function
❑ Fault is an erroneous state of system hardware or software
❑ Error is the manifestation of a fault
Failure classification
❑ Random failure: no apparent root cause
❑ Active failure: is evident at the moment of occurrence. It may either
produce immediately an observable deterioration in the system
performance (self evident) or the system deterioration is not
observable but the failure is indicated by the monitoring system
❑ Dormant (latent) failure: it s not immediately observable at the
moment of occurrence. It produces no immediately observable effect
on the system performance. There is not indicated by the monitoring
system
❑ Independent failure: the occurrence of a failure does not affect the
probability of the second one
Dr. Sorin Voiculescu

❑ Common mode failure: is an event having a single external cause


with multiple failure effects, which are not consequences of each other
❑ Cascading failures: a single event, not necessarily hazardous in
itself, can precipitate a series of other failures
INDU 6391 DR. SORIN VOICULESCU 47
MTBFMONTREAL.CA
FAILURE
Based on popular belief (which is not quite wrong), in order to make
sure the true root-cause is understood, on e should ask up to 7 times
the question WHY.
E.g. the car failed. WHY? (1). Because it does not start anymore.
WHY? (2). Because it does not turn the starter. WHY? (3). Because
there is no electrical power. WHY? (4). Because the power is off.
WHY? (5). Because the battery is dead. WHY? (6). Because it is a

Reliability from Concept to Culture


very cold morning. WHY? (7) – this time why means why does it not
start in a cold morning -. Because Cold weather is often fingered as the
culprit when car batteries die, but actually warm temperatures do the
most damage to them. High temperatures quicken corrosion of internal
plates and vaporize the electrolyte faster. But car batteries usually go
dead in cold weather mostly because damage done during the summer
doesn’t show up until the battery is more taxed. A cold battery has
reduced cranking power, and cold temperatures thicken motor oil,
making it harder to turn the engine over1.
The X WHY technique is generally combined with the octopus
approach, where the engineer puts himself in the place of the device
(octopus) and tries to imagine what the product (in this case himself)
lives in different states. It has to consider vibration, temperature,
extreme operation conditions, corner envelope usage, power
variations, day/night, summer/winter, vibration, thermal cycling, high
or low temp, humidity, corrosive agents, dust/pollution, jamming
factors, forces of any kind, cosmic radiations, etc. etc. etc.
Dr. Sorin Voiculescu

1 https://www.consumerreports.org/cro/news/2009/11/q-a-why-do-car-batteries-die-in-winter/index.htm

INDU 6391 DR. SORIN VOICULESCU 48


INDU 6391 DR. SORIN VOICULESCU 49
INDU 6391 DR. SORIN VOICULESCU 50
INDU 6391 DR. SORIN VOICULESCU 51
INDU 6391 DR. SORIN VOICULESCU 52
INDU 6391 DR. SORIN VOICULESCU 53
INDU 6391 DR. SORIN VOICULESCU 54
INDU 6391 DR. SORIN VOICULESCU 55
INDU 6391 DR. SORIN VOICULESCU 56
INDU 6391 DR. SORIN VOICULESCU 57
MTBFMONTREAL.CA
PHYSICAL BREAKDOWN

A breakdown is always related to a design or a real product, of


which it is a breakdown. It is identified and versioned as an object in
its own rights. It has a number of constituents (breakdown elements),
often structured hierarchically, that makes up the breakdown

Reliability from Concept to Culture


structure.

Dr. Sorin Voiculescu

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 58


MTBFMONTREAL.CA
FUNCTIONAL BREAKDOWN

A functional block diagram in systems engineering and software


engineering is a block diagram, that describes the functions and
interrelationships of a system.
The functional block diagram can picture:

Reliability from Concept to Culture


❑ functions of a system pictured by blocks
❑ input and output elements of a block pictured with lines, and
❑ the relationships between the functions
❑ the functional sequences and paths for matter and or signals

Dr. Sorin Voiculescu

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 59


MTBFMONTREAL.CA
BREAKDOWN

In order to assess the impact of the failure on the top level function,
one must assess the link between these two. A typical breakdown
represents the link starting from the functional level, going down to
technical functions ensuring the upper function and linking these
technical functions to the physical piece-part/component/installation

Reliability from Concept to Culture


involved.
For example, a screen has the function to display the information, one
of it’s technical functions is the power supply and one of the
components is the 110 plug.
Similar for a simple bicycle, below is the breakdown:

Dr. Sorin Voiculescu

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 60


MTBFMONTREAL.CA
TYPICAL BREAKDOWN
Below is the typical break-down to be considered.
For very complex systems, operational function might be split in
multiple layers. Also, especially in electronics, do not ignore the
potential contribution of one technical function to multiple system
functions as well as the potential contribution of one piece part to
multiple technical functions. The technical function power supply of a

Reliability from Concept to Culture


modern screen impacts both the function display information (image)
on the screen as well as the function acquire images with the integrated
camera.
Note that for very simple systems, operational functions might be
directly related to technical functions and these might be composed
by one single installation/component/piece-part level.
Even for very simple systems, do not skip breaking-down the system
into operational and then technical (sub)functions as this exercise
might reveal interconnections that would be missed otherwise.

Dr. Sorin Voiculescu

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 61


MTBFMONTREAL.CA
DYSFUNCTIONAL ASSESSMENT

Reliability definition involves working with failures. In order to address


the impact of the failure, one must add a supplementary layer on the
functional break-down, layer that represents the definitions of the
failures of the system functions, technical functions as well as failures

Reliability from Concept to Culture


of the piece-part/component/installation. The overall interaction of
these failures is essential in evaluating the impact of a low level
failure (e.g. a resistor) at technical function level (loss of power
supply) system level (e.g. loss of function display information on the
screen).
Generally the technique of “THE x WHY?” implies that, in order to
assess the root cause of a high system level failure, one has to ask
multiple times “WHY?”. The approach involves people outside
reliability engineering, e.g. maintenance, sales, customers, weight,
design, logistic, management, program, etc.
Why would the failure occur? Because the system would not react.
Why would the system not react? Because the …. Why would….
At this point of the course, all that’s requested is to:
❑ consider the functional assessment
❑ define failure modes fore each system function, technical function
and piece-part/component/installation
Dr. Sorin Voiculescu

Two of the most popular tools used to assess the impact and the link
between these failures are referenced in the following slides. Specifics
on these tools will be the topic of a future lecture.

INDU 6391 DR. SORIN VOICULESCU 62


MTBFMONTREAL.CA
EXAMPLE OF FAILURE DEFINITION

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 63


INDU 6391
Reference:

DR. SORIN VOICULESCU


64
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
Reference:

DR. SORIN VOICULESCU


65
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
Reference:

DR. SORIN VOICULESCU


66
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
Reference:

DR. SORIN VOICULESCU


67
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
68
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
RISK

Failure = system is not performing the intended function

We’d like to know :


- when the first failure will occur ?

Reliability from Concept to Culture


- how long will it take between two consecutive failures

hours
- 1st failure
km / miles
number of cycles - Between 2 consecutive failures

Answer: time to failure is a random variable

 one cannot provide the precise time of arrival but can give the probability of
occurrence before a certain date

One cannot say « failure will occur at 80 months », but:


« there is x% chances that failure installs before 84 months »

risk

what is the risk level we’re accepting ?


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 69


MTBFMONTREAL.CA
RISK
Risk is potential of losing something of value. Values (such as physical
health, social status, emotional well-being or financial wealth) can be
gained or lost when taking risk resulting from a given action or inaction,
foreseen or unforeseen. Risk can also be defined as the intentional
interaction with uncertainty. Uncertainty is a potential, unpredictable,
unmeasurable and uncontrollable outcome; risk is a consequence of
action taken in spite of uncertainty.

Reliability from Concept to Culture


Risk perception is the subjective judgment people make about the
severity and probability of a risk, and may vary person to person. Any
human endeavor carries some risk, but some are much riskier than others.

The risk is a measure of a danger which puts together the measure of


occurrence of the unwanted event and the measure of the consequences
of this event

Severity

Probability RISK
Dr. Sorin Voiculescu

Reference: WIKIPEDIA

INDU 6391 DR. SORIN VOICULESCU 70


MTBFMONTREAL.CA
RISKS

❑ safety
❑ media impact
❑ availability

Reliability from Concept to Culture


❑ mission interruption
❑ scheduled maintenance cost
❑ life cycle cost
❑ aftermarket
❑ warranty
❑ marketing
❑ liability
❑ program cost
❑ company’s reputation
Dr. Sorin Voiculescu

Reliability impacts the risk throughout it’s probability value.

INDU 6391 DR. SORIN VOICULESCU 71


MTBFMONTREAL.CA
EXAMPLE OF STANDARDS:
INTERNATIONAL ELECTROTECHNICAL
COMMISSION 6158

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 72


INDU 6391
EXAMPLES

DR. SORIN VOICULESCU


73
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
TYPES OF RELIABILITY

❑ Design Reliability
The design reliability of a product is the predicted reliability performance of
the product at the end of the development phase.
The prediction may be based on field experience from similar products, testing,

Reliability from Concept to Culture


expert judgment, and various types of analysis.
The prediction is based on nominal environmental and operational conditions
used during the design process.
❑ Inherent Reliability
The reliability of the products produced will tend to differ from the design
reliability due to quality variations.
The variations result from some of the components not conforming to the design
specification and/or assembly errors.
The reliability of produced items is often referred to as the inherent reliability.
❑Field (Operational) Reliability
The field reliability is the reliability of the product subsequent to the sale of the
product.
The field reliability is calculated based on recorded failures and malfunctions.
The field reliability is also called the actual reliability.
Very often, the field reliability of a product differs from the design reliability
Dr. Sorin Voiculescu

due to environmental and operational conditions varying from customer to


customer and differing from the nominal values used in the design process.
It also depends on the maintenance actions carried out by the customers during
the use of the product.
Reference: Z. Klim

INDU 6391 DR. SORIN VOICULESCU 74


MTBFMONTREAL.CA
RMAS & PROGRESSION OF ESTIMATES
Manufacturer User

Real characteristics
Intrinsic characteristics Pending on usage and maintenance

Reliability from Concept to Culture


Intrinsic reliability Customer support

Intrinsic maintainability Maintenance

Intrinsic availability Maintenance support

Operational availability

Dr. Sorin Voiculescu

Progression of estimates
INDU 6391 DR. SORIN VOICULESCU 75
MTBFMONTREAL.CA
TYPES OF RELIABILITY

❑ Design Reliability
The design reliability of a product is the predicted reliability performance of
the product at the end of the development phase.

Reliability from Concept to Culture


The prediction may be based on field experience from similar products, testing,
expert judgment, and various types of analysis.
The prediction is based on nominal environmental and operational conditions
used during the design process.
❑ Inherent Reliability
The reliability of the products produced will tend to differ from the design
reliability due to quality variations.
The variations result from some of the components not conforming to the design
specification and/or assembly errors.
The reliability of produced items is often referred to as the inherent reliability.
❑ Field (Operational) Reliability
The field reliability is the reliability of the product subsequent to the sale of the
product.
The field reliability is calculated based on recorded failures and malfunctions.
The field reliability is also called the actual reliability.
Very often, the field reliability of a product differs from the design reliability
due to environmental and operational conditions varying from customer to
Dr. Sorin Voiculescu

customer and differing from the nominal values used in the design process.
It also depends on the maintenance actions carried out by the customers during
the use of the product.

Reference: Z. Klim

INDU 6391 DR. SORIN VOICULESCU 76


INDU 6391
MAINTENANCE AND RISK

DR. SORIN VOICULESCU


77
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
Reference: web
RISK RATING EXAMPLE

DR. SORIN VOICULESCU


78
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
PROBABILITY OF OCCURRENCE

❑ It is the mathematical measure of the risk.


❑ It is measured in values between 0 and 1
❑ It is linked to a time period measured in the same unit of measure

Reliability from Concept to Culture


as the reliability is (e.g. operating hour, km, etc.)

E.g. the risk of derailment of a train car is 7.8 derailments per billion
freight car-miles (FCM)
This is equivalent to 7.8E-9 / mile per car per mile.
For a 1,500 miles distance (Montreal to Orlando), this probability
becomes 1.17E-5 / mission
1.175E-5 = 7.8E-9 * 1,500
For a life of 10 years and 180 missions a year, the probability
becomes 2.8E-2*
Dr. Sorin Voiculescu

*Under the assumption of a constant probability and of no


maintenance action taken

INDU 6391 DR. SORIN VOICULESCU 79


MTBFMONTREAL.CA
MAINTENANCE AND RISK

The maintenance action, ideally, reduces the risk value to it’s original
value (as good as new).
Risk-based maintenance (RBM) prioritizes maintenance resources
toward assets that carry the most risk if they were to fail. It is a

Reliability from Concept to Culture


methodology for determining the most economical use of maintenance
resources. This is done so that the maintenance effort across a facility
is optimized to minimize any risk of a failure.
A risk-based maintenance strategy is based on two main phases:
1. Risk assessment
2. Maintenance planning based on the risk
The maintenance type and frequency are prioritized based on the
risk of failure. Assets that have a greater risk and consequence of
failure are maintained and monitored more frequently. Assets that
carry a lower risk are subjected to less stringent maintenance
programs. Implementing a risk-based maintenance process means that
the total risk of failure is minimized across the facility in the most
economical way.
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 80


Reliability MTBFMONTREAL.CA
Reliability from Concept to Culture

MTTF
MTBF

BASIC METRICS MTBUR


Dr. Sorin Voiculescu

Maintainability
Availability
Failure rate

INDU 6391 DR. SORIN VOICULESCU 81


MTBFMONTREAL.CA
« RELIABILITY » AS METRIC
Reliability engineering is an engineering field that deals with
the study, evaluation, and life-cycle management of reliability:
the ability of a system or component to perform its required
function under stated conditions for a specified period of time.

R(t ) = Punit does not fail over 0,t   = PT  t 

Reliability from Concept to Culture


R(t) 1
R(tfix)

0 tfix t

For a given value of time tfix, one can compute the Reliability of a
system R(tfix) and express it as a fixed value. Often the Industry
uses a statement like “Reliability of 93%”; such statement always
involves a fix time.
Dr. Sorin Voiculescu

Example of tfix: warranty time, 100,000km / 75K miles, A-check /


C-check, 15 years (end of life), etc.
Institute of Electrical and Electronics Engineers (1990) IEEE Standard Computer Dictionary: A Compilation
of IEEE Standard Computer Glossaries. New York, NY ISBN 1-55937-079-3

INDU 6391 DR. SORIN VOICULESCU 82


MTBFMONTREAL.CA
COMPONENT DEFINITION

For reliability purposes, component refers to the lowest part/LRU a


system is broken in.
A chair in a space elevator for example is the component (LRU = line
replaceable unit) if a failure involves the complete removal and

Reliability from Concept to Culture


replacement of the chair.
The same chair can be broken into parts in the service center if the
design and the manuals allow the maintenance team. E.g. the chair’s
controller (computer) that actuates the actuator becomes the
component for the maintenance team as this computer is the part they
remove and replace in order to fix the failure “actuator not working”.
The controller can be split into piece-parts in the Supplier’s shop if
they can fix the failure by replacing a specific piece-part. The piece-
part is the component for the Supplier.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 83


INDU 6391
DR. SORIN VOICULESCU
84
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
85
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
86
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
87
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
88
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
89
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
90
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
91
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
92
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
93
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
94
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
95
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
96
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
97
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
SYSTEM RELIABILITY
C1

C2

Reliability from Concept to Culture


n
R parallel = 1 −  (1 − Ri )
i =1
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 98


MTBFMONTREAL.CA
PARALLEL SYSTEM M OUT OF N

In an m out of n parallel configuration, the system is performing its


intended function if m out of a total of n components are operational.

Reliability from Concept to Culture


C1

C2

Ci m/n

Cn
Dr. Sorin Voiculescu

For identical components of reliability 𝑟𝐶 𝑡 , the system’s reliability is:


𝑅𝑆 𝑡 = 𝑃𝑟𝑜𝑏 𝑚 𝑜𝑢𝑡 𝑜𝑓 𝑛 𝑜𝑝𝑒𝑟𝑎𝑡𝑒 =
𝑛! 𝑘 𝑛−𝑘
= σ𝑛𝑘=𝑚 ∗ 𝑟𝐶 𝑡 ∗ 1 − 𝑟𝐶 𝑡
𝑘! 𝑛−𝑘 !

INDU 6391 DR. SORIN VOICULESCU 99


INDU 6391
DR. SORIN VOICULESCU
100
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
101
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
102
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
103
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
104
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
105
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
106
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
107
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
108
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
109
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
110
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
111
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
112
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
113
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
Reliability from Concept to Culture
Dr. Sorin Voiculescu

Up to 19.3.2 (standby systems) - excluded

INDU 6391 DR. SORIN VOICULESCU 114


INDU 6391
DR. SORIN VOICULESCU
115
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
116
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
117
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
118
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
120
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
ACHIEVING RELIABILITY THROUGH
REDUNDANCY
Redundancy can only be used when the functional design of the system
allows for the incorporation of replicated components. It is used extensively
in electronic products to achieve high reliability when individual
components have unacceptably low reliability. Building in redundancy
corresponds to using a module consisting of (M) replications of a
component.
The number of replications needed depends on the actual and the

Reliability from Concept to Culture


allocated reliability. The reliability increases as the number of replicated
components (M) increase (see figure). The decision regarding the use of
redundancy has implications for production cost and must take into account
other constraints such as weight and/or volume. We need to ensure that
these constraints are not violated.

The manner in which the replicates are put to use depends on the type of
redundancy:
❑ In active redundancy, all (M) components of the module are in their
operational state, or “fully energized,” when put into use.
❑ In passive redundancy, only one component is in its fully energized state
and the remaining are either partially energized (warm standby) or kept
Dr. Sorin Voiculescu

in reserve and energized when put into use (cold standby)


If all components in the module have failed, then the module has failed

Reference: Z. Klim

INDU 6391 DR. SORIN VOICULESCU 121


MTBFMONTREAL.CA
WEBTOOL

http://www.ecs.umass.edu/ece/koren/FaultTolerantSystems/simulator
/NonSerPar/nsnpframe.html

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 122


INDU 6391
EXAMPLE OF USE

DR. SORIN VOICULESCU


123
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
TIME TO FAILURE

❑ Random Variable "T" is a measurement of the possible outcome of


an experiment
❑ Particular value taken by R.V."T" is denoted by t

Reliability from Concept to Culture


❑ Time to failure "T" — Continuous Random Variable

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 124


MTBFMONTREAL.CA
MTTF: MEAN TIME TO FAILURE
MTBF: MEAN TIME BETWEEN FAILURES

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 125


INDU 6391
MTTF / MTBF

DR. SORIN VOICULESCU


126
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
MTTF / MTBF

DR. SORIN VOICULESCU


127
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
MTTF / MTBF / MTTR

MTBF

Reliability from Concept to Culture


MTTF

MTTR: Mean time to repair is the average time needed to recover a


unit to an operational state.
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 128


MTBFMONTREAL.CA
FAILURE DEFINITION IS IMPORTANT
total 100,000.00 20.00 5.00
car km tire failure computer failure
1 1,500.00 1
2 1,000.00 1

For example, a car fleet reliability is observed, 3


4
2,500.00
5,500.00
1
1
measured in MTTF, time to first failure. For 5
6
6,500.00
1,700.00
1
1
academic purposes, let’s suppose that only 2 7
8
2,600.00
3,400.00 1
failure modes are being observed: tire burst 9 4,500.00 1

Reliability from Concept to Culture


and main computer, both failures leading to 10
11
1,600.00
2,400.00 1
unavailability of the car. 12
13
2,100.00
1,900.00
1

14 1,600.00 1
Let’s assume the following data: over the last 15
16
1,500.00
1,000.00 1
12 months, the fleet has cumulated 100,000km 17 2,500.00 1
18 5,500.00 1
(so time definition is in km) up to the first 19 5,900.00

failure. This 100,000km is the result of the 20


21
1,700.00
2,600.00
1
1
number of kilometers cumulated by each car up 22
23
3,400.00
4,500.00 1
to its first failure. 24 4,600.00 1
25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
5 computers and 20 tires failed (confirmed 30 6,800.00 2

failures) over the same time period.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 129


MTBFMONTREAL.CA
FAILURE DEFINITION IS IMPORTANT

total 100,000.00 20.00 5.00


Case 1: computer reliability car km tire failure computer failure
1 1,500.00 1

Each car is equipped with one computer, so the 2


3
1,000.00
2,500.00
1
1

computers have cumulated 100,000km and 5 4


5
5,500.00
6,500.00
1
1
failures. The non-failed computer cars are 6 1,700.00 1

Reliability from Concept to Culture


7 2,600.00
adding suspension times to our analysis as their 8 3,400.00 1

computer worked but the failure (blue) was not 9


10
4,500.00
1,600.00
1

due to a computer. Same is applicable for non- 11


12
2,400.00
2,100.00
1
1
failed cars as their computers add operating 13
14
1,900.00
1,600.00 1
time without failure. 15 1,500.00
16 1,000.00 1
17 2,500.00 1
The above is leading to a MTTF = 18 5,500.00 1

100,000km/5 = 20,000km 19
20
5,900.00
1,700.00 1
21 2,600.00 1
22 3,400.00
23 4,500.00 1
24 4,600.00 1
25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
30 6,800.00 2

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 130


MTBFMONTREAL.CA
FAILURE DEFINITION IS IMPORTANT

total 100,000.00 20.00 5.00


Case 2: tire reliability car km tire failure computer failure
1 1,500.00 1

Each car is equipped with 4 tires, so the tiers 2


3
1,000.00
2,500.00
1
1

have cumulated 4*100,000km and 20 failures. 4


5
5,500.00
6,500.00
1
1
The non-failed tire cars are adding suspension 6 1,700.00 1

Reliability from Concept to Culture


7 2,600.00
times to our analysis as their tires worked but 8 3,400.00 1

the failure (yellow) was not due to a tire. Same 9


10
4,500.00
1,600.00
1

is applicable for non-failed cars as their tires 11


12
2,400.00
2,100.00
1
1
add operating time without failure. Moreover, 13
14
1,900.00
1,600.00 1
the non-failed tires (most of the cars have one 15 1,500.00

tire failed and 3 non-failed) add operational 16


17
1,000.00
2,500.00
1
1
time as their operation was suspended without 18
19
5,500.00
5,900.00
1

failure. 20 1,700.00 1
21 2,600.00 1
22 3,400.00
The above is leading to a MTTF = 23 4,500.00 1
24 4,600.00 1
400,000km/20 = 20,000km 25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
30 6,800.00 2

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 131


MTBFMONTREAL.CA
FAILURE DEFINITION IS IMPORTANT

total 100,000.00 20.00 5.00


Case 3: car fleet reliability car km tire failure computer failure
1 1,500.00 1

For 100,000km (cumulated by failed and non- 2


3
1,000.00
2,500.00
1
1

failed cars) the data shows 25 failures. 4


5
5,500.00
6,500.00
1
1
6 1,700.00 1
The above is leading to a MTTF =

Reliability from Concept to Culture


7 2,600.00
8 3,400.00 1
100,000km/25 = 4,000km 9 4,500.00 1
10 1,600.00
11 2,400.00 1
12 2,100.00 1
13 1,900.00
14 1,600.00 1
15 1,500.00
16 1,000.00 1
17 2,500.00 1
18 5,500.00 1
19 5,900.00
20 1,700.00 1
21 2,600.00 1
22 3,400.00
23 4,500.00 1
24 4,600.00 1
25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
30 6,800.00 2

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 132


MTBFMONTREAL.CA
MTBUR

An acronym for Mean Time Between Unscheduled Removal. This is


an operational measurement. If all removals were because of actual
component failure then MTBUR would be equivalent to MTBF, but that
is not usually the case and so MTBUR is usually less than MTBF.

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 133


MTBFMONTREAL.CA
MTBX IN INDUSTRY

Notation MTBX is generically used to highlight that the following is


applicable to MTTF, MTBF, MTBUR, etc.
The definition of the MTBX divides the cumulated time by the number
of the events. The X varies depending on the definition of the events

Reliability from Concept to Culture


(see previous slide, e.g. X = UR if the events are unscheduled
removals, etc.).
It is important to have a clear definition of the event. The following list
is not exhaustive:
❑ removal
❑ unscheduled removal
❑ justified removal
❑ failure
❑ induced failure
❑ confirmed failure Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 134


MTBFMONTREAL.CA
MTBUR = 2 MINUS 7

MTBUR: Mean time between unscheduled removals


MTBF: J (justified) 4 , N (non-induced) 6 , C (confirmed) 8

1 Removal

Reliability from Concept to Culture


unit removed

2 3
Unscheduled Removal Scheduled Removal
Known or suspected malfunction unit removed to perform
maintenance

4 5
Failure / Fault Unjustified Removal
failure/fault found No Failure / No Fault Found (NFF)

6 Predictions 7
Failure / Fault Induced Failure / Fault
unit used within specification unit used out of specification
Dr. Sorin Voiculescu

8 9
Confirmed / Accepted Unconfirmed
Failure / Fault Failure / Fault
Failure/fault does not
Failure/fault substantiates the
substantiates the reason for
reason for removal
removal
INDU 6391 DR. SORIN VOICULESCU 135
MTBFMONTREAL.CA
DATA – HOURS TO FAILURE EXAMPLE
The following pages presents an example of a unit that should meet, by
contract, a minimum MTBUR value of 4,000 hours.
The overall analysis of the unit’s performance measured by the MTBUR = (sum
of time to failure) / (number of failures) shows compliance to contractual
value.

Reliability from Concept to Culture


Cumulated over the life

Partial extract of the data

The first approach is to take each individual item contributing to the analysis
and to enter its knowing operating time (first column), removal (second
column) or suspension (3rd column).
Using the time to removal or to suspension for each individual unit in the fleet
can offer the performance over the life of the product since entry into service.
Dr. Sorin Voiculescu

This approach is not sensitive to any design / manufacturing / installation /


operational / maintenance changes over time as one is unable to say in which
year the failure encountered after 8h occurred.
Note: complete data is in the file “ MTBUR numerical example” attached on
Moodle.
INDU 6391 DR. SORIN VOICULESCU 136
MTBFMONTREAL.CA
MTBX IN INDUSTRY
It is very common not to have access to each individual product’s performance due to lack of
traceability, but to have access to the overall performance of the fleet.
For example, let’s suppose I am a car producer and I track the performance of an electronic
computer. I shipped the computer with serial number XX to my Customer A who has Car Number 1
and Car Number 2. I do not know how many hours have the unit serial XX been performed
because I do not know if it was installed on car Number 1 or on car Number 2 (my Customer did
not provide me with this data), but I know that, during the month of October, both cars have
summed ZZ number of hours. Extending this situation to all my Customers, I cannot use the exact
operating time to removal/failure of each individual computer to understand the overall
performance of my computers.

Reliability from Concept to Culture


How about if, instead of using time to failure for each component in the sample size, we use the
observed performance over a specific period of time?
So, instead of using the exact operating time for each individual computer, I have to stick to a
number of hours cumulated over a period of time and a number of failures over the same period
of time.
Depending on the time frame chosen for the study, multiple options are available when counting
the cumulated operating time and cumulated number of removals/failures :
❑ over the last fiscal/ calendar year: very useful for economic purposes as many financial targets
are set per fiscal year ad these targets are influenced by the reliability performance
❑ over the last four months: useful to observe a trend, as well as seasonal (e.g. winter/summer)
behaviour
❑ over the last 12 months: very useful to observe a more stable performance; as 12 month data
generally integrates a large value for time (number of hours, cycles, km, miles, etc.) and an
important number of events, short time variability is attenuated.
❑ during a specific time frame: e.g. car/train industry 50.000km regardless the calendar time;
some particular cases might require observation over a non-standard time frame
❑ since the beginning of life: MTBX shows the overall performance of the product since the entry
into service; as today’s products have generally long lives (years), one should pay attention to
major changes that might impact any of the 3 parameters of the reliability definition (product,
operational conditions, function), e.g. changes in manufacturing, design updates, extra-
functionality added to product, operating conditions change, maintenance procedures change,
maintenance team change (most of the cases with a better one), etc. OBSERVATION: this
Dr. Sorin Voiculescu

approach should give the same identical result as the previous approach using the individual
contribution of each item in the sample size.

It makes sense to see variations of this MTBX number over time.

INDU 6391 DR. SORIN VOICULESCU 137


MTBFMONTREAL.CA
MTBUR: MOVING AVERAGE
Moving average reports are used to show the performance of a product
over time. Depending on the reliability of the product and the size of the
sample, among other things, the performance (MTBUR/MTBF) of the product
measured over small time periods (say monthly) could vary widely and will
not be a real indicator of performance.
Take for example a product that in one month had 2 unscheduled removals
over 20,000 Flight Hours and in the following months had 3 unscheduled

Reliability from Concept to Culture


removals and 1 unscheduled removal respectively, for the same amount of
flight hours. The monthly MTBUR for this product will be 10,000, 6,667 and
20,000 Flight Hours for each month respectively; a wide variation. The
variation is considered natural noise in the data.
In order to get a better picture of true performance of the product, we can
instead calculate the average performance over a fixed period of time
ending on the month in question. This is a moving average. So for a 3 month
moving average, data from any one month is averaged with data from the
prior two months. However, in many cases, 3 months may still be too short a
time period to dampen, or "filter", out this natural noise, so 6 and 12 month
moving average report are also provided.
Note also that using a "longer" moving average report (12MMA vs 3MMA)
may have its downsides. The longer the time period used for moving
averages, the longer it will take for true shifts in the performance to show
up.
A balance in the use of these reports, should provide helpful information
needed for making decisions, always keeping in mind what question you're
trying to answer.
Following pages present different analyses on the same data. First is the
Dr. Sorin Voiculescu

“classic approach”, then the data is obtained by reading the monthly


operating times and failures.

Reference: https://havrel.honeywell.com/docs/index.cfm?content=help/FAQ.cfm#8

INDU 6391 DR. SORIN VOICULESCU 138


MTBFMONTREAL.CA
DATA – CUMULATED OPERATING TIME
The same numerical data as before is analyzed now by month:
instead of each operating time to removal, we analyze cumulated
operating time per month and number of removals per month.
The following analysis takes into account the monthly operating time
instead of the individual times to failure. So, for 201202 (February

Reliability from Concept to Culture


2012), without knowing how many units operated, it is known that the
cumulative operating time was 1,091h
The overall MTBUR (sum of all operating times divided by number of
removals) is (obviously) the same as the one computed before.

Partial extract of the data


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 139


MTBFMONTREAL.CA
3 AND 12 MONTHS MOVING AVERAGE
MTBUR

Instead of computing the total operating time, 3 months moving


average MTBUR offers the performance over the last 3 months
(computed by summing the operating times of the last 3 months
divided by the number of removals encountered over the last 3

Reliability from Concept to Culture


months).
3 months moving average MTBUR can be computed for any of the
month having 2 previous operating data.

In a similar manner, 12 months moving average reflects the


performance of the last 12 months.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 140


MTBFMONTREAL.CA
DATA INTERPRETATION
log 3 months MTBUR
12 months MTBUR
Contract

18,000

1,800
201101
201103
201105
201107
201109
201111
201201
201203
201205
201207
201209
201211
201301
201303
201305
201307
201309
201311
201401
201403
201405
201407
201409
201411
201501
201503
201505
201507
201509
201511
201601
201603
201605
201607
201609
201611

Reliability from Concept to Culture


▪ 3 months average is not representative due to low number of removals
and to high variations of this number(too much noise)
▪ For a short period of time at the end of the summer 2013, performance
was below target.
▪ 2013-2016 shows a performance around the minimum contractual value
▪ Since March 2016, the unit is performing below target
It is interesting to understand the removal reason that induced the
decreasing trend since 2011 to 2013. It looks like an ageing failure mode
installs.
▪ Once the ageing units are balanced by the newly installed ones, over
2013-2016, the overall performance is quite constant
▪ From March 2016, the unit’s performance is constantly decreasing. Action
needs to be taken to improve the performance. The generic word action is
defined by the following process (FRACAS – failure reporting, analysis,
and corrective action system) :
Dr. Sorin Voiculescu

 Identify the driving removal reason and the driving failure mode(s)
 Understand the field root-causes initiating these failure modes
 Update FMEA
 Implement actions to reduce/eliminate the impact of these root-causes (either by eliminating the root
causes or by reducing/eliminating the impact)
 Monitor the effectiveness of the measures by tracing the performance of the units with these corrective
actions implemented
INDU 6391 DR. SORIN VOICULESCU 141
MTBFMONTREAL.CA
EXAMPLE OF USE

An investigation on the fuel control system of the F100 Engine was


conducted. Some of the recommendations were justified using the
MTBUR moving average technique.
Recommendation

Reliability from Concept to Culture


Justification

Dr. Sorin Voiculescu

Reference: Google books: Final Report on the Fuel Control system of the F100 Engine

INDU 6391 DR. SORIN VOICULESCU 142


MTBFMONTREAL.CA
EXAMPLE OF USE

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: Google books: Final Report on the Fuel Control system of the F100 Engine

INDU 6391 DR. SORIN VOICULESCU 143


MTBFMONTREAL.CA
MTBUR – 12 MONTHS MOVING AVERAGE
EXAMPLE ON A CAR ITEM
Example of variations induced by the age of the fleet. Early years
showed a very good MTBUR due to the fact that all vehicles are
equipped with new items. towards 2012, the MTBUR increases,
potentially doe to any of the following:
❑ design change (new parts, improved operational conditions: e.g.

Reliability from Concept to Culture


reduced temperature, vibration, improved hermeticity to moisture,
etc.)
❑ large order (many new vehicles on market)
❑ maintenance improved
❑ operating conditions improved
❑ new manufacturing line and/or process (or existing one upgraded)
❑ new supplier
❑ etc., etc., etc.

25000
20000
UNIT KM

15000
Dr. Sorin Voiculescu

10000
5000
0
Feb-05

Oct-05
Feb-06

Oct-06
Feb-07

Oct-07
Feb-08

Oct-08
Feb-09

Oct-09
Feb-10

Oct-10
Feb-11

Oct-11
Feb-12

Oct-12
Feb-13
Jun-05

Jun-06

Jun-07

Jun-08

Jun-09

Jun-10

Jun-11

Jun-12

TARGET 12-M MTBUR

INDU 6391 DR. SORIN VOICULESCU 144


MTBFMONTREAL.CA
B10

B10 is the time that a devices will operate prior to 10% of a sample
of those devices would fail.

Reliability from Concept to Culture


R(t) 1
10%
R(tfix)

1t 2
B10 < B10
0 fix
t
Dr. Sorin Voiculescu

DR. SORIN V

INDU 6391 DR. SORIN VOICULESCU 145


MTBFMONTREAL.CA
BX – TOOL FOR COMPARISON

Depending on the X value, the results might be different. In the


example below, for a small value of X, the component represented by
the red reliability function (2) provides a better value (B2 > B1). For a
large value of X, the component 1 (black reliability function on the

Reliability from Concept to Culture


graph) provides better results (B1 > B2).
Even though from a mathematical point of view, X can take any value
within (0, 1), typical values for X are 5%, 10% or even (rarely) 20%.

R(t) 1
X%

Y%
Dr. Sorin Voiculescu

0 1 2 2 1 t
INDU 6391 DR. SORIN VOICULESCU 146
INDU 6391
AVAILABILITY

DR. SORIN VOICULESCU


147
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
MUT / MDT
Mean up time (MUT): is a measure of the mean time a machine,
typically a computer, has been working and available. Uptime is the
opposite of downtime. It is often used as a measure of computer
operating system reliability or stability, in that this time represents the
time a computer can be left unattended without crashing, or needing
to be rebooted for administrative or maintenance purposes.

Reliability from Concept to Culture


Mean down time (MDT): is a measure of the mean time when a system
is unavailable. Downtime or outage duration refers to a period of
time that a system fails to provide or perform its primary function.

Cumulated running time after 1st failure


MUT =
Number of intervals between 2 consecutive failures

Cumulated Down-time
MDT =
Number of failures after the 1st one

MTBF = MUT+MDT
Dr. Sorin Voiculescu

MUT
Availability =
Reference: WIKIPEDIA MTBF
INDU 6391 DR. SORIN VOICULESCU 148
INDU 6391
EXAMPLE IN CLASS

DR. SORIN VOICULESCU


149
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
MAINTAINABILITY

Reliability from Concept to Culture


The following items are typical:
❑ diagnosis time
❑ part procurement time
❑ teardown time
Dr. Sorin Voiculescu

❑ rebuild time
❑ verification time

INDU 6391 DR. SORIN VOICULESCU 150


MTBFMONTREAL.CA
MAINTAINABILITY

In engineering, maintainability is the ease with which a product can be


maintained in order to:
❑ isolate defects or their cause,
❑ correct defects or their cause,

Reliability from Concept to Culture


❑ repair or replace faulty or worn-out components without having to
replace still working parts,
❑ prevent unexpected breakdowns,
❑ maximize a product's useful life,
❑ maximize efficiency, reliability, and safety,
❑ meet new requirements,
❑ make future maintenance easier, or
❑ cope with a changed environment

In telecommunication and several other engineering fields, the term


maintainability has the following meanings:
❑ A characteristic of design and installation, expressed as the
probability that an item will be retained in or restored to a specified
Dr. Sorin Voiculescu

condition within a given period of time, when the maintenance is


performed in accordance with prescribed procedures and resources.
❑ The ease with which maintenance of a functional unit can be
performed in accordance with prescribed requirements.

INDU 6391 DR. SORIN VOICULESCU 151


INDU 6391
DR. SORIN VOICULESCU
152
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
FAILURE RATE

Failure rate is the frequency with which an engineered system or


component fails, expressed in failures per unit of time. It is often
denoted by the Greek letter λ (lambda) and is highly used in
reliability engineering.

Reliability from Concept to Culture


Depending on the timeframe considered, one can have:
- Instant failure rate (time -> 0)
- Daily failure rate
- Annualized failure rate
- 75K miles failure rate, etc.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 153


INDU 6391
FAILURE RATE

DR. SORIN VOICULESCU


154
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FAILURE RATE

DR. SORIN VOICULESCU


155
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FAILURE RATE

DR. SORIN VOICULESCU


156
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FAILURE RATE

DR. SORIN VOICULESCU


157
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
158
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
SOFTWARE

DR. SORIN VOICULESCU


159
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
SOFTWARE

DR. SORIN VOICULESCU


160
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
SOFTWARE

Software Reliability is the probability of failure-free software


operation for a specified period of time in a specified environment.
Software Reliability is also an important factor affecting system
reliability. It differs from hardware reliability in that it reflects the

Reliability from Concept to Culture


design perfection, rather than manufacturing perfection. The high
complexity of software is the major contributing factor of Software
Reliability problems. Software Reliability is not a function of time -
although researchers have come up with models relating the two. The
modeling technique for Software Reliability is reaching its prosperity,
but before using the technique, we must carefully select the
appropriate model that can best suit our case. Measurement in
software is still in its infancy. No good quantitative methods have
been developed to represent Software Reliability without excessive
limitations. Various approaches can be used to improve the reliability
of software, however, it is hard to balance development time and
budget with software reliability

Dr. Sorin Voiculescu

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 161


MTBFMONTREAL.CA
SOFTWARE

A partial list of the distinct characteristics of software compared to


hardware is listed below [Keene94]:
Failure cause: Software defects are mainly design defects.
Wear-out: Software does not have energy related wear-out phase. Errors

Reliability from Concept to Culture


can occur without warning.
Repairable system concept: Periodic restarts can help fix software
problems.
Time dependency and life cycle: Software reliability is not a function of
operational time.
Environmental factors: Do not affect Software reliability, except it might
affect program inputs.
Reliability prediction: Software reliability can not be predicted from any
physical basis, since it depends completely on human factors in design.
Redundancy: Can not improve Software reliability if identical software
components are used.
Interfaces: Software interfaces are purely conceptual other than visual.
Failure rate motivators: Usually not predictable from analyses of
separate statements.
Built with standard components: Well-understood and extensively-tested
standard parts will help improve maintainability and reliability. But in
software industry, we have not observed this trend. Code reuse has been
around for some time, but to a very limited extent. Strictly speaking there
Dr. Sorin Voiculescu

are no standard parts for software, except some standardized logic


structures.

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 162


MTBFMONTREAL.CA
SOFTWARE

Software reliability, however, does not show the same characteristics


similar as hardware. A possible curve is shown in Figure 2 if we projected
software reliability on the same axes. [RAC96]There are two major
differences between hardware and software curves. One difference is that
in the last phase, software does not have an increasing failure rate
as hardware does. In this phase, software is approaching obsolescence;

Reliability from Concept to Culture


there are no motivation for any upgrades or changes to the software.
Therefore, the failure rate will not change. The second difference is that in
the useful-life phase, software will experience a drastic increase in failure
rate each time an upgrade is made. The failure rate levels off gradually,
partly because of the defects found and fixed after the upgrades.

The upgrades in Figure 2 imply feature upgrades, not upgrades for


reliability. For feature upgrades, the complexity of software is likely to be
increased, since the functionality of software is enhanced. Even bug fixes
may be a reason for more software failures, if the bug fix induces other
defects into software. For reliability upgrades, it is possible to incur a drop
in software failure rate, if the goal of the upgrade is enhancing software
Dr. Sorin Voiculescu

reliability, such as a redesign or reimplementation of some modules using


better engineering approaches, such as clean-room method.

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 163


MTBFMONTREAL.CA
SOFTWARE

Software Reliability Models


A proliferation of software reliability models have emerged as
people try to understand the characteristics of how and why software
fails, and try to quantify software reliability. Over 200 models have

Reliability from Concept to Culture


been developed since the early 1970s, but how to quantify software
reliability still remains largely unsolved. Interested readers may refer
to [RAC96], [Lyu95]. As many models as there are and many more
emerging, none of the models can capture a satisfying amount of the
complexity of software; constraints and assumptions have to be made
for the quantifying process. Therefore, there is no single model that
can be used in all situations. No model is complete or even
representative. One model may work well for a set of certain
software, but may be completely off track for other kinds of
problems.
Most software models contain the following parts: assumptions,
factors, and a mathematical function that relates the reliability with
the factors. The mathematical function is usually higher order
exponential or logarithmic.
Software modeling techniques can be divided into two subcategories:
prediction modeling and estimation modeling. [RAC96] Both kinds of
modeling techniques are based on observing and accumulating
Dr. Sorin Voiculescu

failure data and analyzing with statistical inference. The major


difference of the two models are shown in Table 1.

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 164


MTBFMONTREAL.CA
SOFTWARE

Reliability from Concept to Culture


Table 1. Difference between software reliability prediction models and
software reliability estimation models
Representative prediction models include Musa's Execution Time Model, Putnam's
Model. and Rome Laboratory models TR-92-51 and TR-92-15, etc. Using
prediction models, software reliability can be predicted early in the
development phase and enhancements can be initiated to improve the reliability.
Representative estimation models include exponential distribution models,
Weibull distribution model, Thompson and Chelson's model, etc. Exponential
models and Weibull distribution model are usually named as classical fault
count/fault rate estimation models, while Thompson and Chelson's model belong
to Bayesian fault rate estimation models.
The field has matured to the point that software models can be applied in
practical situations and give meaningful results and, second, that there is no one
model that is best in all situations. [Lyu95] Because of the complexity of
software, any model has to have extra assumptions. Only limited factors can be
put into consideration. Most software reliability models ignore the software
development process and focus on the results -- the observed faults and/or
failures. By doing so, complexity is reduced and abstraction is achieved,
however, the models tend to specialize to be applied to only a portion of the
Dr. Sorin Voiculescu

situations and a certain class of the problems. We have to carefully choose the
right model that suits our specific case. Furthermore, the modeling results can not
be blindly believed and applied.

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 165


MTBFMONTREAL.CA
SOFTWARE

Software Reliability Metrics


Measurement is commonplace in other engineering field, but not in software
engineering. Though frustrating, the quest of quantifying software reliability has never
ceased. Until now, we still have no good way of measuring software reliability.
Measuring software reliability remains a difficult problem because we don't have a

Reliability from Concept to Culture


good understanding of the nature of software. There is no clear definition to what
aspects are related to software reliability. We can not find a suitable way to measure
software reliability, and most of the aspects related to software reliability. Even the
most obvious product metrics such as software size have not uniform definition.
It is tempting to measure something related to reliability to reflect the characteristics, if
we can not measure reliability directly. The current practices of software reliability
measurement can be divided into four categories: [RAC96]
Product metrics
Software size is thought to be reflective of complexity, development effort and
reliability. Lines Of Code (LOC), or LOC in thousands(KLOC), is an intuitive initial
approach to measuring software size. But there is not a standard way of counting.
Typically, source code is used(SLOC, KSLOC) and comments and other non-executable
statements are not counted. This method can not faithfully compare software not written
in the same language. The advent of new technologies of code reuse and code
generation technique also cast doubt on this simple method.
Function point metric is a method of measuring the functionality of a proposed software
development based upon a count of inputs, outputs, master files, inquires, and
interfaces. The method can be used to estimate the size of a software system as soon as
these functions can be identified. It is a measure of the functional complexity of the
program. It measures the functionality delivered to the user and is independent of the
programming language. It is used primarily for business systems; it is not proven in
scientific or real-time applications.
Complexity is directly related to software reliability, so representing complexity is
important. Complexity-oriented metrics is a method of determining the complexity of a
Dr. Sorin Voiculescu

program's control structure, by simplify the code into a graphical representation.


Representative metric is McCabe's Complexity Metric.

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 166


MTBFMONTREAL.CA
SOFTWARE

Test coverage metrics are a way of estimating fault and reliability by


performing tests on software products, based on the assumption that software
reliability is a function of the portion of software that has been successfully
verified or tested. Detailed discussion about various software testing methods
can be found in topic Software Testing.

Reliability from Concept to Culture


Project management metrics
Researchers have realized that good management can result in better products.
Research has demonstrated that a relationship exists between the development
process and the ability to complete projects on time and within the desired
quality objectives. Costs increase when developers use inadequate processes.
Higher reliability can be achieved by using better development process, risk
management process, configuration management process, etc.
Process metrics
Based on the assumption that the quality of the product is a direct function of the
process, process metrics can be used to estimate, monitor and improve the
reliability and quality of software. ISO-9000 certification, or "quality
management standards", is the generic reference for a family of standards
developed by the International Standards Organization(ISO).
Fault and failure metrics
The goal of collecting fault and failure metrics is to be able to determine when
the software is approaching failure-free execution. Minimally, both the number
of faults found during testing (i.e., before delivery) and the failures (or other
problems) reported by users after delivery are collected, summarized and
analyzed to achieve this goal. Test strategy is highly relative to the effectiveness
of fault metrics, because if the testing scenario does not cover the full
functionality of the software, the software may pass all tests and yet be prone
to failure once delivered. Usually, failure metrics are based upon customer
information regarding failures found after release of the software. The failure
Dr. Sorin Voiculescu

data collected is therefore used to calculate failure density, Mean Time Between
Failures (MTBF) or other parameters to measure or predict software reliability.

Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 167


MTBFMONTREAL.CA
SOFTWARE
Software Reliability Improvement Techniques

Good engineering methods can largely improve software reliability.


Before the deployment of software products, testing, verification and validation are necessary
steps. Software testing is heavily used to trigger, locate and remove software defects.
Software testing is still in its infant stage; testing is crafted to suit specific needs in various
software development projects in an ad-hoc manner. Various analysis tools such as trend
analysis, fault-tree analysis, Orthogonal Defect classification and formal methods, etc, can also

Reliability from Concept to Culture


be used to minimize the possibility of defect occurrence after release and therefore improve
software reliability.
After deployment of the software product, field data can be gathered and analyzed to study
the behavior of software defects. Fault tolerance or fault/failure forecasting techniques will be
helpful techniques and guide rules to minimize fault occurrence or impact of the fault on the
system.
Conclusions
Software reliability is a key part in software quality. The study of software reliability can be
categorized into three parts: modeling, measurement and improvement.
Software reliability modeling has matured to the point that meaningful results can be obtained
by applying suitable models to the problem. There are many models exist, but no single model
can capture a necessary amount of the software characteristics. Assumptions and abstractions
must be made to simplify the problem. There is no single model that is universal to all the
situations.
Software reliability measurement is naive. Measurement is far from commonplace in software,
as in other engineering field. "How good is the software, quantitatively?" As simple as the
question is, there is still no good answer. Software reliability can not be directly measured, so
other related factors are measured to estimate software reliability and compare it among
products. Development process, faults and failures found are all factors related to software
reliability.
Software reliability improvement is hard. The difficulty of the problem stems from insufficient
understanding of software reliability and in general, the characteristics of software. Until now
there is no good way to conquer the complexity problem of software. Complete testing of a
moderately complex software module is infeasible. Defect-free software product can not be
assured. Realistic constraints of time and budget severely limits the effort put into software
reliability improvement.
Dr. Sorin Voiculescu

As more and more software is creeping into embedded systems, we must make sure they don't
embed disasters. If not considered carefully, software reliability can be the reliability
bottleneck of the whole system. Ensuring software reliability is no easy task. As hard as the
problem is, promising progresses are still being made toward more reliable software. More
standard components, and better process are introduced in software engineering field.
Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

INDU 6391 DR. SORIN VOICULESCU 168


INDU 6391 DR. SORIN VOICULESCU 169
INDU 6391 DR. SORIN VOICULESCU 170
INDU 6391 DR. SORIN VOICULESCU 171
INDU 6391 DR. SORIN VOICULESCU 172
INDU 6391 DR. SORIN VOICULESCU 173
INDU 6391 DR. SORIN VOICULESCU 174
INDU 6391 DR. SORIN VOICULESCU 175
INDU 6391 DR. SORIN VOICULESCU 176
INDU 6391 DR. SORIN VOICULESCU 177
INDU 6391 DR. SORIN VOICULESCU 178
INDU 6391 DR. SORIN VOICULESCU 179
INDU 6391 DR. SORIN VOICULESCU 180
INDU 6391
SOFTWARE

DR. SORIN VOICULESCU


181
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FTA

FMEA

TOOLS
FTA
FMEA

DR. SORIN VOICULESCU


182
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
FMEA: BOTTOM-UP APPROACH

Failure Modes and Effect Analysis (FMEA) is a systematic technique of


identifying and preventing product and process problems before they
occur.
With FMEA, you explore potential failure modes of the lowest level

Reliability from Concept to Culture


(installation/component/piece-part) and identify potential effects of
this failure up to the system level. It generally addresses the effect
propagation of a single failure up to the system level.
Low level

System

Dr. Sorin Voiculescu

Reference: WEB

INDU 6391 DR. SORIN VOICULESCU 183


INDU 6391
DR. SORIN VOICULESCU
184
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
SEVERITY RANKING
An industry-dedicated ranking is generally used.
The table must be common for all the items in the project.
If you do not have any reference, you can use one of the below:

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 185


MTBFMONTREAL.CA
PROBABILITY RANKING
An industry-dedicated ranking is generally used.
The table must be common for all the items in the project.
If you do not have any reference, you can use one of the below:

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 186


MTBFMONTREAL.CA
DETECTABILITY RANKING

For INDU 6391, DET = 1 for all elements (for simplicity)

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 187


MTBFMONTREAL.CA
FTA: TOP-DOWN APPROACH

Fault Tree Analyse (FTA) is a top—down approach to failure analysis.


You can use an FTA to identify high level (system) failures and to
eliminate the cause of the failure. An FTA is a systematic, deductive
method far a single specific undesirable event and determining al

Reliability from Concept to Culture


possible failures and combinations that could cause the event in
question to occur.

System

Dr. Sorin Voiculescu

Component

INDU 6391 DR. SORIN VOICULESCU 188


INDU 6391
FTA

DR. SORIN VOICULESCU


189
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FTA

DR. SORIN VOICULESCU


190
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FTA

DR. SORIN VOICULESCU


191
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
FTA

DR. SORIN VOICULESCU


192
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
193
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
194
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
195
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
196
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
197
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
198
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
199
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
200
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
201
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
202
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
QUANTITATIVE VS QUALITATIVE IN
RELIABILITY (AND SAFETY)

Qualitative Quantitative

Reliability from Concept to Culture


No numerical probability Numerical probability
Used in early phases of the Used in detailed phases of the
project project
Allows early identification of the Allows computation of the
top risk elements predicted reliability of the
design and break-down by
Some safety requirements major sub-systems (e.g. power
checked (e.g. no single event supply, motor, etc.)
leads to top event)
Numerical validation of the
Early link between FMEA and safety requirements (e.g. top
FTA event probability is less than 2E-
FMEA probability is expressed 9/h)
on a qualitative scale e.g. low,
medium high, e.g. 1 to 5, e.g. 1
to 10, etc.
Dr. Sorin Voiculescu

FTA have no probability number


associated

INDU 6391 DR. SORIN VOICULESCU 203


MTBFMONTREAL.CA
QUANTITATIVE VS QUALITATIVE IN
RELIABILITY (AND SAFETY): FMEA
EXAMPLE
Qualitative

Reliability from Concept to Culture


Quantitative

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 204


MTBFMONTREAL.CA
QUANTITATIVE VS QUALITATIVE IN
RELIABILITY (AND SAFETY): FTA EXAMPLE

Qualitative

Reliability from Concept to Culture


Quantitative

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 205


MTBFMONTREAL.CA
Reliability from Concept to Culture

Exponential

STATISTICS FOR RELIABILITY Weibull


Dr. Sorin Voiculescu

Normal
Log-Normal

INDU 6391 DR. SORIN VOICULESCU 206


MTBFMONTREAL.CA
MATHEMATICAL REPRESENTATION OF
RELIABILITY

The graph below shows two curves:


- C1: a step-down curve that fits the evolution of the operating size
(in percentage) of a finite sample size, operating under given
conditions. It can be noticed that the operation is defined (WHAT

Reliability from Concept to Culture


function the units should produce) as well as the operating conditions
are. Each time when a unit out of n fails, the operating size reduces
by 1/n.
- C2: a continuous curve that APPROXIMATES the above step-down
one.
In theory, larger the sample size is, closer the step-down curve C1
gets to a continuous form. Better the mathematical model chosen for
C2, closer the C2 gets to what the products perform in the field.
REMEMBER: a mathematical model for C2 is only as good as the
selection criteria are.
C2 can be mathematically modeled by a continuous function
𝑅 𝑡Τ𝛼1 , 𝛼2 , … 𝛼𝑠 where 𝛼1 . . 𝛼𝑠 are the model’s parameters. Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 207


MTBFMONTREAL.CA
MATHEMATICAL REPRESENTATION OF
RELIABILITY

Independent of the chosen model, some important functions are generally


used in reliability*:
❑ reliability function 𝑅 𝑡 = 𝑃𝑅 𝑇 ≥ 𝑡 : is the probability that T will take

Reliability from Concept to Culture


a value higher than or equal to 𝑡
❑ unreliability (cumulative distribution) 𝐹 𝑡 = 1 − 𝑅 𝑡 : is the probability
that T will take a value less than or equal to 𝑡
𝑑𝑅 𝑡
❑ probability density function 𝑓 𝑡 = − : a function that describes
𝑑𝑡
the relative likelihood for this random variable to take on a given value
𝑓 𝑡
❑ h 𝑡 hazard rate and 𝜆 𝑡 failure rate h 𝑡 = 𝜆 𝑡 = : the
𝑅 𝑡
frequency with which an engineered system or component fails, expressed
in failures per unit of time (many papers and references use these terms
interchangeable. The hazard rate is the limit of the instantaneous failure
rate given no failures up to time t)

❑ 𝑀𝑇𝑇𝐹 = ‫׬‬0 𝑅 𝑡 𝑑𝑡

Graphic form of each of the above varies depending on model and


parameter value.
Dr. Sorin Voiculescu

* As defined in a previous lecture, the value of the time to failure 𝑇 cannot


be known. The time to failure 𝑇 is a random variable.

INDU 6391 DR. SORIN VOICULESCU 208


MTBFMONTREAL.CA
MATHEMATICAL REPRESENTATION OF
RELIABILITY

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 209


INDU 6391
DR. SORIN VOICULESCU
210
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
211
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
RELIABILITY PREFERRED MODELS

Choosing the right model to mathematically represent the reliability


evolution of a function within a specific project is critical for the
program. The selection of the model can largely impact:
❑ target definition

Reliability from Concept to Culture


❑ test plan set-up
❑ test results interpretation
❑ trade-offs
❑ each phase exit decision (go/no-go decision)
The INDU 691 presents the following models :
❑ exponential: one parameter 𝜆
❑ Weibull: two parameters 𝛽 and 𝜂
❑ normal: two parameters 𝜇 and 𝜎
❑ log-normal: two parameters 𝜇 and 𝜎
Defining a reliability law is equivalent with defining the parameters
for the chosen model.
Dr. Sorin Voiculescu

Note that the above 4 models are generally sufficient to characterize


most of the products and their related failure modes. Still, some
specific cases (specific failure modes, new technologies, etc.) might
require the use of other models.

INDU 6391 DR. SORIN VOICULESCU 212


MTBFMONTREAL.CA
CHOOSING THE RIGHT MODEL
The reliability model is obviously related to the product. Field experience
demonstrated that models are generally related to failure mode
mechanisms and transferable from one product to another. He graph
below highlights the most usual association of failure mode and
mathematical model:
❑ Exponential: a ccommonly used distribution in reliability engineering.

Reliability from Concept to Culture


Mathematically, it is a fairly simple distribution, which many times leads to
its use in inappropriate situations. It is used to model the behavior of units
that have a constant failure rate (or units that do not degrade with time or
wear out). There is no dominant failure mechanism and random failures
are expected.
❑ Weibull: one of the most widely used lifetime distributions in reliability
engineering. It is a versatile distribution that can take on the characteristics
of other types of distributions, based on the value of the shape parameter
𝛽. It can characterize:
✓ 0 < 𝛽 < 1 : early life of the product
✓ 𝛽 = 1: random failures (equals the exponential distribution)
✓ 𝛽 > 1 failure modes induced by wear-out
❑ Normal: also known as the Gaussian distribution, is the most widely-used
general purpose distribution. It is for this reason that it is included among
the lifetime distributions commonly used for reliability and life data
analysis. There are some who argue that the normal distribution is
inappropriate for modeling lifetime data because the left-hand limit of the
distribution extends to negative infinity. This could conceivably result in
modeling negative times-to-failure. However, provided that the distribution
Dr. Sorin Voiculescu

in question has a relatively high mean and a relatively small standard


deviation, the issue of negative failure times should not present itself as a
problem. Nevertheless, the normal distribution has been shown to be useful
for modeling the lifetimes of consumable items, such as printer toner
cartridges.

INDU 6391 DR. SORIN VOICULESCU 213


MTBFMONTREAL.CA
CHOOSING THE RIGHT MODEL

❑ Log-Normal: The lognormal distribution is commonly used to model


the lives of units whose failure modes are of a fatigue-stress nature.
Since this includes most, if not all, mechanical systems, the lognormal
distribution can have widespread application. Consequently, the
lognormal distribution is a good companion to the Weibull distribution

Reliability from Concept to Culture


when attempting to model these types of units. As may be surmised
by the name, the lognormal distribution has certain similarities to the
normal distribution. A random variable is lognormally distributed if
the logarithm of the random variable is normally distributed.
The graph below visualizes the relation between the failure mode
and the mathematical model :
Failure mode

?
component
hasard failure
degradation
ageing random

? ?

wear-out fatigue

? software + external
?
events
mecanica force systematique failures
corrosion vibrating heat
l d
Dr. Sorin Voiculescu

chimique
wear-out FATIGUE FATIGUE
random
variable ou constant load vibrating
etc

INDU 6391 DR. SORIN VOICULESCU 214


MTBFMONTREAL.CA
CHOOSING THE RIGHT MODEL

For cases not covered by the previous page or when the association
to the proposed model on the previous page is under question, one
should consider some more extensive work before making the choice.
Other means to select the law might be:
❑ Internet search

Reliability from Concept to Culture


❑ manuals / literature
❑ vendor test results
❑ PoF
❑ Experts opinion, etc.
Sometimes it’s impossible to decide upfront on the model; in such cases,
a test needs to be performed and the model is decided based on the
test results.

Reminder: the use of a mathematical model for the reliability of a


part/component/LRU/system/etc. requires a fixed function (and a
defined failure mode) as well as fixed operating conditions.
Dr. Sorin Voiculescu

Change of the function might imply change of the model or of the


parameters if the same model corresponds to the new function.
Change of the operating conditions generally impact the value of the
model’s parameters, especially the scale one (the time-related one).

INDU 6391 DR. SORIN VOICULESCU 215


MTBFMONTREAL.CA
USE OF THE MATHEMATICAL MODELS
DURING THE PROGRAM

Approximating the reliability evolution by a parametric mathematical


model is useful to:
❑ visualize the reliability evolution within time
❑ predict the evolution of a system

Reliability from Concept to Culture


❑ provide input to other domains (e.g. safety)
During the DESIGN phase, reliability targets can be translated into
model’s1 parameters
2 minimum requirements. In other words, if the
design
1 target
1.2 is maximum mission failure probability 𝑝 = 10−5 for a
2128771.4
failure
2 mode
1.4 of system which (failure mode) is modeled by Weibull
1130483.9

distribution,
3
4
this
1.6 696525.2
1.7 475370
can be translated into minimum parameters
requirement
H= 5 1.9 as presented in the graph below:
349129
6 2.1 270738.5
1 10
6
7 2.3 218819.1
6
8 2.5 182645 110 9 105
8 10
5
9 2.6 156395.3
7 10
5
10 2.8 136704
6 10
5
11 3 121519.9
 2  5
H 5 10
h 4 10
5
OK
3 10
5

2 10
5

1 10
5
reject
1.215105 0
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
1.4  1 3
b
H
Dr. Sorin Voiculescu

Any product with the specific failure mode characterized by a


Weibull of parameters 𝛽 and 𝜂 below the graph is rejected. Only
products with model parameters above the graph are meeting the
design target.
INDU 6391 DR. SORIN VOICULESCU 216
MTBFMONTREAL.CA
USE OF THE MATHEMATICAL MODELS
DURING THE PROGRAM

During the VALIDATION phase, the mathematical models are used to


❑ build an optimal test plan
❑ predict the time and the cost of the test

Reliability from Concept to Culture


❑ convert a test plan to an accelerated test plan
❑ validate the test results compliance against targets
During the OPERATION phase, the mathematical models are used to
❑ validate the model’s assumption
❑ validate field performance of the product against targets

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 217


MTBFMONTREAL.CA
REMINDER

Many people associates the engineering of reliability exclusively to


dedicated statistics. In order to apply these statistics, it is extremely
important to:
❑ define the function

Reliability from Concept to Culture


❑ define the operating conditions
❑ choose the right model
Statistics model failures and statistical analysis does not improve reliability.
It helps setting targets and measuring the evolution throughout the
program.
The only mean to improve reliability is by affecting one or several of the
following product related aspects:
❑ design
❑ components quality
❑ manufacturing process
❑ production screening
❑ transport conditions
❑ operating conditions
❑ storage
Dr. Sorin Voiculescu

❑ maintenance
All the previous lectures gave you means to achieve the most reliable
design before testing. Reliability modeling and statistics are here now to
confirm the compliance of the design to requirements.
INDU 6391 DR. SORIN VOICULESCU 218
MTBFMONTREAL.CA
EXPONENTIAL DISTRIBUTION
𝑅 𝑡 = 𝑒𝑥𝑝 −𝜆 𝑡

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 219


INDU 6391
EXPONENTIAL DISTRIBUTION

DR. SORIN VOICULESCU


220
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
EXPONENTIAL DISTRIBUTION

DR. SORIN VOICULESCU


221
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
DR. SORIN VOICULESCU
222
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
EXPONENTIAL

The law the most popular among the Industry due to:
❑ ease of use (one parameter)
❑ maintenance (linearizes the failure rate)

Reliability from Concept to Culture


❑ representativeness for very complex systems

Densité de proba exp(-t)


Failure rate
Défiabilité 1-exp(-t) Taux de défaillance

Unreliability 0 ,0 12 0
PDF 0 ,0 12 0

1,0 0 0 0 0 ,0 10 0 0 ,0 10 0

0 ,0 0 8 0 0 ,0 0 8 0
F(t)

f(t)

f(t)

0 ,0 0 6 0 0 ,0 0 6 0

0 ,0 0 4 0 0 ,0 0 4 0

0 ,0 0 2 0 0 ,0 0 2 0

0 ,0 0 0 0 0 ,0 0 0 0 0 ,0 0 0 0
t t t
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 223


MTBFMONTREAL.CA
EFFECT OF LAMBDA ON THE
EXPONENTIAL DISTRIBUTION

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: RELIAWIKI.COM

INDU 6391 DR. SORIN VOICULESCU 224


MTBFMONTREAL.CA
MEMORYLESS EFFECT
The exponential distribution is the only continuous distribution satisfying:

Reliability from Concept to Culture


𝑃 𝑇 ≥ 𝑡 = 𝑒 −𝜆𝑡
𝑃 𝑇 ≥𝑡+𝑠 ∩𝑇 ≥𝑡
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 =
𝑃 𝑇≥𝑠
𝑃 𝑇 ≥𝑡+𝑠
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 =
𝑃 𝑇≥𝑠
𝑒 −𝜆(𝑡+𝑠) 𝑒 −𝜆𝑡+𝜆𝑠 𝑒 −𝜆𝑠 ∗ 𝑒 −𝜆𝑡
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 = −𝜆𝑠
= −𝜆𝑠 =
𝑒 𝑒 𝑒 −𝜆𝑠

𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 = 𝑒 −𝜆𝑡 = 𝑃 𝑇 ≥ 𝑡
This result indicates that the conditional reliability function for the lifetime
of a component that has survived to time s is identical to that of a new
component. This term is the so-called "used-as-good-as-new" assumption.
The lifetime of a fuse in an electrical distribution system may be assumed
Dr. Sorin Voiculescu

to have an exponential distribution. It will fail when there is a power surge


causing the fuse to burn out. Assuming that the fuse does not undergo any
degradation over time and that power surges that cause failure are likely
to occur equally over time, then use of the exponential lifetime distribution
is appropriate, and a used fuse that has not failed is as good as new.
INDU 6391 DR. SORIN VOICULESCU 225
MTBFMONTREAL.CA
MEMORYLESS EFFECT

Implications:
MTTF = MTBF (replacement is as good as new)
A time interval Δ𝑡 has the same impact (in percentage) over the

Reliability from Concept to Culture


reliability, independent of the value of the starting time:

𝑅 𝑡 − 𝑅 𝑡 + ∆𝑡 = 𝑒 −𝜆𝑡+Δ𝑡 − 𝑒 −𝜆𝑡
= 𝑒 −𝜆𝑡 − 𝑒 −𝜆𝑡 ∗ 𝑒 +Δ𝑡 = 𝑒 −𝜆𝑡 1 − 𝑒 −Δ𝑡
If for example, during Δ𝑡 = 100 operating hours, a new product
will loose 50% of it’s reliability, R(t = 0h + Δ𝑡 =100h) = 0.5,
then after 200 hours it will loose 50% of the remaining value:
R(200h) = R(100h) * (% Decrease due to functioning Δ𝑡 = 100)
= 0.5 * 0.5 = 0.25.
For the same assumptions, if after 41.49h, a product reaches
75% reliability, R(41.49h) = 0.75, after 141,49 hours (100 more
operational hours), R(41.49+100) will reduce to half of
R(41.49), this means:
R(141.49) = R(41.49) * .5 = 0.75 * 0.5 = 0.375
Dr. Sorin Voiculescu

If we continue the logic, based on the memoryless effect, after


100 more operating hours, the reliability value will split in two:
R(241.49) = R(141.49) * .5 = 0.375 * 0.5 = 0.1875
INDU 6391 DR. SORIN VOICULESCU 226
MTBFMONTREAL.CA
EXPONENTIAL DISTRIBUTION FOR
T=MTBF

Knowing that:
1
𝜆=
𝑀𝑇𝑇𝐹

Reliability from Concept to Culture


Let’s compute the R(t = MTTF)
1
− 𝑀𝑇𝑇𝐹 ∗𝜆 − 𝑀𝑇𝑇𝐹 ∗
𝑅 𝑡 = 𝑀𝑇𝑇𝐹 = 𝑒 =𝑒 𝑀𝑇𝑇𝐹 = 𝑒 −1 = 0.367

Based on the definition applied to t = MTTF, reliability is the


probability that a system performs it’s intended function over MTBF
duration (under given conditions).
The above equation translates in 63.21% of the units FAILED at the
time of MTBF. For example, a company has in possession 100
computers with an MTTF of 10,000 hours. This translates into
approximately 63 computers FAILED before 10,000 operational
hours.
As the exponential law has MTBF = MTTF, the above is also
applicable to MTBF (considering that a repair brings the component
to a state equivalent to as good as new).
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 227


MTBFMONTREAL.CA
CONSTANT FAILURE HAZARD - USAGE

Much easier to be used

Reliability from Concept to Culture


• Simple reliability equation
• Field returns interpretation (total FH and total fails)
• Test setup confidence level
• 100 units operating 1 hour = 1 unit operating 100 hours
• Allows prediction models (generally for electronics)

Dr. Sorin Voiculescu

2
2
8
INDU 6391
DR. SORIN VOICULESCU
229
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
RISK-BASED MAINTENANCE (RBM)

Theoretically, the maintenance task is intended to reduce the failure


rate to a state of “as good as new”. Still in theory, maintenance tasks
performed at a correct interval, will make failure rate variations over
time to be approximated by a constant value see below).

Reliability from Concept to Culture


In conjunction with a validate maintenance program, the exponential
behavior hypothesis of a product can be a valid hypothesis.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 230


MTBFMONTREAL.CA
WEIBULL DISTRIBUTION

Reliability from Concept to Culture

1
Dr. Sorin Voiculescu

𝑀𝑇𝑇𝐹 = 𝜂 ∗ Γ 1 +
𝛽

INDU 6391 DR. SORIN VOICULESCU 231


INDU 6391
WEIBULL DISTRIBUTION

DR. SORIN VOICULESCU


232
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
RELIAWIKI.COM

DR. SORIN VOICULESCU


233
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
BETA PARAMETER AND FAILURE MODE

Field experience associates specific BETA values (or intervals) to


specific failure modes. Papers and references exist on the typical
BETA value for simple components which generally have one single
major failure mode.

Reliability from Concept to Culture


For example, an extract from
http://www.barringer1.com/wdbase.htm:

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 234


INDU 6391 DR. SORIN VOICULESCU 235
MTBFMONTREAL.CA
WEIBULL DISTRIBUTION PARAMETERS
REPRESENTATION

As observed through out field experience, for most of the cases, 𝛽


parameter takes values between 0,4 and 7. The graph below
presents all the couples 𝛽, 𝜂 that satisfies the reliability requirement.
Any 𝛽, 𝜂 couple in the green area characterizes a better than

Reliability from Concept to Culture


requirement product. Any 𝛽, 𝜂 couple in the green area
characterizes a lower than requirement product.

IMPORTANT
1 2 REMARK: pending on the requirement definition, the
1 shape of the graph can change and the green-red areas can reverse.
1.2 2128771.4
2
3
Always pay attention to the meaning of each 𝛽, 𝜂 couple.
1.4 1130483.9
1.6 696525.2
4 1.7 475370
H= 5 1.9 349129
6 2.1 270738.5
1 10
6
7 2.3 218819.1
8 2.5 182645 11069 105
8 10
5
9 2.6 156395.3
7 10
5
10 2.8 136704
11 3 121519.9 
6 10
5
 2  5
H 5 10
h 4 10
5
OK

3 10
5

2 10
5

1 10
5
NOK
1.215105 0
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
1.4  1 3
b
H
Dr. Sorin Voiculescu

Reference:

INDU 6391 DR. SORIN VOICULESCU 236


MTBFMONTREAL.CA
NORMAL DISTRIBUTION

The normal distribution, also known as the Gaussian distribution, is the


most widely-used general purpose distribution. It is for this reason that
it is included among the lifetime distributions commonly used for
reliability and life data analysis. There are some who argue that the

Reliability from Concept to Culture


normal distribution is inappropriate for modeling lifetime data
because the left-hand limit of the distribution extends to negative
infinity. This could conceivably result in modeling negative times-to-
failure. However, provided that the distribution in question has a
relatively high mean and a relatively small standard deviation, the
issue of negative failure times should not present itself as a problem.
Nevertheless, the normal distribution has been shown to be useful for
modeling the lifetimes of consumable items, such as printer toner
cartridges.

Dr. Sorin Voiculescu

Reference: RELIAWIKI.COM

INDU 6391 DR. SORIN VOICULESCU 237


MTBFMONTREAL.CA
NORMAL DISTRIBUTION

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: RELIAWIKI.COM

INDU 6391 DR. SORIN VOICULESCU 238


MTBFMONTREAL.CA
NORMAL DISTRIBUTION

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: RELIAWIKI.COM

INDU 6391 DR. SORIN VOICULESCU 239


MTBFMONTREAL.CA
NORMAL DISTRIBUTION PARAMETERS
REPRESENTATION

Both 𝜇, 𝜎 parameters can practically take values across the entire


0, ∞ range. It has been noted that replacing the 𝜎 parameter by
the ratio q = 𝜎/𝜇 not only provides easier means to handle the
parameters but also can be linked to some manufacturing quality

Reliability from Concept to Culture


aspects. Existing literature claims that, a high quality manufacturing
process should not provide q values that exceed 0,09. A q value of
0,20 shows a large variability in manufacturing and thus a low quality
process.
The graph below presents all the couples 𝑞, 𝜇 that satisfies the
reliability requirement. Any 𝑞, 𝜇 couple in the green area
characterizes a better than requirement product. Any 𝑞, 𝜇 couple in
the green area characterizes a lower than requirement product.

IMPORTANT REMARK: pending on the requirement definition, the


shape of the graph can change and the green-red areas can reverse.
Always pay attention to the meaning of each 𝑞, 𝜇 couple.
0.12
0.12
0.11
NOK
0.1
0.09
 1
W 6 0.08
q
Dr. Sorin Voiculescu

0.07
0.06 OK
0.05
0.04 0.04
1.1 1.2 1.3 1.4 1.5 1.6
 2
m
1.1 1.6
W6

Reference: web

INDU 6391 DR. SORIN VOICULESCU 240


MTBFMONTREAL.CA
LOGNORMAL DISTRIBUTION

The lognormal distribution is commonly used to model the lives of units


whose failure modes are of a fatigue-stress nature. Since this includes
most, if not all, mechanical systems, the lognormal distribution can
have widespread application. Consequently, the lognormal

Reliability from Concept to Culture


distribution is a good companion to the Weibull distribution when
attempting to model these types of units. It is used to determine
failure due to crack propagation, modeling material fatigue failures,
and material strength. As may be surmised by the name, the
lognormal distribution has certain similarities to the normal
distribution. A random variable is lognormally distributed if the
logarithm of the random variable is normally distributed.

Dr. Sorin Voiculescu

Reference: web

INDU 6391 DR. SORIN VOICULESCU 241


MTBFMONTREAL.CA
LOGNORMAL DISTRIBUTION

Reliability from Concept to Culture


There is no close form for the Reliability function

Dr. Sorin Voiculescu

Reference: web

INDU 6391 DR. SORIN VOICULESCU 242


MTBFMONTREAL.CA
EFFECT OF PARAMETERS ON LOGNORMAL
DISTRIBUTION

Reliability from Concept to Culture


Dr. Sorin Voiculescu

Reference: web

INDU 6391 DR. SORIN VOICULESCU 243


MTBFMONTREAL.CA
LOGNORMAL DISTRIBUTION PARAMETERS
REPRESENTATION

Both 𝜇, 𝜎 parameters can practically take values across the entire


0, ∞ range. Based on the normal distribution approach, it has been
2
noted that replacing the 𝜎 parameter by the ratio 𝑞 = μ ∗ 𝑒 𝜎 − 1
not only provides easier means to handle the parameters but also can

Reliability from Concept to Culture


be linked to some manufacturing quality aspects. Existing literature
claims that, a high quality manufacturing process should not provide q
values that exceed 0,09. A q value of 0,20 shows a large variability
in manufacturing and thus a low quality process.
The graph below presents all the couples 𝑞, 𝜇 that satisfies the
reliability requirement. Any 𝑞, 𝜇 couple in the green area
characterizes a better than requirement product. Any 𝑞, 𝜇 couple in
the green area characterizes a lower than requirement product.

IMPORTANT REMARK: pending on the requirement definition, the


shape of the graph can change and the green-red areas can reverse.
Always pay attention to the meaning of each 𝑞, 𝜇 couple.
0.12
0.12
0.11
NOK
0.1
0.09
 1
Dr. Sorin Voiculescu

W 6 0.08
q
0.07
0.06
OK
0.05
0.04 0.04
1.1 1.2 1.3 1.4 1.5 1.6
 2
m
1.1 1.6
Reference: web W6

INDU 6391 DR. SORIN VOICULESCU 244


MTBFMONTREAL.CA
REMEBER
Simple distributions used in reliability have either one parameter (exponential)
of two parameters.
Exponential model is a particular case of Weibull with 𝛽 = 1.
Simple distributions with two parameters have:
❑ one “shape” parameter: 𝛽 for Weibull and 𝜎 for Normal and Lognormal. This
parameter is under less control for a given technology and manufacturing line.

Reliability from Concept to Culture


Changes to this parameter are made generally by changing the failure mode:
choosing a different technology, different materials with different failure modes,
changing the manufacturing processes, or any other change that impacts the PoF
of the failure mode under discussion. Generally*, design changes without
considering any of the above, should not impact this parameter. For example, if
a design change intends to improve the reliability of an IC by reducing the
environmental temperature and that IC follows a WEIBULL of 𝛽 = 1,8, the
improved performance should still be modeled by a WEIBULL of 𝛽 = 1,8
For example, doubling the life of a bearing by using new materials but not
changing the failure mode: if the old bearing design failure mode (wear-out) is
modeled by a WEIBULL of shape parameter 𝛽 = 2,8 and 𝜂 = 10,000ℎ the new
one will be equivalent with doubling the scale parameter. This means that the new
design reliability will be modeled by a WEIBULL of shape parameter 𝛽 = 2,8
(same value as the old one as the failure mode did not change) and 𝜂 =
20,000ℎ (twice for the new design compared to the old design).

❑ one scale parameter: 𝜂 for Weibull and 𝜇 for Normal and Lognormal. Design
improvements (except technological changes, materials changes, manufacturing
process) should directly impact the scale parameter. This will allow us later to
model the accelerated testing. Dr. Sorin Voiculescu

* Disclaimer: Literature and references exists and support cases when shape
parameter changes for the same failure mode under different operating
conditions, but the approach is less practical to use. CSS stands for Changing
Shape and Scale approach. Historical data shows that, except for some specific
technologies, there is very low added value in using the CSS

INDU 6391 DR. SORIN VOICULESCU 245


MTBFMONTREAL.CA
MTBF FOR SAFETY

Though true only under exponential assumptions, most of the safety


engineers consider:

Reliability from Concept to Culture


If the exponential distribution is not demonstrated, then the safety
requirement is actually the probability of failing the last mission,
which is 1-R(t|T) (reliability of the last mission of duration t, last
mission starts at time T)

𝑅 𝑡+𝑇
Dr. Sorin Voiculescu

𝑝 = 1 − 𝑅 𝑡|𝑇 = 1 −
𝑅 𝑇

INDU 6391 DR. SORIN VOICULESCU 246


INDU 6391 DR. SORIN VOICULESCU 247
INDU 6391 DR. SORIN VOICULESCU 248
INDU 6391 DR. SORIN VOICULESCU 249
INDU 6391 DR. SORIN VOICULESCU 250
INDU 6391 DR. SORIN VOICULESCU 251
INDU 6391 DR. SORIN VOICULESCU 252
INDU 6391 DR. SORIN VOICULESCU 253
INDU 6391 DR. SORIN VOICULESCU 254
INDU 6391 DR. SORIN VOICULESCU 255
INDU 6391 DR. SORIN VOICULESCU 256
INDU 6391 DR. SORIN VOICULESCU 257
INDU 6391 DR. SORIN VOICULESCU 258
MTBFMONTREAL.CA
ASSIGNMENT – SAFETY TARGETS–
SAFETY REQUIREMENTS PER OPERATING
HOUR
The following is for information only (not required for exam).
Assignment was intended to highlight the importance of choosing the right
distribution function. In reality, the 1E-6 at the end of life is a durability
requirement for automotive industry but the designers have to meet,

Reliability from Concept to Culture


accordingly to ISO 26262 a failure rate of 1E-8 per operating hours for
ASIL D (worst case, loss of human life). Note that 1E-8 is larger than all the
value in red on the previous slide (so a less stringent requirement).

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 259


MTBFMONTREAL.CA
REMEMBER

Simple distributions with two parameters have:


❑ one “shape” parameter: 𝛽 for Weibull and 𝜎 for Normal and
Lognormal.

Reliability from Concept to Culture


under less control for a given technology and manufacturing line
Changes to this parameter: a different technology, different
materials, improving manufacturing processes, or any other change
that impacts the PoF of the failure mode under discussion
❑ one scale parameter: 𝜂 for Weibull and 𝜇 for Normal and
Lognormal.
Design improvements (except technological changes, materials
changes, manufacturing process) should directly impact the scale
parameter.
This will allow us later to model the accelerated testing.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 260


INDU 6391
LIFE LIMITED

DR. SORIN VOICULESCU


261
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
LIFE LIMITED

A hard time component is a component that requires a specific action


at a specific interval (overhaul, refurbishment, bench check, etc.) per
the manufacturers recommendations.
On-Condition (OC) is a preventive primary maintenance process that

Reliability from Concept to Culture


requires a system, component, or appliance be inspected periodically
or checked against some appropriate physical standard to determine
if it can continue in service. The standard ensures that the unit is
removed from service before failure during normal operation. These
standards may be adjusted based on operating experience or tests,
as appropriate, IAW a carrier's approved reliability program or
maintenance manual.

Condition Monitoring (CM) is a process for systems, components, or


appliances that have neither HT nor OC maintenance as their primary
maintenance process. It is accomplished by appropriate means
available to an operator for finding and solving problem areas. The
user must control the reliability of systems or equipment based on
knowledge gained by analysis of failures or other indications of
deterioration. Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 262


MTBFMONTREAL.CA
OBSERVED MTTF WHEN OPERATING WITH
HARD TIME

Intrinsic MTTF is equal to the area under the reliability graph.



𝑀𝑇𝑇𝐹 = න 𝑅 𝑡 𝑑𝑡

Reliability from Concept to Culture


0

When operating with hard time T, at time T the non-failed units are
set back to a state “as good as new” (by overhaul/maintenance or by
being replaced with new units). The intrinsic performance of the unit
does not change.
The observed MTTF is what the user notices in the field, based on the
cumulated operating time and observed number of failures.
Obviously, this observed MTTF is larger than the intrinsic one as, by
the action taken at hard time, the user renews the fleet (or equivalent
to renewal).

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 263


INDU 6391
GOAL SETTING

DR. SORIN VOICULESCU


Setting design targets

264
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
MTBFMONTREAL.CA
FROM PROGRAM TARGETS TO
RELIABILITY TARGETS

Reliability targets can be fixed in term of


❑ reliability value at time t
❑ failure rate value at a time t

Reliability from Concept to Culture


❑ Bx for a given X % value
❑ mission probability failure
❑ MTTF value for units running to failure
❑ MTBF value for life limited units
❑ Etc.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 265


MTBFMONTREAL.CA
EXCEL TOOL

INDU 691 offers a tool to easy decide model parameters targets


based on the reliability target and on the chosen model. There are 4
files available on vosorin.com.

Reliability from Concept to Culture


Step 1: select the appropriate file, based on the mathematical model
to be used
Step 2: understand the color coding.
➢ User can modify only the blue cells, intended to enter data.
➢ Tool output is listed in green cells.
➢ Yellow cells are general comments, intended to ease the use of the
tool.
Step 3: select the reliability target definition
Step 4: enter input data
Dr. Sorin Voiculescu

Step 5: click RUN (if button exists)


Step 6: read output data (model parameters value(s) )

INDU 6391 DR. SORIN VOICULESCU 266


MTBFMONTREAL.CA
EXCEL TOOL

Multiple options are embedded

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 267


MTBFMONTREAL.CA
EXPONENTIAL - CASE 1

Case 1: the reliability is measured by the maximum accepted failure


rate (T) at a given time T. For the exponential case, the value of T,
is of no importance as the failure rate is constant. The use of the tool
is of very low value as the input and the output are the same.

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 268


MTBFMONTREAL.CA
WEIBULL - CASE 1

Case 1: the reliability is measured by the maximum accepted failure


rate (T) at a given time T. For a given set of data, the tool
automatically provides all the couples 𝛽, 𝜂 that satisfy the condition
and computes the associated MTTF.

Reliability from Concept to Culture


Any product characterized by a 𝛽, 𝜂 couple situated in the red
area in the red area does not comply with the desired target.

OK
Dr. Sorin Voiculescu

NOK

INDU 6391 DR. SORIN VOICULESCU 269


MTBFMONTREAL.CA
NORMAL - CASE 1

Case 1: the reliability is measured by the maximum accepted failure


rate (T) at a given time T. For a given set of data, the tool
automatically provides all the couples 𝑞, 𝜇 that satisfy the condition
and computes the associated MTTF.

Reliability from Concept to Culture


Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.

NOK

OK
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 270


MTBFMONTREAL.CA
LOGNORMAL - CASE 1
Case 1: the reliability is measured by the maximum accepted failure
rate (T) at a given time T. For a given set of data, the tool provides
all the couples 𝑞, 𝜇 that satisfy the condition and computes the
associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area

Reliability from Concept to Culture


in the red area does not comply with the desired target.
Once entered the data, the user has to click on RUN

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 271


MTBFMONTREAL.CA
EXPONENTIAL - CASE 2

Case 2: the reliability is measured by the minimum reliability


accepted value R(T) at a given time T For a given set of data, the
tool provides the value for the 𝜆 parameter that satisfy the condition
and computes the equivalent MTTF.

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 272


MTBFMONTREAL.CA
WEIBULL - CASE 2

Case 2: the reliability is measured by the minimum reliability


accepted value R(T) at a given time T For a given set of data, the
tool automatically provides all the couples 𝛽, 𝜂 that satisfy the
condition and computes the associated MTTF.

Reliability from Concept to Culture


Any product characterized by a 𝛽, 𝜂 couple situated in the red
area in the red area does not comply with the desired target.

OK
Dr. Sorin Voiculescu

NOK

INDU 6391 DR. SORIN VOICULESCU 273


MTBFMONTREAL.CA
NORMAL - CASE 2

Case 2: the reliability is measured by the minimum reliability


accepted value R(T) at a given time T For a given set of data, the
tool automatically provides all the couples 𝑞, 𝜇 that satisfy the
condition and computes the associated MTTF.

Reliability from Concept to Culture


Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 274


MTBFMONTREAL.CA
LOGNORMAL - CASE 2
Case 2: the reliability is measured by the minimum reliability
accepted value R(T) at a given time T For a given set of data, the
tool provides all the couples 𝑞, 𝜇 that satisfy the condition and
computes the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area

Reliability from Concept to Culture


in the red area does not comply with the desired target.
Once entered the data, the user has to click on RUN

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 275


MTBFMONTREAL.CA
EXPONENTIAL - CASE 3

Case 3: the reliability is measured by the minimum time BX (X becomes


L for bearings) accepted for X% failed For a given set of data, the
tool provides the value for the 𝜆 parameter that satisfy the condition
and computes the equivalent MTTF.

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 276


MTBFMONTREAL.CA
WEIBULL - CASE 3

Case 3: the reliability is measured by the minimum time BX (X becomes


L for bearings) accepted for X% failed For a given set of data, the
tool automatically provides all the couples 𝛽, 𝜂 that satisfy the
condition and computes the associated MTTF.

Reliability from Concept to Culture


Any product characterized by a 𝛽, 𝜂 couple situated in the red
area in the red area does not comply with the desired target.

OK
Dr. Sorin Voiculescu

NOK

INDU 6391 DR. SORIN VOICULESCU 277


MTBFMONTREAL.CA
NORMAL - CASE 3

Case 3: the reliability is measured by the minimum time BX (X becomes


L for bearings) accepted for X% failed For a given set of data, the
tool automatically provides all the couples 𝑞, 𝜇 that satisfy the
condition and computes the associated MTTF.

Reliability from Concept to Culture


Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 278


MTBFMONTREAL.CA
LOGNORMAL - CASE 3
Case 3: the reliability is measured by the minimum time BX (X becomes
L for bearings) accepted for X% failed For a given set of data, the
tool provides all the couples 𝑞, 𝜇 that satisfy the condition and
computes the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area

Reliability from Concept to Culture


in the red area does not comply with the desired target.
Once entered the date, the user has to click on RUN

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 279


MTBFMONTREAL.CA
EXPONENTIAL - CASE 4

Case 4: the reliability is measured by the maximum accepted


probability p of failing a mission of tm when the unit is removed after
Th time. The removal time might be associated either with the end of
life or with a hard time (restauration, overhaul) that resets the unit to

Reliability from Concept to Culture


a state of “as good as new” For a given set of data, the tool
provides the value for the 𝜆 parameter that satisfy the condition and
computes the equivalent MTTF.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 280


MTBFMONTREAL.CA
WEIBULL - CASE 4
Case 4: the reliability is measured by the maximum accepted
probability p of failing a mission of tm when the unit is removed after
Th time. The removal time might be associated either with the end of
life or with a hard time (restauration, overhaul) that resets the unit to
a state of “as good as new” For a given set of data, the tool
automatically provides all the couples 𝛽, 𝜂 that satisfy the condition

Reliability from Concept to Culture


and computes the associated MTTF.
Any product characterized by a 𝛽, 𝜂 couple situated in the red
area in the red area does not comply with the desired target.

OK
Dr. Sorin Voiculescu

NOK

INDU 6391 DR. SORIN VOICULESCU 281


MTBFMONTREAL.CA
NORMAL - CASE 4
Case 4: the reliability is measured by the maximum accepted
probability p of failing a mission of tm when the unit is removed after
Th time. The removal time might be associated either with the end of
life or with a hard time (restauration, overhaul) that resets the unit to
a state of “as good as new” For a given set of data, the tool
automatically provides all the couples 𝑞, 𝜇 that satisfy the condition

Reliability from Concept to Culture


and computes the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.
Once entered the data, the user has to click on RUN

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 282


MTBFMONTREAL.CA
LOGNORMAL - CASE 4
Case 4: the reliability is measured by the maximum accepted
probability p of failing a mission of tm when the unit is removed after
Th time. The removal time might be associated either with the end of
life or with a hard time (restauration, overhaul) that resets the unit to
a state of “as good as new” For a given set of data, the tool
provides all the couples 𝑞, 𝜇 that satisfy the condition and computes

Reliability from Concept to Culture


the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.
Once entered the data, the user has to click on RUN

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 283


MTBFMONTREAL.CA
EXPONENTIAL - CASE 5

Case 5: the reliability is measured by the minimum accepted MTTF


value. For a given set of data, the tool provides the value for the 𝜆
parameter that satisfy the condition and computes the equivalent
MTTF.

Reliability from Concept to Culture


Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 284


MTBFMONTREAL.CA
WEIBULL - CASE 5
Case 5: the reliability is measured by the minimum accepted MTTF
value. For a given set of data, the tool automatically provides all the
couples 𝛽, 𝜂 that satisfy the condition and computes the associated
MTTF.
Any product characterized by a 𝛽, 𝜂 couple situated in the red

Reliability from Concept to Culture


area in the red area does not comply with the desired target.

OK Dr. Sorin Voiculescu

NOK

INDU 6391 DR. SORIN VOICULESCU 285


MTBFMONTREAL.CA
NORMAL - CASE 5
Case 5: the reliability is measured by the minimum accepted MTTF
value. For a given set of data, the tool automatically provides all the
couples 𝑞, 𝜇 that satisfy the condition and computes the associated
MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area

Reliability from Concept to Culture


in the red area does not comply with the desired target.
This case if of less interest to a Normal distribution as the parameter
𝜇 has the same value as the MTTF.

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 286


MTBFMONTREAL.CA
LOGNORMAL - CASE 5
Case 5: the reliability is measured by the minimum accepted MTTF
value. For a given set of data, the tool provides all the couples 𝑞, 𝜇
that satisfy the condition and computes the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.

Reliability from Concept to Culture


This case if of less interest to a Lognormal distribution as the
parameter 𝜇 has the same value as the ln(MTTF).

NOK

OK
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 287


MTBFMONTREAL.CA
EXPONENTIAL - CASE 6

Case 6: the reliability is measured by the minimum observed MTTF


value for a unit removed at Th > MTTF hard time. For a given set of
data, the tool provides the value for the 𝜆 parameter that satisfy the
condition and computes the equivalent MTTF.

Reliability from Concept to Culture


As the failure rate is constant, this case is of low interest.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 288


MTBFMONTREAL.CA
WEIBULL - CASE 6
Case 6: the reliability is measured by the minimum observed
MTTF/MTBF value for a unit removed at Th > MTTF hard time. For a
given set of data, the tool automatically provides all the couples
𝛽, 𝜂 that satisfy the condition and computes the associated MTTF.
Any product characterized by a 𝛽, 𝜂 couple situated in the red

Reliability from Concept to Culture


area in the red area does not comply with the desired target.

OK
Dr. Sorin Voiculescu

NOK

INDU 6391 DR. SORIN VOICULESCU 289


MTBFMONTREAL.CA
NORMAL - CASE 6
Case 6: the reliability is measured by the minimum observed
MTTF/MTBF value for a unit removed at Th > MTTF hard time. For a
given set of data, the tool automatically provides all the couples
𝑞, 𝜇 that satisfy the condition and computes the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area

Reliability from Concept to Culture


in the red area does not comply with the desired target.

NOK
Dr. Sorin Voiculescu

OK

INDU 6391 DR. SORIN VOICULESCU 290


MTBFMONTREAL.CA
LOGNORMAL - CASE 6
Case 6: the reliability is measured by the minimum observed MTTF/MTBF
value for a unit removed at Th > MTTF hard time. For a given set of data,
the tool provides all the couples 𝑞, 𝜇 that satisfy the condition and
computes the associated MTTF.
Any product characterized by a 𝑞, 𝜇 couple situated in the red area in
the red area does not comply with the desired target.

Reliability from Concept to Culture


Once entered the data, the user has to click on RUN

NOK

OK
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 291


MTBFMONTREAL.CA
FROM FIELD MTBF TO WEIBULL
PARAMETERS

The last file, “TOOL01_Requirement_5_FIELD_to_Weibull” introduces


means to obtain the equivalent Weibull parameters from:
❑ field MTBF

Reliability from Concept to Culture


❑ predictions failure rate
❑ expected % of failure at specific time
Sheet “field MTBF to Weibull”
This tool uses field performance (and some assumptions) to provide the
WEIBULL ETA (scale) parameter.
User is supposed to know the number of cumulated operating
hours/cycles over the last period of time (for simplicity, 1 period of
time = 1 year), number of failures, BETA (shape parameter) value, as
well as the number of units and their age running during the last
period of time.

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 292


MTBFMONTREAL.CA
FROM FIELD MTBF TO WEIBULL
PARAMETERS

Sheet “Predicted FR to Weibull”


This tool provides the ETA parameter of the WEIBULL distribution that
satisfies the average FR entered by the user under the assumption of a
given BETA (shape parameter) value. User is also asked to enter the

Reliability from Concept to Culture


average expected operating time per year. (FYI: the tool predicts the
FR evolution over the first 5 years and looks for the ETA value that
makes the average WEIBULL FR equal to predicted FR)

Sheet “Failure % to Weibull”


This tool provides the BETA and ETA parameters required to meet a
specific % of failures at 2 moment in time (years or time periods). User
is also asked to enter the average expected operating time per year. Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 293


MTBFMONTREAL.CA
WOLFRAM – FREE COMPUTATIONAL
TOOL
https://www.wolframalpha.com/input/ to plot Reliability

❑Exponential: Plot(1-CDF[ExponentialDistribution[], t],{t,0,value})

❑Normal: Plot( 1-CDF[NormalDistribution[µ, s], t] ,{t,0,value})


❑LogNormal: Plot( 1-CDF[LogNormalDistribution[µ, s], t] ,{t,0,value})

Reliability from Concept to Culture


❑Weibull: Plot( 1-CDF[WeibullDistribution[b, h], t] ,{t,0,value})

https://www.wolframalpha.com/input/ to plot PDF

❑ Exponential: Plot(PDF[ExponentialDistribution[], t],{t,0,value})

❑ Normal: Plot( PDF[NormalDistribution[µ, s], t] ,{t,0,value})


❑ LogNormal: Plot(PDF[LogNormalDistribution[µ, s], t] ,{t,0,value})

❑ Weibull: Plot(PDF[WeibullDistribution[b, h], t] ,{t,0,value})

Multiple plots example:


Plot({PDF[ExponentialDistribution[0.021], t], PDF[WeibullDistribution[2,1/0.041], t]} , {t,0,100})
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 294


MTBFMONTREAL.CA
WOLFRAM – FREE COMPUTATIONAL
TOOL ∞
𝑀𝑇𝑇𝐹 = න 𝑅 𝑡 𝑑𝑡
0

https://www.wolframalpha.com/input/?i=integrate for MTTF/MTBF


(run to failure)
❑Normal
❑Function to integrate 1-CDF[NormalDistribution[µ, s], t]

Reliability from Concept to Culture


❑Variable : t
❑Lower limit: 0
❑Upper limit: ∞

❑LogNormal
❑Function to integrate 1-CDF[LogNormalDistribution[µ, s], t]
❑Variable : t
❑Lower limit: 0
❑Upper limit: ∞

❑Weibull
❑Function to integrate 1-CDF[WeibullDistribution[b, h], t]
❑Variable : t
❑Lower limit: 0
❑Upper limit: ∞
Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 295

You might also like