RPP 2021 - 01 Intro - Student

MTBFMONTREAL.
CA
A GOAL WITHOUT A PLAN…
IS JUST A WISH
Reliability engineering is an engineering field that deals with the

study, evaluation, and life-cycle management of reliability: the
probability of a system or component to perform its
intended function (w/o failure) under stated conditions for a
specified period of time.
Reliability from Concept to Culture
RELIABILITY AND MAINTENANCE

PROGRAM
Dr. Sorin Voiculescu
Intro
FOR DESIGN AND
MANUFACTURING
MTBFMONTREAL.CA
ISBN-13: 978-1607730606
ISBN-10: 160773060X
REFERENCE 1
❑ Title of book: 50 ways to improve

product reliability
❑ Author: Mike Silverman

❑ ISBN#: 978-1607730606
INDU 6391 DR. SORIN VOICULESCU 2

MTBFMONTREAL.CA
REFERENCE 2
(REQUIRED)
❑ Title of book: Reliability Engineering

❑ Author: K.C. Kapur, M. Pecht
❑ ISBN#: 978-1607730606

Reliability Engineering presents an integrated approach to the design,
engineering, and management of reliability activities throughout the life
cycle of a product, including concept, research and development, design,
manufacturing, assembly, sales, and service. Containing illustrative guides
that include worked problems, numerical examples, homework problems, a
solutions manual, and class-tested materials, it demonstrates to product
development and manufacturing professionals how to distribute key
reliability practices throughout an organization.
The authors explain how to integrate reliability methods and techniques in
the Six Sigma process and Design for Six Sigma (DFSS). They also discuss
relationships between warranty and reliability, as well as legal and
liability issues. Other topics covered include:
• Reliability engineering in the 21st Century
• Probability life distributions for reliability analysis
• Process control and process capability
• Failure modes, mechanisms, and effects analysis
• Health monitoring and prognostics
• Reliability tests and reliability estimation

Reliability Engineering provides a comprehensive list of references on the
topics covered in each chapter. It is an invaluable resource for those
interested in gaining fundamental knowledge of the practical aspects of
reliability in design, manufacturing, and testing. In addition, it is useful for
implementation and management of reliability programs.
MTBFMONTREAL.CA
REFERENCE 3
(OPTIONAL)

❑ Title of book: Reliability Engineering and Risk Analysis: A Practical
Guide, Third Edition
❑ Author: M. Modarres, M. P. Kaminskiy, V. Krivtsov
❑ISBN#: 1498745873
This undergraduate and graduate textbook provides a practical and

comprehensive overview of reliability and risk analysis techniques.
Written for engineering students and practicing engineers, the book is
multi-disciplinary in scope. The new edition has new topics in classical
confidence interval estimation; Bayesian uncertainty analysis; models
for physics-of-failure approach to life estimation; extended
discussions on the generalized renewal process and optimal
maintenance; and further modifications, updates, and discussions. The
book includes examples to clarify technical subjects and many end of

chapter exercises.

MTBFMONTREAL.CA
ISBN-13: 978-0873898379
ISBN-10: 0873898370
REFERENCE 4
(OLD REFERENCE)
❑ Title of book: The certified reliability

engineer handbook
❑ Author: D. W. Benbow, H. W. Broome

ISBN#: 978-0873898379
❑ Edition: 2nd
The structure of this book is based on that of the Body of

Knowledge specified by ASQ for the Certified Reliability
Engineer, which includes design review and control;
prediction, estimation, and apportionment methodology;
failure mode effects and analysis; the planning, operation
and analysis of reliability testing and field failures,
including mathematical modeling; understanding human
factors in reliability; and the ability to develop and
administer reliability information systems for failure

analysis, design and performance improvement and
reliability program management over the entire product
life cycle.

MTBFMONTREAL.CA
REFERENCE 5
(OLD REFERENCE)
❑ Title of book: Handbook of Reliability,

Availability, Maintainability and Safety in
Engineering Design
❑ Author: R. F. Stapelberg

ISBN#: 978-1-84800-174-9
In the past two decades, industry—particularly the process

industry—has witnessed the development of several large
‘super-projects’, most in excess of a billion dollars. These large
super-projects include the exploitation of mineral resources such
as alumina, copper, iron, nickel, uranium and zinc, through the
construction of huge complex industrial process plants. Although
these super-projects create many, thou-sands of jobs resulting in
a significant decrease in unemployment, especially during
construction, as well as projected increases in the wealth and
growth of the economy, they bear a high risk in achieving their
forecast profitability through maintaining budgeted costs. Most
of the super-projects have either exceeded their budgeted
establishment costs or have experienced operational costs far
in excess of what was originally estimated in their feasibility
prospectus scope. This has been the case not only with projects
in the process industry but also with the development of
infrastructure and high-technology projects in the petroleum
and defense industries. The more significant contributors to the
cost ‘blow-outs’ experienced by these projects can be
attributed to the complexity of their engineering design, both in

technology and in the complex integration of systems. These
systems on their own are usually adequately designed and
constructed, often on the basis of previous similar, though
smaller designs.

MTBFMONTREAL.CA
WHAT WILL INDU 6391 PRESENT?
Definition phase:
- benchmarking,
- reliability by similarity (similar products performance analysis)

- reliability metrics appropriate to project
- establish targets / break-down targets / write contracts
- lessons learned implementation
Design phase:
- evaluate design capabilities by reliability predictions
- FMEA (failure mode and effect analysis),
- fault trees, physics of failure
- HALT for design (identify weakness and increase the final design’s
reliability by early testing)

MTBFMONTREAL.CA
WHAT WILL INDU 6391 PRESENT?
Validation phase:
- reliability by similarity (taking into account design changes)
- reliability growth

- testing (when no similarity is possible)
- requirements validation testing
- life testing
- accelerated life testing
- endurance testing
- HALT for reliability (modern tool)
Manufacturing phase:
- reduce the risks potentially induced by the production line using ESS testing
Operational phase:
- FRACAS (follow-up field performance)
- early trends detection and corrective actions
- real-time health monitoring
- optimize the maintenance tasks
- maintenance models
- reliability centered maintenance
- MSG3

INDU 6391
INDU 6391ESSENTIAL
DR. SORIN VOICULESCU

9
Dr. Sorin Voiculescu Reliability from Concept to Culture MTBFMONTREAL.CA
INDU 6391
RELIABILITY VS. QUALITY

10
MTBFMONTREAL.CA
RELIABILITY VS. QUALITY
The everyday usage term "quality of a product" is loosely taken to mean

its inherent degree of excellence. In industry, this is made more precise by
defining quality to be "conformance to requirements at the start of use".
Assuming the product specifications adequately capture customer
requirements, the quality level can now be precisely measured by the
fraction of units shipped that meet specifications.

But how many of these units still meet specifications after a week of
operation? Or after a month, or at the end of a one-year warranty
period? That is where "reliability" comes in. Quality is a snapshot at the
start of life and reliability is a motion picture of the day-by-day
operation. Time zero defects are manufacturing mistakes that escaped
final test. The additional defects that appear over time are "reliability
defects" or reliability fallout.
The quality level might be described by a single fraction defective. To
describe reliability fallout a probability model that describes the fraction
fallout over time is needed. This is known as the life distribution model.
From an operating point of view: Reliability is the quality degradation

over time
Operational reliability of a product is highly influenced by quality
❑ in manufacturing
❑ of components
❑ of storage and transport
❑ of processes used in design

❑ of the user
Reference: WEB

INDU 6391
ENGINEERING INTEGRITY
RAMS

12
MTBFMONTREAL.CA
WHY INVEST IN A PLAN WHEN WE HAVE A
SIGNED CONTRACT?
For a product that entries into service in 2020, obsolescence 10 years

Contractual MTTF (mean time to failure, average operating time) =
15 years

Let’s suppose actual MTTF = 5 years (reliability not met)
Operating time to realize that the contractual value cannot be met: 3
years
Calendar time to understand root cause : 0.5 years
Calendar time to negotiate (argue) with Supplier: 0.5 years
Calendar time to bring a corrective action and to certify: 1 year
Calendar time to retrofit: 1 year
Updated product after 6 years

Above values are fictive numbers intended to highlight the need of a reliability program plan.

MTBFMONTREAL.CA
BARRIERS IN IMPLEMENTING THE
RELIABILITY


INDU 6391
15
INDU 6391
16
INDU 6391
17
INDU 6391
18
INDU 6391
19
INDU 6391
20
INDU 6391
21
INDU 6391
22
INDU 6391
23
INDU 6391
24
INDU 6391
25
INDU 6391
26
INDU 6391
27
INDU 6391
28
INDU 6391
29
INDU 6391
30
MTBFMONTREAL.CA
EXAMPLE ON HONEYWELL TRUE-STEAM

Disclaimer: This example is only intended to present a specific problem on a particular product of Honeywell. It is not intended to harm n anyway the good image of Honeywell. Remember: there is an improvement
potential in any product of any Company. Honeywell removed the product from the market.

INDU 6391
32
MTBFMONTREAL.CA
RELIABILITY PLACE IN THE PYRAMID OF
NEEDS
Companies need to secure a design first (something to sale), to make

it safe, attractive to customers (most of the cases this means not
expensive), to assess quality and, last (but not especially least),
reliability.


INDU 6391
34
MTBFMONTREAL.CA
Reliability
Failure
SOME DEFINITIONS Risk

Types of Reliability
System break-down
FTA (basics)

MTBFMONTREAL.CA
RELIABILITY DEFINITION
ability of an item to perform its required function under stated
conditions for a specified period of time.

Suppose that T is the random time-to-failure of an unit. We say also t
hat T is a hard or traumatic failure.
R(t ) = Punit does not fail over 0,t   = PT  t 
R(t)
1
0 t
Life time
Conditions
time to failure
physical, chemical, mechanically, stresses…
numbers of cycles to failures
Reliability engineering relies heavily on statistics and probability theory

MTBFMONTREAL.CA
RELIABILITY DEFINITION
ability of an item to perform its required function under stated
conditions for a specified period of time.

R(t)
1
0
t
Only 2 of the 5 items contributing to the definition can be drawn on a

2D graph.
If any of these 5 items changes, the reliability changes. That is why, in
order to assess/estimate/study the reliability, one needs to have

clearly defined ALL of the 5 items.

MTBFMONTREAL.CA
KEY COMPONENTS
❑ Item
Item is what we are studying and embeds:
Components: components quality plays an important role

Design: e.g.: high temperature spots will reduce the long time
performance (will be detailed in upcoming lectures)
Design margins: robustness of a design plays also an important role
Manufacturing process: e.g. hand soldering provides less reliable
results than automated one
Technology: e.g. lead soldering provides different performance
against lead-free
Transport, storage, installation, etc.
❑Probability
A probability of 51% means that there is a 49% chances that the
conclusion is incorrect based on the data.
The probability generally is translated into the reliability target (e.g.
x% surviving at time t, PPM under warranty, cost of operating over a

specified time period, etc.)
Reliability is a positive number within 0 to 1 range.

MTBFMONTREAL.CA
KEY COMPONENTS
❑ Required function
This should be defined for every part, subassembly, and product. The
statement of the required function should explicitly state or imply a
failure definition. For example, a pump's required function might be

moving at least 20 gallons per minute. The implied failure definition
would be moving fewer than 20 gallons per minute.
When defining the function one has to have a clear definition of the
failure mode. If one function can fail in multiple ways (e.g. multiple
failure modes) then each failure mode has a reliability function. The
unit’s function reliability will be a combination of each individual
reliabilities.
❑ Stated conditions
These include: environmental conditions, maintenance conditions, usage
conditions, storage and moving conditions, possibly others
❑ Specified period of time
DO NOT mix calendar time and TIME as a measure of functioning
(operating hours, calendar time, cycles, km, miles, etc.). See the
example on business vs. commercial aircrafts utilisation given in class.

MTBFMONTREAL.CA
EXAMPLE IN CLASS

Electronic “on” time (in hours) versus flight hours:

MTBFMONTREAL.CA
COMPARISON RESULTS
Reliability cannot predict the exact time to failure of an unit. It always
deals with a population. Reliability provides units to measure the
performance of the population.
One of the measuring units is the MTBUR.
MTBUR = Mean time between unscheduled removals is the average
expected time between two consecutive removals.

That means that, for an MTBUR of 18,000 (operating hours), on
average, one failure occurs for each 18,000 cumulated operating
hours (under the assumption that all units operate identically, a fleet
of 10 units will cumulate 18,000 hours if each operates 1,800 ; a
fleet of 180 units will cumulate 18,000 hours when each operates
100 hours).

MTBFMONTREAL.CA
MANAGEMENT’S RELIABILITY
MEASUREMENT
Reliability is not performed for the sake of reliability. It is a mean to

achieve other targets:
❑ Safety

❑ Catastrophes
❑ Image (media impact)
❑ Availability
❑ Dispatch interruption rate
❑ Mission interruption rate
❑ Warranty
❑ Scheduled maintenance cost
❑ Life cycle cost
❑ Aftermarket
❑ Marketing
❑ Liability
❑ USD, CAD, EUR, YEN, etc.

MTBFMONTREAL.CA
STATISTICS AND RELIABILITY
Failures do not happen at fixed times; they occurs randomly based on

a distribution.

The PDF is the basic description of the time to failure of an item.
All other functions related to an item’s reliability can be derived from
the PDF
Reference: Z. Klim

MTBFMONTREAL.CA
The cumulative distribution function of a real-valued random variable

X is the function given by

where the right-hand side represents the probability that the random
variable X takes on a value less than or equal to x. The probability
that X lies in the semi-closed interval (a, b], where a < b, is
therefore
Probability that the value of the random variable T is less than or

equal to “t” is defined as the cumulative probability function

MTBFMONTREAL.CA
R(t) + F(t) = 1

Reference: Z. Klim

MTBFMONTREAL.CA
FAILURE
Failure is the state or condition of not meeting a desirable or

intended objective, and may be viewed as the opposite of success.
Product failure ranges from failure to sell the product to fracture of
the product, in the worst cases leading to personal injury, the province
of forensic engineering.

The criteria for failure are heavily dependent on context of use, and
may be relative to a particular observer or belief system. A situation
considered to be a failure by one might be considered a success by
another, particularly in cases of direct competition or a zero-sum
game. Similarly, the degree of success or failure in a situation may be
differently viewed by distinct observers or participants, such that a
situation that one considers to be a failure, another might consider to
be a success, a qualified success or a neutral situation.
It may also be difficult or impossible to ascertain whether a situation
meets criteria for failure or success due to ambiguous or ill-defined
definition of those criteria. Finding useful and effective criteria, or
heuristics, to judge the success or failure of a situation may itself be a
significant task.
Failure can be differentially perceived from the viewpoints of the
evaluators. A person who is only interested in the final outcome of an
activity would consider it to be an Outcome Failure if the core issue
has not been resolved or a core need is not met. A failure can also be
a process failure whereby although the activity is completed
successfully, a person may still feel dissatisfied if the underlying
process is perceived to be below expected standard or benchmark.
Reference: WIKIPEDIA

MTBFMONTREAL.CA
Intermittent Nuisances Degradation
Sudden Partial
Status
Function
FAILURE Degraded
Failed
Time
Intermittent Nuisances
Failure is an event which causes the system performance to deviate
from the specified performance

The termination of the ability of an item to perform its required
function
❑ Fault is an erroneous state of system hardware or software
❑ Error is the manifestation of a fault
Failure classification
❑ Random failure: no apparent root cause
❑ Active failure: is evident at the moment of occurrence. It may either
produce immediately an observable deterioration in the system
performance (self evident) or the system deterioration is not
observable but the failure is indicated by the monitoring system
❑ Dormant (latent) failure: it s not immediately observable at the
moment of occurrence. It produces no immediately observable effect
on the system performance. There is not indicated by the monitoring
system
❑ Independent failure: the occurrence of a failure does not affect the
probability of the second one
❑ Common mode failure: is an event having a single external cause

with multiple failure effects, which are not consequences of each other
❑ Cascading failures: a single event, not necessarily hazardous in
itself, can precipitate a series of other failures
MTBFMONTREAL.CA
FAILURE
Based on popular belief (which is not quite wrong), in order to make
sure the true root-cause is understood, on e should ask up to 7 times
the question WHY.
E.g. the car failed. WHY? (1). Because it does not start anymore.
WHY? (2). Because it does not turn the starter. WHY? (3). Because
there is no electrical power. WHY? (4). Because the power is off.
WHY? (5). Because the battery is dead. WHY? (6). Because it is a

very cold morning. WHY? (7) – this time why means why does it not
start in a cold morning -. Because Cold weather is often fingered as the
culprit when car batteries die, but actually warm temperatures do the
most damage to them. High temperatures quicken corrosion of internal
plates and vaporize the electrolyte faster. But car batteries usually go
dead in cold weather mostly because damage done during the summer
doesn’t show up until the battery is more taxed. A cold battery has
reduced cranking power, and cold temperatures thicken motor oil,
making it harder to turn the engine over1.
The X WHY technique is generally combined with the octopus
approach, where the engineer puts himself in the place of the device
(octopus) and tries to imagine what the product (in this case himself)
lives in different states. It has to consider vibration, temperature,
extreme operation conditions, corner envelope usage, power
variations, day/night, summer/winter, vibration, thermal cycling, high
or low temp, humidity, corrosive agents, dust/pollution, jamming
factors, forces of any kind, cosmic radiations, etc. etc. etc.
1 https://www.consumerreports.org/cro/news/2009/11/q-a-why-do-car-batteries-die-in-winter/index.htm

MTBFMONTREAL.CA
PHYSICAL BREAKDOWN
A breakdown is always related to a design or a real product, of

which it is a breakdown. It is identified and versioned as an object in
its own rights. It has a number of constituents (breakdown elements),
often structured hierarchically, that makes up the breakdown

structure.
Reference: WEB

MTBFMONTREAL.CA
FUNCTIONAL BREAKDOWN
A functional block diagram in systems engineering and software

engineering is a block diagram, that describes the functions and
interrelationships of a system.
The functional block diagram can picture:

❑ functions of a system pictured by blocks
❑ input and output elements of a block pictured with lines, and
❑ the relationships between the functions
❑ the functional sequences and paths for matter and or signals
Reference: WEB

MTBFMONTREAL.CA
BREAKDOWN
In order to assess the impact of the failure on the top level function,
one must assess the link between these two. A typical breakdown
represents the link starting from the functional level, going down to
technical functions ensuring the upper function and linking these
technical functions to the physical piece-part/component/installation

involved.
For example, a screen has the function to display the information, one
of it’s technical functions is the power supply and one of the
components is the 110 plug.
Similar for a simple bicycle, below is the breakdown:
Reference: WEB

MTBFMONTREAL.CA
TYPICAL BREAKDOWN
Below is the typical break-down to be considered.
For very complex systems, operational function might be split in
multiple layers. Also, especially in electronics, do not ignore the
potential contribution of one technical function to multiple system
functions as well as the potential contribution of one piece part to
multiple technical functions. The technical function power supply of a

modern screen impacts both the function display information (image)
on the screen as well as the function acquire images with the integrated
camera.
Note that for very simple systems, operational functions might be
directly related to technical functions and these might be composed
by one single installation/component/piece-part level.
Even for very simple systems, do not skip breaking-down the system
into operational and then technical (sub)functions as this exercise
might reveal interconnections that would be missed otherwise.
Reference: WEB

MTBFMONTREAL.CA
DYSFUNCTIONAL ASSESSMENT
Reliability definition involves working with failures. In order to address

the impact of the failure, one must add a supplementary layer on the
functional break-down, layer that represents the definitions of the
failures of the system functions, technical functions as well as failures

of the piece-part/component/installation. The overall interaction of
these failures is essential in evaluating the impact of a low level
failure (e.g. a resistor) at technical function level (loss of power
supply) system level (e.g. loss of function display information on the
screen).
Generally the technique of “THE x WHY?” implies that, in order to
assess the root cause of a high system level failure, one has to ask
multiple times “WHY?”. The approach involves people outside
reliability engineering, e.g. maintenance, sales, customers, weight,
design, logistic, management, program, etc.
Why would the failure occur? Because the system would not react.
Why would the system not react? Because the …. Why would….
At this point of the course, all that’s requested is to:
❑ consider the functional assessment
❑ define failure modes fore each system function, technical function
and piece-part/component/installation
Two of the most popular tools used to assess the impact and the link
between these failures are referenced in the following slides. Specifics
on these tools will be the topic of a future lecture.

MTBFMONTREAL.CA
EXAMPLE OF FAILURE DEFINITION

Reference: WEB

INDU 6391
Reference:

64
INDU 6391
Reference:

65
INDU 6391
Reference:

66
INDU 6391
Reference:

67
INDU 6391
68
MTBFMONTREAL.CA
RISK
Failure = system is not performing the intended function
We’d like to know :

- when the first failure will occur ?

- how long will it take between two consecutive failures
hours
- 1st failure
km / miles
number of cycles - Between 2 consecutive failures
Answer: time to failure is a random variable
 one cannot provide the precise time of arrival but can give the probability of
occurrence before a certain date
One cannot say « failure will occur at 80 months », but:

« there is x% chances that failure installs before 84 months »
risk
what is the risk level we’re accepting ?


MTBFMONTREAL.CA
RISK
Risk is potential of losing something of value. Values (such as physical
health, social status, emotional well-being or financial wealth) can be
gained or lost when taking risk resulting from a given action or inaction,
foreseen or unforeseen. Risk can also be defined as the intentional
interaction with uncertainty. Uncertainty is a potential, unpredictable,
unmeasurable and uncontrollable outcome; risk is a consequence of
action taken in spite of uncertainty.

Risk perception is the subjective judgment people make about the
severity and probability of a risk, and may vary person to person. Any
human endeavor carries some risk, but some are much riskier than others.
The risk is a measure of a danger which puts together the measure of

occurrence of the unwanted event and the measure of the consequences
of this event
Severity
Probability RISK
Reference: WIKIPEDIA

MTBFMONTREAL.CA
RISKS
❑ safety
❑ media impact
❑ availability

❑ mission interruption
❑ scheduled maintenance cost
❑ life cycle cost
❑ aftermarket
❑ warranty
❑ marketing
❑ liability
❑ program cost
❑ company’s reputation
Reliability impacts the risk throughout it’s probability value.

MTBFMONTREAL.CA
EXAMPLE OF STANDARDS:
INTERNATIONAL ELECTROTECHNICAL
COMMISSION 6158


INDU 6391
EXAMPLES

73
MTBFMONTREAL.CA
TYPES OF RELIABILITY
❑ Design Reliability
The design reliability of a product is the predicted reliability performance of
the product at the end of the development phase.
The prediction may be based on field experience from similar products, testing,

expert judgment, and various types of analysis.
The prediction is based on nominal environmental and operational conditions
used during the design process.
❑ Inherent Reliability
The reliability of the products produced will tend to differ from the design
reliability due to quality variations.
The variations result from some of the components not conforming to the design
specification and/or assembly errors.
The reliability of produced items is often referred to as the inherent reliability.
❑Field (Operational) Reliability
The field reliability is the reliability of the product subsequent to the sale of the
product.
The field reliability is calculated based on recorded failures and malfunctions.
The field reliability is also called the actual reliability.
Very often, the field reliability of a product differs from the design reliability
due to environmental and operational conditions varying from customer to

customer and differing from the nominal values used in the design process.
It also depends on the maintenance actions carried out by the customers during
the use of the product.
Reference: Z. Klim

MTBFMONTREAL.CA
RMAS & PROGRESSION OF ESTIMATES
Manufacturer User
Real characteristics
Intrinsic characteristics Pending on usage and maintenance

Intrinsic reliability Customer support
Intrinsic maintainability Maintenance
Intrinsic availability Maintenance support
Operational availability
Progression of estimates
MTBFMONTREAL.CA
TYPES OF RELIABILITY
❑ Design Reliability
The design reliability of a product is the predicted reliability performance of
the product at the end of the development phase.

The prediction may be based on field experience from similar products, testing,
expert judgment, and various types of analysis.
The prediction is based on nominal environmental and operational conditions
used during the design process.
❑ Inherent Reliability
The reliability of the products produced will tend to differ from the design
reliability due to quality variations.
The variations result from some of the components not conforming to the design
specification and/or assembly errors.
The reliability of produced items is often referred to as the inherent reliability.
❑ Field (Operational) Reliability
The field reliability is the reliability of the product subsequent to the sale of the
product.
The field reliability is calculated based on recorded failures and malfunctions.
The field reliability is also called the actual reliability.
Very often, the field reliability of a product differs from the design reliability
due to environmental and operational conditions varying from customer to
customer and differing from the nominal values used in the design process.
It also depends on the maintenance actions carried out by the customers during
the use of the product.
Reference: Z. Klim

INDU 6391
MAINTENANCE AND RISK

77
INDU 6391
Reference: web
RISK RATING EXAMPLE

78
MTBFMONTREAL.CA
PROBABILITY OF OCCURRENCE
❑ It is the mathematical measure of the risk.

❑ It is measured in values between 0 and 1
❑ It is linked to a time period measured in the same unit of measure

as the reliability is (e.g. operating hour, km, etc.)
E.g. the risk of derailment of a train car is 7.8 derailments per billion
freight car-miles (FCM)
This is equivalent to 7.8E-9 / mile per car per mile.
For a 1,500 miles distance (Montreal to Orlando), this probability
becomes 1.17E-5 / mission
1.175E-5 = 7.8E-9 * 1,500
For a life of 10 years and 180 missions a year, the probability
becomes 2.8E-2*
*Under the assumption of a constant probability and of no

maintenance action taken

MTBFMONTREAL.CA
MAINTENANCE AND RISK
The maintenance action, ideally, reduces the risk value to it’s original
value (as good as new).
Risk-based maintenance (RBM) prioritizes maintenance resources
toward assets that carry the most risk if they were to fail. It is a

methodology for determining the most economical use of maintenance
resources. This is done so that the maintenance effort across a facility
is optimized to minimize any risk of a failure.
A risk-based maintenance strategy is based on two main phases:
1. Risk assessment
2. Maintenance planning based on the risk
The maintenance type and frequency are prioritized based on the
risk of failure. Assets that have a greater risk and consequence of
failure are maintained and monitored more frequently. Assets that
carry a lower risk are subjected to less stringent maintenance
programs. Implementing a risk-based maintenance process means that
the total risk of failure is minimized across the facility in the most
economical way.

Reliability MTBFMONTREAL.CA
MTTF
MTBF
BASIC METRICS MTBUR

Maintainability
Availability
Failure rate

MTBFMONTREAL.CA
« RELIABILITY » AS METRIC
Reliability engineering is an engineering field that deals with
the study, evaluation, and life-cycle management of reliability:
the ability of a system or component to perform its required
function under stated conditions for a specified period of time.

R(t) 1
R(tfix)
0 tfix t
For a given value of time tfix, one can compute the Reliability of a
system R(tfix) and express it as a fixed value. Often the Industry
uses a statement like “Reliability of 93%”; such statement always
involves a fix time.
Example of tfix: warranty time, 100,000km / 75K miles, A-check /

C-check, 15 years (end of life), etc.
Institute of Electrical and Electronics Engineers (1990) IEEE Standard Computer Dictionary: A Compilation
of IEEE Standard Computer Glossaries. New York, NY ISBN 1-55937-079-3

MTBFMONTREAL.CA
COMPONENT DEFINITION
For reliability purposes, component refers to the lowest part/LRU a

system is broken in.
A chair in a space elevator for example is the component (LRU = line
replaceable unit) if a failure involves the complete removal and

replacement of the chair.
The same chair can be broken into parts in the service center if the
design and the manuals allow the maintenance team. E.g. the chair’s
controller (computer) that actuates the actuator becomes the
component for the maintenance team as this computer is the part they
remove and replace in order to fix the failure “actuator not working”.
The controller can be split into piece-parts in the Supplier’s shop if
they can fix the failure by replacing a specific piece-part. The piece-
part is the component for the Supplier.

INDU 6391
84
INDU 6391
85
INDU 6391
86
INDU 6391
87
INDU 6391
88
INDU 6391
89
INDU 6391
90
INDU 6391
91
INDU 6391
92
INDU 6391
93
INDU 6391
94
INDU 6391
95
INDU 6391
96
INDU 6391
97
MTBFMONTREAL.CA
SYSTEM RELIABILITY
C1
C2

n
R parallel = 1 −  (1 − Ri )
i =1

MTBFMONTREAL.CA
PARALLEL SYSTEM M OUT OF N
In an m out of n parallel configuration, the system is performing its

intended function if m out of a total of n components are operational.

C1
C2
Ci m/n
Cn
For identical components of reliability 𝑟𝐶 𝑡 , the system’s reliability is:

𝑅𝑆 𝑡 = 𝑃𝑟𝑜𝑏 𝑚 𝑜𝑢𝑡 𝑜𝑓 𝑛 𝑜𝑝𝑒𝑟𝑎𝑡𝑒 =
𝑛! 𝑘 𝑛−𝑘
= σ𝑛𝑘=𝑚 ∗ 𝑟𝐶 𝑡 ∗ 1 − 𝑟𝐶 𝑡
𝑘! 𝑛−𝑘 !

INDU 6391
100
INDU 6391
101
INDU 6391
102
INDU 6391
103
INDU 6391
104
INDU 6391
105
INDU 6391
106
INDU 6391
107
INDU 6391
108
INDU 6391
109
INDU 6391
110
INDU 6391
111
INDU 6391
112
INDU 6391
113
MTBFMONTREAL.CA
Up to 19.3.2 (standby systems) - excluded

INDU 6391
115
INDU 6391
116
INDU 6391
117
INDU 6391
118
INDU 6391
120
MTBFMONTREAL.CA
ACHIEVING RELIABILITY THROUGH
REDUNDANCY
Redundancy can only be used when the functional design of the system
allows for the incorporation of replicated components. It is used extensively
in electronic products to achieve high reliability when individual
components have unacceptably low reliability. Building in redundancy
corresponds to using a module consisting of (M) replications of a
component.
The number of replications needed depends on the actual and the

allocated reliability. The reliability increases as the number of replicated
components (M) increase (see figure). The decision regarding the use of
redundancy has implications for production cost and must take into account
other constraints such as weight and/or volume. We need to ensure that
these constraints are not violated.
The manner in which the replicates are put to use depends on the type of
redundancy:
❑ In active redundancy, all (M) components of the module are in their
operational state, or “fully energized,” when put into use.
❑ In passive redundancy, only one component is in its fully energized state
and the remaining are either partially energized (warm standby) or kept
in reserve and energized when put into use (cold standby)

If all components in the module have failed, then the module has failed
Reference: Z. Klim

MTBFMONTREAL.CA
WEBTOOL
http://www.ecs.umass.edu/ece/koren/FaultTolerantSystems/simulator
/NonSerPar/nsnpframe.html


INDU 6391
EXAMPLE OF USE

123
MTBFMONTREAL.CA
TIME TO FAILURE
❑ Random Variable "T" is a measurement of the possible outcome of

an experiment
❑ Particular value taken by R.V."T" is denoted by t

❑ Time to failure "T" — Continuous Random Variable

MTBFMONTREAL.CA
MTTF: MEAN TIME TO FAILURE
MTBF: MEAN TIME BETWEEN FAILURES


INDU 6391
MTTF / MTBF

126
INDU 6391
MTTF / MTBF

127
MTBFMONTREAL.CA
MTTF / MTBF / MTTR
MTBF

MTTF
MTTR: Mean time to repair is the average time needed to recover a

unit to an operational state.

MTBFMONTREAL.CA
FAILURE DEFINITION IS IMPORTANT
total 100,000.00 20.00 5.00
car km tire failure computer failure
1 1,500.00 1
2 1,000.00 1
For example, a car fleet reliability is observed, 3

4
2,500.00
5,500.00
1
1
measured in MTTF, time to first failure. For 5
6
6,500.00
1,700.00
1
1
academic purposes, let’s suppose that only 2 7
8
2,600.00
3,400.00 1
failure modes are being observed: tire burst 9 4,500.00 1

and main computer, both failures leading to 10
11
1,600.00
2,400.00 1
unavailability of the car. 12
13
2,100.00
1,900.00
1
14 1,600.00 1
Let’s assume the following data: over the last 15
16
1,500.00
1,000.00 1
12 months, the fleet has cumulated 100,000km 17 2,500.00 1
18 5,500.00 1
(so time definition is in km) up to the first 19 5,900.00
failure. This 100,000km is the result of the 20

21
1,700.00
2,600.00
1
1
number of kilometers cumulated by each car up 22
23
3,400.00
4,500.00 1
to its first failure. 24 4,600.00 1
25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
5 computers and 20 tires failed (confirmed 30 6,800.00 2
failures) over the same time period.

MTBFMONTREAL.CA
total 100,000.00 20.00 5.00

Case 1: computer reliability car km tire failure computer failure
1 1,500.00 1
Each car is equipped with one computer, so the 2

3
1,000.00
2,500.00
1
1
computers have cumulated 100,000km and 5 4

5
5,500.00
6,500.00
1
1
failures. The non-failed computer cars are 6 1,700.00 1

7 2,600.00
adding suspension times to our analysis as their 8 3,400.00 1
computer worked but the failure (blue) was not 9

10
4,500.00
1,600.00
1
due to a computer. Same is applicable for non- 11

12
2,400.00
2,100.00
1
1
failed cars as their computers add operating 13
14
1,900.00
1,600.00 1
time without failure. 15 1,500.00
16 1,000.00 1
17 2,500.00 1
The above is leading to a MTTF = 18 5,500.00 1
100,000km/5 = 20,000km 19
20
5,900.00
1,700.00 1
21 2,600.00 1
22 3,400.00
23 4,500.00 1
24 4,600.00 1
25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
30 6,800.00 2

MTBFMONTREAL.CA
total 100,000.00 20.00 5.00

Case 2: tire reliability car km tire failure computer failure
1 1,500.00 1
Each car is equipped with 4 tires, so the tiers 2

3
1,000.00
2,500.00
1
1
have cumulated 4*100,000km and 20 failures. 4

5
5,500.00
6,500.00
1
1
The non-failed tire cars are adding suspension 6 1,700.00 1

7 2,600.00
times to our analysis as their tires worked but 8 3,400.00 1
the failure (yellow) was not due to a tire. Same 9

10
4,500.00
1,600.00
1
is applicable for non-failed cars as their tires 11

12
2,400.00
2,100.00
1
1
add operating time without failure. Moreover, 13
14
1,900.00
1,600.00 1
the non-failed tires (most of the cars have one 15 1,500.00
tire failed and 3 non-failed) add operational 16

17
1,000.00
2,500.00
1
1
time as their operation was suspended without 18
19
5,500.00
5,900.00
1
failure. 20 1,700.00 1
21 2,600.00 1
22 3,400.00
The above is leading to a MTTF = 23 4,500.00 1
24 4,600.00 1
400,000km/20 = 20,000km 25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
30 6,800.00 2

MTBFMONTREAL.CA
total 100,000.00 20.00 5.00

Case 3: car fleet reliability car km tire failure computer failure
1 1,500.00 1
For 100,000km (cumulated by failed and non- 2

3
1,000.00
2,500.00
1
1
failed cars) the data shows 25 failures. 4

5
5,500.00
6,500.00
1
1
6 1,700.00 1
The above is leading to a MTTF =

7 2,600.00
8 3,400.00 1
100,000km/25 = 4,000km 9 4,500.00 1
10 1,600.00
11 2,400.00 1
12 2,100.00 1
13 1,900.00
14 1,600.00 1
15 1,500.00
16 1,000.00 1
17 2,500.00 1
18 5,500.00 1
19 5,900.00
20 1,700.00 1
21 2,600.00 1
22 3,400.00
23 4,500.00 1
24 4,600.00 1
25 5,800.00 1
26 6,800.00 1
27 1,300.00 1
28 5,900.00 1
29 1,400.00 1
30 6,800.00 2

MTBFMONTREAL.CA
MTBUR
An acronym for Mean Time Between Unscheduled Removal. This is

an operational measurement. If all removals were because of actual
component failure then MTBUR would be equivalent to MTBF, but that
is not usually the case and so MTBUR is usually less than MTBF.


MTBFMONTREAL.CA
MTBX IN INDUSTRY
Notation MTBX is generically used to highlight that the following is

applicable to MTTF, MTBF, MTBUR, etc.
The definition of the MTBX divides the cumulated time by the number
of the events. The X varies depending on the definition of the events

(see previous slide, e.g. X = UR if the events are unscheduled
removals, etc.).
It is important to have a clear definition of the event. The following list
is not exhaustive:
❑ removal
❑ unscheduled removal
❑ justified removal
❑ failure
❑ induced failure
❑ confirmed failure Dr. Sorin Voiculescu

MTBFMONTREAL.CA
MTBUR = 2 MINUS 7
MTBUR: Mean time between unscheduled removals

MTBF: J (justified) 4 , N (non-induced) 6 , C (confirmed) 8
1 Removal

unit removed
2 3
Unscheduled Removal Scheduled Removal
Known or suspected malfunction unit removed to perform
maintenance
4 5
Failure / Fault Unjustified Removal
failure/fault found No Failure / No Fault Found (NFF)
6 Predictions 7
Failure / Fault Induced Failure / Fault
unit used within specification unit used out of specification
8 9
Confirmed / Accepted Unconfirmed
Failure / Fault Failure / Fault
Failure/fault does not
Failure/fault substantiates the
substantiates the reason for
reason for removal
removal
MTBFMONTREAL.CA
DATA – HOURS TO FAILURE EXAMPLE
The following pages presents an example of a unit that should meet, by
contract, a minimum MTBUR value of 4,000 hours.
The overall analysis of the unit’s performance measured by the MTBUR = (sum
of time to failure) / (number of failures) shows compliance to contractual
value.

Cumulated over the life
Partial extract of the data
The first approach is to take each individual item contributing to the analysis
and to enter its knowing operating time (first column), removal (second
column) or suspension (3rd column).
Using the time to removal or to suspension for each individual unit in the fleet
can offer the performance over the life of the product since entry into service.
This approach is not sensitive to any design / manufacturing / installation /

operational / maintenance changes over time as one is unable to say in which
year the failure encountered after 8h occurred.
Note: complete data is in the file “ MTBUR numerical example” attached on
Moodle.
MTBFMONTREAL.CA
MTBX IN INDUSTRY
It is very common not to have access to each individual product’s performance due to lack of
traceability, but to have access to the overall performance of the fleet.
For example, let’s suppose I am a car producer and I track the performance of an electronic
computer. I shipped the computer with serial number XX to my Customer A who has Car Number 1
and Car Number 2. I do not know how many hours have the unit serial XX been performed
because I do not know if it was installed on car Number 1 or on car Number 2 (my Customer did
not provide me with this data), but I know that, during the month of October, both cars have
summed ZZ number of hours. Extending this situation to all my Customers, I cannot use the exact
operating time to removal/failure of each individual computer to understand the overall
performance of my computers.

How about if, instead of using time to failure for each component in the sample size, we use the
observed performance over a specific period of time?
So, instead of using the exact operating time for each individual computer, I have to stick to a
number of hours cumulated over a period of time and a number of failures over the same period
of time.
Depending on the time frame chosen for the study, multiple options are available when counting
the cumulated operating time and cumulated number of removals/failures :
❑ over the last fiscal/ calendar year: very useful for economic purposes as many financial targets
are set per fiscal year ad these targets are influenced by the reliability performance
❑ over the last four months: useful to observe a trend, as well as seasonal (e.g. winter/summer)
behaviour
❑ over the last 12 months: very useful to observe a more stable performance; as 12 month data
generally integrates a large value for time (number of hours, cycles, km, miles, etc.) and an
important number of events, short time variability is attenuated.
❑ during a specific time frame: e.g. car/train industry 50.000km regardless the calendar time;
some particular cases might require observation over a non-standard time frame
❑ since the beginning of life: MTBX shows the overall performance of the product since the entry
into service; as today’s products have generally long lives (years), one should pay attention to
major changes that might impact any of the 3 parameters of the reliability definition (product,
operational conditions, function), e.g. changes in manufacturing, design updates, extra-
functionality added to product, operating conditions change, maintenance procedures change,
maintenance team change (most of the cases with a better one), etc. OBSERVATION: this
approach should give the same identical result as the previous approach using the individual
contribution of each item in the sample size.
It makes sense to see variations of this MTBX number over time.

MTBFMONTREAL.CA
MTBUR: MOVING AVERAGE
Moving average reports are used to show the performance of a product
over time. Depending on the reliability of the product and the size of the
sample, among other things, the performance (MTBUR/MTBF) of the product
measured over small time periods (say monthly) could vary widely and will
not be a real indicator of performance.
Take for example a product that in one month had 2 unscheduled removals
over 20,000 Flight Hours and in the following months had 3 unscheduled

removals and 1 unscheduled removal respectively, for the same amount of
flight hours. The monthly MTBUR for this product will be 10,000, 6,667 and
20,000 Flight Hours for each month respectively; a wide variation. The
variation is considered natural noise in the data.
In order to get a better picture of true performance of the product, we can
instead calculate the average performance over a fixed period of time
ending on the month in question. This is a moving average. So for a 3 month
moving average, data from any one month is averaged with data from the
prior two months. However, in many cases, 3 months may still be too short a
time period to dampen, or "filter", out this natural noise, so 6 and 12 month
moving average report are also provided.
Note also that using a "longer" moving average report (12MMA vs 3MMA)
may have its downsides. The longer the time period used for moving
averages, the longer it will take for true shifts in the performance to show
up.
A balance in the use of these reports, should provide helpful information
needed for making decisions, always keeping in mind what question you're
trying to answer.
Following pages present different analyses on the same data. First is the
“classic approach”, then the data is obtained by reading the monthly

operating times and failures.
Reference: https://havrel.honeywell.com/docs/index.cfm?content=help/FAQ.cfm#8

MTBFMONTREAL.CA
DATA – CUMULATED OPERATING TIME
The same numerical data as before is analyzed now by month:
instead of each operating time to removal, we analyze cumulated
operating time per month and number of removals per month.
The following analysis takes into account the monthly operating time
instead of the individual times to failure. So, for 201202 (February

2012), without knowing how many units operated, it is known that the
cumulative operating time was 1,091h
The overall MTBUR (sum of all operating times divided by number of
removals) is (obviously) the same as the one computed before.
Partial extract of the data


MTBFMONTREAL.CA
3 AND 12 MONTHS MOVING AVERAGE
MTBUR
Instead of computing the total operating time, 3 months moving

average MTBUR offers the performance over the last 3 months
(computed by summing the operating times of the last 3 months
divided by the number of removals encountered over the last 3

months).
3 months moving average MTBUR can be computed for any of the
month having 2 previous operating data.
In a similar manner, 12 months moving average reflects the

performance of the last 12 months.

MTBFMONTREAL.CA
DATA INTERPRETATION
log 3 months MTBUR
12 months MTBUR
Contract
18,000
1,800
201101
201103
201105
201107
201109
201111
201201
201203
201205
201207
201209
201211
201301
201303
201305
201307
201309
201311
201401
201403
201405
201407
201409
201411
201501
201503
201505
201507
201509
201511
201601
201603
201605
201607
201609
201611

▪ 3 months average is not representative due to low number of removals
and to high variations of this number(too much noise)
▪ For a short period of time at the end of the summer 2013, performance
was below target.
▪ 2013-2016 shows a performance around the minimum contractual value
▪ Since March 2016, the unit is performing below target
It is interesting to understand the removal reason that induced the
decreasing trend since 2011 to 2013. It looks like an ageing failure mode
installs.
▪ Once the ageing units are balanced by the newly installed ones, over
2013-2016, the overall performance is quite constant
▪ From March 2016, the unit’s performance is constantly decreasing. Action
needs to be taken to improve the performance. The generic word action is
defined by the following process (FRACAS – failure reporting, analysis,
and corrective action system) :
 Identify the driving removal reason and the driving failure mode(s)
 Understand the field root-causes initiating these failure modes
 Update FMEA
 Implement actions to reduce/eliminate the impact of these root-causes (either by eliminating the root
causes or by reducing/eliminating the impact)
 Monitor the effectiveness of the measures by tracing the performance of the units with these corrective
actions implemented
MTBFMONTREAL.CA
EXAMPLE OF USE
An investigation on the fuel control system of the F100 Engine was

conducted. Some of the recommendations were justified using the
MTBUR moving average technique.
Recommendation

Justification
Reference: Google books: Final Report on the Fuel Control system of the F100 Engine

MTBFMONTREAL.CA
EXAMPLE OF USE

Reference: Google books: Final Report on the Fuel Control system of the F100 Engine

MTBFMONTREAL.CA
MTBUR – 12 MONTHS MOVING AVERAGE
EXAMPLE ON A CAR ITEM
Example of variations induced by the age of the fleet. Early years
showed a very good MTBUR due to the fact that all vehicles are
equipped with new items. towards 2012, the MTBUR increases,
potentially doe to any of the following:
❑ design change (new parts, improved operational conditions: e.g.

reduced temperature, vibration, improved hermeticity to moisture,
etc.)
❑ large order (many new vehicles on market)
❑ maintenance improved
❑ operating conditions improved
❑ new manufacturing line and/or process (or existing one upgraded)
❑ new supplier
❑ etc., etc., etc.
25000
20000
UNIT KM
15000
10000
5000
0
Feb-05
Oct-05
Feb-06
Oct-06
Feb-07
Oct-07
Feb-08
Oct-08
Feb-09
Oct-09
Feb-10
Oct-10
Feb-11
Oct-11
Feb-12
Oct-12
Feb-13
Jun-05
Jun-06
Jun-07
Jun-08
Jun-09
Jun-10
Jun-11
Jun-12
TARGET 12-M MTBUR

MTBFMONTREAL.CA
B10
B10 is the time that a devices will operate prior to 10% of a sample
of those devices would fail.

R(t) 1
10%
R(tfix)
1t 2
B10 < B10
0 fix
t
DR. SORIN V

MTBFMONTREAL.CA
BX – TOOL FOR COMPARISON
Depending on the X value, the results might be different. In the

example below, for a small value of X, the component represented by
the red reliability function (2) provides a better value (B2 > B1). For a
large value of X, the component 1 (black reliability function on the

graph) provides better results (B1 > B2).
Even though from a mathematical point of view, X can take any value
within (0, 1), typical values for X are 5%, 10% or even (rarely) 20%.
R(t) 1
X%
Y%
0 1 2 2 1 t
INDU 6391
AVAILABILITY

147
MTBFMONTREAL.CA
MUT / MDT
Mean up time (MUT): is a measure of the mean time a machine,
typically a computer, has been working and available. Uptime is the
opposite of downtime. It is often used as a measure of computer
operating system reliability or stability, in that this time represents the
time a computer can be left unattended without crashing, or needing
to be rebooted for administrative or maintenance purposes.

Mean down time (MDT): is a measure of the mean time when a system
is unavailable. Downtime or outage duration refers to a period of
time that a system fails to provide or perform its primary function.
Cumulated running time after 1st failure

MUT =
Number of intervals between 2 consecutive failures
Cumulated Down-time
MDT =
Number of failures after the 1st one
MTBF = MUT+MDT
MUT
Availability =
Reference: WIKIPEDIA MTBF
INDU 6391
EXAMPLE IN CLASS

149
MTBFMONTREAL.CA
MAINTAINABILITY

The following items are typical:
❑ diagnosis time
❑ part procurement time
❑ teardown time
❑ rebuild time
❑ verification time

MTBFMONTREAL.CA
MAINTAINABILITY
In engineering, maintainability is the ease with which a product can be

maintained in order to:
❑ isolate defects or their cause,
❑ correct defects or their cause,

❑ repair or replace faulty or worn-out components without having to
replace still working parts,
❑ prevent unexpected breakdowns,
❑ maximize a product's useful life,
❑ maximize efficiency, reliability, and safety,
❑ meet new requirements,
❑ make future maintenance easier, or
❑ cope with a changed environment
In telecommunication and several other engineering fields, the term

maintainability has the following meanings:
❑ A characteristic of design and installation, expressed as the
probability that an item will be retained in or restored to a specified
condition within a given period of time, when the maintenance is

performed in accordance with prescribed procedures and resources.
❑ The ease with which maintenance of a functional unit can be
performed in accordance with prescribed requirements.

INDU 6391
152
MTBFMONTREAL.CA
FAILURE RATE
Failure rate is the frequency with which an engineered system or

component fails, expressed in failures per unit of time. It is often
denoted by the Greek letter λ (lambda) and is highly used in
reliability engineering.

Depending on the timeframe considered, one can have:
- Instant failure rate (time -> 0)
- Daily failure rate
- Annualized failure rate
- 75K miles failure rate, etc.

INDU 6391
FAILURE RATE

154
INDU 6391
FAILURE RATE

155
INDU 6391
FAILURE RATE

156
INDU 6391
FAILURE RATE

157
INDU 6391
158
INDU 6391
SOFTWARE

159
INDU 6391
SOFTWARE

160
MTBFMONTREAL.CA
SOFTWARE
Software Reliability is the probability of failure-free software

operation for a specified period of time in a specified environment.
Software Reliability is also an important factor affecting system
reliability. It differs from hardware reliability in that it reflects the

design perfection, rather than manufacturing perfection. The high
complexity of software is the major contributing factor of Software
Reliability problems. Software Reliability is not a function of time -
although researchers have come up with models relating the two. The
modeling technique for Software Reliability is reaching its prosperity,
but before using the technique, we must carefully select the
appropriate model that can best suit our case. Measurement in
software is still in its infancy. No good quantitative methods have
been developed to represent Software Reliability without excessive
limitations. Various approaches can be used to improve the reliability
of software, however, it is hard to balance development time and
budget with software reliability
Reference: https://users.ece.cmu.edu/~koopman/des_s99/sw_reliability/

MTBFMONTREAL.CA
SOFTWARE
A partial list of the distinct characteristics of software compared to

hardware is listed below [Keene94]:
Failure cause: Software defects are mainly design defects.
Wear-out: Software does not have energy related wear-out phase. Errors

can occur without warning.
Repairable system concept: Periodic restarts can help fix software
problems.
Time dependency and life cycle: Software reliability is not a function of
operational time.
Environmental factors: Do not affect Software reliability, except it might
affect program inputs.
Reliability prediction: Software reliability can not be predicted from any
physical basis, since it depends completely on human factors in design.
Redundancy: Can not improve Software reliability if identical software
components are used.
Interfaces: Software interfaces are purely conceptual other than visual.
Failure rate motivators: Usually not predictable from analyses of
separate statements.
Built with standard components: Well-understood and extensively-tested
standard parts will help improve maintainability and reliability. But in
software industry, we have not observed this trend. Code reuse has been
around for some time, but to a very limited extent. Strictly speaking there
are no standard parts for software, except some standardized logic

structures.

MTBFMONTREAL.CA
SOFTWARE
Software reliability, however, does not show the same characteristics

similar as hardware. A possible curve is shown in Figure 2 if we projected
software reliability on the same axes. [RAC96]There are two major
differences between hardware and software curves. One difference is that
in the last phase, software does not have an increasing failure rate
as hardware does. In this phase, software is approaching obsolescence;

there are no motivation for any upgrades or changes to the software.
Therefore, the failure rate will not change. The second difference is that in
the useful-life phase, software will experience a drastic increase in failure
rate each time an upgrade is made. The failure rate levels off gradually,
partly because of the defects found and fixed after the upgrades.
The upgrades in Figure 2 imply feature upgrades, not upgrades for

reliability. For feature upgrades, the complexity of software is likely to be
increased, since the functionality of software is enhanced. Even bug fixes
may be a reason for more software failures, if the bug fix induces other
defects into software. For reliability upgrades, it is possible to incur a drop
in software failure rate, if the goal of the upgrade is enhancing software
reliability, such as a redesign or reimplementation of some modules using

better engineering approaches, such as clean-room method.

MTBFMONTREAL.CA
SOFTWARE
Software Reliability Models

A proliferation of software reliability models have emerged as
people try to understand the characteristics of how and why software
fails, and try to quantify software reliability. Over 200 models have

been developed since the early 1970s, but how to quantify software
reliability still remains largely unsolved. Interested readers may refer
to [RAC96], [Lyu95]. As many models as there are and many more
emerging, none of the models can capture a satisfying amount of the
complexity of software; constraints and assumptions have to be made
for the quantifying process. Therefore, there is no single model that
can be used in all situations. No model is complete or even
representative. One model may work well for a set of certain
software, but may be completely off track for other kinds of
problems.
Most software models contain the following parts: assumptions,
factors, and a mathematical function that relates the reliability with
the factors. The mathematical function is usually higher order
exponential or logarithmic.
Software modeling techniques can be divided into two subcategories:
prediction modeling and estimation modeling. [RAC96] Both kinds of
modeling techniques are based on observing and accumulating
failure data and analyzing with statistical inference. The major

difference of the two models are shown in Table 1.

MTBFMONTREAL.CA
SOFTWARE

Table 1. Difference between software reliability prediction models and
software reliability estimation models
Representative prediction models include Musa's Execution Time Model, Putnam's
Model. and Rome Laboratory models TR-92-51 and TR-92-15, etc. Using
prediction models, software reliability can be predicted early in the
development phase and enhancements can be initiated to improve the reliability.
Representative estimation models include exponential distribution models,
Weibull distribution model, Thompson and Chelson's model, etc. Exponential
models and Weibull distribution model are usually named as classical fault
count/fault rate estimation models, while Thompson and Chelson's model belong
to Bayesian fault rate estimation models.
The field has matured to the point that software models can be applied in
practical situations and give meaningful results and, second, that there is no one
model that is best in all situations. [Lyu95] Because of the complexity of
software, any model has to have extra assumptions. Only limited factors can be
put into consideration. Most software reliability models ignore the software
development process and focus on the results -- the observed faults and/or
failures. By doing so, complexity is reduced and abstraction is achieved,
however, the models tend to specialize to be applied to only a portion of the
situations and a certain class of the problems. We have to carefully choose the
right model that suits our specific case. Furthermore, the modeling results can not
be blindly believed and applied.

MTBFMONTREAL.CA
SOFTWARE
Software Reliability Metrics

Measurement is commonplace in other engineering field, but not in software
engineering. Though frustrating, the quest of quantifying software reliability has never
ceased. Until now, we still have no good way of measuring software reliability.
Measuring software reliability remains a difficult problem because we don't have a

good understanding of the nature of software. There is no clear definition to what
aspects are related to software reliability. We can not find a suitable way to measure
software reliability, and most of the aspects related to software reliability. Even the
most obvious product metrics such as software size have not uniform definition.
It is tempting to measure something related to reliability to reflect the characteristics, if
we can not measure reliability directly. The current practices of software reliability
measurement can be divided into four categories: [RAC96]
Product metrics
Software size is thought to be reflective of complexity, development effort and
reliability. Lines Of Code (LOC), or LOC in thousands(KLOC), is an intuitive initial
approach to measuring software size. But there is not a standard way of counting.
Typically, source code is used(SLOC, KSLOC) and comments and other non-executable
statements are not counted. This method can not faithfully compare software not written
in the same language. The advent of new technologies of code reuse and code
generation technique also cast doubt on this simple method.
Function point metric is a method of measuring the functionality of a proposed software
development based upon a count of inputs, outputs, master files, inquires, and
interfaces. The method can be used to estimate the size of a software system as soon as
these functions can be identified. It is a measure of the functional complexity of the
program. It measures the functionality delivered to the user and is independent of the
programming language. It is used primarily for business systems; it is not proven in
scientific or real-time applications.
Complexity is directly related to software reliability, so representing complexity is
important. Complexity-oriented metrics is a method of determining the complexity of a
program's control structure, by simplify the code into a graphical representation.

Representative metric is McCabe's Complexity Metric.

MTBFMONTREAL.CA
SOFTWARE
Test coverage metrics are a way of estimating fault and reliability by

performing tests on software products, based on the assumption that software
reliability is a function of the portion of software that has been successfully
verified or tested. Detailed discussion about various software testing methods
can be found in topic Software Testing.

Project management metrics
Researchers have realized that good management can result in better products.
Research has demonstrated that a relationship exists between the development
process and the ability to complete projects on time and within the desired
quality objectives. Costs increase when developers use inadequate processes.
Higher reliability can be achieved by using better development process, risk
management process, configuration management process, etc.
Process metrics
Based on the assumption that the quality of the product is a direct function of the
process, process metrics can be used to estimate, monitor and improve the
reliability and quality of software. ISO-9000 certification, or "quality
management standards", is the generic reference for a family of standards
developed by the International Standards Organization(ISO).
Fault and failure metrics
The goal of collecting fault and failure metrics is to be able to determine when
the software is approaching failure-free execution. Minimally, both the number
of faults found during testing (i.e., before delivery) and the failures (or other
problems) reported by users after delivery are collected, summarized and
analyzed to achieve this goal. Test strategy is highly relative to the effectiveness
of fault metrics, because if the testing scenario does not cover the full
functionality of the software, the software may pass all tests and yet be prone
to failure once delivered. Usually, failure metrics are based upon customer
information regarding failures found after release of the software. The failure
data collected is therefore used to calculate failure density, Mean Time Between
Failures (MTBF) or other parameters to measure or predict software reliability.

MTBFMONTREAL.CA
SOFTWARE
Software Reliability Improvement Techniques
Good engineering methods can largely improve software reliability.

Before the deployment of software products, testing, verification and validation are necessary
steps. Software testing is heavily used to trigger, locate and remove software defects.
Software testing is still in its infant stage; testing is crafted to suit specific needs in various
software development projects in an ad-hoc manner. Various analysis tools such as trend
analysis, fault-tree analysis, Orthogonal Defect classification and formal methods, etc, can also

be used to minimize the possibility of defect occurrence after release and therefore improve
software reliability.
After deployment of the software product, field data can be gathered and analyzed to study
the behavior of software defects. Fault tolerance or fault/failure forecasting techniques will be
helpful techniques and guide rules to minimize fault occurrence or impact of the fault on the
system.
Conclusions
Software reliability is a key part in software quality. The study of software reliability can be
categorized into three parts: modeling, measurement and improvement.
Software reliability modeling has matured to the point that meaningful results can be obtained
by applying suitable models to the problem. There are many models exist, but no single model
can capture a necessary amount of the software characteristics. Assumptions and abstractions
must be made to simplify the problem. There is no single model that is universal to all the
situations.
Software reliability measurement is naive. Measurement is far from commonplace in software,
as in other engineering field. "How good is the software, quantitatively?" As simple as the
question is, there is still no good answer. Software reliability can not be directly measured, so
other related factors are measured to estimate software reliability and compare it among
products. Development process, faults and failures found are all factors related to software
reliability.
Software reliability improvement is hard. The difficulty of the problem stems from insufficient
understanding of software reliability and in general, the characteristics of software. Until now
there is no good way to conquer the complexity problem of software. Complete testing of a
moderately complex software module is infeasible. Defect-free software product can not be
assured. Realistic constraints of time and budget severely limits the effort put into software
reliability improvement.
As more and more software is creeping into embedded systems, we must make sure they don't
embed disasters. If not considered carefully, software reliability can be the reliability
bottleneck of the whole system. Ensuring software reliability is no easy task. As hard as the
problem is, promising progresses are still being made toward more reliable software. More
standard components, and better process are introduced in software engineering field.

INDU 6391
SOFTWARE

181
INDU 6391
FTA
FMEA
TOOLS
FTA
FMEA

182
MTBFMONTREAL.CA
FMEA: BOTTOM-UP APPROACH
Failure Modes and Effect Analysis (FMEA) is a systematic technique of

identifying and preventing product and process problems before they
occur.
With FMEA, you explore potential failure modes of the lowest level

(installation/component/piece-part) and identify potential effects of
this failure up to the system level. It generally addresses the effect
propagation of a single failure up to the system level.
Low level
System
Reference: WEB

INDU 6391
184
MTBFMONTREAL.CA
SEVERITY RANKING
An industry-dedicated ranking is generally used.
The table must be common for all the items in the project.
If you do not have any reference, you can use one of the below:


MTBFMONTREAL.CA
PROBABILITY RANKING
An industry-dedicated ranking is generally used.
The table must be common for all the items in the project.
If you do not have any reference, you can use one of the below:


MTBFMONTREAL.CA
DETECTABILITY RANKING
For INDU 6391, DET = 1 for all elements (for simplicity)


MTBFMONTREAL.CA
FTA: TOP-DOWN APPROACH
Fault Tree Analyse (FTA) is a top—down approach to failure analysis.

You can use an FTA to identify high level (system) failures and to
eliminate the cause of the failure. An FTA is a systematic, deductive
method far a single specific undesirable event and determining al

possible failures and combinations that could cause the event in
question to occur.
System
Component

INDU 6391
FTA

189
INDU 6391
FTA

190
INDU 6391
FTA

191
INDU 6391
FTA

192
INDU 6391
193
INDU 6391
194
INDU 6391
195
INDU 6391
196
INDU 6391
197
INDU 6391
198
INDU 6391
199
INDU 6391
200
INDU 6391
201
INDU 6391
202
MTBFMONTREAL.CA
QUANTITATIVE VS QUALITATIVE IN
RELIABILITY (AND SAFETY)
Qualitative Quantitative

No numerical probability Numerical probability
Used in early phases of the Used in detailed phases of the
project project
Allows early identification of the Allows computation of the
top risk elements predicted reliability of the
design and break-down by
Some safety requirements major sub-systems (e.g. power
checked (e.g. no single event supply, motor, etc.)
leads to top event)
Numerical validation of the
Early link between FMEA and safety requirements (e.g. top
FTA event probability is less than 2E-
FMEA probability is expressed 9/h)
on a qualitative scale e.g. low,
medium high, e.g. 1 to 5, e.g. 1
to 10, etc.
FTA have no probability number

associated

MTBFMONTREAL.CA
RELIABILITY (AND SAFETY): FMEA
EXAMPLE
Qualitative

Quantitative

MTBFMONTREAL.CA
RELIABILITY (AND SAFETY): FTA EXAMPLE
Qualitative

Quantitative

MTBFMONTREAL.CA
Exponential
STATISTICS FOR RELIABILITY Weibull

Normal
Log-Normal

MTBFMONTREAL.CA
MATHEMATICAL REPRESENTATION OF
RELIABILITY
The graph below shows two curves:

- C1: a step-down curve that fits the evolution of the operating size
(in percentage) of a finite sample size, operating under given
conditions. It can be noticed that the operation is defined (WHAT

function the units should produce) as well as the operating conditions
are. Each time when a unit out of n fails, the operating size reduces
by 1/n.
- C2: a continuous curve that APPROXIMATES the above step-down
one.
In theory, larger the sample size is, closer the step-down curve C1
gets to a continuous form. Better the mathematical model chosen for
C2, closer the C2 gets to what the products perform in the field.
REMEMBER: a mathematical model for C2 is only as good as the
selection criteria are.
C2 can be mathematically modeled by a continuous function
𝑅 𝑡Τ𝛼1 , 𝛼2 , … 𝛼𝑠 where 𝛼1 . . 𝛼𝑠 are the model’s parameters. Dr. Sorin Voiculescu

MTBFMONTREAL.CA
RELIABILITY
Independent of the chosen model, some important functions are generally

used in reliability*:
❑ reliability function 𝑅 𝑡 = 𝑃𝑅 𝑇 ≥ 𝑡 : is the probability that T will take

a value higher than or equal to 𝑡
❑ unreliability (cumulative distribution) 𝐹 𝑡 = 1 − 𝑅 𝑡 : is the probability
that T will take a value less than or equal to 𝑡
𝑑𝑅 𝑡
❑ probability density function 𝑓 𝑡 = − : a function that describes
𝑑𝑡
the relative likelihood for this random variable to take on a given value
𝑓 𝑡
❑ h 𝑡 hazard rate and 𝜆 𝑡 failure rate h 𝑡 = 𝜆 𝑡 = : the
𝑅 𝑡
frequency with which an engineered system or component fails, expressed
in failures per unit of time (many papers and references use these terms
interchangeable. The hazard rate is the limit of the instantaneous failure
rate given no failures up to time t)
∞
❑ 𝑀𝑇𝑇𝐹 = ‫׬‬0 𝑅 𝑡 𝑑𝑡
Graphic form of each of the above varies depending on model and

parameter value.
* As defined in a previous lecture, the value of the time to failure 𝑇 cannot

be known. The time to failure 𝑇 is a random variable.

MTBFMONTREAL.CA
RELIABILITY


INDU 6391
210
INDU 6391
211
MTBFMONTREAL.CA
RELIABILITY PREFERRED MODELS
Choosing the right model to mathematically represent the reliability

evolution of a function within a specific project is critical for the
program. The selection of the model can largely impact:
❑ target definition

❑ test plan set-up
❑ test results interpretation
❑ trade-offs
❑ each phase exit decision (go/no-go decision)
The INDU 691 presents the following models :
❑ exponential: one parameter 𝜆
❑ Weibull: two parameters 𝛽 and 𝜂
❑ normal: two parameters 𝜇 and 𝜎
❑ log-normal: two parameters 𝜇 and 𝜎
Defining a reliability law is equivalent with defining the parameters
for the chosen model.
Note that the above 4 models are generally sufficient to characterize

most of the products and their related failure modes. Still, some
specific cases (specific failure modes, new technologies, etc.) might
require the use of other models.

MTBFMONTREAL.CA
CHOOSING THE RIGHT MODEL
The reliability model is obviously related to the product. Field experience
demonstrated that models are generally related to failure mode
mechanisms and transferable from one product to another. He graph
below highlights the most usual association of failure mode and
mathematical model:
❑ Exponential: a ccommonly used distribution in reliability engineering.

Mathematically, it is a fairly simple distribution, which many times leads to
its use in inappropriate situations. It is used to model the behavior of units
that have a constant failure rate (or units that do not degrade with time or
wear out). There is no dominant failure mechanism and random failures
are expected.
❑ Weibull: one of the most widely used lifetime distributions in reliability
engineering. It is a versatile distribution that can take on the characteristics
of other types of distributions, based on the value of the shape parameter
𝛽. It can characterize:
✓ 0 < 𝛽 < 1 : early life of the product
✓ 𝛽 = 1: random failures (equals the exponential distribution)
✓ 𝛽 > 1 failure modes induced by wear-out
❑ Normal: also known as the Gaussian distribution, is the most widely-used
general purpose distribution. It is for this reason that it is included among
the lifetime distributions commonly used for reliability and life data
analysis. There are some who argue that the normal distribution is
inappropriate for modeling lifetime data because the left-hand limit of the
distribution extends to negative infinity. This could conceivably result in
modeling negative times-to-failure. However, provided that the distribution
in question has a relatively high mean and a relatively small standard

deviation, the issue of negative failure times should not present itself as a
problem. Nevertheless, the normal distribution has been shown to be useful
for modeling the lifetimes of consumable items, such as printer toner
cartridges.

MTBFMONTREAL.CA
❑ Log-Normal: The lognormal distribution is commonly used to model

the lives of units whose failure modes are of a fatigue-stress nature.
Since this includes most, if not all, mechanical systems, the lognormal
distribution can have widespread application. Consequently, the
lognormal distribution is a good companion to the Weibull distribution

when attempting to model these types of units. As may be surmised
by the name, the lognormal distribution has certain similarities to the
normal distribution. A random variable is lognormally distributed if
the logarithm of the random variable is normally distributed.
The graph below visualizes the relation between the failure mode
and the mathematical model :
Failure mode
?
component
hasard failure
degradation
ageing random
? ?
wear-out fatigue
? software + external
?
events
mecanica force systematique failures
corrosion vibrating heat
l d
chimique
wear-out FATIGUE FATIGUE
random
variable ou constant load vibrating
etc

MTBFMONTREAL.CA
For cases not covered by the previous page or when the association
to the proposed model on the previous page is under question, one
should consider some more extensive work before making the choice.
Other means to select the law might be:
❑ Internet search

❑ manuals / literature
❑ vendor test results
❑ PoF
❑ Experts opinion, etc.
Sometimes it’s impossible to decide upfront on the model; in such cases,
a test needs to be performed and the model is decided based on the
test results.
Reminder: the use of a mathematical model for the reliability of a

part/component/LRU/system/etc. requires a fixed function (and a
defined failure mode) as well as fixed operating conditions.
Change of the function might imply change of the model or of the

parameters if the same model corresponds to the new function.
Change of the operating conditions generally impact the value of the
model’s parameters, especially the scale one (the time-related one).

MTBFMONTREAL.CA
USE OF THE MATHEMATICAL MODELS
DURING THE PROGRAM
Approximating the reliability evolution by a parametric mathematical

model is useful to:
❑ visualize the reliability evolution within time
❑ predict the evolution of a system

❑ provide input to other domains (e.g. safety)
During the DESIGN phase, reliability targets can be translated into
model’s1 parameters
2 minimum requirements. In other words, if the
design
1 target
1.2 is maximum mission failure probability 𝑝 = 10−5 for a
2128771.4
failure
2 mode
1.4 of system which (failure mode) is modeled by Weibull
1130483.9
distribution,
3
4
this
1.6 696525.2
1.7 475370
can be translated into minimum parameters
requirement
H= 5 1.9 as presented in the graph below:
349129
6 2.1 270738.5
1 10
6
7 2.3 218819.1
6
8 2.5 182645 110 9 105
8 10
5
9 2.6 156395.3
7 10
5
10 2.8 136704
6 10
5
11 3 121519.9
 2  5
H 5 10
h 4 10
5
OK
3 10
5
2 10
5

1 10
5
reject
1.215105 0
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
1.4  1 3
b
H
Any product with the specific failure mode characterized by a

Weibull of parameters 𝛽 and 𝜂 below the graph is rejected. Only
products with model parameters above the graph are meeting the
design target.
MTBFMONTREAL.CA
USE OF THE MATHEMATICAL MODELS
DURING THE PROGRAM
During the VALIDATION phase, the mathematical models are used to

❑ build an optimal test plan
❑ predict the time and the cost of the test

❑ convert a test plan to an accelerated test plan
❑ validate the test results compliance against targets
During the OPERATION phase, the mathematical models are used to
❑ validate the model’s assumption
❑ validate field performance of the product against targets

MTBFMONTREAL.CA
REMINDER
Many people associates the engineering of reliability exclusively to

dedicated statistics. In order to apply these statistics, it is extremely
important to:
❑ define the function

❑ define the operating conditions
❑ choose the right model
Statistics model failures and statistical analysis does not improve reliability.
It helps setting targets and measuring the evolution throughout the
program.
The only mean to improve reliability is by affecting one or several of the
following product related aspects:
❑ design
❑ components quality
❑ manufacturing process
❑ production screening
❑ transport conditions
❑ operating conditions
❑ storage
❑ maintenance
All the previous lectures gave you means to achieve the most reliable
design before testing. Reliability modeling and statistics are here now to
confirm the compliance of the design to requirements.
MTBFMONTREAL.CA
EXPONENTIAL DISTRIBUTION
𝑅 𝑡 = 𝑒𝑥𝑝 −𝜆 𝑡


INDU 6391

220
INDU 6391

221
INDU 6391
222
MTBFMONTREAL.CA
EXPONENTIAL
The law the most popular among the Industry due to:
❑ ease of use (one parameter)
❑ maintenance (linearizes the failure rate)

❑ representativeness for very complex systems
Densité de proba exp(-t)

Failure rate
Défiabilité 1-exp(-t) Taux de défaillance
Unreliability 0 ,0 12 0
PDF 0 ,0 12 0
1,0 0 0 0 0 ,0 10 0 0 ,0 10 0
0 ,0 0 8 0 0 ,0 0 8 0
F(t)
f(t)
f(t)
0 ,0 0 6 0 0 ,0 0 6 0
0 ,0 0 4 0 0 ,0 0 4 0
0 ,0 0 2 0 0 ,0 0 2 0
0 ,0 0 0 0 0 ,0 0 0 0 0 ,0 0 0 0
t t t

MTBFMONTREAL.CA
EFFECT OF LAMBDA ON THE

Reference: RELIAWIKI.COM

MTBFMONTREAL.CA
MEMORYLESS EFFECT
The exponential distribution is the only continuous distribution satisfying:

𝑃 𝑇 ≥ 𝑡 = 𝑒 −𝜆𝑡
𝑃 𝑇 ≥𝑡+𝑠 ∩𝑇 ≥𝑡
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 =
𝑃 𝑇≥𝑠
𝑃 𝑇 ≥𝑡+𝑠
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 =
𝑃 𝑇≥𝑠
𝑒 −𝜆(𝑡+𝑠) 𝑒 −𝜆𝑡+𝜆𝑠 𝑒 −𝜆𝑠 ∗ 𝑒 −𝜆𝑡
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 = −𝜆𝑠
= −𝜆𝑠 =
𝑒 𝑒 𝑒 −𝜆𝑠
𝑃 𝑇 ≥ 𝑡 + 𝑠| 𝑇 ≥ 𝑠 = 𝑒 −𝜆𝑡 = 𝑃 𝑇 ≥ 𝑡
This result indicates that the conditional reliability function for the lifetime
of a component that has survived to time s is identical to that of a new
component. This term is the so-called "used-as-good-as-new" assumption.
The lifetime of a fuse in an electrical distribution system may be assumed
to have an exponential distribution. It will fail when there is a power surge

causing the fuse to burn out. Assuming that the fuse does not undergo any
degradation over time and that power surges that cause failure are likely
to occur equally over time, then use of the exponential lifetime distribution
is appropriate, and a used fuse that has not failed is as good as new.
MTBFMONTREAL.CA
MEMORYLESS EFFECT
Implications:
MTTF = MTBF (replacement is as good as new)
A time interval Δ𝑡 has the same impact (in percentage) over the

reliability, independent of the value of the starting time:
𝑅 𝑡 − 𝑅 𝑡 + ∆𝑡 = 𝑒 −𝜆𝑡+Δ𝑡 − 𝑒 −𝜆𝑡
= 𝑒 −𝜆𝑡 − 𝑒 −𝜆𝑡 ∗ 𝑒 +Δ𝑡 = 𝑒 −𝜆𝑡 1 − 𝑒 −Δ𝑡
If for example, during Δ𝑡 = 100 operating hours, a new product
will loose 50% of it’s reliability, R(t = 0h + Δ𝑡 =100h) = 0.5,
then after 200 hours it will loose 50% of the remaining value:
R(200h) = R(100h) * (% Decrease due to functioning Δ𝑡 = 100)
= 0.5 * 0.5 = 0.25.
For the same assumptions, if after 41.49h, a product reaches
75% reliability, R(41.49h) = 0.75, after 141,49 hours (100 more
operational hours), R(41.49+100) will reduce to half of
R(41.49), this means:
R(141.49) = R(41.49) * .5 = 0.75 * 0.5 = 0.375
If we continue the logic, based on the memoryless effect, after

100 more operating hours, the reliability value will split in two:
R(241.49) = R(141.49) * .5 = 0.375 * 0.5 = 0.1875
MTBFMONTREAL.CA
EXPONENTIAL DISTRIBUTION FOR
T=MTBF
Knowing that:
1
𝜆=
𝑀𝑇𝑇𝐹

Let’s compute the R(t = MTTF)
1
− 𝑀𝑇𝑇𝐹 ∗𝜆 − 𝑀𝑇𝑇𝐹 ∗
𝑅 𝑡 = 𝑀𝑇𝑇𝐹 = 𝑒 =𝑒 𝑀𝑇𝑇𝐹 = 𝑒 −1 = 0.367
Based on the definition applied to t = MTTF, reliability is the

probability that a system performs it’s intended function over MTBF
duration (under given conditions).
The above equation translates in 63.21% of the units FAILED at the
time of MTBF. For example, a company has in possession 100
computers with an MTTF of 10,000 hours. This translates into
approximately 63 computers FAILED before 10,000 operational
hours.
As the exponential law has MTBF = MTTF, the above is also
applicable to MTBF (considering that a repair brings the component
to a state equivalent to as good as new).

MTBFMONTREAL.CA
CONSTANT FAILURE HAZARD - USAGE
Much easier to be used

• Simple reliability equation
• Field returns interpretation (total FH and total fails)
• Test setup confidence level
• 100 units operating 1 hour = 1 unit operating 100 hours
• Allows prediction models (generally for electronics)
2
2
8
INDU 6391
229
MTBFMONTREAL.CA
RISK-BASED MAINTENANCE (RBM)
Theoretically, the maintenance task is intended to reduce the failure

rate to a state of “as good as new”. Still in theory, maintenance tasks
performed at a correct interval, will make failure rate variations over
time to be approximated by a constant value see below).

In conjunction with a validate maintenance program, the exponential
behavior hypothesis of a product can be a valid hypothesis.

MTBFMONTREAL.CA
WEIBULL DISTRIBUTION
1
𝑀𝑇𝑇𝐹 = 𝜂 ∗ Γ 1 +
𝛽

INDU 6391
WEIBULL DISTRIBUTION

232
INDU 6391
RELIAWIKI.COM

233
MTBFMONTREAL.CA
BETA PARAMETER AND FAILURE MODE
Field experience associates specific BETA values (or intervals) to

specific failure modes. Papers and references exist on the typical
BETA value for simple components which generally have one single
major failure mode.

For example, an extract from
http://www.barringer1.com/wdbase.htm:

MTBFMONTREAL.CA
WEIBULL DISTRIBUTION PARAMETERS
REPRESENTATION
As observed through out field experience, for most of the cases, 𝛽

parameter takes values between 0,4 and 7. The graph below
presents all the couples 𝛽, 𝜂 that satisfies the reliability requirement.
Any 𝛽, 𝜂 couple in the green area characterizes a better than

requirement product. Any 𝛽, 𝜂 couple in the green area
characterizes a lower than requirement product.
IMPORTANT
1 2 REMARK: pending on the requirement definition, the
1 shape of the graph can change and the green-red areas can reverse.
1.2 2128771.4
2
3
Always pay attention to the meaning of each 𝛽, 𝜂 couple.
1.4 1130483.9
1.6 696525.2
4 1.7 475370
H= 5 1.9 349129
6 2.1 270738.5
1 10
6
7 2.3 218819.1
8 2.5 182645 11069 105
8 10
5
9 2.6 156395.3
7 10
5
10 2.8 136704
11 3 121519.9 
6 10
5
 2  5
H 5 10
h 4 10
5
OK

3 10
5

2 10
5

1 10
5
NOK
1.215105 0
1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
1.4  1 3
b
H
Reference:

MTBFMONTREAL.CA
NORMAL DISTRIBUTION
The normal distribution, also known as the Gaussian distribution, is the

most widely-used general purpose distribution. It is for this reason that
it is included among the lifetime distributions commonly used for
reliability and life data analysis. There are some who argue that the

normal distribution is inappropriate for modeling lifetime data
because the left-hand limit of the distribution extends to negative
infinity. This could conceivably result in modeling negative times-to-
failure. However, provided that the distribution in question has a
relatively high mean and a relatively small standard deviation, the
issue of negative failure times should not present itself as a problem.
Nevertheless, the normal distribution has been shown to be useful for
modeling the lifetimes of consumable items, such as printer toner
cartridges.

MTBFMONTREAL.CA
NORMAL DISTRIBUTION


MTBFMONTREAL.CA
NORMAL DISTRIBUTION


MTBFMONTREAL.CA
NORMAL DISTRIBUTION PARAMETERS
REPRESENTATION
Both 𝜇, 𝜎 parameters can practically take values across the entire

0, ∞ range. It has been noted that replacing the 𝜎 parameter by
the ratio q = 𝜎/𝜇 not only provides easier means to handle the
parameters but also can be linked to some manufacturing quality

aspects. Existing literature claims that, a high quality manufacturing
process should not provide q values that exceed 0,09. A q value of
0,20 shows a large variability in manufacturing and thus a low quality
process.
The graph below presents all the couples 𝑞, 𝜇 that satisfies the
reliability requirement. Any 𝑞, 𝜇 couple in the green area
characterizes a better than requirement product. Any 𝑞, 𝜇 couple in
the green area characterizes a lower than requirement product.
IMPORTANT REMARK: pending on the requirement definition, the

shape of the graph can change and the green-red areas can reverse.
Always pay attention to the meaning of each 𝑞, 𝜇 couple.
0.12
0.12
0.11
NOK
0.1
0.09
 1
W 6 0.08
q
0.07
0.06 OK
0.05
0.04 0.04
1.1 1.2 1.3 1.4 1.5 1.6
 2
m
1.1 1.6
W6
Reference: web

MTBFMONTREAL.CA
LOGNORMAL DISTRIBUTION
The lognormal distribution is commonly used to model the lives of units

whose failure modes are of a fatigue-stress nature. Since this includes
most, if not all, mechanical systems, the lognormal distribution can
have widespread application. Consequently, the lognormal

distribution is a good companion to the Weibull distribution when
attempting to model these types of units. It is used to determine
failure due to crack propagation, modeling material fatigue failures,
and material strength. As may be surmised by the name, the
lognormal distribution has certain similarities to the normal
distribution. A random variable is lognormally distributed if the
logarithm of the random variable is normally distributed.
Reference: web

MTBFMONTREAL.CA
LOGNORMAL DISTRIBUTION

There is no close form for the Reliability function
Reference: web

MTBFMONTREAL.CA
EFFECT OF PARAMETERS ON LOGNORMAL
DISTRIBUTION

Reference: web

MTBFMONTREAL.CA
LOGNORMAL DISTRIBUTION PARAMETERS
REPRESENTATION
Both 𝜇, 𝜎 parameters can practically take values across the entire

0, ∞ range. Based on the normal distribution approach, it has been
2
noted that replacing the 𝜎 parameter by the ratio 𝑞 = μ ∗ 𝑒 𝜎 − 1
not only provides easier means to handle the parameters but also can

be linked to some manufacturing quality aspects. Existing literature
claims that, a high quality manufacturing process should not provide q
values that exceed 0,09. A q value of 0,20 shows a large variability
in manufacturing and thus a low quality process.
The graph below presents all the couples 𝑞, 𝜇 that satisfies the
reliability requirement. Any 𝑞, 𝜇 couple in the green area
characterizes a better than requirement product. Any 𝑞, 𝜇 couple in
the green area characterizes a lower than requirement product.
IMPORTANT REMARK: pending on the requirement definition, the

shape of the graph can change and the green-red areas can reverse.
Always pay attention to the meaning of each 𝑞, 𝜇 couple.
0.12
0.12
0.11
NOK
0.1
0.09
 1
W 6 0.08
q
0.07
0.06
OK
0.05
0.04 0.04
1.1 1.2 1.3 1.4 1.5 1.6
 2
m
1.1 1.6
Reference: web W6

MTBFMONTREAL.CA
REMEBER
Simple distributions used in reliability have either one parameter (exponential)
of two parameters.
Exponential model is a particular case of Weibull with 𝛽 = 1.
Simple distributions with two parameters have:
❑ one “shape” parameter: 𝛽 for Weibull and 𝜎 for Normal and Lognormal. This
parameter is under less control for a given technology and manufacturing line.

Changes to this parameter are made generally by changing the failure mode:
choosing a different technology, different materials with different failure modes,
changing the manufacturing processes, or any other change that impacts the PoF
of the failure mode under discussion. Generally*, design changes without
considering any of the above, should not impact this parameter. For example, if
a design change intends to improve the reliability of an IC by reducing the
environmental temperature and that IC follows a WEIBULL of 𝛽 = 1,8, the
improved performance should still be modeled by a WEIBULL of 𝛽 = 1,8
For example, doubling the life of a bearing by using new materials but not
changing the failure mode: if the old bearing design failure mode (wear-out) is
modeled by a WEIBULL of shape parameter 𝛽 = 2,8 and 𝜂 = 10,000ℎ the new
one will be equivalent with doubling the scale parameter. This means that the new
design reliability will be modeled by a WEIBULL of shape parameter 𝛽 = 2,8
(same value as the old one as the failure mode did not change) and 𝜂 =
20,000ℎ (twice for the new design compared to the old design).
❑ one scale parameter: 𝜂 for Weibull and 𝜇 for Normal and Lognormal. Design
improvements (except technological changes, materials changes, manufacturing
process) should directly impact the scale parameter. This will allow us later to
model the accelerated testing. Dr. Sorin Voiculescu
* Disclaimer: Literature and references exists and support cases when shape
parameter changes for the same failure mode under different operating
conditions, but the approach is less practical to use. CSS stands for Changing
Shape and Scale approach. Historical data shows that, except for some specific
technologies, there is very low added value in using the CSS

MTBFMONTREAL.CA
MTBF FOR SAFETY
Though true only under exponential assumptions, most of the safety

engineers consider:

If the exponential distribution is not demonstrated, then the safety
requirement is actually the probability of failing the last mission,
which is 1-R(t|T) (reliability of the last mission of duration t, last
mission starts at time T)
𝑅 𝑡+𝑇
𝑝 = 1 − 𝑅 𝑡|𝑇 = 1 −
𝑅 𝑇

MTBFMONTREAL.CA
ASSIGNMENT – SAFETY TARGETS–
SAFETY REQUIREMENTS PER OPERATING
HOUR
The following is for information only (not required for exam).
Assignment was intended to highlight the importance of choosing the right
distribution function. In reality, the 1E-6 at the end of life is a durability
requirement for automotive industry but the designers have to meet,

accordingly to ISO 26262 a failure rate of 1E-8 per operating hours for
ASIL D (worst case, loss of human life). Note that 1E-8 is larger than all the
value in red on the previous slide (so a less stringent requirement).

MTBFMONTREAL.CA
REMEMBER
Simple distributions with two parameters have:

❑ one “shape” parameter: 𝛽 for Weibull and 𝜎 for Normal and
Lognormal.

under less control for a given technology and manufacturing line
Changes to this parameter: a different technology, different
materials, improving manufacturing processes, or any other change
that impacts the PoF of the failure mode under discussion
❑ one scale parameter: 𝜂 for Weibull and 𝜇 for Normal and
Lognormal.
Design improvements (except technological changes, materials
changes, manufacturing process) should directly impact the scale
parameter.
This will allow us later to model the accelerated testing.

INDU 6391
LIFE LIMITED

261
MTBFMONTREAL.CA
LIFE LIMITED
A hard time component is a component that requires a specific action

at a specific interval (overhaul, refurbishment, bench check, etc.) per
the manufacturers recommendations.
On-Condition (OC) is a preventive primary maintenance process that

requires a system, component, or appliance be inspected periodically
or checked against some appropriate physical standard to determine
if it can continue in service. The standard ensures that the unit is
removed from service before failure during normal operation. These
standards may be adjusted based on operating experience or tests,
as appropriate, IAW a carrier's approved reliability program or
maintenance manual.
Condition Monitoring (CM) is a process for systems, components, or

appliances that have neither HT nor OC maintenance as their primary
maintenance process. It is accomplished by appropriate means
available to an operator for finding and solving problem areas. The
user must control the reliability of systems or equipment based on
knowledge gained by analysis of failures or other indications of
deterioration. Dr. Sorin Voiculescu

MTBFMONTREAL.CA
OBSERVED MTTF WHEN OPERATING WITH
HARD TIME
Intrinsic MTTF is equal to the area under the reliability graph.

∞
𝑀𝑇𝑇𝐹 = න 𝑅 𝑡 𝑑𝑡

0
When operating with hard time T, at time T the non-failed units are
set back to a state “as good as new” (by overhaul/maintenance or by
being replaced with new units). The intrinsic performance of the unit
does not change.
The observed MTTF is what the user notices in the field, based on the
cumulated operating time and observed number of failures.
Obviously, this observed MTTF is larger than the intrinsic one as, by
the action taken at hard time, the user renews the fleet (or equivalent
to renewal).

INDU 6391
GOAL SETTING

Setting design targets
264
MTBFMONTREAL.CA
FROM PROGRAM TARGETS TO
RELIABILITY TARGETS
Reliability targets can be fixed in term of

❑ reliability value at time t
❑ failure rate value at a time t

❑ Bx for a given X % value
❑ mission probability failure
❑ MTTF value for units running to failure
❑ MTBF value for life limited units
❑ Etc.

MTBFMONTREAL.CA
EXCEL TOOL
INDU 691 offers a tool to easy decide model parameters targets

based on the reliability target and on the chosen model. There are 4
files available on vosorin.com.

Step 1: select the appropriate file, based on the mathematical model
to be used
Step 2: understand the color coding.
➢ User can modify only the blue cells, intended to enter data.
➢ Tool output is listed in green cells.
➢ Yellow cells are general comments, intended to ease the use of the
tool.
Step 3: select the reliability target definition
Step 4: enter input data
Step 5: click RUN (if button exists)

Step 6: read output data (model parameters value(s) )

MTBFMONTREAL.CA
EXCEL TOOL
Multiple options are embedded


MTBFMONTREAL.CA
EXPONENTIAL - CASE 1
Case 1: the reliability is measured by the maximum accepted failure

rate (T) at a given time T. For the exponential case, the value of T,
is of no importance as the failure rate is constant. The use of the tool
is of very low value as the input and the output are the same.


MTBFMONTREAL.CA
WEIBULL - CASE 1

rate (T) at a given time T. For a given set of data, the tool
automatically provides all the couples 𝛽, 𝜂 that satisfy the condition
and computes the associated MTTF.

Any product characterized by a 𝛽, 𝜂 couple situated in the red
area in the red area does not comply with the desired target.
OK
NOK

MTBFMONTREAL.CA
NORMAL - CASE 1

rate (T) at a given time T. For a given set of data, the tool
automatically provides all the couples 𝑞, 𝜇 that satisfy the condition

Any product characterized by a 𝑞, 𝜇 couple situated in the red area
in the red area does not comply with the desired target.
NOK
OK

MTBFMONTREAL.CA
LOGNORMAL - CASE 1
rate (T) at a given time T. For a given set of data, the tool provides
all the couples 𝑞, 𝜇 that satisfy the condition and computes the
associated MTTF.

Once entered the data, the user has to click on RUN
NOK
OK

MTBFMONTREAL.CA
Case 2: the reliability is measured by the minimum reliability

accepted value R(T) at a given time T For a given set of data, the
tool provides the value for the 𝜆 parameter that satisfy the condition
and computes the equivalent MTTF.


MTBFMONTREAL.CA
WEIBULL - CASE 2

tool automatically provides all the couples 𝛽, 𝜂 that satisfy the
condition and computes the associated MTTF.

OK
NOK

MTBFMONTREAL.CA
NORMAL - CASE 2

tool automatically provides all the couples 𝑞, 𝜇 that satisfy the

NOK
OK

MTBFMONTREAL.CA
LOGNORMAL - CASE 2
tool provides all the couples 𝑞, 𝜇 that satisfy the condition and
computes the associated MTTF.

NOK
OK

MTBFMONTREAL.CA
Case 3: the reliability is measured by the minimum time BX (X becomes

L for bearings) accepted for X% failed For a given set of data, the
tool provides the value for the 𝜆 parameter that satisfy the condition
and computes the equivalent MTTF.


MTBFMONTREAL.CA
WEIBULL - CASE 3

tool automatically provides all the couples 𝛽, 𝜂 that satisfy the

OK
NOK

MTBFMONTREAL.CA
NORMAL - CASE 3

tool automatically provides all the couples 𝑞, 𝜇 that satisfy the

NOK
OK

MTBFMONTREAL.CA
LOGNORMAL - CASE 3
tool provides all the couples 𝑞, 𝜇 that satisfy the condition and

Once entered the date, the user has to click on RUN
NOK
OK

MTBFMONTREAL.CA
Case 4: the reliability is measured by the maximum accepted

probability p of failing a mission of tm when the unit is removed after
Th time. The removal time might be associated either with the end of
life or with a hard time (restauration, overhaul) that resets the unit to

a state of “as good as new” For a given set of data, the tool
provides the value for the 𝜆 parameter that satisfy the condition and
computes the equivalent MTTF.

MTBFMONTREAL.CA
WEIBULL - CASE 4
automatically provides all the couples 𝛽, 𝜂 that satisfy the condition

OK
NOK

MTBFMONTREAL.CA
NORMAL - CASE 4
automatically provides all the couples 𝑞, 𝜇 that satisfy the condition

NOK
OK

MTBFMONTREAL.CA
LOGNORMAL - CASE 4
provides all the couples 𝑞, 𝜇 that satisfy the condition and computes

the associated MTTF.
NOK
OK

MTBFMONTREAL.CA
Case 5: the reliability is measured by the minimum accepted MTTF

value. For a given set of data, the tool provides the value for the 𝜆
parameter that satisfy the condition and computes the equivalent
MTTF.


MTBFMONTREAL.CA
WEIBULL - CASE 5
value. For a given set of data, the tool automatically provides all the
couples 𝛽, 𝜂 that satisfy the condition and computes the associated
MTTF.

OK Dr. Sorin Voiculescu
NOK

MTBFMONTREAL.CA
NORMAL - CASE 5
value. For a given set of data, the tool automatically provides all the
couples 𝑞, 𝜇 that satisfy the condition and computes the associated
MTTF.

This case if of less interest to a Normal distribution as the parameter
𝜇 has the same value as the MTTF.
NOK
OK

MTBFMONTREAL.CA
LOGNORMAL - CASE 5
value. For a given set of data, the tool provides all the couples 𝑞, 𝜇
that satisfy the condition and computes the associated MTTF.

This case if of less interest to a Lognormal distribution as the
parameter 𝜇 has the same value as the ln(MTTF).
NOK
OK

MTBFMONTREAL.CA
Case 6: the reliability is measured by the minimum observed MTTF

value for a unit removed at Th > MTTF hard time. For a given set of
data, the tool provides the value for the 𝜆 parameter that satisfy the
condition and computes the equivalent MTTF.

As the failure rate is constant, this case is of low interest.

MTBFMONTREAL.CA
WEIBULL - CASE 6
Case 6: the reliability is measured by the minimum observed
MTTF/MTBF value for a unit removed at Th > MTTF hard time. For a
given set of data, the tool automatically provides all the couples
𝛽, 𝜂 that satisfy the condition and computes the associated MTTF.

OK
NOK

MTBFMONTREAL.CA
NORMAL - CASE 6
Case 6: the reliability is measured by the minimum observed
MTTF/MTBF value for a unit removed at Th > MTTF hard time. For a
given set of data, the tool automatically provides all the couples
𝑞, 𝜇 that satisfy the condition and computes the associated MTTF.

NOK
OK

MTBFMONTREAL.CA
LOGNORMAL - CASE 6
Case 6: the reliability is measured by the minimum observed MTTF/MTBF
value for a unit removed at Th > MTTF hard time. For a given set of data,
the tool provides all the couples 𝑞, 𝜇 that satisfy the condition and
Any product characterized by a 𝑞, 𝜇 couple situated in the red area in
the red area does not comply with the desired target.

NOK
OK

MTBFMONTREAL.CA
FROM FIELD MTBF TO WEIBULL
PARAMETERS
The last file, “TOOL01_Requirement_5_FIELD_to_Weibull” introduces

means to obtain the equivalent Weibull parameters from:
❑ field MTBF

❑ predictions failure rate
❑ expected % of failure at specific time
Sheet “field MTBF to Weibull”
This tool uses field performance (and some assumptions) to provide the
WEIBULL ETA (scale) parameter.
User is supposed to know the number of cumulated operating
hours/cycles over the last period of time (for simplicity, 1 period of
time = 1 year), number of failures, BETA (shape parameter) value, as
well as the number of units and their age running during the last
period of time.

MTBFMONTREAL.CA
FROM FIELD MTBF TO WEIBULL
PARAMETERS
Sheet “Predicted FR to Weibull”

This tool provides the ETA parameter of the WEIBULL distribution that
satisfies the average FR entered by the user under the assumption of a
given BETA (shape parameter) value. User is also asked to enter the

average expected operating time per year. (FYI: the tool predicts the
FR evolution over the first 5 years and looks for the ETA value that
makes the average WEIBULL FR equal to predicted FR)
Sheet “Failure % to Weibull”

This tool provides the BETA and ETA parameters required to meet a
specific % of failures at 2 moment in time (years or time periods). User
is also asked to enter the average expected operating time per year. Dr. Sorin Voiculescu

MTBFMONTREAL.CA
WOLFRAM – FREE COMPUTATIONAL
TOOL
https://www.wolframalpha.com/input/ to plot Reliability
❑Exponential: Plot(1-CDF[ExponentialDistribution[], t],{t,0,value})
❑Normal: Plot( 1-CDF[NormalDistribution[µ, s], t] ,{t,0,value})

❑LogNormal: Plot( 1-CDF[LogNormalDistribution[µ, s], t] ,{t,0,value})

❑Weibull: Plot( 1-CDF[WeibullDistribution[b, h], t] ,{t,0,value})
https://www.wolframalpha.com/input/ to plot PDF
❑ Exponential: Plot(PDF[ExponentialDistribution[], t],{t,0,value})
❑ Normal: Plot( PDF[NormalDistribution[µ, s], t] ,{t,0,value})

❑ LogNormal: Plot(PDF[LogNormalDistribution[µ, s], t] ,{t,0,value})
❑ Weibull: Plot(PDF[WeibullDistribution[b, h], t] ,{t,0,value})
Multiple plots example:

Plot({PDF[ExponentialDistribution[0.021], t], PDF[WeibullDistribution[2,1/0.041], t]} , {t,0,100})

MTBFMONTREAL.CA
WOLFRAM – FREE COMPUTATIONAL
TOOL ∞
𝑀𝑇𝑇𝐹 = න 𝑅 𝑡 𝑑𝑡
0
https://www.wolframalpha.com/input/?i=integrate for MTTF/MTBF

(run to failure)
❑Normal
❑Function to integrate 1-CDF[NormalDistribution[µ, s], t]

❑Variable : t
❑Lower limit: 0
❑Upper limit: ∞
❑LogNormal
❑Function to integrate 1-CDF[LogNormalDistribution[µ, s], t]
❑Variable : t
❑Lower limit: 0
❑Upper limit: ∞
❑Weibull
❑Function to integrate 1-CDF[WeibullDistribution[b, h], t]
❑Variable : t
❑Lower limit: 0
❑Upper limit: ∞

RPP 2021 - 01 Intro - Student

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

RPP 2021 - 01 Intro - Student

Uploaded by

Copyright:

Available Formats

MTBFMONTREAL.

Reliability engineering is an engineering field that deals with the

Reliability from Concept to Culture

RELIABILITY AND MAINTENANCE

❑ Title of book: 50 ways to improve

Reliability from Concept to Culture

Dr. Sorin Voiculescu

INDU 6391 DR. SORIN VOICULESCU 2

❑ Title of book: Reliability Engineering

Reliability from Concept to Culture

• Reliability tests and reliability estimation

Reliability from Concept to Culture

This undergraduate and graduate textbook provides a practical and

book includes examples to clarify technical subjects and many end of

INDU 6391 DR. SORIN VOICULESCU 4

❑ Title of book: The certified reliability

Reliability from Concept to Culture

The structure of this book is based on that of the Body of

administer reliability information systems for failure

INDU 6391 DR. SORIN VOICULESCU 5

❑ Title of book: Handbook of Reliability,

Reliability from Concept to Culture

In the past two decades, industry—particularly the process

attributed to the complexity of their engineering design, both in

INDU 6391 DR. SORIN VOICULESCU 6

Reliability from Concept to Culture

INDU 6391 DR. SORIN VOICULESCU 7

Reliability from Concept to Culture

INDU 6391 DR. SORIN VOICULESCU 8

DR. SORIN VOICULESCU

DR. SORIN VOICULESCU

The everyday usage term "quality of a product" is loosely taken to mean

Reliability from Concept to Culture

From an operating point of view: Reliability is the quality degradation

❑ of processes used in design

INDU 6391 DR. SORIN VOICULESCU 11

DR. SORIN VOICULESCU

For a product that entries into service in 2020, obsolescence 10 years

Reliability from Concept to Culture

Updated product after 6 years

INDU 6391 DR. SORIN VOICULESCU 13

Reliability from Concept to Culture

INDU 6391 DR. SORIN VOICULESCU 14

Reliability from Concept to Culture

INDU 6391 DR. SORIN VOICULESCU 31

Companies need to secure a design first (something to sale), to make

Reliability from Concept to Culture

INDU 6391 DR. SORIN VOICULESCU 33

SOME DEFINITIONS Risk

INDU 6391 DR. SORIN VOICULESCU 35

Reliability from Concept to Culture

R(t ) = Punit does not fail over 0,t   = PT  t 

Reliability engineering relies heavily on statistics and probability theory

Reliability from Concept to Culture

Only 2 of the 5 items contributing to the definition can be drawn on a

order to assess/estimate/study the reliability, one needs to have

INDU 6391 DR. SORIN VOICULESCU 37

Reliability from Concept to Culture

x% surviving at time t, PPM under warranty, cost of operating over a

INDU 6391 DR. SORIN VOICULESCU 38

Reliability from Concept to Culture

INDU 6391 DR. SORIN VOICULESCU 39

Reliability from Concept to Culture

Dr. Sorin Voiculescu