What Every Engineer Should Know About Reliability Engineering I

290.
pdf
A SunCam online continuing education course
WHAT EVERY ENGINEER SHOULD KNOW ABOUT

RELIABILITY ENGINEERING I
by
O. Geoffrey Okogbaa, Ph.D., PE

290.pdf
WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

Contents
Introduction .................................................................................................................................................. 4
1.1 Definition of Reliability ................................................................................................................. 4
1.1.1 Performance and Reliability .................................................................................................. 4
1.1.2 Trade-offs: Reliability versus Cost ............................................................................................. 4
1.1.3 Time Element of Reliability ................................................................................................... 5
1.1.4 Operating Condition.............................................................................................................. 5
1.1.5 Other Performability Measures ............................................................................................ 5
1.2 Definition of Failure ...................................................................................................................... 5
Reliability Models.......................................................................................................................................... 5
2.1 Parametric and Nonparametric Relationships .............................................................................. 5
2.2 Failure Density Function ............................................................................................................... 6
2.2.1 Failure Probability in the interval (t1,t2) ............................................................................... 6
2.3 Reliability of Component of age t ................................................................................................. 6
2.4 Conditional Failure Rate (Hazard Function) .................................................................................. 7
2.5 Mean Time To Failure (MTTF and MTBF)...................................................................................... 7
2.6 Hazard Functions for Common Distributions................................................................................ 9
2.6.1 Exponential ........................................................................................................................... 9
2.6.2 Normal Distribution (Standard Normal Distribution) .................................................................. 9
2.6.3 Log Normal Distribution ........................................................................................................ 9
2.6.4 Weibull Distribution ............................................................................................................ 10
2.7 Estimating R(t), h(t), f(t) Using Empirical Data ............................................................................ 10
2.7.1 Small sample size (n < 10) ................................................................................................... 10
2.7.1 Large Sample size (n >10).................................................................................................... 10
Static Reliability........................................................................................................................................... 14
3.1 Series System .............................................................................................................................. 15
3.2 Parallel Systems .......................................................................................................................... 16
Reliability Improvement.............................................................................................................................. 18
3.1 Redundancy-High level ............................................................................................................... 19
3.2 Redundancy-Low level ................................................................................................................ 19
3.3 Active and Standby Redundancy ................................................................................................ 20
www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 2 of 39

290.pdf

3.3.1 Active or Parallel System Models........................................................................................ 20

3.4 Passive or Standby Configuration with Switching....................................................................... 20
3.4 Imperfect Switching .................................................................................................................... 21
3.5 Shared Load Models.................................................................................................................... 22
Repairable Systems (Availability Analysis) .................................................................................................. 23
4.1 Definition of Measures of System Effectiveness ........................................................................ 24
4.1.1 Serviceability ....................................................................................................................... 24
4.1.2 Reparability ......................................................................................................................... 24
4.1.3 Operational Readiness (OR) ................................................................................................ 24
4.1.4 Availability (A) ..................................................................................................................... 25
4.1.5 Intrinsic Availability (AI) ...................................................................................................... 25
4.1.6 Maintainability .................................................................................................................... 25
4.2 System Availability ...................................................................................................................... 25
4.2.1 Computation of Availability ................................................................................................ 26
4.2.2 Repair Function ................................................................................................................... 26
4.2.3 Availability Modeling........................................................................................................... 27
Redesign of the Automobile Braking System Using Redundancy Concepts ............................................... 29
5.1 Basic Brake Design (Design a) ..................................................................................................... 30
5.2 Unit or System Redundancy (Design b) ...................................................................................... 31
5.3 Component Redundancy (Design c)............................................................................................ 32
5.4 Hybrid/Compromise Redundancy (Design d) ............................................................................. 34
Maintenance ............................................................................................................................................... 34
6.1 Preventive Maintenance (PM) .................................................................................................... 35
6.2 Predictive Maintenance (PdM) ................................................................................................... 37
6.3 Corrective Maintenance (CM)..................................................................................................... 38
6.4 Summary ..................................................................................................................................... 39
Summary ..................................................................................................................................................... 39
References .................................................................................................................................................. 39

290.pdf

Introduction
Reliability consideration is playing an increasing role in virtually all human endeavors more
specifically in engineering designs. As the demand for systems that perform better and cost less
increase, there is a concomitant need and perhaps even requirement to minimize the probability of
component and/or system failures. Such failures if not properly mitigated could lead to increased cost
and inconvenience or could threaten individual and public safety.
1.1 Definition of Reliability

Reliability is defined as the probability that when operating under stated operating conditions,
the system (facility or device or component) will perform its intended function adequately for a
specified period of time. In actual practical considerations, reliability may be viewed or defined
differently for a given system or components etc. However, the system or unit of interest typically
determines what is being studied and there is usually no ambiguity. Based on this definition, we can
we surmise the following about reliability.
 It is a Probability (conditional probability)
 It is a design parameter (you can specify its value just as you can strength, weight)
 It is time dependent (it changes value with time and age)
 It is dependent on the operating conditions and the environment
In defining reliability, no distinction is made between failure and failure types. There is a great
deal of concern not only with the probability of failure but also the potential consequences of the
different modes of failures. In reliability analysis, attention is focused not just on economic losses or
inconvenience but also on the impact of failures on public safety and well being. For example, a home
appliance manufacturer must be concerned not only with frequent failures and the cost of maintenance
but the fact that such failures could become a safety hazard due to shock or electrocution. For a system
such as an aircraft, there is less distinction between reliability and safety considerations. Overall, safety
and reliability go hand in hand.
1.1.1 Performance and Reliability

The tradeoffs between performance and reliability are often subtle involving loading, complexity, etc.
While performance is frequently improved through overdesign and overloading, high reliability
requirement is often achieved sometimes by worst case design and most assuredly by determining the
interference region between stress and strength. In order words, reliability is the probability that load
(stress) is less than strength (capacity), i.e., P(c>s).
1.1.2 Trade-offs: Reliability versus Cost

In designing a race car, performance is the overriding goal. The designer must tolerate high
probability of breakdown with high probability of winning the race. In the case of a commercial airline,
safety and reliability are paramount, so performance and speed are sacrificed. For military aircraft,

290.pdf

both performance and reliability are equally useful. Performance must be high or the number of losses
during combat mission would be high. In situations other than life and death, reliability is viewed in
terms of economics. While management is concerned about reliability, management is less concerned
about the technical jargon surrounding reliability. So the best way to communicate the importance of
reliability to management is in terms of dollars and cents.
1.1.3 Time Element of Reliability

The way in which time is specified can also vary with the nature of the system under consideration.
a). In an intermittent system, one must specify whether calendar time or number of hours of
operation is the metric to be used (car, shoes, etc)
b). If the system operation is cyclic (switch, etc), then time is likely to be specified in terms of number
of operations
c). If reliability is to be specified in calendar time, it may also be necessary to indicate the number of
frequency of system stops and go’s.
1.1.4 Operating Condition

a) Principal design loads( weight, electrical load)
b) Environmental conditions (Dust, salt, vibrations) and Temperature extremes
1.1.5 Other Performability Measures

In addition to reliability, other quantities used to characterize the performability of a system include:
• MTTF and Failure rate for repairable system
• System Safety
• Availability and
• Maintainability.
1.2 Definition of Failure
A system or unit is commonly referred to as having failed when it ceases to perform its
intended function. When there is total cessation of function e.g., engine stops running, structure
collapses etc, then the system has clearly failed. However a system can also be considered to be in a
failed state when its deterioration function is within certain critical region or boundary. Such subtle
form of failure makes it necessary to define or determine quantitatively what is meant by failure.
Typical failure types include; creep, degradation, catastrophic, intermittent, drift, fracture, crack, shock,
etc.
As a result, the mathematical model of reliability can be quite complex because of the different
component probability distributions, complexity of the interference between stress and strength,
environmental conditions and stresses, as well as variations in equipment use conditions.
Reliability Models
2.1 Parametric and Nonparametric Relationships
Define t = random variable representing the time to failure and define T as the age of the system. If
the failure density function is given by f(t), then Prob(t < T) is the probability of failure and is

290.pdf

represented as F(t). F(t) also known as the distribution function of failure process. The
nonparametric relationship between f(t) and F(t) is given as: F (t ) = t f ( s)ds

0
t 
The Re liability function is given by : R(t ) = 1 − F (t ) = 1 −  f ( s)ds =  f ( s)ds
0 t
If ‘t’ is a negative exponential random variable with a constant parameter θ (The Mean Time to
Failure or MTTF), we can use probability to show the relation between f(t), F(f), and R(t), that is:
1  t 
f (t ) = exp  
  
 
t
 s  s    s   t 
t
1
 0
F (t ) = exp  −  = − exp  −  = −1exp  −  − 1 = 1 − exp  − 
        0       
 t 
R (t ) = 1 − F (t ) = exp  − 
 
f(t)
Figure 1: The failure Density Function
t1 t2 t
2.2 Failure Density Function
This gives a relative frequency of failure from the viewpoint of initial operation at time t =0. The
failure distribution function F(t) is the special case when t1 = 0 and t2 =t, i.e. F(t2)= F(t)
2.2.1 Failure Probability in the interval (t1,t2)
t2 t2 t1
= F (t2)-F(t 1)= [1 - R(t2) ] - [(1 - R(t 1)]= R(t 1 ) - R(t 2)

t1

f ( t ) dt = f ( t ) dt −
0

f ( t ) dt
0
2.3 Reliability of Component of age t

The reliability (or survival probability) of a fresh unit with mission duration x is by definition:
R(x) = F ( x ) = 1 - F(x) where F(x) is the life distribution of the unit. The corresponding conditional
reliability of the unit of age t for an additional time duration x is given by:
F (t  x)
F (x / t) = ; where F ( x)  0, but F (t  x) = F (t + x) , that is, the total life of the unit up
F (t )
to time (t+x)  F ( x / t ) = F (t + x )
F (t )
Similarly, the conditional probability of failure during the interval of duration x is F(x/t)

290.pdf

where: F(x/t) = 1 - F (x/t) by definition, But F ( x / t ) = F (t + x )

F (t )
F (t + x) F (t ) − F (t + x)
Hence : F (x/t) = 1 − =
F (t ) F (t )
2.4 Conditional Failure Rate (Hazard Function)
The conditional failure probability is given by F(x/t). Hence the conditional failure rate is given by:
F(x / t)
, That is, F ( x / t ) = 1  F (t ) − F (t + x)  = 1  R(t ) − R(t + x) 
x x x F (t )  x R(t ) 
The hazard function is the limit of the failure rate as the interval (x in this case) approaches zero.
The hazard function is also referred to as instantaneous failure rate because the interval in question
is very small. The hazard rate is a function that describes the conditional probability of failure in the
next instant x (or Δt) given survival up to a point in time, t.
 R(t ) − R(t + x)   1    R(t + x) − R(t )  − 1 
h(t ) = Limit    R (t )  Limit    R (t ) 
x →0  x   x →0  x  
1 d . Note: If f(t) is the exponential then and only then is h(t)=λ or 1/θ
h(t ) = −  dt R (t ) 
R (t )  
Note: R(t ) = 1 − F (t )  d R(t ) = − d F (t ) = − f (t )

dt dt
1 d  1  d 
 h(t ) = −  R(t ) =−  − F (t ) = −
1
− f (t ) = f (t )  f (t ) = h(t ) R(t )
R(t )  dt  R(t )  dt  R(t ) R(t )
This parametric relationship between the hazard function, the reliability function and the
density function is perhaps the most important relationship in reliability work. We can explore this
further to establish a more robust relationship among these functions to make it easy to determine
the reliability function or the density function once the hazard function is known or given.
h(t ) =
f (t )
=−
1 d
R(t ) = −
d
ln R(t )
R(t ) R(t ) dt dt
 t   t 
R(t ) = exp −  h( )d   f (t ) = h(t ) exp −  h( )d 
 0   0 
2.5 Mean Time To Failure (MTTF and MTBF)
The mean time to failure is the expected value of the time to failure. By definition, the expected
value of a density function ‘y’ is the following:

E ( y) =  xf ( x)dx,
−
−  x  
For the mean time to failure or expected time to failure or the average life of the system we have;

E (T ) =  R( s)ds, 0  T   = MTTF=expected life of the system
0
By proper transformation and integration (integration by parts), the mean time to failure is:

290.pdf

 
E (T ) =  R(s)ds =
0
 sf (s)ds , How? By Integration by parts.
0

E (T ) =  R( s)ds,0  T
0
 , Let :  udv = uv −  vdu
Let u = R(s), dv = ds  then v = s, and du = d (R(s))ds = - f(s)ds

 
  R( s)ds = sR(s) 
0 +  sf ( s)ds
0 0
When s =0, R(0) =1, sR(s)=0, at s= ∞, R(∞)=0, hence:∞(0)=0

 E(T ) = MTTF = 
 R(s)ds =  sf (s)ds
0
Example with exponential density function

f (t ) 1
f (t ) =
1
exp( −t /  ) Note : if f (t ) = e −t ; then and only then h(t ) = = =
 R(t ) MTTF

1 1
E (T ) = MTTF =  t   exp( −t /  )dt =  t exp( −t /  )dt
0    
Using Integratio n by parts :  udv = uv −  vdu


Let s = u , Let : exp( − s /  ) ds = dv  v = − exp( − s /  ),


(− 1)  s exp( − s /  ) 
+  exp( − s /  )ds
 0
0
(− 1)s exp( − s /  ) 0 −  exp( − s /  ) 0

When s = , s exp( − s /  ) = 0, when s = 0, s exp( − s /  ) = 0
When s = , exp( − s /  ) = 0, when s = 0, exp( − s /  ) = 1
 E (T ) = 0 −  0 − 1 =  = MTTF =
1

 t =   
R (t =  ) = R ( MTTF ) = exp  −  = exp  −  = exp (− 1) = 0.3679, F ( MTTF ) = 1 − R ( MTTF )
    
For the Normal Density:
t− t − MTTF
P (t   ) = F (t ) = = = Z0 ,
 
MTTF − MTTF
When t = MTTF  Z 0 = = 0   (0) = 0.5  F ( MTTF ) = 0.5

R ( MTTF ) = 1 − F ( MTTF ) = 0.5
Thus, even if MTTF is the same and known, reliability could change depending on the distribution or
density function associated with failure. Please note that for non-repairable system, we have MTTF,
namely mean time to failure. For repairable systems it is mean time to first failure (MTFF).

290.pdf

Note on MTTF: MTTF is the average life of the system. The MTTF can be derived or estimated
from the reliability function. Also, MTTF is different depending on the reliability function. That
means that MTTF for the normal distribution or the Weibull distribution will all be different and
specific to each distribution
2.6 Hazard Functions for Common Distributions

It is important to note that not all proposed hazard functions are legitimate probability functions.
Only legitimate hazard probability functions can produce reliability and probability density
functions.
2.6.1 Exponential
Given: f(t) =1/ e-t/, R(t) = e-t/, or f(t)= λ exp(-λ t), R(t)= exp(-λt), where: λ=1/
f (t ) 1
h(t ) = = =
R(t ) 
Note: This is true only when f(t) is the exponential. Some properties of the exponential distribution
include memoryless property; the occurrences follow the poison process; and constant failure rate.
2.6.2 Normal Distribution (Standard Normal Distribution)

1  z 2   ( z)
f (t ) = exp  −  =
2  2  
Z
 2 
d = 1 − Ft ) = 1 − (z )
1
R(t ) = 1 − 
− 2
exp  −
 2 
f (t )  ( z ) /   ( z)
h(t ) = = =
R(t ) R(t )  (1 − (z ))
where (z) = pdf for standard normal variable, and  (z) = cdf for standard normal variable
2.6.3 Log Normal Distribution

 1  ln t −   2 
1
f (t ) = exp −   
t 2  2    
t
 1  ln  −   2 
1  ln t −  
F (t ) = 0 t 2 − 2    d =    
exp
 
t
1  1  ln  −   2 
R(t ) = 1 −  exp −   d
0 t 2  2    
 ln t −  
R(t ) = 1 − F (t ) = 1 − P(T  t ) = 1 − P  z  = 1 − (z )
  
 ln t −    ln t −  ) 
  t   ( t )
   
=  
f (t )
h(t ) = =
R(t ) 1 − F (t ) t (1 − ( z ))

290.pdf

2.6.4 Weibull Distribution

 (t −  ) −1  t −    
f (t ) = exp   , t   0
( −  )   −   
t
 t −     ,  (t −  ) −1
R(t ) = 1 − F (t ) = 1 −  f ( )d = exp    h(t ) =
0   −    ( −  )
2.7 Estimating R(t), h(t), f(t) Using Empirical Data
2.7.1 Small sample size (n < 10)
Median estimator using order statistic
Consider the following ordered failure times
OT1, O T 2, OT3, OT4, ……, OTn
Where: oT1, < OT2 < OT 3 < ……< OT n

Let: nPj. = F̂(OTJ ) , that is:
nPjis the fraction of the population failing prior to the jth observation in a sample of size n.
The best estimate for nPj is the median value, i.e.
j − 0.3
Pj = Fˆ ( OTJ ) =
n + 0.4
n
Hence the cumulative distribution at the jth ordered failure time tj is estimated as:
j − 0.3
Fˆ ( OTJ ) =
n + 0.4
 j − 0.3  n + 0.4 − j + 0.3 n − j + 0.7
Rˆ ( OTJ ) = 1 − Fˆ ( OTJ ) = 1 −  = =
 n + 0.4  n + 0.4 n + 0.4
Rˆ ( OT j ) − Rˆ ( OT j +1 ) 1
hˆ( OT j ) = =
ˆ
( OT j +1 − OT j ) R( OT j ) ( OT j +1 − OT j )( n − j + 0.7)
Rˆ ( OT j ) − Rˆ ( OT j +1 ) 1
fˆ ( OT j ) = =
ˆ
( OT j +1 − OT j ) R( OT j ) (n + 0.4)( OT j +1 − OT j )
2.7.1 Large Sample size (n >10)

N (t ) N (t ) − N (t + x)
R(t ) = , f (t ) =
N N .x
f (t ) N (t ) − N (t + x)
h(t ) = = , where x = t
R(t ) N (t ) x

290.pdf

Estimation Using Empirical Data

n f (t )
f e (t ) =
n0 t
n f (t )
he (t ) =
ns t
f e (t )
Re (t ) = , and Fe (t ) = 1 − Re (t )
he (t )
These expressions are good for empirical data
• nf(t)= the number that failed during any interval
• n0(t) = original number of items that was put on the test
• ns(t) = number that survived at any given instance
Example: 300 units of electronic circuit boards were cycled for 6000 hours as shown in table 1.
The units that failed and those that survived in their corresponding intervals are as shown. The
numerical values of the parameters are computed using the formulas shown.
t nf ns
0<t<1000 100 200
1000<t<2000 80 120
2000<t<3000 60 60
3000<t<4000 40 20
4000<t<5000 10 10
5000<t<6000 8 2
t >6000 2 0
t nf n S = N (t ) N = N (t ) Rˆ (t ) Fˆ (t ) fˆ (t ) hˆ(t )
1000 100 200 100 0.6667 0.3333 0.000333 0.00033
2000 80 120 180 0.4000 0.6000 0.000266 0.00040
3000 60 60 240 0.2000 0.8000 0.00020 0.00050
4000 40 20 280 0.0667 0.9330 0.000133 0.000667
5000 10 10 290 0.03333 0.9667 0.000033 0.0005
6000 8 2 298 0.00667 0.9937 0.000023 0.0008
>6000 2 0 300 0.000 1.000 0.000006 0.001
Table 1: Failure data and the resulting Distributions

290.pdf

𝑁̄(𝑡) 𝑛𝑆 𝑁̄(𝑡) − 𝑁̄(𝑡 + Δ𝑡)

𝑅̂ (𝑡) = = , 𝑓(𝑡) =
𝑁 𝑁 𝑁(Δ𝑡)
̄ ̄
𝑁(𝑡) − 𝑁(𝑡 + Δ𝑡)
ℎ(𝑡) =
𝑁̄(𝑡)(Δ𝑡)
Example: Table 2 represents the failure data for a small sample. We will show how to compute the
various statistics such as the failure density function, the reliability function, the failure distribution
function and the hazard function
Figure 2:Reliability Function R(t)

0.8
0.6
0.4 Reliability
Function R(t)
0.2
0
0 2 4 6 8
Figure 3: Failure Distribution F(t)

1.2
1
0.8
0.6 Failure
0.4 Distribution F(t)
0.2
0
0 2 4 6 8

290.pdf

Figure 4: Density Function f(t)

0.00035
0.0003
0.00025
0.0002
Density Function
0.00015 f(x)
0.0001
0.00005
0
0 2 4 6 8
Figure 5: Hazard Function h(t)

0.0012
0.001
0.0008
0.0006 Hazard
Function h(t)
0.0004
0.0002
0
0 2 4 6 8
Table 2: Failure Data for Eight Springs

Failure Number KILOCYCLES TO FAILURE
1 190
2 245
3 265
4 300
5 320
6 325
7 370
8 400
For the data in table 2, n <10, so we will use the following formula to compute f(t), R(t), and h(t).

290.pdf

j − 0.3
Fˆ ( OTJ ) =
n + 0.4
 j − 0.3  n − j + 0.7
Rˆ ( OTJ ) = 1 − Fˆ ( OTJ ) = 1 −  =
 n + 0.4  n + 0.4
(n − j + 0.7 ) − (n − ( j + 1) + 0.7 )
hˆ( OT j ) =
Rˆ ( OT j ) − Rˆ ( OT j +1 )
= n + 0.4 n + 0.4 =
(n − j + 0.7 − n + j + 1 − 0.7)(n + 0.4)
( T − T ) Rˆ ( T )
O j +1 O j O j ( OT j +1 − OT j ) (n − j + 1 + 0.7 ) (n − j + 1 + 0.7)( OT j +1 − OT j )(n + 0.4)
n + 0.4
1
hˆ( OT j ) =
( OT j +1 − OT j )( n − j + 0.7)
ˆ ˆ
ˆf ( T ) = R( OT j ) − R( OT j +1 ) = 1
O j
ˆ
( OT j +1 − OT j ) R( OT j ) (n + 0.4)( OT j +1 − OT j )
Table 3: Computation of Reliability Measures for the Spring Test Data
Failure t t i +1 − t1 Fˆ (t ) Rˆ (t ) fˆ (t ) hˆ(t )
Number
1 190 55 0.083 0.917 0.0022 0.0024
2 245 20 0.202 0.798 0.0060 0.0075
Static
3 265 35 0.321 0.679 0.0034 0.0050
4 300 20 0.440 0.560 0.0059 0.0171*
5 320 5 0.560 0.440 0.0248
6 325 45 0.679 0.321 0.0025 0.0082
7 370 30 0.798 0.202 0.0040 0.0198
8 400 - 0.917 0.083 - -
*This value of the hazard rate was obtained by combining intervals four and
five together and thus considering it as a single interval of 20+5=25 kilocycles
Reliability
In performing the reliability analysis of a complex system, it is almost impossible to treat the
system in its entirety. A logical approach is to decompose the system into functional entities composed
of units, subsystems or components. Each entity is assumed to have either of two states – good or
bad (success or failure). System block diagrams (SBD) where necessary are generated to show
desirable system operation. Models are then formulated to fit the logical structure.
After the system block diagram has been completed, the system reliability diagram is then
developed. The system reliability diagram (RBD) is a logical diagram or graph whose edges represent
the system components and indicates how the system will successfully operate. A reliability block
diagram is a graphical representation of the components of the system and how they are related or
connected in terms of their reliability. It provides a success oriented view of the system and facilitates
the computation of system reliability from component reliabilities. It should be noted that RBD may
differ from the system block diagram. SBD shows how the components are physically or functionally
connected while the RBD shows how the system will successfully operate (or not).

290.pdf

The unit or subsystem reliabilities are computed and subsequently used to derive the overall
system reliability. Most systems can be decomposed into series, parallel or hybrid structures. In many
cases when the structure is of a more complicated or complex nature, more general techniques are
used. In the series and parallel models, the assumption is that each unit or entity is independent of the
others. In a series structure the functional operation of the system depends on the proper operation
of all system components. Parallel paths are redundant, meaning that all of the parallel paths must fail
for the parallel network to fail. By contrast, any failure along a series path causes the entire series path
to fail.
3.1 Series System
1 2 n Fig 6. Series Configuration System

The block diagram shows that a single path from cause to effect is created. Failure of any component
is represented by removal of the component which interrupts the path, thereby causing system failure.
Define
Ei = event that subsystem i will operate successfully
Ri=Subsystem survival probability
Rs = system reliability
Then Rs(series) = P[El  E2  E3 ….  En]
But for any two independent events A and B
n
P(AB) = P(A)x P(B) Rs = P(El)P(E2)P(E3) …P(En)= R
i =1
i
Let R1=0.90, R2=0.85, R3=0.99, Rs=(0.90)(0.85)(0.99)=0.7574

Note: The reliability of a series system is no better than the reliability of the worst component.
 e −(1 +2 +... +n )t , if the i ' s are different
Rs =  −nt
 e , if the i ' s are same
For a series structure, the system reliability decreases rapidly as the number of series components
increases, thus the reliability of a series system is at most as good as the poorest or least reliable
component. Let qi = probability that a subsystem or component ‘i’ will fail.
n
Rs = (1- ql)(l - q2)(1 - q3) ... (1 – qn )=  (1 − q )
i =1
i
 1 − nq ; if q ' s are identical

n
 1 −  qi if q ' s are different
i =1
For series configuration, Rs(t)= R1(t) x R2(t)….Rn(t). If the components have exponential life

290.pdf

− n t
R (t ) = e − 1t + e −  2 t + ... + e = e − t i
The system failure rate S= i , i=1, ..n, with system MTBF ()=1/S
 
 Rs (t ) dt = e
−t  i
= s
0 0
1 1
s = =
 i 1
+
1
+
1
... +
1
1 2 3 n
1  
If the Components are identical, i.e., 1=2=3=…=n, then  s = = =
n n n
3.2 Parallel Systems
In many systems, several signal paths perform the same operation. If the system configuration is such
that failure of one or more paths still allows the remaining path or paths to perform properly, the
system can be represented by a parallel model. A parallel system is one that is not considered to have
failed unless all components have failed. The reliability block diagram is represented as
Define Qs = unreliability of the system
Let Ei = event compoent ' i ' works , Ri = P( E )
Let Ei = event component ' i ' does not work , Qi = P( E )
1
2
Fig. 7. Parallel Configuration
n
Qs = P (E1 )P (E 2 ).....P (E n )  Qs = (1 − P(E1 ))(1 − P(E 2 )).......(1 − P(E n )) =  (1 − P(E i ))

n
i =1
  
n
 n
Rs = (1 − Qs ) = 1 −  (1 − P(E i )) = 1 −  (1 − Ri )
 i =1   i =1 
if R1 = 0.9, R2 = 0.85, R3 = 0.99, Qs = 0.1(0.15)(0.01) = 0.00015, R s = (1 − Qs ) = 0.99985
Consider a three unit redundant system (three components in parallel fig 7).
Let Ei = event compoent ' i ' works , Ri = P( E ),
Let Ei = event component ' i ' does not work , Qi = P( E )
The probability that the system fails is:
Qs = P(E1 )P(E 2 ) = (1 − R1 )(1 − R2 )(1 − R3 )
Rs = 1 − Qs = 1 − (1 − R1 )(1 − R2 )(1 − R3 )
 3

R(t ) = 1 −  (1 − Ri ) = 1 − (1 − R3 )(1 − R1 − R2 + R1 R2 )
 i =1 
= R1 + R2 + R3 − R1 R2 − R1 R3 − R2 R3 + R1 R2 R3

290.pdf

If we assume that the failure rate h(t) is constant, then the failure density function is the exponential
distribution. We can show this by using the non-parametric relationship between R(t), h(t), and f(t).
 t   t 
Given : h(t ) = c,  R(t ) = exp  −  h( )d  = exp  − c  d  = e −ct
 0   0 
− t
f (t ) = h(t ) R(t ) = ce = e  Which is the exp onential distributi on
− ct
1  1   1 
R(t ) = e −t , Taking log s, − ln( R(t ) = t  t = ln   = MTTF ln  
  R(t )   R(t ) 
Rs (t ) = e −1t + e −2t + e −3t − e −1t e −2t − e −1t e −3t − e −2t e −3t + e −1t e −2t e −3t
if 1 = 2 = 3 =   R(t ) = 3e − t − 3e −2  t + e −3 t
   
MTTF = E (T ) =  R s (t ) =  3e − t −  3e − 2 t +  e −3t
0 0 0 0
=
−3


e − t 0
+

2
e 
3 − 2 t 
0
+

e 
1 − 3 
0

3 3 1 1 3 1 1  1 1
= − + = 3 − +  = 1 + + 
 2 3   2 3   2 3
n
1 1
 MTTF =  , if components are identical
 i =1 i
The results are true for active parallel systems in which all the components are active in the
system starting from time zero. In a different type of redundant system, namely the standby system,
the second unit is turned only after the first unit fails. In that scenario, the failures rates are no longer
independent but depend on the failure of the main or primary unit.
There is a relationship between the design life of a system or component and the end of life
reliability. In practice, the engineer will set the design life so as to achieve a desired end-of-life reliability
goal. For example, if the design life is 5 years, we want the reliability at the end-of-life of the system
to be some value of reliability. Please also note that the MTTF is a single but important time point in
the time domain and thus is a time measured value.
Example: Through predictive analytics, the MTTF of systems with constant failure rate has
been determined and it is equal to MTTF0 or 15 years. The engineer wants to set the design life in
consonant with the predetermined MTTF so that the end-of-life reliability is 85%.
a). Determine the design life T with respect to MTTF
b). To enhance system performance, two identical units with same failure rate are utilized as part
of the active parallel system configuration to increase the design life. How does this new
configuration affect the design life given that the end-of-life reliability remains the same?

290.pdf

Solution part (a).

Let the failure rate be  = 1 / MTTF
1  1 
R(T ) = e − T  Taking log s : − T = ln( R(T )  T =   ln  
    R(T ) 
 1 
T = ln   MTTF0 = 0.1625MTTF0 = 2.44 yrs
 0.85 
Solution part (b).
R(T ) = 2e−T − e−2 T ; let x = e −T  e −2 T − 2e−T + R = 0
 x2 − 2x + R = 0
x=
+ 2  4 − 4R 1 − 1 − R 
x=  = 1− 1− R ( )
2 1 + 1 − R 
x = 1 + 1 − R is not permisible because x  1  x = e −T = R(t ) will be  1
1  1 
T = ln   MTTF0 = ln   MTTF0 = 0.4899MTTF0 = 0.4899(15) = 7.34 yrs
R (
 1− 1− R )

The redundant arrangement has more than three times the design life of the single unit.
Reliability Improvement
One approach to reliability improvement is to alter the system structure so as to obtain higher
reliability while maintaining the basic system function. This is generally accomplished by creating
additional parallel paths in the system structure and is usually termed REDUNDANCY. The
straightforward approach is to take existing system and connect a duplicate one in parallel. This results
in two separate systems. Such an approach which involves paralleling the entire system or unit is
called system or unit redundancy. A different approach is to parallel each component resulting in
component redundancy. The hybrid model resulting from a mix of both system and component
redundancies is called the compromise redundancy
Fig 8.System Redundancy Fig 9.Component Redundancy

290.pdf

3.1 Redundancy-High level

High level redundancy is based on the system or subsystem (See Fig 8). Each subsystem
consists of individual units in series. The resulting serial configuration is placed in parallel with other
subsystems to form a bank. Several such banks are placed in parallel to form a High Level
Redundant system. Assuming there are ‘m’ identical components per serial configuration subsystem
which form a bank and ‘n’ banks in parallel, then the system reliability assuming identical
components is given by:
Rseries = R m
Qseries = (1 − R m ) n
Rsubsys = 1 − (1 − R m ) n
 (
Qs = 1 − 1 − (1 − R m ) n )
n
 (
 Rs = 1 − Qs = 1 − 1 − 1 − (1 − R m ) n )
n
Under certain conditions, low level redundancies give higher reliability values than high level
redundancies, namely; the reliabilities of the component cannot depend on the configuration in which
they are located, the failure process must be truly independent for both configurations, and the
component reliabilities must be same for both configurations
Example: From Fig 8, Let Ri=0.9 for all components. Use n=2 and m=3. Note that in fig 8, m=2.
Rseries = R m = (0.9) = 0.729
3
Qseries = (1 − R 3 ) 2 = (0.271) = 0.0734

2
Rsubsys = 1 − (1 − R m ) n = 0.92656
 (
Qs = 1 − 1 − (1 − R 3 ) 2 ) = (0.0734)
2 2
= 0.005394
= 1 − 1 − (1 − (1 − R ) )
n
Rs = 1 − Qs m n
= 1 − 005394 = 0.994606
3.2 Redundancy-Low level

Low level redundancy is redundancy based on the component (See Fig 9). Thus
components are placed in parallel in banks where each bank consists of individual units in parallel.
Assuming there are ‘m’ components per bank and ‘n’ banks in series in the system then the system
reliability assuming identical components is given by:
For each bank , Rsub = 1 − (1 − R) m

For n banks, Rsys = 1 − (1 − R) m  n
From Fig 9, assuming Ri=0.9 for all components and m=4, n=2
Rsub = 1 − (1 − R) m = 1 − (1 − 0..9) = 1 − (0.1) = 1 − 0.0001 = 0.9999
4 4

Rsys = 1 − (1 − R) m  n
= (0.9999)(0.9999) = 0.9998

290.pdf

3.3 Active and Standby Redundancy

3.3.1 Active or Parallel System Models
Fig 10: Active Parallel System
For this two-unit system, we can compute the unreliability fo the system and based on the
complementary nature of R(t) and F(t), we can compute the reliability. The unreliability of the system
is given by the product of the probability of failure of both components, i.e.,
Qs(t) = P[t1<tt2<tt3<t...tn<t)
Qs(t) = P(t1<t)P(t2<t)P(t3<t)...P(tn<t)
but P(t1<t) = 1-P(t1>t) = 1-R(t)
If the failure mechanism for the components is independent, then:
Qs (t ) = i =1[1 − Ri (t )]  R(t ) = 1 − i =1[1 − Ri (t )]
n n
( )
n
If R(t ) = e −i t then R(t ) = 1 −  1 − e −i t
i =1
( )( )
For two identical units, R(t ) = 1 − 1 − e −it 1 − e −it = 2e −t + e −2t
Note: Failure rates in the exponential case are summed to combine independent series elements in
reliability analysis. In general the exponents are summed when a product of elements of the
exponential are desired.
3.4 Passive or Standby Configuration with Switching
1
R
2
S Fig. 11 Passive Standby with

switching
Switch n
E1
Mode 1
E1
Fig. 12 Success Mode
for 2-unit standby with
Mode 2 switching
www.SunCam.com Copyright© 2017 O.t1Geoffrey Okogbaa,t PE Page 20 of 39

290.pdf

The system operation is as follows. First the primary unit is switch on with the other unit in standby.
Should the primary unit fail, then the switching mechanism (perfect switch) switches over to the
standby unit which then works till time t. This results in two success modes as depicted in fig 12.
Mode 1: Primary unit works till end of life- t.
Mode 2: Primary unit fails at t1 and the standby unit takes over and works till t.
Rs(t) = P[(t1>t) (t1t t2>t-t1)]
Assuming that the success modes are mutually exclusive, then
R2s(t) = P[(t1>t) ]+ P[(t1t t2>t-t1)]
t
Rs2 (t ) = R1 (t ) +  f1 (t1 ) R2 (t − t1 )dt1 , if 1 = 2 = 
0
t
Rs2 (t ) = e −t +  e −t1 e − (t −t1 ) dt1 = e −t + te −t = e −t (1 + t )
0
In general for an (n) unit standby system with identical components with one primary and (n-1)
n −1
(t ) i
standby units, the system reliability is given by Rsn = e −t 
i =0 i!
3.4 Imperfect Switching
There are several ways in which a switch can fail. The failure modes depend on the switching
mechanism and the system.
Case I: When the switch fails to operate when called upon.
In case I, the probability that the switch performs when called upon to do so is ps.
For the two unit standby system, the system reliability is given as:
t
Rs (t ) = R1 (t ) + p s  f1 (t1 ) R2 (t − t1 )dt1 = e −t 1 + p s t  for constant failure rate system.
0
Example: Let p=0.99, =0.02 /hr for a two unit standby system with constant failure rate. Find the
reliability for a mission time of 50 hours
Solution: Rs (t ) = e − t 1 + p s t  = e −0.02(50) 1 + 0.99(50)(0.02) = 0.5018
Case II: When the switch is a complex piece of equipment with a constant failure rate equal to sw.
In case II, Rsw (t ) = e − sw =The reliability of the switching mechanism
For the two-unit standby system: Rs(t) = P [(t1<t) (t1t tsw>t1t2>t-t1)]
t
Rs (t ) = R1 (t ) +  f 1 (t1 ) Rsw (t1 ) R2 (t − t1 ) dt1
0
t
  
= R1 (t ) +  f 1 (t1 )e −swt1 R2 (t − t1 ) dt  Rs (t ) = e −t 1 + (1 − e −swt ) , t  0
0   sw 
Example: A two unit standby system with switch with constant failure rate equal to sw =0.001/hr,
The two units have identical constant failure rate of =0.04/hr, Find R(60) hr

290.pdf

  
Rs (t ) = e −t 1 +


(1 − e −swt ) = e −0.04(60) 1 +
0.04
( 
1 − e −0.001(50)  = 0.302 )
 sw   0.001 
Example: Consider a two-unit standby redundant system that has constant switch failure rate of s.
If the switch fails, the system fails. In this system both units have identical time to failure pdf’s given
by f(t)=exp(-t). (a). Find the system reliability function
(b). If s =0.01/hr and the subsystems both have a constant failure rate =0.02/hr, Find R(50) hr.
Solution
Rs (t ) = P(t1  t  t sw  t  (t1  t  t 2  (t − t1 )  t sw  t )
t
 t

= R1 (t ) Rsw (t ) + Rsw (t )  f 1 (t1 ) R2 (t − t1 )dt1 Rs (t ) = e −s t  R (t ) +  f (t1 ) R(t − t1 )dt1 
0  0 
 t

Rs (t ) = e −0.01t e −0.02t +  0.02e −0.02t1 e −0.02(t −t1 ) dt1  = Rs (t ) = e −0.03t 1 + 0.02t  , R(50) = e −0.03(50) 1 + 0.02(50) = 0.45
 0 
Many other types of switch failure may be encountered in practical situations. For example, a switch
may fail to hold a subsystem online or the switch may inadvertently sense a failure.
3.5 Shared Load Models
In this type of configuration, the parallel subsystems share the load equally and as a subsystem
fails, the surviving subsystem must sustain the increased load. Thus, as successive subsystems fail the
failure rate of the surviving components increases.
Example: A shared load parallel configuration would be when cables are used to support a
load or bolts are used to support a machine component. In each case the supporting cables or bolts
equally share in the support of the system.
Define the following system parameters
fh (t) = pdf for time to failure under half load, ff(t) = pdf for time to failure under full load
In the enduing analysis, it would be assumed that when failure occurs, the survivor follows the pdf
f(t) and that the pdf does not depend on the interval of elapsed time.
Mode 1
Mode 2
Mode 3
t2 t1 t
Fig 14: Success Modes for Shared Load Parallel System

290.pdf

The y-axis represents the success modes

Mode 1: both are working till time t
Mode 2: both work for a while, subsystem 1 fails and subsystem 2 works to completion.
Mode 3: both subsystems work for a while, subsystem 2 fails and subsystem 1 works to completion.
Let us consider each of the modes separately:
1. For Mode 1: both components survive, hence
P[t1>tt2>t] = [Rh(t)]2, where R (t ) =  f ( )d
h 
t
h
2. For Mode 2, both units work for a while, then subsystem 1 fails at t1 and subsystem 2 works
to completion to time t.
P[t1t, under half load)(t2>t1, under half load) (t2>t-t1 under full load)]
t 
= f
0
h (t1 ) Rh (t1 ) R f (t − t1 )dt1 , ~ where ~ R f (t ) =  f ( )d
0
3. For Mode 3. The third mode is identical to the second. Both work for a while, then subsystem
2 fails at t2 and subsystem 1 works to completion to time t
If we assume the components are identical then the system reliability is:
t
Rs (t ) = [ Rh (t )] 2 + 2 f h (t1 ) Rh (t1 ) R f (t − t1 )dt1
0
Let: h= half load failure rate and f = full load failure rate
t
− f ( t − )
Rs (t ) = e −2 ht + 2  h e −h e −h e d
0
Rs (t ) = e −2 ht +
2h
( 2h −  f )

− t
e f − e −ht , t  0 
Example: Assume that the shared load parallel system has constant failure rate and with
f =0.001/hr, and h =0.05/hr. Find the reliability of the system at t=300 hrs. Notice the failure rate
at full load is much higher than that at half load. The reason is that at full load a component is at
higher risk of failure than when it is working at half load.
Rs (t ) = e − 2 ht +
2 h
(2 h −  f )
 − t
e f − e − h t 
2(0.05)
Rs (300) = e −0.05(300) +
(2(0.05) − 0.001)
e −0.001(300) − e −0.05(300)  = 0.74831
Repairable Systems (Availability Analysis)

In many classes of systems where maintenance (preventive, predictive, and corrective) plays a
central role, reliability is no longer the main focus. In the case of repairable systems (as a result of
corrective maintenance), we are interested in:

290.pdf

• the probability of failure, ● the number of failures, ● the time required to make repairs
For such considerations, two new metrics (or parameters) of system effectiveness become the focus
of attention, namely i) Availability, ii) Maintainability. Other related measures include:
• reparability
• operational readiness
• intrinsic availability, and
● serviceability
Figure 15: System Effectiveness Measures
Operating Time
Operating Time
Free Time
Operating
Operating
Free Time
Free Time
Free Time
Time
Time
Down Time
Active Repair
Admin Time
A&L Time
4.1 Definition of Measures of System Effectiveness

4.1.1 Serviceability
This is the ease with which a system can be repaired. It is a characteristic of the system design
and must be planned at the design phase. It is difficult to measure on a numeric scale.
4.1.2 Reparability
This is the probability that a system will be restored to a satisfactory condition in a specified
interval of active repair time. This metric is very valuable to management since it helps to quantify
workload for the repair crew. A major issue with reparability is the issue of secondary failure during
repair or maintenance. Secondary failure is the failure of an item due to the failure of another item
either due to repair, maintenance or sheer inducement and may also affect performance.
4.1.3 Operational Readiness (OR)

This is the probability that a system is operating or can operate satisfactorily when the
system is used under stated conditions. This includes free (idle) time
Operating time + idle time
OR =
Operating time + idle time + downtime

290.pdf

4.1.4 Availability (A)

This is the probability that a system is available when needed or the probability that a system
is available for use at a given time. It is simply the proportion of time the system is in an operating
state and it considers only operating time and down time. It excludes free or idle time.
Operating time
A=
Operating time + downtime
4.1.5 Intrinsic Availability (AI)
This is defined as the probability that a system is operating in a satisfactory manner at any
point in time. In this context, time is limited to operating and active repair time. Intrinsic availability
is more restrictive than availability. Its numeric value is always more than that for availability.
Operating time
AI =
Operating time + active repair time
Summary of the Effectiveness Measures Based on Fig 15
The figure is not drawn to scale. the tick marks divide the time
horizon int o units of one as shown on the diagram
Operating time + idle time

OR (Operationa l Re adiness ) =
Operating time + idle time + downtime
14
=
20
Operating time 9
A( Availabili ty) = =
Operating time + downtime 15
Operating time 9
AI ( Intrinsic Avaliabili ty) = =
Operating time + active repair time 13
4.1.6 Maintainability
This is the probability that a system can be repaired in a given interval of downtime.
4.2 System Availability

For a repairable system, a fundamental parameter of interest is availability defined as:
A(t)= probability that a system is performing satisfactorily at time t and considers only operating time
and downtime. This definition refers to point availability and is often not a true measure of the system
performance. Often it is necessary to determine interval or mission availability defined as:
1 T
A* (T ) =
T 
0
A(t )dt .
This is the value of the point availability averaged over some interval of time T. This time interval may
represent the design life of the system or the time to accomplish some mission.

290.pdf

The steady state or asymptotic availability is given by:

1 T
T 0
A* () =lim it A(t )dt
T →
If the system or its components cannot be repaired then the point availability at time t is simply the
probability that it has not failed between time 0 and t, hence
A(t ) = R (t )
1 T
T 0
Substituting the value A(t ) = R (t ); : A* (T ) = R (t )
T MTTF
As T  ,  0
R (t ) MTTF , Hence A* () =

= 0  A* (T ) = 0
This result is quite intuitive given our assumption. Since all systems eventually fail and there is no
repair, then the availability averaged over an infinitely long time is zero.
The asymptotic availability is especially useful when both the failure and repair processes are
exponentially distributed. It is also useful for evaluating the overall availability since for a reasonable
time period T, availability is insensitive to the details of repair and failure process. It may also be used
as an approximation even when the failure and repair distributions are not exponential.
Example: A non-repairable system has a known MTTF and is characterized by a constant failure rate.
The system mission availability must be 0.95. Find the maximum design life that can be tolerated in
terms of the MTTF?
R(T ) = e −  T
1 T 1
A* (T ) =
T  0
e − t =
T
(1 − e − T )
Expanding the exponential assuming: (T <<<1)

A* (T )=
1
(1 − 1 + T −
1
(T )2 +.....)
T 2
1 1
A(T ) 1 − T 0.95 = 1 − T , T =0.1, but MTTF =
1
, T = (0.1)( MTTF )
2 2 
4.2.1 Computation of Availability

In order to calculate availability, one must take the repair rate into account, even though it
may be large compared to the failure rate. In other words:
• Repair time of 5 hours is equal to a rate of (1/5)=0.2
• MTTF of 400 hours is equal to a rate of (1/400)=0.0025
4.2.2 Repair Function

Assuming that the repair rate is constant, this means that the probability density function for the repair
function µ(t) is the exponential.
f (t ) = e − t , with the MTTR (mean time to repair ) = 1


290.pdf

Although the exponential may not reflect the details of the distribution very accurately, it
provides a reasonable approximation for predicting availability since these tend to depend more on
the MTTR than on the details of the distribution. Therefore, even when the pdf of the repair time
distribution is clustered about the MTTR and resulting distribution does not seem to be the
exponential, the constant repair rate (the exponential) model seems to adequately predict the
asymptotic reliability.
So long as failures are revealed immediately, the time to repair is the primary factor that
determines availability. If the system is not in continuous operation (as in standby) failures may occur
but will remain unrevealed. The primary loss of availability will be failures in standby mode that are
not detected until an attempt to use the system. A primary solution to this failure type is periodic
testing. While periodic testing may help detect more failures, it may also lead to loss of availability due
to the downtime for testing. The longer it takes to detect failures, the less is the system availability
4.2.3 Availability Modeling

Consider a two-state system (working or failed). The A(t) and Ã(t) are the probabilities that
the system is operational or failed at any time t. The initial conditions are thus:
A(t)= 1, Ã(t) =0, and A(t) + Ã(t) = 1
Differential equations can be used to develop the equation for availability. Consider the change
in A(t) between t and t+Δt. There are two contributions or possibilities.
• Δt is the conditional probability of failure during Δt, given that the system was available at
time t.
• µΔt is the conditional probability of repair during Δt given system failure.
Some assumptions
• P[system failure during Δt] = Δt
• P[repair during Δt|system failure] = µΔt
• A(t+ Δt) =A(t)[1-Δt]+[1-A(t)] µΔt
• Either the system was available in time t and did not fail in the interval Δt OR it failed during
Δt with prob. (1-A(t)) and was repaired with probability µΔt
Hence
A(t + t ) = A(t )[1 − t ] + [1 − A(t )]t  A(t ) − A(t ) t + t − A(t ) t
A(t + t ) − A(t )
Hence : = −( +  ) A(t ) + 
t
d
As t → 0, we have A(t ) = −( +  ) A(t ) + 
dt
 
The solution to this differential equation is given by : A(t ) = + e −(  +  ) t
 +  +

290.pdf

A(t ) = −( +  ) A(t ) + 

Integration factor ( IF ) approach :
A(t ) + (  +  ) A(t ) =  , IF = p = e 
d (  +  ) dt
= e(  + )t
dt
d
dt
( )
A(t )e (  +  ) t =  e (  +  )t  A(t )e (  +  )t =   e (  +  ) t + C

A(t )e (  + ) t = e (  + ) t + C , at t = 0, A(0) = 1
( +  )
 
Hence C =1 − =
( +  ) ( +  )
 
 A(t ) = + e − (  + )t
( +  ) ( +  )
As t becomes large, the availability A(t) clearly approaches a constant value. The steady state availability
is given by
 Mean repair rate
A *= lim A(t ) = ( +  ) =
t → Mean failure rate + Mean repair rate
  1 
  1   1    1  
 = 
1
=
1
 = =
If we rewrite
( +  )
as
1 1  1 + 1   +    
  +    + 
+
        
𝑀𝑒𝑎𝑛 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑎𝑖𝑙𝑢𝑟𝑒
ℎ𝑒𝑛𝑐𝑒 𝐴*(∞) = 𝐴(∞) =
𝑀𝑒𝑎𝑛𝑡𝑖𝑚𝑒 𝑡𝑜 𝑟𝑒𝑝𝑎𝑖𝑟+𝑀𝑒𝑎𝑛𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑎𝑖𝑙𝑢𝑟𝑒
MTTF
𝐴(∞) =
MTTF + MTTR
Example: In the following table (Table 4), the times (in days) over a 6-month period at which failure
of a production line occurred (tf) and times (tr) at which the plant was brought back online following
repair are as shown. Question:
(a). Calculate the 6-month availability from the plant data
(b). Estimate the MTTF and the MTTR from the data
(c). Estimate interval (steady state) availability
a) During the 6 months (182.5 days) there were 10 failures
~ 1 10
A(T ) =  (t ri − t fi )
T i =1
1
0.2 + 0.6 + 0.4 + 1.9 + 0.3 + 0.9 + .. + 3.3 + .1
182.5
~
A(T ) = 0.0630  A(T ) = 1 − 0.0630 = 0.937

290.pdf

Table 4: Failure and Repair Times of a production Line

S/N tf tr S/N (Cont) tf tr
1 12.80 13.0 6 56.40 57.30
2 14.20 14.8 7 62.70 62.80
3 25.40 25.80 8 131.20 134.90
4 31.40 33.40 9 146.70 150.00
5 35.30 35.60 10 177.00 177.10 b). Let
tr 0 = 0, we first estimate MTTFand then MTTR from the data
1 10
MTTF =  (t fi − t ri−1 )
N i =1
= 12.8 + 1.2 + 10.6 + 5.6 + 2.0 + 20.8 + 5.4 + 68.4 + 11.8 + 27.0
1
10
1
MTTF = (165.6) = 16.56 days
10
1 10
MTTR =  (t ri − t fi )
N i =1
=
1
0.2 + 0.6 + 0.4 + 1.9 + 0.3 + 0.9 + 0.1 + 3.7 + 3.3 + 0.1 = 1.15 days
10
c).  / 1 1
A* () = = = = = 0.936
 + ( /  ) + ( /  ) 1 + MTTR 1 + 1.15
MTTF 16.5
Redesign of the Automobile Braking System Using Redundancy Concepts

In order to accomplish this task, we will examine four designs configurations and compute
their reliabilities with the goal of determining the optimal design. We will use the following notations
and symbols
M Master Cylinder
W Wheel Cylinder
L Hydraulic Lines
Note: Safe breaking is achieved when either the front break works or the rear break works or both.
R(M)=0.995, R(Wi)=0.999, R(Li)=0.999

290.pdf

5.1 Basic Brake Design (Design a)

This is the traditional break design with two front and two rear cylinders connected to
hydraulic lines and a master cylinder, design #a.
Figure 16a: Basic

left Design
Brake
rear Pedal
wheW3 W1 left
el
L3 L1 front
M1
L4 L2 whee
W4 W2 lright
right front
rear whee
whe l
el
Figure 16b: Basic

Design Front
L1 W1 L2 W2
M1
L3 W3 L4 W4
Brea Safe
k Rear
Brea
pedal king
From Fig 16b, R(Basic Design a)
Ra =RM1{1- [(1- RW1RL1RW2RL2)(1-RW3RL3RW4RL4)]}= 0.995[1- (1- 0.9994)2] =0.99498

290.pdf

5.2 Unit or System Redundancy (Design b)

Install a duplicate set of brake shoes and cylinder on each wheel and feed these with separate hydraulic
lines attached to a second master cylinder. This results in two separate systems and doubles the cost,
weight and volume of the system.
Figure 17a: Unit/System Redundancy

L3 L1 W1
lef W
3
M1 left
W’1
t W Brake
’3 pedal
front
rea
L4 L2
rrig W4 L’3
M’1
L’1 W
right
W’4 L’4 L’2 2
W’2
ht front
rea
r
Fig 17b: Unit/System Redundancy
Front
L1 W1 L2 W2
M1
L3 W3 L4 W4
L’1 W’1 L’2 W’2
M’1
Brea L’3 W’3 L’4 W’4
k Rear Safe
Pedal Break
ing
Reliability of Design b: Ra from Fig 16a &b =0.99498
Rb = 1 − (1 − Ra )(1 − Ra )
= 1 - (1 - 0.99498)2 = 0.9999748

290.pdf

5.3 Component Redundancy (Design c)

Parallel two master cylinders and run two parallel hydraulic lines to each wheel which connects to a
parallel pair of wheel cylinders. In this case, each component is in parallel. Components are
individually paralleled.
Reliability for Design c
For the hydraulic lines
RL =1-[(1-RLi)(1-RLi)] for each set of parallel hydraulic lines
For the Wheel Cylinder
RW =1-[(1-RWi)(1-RWi)] for each set of parallel wheel cylinders
Left Fig 18a: Component Redundancy

rear W3 L’3 L’1 W1 Le
W’3 L3 M1 L1 W’1 ft
Brake
pedalL’4 fr
W4 L’2 W2
M’1 Right
on
W’4 L4 L2 W’2 front
t
Right
rear
Fig 18b: Component

Brea RedundancyFront Safe
L1 W1 L2 W2
k Brea
M1
Peda L’1 W’1 L’2 W’2 king
l
L3 W3 L4 W4
M’1
L’3 W’3 L’4 W’4
Rear
For each of these in series
2 2
For the Top half: 
( ) 
( ) 
2 2
1 −  1 − RLi  1 −  1 − RW ii  = (0.999998)(0.999998) = 0.999996

 i =1   i =1 

290.pdf

2 2
   
( ) ( )
2 2
1 −  1 − RLi  1 −  1 − RLi  = (0.999998)(0.999998) = 0.999996

For the Lower half:  i =1   i =1 
Reliability of the wheel and hydraulic line subsystem
RWL = 1 − (1 − 0.999996)(1 − 0.999996) = 1
For the master cylinder subsystem:
RM = 1 − (1 − 0.9995)(1 − 0.9995) = 0.999975
RC = RW L * RM = 0.999975(1) = 0.999975

290.pdf

5.4 Hybrid/Compromise Redundancy (Design d)

In a compromise system, a single brake pedal activates two separate master cylinders. One master
cylinder feeds a set of hydraulic lines, which connects to the front wheel brakes, and the other master
cylinder operates the rear wheel brake cylinder through its own set of lines.
Figure 19a: Hybrid/Compromise

Redundancy
L3 Brake pedal
le W3 L1 W1 left
ft front
re
M’1 M1
ar
right W4
L4 L2
W2 right
rear front
Fig 19b: Hybrid/Compromise Safe

Redundancy Front Break
Bra M1 L1 W1 L2 W2ing
ke
Ped M1’ L3 W3 L4 W4
al
Rea
r
RF = RM1RL1RW1RL2RW2 = (0.95)(0.9994) (RF=Reliability Front, RR=Reliability rear)
RF = RR, Hence: Rd = 1- [(1-RF)(1-RR)] = 1- [1-(0.995)(0.9994)]2 = 0.9999195
Maintenance
Relatively few systems can operate without a breakdown and so maintenance is needed to keep
the system running to ensure minimal interruption to the production activity. The objective of
maintenance is to increase the reliability (or more appropriately the availability in case of a repairable
system)of the system over the long haul by reducing the aging and wear-out effects due to corrosion,

290.pdf

fatigue and other internal and environmental problems that negatively affect the long term operability
of the system. Although the primary metric for judging effective maintenance is the resulting increase
in reliability after maintenance, the criteria most often considered is system availability especially in
the case of repairable or maintained system. Availability is defined in this case as the probability that
the system will be operational when needed. The distinction is clear. Generally, reliability in a strict
sense refers to unmaintained or systems that are not repairable whereas availability refers to maintained
or repairable systems. For satellites and one-shot space systems, we talk about reliability whereas for
cars or machinery we talk about availability, again in the strictest sense. We also make reference to
the notion of idealized maintenance which is a very rare case where maintenance returns the system
to as-good-as-new condition.
Considerable maintenance benefits can be realized when the maintenance intervals are chosen
such that for a given system the positive effects of wear-out time (increasing failure rate) is greater
than the negative effects of wear-in time (decreasing failure rate). This may apply more especially in a
system with different components. In such a case, it would be better to perform maintenance only
on those elements for which the wear-out effect dominate. For example, one may chose to replace
worn spark plugs in a car rather than replace a fuel injector (which may be defective) with a new one.
There are three basic types of maintenance, namely, Preventive, Predictive and Corrective,
and they are delineated as follows:
• Preventive (PM) – involves greasing, oiling, changing filters
• Predictive (PdM) – Inspections
• Corrective – Repairs
In general maintenance does not return the system to as good as new condition. A system that
has undergone any form of maintenance can be in one of several states after repair, namely:
a) As good as new
b) New better than old
c) As bad as old
d) Worse than old
6.1 Preventive Maintenance (PM)

Definition of Preventive Maintenance (PM): “Schedule of planned maintenance actions aimed
at the prevention of breakdowns and failures.” Preventive Maintenance is the planned maintenance
of plant infrastructure and equipment with the goal of improving equipment life by preventing excess
depreciation and impairment. This type of maintenance includes, but is not limited to, adjustments,
cleaning, lubrication, repairs, and replacements for the express purpose for the extension of equipment
life. However by its nature, it can also lead to common-mode failures where related or connected
component can be damaged due to the maintenance of its neighbor. Preventive Maintenance
standards provide the fundamental principles and crucial guidelines for establishing a successful
preventive maintenance program. Due to the varying needs of different plants, the type and amount
of preventive maintenance required also varies greatly from plant to plant. Due to this, it is extremely
difficult to establish a successful preventive maintenance program without proper guidelines and

290.pdf

instructions regarding the specific plant or equipment. The primary goal of PM is to preserve and
enhance equipment reliability. Therefore, any planned activity that increases the life of the plant or
equipment/component and helps such an entity to run more efficiently is desirable. Examples of PM
include tasks such as:
• Oil changes,
• Greasing,
• Changing filters,
• Belt tightening.
Preventive maintenance should be performed equipment as recommended by the original
Equipment Manufacturer (OEM). However, we must determine if time spent to perform PM is
greater than the replacement cost. If the PM cost is higher than the replacement cost, then
consideration should be given to replacement of the unit. Typically, equipment manufacturers outline
preventive maintenance procedures and guidelines in the OEM manuals including:
• Oil and/or grease types, and quantities
• Time periods (weekly, monthly, quarterly)
• V-belt inspections & Torque settings
• General visual inspections
These guidelines should be used when creating a PM program
In a recent study, the US Department of Energy (DOE) reported that every 10 minutes an
average furnace runs, it unleashes the equivalent energy of 3.5 sticks of dynamite in an effort to raise
awareness of the importance of regular PM of a common household furnace. The lesson or point to
be taken away from this study is that not performing PM wastes energy and costs money.
In addition to the guidelines and procedures that manufacturers provide in their manuals, the
American Standards Institute (ANSI) specify standards and recommendations for PM to help
businesses determine the type and frequency of inspections and maintenance procedures, define the
minimum requirements for servicing and maintaining plant equipment, serve as a comprehensive
maintenance checklist, and supplement more specific instructions, manufacturer publications, and
other standards. Through the application of these standards, industrial firms can improve automation
and operate more efficiently, produce higher quality products, minimize energy consumption, reduce
insurance inventories and business loss due to production delays, and increase overall safety levels. In
addition, preventive maintenance measures can drastically reduce errors in day-to-day operations, as
well as increase the overall preparedness of plants in the case of an emergency.
For PM, we must choose the maintenance interval for which the positive effect of wearout
time is greater than the negative effect of wearin time. Typically PM is performed on these components
for which the wearout effect dominates. Even with wearout present, the constant failure rate model
may suffice. Wearin like burn-in is a period of stabilization for the components.
As a general rule, only trained, qualified maintenance personnel should perform PM activities.

290.pdf

PM Training is important to:

 Ensure proper techniques and procedures are followed.
 Reduce Over greasing which is often worse than not greasing enough.
 Reduce improper tightening which increase shaft wear and shortens shaft life.
 Reduce common-mode failures due to poor maintenance practices.
 Ensure proper lubricants are used so as not to shorten equipment life
Benefits of PM
• Increases life of equipment
• Reduces failures and breakdowns
• Reduces costly down time
• Decreases cost of replacement
6.2 Predictive Maintenance (PdM)
Definition – Predictive Maintenance Techniques are techniques that help determine the
condition of in-service equipment in order to predict when maintenance should be performed. The
primary goal is to minimize disruption of normal system operations, while allowing for budgeted and
scheduled repairs. It also involves data analytics and rigorous mathematical methods as well as:
 Vibration Analysis
 Infrared Thermography
 Oil Analysis
 Visual Inspections
Benefits of PdM
 Provides increased operational life
 Results in decrease of downtime
 Allows for scheduled downtime
 Allows for money to be budgeted for repairs
 Lowers need for extensive parts inventory
 DOE reports an estimated 8-12% cost savings by having an effective PdM program
According to the USDOE a good PdM program will lead to:
 Reduction in maintenance costs – 25-30%
 Elimination of breakdowns – 70-75%
 Reduction of downtime- 35-45%
 Increase in production – 20-25%
PdM is often performed by a contract and specialized technician who:
• Are qualified and trained on latest technology
• Possess the proper equipment
• Are able to provide trending and historical data in report form
Some of the techniques used to implement effective PdM programs and techniques include:
• Oil Analyses
• Thermography
• Vibration Analyses (VA)
1). Oil Analyses is a long term program that may take years before its benefits are seen.

290.pdf

Oil analyses include oil analysis and wear particles analysis

a) Oil analysis determines:
Condition of oil , Quality of the lubricant, Suitability for continued use
b) Wear particle analysis determines:
i). Mechanical condition of machine components. ii). Identifies particle size, types, etc.
Oil Analysis results may:
• Detail the types of metal fragments in the sample
• Show a continued increase in particle size
• Recommend increasing sampling intervals
• Recommend shutting down machine
2). Thermography
This is used for electrical infrared inspections to detect hot spots, load imbalances and corrosion
at a safe distance and to detect failures due to excessive heat. Specific applications include:
• Indoor equipment such as MCC’s (Motor Control Center) disconnect switches &
transformers.
• Outdoor equipment such as substations, transformers and outdoor circuit breakers.
3). Vibration
Vibration tests are usually done on large equipment, such as blowers, pumps, etc. to:
• Determines if bearings or components are loose, moving or wearing.
• Allows for scheduled repair of equipment.
• Provide trending that enables shutdown of equipment BEFORE failure and major damage.
6.3 Corrective Maintenance (CM)

Definition – Repair of equipment/machinery in order to bring it to its original operating condition
Corrective maintenance is a form of maintenance that is implemented when the system breaks
down or when there is a fault or problem in a system, with the goal of restoring the system to an
operating condition. In some cases, it may not be possible to predict or prevent a failure, thus
corrective maintenance becomes the only option. In other instances, a system can require repairs as a
result of insufficient preventive maintenance, and in some other situations, it may be desirable to focus
on corrective, rather than preventive, repairs as part of a maintenance strategy. The process of
corrective maintenance begins with the failure and a diagnosis of the failure to determine why it
occurred. The diagnosis process may include a physical inspection of the system, the use of a
diagnostic equipment to evaluate the system, interviews with users, and a number of other steps. It is
important to determine what caused the problem in order to take appropriate action and to recognize
that multiple failures of components or software may occur simultaneously.

290.pdf

6.4 Summary
In summary, properly orchestrated maintenance programs have significant payoffs including but not
limited to; ● Keeps equipment running longer. ●Allows for scheduled, budgeted repairs
● .Reduces unscheduled down time, ● Makes life less stressful
Summary
Reliability Engineering is concerned with the design, implementation, and prediction of the
life profile of a system or component using a disciplined analysis approach that has strong roots in
statistics, mathematics and engineering, Given a system, subsystem or component, one of the major
challenges of reliability analysis is to provide an understanding of the inherent failure mechanisms that
undergird such a system and to develop the appropriate analytical scheme to determine the system's
life profile. The problem becomes even more daunting given the phenomenon of aging and related
transient phenomenon as well as the practical realities of little or no data. Today, these challenges still
persist especially as products gets more and more miniaturized and as companies try to shorten the
time to market in order to gain market share. This first in a two-course sequence has examined some
of the basic issues related to reliability such as:
• Understand the various viewpoints of reliability, especially the engineering design viewpoint.
• The use of nonparametric approach to estimate the reliability function.
• Understand the performance measures used to characterize reliability.
• Appropriate reliability based intervention strategies that lead to optimally maintained system.
• Availability, Maintainability and Performability measures.
The second sequence will focus on the all important area of dependency analysis, interference theory,
data analysis and testing.
References
1. Elsayed A. E., (1996), Reliability Engineering, Addison Wesley, Reading, MA
2. Lewis, E. E. (1995), Introduction to Reliability Engineering, Wiley & Sons, New York
3. Paul A. Tobias and David C. Trindade (1994), Applied Reliability, 2nd ed, Van Nostrand
Reinhold, New York.
4. Kapur. K. C., and Lamberson, L. R. (1977), Reliability in Engineering Design, Wiley & Sons, NY
5. Wayne, Nelson (1990), Accelerated Testing: Statistical Models, Test Plans, and Data Analyses,
Wiley & Sons New York
6. Wayne, Nelson (1982), Applied Life Data Analysis, Wiley & Sons, New York
7. Barlow, R. E. and Proschan, F. (1975), Statistical Theory of Reliability and Life Testing, Holt,
Rinehart, and Winston, New York
8. Operations & Maintenance Best Practices, A Guide to Achieving Operational Efficiency: Chapter 5;
Types of Maintenance Programs, Release 3.0, US Dept of Energy USDOE), August, 2010

What Every Engineer Should Know About Reliability Engineering I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

What Every Engineer Should Know About Reliability Engineering I

Uploaded by

Copyright:

Available Formats

290.

A SunCam online continuing education course

WHAT EVERY ENGINEER SHOULD KNOW ABOUT

O. Geoffrey Okogbaa, Ph.D., PE

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 2 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

3.3.1 Active or Parallel System Models........................................................................................ 20

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 3 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

1.1 Definition of Reliability

1.1.1 Performance and Reliability

1.1.2 Trade-offs: Reliability versus Cost

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 4 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

1.1.3 Time Element of Reliability

1.1.4 Operating Condition

1.1.5 Other Performability Measures

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 5 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

Figure 1: The failure Density Function

2.3 Reliability of Component of age t

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 6 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

where: F(x/t) = 1 - F (x/t) by definition, But F ( x / t ) = F (t + x )

Note: R(t ) = 1 − F (t )  d R(t ) = − d F (t ) = − f (t )

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 7 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

Let u = R(s), dv = ds  then v = s, and du = d (R(s))ds = - f(s)ds

When s =0, R(0) =1, sR(s)=0, at s= ∞, R(∞)=0, hence:∞(0)=0

Example with exponential density function

Using Integratio n by parts :  udv = uv −  vdu

(− 1)s exp( − s /  ) 0 −  exp( − s /  ) 0

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 8 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

2.6 Hazard Functions for Common Distributions

2.6.2 Normal Distribution (Standard Normal Distribution)

2.6.3 Log Normal Distribution

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 9 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

2.6.4 Weibull Distribution

Where: oT1, < OT2 < OT 3 < ……< OT n

2.7.1 Large Sample size (n >10)

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 10 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

Estimation Using Empirical Data

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 11 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

𝑁̄(𝑡) 𝑛𝑆 𝑁̄(𝑡) − 𝑁̄(𝑡 + Δ𝑡)

Figure 2:Reliability Function R(t)

Figure 3: Failure Distribution F(t)

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 12 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

Figure 4: Density Function f(t)

Figure 5: Hazard Function h(t)

Table 2: Failure Data for Eight Springs

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 13 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

www.SunCam.com Copyright© 2017 O. Geoffrey Okogbaa, PE Page 14 of 39

WHAT EVERY ENGINEER SHOULD KNOW ABOUT ENGINEERING RELIABILITY I

3.1 Series System

1 2 n Fig 6. Series Configuration System

Let R1=0.90, R2=0.85, R3=0.99, Rs=(0.90)(0.85)(0.99)=0.7574