Professional Documents
Culture Documents
by
Contents
Introduction .................................................................................................................................................. 4
1.1 Definition of Reliability ................................................................................................................. 4
1.1.1 Performance and Reliability .................................................................................................. 4
1.1.2 Trade-offs: Reliability versus Cost ............................................................................................. 4
1.1.3 Time Element of Reliability ................................................................................................... 5
1.1.4 Operating Condition.............................................................................................................. 5
1.1.5 Other Performability Measures ............................................................................................ 5
1.2 Definition of Failure ...................................................................................................................... 5
Reliability Models.......................................................................................................................................... 5
2.1 Parametric and Nonparametric Relationships .............................................................................. 5
2.2 Failure Density Function ............................................................................................................... 6
2.2.1 Failure Probability in the interval (t1,t2) ............................................................................... 6
2.3 Reliability of Component of age t ................................................................................................. 6
2.4 Conditional Failure Rate (Hazard Function) .................................................................................. 7
2.5 Mean Time To Failure (MTTF and MTBF)...................................................................................... 7
2.6 Hazard Functions for Common Distributions................................................................................ 9
2.6.1 Exponential ........................................................................................................................... 9
2.6.2 Normal Distribution (Standard Normal Distribution) .................................................................. 9
2.6.3 Log Normal Distribution ........................................................................................................ 9
2.6.4 Weibull Distribution ............................................................................................................ 10
2.7 Estimating R(t), h(t), f(t) Using Empirical Data ............................................................................ 10
2.7.1 Small sample size (n < 10) ................................................................................................... 10
2.7.1 Large Sample size (n >10).................................................................................................... 10
Static Reliability........................................................................................................................................... 14
3.1 Series System .............................................................................................................................. 15
3.2 Parallel Systems .......................................................................................................................... 16
Reliability Improvement.............................................................................................................................. 18
3.1 Redundancy-High level ............................................................................................................... 19
3.2 Redundancy-Low level ................................................................................................................ 19
3.3 Active and Standby Redundancy ................................................................................................ 20
Introduction
Reliability consideration is playing an increasing role in virtually all human endeavors more
specifically in engineering designs. As the demand for systems that perform better and cost less
increase, there is a concomitant need and perhaps even requirement to minimize the probability of
component and/or system failures. Such failures if not properly mitigated could lead to increased cost
and inconvenience or could threaten individual and public safety.
both performance and reliability are equally useful. Performance must be high or the number of losses
during combat mission would be high. In situations other than life and death, reliability is viewed in
terms of economics. While management is concerned about reliability, management is less concerned
about the technical jargon surrounding reliability. So the best way to communicate the importance of
reliability to management is in terms of dollars and cents.
Reliability Models
2.1 Parametric and Nonparametric Relationships
Define t = random variable representing the time to failure and define T as the age of the system. If
the failure density function is given by f(t), then Prob(t < T) is the probability of failure and is
represented as F(t). F(t) also known as the distribution function of failure process. The
nonparametric relationship between f(t) and F(t) is given as: F (t ) = t f ( s)ds
0
t
The Re liability function is given by : R(t ) = 1 − F (t ) = 1 − f ( s)ds = f ( s)ds
0 t
If ‘t’ is a negative exponential random variable with a constant parameter θ (The Mean Time to
Failure or MTTF), we can use probability to show the relation between f(t), F(f), and R(t), that is:
1 t
f (t ) = exp
t
s s s t
t
1
0
F (t ) = exp − = − exp − = −1exp − − 1 = 1 − exp −
0
t
R (t ) = 1 − F (t ) = exp −
f(t)
t1 t2 t
2.2 Failure Density Function
This gives a relative frequency of failure from the viewpoint of initial operation at time t =0. The
failure distribution function F(t) is the special case when t1 = 0 and t2 =t, i.e. F(t2)= F(t)
2.2.1 Failure Probability in the interval (t1,t2)
t2 t2 t1
= F (t2)-F(t 1)= [1 - R(t2) ] - [(1 - R(t 1)]= R(t 1 ) - R(t 2)
t1
f ( t ) dt = f ( t ) dt −
0
f ( t ) dt
0
to time (t+x) F ( x / t ) = F (t + x )
F (t )
Similarly, the conditional probability of failure during the interval of duration x is F(x/t)
1 d . Note: If f(t) is the exponential then and only then is h(t)=λ or 1/θ
h(t ) = − dt R (t )
R (t )
For the mean time to failure or expected time to failure or the average life of the system we have;
E (T ) = R( s)ds, 0 T = MTTF=expected life of the system
0
By proper transformation and integration (integration by parts), the mean time to failure is:
E (T ) = R(s)ds =
0
sf (s)ds , How? By Integration by parts.
0
E (T ) = R( s)ds,0 T
0
, Let : udv = uv − vdu
R(s)ds = sf (s)ds
0
E (T ) = 0 − 0 − 1 = = MTTF =
1
t =
R (t = ) = R ( MTTF ) = exp − = exp − = exp (− 1) = 0.3679, F ( MTTF ) = 1 − R ( MTTF )
For the Normal Density:
t− t − MTTF
P (t ) = F (t ) = = = Z0 ,
MTTF − MTTF
When t = MTTF Z 0 = = 0 (0) = 0.5 F ( MTTF ) = 0.5
R ( MTTF ) = 1 − F ( MTTF ) = 0.5
Thus, even if MTTF is the same and known, reliability could change depending on the distribution or
density function associated with failure. Please note that for non-repairable system, we have MTTF,
namely mean time to failure. For repairable systems it is mean time to first failure (MTFF).
Note on MTTF: MTTF is the average life of the system. The MTTF can be derived or estimated
from the reliability function. Also, MTTF is different depending on the reliability function. That
means that MTTF for the normal distribution or the Weibull distribution will all be different and
specific to each distribution
Hence the cumulative distribution at the jth ordered failure time tj is estimated as:
j − 0.3
Fˆ ( OTJ ) =
n + 0.4
j − 0.3 n + 0.4 − j + 0.3 n − j + 0.7
Rˆ ( OTJ ) = 1 − Fˆ ( OTJ ) = 1 − = =
n + 0.4 n + 0.4 n + 0.4
Rˆ ( OT j ) − Rˆ ( OT j +1 ) 1
hˆ( OT j ) = =
ˆ
( OT j +1 − OT j ) R( OT j ) ( OT j +1 − OT j )( n − j + 0.7)
Rˆ ( OT j ) − Rˆ ( OT j +1 ) 1
fˆ ( OT j ) = =
ˆ
( OT j +1 − OT j ) R( OT j ) (n + 0.4)( OT j +1 − OT j )
t nf ns
0<t<1000 100 200
1000<t<2000 80 120
2000<t<3000 60 60
3000<t<4000 40 20
4000<t<5000 10 10
5000<t<6000 8 2
t >6000 2 0
t nf n S = N (t ) N = N (t ) Rˆ (t ) Fˆ (t ) fˆ (t ) hˆ(t )
1000 100 200 100 0.6667 0.3333 0.000333 0.00033
2000 80 120 180 0.4000 0.6000 0.000266 0.00040
3000 60 60 240 0.2000 0.8000 0.00020 0.00050
4000 40 20 280 0.0667 0.9330 0.000133 0.000667
5000 10 10 290 0.03333 0.9667 0.000033 0.0005
6000 8 2 298 0.00667 0.9937 0.000023 0.0008
>6000 2 0 300 0.000 1.000 0.000006 0.001
Table 1: Failure data and the resulting Distributions
0.6
0.4 Reliability
Function R(t)
0.2
0
0 2 4 6 8
0.001
0.0008
0.0006 Hazard
Function h(t)
0.0004
0.0002
0
0 2 4 6 8
For the data in table 2, n <10, so we will use the following formula to compute f(t), R(t), and h(t).
j − 0.3
Fˆ ( OTJ ) =
n + 0.4
j − 0.3 n − j + 0.7
Rˆ ( OTJ ) = 1 − Fˆ ( OTJ ) = 1 − =
n + 0.4 n + 0.4
(n − j + 0.7 ) − (n − ( j + 1) + 0.7 )
hˆ( OT j ) =
Rˆ ( OT j ) − Rˆ ( OT j +1 )
= n + 0.4 n + 0.4 =
(n − j + 0.7 − n + j + 1 − 0.7)(n + 0.4)
( T − T ) Rˆ ( T )
O j +1 O j O j ( OT j +1 − OT j ) (n − j + 1 + 0.7 ) (n − j + 1 + 0.7)( OT j +1 − OT j )(n + 0.4)
n + 0.4
1
hˆ( OT j ) =
( OT j +1 − OT j )( n − j + 0.7)
ˆ ˆ
ˆf ( T ) = R( OT j ) − R( OT j +1 ) = 1
O j
ˆ
( OT j +1 − OT j ) R( OT j ) (n + 0.4)( OT j +1 − OT j )
Table 3: Computation of Reliability Measures for the Spring Test Data
Failure t t i +1 − t1 Fˆ (t ) Rˆ (t ) fˆ (t ) hˆ(t )
Number
1 190 55 0.083 0.917 0.0022 0.0024
2 245 20 0.202 0.798 0.0060 0.0075
Static
3 265 35 0.321 0.679 0.0034 0.0050
4 300 20 0.440 0.560 0.0059 0.0171*
5 320 5 0.560 0.440 0.0248
6 325 45 0.679 0.321 0.0025 0.0082
7 370 30 0.798 0.202 0.0040 0.0198
8 400 - 0.917 0.083 - -
*This value of the hazard rate was obtained by combining intervals four and
five together and thus considering it as a single interval of 20+5=25 kilocycles
Reliability
In performing the reliability analysis of a complex system, it is almost impossible to treat the
system in its entirety. A logical approach is to decompose the system into functional entities composed
of units, subsystems or components. Each entity is assumed to have either of two states – good or
bad (success or failure). System block diagrams (SBD) where necessary are generated to show
desirable system operation. Models are then formulated to fit the logical structure.
After the system block diagram has been completed, the system reliability diagram is then
developed. The system reliability diagram (RBD) is a logical diagram or graph whose edges represent
the system components and indicates how the system will successfully operate. A reliability block
diagram is a graphical representation of the components of the system and how they are related or
connected in terms of their reliability. It provides a success oriented view of the system and facilitates
the computation of system reliability from component reliabilities. It should be noted that RBD may
differ from the system block diagram. SBD shows how the components are physically or functionally
connected while the RBD shows how the system will successfully operate (or not).
The unit or subsystem reliabilities are computed and subsequently used to derive the overall
system reliability. Most systems can be decomposed into series, parallel or hybrid structures. In many
cases when the structure is of a more complicated or complex nature, more general techniques are
used. In the series and parallel models, the assumption is that each unit or entity is independent of the
others. In a series structure the functional operation of the system depends on the proper operation
of all system components. Parallel paths are redundant, meaning that all of the parallel paths must fail
for the parallel network to fail. By contrast, any failure along a series path causes the entire series path
to fail.
For series configuration, Rs(t)= R1(t) x R2(t)….Rn(t). If the components have exponential life
− n t
R (t ) = e − 1t + e − 2 t + ... + e = e − t i
The system failure rate S= i , i=1, ..n, with system MTBF ()=1/S
Rs (t ) dt = e
−t i
= s
0 0
1 1
s = =
i 1
+
1
+
1
... +
1
1 2 3 n
1
If the Components are identical, i.e., 1=2=3=…=n, then s = = =
n n n
3.2 Parallel Systems
In many systems, several signal paths perform the same operation. If the system configuration is such
that failure of one or more paths still allows the remaining path or paths to perform properly, the
system can be represented by a parallel model. A parallel system is one that is not considered to have
failed unless all components have failed. The reliability block diagram is represented as
Define Qs = unreliability of the system
Let Ei = event compoent ' i ' works , Ri = P( E )
Let Ei = event component ' i ' does not work , Qi = P( E )
1
2
Fig. 7. Parallel Configuration
n
i =1
n
n
Rs = (1 − Qs ) = 1 − (1 − P(E i )) = 1 − (1 − Ri )
i =1 i =1
if R1 = 0.9, R2 = 0.85, R3 = 0.99, Qs = 0.1(0.15)(0.01) = 0.00015, R s = (1 − Qs ) = 0.99985
Consider a three unit redundant system (three components in parallel fig 7).
Let Ei = event compoent ' i ' works , Ri = P( E ),
Let Ei = event component ' i ' does not work , Qi = P( E )
The probability that the system fails is:
Qs = P(E1 )P(E 2 ) = (1 − R1 )(1 − R2 )(1 − R3 )
Rs = 1 − Qs = 1 − (1 − R1 )(1 − R2 )(1 − R3 )
3
R(t ) = 1 − (1 − Ri ) = 1 − (1 − R3 )(1 − R1 − R2 + R1 R2 )
i =1
= R1 + R2 + R3 − R1 R2 − R1 R3 − R2 R3 + R1 R2 R3
If we assume that the failure rate h(t) is constant, then the failure density function is the exponential
distribution. We can show this by using the non-parametric relationship between R(t), h(t), and f(t).
t t
Given : h(t ) = c, R(t ) = exp − h( )d = exp − c d = e −ct
0 0
− t
f (t ) = h(t ) R(t ) = ce = e Which is the exp onential distributi on
− ct
1 1 1
R(t ) = e −t , Taking log s, − ln( R(t ) = t t = ln = MTTF ln
R(t ) R(t )
Rs (t ) = e −1t + e −2t + e −3t − e −1t e −2t − e −1t e −3t − e −2t e −3t + e −1t e −2t e −3t
if 1 = 2 = 3 = R(t ) = 3e − t − 3e −2 t + e −3 t
MTTF = E (T ) = R s (t ) = 3e − t − 3e − 2 t + e −3t
0 0 0 0
=
−3
e − t 0
+
2
e
3 − 2 t
0
+
e
1 − 3
0
3 3 1 1 3 1 1 1 1
= − + = 3 − + = 1 + +
2 3 2 3 2 3
n
1 1
MTTF = , if components are identical
i =1 i
The results are true for active parallel systems in which all the components are active in the
system starting from time zero. In a different type of redundant system, namely the standby system,
the second unit is turned only after the first unit fails. In that scenario, the failures rates are no longer
independent but depend on the failure of the main or primary unit.
There is a relationship between the design life of a system or component and the end of life
reliability. In practice, the engineer will set the design life so as to achieve a desired end-of-life reliability
goal. For example, if the design life is 5 years, we want the reliability at the end-of-life of the system
to be some value of reliability. Please also note that the MTTF is a single but important time point in
the time domain and thus is a time measured value.
Example: Through predictive analytics, the MTTF of systems with constant failure rate has
been determined and it is equal to MTTF0 or 15 years. The engineer wants to set the design life in
consonant with the predetermined MTTF so that the end-of-life reliability is 85%.
a). Determine the design life T with respect to MTTF
b). To enhance system performance, two identical units with same failure rate are utilized as part
of the active parallel system configuration to increase the design life. How does this new
configuration affect the design life given that the end-of-life reliability remains the same?
x=
+ 2 4 − 4R 1 − 1 − R
x= = 1− 1− R ( )
2 1 + 1 − R
x = 1 + 1 − R is not permisible because x 1 x = e −T = R(t ) will be 1
1 1
T = ln MTTF0 = ln MTTF0 = 0.4899MTTF0 = 0.4899(15) = 7.34 yrs
R (
1− 1− R )
The redundant arrangement has more than three times the design life of the single unit.
Reliability Improvement
One approach to reliability improvement is to alter the system structure so as to obtain higher
reliability while maintaining the basic system function. This is generally accomplished by creating
additional parallel paths in the system structure and is usually termed REDUNDANCY. The
straightforward approach is to take existing system and connect a duplicate one in parallel. This results
in two separate systems. Such an approach which involves paralleling the entire system or unit is
called system or unit redundancy. A different approach is to parallel each component resulting in
component redundancy. The hybrid model resulting from a mix of both system and component
redundancies is called the compromise redundancy
Under certain conditions, low level redundancies give higher reliability values than high level
redundancies, namely; the reliabilities of the component cannot depend on the configuration in which
they are located, the failure process must be truly independent for both configurations, and the
component reliabilities must be same for both configurations
Example: From Fig 8, Let Ri=0.9 for all components. Use n=2 and m=3. Note that in fig 8, m=2.
Rseries = R m = (0.9) = 0.729
3
Rsubsys = 1 − (1 − R m ) n = 0.92656
(
Qs = 1 − 1 − (1 − R 3 ) 2 ) = (0.0734)
2 2
= 0.005394
= 1 − 1 − (1 − (1 − R ) )
n
Rs = 1 − Qs m n
= 1 − 005394 = 0.994606
From Fig 9, assuming Ri=0.9 for all components and m=4, n=2
Rsub = 1 − (1 − R) m = 1 − (1 − 0..9) = 1 − (0.1) = 1 − 0.0001 = 0.9999
4 4
Rsys = 1 − (1 − R) m n
= (0.9999)(0.9999) = 0.9998
For this two-unit system, we can compute the unreliability fo the system and based on the
complementary nature of R(t) and F(t), we can compute the reliability. The unreliability of the system
is given by the product of the probability of failure of both components, i.e.,
Qs(t) = P[t1<tt2<tt3<t...tn<t)
Qs(t) = P(t1<t)P(t2<t)P(t3<t)...P(tn<t)
but P(t1<t) = 1-P(t1>t) = 1-R(t)
If the failure mechanism for the components is independent, then:
Qs (t ) = i =1[1 − Ri (t )] R(t ) = 1 − i =1[1 − Ri (t )]
n n
( )
n
If R(t ) = e −i t then R(t ) = 1 − 1 − e −i t
i =1
( )( )
For two identical units, R(t ) = 1 − 1 − e −it 1 − e −it = 2e −t + e −2t
Note: Failure rates in the exponential case are summed to combine independent series elements in
reliability analysis. In general the exponents are summed when a product of elements of the
exponential are desired.
1
R
2
E1
Mode 1
E1
Fig. 12 Success Mode
for 2-unit standby with
Mode 2 switching
The system operation is as follows. First the primary unit is switch on with the other unit in standby.
Should the primary unit fail, then the switching mechanism (perfect switch) switches over to the
standby unit which then works till time t. This results in two success modes as depicted in fig 12.
Mode 1: Primary unit works till end of life- t.
Mode 2: Primary unit fails at t1 and the standby unit takes over and works till t.
Rs(t) = P[(t1>t) (t1t t2>t-t1)]
Assuming that the success modes are mutually exclusive, then
R2s(t) = P[(t1>t) ]+ P[(t1t t2>t-t1)]
t
Rs2 (t ) = R1 (t ) + f1 (t1 ) R2 (t − t1 )dt1 , if 1 = 2 =
0
t
Rs2 (t ) = e −t + e −t1 e − (t −t1 ) dt1 = e −t + te −t = e −t (1 + t )
0
In general for an (n) unit standby system with identical components with one primary and (n-1)
n −1
(t ) i
standby units, the system reliability is given by Rsn = e −t
i =0 i!
3.4 Imperfect Switching
There are several ways in which a switch can fail. The failure modes depend on the switching
mechanism and the system.
Case I: When the switch fails to operate when called upon.
In case I, the probability that the switch performs when called upon to do so is ps.
For the two unit standby system, the system reliability is given as:
t
Rs (t ) = R1 (t ) + p s f1 (t1 ) R2 (t − t1 )dt1 = e −t 1 + p s t for constant failure rate system.
0
Example: Let p=0.99, =0.02 /hr for a two unit standby system with constant failure rate. Find the
reliability for a mission time of 50 hours
Solution: Rs (t ) = e − t 1 + p s t = e −0.02(50) 1 + 0.99(50)(0.02) = 0.5018
Case II: When the switch is a complex piece of equipment with a constant failure rate equal to sw.
In case II, Rsw (t ) = e − sw =The reliability of the switching mechanism
For the two-unit standby system: Rs(t) = P [(t1<t) (t1t tsw>t1t2>t-t1)]
t
Rs (t ) = R1 (t ) + f 1 (t1 ) Rsw (t1 ) R2 (t − t1 ) dt1
0
t
= R1 (t ) + f 1 (t1 )e −swt1 R2 (t − t1 ) dt Rs (t ) = e −t 1 + (1 − e −swt ) , t 0
0 sw
Example: A two unit standby system with switch with constant failure rate equal to sw =0.001/hr,
The two units have identical constant failure rate of =0.04/hr, Find R(60) hr
Rs (t ) = e −t 1 +
(1 − e −swt ) = e −0.04(60) 1 +
0.04
(
1 − e −0.001(50) = 0.302 )
sw 0.001
Example: Consider a two-unit standby redundant system that has constant switch failure rate of s.
If the switch fails, the system fails. In this system both units have identical time to failure pdf’s given
by f(t)=exp(-t). (a). Find the system reliability function
(b). If s =0.01/hr and the subsystems both have a constant failure rate =0.02/hr, Find R(50) hr.
Solution
Rs (t ) = P(t1 t t sw t (t1 t t 2 (t − t1 ) t sw t )
t
t
= R1 (t ) Rsw (t ) + Rsw (t ) f 1 (t1 ) R2 (t − t1 )dt1 Rs (t ) = e −s t R (t ) + f (t1 ) R(t − t1 )dt1
0 0
t
Rs (t ) = e −0.01t e −0.02t + 0.02e −0.02t1 e −0.02(t −t1 ) dt1 = Rs (t ) = e −0.03t 1 + 0.02t , R(50) = e −0.03(50) 1 + 0.02(50) = 0.45
0
Many other types of switch failure may be encountered in practical situations. For example, a switch
may fail to hold a subsystem online or the switch may inadvertently sense a failure.
3.5 Shared Load Models
In this type of configuration, the parallel subsystems share the load equally and as a subsystem
fails, the surviving subsystem must sustain the increased load. Thus, as successive subsystems fail the
failure rate of the surviving components increases.
Example: A shared load parallel configuration would be when cables are used to support a
load or bolts are used to support a machine component. In each case the supporting cables or bolts
equally share in the support of the system.
Define the following system parameters
fh (t) = pdf for time to failure under half load, ff(t) = pdf for time to failure under full load
In the enduing analysis, it would be assumed that when failure occurs, the survivor follows the pdf
f(t) and that the pdf does not depend on the interval of elapsed time.
Mode 1
Mode 2
Mode 3
t2 t1 t
Fig 14: Success Modes for Shared Load Parallel System
2. For Mode 2, both units work for a while, then subsystem 1 fails at t1 and subsystem 2 works
to completion to time t.
P[t1t, under half load)(t2>t1, under half load) (t2>t-t1 under full load)]
t
= f
0
h (t1 ) Rh (t1 ) R f (t − t1 )dt1 , ~ where ~ R f (t ) = f ( )d
0
3. For Mode 3. The third mode is identical to the second. Both work for a while, then subsystem
2 fails at t2 and subsystem 1 works to completion to time t
If we assume the components are identical then the system reliability is:
t
Rs (t ) = [ Rh (t )] 2 + 2 f h (t1 ) Rh (t1 ) R f (t − t1 )dt1
0
Let: h= half load failure rate and f = full load failure rate
t
− f ( t − )
Rs (t ) = e −2 ht + 2 h e −h e −h e d
0
Rs (t ) = e −2 ht +
2h
( 2h − f )
− t
e f − e −ht , t 0
Example: Assume that the shared load parallel system has constant failure rate and with
f =0.001/hr, and h =0.05/hr. Find the reliability of the system at t=300 hrs. Notice the failure rate
at full load is much higher than that at half load. The reason is that at full load a component is at
higher risk of failure than when it is working at half load.
Rs (t ) = e − 2 ht +
2 h
(2 h − f )
− t
e f − e − h t
2(0.05)
Rs (300) = e −0.05(300) +
(2(0.05) − 0.001)
e −0.001(300) − e −0.05(300) = 0.74831
• the probability of failure, ● the number of failures, ● the time required to make repairs
For such considerations, two new metrics (or parameters) of system effectiveness become the focus
of attention, namely i) Availability, ii) Maintainability. Other related measures include:
• reparability
• operational readiness
• intrinsic availability, and
● serviceability
Figure 15: System Effectiveness Measures
Operating Time
Operating Time
Free Time
Operating
Operating
Free Time
Free Time
Free Time
Time
Time
Down Time
Active Repair
Admin Time
A&L Time
4.1.2 Reparability
This is the probability that a system will be restored to a satisfactory condition in a specified
interval of active repair time. This metric is very valuable to management since it helps to quantify
workload for the repair crew. A major issue with reparability is the issue of secondary failure during
repair or maintenance. Secondary failure is the failure of an item due to the failure of another item
either due to repair, maintenance or sheer inducement and may also affect performance.
4.1.6 Maintainability
This is the probability that a system can be repaired in a given interval of downtime.
If the system or its components cannot be repaired then the point availability at time t is simply the
probability that it has not failed between time 0 and t, hence
A(t ) = R (t )
1 T
T 0
Substituting the value A(t ) = R (t ); : A* (T ) = R (t )
T MTTF
As T , 0
R (t ) MTTF , Hence A* () =
= 0 A* (T ) = 0
This result is quite intuitive given our assumption. Since all systems eventually fail and there is no
repair, then the availability averaged over an infinitely long time is zero.
The asymptotic availability is especially useful when both the failure and repair processes are
exponentially distributed. It is also useful for evaluating the overall availability since for a reasonable
time period T, availability is insensitive to the details of repair and failure process. It may also be used
as an approximation even when the failure and repair distributions are not exponential.
Example: A non-repairable system has a known MTTF and is characterized by a constant failure rate.
The system mission availability must be 0.95. Find the maximum design life that can be tolerated in
terms of the MTTF?
R(T ) = e − T
1 T 1
A* (T ) =
T 0
e − t =
T
(1 − e − T )
Although the exponential may not reflect the details of the distribution very accurately, it
provides a reasonable approximation for predicting availability since these tend to depend more on
the MTTR than on the details of the distribution. Therefore, even when the pdf of the repair time
distribution is clustered about the MTTR and resulting distribution does not seem to be the
exponential, the constant repair rate (the exponential) model seems to adequately predict the
asymptotic reliability.
So long as failures are revealed immediately, the time to repair is the primary factor that
determines availability. If the system is not in continuous operation (as in standby) failures may occur
but will remain unrevealed. The primary loss of availability will be failures in standby mode that are
not detected until an attempt to use the system. A primary solution to this failure type is periodic
testing. While periodic testing may help detect more failures, it may also lead to loss of availability due
to the downtime for testing. The longer it takes to detect failures, the less is the system availability
A(t ) + ( + ) A(t ) = , IF = p = e
d ( + ) dt
= e( + )t
dt
d
dt
( )
A(t )e ( + ) t = e ( + )t A(t )e ( + )t = e ( + ) t + C
A(t )e ( + ) t = e ( + ) t + C , at t = 0, A(0) = 1
( + )
Hence C =1 − =
( + ) ( + )
A(t ) = + e − ( + )t
( + ) ( + )
As t becomes large, the availability A(t) clearly approaches a constant value. The steady state availability
is given by
Mean repair rate
A *= lim A(t ) = ( + ) =
t → Mean failure rate + Mean repair rate
1
1 1 1
=
1
=
1
= =
If we rewrite
( + )
as
1 1 1 + 1 +
+ +
+
𝑀𝑒𝑎𝑛 𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑎𝑖𝑙𝑢𝑟𝑒
ℎ𝑒𝑛𝑐𝑒 𝐴*(∞) = 𝐴(∞) =
𝑀𝑒𝑎𝑛𝑡𝑖𝑚𝑒 𝑡𝑜 𝑟𝑒𝑝𝑎𝑖𝑟+𝑀𝑒𝑎𝑛𝑡𝑖𝑚𝑒 𝑡𝑜 𝑓𝑎𝑖𝑙𝑢𝑟𝑒
MTTF
𝐴(∞) =
MTTF + MTTR
Example: In the following table (Table 4), the times (in days) over a 6-month period at which failure
of a production line occurred (tf) and times (tr) at which the plant was brought back online following
repair are as shown. Question:
(a). Calculate the 6-month availability from the plant data
(b). Estimate the MTTF and the MTTR from the data
(c). Estimate interval (steady state) availability
a) During the 6 months (182.5 days) there were 10 failures
~ 1 10
A(T ) = (t ri − t fi )
T i =1
1
0.2 + 0.6 + 0.4 + 1.9 + 0.3 + 0.9 + .. + 3.3 + .1
182.5
~
A(T ) = 0.0630 A(T ) = 1 − 0.0630 = 0.937
= 12.8 + 1.2 + 10.6 + 5.6 + 2.0 + 20.8 + 5.4 + 68.4 + 11.8 + 27.0
1
10
1
MTTF = (165.6) = 16.56 days
10
1 10
MTTR = (t ri − t fi )
N i =1
=
1
0.2 + 0.6 + 0.4 + 1.9 + 0.3 + 0.9 + 0.1 + 3.7 + 3.3 + 0.1 = 1.15 days
10
c). / 1 1
A* () = = = = = 0.936
+ ( / ) + ( / ) 1 + MTTR 1 + 1.15
MTTF 16.5
W Wheel Cylinder
L Hydraulic Lines
Note: Safe breaking is achieved when either the front break works or the rear break works or both.
R(M)=0.995, R(Wi)=0.999, R(Li)=0.999
Rear
For each of these in series
2 2
For the Top half:
( )
( )
2 2
2 2
( ) ( )
2 2
fatigue and other internal and environmental problems that negatively affect the long term operability
of the system. Although the primary metric for judging effective maintenance is the resulting increase
in reliability after maintenance, the criteria most often considered is system availability especially in
the case of repairable or maintained system. Availability is defined in this case as the probability that
the system will be operational when needed. The distinction is clear. Generally, reliability in a strict
sense refers to unmaintained or systems that are not repairable whereas availability refers to maintained
or repairable systems. For satellites and one-shot space systems, we talk about reliability whereas for
cars or machinery we talk about availability, again in the strictest sense. We also make reference to
the notion of idealized maintenance which is a very rare case where maintenance returns the system
to as-good-as-new condition.
Considerable maintenance benefits can be realized when the maintenance intervals are chosen
such that for a given system the positive effects of wear-out time (increasing failure rate) is greater
than the negative effects of wear-in time (decreasing failure rate). This may apply more especially in a
system with different components. In such a case, it would be better to perform maintenance only
on those elements for which the wear-out effect dominate. For example, one may chose to replace
worn spark plugs in a car rather than replace a fuel injector (which may be defective) with a new one.
There are three basic types of maintenance, namely, Preventive, Predictive and Corrective,
and they are delineated as follows:
• Preventive (PM) – involves greasing, oiling, changing filters
• Predictive (PdM) – Inspections
• Corrective – Repairs
In general maintenance does not return the system to as good as new condition. A system that
has undergone any form of maintenance can be in one of several states after repair, namely:
a) As good as new
b) New better than old
c) As bad as old
d) Worse than old
instructions regarding the specific plant or equipment. The primary goal of PM is to preserve and
enhance equipment reliability. Therefore, any planned activity that increases the life of the plant or
equipment/component and helps such an entity to run more efficiently is desirable. Examples of PM
include tasks such as:
• Oil changes,
• Greasing,
• Changing filters,
• Belt tightening.
Preventive maintenance should be performed equipment as recommended by the original
Equipment Manufacturer (OEM). However, we must determine if time spent to perform PM is
greater than the replacement cost. If the PM cost is higher than the replacement cost, then
consideration should be given to replacement of the unit. Typically, equipment manufacturers outline
preventive maintenance procedures and guidelines in the OEM manuals including:
• Oil and/or grease types, and quantities
• Time periods (weekly, monthly, quarterly)
• V-belt inspections & Torque settings
• General visual inspections
These guidelines should be used when creating a PM program
In a recent study, the US Department of Energy (DOE) reported that every 10 minutes an
average furnace runs, it unleashes the equivalent energy of 3.5 sticks of dynamite in an effort to raise
awareness of the importance of regular PM of a common household furnace. The lesson or point to
be taken away from this study is that not performing PM wastes energy and costs money.
In addition to the guidelines and procedures that manufacturers provide in their manuals, the
American Standards Institute (ANSI) specify standards and recommendations for PM to help
businesses determine the type and frequency of inspections and maintenance procedures, define the
minimum requirements for servicing and maintaining plant equipment, serve as a comprehensive
maintenance checklist, and supplement more specific instructions, manufacturer publications, and
other standards. Through the application of these standards, industrial firms can improve automation
and operate more efficiently, produce higher quality products, minimize energy consumption, reduce
insurance inventories and business loss due to production delays, and increase overall safety levels. In
addition, preventive maintenance measures can drastically reduce errors in day-to-day operations, as
well as increase the overall preparedness of plants in the case of an emergency.
For PM, we must choose the maintenance interval for which the positive effect of wearout
time is greater than the negative effect of wearin time. Typically PM is performed on these components
for which the wearout effect dominates. Even with wearout present, the constant failure rate model
may suffice. Wearin like burn-in is a period of stabilization for the components.
As a general rule, only trained, qualified maintenance personnel should perform PM activities.
6.4 Summary
In summary, properly orchestrated maintenance programs have significant payoffs including but not
limited to; ● Keeps equipment running longer. ●Allows for scheduled, budgeted repairs
● .Reduces unscheduled down time, ● Makes life less stressful
Summary
Reliability Engineering is concerned with the design, implementation, and prediction of the
life profile of a system or component using a disciplined analysis approach that has strong roots in
statistics, mathematics and engineering, Given a system, subsystem or component, one of the major
challenges of reliability analysis is to provide an understanding of the inherent failure mechanisms that
undergird such a system and to develop the appropriate analytical scheme to determine the system's
life profile. The problem becomes even more daunting given the phenomenon of aging and related
transient phenomenon as well as the practical realities of little or no data. Today, these challenges still
persist especially as products gets more and more miniaturized and as companies try to shorten the
time to market in order to gain market share. This first in a two-course sequence has examined some
of the basic issues related to reliability such as:
• Understand the various viewpoints of reliability, especially the engineering design viewpoint.
• The use of nonparametric approach to estimate the reliability function.
• Understand the performance measures used to characterize reliability.
• Appropriate reliability based intervention strategies that lead to optimally maintained system.
• Availability, Maintainability and Performability measures.
The second sequence will focus on the all important area of dependency analysis, interference theory,
data analysis and testing.
References
1. Elsayed A. E., (1996), Reliability Engineering, Addison Wesley, Reading, MA
2. Lewis, E. E. (1995), Introduction to Reliability Engineering, Wiley & Sons, New York
3. Paul A. Tobias and David C. Trindade (1994), Applied Reliability, 2nd ed, Van Nostrand
Reinhold, New York.
4. Kapur. K. C., and Lamberson, L. R. (1977), Reliability in Engineering Design, Wiley & Sons, NY
5. Wayne, Nelson (1990), Accelerated Testing: Statistical Models, Test Plans, and Data Analyses,
Wiley & Sons New York
6. Wayne, Nelson (1982), Applied Life Data Analysis, Wiley & Sons, New York
7. Barlow, R. E. and Proschan, F. (1975), Statistical Theory of Reliability and Life Testing, Holt,
Rinehart, and Winston, New York
8. Operations & Maintenance Best Practices, A Guide to Achieving Operational Efficiency: Chapter 5;
Types of Maintenance Programs, Release 3.0, US Dept of Energy USDOE), August, 2010