You are on page 1of 712

INTRODUCTION TO RELIABILITY

ENGINEERING
Prof. Neeraj Kumar Goyal
Multidisciplinary
IIT Kharagpur
INDEX

S.NO TOPICS PAGE.NO


Week 1
1 Lecture 01: Introduction to Reliability Engineering 3
2 Lecture 02: Introduction to Reliability Engineering 17
3 Lecture 03: Introduction to Reliability Engineering 31
4 Lecture 04: Probability Basics 47
5 Lecture 05: Probability Basics (Contd.) 68

Week 2
6 Lecture 06: Constant Failure Rate Model-I 91
7 Lecture 07: Constant Failure Rate Model-II 112
8 Lecture 08: Constant Failure Rate Model- III 128
9 Lecture 09: Two Parameter Exponential Distribution 142
10 Lecture 10: Weibull Distribution (2 Parmeter) 155

Week 3
11 Lecture 11: Burn-in Screening for Weibull 190
12 Lecture 12: Weibull Distribution 220
13 Lecture 13: Normal Distribution 247
14 Lecture 14: Lognormal Distribution 271
15 Lecture 15: System Reliability Modelling 284

Week 4
16 Lecture 16: System Reliability Modelling (Contd.) 298
17 Lecture 17: System Reliability Modelling (Contd.) 313
18 Lecture 18: System Reliability Modelling (Contd.) 327
19 Lecture 19: System Reliability Modelling (Contd.) 340
20 Lecture 20: System Reliability Modelling (Contd.) 356

1
Week 5
21 Lecture 21: Markov Analysis 372
22 Lecture 22: Markov Analysis (Contd.) 388
23 Lecture 23: Markov Analysis (Contd.) 404
24 Lecture 24: Markov Analysis (Contd.) 419
25 Lecture 25: Markov Analysis (Contd.) 434

Week 6
26 Lecture 26: Failure Data Analysis: Non- Parametric Approach 445
27 Lecture 27: Failure Data Analysis: Non-Parametric Approach (Contd.) 467
28 Lecture 28: Failure Data Analysis: Non-Parametric Approach (Contd.) 485
29 Lecture 29 Failure Data Analysis: Non-Parametric Approach (Contd.) 502
30 Lecture 30 Failure Data Analysis (Parametric) 514

Week 7
31 Lecture 31: Failure data analysis(Parametric)(Contd.) 537
32 Lecture 32: Failure data analysis(Parametric)(Contd.) 551
33 Lecture 33: Goodness of fit 563
34 Lecture 34: Goodness of Fit (GoF) Tests 583
35 Lecture 35: Goodness of Fit (GoF) Tests(Contd.) 607

Week 8
36 Lecture 36: Maintainability and Availability 619
37 Lecture 37: Maintainability and Availability (Contd.) 636
38 Lecture 38: Maintainability and Availability (Contd.) 656
39 Lecture 39: Maintainability and Availability (Contd.) 671
Lecture 40 Summary of the course Introduction to Reliability
40 Engineering 698

2
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 01
Introduction to Reliability Engineering
Good morning, everyone. I welcome you all to the first lecture of the course introduction to
reliability engineering. During this eight-week course, we will cover the fundamental
concepts of reliability and emphasize their significance. Specifically, we will explore how
probability and statistics contribute to assessing reliability indices. So, in this first lecture
today, we will be discussing basic terms, definitions, and why they are essential.

(Refer Slide Time: 1:08)

Allow me to provide a brief introduction about myself. My name is Professor Neeraj Kumar
Goyal, and I hold the position of Associate Professor at the Subir Chowdhury School of
Quality and Reliability at IIT Kharagpur. I earned my PhD in Reliability Engineering from
IIT Kharagpur. In the year 2000, I completed my BE in Electronics and Communication from
MNIT Jaipur. Additionally, I worked at Secure Meters Limited in Udaipur for one year, from
2000 to 2001. I have done various research and consultancy projects for multiple
organizations like DRDO, MTRDC, DIC, ISRO, NPCIL, BARC, Secure Meters, Vodaphone,
AERB, Railways, MHRD, DST, Indian Army, IGCAR, John Deere, CESC et cetera.

I have guided few students like 7 PhD has been completed, 5 are currently going and 2 MS
have been completed.

(Refer Slide Time: 2:08)

3
4
I collaborated on a couple of books with my scholars. So far, we have authored three books,
the first one is on software reliability and early software reliability prediction utilizing a fuzzy
logic approach. The second is on artificial neural network applications for software reliability
prediction. The third book is on interconnection network reliability evaluation. Then I have
also written a few papers, around 45 journal papers and 13 conference papers. Here is a
collection of consultancy and research projects we have done.

(Refer Slide Time: 2:43)

So, before we begin this lecture and delving into the details of reliability engineering, we will
first go over some basic words and definitions that will be utilized throughout this subject..
There are four basic terms which are used for performance evaluations like reliability,
availability, maintainability and safety. We will discuss these one by one.

5
(Refer Slide Time: 3:09)

Before going into detail, we will first discuss what is meant by these terms. So, here we aim
to develop a general understanding of what are these terms and how they can be used for our
purposes. Before going for these terms, let us also discuss where these relevant engineering
concepts are applicable. Reliability engineering concepts are generally applicable to most of
industries as well as general life. In industries, we need to apply this concept of why the
reliability concepts of becoming more and more beneficial for the industry people like
nowadays we have a demand for RAMS engineers. RAMS means reliability, availability,
maintainability, and safety.

We also have a high demand for reliability engineers. Now, what do these engineers do?
These engineers try to evaluate the product, their life, and the possible problems the product
may face during their lifecycle and try to predict them. We also try to address those as much
as possible during design, manufacturing and operations or the maintenance phase. Here
knowledge of reliability engineering helps any organization to grow. Nowadays, like earlier
we used to have systems which have been very robust, which have been very simple. So,
dependency on the human being was also not very high.

These days systems are complex. That means they have multiple devices; they have multiple
components, and they are interlinked together. Nowadays these days we also talk about the
multidisciplinary field because any system is not purely electronic or purely mechanical. So,
there are not only multiple components, there are also different types of components that are
coming together to do certain functions. Now, why reliability comes into the picture is that
when these products are made by any organization, they can be sold to the customers.

6
So, let us have general electronics and general home-use products. If these products do not
work correctly for a sufficient period, for example, if we get a TV at home and within three
months, it fails. We will be very dissatisfied with the product and may not be using the same
product again. We will also not be recommending the same product again. Therefore, it is
essential that the product we are making are not only able to do the task but they are also able
to do the job for a more extended period of the time.

So, the design life concept comes into the picture that the product is not only designed for
doing certain things; they need to do those things for a certain period. Hence, reliability
comes into the picture. Similarly, other concepts like maintainability, availability and safety
come into view. Mostly these terms are considered to be customer-oriented, like, who is
demanding reliability, the customer is requesting this or the users or even the general public,
like let us say if we are talking about the reliability for nuclear power plants.

So, we are talking about the reliability of the traffic systems, or we are talking about the
railway systems. Now, these systems have safety consequences. Even if we are not using
them, we want to have them safe because of their wrong effects, we may be influenced like
we may be exposed to radiation and crushed on the road even though it is not our mistake.

So, all these systems where safety concern is there, their high losses are there, we need to
make sure that they are reliable; if they are not reliable, those systems will become useless.
We will not be able to use them if, let us say, we purchase a TV every 6 months it fails, we
are not going to purchase a TV every 6 months. So, that is going to be highly costly for us.
So, either because of cost or because of safety concerns, we may have to ensure that these
products are able to not only do the job, but they are also able to do their job for a more
extended period and for the desired time period or the general time period which we expect as
well as they are safe.

So, let us discuss some terms we will use during all these lectures. The first term is reliability.
So, reliability as we see reliability, why do we need reliability? Reliability we need so that we
can understand the product's performance in terms of duration the product works without
failure. So, if a product is working for a longer period without failure, it has better reliability.

So, are we using this measure or this measurement we are doing from the data to see how
long a product will work? Where is it applicable? As we discuss that reliability is applicable
generally everywhere. Everything we are using, have a certain life, we want them to work for

7
that life without failure. But as we discuss like for safety cases, wherever safety is essential or
wherever the product is having a high cost or the product is being used in a high volume.

So, consumer products produced in high volume also need to be reliable, because if a cstomer
comes to know that they are failing fast, the company will lose the market. So, reliability is
something which the people demand for longer duration of work, but it is supposed to be met
by or ensured by the supplier or the manufacturer. When is it useful? So, whenever we have
now like high consumer products, if we talk about high volume products are there so unless
there is a competition there may not be high for much focus on the reliability.

But all these products have multiple manufacturers there is fierce competition. In this
competition, the product that is unreliable or less reliable will soon be phased out from the
market because customers will not pick up. So, we need to make sure how much reliability
we are expecting from our product. Accordingly, we can decide our warranty policies,
maintenance requirements, etc. Similarly, for maintainability, in maintainability, we talk
about the time required for maintenance, primarily repair time or other types of maintenance.

So, here we need to ensure that the failed product goes to the maintenance. So, after the
failure maintenance is being done sometimes, we are doing maintenance to stop the device
from the failure or to predict that if condition is going bad for the system, we can repair it
before it actually fails. Especially that is done for cases where high consequences in terms of
losses or safety or human life are considered.

So this is required to ensure that maintenance time is optimized so we do not take much time
in maintenance. So, if we do not take much time in maintenance, for example, if we are using
refrigerators in our home. Now, if our refrigerator fails, by chance then what happens?
Because of the failure directly, we cannot use that refrigerator. So, the purpose of a
refrigerator is to ensure that our food items, mostly food items or other, ice etc., are not
wasted and remain usable.

Now, when this refrigerator is failed, all of our functions will be affected, and we will not be
able to use it for those purposes. So, since we are losing the function, I will be dissatisfied as
a customer. But I can still tolerate that, okay that my fridge will be repaired within 1 day. So,
1 or 2 days, I can easily manage it. But if the fridge takes one week to repair, then what will
happen? I will be highly dissatisfied. And I may not use the refrigerator from such a
company.

8
So, that company which is giving can repair my fridge in the least time; I will prefer that
company because that way, my inconveniences are short, and I am staying strong. And I am
able to continue use the because that fridge it is failed. So, I am not able to use it. So,
whatever the purpose and whatever my dependency is there on the refrigerator that is
affected. So, maintainability comes into the picture.

So, the companies which design their maintenance procedures to ensure that in the least time
your devices are repaired, the customer will be more satisfied. So, it is needed for those cases
where generally this may not be required for the cases where use and throw items are there,
this is mostly for little bit costly devices or the we have the engineering setups like nuclear
power plants, like thermal power plants, we have chemical industries.

So, all these big, big industries, steel industries or whatever big setups, generators, etc. which
we have all these need to be maintained. Because if they fail, we will not throw them we need
to repair them. So, we need to ensure that the repair time is minimum as possible. So, where
is it applicable? It is applicable to all cases where either item are costly, or they are having,
they are big items So, most of the time it is costly, and infrastructure items, and that is the
cases it is useful, we want to make sure that our maintainability is high. High maintainability
means we are able to repair it in less time.

When we discuss, availability is generally for the repairable systems. So , we are also worried
about the availability of the costly item you discuss for maintainability for those kinds of
items. So, availability is telling us to see if we have a system that is up and down for a certain
period. Let us say so, let us say this is our up period, this is up and this is our down period.
Down period means system is under repair and here up period means system is under
operation, it is working, functional.

Now, what is availability? Availability is how much time let us say this is time to failure.
Generally, for a reliability we use time to failure, working time is how much? It started
working from here that it let us say time t =0 and it worked up to here and then it failed here.
So, from 0 to there, this becomes time for failure. This is the time it is taken to fail, but that is
the same time it has been operating. So, TTF is the operating time, while downtime is the
time taken for repair TTR.

So, we can say TTR for first instance, second instance we can say this is TTR 2, and
similarly, we can have TTR 3. So, what is availability? Availability of a system is in two

9
states, either it is up or it is down. We want the system to be in upstate. So, what is the
probability that system is in upstate? So, we can say uptime divided by uptime plus
downtime. That means in one cycle this can be the average value of uptime and average value
of downtime.

Availability =

So, we can say the average value of uptime is MTTF (mean time to failure), an average of
TTF that is mean time to failure. An average downtime we can say is TTR. So, the mean time
to repair is how much average time it is taking to repair. So, if we want high availability
because this is the period for which we are using. So, high availability we can achieve either
by having high reliability which means, over time to failure is high, or we can also achieve by
having high maintainability, which means TTR is less time to repair is less, if the time to
repair is less or time to failure is high, we will be observing high availability.

So, the system we want to be available, if that is available, then only we are able to use if it is
not available, then we will not be able to use it for the functions like we took the example of
the fridge. So, we want it to be highly available. So, first, we do not want it to fail. And if it
fails, it should be repaired quickly if it is repaired quickly my availability will be higher. So,
availability is a performance requirement that is there and useful. Almost all systems here we
have high-cost involvement where we have the safety involvement, where we have the bulky
systems.

So, we need to ensure and tell this so, that we are able to understand how much, how on
overall life, how much time the system will be useful. Generally, for safety requirements, we
may be looking for high reliability, but in further cases where the consequences of failures
are not very high. Consequences mean whenever there is a failure, then whatever happens
after failure, mostly we are losing money, if we are only losing the money, there is no threat
to the life then it does not have safety consequences it is having only the loss of function or
loss of money.

So, loss of money kind of scenario, we will be working on the availability. Because if
availability high, that means, we are able to use it more we are able to generate more money
or we are able to satisfy our function for a larger time. But for safety kind of requirements,
because if it fails it is going to harm the people which is not permitted. So, for safety kinds of

10
requirements, we have to meet high reliability, high reliability means the time to failure is
high we do not want it to fail.

So, for those kinds of case we are more interested in reliability, but for cost kind of scenarios,
we are more concerned about availability. It is applicable to the repairable systems, wherever
repair is applicable, we are able to use it. If such a system cannot be repaired we cannot talk
about maintainability or availability. Let us see if we talk about satellites. So, satellite repair
is very cumbersome, and most of the failures may not be repairable. So, if they are not
repairable, we will not be talking about maintainability or availability, we will be mostly
talking about reliability.

Then comes safety, safety is also measured in terms of risk. So, safety is like reliability all
failures are considered to be unreliable, but all failures may not lead to harm to people. So,
safety is concerned with the cases where people may be harmed, people can get injured,
people can get there is a possibility of death. So, those kinds of scenarios or those kinds of
failures, whenever possible, then there is a safety concern for the system. So, wherever such
possibilities exist, we must ensure that safety is kept high.

So, we need to evaluate the safety, how do we evaluate safety? Safety is a subjective term.
So, we cannot directly evaluate. So, we evaluated it in terms of risk, the risk is opposite of
safety that means unsafe means risk, and here this risk is again a miserable term where we try
to estimate what is the probability that the system will be unsafe. Then, this is applicable for
scenarios like, let us say, railways, aircraft, nuclear power, thermal power, and chemical
plants. So, all these cases where there is a possibility of harm to the human being, safety, or
risk have to be analyzed. And in all these cases, we find it useful also.

So, like for RAMS terms, if we want to explain likely if you take an example of let us say
train, passenger train. So, for passenger train reliability would mean that when we sit on the
train and when we reach our destination. So, from one destination to another destination,
there is no failure, without failure we are able to start and reach the final destination.

Maintainability means, like, whenever this journey is completed, then the system will go for
the maintenance. So, and during the journey also, if it fails, sometimes we have observed that
when you are travelling during the train using the train, then in between there is something
happens something goes wrong and then it got stuck. So, maintainability we will deal about

11
how to address these failures? That means calling support and how quickly they are able to
fix it, because if they take more time, then we will be stuck on the way for a longer period.

Availability means that since this train can be unavailable, either due to the failure or when
we are traveling, this may also be unavailable because this is a vast system to keep it running,
we have to maintain it. So, it goes to the maintenance, there are some general regular
maintenances is carried out like cleaning, some repairs, some replacements and that takes
time. So, that train is not always available. So, it takes some time to maintain regular
maintenance is there, than periodic maintenance is there.

So, some time is lost in that. So, how much is the time it is going to be available to us that
means, how much time it is not spent on repair or that is either due to the failure during its
travel or to ensure that this is in good health? Similarly, whenever we are talking about the
risks, then we are talking about those failures like brake failures like we are talking about
wheel failures, or we are talking about accident chances.

So, in all those cases, what will happen and how the system will behave, and how much harm
is possible that we are evaluating in terms of risk? I hope this is understandable. So, we will
continue to next slide.

12
(Refer Slide Time: 24:22)

Let us go through the definitions of each term. What is Reliability? Reliability is defined as
the probability that a product or system will perform the required function for a given period
of time when used under stated operating conditions. So, as we see, reliability was initially
defined as the ability but later on, in this definition was changed, and it was made a
probability. So, how do we measure reliability? We measure the reliability in terms of
probability.

Probability of what? Probability that the product or system, whatever is of concern product or
system, will perform the required function. So, we need to know what are the functions of the
product, these functions the product should continue to function and should for a given period
of time, it is not like it just worked and then failed. And why we also mention here operating
conditions, we are making sure that this system should work under the stated operating
conditions like environment like what kind of uses, and what way people are going to use it.

Because if the environment is harsh if people are using abusing the device, then the
probability will change and the probability of big failure may become higher. So, reliability
becomes poorer. So, whenever we are promising reliability, we are also telling them under
what conditions this reliability is valid. Then comes availability which is similar to reliability.
It is also the probability that the product is in a state to perform the required function. So,
there are two states that we discuss either it can be a working state or it can be under repair.

So, it should be a working state, what is the probability that the system is in a working state
under given conditions? So, at this condition what is the probability that at any particular time

13
or at any given time, that system is in the operating state. Sometimes we may discuss about
given time, but this is not used frequently. The most used term is the instant of the time, like
availability at time t when we are talking. Then at that time t what is the probability that the
system is in the working state not in the failure or the repair state, given that required external
resources are provided, as we see here for availability the maintenance may be there.

So, maintenance may have a requirement for external resources. For maintainability, it is a
probability again as we see, all these terms are being measured in terms of the probability,
that is why we have to understand probability for this. So, here the item under a given
condition of use, we want to retain it in that condition the excellent condition or it is from a
failed condition, it is restored to the good condition where it is performing the required
function.

When maintenance is performed under given conditions again we cannot assume that
maintenance will be done like this like that, whatever is the defined way of doing
maintenance that should be considered here using the standard procedures and resources. If
there is a lack of procedure and resources, we cannot assess the maintainability because then
it sometimes it may take very huge time sometimes it will have high variability in that case,
maintainability assessment will become challenging.

So, here whenever we say so, like for engineering systems, there are maintenance crew that
availability of resources, resources in terms of both spare parts as well as the equipment
required for opening and using them and correcting the machines. Then comes safety, safety
is freedom from an acceptable risk to human health or the environment. So, anything which
can damage the human health or the environment, those are having the safety consequences.
So, the probability which is given in terms of risk that is the combination of expected
frequency of loss and expected degree of severity of the loss.

So, here we have to ensure that the probability of a certain type of loss is within certain
limits. So, after if that probability of loss becomes high that means, our risk is high, that
means, risk becomes unacceptable. If the risk becomes unacceptable, then the system is
considered to be unsafe, but if the system is lying within the acceptable risk. In that case, we
will consider that system is safe and usable because nothing can be made perfectly safe. So,
but we have to ensure or we cannot say that risk free or zero risk.

14
That kind of scenario does not exist. So, what we have to make risk as low as possible, as low
as practical and practically possible that in that case, we will ensure that the system is
considered to be safe.

(Refer Slide Time: 29:55)

Let us discuss reliability versus safety. So, reliability techniques increase reliability but may
not necessarily increase safety, why? Because while increasing the reliability, you may
increase the reliability of some certain functions, but to increase the safety you have to work
with those functions or those consequences which are having the like all the failures may not
lead to harm to the environment and people.

So, if you work on the failures, decreasing the failures which harm the people and
environment, then you are increasing safety. But if you are working on other aspects of
failures of the devices, then it may not be improving the safety, but it may improve the
reliability. Because reliability is all failures, safety failures, or some failures out of that. So,
an accident can occur without component failure, a component can fail without causing the
accident. So, even though there is no failure, you may have met an accident or two vehicles
that can collide.

Similarly, a component can fail without causing an accident, even if the accident is not there,
but the component can fail on its own. So, this is mainly about the reliability concern and this
is our safety concern.

15
(Refer Slide Time: 31:11)

If you compare reliability and availability, we have discussed this. So, high availability can
be achieved by reducing downtime. If we increase reliability, we can achieve high
availability, but if we decrease downtime, we can also achieve high availability. So, high
availability is not sure that we always do high reliability, but we can also do high
maintainability or reduce the repair time.

High reliability, similarly, may sometimes be at the loss of availability, what is happening for
repairable systems, is we are doing preventive maintenance if we do preventive maintenance,
our availability is decreased because now the system is in downtime, while high reliability is
there because the system is ensured in good health. So, we discuss these terms in detail in the
following lecture, lecture number 2. So, we will stop it here today. Thank you.

16
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 02
Introduction to Reliability Engineering (Contd.)
Hello everyone. So, we are now up to the second lecture, which is a continuation of our previous
lecture. So, today we will discuss the terms used in reliability engineering. So, here are some
basic terms we are going to discuss today.

(Refer Slide Time: 00:44)

Let us first see what is reliability? As we discussed, reliability is the probability that the system
performs a specific function for a given period of time under given stated conditions. Now, here
what is that given period of time? Let us say it is over t; we can also call this time(t) the mission
time. So, mission time back for the railway, as we discussed, mission time is like 1 hour trip
time. So, the whole trip time can be a mission time which is completed without failure. How do
we see that this is considered or it is without failure? We considered the random variable here.

As we discussed, this random variable is time to failure TTF. Now, this TTF time to failure has
to be beyond time(t). What does it mean? What if I am going to use the system for certain time t;
then, if my TTF is lying somewhere here, then I am fine. My system is reliable because I do not
have any failure in this duration. So, what is the probability that my failure does not happen in
this region? That means failure happens outside this region. That means my time to failure is

17
larger than the time(t) that is my reliability. Moreover, what will be my unreliability that failure
happens within this period? So, unreliability will be the cumulative TTF, or we can say it is (1-
R); that is the probability that my failure is before time t, or within time t.

So, failure happens before completing the mission or the trip. Moreover, if that happens the
system is considered to be unreliable. So, unreliability is the probability that failure will happen
before completing the operation. So, here we use reliability and unreliability; these are prevalent
terms. Then, we also use PDF. PDF is the probability density function. We will discuss a little
more detail in the following classes and introduce basic probability. We can say the probability
density function is the slope of the unreliability curve, or slope of CDF.

Alternatively, we can say the negative slope of the reliability curve, or we can say that is the
negative differentiation of reliability. So, it is the differentiation of unreliability or CDF or
negative differentiation of reliability. So, if we say that F(t) or R(t) unreliability and reliability.
However, the probability density function may not be the probability; because it is the per unit
time. So, it will have the time inverse dimensions, while reliability and unreliability will be
dimensionless quantities. Then, comes the mean time to failure. The mean time to failure as we
know, is the expectation or mean.

How do we calculate expectation of any random variable t? That is, since it is a continuous
function, time is a continuous function we take the integration from all ranges. That is generally
taken from -∞ to ∞, but time is a positive quantity. So, we do not have anything from -∞ to 0.
So, it starts from 0 and is integrated up to ∞; and the random variable because it is E(t). So,

There is an important term which we will be discussing in this overall course. Furthermore, this
is the term which is very highly used in the reliability theory. That is the failure rate or we
sometimes call it as the instantaneous failure rate versus the hazard rate. So, instantaneous failure
rate or hazard rate what is it? Instantaneous failure rate or hazard rate is an instantaneous rate of
failure at time t. It is the conditional probability of failure per unit time in time interval t to (t
+∂t). That means that if we have a time interval t and (t +∂t) what is the, if we take f(t) here?

So, whatever is the value of f(t) in this short period? This f(t) value is my probability of failure
per unit time. However, here, there is a conditional probability. What is that conditional

18
probability? Conditional probability is that the item should not have failed before time t; that
means it should be reliable here. So, this becomes f(t)/R(t). So, Z(t) is also represented as λ(t). It
is also represented sometimes as h(t) as the hazard rate. We will be mainly using λ(t). This equals
the probability density function, which is failure probability per unit time.

Failure probability per unit time is conditional. What is the conditional? That it should have been
working; means out of the working unit what the probability that it failed. So, the working unit
probability is R(t). Out of the working unit what is the probability that it failed is hazard rate; we
can get this in a stepwise manner before going for a description of hazard rate. Let us first see
what conditional reliability is.

(Refer Slide Time: 07:24)

So, conditional reliability is often used for reliability after the burn-in period for systems like
railways, aircraft etc. It also gives reliability after preventive maintenance or repair on failure etc.
Repair on failure means once it fails, it is repaired. Preventive maintenance means the
component or the system has not failed yet. However, we know that it may fail, so to reduce the
chances of failure, we are doing some maintenance there so that it does not fail. So, here as we
see this is R(t|T0). So, T0 is the time for which it has been operated. If you look at here, this is
work for T0 and this is (T0+ t). So, what is the reliability during this period?

That probability that it works during is T0 to (T0+ t), given it has worked up to time T0. So, this is
the probability that the time to failure is beyond (T0+ t), given time to failure is greater than T0;

19
that means it has not failed till T0 that is my condition here. Moreover, in this condition what is
the probability that it works from T0 to (T0+ t); that is failure time is beyond (T0+ t). How can we
get this? This probability conditional probabilities, probability we know conditional probability
of probability of (x|y) is equal to the P (x y)/ P(y).

Similarly we can get P(T > T0+ t). So, the intersection of the two if you see is the same as
probability that P(T>T0+ t). So, the intersection term is R(T > T0+ t)., probability that P(T > T0+
t).; that is my R(T0+ t). Similarly, the condition here is that P(T>T0) is R(T0). So, this is how we
can get conditional reliability.

This conditional reliability concept is often used for burn-in; what is the burn-in period?
Initially, because of the manufacturing etc., there are certain times the number of failures is very
high. So, failure rate is high.

In that case what happens to remove that big population or the faulty population, what we do?
We do burn-in. That means we keep the device under elevated temperature and observe that if it
is keeps on working or not. So, stresses are applied, and then we see that whether it is failed or
not like, like a motorcycle or car, what can be done initially before giving it to the customer, they
will run it for eight hours. They will run it for one or two days, or one or two months, and then
they will give it to the customer. That happens because of the initial running whatever
weaknesses are there, whatever problems are there that can be detected and corrected, before
giving it to the customer.

Same way, if you talk about the trains, like trains, before being used for public uses. What they
will do? Like a nuclear power plant, they will launch the train in the dry run. So, even for one or
two years, they will run that train without any passenger to see whether there is a failure.
Because initially, chances of failures are high, there can be a problem due to the manufacturing
or installation; those problems are difficult to remove. But, initially, if we have some failure-free
period, then we can ensure that the device will run normally after that period of time; or it will
have good reliability.

20
So, this becomes our conditional reliability; during this period (T0), the customer has not
observed any failures. So, during this period, the customer is not using the product as it is used
within the organization. Furthermore, whatever failure happens, here it is corrected; the customer
does not come to know. So, the customer sees the reliability after time T0, which is (T0+t); this is
conditional reliability. Because already T0 time we have spent, and we have ensured that
component is not in failed condition here. So, this becomes the conditional reliability here.
Similarly, for preventive maintenance, the same philosophy works here.

(Refer Slide Time: 12:19)

21
Let us see how the failure rate relates to the other thing. So, though we have shown there that
failure rate is f(t)/R(t); but how does it come? Let us see what the concept of failure rate here is.
Let us assume that we have time (t +∂t). Now, during this time interval, what is the probability of
failure? Now R(t) is a decreasing curve. We will discuss all this. This is or we can say our R(x);
let us see this is x. So, as we see here, if we want to know from this is the probability of success;
so what are the chances of failure here?

The chances of failure here mean that it is failing during small t to (t +∂t); that means we can
write the same thing that this is a probability that my time to failure is beyond small time t,
because it has not failed here. But, it has failed before (t +∂t). So, the probability that time is
greater than (t +∂t). But time has exceeded (t +∂t). So, the same thing we can add this to R(t), and
this is R(t +∂t). So, same thing or we can say this is probability that in another way we can see
this is my probability. So, I can calculate that in the limit from 0 to t (t +∂t)ft dt. This gives me
the failure probability in time interval (t +∂t).

We can write again the same way that is called the T is greater than t and T is greater than (t
+∂t). Once we get this, this gives us the conditional probability of failure; let us again see this.
So, now, I want to know my probability of failure in the same interval; but, the condition is it is
survived by here. We are keeping this because we want to ensure that the product is in working
condition. So, out of working condition, out of the working product, what is the proportion of
failure per unit time? That is of concern; that is what we call the failure rate.

22
While f(t) which we discussed earlier is PDF which is concerned with the failure probability per
unit time during this interval, there is no condition. It is out of the whole population. Out of the
whole population, what is the priority that it will fail in this duration? However, for failure rate,
what is the probability it will fail in this duration given that it is working here; that means only
the population surviving up to here. How can we get this means conditional probability? So, that
the probability of failure during this period divided by the probability that it is survived up to
time that is R(t). So, this becomes conditional probability of failure per unit time which is failure
rate. So, failure rate becomes the same value as what we have written here.

So, the same thing we are calling as instantaneous hazard rate function. So, now we can say that
limit t delta t tending to 0 will give the instantaneous failure rate or hazard rate i.e R(t)- R(t
+∂t)/R(t +∂t). So, I can take 1/R(t)outside here, this portion I can take outside; remaining portion
is our R(t) minus R; negative if I take outside, this becomes R(t)- R(t +∂t)/R(t +∂t). So, if I apply
the limit, this portion will become dR(t)/dt. So, we get the following

So, this is how we get it. In a more straightforward way if you see, this is the PDF applicable
given that conditional PDF given that there is no failure up to time t; that becomes the same
thing. So, we can now, if we try to use this formula, we will also be able to get R(t); because we
know this = f(t)/R(t). This is uniquely defined that means if we have one value of or one
function , then f(t) will be uniquely defined by that. We cannot have two different ft when
given a function. For one function of , we will have unique f(t) ,unique R(t); we cannot
have two different values of them. So, they are uniquely defined by each other.

So, here we can calculate reverse for, if we have any one of the functions, we can get all other
functions; if we are having ,then we can calculate R(t). This R(t) comes out to be e to the
power minus 0 to t, lambda t dt. How does it come out? We can see that this = -dR(t)/dt, and
that is 1/R(t). If we say that dt if we take it here, then dt will be equal to -dR(t)/dt, if we take
integration on both side from 0 to t. So, this will become negative sign we can take here, this
will become plus; and this will become lnR(t). lnR(t) is equal to minus integration from 0 to t
lambda t R(t), dt.

23
So, from here my R(t) comes out to be e to the power minus; because ln if I take it here, this will
become exponential, zero to t lambda x dx. Same thing we have written here. I can say x so that
we are able to differentiate between mission time and the integrating variable. Similarly, like ft is
equal to R(t), from f(t) into R(t); so, R(t) is my already here. So, if we want to calculate
f(t), then f(t) in terms of if I say, then this will become into R(t). R(t) e to the power
minus 0 to t, lambda x dx, this one same. So, similarly, whatever we have if we have R(t), we
can get , we can get f(t) all the functions can be interchangeably evaluated. So, simple
mathematics we can do it.

or

(Refer Slide Time: 20:15)

There is one important curve which is very famous curve; which is called bath tub curve. This
curve defines how life goes on. This life is similarly experienced by humans, for animals or even
the products. If we look at it, we have three phases here. We have burn-in, we have useful life,
and we have the wearout. How is it coming up? There are actually three kinds of failures
involved here. There is wear out failures, there is a early failures, and there is a random failures.
Now, let us see what is this early failures. This bath, generally there are three source general
source of the failures; one is the early failures.

These early failures are happening because of the mistakes we haves made during the design, or
the mistakes which have been done during the manufacturing; or there has been weakness in the

24
parts. Or there has been installation problems or transit problems, transportation problems. So,
because of this what happens that in a very short time of product uses, the product will fail. But,
initially only the failure rate is high; that means per unit time failure probability is high. But later
on, those products that have survived continue to survive longer. So, my failure rate, this is a
curve for failure. So failure rate keeps decreasing, also called decreasing failure rate DFR.

Then, another source is the random failure. Random failure happens due to the various reasons it
can happen with any device anytime. So, it is not generally having a particular reason or
particular; any component can fail for any reason, accidents can happen. So, accidental failures
in human life can be called as the random failures. Because, early failures like for human life if
we compare, early failures are the ones which are happening because, let us say infant mortality;
so the child at the time of birth is most vulnerable. So, at that time, we have high death rates. So
children at the time of pregnancy or at the time of birth, or during initial feed up to three to five
years, there is a high chances of deaths.

Why that is there? Because the system or the human being or the baby requires care. Why do
they require care? Because they are vulnerable to the environment, vulnerable to human uses and
various accidents. So, this early failure probability or early failure rate has to be insured; so they
require care. So, it is insured that one; but once they reach age five and more, the chances of
failures due to these kind of baby-related issues becomes less. But, there is that there is a random
failure. Random failures can be due to any reason; it can be due to the development of certain
diseases, or the development of certain accidents, or something suddenly happens with someone.

But, does not happen with most of them, it happens with some of them. Similarly, something
happens or goes wrong when using the system. So, they have these kind of random failures.
Then, we have the wear out failures. These wear out failures are the age related failures like
when we grow old, then our body (weaks) weakens, and we are susceptible to the diseases. And
after certain life, let us say 70 years or 80 years, we wear out is so high that our body gives up.
So, initially the failure rate is low, there is no wear out; but later on, on the life we see that due to
wear out, chances of deaths are rising. Same thing happens with the human being and same thing
happens with the components or the systems. The systems also wear out like our tires, they will
they will wear; so, and we have to replace. Similarly, most of the components and parts of our
systems cannot survive long; they cannot continuously work for infinite period of the time.

25
So, at a certain period of time, after that time, because make a change in chemical properties,
mechanical properties, change in system will lead to these failures. So, everything ages and once
it ages, because aging wears out what happens. If we add these three curves, this curve, early
failure curve, then random and wear out failures. If we add them, then we will see that see the
shape. In the initial period, the dominant failure mode is the early failure, DFR. During the useful
life, which is the like for our case, it is the useful life maybe around 5 years to somewhere
around 50 years; we do not see much chances of failures. The chances of failures are same. Some
person can die without having any not because of age; it can die at any moment because of those.
There are random failures; so it is a constant failure rate.

Then, we have the increasing failure rate that is our wear out failures. Here we have the failures
because of the aging, because of weakness, which has been there for the system of the human
being; and this is increasing failure rate, IFR. So, this kind of curve or failure rate pattern is
generally observed with most of the systems. So, if we understand that, we can devise our
expectations better and manage our systems better.

26
(Refer Slide Time: 26:14)

Let us see one example: if we have TTF distribution given here, if f(t) is given, then what will be
the reliability? As we discussed, reliability equals integration from t to infinity f(t)dt. So, here if
you want to calculate reliability for 100 hours of operating life, we can calculate it as from 100 to
infinity, f(t) is this; because this is defined for t greater than 0, we are talking about t greater than
100. So, we can use the same 0.0001, sorry 0.001, divided by 0.001 t, plus 1 whole square dt. If
we solve this, then like we can assume that 0.001 t plus 1 is equal to x; so 0.001 dt will be equal
to dx. So, this will be replaced by dx and what we will have here let us solve it here.

So, if I apply t equal to 100 here, so 100 into 0.001 will become 0.1 plus 1; so that is 1.1. That is
and for infinity that will become remain infinity; and this will become dx upon that is again my x
square. So, what I will get is minus 1 upon x that is 1.1 to infinity. So, that infinity this will be 0
and minus minus will become plus; and that will become 1 upon 1.1, this will be my Rt. Same if
you see here, if I apply t equal to 100, I will get the same value. I have solved it in terms of
values, we can first solve in terms of t and then apply the t; or we can solve directly also. So, my
value if I solve this 1 upon 1.1 that is turning out to be 0.909.

27
(Refer Slide Time: 28:38)

Similarly, if I do another example that computing MTTF and reliability. Now, let us say there are
two reliability functions given reliability e to the power minus 0.002t, and another reliability
function is 1000 minus t upon 1000. This function is defined for t values of t from 0 to 1000; this
is defined for all positive values of t. If we want to calculate MTTF for this, now MTTF is
integration from 0 to infinity, R(t) dt. If I integrate this, what will I get? I will get integration
from 0 to infinity, e to the power minus 0.002t dt. I will get 1 upon 0.002 minus, e to the power
minus 0.002t, from 0 to infinity. If I put infinity here, this will become 0; e to the power minus
infinity will be 0.

So, minus 0 and then I put 0; this will become 1 and this will become 1 upon 0.002. This is 500
hour. Similarly, if I solve this, then this is integration from. Now, here this is defined for 0 to
1000 only; so I will calculate from 0 to 1000. And if I divide by 1000 that is 1 minus t divided by
1000 dt; so that will become t minus t square by 2000, from 0 to 1000. So, that will become if I
do this, this will be 1000 minus 1000 square, divided by 2000 minus 0. If I apply t equal to 0,
both will become 0. Now, here if we see from 1000, it will be gone. I will have 1000 divided by
2 that will be 500; 1000 minus 500 will give me 500. So, as we see in this case, both are giving
me 500 hours.

But, if I calculate the reliability for 400 hours, I simply apply the values here. So, this will
become e to the power minus 0.002 into 400; this gives me reliability as 0.449. If I apply the t

28
equal to 400 here, that will become 1000 minus 400; that means 600 divided by 1000 that will be
0.6. So as we see here, two functions are having same MTTF; but their reliability behavior may
be different. So, we should not say that if two systems are having same MTTF, then they are
having the same reliability also.

So, we should keep in mind that reliability may be different, even though system is having the
same MTTF.

(Refer Slide Time: 31:31)

Similarly, if we have the lambda t available to us, we can calculate the design life. So, what is
the design life? That means we want to know the life at which our reliability is 0.98; our
reliability is not falling below this. So here, I can write down the R(t) function that is exponential
minus 0 to t ft dt. If I solve this, my value will become exponential. This x square by 2 this will
become; so this will become 2.5 into 10 to the power minus 6, t square. Once I apply the values
here, what is my design life? I want to know what is the value of t for which this reliability will
be 0.98? So, I will put 0, this will be 0.98. And I want to know the final value of t; so, here I can
take log of this.

So, ln of 0.98, then this will be right side minus 2.5 into 10 to the power minus 6, t square. Then,
t square is equal to minus ln of 0.98, divided by 2.5 into 10 to the power minus 6. Then, t will be
equal to square root of minus ln 0.98, divided by 2.5 into 10 to power minus 6. Since, ln value is
calculated for value less than 1; this will be negative value. So, negative negative will become

29
plus; if we solve, we get the value 89.89; that equals 90 hours. So, using these functions, we can
evaluate reliabilities as we see. So, we will stop it here today; we will discontinue our discussion
in the next lecture. Thank you.

30
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 03
Introduction to Reliability Engineering (Contd.)
Hello everyone. So we are continuing our lecture series, this is our third lecture. In previous 2
lectures, we discussed about some terms and their definitions, we will continue the discussion
on the terms and definitions which are used in reliability engineering. Previous lecture we
discussed about the terms which are commonly used for reliability engineering. We will
continue our discussion and now, we will discuss about the terms which are used for
maintainability and availability and safety.

(Refer Slide Time: 01:00)

As we discussed in reliability, we want to know the reliability. But the problem is that many
times, when it comes to reliability, we are interested in knowing at what time the failure will
occur. Unfortunately, we can never determine exactly when a system or component will fail.
For example, if I am using a cooler, fridge, AC, or my vehicle, I cannot be sure that it will
fail at this moment. Similarly, nobody can predict the exact moment that a person will die.

However, we can assess the probability of failure, meaning the likelihood that a certain
system will fail. A higher probability means a higher chance of failure. For instance, if a large
population is using a particular device, we can estimate the proportion of the devices that may
fail.

31
Although we cannot predict the exact time of failure, we can estimate the chances of failure.
The prediction, however, is limited to the chances of failure and not the exact time to failure.
For instance, if I use 100 tube lights in the same room, on average, I can expect to see around
5-6 failures per year. Alternatively, if the tube lights are beyond 2-3 years of use, the failure
rate can increase to 30-40%.

Therefore, we cannot claim that any system is risk-free or 100% reliable. There is always a
chance of failure or an accident. However, we can work towards improving the reliability of
the system to increase the chances of success and minimize the chances of failure and
accidents.

(Refer Slide Time: 05:16)

Now we will discuss maintainability. In maintainability, maintenance is a combination of all


technical and administrative actions. It includes supervision actions and is intended to retain
the product in good condition, restore it from a failed state to a good state or to a state where
it can perform a required function.

Our intent is that the maintenance intention is the same as the system should be able to
perform the required function. The system is designed for certain functions and should be
able to perform them. Maintenance is for that purpose, either to ensure that it continues to
perform the function without failure or, if it fails, we repair it and bring it to a working
condition. Even though it may not be fully restored, it can still perform the required function.
Therefore, maintenance is a set of activities aimed to accomplish all these objectives.

32
There are different types of maintenance. One is corrective maintenance, which was used
before World War 2 when dependency on systems was not very high. Whenever a system
failed, people could correct it. It was carried out after fault recognition, meaning that once a
failure occurred, the fault was identified, and the system was put back into a working state.
However, during World War 2, when systems became larger and complex, if a failure
happened, people were losing the war because if their equipment did not work, they could
lose troops and lives.

Similarly, during the Industrial Revolution, when their dependency on machines was higher,
if the machine failed, the whole staff would not be able to do any production. Repair costs
may not have been high, but repair time was very costly. During downtime, the function
could not be performed, and there were high losses, including mission losses. Therefore, a
philosophy for proactive maintenance emerged in Second World War and onwards.
Preventive maintenance came first. In preventive maintenance, the simpler systems with only
a few components, which tended to fail, were assessed to determine how long it generally
took to fail. Before the failure occurred, the aged components were replaced with new
components. This is applicable only to the components that fail on ageing, such as gears,
bearings, moving parts, friction parts, computer screens, etc. Here, predetermined intervals
are used from earlier experience and data to identify the safe interval within which there are
not many chances of failure. At those intervals after which there is a high probability of
failure, the item is replaced with a new one, or it is maintained or repaired using certain
procedures so that it becomes as good as new. First, degradation is removed, then the system
becomes good, and it continues working. This is called preventive maintenance.

However, preventive maintenance may result in replacing good components and inducing
errors during maintenance. Machines that were working fine may start creating problems
because of human error or other reasons during maintenance. Therefore, the concept of
predictive maintenance was introduced to stop a good running system for maintenance. In
predictive maintenance, the system is observed using various sensors, which are used most of
the time nowadays. Here, preventive maintenance can be used for frequently checking and
inspecting certain parameters, and based on those parameters, it is determined whether a
particular component or system is in a healthy or unhealthy state. If the system is healthy, it is

33
not touched, and we continue with it. However, if the system is found to be developing faults
or degradation, those systems are replaced or repaired. This is called predictive maintenance.

"Predictive maintenance helps reduce the replacement of good components or unwanted


interactions with the system, thereby preventing unnecessary component replacements.
However, predictive maintenance may not always be possible, as the failure must be
identifiable in advance. Without a means to identify failure, it may not be feasible.
Collectively, these techniques are called proactive maintenance, with proactive maintenance
occurring before failure and corrective maintenance taking place after failure. By performing
maintenance before failure occurs, we can prevent system failures."(Refer Slide Time: 12:54)

The terms commonly used in maintainability were discussed earlier. Maintenance refers to
the procedures, ways, or tools used for upkeep. Maintainability is the probability that the
repair time is less than the target time. This means that maintenance should be completed
within a certain period of time. For reliability, the time to failure has to be beyond the target
time, but for repair, the time to repair should be less than the target time. This is essential for
the system to function properly. Maintainability is the probability that the repair time is less
than or equal to the target time.

If the probability density function (PDF) of the repair time is h(x), then integrating h(x) from
0 to t will give the maintainability. The mean time to repair can be calculated as 0 to infinity
of h(x) multiplied by tdt, or 1 minus the cumulative distribution function (CDF) of h(t) from
0 to infinity. Since we are talking about time, the limits are from 0 to infinity.

34
The repair rate is the same as the failure rate. It is the instantaneous rate of repair at time t.
The concept of failure rate can be applied here as well. The repair rate, mu*t, has the same
PDF as h(t). The survival function of r(t) is 1 minus h(t) because r(t) is 1 minus the CDF
(Refer Slide Time: 15:07)

h(t )
 (t ) =
1 − H (t )

Now, let us take one example that time to repair of a power generator is best described by the
following distribution. So let us say this is our PDF given for repair time. So my repair time
in minimum is 1 hour and maximum is 10 hour, my repair will complete within 1 to 10 hour
and within 1 to 10 hour my PDF, that is probability of completing the repair per unit time is h
t. Now, what is the priority that my repair will complete in 6 hours? So that means I am
interested in maintainability, maintainability for 6 hours. So maintainability for 6 hours
means H 6, I can say H 6 t equal to 6, probability that is less than equal to 6.

Now, this is on 1, so 0 will be replaced by 1 and maximum limit is up to 6, t is 6, t square by


333 dt. If I take the integration of this, this will become t cube by 3 into triple 3, and if I put
the limits 6 and 1, this will become 6 cube minus 1 cube. And this if I solve I get the value
0.2152. This is my probability that my repair will complete in 6 hours. So or I can say this is
my maintainability for 6 hours. What is mean time to repair? Mean time to repair as we see
this is 0 to infinity, but since this is defined from 1 to 10 only, so 0 will be replaced by 1 and
maximum values will be 10.

35
What is my R(T)? I can say 1 minus h t or I can say tf(t)dt, so th(t)dt. So if I multiply t with
this, this will become t cube by triple 3, t cube by triple 3 dt. If I integrate this, this will
become t to the power 4 divided by 4. So t to the power 4 that is 10 to the power 4 minus 1 to
the power 4 divided by 4 divided by triple 3, this if I solve this gives me 7.51 hours. So this
gives me value that which gives me that is the mean time to repair. So 7.51 hour is expected
to be the mean time which is taken for repair.

Next question is what is median time to repair? So median time to repair we can calculate
here I have not calculated already. So let us say median time, what is median time? Median
time is at that time the probability is CDF value is 0.5 or my maintainability is 0.5, I can say
1 minus maintainability is 0.5. Let us say maintainability is 0.5, I want to know the time at
which my maintainability is 0.5, H(t) is 0.5. So H(t) we have got this here H(t) is 1 upon
triple 9 t cube minus 1. Now, this I can solve, if I solve this then this becomes 0.5 into triple 9
that is equal to t cube minus 1. So t cube will be equal to 1 plus triple 9 into 0.5.

Now, if I take this approximation if I assume this is to be approximately 1000, then this will
become 500. So 500 plus 1 or even if I calculate this let us say 1 plus 499 point here. So this
will become if I add this so this will become 500.5 that is my t cube. And so my t value will
be 500.5 raise to the power 1 by 3 which I can calculate using the calculator. I am not using it
here this can be done very easily, so I am leaving it here. So, this gives my median time to
repair, median time because at this time the probability is 0.5.

"Now, let us take one example. The time to repair a power generator is best described by the
following distribution. So, let us say this is our PDF given for repair time. The minimum time
for repair is 1 hour, and the maximum is 10 hours. The repair will complete within 1 to 10
hours, and within this time, the PDF (probability of completing the repair per unit time) is
H(t).

Now, what is the probability that the repair will complete in 6 hours? This means I am
interested in maintainability for 6 hours. Maintainability for 6 hours means H(6), which is the
probability that is less than or equal to 6. On 1, 0 will be replaced by 1, and the maximum
limit is up to 6. So, the integral will be t squared by 333 dt. If I take the integration of this, it
becomes t cube by 3 into 333, and if I put the limits 6 and 1, it becomes 6 cube minus 1 cube.
Solving this, I get the value 0.2152. This is the probability that the repair will complete in 6
hours or the maintainability for 6 hours.

36
What is the mean time to repair? As we see, this is from 0 to infinity, but since this is defined
from 1 to 10 only, 0 will be replaced by 1, and the maximum values will be 10. So, my RT
can be calculated as 1 minus H(t) or t times the PDF. So, t times H(t) dt. If I multiply t with
this, it becomes t cube by 333 dt. If I integrate this, it becomes t to the power 4 divided by 4
into 333. Solving this gives me 7.51 hours, which is the expected mean time for repair.

t2
H (t ) = Pr{T  6} = 16
333
dt =
1
3  333
( )
63 − 1 = 0.2152
3
10 t
MTTR = 1
333
dt =
1
4  333
( )
104 − 1 = 7.51hr

Next question is, what is the median time to repair? I have not calculated this yet. Let us say,
the median time is the time at which the probability is CDF value 0.5 or maintainability is
0.5. So, I can say 1 minus maintainability is 0.5. Let us say, maintainability is 0.5, I want to
know the time at which my maintainability is 0.5, H(t) is 0.5. H(t) can be calculated as 1 upon
999t cube minus 1. If I solve this, it becomes 0.5 into 999 is equal to t cube minus 1. So, t
cube will be equal to 1 plus 999 into 0.5, which is 500. If I take this approximation, it
becomes 500.5, and my t value will be 500.5 raised to the power of 1/3. This gives my
median time to repair, as at this time, the probability is 0.5.".

(Refer Slide Time: 19:31)

Next, let us discuss availability. The most commonly used term for availability is "average
availability". What is average availability? Average availability is the integral from 0 to t of

37
A(t) dt. How do we take the average of any function? To find the average of function f from 0
to t, we can calculate the integral from 0 to t of f(x) dx divided by the integral from 0 to t of
dx. Therefore, it becomes 1/t multiplied by the integral of f(x) dx from 0 to t. Here, f(x) is the
availability function, also known as point availability or instantaneous availability, which is a
function of time.

1 T
A(T ) =  0 A(t )dt
T

As we can see here, A(t) becomes 1/t multiplied by the integral from 0 to t of A(t) dt when
we take the average for an interval. If we have the exponential failure rate and exponential
repair rate with parameters lambda and mu, we can obtain the availability function. We will
discuss this later, maybe around the eighth or seventh week. If we have the failure
distribution, which is exponential with repair, and a failure rate lambda, and if we have the
repair rate mu for the repair distribution, which is also exponential, then we can get the
availability function, which is mu/(mu+lambda) * (lambda/(mu+lambda) * e^(-
lambda+mu)t).

 
A(t ) = + e− (  +  )t
 +  +

If we plot t versus A(t), it will look something like this. If we put t equal to infinity, then A(t)
becomes 0 because exponential minus infinity equals 0. Therefore, it terminates at
mu/(mu+lambda), which means it will never be below mu/(mu+lambda). At the start, when t
equals 0, A(t) becomes mu+lambda/(mu+lambda), which means availability starts from 1 and
decreases. Generally, this is our instantaneous or point availability, which varies for one
device that has the exponential distribution for repair and failure.

If we want the interval or mission availability, we need to integrate from t1 to t2 using the
same formula. The result will be (1/(t2-t1)) multiplied by the integral from t1 to t2 of A(t) dt.
As we see, this availability, which is a time function, becomes almost constant after a certain
period of time, and this value that we get is called inherent or steady state availability. This
value is not changing even though time is changing. It is independent of time because repair
and failure have happened too many times. This is known as the inherent or steady state
availability.

38
The limit t tending to infinity of A(t) is the inherent or steady state availability, which is
mu/(mu+lambda). What is mu? Mu is 1/MTTR, and lambda is 1/MTTF. Therefore, if we
apply this here, it becomes 1/MTTR divided by 1/MTTR+1/MTTF. If we simplify this, it
becomes MTTF/(MTTR+MTTF), which is the same as the previous result.

MTBF MUT
Ainh = ASS = lim A(T ) = =
T → MTBF + MTTR MUT + MDTCM

This has some different notations. We can say that if we use the uptime and downtime basis,
the mean uptime divided by the mean uptime plus mean downtime is the inherent or steady
state availability. Here, downtime is only considered due to corrective maintenance, which
means the downtime that occurs only due to repair after failure. We are not counting the
downtime that occurs due to preventive or predictive maintenance.

(Refer Slide Time: 25:15)

Then we could include the downtime due to the predictive maintenance and preventive
maintenance that is our proactive maintenance, then we call it as achieved availability. So
mean uptime divided by mean uptime, then downtime due to both causes either due to the
failures whatever repair time is taken after the failure or whatever is the repair time which is
there due to the preventive maintenance actions, both downtime will be combined here, then
we will call it achieved availability.

39
Mean, here we use the term MTBM, MTBM is the mean time between maintenance sections.
So maintenance sections here can be either due to the repair on failure or it can be
maintenance action can be taken to avoid the failure that is our preventive maintenance
section. And M bar is the mean system downtime which is considering all maintenance
downtime, corrective maintenance also and preventive maintenance also. Then comes
operational availability that actually how much time it is available for operation in that case,
we are including additional downtime, which is due to the administrative reason, this
administrative reason is mostly like decision making, there can be the downtime due to the
spare parts like.

MTBM MUT
Aa = =
MTBM + M MUT + MDTCM + MDTPM

So sometimes what happens our devices fail, but we do not have a spare available. Then there
is a maintenance time, maintenance in terms of here this is for crew availability. So what can
happen that the specific person or the specific technicians are required that may not be
available, so all these types of downtime when we combine it becomes operational
availability that actually this becomes very close to the practical experience, where we are
counting the downtime due to the corrective maintenance, preventive maintenance as well as
the delays, various delays, maintenance delays, logistic delays, all delays are counted here.

MTBM MUT
Ao = =
MTBM + M + MDT MUT + MDTCM + MDTPM + MDTadm

Supply delay time, supply delay time is the spare part and maintenance delay time
maintenance is due to the crew availability or the equipment which is required for repair. So
generally, if we want to achieve high reliability, what will happen? To achieve high reliability
we have to do the preventive maintenance. So to achieve high reliability many times we have
to put more time here. So this may decrease achieved availability, this may decrease our
operational availability.

So we have to look into that how much but most of the time we have to achieve high
reliability the reason being that there is a high losses whenever there is a failure. So we may
have to invest some time here that may reduce availability sometime not always because
there is a gain here also. So there is a increase in uptime expected but generally that is not
there because generally that uptime may also be a little bit reduced because we are doing

40
maintenance early, earlier than that life is fully consumed. So overall it is generally tending to
give you the lower availability when you are doing the preventive maintenance.

(Refer Slide Time: 28:45)

If we consider our main interest, it is mostly in downtime, such as for railways or other
facilities that are highly costly and time-based. Mean downtime can be calculated as
unavailability. What is unavailability? Unavailability is 1 minus steady-state availability
multiplied by 365 days, 24 hours, and 60 minutes. There are 60 minutes in each hour and 24
hours in each day, so there are 365 days in a year. Therefore, we can calculate the downtime
in terms of minutes per year.

MDT = (1 − ASS )  365  24  60min / yr

For example, if our availability requirement is 99.99 percent, that means our unavailability is
0.01, or 99.9 percent, which is 99.99 divided by 100. This gives us an availability of 0.9999.
The unavailability is then 1 minus availability, which equals 0.0001, or 10 to the power of
minus 4. We can use this value to calculate the mean downtime.

In the table provided, we can see that when the availability is 90 percent, the downtime is 10
percent. This means that in a year, we are spending 876 hours or 52,560 minutes on
downtime. If we aim for 99 percent availability, the downtime is 87.6 hours or 5,256 minutes
per year. If we aim for 99.9 percent availability, the downtime allowed is only 0.1 percent,
which is equivalent to 8.76 hours of downtime per year. We can obtain other figures in a
similar manner.

41
As we can see, these figures need to be determined based on our requirements, and the target
is set accordingly to allow for the desired level of downtime.

(Refer Slide Time: 31:07)

There are safety-related terms, such as system safety definition. This involves the application
of engineering management principles, criteria, and techniques to achieve an acceptable level
of mishap risk. As we discussed earlier, risk cannot be reduced to zero, but we can aim for a
risk level of, let's say, 10 to the power of minus 6 failure probability per year. This means that
out of 1 million devices, one device may fail or there is a 1 in a million chance of failure per
year. While this is a low risk, it cannot be reduced to zero, and an acceptable level of risk
must be determined. System safety aims to achieve a low and acceptable level of risk by
using engineering and management approaches.

However, there are operational effectiveness and suitability constraints that must be
considered. We cannot assume that everything is possible; we must consider practical factors
such as time and cost. Things cannot be too expensive to the point of being unusable or less
helpful throughout the system's lifecycle phases. The second term is accident, which refers to
a series of accidents resulting in loss of human life, damage to property, or environmental
damage.

What is a hazard? A hazard is a potential condition that can lead to something bad happening.
However, for that condition to cause harm, certain accidents must occur or certain things
must go wrong. Therefore, a hazard is a condition that exists in the system. An accident turns

42
the system into a mishap. While a hazard is a deterministic quantity that may or may not be
possible, an accident makes it probabilistic because the accidental event may happen at any
moment. As a result, the probability of a mishap happening will be determined by the risk,
which is a probabilistic term that combines hazard and accident.

(Refer Slide Time: 33:36)

What is a safe system? A safe system is a probabilistic system that ensures there is no single
point of failure leading to disasters. Therefore, redundancy in terms of sensors, computers,
and effectors is implemented to minimize the likelihood of harm. At least 2-3 accidents or
failures should occur before something bad happens; a single failure can happen at any
moment due to any reason.

An inherently safe system is a clever mechanism arrangement that cannot cause harm.
However, every arrangement can fail, and when they do, they may lead to harm. Therefore, a
probabilistic concept may also come into play, although it is not always possible.

A failsafe system is one that will not cause harm on failure. So even if it fails, it fails in a
condition that will not result in harmful consequences.

A fault-tolerant system can continue to operate despite faults, although its operation may be
degraded. Fault-tolerance means that even though there is a fault, the system can still
continue to operate without causing problems.

So which is better, failsafe or fault-tolerant? While we would prefer a failsafe system, it is not
always achievable. Therefore, we often have to settle for a fault-tolerant system, where the
system is tolerant of faults and does not cause problems.

43
(Refer Slide Time: 35:20)

Whenever we discuss the concept of risk, the concept of probability comes into play.
Probability refers to the likelihood of a particular adverse event occurring during a specified
period or resulting from a particular challenge, such as a threat or failure. When such an
adverse event occurs, we want to know what the probability of that event is. An adverse event
is an occurrence that produces harm or loss to humans, populations, or the environment.

Risk is calculated by adding up the probabilities of all possible events and their
consequences. The consequence (Ci) refers to the amount of loss, and the probability (pi)
refers to the likelihood of that consequence occurring. If we sum up the probabilities for all
events and their consequences, we get the total risk. This is the formula for linear risk.
However, there is also nonlinear risk, which occurs when high-consequence events are
perceived to have a higher risk than low-consequence events.

For example, there may be 10 motorbike accidents resulting in 1 death each, and 1 bus
accident resulting in 10 deaths. Both cases have the same linear risk of 10 deaths per year.
However, in the second case, the consequence is higher (10 deaths), so the risk is perceived
to be higher. To account for this, we may use a nonlinear factor (such as 2) to adjust the risk
calculation. This formula is used only in specific conditions when specified.

(Refer Slide Time: 38:11)

44
There can be residual risk when designing a system. We develop control measures to reduce
the risk, and this residual risk becomes the factor by which we decide whether the system will
continue to run. In risk analysis, we use all available information to identify hazards and
estimate risks, and then we use the analysis results to determine whether the system should be
allowed to run.

Risk-based approaches are used in various contexts, including for economic purposes. In
terms of safety, we use risk-based approaches to ensure that products, processes, and systems
are safe, considering all hazards and associated risks. This complete process is called
probabilistic risk assessment, which includes various subjects. More details can be found by
following this approach.

45
In future lectures, we will discuss reliability, availability, and maintainability, with a focus on
reliability. We have used the following references in preparing these lectures. Thank you.

46
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 04
Probability Basics

Hello, everyone. So, this is lecture number 4 of our series on Introduction to Reliability
Engineering, We earlier discussed about various terms and definitions and their formulas,
which are used for; used in reliability engineering. As we have seen that most of the
terms are measured in terms of probability.

Therefore, we have to understand and refresh our probability concepts. So, here, we will
be going through basic probability concepts which you might have studied your
secondary education or your B.Tech also. Same thing will be refreshed here because
these are the terms which we will be using when we are discussing other things.

(Refer Slide Time: 01:15)

We will start with sample space. Sample space, if you want to understand sample space,
first is a, there is a common term, which is used as experiment. So what is experiment?
Experiment is used in statistics, which is describing the process which is useful
generating the data. So, experiment is a process by which we are generating the data.

47
Now, this data is like a various outcomes, possible outcomes from the experiment. And
all pos, set of all possible outcomes is called sample space. That means anything which is
possible when we are doing the experiment is covered in sample space. Every outcome in
the sample space itself is called the sample point.

(Refer Slide Time: 02:05)

Like, if we take an example, we are tossing a coin and we are rolling a dice, sorry, yeah.
This is throwing a dice, this is tossing a coin. So when we toss a coin, there are two
possible outcomes, head or tail. Similarly, when we roll a dice, we have six possible
outcomes, 1 to 6. Now, we are doing this together.

Since, we are doing this together, now we have the two outcomes. I, one outcome is on
coin, one outcome is on dice. So, possible outcomes can be head, sorry, the sample points
here can be head and sorry, I am sorry here. Here, actually this is an experiment which
we are doing in a way where we are first tossing a coin.

If coin toss gives a head, then we tossed the coin again. If coin toss gives tail, then we
throw the dice. So we have possible outcomes here. So the possible outcome is that first
time we get head, second time we get head. First time we get head, second time we get
tail. Second case is that first time we get tail.

48
If we get first time tail, then we have dice. So, we will have tail. Now, on the dice we
have six possibilities, 1, 2, 3, 4, 5, 6. So this describes our sample space where each
outcome is sample point, each outcome is head and head, head and tail, tail and 1. All
these outcomes are sample point, and set of all these points is called sample space.

Similarly, whatever experiment we are doing, we will have all possibilities. All
possibilities when we list out, each possibility we are calling as a sample point, and all,
set of all possibilities we are calling as sample space.

(Refer Slide Time: 04:10)

But we are more interested in event. We are not, less interested in sample points, we are
interested in events. What is the event? Event is set, subset of sample space. So, it may be
complete sample space, it may be null or it may be a partial set of the sample space. Any
event A with respect to a particular sample space is simply set of possible outcomes.

Like, if we say, event A, that is outcome of tossing a dice which is divisible by 2. So, we
have two possible outcomes here, 3 and 6. So here, event A means 3 and 6, two outcomes
are there. So two sample points are there, but they together have a meaning. What is the
meaning? Meaning is, they are divisible by 2. So we have the event A, which is a set of
two events, 3 and 6.

49
And operational is, like we can have another example that operational states of
component in which system is considered functioning. So, there can be various states of
the components in which the system, in some states the system may be functioning, in
some states system maybe considered as a failure state.

So, the states where, states of components where system is considered to be functioning,
that can be one event. Another events can be failure events. Therefore, what is an event?
It is a group of outcome of the sample space. And members of these, like these sample
points have some common characteristics.

So, here, whenever, we are saying an event, at that moment, we have to see that there is
some meaning to it. We cannot call anything as an event. To call it an event, we should
have some purpose, some meaning assigned to it. So, we are interested in events, not in
samples, sample points.

(Refer Slide Time: 06:11)

Now, events when we have, as we see that events will have different sets. So they, but
they are subset of sample space. So, in sample space S, like we have event E1 here, E2
here, like same way. So, there are various possibilities. Like, a set can be null set or
empty set. So an event which is not having any outcome belonging to the sample space,
then it is having the null set.

50
Then, we have the union set. Union set for two events or more events. So for two events,
we are showing it here, like two events, E1 and E2 are there. If we take union of this,
then all possibilities covered in E1 and E2, all together we are calling us union. So here,
the outcomes which are there in E1 outcomes which are there in E2 or outcomes which
are common to both, all are covered here.

While if you talk about intersection, then only the outcomes which are common to both
events, they are covered in the intersection. In case of mutually exclusive, the two events
are mutually exclusive when they have nothing common. So, intersection of the two
event will result in a null set.

When we say containment, so like here, E2 is contained in E1. So all events or all
outcomes given in E1, E2 are already part of E1. So E2 is contained in E1. Complement
event is in sample space, everything outside the event is the complement event. So event
E is here, event E bar is everything in sample space which is not there in E. That is our
complement event.

(Refer Slide Time: 08:00)

Here, as we go through, we will see that we have various states, we have various
possibilities. Now, counting of these possibilities and states is required whenever we are
going to study reliability engineering. So, that is why we will be briefly discuss various

51
methods which are used to understand how can we count the total sample points or assess
the sample space.

So, here, let us say if we have key operations, which can be, and each operation can be
performed in different, different ways. Like, first operation can be performed in n 1 ways,
second can be in n 2 ways, like kth operation can be in n k ways. Then, if we do these
operations one by one, then how many ways the operation, total operation can be
performed is, first can be n 1, second can be in n 2, and k can be in n k.

So, if we multiply these possibilities, then the possible ways or this sequence of event
operations can be performed is n 1 into n 2 into till n k. Like, if we take an example, in
voltage divider circuit, we use two resistances. Like voltage divider look like generally,
we may have like this, r 1, r 2.

So, here we have two resistances. Each resistance, let us say, have the five work, five
states. In five states, we have one working estate. Then working with tolerance. Within
tolerance, that is our working estate, failed in short mode, failed in open mode, reduced
resistance, increased resistance.

Now, as we see here, this is the only state where our system will work properly. In all
other cases, the system will not be able to give us the proper output here. The output will
not be having the proper division as we require. So, how many states are there? Both
resistance can have 5-5 states. So, total possibilities are 25.

Like, working with first one working second one working, first one working second,
failed in short mode, first one working second failed in open mode. Like this. So, we
have 5 into 5 possibilities. But out of which, there is only one possibility in which the
system will work. And that possibility is when r 1 is working within tolerance and r 2 is
also working with intolerance.

When any one of them is in work, is in any other state, it will result in a failure. So it has
one success state and it has 24 failure states. So, we need this counting process to
understand that how many states will be there and in what ways we will come to know
which are the failures cases and which other success cases.

52
(Refer Slide Time: 10:52)

The way of doing counting, first one is permutation. Permutation is used for arranging,
that whenever either we have all possibilities arranged in a sequence or we have the part
of the set we are arranging in sequence. Here, the order of positioning is important. So
which happens first, which happens second, like that, that is important.

So, we have n different objects here, like we can have x 1 and x n, different items. Now,
all are different, they are all distinct. So if you want to know the permutation of this, then
permutation of this, n out of n if we take, then this is going to be factorial n. Like, first
place can be filled n ways, second filled, can be filled in n minus 1 ways. Same way, till
1. That is our factorial n.

Similarly, but if we want to pick up some items from here, let us say we pick up r number
of items from here, then those r number of items, we want to arrange. So, such
arrangements will be out of factorial n, n minus r combinations will be gone. Only
factorial n, so factorial n divided by factorial n minus r will be the permutations of
selecting r and arranging them from n number of items.

Similarly, but these are distinct items. Sometimes, the items may not be distinct.
Sometimes, items can be similar. So if n 1 is of one kind, n 2 is of second kind and n k of
nth kind, and if you are arranging all of them, then the total number of possibilities will

53
be factorial n divided by factorial n 1, factorial n 2. So, these are the number of
combinations which we are losing because of the similarity.

So, total combinations, which if they were distinct it was factorial n, but since they were
sim, there were groups similars, similar groups, those similar groups we divide. If there is
no group, then it will become 1. So factorial 1 will be 1, and this will make no change.

(Refer Slide Time: 13:03)

If we take an example that in one year, three awards, one for research, one for teaching,
and one for service is given to a class of 25 graduate students in a statistics department. If
each student can receive at most one award, then how many possible selections are there?
Now, since one person can receive only one award, so there are three pos, three sequence,
1, 2, 3. So, we have 25 P 3 possible permutation.

Like, first award can be given to 25 pupil, but one of them will receive the award. So,
second award can be given only to 24 people, and third award can be only given from one
of the third 23 people. And this gives us the number of per mutations, 25 into 24 into 23.
That is 13,000.

If there was no condition that we cannot give it to, one award to one person, if this
condition was not valid, then it would have been 25 raised to power 3 because anyone

54
can, could be given any award. So same person can be given, then there will be 25, 25, 25
possibilities.

(Refer Slide Time: 14:17)

But we, many times and more frequently used combination. Combinations, what we are
trying to do, we have n set of elements. Like, we have n elements here. Now, out of these,
n elements, we want to group them in separate groups. We want to partition them into
groups. So, let us say we are partitioning them in n 1, n 2, n r groups.

And if you see that submission of n i here, i equal to 1 to r will be equal to n. The whole
group is, the n number of items are divided into different, different groups. Then, how
many possible ways it can be divided into groups, that is factorial n divided by factorial n
1, factorial n 2 till factorial n r. If you see, this is the same formula as we have used for
the permutation, because in a group, items are considered to be same.

So, here, this comes the answer. But most of the time, we are dealing with two states or
we are dealing with two possible groups, like failure and success. So, in that case, we can
say, if we have two groups, then we have n C, if let us say we take r item out of it. So, if
we take r items out of it, either take it out or separate it out, then how many items are
remaining? That is n minus r. So we have r n minus r.

55
This will be equal to factorial n divided by factorial n minus r. I will use this factorial
sign, factorial r. Same formula is here. So, this is given, n C, but here writing the two
things is redundant because if we write n C r or if we write n C n minus r, that is enough
because second item becomes obvious. So n C r is only written.

Rather than writing two terms, we write only one term. If you know, n C r is equal to n C
n minus r, because this is dividing in two group. Whether we take r item or whether we
take n minus r item, it will give you the same two groups.

(Refer Slide Time: 16:35)

So how many ways or 7 graduate students can be assigned to 1 triple 2 double hostel
rooms? So we have 7 pupil, out of 7 people, we have to choose 3 persons for triple room,
2 persons for double room, first double room, 2 persons for second double room.

So, we are dividing the 7 into three parts, 3, 2, 2. So 7, 3, 2, 2. So, 7, 3, 2, 2, factorial 7


divided by factorial 3, factorial 2, factorial 2. This comes out to be 7, 6, 5, 4, 2, 1. And
this is 3, 2, 1. This is 2 1, this is 2 1. So we have 3 2 1 5 4 3, I, and then 2 into 2 is 4, so 7
6 5, 5 sixes are 30, 7 threes are 21, 2 1 0. We get this.

56
(Refer Slide Time: 17:25)

Now, if we fit it in another example, let us say a system is there, which is having 5
identical units. Unit 1, Unit 2, Unit 3, Unit 4, Unit 5. All are identical. Identical means
they are coming from same process or they are looking, or they are similar in that nature.
Now, the system will fail when 3 or more unit fail.

Now, any 3 units, if it fails or 4 units fails or 5 unit fails, in that case, the system will be
in failed condition. So the question is how many total system states are there? How many
total system streets are there? Each system can be 2 states, either fail or success. So 2 2 2
2 2, so that is equal to 2 raised to the power 5, that is 32.

That is my total system states. What are the system states with, what are the system
states? System is all 5. So what are the system states in which 3 units are failing, exactly
3 units are failing? That means 3 failing and 2 working, that means 3, like this is in failed,
this is in failed, this is in failed, this is working, this is working. Another can be fail, fail,
this can be working, this can be failed.

So, possible combination is that means out of the 5 device I am choosing 3 for failure, I
am choosing 2 for success. So I can say 5 C 3 or 5 C 2. That will be called to factorial 5
divided by factorial 3 factorial 2. This comes out to be 10. Similarly, how many devices

57
states will be there with 4 unit failure? That means I am partitioning this into 4 and 1. So,
I select 4 out of the 5, so 5 C 4. That means factorial 5 factorial 4 4 1, that is 5.

And states with 5 units failure, so 5 units means all failed. So, that is only 1 case in which
all failed. That is 5 C 5, 5 factorial 5 divided by factorial 5, that gives 1. We know
factorial 0 and factorial 1, both are equal to 1. And so, total failure states becomes, in this
also it is fail, in this also it is fail, in this also it is fail. So, we get 10 plus 5 plus 1, that is
16. So, this is how we are able to understand that in reliability, how we get the failure
states and success states.

(Refer Slide Time: 19:40)

Now, we try to understand the basic probability. Whenever we are doing an experiment,
we know that there are N different likely outcome, sample space we can say. But we are
interested in an event, and that event is having n posi, n outcomes. And these n outcomes
are corresponding to the event A. So we are interested in that. What is the probability?

So, the probability is number of times event A occurs divided by total number of
outcomes. So, and the same thing when we take it into the data point, that is observations
point of view. In that case, if we say N is the total observations and small n is the
observations in which event A is true.

58
In that case, the probability of event A is n, n is the number of times event A occurred
divided by total number of events or total number of trials which are done, that is capital
N. But here, this will be a biased estimate if we are not able to do this experiment or trial
sufficient number of time. So, this number of trials should be done to a very, very large
amount of time, that is N intending to infinity.

In that case, this gives us a true assessment or true estimate of the probability of event A.
The probability of event A obeys following postulates. Probability is always a positive
quantity, and probability is always a value from 0 to 1. It is a proportion. So, proportion
is 0 to 1, it can be maximum 1, minimum 0.

Probability of certain event equals 0. If any event is certain, we are sure about it, there is
no doubt about it, this has happened already, then this property is equal to 1. Probability
of complement event, that is Probability of A bar is equal to 1 minus P(A). As we saw
earlier, this is A, and this is A bar.

So, Probability of sample space is 1. So, probability of A bar will be called to, now, total
space is S, that is Probability of space, sample space minus probability of A. And we
know Probability of sample space is 1. So, this will become Probability 1 minus P(A).

(Refer Slide Time: 22:07)

59
If we look into Idempotence Law, Idempotence Law is also widely used. We know if we
have an event A, so if we take union of A with A, we get only A. There is no change in
the A. So, if we take union of A with A, the event does not change, event remains same.
Similarly, if we take intersection of A with A, the event does not change, it remains A.

If you take A intersection multiple times also it will remain A, union multiple times also
it will remain A. If I take A intersection B, then again if I intersect with A or again
intersect with B, it is not going to change. It will remain A intersect B only. So intersect
B intersect A intersect B will remain A intersect B. We can say A intersect A is A, B
intersect B is B. So, this remains B A intersection B. Similar is applicable to the union.

Now, if we say that probability of A is p and probability of A bar is q. In that case, if we


say P(A), that is we are saying p, so p into p, p into p means probability of A intersection
with A, this is always going to be p, it is not going to be, it is not equal to p square. Why?
Because this does not get shortened. It remains A only. So p into p does not give p
square, p into p gives p only.

Similarly, if we take union of that P(A) again and again, then p plus p will not give, will
not be equal to 2P. It will be equal to p only. Many times, there can be confusion. Many
times, let us say there are two events, P(A) and P(B). But P(A) is equal to P(B) is equal to
P. Both are, let us say, we are using two register, both are same. So, probability of suc,
reliability or probability of success will be same for both, that is, let us say, p.

In that case, this will become 2 p because this is another event, this is another event. This
is not same event. So, we have to be cautious here that same event, whenever it is there,
we have to solve first and we have to ensure that event level, Idempotence Law is
applied. So p plus p is p, p plus q is 1, p dot q is 0. p plus q means A happens and A bar.

So, that means complete sample space. So, it is 1. p dot q, there is nothing common in p
and q, A and A bar. So, intersection probability is always going to be 0.

60
(Refer Slide Time: 25:12)

There are two types of concepts which comes into the picture, one is independent events
and others is mutually exclusive event. Two events A and B are set to be independent if
they, if occurrence of event A does not depend on the occurrence of B. That means, event
A and B do not influence each other. They are independent of each other.

In that case, probability of occurrence of A given B has occurred is P(A). Similarly,


priority of occurrence of B given A has occurred is P(B). They do not change, they
remain same. Two events are set to be mutually exclusive if both cannot occur
simultaneously. So like tossing a coin, so both head and tail cannot come together, but if
you toss two coins, then they are independent.

What comes on this, like if head comes on this, it is not going to influence whether head
will be here or tail will be here. So, the two events are independent. But for the same
coin, the outcome of head or tail is mutually exclusive because on the same coin, if head
comes, tail will not come, if tail comes, then head will not come. So, intersection of two
or mutually exclusive event is 0 because they do not have anything in common.

Mutually exclusive events are generally dependent events because they are influencing
each other. If head is there then tail will not come. So, they are independent, they are not
considered independent, they are dependent.

61
(Refer Slide Time: 26:45)

Joint probability that A and B occur, so probability of A intersection B is nothing but


probability of A multiplied with probability of B given that A or probability of B
multiplied with probability of A given that B, that B happens and A happens given B
happens, A happens into B happens given A happens.

This is called as Law of Intersection. This is the formula when this is dependent events. If
they are independent, in that case we know, probability of B given A is P(B) and
probability of A given, A given B is P(A). So, in that case, this simply becomes
P(A)P(B), this becomes P(B)P(A), which is same as P(A)P(B).

This is called, if there are, if we have, this is called Multiplication Theorem. So, for
independent cases, the intersection is simply leading to the multiplication of probabilities.

62
(Refer Slide Time: 27:45)

Union Law, like union of two. In union of two, there is a little difficulty here. What
happens, if we have two events here, A and B, then the intersection area, A intersection
B, this is counted in event A also and event B also. So, when we say probability of event
A, then same outcomes are covered in A, which are common as A intersection B, and
same outcomes are also part of B.

So, what will happen, same outcome is counted twice. So the improbability space, this
area is counted twice, in A also and B also. So, but we have to count it only once, so we
have to reduce this once. So this intersection probability is reduced from P(A) plus P(B)
so that A intersection B is not counted twice.

So, this is, so in case of independent, probability of A union B is P(A) plus P(B) minus
P(A) into P(B). If they are dependent, then this will be, this is actually probability of A
intersection B. So, we can either calculate A, P(A) into P(B) given A or calculate as P(B)
into P(A) given that B.

If both are mutually exclusive, then this will be equal to 0. So, this is for depend,
independent, this is for independent, dependent, and this is for mutually exclusive. In
mutually exclusive, this will be 0. And probability of A union B will be summation of the
probabilities.

63
If I want to take three events, now in case of three events, the same formula, if I use, for
independent events, then this will become P(A) plus P(B) plus P(C) minus 2 event
interaction, AB, AC, BC plus three event interaction, ABC. Now, as we see here, these
number of, this is, if I am having n number of events here, so that will, the combinations
would be 2 to the power n minus 1.

So, like here, we have 3, so 2 to the power 3 minus 1, that is 7. If you say 1, 2, 3, 4, 5, 6,
7, because all pos, all possibilities are counted 1, 2, 3, 1 out of 3, 2 out of 3 here, only
possibility which is not counted is absence of all. So, possible com, total possible
combinations is 2 to the power n, out of which one, only possible combination we are not
counting here.

This creates a problem. This formula creates a problem because if we, when we are
having, let us 10 items also, then this will become 2 to the power 10 minus 1, almost
1023. These many combinations, we have to evaluate. This is becoming cumbersome.
We will see that how that can be addressed little later.

(Refer Slide Time: 30:28)

Then there is a concept of conditional probability. Conditional probability is little


different than intersection. In case of conditional probability also, both the events are

64
happening. A happening given B. That means given that event B has happened, what is
the probability that B will hap, A will happen?

Now, this property, when we are evaluating, the problem is in this case A intersection B
also, both the events are happening. But their reference point is changed. Whenever we
say intersection, in that case, out of the sample space, that is P A intersection B divided
by probability of sample space.

But when we say this is conditional probability, then B event has already happened. So,
our sample space is reduced to only B. So, our sample space probability was 1, but now
our samples space has changed to probability B. So, we have to divide by probability of
B, because out of the B, what is the priority that A intersection B happens? That is the
probability of A given B.

So, given that B has happened, the probability of A is only limited to this area. That is A
intersection B. So probability of A intersection B given divided by probability B gives A
given B. Similarly, B given A means given that A has happened, out of that, what is the
priority B will happen? That is the intersection area only, intersection divided by P(A).
So, P A intersection B, we can calculate as from here like, multiplication of this or this,
multiplied by this.

(Refer Slide Time: 32:06)

65
For an example, the probability that a regularly scheduled flight, like we have a flight, we
have departure and arrival. So, flight departure, probability that flight depart as per the
scheduled correct time is 0.83. Flight arrives at the same time, or correct time, is 0.82.
Probability that flight is departing and arriving on time is 0.78. That means, it is departing
as well as arriving on the same time, that is 0.78.

Now, I am interested to know that the plane which is arriving on time, like I am here, my
flight has arrived on time. I want to know what is the probability that it is arriving on
time. That means, I know it has departed on time. I want to know what is the probability
it is arriving on time.

So, probability of arriving on time given it has departed on time, that means out of all
correct departure, how many times it has arrived correctly. So, correct departures has
been 0.83, and how many times it has arrived when correct departure is there, that is 0.78.
So 0.78 divided by 0.83 gives me, that means 94% possibility is there that if a flight
which has departed on time, it will arrive on time.

Similarly, I can evaluate the probability that what is the probability it will, it has been
departed on time given that it has arrived on time. That is, probability of A intersection D
divided by probability of A, that is 0.78 divided by 0.82. That comes out to be 0.95.

(Refer Slide Time: 33:40)

66
I think we will stop it here, and we will continue this discussion to the next lecture.

67
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 05
Probability Basics (Contd.)

(Refer Slide Time: 00:36)

Hello everyone, we have now reached lecture number 5. This is a continuation of our previous
lecture, lecture number 4 on probability basics. We will first discuss De Morgan's theorem. De
Morgan's theorem is related to the complement of events.

68
If we have two events, A and B, and we are interested in their union, taking the complement of it
means we are talking about the area outside A union B. This can be obtained by finding the
intersection of what is outside A and what is outside B. Therefore, anything outside A but not
part of B gives us the complement of A union B.

Similarly, if we are interested in the intersection of A and the complement of A intersection B,


we need to find the union of what is outside A and what is outside B. This is because everything
outside B and everything outside A has to be covered.

So, in a simple way, we can see that when we take the complement of a group, the individual
elements will be complemented and the signs will be reversed. If it is a union, it will become an
intersection, and if it is an intersection, it will become a union.

We can use this property to simplify cases where two events are independent. For example, if we
have n individual events that are a union, we would have had 2 to the power n minus 1 terms to
solve and evaluate. However, using De Morgan's theorem, we can simplify this as the probability
of A union B is equal to the product of 1 minus the probability of the complement of A union B.

The complement of A union B is the intersection of A complement and B complement, which


can be multiplied to get their probability since they are independent. So, the probability of A
union B can be expressed as 1 minus the probability of A union B complement, which is equal to
1 minus P A complement P B complement.

Similarly, for three events, we can simplify it as 1 minus P A complement P B complement P C


complement, which is much simpler than solving it the other way. If there were four events, with
D also involved, we can express it as the probability of D complement multiplied by the previous
expression.

69
(Refer Slide Time: 03:26)

Let's take an example where an electrical system consists of four components: A, B, C, and D.
Their reliability values are given as 0.9, 0.9, 0.8, and 0.8. The system will work if A and B work,
and if either C or D works. Generally, if we represent the system diagrammatically, it will look
like A-B, and then either C or D can work. So, the system will work if A works, B works, and
either C or D works. Now, we want to know the probability of the entire system working, which
means A works, B works, and either C or D works. This can be represented as the probability of
(A intersection B intersection C) union with (A intersection B intersection D), or we can evaluate
it directly.

If we calculate it directly, then we can solve it in this way: the probability of A is 0.9, the
probability of B is 0.9, and we want to calculate the probability of (C union D) working. We can
calculate this by subtracting the probability of (C not working) and (D not working) from 1, i.e.,
1 - probability of (C bar) times the probability of (D bar). What is the probability of (C bar)?
That is 1 minus 0.8, and the probability of (D bar) is also 1 minus 0.8. So, 0.9 times 0.9 times 1
minus 0.2 times 0.2 gives us the probability of 0.7776. Therefore, we are able to calculate this.

P( S ) = P[ A  B  (C  D)] = P( A) P( B) P(C  D) =
0.9  0.9  (1 − (1 − 0.8)  (1 − 0.8)) = 0.81 (1 − 0.04) = 0.7776

70
Now, our second question is: what is the probability that component C does not work given that
the entire system works? That means, in the case where A, B, and D work, but C does not work,
the system will still work. We can represent this as the intersection of (C bar) with S, where S
represents the event that the entire system is working. So, if we take the intersection of (C bar)
with S, then C dot (C bar) will become 0 and will be removed. The only thing remaining will be
PB PA PB PD and PC bar PA PB PD and PC bar divided by the probability that the system is
working. The probability of the system working is already available to us, and we can calculate it
as 0.9 times 0.9 times 0.8, and the probability of (C bar) is 0.2. This gives us a result of 0.1167,
which means that there is only an 11.67% chance that component C is not working when the
system is working.

P(C  S ) P( A) P( B) P( D) P(C) 0.9  0.9  0.8  0.2


P(C∣ S ) = = = = 0.1167
P( S ) 0.7776 0.7776

(Refer Slide Time: 06:31)

There is another important term called the Total Probability Theorem. The Total Probability
Theorem is useful when we have mutually exclusive events. For example, we may have two
events, event A and event A bar, where A bar is the complement of event A. We divide the entire
sample space into two parts, E and E bar, which represent these mutually exclusive events. Now,

71
we want to know the probability of A, but we only know the probability of event A occurring
with E or without E.

To calculate the probability of A, we need to take the intersection of A with both E and E bar,
since they represent the total sample space. Therefore, the probability of A is the sum of the
intersection of A with E and the intersection of A with E bar. If the events are dependent, we can
calculate this probability by multiplying the probabilities. However, if the events are
independent, we can directly calculate the probability without using the Total Probability
Theorem. The Total Probability Theorem is mostly used when there is a dependency between
events.

P( A) = P[( E  A)  ( E  A)] = P( E  A) + P( E  A)
P( A) = P( E ) P( A∣ E ) + P( E ) P( A∣ E )

(Refer Slide Time: 07:51)

This total probability theorem if let us say we divide the whole sample space into here like it is
shown in 5 events. So, k number of events, if k number of events mutually exclusive they do not
share any common space. In that case, I want to calculate then so again, summing up the
intersection probabilities from i equals 1 to k will give me the probability of A or conditional
probability taking condition into account probability of Bi given multiple availability of A given
Bi has already occurred.

72
(Refer Slide Time: 08:27)

So, this case, how is it useful, let us take an example that if we have an assembly plant where 3
machines are there, B1 B2 B3 and they are making 30 percent 45 percent and 25 percent of the
production, so 30 percent production 0.3 is covered by B1, 0.45 is covered by B2 and 0.25 is
covered by total will be as we see total production is divided into a total sample space is divided
into 3 parts total will always be 1, 0.3 0.45 0.25 will be sum will be equal to 1.

Now, here, we know that this machine produces 2 percent defective. This machine produce 3
percent defective and this machine produce 2 percent effective now, what will be the probability
if I am having a product which I have selected from the output which is randomly selected, now,
what is the probability that it is defective the probability that it is defective is equal to probability
that it is coming from machine 1 and it is defective probability that it is coming from second
machinery it is defective and third machine it is defective so we can get it.

So, the probability of A is equal to product B1 into probability of A given B1 probability B2, A
given B2 and probability of B3, A given B3. So, 0.3 into 0.02 like this, if you sum up we are able
to get the probability of defective.

73
There is a Bayes rule, what happens generally here in this example, as we see, the response
variable is defective and the independent variables are the machine output, and they are
defective, they are not changed, they are independent, but outcome which is the product
defective is dependent on the machines. So, machines are independent variables and outcome is
the dependent variable.

(Refer Slide Time: 10:29)

But in case of Bayes we can do the interchange, we can assess the probability for independent
variables given the dependent variable is known so, here what we want to know independent that
is the cause.

So, we want to know the probability of cause based on the outcome, so, we want to check our
cause based on the evidence or based on the observations which you are having. So, we want to
know what is the probability that the outcome is coming from a section Br given that outcome A
is observed, so, this is equal to we know that this is simple reverse formula, we know probability
of Br intersection A is probability of A into Br into this but if you take probability of A here, this
becomes probability of Br A is equal to this.

74
So, this probability of A if you use total probability theorem, it can be expressed like this. So,
probability of Br given A can be calculated using this formula, this can be also calculated in
terms of conditional probabilities like this.

So, we are able to calculate probability of Br given A which is equal to probability of Br


multiplied with probability of A given Br divided by probability of B for all cases, probability of
B into product A given Bi i equal to 1 to k like the example which are taken earlier we are 3
machines were there B1 B2, B3.

P ( Br  A ) P ( Br  A )
P ( Br ∣ A ) = = k
P ( A)  P ( Bi  A )
i =1

P ( Br  A ) P ( Br ) P ( A∣ Br )
= k
= k
 P ( Bi ) P ( A∣ Bi ) P ( Bi ) P ( A∣ Bi )
i =1 i =1

Now, we want to know opposite, we want to know what is the probability that our defective
product which we have found belonging to machine B3 that means, given that product is
defective, what is the probability that it has come from machine number B3 that means
probability of B3 given A so B3 given A means, we can use this formula probability of B3
multiply the probability of A given B 3 divided by for all cases for the Bi or we can directly have
PA, which we have already calculated 0.0245.

P ( B3) P ( A∣ B3) 0.25  0.02 10


P ( B3∣ A) = k
= =
 P ( Bi ) P ( A∣ Bi ) 0.0245 49
i =1

Now, what is the probability of B3 probability of B3 is 0.25 and probability of defective if this it
is B3 is 0.02. So, this if we see solve then this comes out to be 10 divided by 49. So, that means
10 divided 49 and is the probability that a defective product which are found belongs to machine
number B3, similarly we can get for B2 B1 like that.

So, it helps us to investigate the matters and see that what are the more important cases from
where it might have come so, we can do cause analysis also or we can take the do the reverse

75
analysis also using the Bayes rule. There is lot of lot of the things about Bayes which you can
read more about it.

(Refer Slide Time: 13:21)

Next, we can move on to probability distributions. Probability distributions are common terms
used like PDF and CDF. The set of ordered pairs x, f(x) is the probability function or the
probability distribution or the probability mass function. We call this the probability distribution
function or probability mass function for discrete random variables. There are generally two
types of random variables - discrete and continuous. For a discrete random variable, the pair x

76
and f(x) is called the probability distribution function or probability mass function. Here, fx
represents the probability because the total probability of 1 is divided into various small
probabilities for all outcomes. f(x) is always greater than or equal to 0, and the summation of f(x)
will give you 1. f(x) is also a value between 0 and 1, representing the probability that the random
variable x has a value equal to x.

f ( x)  0
Σ x f ( x) = 1
P( X = x) = f ( x)

The cumulative probability distribution function is the summation of probabilities. If we want to


know the cumulative probability up to event x, we can sum all the probabilities from minus
infinity to x. For example, if x is 2, then the cumulative probability includes all probabilities up
to x=2. As x increases, the cumulative probability also increases. If we want to calculate the
probability for x=3, the cumulative probability for 3 includes the probability up to 2 plus the
probability of 3. Therefore, the cumulative distribution is always increasing and never decreases.

x
F ( x) = P( X  x) =  f (t )
t =−

(Refer Slide Time: 15:18)

77
Then comes probability density function same both for discrete and for continuous both we are
saying PDF, but there is a little difference in when we talk about the continuous random variable
for continuous random variable this is called density function not the distribution function. So, it
is called density function because here f(x) is equal to df(x)/dx it is the slope.

So, fx is the unitless quantity x is not the unitless quantities, it is divided. So, this is the density
function. So, it is not unitless while f(x) is unitless in case of discrete random variable as we see
when we multiply dx then only it becomes unitless. So, when we multiply with dx and integrate
now, integration of small f(x) versus all value for all values of x gives you so, area under this
curve is always 1.

Now, if I want to calculate the probability from A to B that is area under the curve from A to B
that is A to B integration of small fx Fx dx this gives me the probability that it is lying between.
Here in case of discrete we should be careful with the size whether equal and here, in case of
continuous whether we put equal or do not put equal that does not make much difference because
point probabilities are equal to 0, point does not have any dimension so, it will not have any area
under this. So, because of that area under this point probabilities are 0.

f ( x)  0

 − f ( x)dx = 1
P(a  X  b) =  ba f ( x)dx

78
So, because of that equal sign here, where we whether we put or do not put does not make much
difference, we could only we should only be careful because, so, that meaning is correct. But for
discrete distribution, it matters if you see because value of x it will be counted or not it depends
on the equal sign or not if I say x is less than x, then well probability of x will not be counted
here. So, that makes a difference here.

(Refer Slide Time: 17:24)

For cumulative distribution functions for same continuous random variable I can get it by
integration. So, that is from minus infinity to t*f(x)dx will give me the cumulative distribution
for this we know that if t is infinity then f t will become 1.

So, f t will look like something like this it will always reach to 1 finally, and will always start
from 0 so, and if I take area under the curve for f(x) then it is always giving me 1 this we have
seen already.

F(t) =  −t  f (x)dx
F (t ) → 1 as t →  .
− f (x)dx = 1

79
(Refer Slide Time: 18:07)

Now, there are two important terms which we use in engineering decisions, one is mean and
another is variance means tells us where the centralization of the values is there and variance
tells us how much spread is there how much is the variability is there in the values.

So, for discrete random variable if I want to calculate mean it is nothing but summation of x
multiplied by f(x), f(x) tells the wait that or how much proportion of time x is supposed to occur.
So, if we multiply this and sum it up, we get the value of mu and variance has given us variances
generally the second moment around mean.

So, from mean how far is it so, from mean whether it is negative side or positive side whatever is
the distance that is x minus mu since we are taking a square this will always give the positive
quantity. So, this is whatever distance we have that is the square of that multiplied by the f(x).
So, that gives me the variance for continuous random variables same thing is done f(x) is taken
f(x) is taken it is integrated over f(x)dx. Similarly, x minus mu whole square f(x)dx, if you see
integrated over full range, full range of full values of x.

80
(Refer Slide Time: 19:33)

Covariance is used for when there is a dependency among X and Y if there is if these are
independent then covariance will be equal to 0. So, covariance of x and y is equal to the expected
value of x minus mu x and y minus mu y that is if you saw that becomes the expected value of x
into y or we can say intersection y and mu x into mu y.

So, if they are independent then we know this is the value of expected value of x multiplied with
the expected value of Y, which is mu x, mu y so that will be 0, but if they are dependent, it will
not be equal to mu x mu y so, in that case only, we will have the covariance value.

 xy = E ( X −  x ) (Y −  y ) = E ( X .Y ) −  x  y

81
(Refer Slide Time: 20:16)

We will discuss some important distributions. So, we will discuss some discrete random variable
distributions, first this distribution is uniform, then Bernoulli, then binomial, then Poisson. So,
uniform distribution as you know uniform is equally likely. So, if we say since we are talking
about discrete, so, we have let us say, if there are k possible outcomes 1 2 3 three possible
outcomes are there for like when we say tossing a coin, then we have two possible outcomes.

If we say when we are rolling a dice, then we have 6 possible outcome. So, this gives us the
probability now, all outcomes are equal probable, since all outcomes are equal probable, let us

82
say probability is p. And if I am talking about let us say 6 outcomes, then all are P so, this will
become 6 p.

So, summation we know should be equal to 1, so, 1 p will be equal to 1 upon 6 if I say K
outcome, then k p is equal to 1 so, p will be equal to 1 upon k. So, the probability this p is
nothing but f(x) is constant all outcomes are equally likely with the probability 1 upon K. The
mean and variance for this distribution are given mean will be nothing but summation of this
multiplied with 1 upon K because all f(x) is same so, xi into 1 upon k.

So, summation of x divided by k will give the mu and sigma squared will be 1 upon k summation
of x minus mu whole square.

(Refer Slide Time: 22:00)

Then there is another distribution which is used for Bernoulli like for tossing a coin, tossing a
coin is actually equal probable so, we use we can use uniform. But if the two events are not
equally likely, then we can use a Bernoulli for equally likely also, we can use Bernoulli but, there
are two possibilities, let us say in case of reliability, we talk about two outcomes either success
or failure.

So, here the random variable represent whether the system is state is success or failure. So,
success state’s probability is p if success state’s probability is p then by default failure

83
probability will be 1 minus p that is we are representing as q now, Bernoulli distribution can
have two outcomes either 0 or 1, 0 means unsuccess 1 means success.

So, fx can be given as p to the power x q to the power 1 minus x. So, when I put x equal to 0 p to
the power 0 will become 1 q to the power will become 1. So, that will become q if I put x equal
to one p to the power 1 will become p, q to the power 1 minus one 0 will become 1.

So, this becomes p, mean and variance for this is given as mean of this f(x) is p and variance has
given us pq but generally this is for single item or single coin toss or single item failure, but
many most of the time we are having multiple events happening together.

(Refer Slide Time: 23:28)

So, if a Bernoulli trial can result that means, if you do multiple Bernoulli trials, and each trial, the
probability is same, probability of success or failure is same, probability of success is p and
probability of failure is 1 minus p, then, this will become a binomial distribution. In binomial
distribution, the Bernoulli trial is done n number of times.

So, here we have two parameters, number of trials and probability of success p. So, binomial
distribution probability can be calculated as nCx, because, for each outcome, let us say n
outcomes are there, I am, let us say 4, I am doing the 4 trials, and so, 4 trials can be like all are
success or one of them is failed. One of them fail can be in various ways like this.

84
n
b( x; n, p) =   p x q n − x , x = 0,1,, n
 x

So, when I am talking about X number of successes out of it, let us say I am talking about 2
success out of 4. So, in that case, the possible cases of 2 success out of 4 will be 1100 1110 1010
01. There will be 6 possible states 01100011 and 0101 4C2, 4 into 3 divided by 2. So, that
becomes 6 states.

So, have what happens here in all state what is the probability, probability is same, this is
success, this is success. So, p into p, p squared and q into q, q squared. So, the total probability
will become this plus again p squared q squared like this.

So, if I sum it up that becomes ncx p to the power x, q to the power n minus x, x is number of
successes and n minus x will become number of failures, this gives me the total probability
binomial probability for observing x number of successes out of x trials, where p is the
probability of success, mean of this is given as like for Bernoulli it was p now, for e, n trials are
done so, mean becomes n into p.

And sigma squared is also summation of all sigma that is pq some n number of times that
becomes npq as we see that calculation of this ncx factorial n divided by factorial n minus x n
factorial x if I say n is very very large if n is 1000, if I want to know 10 out of 1000 then this
probability calculation would be difficult, the value of factorial 1000 will become very high.

So, in that case, this binomial distribution is many times approximated as either normal
distribution or Poisson distribution. So, when value of P is somewhere around middle 0.5 0.4 0.5
0.6 or around that somewhere in middle then in that case, this binomial distributions follows the
normal distribution for normal distributions, we know there are two parameters mu and sigma.
So, mu will be equal to NP here and sigma will be equal to square root of NPQ these two
parameters if you use then for normal tables we can use to evaluate the probabilities.

85
(Refer Slide Time: 27:05)

Let us take an example, if x a random variable representing the number of failed components
among 5 independent identical components, each component has one chance in 100 of failing.
So, probability of failure here is p equal to 1 out of 100 that is 0.01.

Now, I want to know x equal to 0 1 2 has a binomial distribution, we have already found that that
is probability is 0.01 and n is equal to 5 now, I want to know the mean number of failure, so, the
mean number of failure will be np, so, 5 into p so, 5 into 0.01 that will be 0.05 if I want to know
the variance that is npq 5 into 0.01 0.99 0.0495 if I want to know the probability of exactly 1
failure that means f of 1 that is our 5 c 1 and p to the power 1 and 1 minus p to the power 4. So,
this we calculate we get the 0.048.

86
(Refer Slide Time: 28:26)

Then comes Poisson distribution, Poisson distribution is probability distribution where number
of successes but here like here we were talking, in binomial we were talking about our samples
all possibilities, sample space were the countable like 5 10.

So, that was also discrete, but in case of Poisson we take the possibility in continuous space that
is time. So, let us say we want to know how many failures in 100 hours in that case, we will be
using Poisson distribution. So, here number of success occurring in a given time interval we
want to know or any specific reason that means, let us say 100 kilometers someone has table
what is the number of failures a person can observe or how many breaks a person will take.

So, let us say mu is the average number of successes, mu is only parameter for Poisson
distribution, that is giving the mean number of successes. So, probability of x can be obtained as
probability of x number of successes I can get with the same parameter mu that is given as e to
the power minus mu multiplied mu to power x divided by factorial x where x can be 0 to infinity
and mean and variance both are equal to mu here.

e−   x
p( x;  ) = , x = 0,1, 2, 
x!

So, the Poisson distribution has single parameter mu, and the parameter value is, here in
reliability this mu is considered as lambda t where lambda is the failure rate or it is also called as

87
arrival rate and t is the time. So, multiplying constant failure rate arrival rate multiplied by time
will give you how many.

So, let us say arrival rate is 10 per day, let us say our production rate is 10 per day. So, if I want
to know how much is the production in 10 days that will be equal to 10 into 10. So, mu will be
equal to 100 that is 10 is the arrival rate and t is the 10 days, so, that will become 100, same way
I can this mu will be calculated.

So, thus as we discussed earlier binomial distribution if the value of P is somewhere around 0.01
0.1 means it is on lesser side or higher side let us say 0.99, in the case of 0.99 I can consider it
equivalent to q, I will use q as p and rather than np as q in that case q will be equal to 0.01.

So, I will use q as p in that case what will happen the probability will become smaller. So, when
probabilities are smaller in that case I can use, convert the binomial distribution as the Poisson
distribution.

So, if p is let us say 0.01 and then what will happen q will be 0.99 so, mean will become n into p.
So, if let us say n is 1000. So, 1000 into 0.01 will give me 10 so, my mu will become 10 in that
case, I will be able to use the poison distribution with mu equal to 10. So, as we see here that
binomial distributions many times is difficult to evaluate. So, we may be able to use the Poisson
distribution as the.

(Refer Slide Time: 31:53)

88
Let x be the random variable which is representing number of failures and subsequent repairs of
restorable over a 1 year period assuming x as a Poisson distribution with mean failure rate of 2
failures per year.

So, lambda is 2 per year. And we want to know the probability that there is no more than one
failure no more than one failure means, we want to know the probability of x equal to 0 and 1.
So, probability that x is less than equal to 1. So, I want to know the probability of cumulative
probability up to 1, F 0 or 1.

So, x equal to 0 to 1 e to the power minus, now 2 per year failure and I want to know for 1 year
so, that will be 2 into 1 that will be mu will be equal to 2. So e to the power minus 2, 2 to the
power x divided by factorial x, now, if I put x equal to 0 this will become e to the power minus 2
into 2 to the power 01 so 1 plus if I put x equal to 1 this will become 2 divided by factorial 1, 1
plus 2. So that will become 3 into e to the power minus 2 which if we saw we get 0.406.

1 e−2 2 x
P( X  1) = F (1) =  = 0.406
x =0 x !

(Refer Slide Time: 33:05)

89
"So, that's all for now. For more details, you can refer to the Walpole book which can help you
understand probability and the importance of continuous distributions in reliability theory. We
will discuss them one by one in our upcoming lectures. Thank you."

90
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 06
Constant Failure Rate Model-I

Hello everyone, we have completed one week of lectures, and now we have moved on to the
second week. Our focus this week will be on constant failure rate models, also known as the
exponential distribution. In the previous week, we discussed various reliability indices, their
definitions, and their basic relationships. This week, we will be focusing on the exponential
distribution, including how it is used for modeling and how it can be used to evaluate reliability.
(Refer Slide Time: 00:59)

As discussed in previous lectures, the exponential distribution is characterized by a constant failure


rate. This failure rate, denoted as λ(t), remains unchanged over time. Therefore, if the failure rate
remains constant, the default model becomes the exponential distribution.

Failures can occur due to wear and tear, and the exponential distribution finds application in such
scenarios. As we discussed earlier, the bathtub curve represents λ(t) and t, where λ represents the
failure rate during the useful life of the component. In the case of the exponential distribution, λ is
constant during the useful life of the component. This constant rate is significant because it

91
represents the period during which the component is expected to function correctly without any
manufacturing or design defects or degradation.

The reason why failures occur randomly or due to chance events is that any component can fail
for these reasons. Therefore, there is no specific assignable cause to be found, and this can happen
to any device. Hence, any device can fail, and the failure rate follows the exponential distribution.
Since the exponential distribution is a one-parameter distribution with various properties, it is easy
to handle and can be used in most relative calculations. When referring to system reliability
standards, if no distribution is mentioned, it is assumed to be an exponential distribution.

The exponential distribution is so popular that reliability has become synonymous with it. The
reliability R(T) is equal to e^(-λ(t)), which is only applicable for exponential distribution.
However, other distributions are also used, but they are more constrained to the data analysis part.
Specifically, for reliability prediction data, the assumed distribution is almost always the
exponential distribution. When collecting and analyzing data, other distributions with a non-
constant failure rate may be used, but due to its popularity, the exponential distribution is used
almost everywhere. This distribution has only one parameter, which is time, t, and the function of
lambda is the only parameter.

Lambda, which is the parameter, can be easily evaluated from the data for reliability. Reliability
data is generally the time to failure, like t1, t2, t3, because the random variable is the time to failure.
For example, if we put 100 devices on a test and had ten failures in 500 hours, the times may be
recorded as first failure at 10 hours, second at 15 hours, third at 100 hours, fifth at 150 hours, sixth
at 200 hours, and seventh at 250 hours, and so on. We may have various data points listed as t1 to
tn. Out of the 100 devices, 90 devices have not failed, so they have worked for 500 hours. Lambda
can be easily calculated as the number of failures divided by the cumulative time of operation. In
our example, the number of failures is 10, and the cumulative time to failure is the summation of
ti, where i ranges from 1 to 10, and for the remaining 90 devices, it is 90 multiplied by 500 hours.

This formula is only valid for the exponential distribution, but it is so easy to understand that the
industries assume it whenever there are multiple failures, but many devices have not failed. These
are called sensor devices because they have not failed, and we do not know when they will fail.
However, we know they will fail after 500 hours since they have not failed yet. Using this

92
information, we can easily calculate lambda, which can help us evaluate other parameters. The
same is not the case when considering time-dependent failure rate models.

In this week, our focus will mostly be on discussing the constant failure rate models. After
completing the constant failure rate models, we will discuss time-dependent failure rate models.
(Refer Slide Time: 07:46)

In case of constant failure rate models if we see λ(t), but that is over hazarded it is constant. Now,
we know from the formula that R(T) is equal to given us e to the power minus integration from 0
to t, z(t) dt or lambda t dt, this formula or lambda x dx this formula we have already discussed in
previous classes. So, as we see this comes out to be e to the power minus, now, this is constant.
So, let us see constant values lambda and dx. So, this will be equal to e to the power minus x from
0 to t and lambda this will give us e to the power minus lambda t because, when we put 0 that will
become x t minus 0 so, that will become t.

𝜆(𝑡) = 𝑧(𝑡) = 𝜆

𝑡
𝑅(𝑡) = 𝑒 −∫0 𝜆𝑑𝑥 = 𝑒 −𝜆𝑡

So, this is our R(T) formula what is, so, this is how we get the R(T) as we know this distribution
is all applicable only for t greater than equal to 0 time is not considered to be negative quantity
here. Similarly, we can calculate unreliability, unreliability means we can also say it is the failure

93
probability. Failure probability is nothing but 1 minus reliability. So, that is 1 minus e to the power
minus lambda t. Probability density function for the same we can evaluate as we know earlier
formula that lambda t is equal to f(t) upon R(T).

𝐹(𝑡) = 1 − 𝑅(𝑡) = 1 − 𝑒 −𝜆𝑡

𝑓(𝑡) = 𝜆(𝑡)𝑅(𝑡) = 𝜆𝑒 −𝜆𝑡

So, we can get f(t) from here f(t) will be equal to 𝜆𝑡 into R(t). Now, lambda t is constant here that
is lambda and R(t) we know that is we have already calculated 𝑒 −𝜆𝑡 . So, the same becomes PDF
here. For calculation of mean time to failure, as we know mean time to failure MTTF is equal to
integration over full range. Now, t is not negative t can be only from 0 to infinity RTDT, what is
R(t)dt? R(t) is 𝑒 −𝜆𝑡 dt this if you integrate this will become minus one upon lambda, e to the power
minus lambda t from 0 to infinity.

∞ ∞
1 1
MTTF = ∫ 𝑒 −𝜆𝑡 𝑑𝑡 = − 𝑒 −𝜆𝑡 | =
0 𝜆 0 𝜆

Then we put limit equal to infinity this will become minus 0, minus minus will become plus t equal
to 0 putting it will give the 1 upon lambda. So, as we see, MTTF comes out to be 1 upon lambda.
So, or we can say that, MTTF is inverse of the parameter only parameter is the failure rate here
constant failure rate that is lambda. Similarly, we can evaluate variance, variances nothing but the
expectation of t minus MTTF whole square expectation of t minus MTTF square.

So, once we put it then this will become as we know from the formula this is integration from t
squared FTDT minus mean square, mean here is MTTF and if we integrate this we get the 2MTTF
square. So 2MTTF square minus MTTF square gives us MTTF square, and we know what is
MTTF that is 1 upon lambda, so this becomes 1 upon lambda square.


1
𝜎 2 = ∫ 𝑡 2 𝑓(𝑡)𝑑𝑡 − 𝑀𝑇𝑇𝐹 2 = 𝑀𝑇𝑇𝐹 2 =
0 𝜆2

So standard deviation for this is square root of variance, that is will become one upon lambda. So
if we see here that our standard deviation for exponential distribution is same as MTTF. So, if our

94
MTTF is increasing, that means variability in the distribution is also increasing that means
uncertainty in our distribution is also increasing.

1
𝜎 = 𝜆 = MTTF

(Refer Slide Time: 11:51)

95
96
97
98
99
100
101
102
103
Now, let us discuss how this exponential distribution can be varied over various functions. So, for
that, let us see if I think we may not be able to evaluate here I will show it in a, here, so, let us see
in this view, if I have just put a Excel sheet here that is Excel sheet if we see this Excel sheet is for
same, just let me expand this a little bit, if you see it here, I have just put all the formulas here, the
only parameter here is, just give me a second lambda. So the parameter lambda now I can visualize
that if I am changing value of lambda how various values how this distribution will vary.

I am sorry, just give me a second. So, lambda is here, if you see this exponential distribution, I
have plotted lambda here this is my lambda value. So, since lambda is constant, this will always
be a straight line, it does not change with time.

Next, this value is small effects that is probability density function. So probability density function,
if you see it starts with high value and then decreases the starting value of this is same as the
lambda value that is 2, because at the start, as we will see, if we put t equal to 0, then lambda e to
the power minus lambda t when we put t equal to 0 it will become lambda, it starts from lambda,
then it will keep on decreasing in an exponential rate.

Then we have this value as the reliability, reliability always start from one then keep on
exponentially decreasing. Now, similarly, I have also plotted the unreliability, unreliability always
start from 0 then keeps on increasing. So, this value I have plotted for lambda value equal to 2 and

104
this I plotted on a time scale from 0 to 4. Now, if I let us say change this from 2 to 5, then let us
see what kind of change happens.

When I change to 5 if you see that my this fall has become faster, the FT which was slowly
changing now is changing very fast it is coming down very fast. Similarly, reliability is also coming
down very fast. Let us for more understanding let us do one by one let us see how it is affecting
the reliability first. Let us see how reliability is affected here.

So, if I put lambda equal to 2, lambda equal to 2 means, my MTTF is 0.5 hour in that case this will
look like this, if I put lambda equal to let us say 0.5, you see it is almost looking like a it is taking
larger time to decrease because MTTF is higher, if I put let us say 0.1 it is becoming looking like
it is this starting period is looking like almost like a straight line because and decreases only up to
0.6 around.

So, it will be going slowly, slowly sloping over a long range, MTTF is 10 here, but when I put this
5 here, you will see there is a sudden and sharp like within 0.5 hours of 1 hour the reliability
becomes almost equal to 0. Similarly, if you look at the impact of changing this parameter on
values of FX, FX is our failure density quality density function since quality density functions
when lambda is large, what will happen same, because this will follow almost similar pattern as
the reliability curve, like 5 is there it is starting from 5 then steeply decreasing that means most of
the failure we are observing in this region.

If I make it 0.5, then it is starting from 0.5 but it is taking longer to decrease and it is also not
reaching to 0 very fast. If I make it further less 0.1 will see that it is even decreasing small, almost
linearly it is looking at because it is the initial degradation only, initial downfall only. Similarly,
whatever we have observed RT same values we can observe for if you look FX and RX they are
one minus of that. So, whatever the pattern is there opposite pattern will be there for the reliability
as you see similar this is decreasing whatever rate it is decreasing it will increase with the same
rate.

If I put lambda equal to 1, then it will look like this. If I put lambda equal to 5, it will steeply
change quickly reaches to 1 quickly reaches to 0. So, this gives us an idea that how exponential

105
curve will look like and failure densities failure it is always going to be same here. So we will
continue our discussion, now here or if we look at it.

(Refer Slide Time: 18:29)

Now let us say many times we are interested to evaluate design life. So reliability for time. So for
design life or some other like mean life median life etcetera. We have calculated values for MTTF,
which is equal to 1 upon lambda. Now, we want to calculate reliability, how much is the reliability
for when time is equal to MTTF or time is equal to 1 upon lambda.

So, we know reliability at time t is e to the power minus lambda t when t is equal to 1 upon lambda
this will become e to the power minus 1 and e to the power minus 1 is equal to 0.368. What does
it mean that when time is equal to mean time at the time we have almost 36.8 percent failure and
how much is only working that means around 73.2 Sorry 63.2. So, we have only 63.2 percent
equipment which are working, that what does it mean? That if I have 100 devices which are put
on uses, then by MTTF I am time t equal to MTTF if let us say MTTF is 100 hours or let us say
500 hours and I put 100 devices on test.

𝑅(𝑀𝑇𝑇𝐹) = 𝑒 −𝜆×𝑀𝑇𝑇𝐹 = 𝑒 −1 = 0.368

Then, in 500 hours I am expecting that out of 100, almost 63 devices will fail and only 37 will be
working. Does that means more most of the devices are going to fail. So when we use MTTF as a

106
criteria for reliability, we have to be a little careful. We have also seen in one example earlier that
when we have used two different distributions, then even for same MTTF reliability was different.

Reliability is telling us that what is the probability of failure or how much proportion of failure I
am expecting from the population, which is much more important parameter to know compared to
the MTTF, MTTF can, because of large variability in exponential distribution. MTTF can be a
little bit misguiding here, because in case of here, the by the time we say e is equal to MTTF, or
many devices proportion of devices have already failed.

So we have to be cautious when using MTTF as the reliability parameter, and MTTF should not
be considered that this is a good time for which system because in general, we may like that most
of our devices should be working so MTTF is not like always 50 percent, 50 percent if you are
interested, then the we go for the median time, median time is by the time t median we will observe
50 percent will be working 50 percent will be failed. So, it is the middle portion of the number of
failures.

𝑅(𝑡𝑚𝑒𝑑 ) = 0.50 = 𝑒 −𝜆𝑡𝑚𝑒𝑑


1 0.69315
𝑡𝑚𝑒𝑑 = − ln (0.5) = = 0.69315 MTTF
𝜆 𝜆
𝑅(𝑡𝑑 ) = 0.9
0.90 = 𝑒 −𝜆𝑡𝑑
1
𝑡𝑑 = − ln (0.9) = 0.10536𝑀𝑇𝑇𝐹
𝜆

So, by the time we expect that 50 percent of the failures, so, reliability becomes 0.5 and that is
equal to e to the power minus time t equal to t median. So, using this reverse calculation, we can
get it if I take log both side, so ln of 0.5 will be equal to minus lambda t median. So, t median will
be equal to minus 1 upon lambda ln of 0.5 same thing here and which comes out to be this value.

So, if you see that one upon lambda is coming now, 1 upon lambda we can say it is equal to MTTF,
so, that can be replaced as MTTF. So, if you see that median time is around 0.69 or around 70
percent of the MTTF. So, median is less than MTTF. Let us say we are interested to know the time
by which our some reliability target will be achieved. So, we are producing certain product let us
if we want to say our reliability target is 0.9, we are promising that our reliability that means 0.9
reliability will be there for a certain time I want to know that how much will be that time.

107
So, that means, we are promising that our design will have only 10 percent failure by time td. So,
this becomes our design life, td becomes our design life, that in design life we are expecting only
90 percent fail. So, we are designing for 90 percent reliability 10 percent failures only. So,
reliability e to the power minus lambda into td will become 0.9 here, you can also design for 95
percent reliability, in that case this will become 0.95, you can also design for 98 percent reliability.

So, reliability how much reliability you are designing for that is to be determined by the based on
the customer or based on the product, if we have a product requirement which requires a very high
reliability to be delivered, then accordingly we can set this reliability and same value can be used
here and according to that, if we have the failure rate information constant failure rate parameter
is known to us then we can get the value of TD. So, in this case TD will be equal to same as we
have done earlier minus 1 upon lambda ln of 0.9 and that will come out to be 0.1, this I have
calculated and put it here that is around 10 percent of MTTF.

So that means whatever MTTF I have calculated let us say my MTTF was 500 hours. So that
means for 500 hours, I was expecting 63 percent failure. But for 10 percent failure, I have to
promise the life which is almost like a 10 percent, almost like a 50 hours.

So, for design life, which I will be aiming here will be 50 hours. That means I will be promising
customer I will be telling the customer that for 50 hours I am giving you the equipment because in
this duration only. I am expecting that I will meet your criteria of reliability of 0.9, though MTTF
is 500. So we have to be cautious here it is better to design or promise reliability based on the
design reliability for a given reliability value rather than MTTF.

(Refer Slide Time: 24:55)

108
Let us take one example that there is a microwave transmitter and this microwave transmitter is
having a failure rate of 0.00034 failures per operating hour. So, can we determine MTTF we know
MTTF. So, this becomes our lambda say MTTF will be equal to 1 upon lambda that will be equal
to 1 upon 0.00034, we want to know t median, t median as we have seen earlier that is 69315 or
around 70 percent 0.69315 into whatever we have got here that is MTTF we can calculate this.

Similarly, if we want to know reliability for 30 days of continuous operation now, we want to
know reliability of 30 days, but this is given in hour. So, we have to convert days into hours, so,
30 into 24 that will become number of hours, this will be equal to, e to the power minus 30 into 24

109
multiply by lambda lambda is 0.00034, design life for reliability of 0.95 so, that is same minus 1
upon lambda ln of 0.95 we calculate this we will get the t 0.95.

(Refer Slide Time: 26:37)

110
So, here are these calculations we can do if it is not shown here so, let us see if we do it here. So,
we have seen the formula if we want we can show it here by using the Excel sheet. So, I think this
is being used first time. So, my lambda here was I will just copy that same thing from here, that
was 0.00034, I want to calculate MTTF. So, MTTF is equal to this is equal to 1 divided by lambda,
then we can calculate t median, median time to failure is equal to MTTF into 0.69315 I think, 315.

Next we have to calculate is reliability for 30 days are 30 into 24 so, this is equal to exponential
minus lambda into 30 into 24, our reliability comes out to be 78 percent or 0.78. Next what we
wanted to calculate is design life for reliability of 0.95 t 0.95. As we have seen this is equal to
minus ln of 0.95 divided by lambda, lambda is this.

So my design life comes out to be around 150 hours. If we want to convert into days, I can divide
this by 24, because it is having the continuous operation, so around 6 days, so, around 6 days, there
will be that design life for this equipment, I will be promising the customer that for 6 days you, it
will continuously run without any much problem, reliability will be 0.95. So, we will stop it here
and we will continue our discussion in the next lecture. Thank you.

111
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 07
Constant Failure Rate Model-II

Hello everyone, now, we move to lecture number 7, which is continuation of lecture number 6
on exponential distribution or constant failure rate models.

(Refer Slide Time: 00:38)

Constant failure rate models show a very important and a distinct property that this is a
memorylessness. So, this memorylessness property what does it mean, means memorylessness
we can also say it is a forgetfulness. So, what is this forgetfulness which means in terms of this
exponential distribution of CFR model. Now, here what does it mean that it is for what it is
forgetting, it is for getting the age, like someone like us whatever as we age, we have a failure
rate will increase our hazarded or we can say the fatality rate or we can say the mortality rate will
increase.

Similarly, the components as they age we expect them they will wear out they will have
degradation and because of that, later on the chances of failures will increase or failure rate will
also increase. But here, CFR has a constant failure rate has a specific meaning that per

112
probability, if the component is surviving up to a certain time, then probability of failure per unit
time remains same. It does not change.

What does it mean? That means, in terms of aging, if you see that component is not aging,
component is forgetting its age. It does not remember that it has aged it does not remember how
long it has worked. So, let us say if we say so, this property I can explain using this conditional
reliability formula, conditional reliability formula we have already discussed in previous class.
So, condition reliability formula RT given t0 means that RT plus t0, R t given t0 that means,
what does it mean? I am here at t0 time, and I want to know reliability in future of t time.

That means, t plus t0 time, as we see here, at t0 time I am sitting at t0 time that means, let us say
if I take an example of any let us say let us take example for let us say watch. So, for watch, I
have used it for let us see one year. So, my t0 is one year and it is working right now, that after
one year of uses currently it is working. Now, let us say watch follows the exponential
distribution.

So in that case, I am interested to know that if I use this watch for further 2 years, what will be
the reliability that means today it is working that is one year and that means on completion of 3
years, but from today it is 2 years, what is going to be the reliability. So reliability for 2 years
given that it is worked for one year reliability for 2 years, given that it is worked for one year.
Now, it is following the constant failure rate lambda.

This if I calculate, then this will be equal to e to the power minus total time is 3. So 3 lambda
divided by how long it is worked, that is e to the power minus lambda t0 is 1. So this will
become 1, this is 1 plus 2 will become 3. So I get e to the power minus 2 lambda. That is the
same time 2 years, as we see this time is 2 years, as you see here, that whatever is my failure
probability for 2 years, it remains same.

Let us say if I have done I have used a watch for 2 years. And now I am talking about 2 more
years that means up to 4, in that case, this value will be equal to e to the power minus 4 lambda
divided by e to the power minus 2 lambda, again, I get e to the power minus2 lambda. That

113
means probability of working for 2 years here is same irrespective of the age whether I have
worked used it for one year or whether I have used it for 2 years. If it is working today, then for
next 2 years the probability of failure of or probability of success is going to be same, it is not
going to change it change only with amount of time.

If I make it 3 years this will become e to the power minus 3 lambda t, irrespective of the time it
has already spent. So, in a way, we are saying that this is a memoryless property that it does not
remember that it has worked for so many time. So, whatever it has worked for, it does not
cumulate any damage, it remains same as when it was new, the chances of failure does not
change with the age or with the life, this is a memoryless property, which becomes very useful
here, because here the chances of failures per unit times are not changing with the time.

(Refer Slide Time: 05:45)

In case of multiple failure modes, so, generally what happens our devices are having complex
systems, which is where multiple components are there, multiple subsystems are there that let us
say, if I talk about a communication system, so, in a communication system the failure can
happen either due to the antenna failure or it can happen due to the failure of receiver or failure
of the transmitter.

If there are multiple challenges multiple elements over there, it can also be due to the power
failure. Now, as we see that the system can fail in multiple ways, why it is failing in multiple

114
ways, because, there are multiple components which can fail in different, different ways. So,
system is also can fail in different ways, these different ways we are calling as the failure modes.
So, these are the failure modes of the system.

Now, these failure modes now, what is happening, the failure mechanism, failure mechanism
tells us that, why they are failing, how they are failing, what is changing. See, if everything
remains same, then there is no chance that there will be a failure. If things stays same, then and
the uses also remains stay same, then there will be no chance of failure it will remain as it is if it
is working today, it will work next time also, but what happens over the time the way we are
using that is our load that is changing as well as the components or the system which are using
they are also changing because nothing remains same with the time everything changes with the
time and uses.

So, as they are changing what happens there is a degradation or there is some sort of
development of fault or there is excessive stress coming to this. So, different failure mechanisms
are involve, like antenna failure causes would be different, failure pattern would be different,
failure distribution would be different compared to the receiver failure distribution or transmitter
failure distribution or power system failure power source distribution. So, different failure modes
generally are involving different failure mechanism.

For antenna failure, it may be the dimension property or material problem or there may be
binding or there may be some other issues, but if I am talking about a power failure, it may be
that battery failure battery cell is not working properly or some wiring failure has happened or
some it is not retaining the charge. So, various different reasons different failure mechanisms
will be there.

So, due to which we they are tend to having the different failure distributions I want to know the
reliability. So, for knowing the reliability I know I need to know the reliability against each
failure mode, because for my system to work, the system should not fail in any failure mode. So,
that means, if for my system to work, if let us say there are n number of failure modes that
means, there are n possible components or n possible ways the system can fail each. So, the
reliability of the system is if any of the failure mode happens the system will fail.

115
So, that means, for each failure mode it should be reliable, it should not fail in that failure mode.
So, 1 minus failure probability for each failure mode that is my RIT. So, probability that it is not
failing in ith failure mode that is we are calling as RIT. So, for system reliability, what will be
the probability it does not fail in any of, so, that means, it does not fail in any means,
intersection. Neither it fails in failure mode 1 nor it fails in failure mode 2, like nor it fails in
failure mode n.

So, reliability will be equal to reliability, let us say we get reliability for this as R1T for this R2T.
So, as we know as we have discussed earlier, we follow the multiplication principle here. So,
whenever all has to work that means, we have to use some multiplication because all have to
work. So, if there is an intersection involved this is working and this is working and this is
working and relationship, we use the multiplication here.

So, once we multiply here we get the system reliability. So, system reliability is that all our
reliable or we can say none of the failure occurs. Now, RIT as we know that is equal to e to the
power minus lambda I x dx, this is in the case of general model where lambda i can be the time
dependent. So, first we will do it for the time dependent and then we will see if this is time
independent or constant failure rate how the change will happen.

So, the same formula that we developed for time dependent same formula will be applicable for
the time independent also. So, as we see here, if we do this now, here if you see that we get for
lambda 1 lambda 2 lambda 3 x I will get it. Now, in this case if I want to know the reliability
here to see that how this can be solved. Now, we are multiplying it, so, for an example, let us say
if I had the 2 failure modes. So, for 2 failure modes this will be 0 to t lambda 1x dx multiply with
e to the power minus 0 to t lambda 2x dx.

Now, the same thing I can write it as e to the power minus integration from 0 to t lambda 1x dx
plus 0 to t lambda 2x dx, same way I can do it further I can write it as e to the power minus
integration from 0 to t lambda 1 plus lambda 1x plus lambda 2x into dx. So, same way if I have n
number of components this will become lambda 1x plus lambda 2x plus lambda 3x till lambda

116
nx. So, this I can write it as e to the power minus integration from 0 to t summation i equal to 1
to n lambda I x dx same thing we have written it here.

So, this way what does it mean now, if I call this as a lambda x. So, lambda x will be equal to
summation i equal to 1 to n lambda i x, what does it mean that if I sum up individual failure rates
for given time t then I will get the failure rate at time t for the system. So, failure rates are when
system is having multiple failure modes, then system failure rate is summation of the failure rates
of the each failure mode.

So, this is generic this is independent whether this is time dependent failure distribution or failure
rate or it is time independent failure rate whatever is the failure rate at that time t we take it for
all failure modes and we sum it up and we get the system failure rate. So, system failure rates as
we see when reliabilities are multiply cable here, system failure rates are summable here.

So, this becomes an easy formula to evaluate that RT system reliability is equal to e to the power
minus integration from 0 to t lambda x dx, where lambda x is nothing but the summation of
individual failure rates of the all failure modes. This helps us this formula will help us to
evaluate, once we know the system failure rate, we simply sum it up and we get the individual
failure rate we sum it up together system failure rate. Here the inherent assumption is that failure
modes are independent that means, one failure mode does not influence the another failure mode.

(Refer Slide Time: 14:21)

117
Now, let us say if we consider that failure modes are following the constant failure rate
distribution. So, if there are constant failure rate then what will happen each failure rate lambda it
will become independent of time t, it will become lambda i, which we can write it and we know
the failure rate is 1 upon MTTF. So, this will be one upon MTTF i, and reliability for each failure
mode will be e to the power minus lambda it.

Where,

So, we can get the system reliability, system reliability will be summation of individual failure
rate into t, e to the power minus lambda t, where lambda here is the summation of individual
lambda i, this gives us now from here, if I want to know MTTF, MTTF will be equal to 1 upon
lambda 1 upon lambda and what is lambda? Lambda here is 1 upon lambda here is submission of
lambda i.

So, MTTF if I replaced lambda i with 1 upon MTTF i, this will be in a way I can say that 1 upon
MTTF for a system will be equal to summation of i equal to 1 to n 1 upon MTTFi or we can say
lambda is equal to summation of i equal to 1 to n lambda i. That means either we take the MTTF
inverse in sum, I will get the inverse of MTTF or we can simply sum up the failure rate I will get
the failure rate of the system.

But this property is true only for CFR model this is true for this is lambda t is equal to lambda it
this is true for all distributions, but this is true only for CFR model because, in case of CFR only
lambda i becomes 1 upon MTTF for other distributions lambda i may not be 1 upon MTTF. If all
failure modes are identical, if we assume that lambda one is equal to lambda 2 all r is equal to
lambda n and this is all equal to lambda 1.

118
So, if I replace this then what will happen summation of lambda i n number of times will give me
the n lambda 1. So, my system failure rate will be n times of component failure rate and MTTF
will be 1 upon lambda that is 1 upon lambda 1 lambda 1 is 1 upon MTTF 1. So that will become
MTTF 1 upon n, and that means, my MTT if I am having n components, let us say if I am having
10 components or 10 Failure Modes, what will happen, my MTTF and all are having same
MTTF, then system MTTF will be 10 times less than the component MTTF or failure rate will be
10 times of failure rate of each component.

(Refer Slide Time: 17:34)

Let us take one example. An aircraft engine consists of 3 modules which is having constant
failure rates, lambda 1, lambda 2 lambda 3. So this also means that we are having 3 different
failure modes of components, so, how much will be the failure rate for the system? System
failure rate will be this plus this plus this, this comes out to be this. So my system failure rate will
be this. And my reliability will be so we want to know the reliability functions.

MTTF is operating hours

119
So R(T), R(T) will be equal to e to the power minus summation of lambda into T, that comes out
to be to the power minus if we sum it up, we get the 0.0195 multiply by T and what is MTTF?
MTTF is one upon of this lambda. So, we get one upon of this that comes out to be around 51.28
operating hours. So, we can easily calculate MTTF for the system, we can calculate failure rate
of the system we can calculate reliability of the system, unreliability of the system, PDF
everything we are able to calculate if we have the data for the individual failed component or
individual failure modes of the system.

(Refer Slide Time: 18:53)

There is another kind of system which is failure on demand, like some system like compressor is
there or we can say this compressor or we can say sometimes like bulb is there. So, like electric
bulb which you are using at home or some other similar devices, what they happen, they follow a
pattern like this. This is off, let us say and this is let us say on. So, what will happen here
switching will be carried out. Generally, what happens similarly, motors may be there,
generators may be there.

So, and because of the nature, nature of these devices, electrical nature of these devices, at the
time of switching there is sometimes surge current happening there is a state change happens.

120
The components which we are not having any load suddenly start having the load. Now, because
of this state change, what happens? There is another failure mechanism involved.

So, there are different failure mechanisms involved, there are actually 3 failure mechanisms we
are going to discuss here. One failure mechanism is let this is off or we are saying, idle state, on
means operating state and here it is switching. So, there is a possibility it can fail during the
switching this fail during the switching the probability we are calling as p, p is the probability
that it can fail when we are switching.

But here it is continuously either operating here it is switching is one time instant phenomenon.
So, for instant formula this is not a function of time this is a probability that if we let us say p is
equal to 0.1, what does it mean? If I am making 100 switching actions, I am expecting that 10
times it will fail, is it proportion of the time if I am saying 0.01, I am expecting one failure out of
100 switching. So, every time it is switching it is having a different failure mechanism because
parts are stress differently when there is a switching.

Then there are 2 modes this is idle mode, in idle mode let us say the failure rate is lambda i and
there is another on mode that is the working mode let us say failure rate is lambda o, failure rate
means probability of failure per unit time. Now, here let us say in general we can assume that on
an average how many how much time it is going to be on and how much time it is going to be
off. Let us say this remains, I will use black red pen only.

So, let us say the duration here for this is t o and off duration is let us say t idle, t o means
operating time this is idle time. So, how much is total time here my total time is t o plus t i. So,
one cycle time is t o plus t i, in this 1 cycle, I am having proportion of operation and proportion
of idle time. Now, I want to know the failure rate. So, to calculate the failure rate, we can
calculate it in a manner like probability of failure in one cycle there is one switching.

So, per unit time how much switching will be there we will have p divided by t 0 plus t i, due to
switching let us say if I let us say that idle is 10 hours and it is working for let us say or 20 hours.
So, my cycle time is 30 hours, in 30 hours 10 hours it works in idle mode 20 hours it works in
the operating mode. So, in 30 hours, I will have only one switching only one time it will be

121
switched on, during off generally there is no stress coming. So, during off it is generally not
counted most of the time we are considering the switching during the on.

So, what is the now failure rate for this operation of switching one for one cycle probability of
failure is p. So, p divided by t 0 plus t i will give me the failure rate expected number of failures
per unit time due to the switching. What is the switching failure probability due to due to the
operating hours? I have t 0 time it is operating out of t 0 plus t i. So, it will have contribution
only up to the t 0 and the failure rate is lambda 0.

Similarly, for idle time I will have this t i divided by t 0 plus t i multiply by lambda i. This gives
me the failure rate total failure rate for the or effective failure rate for the this kind of system
where we have the switching and where we have the idle, and operating state of the systems.
This gives me the same formula is given here that 1 upon t 0 plus t i, P plus lambda 0, t 0 plus
lambda t i. If I want to know the reliability here, I can easily calculate it by RT equals to e to the
power minus lambda effective into t.

(Refer Slide Time: 24:56)

Let us take an example for this. Let us say one air conditioner compressor is there, as we know
like conditioner, the compressor whether it is in AC, whether it is refrigerator, it does not operate
continuously it operates in the cycle. So, depending on the temperature requirements it will start
then it will run for some time when temperature required temperature, threshold temperature is

122
achieved, then it will shut down and wait for some time till temperature increases again and
reaches to a certain value.

So, this operates now, this operation is let us say if we denote it, like this as we see here this is
working on an average 20 minutes for each hour that means 20 minutes is the uptime in one
hour. So, that means, out of 60 minutes 20 minutes it is operating 40 minutes it is idle, it has
experienced failure rate of 0.01 failures per operating hour. So, failure rate in operating case is
0.01 failure rate for idle a dormant is generally dormant failure rate is very, very low, it is so low
many times we can ignore it, if we drop it to 0 then from our calculation, this portion which we
are having for the idle failure rate, if it is 0 failure rate is 0 this will become 0.

Now, similarly, since one switching is happening in 60 minutes. So, now, my lambda effective I
can calculate that is 40 divided by 60 from the 60 minutes cycle 40 divided by 60 into 0.002 plus
20 divided by 60 into 0.01 and the failure probability for each switching is how much 0.03, so,
0.03 divided by 60 will give me the probability of failure rate due to the switching operation. So,
my effective failure rate comes out to be 0.003967 I want to know the probability of failure that
compression works reliability for 24 hours.

So, per hour failure rate is this is per hour failure per hour I want to know the reliability for 24
hours so, this will become e to the power minus this effective failure rate which we got multiply
by 24 and my reliability comes out to be 0.9092. So, similarly, for any other device like bulb
electric bulb or any other device which you are having you can calculate if you know there is a
failure reason which is for switching failure regions are different and for normal operation is
different and for non-operating region.

There is different reasons for the failure and if you know the failure rates and if you know the
probability of switching failure, you will be able to calculate this generally this switching failure
is also called on demand failure probability. This is very much applicable for these compressors,
motors or even our engines even bikes etcetera, whenever we start them.

So, what happens they do not start there is a starting failure why because for starting they need
some excessive current they need some excessive force and because of that, there is a chance of
failure, some other circuitry is also involved some time for the starting purpose which is different

123
than the usual circuitry. So, starting failure probabilities may be different than the usual working
probability. The same way is applicable for our motor this pumps etcetera,

(Refer Slide Time: 29:06)

You will see this repetitive loading, repetitive loading, what does it mean? That means, there is
some operation which is carried out like instantaneously let us say punch machine is there. So, it
will just punch and go punch and go. So, each time one load cycle will be applied. So, here let us
say, but at some moment it may fail to punch because of the load cycle. So, a system may
experience repetitive loading.

And let us say the probability of failure in each cycle, each load cycle here it does not matter,
during non-operating time does nothing matter and it is not like switching and working for a
longer period. It is just one time operation. So it is kind of switching just it just goes and do the
work and goes off. Let us see the probability of one cycle, the cycle may be longer also, but
generally it will be not so high, in one cycle the failure probability is p. So, if I am interested in
reliability, then the probability that it will not be failing in one cycle that is reliability that is
equal to 1 minus p probability of failure is p.

So for reliability is 1 minus p. Now, let us say n such cycles are applied in a given time t in small
time T I am applying n number of cycles. So, what does it have what will happen in this case, in
none of the cycles that for the system to work and sustained n number of cycles, it should not fail

124
in any n number of cycle. So, reliability for n number of cycle will be that it is not failing in any,
so, multiplication of this value or that is equal to r to the power n and what is r? That is 1 minus p
raised to the power n.

Now, this I can write this value if I take exponential and log, exponential and log of x is equal to
x for any value x if I take exponential log or similarly, if I take ln of e to the power x both are
equal to x. Here I am taking the this part that x is equal to e to the power ln of x, what is x here?
1 minus p to the power n. So, e to the power ln of 1 minus p to the power n, this I can write it as
e to the power n can come out here ln 1 minus p. Now, we know that in one cycle the failure
chance in such cases are very, very small the probability of failure is very, very small.

So, in that case because p is very small this ln of 1 minus p if we expand it comes out to be we
can assume it equal to minus p and for minus p if I replace ln of 1 minus p with minus p if this
RN will become e to the power minus np, if you see this now looks like same formula as e to the
power minus lambda t, but there is no t here. So, t how we can get we can get from here, we
know that n number of cycles are applied in t time. So, for each cycle how much time is there
that is t divided by m.

So, delta t is the each cycle time on an average that will become t divided by n or we can say
number of cycles is equal to t divided by delta t from there I can get n is equal to t divided by
delta t. So, reliability will be equal to e to the power minus, I replace n with t divided by delta t
multiplied by the p. Now, here what I will do t I will takeout and I will write it as p upon lambda
t.

Now, if you see here t has come here and this I can write it as the lambda probability per unit
time is lambda this becomes e to the minus lambda t. So, as we see that repetitive loading case
can also be converted into a exponential distribution where the failure rate is the parameter is
nothing but the probability of failure per unit cycle time.

125
(Refer Slide Time: 33:23)

Let us take one example that there is a packaging machine or carton or is it is doing the carton
packing box in a in a food processing facility. Let us say we have the chips packet or we have the
cake packet or we have apple packet like that. So, in one carton we are having 12 cans of coffee
it is already given coffee cans are there, do you know this cartooning machine can fail with a
probability of 0.005 per application that means in making one carton, the failure probabilities
0.005.

This carton is having 12 cans so, and production rate is 30 cans. So, that means, how many
cartons will be there in 1 minute that is 13 divided by 12, in 1 minute 30 cans are produced and
12 goes to the carton so, that number of cartons will be 30 divided by 12. So, that is my rate of
carton production. And so here my p probability of failure for each carton is 0.005 and my
frequencies 30 divided by carton per minute and I am talking about 1 hour period run, production
run.

126
So, in 1 hour production, how many times carton will be made that is 60 divided by so, T
becomes 60. So, my lambda will be equal to p divided by delta t that is 0.005 divided by delta t
that is 30 by 12, so, divided by 30 then divided by 12, that becomes 0.002. I want to know the
reliability for 60 minutes, so, reliability for 60 minutes will be e to the power minus 0.005 into
12 by 30 into 60 minutes that comes out to be 0.8869.

So, using this as we have seen that with the approximation to exponential distribution, we are
able to solve problems for these kinds of systems which are not directly following exponential
distribution, but they have the instant failure rates and these instant failure rates we are able to
convert into a exponential distribution, and it helps us to calculate the values very easily. So, we
will stop it here and continue our discussion in the next lecture. Thank you.

127
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 08
Constant Failure Rate Model- III

Hello everyone, we will continue our discussion from our previous lecture that is on constant
failure rate model or exponential distribution.

(Refer Slide Time: 00:38)

We have already studied about exponential distribution, how to get failure rate, how to get a
PDF, reliability, unreliability etcetera and then we discuss that how some in some particular
application exponential distribution can be used to model the situation. Now, we are going to
discuss another situation in which let us say again, if we have one component which is failing
the constant failure rate or the exponential distribution.

e −  t ( t ) n
pt (n) =
n!

So, and now, what happens this component if it fails, then it gets immediately repaired or
replaced. So, since there is an immediate repair and replace we do not lose much time
between the replacement or repair and so, what happens here one equipment on failure gets
replaced so, our function is not lost or we are able to continuously use the system. So, here
what will happen we will observe certain number of failures in a given time. So, that number
of failures depends on the distribution which you are using.

128
Now, this number of failures if we take, number of failures we are writing as let us say n,
number of failures n is a random variable which is discrete random variable that is numbers
0, 1, 2, 3, like that. Now, this number of failure, which is a random variable now, because in a
given time t there may be 0 failures, there may be 1 failure, 2 failure, 3 failure and their
probabilities would be different. So, the probability that number of failures would be a small
n amount. So, this very small n can be any number, this probability is given by the Poisson
distribution, this Poisson distribution we have already discussed in the discrete distributions.

So, as we discussed earlier that, when we have the constant failure rate or constant arrival
rate, arrival rate is not a function of time, that is what we are giving us lambda, lambda is a
constant failure rate, then expected number of failure in the small time t will be lambda t
because unit time we are expecting lambda failures. So, in time t this will become lambda t
expected number of values, this becomes our mu, mu is the parameter of Poisson distribution
which is also mean and variance of the Poisson distribution.

So, this gives us the Poisson distribution, Poisson distribution as we earlier discussed this is e
to the power minus mu, mu to the power x divided by or mu to the power n in this case
divided by factorial n, we are able to get the same thing here. So, once we use this formula,
we can get the probability of 0 failure, probability of 1 failure, probability of 2 failure,
probability of 3 failure. So, this becomes very easy to solve spare parts related problems.

n
Yn = Ti
i =1

n −1( t ) 2
P (Yn  t ) = FYn (t ) = 1 − e − t 
i =0 i !

So, similarly, this is in number of failures zone, we can also get the probability in time zone,
that means, what is the probability that we will observe n number of failures in time t, so,
here time t is the random variable. So, we want to know what is the time to nth failure, so,
time to nth failure if you are representing as Yn. So, time to enter failure Yn will be
submission of Ti, here Ti is the time to each failure or time between failures. So, T1 means,
time from 0 to first failure, how much time is spent, T2 means, how much time is spent from
1 to 2 failure first to second failure.

129
So, this is time it is 0 to 1 failure, this is time between 1 to 2 failure like that, and summation
of this will give us the time from 0 to till that failure time. So, if I am saying Y1 that means
this is essentially equal to T1 time to first failure. Y2 means time to second failure, which is
submission of T1 and T2, Y3 is time to third failure, that is T1 plus T2 plus T3. So, Yn is
essentially submission of Ti i equal to 1 two n.

Now we want to know what is this Yn, Yn is time, time to nth failure, what is the probability
that time to nth failure is less than t time to failure is nth failure is less than t means our nth
failure happens before time t small time t , that means it is unreliability, unreliability when I
have the n components in my hand, so, that means all n components are failing before time t.

So that gives me unreliability. So, because I do not have any further spears or I do not have
any further components, so, if all the spares and components failed before time t, I will not be
able to use the system, it will be failed this we can, this is a cumulative distribution which we
can represent as FYn t, t is the random variable.

So, we are representing it here and Yn is the random variable representing the at time t, one
this value turns out to be if you see it here this is very similar to this the only thing is this is 1
minus of this that means 1 minus of because it will work what is the reliability here, if I have
n component reliability would be there will be there will be 0 failure that will be n equal to 0
sorry, up to 1 failure, 2 failure, 3 failure 4 failure like that, if I am having 4 then it can treat up
to there is a failure of 0, 1, 2, 3, 4 like that.

So, here the FYn t is given it 1 minus e to the power minus lambda t and equal to i equal to 0
to n minus 1 lambda t raised to the power i factorial i. So, if we have s squares which are
available to us then this would be equal to the system reliability would be equal to all the
cases up to S number of, because we have one device which is already working, then we have
S, so total devices are S plus 1 here. So, here the value would turn out to be summation that 0
failure up to S failure because S plus 1 failure means total system is lost, we cannot use it. So,
this turns out to be system reliability.

S
RS (t ) =  pt (n)
n =0

130
(Refer Slide Time: 07:37)

Let us take an example, I specially designed welding machine has a non-repairable motor
with a constant failure rate of 0.05 failure per year, the company has purchased 2 spare
motors. So, we have S equal to 2 here, that means I can sustain I can continue working even,
if there are two failures, if third failure occurs after that, I will not be able to use the system,
if the design life of welding machine is 10 years, I want to use this up to 10 years.

So, what is the probability that two spares will be adequate, that means my system will
continue to work up to 10 years and my spares will be consumed or not consumed, but my
system will not being failed state that means what is the here the expected number of failures
or mu as we wrote mu is equal to lambda t.

So, lambda is 0.05 t is 10 years, 0.05 per year and multiply by 10. We have to keep remember
the unit is it is per year. So 10 into 0.05, this is the 5 0.5. So, this is our mu, now we can get
the reliability, reliability means 0 failure, 1 failure, 2 failure it can sustain and it will be
reliable. If third failure occurs, then it will be unreliable. So that R2 10 comes out to be n
equal to 0 to 2 e to the power minus 0.5, 0.5 raised to the power n divided by 5 factorial n.
So, e to the power 0.5 we can minus 0.5 you can take outside and then for n equal to 0 this
will become 0.5 raised to the 0 divided by factorial 0.

131
So that will be 1 divided by 1 for n equal to 2 this will become 0.5, sorry n equal to 1 it will
become 0.5 divided by factorial 1, factorial 1 is 1, so plus 0.5 for n equal to 2 this will
become 0.25 squared 0.5 divided by factorial 2 factorial 2 is 2 0.25 by 2, when we solve this
we will be able to get 0.9856. So, as we see this has a very interesting application that when
we have spares, how can we calculate the reliability of the system for a given time?

Or we can also see that how many we can reversely see that how many spares would be
adequate like, this is giving me 0.98, so that means almost 98 0.5 percent reliability I am able
to achieve.

If my target is let us say 99 percent, in that case, I can add 1 more spare here, I can then if I
add 1 more spare this will become a 3. So, in that case, I will add additional I will get
additional reliability here, how much that will be the summation would be added is 0.5 raised
to the power 3. So that would be 20, but you found in this Bara 0.125 divided by factorial 3
factorial 3 is 6.

So, we can see whether this is sufficient or not or how many spares are required. So, this
helps us to have the reliability estimate in case we, but the condition is that the components
should follow the constant failure rate, if it is not constant failure rate then it will not be
Poisson process and we will not be able to use this formula.

(Refer Slide Time: 11:06)

Sometimes we are interested to know reliability bounds, most of the time we are interested in
lower reliability bounds we are interested in this value that reliability is greater than

132
something. So, R we RL t, that means at time t my reliability is minimum this better than this
is good.

Similarly, for things which are of negative nature, which is undesired, we try to we are
interested in upper limit like our failure probability is not more than a certain value, reliability
is better than this. So, we always want to have an optimistic conservative estimate which will
give us a optimistic view. So, RL gives us a conservative estimate on reliability, we expect a
better reality than this. But sometimes, if required, we can also calculate the upper reliability
is sorry, this is our RL t this is over RU t.

Now, in case we have the failure rate, failure rate is a function of time or it can be constant
failure rate, if it is constant failure rate, we consider this as exponential distribution, let us say
failure rate is having a time function also, but we know that at the time t which is of time of
interest at time t lambda value is lying somewhere between lambda L and lambda U, that
means lambda t value we know this is going to fall between lambda L lower limit on lambda
and upper limit of lambda which is a constant value.

So, anytime we evaluate the bound it will be giving us two values lower limit and upper limit.
Now, we want to know if we know the lower limit and upper limit on lambda, this is a
constant value now. So, we can use the reliability equation of exponential distribution that
will give us the bounds on the reliability. So, since we are taking e to the power minus
integration of lambda. So, what happens when lambda is more the reliability will be less.

So, since this is a inverse relationship, that is why for lambda L we get the RU for lower the
value of lambda we get the upper value of reliability. Similarly, for lower value of upper
value of lambda we get the lower limit on reliability.

0  L   (t )  U
 t0 L dx   t0  ( x ) dx   t0 U dx
e −  0 L dx  e −  0  ( x ) dx  e −  0 U dx
t t t

e − L t  R (t )  e − U t

So, as we see by using the exponential distribution, we can simply take the e to the power
minus lambda Lt which gives us the upper limit on upper bound on the reliability and lambda
Ut, it gives us the lower bound on reliability and this can help us to know that reliability even

133
if we do not know that failure rate function is the constant failure rate, but we know the
bounds we know that it is bounded in certain value.

\(Refer Slide Time: 14:18)

134
135
Now, let us see constant failure rate redundant components, redundant components means
that we redundant components like we have two components anyone fails even if this fails
my system will continue to work. Let us say we have two pipelines are coming to our home
from the same source. Now, if one pipeline breaks, then second will continue to work. So, in
that case, our job is our water supply continues our water supply does not fail.

So, if we do that, let us say both failure probability of any pipeline is same or any other
system similar system there are parallel redundancies is there, then let us say that each
component is having a failure rate e to the lambda and reliability is going to be for time t it is
going to be e to the power minus lambda t let us say we have two components.

So, R1 t, R2 t we can get. Now, system reliability, we know that for parallel case, parallel
cases union case or this will come out to be R1 t plus R2 t, Rt is equal to we can write in two
way one is 1 minus multiplication of 1 minus Ri t that is 1 minus R1 t multiplied by 1 minus
R2 t, that this we discussed earlier when we discuss the probability.

So, here or we can also get apply the union formula that is Rt is equal to R1 t union with R2 t,
or we can say R1. So, this becomes I am generally we do not do this for probability we have
to apply union on the events, here the event then we have to call T1 T2, but to simplification
for understanding purposes I am writing like this.

So, that becomes R1t plus R2 t minus R1 t into R2 t. So, as you see here, same formula
comes out for Rt. Now, if we apply R1 t and R2 t, R1 t is e to the power minus lambda t, R2 t
is also e to the power minus lambda t, r so, this becomes 2 into e to the minus lambda t and

136
this is e to the minus lambda t into e to the power minus lambda t. So, that becomes e to the
power minus lambda t square or we can say e to the power minus 2 lambda t.

Now, once we have this reliability from the reliability we can get the system hazard rate,
hazard rate is given us ft upon Rt where ft is what, ft minus d Rt over dt. So, here if I
calculate f t, f t will be equal to minus t Rt over dt.

So, that means Rt is given to me here, I have to differentiate this if I differentiate this, this
will become since all is negative. So, when I differentiate this negative of minus
differentiating this will be minus 2 into lambda e to the power minus lambda t minus again
when I differentiate this will be minus 2 lambda will come out so, minus minus plus 2 lambda
e to the power minus 2 lambda t.

So, when I take my negative inside this become 2 lambda e to the power minus lambda t
minus 2 lambda e to the power minus 2 lambda t, this becomes our ft same thing we have
written here, 2 lambda e to the minus lambda t minus 2 lambda e to the power minus 2
lambda t and what is Rt, Rt is 2 e to the power minus lambda t and e to the four minus 2
lambda t.

Now, here to lambda we can take as common. So, once we take 2 lambda common this will
become 2 lambda into e to the power minus lambda t into minus e to the power minus 3, from
there again we can take. So, these steps are like this like I will take 2 lambda, 2 lambda e to
the minus lambda t common here, first term is taken as common so, this will become 1 minus
2 lambda is taken out e to the power minus lambda t is taken out. So, if I divide this by e to
the minus t lambda t the remaining will be e to the power minus lambda t.

Similarly, here if I solve this, if I take 2 e to the power minus lambda t as common, this will
become 1 minus Now, here 2 is not there, so, this will become 1 by 2 to the power minus
lambda t same thing comes out so, this becomes.

Now, 2 lambda t e to the minus lambda t 2 e to the power minus lambda t can be cancel
lambda will be only remaining here lambda will be outside. So, we get lambda into e to the
power 1 minus e to the minus lambda t divided by 1 minus 0.05 e to the power minus lambda
t. Here if we see this gives us the failure rate.

137
Now, this failure rate is not constant while if the system was a series system, series system
means are both components are required for system to work. In that case Rt would be equal to
R1 t multiplied with R2 t in that case this comes out to be e to the power minus 2 lambda t in
this case, the system is again constant failure because this is again exponential distribution
with parameter 2 lambda.

So, the failure rate, which was lambda earlier has become now 2 lambda, but in case of
parallel systems, we are not able to reduce it to the constant value or independent of time as
we see this is dependent on time we are not able to subtract it and get it make it to the
independent of time. So, here lambda t becomes function of time.

So, for series system failure rate for when all components are failing constant failure rate
system also follows a constant failure rate, but for redundant components or parallel systems,
the system hazard rate depends on the time, even when components are falling the constant
failure rate and this hazarded if you see this hazard rate is increasing with time, this is
increasing hazard rate and this asymptotically reaches to those single component hazard rate,
finally the hazard rate it will reach to the single component hazard rate as time progresses this
we can see with the example, which I have drawn here already as an in an Excel sheet.

So, to see this, I have to open this. So, if you see here, I have taken the value of lambda here.
So, when lambda is 1 failure, rate is 1 then I have plotted this for time t equal to 0 to 10 this
will look like this. So, if you see it here it is starting for almost 0, because it is time
dependent. So, t equal to 0 it is coming like to be 0 almost and then it is increasing, increasing
and if you see it is going up to the lambda equal to 1. And if you see this is almost at lambda
equal to 4, 4 times of the lambda it is reaching to the, when time t equal to 4 times of MTTF,
means 1 upon lambda into 4 around, then it is reaching up to here.

Now, let us say for this case we may not be absolute let now let us say. So, what happens if I
increase the lambda failure rate let us say if I make it 2, let us see what happens when failure
rate is 2 sorry Num Lock, when failure rate is 2 then if you see this sharper the rise is sharper
in a less time it is reaching almost at time t equal to 2 it becomes failure rate equal to 2.

Now, if let us say if I reduce this if I make these 0.5, then what happens it will again rise 0.5,
but it will take more time to rise 0.5 that would be around 2 into 4 around 8. If you see in
around 8, it reaches to time t equal to 8 hours if failure rate in per hour. The general index use
unit use for failure it is per hour per operating hour.

138
So, if it is not mentioned assume that this is per hour. So, if you see in 8 hour it will be
reaching to the failure rate. So, if as we see that larger the failure rate, it will quickly it will
try to reach to the hazard rate of single system even though we are having the multiple
systems here. So, this helps us to understand that how this hazard rate is changing with time
and as we see hazard rate is increasing with time. So, this is increasing hazard rate here. Now
for this case, if you want to calculate MTTF, the MTTF turns out to be how we will calculate
MTTF, MTTF is as we know, MTTF is integration from 0 to infinity, Rt dt.

So, RT is this. So, I simply have to integrate from 0 to infinity of this, if I integrate what will
happen exponent, we know in integration of exponential power leads to the 1 upon whatever
is the integration of ea x is 1 upon a e to the power ax. So same thing if we apply, we will get
2 upon minus lambda plus 1 upon 2 lambda, 2 is here, integration of this you will get 1 minus
one upon lambda e to the power minus lambda t and this will give 1 upon minus 2 lambda
minus minus will become plus and e to the power minus 2 lambda t.

Now, limit 0 to infinity, when we put infinity both will become 0 exponential minus infinity
is equal to 0 and exponential 0 will be equal to 1, when we put 0 then this will become 1 and
this will be minus minus plus 1, 2 upon lambda that will be equal to 2 upon lambda and this
will become minus minus 1 upon 2 lambda. So, this as we know this is coming to be 2 upon
lambda upon minus 1 upon 2 lambda.

Now, if we solve these 2 lambda we take this will become 4 minus 1 3 by 2 lambda that is 1.5
divided by lambda and we know 1 upon lambda is the MTTF of individual component. So,
MTTF of system here turns out to be 1.5 times MTTF of individual component. So, if we
have two components working together, MTTF is not doubled MTTF is around 1.5. And
again, this all is applicable when we have the exponential distribution.

(Refer Slide Time: 26:30)

139
Now, let us say if you take one example, that microwave transmitter is there, which is having
the failure rate of 0.00034, this we already considered, but now let us say we earlier we
considered only one equipment. Now, let us say we add one more transmitter. So, there is a
two-transmitter system where if one fails, another one will continue to do the function. So, in
this case, what is the reliability of parallel system, reliability of parallel system is 2 e to the
power minus lambda t minus e to the power minus 2 lambda t.

Then reliability for 30 days period we want to calculate, if you want to calculate for 30 days
period, earlier also we have calculated so that will become 34 into 32 into 24 hours, that
comes out to be 720 hours. So, for some because failure rate is given failure for operating
hours, so, we have to convert the time into hour, we get 72 hours. So, e to the power minus
lambda into 720 minus e to minus 2 lambda into 720. This gives me the relative 0.95285.
Now, if you see this the earlier the reliability was for single unit was 0.78286, 78 percent
around now it become almost fit 95 percent.

So, it is a very high increase in the reliability. And what is going to be the MTTF for new
system, that is 1.5 of single system MTTF, single system MTTF or we can say 1.5 time
divided by lambda. So, 1.5 divided by lambda gives us this. So, this is around 1,5 times of the
single unit of the MTTF. Earlier it was around 2941. Now it is become 4411 and hazard rate
function lambda t is given here, the same hazard rate function I have just taken from earlier
and just by putting this hazard rate here, the same curve we are able to get.

And we see they are plotted this from 0 to around 47, 4800 around 5000. And we get this, if
you see this is hazard rate, increasing and so in this increasing hazard rate is up to here. And

140
further also it may increase a little bit, but it is asymptotically now becoming parallel to the x
axis.

(Refer Slide Time: 28:52)

So, I will stop here. And we will continue our discussion in next lecture.

141
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 09
Two Parameter Exponential Distribution

Hello everyone, welcome to lecture number 9. So far, we have covered topics such as
reliability definitions and basic probability concepts, which will be relevant in future lectures.
We also explored the constant failure rate or exponential distribution. Today, we will
introduce a variation of the exponential distribution called the two-parameter exponential
distribution, which is commonly used in modelling the time until a certain event occurs. This
distribution allows for a non-constant failure rate and has two parameters - the scale
parameter and the location parameter. We will dive deeper into this topic in this lecture.

(Refer Slide Time: 01:05)

So, the two-parameter exponential distribution add additional parameter at time t0, what was
the single parameter exponential distribution that was e to the power minus lambda t was the
reliability equation for it.

−  ( t −t0 )
f (t ) =  (t ) R(t ) =  e , where 0  t0  t  

So, here lambda was the parameter, whenever we talk about distribution there are two things
one is the random variable for which the distribution is the function. So, the reliability when
we say reliability against the value of the random variable. So, time to failure is a random

142
variable. So, reliability is a function of time to failure. Similarly, hazard rate is also a function
of time to failure or we can say the failure or probability unreliability is also a function of
time to failure that is for this reflecting here.

−  ( t − t0 )
R(t ) = e , where t  t0

Now, this time to failure will appear in this distribution in some form. So, that is t here. Now,
but what are these parameters lambda etcetera. Now, what happens from system to system
how the difference will be there, one system will have different reliability another system will
have different reliability though they are following exponential distribution, but still they are
following the different exponential distribution and that that difference is determined based
on the value of the parameter.

So, one system having the one value of parameter lambda another system which is having
another value of value of the parameter lambda both will follow the exponential distribution,
but both will have the different exponential distribution because there is a different value of
the parameter.

1
MTTF = t0 +

So, once we determine the parameter for a particular component or the system that defines
the system distribution. So, parameter is going to give the different instances of the same
distribution. So, the earlier we have one parameter that is lambda. lambda is also called as the
failure rate or constant failure rate. We can also write it as minus t upon MTTF where MTTF,
though it is MTTF of the distribution, but again this is also parameter because it is dependent
on the value of the lambda. So, now here let us say generally, our concern is that reliability at
time t equal to 0 is always 1, and then it always keeps on decreasing by whatever amount
applicable.

So, here we are discussing a specific case or some cases where we say that for time t equal to
t0 that means, up to time t0 here, small time t0 here we have no chances of failure that means,
up to this time or this is the minimum life, this is the life for which the component or the
system is going to survive there is no chance 0 chance of failure means, it is not possible that
the system will fail during this period, then this period we are calling this 0 to t0 we are
calling as a additional parameter this additional parameter we can call it as the guaranteed

143
lifetime, that means or we can say this is the minimum life. So, this is the time for which we
have no failure no possibility of failure is there.

So, what will happen in this case then again reliability will start from one from here then
again keeps on decreasing, same curve here what happens it gets shifted by the time t equal to
t0. So, this shifting is happening because of this guaranteed life.

So, these additional parameters how do we take into account into the our equations, here
reliability is represented as R same if we replace t as t minus t0 it becomes same. So, Rt dash
that is a t dash is the new parameter then for this becomes t minus t0, that will e to the power
minus lambda t minus t0. Similarly, if you differentiate this ft will be minus d Rt over dt or
we can say lambda t into Rt. So, lambda t is nothing but lambda t is again lambda only.

So, lambda into but if you know this lambda is not applicable for 0 to t0 because there is no
failure lambda t is 0 failure properties failure it is 0 no possibility of failure during this
period. So, here the reliability function is defined only when t our time is greater than t0 and
less than or up to time t equal to infinity.

So, here for t less than t0 it is kind of undefined or we can say it is equal to 1. So, here small
ft comes out to be lambda into e to the power minus lambda t minus t0. Now, here similarly
reliability is starting from t0 only reliability and first below time t0 we consider that there is
no distribution this is a fixed value and MTTF. Now here if we have this reliability, when we
integrate this from 0 to infinity dt we know that up to see same curve is there only up to here
to here we have the additional so, this additional area which is added here.

Now, this is up to t0 this is 1 so, area under the curve that is small time t0. So, small time t 0
is added to the earlier MTTF earlier, MTTF was 1 upon lambda. Similarly, we can get the
function because Rt is this. So, if we solve this for time t, then we know minus ln of Rt will
be equal to lambda t minus t0, just taking minus ln on both sides and then if I divide by
lambda will be gone from here then t will be equal to t0 minus 1 upon lambda ln of Rt.

1 0.69315
tmed = t0 − ln(0.5) = t0 +
 

So, here if I want, if I am interested to know time t equal to t median. So, we know for t
median reliability is equal to 0.5 at t equal to t median reliability becomes 0 to 0.5. So, t
median will be equal to 0 t 0 minus ln of 0.5 divided by lambda we can get this using this and

144
ln of 0.5 turns out to be around 0.7 that is 0.69315. Similarly, if I am interested to know what
is my design life let us say I am interested to know design life means for a certain reliability.

1
t d = t0 − ln( R)

0.10536
t0.9 = t0 +

So, I am interested to know the design life for reliability to equal to 0.9. So, this can be
obtained by placing 0.9 in terms of reliability that is t0. So, R t 0.9 design life will be minus
ln of 0.9 divided by lambda ln 0.9 turns out to be minus 0.10536. So, that will become plus.
So, similarly for whatever is our design life, we will be able to whatever if I my design life
target is 0.9. So, I can know that what is what will be the time design life or design life which
we can offer.

(Refer Slide Time: 09:12)

145
1

Let us take one example that if let us say for the system failure rate is given us 0.001 the unit
maybe per hour and again t0 is let us say 200 hours. So, we are interested to know what with
the reliability function, so reliability function is exponential minus lambda into t minus t0 and
t has to be greater than equal to this function is defined only when t is greater than equal to
200.

Same formula for MTTF and then we apply MTTF as we know that is t0 plus 1 upon lambda.
So, 200 plus 1 upon 0.001. So, 200 plus 1000 gives the 1200 you can say 1200 hour, when
we want to know t median as we discussed earlier t medium is t 0.5, reliability designs
reliability 0.5.

146
So, this is t 0 minus ln of 0.5 divided by lambda 0.001 ln of 0.5 turns out to be minus
0.69315. This gives me the reliability of sorry t median of 893.15 hours, if I am interested to
know the reliability that what is the design life for 95 percent reliability or R equal to 0.95.
Same formula I have to use 200 minus ln or 0.95 divided by 0.001.

This if we solve we will get the 0.251.3, this like if we can calculate this using calculator if
we use scientific then we have to calculate ln of 0.95 and we want to log CatLog of this that
is 0.0519. If we multiply divide this by 0.001 and multiply by minus 1 sorry some mistake
like 552.229 and once we add this to 200, that becomes around 252 points there may be some
calculation based on calculation variations may be there little bit based on the calculators
because, this is done by hand.

So, we many approximations may be carried out so, it comes around 252 and standard
deviation sigma we can calculate, which is 1 upon lambda. So, as we see for two parameters
also the sigma square is same sigma square is 1 upon lambda square, because variability same
by changing the distribution to this to this the variability has not increased or decreased
because this portion does not have any variability.

So, the variability essentially remains same and this is all applicable from time t equal to
greater than t0. So, we comes out the variability comes out to be 1 upon lambda square or
standard deviation as t turns out to be 1 upon lambda that is 1000.

(Refer Slide Time: 13:14)

147
148
149
This all we have taken from this book and further discussion; we will continue for the time
variant distributions. So, that will first discussion would take place from the verbal
discussion, before going to verbal discussion, let us have one important discussion on this
lambda.

So, what is this constant failure rate and if you see in the professional world there is a lot of
disputes, though as I as we discussed earlier all the standards processes and even though if
you refer to any product manufacturing catalogue, they would say one quantity that is MTBF
or MTTF. They will tell that their component or part is having a certain MTTF.

So, by default, we are supposed to assume that this is following the exponential distribution.
And if we want to calculate reliability, it will be for a given time t if I want to calculate
reliability this will be e to the power t upon MTBF. Now, the question arises here that by
default we are assuming that the system is not going to deteriorate.

So, or let us say, if we see the component handbooks and if we see the component MTTF
etcetera, they will say that failure rate for the component is given as 10 to the power 3 into 10
to like for capacitors etcetera or registers you will see that failure rate is somewhere around 3
into 10 to the power minus 9 per hour.

So, failure rate per hour is this much is okay. But if I convert this into MTTF this will turn
out to be 10 to the power 9 divided by 3 hours, which is a big number. If we see the number
in another sense let us see this from the Excel. So, as we were discussing that we were
interested in 1 divided by 3 into 1 e 9. So, if you see that number is quite large we can see in

150
so, many three’s are there, 1234 at around 33. Now, let us say we divide this by 24. If you
divide by 24 we will get the number of days, number of days that is around 1,00,000 around
1,38,00,000.

Now, if I divide this by 365.25 I will get years. So, if we see the years are coming out to be
somewhere around 38,000. So, if you see when I have taken this MTTF my lambda was 3
into 10 to the power minus 9 per hour. In that case, my MTTF turns out to be somewhere
around 38,000 years. So, it raises a natural question that what does this MTTF means is the
cam component going to survive 38,000 years, the answer is definitely no the company know
component can work for this long period of the time.

So, what does it mean in that case? Generally, to understand this, because there are many
people you will find there they are saying that this because of this this MTTR becomes
useless. But generally in practical sense, why it is still useful and because of the simplicity it
again makes a lot of these possibilities which would have not exist if we were not taking this
as a constant failure rate. So, but is it still valid.

So, the validity question comes into picture again and again, we know that generally we may
have time and we may have the failure rate, which is following the this kind of bathtub curve.
Now, this bathtub curve we have this constant failure rate.

So, let us say this is my life at which deterioration start. So, let us say this is my design life.
And this is my life which is at which we I am doing the burning that is a tb. So, what happens
generally though, our concern is that failure rate is changing here failure is changing here. So,
what is limiting the life, most of the time life is being limited by the this phenomenon,
whenever increasing failure rate starts what will happen majority of the components will start
failing or here is the place where the failures will start.

So, but what happens generally this life, this life maybe 30 years only for a component like
this, while I am writing 38,000 here, the life may actually be around 30 years, that means
around in 30 years the deterioration will start and my component failure will start failing very
quickly that means in around 30 to 50 years most of the most of the components may fail,
that is because we are talking about this region.

But what happens in most of the time we are not going to use the equipment beyond up to 30
years that equipment may be used only for 10 years. The equipment in which I am using this

151
component that equipment is used only for 10 years. So, for 10 years, my validity of the
reason is only this, that is the constant failure rate that does not go to the, this region that is
my increasing failure rate.

And this failure rate which is initially decreasing failure rate which can also be the
responsible for that is already by choosing better manufacturing methods and by choosing the
burning methods segregation and doing the same stressing and then removing the failed
equipment. Once we do that, what will happen during this initially years also this is
eliminated. So, this infant mortality period or burning period is also those failures are not also
sent to the people.

So, people when they are using a component, they see only this constant failure rate region.
But for constant failure rate, how much will be the component MTTF or how much will be
the lambda that will be this one that we have calculated. The condition here is that the
component is not used beyond time td, if we consider that component is used beyond time td,
then my failure distribution which was here that failure distribution which was constant will
not be valid in that case my calculations will also not be valid.

So, this calculation is valid then design life is up to 10 years or 20 years or 30 years. But if it
goes beyond that 30 years or beyond or whenever the we see that for different devices it may
be different when the deterioration is start.

So, before deterioration takes place. The constant failure rate assumption is very much valid
and this high MTTF and lambda is also valid because this lambda as we know for system, if I
am calculating lambda as now, let us see I am using almost 1000 components like this. So,
when I let us say if I include i equal to 1 to 1000 lambda.

So, what will happen in that case the lambda will be 1000 times. So, lambda will become
system lambda will become 10 to the power minus 6 and MTTF will become 1000 times less
that is MTTF divided by n. So, this will become 38 years. If I say some components if let us
say not all components are falling in 10 to the power minus 9 some components will be 10 to
the power minus 8, some will be 10 to the power minus 7 also.

So, in that case if let us it is a 10 to the power minus 8 component this will be around 3.8
years. So, system MTTF which is of our concern because the system will fail the system
failure probability is going to be high, system MTTF is only 38 years here. So, that means,

152
system we expect to fail before much within this time that 38 years we are expecting around
63 percent failure.

So, here when we see this that failures are happening before this period itself. So, this
becomes that the calculations which you have done becomes valid here, because for the
system level the total time or MTTF comes out to be quite low, it may be some 4 years, 5
years because multiple components are there, multiple failure rates are there.

When we sum that up, then this MTTF comes out to be in which is much less than this period
design life. So that makes the sense because then it provides us the comparison purpose,
because within this period, we expect that this component will fail with this failure rate. So
here like when people are asking validity, whether we should use this constant failure rate or
MTTF or no, I would suggest that we should use but we should use, but we should aware that
these calculations that MTTF of 38,000 years does not mean that this is going to survive for
38,000.

Here it means that MTTF a 38,000 will be used when component is used only for 20 years or
30 years of before the deterioration takes place. So, before deterioration sets before that time,
if we consider the failure rate of if we consider the MTTF that, MTTF is valid. But that is not
indicative of life in that case for the component, but since most of the time the system lives or
system MTTF would be lesser than the time before which that come this component sets in or
this component is start deteriorating. The constant failure rate or this kind of high MTTF can
still be valid for the component level.

Because at the system level then we will have the meaningful result at system level 38 years
or 3.8 years, if it was 8 minus 8 it would have been 3.8, we have some valid outcomes, this is
quite useful, because this provides us because most of the time in failure in industry when
you see as a manufacturer or as an employer employee, in any company, which is producing
any goods, consumer goods or certain goods, you will see that you have certain life for which
you are liable like warranty life, then certain life you perceive that the system is supposed to
work because after there is some critical or costly component starts failing and system
becomes an unsupportive or the technology itself is supposed to improve.

So, let us say the warranty life you are giving around 3 years and you may have component
system like around 10 years, let us say we talk about a free refrigerator or if we talk about our
TV, sorry, not TV, TVs are much less life.

153
So, in this time, what will happen during this period, we will be able to have some
meaningful and the cost the kinds of failures the system will see will not be life related
failures, those will be the random failures belonging to this constant failure rate reason,
because of random failures, this assumption of having exponential distribution, because this
is the area of concern. Once the failure starts let us after 10 years, when multiple failures
starts happening, we are not going we are not having an intent to use the system in this
region, we are going to use the system in this reason where the failure rate is constant.

So, when we do the design and especially whenever these failures, these life related failures
are generally perceivable you can predict them and you can find out a safe life. So generally,
what happens there is a preventive maintenance policy. So, before these components actually
starts deteriorating or preventive maintenance policy sets in and before deterioration actually
can lead to the failure these components are discarded before the failure itself.

So, because of that population, because we are taking out this population also, we are taking
out this this population we are addressing by the by name, this population we are addressing
with the preventive maintenance.

So, effectively most of the time, we are working with the components or the parts which are
setting in this, and our concern is mostly this random failures. So, here these models provide
us a very good assessment to take these random failures into account and understand that
though, we are not able to foresee any failure, but still there is a probability of failure lying
with the systems. So, I will stop it here today. This lecture will continue in next lecture we
will continue with the time dependent failure distributions. Thank you.

154
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 10
Weibull Distribution (2 Parameter)

In lecture 10, we will delve into the Weibull distribution, which is another widely used
probability distribution. Unlike the exponential distribution, which assumes a constant failure
rate, the Weibull distribution can accommodate increasing, decreasing, or constant hazard rates,
making it a versatile model. The 2-parameter Weibull distribution is more commonly used than
the 3-parameter version.

The Weibull distribution is highly popular in data analysis for reliability estimation. In fact, it is
so ubiquitous that many refer to reliability analysis as "Weibull analysis." Its versatility and
applicability have contributed to its widespread adoption.

(Refer Slide Time: 01:26)

So, today, we are going to start discussing about the Weibull distribution. Weibull distribution
the first one which we will be discussing is 2 parameter Weibull distribution, which is the most
common one. So, similar to exponential distribution, which is 1 parameter this becomes 2
parameters, similar to 2 parameter exponential distribution, Weibull distribution has a 3
parameter Weibull distribution.

155
 (t ) = at b , t  0

For 2 parameter Weibull distribution, this can not only consider constant failure rate model, but
it can also model increasing hazard rate and decreasing hazard rate model. So, we can say that
hazard rate or failure rate instantaneous failure rate lambda t is a function of time as well as some
constant a.

 −1
   t 
 (t ) =   
    

So, time raise to the power beta now, when for different values of beta it will have the different
forms. So, when beta is 0, b0 and b0 then lambda t will be equal to a t raise to the power 0 will
be 1. So, this becomes time independent that is constant failure rate. If b is less than 1 less than 0
negative quantity in that case, what you will see lambda t will be a t to the negative power. So,
that means it is some it is a inversely proportional or in it is a inverse function of time with some
power. When b is greater than 0, that means b is a positive quantity in that case, with time
lambda t will increase.

So, that becomes a increasing function of time. This is a form which is sometimes used, but more
commonly form which is used for representing failure rate for Weibull distribution is beta upon
theta t upon theta raise to the beta minus 1 where theta is called the scale parameter beta is called
the shape parameter. Generally the parameter which comes as the division or multiplication for
the random variable, here random variable of interest is t.

So, since theta comes in division, so, it modifies or changes the scale of t that is why this is
called scale parameter. But the power changes the behavior or the pattern or the curve. So, beta is
called the shape parameter with the changing of beta the shape of lambda t shape of rt functions
will change, but if you change the theta, then this will result in only contraction or expansion of
the scale or the axis.

So, it will be like stretching the distribution or it will be like contracting the distribution. Theta is
also called the characteristic life. Because the as we will see later when t is equal to theta then
reliability is same. Which is same as the when t is equal to MTTF for the exponential
distribution. Like some cases like when beta is less than 1 then this is called decreasing failure

156
rate and decreasing failure rate will look like exponential only but it will be sharp, beta equal to 1
is a constant failure rate which is also an exponential distribution. And for beta greater than 1, it
may have any shape it may be your Ft versus t may have a different shape. It can be like this can
be like this can be like this.

So, different kinds of distributions shape it can take depending on the value of beta. When beta is
equal to beta is somewhere greater than 3, then it is taking a quite similar shape as normal
distribution something like this. As it becomes more and more it becomes more looking like a
normal distribution when beta is some value between 1 and 3 then distribution is skewed it may
be looked like looking like something like this.

(Refer Slide Time: 05:29)

Or another way, we for if we ignore the Weibull distribution, we know this is our lambda t. So,
we know reliability can be given as e to the power minus integration from 0 to t lambda x dx.
Here lambda is. So, if we integrate this beta upon theta x upon theta raise to the power beta
minus 1 this becomes t upon theta raise to the power beta. In another way in a simpler way if we
see, if you differentiate this with respect to t what we will get we will get t upon theta raise to the
power beta minus 1 beta into 1 upon theta.

 −1
  x  t
− 
R (t ) = exp  −  t0   dx  = e   
    

157
This is same like if we assume t upon theta as x then dt over sorry so, we will have 1 upon theta
dt is equal to dx. So, this will be t upon theta is x so the x raise to the power beta divided by dt, dt
we will replace as theta dx. So, that will be 1 upon theta then we differentiate this this will
become beta x to the power beta minus 1 and so, this will become beta upon theta x if we convert
into t x is equal to t upon theta.

So, this is t upon theta raise to the power beta minus 1, which is same as this beta upon theta x
upon theta raise to the power beta minus 1. So, if we integrate this 0 to t dt dash what we will get
is t upon theta raise to the power beta. So, here once we integrate this what we get is e to the
power minus t upon theta raise to the power beta. So, that is why we write failure rate as a
differentiation of, this is much easier to process. So, that is why more commonly use expression
for reliabilities e to the power minus t upon theta raise to the power beta where theta is the
characteristic like beta is called the shape parameter.

Now, if we take t equal to theta, then in all cases irrespective of beta what will happen t will be
theta. So, this will become e to the power minus 1 raise to the power beta, which is equal to e to
the power minus 1. So, irrespective of value of beta, what if at time t equal to theta the reliability
is always equal to 0.368 which is same as when t is equal to MTTF of a exponential distribution.
We can calculate ft, ft is the PDF, we know PDF is minus dR t over dt or we can say it is equal to
lambda t dt lambda t Rt.

 −1 t
dR(t )   t  − 
 
f (t ) = − =   e
dt   

So, this is my lambda t and this is my Rt. Or if we differentiate this differentiation of this will
produce the same this value the exponential term when we differentiate it will produce this beta
upon theta t upon theta raise to the power beta minus 1 which we have already got it here
multiplied by e to the power minus t upon theta raise to the power beta.

MTTF

Where,

, where

, for positive integer values of

158
Now, if we want to calculate MTTF then MTTF because this is little complex evaluation, we are
not going into the derivation the integration of reliability, if we do we are able to get MTTF
function is theta into gamma function of 1 plus 1 upon beta, where gamma function is
mathematically defined as integration from 0 to infinity of y to the power x minus 1 e to the
power minus y dy.

Gamma function has certain properties like we know factorial x is equal to x factorial x minus 1,
but for gamma function gamma function of x is x minus gamma function of x minus 1. This is
the property of gamma function. And similarly, gamma function of x as we see if we continue to
do this if x is a integer quantity positive integer quantity then the same we can continue and
finally, we will reach up to let us say 1.

So, 1 into till x minus. So, we can say this is equal to factorial x minus 1. So, gamma x becomes
factorial x minus 1, if x is a positive integer. For non-integer values we have to refer certain
tables there are gamma tables available in which we can find out the values of the known values
from 0 to 1, when x value is somewhere from 0 to 1. Various for this distribution is calculated as
theta square gamma of 1 plus 2 upon beta minus gamma of 1 plus 1 upon beta whole square this
is the square of this gamma function. Design life we can achieve from again from same from
here. So, I will erase this.

(Refer Slide Time: 11:13)

159
Design that we know that R is equal to e to the power minus t upon theta raise to the power beta.
So, from here at when we want to know tR that time at which reliability is equal to R. So, this
will be equal to if I take minus ln of R this will be equal to tr divided by theta raise to the power
beta or if I take n tR upon theta will be equal to minus ln of r raise to the power 1 upon beta. So,
tR will be equal to theta into minus ln of R raise to the power 1 upon beta.

So, if I am interested to know median time to failure or the reliability is. So, for median time to
failure my reliability will be equal to 0.5. So, I can get t 0.5 that is median time that will be equal
to theta minus ln of 0.5 whole raise to the power 1 upon beta. So, we can use this formula to get
the median life.

If I am interested to know 90 percent reliability life which we are calling as a design life then R
will be equal to 0.95. The 0.5 will replace with 0.90. There is another indication another
parameter which is used this is for exponential or any distribution that is we are calling as Bx
life. This x can be 1, can be 10, can be 100, can be 0.1 any value, can be 5, what does it mean
B10 or Bx life means that we are taking x percent of failure that means, we want to know the life
at which we will observe x percent of failure.

So, this similar to tR R when I am saying t 0.90 that means, time at each reliability is 90 percent
or we can say this is equal to B 10 that means time average unreliability is 10 percent. So, t 0.9 is
equal to B10. Similarly, t is 0.09 t 0.99 will be equal to B1, 1 percent failure. So, t 0 99 percent
reliability means 1 percent failure. So, Bx is a common notation many times it is referred by
many people that be B1 life B1 life means the time at which we are expecting that 1 percent of
the or x percent of the population will fail.

Similarly, we can get the mode as we have seen that when beta is less than equal to 1 for
exponential et cetera, the maximum value comes at t equal to 0 only. So, that is a mode for beta
less than 1 or beta equal to 1 is at t equal to 0. But for beta greater than 1 this model is changing
this shape changes.

So, because of the change in shape the beta the mode value comes out to be at theta into 1 minus
1 upon beta whole raise to the power 1 upon beta. So, this formula can be used to calculate the
mode of the distribution. To get the mode actually you once you have this mode means you have
the maximum value of ft. So, if you maximize ft the value of t at which ft is maximum that is

160
your t mode. So, if you want you can get it by differentiation of ft and then setting it equal to 0 to
get the value of t mode.

(Refer Slide Time: 15:07)

161
162
163
Now, let us see how does this Weibull distributions look like. So, I have made again use the
Excel sheet here. So, if you look at here my theta value I have put it as 1000 here. If my theta is
1000 and beta is 2. In that case I can calculate everything here. My MTTF, as you see here,
MTTF is theta into gamma function of 1 plus 1 upon beta. If you see this is equal to theta 1000
into actually this Excel function did not have earlier Excel versions did not have the gamma
function directly they have given the gamma ln function gamma ln function gives the log gamma
values of the value.

So, we have to take the exponential of that to get the gamma function value. New Excel features
have this gamma function directly. So, you can use them directly gamma of 1 plus 1 upon beta.

164
So, beta is B2 here B is the column number 2 is the row number B2 means 2 here to have a better
explanation, what I can do this is equal to let me check if gamma is here or not. See, here gamma
function is not there. So I will use the gamma ln, gamma ln of 1 plus 1 upon beta. So, 1 plus 1
divided by beta, beta is 2 and this value I have to take the exponential again.

So, I will use the exponential function here and this whole value has to be multiplied with the
characteristic life theta, theta is 1000 here once I multiply I get the MTTF here which is seem
like shown here.

165
(Refer Slide Time: 17:27)

166
Variance again calculated in a similar way. So, we take the gamma value of 1 plus 2 upon beta
and we take gamma value of 1 plus 1 upon beta, but when we take the gamma value of 1 plus 1
upon beta that is quiet and subtracted from gamma value 1 plus 1 upon beta and this whole
whatever comes is multiplied with a squire of the theta value B1 is theta, that is 1000. So, we can
get the variance here. If I want to know the standard deviation standard deviation will be this
power 0.5 square root of this value that comes out to be 463.

(Refer Slide Time: 18:17)

167
168
169
170
171
172
Now, here I have taken few values of x like x equal to 0 to 2 I have taken and then we have tried
to calculate the fx capital FX or Rx and zx. So, since theta is 1000 these values looks very
smaller. Now, I can take values something like. So, if I take values of x something like 1 then
since, these are some where 10,000 let us says 50 difference.

If I take a difference of 50 you will see that how small fx is coming now, for small fx calculation,
we know the formula small fx is t upon theta raise to the power beta minus 1. So, if you see here
this is t upon theta this is raise to the power beta minus 1 multiplied by beta upon theta beta is
this and theta is this, multiply by exponential of minus t upon theta raise to the power beta.

Once we get this we are able to take the fx. Similarly, we can get the capital Fx which is 1 minus
exponential of t divided by theta whole raise to the power beta and negative value of the same.
Similarly R we can take 1 minus of Fx and zx is nothing but Fx upon Rx so, same thing we are
able to get here maybe the resolution is poor zx is not visible, I will increase the resolution here
to see we able to see. So, as you can see here that we are able to.

173
(Refer Slide Time: 20:29)

174
Now, let us investigate that how this Weibull distribution looks like that, when I am taking beta
equal to 2, then if you see zt is a zt becomes linear because beta minus 1 is 1. Since beta minus 1
is 1 my use of pen here. We know that zx or lambda x is equal to beta upon theta then t upon
theta raised to the power beta minus 1. This is getting disturbed I will use rubber here. We know
that zx here zx is equal to beta upon theta into x upon theta raise to the power beta minus 1.
Now, when theta is equal to 2 what will happen this will be equal to 2 upon theta into x upon
theta raise to the power 1. So, that if we see that is 2 upon theta square x.

175
So, we say that zx is proportional to x. So, it is a linear function. So, but when beta is equal to 1
this will be x to the power 0 that will be constant value. If x beta is equal to 3 then this will
become x square so, zx will be square function of this same we can see.

(Refer Slide Time: 22:38)

176
177
So, if we change the value of beta let us say if we make it 1 then this zt will become constant
same constant value. And when we make beta equal to 3 you see that this will become kind of
parabolic function. Now, if we look make beta equal to 4 this will be much sharper increase if I
make it let us say 5 much sharper going and then increasing faster.

(Refer Slide Time: 23:23)

178
179
Now, let us investigate how the distribution is changing. Now, let us look into the PDF function.
So, PDF functions as we as you see here though it is all plotted only up to 1000. So, let us do
little more let us try it 100 gap. So, if you see that when beta is equal to 5 it is almost shape is
almost looking like the normal distribution. If you see beta equal to 3 almost looking like but
when beta is equal to 2 this is called Rayleigh distribution you have a little skewness is there. But
when beta becomes larger this is almost similar to normal distribution.

180
(Refer Slide Time: 24:05)

181
182
183
Similarly, if you see that shape, so, since PDF shape is changing with the beta, whenever I
change beta a shape is having a change, then that is why this is called scale parameter, let beta is
equal to 1 this will become exponential. All the values beta theta has to be greater than 0 they
cannot be less than equal to 0. Now, let us see what is the impact of theta.

If I make theta equal to 500 half of if I make theta half. So, what will happen? The distribution is
supposed to stretch, what I am getting from here to whatever values I am getting from 0 to 2000.
Same value I will get from the scale from 0 to 1000. You see, it will have the same pattern for 0
to 1000.

184
So, because it is contracting, because of the if you see shape is same here also shape is same, the
only change happened is that scale changed. It was happening on this is 100. So, is this shorter
further can I make 1000 and this will be going to the larger in.

So, when theta is change the shape is not change, shape of the distribution remains in the only
thing either when theta becomes higher this is a stretched out and when theta becomes lesser than
it is contracted, this becomes the reliability function is always starting at 1 at t equal to 0 and it is
a non-increasing function it is a decreasing function. So, as time progresses the reliability is
decreasing.

(Refer Slide Time: 25:06)

Now let us take one example here let us say that there is a compression which is experiencing
wear out. So, wear out means whenever we say wear out this is the increasing hazard rate or beta
is greater than 1. So, here it is given we are given that lambda t is following this pattern that
lambda t is equal to 2 upon 1000 multiplied with t divided by 1000.

2  t 
 (t ) = 
−6
 = 2 10 t
1000  1000 

If we compare this, if we compare this with our function beta upon theta than t upon theta raise
to the point beta minus 1. So, if you see t to the power 1 so, that definitely beta is equal to 2. So,
when theta is equal to 2, then t to the power beta minus 1, so, this will become 1 and here if you

185
see compared with this then for beta equal to 2 this will become beta minus 1. So, 2 minus 1 and
theta is equal to 1000.

So, if you put beta equal to 2 and theta equal to 1000 here, we will get 2 upon 1000 multiply by t
upon 1000 raise to the power beta minus 1 which is same as this. So, from here by comparison
we are able to get that beta is equal to 2 and theta is equal to 1000. Now, from this equation we
get beta to equal to 2 theta equal to 1000.

Now, we want to have a let us say we want our desire is that we want to know the design life at
which reliability is 0.99. So, Rt is equal to 0.99 I want to know what the value of t. So, that is t
0.99 I want to know. As we discussed earlier, this is equal to I can take log of both sides that will
give me minus ln of 0.99 and that will be equal to t upon 1000 square.

So, t upon 1000 will be equal to square root of minus log of 0.99 and t will be equal to 1000 into
square root of minus ln of 0.99. This gives me 100.25 hours. For MTTF calculation again theta
gamma function of 1 plus 1 upon beta, beta is 2 theta 1000. Once you put this, we will be able to
get the MTTF value. Gamma function of 1 plus 1 point 1 by 2 that is gamma function of 1.5 is
equal to this value 0.886227. Similarly, we can calculate sigma square sigma squared is theta
square. So, 1000 square will become 10 to the power 6, gamma of 1 plus 2 by 2, gamma 1 plus 2
upon beta.

So, 2 upon 2 will be 1 minus gamma of 1 plus 1 upon beta whole square. So, gamma of 1 plus 1
by 2 whole square. This if we solve we get the 214601.7 gamma value of 2 as we know it is a
positive integers. So, this is equal to factorial of 1 that is equal to 1. If we take a square root of
this my sigma value comes out to be 463.25 hour.

So, when beta is equal to 2 then our lambda t is a proportional function of time t. So, hazard of
failure it is linear it is a linearly increasing function and the Weibull distribution becomes
Rayleigh distribution. So, as we know that Weibull distribution for different values of beta it
represents a different kind of distribution when beta is equal to 1 it becomes exponential
distribution.

186
187
(Refer Slide Time: 29:55)

Let us take 1 more example that of given failure rate Weibull distribution shape parameter is 1
by 3. So, my beta is equal to 1 by 3 and scale parameter that is theta, theta is equal to 16,000. In
that case, we can find out all the values, but is it reliability, reliability Rt will be equal to e to the
power minus t upon theta raise to the power beta, beta is less than 1. So, it is a decreasing failure
rate.

So, with time like, if I take failure rate, then failure rate will be equal to 1 by 3 into 1 upon theta
that is 1 by 16,000 t raise to the power theta minus 1 so, that will be minus 2 by 3 divided by
16,000, t upon theta is raise to the power 1 minus beta. So, this if you see it is a negative function
negative power of t, since t has a negative power that means, with time the failure rate is going to
decrease. So, decreasing failure rate.

Decreasing failure rate reason in bathtub curve we are also calling as infant mortality period,
because, here in this case this curve infant mortality here the failure rate is decreasing. Now, this
is undesired because what will happen in this case, many failures will happen at the start of the
life which is undesired.

Because any person who is using this product the moment he purchase and started to use, it fails,
so, it is the worst customer experience which is highly undesirable. This reason, when it is failing
that means, in this reason the equipment has already worked for the desired life whatever is
desired life maybe 10 years, 5 years depending on the equipment.

188
So, here if it is failing it is expected and customer has already used the equipment for the certain
period of the time. So, he is satisfied, but these failures are going to lead to the higher and very
high dissatisfaction among the customer. And also company reputation will be affected because
new product launch whenever they do and they start having this kind of failure the company
image will be severely affected. So, this reason we have to address this reason we have to make
sure that customer is not seeing these fillers.

So, the one way of doing that is we do the Burning. In Burning we are keeping this product
within the company and providing at the highest stress and checking. So, make sure that these do
not fail during the initial some period. So, this period we passed it and those components which
do not fail, and those products are only sent to the customer. So customers start seeing the life
from where which is near to the constant failure it reason.

Now, how can we calculate MTTF for this we know the formula for MTTF. MTTF is theta
gamma function of 1 plus 1 upon beta. So, that will become theta gamma function of 1 plus 1 by
3. So, 1 by 2 if we take this will become 3. So, gamma function of 4 comma functional 4 is
factorial 3 that will be 6.

So, 6 into 16,000 gives me the 96,000, t median as we have discussed earlier t median is theta
minus ln of 0.5 that comes out to be 0.693 raised to the power of 1 upon beta 1 upon beta
becomes 3 because beta is 1 by 3 this gives me 5328 hours. Since beta is less than 1 mode will be
equal to 0. So, t mode is equal to 0 and sigma square is theta square gamma functional 1 plus 2
upon beta. So, 1 plus 2 divided by 1 by 3.

So, that will be 1 plus 6 that will be equal to 7. So, gamma of 7 means you have the factorial 6
and similarly, 1 plus 1 upon beta is gamma of 4, gamma of 4 means factorial 3 that is 6. Once we
solve this we get this value of sigma square and from here we can get the standard deviation
sigma equal to 418.4 into 10 to the power 3 hours.

Now, if I am interested to know the design life for 90 percent reliability that means, I want to sell
the product when and I want to promise the life as the life at which reliability is not falling below
90 percent. So, I will promise the product for around 18 hours. So, for 18 hours when the product
is used, there is 90 percent chance that system it will continue to work for 18 hours. So, my

189
reliability is 90 percent here or this also we can say this is the B10 life. Similarly, if I want to
calculate B1 life B1 life means 1 percent is equal to 0.01 failure probability.

So, my reliability will be equal to 1 minus 0.01 that will be equal to 0.99. So, I want to know the
life at which reliabilities 0.99 that is t 0.99 this will be same formula we will use that will be
16,000 square minus ln of 0.9 whole cube, this comes out to be 0.0162 hours. So, if you see that
initially number of failures are very high 1 percent is happening like 10 percent failures are
happening within 18 hours. So, we will stop it here. We will continue this discussion in our next
lecture. Thank you.

190
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 11
Burn-In Screening for Weibull
Hello everyone. So, now, we are moving to lecture number 11 which is in continuation with our
previous lecture on Weibull distribution.

(Refer Slide Time: 00:36)

In previous lecture we discussed about the basic equations used in Weibull distribution,
calculation of design life, variance, MTTF et cetera. Now, let us discuss that burn in. So, in
previous example, when we discussed that there was a decreasing failure rate. So, whenever we
have a decreasing failure rate, case of decreasing failure rate and in those cases, we use the burn
in so, that out of box failure or the immediate failures are not experienced by the customer.

So, customer is able to have a good and observe the useful life period he is not having the infant
mortality period. So, what we are doing here we are having a burn in period for that. So, how
much period will be good enough for us in which we will be able to ensure that customer is
getting good reliability.

So, because this is going to be costly for the company, a company has to keep a product for
certain period t0 here within their premises under stress conditions. So, once they apply the
stresses and these many times there is a high temperature applied around 60 to 70 degrees celsius

191
as well as all electrical stresses are applied to the product. For non electrical product they will be
like for bikes et cetera they will run the bike for let us say 4 hours 5 hours or for a higher terrain,
the terrain will not be a smooth terrain, they will have a rough terrain and the which will produce
more load on the system.

So, if there is any weakness in the product or if there is any manufacturing defect or other kinds
of design defect, then what will happen, the product will fail faster. And so, the product which
are failing will be removed they will not be sent to the customer. So, customer will only be sent
those products which pass this this burn in screening. So, we call this screen because this is
screening there is this kind of screen from which the faulty products are dropped here and only
good products are passed.

So, screening strength, the strength of this screen that means how much is the capability of this
screen to stop bad product from reaching to the customer. That screening strength will depend on
the stresses which we are choosing for this screen as well as the time how long we are going to
keep the product for these stresses. So, generally if we assume here that general case that the
same stresses as the normal use conditions are applied here.

(Refer Slide Time: 03:42)

If we do that, in that case we are seeing that we are keeping the product for time t equal to t0
within the premises then we are sending the product to the customer for and we want to know the
reliability for time small time t which is the time observed by the customer. So, that means we

192
know from the condition reliability formula Rt given T0 is the reliability at time t plus T0
divided by reliability at time T0. And we know the reliability formula that is e to the power
minus t upon theta raise to the power beta here t is t plus T0. So, this will become t plus T0
divided by theta raise to the power beta divided by e to the power minus T0 upon theta raise to
the power beta.

The same if we take a this will become this. We knew that for exponential distribution makes no
difference the burn in will have no difference. We can see it here, if beta is equal to 1 this will
become e to the power minus t plus T0 divided by theta divided by e to the power minus T0
divided by theta.

If we take it up, this will become e to the power minus t plus T0 divided by theta plus T0 divided
by theta. This will be equal to e to the power minus T0 T0 will get cancelled minus t upon theta.
So burn-in has no impact on the exponential distribution or the constant failure, burn-in is only
effective for the decreasing hazard rate. For increasing hazard rate the reliability will decrease.

So, here the beta if we take the previous example, which we discussed in previous lecture, that
there is a Weibull failure distribution with parameter beta equal to 1 by 3, 1 by 3 means
decreasing failure rate less than 1 and scale parameter is 16,000. We want to determine the
reliability function and design life for 0.90 reliability. Given that we are doing the 10 hours burn-
in So, if we do 10 hours burn-in then reliability will become tR given T0 that is our target value
0.9. And so, this will become reliability of t 0.9 that means 90 percent reliability is aim given 10
hours are already spent.

So, exponential minus of we use this formula t is equal to t 0.9 T0 is 10 hours divided by 16,000
hours minus 10 divided by 16,000 hours whole raise to power 1 by 3, this our target value is 0.9.
So, if we reverse all this we take exponential here. So, this will become ln, so, minus ln of 0.90
will be equal to this value inside value t 0.9 plus 10 divided by 16,000 power 1 by 3 minus this.

Now here this value we can take right side. So, this will become minus ln of 0.90 plus 10 divided
by 16,000 raise to the power 1 by 3 this will be equal to this. Now, if I want to take this then this
will become t 0.9 plus 10 divided by 16,000 whole raise to the power 1 by 3 will be equal to

193
minus ln 0.9 plus 10 divided by 16,000 whole raise 16,000 raise to the power 1 by 3. Now, this 1
by 3 we can take it here this will become 3.

So, t 0.9 plus 10 divided by 16,000 will be equal to minus ln 0.9 plus 10 divided by 16,000
whole raise to the power 1 by 3 and this whole raise to the power 3. So, t 0.9 will be equal to we
can take multiply here this 16,000, we can multiply here and this 10 we can take it here as a
subtraction.

So, this will be equal to minus same what you have written here that will be 16,000 into minus ln
of 0.9 plus 10 divided by 16,000 raise to the power of 1 by 3 whole raise to the power 3 minus
10. This if we solve we get the 101.24 hours. So, if we as we saw earlier in previous example,
then we did not do the burn-in then for 0.9 reliability my system was having only 18.71 hours as
a life that means 10 percent failures for happening only in 18 hours or around 19 hours.

R ( t R ∣ T0 ) = R = 0.9
  t + 10 1/3  10 1/3 
R ( t0.90 ∣ 10 ) = exp  −  0.90  −  16000   = 0.9
  16000    

 t + 10 
1/3 1/3
 10 
− In0.90 =  0.90  − 
 16000   16000 
3
  10  
1/3

t0.90 = 16000  − In0.90 +    − 10 = 101.24hr


  16000  

But if we see that if we applied 10 hours burn in, then the customer will see a life of 101 hours. If
you see it is a significant increase in the life seen by the customer, because what is happening the
products which were failing fast in a shorter period, those products are not sent to the customer
those products are kept within the premises they are repaired and then sent to the customer.

So, the problem which customer was supposed to see has already been rectified by the
manufacturing unit and this results in a significant increase in the life of the product. So,
customer is for 100 hours almost for 100 hours in place of 18 hours they are getting the point
reliability. So, 10 percent failures are happening in 100 hours approximately. So, this becomes
quite useful here. And as we see that burn-in is used when there is a decrease in failure rate.

194
(Refer Slide Time: 09:30)

Like we discussed earlier, multiple failure modes in case of exponential distribution, let us
discuss the same when we have the Weibull distribution. So, we know from earlier discussion
that system failure rate is summation of failure rate of the failure modes. So, lambda t is equal to
summation of lambda i t. So, if there are n failure modes it is i equal to 1 to n. Failure modes and
if components are in series both have the same effect. So, failure modes can be considered as the
series failures or series component.

i −1
n n    t 
 (t ) = i (t ) =  i  
i =1 
i =1
 i  i 

So, here what is lambda i t? Lambda i t is beta i upon theta i upon t upon theta i raise to power
beta i minus 1. Same formula we have kept it here. Now, this formula if we see we cannot
convert this formula into the standard beta upon theta and t and t upon theta raise to power beta
minus 1 or we can say a t to the power b.

We are not able to convert this into this formula because this is the summation of this we cannot
like if you say beta 1 upon theta 1 to t upon theta 1 raise to the power let us say beta 1 minus 1
plus beta 2 upon theta 2 t upon theta 2 raise to the power beta 2 minus 1. Now, I cannot convert
this into the standard form.

195
So, that is why this failure rate when we sum this failure rate, it may not be following the
Weibull distribution. So, in case of when components are failing, Weibull distribution then we
cannot say even though they are in series, they may not the system may not follow the Weibull
distribution. But, in case of exponential distributions this first true if components or the multiple
failure modes are having the exponential distribution resultant distribution was also the
exponential distribution, but same is not true for the Weibull distribution, when components are
in series following Weibull distribution, it may not result in a Weibull distribution for the
system.

But there is a condition that when all come if we consider special case when all failure modes are
having same shape parameter that means, beta is same for all, all beta i is equal to beta. In that
case, if we see this lambda t lambda t is changed to a another that is beta upon theta i t upon theta
theta is different for different failure modes but beta is same so, this will become beta upon theta
i t upon theta i raise to the power beta minus 1.

 −1 
n    t   −1
n 1
 (t ) =    = t  
i =1 
 i  i  i =1 
 i

So, time dependent parameters are beta and t. So, I can take beta outside from the summation
sign. Similarly, t raise to the power beta minus 1 also I can take it outside and this will become 1
upon theta i raise to the power beta. Now, this if we see this becomes kind of similar approach as
we have taken earlier, that I can say it that this is beta summation of 1 upon theta i raise to the
power beta this will be beta minus 1 not sorry beta sorry this will be beta only beta into t to the
power beta minus 1 this I can say this is a t to the power b where a is this and t to the power b.

So, and we know that if I compare a t to the power b and if I compare an equivalent here that is
beta upon theta and t upon theta raise to the power beta minus 1 if I convert this here then this
will be equal to beta upon theta raise to the power beta into t raise to the power beta minus 1. So,
if we equivalent here then b is equal to beta minus 1 and a is equal to beta upon theta raise to the
point beta.

So, here I can get the so, if I equivalent here, so, beta minus 1. So, here we are able to have the
standard form which is standard form is similar to the Weibull distribution form. So, I can
convert this into the standard form and that is standard form if you look into here, then this is

196
having the same beta value, if I say b is equal to beta minus 1, so, in that case beta will be equal
to b plus 1.

So, I am getting here beta minus 1. So, here the shape parameter will be same as beta because t
to the power beta and beta is known to me, so, my I can get the theta here. So, theta will be equal
to if I take it here that will be equal to theta raise to the power beta I will take theta raise to the
power beta will be equal to beta upon a and this if I solve further than theta will be equal to beta
upon a raise to the power 1 upon beta.

So, same formula if I apply here then my a here is this and if I divide beta upon this will be 1
upon summation 1 upon theta i raise to power beta i equal to 1 to n this whole raise to the power
1 upon beta. Because beta and beta will get cancelled this beta and this beta will get cancelled
and this becomes summation i equal to 1 to n 1 upon theta i whole raise to power beta and this
whole raise to the power minus 1 upon beta.

So, we are able to get the same so, that means, we are able to convert when beta is same the
system is also following the Weibull distribution Weibull distribution which is having shape
parameter which is the same as the component shape parameter which is beta and characteristic
life is having this formula that is summation i equal to 1 to n summation of 1 upon theta i whole
raise to the power minus 1 upon beta. So, this becomes this gives that when components are in
series or when there are multiple failure modes, the result can also be Weibull distribution, if all
the components which are falling Weibull distribution has the same beta otherwise it will not be
Weibull distribution.

(Refer Slide Time: 16:24)

197
There can be another special case that can be sorry another special case can be that we have all
components are identical, all components are identical means theta i is also equal to theta and
beta i is also equal to beta all are same for everyone. So, then what will happen this becomes
very simple that I have to simply sum it up n number of times, because this will become
summation of beta upon theta t upon theta raise to beta minus 1. So, this become n beta upon on
theta upon t upon theta raise to power beta minus 1. If we see here, this again has the same shape
parameter as beta.

 −1  −1
n
   t     t 
 (t ) =    = n   
i =1         

198
If I compare with this like earlier formula, which we have got. So, in this case, if I put n so, this
is beta then we can say theta and if I make this n here in a manner that theta raise to the power n
raise to the power 1 upon beta, then t upon theta divided by n raise to the power 1 upon beta
minus 1 1 upon beta sorry and whole raise to the power beta minus 1.

In that case once we multiply these bottom terms what will happen, this I can write it as a n can
be taken up. So, this will become n raise to the power of 1 upon beta upon into beta divided by
theta into n raise to the power of 1 upon beta divided into t divided by theta raise to the power
beta minus 1.

So, here if I am taking multiplying this this will become n raise to the power 1 upon beta plus
beta minus 1 upon beta, if we sum it up, we get n n raise to the power beta 1 plus beta minus 1
that will be equal to n raise to the power beta upon beta that will be equal to n which is same this
one.

So, here we are able what we are able to do we are able to convert this where theta is changed
into theta divided by n raise to the power 1 upon beta and beta remains same. So, we are able to
transform and we are able to get the system failure rate or system will follow the Weibull
distribution which is having same shape parameters as beta and which is having characteristic
life which is equal to theta upon n raise to the power 1 upon beta n is the number of failure
modes or number of components here.

(Refer Slide Time: 19:09)

199
200
201
202
203
Let us see some example. Let us say jet. So, let us say if a engine is there, jet engine is there
which is having 5 modules, these 5 modules are having the Weibull distribution following
Weibull distribution and the shape parameter of all the Weibull distribution is 1.5. So, their scale
parameters are 3600, 72,000, 5850, 4780 and 9300.

We want to find out the MTTF and median time to failure. So, here if you want we can use this I
have just put it here Excel sheet here so, that we are able to see, now we know that how much is
the failure rate for each one or how much is this. So, that is equal to 1 divided by first is 3600
whole power 1.5 beta is 1.5. Similarly, we will calculate others like here if you look into and 1

204
divided by 7200, then this is 1 divided by 5850, this is 4780 and this is 9300. So, what we have
got here that is t upon theta sorry 1 upon theta raise to the power beta.

(Refer Slide Time: 21:17)

205
This if you look at it this is from our previous here here that when all are equal then what we
have to do sorry when all are not equal then this is what we have calculated summation of 1 upon
theta i raise to power beta. So, when we sum it up this is equal to sum of all these.

(Refer Slide Time: 21:05)

206
207
Now, this sum which we have got here I want to know the so, we have got this portion now, we
have to multiply with beta into t raise to the power beta minus 1 then we will get the lambda,
what we want to calculate here is, what we want to calculate here is theta.

(Refer Slide Time: 22:31)

208
209
So, theta is summation we have calculated and now this raise to the power minus 1 upon beta.
So, this is equal to summation whole to the power minus 1 divided by beta, beta is 1.5. So, my
theta or my characteristic life comes out to be 1842.67.

(Refer Slide Time: 23:05)

210
My beta is same as 1.5. Now, I want to do the MTTF. How do I calculate MTTF? For MTTF
calculation that is equal to theta. I have got theta. This is my theta, multiply by gamma function
of this is new version. So I am here I am directly having the gamma function gamma of 1 plus 1
divided by beta. What is my beta? This 1.5 is my beta. Once I do this, I will get the MTTF. So
MTTF is approximately 1663.47.

211
(Refer Slide Time: 23:55)

212
Similarly, I can get the MTTF, what is my MTTF? MTTF is equal to theta. This is my theta sorry
MTTF we have already calculated t median. So t median if you want to calculate t median is
equal to or we can say t 0.5. So for t 0.5. We are able to know that is theta into ln of minus ln of
0 this is equal to theta my theta is this one multiply by minus log of log of 0.5 whole raise to the
power 1 divided by beta beta is this 1.5. And we get 1443 cycles.

And reliability function if we see that is exponential minus this is t upon theta raise to the power
beta. If t is 1 then we will get that value. So, we have used Excel for calculation because it is
faster and we can do by calculator also.

(Refer Slide Time: 25:35)

213
Let us take another example that a electrical system has 4 series connector each having Weibull
failure rate beta is equal to 3.4 3 by 4. So, this is equal to 3 by 4, this is my beta. Now, theta is
given as 2000. Now, I want to know that if I am having 4 connectors, so, n is 4, what will be my
reliability?

As we know reliability is nothing but e to the power minus t upon theta raise to the power beta
for 1. So, once we multiply for 4 this will become multiplication by 4. So, that is equal to
exponential minus 4 into t, t is 150 hours 150 divided by theta raise to the power beta so that
comes out to be 0.5637. So, we are able to as we see here we are able to calculate our reliability.

(Refer Slide Time: 27:28)

214
If we want to calculate let us failure rate here or the characteristic life here then we knew that
characteristic life here or that beta will remain same because they are all same theta will be equal
to theta divided by n to the power 1 upon beta. So, this will be equal to theta or multiply by n
raise to the power minus 1 divided by beta. So, this becomes our new theta 314.98 for the
system.

215
(Refer Slide Time: 28:12)

216
So, if we want we could have directly calculated here that is equal to reliability is exponential
minus of 150 divided by theta. Now theta for system is this whole beta if you see we get the
same reliability either we use this or we use this both will give you the same result. So,
effectively we are able to calculate new theta and beta for the system.

(Refer Slide Time: 28:48)

There is same what we have discussed can also be extended for the 3 parameter Weibull. As we
discussed for the 2 parameter exponential distribution, one additional parameter t0 is added here,
what is t0? t0 is the time for which we expect no failure here. So, in this case, if we replace t with

217
t minus t0, the same thing happened whatever happened with the 2 parameters just we replace t
with t minus t0 and this becomes the 3 parameter Weibull distribution.

The, here the third parameter is called location parameter because what will happen if we change
the t0 value here there will be no change in the shape or there will no change in the scale the only
change will be the distribution will be shifted. If I increase the value of t0 it will be shifting right
if I decrease the value of t0 it will shift the towards left.

So, this is a location parameter because it is changing the PDF location only it is not changing
the making any other changes PDF. But this is also the guaranteed life or we can say this is the
life within which this failure will not happen. This should be t0 these values are defined for t
greater than t0.

Now MTTF as we have calculated earlier that for exponential also what happened, whatever my
reliability was here, this is 1 here. So, area under the curve t0 is added. This is 1 and this is t0 so
1 into t0 t0. So, whatever MTTF you got with the 2 parameter is this, if we simply sum up t0
with this we get the MTTF for the 3 parameter distribution.

Similarly, for t median also we have to add only t0 because distribution is simply sifted by the t0
towards the right and similarly for tR also we need to only add t0 with whatever formula we had
earlier for the design life. The variance does not change variance remains same, because
variability is not changing by the location.

218
(Refer Slide Time: 31:02)

For an example, if we have the beta equal to 4 t0 is 100 and theta is 780. What is MTTF? MTTF
is 100 plus that is 780 gamma of 1 plus 1 upon beta we can get a0 is 6.99 hour, t median is t0
plus theta into minus ln of 0.5 which is 0.6931 whole raise to the power beta. So, we get 100 into
780 into this to the power beta and this gives me sorry this to the power 1 upon beta.

So, this 811.7 and sigma square is theta square gamma of 1 plus 2 upon beta minus square of
gamma of 1 plus 1 upon beta multiply with sorry this actually is equal to equal sign is missing.
So, once we calculate this, we will get this and square root of this will produce 198. If I want to
calculate reliability for 500 hours, where reliability is e to the power minus t minus t0 is 100
divided by 780 whole raise to the power 4. It comes out to be 0.933.

219
(Refer Slide Time: 32:20)

So, we will stop our discussion here and we will continue a little more discussion left for the
Weibull distribution and then we will start discussing other distributions like normal and log
normal distribution. Thank you.

220
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 12
Weibull Distribution

(Refer Slide Time: 00:26)

221
222
223
224
225
226
Hello everyone. So, we have been discussing about Weibull Distribution. So, we have now
moved to lecture number 12. We will discuss more about three parameter Weibull
distribution as well as one more application of Weibull distribution here. So, as we see here,
let us see, we have discussed what is three-parameter distribution, I have built Excel sheet
here there all those formulas I have already put up here what we discussed with the three
parameter Weibull distribution as we see here, I have included the third parameter here that is
let us say t0. So, if we take t0 equal to 10, what will the impact here.

So, I have started this calculation from t0, so, t0 plus certain a little bit higher value so, that
you will be able to calculate because at t equal to same as 10, it will be undefined. Now, here
if you see if I increase let us say if I change the location parameter then what will happen?

227
So, if I make it let us say 20, what will happen? The visible changes not there let us say I will
make it 100, if I make it 100, you see that the values are changed or if I make it 90.

So, this simply shifted nothing much changes there the Weibull or the distribution is only
shifted. So, let us see if I make it from 200 or I will say I will do not take this, let us make it
150, then what will happen it will be only shifting towards right nothing much changes
happening, because up to this value the reliability is equal to 1 then now, let us say if we
consider that t is 20 here only.

But let us say if we take the change in beta, if I make it beta change it is becoming more and
more modal. If you see there is another thing which you can see here, let us say if we talk
about the is this is small ft distribution small ft is the PDF. What does PDF tell? PDF is
telling us where the failures are concentrated. So, if we have a full population, then from the
population most of the failures are populated in this region while here still some equipment’s
continue to work for a longer period, most of the failures are happening within this period.
Now, for this if I make let us say beta equal to 3, what happens here the this is becoming
much sharper, that this peakedness is increasing.

So, failure concentration is also becoming in less area this is also visible, if you look at the
standard deviation, standard deviation is 3 to 4, but when I took beta equal to 2 standard
deviation was 463, when I made beta equal to 3 standard deviation has become 324, that
means the standard deviation has reduced or the variability has reduced or the distribution has
become peaked.

Now, let us see if I make this more let us say I make this 6 then my standard deviation is
further decrease it is becoming only 180 and this very sharp, sharp distribution here means
that most of my failures are going to be in small smaller region. So, for a company it is
desirable, if you have the higher beta it will have the less variance and because of the less
variance or because of the less standard division what is happening you are having larger
failure free period if you see that up to almost 400 hours 500 hours the chances of failure are
very small.

But when we move to this area, most of the failure occurs and nothing is surviving very long.
So, very few units will survive very low. So, failure concentration is becoming you know,
smaller zone. And that is that would be the desirable situation for the manufacturer because

228
this is going to help him understand that where the failures are occurring, but this is only
happening when there is a deterioration.

So once we know that deterioration kind of failures are there, we have much more certainty
about it like if you see, like if we simply change theta here, what will happen, this will be this
is supposed to stretch, but not much. If you see most of the failures are concentrated in here,
sorry, this is to 2000.

Failures are starting to happen after a longer period of around 1000, like when we took 1000
here than most of the failures were happening somewhere around 600, failures were starting
to happen, the earlier failure probability was very less, we are happening somewhere around
400 or 300 around, but when we made it 2000. Then we see that failure, the probability of
failure was very small up to somewhere around 800, when we made this 3000 then
probability of failure was small almost up to 1100 or 1000.

As we see here, as we are having high characteristic life, it is means that my failures are
starting late to happen when beta is high, but when beta is small, then in that case, it does not
make much difference, the number of the probability of failure is still existing here it is
comparable here, but when beta is high, then chances of early failures would be less, most of
the failures would be concentrated in a smaller zone here.

So, because of that, this gives us an understanding that the system, if we know more about the
system, if we know the degradation mechanism, if we are able to design the system better,
then we are expecting that beta should be larger. So that variability is lesser, variability about
the failure time is lesser. So that will help us to decide our policies better, our surprises will
be lesser and but the chance failures cannot be ignored, chance failures which are there due to
the random failures, they may still happen. So that may further give rise to the ft value.

So those because of that, early failures may be more but if all, if those can be removed, then
you may have a very less chance of failure till you are having a certain life period around
missed in that case this empty tf is having much more meaningful 918. If we look at it 918.
That is the central almost central where the failures are happening when beta is large, but
when beta is small, then it is not having much impact because variability is high. So, at 1000
if you look then it is already past much after the PQ, if we take 1.5 if you see that further
1000 values coming in too far towards the, towards the peak, away from the peak.

229
So, here, if we look at the Weibull distribution, it is it can show you various factors, it can
show you the various ways of the failures are happening and will help you to model various
scenarios, which are happening in the field or with the manufacturing. And if you are able to
understand this better, it will give you very much insight into your design process and the
manufacturing processes.

(Refer Slide Time: 08:43)

There is a like we discussed for exponential distribution, if you put two components in
parallel or redundant fashion that means if one component fail, the system will still continue
to work. So, for system failure, both components should fail. So, we have the parallel
redundancy here, that means two out of the two components should fail here.

230
So, we have already seen this in the when we discuss the CFR model, constant failure rate
model that when we put two into parallel then system reliability becomes R1 plus R2 minus
R1, R2. If both are identical, we are assuming that both are identical and independent. So,
since both are identical R1 t is equal to R2 t which should be equal to R, Rt. So this can be
written as Rt plus Rt minus Rt square.

So this will become 2 Rt minus Rt square. Where what is Rt, Rt is equal to single component
reliability which is e to the power minus t upon theta raise to the power beta. So, our system
reliability in this case will become 2 into e to the power minus t upon theta raised to the
power beta multiply by R square, can I take a square of this this will become multiply by
power will be multiplied by 2. So, e to the power minus 2 into t upon theta raised to the
power beta.

As we have seen this quantity I can write it is e to the power minus t upon into 2 raised to the
power 1 upon beta divided by theta raised to the power beta or I can write it as e to the power
minus t upon theta divided by 2 to the power 1 upon beta whole raised to the power beta, this
is same as what we did earlier or in case of multiple failure modes that our theta is changed to
theta divided by 2 raised to the power 1 upon beta.

So, this is a this becomes a standard form for Weibull distribution reliability formula. So,
there beta is same and theta has become theta divided by 2 raised to the power 1 upon beta.
Now, we know this is the standard that we know that when Rt is equal to e to the power
minus t upon theta raised to the power beta, then my MTTF, MTTF which is equal to

231
integration from 0 to infinity Rt dt, this comes out to be theta gamma function of 1 plus 1
upon beta.

So, same thing if we apply here, then 2 will remain same, and then we take integration of this
from 0 to infinity this will give me gamma function of theta into gamma function of 1 plus 1
upon beta same thing, what we have got here, for this when we do the integration, this will be
theta is this and beta is this.

So, when I apply the formula this will become theta upon 2 raised to the power 1 upon beta
gamma function of 1 plus 1 upon beta. Same thing I have written here theta upon 1 2 raised to
power 1 upon beta gamma function of 1 plus 1 upon beta. So, this becomes my new MTTF,
MTTF of this system.

So, this MTTF of the system and what I can do I can take gamma function of 1 plus 1 upon
beta into theta as common from here also from here also, then what is remaining, 2 is
remaining from here and 1 upon 2 raised to the power 1 upon beta is remaining here. So,
taking common theta gamma function of 1 plus 1 upon beta the remaining is 2 minus. Now,
this I can take it up this will become 2 to the power minus 1 upon beta.

So, MTTF becomes theta gamma function of 1 plus 1 upon theta multiplied by 2 minus 2
raised to the power minus 1 upon beta. Now, if we want to calculate. So, MTTF calculation is
clear now, let us say if you want to calculate lambda s or if you want to calculate fs. So, we
know that f t is equal to minus dR t over dt. So, differentiating this function with respect to t
will give me the fs t, if I differentiate this, then we know that 2 will remain as it is if we
differentiate this then we have to differentiate this.

So, this differentiation will give beta into t upon theta raised to the power beta t raised to the
power beta minus 1 into e to the power minus t upon theta raised to the power beta minus, if I
differentiate this I will get so, minus sign was also there, but minus sign will be cancelled
with this minus. So, that is why I am not putting minus, minus minus will become plus here.

So, minus of this which is coming down that will be compensated by this minus so, this will
again remain, so, sign remains same here. So, differentiation of this will produce 2 into again
beta t to the power beta minus 1 divided by theta raised to the power beta. Once you take this
then if I take common that is beta I can write this as beta upon theta into t upon theta raised to
the power beta minus 1.

232
If I take this as common this is this, then what is remaining and I can also take e to the power
minus t upon theta raised to the power beta as common. Then what is remaining here is 2. So,
this I have taken common and e to the power minus 2 into t upon theta raised to power beta is
there. So, this is square of the t upon theta raised to the power beta. So, 2 is here, 2 is here
also.

So, 2 e to the power minus t upon theta raised to the power beta will come from here, because
this was 2. So, 1 has come out 1 will be remaining here, because it was square of this. So, this
gives me the fs t. So, fs t becomes 2 into beta upon theta, I can take 2 also outside or I can
take write in this fashion 2 upon 2 into beta upon theta t upon theta raised to the power beta
minus 1 into e to the power minus t upon theta raised to the power beta and this is e to the
power minus 2 upon 2 t upon theta raised to the power beta this is the remaining similar like
2 raised to the power 1 upon beta multiplied by 2 raised to 1 upon beta whole raised to the
power beta minus 1 if we say then this will become 2 which is coming out.

So, this we have got, and this is my fs t to get the lambda t, lambda is equal to lambda t is
equal to ft upon Rt. So, this is my ft already available here. Now, what I have to do I have to
divide it by the Rt, what is my Rt here? Rt is 2 e to the power minus t upon theta raised to the
power beta if I take minus e to the power minus 2 t upon theta raised to the power beta, if I
take e to the power minus t upon theta as raised to the power beta as common than this will
be equal to e to the power minus t upon theta raised to the power beta into 2 minus e to the
power minus t upon theta raised to the power beta.

If you see it here, this will get cancelled with this and what is remaining is beta upon theta
into t upon theta raised to the power beta minus 1 multiplied by 2 minus 2 e to the power
minus t upon theta raised to the power beta divided by 2 minus e to the power minus t upon
theta raised to the power beta.

So, this gives me this is very similar to what we have got in the constant failure rate. The only
difference is beta was 1 there so, if beta we put 1 there then this was coming as the lambda,
lambda into 2 minus 2 e to the power minus lambda t and this was 2 minus e to the 1 minus
lambda t similar. So, this also have the similar function that when t is tending to infinity what
will happen when t is tending to infinity then this value tend to be 0 this value tend to be 0
when time is large, this will become negligible This will reduce. So, this will become 2 upon
2 and the failure rate was lambda as we discussed earlier, when two components are in

233
parallel, then as time becomes large the system failure rate reaches the single component
failure rate lambda.

Same is applicable here when time is large what will happen this will tend to 0, this will also
tend to 0, this will become 2 upon 2 that will be equal to 1, and what is remaining is beta
upon theta into t upon theta raise to the power beta minus 1 which is the failure rate for one
component as we have considered earlier.

So, when time becomes high, time tends to infinity then our failure rate for the system is
tending to be the same as the failure rate of one component. But as we discussed earlier
neither for exponential, neither for Weibull distribution, if we have components in parallel,
then it will not result in a Weibull distribution.

So, if two components Weibull components are in parallel, we are not able to get the same as
the Weibull distribution as you see here, this distribution cannot be represented in a standard
form of the Weibull distribution. And as we discussed for larger value of t the system failure
rate approaches the failure rate of single component.

(Refer Slide Time: 19:20)

Let us take one example, that we have two fuel pumps and both are having failure
distribution with beta equal to 1 by 2 and theta equal to 1000 hours. So, that means, my
reliability for each component is e to the power minus t upon theta. So, that is t upon 1000
raised to the power beta 1 by 2.

234
Now, we want to know the relative 100-hour mission. So, we know that Rt is equal to Rs t
will be equal to twice of e to the power minus t upon theta raised to the power beta 0.5 minus
square of this that is minus twice of t upon theta raised to the power beta. Once we apply this
formula here, we will get the reliability we want to know 100-hour reliability. So, we can
calculate the 100-hour liability here.

MTTF

(Refer Slide Time: 20:23)

235
236
237
238
239
240
241
242
243
244
Let us see how we can get this here. So, we can take first term, first term is equal to 2 into
exponential of minus of t is 100 here divided by theta is 1000 here whole the raised to the
power, beta is 1 by 2 0.5, second term is equal to exponential minus t minus 2 into or we can
say minus 2 itself I can write minus 2 into t divided by theta whole raised to the power beta,
beta is 0.5 and my reliability is this minus this.

So, this becomes my system reliability Rs 0.9265. What is my MTTF? As we discussed


earlier MTTF like for first term if I want to calculate MTTF that will be equal to 2 into
gamma function of 1 plus 1 divided by beta, beta is of 0.5 and for this multiply sorry multiply
by this multiply by the theta is 1000 and this will be equal to as we discussed earlier this will
be equal to theta, theta is change to theta divided by 2 raised to the power 1 divided by beta.

245
So, 0.1 by 0.5, 1 by 1 by 2 if I do this will become 2, multiply by gamma of 1 plus 1 divided
by 1 by 2 that will become 2 and answer would be equal to this minus this which comes out
to be 3500 hours. So, my MTTF also we are able to calculate this is my MTTF. As we know
if I, if I did not have this if I want to know for single components, for single component this
was exponential minus t was let us say 100 divided by 1000 whole raised to the power 0.5.
This was my single component reliability that is 0.7288 and MTTF for the same is 1000 into
gamma function of 1 plus 3, 1 plus 2, 1 plus 1 by 2 is 2 that comes out to be 2000.

So, if we see it here that by having two component by reliabilities change to from 2000 it is
changed to 3500. But, in case of CFR the relative is 1.5 of this that means, if I had taken that
would have been 3000. But because of different beta, beta is 0.5 here the reliability
improvement has been like has been like if we see here, it is almost double 2000 has become
3500 the MTTF has increased to that, while reliability is changed from 0.72 to 0.92 when we
use components in parallel. Similar thing if we want to investigate let us say if you take beta
is was equal to 2 let us hypothetically assume this question is solved let us say if we assume
beta equal to 2.

So, in that case let us see how it impacts rather than 0.5, I will write wherever beta have taken
I will take 2, then my reliability was this. And my MTTF, I will take 2 beta is 2. So this will
become 1 by 2 and this will also become 0.5, I will write 0.5 here. If you see here that here
the ratio is 1.29, but when beta was 0.5, the ratio was 3500 divided by 2000, which was
around 1.75.

If you see that here or depending on the value of beta you may have a different improvement.
So, when we changed when beta was less than the improvement was more in MTTF because
of the redundancy, but when beta was higher than it became 2 my improvement has become
1.75 times. So, similarly we will be able to, we can see that or whether for different values of
beta we may have a different improvement in MTTF when we are having the redundancy in
the components.

So if I had made it properly I could have shown like we have shown with beta different
values or if I take let us say beta value if I write separately, then it will be much easy. Let us
say beta I am writing here 2. So, then we can investigate that how this happens. And theta is
1000, now I will write the standard formula here. So that is twice of exponential. Let us say
time, time also I will write. Time is 100 here. So, when time is 100 then this will become 2

246
into time divided by theta raised to the power beta. This will become time divided by theta
raised to the power beta.

This will become 2 into gamma 1 plus 1 upon beta plus 1 plus 1 divided by beta multiplied
by theta this is theta divided by 2 raised to the power 1 by beta multiply with gamma 1 plus 1
divided by beta, this we can take it here as the ratio so, now, let us say, if we investigate that
when let us say beta is 0.5, then my ratio was 3.94, when I took beta equal to 1, 1.69 265
gamma of this 2 to the power minus 2 only or minus 2 is minus beta, just let me just check
this formula once again that we have MTTF 2 to the power minus 1 upon beta. So this will I
took a division then minus sign will be adjusted here.

So, same thing, as we see here when we put 0.5, if you put 1.69. When beta is around 1.5, I
will have 1.39, I am making 2 1.29 then I am making it 4. If we see that, as beta is increasing
the effect of having redundancy may not be that much in improvement in MTTF as it was
there for the lower values of the beta. We can do this exercise you can try this exercise once
again to see that how the change of beta is giving you the improvement in MTTF, but
reliability improvement can already be, can also be seen here. And this will give you an idea
that when you improve redundancy, whether it is effective or not, that can also depend on the
value of the beta.

So, by doing this exercise, you can do more intensive exercise by taking different values of
beta for different theta’s also, theta’s will actually not change much in MTTF, that will be
similar proportion will be there. But reliability also you can look into that how reliability
ratios are being changed. So, this all you can do and or you can do much more exercise which
will be given to you in as assignment. So thank you. Next time we will continue our
discussion with normal distribution. Thank you.

247
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 13
Normal Distribution

Hello everyone. Now, we will start our lecture number 13 which is about Normal
Distribution. So, in previous lecture we discussed about Weibull distribution which is a time
dependent failure rate distribution. As we learned, when the shape parameter of the Weibull
distribution becomes greater than 3, it tends to resemble a normal distribution. The normal
distribution is an increasing failure rate distribution where the failure rate is also time-
dependent. So, today we will delve into the topic of normal distribution and discuss its
characteristics in detail.

(Refer Slide Time: 01:04)

Normal distribution as we know it is very widely used and especially in quality. So, in quality
almost all the formulas are driven by the inherent assumption that distribution is the normal
distribution. So, normal distribution whoever goes through normal statistics estimation
etcetera, they are very well versed that normal distribution is there.

So, inherent assumption is most of the time normal distribution. For a Lap T application a
normal distribution can represent the stress distribution. It can represent strength distribution,
it can represent wear distribution, it can represent the fatigue distribution and many other
cases. So, normal distribution is used in reliability theory also, may not be as widely as
weibull distribution for time to failure data.

248
But apart from time to failure, other parameters generally follow the normal distribution. The
PDF of normal distribution is bell shaped whenever someone is asked to draw a distribution,
they will default draw like this. Why? Because this is such a popular distribution, normal
distribution, which is a symmetrical like whatever is the value at the same distance, we will
have the same distance value on the left-hand side.

So, normal distribution is a symmetrical around the mean. This is the mean and this flex point
is generally mu minus sigma. Similarly, this flex point is mu plus sigma. So, the distribution
PDF Probability Density Function is given by small f(x) that is 1 upon under root 2 pi sigma
e to the power minus half t minus mu divided by sigma whole square, where t is defined for
all values from minus infinity to plus infinity.

1  t − 
2

1 − 
 
f ( x) = e 2 , −  t  
2

Generally, here, t is the random variable which can have both values, positive as well as
negative. Mean is mean of, this distribution has two parameters mu and sigma. And here by
the nature this mu is the mean and sigma is the standard division or sigma square becomes
the variance. If we want to calculate CDF, for this that is f t, then f t is minus infinity to t f x d
x.

Small f x is we have to replacing this. There is d x missing. So, this is d x. So, when we
integrate this, we will get the CDF. This if I try to draw here, if this is mean, then at mean
you will have value 0.5 and this will be something like this, you will see this more little later.
Now, this function if you want like unlike exponential distribution, Weibull distribution
which we could integrate to get the capital F t capital R t from small f x. But that is not
feasible here, we are not able to integrate this.

So, how do we get then F(t) value? To get the capital F t value there is a transformation.
What we can do that we can transform the normal distribution into a standard normal
distribution. What is the difference between the normal distribution and standard normal
distribution?

249
Standard normal distribution does not have any parameter. This is our standard normal
distribution, PDF. If you see here there is no parameter, it is directly function of Z. There is
no other parameter which is not known here. But for normal distribution, there are two
parameters mu and sigma which need to be known for normal distribution to be defined. How
can we convert this into standard normal distribution?

(Refer Slide Time: 05:06)

To convert this into standard normal distribution, we consider this as Z. We consider that Z is
equal to t minus mu divided by sigma square. Now, sorry not sigma square, this is sigma. So,
when we put Z equal to this then differentiation of this d z over or d t will be equal to 1 upon
sigma.

So, what happens that whatever this sigma which we had is removed because once we divide
this by this 1 upon sigma this will be gone, as well as t minus mu open sigma becomes Z. So,
this becomes e to the power minus half Z square into 1 upon under root 2 pi. This is because
of the transformation loss.

So, we can remember that standard normal distribution phi Z is given as 1 upon under root 2
pi e to the power minus half Z square which is not having any parameter that this is why this

250
is a standard. So, since this is a standard, we can take different values of Z and we can get the
values of small phi Z.

But our concern is not to get a small phi z, our concern is to get the capital phi Z. So, capital
phi Z is the integration of this small f x. Now, this is also not integrable directly. So,
numerical methods are used and approximation methods are used which have been used to
give the values of phi Z, capital phi Z which is the CDF.

But what happens here since this is a standard, it is not having any parameter, so for different
values of z I can get the values of capital phi Z which is I can also say this is f z, CDF. Now,
these tables if I show that there are standard tables available, so whenever we have the
statistical tables, we can get these kind of tables, like this kind of table you will see.

(Refer Slide Time: 07:14)

251
You see this is a standard normal distribution where Z values are here and these are the CDF
values. So, if I am interested that what is the value of CDF for minus 3 then minus 3.0 I will
take and this my 0.00135, 0.00135 is the value which will become. If I am interested to know
minus 3.02 then this will be my value 0.00126.

So, by having this standard table because there is no unknown, so different values of z, I can
get the values of capital phi Z. I can also do this same thing by using the Excel. So, I will be
using the Excel for this purpose in this lecture. But you can use both Excel also and if you do
not have access to the Excel, you can use the standard normal tables.

Now, here if we see that we can get the CDF. How do we get? We will replace t minus mu
divided by sigma as the Z and that will give us the value required for the PDF. And we know
PDF is unreliability if this random variable is time to failure. So, reliability is equal to 1
minus f t or I can say this is equal to 1 minus phi Z.

f (t ) f (t )
 (t ) = =
R(t ) t− 
1−  
  

Where Z is equal to t minus mu divided by sigma. Similarly, we know the failure rate
formula. Failure rate is given as small f t upon R t. So, small f t is here and R t is 1 minus phi
Z. So, we can use different formulas to get this failure rate for the normal distribution.

(Refer Slide Time: 09:11)

252
So, I have done this exercise in Excel. In Excel if you see, I have just put up all the values
here for demonstration purpose. So, I have two parameters here mu and sigma. Now, let us
see how does this change. Generally, if you look at the how this goes then if we look at it
then if we look this small f x then t is minus mu. So, here mu is the location parameter.

Because if I change the mu what will happen f x will simply shift left or right. If I increase
mu it will shift right if I decrease mu it will shift left. And sigma is the scale parameter here.
Because if I increase the if let us say I make sigma 10 times then what will happen? I need to
I will get the same value of f x if I take the t also 10 times and mu also 10 times. So, this has
the impact of the changing the scale.

253
So, if I increase this then it will stretch and if I decrease then this will be this will be your
contracted. So, that is why peakedness will be more if sigma is less and or it it will be having
higher width if sigma is more. The same thing we can see from here. Let us see, let us first
see how sigma change. Right now, I have taken sigma is equal to 0.2.

(Refer Slide Time: 10:49)

254
If I take sigma is equal to let us say 0.5. Now, let us see what will happen to small f x. If you
see small f x has become wider. Then I take this as 0.1, it will become narrower, sharp. So,
generally, we desire sigma to be lesser for any distribution. Why? Because lesser the
variability or lesser will be the uncertainty and it will be easier for us to make the decision.

More uncertainty means there is a more variability and because of more variability for it is
difficult to decide that what is the right value and we have to consider the full range of the
possibilities. So, here, same way, if I let us say, if I shift, if I make mu then it should be able
to shift it.

255
(Refer Slide Time: 11:44)

So, if I make it let us say from 1 I will make it 1.2. If I make it 1.2, see this mean is just
shifted. If you see if I make it 0.8 then it will be shifted left. So, mu acts as a location
parameter and sigma acts as a scale parameter. And if I increase the value of sigma, the width
of the distribution will be more.

256
(Refer Slide Time: 12:19)

257
If you see it here this is the Z X function which is increasing like if the whatever
combinations I have taken, it is always increasing. If I take 0.1 And if I take this as let us say
2. So, shows the either similar or increasing. It generally does not decrease. So, here as we
can see that normal distribution is symmetrical distribution around the mean and this has
these properties.

Let us take an example. For normal distribution lot of literature is available or more you want
to know about it, you can study more. Here we are not discussing much in detail, we are just
looking at it that how does it, how we can use it for the calculation purposes.

258
(Refer Slide Time: 13:30)

So, let us see if we have a, let us take one example that there is a wear out. Wear out out
means something which is let us say tire. So, tire what happens over the time they will keep
on reducing the material it will keep on wearing out. So, here oil drilling bit is there. So, like
our tools, mechanical tools which we use for drilling etcetera over the time they will lose
their material.

So, that will continuously wearing out as we use. So, let us say this wear out is normally
distributed with mean of 120 drilling hours and standard deviation is 14. So, what we can say
here that our 120 drilling hours is the mean. So, mu is equal to 120. And we have standard
division that is sigma, sigma equal to 14 hours. Now drilling occurs 12 hours each day.

So, in one day only 12 hours it works. How many days should drilling continue before
operation is stopped in order to replace the drill bit. A 95 percent reliability is desired. What
does it mean? We have to decide a interval here that if I start using the bit from time t equal
to 0 then how long should I work so that I am getting the 95 percent reliability because I am
taking 5 percent, 5 percent chance here that the drill may fail but 95 percent of the chance the
drill will not wear out during this period.

 t − 120   t0.95 − 120 


Pr  z  0.95  = 1− Φ   = 0.95
 14   14 

So, I want to know the life for which reliability is 0.95. So, I see here that reliability is equal
to 0.95 and that is equal to 1 minus phi t minus mu divided by sigma. Now, I want to know

259
what is the value of t for this. So, I can solve this repeated in a reverse manner. So, phi t is
0.95 minus mu divided by sigma is equal to 1 minus 0.95. That means my failure probability
is 0.05.

Now, here we want to know that how, this is in hours, mu is in hours, sigma is in also
operating hours. So, whatever result we will get that of 40 this will also be in hours. So, I can
do this t 0.095 minus mu divided by sigma is equal to phi inverse. Phi inverse is the inverse
distribution, inverse value of the normal distribution for 0.05. This I can get in multiple ways.

This value I will show you how we can get it. Let us say for now that this value turns out to
be minus 1.645. How do we get this value? I will show you. So, t is 0.95 minus mu divided
by sigma is this value. So, t 0.95 would be equal to, mu will go here that will be mu minus
1.645 into sigma.

( t0.95 − 120 ) = −1.645, or t = 96.97hr


0.95
14

So, this value once we supply here then this value comes out to be 96.97 hours. Now, this is
in hours. I want to convert it into the days. So, I have to divide this by 12 and that will give
me the days. So, that comes out to be approximately 8 days. Now, let us do this calculation
using either, first let us see how can we do this using the Excel sheet.

(Refer Slide Time: 17:42)

260
261
So, I have put an Excel sheet here. So, let us say mu, mu value is here 120. We have sigma or
S D, standard deviation, standard division is 14. As we discussed earlier here we want to
know the phi inverse for 0.95, 0.05. So, Z value will be equal to norm inverse, phi inverse
means norm inverse, norm inverse. We will use norm inverse, norm inverse of what
probability 0.05. And what would be the mean?

This is mean. Generally, this gives directly. If we want to solve with standard normal
distribution then we will first take the zero mean, standard normal distribution has 0 mean
and sigma s unity. If you replace mu as 0 and sigma as 1, the normal distribution will become
standard normal distribution. So, let us do this. Now, if you see that we got the same value
minus 1.645.

Now, since we have got this Z. Now, I want to calculate the t. t will be equal to as we
discussed this will be equal to Z multiplied by sigma plus mu that turns out to be 96.97 hours.
I want to get this in days. So, that will be equal to this divided by 12. That is approximately 8
days.

262
We have done this in two step by first getting into the standard normal than normal. We could
have also got it directly. How we could have got directly? We could have got it by norm
inverse. Norm inverse of 0.05 with mu this and sigma this and this will give us the value
directly 96.972 hours. This is how we can solve this problem using the Excel.

(Refer Slide Time: 20:22)

263
264
Now, if we want to solve this use, as we see, let us see, if we can use the PDF table also. If
we look at the table, our interest is to get the Z inverse for 0.05. Phi value is 0.05. Let us see
what is the phi value here for 0.05. So, let us search here, which value is coming close to
0.05. This is 0.01, this is 0.48, this is 0.46. If we see here, sorry 0.05 we required. 0.06, 0.05
is coming somewhere here. 474849, if you see this. So, what is this value here?

This value is coming as minus 1.6 and here it is 4. So, either I can say minus 1.64 or I can say
minus 1.6, somewhere in between minus 1.64 and minus 1.65. So, I can say minus 1.645
which is same minus 1.645. This value, this table gives us only the Z value. So, we get this Z
value then we can get the value of t by multiplying this Z value, Z value with sigma and
adding the mu value to this that will give us this 96.97 hours. Let us continue with our
discussion with next example.

(Refer Slide Time: 22:06)

265
Let us say we have another problem where we say that 5 percent of a certain grade of tires
wears out before 25,000 miles. That means within 25 miles we have a wear out of 5 percent
and another 5 percent of the tires exceeded 35,000 miles. So, 5 percent is below 25,000 and 5
percent is above 35,000.

Now, we want to know the tire reliability as 24,000 miles, if wear out is normally distributed.
Now, this problem if you look at it then let us first see in this presentation form that what
does it says, it says that what is the probability if we draw the f x. Then we say that for
25,000 miles the chances are 5 percent. The area under the f x curve is 5 percent.

Similarly, for above 35,000 miles, the area is 5 percent. So, since we know that this
distribution is having the symmetrical property around the mean. So, since 5 percent this side
is 25,000 and 5 percent of 35,000 is this side. So, how much will be the mean here? The
mean would be we can directly get it the mean would be mu would be equal to 25,000 that is
the middle point of this.

Because if you see here, this is exactly related same distance from here to here. So, that is the
middle point of this 25,000 plus 35,000 divided by 2. So, we get the 30,000. But we can solve
it in another ways. Let us say Z 1 corresponding to this will be equal to 25,000 minus mu
divided by sigma and this is our Z 2. Z 2 is equal to 25,000, sorry, 35,000 minus mu divided
by sigma.

And we know that probability that Z is less than equal to Z 1 is equal to 0.05 and probability
that Z is greater than Z 2 is equal to 0.05. This we can also write as 1 minus probability that

266
capital Z is less than Z 2, that is equal to 0.05. So, here by using this now or we can look at
the table directly.

Now, so, this we can see probability that Z is less than equal to Z 2 will be equal to 1 minus
0.05 that is 0.95. Now, if we look at the table, we have already seen that for 0.05, the value
was minus 1.645. Similarly, if you look at the table again for value 0.95 because of the
symmetry, this is plus 1.645. So, we know. Now, Z 1 is equal to this and Z 2 is equal to this.

So, here we can say that 25,000 minus mu divided by sigma is minus 1.645 and thirty 35,000
minus mu divided by sigma is plus 1.645. Now, if we solve this, we can get the value easily
like that is we can write it as 25,000 minus mu is equal to minus 1.645 sigma. And another
equation is 35,000 minus mu is equal to 1.645 sigma. If I subtract this first equation from
here, so then what will happen?

This will become minus, this will become plus, this will become plus. Now, what will
happen? Or I can subtract directly here also. So, if I subtract this. So, that will become 10,000
mu minus mu will be equal to 0 and this minus this will be 1.645 into 2 into sigma. So, I can
get sigma as equal to 10,000 divided by 1.645 into 2. And similarly, if I want to get the mu I
will sum it up.

So, that will become 60,000 minus 2 mu will be equal to 0. So, two mu is equal to 60,000 and
mu is equal to 60,000 divided by 2 that is 30,000. So, I am able to get mu will also I am get
able to get the sigma also. Once I get this, my parameters are now known. Since I know the
parameters mu and sigma, I can calculate any probability I want.

The probability which I want to know is the reliability for 24,000 hours. So, let us first
calculate reliability for 24,000 hours. Reliability is 1 minus CDF 1 minus CDF, phi of 24,000
hours. So, this is 24,000. So, 24,000 minus mu divided by sigma, 24,000 minus 30,000
divided by sigma 3039.5. Now, this if I look into the table, this value comes out to be minus
1.97.

(Refer Slide Time: 28:25)

267
It is minus 1.97 value in that table if you look at it minus 1.97. So, I will look into this minus
1.9 and 1 0 1 2 3 4 5 6 7, 0.02442. So, 0.02442 comes out to be the value of phi. And if I
subtract this value from 1, the value which I will get is 0.9756 approximately.

268
(Refer Slide Time: 29:00)

So, we can use table or we can use this also directly to get these values, that is equal to my
mu which I have calculated is 30,000 and standard deviation which I have got is 3039.5. So, I
can get the R 24,000. R 24,000 is equal to norm 1 minus 1 minus norm distribution. I will use
norm distribution here.

Norm distribution for x, x is 24,000, 24,000 and mean is 30,000 and sigma is this. And I want
the cumulative value. So, I will use the true here. If I do this I get the probability 0.97581
directly. I do not have to use the first standard normal and then convert into the normal. I can
directly also calculate using this Excel sheet.

269
(Refer Slide Time: 30:31)

There is an important theorem which is used with respect to normal distribution that is if we
use different distribution, different random variables and if you take summation of those
random variables as y then if these distributions they may be different distribution, they may
not follow the normal distribution.

If they follow normal distribution then for any number small n, whatever is the number, 5, 10,
this will always be the normal distribution. But central limit theorem states that irrespective
of the distributions of x1 x2 xn, if n is very very high, n is approaching infinity then
summation of these random variables, whatever random variable we get, this random variable
will follow the approximate normal distribution.

And the mean value, we have the two parameters for normal distribution, mean and standard
deviation. So, mean of this random variable is nothing but the summation of the mean of the
random variables x1 to xn and mean of this output random variable, variance of this output
random variable is nothing but the summation of variance of the individual random variable.

So, like if we use a standard, if we use x, if we let us say these are following the exponential
distribution then mean is 1 upon lambda. So, if all are same, let us say then this will become n
upon lambda. Mu will be equal to n upon lambda and variance will be equal to, because this
is 1 upon lambda square that will become n upon lambda square.

Any other distribution whatever is their mean, whatever is their variance, we put it here to
calculate the mean and variance of the resultant random variable. Now, since we know the

270
mean and variance of the resultant random variable, we know the distribution of random
variable and we can use it to get any quantity of the interest. We will stop here today and we
will continue our discussion with the next distribution would be log normal distribution.
Thank you.

271
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 14
Lognormal Distribution
Hello everyone, in previous lecture, we discussed about normal distribution. Now, we will be
discussing about lognormal distribution in this lecture. So, we have a brief discussion about
lognormal distribution here.

(Refer Slide Time: 0:42)

So, lognormal distribution is kind of special form of normal distribution we can say that, if
we say that variable X the random variable X if it is following lognormal distribution then ln
of x will follow the normal distribution. Like we said that, if X follows normal distribution,
then X minus mu divided by sigma follows the standard normal distribution. Same way here
if X follows the lognormal distribution, then ln of x will follow the normal distribution or we
can say that ln of X minus mu divided by sigma will follow the standard normal distribution.

So, we are able to do same thing what we are able to do with the normal distribution. Here
since ln of X is coming so, because of ln of X then X is changed to an ln of X. In earlier case,
we had our sigma under root 2 pi sigma here we have additional parameter t here, because of
the when we differentiate X ln of X then this comes 1 upon X or when we differentiate ln of t
this will become 1 upon t. So, this 1 upon t also comes here and regarding this exponential
part which is that is minus half of here if you see this, this is nothing but ln of t divided by t
median. This can also be written as ln of t minus ln of t median divided by S whole square.

272
This is square and this is also square. So, this will become ln of t minus ln of t median
divided by S whole square. If I want I can also write it in another form that is ln of t minus
mu dash divided by S whole square minus half. So, this can sometimes you will find that this
distribution is represented in this format that is e to the power minus half 1 upon under root 2
pi S into t. This form is also many times used, many times we use the t median directly. As
we see either we see mu dash or we say ln of t median both are same.

So, why we use T median? Because it is as we know when if I put t equal to t median here
then this will become ln of 1, ln of 1 is 0 that means the f t at 0 which is we know from
normal distribution comes out to be 0.5. So, since the value of t at which it becomes CDF
becomes 0.5 that value is of t is t median. So, that is why this is called the median value.

(Refer Slide Time: 4:13)

273
So, we can use median value directly here or we can use the mu function also there. As we
discussed earlier, I can use this part that is ln of t divided by t median whole divided by S this
I can call it as Z. Once I do this, then I can convert lognormal distribution into the standard
normal distribution. And I can use the standard normal tables to get the CDF values or vice
versa that I can get the values of Z for given CDF values. Unlike normal distribution, in
normal distribution, we had the mean as mu and standard deviation as sigma. But here this
relationship is not X minus mu this relation is ln of t minus ln of t median.

Because of that or ln of t median here we can say is the mu. So, here this mu is not the
exactly the mean here, mean comes out for this distribution the mean is given using this
formula that is t median into e to the power S square by 2. If I am using mu here, then median
will be equal to t median will be equal to exponential of mu. So, this t median, if I use this
earlier form, then this will become exponential of mu. So, I can say t median is e to the power
mu if I am using this another format here, I am representing Z is equal to ln of t minus mu
divided by S.

Similarly, variance for this is given us t median square into e to the power S square multiplied
by e to the power S square minus 1. And more value for this is like for a normal distribution
median, mode and mean all three are same, because there is no tail it is a completely it is
having all the reflective and left side right side at all property symmetric distribution. So,
because of symmetry, and there is no tail like left side right side there is no tail. Around the
mean both sides, it is having a similar shape and distance.

274
So, because of that, all mean, median, mode all are equal to mu here, but same is not true for
the lognormal distribution. If we look at this lognormal distribution a little bit more deeper,
then you see that I am seeing that here the parameter is this is coming 1 upon under root 2 pi
S t e to the power minus ln of t minus ln of t median divided by S square half as we see here.
Since this is ln, this parameter which I am taking here, this becomes ln of t divided by t
median.

1  1   t 
2

PDF: f (t ) exp  − 2  In     ,t  0
2 st  2 s   tmed  
 

So, this t medium is not the location parameter here, this t median becomes the scale
parameter here that means, if I change the t median, I have to change the t with the similar
amount like if I made the t median as 10 times, then I have to multiply t with the 10 times to
have the same values. So, here and this is divided by S. So, effectively, if I take it in ln
function, this becomes t divided by t median raised to the power S squared, this value I can
write it like this. If you see if I write it like this, then S becomes the shape parameter.

So, if I change the S, then the shape of the distribution will change, because this is becoming
a power of t. That is why the lognormal distribution also represent a family of distributions.
Because it can take different shapes unlike normal distribution, normal distribution shape is
same that is bell-shaped, bell-shaped curve, but this can have various shapes like the Weibull
distribution. So, because of that, this lognormal distribution also fits to a variety of the data.

Further, if you see that lognormal distribution is defined for values of random variable, which
are greater than equal to 0, because we know ln of negative values does not exist. So,
minimum value of X can be 0 and for that, it will be minus infinity. So, we cannot take log of
negative values. So, this distribution is not valid for the random variables which are having a
negative value. So, it is very well suited for the reliability applications. And it is very highly
used for the repair distributions repair time TTR, time to repair many times follow the
lognormal distribution.

Even time to fill your data when you try to fit with lognormal distribution many times it will
feel better and this is very well usable because it follows the time properties also that time
cannot be negative. So, here this becomes very useful distribution for the reliability studies
where maintainability and reliability are studied.

275
So, here we can use these formulas to get the interesting quantities in which we are interested
and we can also get the failure rate for this lognormal distribution using same ft upon Rt. So,
let us say if we do this by let us see how this looks like by taking the I am again use the Excel
sheet for the same thing.

(Refer Slide Time: 10:52)

276
277
So, let us look at it so, here if you see that this is t median, I am having an I am having the S
value, just give me 1 second. So, actually lognormal distribution uses mu. So, it is using that.
So, if I say this actually uses the formula that is Z is equal to ln of X minus mu divided by S.
So, this value is S and for mu I have to get the value of mu from the t median or I can get the
t median from the mu.

So, the parameter here are the mu and S not the t median and the S. So, in Excel sheet we
have to supply the value of mu and S that is why it is taken like this here. Now, let us look at
it. So, as we discussed earlier ln of t median is mu. So, we can calculate t median as the value
exponential of mu. So, here if we see that what we saw that if I get t median, t median is
exponential of mu. So, I can calculate that exponential of mu will give me the t median. This
sigma is nothing but the S I can write it S for better clarity.

Now, here I have taken few values of X and I have got the value of fx, fx is as we see that is
the lognormal distribution value of with respect to X and mu values, I have taken B 1 and S
value is given us B 2 and false means I want to take the PDF value. And if I take CDF, I will
make it true same thing I will make it true. So, I will get the CDF value and RX is nothing
but the 1 minus CDF and ZX is small fx divided by RX that I get it from here, same thing I
have done. So, let us see if I change the mu then what happens and if I change the S then
what happens? So, let us change mu.

So, with mu what we expect mu is the scale parameter. So, it will stretch or it will become if I
increase this will be stretched more. So, this same distribution, what will happen it will
stretch it will be spread over a more values. If I make it little S it will contract in a smaller

278
region you will have these values. Similarly, if I take different values of S like for S equal to
1 these values are there, If I take S equal to 0.1 then you will see that shape is changed
completely. If I make it 0.5 it gets different shape, it is kind of curved shape, if I make it or
0.9 it will be in this shape. If I make it let us say 2 that is another shape.

So, here by changing the value of S, we are able to change the, we are able to capture
different failure patterns. Here shape means, since I am changing the shape of PDF means,
PDF is telling that how failures and where the failures are more where the failures are less.
So, when we are able to change the shape that means, we are able to change the how failures
are distributed over the time. So, but in normal distribution most of the failures are distributed
in the central region, but if you have the tails like in this case, if we see here then we see that
most of the distribution are in the initial, most of the failures are in the initial region only.

And later on number of failures are reducing that is coming and failure rate is also same like
it is a decreasing failure rate. So, we can here also we can have decreasing failure rate
increasing failure rate if you see 0.5 then we have the increasing failure rate here the failure
rate is increasing here Zx is increasing here. And if you see here the density function that
means failures are more (())(16:22) failures are less that is also behavior is change. So,
because of this what is happening?

We are able to capture a variety of failure patterns and those failure patterns we can represent
by the same lognormal distribution by having the different parameter values of mu and S. So,
lognormal distribution reliability function also if you see this is reflective of how fx and Zx
will be there. And as usual reliability is a decreasing function.

279
(Refer Slide Time: 16:57)

So, in different cases like for I have taken 0.5 if I take 1 then it will be looking like this and
let us say this is 2 so, in as it will take more time. If I increase the mu it will take more time to
deteriorate. So, mu will be reflecting a if mu is high that means, it will take more time to fail
or life will be more. Similar to what we have seen in case of exponential or VFC, in
exponential 1 upon lambda does the same thing and or MTBF does the same thing. And in
case of lognormal mu will also have the same impact the scale parameters, scale parameters
will have the direct implication of on the mean. So, here we can use this.

280
(Refer Slide Time: 17:49)

Now, let us take an example here to understand that how we can calculate this? So, for
calculation purpose, let us see, we have taken an example, that fatigue wearout of a
component follows the lognormal distribution and the parameters given a t median that is
equal to 5000 hours and S is equal to 0.20. So, if we have these 2 parameters, we know full
about the lognormal distribution now. So, we can calculate everything what is MTTF?

MTTF = 5000e(0.20) = 5101hr


2
/2

 2 = 50002 e(0.20) e(0.20) − 1 = 1.0619 106


2 2

 
− = 1030hr
5000 .
tmode = (0.20)2
= 4804hr
e
 1 3000 
R(3000) = 1 − Φ  ln 
 0.2 5000 
− R(3000) = 1 − Φ(−2.55) = 0.99461

MTTF is t medium e to the power S square by 2. You can calculate this; this comes out to be
5101 hours. Similarly, we can calculate sigma square, sigma square was t median is called e
to the power S squared multiplied by e to the power S squared minus 1 we apply this we get
this value that is for sigma square.

So, standard deviation would be the square root of this that is 1030-hour. Mode value was t
median divided by e to the power S squared. So, 5000 divided by S square this case mode is
4804. Now, if I am interested to calculate the reliability for 3000 hours, so as we discuss the

281
reliability 1 minus CDF and what is CDF here? CDF is from a standard normal distribution
we can get CDF that is phi of ln of t minus ln of t median divided by S. So, S is 0.2. And this
I can also write it as phi of ln of t divided by t median whole divided by S. So, this will be ln
of 3000. Because I want to calculate for 3000 divided by mean is 5000, median is 5000. And
ln of this divided by 0.2, this value comes out to be minus 2.55.

And if I take the standard normal CDF value for this, that and if I subtract that from 1 that
turns out to be 0.99461. Now, let us say my design target for reliability is 0.95 I want to know
that how much life is there or how much design life can be offered for the reliability target of
0.95. So, what does it mean? It means my reliability at time t is 0.95 or t 0.95 is 0.95. So, I
can see that means my phi at t 0.95 is equal to 1 minus 0.95, 0.05. So, t 0.95 will be nothing
but phi inverse of 0.05, phi inverse of 0.05 I can calculate and then I can do take the inverse
from the that was minus 1.645 as we remember from it.

Now, this minus 1.645 is for Z. Now, this Z value is nothing but Z value is ln of t 0.95
divided by t median divided by S. So, my t 0.95 ln of t 0.95 divided by t median will be equal
to minus 1.645 into S from here if I want to calculate then have to take exponential here. So, t
0.95 divided by t median will be equal to e to the power minus 1.645 into S. So, t 0.95 will be
equal to t median e to the power minus 1.645 into S I know the S value that is 0.20 t median
is 5000. So, that will become 5000 e to the power minus 1.645 into 0.20 this if I solve the
same thing is coming here this comes out to be 3602 hours. Let us solve this using the Excel
sheet also.

(Refer Slide Time: 22:17)

282
So, if I solve this using Excel sheet then let us take t median, t median is given as 5000. I
have the value of S, S is given as 0.2. Now I can get the value of all the values whichever I
am interested I can calculate MTTF, MTTF is equal to t median multiply by e to the power x
square by 2. So, exponential of S square, S is 0.2 square whole divided by 2, 5101 which is
similar to what we have calculated. Then variance if I want to calculate, variance is equal to
or t median square multiply by e to the power S square exponential of square multiply by
same e to the power S square minus 1.

Once we apply, we get the 1061119 and standard deviation same value that will be equal to
square root of this power 0.5, 1030 hour and this will hour square variance will be the square
of the unit and this will be hour, this will be hour. Similarly, we can get t mode, t mode is
equal to t median divided by e to the power S square exponential of S square 4803.95 that is
4804 approximately. Then I want to calculate reliability at 3000 hours, reliability 3000 hours
I can directly do in Excel because Excel also has the lognormal distribution.

So, 1 minus lognormal distribution, lognormal distribution value for X is 3000. I want to
know this value for 3000. And what is the mu? Here, since I am taking mean as we
remember, mean is mu is ln of t median. So, I will take ln of t median, this is not t median
this is ln of t median and standard deviation is value of S. And I want to know cumulative
values I will put true; this gives me the hour 3000 if you see that same 0.99468. And I want to
know the design reliability.

So, my reliability target is let us say 0.95. So, that means I will take the norm inverse, norm
inverse sorry, log norm inverse. So, inverse what is the probability I want to calculate that is
1 minus of 0.95 because this is reliability. So, unreliability would be 1 minus of this or CDF
would be 1 minus of this and mean is as we discussed earlier mean is ln of t median and S is
0.2. Once we calculate this, I will get the I will be able to hear actually in calculation, there is
a why our calculations are not exactly matching with the Excel sheet?

Because, in these calculations, we have done the we have dropped the higher digits we have
taken like here we have taken only minus 1.64. But that value is minus 1.645 something like
that. So, once we take the accurate values, because Excel is giving the all-accurate values. So,
because of point values, there is a slight change here. But, as we see here, we are able to
gather design life here. So, we can use Excel sheet to solve our problems, and we can get all
the probabilities at sector values, which we have been using for various purposes. So, as we
can also do the same calculation by hand, we can use a calculator and then also we can do.

283
For normal and lognormal distribution, we may have to refer the normal distribution standard
normal tables, which is giving the for different values of Z, what is the cumulative value of Z.
So, if I want to know the probability, I will search on the row and column find out the value
of Z and corresponding probability value picked up from the values given. If I want to know
the Z value, then I will first search for the probability which is matching in the table. And
corresponding to that value, I will look into the row and column value and based on that, I
will find out the what is the value of Z?

Once I find out the value of Z, then if it is normal distribution, then I will multiply the sigma
and I will sum up the mu, that will give me the random variable X value. And if I am
interested in taking the value for lognormal distribution, then whatever inverse value I have
got for the Z, then Z value will be multiplied with S. And that will be then either, if I am
having t median, then what I will do? I will take the exponential of that, and then I will
multiply by the t median. If I am having mu, then I will first add the mu then take the
exponential that will give me the value of t.

So, this way, we are able to with this like most of the major distributions, there are other
distributions like gamma distribution, there are beta distribution, there is so many
distributions. But these are very commonly used distribution for the purpose of the capturing
the failure patterns or capturing or repair patterns.

And these are quite as we discussed, we can use them it is not very difficult to use, we can
use them for calculating the reliability, unreliability, failure rate, and PDF whatever value we
require. These are the four major parameters, then there is MTTF, then there is a variance. All
these things we can calculate once we know that parameters of the distribution.

So, here, we will stop discussing about these major distributions in next time, we will start
discussing about the system reliability models. System reliability models in those we will be
discussing that system is made up of components. So, if we know the reliability of
components, how can we get the reliability of the systems? So, I will stop it here and we will
continue in next lecture. Thank you.

284
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 15
System Reliability Modelling
Hello everyone. Now, we are moving to lecture number 15 this is on system reliability
modeling. In earlier lectures we discussed how component reliabilities can be represented by
the various distributions like exponential distribution, Weibull distribution, logarithmic
distribution, log normal distribution, normal distributions etcetera. We also discussed various
terms and definitions used in the reliability theory and how they are related to these
distributions.

If you know the distributions, how we are able to know the reliability parameters or various
concerns which we wanted to know the probability values. Today we will be discussing about
system reliability modeling, generally what happens that systems are generally few and used
in few numbers only. So, in that case and many times when we are designing the system, we
do not have the past history of the system.

So, in that case we try to use the historical data from the component reliability values, this
component reliability values if we know then we can evaluate the system reliability. So, at
the time of design stage or further when we are launching the product, we are able to know
how much is the system going to be reliable.

Similarly, when we see the deficiency in reliability for various products, we can use various
approaches like redundancies so that we are able to achieve better reliability even though
component reliability is not so good or subsystem reliability are not so good or system
reliability are not good, not so good. So, today we will be discussing that how various
components or systems when they act together in a certain manner and contribute to the
reliability how they can be model to evaluate the reliability of the system.

285
(Refer Slide Time: 2:12)

For this purpose, we will be using reliability block diagram in case reliability block diagram.
So, the reliability block diagram we use for representing the system like we generally use
flow diagram functional flow diagram we use of other kinds of system the representation the
system reliability can be represented by the relative block diagram. So, as the name suggests
it is a blocks of reliability. So, like we will be using these kinds of blocks where each block
will represent a component or subsystem reliability values.

Here generally physical diagrams are we already know like layouts and system layouts like
piping layouts, other layouts, process layouts but here we are using a logic this is more of a
logic diagram which we use in the reliability block diagram we also have functional block

286
diagrams for the system where we try to tell or explain the how the system is functioning
with respect to various components how they are contributing to the system functions.

So, in case of reliability we try to represent the component by the blocks like this is one
component block this can be another component block and functional relationships are
represented by the connecting lines. So, these connecting lines which we are using here these
connecting lines represent whether systems are required to function together or whether they
are compensating each other.

Generally, when we use a system in series like this, in that case it means that we need to have
both the components let us say C1, C2 both the components need to work for the system to
work. As we can see we can make out that for the system to work it should start and it should
end here. So, to have a start and ending here all those functional blocks should work if any
functional block is not working then our system function will not complete.

Similarly, we use the parallel configurations here these parallel configurations represent that
my system can function in two ways either component one is working then also I will be able
to start from here and I will be able to reach to my target. Similarly, if component two is
working then also I will be able to achieve my functionality. So, when we have the
redundancy then we have the parallel combinations with redundancy means it is extra.

I could do with C1 only my work was complete. I could do the function C1 only but C2 I
have kept extra for reliability purpose. So, that when C1 is not working C2 can continue to
work and my reliability or my system continues to function. So, my system does not fail
though components C1 has failed. Similarly, if C2 fails then also C1 can keep some working.
So, here one component failure it is able to tolerate and even in case of one failures the
system continues to work.

Whenever we use blocks we use we call it reliability block diagram generally line diagram if
you use like same thing I can represent through like this. So, C1, C2 and this is the starting
node and this is the terminal or ending node. So, for my system to be successful these both
C1, C2 need to work. Similarly, for parallel I can represent like this, like this. So, here the by
default I do not have to make the blocks if I am using the presentation here over the line
edges over the edges, I can get the same so both can be solved in the same way both means
almost same thing. Just it is a little bit different in representation only.

287
(Refer Slide Time: 6:17)

Let us say we want to understand what is the reliability block diagram let us see if we have a
simple voltage divider circuit like this where we have a supply voltage we have two
resistances, one is this resistance another is this, R1 is here R2 is here and we also have a fuse
we have the bulb connected across the R2. So, whatever voltage is here this will divide and
whatever voltage comes as output that will be used for the bulb.

Now, here if you look at the reliability block diagram for this system to work properly I need
all the elements of the system to work if let us say my supply fails then my system will not
work. So, supply has to work for my system to work then this resistance should not fail
resistance should be working within the specification 300 plus minus 30 ohm then only my
system work if it is violated my system will not work properly.

Similarly, then my R2 resistance has to also work if this does not work then also my system
will not work the voltage division will not take place. The fuse also need to work fuse should
also not blow up unnecessarily or when it is required it blows up then also there is a problem.
So, here the fuse reliability it also comes into the picture.

Similarly, the if the bulb itself is not working then also the circuit will fail. So, in this case for
this circuit to work all my system components supply, R1, R2, fuse, bulb need to work. So,
everything I need to work this comes into the series, so it is a series system where all these
components need to work. So, for my system to be reliable this has to be reliable, this has to
be reliable this, this, this everyone has to be reliable. So, then only I will get the system
reliability.

288
(Refer Slide Time: 8:23)

Now, let us see how do we evaluate such systems general combinations which you will be
discussing is the series system, series system will look like this where components are in
series that means all components need to work for the system to work. Parallel system is
whatever components or series of components are in parallel any one of them works the
system will work. Series parallel system we may have, series combinations of parallel or
parallel combinations of series like this or we may have like this also, have we may have
other.

So, this is series of parallel and this is parallel of series, parallel series, series parallel we have
low level versus high level redundancy where low level redundancy means that we are
applying the redundancy or we are giving more components at the lower level that means at
the component level and high level redundancy means we are giving redundancy at the higher
level that means we are using duplicate either system.

So, high level redundancy means at the system level and low level redundancy means at the
component level. k out of m system is that for a system which is consists of m sub systems a
component out of them if k systems are working, then system will function that means it can
tolerate k minus m number, m minus k number of failures.

But if failures are more than this then system will fail. Standby system like our generators or
UPS etcetera these are the system when they come into operation they remain standby they
are not used in general, but when the primary function fails, then these bring are brought into
the system and then they start functioning. Non-series parallel systems these are mostly used

289
in various networks which cannot be directly series of parallel where multiple ways of
connections are possible like if we have a connection like this.

So, in that case it is neither series neither parallel because of this element. So, this kind of
systems need to be analyzed little differently than the series parallel system. Common mode
failures are there. Common mode failures represent that there are some failures which can
happen because of common reasons. So, those systems are in redundant fashion but because
of the common reason all the systems can fail together.

Three state devices generally like we discussed resistance, so resistance can have three states
one is that it is working, another is it is failed in short mode, another can be it is failed in
open mode. So, this is having three state there can be more states but these three are the major
or potential states for the various devices. So, various devices can fail in multiple ways. So,
they may have the multiple states.

(Refer Slide Time: 11:35)

Let us see some notations which we will be using throughout, we will try to stick to this
lambda i lambda i is the failure rate of component i. So, each component i will be having a
failure rate. So, whenever we are using exponential distribution, we know it is having the
constant failure rate. So, each component failure rate we will be representing by lambda i. So,
if I say i equal to 1 this will become lambda 1 equal to 2 it will become lambda 2 like that.

Lambda s is the failure rate of the system. So, for overall system how much will be the
system failure rate that is given as a lambda s. Generally, whenever we use these terms in
terms of exponential distribution these are supposed to be irrespective of time, but many

290
times they may be function. So, we may use lambdas s t we may use lambda i t as per the
applicability.

Reliability is to be calculated for time t this definition etcetera we have already discussed R s
t is for the system, t is the mission time. So, t is the time for which we want that system
should be working and should not fail. MTTF i is the MTTF of component i, MTTF s is the
MTTF of system.

(Refer Slide Time: 12:55)

All the terms we have already discussed and we have also seen how these can be evaluated
for the component level. We will now see that if we know the component level values how do
we get the system level values. So, first system which is the very common way that is the
series system. In series system what happens that whatever we are using everything is
important, anything fails the system will fail if let us say component 2 is failed my system
will not work the I will not get the required output.

Any component fails my system will not work. So, every component has to work for series
system. So, let us say if we say that we are concerned about two events, event E1 and event
E2. Now, what happens that event E1 means that component 1 does not fail, event E2 means
component 2 does not fail. So, probability of event 1 will be reliability R1, probability of
event 2 means reliability R2.

So, R1 is the probability that component 1 does not fail R2 is the probability that component
2 does not fail. Generally, it is a function of time. So, we can say if I say in time t in time t,
then I can also write it as R1 t and R2 t. So, R1 is the reliability of component 1, R2 is the

291
relative component 2. So, I want to know the reliability of system. What is the system
reliability? System will only be reliable when component 1 does not fail and component 2
does not fail, since it is and relationship we use the intersection here.

So, I am interested in that what is the probability that neither that both event occurs that
means, event E1 also is true event E2 is also true because in that case only my system will be
reliable. So, systems reliability probability that E1 intersection E2, E1 and E2. That is if as
we discussed earlier if the events are independent then this probability is equal to probability
of E1 into probability E2 and that is equal to R1 into R2.

Rs = P ( E1  E2 ) = P ( E1 ) P ( E2 ) = R1R2

This is only applicable when components are independent in general during the reliability
evaluation in it is rarely assume that events are dependent almost all the time we assume the
independency among the events, if there is a visible or there is identifiable dependency then
only we use them as dependent events. So, another way if we see the series system is for that
for system to function both component has to function if you have n component then I can
make this like this.

So, all n components has to work that means reliability of system is multiplication of
reliability of the n components because all has to work so E1 E2 like that En. So, R1 into R2
into R3 like that we will have the reliability system has to be multiplication of individual
reliability. So, I can also write it as multi pi of i equal to 1 to n R i if t is involved, I can write
R s t.

Rs (t ) = R1 (t )  R2 (t )   Rn (t )  min R1 (t ), R2 (t ), , Rn (t )

Now, this reliability as we know reliability are the probability values so the reliability are
always somewhere between 0 to 1. Now, if you see if I multiply reliabilities then the
reliability will always become smaller reliability cannot become higher because I am going to
multiply with a value less than 1. So, anything which gets multiplied with a value between 0
to 1 it is always going to be reduced value.

So, this value if we assume then whatever is the minimum of this reliability my system
reliability is going to be lesser than that because this will be further reduced by multiplication
by the other terms. So, let us say if I am having reliability is like 0.7, 0.9, 0.95, 0.99. So, here,

292
my system reliability is going to be less than 0.7 this because I if I multiply this time, this is
0.7 into 0.9 into 0.95 into 0.99 that is definitely going to be less than 0.7.

Because of that, we can say that for a series system whatever is the minimum reliability
component that is going to decide the system reliability because system reliability is going to
be lesser than that. So, in us, in such a system, even if I am using some components, which
are very highly reliable, but if even if one component is there which is less reliable my
system reliability will fall down.

So, the weakest component will generally define the reliability of such systems. So, wherever
we find weakest components we have to improve them so that we are able to improve the
system reliability. Here if you see I have given one example here that if component will have
this 0.9, in that case, if I am using 10 components in series n is equal to 10 then my reliability
turns out to be 0.3487 that means though my component will have these 90 percent the
reliability for system is only around 35 percent.

Here I have around 65 percent chances of failure while this was all component level I having
only 10 percent chance of failure, if let us say I was using 100 components such components,
then my reliability would be of the order of 10 to the power minus 45 so low that means I can
assume that this is always going to fail almost 0. If I am going to use 1000 then minus 45. If
reliability this 0.95 then these values turned out to be 0.9 8 5 9 8 7 0.00592 like even for 100
again the reliability values have become very small this is 0.5 percent that means if you
practically see that this system is not going to work.

And this is almost 0 for 0.99 reliability that means each component has only 1 percent chance
of failure. In that case, I am having around 90 percent reliability for 10 components, and I am
having around 36 or 37 percent reliability for the 100 components. So this is just surviving it
is sometimes survive sometimes not most of the time it is failed state. And this is failed only
when I say 0.999 component to reliability than 10 such components.

When I am using in series my reliability turns out to be 0.99 that means my system is quite
reliable for 10, even for 100 component the system is around 90 percent reliable but for 1000
then this will again fall to the around 36 percent. So, as we see here that in case of series
system whenever we are using multiple components we have to make sure that our
components are very, very reliable.

293
If any few components are unreliable that will bring down our system reliability to a very low
level and the because any component fails, my system will fail. So, whatever is the reason my
system is not working. So, in such cases we have to be careful and we have to make sure that
our component reliabilities are very, very high.

(Refer Slide Time: 20:41)

Now, let us discuss that how to calculate system reliability if component reliabilities are
following the exponential distribution. So, we already know for exponential distribution R i t
is equal to e to the power minus lambda i t this we have already discussed in exponential
distribution then when we discussed so in that case, if I want to know system reliability we
know multiplication of R i t gives us the system reliability for series system. So, that says
multiply of i equal to 1 to n R i t.

n
RS (t ) = Ri (t )
i =1
n
RS (t ) = exp ( −i t )
i =1

 n 
RS (t ) = exp  − i t 
 i =1 
RS (t ) = exp ( −s t )

Now, this system reliability if I replace this formula, then this becomes R s t is equal to pi of i
equal to 1 this we have already validated this is a repetition of what we discussed earlier. So,
R s t becomes e to the power minus because multiplication exponential of each exponential

294
will lead to the, like e to the power minus lambda 1 t into e to the power minus lambda 2 t is
nothing but e to the power minus lambda 1 plus lambda 2 t.

So, exponential of summation of lambda i t. Now, this exponential of summation of lambda


we can call it as lambda s. So, R s t becomes e to the power minus lambda s t where lambda s
is nothing but the summation of individual failure rates lambda i. So, this is how we can
calculate the reliability and we know if reliability is this then MTTF for e to the power minus
lambda t is 0 to infinity dt this turned out, turned out to be 1 upon lambda this we have
already evaluated.

n
s = i = 1 + 2 + + n
i =1

1 1
MTTFs = 0 Rs (t )dt = =
s n
i
i =1

So, I am not going to repeat that this is the same. So, here I want to know the for system this
will become 1 upon lambda s and what is lambda s? Lambda s the summation of individual
lambda. So, in this case if we see, if we take the individual of failure rates lambda and if you
sum it up we get the system failure rate. And if you want to know the MTTF of the system.

That is going to 1 upon summation of failure rate let us say if we take the previous example
the 10 components are there in series then and all are same, all are having lambda equal to
lambda in that case what will be the lambda, 1 upon 10 lambda or we can say this is equal to
MTTF i divided by 10 as you see if I make it 100 this will become MTTF i divided by 100
MTTF i means the same in the component MTTF. So, as we see that more components as we
use MTTF is reduced by the same proportion.

(Refer Slide Time: 23:26)

295
Let us take one example. Let us say we have a four-component system where components are
independent and identically distributed. So, we are assuming they are independent. So, we
can use our laws and identically distributed means everyone has the same distribution and
that is also constant failure rate CFR. So, that means now the value given to me is that in that
case my system reliability R s 100 is equal is 0.95.

So, I can write it that 0.95 is equal to R s t what is my R s t? That is e to the power minus
lambda s t. And what is my lambda s that is e to power minus individual lambda into 4
because four components are there so that we can 4 lambda into t that is equal to 0.95 so I can
reverse evaluate so lambda into 4 into t will be equal to t is also given to as 100. So, I can use
lambda into 400 is equal to l n minus l n of 0.95.

RS (100) = e −100  s
= e −100(4)  = 0.95
− In0.95
= = 0.000128
400
1
MTTF = = 7812.5
0.000128

So, lambda will become minus l n of 0.95 divided by 400 so this becomes my failure rate for
each component. I want to know the MTTF for the component so MTTF is nothing but 1
upon lambda so MTTF is inverse of this which comes out to be 7812.5. So, as we see here we
are able to reverse calculate the component reliability or component MTTF or component
failure rate then we know the system reliability knowing that all components are identical.

296
(Refer Slide Time: 25:23)

RS (500) = R1 (500)  R2 (500)  R3 (500) =


0.5474
s = 1 + 2 + 3 = 1.205 10−3 / hour
1 1
MTTF S = = = 830 hours
 1.205 10−3

Let us take one more example, let us see that we have four, we have three components here,
lambda 1, lambda 2, lambda 3 their failure rates are given lambda 1, lambda 2, lambda 3 and
our mission time is 500 hours. So, I want to know the reliability, to know the reality I can do
various ways either I can calculate individual component reliability. So, I can get R1 500 that
is e to the power minus lambda 1 500.

297
Similarly, I can get R2 500, R3 500 this I will show you in Excel sheet calculations so that
you will be able to follow when you are solving how this we are solving. So, I will just put
the lambda values here 0.0. So, lambda let us say I will call it as failure rate 1 that is 0.065 e
minus 3 sorry 0.065 e minus 3 then my failure rate 2 is 0.18 e minus 3 then my failure rate 3
is 0.96 e minus 3 I have got my lambda 1, lambda 2, lambda 3.

From here I can get R1 R2 R3 if I want, R1 is equal to exponential minus lambda into, if I
want I can write 500 directly or I will put t value here and I will use it. So, I will put time t
equal to 500 here. So, my reliability 1 turns out to be 0.9680, my R2 turns out to be if I use
the same formula, but the time has to be changed, same thing I can do for third as you see I
can calculate R1 R2 R3.

Now I want to know what is my system reliability. So, system reliability as we know R s that
is equal to multiplication of so product of all the above terms by multiplying the reliability
you get the system reliability. So, my system reliability comes out to be 0.5474. If I want to
know the failure rate for system as we know that system failure rate is summation of
component failure rates. So, that is equal to sum of these failure rates, three failure rates.

So, as you know if I want I can also calculate reliability again, if I try to do this by the system
failure rate. So, that is nothing but exponential of, exponential of minus t into this summation
of failure rate as you see whether I use multiplication of reliability or whether I got it by
direct formula both reliabilities are same because both represent the same thing, either you
sum up the failure rate and then take exponential minus lambda t or you individually
calculate reliability and then multiply both will give you the same answer.

MTTF, we want to know for system, MTTF of system as we know that is equal to 1 divided
by lambda. We get this MTTF here because this is coming as the exponential scientific I want
general cycle I can use the general formula here. So, as you see this turns out to be
approximately 830 hours, same thing we have done here and you can see here so we will stop
it here today and we will continue our discussion further in next lecture. Thank you

298
Introduction to Reliability Engineering
Professor. Neeraj Kumar Goyal
Indian Institute of Technology, Kharagpur
Lecture No. 16
System Reliability Modelling (Contd.)

Hello everyone. So, welcome back to lecture number 16. We will continue our discussion from
where we left in lecture number 15. In lecture number 15, we discussed about series model and
we also discussed about the exponential distribution.

(Refer Slide Time: 00:44)

Continue our discussion. Today we will discuss about the Weibull model. That means when
distribution for component is Weibull. So, if each component is following Weibull distribution,
let us say I said lambda i or we can say Ri; Ri t is equal to e to the power minus t upon, we
generally use theta and beta. Weibull distribution two parameter Weibull distribution has theta
and beta as a parameter. So, component reliability is we can express as Ri t is equal to e to the
power minus t upon theta i, to raise to the power beta i. So, different components will have
different values of theta and different values of beta i.

n   t  i   n  t  i 
Rs (t ) = exp  −    = exp  −   
i =1   i    i =1 i  

Now, I want to know the system reliability. For system reliability, the formula is same; that is
multiplication of reliability values irrespective of the distribution. So, Rs t here is equal to

299
multiplication of i equal to 1 to n, Ri t. Ri t values, I can take from here that is exponential of
minus t upon theta raised to the power, theta i raised to the power beta i. This if we see as this we
have already discussed in case of multiple failure mode that if we multiply all this, effectively,
exponential terms will get added up. So, if I take exponential outside this, then because of this
power feature, the pie multiplication will become summation.

And this will become summation of i equal to 1 to n, t upon theta i raised to the power beta i.
This becomes my integration of failure rate for the all components at the system level. So, if I
want to know the system PDF; system PDF is nothing but the exponential dR t over dt, minus dR
t over dt. So, this if we differentiate, effectively, this exponential we know exponential of some
function of x. If we differentiate it with respect to x, I get f dash x, then e to the power of fx.
Same formula if you use, then this remains same; differentiation of this gives you the beta i
divided by theta i.

 n  t  i   n   t 
i −1

f s (t ) = exp −      i   
 i =1 i    i =1i  i  

Beta i is coming because of the differentiation x raise to the power n is equal to n x raise to
power n minus 1. So, theta is coming because it is coming as a multiplying factor to t. So, 1 upon
theta i, beta i; and this raise to the power beta i minus 1. The same thing comes here and this
value overall this gives the PDF. If system failure rate I am interested, then fs upon Rt. So, this
divided by this if you do, then exponential term will get cancelled; only this term which we have
calculated summation that will be remaining. So, effectively it is nothing but this, this is if we
differentiate this, then and take the same process if you follow; then lambda i t is nothing but the
differentiation of this exponential term.

i i −1
 n  t    n i  t  
exp −        
 i =1 i    i =1i  i   n i
i −1
f s (t )  t 
s (t ) = = =  
i =1
Rs (t )  n  t  i  i  i 
exp  −   
 i =1 i  

So, that is equal to beta i upon theta i, t upon theta i raised to the power beta i minus 1. And same
thing comes here, if we sum it up we get the system; so, system failure rate is always; for series

300
system, system failure rate is summation of individual failure rates, lambda i t. Whether it is
exponential distribution that is Weibull distribution, this will be the term. Now, here if you look
at it, if I want to know the system MTTF here; that is little tricky here. Unlike, exponential
distribution, which was easily integratable and we were able to get the system MTTF. Same is
not the case here. Here we have too many time numerically solve this.

 n  t  i 
 
MTTFS =  0 Rs (t )dt =  0 exp  −    dt
 i =1 i  

(Refer Slide Time: 05:09)

301
Let us take an example. An air conditioner consists of three sub-systems, each having Weibull
time to failure distribution with parameters as shown below. So, let us put the same thing here; I
will put same thing here; I will use text only. So, this is my system number, this is my scale
parameter, this is my beta. Now, if I want to know the reliability, let us say time is given to me.
Time is let us say I have the 10, I can have 10. Then, how much will be the reliability for this? I
know the reliability formula is Rt is equal to exponential minus of t; t divided by theta, divided
by theta.

This is my scale parameter theta, whole raised to the power beta; this becomes my reliability for
this component. Similarly, I can get the reliability for these components. As you see, I can get
the reliability for the each component. From here if I am interested to know the system reliability
Rs t, that would be nothing but the product or multiplication of above terms. Because it is a
series system; for series system, the system reliability is multiplication of component reliabilities.
So, this value turns out to be my system reliability. If I am interested in any other parameter, if I
am interested in system failure rate; now system failure rate will be a function of time. But, for a
given time, I can calculate.

For a given time, if I want to calculate I have to, I can calculate individually. Let us say if I insert
here. If I calculate individual failure rate, the failure rate is equal to as we discussed earlier that is
theta divided by beta, multiply by t. T, I will again use dollar signs; t divided by theta, whole
raised to the power beta minus 1. So, this becomes my failure rate for each sub-system. If I want
to know the failure rate for system that will be sum of these failure rate. If I want to calculate
reliability as you know for system reliability that is equal to exponential; here I have to integrate,
so that will be problematic.

I may unlike, I could get it get the reliability from the, for the exponential distribution by simply
taking exponential and multiplying by t; that is not feasible here, because here it is the function
of time t. So, I have to integrate it with respect to time t, then only I will be able to get. Because,
it is a function of t, so I cannot do it directly. If I do, I will not get the same answer. So, to
calculate reliability, I have to calculate individually and then get this; but this is giving me the
system level failure rate. This is my system failure rate for 10 days. So, here like this whatever
parameter I am interested, I can calculate and use it.

302
(Refer Slide Time: 09:47)

303
Now, let us discuss about parallel system. Before going for parallel system, let us discuss a little
bit more. We have already discussed actually, that is why I did not include it here. But, just as a
refresher that let us say two components of exponential model if we discuss about, exponential
distribution.

n
RS (t ) = 1 − (1 − R i (t ) )
i =1

MTTFS = 0 RS (t )dt


f S (t )
S (t ) =
RS (t )

Then, let us say if you are talking about two components here, then reliability will be, failure rate
will be summation of this. Now, let us go forward with the parallel system. For parallel system,
we know that this is in reliability block diagram, it is in parallel. But, actually functionally or the
in the circuit, it is not necessary they should be in parallel. Like even though, let us say if we put
two resistance in parallel.

Let us say this value is R, this value is also R. But, my reliability, my resistance requirement is R
by 2. So, in that case even if one fails, my job is not done; because in that case also, my
reliability resistance becomes R, and R is not acceptable; because in that case, my circuit will
fail. So, in that case, though functionally they are in parallel, but from reliability purpose they are
not in parallel; because for reliability purpose if one of the resistance fails, my system fails, the
system does not work. So, we do not have to look at the functional layout. We have to look from

304
the purpose that whether the systems are required to work, compulsory to work, or they can
replace the functional requirement.

When they are able to replace the functional requirement, then only we use them in the parallel.
In parallel, this component and this component will be doing generally the same function, but in
parallel; that means both are doing the same thing. But, so if one fails to do, then another will
take care of it. In this case, the reliability, system reliability if I have to calculate, I have to use
the Union Law. So, let us say if I am having two components R1 and R2 reliabilities R1 and R2.
Then, I have to get the probability; let us say as we discussed earlier, this is E1t E1, this is E1t
E2.

So, here I am interested to know the reliability that will be equal to probability of union of E1t
E1 and E2. So, that function as we know from the Union Law, this will be equal to probability of
E1 plus probability of E2 minus probability of E1 into probability of E2, given that E1 and E2
are independent. If they are not independent, then this will become E1 given E2, or multiply by
E2 like that. Now, here, this formula then we use, this becomes a trouble because it is increasing.
Now, if I am having n component, I, here the number of terms which are generated is 2 raised to
the power 2 minus 1.

If I am having n terms, then I will be having 2 to the power n minus 1, like for let us take an
example of 3 terms R equal to probability of E1 union E2 union E3. If I am going to do this, this
will be R1 plus R2 plus R3; this is R1, this is R2 and this is R1R2. Same way when we write it
here for 3 events, this will become minus combination of 2 events. So, that is R1R2 minus R1R3
minusR2R3; three C two, out of the 3 events, I choose the 2 events as a time, and that will
become minus. Here I am choosing one out of the three single combinations, here double
combinations, single (combi) all number of combinations are preceded by the plus sign
wherever, or even number terms are there that is preceded by the negative sign.

And three out of three, when we use plus R1R2R3. If you see here, this is 2 to the power 3 minus
1; 7 terms are there 1, 2, 3, 4, 5, 6, 7. If we use more, then similarly the number of terms
becomes very very high; so this formula is little large to solve. In place of solving this, we can do
this in a little easier way. And that easier way is using the reverse formula. What is the reverse
formula? This also we have discussed briefly earlier; I will again discuss it in detail here. If I

305
want to know the probability of E1 union E2; I can write it as this is 1 minus probability of E1
union E2 whole bar.

That means complimentary of E1 union E2. Now, we can use the De Morgan's theorem. De
Morgan's theorem says that E1 union E bar E2 whole bar is equal to E1 bar intersection with E2
bar; that means individual from the whole bar or complete event, I get the individual event
inversion and this sign is also inverted; from Union it becomes intersection. If it was intersection,
it would have become union. So, by changing the sign and by, the individual event become; so
this is the De Morgan's theorem we are using. Now, what happens? This becomes a simpler one;
because now I can calculate probability of E1 bar and probability of E2 bar.

And what is probability of E1 bar? That is 1 minus R1; and what is the probability of E2 bar?
That is 1 minus of R2. Similarly, if I am having 3 event, I am interested in probability of E1
union E2 union E3. Then, I can write it in same way 1 minus probability of E1 union E2 union
E3 whole bar; that if I solve again that will become 1 minus probability of E1 bar intersection E2
bar intersection E3 bar. So that same thing I can write it as 1 minus 1 minus R1 into 1 minus R2
into 1 minus R3. In general, if I am having n elements, I can simply write this formula Rs t or Rs
is equal to 1 minus multiplication of 1 minus Ri’s.

That means here essentially what we are saying 1 minus Ri is the failure probability Fi, failure
probability of component. So, system will fail. In this case, let us say we are talking about two
elements. So, any one of the elements work, system will work; so in a way we can say the system
will fail when both the elements fail. That is, E1 bar intersection E2 bar; this one. So, when both
the element fails, then only the system fails; so, we are trying. So, if n elements are there, then all
n elements should be failed, then only system will fail. So, what is 1 minus R1 that is the failure
probability of component Ri, component i.

So, how can I get the system failure probability? That is the multiplication of failure probabilities
of each component; by multiplying this, I get the unreliability for this system. So, unreliability
for the system, Fs is equal to multiplication of i equal to 1 to n, 1 minus Ri. What is 1 minus Ri?
1 minus R is actually the Fi. And if I am interested to know Rs, Rs is equal to 1 minus Fs; so that
same formula when I subtract from 1, I get the reliability. So, as we know here, we are using two
things that whenever series is there, reliability is multipliable to get the system reliability.

306
Whenever parallel system is there, then we can multiply the failure probabilities to get the
system failure probability. Because for series systems, system will be reliable; then all
components are going to work. And similarly for parallel system, the system will be unreliable
only when all components will fail. So, we use that analogy and this analogy when we use our
computation becomes simpler. And we do not have to use the formula which we were using for
inclusion-exclusion formula, where we will need to calculate so many values. So, we can
calculate the reliability by using this formula.

And once we get this reliability equation, if we do the integration from 0 to infinity, I can get the
system MTTF. And if I want to know the system failure rate, that is nothing but the fs t divided
by Rs t; and where Fs t is nothing but minus dR t over, Rs t over dt. So, same formula which we
discussed, we have to apply it here to get the values. So, in a simple way, we simply remember
that for series system, reliability gets multiplied to get the system reliability; and for parallel
system, the unreliability will get multiplied to give the system unreliability. So, for parallel
system, system unreliability decreases. For series system, system reliability decreases; because
by each multiplication, the probability will become lower.

So, in a way, we can say we were saying for series system; for series system, where reliability is
determined by the least reliable component. Similarly, for parallel system, the reliability will be
determined by the most reliable component or the least unreliable component. Because, the most,
the system reliability will be higher than the reliability of the highest component here; because
here, the when we use the formula the failure rate will failure probability is going to decrease.
Failure probability will always be lesser than that. So, here the reliability of system would be
higher than the strongest system or strongest component which we have.

307
(Refer Slide Time: 20:58)

308
309
Let us take one example, that of. A system is consists of three components in parallel
configuration. These are the failure rates, so I can put them here again. I think these are similar to
what we did earlier; I will again type it. So, FR1 is 0.065e minus 3, then FR2 is 0.18e minus 3,
and this is per hour; so, my results would be in hours, FR3 is 0.96e minus 3, and my time is 500
hours. Remember the unit should be in same; we should use the same unit. Now, if I want to
know, I can get the R1, R2, R3 like we have got earlier. So, R1 is equal to exponential minus
lambda t, lambda into t; because I am going to use the same t for all.

So, that is why I am going to put; I have got the R1, R2, R3. Now to solve this, what I will do? I
will take 1 minus R1, R2, R3 here; rather than reliability, I will calculate the unreliabilities here
F1, F2, F3. What is F1? F1 is 1 minus of R1; similarly, I will get. So, as we discussed earlier, for
parallel system, I have to take the multiplication of unreliabilities. In series system, I had taken
the multiplication of reliabilities; for parallel, I have to take the multiplication of unreliability;
and my reliability will turn out to be 1 minus of this. So, my system failure probability is this and
system reliability is this. Again, same thing is done here point 99895 becomes my reliability.

And how much will be the system failure rate here? System failure rate turns out to be. For
system failure rate we let us see how do we get the system failure rate. To calculate system
failure rate, let us do this calculation. For that we have to use either I can do it here, but that will
be a little difficult to understand; I will use the formula which we discussed earlier. We have the
three elements. So, that will be R1 plus R2 plus R3, minus R1R2, minus R2R3, minus R1R3,

310
plus R1R2R3; we know the R1. R1 is e to the power minus lambda 1 t, plus e to the power minus
lambda 2 t, plus e to the power minus lambda 3 t, minus e to the power minus R1R2.

So, lambda 1 plus lambda 2 into t, minus e to the power minus R2R3 means lambda 2 plus
lambda 3 t, minus e to the power minus R1R3 means lambda 1 plus lambda 3 t, plus e to the
power minus lambda 1 plus lambda 2 plus lambda 3 t. I want to know this is my Rs t; I want to
know MTTF. So, MTTF of system is integration of Rs t from 0 to infinity dt. So, this what I can
do? I can take 0 to infinity of integration of whole these terms which I have got it here and dt.
Now, integrating this with respect to time t, what do I get? We know that integration of 0 to
infinity of e to the power minus lambda t dt is equal to 1 upon lambda.

So, same thing when we integrate from here what we get? This will give 1 upon lambda 1, plus
this will give 1 upon lambda 2, this will give 1 upon lambda 3. Minus sign will come, minus of 1
upon 1 plus, 1 upon lambda 1 plus lambda 2; minus, again this is minus 1 upon lambda 2 plus
lambd3. This is again minus that is minus 1 upon lambda 1 plus lambda 3; and this is plus, plus 1
upon lambda 1 plus lambda 2 plus lambda 3; this becomes my system MTTF. And when I use
this formula, I can get the MTTF of system. I can do this in Excel sheet. So, here if I use this
formula, I have got FR1, FR2; I can take FR1 plus FR2 that will be equal to this plus this.

Similarly, FR1 plus FR3 will be equal to FR1 plus FR3, FR2 plus FR3 is equal to 2 plus 3.
Similarly, I can get FR1 plus FR2 plus FR3; that will be equal to this plus this plus this. Now, if I
want to know the MTTF, for of MTTF system in an easier way if I want to calculate, I will insert
here. I will take 1 upon FR1 that is equal to 1 divided by this; this I will take here. I am not
interested in 1 upon t; so this value I am going to move to little lower. I will delete this row.
Now, I know what is the formula? Formula is equal to sum of individual FR, 1 upon FR, minus
some of these individual or double combinations, plus sum of 1 upon FR which is coming from
combination of three.

If you see this becomes my around 16877 or 16,900 hours becomes by MTTF. So, for MTTF
calculation in parallel case what I have to do? I have to take a single combination, double
combination; depending on if number of combinations, this will be 2 to the power n minus 1 like
we have seven combinations here. So, accordingly we use and then we can get the system or
MTTF.

311
(Refer Slide Time: 29:19)

See one more example, simpler one. Let us say, if we have two components in parallel which is
following the constant failure rate; both are constant failure rate. So, we know this is Rs t will be
equal to 1 minus this, or we can say that is equal to R1 t plus R2 t, minus R1 t into R2 t. So, R1 t
is e to the power minus lambda 1 t, R2 is e to the power minus lambda 2 t, and R1R2 t, R1 t into
R2 t is e to the power minus lambda 1 plus lambda 2 t. From this, if I want to get the MTTF, I
will integrate this; 0 to infinity. The moment I do integration, this will become 1 upon lambda;
this will give 1 upon lambda 1, this will give 1 upon lambda 2. And this will become minus 1
upon lambda 1 plus lambda 2. Same thing we get it here. So, as you see here this becomes my
MTTF of system.

Now, let us see if both lambda are same. Then, this will become 1 upon lambda plus 1 upon
lambda minus 1 upon 2 lambda; or we can say 2 upon lambda, minus 1 upon 2 lambda. If I take
2 lambda, then this will become 4 minus 1; that is equal to 3 divided by 2 lambda. So, as we
discussed earlier here, 1 upon lambda is MTTF of component; that is equal to 1 upon lambda.
So, MTTF of system is equal to 1.5 because 3 by two is 1.5 into MTTF of individual component.
So, as we see that if we use two component in parallel, MTTF is not doubled; but when we use
two component in series, MTTF became half.

So, improvement is diminishing return whenever we use three components, then we will have a
different value. That will not be like three lambda or even not that value will become 3 upon

312
lambda, plus minus three cases of 1 upon 2 lambda. So, that will become 3 upon lambda minus 3
by 2 lambda, plus one case of 3 lambda. So, if I take 6 lambda is common here and that will
become 18 Minus 9 plus 2; 20 minus 9 is 11 by 6 or lambda, or 1 lambda I can write it as MTTF
of i. So, it is around it is less than 2.

So, for two component, it is became 1.5 MTTF; for three components, it is not even 2 MTTF. It
is a little bit less than the 2 MTTF. So, we will continue our discussion; we will stop it this
lecture here. And we will continue our discussion in the next lecture. Thank you.

313
Introduction to Reliability Engineering
Professor. Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 17
System Reliability Modelling (Contd.)

Hello everyone. So, now we are moving to lecture number 17. Today we will continue our
discussion about the system reliability modelling. In previous lectures, we discussed how a
series system is modelled for relative evaluation and how can we evaluate reliability for a
parallel system. In series system, all parts of the system should work for the system to work
and in parallel system, there is a redundancy, if one part does not work but another one is
working, then system will still be continue functioning. Today, we will discuss some more
configurations in which system can be there and for a liability evaluation purpose.

(Refer Slide Time: 01:07)

One is series parallel system. So, it is series combinations which are connected in parallel. If
we see here, so here we have series of 4 components here, series of 3 here, series of 5 here,
series of 2 components here, these components are connected in parallel. Now we want to
know the reliability for this. So, to solve this, first generally we solve the series components.

r1 = r11r12 r13 r14


= (0.65)(0.72)(0.77)(0.82) = 0.2955
r2 = r21r22 r23
= (0.75)(0.67)(0.95) = 0.4774
r3 = r31r32 r33 r34 r35
= (0.90)(0.77)(0.68)(0.81)(0.82) = 0.3130
r4 = r41r43 = (0.95)(0.87) = 0.8265

314
So, here let us say relative values I have here like this component has 0.65, this is 0.72, this is
0.77, this is 0.82. So, reliability as we know reliability for series system is the multiplication
of reliability values. So, this reliability r1 will be multiplication of 0.65, 0.72, 0.77 and 0.82,
this turns out to be 0.2955 as we have evaluated it here.

Similarly, r2, so we have now converted this into 1 block r1. Similarly, these 3 blocks can be
converted into a single block that is we can call it r2. So, this is r1, then r2, as we calculate
this is multiplication of (())(2:33) of 21 22 23. 21 is 0.75, 22 is 0.67 and 23 is 0.95. So, when
we multiply the velocity of these 3, I get the r2 value, r2 value comes out to be 0.4774. So, I
can say this is my r2, similarly we can calculate r3; r3 will be a combined block of this and r3
is equal to, as we have calculated similarly by multiplying, this is 0.9, this is 0.77, this is 0.68,
this is 0.81, and this is 0.82. When we multiply, our reliability turns out to be 0.3130.

Similarly, for these two components, reliability is 0.95 and 0.87, 0.95, 0.87, then we multiply
the 2, this reliability turns out to be r4 equal to 0.8265. As we see here, our system is now
converted into a parallel system of 4 components, each one is having a series of the
components and these are connected in parallel. So, now reliability if I want to calculate,
reliability will be parallel connection of this, and for parallel we know 1 minus Pi of 1 minus
ri, we have 4 components, so i equal to 1 to 4.

This when we apply same formula, we use 1 minus r1, 1 minus r2, 1 minus r3, 1 minus r4 and
we evaluate the reliability we get this final reliability. This we can, let us see and do it by the
Excel sheet.

(Refer Slide Time: 04:28)

315
If we have to do this calculation, so just for, so what is r1? r1 is equal to 0.65 multiplied by
0.72, multiplied by 0.77, multiplied by 0.82, so this becomes my r1. Now, what will be my
r2? r2 is equal to 0.75 multiplied by 0.67, multiply by 0.95. And how much is r3? r3 is equal
to 0.9 multiply by 0.77, multiplied by 0.68, multiplied by 0.81, multiply by 0.82. And r4 is
multiplication of 2 components, that is 41 and 4 2, which is 0.95 and 0.87.

Now, if this is the series, so we have taken the series, now these are in parallel. So, for
parallel what we need to do? first we will take 1 minus of r1, so we will get 1 minus r1,
similarly we will get 1 minus r2, then we will get 1 minus r3, 1 minus r4. So, this I can easily
get by 1 minus of r1, and similarly if I copy and paste, I get the 1 minus r2, 1 minus r3, and 1
minus r4. Then, what I need to do? I need to multiply them. So, I will take the product of
these terms.

So, when I take product of these terms, what I get is the system unreliability, because these,
when I have taken 1 minus r1 that probability has become unreliability. So, as we discussed
earlier for parallel system, the system will be unreliable where all parallel components are
unreliable. So, that means all becomes multiplication. So, this product when I take, it will
give me unreliability. So, this is my unreliability, and if I want to know reliability, reliability
will be 1 minus of this.

Here, as you see that, as we discussed earlier, and as you see now that for series component,
reliability gets multiplied to get the system reliability, and for parallel systems we first
convert relative values into the unreliability, like this was the reliability we converted this
into unreliability, and unreliability values are multiplied to get the system reliability,
unreliability and from this reliability we can get again subtracting from 1. So, this becomes
our system reliability.

(Refer Slide Time: 07:25)

316
Similarly, we may have a parallel series system. Parallel series systems as we see here this is
first systems components are in parallel and then they are then connected in series. So, this is
one parallel, this is another parallel, this is another parallel and these are connected in series.
Now, for parallel system reliability if you want to calculate as we discussed earlier, first we
will convert into unreliability, so this will become 0.19, and point, unreliability will be 0.17,
and reliability will be 0.22, and this is 0.21. And then we will multiply this unreliability and
what the values which we get then we will subtract from 1, this gives me the unreliability for
this block or reliability for this block as 0.9985.

Similarly, I will take 1 minus of this, multiply, then whatever comes I will subtract from 1,
this gives me 0.9973. Similarly, for this, I will take 0.02 and 0.09, when we multiply, it will
give me 0.0018, and when I subtract from 1, I will get 0.9982. And now these 3 are
connected in series, but I got r1, r 2, r3, they are connected in series. So, I will multiply this
whatever relative values I have got 0.9985 into 0.9973 into 0.9982, which gives me final
relative as 0.9939. This again, so for practice purpose, I am showing this once again using
Excel sheet.

(Refer Slide Time: 09:17)

317
So, for Excel sheet, as we discussed, our, if I am going to solve for first component, let us, I
will just first put 0.81, 0.83 or 0.78, 0.79. so, how do I get unreliability of each component
will be 1 minus of this. So, this gives me unreliability, from this I get the product term. I will
use product, all these terms. So, this gives me the unreliability of first block or 1 minus r1,
and r1 will be equal to 1 minus of this value.

Similarly, I can get the second block reliability that is 0.92 0.81 0.82, I will take 1 minus of
this and then I will take the 1 minus product of these terms and this gives me this r2, this is
my r1, and similarly I can get r3, r3 we already shown by the calculation that is this is equal
to 1 minus 0.98 up into 1 minus 0.91, this is my r3. So, my system will have ths series of r1,
r2, r3. So, RS would be equal to r1 multiply by r2 multiplied by r3 which comes out to be
0.9939. So, here, as we see that when components are in series parallel combination we can
solve them, whatever is the combination, we try to solve that according to that.

You may take some more configurations, let us say if we have a system like this where we
have 0.9 here and we have 0.8 reliability here and we have 0 point let us say 7 reliability here
and I want to know how much is the relative for this. So, we try to reduce this as we see this
is parallel combination. So, first I can reduce this, so this will become 0.9 and 0.8 into 0.8 and
0.7, so what I will do 0.2 into 0.3 1 minus that will be 1 minus 0.06 that will turn out to be
0.94. So, this block will become 0.94.

Now, I can get the reliability easily 0.9 into 0.94 that will be 0.846. Similarly, any other
combination if it is available if that is in series and parallel, we can always solve using these
methods, we try just try to solve them.

318
(Refer Slide Time: 12:53)

Generally, system may be having a different way of redundancies, like you have multiple
components with available video. So, for to get the system function you can connect them as
in parallel system.

So, if components you are putting in parallel that means, for a particular system failure
multiple components has to fail, then only system will fail. So, in that case we are doing the
lower-level redundancy, because let us say if we have a design here where we have systems
here, let us say if we talk about a general Communication System, you may have an antenna
here, you may have a receiver here, you may have a transmitter here, you may have a power
supply here.

Now, if you want redundancy there are 2 possible ways to do the redundancy that this
complete system you want to have a redundancy. So, what you can do you can use 2 such
systems separately, A R T P or another case is that you use a system where each component
is having redundancy that means both A can be twice, similarly 2 R are there in 1 system, 2 T
are there in 1 system to power system are there.

Here, this is a low-level redundancy compared to the overall scenario because here these
major systems you have duplicated. This is high level redundancy because here at the system
level itself you have duplicated. So, when we investigate the reliability for such kind of
systems, this is called Low, high-level redundancy, this is called low level redundancy.

(Refer Slide Time: 15:01)

319
Let us say, if we have the low-level redundancy the same thing, we have discussed 4 here, let
us see if there are only 2 if we talk about transceiver system, we may have a transmitter here
and receiver only, let us say we talk about only 2. Then let us say a means transmitter B
means receiver and our transmit transceiver system will work when both transmitter and
receiving is working. So, in that case, in low level redundancy what you mean?

We are connecting 2 transmitters in parallel. So, if any of the transmitter work, we will be
able to transmit similarly. So, we at the design level, in the circuit level, we will have 2
transmitter and we will have 2 receivers. Transmitter function whenever it is to be done,
either of the unit can do, similarly receiver function need to do either of the unit can do, we
can say A1 A2 B1 B2 to differentiate between them.

Another case is that we have one system where A1 B1 is there and another system we have
A2 and B2 is there, that means in this case we have one PCB where or we have a complete
system of transceiver 1 and we have another transceiver here, either of the transfer
transceiver working the system will work. Now, how this makes a difference, let us see this.

If we want to know the reliability for this. So, let us say all reliabilities are same r r r r, then
as per our principles as we discussed, here this is parallel, so this will become 1 minus 1
minus r square, similarly, this will also become 1 minus 1 minus r square, which if we solve
this will become 2r minus r square, this is also 2r minus r square. Now, these are in series, so
we have to multiply them. So, 2r minus r square has to be multiplied, so this will become our
reliability.

320
Now, here we have the high-level redundancy that means this pair should work. In this case,
this is also r this is also our this is r, so first we will they are connected in series, so reliability
will be r square, this reliability will also be r square, then 1 minus 1 minus r square, whole
Square, so this will become, just give me a second, so 1 minus r square, whole Square. This if
we know this will become r square 2r square minus r to the power 4. So, we have low level
redundancy, we have this reliability and high-level redundancy we have this reliability. So,
natural question arises which 1 is higher, whether this reliability is higher or this reliability is
higher.

(Refer Slide Time: 18:19)

( ) − ( 2r )
2
Rlow − Rhigh = 2r − r 2 2
− r4
= 2r 2 (r − 1) 2  0

Generally, as we see here, if I can take reliability of low redundancy and reliability of high
redundancy, this is my low, this is my high, if I solve this, then this will become 4 r square
plus r to the power of 4 minus 2 into 2 r cube, this is the expansion of this minus 2 r square
plus r to the power 4. This if we solve this will become 2 r square r to the power 4 plus r to
the power 4 is 2 r to the power 4 minus 4 r to the power Q.

If I take 2 r square s common, then this will become 1 minus 2r for this and plus r square, this
if we know we know this is 2 r square into 1 minus r to the power 2. Now, this value as we
see this is a square, this is also square and this is positive 2. So, this value, for any value of r
this value can never be negative value, that means, this is always going to be greater than 0

321
that means r low is always going to be higher than r high. So, in a way we can say that that
lower-level redundancy gives a better reliability compared to the high-level redundancy,
which is obvious also.

If we look at here that if this A fails and this B fails, then still my system can continue to
work with this A and this B. But here, if this A fails and this B fails, my system will not work
because this system will also feel this system will also fail. So, here we have a more
combination, here the system will work in more combination, here the system will work in
less combinations. So, there are more possibilities in which system will work, there are less
possibilities here in which system will work. So, our lower-level redundancy generally gives
better reality, but as a designer, implementing lower-level redundancy is challenging.

How the two transmitters will connect together, so that and the signal will go to the receiver
here and then how two receivers will again be able to do the same thing, that becomes
challenging from the design, while it is much easier to put two transceivers in parallel and
just see that and show that they are working. So, this is easier to design high level
redundancy, but it is giving you little lesser relative but low-level redundancy though it gives
a higher liability, but it may pose the designing challenges.

(Refer Slide Time: 21:14)

Let us say if we have the 3 level systems. So, here what happens in low level and high level,
in both cases both we have to see in both the combinations, we have two transmitter, two
receiver, same amount of components are being used but that design is little different.

322
Now, if we take the same example, let us say one electronic system has power supply
receiver and amplifier, three parts are there and their reliabilities are 0.8, 0.9 and 0.85, then
what will be the reliability for low level and high-level redundancies? So, for low level
redundancies we know that reliability has to be, for each component we have the reliability,
so we have the 3 three here, we have 3 components and both. So, that means we have the
two-level redundancy here, we have, so for amplifier we have 2 amplifiers A1, A2.

Rhigh = 1 − [1 − (0.8)(0.9)(0.85)]2
= 0.849
Rlow = 1 − (1 − 0.8) 2  [1 −
(1 − 0.9) 2  1 − (1 − 0.85) 2 
= 0.929

Then we have or we can say power supply, we can take first P1, P2, which are in PC parallel,
then we have the receiver r1 r2, both are same but two different, same make same design. So,
this is my low-level redundancy and high-level redundancy would be P1, then r1, and A1, all
are in one set and another set of P2, r2 and A2. So, here we can get the low-level redundancy,
now we know this is 0.8, both 0.8 0.8, this is 0.9, 0.9 reliability, this is 0.85 0.85 reliability.

So, unreliability, I want to calculate that reliability for this block, so that will be 0.2 into 0.2 1
minus, so that will be 0.96. Similarly, for this 0.1 0.1, so this reliability will be 1 minus 0.01,
that would be 0.99, for this 1 minus 0.15 whole square, that means 1 minus 0.0225, that will
be 757, 0.9775.

Once we multiply the 3, this, this and this, we get reliability for, sorry, we get this reliability
0.929. For this, we have to take the multiplication of 3, that is 0.8 into 0.9 into 0.85, this, once
we multiply and then we take the square of this and subtract from 1. We take subtract from 1,
then we take a square and then we subtract from 1 again, that will give this reliability for high
which is 0.849. So, we can solve this and as we have solved earlier this can also be solved in
the same way without much difficulty.

323
(Refer Slide Time: 24:50)

So, this gives us reliability for low-level redundancy, high level redundancy. There can be
other type of systems where we have multiple items available to us which is working like,
generally let us say production plant where we have m number, let us say we have 10, let us
say there, but generally we require around 7 Lids, 2 Lids and that will be enough for our 2-
meter production requirement.

So, that means if 7 lets out of the 10 systems are working, then our production requirements
are fulfilled. So, we have the redundancy here, that means if we had only 7 Lids, we, our job
would continue, but this extra 3 Lids which we have, they help us to provide the redundancy
and improve the reliability, because in case any of 1 of the Lids filled or more Lids are filled,
we are able to use them.

So, and or they are continuously used. So, the load can be transferred to those Lids. So, here
we need k of these to operate for the system to success, this same can also be there for, let us
say, if we are talking about the industry where we are having the gas supply or having the let
us say some liquid supply. So, multiple pumps may be there, and all the pumps may be
continuously working, but for meeting the critical requirement you may not require all pumps
to work, even if 1 or 2 fails, the system will continue to function or requirement will be
fulfilled.

So, in this case, we only need k systems to work. So, let us say if you talk about, we have 4
pumps let us say we have P1, P2, P3, P4 and all are same. So, if all are same, that means their
failure probabilities or reliabilities are same and we need only 3 pumps. So, in that case we
have combinations like P1, P2, P3 is working and P4 does not work, then also my system will

324
work, if P1, P2, P3 is not working, then my system will work if P1, P2 is not working P3, P4
working my system will work. If P1 does not work, but P2, P3, P4 works, then also my
system will work.

So, here if we see that we have 4 items out of the 4 if 3 works, so I have 4 C3 combinations,
in all these combinations the system will work and for each combination the reliability is
same. How much is the reliability here? that is the 3 system should work. So, r to the power q
and 1 system can, is not working that is Q. Similarly, 3 same reliability, 3 should work, 1 is
not working, 3 should work, 1 is not working, 3 should work and 1 is not working. So, 4C3 r
to the power 3 and Q, because if we sum it 4C 3 times, we get this reliability which is equal
to this.

So, similarly, but another case where system will work that is all 4 are working, P1, P2, P3,
P4, that is r to the power 4. So, in all cases that either k or more than k systems are working,
our reliability, our system will be reliable. So, we will take all such combinations where i is
equal to k to m. So, k working or more than k up to all are working, then MCI will be the
number of such combinations and each combination for i r t raised to the power i as we have
taken for 3 r to the power 3 and remaining may not be working. So, 1 minus r t raised to the
power m minus i, this then we take summation, we get the system reliability.

Once we get the system reliability, we can get other things, we want to calculate system
failure rate, we can get it by, first taking negative differentiation of this with respect to T and
then divide by the RST. And, if failure rate is constant, then MTTF is calculated as 1 upon
summation of i equal to 1 to k, 1 upon Lambda i or we can say 1 upon Lambda summation of
i equal to 1 to k because Lambda is independent of i, 1 upon i.

So, like for 4C 3 system we can say this is i equal to 3 to 4, 1 upon Lambda, 1 upon i. So, that
will be equal to 1 upon 1 by 3 plus 1 by 4 into 1 upon Lambda, this will give me the MTTF.
For non-identical units, we may have to either evaluate combinations, we have to make all
the combinations, then all combinations need to be evaluated or we will discuss little later in
next week about the Markov model. So, Markov models can also be used for the
determination of reliability for such systems. So, here for identical units this follows the
binomial distribution or…

(Refer Slide Time: 30:13)

325
( )
R = exp −0.88 10 −3 500 = 0.6440
4
Rs =  4Ci 0.6440i (1 −
i =3
.
0.6440) 4−i = 0.5524
1 4 1
MTTFs =  = 663hrs.
0.88 10 −3 i =3 i
So, here let us take one example that a system is consists of 4 units, if any 3 units are
operational like we discussed earlier, if 3 are working our system is considered to be
working. So, 3 working, 4 working both are the working scenario. The failure rate of the each
unit is 0.88 into 10 to the power minus 3. So, reliability will become e to the power minus,
this minus sign should also come minus of Lambda into T, T is 500, the reliability turns out
to be 0.66440 for each unit, so this becomes my r. Now, for system reliability I have 4C 3 r
cube Q plus 4 C4 to the power 4. So, 4 into 0.6440 Q and 1 minus 0.6440 plus 4C 4 is 1
multiplied by 0.6440 raised to the power of 4.

Once we calculate this, we get this system reliability, that is 0.5524, and as we discussed
earlier system MTTF can be calculated as 1 upon Lambda, 1 divided by Lambda multiplied
by 1 by 3 plus 1 by 4, this comes out to be 663 hours.

326
(Refer Slide Time: 31:42)

We can do this quickly in Excel sheet, just to explain you. So, we have, this is equal to
exponential minus Lambda T, reliability is exponential minus Lambda T, Lambda is 0.88 e
minus 3 into T, T is 500, this becomes my per unit reliability. Now, I want to calculate the
first that is 4 into 4C 34 that is 4 into reliability raised to the power 3 multiply by
unreliability, that is 1 minus of reliability raised to the power 1. This becomes my first
combination.

Second combination is this power 4 and total relative will become that that means either 3
working or 4 working, that becomes my system relative reliability. And MTTF is equal to 1
divided by Lambda, Lambda is 0.88 e minus 3, multiply by 1 by 3, plus 1 by 4, you get this.
So, thank you for listening, we will continue our discussion on system reliability, other
configurations will discuss and we will stop it here for the today. Thank you.

327
Introduction to Reliability Engineering
Professor. Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 18
System Reliability Modelling (Contd.)

Hello everyone. So, we will continue our discussion on system reliability modelling. We have
discussed about series, configuration, parallel configuration, parallel series, series parallel, we
also discussed k out of m models. So, now we will discuss, today we will discuss further or
more configurations, which is little bit complex configuration.

(Refer Slide Time: 00:51)

So, like if we look at this kind of system, this can be a network here, communication network
also called a bridge network. So, here we look at it here this component is not series parallel,
like if I want to reduce this using series parallel configuration, I am not able to do, that that E
is not in parallel with B. So, neither it is in series with B, neither B is C Series in with B
because there is another connection coming up.

So, because of this configuration, the series parallel reduction methods which we discuss in
which we try to reduce each series element of parallel series, parallel element by step-by-step
manner is not possible here. So, to solve these kinds of problems, we have to use different
methods.

328
(Refer Slide Time: 01:44)

First method, let us discuss is the decomposition method. Decomposition method which is
also the conditional probability approach. It says that I can evaluate reliability for this by
performing a decomposition, that means, I can make this network into two conditions. So, I
can take any element of this and I can provide the two conditions; one condition is calculating
the reliability, then a component X is good, multiply the reliability of the component, that X
is good. So, this is as we discussed that is total probability theorem.

We know that total probability theorem that probability of A is equal to probability of A


given, let us say X1 multiply by probability of X1 plus probability of A given X1 bar
multiply by probability of X1 bar. So, here what is X1 let us say the component is working.
So, here generally we can choose any element, this is applicable to any element, but as we see
this element is the one which is because of, which we are having trouble, because of which
we are not able to convert it into series and parallel. So, we will apply this theorem to this
component, component A.

We will say that we want to know the reliability, so reliability of system, so I want to know
reliability of system given X1 is working. If X1 is working, that means probability that X1 is
working is reliability of X1, I can say R1, plus reliability of system, if X1 is not working
multiply by probability that X1 is not working is nothing but unreliability Q of X1. So, here
we are able to calculate system reliability in two parts.

329
First, we calculate reliability that component is working condition, that means it is perfectly
reliable, it cannot fail and multiply by the reliability of the component. So, this is R small RE
another case is that component is not working that means component is in field condition and
multiply by the probability of failure. So, if we say component is working, component
working means this is perfectly reliable, reliability is 1, reliability is 1 that means it is always
working. So, there is no block here, this is the sorting link. So, this component is converted
into a sorting link in the diagram.

Another case is when E is not working, E is not working means it is not at all available for
the working. So, this is not existing, so this is removed from the system, so this becomes
open. This circuit when we solve or this reliability block diagram, when we solve, we will get
the reliability of the system, then component E is working, and this is giving the reliability,
this is giving the reliability while E is not working.

So, how can we get the reliability is working? This now this is solved, so we can easily solve
what will be the reliability for this that means we can say this is A and B in parallel and again
C and D in parallel. So, we can get reliability easily here, that is 1 minus, 1 minus Ra, 1
minus Rb, this is in series with this, that is 1 minus 1 minus Rc and 1 minus rd, this gives me
the reliability for this.

Similarly, if I want to calculate reliability for this, then this is series, so Ra Rc and this is
series Rb Rd, both are in parallel, so that will be equal to 1 minus Ra Rc into 1 minus Rb Rd.
So, this gives me this reliability, and this gives me this reliability, whatever I get I will
multiply this with Re, I will multiply this with QE, and then take the summation of both, and
that will give me the system reliability.

So, my system reliability here will be equal to this that is Re into 1 minus Ra, 1 minus Rb
into 1 minus 1 minus Rc 1 minus Rd plus 1 minus Re, Q is 1 minus Re, into 1 minus 1 minus
Ra Rc into 1 minus Rb Rd. This if we solve, I get the system reliability.

330
(Refer Slide Time: 07:18)

RS = (1 − QA QB )(1 − QCQD ) RE + (1 − (1 − RA RC )(1 − RB RD ) ) QE


= RA RC + RB RD + RA RD RE + RB RC RE − RA RB RC RD
.
− RA RC RD RE − RA RB RC RE − RB RC RD RE − RA RB RD RE
+2 RA RB RC RD RE

The same thing is shown here, when we solve, then, and if you open this now and if we
assume that all are equal to R, then R, when we solve this then this actually becomes Ra plus
Rb minus Ra Rb, and this will become, second term will become Rc plus Rd minus Rc Rd or
in a way if all are equal we can say this is 2R minus R square and multiply by again whole
square, this will be as we discussed this is R square R square, this is 1 minus or we can say
enough way we can say 1 minus Ra Rc, so 1 minus 1 minus R square whole square. So, that
will be 1 minus 1 minus R to the power 4 plus 2 R square.

So, R 2 R square minus R to the power 4 and this is 2R minus R square whole square, and
when we solve them, some, both when we put summation of both like this will become 4 R
square plus R to the power 4 minus 4R to the power 3. So, if we sum it up, we will become R
square 4 here 2 here 6 R square, R4 will get cancelled minus, I have to multiply here with
another R and here I have to multiply with 1 minus R. So, when we solve this, we will get
these values, we will get the same answers or let me do this here.

4R square minus 4 2R minus R square and whole square multiply by R this will give me 4 R
square plus R to the power of 4 minus 4 R Cube into R this will give me 4 R cube plus R to
the power of 5 minus 4R to the power 4. Similarly, second one, when I solve this portion that

331
is 1 minus R into 2 R square minus R to the power 4, this will be equal to R to the power 4 2
R square, sorry, 2 R square minus R to the power 4 minus 2 R cube Plus R to the power 5.

Once we sum it up both, what we will get? we let us go with the small 2 high. So, R square
term if you see, only 1, that is 2 R square, then R cube term if you look 4 R cube here minus
2 R cube here, so that will become 2 R cube. Now, if you look at the R to the power 4 term
minus 4 here minus 1 here, so minus 5.

Then R to the power 5 term, only 1 1 here, that is 2R to the power 5, same term we are
getting here. So, same Solutions when we apply, we are able to if let us say each component
left is 0.99, I can get the system to actually using this same formula. This problem, let us say
we have another network which is more complex. Let us say if we have another system one
here like this then we have another system here.

(Refer Slide Time: 11:35)

Let us make another system here like we have made one system like this, now this system can
be, there may be another system like this, if you see many complexities may be higher, now
here when we are applying then with this when I solve again this component will become will
be there which is non series parallel. So, this component solution will again require me to
solve using the decomposition theorem. So, here if I take these two components, I will
actually have 4 combinations.

So, similarly if I have more complex Network where more such elements are there, number
of combinations will multiply every time by 2. If I have 3 such components it, will become 8
combinations. So, as we see that more complexity and more irreducibility is there due to the

332
non-parallel series combinations, the combinations will rise and because of that it will
become difficult and time consuming and to solve the problem using this decomposition
theorem. However, this can be solved using the decomposition term, only thing is that we
may require more time and effort.

(Refer Slide Time: 13:05)

To solve these problems, another method which can be used is the path set method. So, path
set is a set of components which if they are working like here A and C, if they work my
system will work, I will be able to transfer the signal from here to here. So, in RBD also this
is going to be working. So, or if my B and E D is working, then also my system will work.
So, these we have to make the combination of the components when they work, then the
system is working.

So, I can have multiple combinations here like A and C is working, my system will work, B
and D is working my system will work. Another path can be from A we have E and then from
we have D, that means A is working, E is working, and D is working, A is working, E is
working, D is working, then also my system will work or B is working, E is working, and C
is working, then my system will also about B E D, there is no other combination here.

Though I can say that if ABE, A is working, B is working, E is working, C is working,


ABEC, then also system is going to work, but this is a path, this is also called a path, but this
is not a minimal path, because even if B E and C is working that is sufficient for the system
to work.

333
So, a minimal path set is the one for which if I take a subset of any subset of that will not
become a path. Like, BEC is a minimal path, because if B does not work, then only E and C
cannot ensure, like if only E and C Works B does not work, then I cannot ensure that my
system will be working. But, if B E and C, all 3 are working, then I will be sure that my
system is going to work. So, subset of this is not possible to make sure that my system is
going to work.

So, this becomes my path now any of the path is working my system will work if
combination of A and C works, my system will work. Combination of B and T work, my
system will work. A E and D work, B E and C work, my system will work. So, now I have
the 4 components, my system reliability is that reliability that T1 is working or T2 is working
or T3 is working or T4, any path working my system will be reliable.

To solve this, I can get it by reliability of T1. As we know, we can use the inclusion exclusion
formula as we discussed earlier, that is individual element I will take the individual
reliabilities, I will sum it up, plus R T3 plus R T4. Then second order combinations will be
subtracted minus RT1 intersection T2 minus R T1 intersection T3 minus reliability of or
probability we can say of T1 intersection T4, this is 4, 4C2 means 4 into 3 by 2, 6
combinations will be there.

Then minus R or T2 T3 minus R T2 T4 minus R T3 T4, then I will take the 3 level
combination there will be plus. So, 4 C 3 means 4 combinations will be there that is R T1
working T2 working and T3 working, plus R T1 working T2 working and T4 working plus R
T1 working T3 working T for working plus R T2 working T3 working T4 working, we have
4 combinations here then I will take the negative of 5 or fourth level combination, like even
combinations are negative and odd combinations are positive, that is R of T1 intersection T2
intersection T3 intersection T4.

Now what is my T1? T1 is A and C and what is relative value when I am calculating
individually? R of T1 means Ra Rc, R of T 1 and R of T2 will be equal to Rb Rd, this is Ra
Re Rd, this is Rb Re RC. But when I take combination when I take combination that is T1
and T2 are T1 intersection T2. Now, T1 is A and C, T 2 is B and D. So, that will be equal to
Ra RC Rb RD when I say R of T1 T3, then if we say Ra Rc and we have 3 here we see that
Ra is coming twice.

So, when it is coming Ra intersection with Ra as we know from the atom potential Ra
intersection with Ra will give only Ra it will not become Ra square. So, that will become Ra

334
into Rc. Now, this Ra will not be becoming Square, it will become Re Rd. So, same way, we
have to solve all these combinations and we will be getting the probabilities.

(Refer Slide Time: 18:40)

RA = RB = RC = RD = RE = R,
RS = 2 R 2 + 2 R 3 − 5 R 4 + 2 R 5
R = 0.99
Rc = 0.99979805

So, here we have shown here this formula T1 T2 gives this T1 T3 gives this, T1 T4 T2 T3,
T3 T2 T4 T3 T4 we get all these combinations and T1 T2 T3 which is in the all these cases
T1 T2 T3 T1 T2 T4 T1 T3 T4, whenever we combine 3 path sets, we always get all elements

335
all 5 elements need to work. Similarly, whenever we take all the parts set then also it
becomes same.

Now, when we apply the formula, if we assume all are equal, then if and we apply the
formula, then our reliability of the system turned out to be, like here if we see this is Ra
square this is Ra square this is R cube this is R cube, we get 2 Ra square plus 2 R cube here.
Now, these terms will be negative, this is R to the power 4, 1 2 3 4 5 times and RT to the
power 5 will be coming 1 time. So, minus 5 R to the power 4 minus R to the power 5,
because of this.

Now, these 4 combinations will be summed up and in every case the probabilities R to the
power 5, Plus 4 times R to the power 5 and 1 combination for 5 is subtracted R to the power
5, this turns out to be 2 Ra square plus 2 R cube, then R to the power 3 is there R to the power
4 is only 1 term minus 5 R to the power 4 and R to the power 5 if we see 5, 3 terms are there
2 negative and 4 positive. So, that will become 2R to the power 5. Same formula we are
getting here and once we apply efficient reliability is 0.99 for each component, then my
system relative will be nothing but applying this formula, we will get this value.

So, as we see here using path sets, we can invert the reliability, but here when we are
calculating the reliability, it is a little lengthy process, because if we see here, we have to find
the reliability of each combination and these number of combinations as we see here, if we
have 4 paths, combination is 2 to the power 4 minus 1, that means 15. We have 15
combination here, like 4 here 5 6 7 8 9 10 11 12 13 14 and 15. So, we have to evaluate 15
combinations reliability, and then we have to apply this plus minus formula to get the system
reliability.

There are other methods, a lot of research has been done and which gives other methods,
which helps to calculate this reliability shorter formula versions. That is out of the scope for
this lecture, we will not be covering that but you can study more in the literature and you will
you will be able to find those methods.

(Refer Slide Time: 21:47)

336
Similar to path set method, we have the cut set method. In path set method, we are trying to
find the ways the system will work that means, when a set of components are working the
system will work. A \ cut set is a opposite concept; in a cut set concept we try to find out the
set of elements when they fail, the system will fail.

(Refer Slide Time: 22:10)

So, like if we see here, in the same diagram, same RBD, relative block diagram, what are the
cases where we see that system will not work? Generally, it is called cut set because it is kind
of cut, it cuts the system into two parts. So, if element A and B fails together, then in under
no other whether it is C E and D is working or not working makes no difference, my system
will not work.

337
Same way, as we discussed earlier, for path set, if a and C works, then whether E B and D
working or not working makes no difference my system will work because that is a path set.
Similarly, if A and B fails, my system is going to fail irrespective of the status of C D and E.
Another possibility is my C and D fails, then also my system will fill ice my system will not
work; I will not be able to transfer the signal from here to here.

Another case is that my components A E and D fails, because that is also bisecting if these 3
components fail, then also my system will not work alone B and C Works my system will not
work. Similarly, if B E and C fails, then also my system will fail. So, I have 4 cut sets here, 4
ways, I can cut it and 4 ways, 4 (com), 4 sets of the components will be there which will be
giving me the minimal cut set.

Similar to minimal part set concept, we have the minimal cut set concept. In minimal cut set
that means, we have A B as the minimal cut set ABC is a cut set but it is not minimal because
A B alone is enough to make the system failure. So, if C fails along with that, the system will
be in failed condition, but subset of ABC that is A and B is enough for the system to fail,
since this is enough to system to fail this becomes minimum, like if I want to, if I take only
A, my system will not be failed this is not sufficient condition for the system to fail.

If I take only B failure then also my system will keep on working a further element are
working. So, that is also not sufficient condition to failure. This in RBD can be represented
like this, like A B fails CD fails, A D E fails BCE fails. So, to calculate this we can calculate
the reliability for this to calculate the relative for this we first calculate the unreliability for
this.

(Refer Slide Time: 24:49)

338
We have AB CD AD. So, our relation becomes same my system will be unreliable if any one
of the cut is unreliable. We have 4 cut sets here, if any of the cut set is unreliable any of the
cut set is resulting in failure my system will be in failed state.

So, similar concept will be applicable as we discussed earlier, this will generate 15
combinations that is probability failure, probability of cut set 1 plus failure probability of cut
set 2 plus QC 3 plus QC 4 minus Q, I will write C1 C2, Q C1 C3, Q C1 C4, Q C2 C3, Q C2
C4, Q C3 C4, then plus terms of 3 combinations 4 terms will be there, Q of C1 C2 C3 plus Q
of C1 C2 C4 plus Q of C1 C3 C4 plus Q of C2 C3 C4 and minus combination of 5, 4 terms
that is Q of C1 C2 C3 C4. So, this formula will give us the unreliability failure probability of
system I want to know the reliability. So, gravity will be 1 minus QS. So, whatever value I
get here I will subtract from 1, I will be able to get the envelope T.

So, if what will be the let us say the reliability of A is Ra or R and let us say then 1 minus Ra
will be unreliability let us say that is Q. So, probability of A B will be QA QB, this is QC
QD, this is QA QE QD, this is QB QE QC. Similarly, we calculate the like we solve for path
set, we can solve for cut set also, we can get the C1 C2; C1 C2 is QA QB QC QD QC, 1 C3
is QA QB QA is again coming, we will not count it. So, QA QB QE QD, C1 C4 is QA QB
QB is not counted again QE QC C2 C3 means QC QD into QA QE QD is again coming. So,
we will not count it again because it is the intersection same term will not repeat it then we
have C2 C4. So, QC QD QC is again coming QB QE QC QD QB QE, C3 C4 QA QE QD QB
QC because Q is duplicated again.

(Refer Slide Time: 28:19)

339
So, this way we get this we take this we get these values and similarly for third level also,
third level all 5 terms will come, as we have seen for C1 C2 C3 QA QB QC QD, then QA
will not repeat but QE will come from here, that will become QA QB QC QD QE, same way
when we take other 3 combinations always, we will have the all 5 terms in this case, that is
QA QB QC QD QE and reliability of system will be summation of all this.

If we assume, the similar expression will come what has come for the reliability, but the
difference is here we have unreliability system and this is unreliability of each element and if
we, unreliability of each element let us say reliability was 0.99 as we have taken earlier, then
unreliability of each element will be point 01 and reliability system will be calculated using
this formula which we have taken here and reliability of the system will be 1 minus of this
which turns out to be this which is same as what we have evaluated earlier.

So, we stop it here and we will continue our discussion for more common configurations for
system relative evaluation. Thank you.

340
Introduction to Reliability Engineering
Professor. Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 19
System Reliability Modelling (Contd.)

Hello everyone, we will continue our discussion on system reliability modelling. We started
discussion with the series, then parallel, then k out of m, parallel series, series parallel. We
also discussed last time that if we have the complex system which cannot be reduced directly
to series parallel network, then how can we use various methods like disjoint theorem or we
can use the cut set method or path set method to solve them. We will continue our discussion
to some more system.

(Refer Slide Time: 01:02)

Here, we are discussing about the common mode failures. Sometimes what happens that
when we consider that see components are in series of parallel, the inherent assumption as we
discussed initially in RBD that components are independent. That means, one failure, one
component failure does not affect the failure of another component. But sometimes this
redundancy which we are building in the system may not exactly be holding good. The
reason being the common mode failures. There are some failure modes which can cause the
failure of such components together.

So, this commonness can be due to the same environment. Let us say something goes wrong
like some electric pulse comes; that electric pulse can cause failure of if you have let us say if
you are talking about a room where we have the multiple light bulbs. So, but that same

341
electrical pulse can cause the failure of all electrical light bulbs. So, we will have the dark.
Though there are let us say 4 bulbs, but all 4 may fail together because they have the similar
electrical circuit and they are made of the same material, same design. So, they are
susceptible to the same pulse.

So, because of the same design, same environment, same people who are operating,
maintenance errors, like many times what happens, someone who is maintaining the system
or operating the system, if they know, they have learned something which is wrong, which is
not the right way of doing it, but they will do the same thing for all the components which we
have put in redundancy. So, in that case what will happen, when that particular situation
arises, in that situation the way we have maintained or the way we have operated is incorrect
and that will be exposed. In that case what will happen? All elements will fail together.

So, human elements in terms of maintenance or operational, design flaws like if multiple
components which we have taken for the same material same design, so, if there is a some
weak component, then that is weak in all. So, whenever stress is coming high, that means, if
my design is like that that it cannot sustain 240 volt, then if suddenly 240 volt come, then all
bulbs will fail or all designs will fail. So, because of these similarities what happens, they
also have the common causes and these common cause result in a failure of multiple systems
together.

While during the design of the system we assumed that these systems are in redundancy that
means, if one fails my another system will work and my function will continue, but because
of common cause, all redundancies are becoming useless, all redundancies are becoming non-
effective. So, the redundant components can fail together or multiple failure can occur due to
the common causes. So, either it may not be all sometimes, it may be like a major portion of
the components fail together due to the common cause. So, with this what happens?

The failure we were assuming the independence because of which our reliability was
becoming higher. Because of parallel system our reliability becomes higher, but in this case
because of common cause, the reliability achieved will not be so higher and our reliability
will actually reduce. So, our failure probability is increased because of the multiple failures
happening together due to the common cause and this becomes a major reason for system
failure many times. So, how do we take care of this in evaluating the reliability?

342
(Refer Slide Time: 05:09)

So, let us say if we have two components, R1 and R2. Generally, these components are of
similar type. So, we have two components which can fail independently due to the reasons
and they can also fail due to the common reason. So, if we say that two components are there,
they are in redundant and they use identical components with failure rate of 0.000253. So,
that is the failure rate for this, that is the failure rate for same, for R2. So, but common mode
failure rate is also there, that is 10 to the power minus 5. Though, this failure rate is high and
this failure rate is small, but as we see, here we have the redundancy.

( )
RS (t ) = 2e −0.000253t − e −0.000506t e −0.00001t
RS (1000) = 0.95e −0.000011000 = 0.94
MTTF = 0  2e −0.000263t − e −0.000516 t  dt
2 1
MTTF = − = 5666.6
0.000263 0.000516

So, due to the redundancy the reliability for this redundant system will be 2R minus R square.
This we have evaluated many times. So, 2 into R means e to the power minus lambda t, this is
lambda into t and R square means multiplied by 2 into t, e to the power minus this multiplied
by 2 will give 0.000506. 506 t. This becomes the reliability for this and how much is the
reliability for this? Reliability for this is e to the power minus point 10 to the power minus 5
into t.

Now, let us say t is 1000 hours, if we put t equal to 1000 hours, Rs turn out to be 0.94 we
apply this. So, this reliability is 0.95 and because of this reliability is turning out to be 0.94.
So, similarly if we see here that because of the common cause, the common cause will appear

343
as the series. Though components are in parallel, the common cause because the same cause
and system can fail either due to the common cause or system can fail due to the independent
causes both components fail.

So, this reliability is calculated using this. So, common cause if we are considering, then
common cause reliability has to appear in series of these parallel components and similarly,
we can solve this. If we want, we can solve this using excel though I have not, you can take it
from here.

(Refer Slide Time: 07:46)

So, let us say if we take this here. Let us try to see how do we solve this. So, for one
component independent, I will say Ri, independent failure reliability. So, that is equal to
exponential minus 0.000253 into 1000 and common cause reliability is equal to exponential

344
minus 1 e minus 5 into 1000. Now, if I am taking parallel combination, that will be equal to 2
into R minus R square and RC is this. So, my system reliability will be equal to this
multiplied by series of this.

I get this and for MTTF calculation as we see, this lambda here if we see this will this is 2, 2
e to the power minus this multiplied by this. So, this will become addition to here. So, this
will become 0.00021 will be added here. So, 263 and similarly 1 will be added here. So, this
will become 516. And once we integrate this, then we know integration from 0 to infinity of e
to the power minus lambda t dt is 1 upon lambda. So, that will become 2 divided by this
value.

So, what I get is this is equal to 2 divided by my lambda is 0.000263 and this is negative sign
already minus, but there is no multiplication. So, 1 divided by 0.000516. I have to take a
another bracket here, this is fine. So, my MTTF turns out to be 5667 or we can say 5666.6.
So, if we are going to consider common cause we can consider the common cause in
reliability calculations like this.

(Refer Slide Time: 10:55)

Many times, we are concerned with 3 state devices. Like we have resistors, capacitors,
valves, they can fail in multiple failure modes. Like resistor if we have resistor, so, a
resistance can fail in 3 major, there can be more failure modes like, but 3 major failure modes
is like resistor, 2 major failure modes is like resistor becomes open, resistance becomes
infinite; another failure mode is resistance becomes short. So, either resistance works or
resistance fail open or fail short, this can be two major state.

345
There can be other states where resistance is becoming smaller or higher, but the major state
variations are this. So, here what will happen? When resistance fails in open or fails in short,
it will have the different impact on the system and the redundancy which we have put into the
system will have a different way of reacting. So, like diodes are there, electrical circuits, flow
valves, we have alarm system. Like for alarm system also let us say alarm system can fail in 2
failure modes - one is fail to alarm, like when let us say if we talk about the fire alarm.

So, we have the fire here, but alarm does not ring. In that case also failure is there because it
is supposed to ring in case of fire. That is one type of failure mode. Another failure mode is
alarm works, but there is no fire. So, even though fire is not there, alarm rings - false alarm
and this is alarm failure. So, in this case what will happen? Now let us assume that I have
more alarms. So, let us say rather than 1, I put the 2 alarms. If I put 2 alarms, then what will
happen? If there is a fire then detection probability by one of the alarm will be higher because
now we have 2 alarm. If one fails to detect, another one will be able to detect.

So, my failure probability of detecting fire will be reduced or my reliability to detect fire will
increase, but my probability of failure that is that without fire it is ringing will become high
because there are 2 alarms. So, now, either alarm 1 can also ring, another alarm 2 can also
ring even when there is no fire. So, my chances that fire without fire alarm will ring will also
increase. So, my redundancy is helping me to reduce one probability, but it is increasing
another probability.

Similar thing happens with the open and short as we will shortly discuss that if we have 2
components in parallel, 2 resistance in parallel then let us see if it fails short then even if this
fails short or this fail short, anyone fail short my probability will, my system circuit will
become short. So, in that case what is happening that my chances of short is increasing
because whether this fail short or this fail short, in both the cases my system will fail short,
but against opening chances my reliability is increasing, failure chances are increasing
because even if this resistance fails open, my another resistance will continue to serve the
purpose.

Similarly, if they are in series, then I am protected against the short because if one fails short,
another will be provide the resistive path, but in this case if anyone fails open either R1 or
R2, in both the cases my system will fail. So, open failure in the series, if resistance opens,
then my failure probability is increased, probability of getting that system will fail in open
mode will be higher, but probability that system will fail in short mode will become smaller.

346
For parallel configuration of resistance, my short mode failure probability is increased, but
my open mode failure probability has decreased. So, depending on the configuration because
when devices are having multiple states; the same configuration may not be able to give us
the higher reliability. We have to see in which aspect our reliability is increasing and in
which aspect we have to pay the penalty. So, here as I think this we discussed, so let us
discuss the series structure.

(Refer Slide Time: 15:43)

So, we have 2 components in series or as we discussed we have the 2 resistance in series. So,
now, we have 3 possible cases for the elements also and for the system also. So, for element
let us say element can have 3 state; either it is working or it fails. Let us say this is R1, this is
R2 or better understanding we can also take switches or we can also take valves. If you are
not having electronic background may be let us take the switches, that will be easier to
understand. Let us talk about switches.

We have 2 switches in series. Now, switches can fail in 2 failure modes, it can stuck close or
it can stuck open. That means, we can say it is failed short means it is shorted right? It is
shorted and I cannot open, it is stuck. So, fail short, that probability is Q for first switch let us
say Q1s and for second switch that is Q2s. Similarly, it may remain open that is I want to
close it, but I am not able to close it my circuit does not complete, that is Q open circuit. So,
let us say Q1O and similarly Q2O.

My system can fail in 2 failure modes and one is the success probability. So, my system can
fail in open mode that is Qo and similarly my system can, or this component together can
cause the system to be failing in short mode. Now, let us discuss here. Then my system will

347
be shorted. That means, I am not able to open the circuit. This will only happen when both
the switches fails in short mode. That means, when both becomes short and I am not able to
open any one of them I will not be able to open the circuit or I will not be able to switch off.

So, here my system will fail in short mode when both the components fail in short mode.
That means, my failure probability in short mode of the system is failure probability of
component 1 in short mode and failure probability of component 2 in short mode, this
becomes my failure probability in short mode. What will be the failure probability in open
mode? For open mode if any one of is the component is stuck, open, then another component
does not matter.

Because if one component is stuck open I will not be able to close the circuit. For closing the
circuit both should close. So, any one of them is able to cause the failure. So, that means, the
failure probability will be 1 minus 1 minus, parallel configuration here 1 minus Q1O and 1
minus Q2O, either of the of them failing in open mode will result in failure mode open for the
circuit. So, QoS becomes failure of failure in open mode of system will be 1 minus of this.
So, my reliability will be how much?

My reliability will be, as we know reliability is 1 minus Qs is minus QoS. That means,
probability that my failure probability will be summation of this, Qs will be equal to Qs,s plus
QoS, either system fails in open mode or system fails in short mode, that will be my system
failure probability. And system failure will be 1 minus Qs, that is 1 minus Qs,s minus QoS.
So, this if let us say I am having n components in series. So, Q open circuit will be equal to 1
minus multiplication I equal to 1 to n, 1 minus QiO; any one of them fails in open mode, it
will result in open mode failure.

Similarly, Qs,s short mode failure probability will be any, all of them fails in short mode, that
means, QiS, i equal to 1 to n, if I multiply with them, I will get the Qs,s right. If I subtract
them from 1, so, then reliability will be equal to 1 minus 1 plus 1 minus 1, then this will
become minus minus plus pi of i equal to 1 to n, 1 minus QiO minus this minus pi i equal to 1
to n QiS, this if we solve 1 and 1 will get cancelled and this will become multiplication of 1
minus QiO i equal to 1 to n minus multiplication of QiS same thing is written here.

So, this gives me the series structure when I am following and I if I am taking two failure
modes one success mode, that means, three state devices, then this will be giving me the
reliability of series system.

348
(Refer Slide Time: 21:15)

Similarly, if I have parallel system, for parallel system this concept will be just reversed. So,
for parallel system let us say this 1 and 2. So, parallel system will fail if any one of them fails
short. If this is shorted what will happen? My system will be short. If any one of them is
shorted, I am not able to open, then my system will short. So, that means, parallel, that
means, 1 minus 1 minus Q1O and 1 minus Q2O, this will be my QoS. But for open mode
failure if this becomes open, then it is not sufficient because, then my system will work
through through 2.

So, both has to open, then only my system will fail in open mode. So, this is s right, this is s.
So, You, this is s s, short mode failure will be equal to sorry, open mode failure will be of
system will be when all components fail in open mode. So, that will be QoS into 1 Q1O and
Q2O. So, here open mode failure is multiplication of all open mode failure probabilities and
short mode failure is 1 minus 1 minus Q1s minus 1 minus Q2s. As we discussed earlier for n
component this will become 1 minus pi i equal to 1 to n 1 minus QiS and this will become
QiO i multiplication i equal to 1 to n.

And when I subtract them from 1, so, then this will become 1 minus this, minus 1 plus 1 plus
this. So, 1 and 1 will get cancelled and this will become multiplication of this minus QiO. So,
multiplication of 1 minus QiS i equal to 1 to n minus i equal to 1 to n QiO. So, here because
of parallel structure when we use this will be giving us the system reliability.

349
(Refer Slide Time: 23:42)

Let us take one example that let us say there is a mechanical valve which fails to close; fails
to close means fails open, that it remains open, I am not able to close it 5 percent of the time.
That means, failure probability of valve is in open mode is equal to 0.05 and fails open and
fails close or fails short, that means, this is closed and I am not able to open it, that is 10
percent, point 10. Now, I want to know the reliability for 3 valves. So, I have 3 valves. First
case is, I put them in series, another case is I put them in parallel. What will be my system
reliability in both the cases?

(Refer Slide Time: 24:48)

Now, this reliability as we see, we have already developed the formula, we can use the same
formula here. When we use the same formula then my reliability will come to this value. So,

350
as we have discussed or we can do this here again. What will be my failure probability in let
us first consider the series this. So, for series my system will fail in two cases. System will be
open if any one of them in open. Any one of them in open means 1 minus 1 minus we have
the 3 components and Qo, any one of them is in open raised to the power 3.

And what is the failure probability in short mode? Short mode means all 3 has to fail in short
mode, then only my system will fail in short mode. That means, Q short raise to the power 3.
How much is this? And if I take reliability, this is my Qs. So, Rs will be equal to 1 minus of
Qs. Now, let us calculate this.

(Refer Slide Time: 26:23)

Let us try to do this in my Qo, Qo is 0.05 and Qs is equal to 0.1 for series system. For series
when I am discussing or let us say I am interested in two thing, Qs in short mode. So, when in

351
series it will fail in short mode, then only when all components are failing in short mode.
That means, Qs raised to the power 3, Qo means any one of them fails in short mode. So, that
is equal to, I can take 1 minus of this here itself, this will help me to solve problem much
easily. So, that is equal to 1 minus each one right fails in short mode raised to the power 3.
So, how much will be my reliability Rs? Rs will be equal to 1 minus this minus this. This
comes out to be my reliability in series.

Similarly, let us say parallel. For parallel again if we do same way, in parallel system as we
discussed earlier, for parallel system because elements are in parallel, any one shorts, the
system will short. So, Qs means anyone. That means, 1 minus 1 minus Qs raised to the power
3. Open mode failure means all has to open because it is in parallel. So, system will become
open only when everyone is open. So, that means, open mode failure probability raise to the
power 3.

System reliability will be equal to 1 minus neither it fails in short mode, nor it fails in open
mode. As you can see, we are able to calculate the reliability for series configuration as well
as the parallel configuration.

(Refer Slide Time: 28:54)

Similar concept is applicable when we discuss with the low level redundancy and high level
redundancy. For low level redundancy means, see, if the system requires M element like M
can be let us say if M is 4 where we were discussed earlier that we need let us say antenna,
we need a transceiver, we may need a power supply right. Let us say M equal to 3. So, what
can happen? In low level redundancies, let us say N N elements.

352
So, we have N equal to 3. We need these three systems to work. For one system if we have
one set of these three, my system will work. Now, let us say for these three I have let us say
different-different, I have 4-4 elements of this, N equal to 4. I have 4-4 of if each. Then, I can
arrange in two ways, I can have the low level redundancy, that means, all antennas are in
parallel. Then all transceivers are in parallel, then all power supplies in parallel. For high
level redundancy that means, one set of antenna transmitter transceiver and power supply is
in parallel with similar four sets.

This is high level redundancy. For low level redundancy when we are evaluating, we evaluate
the reliability in this case. So, same as we have evaluated earlier, the low level redundancy
reliability can be evaluated. Now, this low level redundancy when we see, then these are in
parallel. So, first let us look at the parallel of this N element which are same, element all
antennas are same let us say. So, any one of them works, the system will work, but we have
the three state system here.

So, if any one of them fails short, so, for QiS, if any one of them fail short, the system will
fail short. So, how much will be the short failure probability? That will be 1 minus QiS or for
one system I will be equal to 1, any one fails short, multiply I equal to 1 to the how many
such devices are there N devices are there. So, I can simply put power N 1 minus S. So, the
probability that this section will fail short is 1 minus 1 minus QiS raise to the power N Q1S.

Similarly, 1 minus 1 minus Q2S raise to the power N will be the failure probability of this in
short mode. So, now, this has become series of parallel. So, each element I am able to get the
two probabilities, this is my short failure probability for each. Similarly, I can get the open
mode failure probability for this. As we know for this system to fail in open mode all systems
need to fail in open mode. So, Q1O raise to the power N will give me the failure probability
for this block in open mode.

So, now, I am having the M such blocks ok and for each block I know the failure probability
for short mode. So, let us say QiS is equal to 1 minus 1 minus QiS raise to the power N and
QoS is equal to QiO raise to the power N. Now, I want to know the failure probability of this
whole system. So, for whole system there are M such blocks. So, which are different. So,
they may have the different, i is different. So, here now this block will fail in open mode if
any one of them fails in open mode.

That means, Qo will be equal to, each one can fail in open mode, failure probability is this.
So, 1 minus 1 minus QiO raised to the power N, this multiply I equal to 1 to N. Similarly, for

353
short mode failure, all has to fail in short mode, then only system will fail in short mode. So,
for system level Qs will be equal to multiplication i equal to 1 to N of this short mode failure,
what is the short mode failure? 1 minus 1 minus QiS raise to the power N. And how much
will be reliability of system? Reliability of system will be 1 minus Qo.

So, 1 minus 1 plus pi I equal to 1 to M 1 minus QiO raise to the power N minus this pi I equal
to 1 to N 1 to M 1 minus 1 minus QiO raise to the power N. 1 and 1 will get cancelled and if
you see this is my same formula, multiplication of 1 minus QiO raise to the power N I equal
to 1 to N, this was QiO raise to the power N. So, not this and another is 1 minus 1 minus QiO
QiS raise to the power N and multiplied i equal to 1 to N, same formula these two terms
comes here and this becomes our load.

Similar to this, we get the high level probability. Same thing happens, same way we evaluate
and we will be… So, you should try and get this value that whether you are able to get this
high level probability is the same way. We have to use the same principle here. First, let us
consider each series. So, this series system will fail if any of the component fails in open
mode then system will fail in open mode. If this series will fail in short mode if all the
components 1 to M fails in short mode right. Then again this will become parallel of each and
we will apply two step second step again. And this gives us this formula.

(Refer Slide Time: 35:16)

If we take an example here that if we have three elements here, 1 2 3 and N is equal to 2, that
means, we have low level configuration and high level configuration. So, we have 2-2
elements here for low configuration and for high level configuration we have two parallel
rows, 1 2 3, 1 2 3 and I want to calculate the reliability for this. Now, for this, we know that

354
for short mode failure probability if I want to calculate, then short mode failure probability in
this case, if any one of them fails in short mode, this will fail in short mode.

So, I can say Q1S will be equal to, either of the two fails in short mode that means, 1 minus
0.85 square. Then similarly Q2S will be Q2S, Q2S will be equal to this component 2 fails in
short mode. That means, 1 minus 0.9 square and Q3S will be equal to 1 minus 0.8 square and
my system will fails in short mode when all fails in short mode. So, let us do this in excel
quickly.

(Refer Slide Time: 36:49)

I will use the pen here. So, as we discussed here I will solve for series again and you can do it
for, for low we will do and high you try to do. So, 1, component 2 and component 3. 1 1 2 2 3
3. Now, first let us evaluate the short mode failure probability. So, for short mode failure
probability I will write down this value, 0.15, 0.05 or 0.1, 0.06, 0.2, 0.01. I will take 1 minus
of this also and same value I will calculate here. So, because these values I will need, so, that
is why I have taken it here.

( )( )(
RL = 1 − 0.052 1 − 0.062 1 − 0.012 − [1 − )
(1 − 0.15) 2  1 − (1 − 0.10) 2  1 − (1 − 0.20) 2  = 0.9748
RH = [1 − (0.15)(0.10)(0.20)]2 − [1 − (1 − 0.05)(1 −
0.06)(1 − 0.01)]2 = 0.98060

Now, I want to know the individual block like Q1 short, Q1 short means any one of them is
short. That means, this is equal to 1 minus short failure probability is this, short success
probability. Then similarly, I will get the Q2s. Q2s is equal to 1 minus short mode failure

355
square. Q3s is equal to 1 minus, third component does not fail in short mode, square. Now,
how much will be Qs here? Qs will be equal to if all of them fails in short mode, then only
component will be failing in short mode.

That will be equal to this multiplied by this multiplied by this. So, this becomes my short
mode failure probability. Similarly, Q1O come… this what we have here? This fails in open
mode. Why it will fail in open mode? If both of them fails in open mode, then only system
will fail in open mode. This first component system will fail. That means, my open mode
failure probability is square. Q2O, I will just simply take this and Q3O. But when I take Qo
when this fail in open mode?

This will fail in open mode if any one of them fails in open mode, either component 1 set,
component set of component 2, set of component 3. That means, this is equal to 1 minus 1
minus of this multiplied by 1 minus of this multiplied by 1 minus of this and how much will
be my reliability? Reliability will be equal to 1 minus of Qs minus of Qo, 0.9748. Same way
you try to solve for high configuration.

So, we will stop our discussion here and this as we see, we have been able to take various
configuration into the picture and we were able to solve them using the basic probability loss
and using those loss, we were able to get the system reliability when we know the component
of subsystem reliabilities.

So, with this we will stop for system reliability modeling. Next time, we will be discussing
about the state-based system. When we consider that system is changing from one state to
another state in that case how we can evaluate the system reliability using the Markov
models. So, thank you. Thank you very much.

356
Introduction to Reliability Engineering
Professor. Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 20
System Reliability Modelling (Contd.)

Hello everyone. So, we will now move on to our next topic of discussion that is Markov analysis.
In previous lectures, we discussed system relative modeling using relative block diagram, where
we have discussed series system, parallel system, k out of m system, etcetera. So, there we
assume that we have a system which is composed of components and component reliability if
you know we are able to know the system reliability.

Similar modeling can also be done using the Markov systems. In Markov analysis, we are trying
to do the analysis based on the system states. So, there are various possible combinations of
system states depending on the component states and generally from one state to another state
there will be a transition, there is a possibility that a system can change state from one state to
another state.

So, considering that if we are able to know the various states and we know the probability of
various states, then based on the state probabilities we can determine the reliability. Reliability is
nothing but the probability (we are the) of the states where the system is working and
unreliability is the probability that of the states where the system is not working.

So, this concept we will try to explore because the state-based system like standby system or
where the one changes or changing the system configuration and because of that change need to
be considered, in those cases relative block diagram approach may not be sufficient enough to
solve the problems. So, we will try to use them Markov analysis for those purpose.

357
(Refer Slide Time: 02:15)

So, if we look at here reliability problems are normally concerned with systems that are discrete
in space. So, system can exist in one of a number of discrete and identifiable states. So, some of
these states will be failure states and some of these states will be successes states. As we
discussed earlier, as I told earlier that system states are discrete which is a combination where it
is telling whether some systems are some components of the system are working or not working
and based on the combination we determine whether the system is working going to work in this
case or not.

Here, these system states are, (())(3:05) continuously in one of the system state. So, if system we
are considering, the system will be in any one of these states, all these states are mutually
exclusive, the system can exist in one of these state until a transition occurs. So, when something
happens, some failure happens or some system component state changes, depending on the
system state is also changing and system moves to another state which is, so, this is the
component of the system can remain in that state continuously until there is another transition
happening that means, either another failure happens or some failed component is repaired and it
is starting to working again.

The Markov approach can be used for a variety wide range of reliability problems including
systems that are non-repairable or repairable. There can be series connected, parallel connected
or can be standby. Generally, series connected parallel connect redundant systems we are easily

358
solved using the reliability block diagram approach. A Markov when we are going to use there
are some limitations and there is some, it is little bit complex than the reliability block diagram
approach.

So, we will try to use reliability block diagram approach where this reliability block diagram
approach is not feasible, there we try to go further Markov approach. In reliability block diagram
approach, the common assumption is that component failures are independent. However, we
want to capture some sort of dependency. Sometimes what can happen one component failure
can put more load on the system, so in that case what will happen? The dependency is no longer
independency is no longer there.

So, (there) these kinds of dependencies if we are able to represent, then we can use the Markov
analysis. Markov analysis tries to put the system into various states and all these states of system
will be some of the states will be marked as success some will be marked as a failure.

(Refer Slide Time: 05:21)

General assumptions when we are doing Markov analysis, so Markov analysis cannot be solved
for all types of problem, but Markov analysis is only applicable where the behavior of the system
is characterized by a lack of memory. Like same, memory less property we discussed in case of
exponential distribution that means, irrespective of the age of the component, the failure rate
remains same, failure rate do not change or the conditional probability of failure per unit time
does not change with the age.

359
Similar case is here, here when we use Markov diagram, we will be having the state diagram. So,
here it is possible that our system is in good state, it gets failed, then it get repaired and come
back to the good state again. So, there is a possibility that there may be many such transition or
the system can accumulate age.

Here, the in Markov diagram when we consider, if we consider lack of memory we consider that
transition probabilities or the state probabilities which we are getting they will not depend on the
age, they will depend on the current that where the system was in last state. So, failure state of
the systems are independent of all past states. So, whether it has been failed two times or three
times it does not make much difference failure and repair, failure and repair let us say we have
one state, here it is a component is working., then, let us say we talk about refrigerator, so
refrigerator is working here now, some failure happens it will move to the failed state, then what
will happen we can call the technician and technician will repair this. So, it will fail by the rate of
failure rate and it will repair by the repair rate.

Now, what happens the failure rate which you are denoting here or the probability of failure per
unit time which we are denoting here that is considered to be time independent. So, that is why if
there are let us say 4-5 such transition, in that case also we will assume that the age will not
impact. Like, if fridge is 10 year old, then this will this should not change the failure rate and
repairs should be seen. So, this phenomena, whenever is possible we are trying to use Markov
modeling.

So, the failure rate lambda will depend only on the previous state that is the W or this mu will
depend on this where it is starting from that is F, whether the system is in this state or not that
will only determine. So, the future random behavior of a system depends only where it is at
present and it does not depend on where it has been in past, whether it has been in various states
of which states it has traveled in past, that will not change the probability.

So, this property Markovian property has to be true, then only we can use the Markov diagram. If
that property is not true, then we have we cannot use the Markov in this form but there are other
forms of Markov like semi Markov or Markov chain etcetera, they may be used for that purpose.
The process should also be stationary that is also called homogeneous, that means the behavior
of system must be same at all points of time irrespective of the time being considered.

360
So, it should not depend on age. The probability of making transition from one given state to
another state is going to be same, this lambda is not going to be a function of time, then only we
can use this continuous Markov chain Markov analysis.

So, from these two aspects we can we show that lack of memory and being stationary, then only
we can use a Markov approach for these systems. So, generally this is applicable only when our
failure if we are talking about the failure process, then the failure process has to be Poisson. And
if you are talking about (ex) if it is failure rate, then we are seeing it is the exponential
distribution,, constant failure rate. In both the cases, in Poisson and exponential both the cases as
we discussed the failure rate is constant, so, this property has to be followed.

As we discussed as I mentioned that even if this property is not followed it does not mean that
we cannot solve it, but for that we have to use a more complex Markov approach, this approach
as we are going to discuss may not be applicable.

(Refer Slide Time: 10:10)

The systems we will be discussing for Markov analysis is, first we will discuss a two component
system. Two component systems can be in two ways either they can be in series or they can be in
parallel. So, we can discuss both the cases that if we have that two component, if we are able to
model two component system using Markov, then we will get the different states probabilities
and based on the different state probabilities depending on whether our system is configured a

361
series of parallel, we will be able to know what is the reliability for series system or parallel
system.

Then we will discuss load sharing system, load sharing systems are the one where load is shared.
So, like we have motors, we have generators or some other plant or the equipments. These
equipments, when they are all working together, then the load which is coming on this
equipment is shared load or less load. But if one of the equipment fails, then that load transferred
to the working equipment. So, because of that what happens these devices starts working on the
higher load and we know that when stress is high for any system, then failure probability of
failure rate will also be high.

So, that system which is remaining and keep on working, those components will observe a little
higher failure rate than they will usually observe when the load is shared. So, these kinds of
systems we can model using the Markov diagram. We will also discuss standby systems, standby
systems are actually like our generators or we have standby motors also we have standby pumps
also, so, they are still there they are in that system, but they are not continuously operating, they
are in the standby mode. Whenever some failure happens, then the standby unit will come into
the picture and that will replace the main unit and it will start working in that case.

Standby unit we will consider failures. There are two possibilities we will consider, failure in
standby that means, there is a possibility that the unit which we have kept in standby, when we
need to plug in that unit for our operation at that time that unit is found as the failed unit, that is
no longer working because in the standby mode itself it has failed, so, that (constitution)
condition will also can also be taken care into the Markov analysis.

There is another possibility similar like we discussed for motors or there can be generators
etcetera. So, whenever there is let us say if we talk about the general generator, then if the power
failure occurs, then generator will automatically switch and that will be coming taking care of the
power requirement. But if the switch itself fails, then what will happen, then also we will not be
able to get the power supply. So, switch failure if we consider, then how the standby system will
behave all these can be considered.

There is a possibility of degraded system consideration. Degraded system is almost similar to


load sharing system sometimes like we have let us say some system let us say we have bearings

362
which is being used in various parts or let us say if you use any like motorcycle car etcetera. So,
what can happen that after a certain period of time these systems or their components can get
degraded. During degradation cannot completely failed, but their performance has degraded as
well as their failure rate has become higher, because in degraded state when we are going to use
those items, the chances of failures will be high. So, degraded systems also we can model and we
can consider different failure rates when they are degraded and not degraded.

Then with repair also systems can be considered. Whenever we consider repair, then modeling
becomes a little tough but we will discuss about it how to take care of it. And once we take repair
into consideration, then we will consider a simple case that is the two component system here. If
we consider this is, first we will consider without repair, then we will consider that if they are if
this is we have the repair facility and the failed equipment can be repaired, then how much will
be the reliability we will be getting.

Similarly, we will also consider the standby system. Here, we will consider without repair, then
we will also consider an example if that that if repair is considered, then how the reliability can
be calculated.

(Refer Slide Time: 15:05)

Let us take an example of two component system. In two component system, we have two
components here; component 1, component 2. Now, there are various possibilities here emerging

363
out of it, because as here we are considering that each component can be in two state, either it
can be in operating state or it can be failed state.

So, we have two states for component 2 and we have two states for component 1, so, all together
we have 2 into 2, 4 states. These are the 4 states which are mentioned here that is, both
components are operating component 1 is failed component 2 is operating, component 1 is
operating component 2 is failed or both are failed.

Now, so, we have 4 states here, first state is both are operating. So, here we have state number 1
where we have component 1 is also operating and component 2 is also operating. Now, (there is)
there are two possibilities here from this state either component 1 fails or component 2 fails. So,
if component 1 can fail with the transition rate of lambda 1 because failure rate for component 1
is lambda 1.

So, from this state there is a transition rate of lambda 1 by which we can move to the system
state number 2. In the system state number 2, component 1 is failed state but component 2 is
working from this state there is another transition possible that is that is due to the failure rate of
second component. So, if second component fails in that case component 1 is (opera) continues
to continue to operate but (com) component 2 that is failed.

Now, from these two state if we see, only component 2 is operating. So, it says component 2 is
operating, then failure it will be lambda 2, so, with that this state if you see component 1 is failed
and component 2 is also failed. From this state also same state can be achieved by because
component 1 is operating failure rate of component 1 is lambda 1. So, from component this state
to this state where component 1 and 2 both are in failed state (the fail) the rate will be lambda 1.
So, this is how we are able to make this transition diagram or rate diagram. In rate diagram, what
we do we try to explore all possibilities here all states are mentioned here.

364
So, here all four states of this system are mentioned here and as we see that in each transition
only one failure is considered or only one change is considered in one transition two change will
not be considered that is also another Markov assumption. So, in one change, only one change
will happen. So, here lambda 1 is changing of component 1 is failing a component 2 is failing,
then here because component 2 is operating, so component 2 fails here and here component 1 is
operating, so, component 1 fails here. So, with this diagram once we know we are able to let us
move forward, let us see how we can analyze this.

(Refer Slide Time: 18:26)

365
Now, let us see this, we want to develop the equations here. So, here to develop the equations
first let us see, let us assume that we want to know the P1 t plus delta t. What is P1 t plus delta t?
P1 means probability of that system is in state 1 at time t plus delta t. Now, the system can be in
state 1 at time t plus delta t, that is only possible if system was in state 1 at time t and it moves to
the same state that is state 1 that means, it does not fail neither it moves to state 2 nor it moves to
state 3.

Now, these rates we have mentioned in terms of rates. So, we know lambda is the rate lambda is
the conditional probability of failure per unit time. So, if we multiply lambda with delta t, this
becomes probability of failure in delta t time, small time delta t the probability of failure or the
chain probability that system will move from system state 1 to state 2 is lambda 1 delta t.
Similarly, this we can write probabilities lambda 2 delta t, this is lambda 1 delta t, this is lambda
2 delta t.

Now, the probability for state 1 to move the probability that system can be out from state 1 that
is, lambda 1 delta t and lambda 2 delta t. So, what is the probability that system will be in state
remaining in state 1? That means, it does not move out that is 1 minus lambda 1 delta t minus
lambda 2 delta t. So, the system will remain in state 1 in time delta t. The probability is 1 minus
lambda 1 delta t minus lambda 2 delta t because lambda 1 delta t is the probability it can move
the state 2 and lambda 2 delta t is the probability that it can move to state 3.

So, the probability that it will remain in the same state is 1 minus lambda 1 delta t lambda 2 delta
t given that it was already state 1. So, the probability of transition from state 1 to state 1 becomes
this. So, here probability that the system is in state 1 at time t plus delta t, this probability is only
possible that this is the probability that system remains in state 1 only that means, this
probability, but under one condition, the condition is that at the time t system should be in state 1
because here no other condition is happening like system one cannot system cannot reach the
state 1 from state 2 there is no transition possible.

Similarly, from 3 there is no transition possible. Similarly, from 4 there is no transition possible
to state 1. So, we have the 0 possibility, we have only one possibility that system can be in state
1 at time t plus delta t that is, that system was in state at was in state 1 at time t and it remains in
same state. So, that is the probability that system remains in state 1 in time in time delta t given

366
that, system was in a state 1 at time t multiply with the probability that system is in, we know
that Pt plus delta t is p delta t we can say given t multiply by P1 t.

So, how much is P1 t plus t delta t given t that is we already know from here that is 1 minus
lambda 1 delta t minus lambda 2 delta t. So, P1 t plus delta t is nothing but P1 t into 1 minus
lambda 1 delta t minus lambda 2 delta t this probability is 1 minus lambda 1 delta t minus
lambda 2 delta t multiplied by probability that system is in state t state 1 at time t that is P1 t.

So, here as we discuss P t plus delta t is nothing but probability that system does not transit from
here that is 1 minus lambda 1 delta t minus lambda 2 delta t multiply by probability that system
should be in state 1, that is P1 t. So, this if we solve this we can take P1 t plus delta t and from
here this will become all multiplied by P1 delta t, so, (my) P1 t will be coming here 1 into P1 t as
P1 t, so, P1 t if we take left this will become minus and minus lambda 1 delta t minus of minus
minus plus lambda 2 delta t into P1 t.

Now, as we see here the same thing we have written here, delta t if we take common from both,
then this will become P1 t plus delta t minus P1 t divided by delta t will be equal to minus of
lambda 1 plus lambda 2 into P1 t. Same thing we have written here. So, here, then we put limit
delta t tending to 0, then this delta p limit delta t tending to 0 when we put, then this become dP1
t over dt. So, dP1 t over dt is equal to minus of lambda 1 plus lambda 2 into P1 t. This becomes a
differential equation simple very simple differential equation which we can solve very easily
here, I will show you how this comes because just to remind you how…

So, we have got this equation. Now we want to solve this. So, what how we can solve? We can
say this equation dP1 t if we take P1 to this side and dt to this side. So, this will become dP1 t
over P1 t will be equal to minus lambda 1 plus lambda 2 into dt. Now, if you take integration on
both side and if you take plus C, then this is dP1 t over P1 t so, this will give the ln of P1 t this is
equal to minus lambda 1 plus lambda 2 both are time independent, so, we can take them outside
and integration of dt will give you t plus C.

Now, if we put the condition here that is, we know at time t equal to 0 system F at the start of
when we started the system at the time t equal to 0 the system was in state 1. So, the P1 0 is
equal to 1. Similarly, but P2 0 will be equal to 0 and P3 0 will be equal to 0 because at the time t
equal to 0 system was in state 1. So, probability of system being in state 1 P1 0 is equal to 1.

367
Now the same thing if we put, then ln of P1 0 that is ln of 1 will be equal to minus lambda 1 plus
lambda 2 t 0 is equal to plus C, lambda ln value of 1 is equal to 0 and this is also 0 this is C. So,
C will be equal to 0.

When C equal to 0 we put, then this equation will become ln of P1 t is equal to minus lambda 1
plus lambda 2 into P1 t, into t. Now here if we solve this, then P1 t will be equal to if you take
this here this will become exponential e to the power minus lambda 1 plus lambda 2 into t the
same, I have written here that P1 t is equal to e to the power minus lambda 1 plus lambda 2 into
t. So, we have solved this equation and we could after solving this differential equation we could
get this P1 t.

(Refer Slide Time: 27:29)

368
Similarly, we can get other probabilities; now for state 2 if we write the equation, then for state 2
what will be the probability that it will stay in the same state is 1 minus outgoing probabilities.
So, it can go outgoing probabilities lambda 2 delta t, so, the probability that system is in state 2
and it remains in state 2 that means it does not go out from the system state, so, that will be 1
minus lambda 2 delta t. Similarly, if we see this, this will be 1 minus lambda 1 delta t.

Now, if we can make the same equation here, then for P2 t, so, P2 t plus delta t will be equal to
now P2 t there are two possibilities that system is in state 1 at time t that means P1 t and system
transit from state 1 to state 2 the transition probabilities lambda 1 delta t delta t lambda 1. Now,
the second possibility is that system is in state 2 and it remains in state 2 that is probability that

369
system is in state 2 that is P2 t multiplied by probability that is, system is remains in state 2 that
is 1 minus lambda 2 delta t.

Now, if we solve this will become P2 t plus delta t equal to P1 t into lambda 1 plus P2 t I will
keep delta t minus lambda 2 delta t t2 t. Now, if you see we take this towards that this P2 t we
take this side so this will become P2 t plus delta t minus P2 t that will be equal to delta t delta t
also we can take common, then it will become lambda 1 P1 t minus lambda 2 P2 t. So, we can
write it as P2 t plus delta t minus P2 t divided by delta t equal to lambda 1 t1 t minus lambda 2
P2 t. So, this again when we put limit delta t tending to 0 this will be dP2 t over dt will be equal
to lambda 1 t1 t minus lambda 2 P2 t.

We have used these equations to derive, but the to derive these equations, we may not need to do
always the same thing. If you look closely at this figure, then we can easily find out like if you
look at the first equation here. So, first equation how can we get the how much is the change in
P1 t? The change in P1 t will be positive if something is coming that if some transition is there
which is incoming to the state that will be positive and if there is some transition which is
outgoing to the state that will be negative.

So, P1 t can change, the change is only negative because only outgoing possibilities there and
there are two possibilities lambda 1 delta d from state 1. So, from state 1 P1 t lambda 1 and
second rate is lambda 2. So, P1 t lambda 1 plus lambda 2, because both are outgoing, so, both are
negative here.

Similarly, if we look at secondary state, we have incoming probability rate is lambda 1 from P1.
So, lambda 1 from P1 and outgoing is lambda 2, so, from P2 we have the outgoing minus lambda
2. So, that is how we can make the transition probabilities can be directly written and we can
directly get these equations. Like for 3, if I want to write the same equation, then dP3 over dt will
be equal to of incoming is lambda 2s from state 1. So, P1 t lambda 2 and outgoing is lambda 1
from state 3. So, P3 t into lambda 1.

When we are solving these equations like same equation is written here, when we solve these
equations, there is another equation which is always going to true that is P1 t plus P2 t plus P3 t
plus P4 t is always going to be 1 the system has to be in one of these a state system cannot be
outside these states, so, the state probabilities always sum to 1.

370
Now, let us just try to solve this that if you are going to solve this equation P2 t, I am erasing this
because it is already written here. So, we will go ahead with this, this is the equation. Now, in
this equation, if we replace P1 t from here, then this will become dP2 t over dt is equal to e to the
power minus lambda 1 plus lambda 2 into t into lambda 1 minus lambda 2 P2 t. So, here this
becomes a differential equation dP2 t over dt plus lambda 2 P2 t is equal to lambda 1 e to the
power minus lambda 1 plus lambda 2 into t. Here again we can use the same concept as we
discussed earlier, we can use the differential equation to solve this.

So, if you use differential equation solving, then what we can do we can multiply both sides with
e to the power lambda 2t e to the power lambda 2t e to the power lambda 2t e to the power
lambda 2t. Then we solve this because now this term left hand side term we know e to the power
lambda 2t P2 t, if you take differentiation of this with respect to t, what we will get? We will get
by parts we will get e to the power lambda 2t into dP2 t over dt plus if we can take the (sec) first
part for differentiation, then P2 t will remain as it is multiply by lambda 2 e to the power lambda
2t.

So, this is nothing but this same part lambda 2 e to the power lambda 2t P2 t and this part is same
as this part. So, this left hand side is nothing but differentiation of e to the power lambda 2t P2 t
over dt. So, our equation has become differentiation of e to the power lambda 2t into P2 t over dt
is equal to all.

Now, here e to the power lambda 2 t when we multiply, then this will become lambda 1 e to the
power minus lambda 1t because this is minus lambda 1 minus lambda 2t and lambda 2t lambda
2t will get cancelled. Now, this equation is if you solve into, so, if we solve this, then we take dt
here and, then we can integrate we can integrate from 0 to t maybe. So, when we integrate on
both side, then this integration will become e to the power lambda 2t lambda 2t positive and
multiply by P2 t will be equal to, this is integration of lambda 1 e to the power minus lambda 1t.

So, lambda 1 will be constant and when we differentiate this, this will we will be getting minus 1
upon lambda 1 e to the power minus lambda 1 t plus C. Now, here if we solve this let us say at t
equal to 0 P2 t 0. So, when we put t equal to 0 this will become 0 and, then we put t equal to 0
this will be equal to minus 1 plus C. So, C will be equal to 1 here. When we put C equal to 1
here, then e to the power lambda 2t P2t will be equal to minus e to the power minus lambda 1 t
plus 1.

371
Now, lambda 2 t which we have here we can divide it here. So, P2t will be equal to minus e to
the power minus lambda 1 minus again because divide this will become minus, so that
cumulatively we can say minus minus plus plus e to the power minus lambda 2t e to the power
minus lambda 2t minus e to the power minus lambda 1 plus lambda 2t, same answer we have
got.

So, similar way if we can solve for P3 t also and we will get this same equation if you see
nothing differences there only lambda rather than lambda 2 it becomes lambda 1. So, this will
become lambda 2 in place that lambda 1 and lambda 1 replaced by lambda 2 lambda 1 lambda 2
lambda 2 lambda you will get the same answer. So, we are able to get the P1 t P2 t and P3 t.

Now, there can be two cases here that is series system if you talk about series system, in series
system this only is the success state because if anyone is failed if component 2 is failed or
component 1 is failed. So, this is failed state this has failed state this is failed state but this is only
success state. So, for reliability evaluation, the reliability will be the probability that system is in
state 1 at time t that this P1 t. So, Rt becomes P1 t that is e to the power minus lambda 1 plus
lambda 2t which is same as we discussed in during reliability block diagram approach.

But if we talk about parallel system, in case of parallel system one failure is allowed only one
device working is required. So, this becomes success this becomes success this becomes success.
So, we have P1 P2 and P3 three states are success states over reliability by P1 t plus P2 t plus P3
t. So, reliability here becomes P1 t plus P2 t plus P3 t or we can say 1 minus P4 t.

Now, P1 t is e to the power minus lambda 1 plus lambda 2t and P2 t e to the power minus
lambda 2t this is minus this, this and this will get cancelled and this will be e to the power minus
lambda 1t plus e to the power minus lambda 2t minus e to the power minus lambda 1 plus
lambda 2t and same thing comes out here, which is same as what we have got during the parallel
(comp) configuration. So, this discussion we will continue in further lectures for other system
state system configurations. Thank you.

372
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 21
Markov Analysis
Hello everyone. So, we are moving to next lecture and we will continue our discussion on
Markov analysis. In the previous lecture, we discussed about two component system which
are working together parallelly. So, we considered two cases in series system, if any one of
the component fails, the system is considered to be failed, in parallel system where one of the
failure is tolerated.

So, both the system whenever they fails then only the system failure occurs and we could see
that when we know the state probabilities from there we can get the relative for both cases,
series system case also, parallel system case also. Today, we will discuss first load sharing
system. In load sharing system, as we discussed initially, we have two systems or two
components.

(Refer Slide Time: 1:20)

So, we have component 1 and component 2 like we discussed earlier, we have component 1
and component 2 both are operating here in state number 1. In state number 2 or state number
3, there are two possibilities here either component 1 can fail. So, if component 1 fails
component 2 continuously operate here that is our system state 2. There is another possibility
in place of component 1, component 2 fails. So, component 2 is failure, but component 1 is
operating in this state.

373
In load sharing system, as we discussed briefly previously in previous class that whenever
one system is failing, let us say we are talking about the generator system. So, if two
generators are working together they are taking half half load, but if one of the generator fails
then all the load is shifted to the another generator.

So, that generator now becomes is working in a higher load under higher load because now
here generator 2 is working or component 2 is working. In higher load as we see the
probability of failure will not be or the failure it will not be same as lambda 2. Failure it may
become higher. It may remain same also but in general as we can see the concept of load
sharing because of failure of one portion of the system, another system is now having the
higher stress.

In higher states the chances of failures are high. So, conditional probability of failure per unit
time is or that is failure rate is also high. Similarly, here component 1 is only working
component 2 is failed, this is also under higher load. So, let us see the failure rate for this
component is now the higher one which is lambda 2 plus. So, plus is denoting that the failure
rate has increased to a new value which we are calling as lambda 1 plus. So, similarly here
component 2 fill (()) (3:19) filler it has increased to a new value which is lambda 2 plus.

So, if we follow the same notation, then as we know that for state number 1dP 1 over dt will
be negative of from P 1 only 2 rates around. So, lambda 1 minus of lambda 1 plus lambda 2
into P 1 t. This equation we have already solved. This is the same equation which we had in
the two component case and the result was P 1 t was equal to e to the power minus lambda 1 t
lambda 1 plus lambda 2 into t same thing which we have caught here. Now, here we have
another equation. This equation for is for lambda 2.

374
(Refer Slide Time: 4:07)

So, for lambda 2 equation when we look into now, let us see how do we solve this. Solution
of this equation will be similar to how we solve the P 2 equation in previous case. So, here P
1 t we can pick up from here, this we can pick up and place it here and this will become dP 2
t over dt will be equal to lambda 1 e to the power minus lambda 1 plus lambda 2 into t minus
lambda 2 plus into P 2 t.

dP1 (t )
= − ( 1 + 2 ) P1 (t )
dt dP2 (t )
= 1 P1 (t ) − 2+ P2 (t )
dP3 (t ) dt
= 2 P1 (t ) − 1+ P3 (t )
dt P1 (t ) = e −( 1 + 2 )t
1 e −  t −e (
+ − 1+2 )t
 2
P2 (t ) = + 
2
P3 (t ) = e −  t −e ( + − 1+2 )t


t
1 + 2 − 2 1 + 2 − 1+  
R(t ) = P1 (t ) + P2 (t ) + P3 (t ) = 1 − P4 (t )

2  −  +t −2t 
R(t ) = e−2t + e −e , if both units are identical.
2 −  +  

As we discussed earlier this lambda 2 P 2 t we can take left sides so, this will become dP 2 t
over dt plus lambda 2 plus into P 2 t is equal to lambda 1 e to the power minus lambda 1 plus
lambda 2 into t. As we discussed earlier this since this is lambda 2 plus we have to multiply
here with the exponential of lambda 2 plus t all the terms here also we have e to the power
lambda 2 plus t, here also we multiplied e to the power lambda 2 plus t, once we multiply this
what happens this left hand term is becoming the differentiation of e to the bar lambda 2 plus
t into P 2 t over dt. If you differentiate this that two term will be generated these are the same

375
term is equal to lambda 1 e to the power minus lambda 1 plus lambda 2 into t into e to the
power lambda 2 plus t.

This lambda 2 plus t will actually we can take inside here if you want so, this we can say that
lambda 1 e to the power minus lambda 1 plus lambda 2 minus lambda 2 plus because this is
plus sign so, this is minus sign this will become minus (()) (6:17) into t. Now, as we saw
earlier we can integrate on we can take dT towards here and we can integrate on both sides.
Once we do that, then this left side integration will give e to the power lambda 2 plus t into P
2 t will be equal to integration of the whatever terms are independent of t we can take them
outside. So, lambda 1 is independent integration of e to the power minus lambda 1 plus
lambda 2 minus lambda 2 plus into t dt.

This integration if you do we know this will become lambda 1 upon lambda 1 plus lambda 2
minus lambda 2 plus the same term which is in exponent as multiplication of t minus of that
will be coming as the division term and again integration of minus of lambda 1 plus lambda 2
minus lambda 2 plus into t plus a constant c.

Now, this constant c as we see, when we put our terms that t equal to 0 that P 2 t will be 0,
because at this initial time t equal to 0 system is in the state 1, so, P 2 is 0. So, here once you
put 0 then this term will become e to the power 0 that will be 1 this term will become 0
because P 2 is 0. So, c will so, here this equation will become 0 equal to minus of lambda 1
upon lambda 1 plus lambda 2 minus lambda 2 plus into e to the power 0 will be 1 plus c.

So, here we can see c is equal to lambda 1 upon lambda 1 plus lambda 2 minus lambda 2
plus. Same thing when we place it here then from this equation what we will get here I will
erase some portion here.

376
(Refer Slide Time: 8:29)

So, let us see if we again use a pen here. So, this equation which we had this equation is now
P 2 t we want to calculate. So, to calculate P 2 t e to the power lambda 2 plus t we can divide.
So, P 2 t will be equal to or P 2 t into e to the power lambda 2 plus t will be equal to lambda 1
upon lambda 1 plus lambda 2 minus lambda 2 plus that is for c. So, c though this value and
this value is same, so, we can say this it multiply by 1 minus e to the power minus lambda 1
plus lambda 2 minus lambda 2 plus into t.

Now, here if you see e to the power lambda 2 plus t if you take on right hand side then P 2 t
will be equal to lambda 1 upon lambda 1 plus lambda 2 minus lambda 2 minus multiply by e
to the power minus lambda 2 plus t minus now, because this will go here and divide here. So,
this lambda 2 will get cancelled, the remaining term is e to the power minus lambda 1 plus
lambda 2 into t.

This gives me the final might be two value, if you look at here, this is my same P 2 is lambda
1 upon lambda 1 plus lambda 2 minus lambda 2 plus sorry this is lambda 2 plus multiply with
e to the power minus lambda 2 plus t sorry this is below this is typing mistake/ This has to
come below.

So, minus here minus exponential of minus lambda 1 plus lambda 2 into t. Similarly, then the
if we try to solve for 3, similar equation will be there, only lambda 1 and lambda 2 will get
interchange. So, this lambda 1 will become lambda 2. This will be lambda 2 plus lambda 1
minus lambda 1 plus and e to the power minus lambda 1 plus t minus the exponential of
minus lambda 1 plus lambda 2 into t because of the similarity because this is this state is very

377
similar to this state, only changes lambda 1 and lambda 2 are interchanged. One component is
changed with the second component.

(Refer Slide Time: 11:19)

So, here we are able to get P 1 t, P 2 t and P 3 t. Now, what is the reliability? Now here if you
look at this state this is a success state because both components are working. All nodes are
served but in a shared mode, but this system if it is in state 2 then also it is a success because
here also my complete load is being served. In state 3 also my state all the load is being
served. State 4 is a failure state because here both components are failed and my requirements
are no longer fulfilled. So, my reliability will be nothing but the system being in a state 1 or
state 2 or state 3 that means reliability is nothing but this probability that system is in state 1
or state to or state 3, all three are mutually exclusive state so, reliabilities are summed up and
this gives me the probability.

If we sum of these three then what will happen? So, here this consideration is given all when
we if we consider that both the components are similar if both are identical unit that means,
they are having same type, same design, same manufacturer, purchased on the same day. So,
if both the needs are identical their failure behavior is also expected to be identical. So, in that
case lambda 1 will be equal to lambda 2 and lambda 1 plus will be equal to lambda 2 plus, we
can call this as lambda, we can call this lambda plus.

So, when we do this then this will be equal to e to the power minus 2 lambda t and this if we
write this will be equal to lambda upon 2 lambda minus lambda plus lambda upon 2 lambda
minus lambda plus into e to the power minus lambda plus t minus into the power minus 2

378
lambda t. The same thing is coming here whatever P 2 is here, same P 3 will also come,
lambda upon 2 lambda minus lambda. So, this becomes twice of this. So, this will become 2
lambda upon 2 lambda minus lambda plus into e to the power minus lambda plus t into minus
e to the power minus 2 lambda t. So, this becomes our answer here.

So, as we see here we are able to calculate reliability. If we want to calculate MTTF, then
MTTF, we know is the integration from 0 to infinity of R t dt. And as we know, as we have
seen earlier, if we do this then integration for this dt from 0 to infinity, when we apply this
then integration of first term will give 1 upon 2 lambda as you know, integration of 0 to
infinity e to the power minus lambda t dt is equal to 1 upon lambda.

So, same thing (()) (14:14) so, this become 1 upon 2 lambda plus constant term will remain
outside that is 2 lambda upon 2 lambda minus lambda plus when we integrate this this will be
given upon lambda plus when we integrate this, this will give 1 upon 2 lambda. So, this will
be our MTTF. We can solve this further to get the final answer.

(Refer Slide Time: 14:40)

379
Let us take one example. We have two generators which provide the electrical power. If one
fails, the other one can continue to provide electrical power. However, the increased load
result in a higher failure rate for the remaining generator. So, let us say lambda is 0.01 failure
per day. That means, when both are sharing the load then failure rate is 0.01 and lambda plus
is 0.1 Right. So, that is failure rate is becoming 10 times when the load is not shared when
one unit is taking the full load failure per day.

Now, what is the system reliability for 10 day contingency operation that means, there is no
power and for 10 days these two generators has to supply the power. So, in that case what
will be the reliability? If you want to know this reliability then we can apply the same
formula what we have got here directly that is e to the power minus 2 lambda t plus 2 lambda
upon 2 lambda minus lambda plus into this. So, e to the power minus 2 lambda t e to the
power minus 2 into 0.01 into t plus 2 into lambda divided by 2 into lambda that is 0.02 minus
lambda plus that is 0.1 multiply by e to the power minus as we see here e to the power minus
lambda plus t so, lambda plus is 0.1 lambda plus.

(()) (16:09) over lambda plus is equal to 0.1 and lambda is equal to 0.01 minus e to the power
minus 2 into lambda t between 2 into 0.01 into t.

(Refer Slide Time: 16:23)

380
So, this if we try we can solve using the this I can show it this calculations if we can do it on
Excel or we can do it on this Excel sheet also. So, here an MTTF as we saw earlier MTTF is
nothing but 1 upon 2 lambda. So, this is 1 by 2 in to 0.01 plus the same term will come as it is
here multiply by 1 upon 0.1 minus 1 upon 2 into 0.01. This if we solve because failure rate is
in per day. So, when we solve this the answer MTTF will also come in days that is coming to
be 60 days.

2  0.01
R (t ) = e −20.01t + e −0.1t − e −20.01t 
0.02 − 0.10
R (10) = 0.9314
1 2  0.01  1 1 
MTTF = +  −
2  0.01 0.02 − 0.10  0.1 2  0.01 
= 60 days

So, using this example as you can see that if there is a shared load then we can solve the
problem together reliability.

381
(Refer Slide Time: 17:28)

Now, if we look at standby systems. Standby systems means as we discussed earlier like of
generator can be one standby system. In case of generative, what is happening we generally
have the power supply so, power supply or electrical input is generally working. Now, so, we
have the electrical supply coming from a state electricity board.

So, this power supply is working now, we have certain reliability for this or we have the
failure rate for this let us say lambda 1. When this power supply fails then we have a
generator here if power supply fails then this generator will be plugged in and the generator
will supply the power. When the generator supplies the power then our sound system will
continue to work.

So, as we see here that two possibilities we are considering here. Here we are considering that
generator can fail in, generator or any system this is for all I am just taking an example here,
but does not mean generator here. It can be any system where the main function is being
served, but whenever main function fails then another component or the standby system this
can be pump also like you may have been some primary pump is doing the pumping of the
water in let us say power plant, nuclear power plant, but when it fails, then another pump will
which is there that will be started to run then and that will supply the power supply the water.

So, what happens, whatever the system we are considering that second system will be
plugged in and then that will start to do the function which was supposed to be done by the
primary unit. Now, here we are considering that generator or the secondary unit can fail in

382
standby mode and the failure but what will happen, generally as you consider the failure
probability in standby mode is smaller than failure probability fully operational mode.

So, here what we consider that the second unit which is in secondary unit which is there in
the standby. This is our component 2 and this is our component 1. Component 2 failure rate is
lambda 2 if it is operational and component 2 failure rate is lambda 2 minus when it is not
operational. Since, this is the difference which we generally see in terms of standby versus
the parallel system.

In case of parallel system, both the units are working so, the whenever we started the failure
rate was lambda 2 only, but in case of standby system, the system is in standby, it is not
operating. Since it is not operating it is not experiencing this stress, since it is not
experiencing the stress the chances of failures are less. So, the failure rate in standby mode
many times is considered 0, but if it is not considered 0, then we can consider some stress
some failure it is there in standby mode that is lambda 2 minus, minus we are denoting to say
that this is the lesser failure rate compared to the failure rate it would have experienced in
general operation mode.

Now, in system mode 1 what is happening component 1 is in function operational state and
component 2 is in standby state. So, there are two possibilities here, either component 1 fails
when it is operating or component 2 fails in standby mode itself, but component 1 keeps on
working. So, here what will happen component 1 works and component 2 is failed. It is failed
in standby mode. And other possibilities that component 1 fails, but component 2 is continues
to operate in standby mode.

Now, what will happen here? From this state the moment component 1 is failed, what will
happen, component 2 will be plugged in and component 2 will become now operational fail
operational. So, component 2 which was standby will become operational here. So, now
component 2 can fail with the priority lambda 2 failure rate lambda 2 and here all the
components either component 2 is failed only component 1 is working that failure rate is
lambda 1. So, this becomes our markup diagram for the same mark of diagram as we have
solved earlier same thing will be used same equations we can develop and similarly, we can
solve.

383
(Refer Slide Time: 22:23)

dP1 (t )
dt
(
= − 1 + 2− P1 (t ) )
dP2 (t ) R(t ) = P1 (t ) + P2 (t ) + P3 (t )
= 1 P1 (t ) − 2 P2 (t )
dt R(t ) = e − 1t +
dP3 (t )
= 2− P1 (t ) − 1 P3 (t ) 1 e − 2t − e −( 1 + 2− )t 
dt 1 + 2− − 2  
(
− 1 + 2− t )
P1 (t ) = e 1 1 1
MTTF = +  −
1 1 1 + 2− − 2  2
P2 (t ) = e −  t − 2

1 + 2 − 2

1 
( ) 1 + 2− 
− 1 + 2− t
e

(
− 1 + 2− t )
P3 (t ) = e − 1t − e

Now here success states are 1 and 2 and 3 in all three cases of a system is working because
here component 1 is operating here also here component 2 is operating and here component 1
is operating. In this case component 1 is failed and component 2 is failed. Here component 2
is in standby mode. Here component 1 is failed and here component 2 is failed.

So, we have different states here and the different states will have the different probabilities.
So, these state probabilities how do we determine as we discussed earlier for this state lambda
1 and lambda 2 minus are the two outgoing so, dP 1 over dt will be nothing but minus of
from P 1 we have two outgoing lambda 1 and lambda 2 so, lambda 1 plus lambda 2 minus
into P 1 t.

384
Now, if we solve this as we know earlier this becomes e to the power minus lambda 1 plus
lambda 2 minus into t. P 1 t will be equal to e to the power minus lambda 1 plus lambda 2
minus into t. I am removing this markers so, that we can see properly.

(Refer Slide Time: 23:37)

So, when we see it here so, solving this equation gives me directly this value. Now, we take
the second equation. Second equation if you look at here or from state 1 incoming is lambda
1 So, lambda 1 into P 1 t and outgoing is only lambda 2 from state 2 so, lambda minus
lambda 2 into P 2 t. This again as we solved earlier we can solve this equation also and then
we solve this equation we get P 2 t is equal to this value. We can try this again so, that you
will get a little bit more practice by seeing this.

(Refer Slide Time: 24:40)

385
Let us try to solve this equation. Same process as we did earlier similar thing as we as you see
that this diagram is also similar and the values the only changes in values. So, here when we
solve this we can look into this. We replace the P 1 t which we have got in here. So, my dP 2
t over differentiation of P 2 t over dt will be equal to lambda 1 e to the power minus lambda 1
plus lambda 2 minus into t minus lambda 2 P 2 t. So, this P 2 I can take on left sides, so, this
will become lambda 2 P 2 t. These are removed from here and I have taken from that side.

Now, this as we see here to resolve this what I have to do? Since lambda 2 is coming as
multiplication of P 2 t I will multiply whole term by e to the power lambda 2 t. Once I
multiply with e to the power lambda 2 t this left hand side will become e to the power lambda
2 t into P 2 t differentiation versus t will be equal to lambda 1 e to the power minus lambda 1
plus lambda 2 minus.

Now, here I will multiply this e to the power lambda 2 t. Since this is minus and this was
positive so, inside when we take this will become minus, minus lambda 2 into t. This we have
done same. So, I have just reduced one step. Same thing I have done in one less system. Now,
again when we do the this dT if you take on here then if we integrate then our values will
come. So, what will be one left hand side? Left hand side is e to the power lambda 2 t P 2 t
will be equal to lambda 1. Now, whatever is the multiplication here negative of that same
thing will come in denominator lambda 1 plus lambda 2 minus minus lambda 2 exponential
of minus lambda 1 plus lambda 2 minus minus lambda 2 into t plus c.

Again when we put t equal to 0 then P 2 0 will be 0. So, 0 e to the power minus 0 will be 1,
so, minus lambda 1 upon lambda 2 sorry lambda 1 plus lambda 2 minus minus lambda 2 plus
c so, c will be equal to positive of this because when it goes to the left hand side will become
positive. So, here we know c value, so, c value will be positive of this so, we get it clearly
that e to the power lambda 2 t into P 2 t will be equal to lambda 1 upon lambda 1 plus lambda
2 minus minus lambda 2 into because of c this will become 1 minus this is minus term, so,
minus term e to the power minus lambda 1 plus lambda 2 minus minus lambda 2 into t. We
can solve this using the same way as we have solved earlier.

So, P 2 t will be equal to we can take divide this by e to the power minus or multiply both
sides with e to the power minus lambda 2 t this will become 1. So, P 2 t will be equal to
lambda 1 upon lambda 1 plus lambda 2 minus minus lambda 2 multiplied by e to the power
minus lambda 2 t minus e to the power minus. Now lambda 2 minus minus plus and this
minus when we multiply them this lambda 2 will be removed here, minus of lambda 1 plus

386
lambda 2 minus into t. The same becomes our P 2 T which is here. If you see this p 2 t same
lambda 1 upon lambda 1 plus lambda 2 minus minus lambda 2 multiply by e to the power
minus lambda 2 t minus e to the power minus lambda 1 plus lambda 2 minus into t. So, same
way as we have got earlier probabilities we are able to get these probabilities also.

(Refer Slide Time: 29:18)

Similarly, PTT also we are able to get by taking by solving this. This is almost similar to
what we have solved earlier in case of two component case. So, the only difference is rather
than lambda 2 it is lambda 2 minus. So, once we solve this then P 2 t comes out to be e to the
power minus lambda 1 t into minus e to the 1 minus lambda 1 plus lambda 2 minus into t. So,
we have got all probabilities here P 1 t, P 2 t and P 3 t so, we can get the R t here. R t is
because this is also success, this is also success, this is also success this system is some state
is the failure state.

So my success state probabilities when I sum up I get the R t and my P 1 t is e to the power
minus lambda 1 plus lambda 2 into t. And this if you see this will get cancelled, this will
remain e to the power minus lambda 1 t, and this P 2 t whatever it is that will also come,
lambda 1 upon lambda 1 plus lambda 2 minus minus lambda 2 and same. So, this becomes
my reliability value. If I want to calculate MTTF, then the integral from 0 to infinity of this
so, this becomes 1 upon this will give 1 upon lambda 1. This term will come as it is multiply
by 1 upon lambda 2 minus of 1 upon lambda 1 plus lambda 2 minus. This gives me the
MTTF.

387
If we solve this further by taking lambda 2 into lambda 1 plus lambda 2 minus then this will
become lambda 1 if you take this inside common then this will become lambda 1 plus lambda
2 minus minus lambda 2 into lambda 1 divided by this value so, this lambda 2 so, we hold
this this may this have to check. So, let us leave this. This will be the final value lambda 2
lambda 1 plus lambda 2 lamda 1 plus lambda 2 minus minus lambda 2 minus divided by into
lambda 1 divided by lambda 1 plus lambda 2. So, we this solution may have been put in
error.

(Refer Slide Time: 31:46)

So, this is not applicable let us hold this here. System by system can be solved using this
equation standby system considering that unit can fail in standby mode. But switch failure is
not considered here that means he has a switch failure probability 0 or switch is going to be
perfect. It is perfectly reliable. So, we will take an example of the same in discussing next
class. We will stop it here today. Thank you.

388
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 22
Markov Analysis (Contd.)
Hello everyone. So, we have been discussing about Markov Analysis and we discussed in
previous two lectures, two different cases three actually different cases. In first lecture, we
discussed two components system which are working together and then they can be series
system can be series or parallel depending on the configuration, then we discussed about
shared load system where if an equipment fails, then load goes to another system, in that case
the failure rate of another system will be higher.

Then we also discuss the standby system. In standby system, we considered the situation
where the system which is in standby can fail when it is in standby, but with a lesser failure
rate, it will not have the stress so, the failure rate will be lesser when it is there in the standby
mode, then the failure rate which is there in the general operating mode for the second unit.
We derive the equations and we also saw that how we can solve those equations to get the
values. Today we will try to do one small example for where we left yesterday that in last
class that is first standby system.

(Refer Slide Time: 1:42)

389
0.01
R(t ) = e −0.01t + e −0.1t − e − (0.01+ 0.001) t 
0.01 + 0.001 − 0.10
1 0.01  1 1 
MTTF = +  −
0.01 0.01 + 0.001 − 0.10  0.1 0.01 + 0.001 
= 109 days

So, let us see the example that similar to what we considered earlier. We have active
generator that means our timer unit, so, we have a generator which is working and but it has a
failure rate of 0.01 failures per day. But, here we have an older generator that means, this is a
new generator and this generator is new made and it is having a better reliability, the better
reliability smaller failure rate, but there is another generator which we are keeping in standby.

This generator is having a failure rate of 0.001 when a standby that means it is not working
just we are keeping it there, but then also may fail when we try to run this generator we may
find that it is not working. Another thing is when we operate this old generator then failure
rate is 0.4 which is significantly higher than the failure rate of the active generator. Now, we
want to know what is the reliability for this system for a 30 day use?

As we know here, if you look at the our previous formula which we developed my lambda 1
is here 0.01. Failure rate of generator in standby mode that is lambda 2 minus that is my
0.001 per day failures per day we can say failures per day. Then lambda 2 in operating is 0.10
failures per day. Now, as we have solved earlier we can use this equation and we can get this
value.

390
So, if we look at the previous presentation from the previous presentation, my value for
reliability was coming this e to the power minus lambda 1 t plus lambda 1 upon lambda 1
plus lambda 2 minus minus lambda 2 e to the power minus lambda 2 t minus e to the power
minus lambda 1 plus lambda 2 minus into t. Same equation we can use here that is e to the
power minus lambda 1 t plus lambda 1 upon lambda 1 plus lambda 2 minus minus lambda 2
multiplied by e to the power minus lambda 2 t minus e to the power minus lambda 1 plus
lambda 2 minus into t.

Same values we have put here for lambda 1 lambda 2 minus and lambda 2 and when we solve
this we get this now, by putting t equal to 30 we get the reliability for 30 days which comes
out to be 0.8160. Same thing we can solve and we can get the MTTF also for MTTF this is
one upon 0.01 plus the same value will come as it is multiply by 1 upon 0.01 minus 1 upon
0.01 plus 0.01. This value comes out to be 109 days.

So, the formula which we evaluated earlier, we can use the formula to get this probability
directly and we are able to know the reliability as a function of time. So, we can get the
reliability for 60 days also, we can get reliability for 100 days also, whatever the value of t we
put here we will get the reliability here so, we are having the relative function as a function of
time t.

(Refer Slide Time: 5:29)

Now, let us see if our units were identical. And here we are considering that there are k
identical units which are used and one of them is online and rest are in standby. So, for these

391
kinds of systems the reliably can be evaluated and can be given using this formula. How this
formula comes? Let us try to evaluate this.

( t ) i
k −1
Rk (t ) = e − t 
i =0 i !

MTTF = k / 

Let us see that we have initially we have k units working so, we have unit 1 or component 1
operating and component 2 to component k minus 1. So, let us take one value of k for an
example and that will help us to understand. So, let us say k is equal to 3 here that means, I
have three units for three units here. So, one unit is an operation and C 2, C 3 are in standby.

Now, from here we are assuming here that system cannot fail in standby mode. System can
only fail when this is operating. So, the failure rate will be lambda here. All components are
having same failure rate. Only one unit is operating. Since one unit is operating the failure
rate of one unit is lambda.

Now, from here we reach to the system state 2. In system state 2 component 1 is failed,
component 2 is operating and component 3 is standby. Now from here again because only
one component is working and failure it is lambda it will reach to the system 3 state number
3. In the state number 3 component 1 is already failed component 2 is also failed and
component 3 becomes operational.

Now, from here it will reach to system number is systems check number 4, where component
with the failure rate lambda component 1 is failed, component 2 is failed and component 3 is
also failed. So, this becomes our failure state. I am showing it for three the same can be
expanded and can be done for the multiple states.

Now, for three as we see here, first state if we write then dP 1 t over dt is equal to only one
outgoing is there that is minus lambda into P 1 t. If we solve this we know P 1 t will be equal
to e to the power minus lambda into t this we already know. So, we are writing because we
have solved so, many equations.

So, we already know. Now from P 2 t dP 2 over dt will be equal to incoming is from P 1. So,
P 1 t into lambda and from outgoing that is from P 2 minus lambda P 2 t this if we solve this
become dP 2 t over dt plus lambda P 2 t will be equal to lambda P 1 t. P 1 t is e to the power

392
minus lambda t. If we solve this then this will become e to the power lambda t into P 2 t will
be equal to, I am doing two step escaping here.

So, this actually is already taught in differential equations solutions if you remember you can
directly do that. So, this term will be equal to e to the power lambda t P 2 t that will be
integration of lambda e to the power minus lambda t into e to the power lambda t dt plus c.
Now, if we integrate this, then what we get? We get e to the power lambda t into P 2 t which
will be equal to lambda and this will become 1. So, lambda into t plus c if I put t equal to 0
then P 2 0 will be equal to 0. So, c will if I put that this will be 0, this will be 0 and c so, this
implies c will be equal to 0.

So, if I put c equal to 0, then my equation now becomes e to the power lambda t into P 2 t is
equal to lambda t. So, my P 2 t will be equal to lambda t e to the power minus lambda t.
Similarly, if I solve for P 3 t. For P 3 t, if I write then dP 3 t over dt will be equal to plus from
two, so, that is lambda into P 2 t and going out that is lambda from P 3 t. So, lambda P 3 t. If
you see, this will be equal to or same way it will become like we solved earlier, same way it
will become I will show this maybe I can show I need to erase this little bit so, that I can use
some space here.

(Refer Slide Time: 10:56)

If you see here, lambda 1 if I put i equal to 0 here, then lambda t raised to the power 0, so,
this will become 1. So, you will get e to the power minus lambda t that is the same term
which I have got for P 1 t e to the power minus lambda 2 t lambda t, when I put i equal to 1,

393
then this will become lambda t raised to the power of 1 divided by factorial 1. Factorial 1 is 1,
so, this will become lambda t e to the power minus lambda t that same term for P 1 t.

Similarly, when I solve for P 2 t, P 3 t, my term will become i equal to 2, that will become
lambda t squared divided by factorial 2. So, that means lambda t squared divided by two
multiplied by e to the power minus lambda 2 t, lambda t. So, same thing we should get.

(Refer Slide Time: 11:47)

So, here I am trying to let us say if I solve the same thing, I have got some space here to solve
this. So, as I see that I am removing this all. I am not removing the diagram because we need
to use that again. So, let us see this now, I will put P 2 t value from here to here. So, this will
become lambda so, lambda into lambda t, so, this will become lambda square t e to the power
minus lambda t minus and this term when I take left hand side this will become dP 3 t over dt
plus lambda P 3 t will be equal to this.

Now, again as we discussed earlier or e to the power lambda t into P 3 t will be equal to
integration lambda squared t e to the power minus lambda t then again e to the lambda t will
get multiplied dt plus c. Now, if you look at here this will become integration of lambda
squared e to the power minus lambda t to the power lambda t will become 1 lambda squared t
dt plus c. This lambda square t square by 2. If you integrate this t will become t square by 2
plus c that is equal to e to the power lambda t P 3 t.

Now, if I put t equal to 0 then P 3 0 will be 0, so, this will become 0, this is 0 plus c so, that
implies c is equal to 0. So, from this equation I can get e to the power lambda t P 3 t will be

394
equal to lambda t whole square divided by 2 and from this I can get P 3 t will be equal to e to
the power minus lambda t into lambda t whole square divided by 2.

As we see here depending on the number of states same thing will come next thing will come
is lambda t cube divided by for t cube when it comes t cube by 3, so, 2 into 3 that will be
factorial 3. So, same way when I keep on solving depending on the number of items I have
this equation will keep on evolving and this is the result which I will get.

So, my reliability is nothing but P 1 t plus P 2 t plus P 3 t and I have got all the values. So,
my reliability will become e to the power minus lambda t plus lambda t into e to the power
minus lambda t plus lambda t squared divided by 2 into e to the power minus lambda t. This I
can write it as e to the power minus lambda t into 1 plus lambda t plus lambda t square by 2,
which is same as this equation. Once we solve this further, you will get this generalized
equation.

(Refer Slide Time: 15:36)

For this identical units, when we solve we are able to get this R k t. Similarly, when we
integrate this function right then this integration will give 1 upon lambda for each term. So,
when i equal to 0 then also we are getting 1 upon lambda. Then we have lambda t e to the
power minus lambda t when we integrate this then again we get 1 upon lambda because this
term on part when we solve this term is making no difference.

Then again whenever we solve further, so, for every term we are getting one upon lambda so,
MTTF comes out to be k upon lambda, which is useful here also we can understand because
if you look at our system, like we have k components here, or we had the k components there.

395
Like in this case four were there, when we consider, four states, first component and
component 2 then component 3, so three components were there, they were working one by
one. So, how much will be MTTF?

Because they cannot fail when this is in standby. So, on an average total time will become
three times of that MTTF which you will get for the one component because they could not
fail in standby mode. So, first component will have on an average time MTTF. Second
component will also have an average time of MTTF. Third will also have on an average
MTTF.

So, if we have k such unit, then average time it will be taking to failure will be MTTF will be
equal to summation of average MTTF of individual. So, we can say this is k into MTTF of
each system.

(Refer Slide Time: 17:35)

So, this is equal to k that is number of units multiply by MTTF. This is my MTTF of system.
This is MTTF of component, which is unlike the what we considered in two component
failure. For two components failure what was happening? The probability was MTTF into
1.5, 1 plus 1 by 2 which was much less. This was 1.5 for 2 unit but for two 2 this will become
2 MTTF. So, for standby system, because the standby system is not operating when the first
system is operating, we get more time to failure or it works for a longer period of time.

396
(Refer Slide Time: 18:30)

Let us take one example that are ReyLieAble printing company has four presses, it has four
press. One operates and three are in standby. Each press has the identical constant failure rate
which is MTTF is 50. So, individual MTTF is 50. So, lambda is equal to 1 by 50 operating
hours so, that will be 0.02 failure per hour. Now, here if you see the company has received
this rush order requiring 75 hours of continuous operation that means they will start working
one by one if it fails, when one press fails, then they will plug in another failure and they will
continue till last press is working.

So, what is the probability that the company will be utilized on the probability that? It will be
able to provide the continuous printing support when the order is being processed. That
means it does not consume all the that means it in 75 hours all units should not fail. So, what
is the probability that at least one of them is working. That means either system as we see
here, we have four states here, three in a standby.

So, one working, three standby. Then we have one failed, one working, two in standby. Then
we have two failed, one working, one standby. Then we have three failed, one working, zero
standby. Then from here we can have four failures and 0 workings, 0 standby. So, here as we
see here, we have 1, 2, 3, 4, 5 state.

So, as we know our formula the because all are lambda here, all failure rates are lambda, and
as we solved earlier the reliability will become the probability that system is in this state, this
state, this state, this is failure state because here my order will not complete before order

397
completion, I will have the failure, but in all these states if the system is in any one of these
state, then my order will be completed.

So, this will become e to the power minus lambda t. So, lambda is 0.02 t into 1 plus lambda t.
So, lambda t is 0.02 into t, t here is my time of interest is 75 hours, 75. Similarly, 0.02 into 75
whole square divided by 2 plus for third component 0.02 into 75 multiply whole cube divided
by factorial 3. So, that will become 6. I have four states 1, 2, 3, 4, this will be my reliability.
And what will be unreliability? Unreliability will be 1 minus this or I can simply say
unreliability is e to the power minus last state probability can be calculated that it will be 1
minus of this.

3 (0.02  75)i
R4 (t ) = e−0.02t  = 0.9344
i =0 j1

So, once we know this, we use this sum when we evaluate this and sum it up the property
comes out to be 0.9344. So, with this example, you can see that if we have the standby units,
and we do not consider standby failure during the standby mode, then this can be a reliability
value which we can get.

(Refer Slide Time: 22:27)

Now, if we consider another case, then we consider a standby system with switching failure.
Switching failure means, as we discussed earlier, our primary unit is here then we have a
secondary unit which is not operating. But, whenever primary unit fails then either by
automatic or by manual switching the secondary unit will be put in operation. So, but this

398
switch can also fail. If these switches failed, what will happen? I will not be able to plug in
the standby unit and my system will be in failed condition.

dP1 (t )
dt
(
= − (1 − p )1 + p1 + 2−  P1 (t ) = − 1 + 2− P1 (t ) )
dP2 (t )
= (1 − p )1 P1 (t ) − 2 P2 (t )
dt
dP3 (t )
= 2− P1 (t ) − 1 P3 (t )
dt

(
− 1 + 2− t)
P1 (t ) = e
(1 − p )1 e − 2t − e −( 1 + 2− )t 
P2 (t ) =
1 + 2− − 2  
( )
− 1 + 2− t
P3 (t ) = e − 1t − e
(1 − p )1 e − 2t − e −( 1 + 2− )t 
R(t ) = e − 1t +
1 + 2− − 2  

If it is automatic switch then also if it is manual switch then also so, my objective of
reliability will not be completed here. There are various cases considered for this which is
like hot standby, warm standby, et cetera. So, we are not going into that we are taking a
simpler concept here that first unit is only working, second unit is in standby and there is no
and but this can fail in standby and there can be also a switch failure. So, now, if you look at
this state, state number, here my first unit is primary unit or the unit 1 is working or operating
I can say operating and unit 2 is in standby mode.

Now, here there are three possibilities here. First possibility is that the equipment primary
equipment fails in prime whenever primary equipment fails the failure rate is lambda 1, but
this lambda 1 has again two possibilities like from here we have lambda 1. But here we have
two possibilities here. One is it is when this failing that switches also switches working and
other cases which is failed if P is the probability of switch failure the same state but this is not
related with time. The switch failure probability is not related considered to be related with
time because switch is not continuously operating.

When we consider that primary unit is failed at that time, the switch can be found in two
condition either it is working or it is failed. So, this probability of failure and probability of
working gets multiplied with the lambda 1 and this gives me the transition to two different

399
states. One is state this state is that unit one is failed switches operating operated. Since
switch is operated, what will happen? Unit 2 will become operational.

Now, this unit if we see this is a failed state because here what is happening? Unit 1 is failed
for this state if we see then switches also failed. Since switches failed now, because switches
failed then you need to remains in standby. We cannot put it in operation mode. Since we are
not able to put it in operation mode this is a failure state. So, this state directly leads to the
system failure. This state in this case we have the unit 2 as the operating. Since unit 2 is
operating so, the failure rate of unit 2 lambda 2 is coming into the picture here.

In this case, what is the second case that is that my standby unit can also fail with the failure
rate lambda 2 minus in the standby mode. In that case, what will happen? Unit 1 is operating
here and unit 2 is failed here, which is the standby failure.

Now, here because unit 2 is already failed, so, no switching nothing even switch works or
does not work in both cases, the unity cannot be put into the operation so, switch becomes
irrelevant here. So, here since unit 1 is operating that will keep on operating until it fails. So,
the failure rate of unit 1 only is coming into the picture and that gives me the state number 4.
So, state number 4, I am having two different possibility. One in one we have the unit one
failure, unit 2 is in standby, but switch has failed. Another case is unit 1 is failed, from here
unit 2 is also failed already and here because of that my system is state is here that is failed.

So, my failure state is reached in two ways either unit 1 is need to be failed in both the cases,
second cases that unit switches failed, but you need to comes on keeps on working. Another
case is unit 1 fails, unit 2 fails in standby mode, another case is unit 1 fails, switch works but
unit 2 fails in operation because of that is lambda 2. So, three cases three possible paths are
coming from three different directions and but all are leading to the system failure and this is
my fourth state is my system failure state.

400
(Refer Slide Time: 27:54)

This problem again, as we have solved other problems same way this can be solved. So, P 1
this is stage 1 if you look at it here, then as we see outgoing probability is this, outgoing rates
are these these these. So, when I summed up this, this will become 1 minus P into lambda 1
plus P lambda 1 plus lambda 2 minus. Now, here P lambda 1 P lambda 1, this will become
lambda 1 minus P lambda 1 plus P lambda 1 plus lambda 2 minus. This will get cancelled.
So, what I will remain is lambda 1 plus lambda 2 minus into P 1 t.

This if we solve, my P 1 t will come e to the power minus lambda 1 plus lambda 2 minus into
t which is similar to what we have for the earlier case is standby with failure problem, that
system can fail in the standby mode. But P 2 t which is the probability of this state where the
switch is working in that case, this if we solve from here similar process we have to follow.
So, this I will leave as an exercise to you.

So, try to evaluate this that by putting P 1 value here P 1 t what we have got this e to the 1
minus lambda 1 plus lambda 2 minus this lambda 2 P you take it here, then you apply the
same formula that you multiply with e to the power lambda 2 t in both side. Then you
integrate on both side, you will get the answer for P 2 t which will be equal to this.

Similarly, P 3 t you can get this and once you get P 1 2 P 2 t and P 3 t from there you can get
the R t. R t is nothing but in this is the only failure state. This is also working state, this is
also working state, this is also working state. So, when we sum up these three probabilities I
get the R t. So, R t comes out to be this value, because this value this will get cancelled and e

401
to the power minus lambda 1 t plus this value e to the power (()) (30:04) plus this value when
we take this case in our reliability.

(Refer Slide Time: 30:09)

402
Like we solved earlier example, we can solve similar example for this by simple formula
application, we can apply the formula, but we have got in previous slide and then here the
same problem is here that we have the generator which is having the failure rate of 0.01 so,
lambda 1 is equal to 0.01.

Older standby generator, so, lambda 2 minus is equal to 0.001 and failure rate of this then
online is lambda 2 that is 0.10. This problem is very similar to what we saw earlier, but
earlier when we solve we did not take the switching failure probability here the switching
failure probability P is equal to 0.1 10 percent. 10 percent means 0.1, 10 by 100 is 0.1.

Same thing then we solve now, this new formula we use that is same e to the power minus
lambda 1 t 1 minus P lambda 1 divided by. If you see the only thing which is changed is 1
minus P is coming here. Earlier there was no P 1 minus P coming here. This was lambda 1
plus upon lambda 1 plus lambda 2 minus minus lambda 2 that is the equation is same. Then
we apply this my probability comes out reliability comes out to be 30. As you see here, some
addition will be less because of 1 minus P because 0.9 whatever value was here that will get
multiplied by 0.9. So, reliability will fall that is 0.8085. If we see the previous example two
component then load sharing then standby.

So, here when we solve the example just 0.8160 that has become 0.8085. And same way, we
can calculate the MTTF also that will also reduce because of this point and multiplication.
So, we are able to solve standby system in two cases when we consider switches perfect that
means switch failure probability if we do not take and we consider that system can fail in
standby mode or during the operational mode. Here we have considered that switch can fail

403
can also fail. So, switch failure probability also we have considered and using this we are able
to get the reliability.

We have also considered then when units are identical, then how can we get the system
reliability, by assuming that there is no switch failure possibility and there is no failure in
standby mode. So, all these cases provide support generalized cases which you can use in
practical reliability valuation for various systems where standby is involved.

So, we will stop it here and we will continue our discussion for further system configurations.
Thank you.

(Refer Slide Time: 33:02)

404
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 23
Markov Analysis (Contd)

(Refer Slide Time: 00:25)

Hello everyone, we have been discussing about Markov Analysis. And in previous 3 lectures we
discussed how we can use Markov analysis to determine the system reliability considering two
component considering shared load system considering standby system. So, today we will
consider few more system configurations today, we will also try to see that if repair is also
possible, then how will it impact the reliability that analysis becomes little tedious, but we will
try to do it with one example or two examples and hopefully, you will learn that.

405
(Refer Slide Time: 01:08)

First let us discuss about degraded systems. First to understand the degraded system, we need to
understand what is mean by this. So, most of the systems when we use initially they will be in
good state good state means capacity is full, they can take all the load and the reliability is also
good, but as the system becomes older or system has sometimes sudden experience experiencing
sudden stresses because of which they become weaker.

So, system will be in degraded state so, degraded state means that system is having little bit loss
in health the health is not as good as new. And because of that the chances of failures becomes
high. So, here the system can be in three state. So, this our system or we can say component this
is applicable to component also system also.

So, at the component level or the system level, we are experiencing three states one is system is
in good state there it is working fine chances have from here there are two possibilities it can get
degrade or it can fail also. So, if it fails the failure rate for failure from system state 1 2 3 that
means, it does not go to degrade, but directly failed in that case that failure rate is lambda 1 and
there is a rate lambda 2 by which the system can degrade.

So, generally what happens and in in a system or component of failure can be happened with due
to the random reason or sudden reasons also, degradation mostly is understood mostly this is
more of systematic way of failure where slowly slowly either bear like for tire this will become

406
bear or slowly slowly it will bear or if we are dealing with metals, they may get rusted and
slowly slowly their strength will decrease. Similarly, other parts they may have with time they
may be their properties may be changing of the material and because of that over the time, the
system strength will not be same as it was when it was new.

So, there will be degraded state and from degraded state we have another failure rate by which it
will be reaching to the failure state generally this failure rate lambda 3 will be much larger than
our failure rate lambda 1 because in under degraded state chances of failures becomes high the
probability of failure becomes high.

(Refer Slide Time: 04:03)

Now, to solve this system we will use like what we have solved already. So, here we will try to
evaluate this now, as we know what will be DP 1 over dt that is outgoing only outgoing states we
have so, minus sin lambda 1 plus lambda 2 into P1 t same thing and for state number 2 we have
incoming that is lambda 2 from P 1 lambda 2 from P 1 and outgoing is from itself that is lambda
3 minus lambda 3 into P 2.

407
dP1 (t )
= − ( 1 + 2 ) P1 (t )
dt
dP2 (t )
= 2 P1 (t ) − 3 P2 (t )
dt
P1 (t ) = e −( 1 + 2 )t
2  e −  t − e − (  +  )t 
P2 (t ) = 3 1 2

1 + 2 − 3  
R(t ) = P1 (t ) + P2 (t )
2
R(t ) = e −( 1 + 2 )t +  e −  t − e − (  +  )t 
3 1 2

1 + 2 − 3  

Now, if we solve these then from this we already know as we have solved many times earlier,
this equation solution gives e to the power minus lambda 1 plus lambda 2 into t. So, p 1 t is equal
to e to the power minus lambda 1 plus lambda 2 into t. If you put the same thing here and we fix
all this then the 2 comes out to be this value this also we have solved many times or you can try
again and you can solve this.

Or if you want we can also solve this very easily that DP 2 P 2 t differentiation versus t is equal
to lambda 2 e to the power minus lambda 1 plus lambda 2 into t minus lambda 3 into P 2 t as we
can take this left side this will become DP 2 t over dt plus lambda 3 this is p 3 P to t lambda 3 P 2
t will be equal to lambda 2 P to the 1 minus lambda 1 plus lambda 2 into t.

As we have discussed earlier if we multiply E to the power lambda 3t on both sides then this will
give E to the power lambda 3 T P 2 t will be equal to integration of this term multiply with e to
the power lambda 3t lambda 2 e to the power minus lambda 1 plus lambda 2 once you multiply
with lambda 3 this will become inside it will become minus lambda 3 into t dt.

So, this if we solve this will be equal to lambda 2 divided by lambda 1 plus lambda 2 minus
lambda 3 minus sin e to the power minus lambda 1 plus lambda 2 minus lambda 3 t plus C and
we know at t equal to 0 this will be 0 this will be minus lambda 2 upon lambda 1 plus lambda 2
minus lambda 3 plus C.

So, C will be equal to lambda 2 upon lambda 1 plus lambda 2 minus lambda 3 and this equation
will be to E to the power lambda 3t E 2 t will be equal to lambda 2 upon lambda 1 plus lambda 2
minus lambda 3 into 1 minus e to the power minus lambda 1 plus lambda 2 minus lambda 3 into

408
t once you divide by this on right hand side this will become lambda 2 upon lambda 1 plus
lambda 2 minus lambda 3 into e to the power minus lambda 3 t minus e to the power minus
lambda 1 plus lambda 2 into t the same thing is here what we have done.

And what is reliability here now, it depends on how we define reliability in most of the cases in
whether system is in degraded state or good state in both the cases the system is working though
it may not in degraded state it may be having little lesser capacity, but still it is functioning so, if
you consider degraded state as a reliable state or as a working state in that case reliability will be
equal to probability of system state in p 1 or system state in P 2.

So, if we sum up the 2 probabilities we get the reliability if we consider only state 1 as the
reliable state state 2 and 3 are considered to be fairly state then the reliability will be only this
value. But generally we will consider P 1 2 plus p 1 t plus P 2 t as the reliability.

(Refer Slide Time: 08:32)

Now, let us see this is this an example. So, let us say we have a machine which is used in a
manufacturing process and experiencing complete failures as constant at 0.01. So, we have the
system state 1 complete failure that is our 0.01 in previous figure that has been given us lambda
1 and this is state number 3, this is lambda 1, then, however, the machine will degrade randomly
producing substandard parts at a constant rate of 0.05 per day.

409
P1 (t ) = e − (0.01+ 0.05) t
− Expected time in State1
1
 = 16.67 days
0.01 + 0.05
0.05
P2 (t ) =  e −0.07 t − e − (0.01+ 0.05) t 
0.01 + 0.05 − 0.07
− Expected time in State 2
0.05  1 1 
 − = 11.90 days

0.01 + 0.05 − 0.07  0.07 0.01 + 0.05 
Expected time in both states
- MTTF = 16.67 + 11.90 = 28.57 days

So, 0.05 is the rate by which it may get degraded once it is degraded, it will fail completely at a
constant rate of 0.07 per day 0.07 per day. So, this is my lambda 2 and this is my lambda 3. If I
want, I can use the formula which we developed earlier or what like all the equations, we have
developed by taking them by taking some representation here like lambda.

So, by putting the values in the formula we can got it, but we can also solve this directly. If you
see that, if we solve this directly, we know that DP 1 t over dt will be equal to minus of 0.01 this
1 and minus of 0.05 into P1 t. So, here this is equal to minus of 0.06 P1 t. So, if we solve this
then DP 1 t over P1 t, if we say will be equal to minus 0.06 dt. So, if we solve this then P1 t will
be equal to e to the power minus 0.06 t plus c at t equal to 0 p 1 t is equal to p 1 0 is equal to 1.

So, this will become 1 l2 to the power 0 is also 1. So, C will become equal to 0. So, this becomes
my P 1 t similarly, I can use the equation for this state now, for this state so, this we have already
got this.

So, once we solve this we will be able to get p1 t similarly P 2 t if we want we can solve again
from the figure directly. So, we can say DP 2 because sometimes solve if let us say you are not
able to remember these equations. So, if you are not able to remember these equations, then
solving preparing the Markov diagram and then solving it for lambda well lambda
representations or the symbolic representation would be a little bit tougher compared to if we
take the numeric values solving for numeric values maybe sometimes simpler, like if you write
dP dT over dt that is positive 0.05 or p 1 T and negative 0.07 P 2 t.

410
Now, here if we see if we can resolve this, so, DP 2 t over dt plus 0.07 e to t is equal to 0.05 or
P1 t we have already calculated here if we want we can put it this value directly here if we have
calculated for a certain time t we can use that directly, but we can use this so, that is e to the
power minus 0.06 t.

Now, here this as we discussed this become e to the point 0.07 t into P 2 T which is equal to
integration of 0.05 e to the power minus 0.06 plus 0.07 into t this 0.07 will get multiplied here dt
this if we solve this will be equal to 0.05 so, 0.06 and 0.07 so, this will be E to the power 0.01 T.

So, this will be 0.01 e to the power 0.01 t plus c and we know at t equal to 0 so, when you put
equal to 0 left hand side will be 0 and right hand side will be 0.05 divided by 0.01 which is
nothing but 5 it and E to the point 0 will be 1 plus C. So, this implies c will be equal to minus 5.
So, this p 2 T which I have got here I can solve it here.

So, my P 2 t will be equal to P 2 t into e to the power 0.07 t will be equal to 5 e to the power 0.01
t minus 5 now if I divide this by exponential term that my P 2 t will be equal to 5 minus 5. So, I
can take 5 common here or I can take then I will just do this little differently so here, P 2 t will be
equal to we will divide by here so this will become 5 e to the power minus 0.01 minus 0.07 will
be 0.06 T minus 5 e to the power minus 0.07 t.

So, if I take a common I can take or I can simply put this value as the P2T and same value is
given here, if you solve this, you will get the same value. So, this gives me a P 2 t and like this if
I solve 0.05 0.06 minus point, so minus 5 and this will become e to the power minus 0.0 7 t
minus e to the power minus 0.06t, which is same as this.

So, here as we see that, we can use the formula directly if we forget the formula, then we can use
this numerical or numerically also we can follow the same process of solving and we can find
out the values p 1 2 p 1 t and p 2 t. I am removing this because all the formulas are already
mentioned here. So, I will erase this.

(Refer Slide Time: 16:07)

411
Now, let us see here, we want to know the time spent in state 1, what we have we have state 1,
state 2, state 3. So, how much is the average time it will be (stan) (stan) spending in state 1 and
state 2 good state and degraded state. So, to know that we use the same what we have discussed
earlier. So, if I am interested in only state 1, so that means state 1 probability is this value. So, if I
integrate this probability with respect to time from 0 to infinity, I will get the state 1 average time
or expected time spent in state 1.

So, I will integrate this from 0 to infinity dt. And once I do that, we know that integration of this
will be 1 upon lambda and what is lambda here that is 1 upon 0.06. So, 1 upon 0.06 that can also
be around 16.67.

Similarly, if I want to know how much average time I will be spending system will be spending
in state 2, then, I will take the P 2 probability and I will integrate this from 0 to infinity dt and we
know that when we integrate this then as we have seen earlier, this is minus 5 multiply by then
we integrate this this will give 1 upon 0.07 and when we integrate this this will give minus of 1
upon 0.06.

This when we solve we will be able to get the expected times spent in state 2, which is around 12
days, which you can clearly see that in degraded state the probability of failure is faster, because
lambda 2 is high, sorry this lambda 3, this is lambda 2, lambda 1, lambda 2 lambda 3. So,

412
because lambda 3 is significantly higher than lambda 1, my lambda 1 is 0.01. And my lambda 2
is lambda 3 is 0.07.

So, failure rate is high that is why what will happen, it will spend less time in this state, it will be
(dep) the chances of departure from this state will be higher. So, the probability of being in this
state will be lower. And also means it will quickly try to departure to the state 3 compare to
departure from the stage 1 because the departure rate lambda 3 is higher than the departure rate
lambda 1.

So, because of that average time spent in stage 2 is only it is quite less than those, whatever time
is being spent in state 1. And how much is the MTTF as we discussed earlier, for MTTF, we
need need to consider the reliability. So, reliability here will be p 1 t plus P 2 T and if we take the
MTTF, then that will be this. So, RT is equal to P 1 T plus P 2 t.

So, if I integrate RT from 0 to infinity dt that will be summation of this. So whatever I have got
the values I already have the values for p 1 t dt and I have already have values for p 2 t dt
integration from 0 to infinity.

So, these values I already have 16.67 and 11.9. Once we apply this then my MTTF comes out to
be 28.57 days. So, this is the expected time which I am expanding in both the state together. So,
in in one way is giving me the MTTF.

413
(Refer Slide Time: 20:00)

So, here as you as you have seen, we can use this for calculating the reliability and MTTF for the
degraded system. So, 3 state system also can be explored in same way, but since 3 state system
we have already discussed in terms of RBD. So, we are not discussing the same thing again but
the same can be evaluated here also.

Now, let us consider another case in our earlier evaluations, we have considered that the system
can only fail it cannot be repaired, because there was no repair path available, but, if we consider
the system that it can get repaired then RBD cannot be used for that purpose because system is
state repair will depend on the failure state once it fails then only it can go go for the repair. So,
that is why we have to use this Markov diagram as a Markov process to solve the problem.

So, here like we can represent the same thing as a state diagram here so, this is state is my where
both the components are working. So, component 1 is working operating and component 2 is
also operating. So, there is 1 problem here that is this should be 2 lambda so, this I will correct
later.

So, because what happens here the failure rate is 2 lambda because there are 2 components
working since 2 components are working we know that any one of them can fail. So, either
component 1 can fail or component 2 can fail.

414
So, the failure rate becomes double that becomes 2 lambda because this becomes e to the power
minus 2 lambda t the reliability of staying in the same state is e to the power minus 2 lambda t.
So, the failure rate becomes 2 lambda.

So, same if I had made this diagram into this fashion 2 and if I made these 3 here lambda here
lambda here this is for component 1 component 1 failure and this is for component 2 failure then
again I can make this as a 4 that is again lambda again lambda this is for component 2 failure and
component 1 both are failed here.

So, either I can make like this or rather than doing this I can because both are similar component
both are lambda so, rather than departure it is 2 lambda here lambda for component 1 lambda for
component 2 and this state 2 and state 3 are the similar state in both the state 1 component is in
failed state and 1 component is in working state.

So, this diagram I can represent through this the 2 lambda going here and lambda going here
because here 1 component working and a 1 failed and here 2 working 0 failed. Now, here I am
removing this for clarity.

(Refer Slide Time: 23:22)

So, this is 2 lambda remember and here we can say that 2 components working and 0 component
in failed state this is 1 component in working 1 component in failed state. And here both
component in failed state 0 working.

415
So, here if I want to write here I am considering the repair also what I am considering that my
repair is also following the exponential distribution and the (repar) rate of repairs r similar to
lambda lambda is the failure rate r is the repair rate so, I can see that repair distribution is r e to
the power minus r into t this is f repair distribution pdf of repair. Similarly, my failure
distribution is lambda failure is lambda e to the power minus lambda t.

So, similar to that, if my repair is also following exponential distribution, because for Markovian
we try to use distribution which is following the exponential distribution so this r becomes the
rate of transition from state 2 to state 1. So, this gives me the repair possibility.

(Refer Slide Time: 24:49)

 + r + x1  + r + x2
dP1 (t ) P1 (t ) = e x1t − e x2t
= −2 P1 (t ) + rP2 (t ) x1 − x2 x1 − x2
dt
2 x1t 2 x2t
dP2 (t ) P2 (t ) = e − e
= 2 P1 (t ) − (r +  ) P2 (t ) x1 − x2 x1 − x2
dt .
x2 x1
dP3 (t )
=  P2 (t ) P3 (t ) = 1 + e x1t − e x2t
dt x1 − x2 x1 − x2
P1 (t ) + P2 (t ) + P3 (t ) = 1 x1 , x2 =
1
−(3 + r )   2 + 6 r + r 2 
2  

416
x1 x2
R (t ) = 1 − P3 (t ) = e x2t − e x1t
x1 − x2 x1 − x2
 x1 x2 
MTTF = 0  e x2t − e x1t  dt
 x1 − x2 x1 − x2 
−1  x1 x2  − ( x1 + x2 ) 3 + r
=  − = =
x1 − x2  x2 x1  x1 x2 2 2

Now, what happens here? This brings us to the scenario what is the scenario here that we initially
have 2 components working, here, 1 of them failed, we reached to this state. Now, what will
happen in this state 1 component is failed that goes into the repair and the component which is
working that continues to work.

So, because of that what happens from this state to this state, there are 2 possibilities. Now, 1 is
working 1 is in failed condition, if the component gets repaired and nothing happens to this what
will happen this will reach to the this state where both are working. But, before repair happens, it
is possible that the component which is working can also get failed. So, in that case what will
happen both will be failed 0 will remain in fail working condition. So, from this 2 state we have
2 transition possible either failed component get repaired, if it is repaired, we are reaching to the
state 1 where both components are working then the system will assume that both components
are working and other states is that the working component fails that means, repair does not
happen, but before that the working component gets failed.

So, in this case what will happen we will have the both components failure. So, now here what is
happening because of the repair, my reliability is supposed to improve the reliability which I had
earlier because earlier what was happening the failed component was thrown away from the
system the failed component was not was no longer considered to be part of the system, but here
the failed component can be repaired and because of that the system can again get strengthened
to working conditions that is the 2 components working.

Now, to solve this, we can evaluate this and get this for P 1 DP 1 T over dt will be equal to
outgoing is 2 lambda. So, minus 2 lambda from state p 1 p 1 T and what is incoming incoming
from state 2 so, plus and how much is that R R into P 2 t this becomes our first equation
similarly, for state 2 incoming is 2 lambda here so, 2 lambda p 1 T and outgoing is lambda 2 that

417
state 3 r to the state 1 from the state 2, so, minus P 2 T into lambda plus r that becomes about dP
T over dt.

Similarly, P3T over or DP3 change in P3T only incoming is there so, that will be the positive
lambda into PTT here when we are solving this this equations, we have to consider 1 more
condition that is p 1 T plus P 2 t plus P 3 t is equal to 1 this is the absolute absolute probability
which is true always whether time is 0 at time t equal to 0 p 1 0 is equal to 1, but, P 2 0 is equal
to p 3 0 is equal to 0 that means, initially the system is in state 1 that is why it p 1 0 is 1 and P 2 0
and P 3 0 is 0 anytime, whenever we consider the system has to be in one of those state either 1 2
or 3.

So, this equation has to be satisfied all the time. So, when we are solving these equations,
generally we drop one of the equations and we consider this equation as the additional equation.
So, here to solve this actually what is happening now, if I take these equations, then none of the
equation is having the single parameter like in earlier cases when we solve whenever we solve
for P 1 t all the equation was in terms of P 1 T there was no P 2 there was no p 3.

So, the equation could be solved easily and once we got the value of p 1 we put that value in the
equation for P 2 and then once we put the equation for p value for p 1, it becomes the equation
for P 2 only. And we could solve that easily.

So, single para because this was single probability values, it was easy to solve, but now, because
in every equation we have more than one terms like here we have p 1 and P 2 both in the second
equation we have p 2 p 1 both in third equation we have p 3 and P 2. So, none of the equation
can be solved directly. So, we have to use as system of equations here. So, we have to consider
that this is set of differential equations, which we need to solve together.

To do that, first we will try to get these equations modified little bit here actually we have DP3t
which is the shortest equation. And if we see p 2 t that is in p 1 and P 2, this is p 3 and P 2 now,
if I use this equation, I can convert this equation into if I replace P 1 value from here as 1 minus
P2T and P3T, then this equation will become only p 2 and p 3. So, we have this equation in P 2
and p 3 this will also be in p 2 and p 3.

418
So, that means, my number of parameters which I need to solve will will become 2. So, that will
be easier case to solve compared to if I take mix all three equations and then solve it. So, here we
will try to solve these two equations by taking as a system of equations and this to solve this
equation because both are differential equation, we have to use the Laplace transform. So, that
we will discuss in next class and we will continue our discussion on this system how to solve this
using the Laplace transform. So, we will stop it here. Thank you.

419
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 24
Markov Analysis (Contd)

(Refer Slide Time: 00:25)

Hello everyone, welcome back to our discussion on Markov analysis. In previous class, we
discussed about degraded systems, we also started discussing about the how to incorporate repair
in our Markov diagram and solve it. So, today we will try to solve that problem.

420
(Refer Slide Time: 00:48)

So, in previous class we discuss two component system with repair. So, here we consider that
that this let me correct this and make it 2 lambda so, there is no confusion here. So, here we have
2 (sys) 2 components working or 2 systems working one of them is failed in state number 2 and
here both the components have failed the when 1 failure and 1 working state is there then there is
a repair possibility and once repair is successful within available time, then my system can be
restored to back to the condition 1 system state 1 where both components will again start
working.

So, here to solve this as we have seen or discussed earlier, we will try to use these 3 equations for
P 2 and p 3 as well as p 1 p 2 p 3 is equal to 1 and using these equations, we will try to make a
set of equations and try to solve it using the Laplace diagram.

421
(Refer Slide Time: 01:58)

I have written these equations here for the reference so, that we can follow up to solve this. To
solve this first what we what we will do we want this equation to be 2 probabilities. So, first I
will convert this p 1 into P 2 and p 3 so, my both equations when they will then be belonging to
P 2 and p 3 only.

So, this will be dP 2 t over dt is equal to 2 lambda into 1 minus minus P3T minus r plus lambda
into P 2 t. Now, this when we solve then if you open this will become 2 lambda minus P 2 t now,
P 2 T is having r plus lambda negative here and minus 2 lambda here. So, that is 2 lambda plus r
plus lambda minus 2 lambda into P 3 t this if we solve further this will become 2 lambda minus
here if I solve this this will become 3 lambda plus r into P 2 t minus 2 lambda P3T.

Now, let us consider the Markov transformation sorry, Laplace transformation if we apply
Laplace transformation then we are calling Laplace of any probability function Pit let us say that
is zis, since s will be common for all so, I will be using for short form as zi. So, if we do this then
we know we take the Laplace transform for all for this complete equation, if we do that then
Laplace transform of and we know that P 1 0 is equal to 1 p 2 0 is equal to 0 initial conditions are
P 3 0 is equal to 0.

We know Laplace transform of differentiation term produces the s into Laplace transform of the
term that is z 2 minus z minus p 2 at 0 that is equal to Laplace transform right side so 2 lambda

422
this is constant value if you take Laplace so, then it will become 1 upon s minus 3 lambda plus r
Laplace transform of P 2 will be z 2 minus 2 lambda Laplace transform of P 3 will be z 3 P 20 is
0.

So, my equation will turn out to be s z 2 I can take this z 2 term there so this will become plus 3
lambda plus r into z 2 this also I can take left hand side this will become 2 lambda z 3 which is
equal to 2 lambda upon s so, here this equation I can write as z 2 into s plus 3 lambda plus r plus
2 lambda into z 3 is equal to 2 lambda upon s.

Similarly, I can solve this equation if I take a Laplace transform for this this will become s into z
3 minus p 3 at 0 will be equal to lambda z 2 and P t equal to 0 at 0 is 0 from here, so, this
becomes s z 3 is equal to lambda z 2 this I can solve take on one side so, this will become
lambda z 2 minus s z 3 equal to 0.

So, I have these two equations let us see equation 1 and equation 2. Now, these two questions if I
make a set of equations, then I can make this equation in the matrix form if I convert this into
matrix form then for this equation it will become s plus 3 lambda plus r this is 2 lambda and from
here if I take z 2 multiplied by lambda and z 3 multiplied by minus s into z 2 z 3 will be equal to
this equation is 2 lambda upon s and this will be 0 this right side is 0 this is 2 lambda upon s.

So, from system of two equations I have made the matrix equivalent now, this matrix equalent I
can solve using the Cramer's so, Cramer's rule says that if I want to know the z 2 here then z 2 I
can evaluate first let us see, we can evaluate z 3 what happens here if we see here my reliabilities
probably p 1 plus P 2 and what is p 1 plus p 2, p 1 plus P 2 is 1 minus p 3 t so, or because this is
on reliability ft is equal to P 3 t.

So, my reliability is 1 minus f t that is 1 minus P 3T. So, if I want if I know the z, if I know the
P3T then I can know the reliability directly I do not have to calculate p 1 T and P 2 T still, we
can calculate that, but for the reliability calculation if that is our only aim then knowing z 3 is
enough or knowing P 3 is enough so, for P3 calculation, we can calculate the z3.

So, z 3 as we see, z 3 will be equal to so, if I want to calculate z3 z3 is on the second second row,
so, I have to replace this second column with the output column. So, this determinant of s plus 3
lambda plus r first column will remain same second column will replace by the output column.

423
So, that will become 2 lambda upon s 0 divided by the determinant of my metric which I have to
solve that is s plus 3 lambda plus r lambda 2 lambda minus s. To solve this I can take z3 is equal
to this into 0 will be 0 and this will become minus 2 lambda square divided by s divided by if I
multiply this this will become s into s plus 3 lambda plus r that is also minus minus s into this
and this will be minus into lambda into 2 lambda minus 2 lambda square all are minus so I can
make them plus.

And if I look at the denominator part or I can see this is 2 lambda square upon s into solving the
denominator will give s into s s square plus 3 lambda plus r into s and this is 2 lambda square.
So, once I have this this is my z 3 now z3 if I want to solve, I have to first convert this equation
and we have to get it in parts.

So, this equation I can write it as this is equal because this is s squared so, I can find out the 2
routes for this. So, let us say those routes are x 1 x 2, then in that case this z 3 will be equal to s
into s minus x 1 and S minus x 2, what is x 1 x 2 here x 1 x 2 are the 2 routes. So, for the
quadratic term the 2 routes can be evaluated as x 1 x 2 is minus b.

So, minus of b is here 3 lambda plus r minus of 3 lambda plus r plus minus square root of b
squared b squared b square of 3 lambda plus r 3 lambda plus r if I square, this will become 9
lambda squared plus r square plus 3 to the 6 lambda into r and minus 4 AC minus 4 a is 1 C is 2
lambda squared, so, 4 into 2 will become 8, lambda squared divided by 2 a a is 1, so, this will
become 2.

Now, this if I solve further then 9 lambda square, and this is 8 lambda is square, so, 9 minus 8
will become 1. So, here to save space, I am doing the correction here itself, not I am doing I am
subtracting it from 9 and what I get here I will get only lambda squared. So, that will be lambda
squared plus r squared plus 6 lambda this becomes my x 1 x 2.

So, x 1 will we can take anyone x 1 is 4 let us say positive because we take the square root
positive and x 2 can be the value when we take the negative value of the square root, this gives
me the 2 values. So, here these values I can solve separately because I know the lambda value
and r value for a given problem and I can find out x 1 x 2 to solve this equation for z 3 I can
solve this equation in z of z 3 in terms of x 1 and x 2 itself which I will solve in next slide.

424
(Refer Slide Time: 12:14)

So, what I have I have z 3 is equal to 2 lambda square upon s s minus x 1 s minus x 2, 2 lambda
squared upon s minus x 1 into x minus x 2 this I can solve as I can make this as it is equal to A
upon S plus B upon S minus x 1 plus C upon S minus x 2. If I solve this here then as we know
we can solve this by part so, this will become s into s minus x 1 into S minus x 2 and so, if LCM
will be this if I divide by s that this will become s minus x 1 into S minus X 2 so, A into S minus
X 1 into x minus x 2 that is s squared minus x 1 plus x 2 into s plus x 1 x 2 plus B into B into if I
divided by x minus x 1 so, this will become S minus S into S minus X 2.

So, that will become S squared minus x 2 s similarly, C will get multiplied with s into x minus x
1 so, that will be S squared minus x 1 s. Now, if I compare the term denominator is equal and if I
compare the terms on here then we will say because there is no s squared term here. So, for S
squared terms like I will just do this calculation again x squared is a plus b plus c and s terms if I
see then plus S into minus x 1 plus x 2 into a, s here is minus x 2 B and S here is minus x 1 C and
remaining terms are plus x 1 x 2 a there is no term without s here. So, that is the only term
divided by S into S minus S minus X 1 into x minus x 2.

So, if we compare this then a plus b plus c will be equal to 0 and all are negative. So, if I want I
can write it in terms of x 1 plus x 2 into a plus x 2 into B plus x 1 into C because there is no s
term here so, that will also be equal to 0 and last term is x 1 x 2 A that is equal that is the term
without s.

425
So, that is equal to 2 lambda squared here or I can say is equal to 2 lambda square upon x 1 x 2
this x 1 x 2 if we see here x 1 is 1 with plus sign and x 2 will be with minus sign and we know
so, I can say this is A and this is B, so, I can say this is a minus b one will be a plus b another
will be a minus b.

So, x 1, I can write it as x 1 is equal to minus a plus b and x 2 is half of 8 and x 2 I can write as
minus a minus b or if I take this and what is A and B here. So, if I take x 1 x 2 that will be equal
to 1 by 4 I can take minus here so, this will become plus here 1 by 4 minus and a plus b into b
minus a will give you b squared minus a squared or this will be 1 by 4 a squared minus b squared
and what is my a here is s 3 lambda plus r.

So, A squared will become 9 lambda square plus r square plus 6 lambda r and what is b, b is
square root of lambda squared plus r square plus 6 lambda r. So, minus lambda square plus r
square plus 6 lambda r whole divided by 4 this if we see 6 lambda r will get cancelled r squared
will also get cancelled, and 9 lambda squared minus lambda squared will give 8 lambda squared
divided by 4 which is equal to 2 lambda squared.

So, my x 1 x 2 turns out to be 2 lambda squared. So, this a will become 2 lambda square upon 2
lambda square that will be equal to 1 my a is 1 so, now, I can use the same thing in these two
equations. So, from these two equation b plus c will be equal to minus 1 and if I put a equal to 1
this will become x 1 plus x 2 plus x 2 B plus x1c is equal to 0.

Now, these two equations we can solve I can do that I can replace b here. So, b will be equal to
minus 1 minus c I can place this here so, this will become x 1 plus x 2 minus x 2 because x 1
become minus x 2 into 1 plus C plus x 1 c is equal to 0.

So, this if I solve further then x 1 plus x 2 minus x 2 minus x 2 c plus x1 c is equal to 0 this is
cancelled and if I take this then c into x 1 minus x 2 is equal to minus x 1 or I can say C is equal
to x 1 upon x 2 minus x 1 if I multiply by minus sign on both sides and this will become x 1 x 2
minus x 1 and what is b, a is equal to 1 we already got C is equal to x 1 upon x 2 minus x 1 and b
is equal to minus 1 minus c.

426
So, minus 1 minus x 1 upon x 2 minus x 1 so, x 2 minus x 1 if we take common then this will be
minus x 2 plus x 1 minus x 1 so, minus sin is 0 so, minus sin can be taken below so, this will
become x 2 upon x 2 minus x 1 minus x 2.

Now, here if we see my equation z 3 z 3 will now become a is 1 1 upon s plus b is x 2 upon x 1
minus x 2 into 1 upon s minus x 1 plus C is x 1 upon x 2 minus x 1 and into 1 upon s minus x 2.
Now, this equation which we have got, I am removing some equations here because we have
already solved this. So, we will have a little bit more space to understand this.

(Refer Slide Time: 20:42)

So, here, we will now take the Laplace inverse, which is quite straightforward here. And Laplace
inverse as we know Laplace inverse of 1 upon s is a constant quantity. So, Laplace inverse of z 3,
if you take Laplace inverse then z 3 of Laplace inverse of z3 P3T, Laplace of z 3 Laplace of p 3
was z 3, so, Laplace inverse of z3 will be P3T and Laplace of 1 is 1 upon s.

So, Laplace inverse of 1 upon S is 1 plus x 2 upon x 1 minus x 2 and we know Laplace inverse of
1 upon s minus x 1 is e to the power plus x 1 t. Similarly, plus x1 upon x 2 minus x 1 e to the
power x 2 t. So, if we see here because x 1 x 2 is known to us, we already have calculated here, if
you put x 1 x 2 calculated value from here in this equation, I will get the P3T and RT I can get as
1 minus P3T. So, this will be 1 minus this. So, 1 minus 1 will be 0 and this will be negative sign
negative sign if I take below this will become reversal sign reversal will be there.

427
So, this will become x 2 upon x 2 minus x 1 e to the power x 1 t and this will be plus because this
is negative so, this becomes x 2 minus x 1 this becomes x 1 minus x 2 upon x x 1 minus x 2 e to
power x2 t the way we have calculated P3 t we can also calculate the p 2 T in a similar fashion
by using the Cramer's rule.

If you see here my rt is x 1 upon x 1 minus x 2 x 2 T x1 upon x 1 minus x 2 e to the power x 2 t


if I take this x 1 minus x 2 then this will become minus minus x 2 upon x 1 minus x 2 e to the
power x 1 t minus x 2 upon x 1 minus x 2 e to the power x 1 T.

(Refer Slide Time: 22:39)

Same value which we have got there, this is P3T is also saying 1 plus x 2 upon x 1 minus x 2 e to
the power x 1 t and this we have written x 1 upon x 2 minus x 1 e to the power x 2 T same value
which we have got here same are reflected there. So, we are able to calculate reliability here
directly.

428
(Refer Slide Time: 23:08)

If I want to calculate MTTF, then again MTTF will be x 2 upon x 2 minus x 1 into 1 upon x 1 as
we see minus lambda it comes so this will be minus sign and again minus of x 1 upon x 1 minus
x 2 into 1 upon x 2.So, we can solve this and we can we will be able to get the values of MTTF
also I have explained this for calculating P3T we can follow the same process and we can get the
value of P2T also.

I will try I will show you for so that you will be able to follow up better. So, that same process
what we have done earlier for P3t, we will follow the same process for P2T, so that you are able
to follow it.

429
(Refer Slide Time: 24:05)

So, as an exercise, I am doing it for z2t. Now z2 t as we have seen here. If we solved from here,
then z2 is the first row. So, that means first column will be replaced by the output column. So, 2
lambda upon s 0. So, (deno) determinant of 2 lambda upon s 0 and 2 lambda minus s second
column will remain as it is 2 lambda minus s divided by the denominator term we already solved
that denominator term term came out to be see same thing will come up s s minus like we got
here.

This value was 2 lambda square upon s divided by s into this so, rather than 2 lambda squared by
s this this was the determinant value for this that determinant for this was this s into s plus 3
lambda plus r plus 2 lambda squared s into s plus 3 lambda plus r plus 2 lambda square. Now,
this if we solve further this will be equal to this multiply by this will be minus 2 lambda s into s
will be 1 divided by and 2 lambda into 0 will be 0 and this we already saw this as s square plus 3
lambda plus r into s plus 2 lambda squared this was also negative as we got earlier this was also
negative. So, this will also become positive again.

Now, we got this value now, to solve this what we can do we can again put it like 2 lambda upon
s minus x 1 and x minus x 2 the x 1 x 2 will be same, because this is the same equation with what
we solved earlier. So, x 1 x 2 are going to be the same value, what we calculated here we only
have to solve this for getting the z 2.

430
So, this is not t sorry this is z 2, z 2 function of s. So, j 2 is equal to 2 lambda upon s minus x 1
into S minus X 2 now, this again I can put it into two parts a upon s minus x 1 plus b upon s
minus x 2. So, if I solve this this will be s minus x 1 into S minus X 2 a into S minus X 2 plus b
into S minus X 1.

So, this implies a plus b into S minus x 1 minus a into x 2 plus b into x 1 divided by s minus x 1
into S minus X 2 now, denominator is same and there is no s term so, we can make equations
from here A plus B is equal to 0 and the constant term here is a x 2 plus b x 1 that is equal to 2
lambda. So, from here I can say a is equal to minus b if I put it here then this will become minus
b x 2 plus b x 1 is equal to 2 lambda.

So, if I take B here common then x 1 minus x 2 is equal to 2 lambda so, b will be equal to 2
lambda upon x 1 minus x 2 and as we know what is my a a is minus b so, a will be equal to
minus of this so, I can say 2 lambda minus sign I can take in denominator that will become x 2
minus x 1 so, here I have got a and b both so, my z2 z 2 I can write it as a upon x minus x 1 that
is 2 lambda upon x 2 minus x 1 into 1 upon s minus x 1 plus B, B is 2 lambda upon x 1 minus x
2 into 1 upon s minus x 2 this if I solve by Laplace inverse then left hand side will give me P 2 T.

And if I solve this first term will come as it is 2 lambda upon x 2 minus x 1 and this value will
give me e to the power x 1 T and second value is 2 lambda upon x 1 minus x 2 e to the power x 2
t. So, this is how x 1 x 2 is known to me already and lambda is also known to me already. So, I
can get the P 2 as the function of time.

So, as you see here we are able to calculate P 2 and P 1 I can calculate as 1 minus P 2 T minus P
3t, I do not have to calculate, again, using the same equations for the P1t, that I can directly
calculate as 1 minus P 2 t minus P3t.

431
(Refer Slide Time: 30:00)

This same thing is coming here so, here by using these equations we are able to solve and we
will be able to get the probability values and relative values also we are able to calculate MTTF
also we are able to calculate and x 1 x 2 is also here.

(Refer Slide Time: 30:28)

So, let us see one example here that a computer system consists of 2 active parallel processor
each having a constant failure rate of 2.5 failures per day. So, that means, if I make this diagram
then I had 3 states here.

432
1
x1 = −(3  0.5 + 2) + 0.52 + 6  0.5  2 + 2 2  = −0.1492
2  
1
x2 =  −(3  0.5 + 2) − 0.52 + 6  0.5  2 + 2 2  = −3.3508
2 
−0.1492 −3.3508
R(t ) = e −3.3508t − e −0.1492t
−0.1492 + 3.3508 −0.1492 + 3.3508
= 0.8999
−0.1492 −3.3508
MTTF = −
(−0.1492 + 3.3508)3.3508 (−0.1492 + 3.3508)0.1492

Now, here 0.5 is for the 1, 1, 1 part 1 processor. So, for 2 processors this will become 1 point 0
failure per day 1 failure per day and for 1 processor it will be 0.5 failure per day and there is a
repair repair failure process requires an average 1 half of a day that means, it will be 0.5 so repair
rate is half of the day half per day half of the day sorry repair rate is means repair time is 1 upon
r mean time to repair is 1 by 2 half day.

So, r will be equal to 1 by 1 by 1 by 2. So, that will become 2 per day in half day repair can
complete so, in 1 day you can do 2 repairs. So, this will be 2, r will be equal to 2 and lambda is
equal to 0.5. So, if I use the same thing in this equation, I can solve this. So, my x1 values half of
minus 3 into lambda plus r so, 3 into lambda is 0.5 r is 2 and square root of so, x 1 I am taking as
the positive quantity x 2 I am taking as the negative quantity and this is lambda square plus 6
lambda into r plus r squared. If I solve this this value comes out to be minus 0.1492 x 2 is also
same value but sign is negative here.

So, this will be minus 3.3508 because, once we have got x1 and x 2 both then our rt as we
calculated earlier our rt was coming out to be x 2 upon x 2 minus x 1 e to the power x 1 T and x1
upon x 1 minus x 2 e 2 x 2 T. So, same thing will happen E to the power x so, if I take x 1 minus
x 2 then this term will come first and this will become minus. So, here the formula use this x 1
minus x 2 this is x1 this is minus x 2.

So, x, x 1 upon x 1 minus x 2 e to the power minus x 2 minus x 1 upon x 1 minus x 2 e to the
power minus x 1 this formula we have used same formula what we developed earlier and using
this we can get the reliability as 0.8999 and MTTF is again as we discussed earlier, MTTF is also

433
whatever value we get here that will be same here divided by 3.3508. And similarly, whatever
value this we got here same as here divided by 0.1492 and this gives me the MTTF.

(Refer Slide Time: 34:01)

So, using same we can if you want we can solve this in Excel how much value of MTTF is
coming. So, I will equal to minus of 0.1492 divided by minus of 0.1492 plus of 3.3508
multiplied with 3.3508. So, this is my first term and second term is equal to minus minus will
become plus.

So, that is 3.3508 divided by minus 0.1492 plus 3.3508 multiply with 0.1492. And MTTF will be
this plus this. So, my MTTF is coming out to be around 7 days here all values were in per day.
So, my MTTF comes out to be approximately 7 days. So, here as you have seen, we are able to
calculate reliability and MTTF when repair is considered repair whenever we consider repair as
we see that we have the loop because in our equation becomes a little bit complex to solve, but
they are solvable.

So, maybe little and if the problem becomes more complex, then we may be able to use other
methods like approximation numerical methods to solve the differential equations the same, so
we will stop here today we will discuss with one more system in next class. Thank you.

434
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 25
Markov Analysis (Contd.)

Hello everyone, so we will continue our discussion from our previous lecture. In previous
lecture, we discussed that when a repair is considered and then, with the effect of repair how
it we can consider it in the Markov diagram by taking a repair path also. Once we take the
repair path then our reliability equations can be again calculated and we can get the system
reliability.

We will continue our discussion with one more system that in earlier case we considered that
there are two components which are working and for that we consider that only one system is
enough to work for the purpose. So, that whenever one out of the two, one system fails, it can
be repaired by another system continues to work and our system is still working.

Now, in this case we will see the standby case, that there is one unit which is working and
one unit is in standby. Again, we will consider identical units again here and because of that
what will happen if one unit fails then another unit will be plugged in. So, that will be in
standby earlier. So, it will not fail during the standby condition. So, when it is plugged in, the
field condition field unit will go to the repair and the plugged-in equipment will continue to
work.

(Refer Slide Time: 01:51)

435
So, this system can be represented by Markov diagram as we have made it here, that here we
are considering that the 2 units are different the unit which we are considering for standby is
different and our unit which is considered for primary unit that is different. They may be
same also, if they are same, you can consider both as the Lambda.

But what happens in earlier case 2 units were working but here 1 unit is working and 1 unit is
in standby. So, only the working unit will have the failure rate. So, the failure rate of working
unit that is unit 1 primary unit the failure rate is Lambda 1 and the for the unit 2 the failure
rate is Lambda 2.

So, now what happens if 1 unit fails then another unit will be replacing the function, while
the second unit is functioning the first primary unit can be repaired and it can be going to the
state number 1. If it goes state number 1 again, then what will happen? Lambda 2 will the
second unit will again become standby and first primary unit will be replacing the standby
unit again, that is what generally happens if we take our ups system in case. So, here we have
the main power supply which is working and this UPS is acting as this standby.

So, when main Supply fails then what will happen our secondary unit which is the UPS that
will take care of the power, mean time if the main Supply is to be restored, the rate of
restoration is r. So, if main Supply is restored then again the UPS we will switch Supply from
UPS to again to the main Supply, that will be our again the same system will be there that we
are working on Main Supply and UPS becomes the standby. So, here whatever we consider
the same situation here is also depicted.

(Refer Slide Time: 03:51)

436
So, in that case if we want to calculate reliability for this kind of system, how can we
calculate? So, this is what we have done earlier similar is applicable here. Same thing the way
we solve the earlier problem for the 2 unit working and 1 unit in, 1 one unit can get repaired
whenever it fails then system will be working. So, here again this is a working condition, this
is also working for the system, this is also working and this is failed. In reliability analysis
whenever we do, we do not consider this this also called is absorbing state.

The failure state is generally considered the absorbing state. So, whenever we reach the
failure, then our reliability goal is diff is not fulfilled. So, whenever a system failure happens
then it is considered to be a failure, and from that system we do not consider the repair again.
So, there is a difference in the concept of availability and reliability, whenever we consider
availability at that time, we consider that repair is possible from complete failure state also.

So, in that case we will take the repair path from here also, but in case of reliability we do not
consider repair from the failure state. So, that system continues to work, the moment system
is failed our system objective is not completed and we will be having the consequences of
failure.

So, because of that, that last state or the failure State whenever we want to calculate
reliability, they will not have any repair, they will be absorbing state, only incoming States
will be there, there will be no outgoing State from here and that is what we are calling
absorbing state. So, what happens if time becomes infinite?

If time is more and more as time is progressing, at the end the system will always land up in
this state. So, because from once it reaches to this state, then it cannot change, the state once
it is failed, then it is permanently failed, then there is no possibility of going back into the
working States again. So, that is what is when we consider the reliability.

437
Actually, there may be a possibility of repairing, but in that case, we will be considering
availability calculation, we will not be calculating that as a reliability. The moment system
fails, our relative objective is defined and because of that system our prob re-labbed is the
probability of successful continuous operation. So, here that becomes unreliability.

So, probability of being in state 3 is unreliability. And/or logically if we see whatever


happens here, it will continue but the moment it reaches to state 3 then system objective is
failed. So, and if I take time t equal to Infinity some day or some sometime it is going to lead
to the state number 3, that will be my failure probability.

Now, to evaluate this, earlier we considered like if we compare this diagram with earlier
diagram it will look similar not much your difference will be there, the only thing is the
values and representation is only difference. So, let us try to solve this time this diagram with
use of values. So, we will try to solve these equations are already given here, rather than
solving them in terms of Lambda upon Lambda 2r, which you can do by following the same
steps which we followed for the 2-component system, let us try to do this directly with the
numerical value.

(Refer Slide Time: 07:51)

So, let us take one example here. So, let us take that we have on board computer system
which has built in test equipment capability. Built-in test equipment is generally critical
systems which on failure can lead to disasters, like airplane systems, missile systems. So,
they what they have they have self-checking capability.

438
So, they have a built-in test equipment. So, like if you have seen many times various plays,
etcetera that whenever even if you have gone somewhere then initially before taking the
flight or before the any rocket is launched, like if you have seen ISRO launch, if you have
seen NASA launch, they have a reverse counting, reverse counting that means actually that
test is running for almost 1 week or 4 days where they are testing each and every parameter
of your subsystems and identifying back other all parameters are in range or not.

So, that is the built-in test equipment, which is testing, its own circuit, its own components
and verifying whether they are in good state or not whenever everything is verified and found
in the good State then only the launch happens. So, this is the built-in test equipment and of
being restored when failure occurs.

So, built-in test equipment can identify the fillers, can also correct the fields. So, we have a
standby computer here whenever the primary fails. Assuming, Lambda 1 as 0.005, Lambda 2
is 0.002 and r is 0.1, determine the system reliability for 1000 hour interval? So, I can make
this diagram here, I have the, here the primary system is working and secondary is in standby.

So, my failure rate from State 1 is Lambda 1, that is 0.0005, then it may reach to the second
stage, in second state what happens primary is in failed condition, I can say it is in failed
condition or I can say that primer is under repair, because the field unit is getting repaired,
immediately when it fails, it will go to the repair.

So, it can get repaired and again you can reach to the first stage where primary is working
and secondaries in standby. The repair rate is 0.1 and how much is the failure rate for unit 2?
Because now in this case, secondary, primaries field and secondary is working. So, the failure
rate of secondary is Lambda 2 that is 0.002, and my third state is the failure state, if it reaches
to this state I cannot recover the system, my complete system failure happens here. So, for
this system now we want to calculate the reliability.

So, to calculate the reliability as we have seen earlier, I can develop the equations here again.
So, I will make this in a separate slide, so, that we can follow the same. So, my values are
0.0005, 0.002, 0.1.

(Refer Slide Time: 11:11)

439
I will make the same system here 0.0005 and repair it as 0.1 and this is 0.002. So, let us
follow the same steps as we have followed earlier. So, dp1 t over dt will be equal to minus
0.0005 P1t and plus 0.1, P2 t incoming positive outgoing negative. Similarly, dp2 t over dt is
equal to for this state incoming is 0.0005 from P1, State 1 and outgoing is 2, outgoings are
there 0.1, here 0.002 here, minus 0.1 plus 0.00 2 into P2 t and similarly dp3 t over dt, there is
only incoming no outgoing this is of absorbing state, that is 0.002 into d2 t.

As we discussed earlier, we can solve these equations, to solve these equations we will be
using 1 more equation that at any time system has to be in 1 of the 3 states, and, the
summation of probability of the 3 state is always 1. So, as we discussed earlier, we can solve
these 2 equations by replacing P1 here, this second equation I can again write it as P2 t
differentiation versus dt is equal to 0.0005.

Now, P1 t I can write it as 1 minus P2 t. So, minus P 2 T minus P3 t, minus this is 0.102,
0.102 into P2 t. This equation let us solve further, here my dp2 differentiation of P2 t over dt
will be 0.0005 minus 0.0005. Now, here 0.102 is also 0, 0.102 into P2 t and for p3 t, only 1
term is there that is 0.0005 P3 t.

Now, this equation and another equation is this. So, if let us say this is equation number 1,
this is equation number 2 for solving this. Now, let us take the Laplace transform. So, if I take
Laplace transform, this will become Z2 into S and P2 at 0 is 0, so minus 0, this is equal to for
constant term Laplace transform is 1 upon S minus…

Now, this if we sum up this will become 1.102, 0.0005. So, that will become a 0.1025 and P2
team is Z2 minus 0.0005 Z3. If I take it on 1 side, then S Plus 0.1025 into Z 2 plus 0.0005 Z3

440
is equal to 0.0005 divided by S, this becomes my lattice equation number 3. Now, from this
equation number 1, if I take Laplace transform, then my equation will become Z3 into S
minus 0 P3 at 0, 0 will be equal to 0.002 into Z2. So, this I can write it as 0.002 Z2 minus S,
Z3 is equal to 0.

So, now I have set of 2 equation third and fourth, I can convert this into Matrix form, if I
convert this into Matrix form this will become S Plus 0.1025, 0.0005 and for this equation
this is 0.002 and this is minus S multiplied with Z2 Z3 is equal to, this equation is 0.0005
divided by S and this is 0. Same equation as we developed here, same equation has come
here.

As we know that my objective is to calculate system reliability, why will I calculate? So,
many values I will simply calculate the Z3 only, I will calculate the Z3 here. So, Z3 will be
equal to replacing the second column with this, this will be S Plus 0.1025 First Column will
remain same and second column will replace by the side of the values 0.0005 divided by S
and 0, divided by this Matrix or determinant S Plus 0.1025, 0.0005 0.002 minus S. If I solve
this, then Z3, if I solve it this is 0 and minus of this will be minus of 0.002 into 0.0005
divided by S, whole divided by this.

If I multiply then minus of S into this, that will become S Square Plus 0.1025 into S and
minus of this because minus sign is outside. So, this will become plus plus 0.002 into 0.0005.
This if I solve, then 5 into 2 will become 10 and or minus 3, 10 to the power minus 3 10 to
the power minus 4, 10 into 10, 10 into 10 to the power minus 3 minus 4, 7 divided by S, now,
minus has become plus divided by S square plus 0.1025 S this 10 into 1 10 to the power
minus 7, I can write it as 10 to the power minus 6 this becomes 10 to the power minus 6
divided by S into S square plus 0.1025 as plus 10 to the power minus 6.

Now, this equation if I want to solve, then I will get the X1 X2, for X1 X2, I have to solve
this equation that is minus B, minus 0.1025 plus minus square root of B Square, that is
0.1025 Square minus 4, A is 1 and C is 10 to the power minus 6 divided by 2a, these values
which I have got I can calculate I will show that in using the excel, I can calculate and I will
show you. X1 X2 I have got, so I can write this as, sorry, I can write this Z3 now as 10 to the
power minus, I will take this on next slide.

(Refer Slide Time: 20:26)

441
So, I will say Z 3 is equal to 10 to the power minus 6 divided by S into S Square S into S
minus X1 into S minus X2, X1 X2 I have already seen how to calculate and this I can again
put as a upon S Plus B upon S minus X1, plus C upon S minus X2. And as we have solved
earlier that is S into S minus X1 into S minus X2, this will become A into S Square minus X1
Plus X2.

If you wish, we can calculate directly first and can put the values here, that will be even
easier. So, let me calculate the X1 and X2 here, I will use this I will keep the notation here, so
that we can follow it later on. I will remove this from here, let us see my values were 1.1025
and 4 into 10 to the power minus 6.

So, I will place it here, minus 0.1025 and second value was minus 1.025, 0.025, 0.1025
square and, 0.1025 Square minus 4E minus 6, I will put equal sign here and I have to now
take the square root of this this is equal to this power 1 by 2 or 0.5. So, this is my first term
this is my second term.

So, my X1, if I calculate X1 is equal to this term minus, I will take the plus here, plus square
root of this, that is my square root term and whole divided by 2, so I will divide this by 2.
Same way, I can calculate X2, for X2, I will take the negative term. So, that means I will take
this term minus square root term divided by 2.

So, my X1 X2 is known here, same X1 X2, I can use directly here and X1 plus X2 is also
required. So, I will take the X1 Plus X2. So, X1 Plus X2 is equal to this Plus this, that is Point
similar to because 0.97 K of 0.102 B4 plus B3, sorry, there is a mistake here, so minus
1.1025.

442
So, same thing let us see here rather than writing. So, I can write here this is equal to A into,
X1 X2 is known to me, if I want, first let us solve here that A into S Square minus. So, minus
will become plus, 0.1025 plus X1 X2. So, X1 X2 again, we have to calculate or that will be
little problem. So, what we do? We first solve this in X1 X2 term and let us see how we go
forward.

So, plus X1 X2 will follow the same as we did earlier B into S Square minus X2 into S plus
C into S Square minus X1S and when we compare the term then A plus B plus C into S
Square minus X1 plus X2 into a plus X2 into S X2 into B Plus X1 into C whole multiplied by
S plus X1 X2 a divided by S into S minus X1, S minus X2, and solving this will give A plus
B plus C is equal to 0, and X1 Plus X2 into A plus X2 B plus X1 C equal to 0 and third is X1
X2 a equal to what we have 10 to the power minus 6. So, A will be equal to 10 to the power
minus 6 and X1 X2.

Now, here, and B will be equal to, A plus B plus C will be equal to minus A. So, minus 10 to
the power minus 6 divided by X1 X2. Here, we will try to calculate this X1 X2 here, X1 into
X2 will be equal to this multiply by X1. So, that is 10 to the power minus 6, this is almost
similar to what we solved earlier, this becomes X1 X2 is 10 to the power minus 6, that will
become minus 1 and. So, A is equal to 1, B plus C is minus 1, B is equal to minus 1 minus C
if I put that in here, then this will become X1 Plus X2 into A, Ais 1 plus B is minus 1 minus
C, plus X2 into minus 1 minus C plus X1 C equal to 0.

Now, X1 Plus X2 minus X2 minus C X2 plus X1 C equal to 0. So, C into X1 minus X2 plus
X1 equal to 0. So, C will be equal to X1 upon reverse of sine will happen, X2 minus X1, and
B will be equal to minus 1 minus C, minus 1 minus X1 upon X2 minus X1. If I take X2
minus X1 as common minus X2 plus X1 minus X1. So, minus X2 upon X2 minus X1, that
will be equal to X2 upon X1 minus X2.

Now, all the values I have got and I can calculate this here. So, this will be equal to 1 upon S
and how much is B? value of B is X2 upon X1 minus x 2. So, let me calculate that again here,
I can calculate X1 minus X2 here, that is equal to X1 minus X2. Similarly, I can calculate X2
minus X1 that will be nothing but minus of this 2 minus X1.

So, minus 1.025 approximately you can take. So, this will become plus 1 plus B is how
much? B is now we can take X2 divided by X1 minus X2 again, we have to, we can again do
this. So, this will be equal to X2 divided by X1 minus X2, if I calculate this, this will be equal
to X2 divided by X1 minus X2.

443
Similarly, we can calculate X1 divided by X2 minus X1, this is equal to X1 divided by X2
minus X1. Now here, our equation will become simpler, we have got all the values here, I can
take the values directly from here. So, my B values here, A value is here, I have calculated all
I can put them directly here.

Now, I want to calculate this reliability. So, reliability will be equal to 1 minus P3. So, I will
write B upon S minus X1, plus C upon S minus X2, I will know I know B value I know C
value directly which is here. So, X2 upon X1 minus X2 that is my B, and this is my C value, I
can take directly from here and put it.

So, since Z3 is here, from here I can calculate P3. So, P 3 will be taking the LaPlace inverse
of this. If I take LaPlace inverse then P3 will become 1, plus B,, E to the power X1 t, plus C
E to the power X2 t, X1 t X2 t all are available with me, I can calculate, and R t will be equal
to 1 minus P3 t. So, I can calculate that directly here, let me just calculate the P3 t here.

So, this is, let us calculate P3 I will take value of t here somewhere, value of T was thousand.
So, P3 t is equal to 1 Plus as we discussed B is X2 upon X1 minus X2, this one, B multiply
by exponential of X1 t this is my X1 multiply by t Plus C, C is X1 upon X2 minus X1, C into
exponential of X2 t X2 is this multiply by t. So, this becomes my P3 t and how much will be
Rt, Rt will be equal to 1 minus P3 t, 1 minus of P3 t.

So, this becomes my reliability 0.99038. So, here I can solve this using the using this or
Lambda and X1 X2 etcetera or I can put the values continuously and try to solve and that
those values also, I can use directly in the equation. So, as we see let us compare whether we
are getting the similar values or not, we have the 0.90399038, let us see how much we are
having, yeah, 99038, if you make this approximate 38 will become 4.

So, same equation what we have got here we can solve our equations numerically also and
then also using the same steps we can use the Laplace transform and inverse Laplace
transform and we can again get the values here. So, this I am, here like Excel sheet, we have
used for this purpose, for calculation because calculator I will it will be difficult for me to
show you, how to calculate using calculator, but I can use the Excel sheet in much easy way
and that can be replacing the calculator work. Because here I can store the values easily for
you to later on C.

So, here as we have seen or we have discussed Marco methods and we are able to solve
various equations, various systems, and various configurations we have taken, and, here as

444
we see that actually for repair case the Markov is considered to be the way to calc, evaluate
the reliability. So, to calculate the reliability we can develop the Markov diagram, then repair
is there, and we can solve using this Laplace transform and inverse Laplace transform, all the
systems can be solved, and once you solve them you will be able to get the state probabilities.

Now, from State probabilities, we can get the system relative system to have t is the
summation of probability of those states which are considered to be working States, and/or
we can say reliability is 1 minus probability of failed State. Like in this case, we did not
calculate all State probabilities, we calculate only the probability of failure state, and from
there we calculate the reliability 1 minus probability of the failure State, P third stage, first
the state was the failure state. So, sometimes that can also be done to save the calculation
efforts.

So, with this our discussion on Markov analysis, whatever we wanted to cover in this course
is completed. So, in next week we will move to the next topic. Thank you.

445
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 26
Failure Data Analysis: Non- Parametric Approach

Hello everyone, in previous weeks, we discussed about various distributions, we discussed


about system reliability. So, wherever we use the reliability values, unreliability values,
failure rate, PDF, etcetera, we need to have known those values. So, these values we need to
determine from the data. So, as you discussed earlier that for reliability our random variable
is time to failure.

So, if we want to know the reliability, we have to collect the data for time to failure, once we
have the instances for time to failure, then we can perform the statistical analysis and we are
able to find out the probability that is reliability, unreliability, probability density function,
failure rate, etcetera. So, now, we will start discussing that if we have the failure data, how do
we find out the reliability or failure rate or the PDF or unreliability value.

(Refer Slide Time: 01:33)

So, we need to have the data. So, for data we have to collect data. Data collection if we are
able to do and if when then we can perform the analysis for the failure data then once we
perform the analysis, this becomes some very useful information for us, it provides a valued
information for engineers and managers. So, they are able to know from reliability data that
what kind of reliability performance their systems are having and accordingly they can take

446
other corrective actions or they can decide their policies for maintenance, policies for
providing the failure corrections etcetera.

It also provides inputs for assessing characteristics of materials. So, we are able to understand
that which materials are behaving better, which materials are not behaving so, well in the
design therefore, so, every it is not necessary that all materials behave or provide good
reliability in all scenarios. So, you are able to understand from the data, what kind of
materials are better or for the particular use, you can predict product reliability and design
phase as we discussed earlier when we were discussing about the system reliability that our
aim of the because reliability we are able to achieve during the design phase, other phases
like manufacturing or the uses generally reliability deteriorating.

So, maximum reliability which is achievable is determined by how well you have designed.
So, design phase is the maximum influence of further reliability. Now, when we are
designing the product, we may not have the data because it is supposed to be used in future
and once it is used, you will have the failure data for that, but if we have the failure data from
previous failures for the similar designs, then we can use the data like component reliability
data we can use and we are able to find out the system reliability.

But, so, for the using the data and design stage we have to analyze the previous designs and
find out their reliability and use them for the current design data. Whenever we are proposing
a design change, then we should be able to know that whether this design change is effective
or not. And what is the effect whether it is making the reliability poor or reliability better. So,
it is not many times we are able to theorize and theoretically we feel that if you make the
change your reliability will be better, but reliability is not just using better material not but it
is also should be a good match.

So reliabilities for the system is that all should be helping each other or all the components
are working together in a good way. So, whenever design changes are proposed then we can
look into whether this design change is going to be effective or not. Or whether it the
implemented design change is effective or not. We have many times suppliers, different
manufacturers which are giving us the parts of life. Now if we have multiple suppliers two or
more suppliers with us we can see the performance of their equipment’s in our designs,
whether they have been performing well or not so, well. And according to that, we can have
the vendor selection.

447
Product reliability in field. So, generally nowadays, reliability can be used as a contextual
parameter where the customer and the supplier both are having some agreement in reliability.
So, once they have the values on which they have agreed, then from the field performance
you can find out whether the reliability which you agreed similar reliabilities exhibited or not,
or whether those criteria’s are satisfied or not. So, field reliability data can be used and we
can assess to find out whether how much is the reliability we can predict product warranty
costs. So, what will happen once we know the reliability of our systems, we will know how
many failures we are expecting during the uses.

So, over the time, and we will be able to understand that how much failures are we going to
see and providing the warranty for those kinds of scenarios, how much it will cost if we do
not have proper data collection. So, like many organizations may not focus on data collection,
they want to use others data or they want to use the generic data or generic formulas, that is
where the problem arises, because you will not be having your own inputs, you will not be
able to understand that your design features or your way of designing or your applications,
whether those are how they are influencing the reliability and how much reliability actually
you are achieving.

So, using if you want to have a good reliability program, you should be able to use your own
data. And for that you have to have a data collection and which should be analyzed and used
for the, many times some pupil they make the procedures for data analysis some companies
but they are not able to perform the analysis of the data and able to use that information for
the design and decision making.

So, this information which are acquiring to testing operation and top products used during
industry and military or customer end, we should be able to find this data we should be able
to analyze this data and this information which you are gathering this information should be
used in our decision making process, you may have your reason based on your role or based
on whether your customer whether your manufacturer, or whether you are part of a team. So,
based on that you may be having your own purpose one of these or maybe that can be
separate from this, but most of the times one of these reasons would be applicable for you one
or more of the reasons.

448
(Refer Slide Time: 08:03)

Generally if we want to do data analysis, then we need to have the data for reliability the data
will come from testing. So, if we do if we perform in house testing, or if we perform testing
at some third party sites, so, wherever we perform the testing, we will have certain data this
data may be analyzed to get some reliability information like screening data, screening or
burn-in as we discussed earlier, that many times after manufacturing due to manufacturing
defects, some faults are part of the system.

So, these can be screened out by having a burn-in process or by having a HALT HASS
process highly accelerated stress screening or ESS process that is environmental stress
screening. This screening will also result in certain failures. So, this failure can also be
analyzed to understand the infant mortality failures and that can give you a good idea that
what can be the reasons for these failures, what kind of corrective actions you may be able to
take either in design or in vendor site or in manufacturing sites. So that you are able to reduce
these failures and you are able to eliminate the infant mortality failure in the future.

Similarly, you can perform accelerated life test ALT is accelerated life testing. So, here we
try to find out the life of the product by performing the acceleration testing, this acceleration
is generally achieved by applying the higher stresses. So we apply the multiple set of higher
stresses and we use the life stress relationship like RNS relationship for temperature stress, or
inverse power law for other kinds of stresses like voltage current, or mechanical load.

We use those formulas to establish the change in life due to the stress. So, once we are able to
have a understanding by doing this life measurement, a different stresses, we know the

449
pattern, the pattern in which the life versus stresses will fall. So, as the stress will increase
generally life will decrease. So, we may have three points, three different stresses we can get
through different life points, then we can fit a regression line through this and this regression
line will be able to give us the relation of life with the stresses.

So, this regression life when because these are generally higher stresses, we may be
extrapolating it to the lower stress level as the use level stress. So, by testing it higher
stresses, we are able to complete the test in a smaller period of time, because, if product life is
10 years, 20 years, we cannot perform the test for 10 20 years. So, we apply the higher
stresses, we try to finish the test in six months to eight months or nine months like that, and
three tests will be conducted and with the three tests we will be having the life stress
relationship and extrapolating the same will give the life at the normal use conditions.

So, stress at normal use condition and we will know the life and that same relationship we use
for determining the reliability, again reliability failure rate etcetera under normal use
conditions. So, this can be also used for the data analysis purpose. Another source of data can
be the previous experiences with similar or identical components as we discussed earlier. No
design comes from zero generally the design set or the modifications or the additional
functionalities or additional features from earlier similar designs or maybe developed by the
same company or maybe developed by some other companies.

So, if we have some previous data for those we can use that to understand that how much
reliability can be achieved from the new designs. We can use the operational field data like
customer failure reporting system, warranty data, inspection record, etcetera. This data is very
useful in analyzing the system especially if that system is repairable, systems are costly
bulky, repairable, because whenever they fail customer will report the failures and you your
team, team of specialized technicians would address those failures and correct those failures.

450
(Refer Slide Time: 12:34)

Some applications for this data would be predicting reliability, availability, maintainability
hazard rate, expected number of failures, etcetera. So, then we can also find out whether we
should do the replacement or repair. So, whenever a failure happens, there is a question
arising that whether we should go with part replacement or we should do repair and reuse it.
So, that can also be analyzed, we can decide the inspection schedules. So like for preventive
maintenance purposes or to find out the hidden failures, we have to do periodically
inspection.

So, when we perform the inspection, what should be the frequency which will allow us to
have the good availability that also we can decide but again to deal all these decisions, we
need the data and this data we need to collect and then only we can do that analysis, we can
do the spare part management, we can do preventive maintenance, everything requires data
and that data needs to be analyzed and used for the purposes we can recommend the design
change for reliability improvement maintenance evolution etcetera.

We can if we want to do the lifecycle cost modeling, then also we require the data of it. So,
based on the data, we can perform lifecycle cost and we can evaluate and lifecycle cost can
be the rather than deciding based on the acquisition cost deciding based on the lifecycle costs
would be much more proper it will give us that over the use of the product, how much cost
would be total costs would be there for the equipment which you are going to use.

And that can be the criteria which can help us to know whether the equipment is whether
sometimes costly equipment can actually be the having the lesser lifecycle costs, because if

451
they are having the less failure rate and less failure probability, you will have to spend less
time on repair.

Apart from that, there will be less downtime and because of that your productivity loss will
not be there you will be able to use the function and because of less of loss of function, you
will be not losing the money because of the failures. So there are we have to do the modeling
and we can do the optimization and find out that what kind of product or what kind of option
would be better for us.

(Refer Slide Time: 15:06)

Generally the failure data, we would can divide in let us say operational data, which is
coming from field or test generated data. So, operational data generally is less accurate,
because people it depends on when failure occur and when people report it, and how we are
operating it, whether operation time is recorded or not recorded, while test generated data is
generally the test is generated in the lab environment.

So, it is supposed to be the accurate data, but here the conditions of tests would be varying,
condition tests may not exactly represent the operational environment. So, because of that,
the relation with the operational environment has to be established. So, that becomes a
challenge when you are doing the testing.

Grouped versus ungrouped data. So, most of the time the data coming from operational side
is the group data that we know how many failures occurs over a period of the time per day
per week per month like that, ungrouped data is like, we have exact time to failure data that at

452
what moment it failed, we are able to record it that kind of data can only be gathered
generally when we are doing the testing. Large samples versus small samples.

So, number of samples which are collecting either from operational or test data. So, generally
you make if you have the large time of the uses, you will have the large samples. So, large
samples is definitely the desired one because the more the samples more accuracy you would
have in the estimation, but many times we have to work with the small samples. So, if some
samples numbers are small, then also we have to estimate but error would be high.

Complete versus censored data. Complete data is generally achievable in lab condition, but
most of the time in lab also it is censored data, because all the equipment’s which we are
putting on tests or the all the equipment which we are giving to the customer for use, all
equipment’s are not going to fail. So, some equipment’s is still keeps on working. So, those
equipment’s, which keeps on working, their failure times are not known to us, so, that
becomes censored. So, that is censoring, our ability to see when the failure happens, but we
know that till a certain time the failure has not happened.

(Refer Slide Time: 17:35)

We will discuss these things in more detail, how we can get those failure time data, let us say
we discussed that operational field data or we can do the reliability testing like this we have
already discussed in detail like screening, burn-in, ALT or we have the growth testing.
Reliability growth testing is another way of doing the test this is mostly done by the
organization. So, developing organizations generally what they do, whatever you design, any

453
product any software that product or software generally will not work, it will fail in the
testing.

So, any product which you design initially is not reliable, because there are many weaknesses
faults, it is theoretically looks good, but when you put it on to the qualification test, the
product keeps on failing. So, when the product fails you test and when it fails, then you will
be correcting the fault you will be improving the design and once you make the improvement,
then next time same failure may not come but some other failure may come.

So, this testing fixing and again finding out the testing failure fix cycle once you do, then
multiple cycles you do every time what happens you fix a failure fix a reason and what will
happen the reliability of the product will grow. So, over the development period, main
objective of development period is to make sure that system reliability laboratory grows by
doing the test fix test fix. So, this growth data can also help in estimation.

(Refer Slide Time: 19:08)

So, let us see that what kind of data we can have, complete data means, we have put all the all
the equipment and all the equipment have failed, singly censored data means one side, so
censored in left is uncensored on right. We will discuss this in by example in next slides. In
censored in right there are two types of tests that the time terminated or failure terminated.
Then there is interval censored and there is a multiply sensor. We will discuss these one by
one.

454
(Refer Slide Time: 19:34)

So, complete data means we have let us say put five units on test all are failed. So, we know
the time to failure for each one, this is the time to failure 1, this is time to failure 2 3 4 5.

(Refer Slide Time: 19:47)

If we discuss about right censor data, right censor data means at some point of right this point
should be up to here, let us say here, we have stopped testing, now this is testing we have
stopped either due to the two reasons either we have decided that our testing will be or let us
say we have given the data into further uses condition. So, when we give the data it falls here
time t equal to 0, but now, let us say we have spent one year in the fields.

455
So, we have the only one year failed data. So, in one year whatever failure have happened
that has happened here, but those which have not failed, they are still running at the end of
the one year. So, this is time terminated data. Many times during tests what we decide we
decide that we will run our tests up to a certain number of failures.

So, if you are putting let us say 30 units for the testing, you will say that our test will run till
we observe 10 number of failures after the 10 number of failures, we will not continue our
test. So, what will happen in that case, the last time where it is failing so, 10th failure will
happen like here if you see here, if you look at here, the failure will last data would be on
failure point and rest of the units which have not failed they will keep on running.

(Refer Slide Time: 21:07)

So, that is, so, this is where we have these two type of censored time terminated that means,
you have decided that at a certain point of the time the test will terminate and failure
terminate means, you have decided a number of failures for the termination point for the
testing. So, in both the cases some equipment will keep on working, once the either time is
reached or the number of failures is reached.

456
(Refer Slide Time: 21:32)

Then comes Left Censored, left censored is that we know that this is failed, like this
equipment, I am checking here and I found it in fail condition, but I do not know when it
failed. So, I know that it is failed from current before somewhere time before current time.
So, somewhere from the start of the test till I observed now. So, this failure is kind of missed
like we did not come to know when it happened. So, that is our left censored, because we do
not know on the left-hand side when the failed because time to failure is on the left-hand side
it may be here, it maybe here.

So, our ability is restricted towards left-hand left-hand side that we are not able to see
towards left there the failure happened. In earlier case our ability to see was restricted on the
right-hand side we were not able to see on the right-hand side where the failure because that
will happen in the future. So here we are not able to record so that is why this becomes left.
Left censor data is generally rare, we do not have much left censor data, we most of the time
work with the either right sensor data or we have the multiply censor data.

457
(Refer Slide Time: 22:45)

There is another kind of data, Interval sensor data. Interval sensor data is like a inspection
data. So, what happens that we are not able to continuously monitor our the performance of
the equipment. So, what we do, we will test time to time. So, we decide like every one hour
we are going to go and check whether it is working or failed because to determine whether it
is working or failed some tests need to be run and once you run that test setup you will come
to know whether it is working or failed.

So, this working of failure information is limited in a interval. So, we know that it is failed
within this interval but within the interval when it will, at the start of interval or end of the
interval, or mid of the interval we do not know. So, our time to failure is restricted, we are not
able to see it so, our censor has happened on the both somewhere in the interval. So, we try to
find this is all interval failures.

458
(Refer Slide Time: 23:49)

So, this we have discussed already, multiply censor data means that you are censoring is
happening in between also. So this can be understood more by the one example here like let
us say we have the multiple failure mode data. So, if you see that one product is there, let us
say if you see and that can fail in three different modes A B and C and like here is the data
which we have. So, at 105 the failure mode was failure due to the A now what happens when
I am analyzing the failure mode B and for failure mode B 105 becomes the censor type
because if it is not failed due to the failure mode A the system would have continued to work
and in future somewhere it would have failed due to the failure mode B.

So, we are not this failure which has happened not because of the failure mode B it has
become censored data for the failure mode B. Similarly, it has become censored data for
failure mode C also because we are not. because this since been removed here because of the
failure mode A, so, we do not know exactly when it is going to fail due to the failure mode B
or C. Similarly, the data for failure mode B would be censored data for failure mode A and C.
This becomes multiply sensor because this can happen anywhere left right interval. So, it
becomes a multiply failure censor data.

459
(Refer Slide Time: 25:26)

So, depending on what we want to measure data classification is open to interpretation. So,
we want when, sometimes what happens that a device is failed, but it is failed due to a reason
which is not of the concern for the testing. So, like let us say you are the testing device the
supporting device is failed, testing device failed, then what will happen you will not be able
to test the device further. So, it is recorded up to a certain period of time, but you do not
know because the failure did not happen. So, that is again become suspended or censored.

(Refer Slide Time: 26:11)

Censoring restricts our ability to see observe the time to failure we are not able to see time to
failure, either right side means failure is on right side, left side means left left side, interval

460
means in between and multiply means or sometimes here sometimes there so, failure time to
failure is not seen. So, sources of test censoring are fixed time because tests cannot continue
for long very very long periods or certain number of failures or grouped items or inspections
which you are doing on group items.

Then a staggered entry of units like all the manufacturing units may not be manufactured in
one day and sent to the service. So, different different time different units are going so, all the
units will not record the same operational time. So, you will not have, you will have the
censoring over there.

(Refer Slide Time: 27:04)

461
Multiple failure modes like we discussed here, that when you have multiple failure modes
then for one failure mode, it will become the failure data for other failure modes it will
become the censor data. So, here now, we will try to see and discuss that how this data can be
analyzed. So, as we discussed the data can be in various categories, it can be complete data, it
can be complete time to failure data, it can be complete group data, it can be censor data, it
can be right censor data, it can be multiply censor data.

As we discussed lab sets data generally does not happen frequently. So, that scenario we are
not, that is same as what we discuss in the multiply censor data, if we apply the same concept
it is applicable to the left censor data also. So, we have the sense various types of censor data.
So, we will discuss each and every category one by one and try to see that what kind of
failure data analysis we are able to do.

(Refer Slide Time: 28:09)

So, let us see the data analysis. So, first before going for distribution fitting etcetera, this data
what we discussed, it can also be fit to the distributions which is called model fitting. But
before doing that, we will discuss the model free graphical data analysis which is non
parametric that means, it is not we are not fitting into this distribution or we are not trying to
find out the parameters of the distribution, we are just looking at the data and from the data
itself we are trying to see what data is saying from the reliability point of view.

So, we look at the data and we do not make any distribution or model assumption we simply
see what is exactly is the value. But if we are able to do the model fitting it helps us but
whenever we do nonparametric data, we know only how much the data is saying we are not

462
able to extrapolate or interpolate or we are not able to consolidate that data because you are
not trying to capture the pattern in the data here. you are simply trying to see that what is the
meaning of the data, but when we fit the model, so, like we discuss the distributions, but
different distributions give different failure pattern, exponential will have one pattern, viable
with different beta parameters would have different patterns, than similarly long normal
would have different patterns.

So different failure patterns we are trying to capture through the data. Once we are able to
capture the failure pattern through sample data, we are able to understand the failure behavior
of the whole population. And we can do little interpolation and extrapolation expecting that
the same failure pattern will continue. And once we are able to do that, we are able to have
more useful interpretations and the predictions we are able to do and those things we are able
to to use for better decision making.

So, it is useful to fit one or more parameters distributions for the purpose of description
estimation prediction. It leads from simple to more elaborate models depending on the
purpose of study. So, we will discuss very brief like this maybe after a few lectures we will
be discussing the this model fitting first we will discuss the nonparametric analysis, we will
discuss the simpler ones so, that which you can understand you can quickly do the analysis.

Examine the appropriate diagnostic for adequacy of the model assumption. So, generally we
try to use that whether so, whenever sample data is less than what can happen, it may not be
correct representation of the population data and it may give the wrong interpretations also,
but, whatever is data says we can only go with that we may not be able to unless we have the
larger data sets from earlier historical data or somewhere from earlier experiences.

Without that it would be difficult to say which data is or what data saying is correct or
incorrect. But, as we know more the data more higher the sample size, the estimation error
will be less and that will be much more representative of the population. So, if you have
larger data it will be always beneficial.

463
(Refer Slide Time: 31:28)

In nonparametric approach, we do not assume any distribution and we do distribution free


analysis and what we focus on we try to find out mean variants etcetera, but without
reference to any distribution and whenever we do censor data sometimes it becomes difficult.

(Refer Slide Time: 31:52)

So, here, to do the data analysis like we, what kind of data we have? We generally have the
data like T1, T1 is the time to failure for one unit, T2. So, if we have n number of failures, we
have the time to failure data for n number of units. If it is the complete data generally it is
arranged with the increasing order that means T1 is the smallest and Tn is the highest, when
first failure occurs, second failure occurred like this nth failure occurred.

464
So, what happens if we have the this data So, Ti FEC, Ti is the data point. So, for Ti data
points we can get the Fti value, what is the Fti, Fti is the unreliability. Unreliability or the
failure probability. So, what is the failure probability? Failure probability is the number of
failures divided by the total number of unit in time Ti. So, in Ti, Ti means when I say T1, T1
means one failure has occurred. T2 means second failure has occurred. So, whenever I am
saying Ti means total I failures have occurred in time Ti but what happens this is a sample
data.

So, it is not necessary that Ith failure whenever I say one failure so, whenever say one failure
out of 10. So, that means 10 percent failure has occurred, but generally this is the sample
data. So, in that case if I say 10 failures out of 10 failure let us say at 1000 hour it occurred
then it will give us a misinterpretation it will say that unreliability is 1 at 10,000 hours, which
is incorrect because if we take some more devices from the population that may work for
12,000 hours also 15,000 hours also.

So, what happens these data points which we have represented they are the they are not
should not be taken at the end of the interval, whenever I am saying 1 out of 10 then it means
that I am taking at the end of the interval first interval is taken at the end of the interval. So,
we need to make certain corrections here so, that we are able to it should be a little less than 1
because first failure is occurring at some certain time which is less than that.

So, for doing that, we can use this formula that is median ranking formula that is 100 into I
minus A upon n plus 1 minus 2a. So, generally this is called plotting points, because here FTi
we plot it against the Ti. So, a varies from 0 to 0.5 whenever we do that, according to that, we
will get the FTi.

465
(Refer Slide Time: 34:40)

So, most appropriate value which you use is I minus 0.3, so, a value is 0.3. So, if I put a equal
to 0.3, this will become I minus 0.3 and when I put equal to point this will become 1 minus
0.6 that is 0.4. So, I minus 0.3 divided by n plus 0.4 that is what we generally use for the FTi.
So, we do not do percentage because we are working with probability. So this is in
percentage. So, when we take probability that is the proportion so the same gives the
proportion. So, this is the median rank assumption. Equal rank assumption means that data is
assumed to be at the end of the interval.

So, we assume that exactly at this failure happen. So that is I opponent, but this can be
generally when n is large, at that time, all three formulas will give you this almost similar
values, but when n is small, that means number of sample are small, at that moment, this

466
ranking will mat would matter that will give a little bit change. So, another most at mostly
time used is the let us say first failure happens somewhere between 0 to first interval.

So, that is our T1. So, here what will happen nth failure, we will assume that there is one
more failure at some other time infinity. So, that means that means nth failure is occurring in
interval from n to n plus 1, which is somewhere in future. So, here what happens we assume
that it is the interval point it is not the point at the, it is not single point it is the at some value
at the interval. So, number of intervals becomes n plus1.

So, here we take an n plus 1 interval and for each interval we will have one failure. So that is
how we try to we take the mean ranking assumption. So with this we will not get the failure
rate 1 of failure probability 1 or reliability equal to 0, when last failure occurs. Similarly, we
use the, so median rank assumptions is the most widely used assumption and whenever
generally the data analysis is carried out by default, most of the methods are using this
formula for the determining the FTi when Ti is given to you.

(Refer Slide Time: 36:58)

So, we will stop here today. We will continue our discussion with more examples and more
patterns which we want to analyze. Thank you.

467
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 27
Failure Data Analysis: Non- Parametric Approach (Contd.)

Hello, everyone, welcome back to our discussion on failure data analysis. So, we have been
discussing that if we have the failure data, how can we analyze the data without using any model
and we are able to evaluate the reliability and other parameters. So, regarding that last time we
discussed that there can be three approaches using which when we when we sort the data into the
order and we find out the data order and then we try to find out the unreliability and then let us
go forward and let us see that how this how this works.

(Refer Slide Time: 01:02)

So, let us take an example that a proof of test of 14 turbo engines provide time to failure data. So,
that means, in hours so, we have tested 14 turbo engines and we are able to find this time to
failure data. Now, this data as we see this is the turbo engine wise, turbo engine 1 failed at 103; 2
failed 113, but this is not ordered.

468
(Refer Slide Time: 01:29)

So, first what we need to do, we need to order the data this 14 failure data which we have there, I
have already put it in Excel and in the Excel we have done the order in the increasing order. So,
first failure occurred at 72, second occurred 82. Like that, we have the total 14 failures. Now, as
we discussed, we discuss the three approaches equal rank method, whenever we are using equal
rank method, then failure probability or unreliability is given us i divided by n, how much is i
here, i is 1.

So 1 is here, i is 2 here, i is 3 here, so we have the i values here, and how much is n? n here is 14,
14 devices were put on test. So when i put it divided by n, that means, 1 divided by 14 which
gives me 0.07, 2 divided by 14 is 0.14, 3 divided by 14 0.21. So, this is how I am able to get the
whole data. And this gives me the unreliability when I am using the equal rank method, if I am
using the mean rank method, in mean rank method, as we discussed, this is i divided by n plus 1.
So, that means, whatever is the value of i here, 1 that I will divide by the 15 here rather than 14, I
will divide by 15.

So, when I divide by 15, I get these values, all the values I have got like this, I have done Excel
sheet also, I will show you maybe, little later how to do this in Excel sheet. Then we can also get
using the median rank, median rank whenever I am using the same thing I can get it like 0.0486,
that is i minus 0.3 divided by 0.4. So, i minus 0.3 means 1 minus 0.3 is 0.7 and n plus 0.4 means
14.4, that will give me this value.

469
Similarly, when I am using other that is 1.7 divided by 14.4, 2.7 divided by 14.4 like that, we
will whenever we use we will be getting this Fi values. So, as we discussed generally the most
popular formula is this and also many times this is also used, but I would prefer that if you can
use this medium ranking formula for the purpose.

(Refer Slide Time: 03:52)

Now, once you get the f t value, that is failure probability, how can we get the PDF value? So
PDF value as we know PDF value is the differentiation of f t value. That means if I take Fti plus
1, and if I take Fti. And if I take the difference of the 2 that will give me the difference in ft dFt.

470
And divided by the difference in time, difference in time is ti plus 1 minus ti. So this gives me
the difference in failure probability per unit time.

Now, as we know that if we use the i upon n plus 1 formula, if I use this formula, then what will
happen Fti plus 1 is i plus 1 divided by n plus 1 and Fti minus 1 is i divided by n plus 1, and
whole divided by ti plus 1 minus ti. Now, if you see here this portion if you solve then this will
become i plus 1 minus i divided by n plus 1 and this will become 1 divided by n plus 1. So,
effectively what I get 1 divided by n plus 1 divided by ti plus 1 minus ti that is 1 divided by ti
plus 1 minus ti divided by n plus 1. So, this becomes my PDF value.

So, PDF value is nothing but 1 divided by the difference in time divided by the n plus 1, same
thing if you are using any other formula, let us if you use median ranking, for median ranking,
you would be using i minus 0.3 divided by n plus 0.4 minus. So, this will be i plus 1 minus 0.3
divided by n plus 0.4 and other is i minus 0.3 divided by n plus 0.4 that is the, and this if you
solve the only change would become n divided by ti plus 1 minus ti.

So, if you solve this then what will happen this will become ti plus 1 minus ti will remain same
and here if you take n plus 0.4 as common this will become n plus 0.4 and this subtraction would
be same. So, i plus 1 minus i minus 0.3 will get canceled by minus 0.3 and only 1 will be
remaining. So, if you use median ranking rather than n plus 1 this will become n plus 0.4, the rest
of the things could remain same. If you use equal ranking then this will be only n this will not be
n plus 1 or neither n plus 0.4 but, so, depending on the method you are using you can use this
formula.

Next is lambda t. So, lambda t we know is ft upon Rt, ft upon Rt. So, ft is here already and how
much is Rt? What is Rt here? Rt is 1 minus ft and for n plus 1 formula 1 minus i divided by n

471
plus 1 so, this will become n plus 1 minus i divided by n plus 1. So, this value I am removing
certain erasing this so, that you can follow it up. So, our Rt value is 1 minus ft that is n plus 1
minus i divided by n plus 1 and ft is this. So, this divided by n plus 1 minus i divided by n plus 1.

If I do this what will happen n plus 1 and n plus 1 will get canceled because this will become n
plus 1 upon n plus 1 n plus 1 minus i. So, this n plus 1 will get replaced by n plus 1 minus i. So,
what I will have is this is 1 upon ti plus 1 minus ti divided by n plus 1 minus i, n plus 1 get
canceled by n plus 1 and n plus 1, 1 minus i will replace this n plus 1.

So, here what is happened in lambda t the change happened is here if we let us say if we use
another formula that is if we use the median ranking formula then that can also be used the fact
this we are discussing with reference to the mean ranking, but same thing is applicable to any
formula which you use.

So, lambda t if we look at it, what does it mean that here in ft when we were considering it was
the one of number of failures out of number of units which were there at the start that is n, but
here it is number of failures that is 1 out of a number of units which is working at the start of the
interval that means, the failure units are not counted. So, i units which are failed at the start of
this interval have been removed.

So, failure rate and the difference in failure rate and ft as we discussed in define earlier in our
lectures, that ft gives you the number of failures, number of probability of failure per unit time
this is the time and this is the probability of failure out of all population, but lambda t gives me
that failure probability out of only the surviving population that means not from n plus 1, but
from surviving population that is n plus 1 minus i, the failure unit have been removed out of that
what is the per unit time failure probability that is given by the lambda t, Rti, is as we have
already seen that is n plus 1 minus i upon n plus 1.

For complete data we are able to calculate MTTF also which is nothing but the average value
time to failure. So, if you sum up all the time and divide by n that will give the MTTF. And if we
want to calculate the this sample variance we can calculate the sample variance as Ti minus
MTTF 4 squared divided by n minus 1 and this can also be calculated ti square minus n into f
MTTF squared divided by n minus 1.

472
So, this gives us the formulas which are required to calculate the important quantities of our
interests. We want to know failures per unit time that is ft, we want to know failures out of
surviving population per unit time that is our lambda t. We want to know reliability, we want to
know unreliability, we want to know MTTF we want to know the standard deviation everything
we are able to calculate using these formulas.

(Refer Slide Time: 11:25)

Now, if we look at the another example. So, same example which we have taken earlier, we took
a 0, here we have added 1 more row here for 0, because at 0 all the system is starting to work and
if you see here, then here we have done the correction little bit correction here that at 127 we had
2 failures, since we had 2 failures, so that difference of time will become 0. So, we tried because
this is continuous variable. So that there may be a little variation in that, so here the 127 failure,
which we had here is actually given at more correct value that is 127.3 and 127.8.

So, we may approximately give like a near about value like 127.25 and 127.75 here or 126.75
and 127.25 here, so that will be on average giving you the same value. So, accordingly we can
choose and we can put it up. So ti value we keep it here and Fti as we discussed i upon n, so i is 0
here. So, this will these values we have already calculated this also we have already calculated.
Now let us see what will be the reliability value, reliability is 1 minus unreliability.

473
So, 1 minus of this value will give you the unreliability, this reliability and unreliability values
which are calculated is generally given at the start of the interval. So, this unreliability values
which are getting we are getting here and this then 0.951. So, I think that is using this ft value.
So, this median ranking if you are using them from based on the median ranking Rt values will
be these and how much will be the ft values small ft means, change in capital FT divided by
time.

So, either what I can do I can take the difference here. So, that is 0.049 divided by time, time is
72. So, 0.049 divided by 72 will give me this value. So, here as you see, this is the difference till
next interval, so, this will be 1 less value, because last interval will not be counted this is for the
applicable for the interval that means from 0 to 1 failure, or 1 to 2 failure, second, third second to
third failure.

These values are calculated and what is the HT or lambda t or hazard rate, that is nothing but
whatever ft value we have calculated, if we divide this by reliability we will get this and how
much is MTTF here, MTTF is nothing but the average time to ti value here and standard
deviation can also be directly calculated using the formula or we have the formula already given
in the Excel sheet. So, let me show you how can we do this in the Excel sheet.

(Refer Slide Time: 14:31)

474
475
476
477
478
So that will be kind of repetition also and that will also give you an idea that how we are doing
the analysis in the so we will take only the data basic data. Let us say if we have the data. I am
opening a Excel sheet here and making it little zoom, so that we can follow it up. I will use this
only as the data point, I will not use any formatting here. So this is my i, i is the failure number.
And this is my ti, ti is the time to failure which I have observed if I want, like, because it is 127.

One case was this, better, I will take a 126.75 and 127.25 that way I will have the balance
average will be 127 for both. So here, we have this data. Now, we can use the different ranking
methods here. So if I let us say, equal rank, if I use equal rank method, then we know ft is equal
to i divided by n, n here is 14 here, so if I want, I can write somewhere here n is 14. Or I

479
generally prefer to use a method, which is format as table, so I will use this here, now, ft, equal
rank method that is equal to i divided by n, n is 14.

So I am putting that directly. If you see, I have got this ft value, I generally prefer to have all
these values to have the same number of decimal points, I will use this function. 3 decimal points
are enough for us to visualize this. Now, if I follow the mean rank method, for mean rank
method, we know ft, fti is equal to i divided by n plus 1.

So what I will do that is equal to, what is my i, i is here divided by n plus 1 is 15. So, I will
divide by 15. And again, I will do the format, same format I will use here. Then I will go with
median ranking, for median ranking, ft is, is equal to, as we already know the formula, but I am
still writing again, i minus 0.3 divided by n plus 0.4. Now this value, that means this is equal to i
minus 0.3 divided by 14.4, because n is 14, so 14 plus 0.4 will be 14.4, the first value is always
0, it cannot never be less than 0. So we will put 0 here, rest of the values will remain same.

I will use again the formula format painter here. So now as we discussed, I want to use these
values for reliability calculation. Now Rt will be nothing but 1 minus ft. So ft values already
given me here, so I will use the same, I have got Rt here. Rt value, as per the median ranking. If I
want to know Rt value as per the let us say, mean ranking, then I will use this formula.

And there will be little change here. If I want to know, use the equal ranking, I will use this as
this is equal to 1 minus equal ranking unreliability value that is this. But as I discussed, we
generally prefer to use the median ranking formula that is considered to be statistically more
accurate. So we will use the median ranking formula. So this gives us the reliability value, we
have got the reliability.

Now we want to calculate the failure density. So density function if you want to calculate ft, ft is
minus DRT over dt or DFT over dt. So we know that this is equal to our DFT so that is next ft I
am taking here minus previous ft, is this value and this divided by time difference, time
differences this time minus previous time. So this is what I am able to get generally this will not
have the last value, because that will be by default it is 0. So, it is counting the same value then
we can get zt here zt as we discuss is ft upon Rt.

480
So, this is equal to ft divided by Rt this gives me this all the values. Now, let us see if I want to
know how this reliability is changing or this ft is changing or zt is changing I can do the plotting
here. So, let me show you if I plot the time versus let us say first I plot the ft value, reliability
value I can go and do this plotting by going to the insert and choosing the XY scatterplot when I
choose XY scatterplot I can choose the scatter or I can choose the scatter with line connections.
So, I can choose this if you see this becomes my Rt, this has further functions, I can add more
functions here, I can go to that say this is equal to ft, generally, I will not use here, ft here, I will
use go with 1 here.

Let us say this becomes my Rt, so as you can see, I am able to see that how reliability is
changing. So initially reliability decrement is not so fast but if you see after around 100 around
us 70 hours or so, then reliability keeps on increasing and it is fastly decreasing then slowly little
bit here. So, I am able to know I want to know the reliability at 100 hours I can calculate that this
is almost here that is around 0.8. So 100 hours is somewhere here.

So that is somewhere between 0.81 to 0.74. So that comes out to be somewhere around 0.8
maybe. So looking at more closer data we will be able to get. Same way if I want I can plot ti
versus let us say I plot the Ft, I will remove this data point, I will do it again. Because one data
point is less. So here, generally, when I am plotting ft I would prefer to plot it as a step function
where this value is given for the whole interval here. So, again I am just plotting for the showing
here again, if you see my ft value looks like this, how so it shows that around somewhere here
my number of failures are quite high and accept that number of failures are comparatively low.

So I have more failures concentrated in this region, similarly if I want I can plot the lambda t.
But as I discussed, I would have preferably considered that my ti is plotted with fti as this
constant value as a step functions. But step function plotting is not easy in Excel, I have to do
certain modifications there, prepare format then only it will be done. So I will better show it as a
then I will go to the chart type. And I will show this as the data rather than showing this change
chart type I will go I will select this data type.

So you can see the data point of view, rather than lines, so here similarly I can plot the ti versus
zt also. Same way I will go and this becomes my zt function. So as you see I am able to plot and

481
zt if you see, I will go with the data type I will change chart type. Let us see how does it looks in
line, line plot.

Again, line plot if you see it is kind of similar here. So we are able to now with the
nonparametric data analysis if we have the complete data how to calculate the various
probabilities reliability unreliability, unreliability also I can plot unreliability is given here, so I
can just plot it. So, ft versus time versus ft, I will go with insert and I will again go with XY
scatter, I will go with this remove this ti versus fti, okay some problem happened, so I will do
this again.

I think some problem happened this is ti and this is ft. So, you see that this is my ft plot. So, here
I can see various ways, so, if you see that ft is rising here then finally all failed, similarly, this I
am saving and this sheet we will share with you maybe you know, whenever we get the chance,
so fine. So, this is our complete data and that is the ungrouped.

(Refer Slide Time: 27:09)

Now, let us go back to our presentation and see that what is next there, whatever we have done
here same thing we can show it, see there. Now, there another type of data can be the complete
grouped data. Grouped data means, you have intervals and like you have k number of intervals
here and for each interval, you know the number of failures and so, n1 n2 nk means the number
of units which have survived that means, at the start of zero interval how many units were there

482
at the end of first interval, how much you data survive that means, n1 minus n2 will give you
number of failures which have happened in second interval like that and t1 to tk is the time.

k −1 ti ( ni − ni +1 )
MTTF =  ;
i =1 n
( ( n − n ) − MTTF
2
k −1 ti i +1
s =
2 i 2
i =1 n

where, ti =
( ti +1 + ti )
2

So, if I want to know how much is the reliability for ti, so, that means, at ti ni numbers are
surviving out of total number of units which have been put on the test. So, ni upon n gives you
the Rt. So, ft will be 1 minus of this and small ft we will be change in Rt divided by Ti minus 1
so, that Ti is ni upon n this will be ni plus 1 minus ni divided by n. So, this will become same
formula as we derived earlier divided by ti plus 1 minus ti divided by, earlier the number of
failures was 1, because we were counting 111.

So, the change only here is that rather than one this has become the number of failures in the
interval. At each time we are saying the how many failures are having happened, rest of the
things are same, rather than 1 it has become ni upon minus ni minus 1 ni plus 1 and ni is the
number of survival unit at time ti. So, number of failures would be difference in that and
similarly zt lambda t, we are able to get and zt as we see whenever we are calculating ft out of
total number of units on the put on the test.

483
And whenever I am calculate zt, the formula is same as ft, the change is that n is replaced with
the ni because out of surviving unit how many units are failing in the interval that is only
considered for zt, at the start of the interval. MTTF can be calculated in similar way for MTTF
calculation we are taking the number of failures in the interval and the average time that means ti
plus ti plus 1 divided by 2. So that is the middle point of the time. So that is Ti bar and then we
have the M that is gives the MTTF. Same formula in a similar way.

So ti into number of failures will actually give you the total time for which the failure happened.
So, cumulative time due to the failure. So, this is time total time in failure divided by n.
Similarly, s square is also similar formula that ti square into ni minus ni plus 1 divided by n
minus MTTF square and ti bar is how much that is ti plus 1 plus ti divided by 2 and we can also
get the confidence interval MTTF this we can get from previous one also that is t alpha by 2 n
minus 1 as by root n. So, that will be plus minus when we take this is the two sided confidence
interval.

So, we have alpha by 2 this side and alpha by 2 probabilities lapped on this side and this is 1
minus alpha. And when we take one sided generally we are interested in lower bound. So, how
much minimum reliability we are going to have or whatever is the minimum time to failure we
will have so that MTTF.

So, this is the minimum that means this is 1 minus alpha. So, here alpha is left and this is nothing
but the if this is our MTTF estimated value, this will be MTTF minus t alpha n minus 1 into s by
root n. So, this is now if I take a 95 percent confidence level, in that case, this is will be 5 percent
loss 5 percent will be left out. So this is the fifth percentile point for this data, you can use this.
So, we will see that how we can use this data.

(Refer Slide Time: 31:56)

484
And how do we do the analysis for population. So, we will do the same as one more example in
next discussion. Thank you.

485
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture: 28
Failure Data Analysis Non-Parametric Approach (Contd.)
(Refer Slide Time: 00:42)

Hello everyone so we will continue our discussion where we left in previous class that is
analysing the group failure data for evaluation of reliability and other important indices. So,
as we discussed in previous class we derived these formulas now let us see that if we have the
data then how we are able to calculate.

So, let us take an example that a preliminary information or underlying failure model can be
obtained if we plot failure density, hazard rate this we have done already for the complete
data we have time to failure was available reliability function against time. If we can define
piecewise continuous functions for these three characteristics by selecting some time
intervals delta t.

This discretization eventually in the limiting condition like delta t tending to 0 data is large
would approach the continuous function analysis. Now let us say that we have the failure
times for population of thirty electric bulbs and in underground subway which is for which
we have these observations here.

486
(Refer Slide Time: 01:30)

Now these observations which is given here I have plotted now I can do the analysis time to
failure analysis also where like we discussed earlier I can arrange all these time to failure data
into the increasing order I can do the same analysis find out the reliability unreliability and
find out the Ft, Rt everything.

(Refer Slide Time: 01:58)

But we can also do one thing we can do the group analysis. So group analysis sometimes we
are doing so that our problem is better understood sometimes that group did group analysis
we are doing because data is in already in the group.

So, if data is not in group but we want to do the analysis in group what we can do we can use
the Sturges formula. Sturges formula suggests is that how we can decide the interval so our

487
minimum time is 3.77 and our maximum time is 715.44 from the data. So, how can we
decide it, so total data points are 30, so when we have the 30 data points this n value when we
use we it comes out somewhere around 6, so we can divide this interval into 6 intervals and
then we can do the analysis.

(Refer Slide Time: 02:48)

So, here this time to failure data which was there I have ordered it already and this is put up
here, and this data I have taken in intervals. So 6 intervals are taken of 125 hours each. So
125 is here then 126 to 250, 251 to 375 like that, so we have the 6 intervals here.

(Refer Slide Time: 03:18)

488
Now in each interval as we discussed for the formulas like here we want to know at the start
of interval how many units are working n1 n2 like that, so we will go like this. So, here as we
see here for first interval at the start of interval all units are working 30, at the time t equal to
0, number of units surviving is 30. Now in time 0 to 125 how many failures are happened if
you look at here this is this, this 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13. 13 failures have
happened. So these 13 failures have happened in interval 0 to 125 so number of failures are
13.

Now how many units are surviving at the next interval is 17 at the next interval only these 17
units are working out of these 17 units up to time 250 if we look at it then 1, 2, 3, 4, 5, 6, 7, 8,
9, 10. 10 units are filled, so 10 units are filled then in next interval from 250 to 375 the
remaining units are only 7, out of the 7, 4 failure occur 1, 2, 3, 4, and then again similarly 4
occur 7 remaining out of 7 1 failures in this interval, and from 500 to 625 1 failure occurs,
then 626 to so 750 1 failure occurs.

So, we have this number of survival data number of failures data now we can get the Rti.
What is reliability at time ti at the start of the interval, so this Rti value is 1 because 30 out of
30 are working at the start of interval so 30 divided by 30 is 1.

Here for second interval 17 out of 30 are working so 17 divided by 30 that gives me 0.57 is
the reliability for second interval. Similarly for third interval 7 divided by 13, 0.23. Similarly
we get 0.03, 0.07 lines all these values we are able to get.

Similarly, I can get the unreliability with as 1 minus reliability and ft is again change in
reliability as we discussed here ft is number of failures in the interval divided by number of

489
survivals at the start that means always 30, so this will be 13 divided by 30 that is probability
of failure per unit time how much is the time each time interval is of 125.

So, 13 divided by 30 divided by 125 will give me this value similarly this value will be 10
failures out of starting 30 units multiply by 125, same way and zt would be I can get it two
ways either I divide Ft with Rt or I can take number of failures divided by number of
surviving units at the start that is 30 here into time period is 125 for this number of failures is
10 number of survivals at the start is 17. 17 into 125 like this way if we calculate we will be
able to get the reliability, failure probability, ft and zt.

(Refer Slide Time: 06:50)

So, we are able to get all the parameters of our interest if I plot this then density function
would like look like this that is of first value is for upon. As we discussed earlier zt values

490
and ft values we will try to this is valid for the complete interval, so this will be for Interval
same value, so we should try to plot a step function here. So first interval value around 0.14
something so that is, this may be a data from some other value 125 there is some error here.
in this so we will not look at here we will see the same value which we have plotted here or
we can check this also if you want let me just try to do this quickly.

(Refer Slide Time: 08:00)

We have used the Excel sheet let me put one more Excel sheet here again I will do the same
thing we have got interval we have got Fti. Now as we saw earlier that is equal to I will call
the ft here. So ft is equal to this is equal to number of failures in the interval divided by units
in at the start divided by interval is 125. So we are getting the same value maybe that figure is
in some error.

And then zt value is number of failures in the interval divided by the number of units at the
start and divided by interval, interval is 125. This value would be same as ft upon Rt if I show
you that is ft divided by Rt same value will get with both the methods. So we are able to get
this is coming because of let me just check Rt is equal to this divided by 30.

And this is actually what happened there was an approximation error, so that is why, it was
looking like this. So because we have did the approximation of the digits, so that is why, now
if we do not do approximation it would look exactly same. So, as we see here this will be
kind of ft Rt. I think there is some scale problem otherwise it looks pattern would be same we
see the same pattern for hazard rate Rt, ft would look like this.

491
(Refer Slide Time: 10:37)

Now if you look into other type of data we have looked into two types of data one is
complete data ungrouped another is complete data grouped. Now let us see if we consider the
singly censored data, singly censored data means mostly as you discussed it is the right
censored data.

So, let us take an example for the same. Let us say we have the 20 units which is placed on
test, so we have n equal to 20 and the test continues for 72 hours and these are the failure
times recorded, so 1, 2, 3, 4, 5, 6, 7 failures have happened. So 7 failures happened in 72
hours but rest of the 13 failures have not happened.

So, these 13 failures are the censored data. So, if we use a mean ranking or median ranking
we can get the because this is right side so whatever failure point is happening this is t1 and

492
this is t7, so we are able to get the same way reliability value or unreliability value but the
problem here would be we cannot calculate the mean on variance.

Because we do not know the how much is going to be the failure time for the rest of the units
which are working so because of that we cannot calculate the mean, so for this kind of
methods unless we fit the distribution we will not be able to calculate the mean. Sometimes a
mean is calculated by assuming that there is exponential under exponential assumption.

In under assumption of exponential what happens we assume that failure rate is same, so if
failure rate is same in that case we assume that given a population size whatever failure
happens the number of probability of failure remains same for the rest of the failure in a
given time.

So, here because of this the for exponential assumption this MTTF becomes a cumulative test
time divided by number of failures but as we see this is and only going to be valid when we
assume the exponential this assumption many times people use this formula as the general
formula but we should be show that data is exponential distribution because otherwise many
times this MTTF would be very-very large and this large MTTF will suggest that life is more
but this MTTF may not be the representative of the life.

Because what will happen if you place lot of units on test let us say 1000s units of test and
then only observe 10 units of failure then what will happen for 990 units you will have the
data which has been cumulated for the failure times, so MTTF will tend to show the larger
value.

Because all units may not fail in similar pattern. So life limitation may be there but generally
still it is valid many time because as we discussed earlier that is, if the device life is within
the non-deterioration period or the constant failure rate period then this MTTF would be
valid.

So, cumulative test time here like if I can take the summation of this time i equal to 1 to 7 and
ti plus now for 13 units we have tested for 72 hours. So total cumulated time is how much the
failure time which we recorded and the 13 units which have been tested for 72 hours but
failure has not been observed. So, that is also cumulated and how many failures I have
observed we have observed 7 failures. So, this formula when we use it will give us the MTTF
but that is not the true MTTF that is the MTTF under exponential assumption

493
So, here as we see here I have done this exercise in excel, so we have 0 to 7 values, then ti
values are there 1.5, 3.5, all these values I have put up here then Fti value as we have seen
earlier we calculate in same way that is 0 and i minus 0.3 divided by 20 plus 0.4. Same way
we calculate here and Rti value is 1 minus Fti and small fti as we discussed earlier that is
difference in this divided by difference in this.

And zt is nothing but ft divided by Rt and that is how we get all these values. And Rt values
if we plot, the plot looks like this, Rt is changing line. As we see here in complete data, it was
reaching to 0 but since this is not a complete data, the censored data. So I know only up to
here, I do not know how reliability will be, this may go like this also, this may go like this
also, I do not know how will be the reliability behaving in the future because I do not have
the data here.

But if I had put some distribution assumption then that would have suggested the future trend
and because of that I would be able to get the MTTF and other values also. So, this singly
censored data as we have seen we are able to analyse and we can use it for multiple purposes

(Refer Slide Time: 16:13)

We will continue our discussion with a more generic type of data which is multiply censored
data so multiply censored data what we have we have ordered failure times this is all ordered
data t1 is less than t2, t2 is less than t3 but here we are putting some signs plus here.

494
n+ 2−i ˆ n +1− i
Rˆ ( ti −1 ) = ; R ( ti ) =
n +1 n +1

 n +1− i  ˆ
i

Rˆ ( ti ) =   R ( ti −1 )
 n+ 2−i 
 1; if failures occur at time ti
i = 
0; if censoring occur at time ti

What does this plus mean, this plus means that data is censored. That means at t3 time the
unit has been removed from the test but not due to the failure of concerned failure, so it has
been removed either because it has failed in some other failure mode, it can be removed
because some sort of other failure, structure failure has happened, or some sort of supporting
device failure has happened which is not the failure concern.

So, because of that we do not know and we could not test it further and see we do not know at
what time it will fail. So we are not able to know that time. Then, so this plus sign is for the
data for the devices which have been removed from the test but not considered as the failure.
So we may have different times we can have at the end also we let us say here it is looking
like that nth failure is happening here but other devices are failing.

We may have some other way also that test may incomplete also may remove also. So, all
types of data may be coming here so this general method can be used for that all the
purposes. So, one popular method to analyse this kind of data is product limit estimator
method. so in product limits estimator method is assume that reliability changes with failure
while reliability does not change if there is a censored data.

So, what does it mean that reliability for ti minus 1 and ti, so, this is again that mean ranking
formula when we use, so ti minus 1 means of i minus 1 failure we are talking about so that is
1 minus i minus 1 divided by n plus 1. So this will be equal to n plus 1 minus i plus 1 divided
by n plus 1 this is equal to n minus i plus 2 divided by n plus 1.

Similarly, if I take Rti, ti mean i failures so 1 minus i divided by n plus 1, so this will become
n plus 1 minus i divided by n plus 1. So if you see here if I take the different ratio of this Rti
divided by Rti minus 1 then this will be equal to n plus 1 minus i divided by n plus 2 minus I,
n plus 1 and n plus 1 will get cancelled, this n plus 1, this n plus 1 will get cancelled.

Now this so what I can say Rti is equal to this value n plus 1 minus i divided by n plus 2
minus i multiply by Rti minus 1. Now here I am using a power factor here that is delta I. So,
this factor I will multiply, so what I am assuming that if it is a censored failure then reliability

495
Rti is same as Rti minus 1, but when it is censored when it is not censored then failure
happens then this is to be multiplied with this factor.

So, delta i will be 1 if there is a failure and delta i will be 0 if there is no failure. So if delta is
0 then Rti will be equal to Rti minus 1 and then delta i is equal to 1 that means if failure
happens for that point, then this value will be equal to, this value has to be multiplied by this
factor.

(Refer Slide Time: 20:06)

Now let us see this with the data so let us say we take an example that we have the failure
time and some censored time and for 10 turbine vanes. So, n is equal to 10 here, we have 10
vanes here and lack of 1 we have the failure data 150 is failure. So, for failure we are writing
1 for censored we are writing 0.

496
For 340 it is censored for 560 it is failure, 800 failure, 1130 censored, 1720 is failure, 2470 is
censored, then again censored, then failure, failure, all these data we have put it here. Now
how do we do the analysis as we have seen n plus 1 minus i and divided by n plus 2 minus i
is the factor.

So, we will calculate this factor. So n is 10, so 11 minus i and n is 10, so that is 12 minus i.
So 11 minus i divided by 12 minus i we will calculate here. i is what, this is, so this is 0.9091
that is 10 divided by 11, this will be 9 divided by 10, this will be 8 divided by 9, this will be 7
divided by 8. So, like this we are able to get this value.

But as we know that Rti value which we are calculating, Rti value will be calculated as Rt.
So, Rt is this value for ti minus 1 as Rti, t0 is equal to 1, t equal to 0 is 1 for 0 failure 0, so
previous value is 1 and if I take 0 i equal to 0, this value will be 1.

Now for this first interval reliability will be coming as it is 0.9091. Now this is a censored
point, so for censored point the reliability remains same 0.9091, the same reliability which we
have here that will remain same. Now for 560 points failure happened and the multiplying
factor is 0.8889. So, what will happen this 0.9091 will get multiplied with this multiplication
of 0.8889 with 0.9091 will give me 0.8081 this will be my new reliability then again failure
has happened.

So, again this value will be multiplied with 0.8081 and this will be my new reliability. Again
censored point, so I do not bother about censored point even the censored point if I do not list
it here it will not make much difference. Then for again failure happened here the factor is
this, this will be multiplied with 0.7071 and this will give me 0.5892 so same way I am
getting the reliability values here.

This reliability values becomes the Rti value, so Rti value will plot against ti, so this is my ti,
and this is my Rti. And when I plot this reliability function is looking like this I am plotting
only 1, 2, 3, 4, 5, 6 points, other points which are not calculated is not plotted here. So that
you can see that how reliability is changing with time.

So, with this example we are able to see that how reliability changes or we are able to
estimate reliability whenever and same once we know the reliability and with time then
change in reliability per unit change in time, if we take that will give you the density
function, negative of that. And if you take the divide by again Rti if you do density function

497
divided by Rti, you will get again the failure rate function. So all the functions which other
functions same formula same table you use you will be able to get it.

(Refer Slide Time: 24:16)

Now this analysis can also be done with other methods so there are two more popular
methods one is Kaplan Meier Product Limit Estimator. So product limit estimator which we
used here it has this assumption and these assumptions were later on changed this they told
that you can do different ways. So here the reliability is calculated as 1 minus 1 upon nj.

So, here tj is the order time and failure times and nj the number of items remaining at risk at
prior to jth failure, so that means at that start of the interval how many units are working. So,
the units before this which have been failed or which have been censored all those units are
removed, only remaining units which have been working has are counted here.

We will see this then rank adjustment method, the same thing it is supposed to propose that
we can calculate it by using the rank adjustment method in which we try to use the adjusted
rank. So it is assumed that whenever you remove a failure because of removal of failure what
has happened, the rank which was 1, 2, 3 we were taking for failure times that gets modified
and we try to get the new adjusted rank which is little higher than the rank, because the
failure could have happened the removed device could have failed in the interval also.

498
(Refer Slide Time: 25:45)

499
So, let us see how do we do this so if we take the Kaplan Meier method so same data which
we had earlier same data I have taken here now in this case we calculate the nj, nj is the
number of device working at the start of the interval, so at the start of first interval the
number of device working for 10.

Now what happened out of 10, 1 unit is failed and 1 unit is under 1 unit has gone for it has
been censored. So 2 units have been removed from the testing further and at this for this
failure the units which are working is 2 removed, so only 8 units are working. Again 1 failed,
so 7 units are working, in a way if I say it is a reverse ranking.

In a way if I say that this is 1 to 10 and this is 10 to 1, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1. So this


reverse ranking is denoting the number of units which are working at the start of the interval.
Now reliability calculation is simple, unreliability calculation is 1 minus 1 divided by nj
whenever failure happened like or we can say this is my reliability calculation.

That means 1 minus failure happen out of 1 failure happened here in each interval. So here 1
failure happened out of 10 units. So 1 minus 0.1 that is 0.9. This is my reliability at the end of
this interval. Similarly, for this time I will not calculate reliability, the censored time
reliability is not calculated, like earlier also we did not calculate.

For 560 number of device working is 8, so 1 minus 1 failure out of 8 devices. So that would
be 0.875. Now this 0.875 is the reliability for this interval only 1 failure for this, but actually
2 failures have happened, so for earlier failure the reliability induction is 0.9. So effective
reliability is that for up to here that means it should survive this period also, this period also.
So reliability 0.875 will be multiplied with 0.9 and that will be giving you the reliability. So
reliability in 560 hours is 0.7875

The reliability in this interval second failure interval that is 0.875 but gravity has to survive
first interval also second interval also, so reliability will gets multiplied that is the formula we
have seen all these values are to be multiplied whenever failure happens. For seventh again
the value is 0.857 that is 1 minus 1 divided by 7 and what is Rti,

Rti is this value multiplied by 0.7875 multiplied by 0.875 that will give me the 0.6750. 1
minus 1 by 0.5 is 0.8 then we get this 0.8 multiplied by this will give me this value. Similarly
we get this value, and this is giving us reliability versus time, ti versus Rti. we are able to plot
and we can see that how Rti is changing with ti.

500
Similarly, if we use the rank adjustment method, it ranks adjustment method again, it is a
similar way that we calculate this is my rank this is my reverse rank. So the rank adjustment
method gives me the adjusted rank so for adjusted rank first we calculate the reverse into
previous, this is my reverse ranking, and this is my previous reverse ranking. Plus, the
formula is reverse rank multiply by previous adjusted rank plus n plus 1 divided by reverse
rank plus 1. So n here is 10, so 10 plus 1 is 11.

So, that is why I am writing 11, this is my reverse rank and previous ranking previous adjust
rank is 0 because i equal to 0. So here I am writing this adjustment this is my new adjusted
rank so for i equal to 0 previous adjustment adjust rank is 0. So what will happen, this will
become 10 into 0 plus 11 divided by or reverse rank plus 1 11, so effectively I will get 11
divided by 11 which will be equal to 1. So, my first rank comes out to be 1 only.

Now the second rank, because here what happens this is the failure, this is a censored unit, no
failure has happened, so I will not do anything here, then I will do the second failure
happened at 560. So, I will use this 8, so 8 is my reverse rank, so but I will do 8 and how
much is my previous adjusted rank that is my 1 plus 11 divided by reverse rank plus 1 9, so
that will be 19 divided by 9.

So, this becomes my new rank and this will do is 2.1 something. So, this gives me the new
rank similarly for this I will get the new rank that is 7 into previous adjust rank is this, that is
2.11 plus 11 divided by 8, this gives me this 3.22 as you see here this is my first failure this is
my second this is my third failure, so third failure is not rank 3 this is little higher than rank 3.

Because the failures which we have removed earlier that they could have failed and because
of that there is a little higher weightage to the rank, so it is not 3, third it maybe 3.22 failure.
Similarly, this fourth failure is 4.51 because more devices which have been removed could
have failed during this period and because of that rank has become higher.

Rest of the process is same once we have the rank then if we have the rank that that this
becomes my now i value, new adjusted rank. So let us say i dash, so i dash if I want to
calculate Fti that will be i minus 0.3 divided by n plus 0.4 that is 10.4 if I am using the
median rank.

If I am using mean ranking I will use i dash divided by 11. So depending on the this rank I
will use to 1, 2.1 divided by the same formula which we have developed and seen earlier
same formula will be used for calculating the Fti. We get the Fti, Rti will be one minus Fti we

501
can plot ti versus Rti or Fti and you will get the change in reliability versus time similarly we
can get small ft can get lambda t using the same table if we extend to calculate we will be
able to get it.

So, we have discussed various methods which using which we are able to get this analysis
and we are able to get this data. We will discuss the same thing in more detail, for one more
type of data which is remaining that is the grouped censored data. If you have the group
censored data, how we can do the analysis that is also again simpler way of doing that and
calculating the reliability, ft, rt, lambda t, etc. We will stop here today, we will continue our
discussion in next class. Thank you.

502
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institutes of Technology, Kharagpur
Lecture: 29
Failure Data Analysis: Non- Parametric Approach (Contd.)
Hello everyone, so we will continue our discussion about failure data analysis using non-
parametric approach. In earlier classes we discussed about the failure data, which is generally
time to failure data or which we have the group or interval data, then we discuss various
methods that when we have the complete data how to analyze if we have the singly censored
data or multiply censored data how to analyze.

One category which was the meaning was that if the data is grouped and data is multiplies
censored and also we will discuss one more type of data that is static data where we have the
fixed interval and for which we want to know the reliability.

(Refer Slide Time: 01:09)

So, whenever we say we are talking about group censored data that means, we have the
intervals we have t 0 to t 1 then t 1 to t 2 and each interval we would like to know how much
is a failure n i. So, rather than that, as we took earlier we can take the n s, n s mean so, at the
start of interval how much for the unit so, working which we are writing as the H i here. So,
if you write this H i and this is i.

So, for first interval that means, at t 0 how many units were working, so, that will be the
initial number of units which you put on the test. So, that may be n, then how whatever
failure have now we have the two categories here, we may have the failure here in this

503
interval and we may have some censored unit. So, number of failures let us say is x, x 1 and
number of censored let us say x 2 which we are writing as F and C.

Then what will happen for this second interval this will become n minus x 1 minus x 2, but
what will happen number of units which have failed or censored will not be available to
working for the next interval. So, what will happen at the start of each interval we will have
reduction in unit because of failure or suspension this is what we are calling as the number of
units which are working at the start of interval H i.

So, these will be H i minus 1 and from H i minus 1 we are reducing the number of failures in
censored unit. Now, what happens that the censored units which you have removed from
here, they might have been removed from anywhere in the interval they might have been
removed from the start of the interval or they might have been removed at the end of the
interval.

So, on an average the number of units which are working at the start of interval we try to
consider that we have removed rather than c i minus 1 we will consider that all the i minus
half of the half of the time they have been on an average working. So, number of units
removed effectively we are taking a C i by 2 this is one type of approach another kind of
approaches where we use we simply used our mean ranking method and we do not we use the
same formula that you have considered here.

So, the next interval number of units removed will be the similar way. So, here the reliability
and we are calculating this is nothing but 1 minus number of failures divided by number of
units which are working at the start of the interval. So, this becomes our interval reliability
that is the conditional reliability. So, it is the reliability for that particular interval only. So,
here what is the probability that let us say first interval we are talking about, then it is giving
the probability that in that interval, there is no failure or what is the probability of successful
passing this particular interval.

So, if you want to know the continuous to have like if I want to know the reliability after 2
intervals, so, for that I have to multiply the reliability first interval and the second interval
like we did the same in the case of ungrouped data censored data multiply censored data
whenever we considered the found whenever failure happened we found our interval
reliability and we multiplied that with the earlier known reliability, and that is how we
considered it, so, same concept is applied here.

504
That whenever we are calculating here we are calculating the conditional reliability here for
each interval and for current reliability, we have to multiply reliability for all earlier intervals,
then only we get the reliability current interval. So the reliability at the end of each interval
will be previous interval reliability multiply with the reliability at the end of previous interval
multiply with the reliability or reliability of this interval, once we do that we will get the
reliability of the at the end of the interval.

(Refer Slide Time: 05:10)

So, let us take one example here all these examples which I am presenting here, these all we
have put up in using the Excel sheet also. So, those you may be able we may be able to share
and you will be able to find those little later. Let us see how we have done it. So, like here i is
the interval at number, so, that is 0 1. So, when I say 0 1 2 3, so, this becomes our interval
number.

So, at the start how many units are working 50 units are working and so, here like in first
interval that is from 0 to 0 here we are considering them no censoring no failure here. So, 0
failure and 0 censored this actually this values are right aligned. So, you have to read it
according to this and H i dashes 50. So, here now, after one interval what will happen in this
0 to 1 interval we have the 5 failures and we have one censored.

So, because of this what will happen effective number of units which we had because of the
loss of censored unit that will become 49.5 because 1 divided by 2 will be half. So, that will
be 50 minus half. So, effectively we have 49.5 units working at this start. So, here for
working for that interval.

505
Now, out of the 49.5, how many failures we have 5 failures. So, our reliability comes out to
be for this interval is 5 divided by 49.5 this gives us this value. So, this because our earlier
reliability was 1 if we multiply 0.899 and with this we get that 0.899 and here and failure
probability will be nothing but 1 minus reliability.

Similarly, now, let us see the second interval for second interval that is from 1 to 2, we have
past three more here like this is for four years. So, after four years means this is a month, so,
4 into 12 is 48 months. So, after 6 months, so, that means first 3 months, we got this 5 failure
1 suspension from 3 to 6 months, that means in the second quarter we got we had 44 units
working because 5 were failed and 1 was removed the remaining units are only 44, 50 minus
6 is 44 out of 44 3 failed and 1 is, 0 is censored.

So, that means since no censoring is done, so 44 remain eligible for the complete period. So,
out of 44 failure is 3. So, therefore, this interval how much is the reliability that is 3 out of 44
So, that becomes 3 divided by 44, 3 divided by 44 is 0.932. Now 0.932 is the reliability for
second interval but for equipment to survive two intervals that means equipment to work up
to 6 months, the reliability will become the equipment should be working for first interval as
well as the second interval.

So, the reliability will become 0.899 into 0.932 this becomes our new reliability. So, after 2
intervals reliability is 0.838, same way we calculate for other intervals first interval reliability
calculate then we multiply with the previous interval and reliability because as we know for
equipment to survive, let us say this is t 1 this is t 2 this is t 3.

So, for equipment to continue working here it has to work here work here and work here. So,
it has to be reliable here and reliable here and reliable here. So the reliability gets multiplied
for later interval all earlier interval reliability has to be multiplied that means if we multiply
all these, we will get this value.

Same way we have done this I have done this in Excel maybe we can share later. As you see
here, we actually had done 50 equipment’s working out of 50 29 are failed and 9 are
censored. So that means 38 units, we have still 12 units of working here which are censored
here, but they are right censored value. So which do not affect here, but what happens that we
do not reach to the reliability new to 0 here about reliability curve. We are able to draw only
up to the whatever values you are observing.

506
So but if we want to draw this curve further, either we can do the trend and we can draw this
further by the curve fitting etcetera, but rather than doing that, we should be finding out the
appropriate distribution for this data and we should use the distribution analysis so, that we
are able to extend this curve and we know the full reliability distribution for the whole
lifetime.

(Refer Slide Time: 10:20)

There is one type of the more type of data which we may be discussing that is static life
estimation. So, this type of estimation is required for the equipment’s like many times one
shot devices are there are many times they work for a very short period of time. So, what
happens here whenever we conduct the test, so, the test is conducted for a mission time.

This can be applicable for large operating devices also, but for large operating devices, the
reliability is considered to be function of time. So, we would like to know, reliability at for a
particular duration of the time, but here, we are more interested in let us say that short period
devices are there, which work for a certain time t 0. So, this is kind of mission time. So, it is a
mission that which it has to complete and that time for mission is t 0.

So, here what we will do, we will have multiple devices tested for this time, all devices and
we do not know anywhere in between whatever failure happened or success happened, we
come to know only at the end of the time whether that failure happened or this success
happened. So, here when we put n units on the test, let us see r failures we observed.

So, for t 0 time at the start we put n units and at the time t 0 when tests completed, we found
that r units have failed in that case of scenario, how can we find out the life or how can we

507
find out the reliability. So, in that case reliability as we know. So, obviously, if we have the r
failures out of n our unreliability’s r divided by n, that is equal rank method and whatever we
use so, all general sense r upon n will give you the unreliability.

So, here the reason is that the reliability will become 1 minus r divided by n. So, this becomes
the point estimate of reliability. But point estimate of reliability can be misguiding here, the
reason being that point has to be like one failure happened. So, one failure happened, maybe
here we use, let us say 5 devices out of 5, 1 failure happened, but maybe we use 10 out of that
also 1 failure happened.

So, this estimate of reliability is dependent on the value of n how many samples we are using.
So, when samples are less, this estimate can be biased depending on the outcome, because the
sample which are chosen, if there are some equipment’s, fault equipment’s have come, so, it
will make our reliability estimate bias.

So, it is useful if in such cases we are not only estimating reliability, but we are also
estimating the lower bound on reliability and upper bound on reliability. So, we construct a
confidence interval here. So, when we say 1 minus alpha percent into 100 percent confidence
interval. So, essentially confidence interval concept is like this that we if we assume normal
distribution, then we have the mean value here.

So, when we say confidence interval, then we have the lower bound here and we have the
upper bound here, upper bound, so, when we have 1 minus alpha plus alpha here, so,
probability of value falling in this region is 1 minus alpha. So, that means alpha by 2 belongs
to this reason alpha by 2 belongs to this reason. So, we want to make sure find out the value
this value this value if let us say this value will be representing to the alpha by 2 into 100
percent if we say alpha by 2 into 100 so, this will be alpha by 2 into 100 percentile.

So, we can say this, so, we can say 50 alpha percentile value while this one will be 100 minus
alpha by 2. So, that will be because only alpha by 2 is covered here. So, that is right said is
alpha by 2, so, this will become 100 minus alpha by 2. So, if my alpha is let us say, 0.1 that
means 90 percent confidence interval I am talking about then 5 percent to 5 percent here, so,
this becomes fifth percentile, this becomes 95th percentile value. So, this percentile value
which we are getting, we can use it.

So, here the for this kind of scenario F distribution comes into the picture. So, this is
statistically given that R L value can be evaluated as 1 plus r plus 1 divided by n minus 1 into

508
F 2 inverse whole inverse so, that is 1 divided by 1 plus r plus 1 upon n minus 1 r into F 2.
Similarly, upper reliability estimate can be achieved by F 1 into F 1 plus r upon n minus r
plus 1 inverse that we can write as F 1 divided by F 1 plus r upon n minus r plus 1.

So, this R U and R L gives us the estimate this will be R L this will be R U. Most of the time
while F 1 is and F 2 we have to get from the F distribution, F distribution generally has the 2
parameters one is this 2 n minus 2r. So, parameter one is this second parameter is this and the
probability for which we have to calculate is alpha by 2.

Similarly, for alpha by 2 then we have the degree of freedom 2 degrees of freedom which we
use here for this second F 2 case the degrees of freedom is 2 r plus 2 and 2 n minus 2r and 2n
minus 2r. So, for these degrees of freedom we can get this F 1 and F 2 and we can use in this
formula and once we use this the in this formula we get the lower estimate and upper estimate
on reliability.

So, we get the range of reliability in which we expect that the average reliability will fall for
the given 1 minus alpha into 100 percent confidence. So, here generally for the reliability we
are more interested in lower estimate, a lower bound, because we want to know how much
minimum reliability we are going to achieve. So, generally, whenever we are talking about
good things, positive things, we want at least how much we will get.

Similarly, when we talk about so, we are interested in lower bound. Similarly, when we talk
about the bad things like risk failure probability, we talk about the upper bound because we
wanted at max how much that is, so, we are interested in upper bound. Many times here we
are constructing the two sided bound many times we may be interested in one sided bound,
then we are interested in one sided bond that is a lower bound that is alpha percent on left
side and 1 minus alpha percent on 1 minus alpha into 100 percent on the right hand side,
alpha into 100 here.

So, rather than alpha by 2 we will be using alpha. So, same formula may be applicable, but
when we say one sided only in that case, we are not taking alpha by 2 alpha by 2 we are
taking alpha on the left hand side or on the right hand side for lower bound on left hand side
and for right hand, for upper bound, we will be using the same on the right hand side which is
leaving alpha into the right hand side.

So, using these formulas is general and applicable to almost all distribution in all cases we
can use it. Let us see the example for this.

509
(Refer Slide Time: 17:35)

Pr  RL  R ( t0 )  RU  = 1 − 
−1 −1
 r +1   r 
RL = 1 + F2  ; RU = F1  F1 + 
 n−r   n − r +1 
- F1 = F ; F2 = F
,2 n − 2 r + 2,2 r ,2 r + 2,2 n − 2 r
2 2

Let us say that we want to estimate the launch reliability of a booster rocket. So, generally
these kind of formulas are much more applicable to rockets, missiles, guns, or we can say
bullets or we can say the different kind of ammunitions, because they are once fired then job
is over. So, one shot devices you may have or you may have some mission oriented devices
which goes and complete the mission. So, like satellites etc.

510
So, here satellites generally work for a longer period. So, you may be interested in time based
reliability, but for rocket you will be interested in this static reliability. Now, here in this case,
the rocket is used to launch the communication satellite into the orbit. Now, this 20 launches
have been completed. So, for that launch 20 launch has been completed. And out of 20 one
time the launch was unsuccessful, so that one failure is observed.

So, how much is this reliability for this rocket, someone wants to know how much is that
laptop for this rocket with a 90 percent confidence interval. So, we can apply whatever we
have done in previous slide that N is 20 r is 1. So, our estimate of reliability is 0.95 but as we
know that number of samples are only 20. So, it is still it is a good size, but still if you take
larger size the value may change.

So, how much is uncertainty about our estimation or because of that, how much is going to be
the lower bound and upper or if there is no uncertainty of our estimate value would have been
good. So, because of uncertainty, we are trying to find out the lower bound on reliability. So,
for lower and upper bound, we need 2 values F 1 and F 2, F 1 and F 2 we have to get from the
F distribution.

So from F distribution when we choose these parameters, 0.05 is the probability 40 and 2 are
the two degree of freedom. Then we get to 19.47 and 4.05 4 and 38 degrees of freedom we
get the 2.625. So based on that we calculate reliability R L, R L is 1 plus 2 divided by as you
see here, r is 1, so, 1 plus 1 2 n minus r, n is 20 r is 1. So, 2 divided by 90 and multiply by F 1
so, multiply by F 2. So, F 2 is 2.625.

And then we took 1 upon of that inverse of that, we get the value 0.7835. Similarly, R U is
calculated by F 1 into F 1 plus r upon n minus r plus 1. So, 1 R is 1 n is 20 So, 20 minus 1
plus 1 is 20 so, this will be 1 by 20 whole raised to the power minus 1 and we get the upper
reliability is 0.9974.

Now these values if we want we can see in excel also we can get it otherwise we have the F
distribution tables from there we can get this F distribution values. So, here as you have seen
that we are able to calculate various system reliability when by using the nonparametric
approaches, but as we discussed earlier nonparametric approaches like we presented earlier.

511
(Refer Slide Time: 21:07)

We do not get the full scope of reliability right we get a limited scope, because here we know
what happens for the reliability for 48 months, but we do not know the reliability for 50
months or we do not know the reliability for 60 months because, that data we do not have.
So, unless we do some sort of trend fitting here, we do certain things here, if you do some
trend fitting and we are able to extend this line, then we will know if assuming that the same
trend will follow.

We will be knowing the reliability for 60 also here, we will know the reliability of 50 also
here 55 also here and other times, but to do that, we have to do certain model fitting because
model is going to fit that trend to this data. Without using a model we cannot extend or
interpolate or extrapolate our results.

So, here our next topic, which we will discuss for this purposes that how to do this estimation
using the parametric methods, where we will mostly be discussing about the 4 distributions.
So, the distributions which are made mostly used in reliability or the exponential distribution
viable distribution and 2 more distributions, which may be used on may not be directly to the
time to failure data, but for repair data many times log normal is used for time to failure data
also log normal distribution is quite widely used.

Sometimes normal distribution is also used for the purpose of stress data, strain data that kind
of applications whenever we involved then normal distribution also comes into the picture.
So, we will discuss in our next class how we can use these distributions fit the data to these
distributions. And once we fit the data to these distributions, how can we evaluate the

512
parameters of reliability and they are able to know more elaborate information and we are
able to know MTTF for like here also we are not able to calculate MTTF because we do not
have all the failures.

MTTF is only we are able to calculate under the assumption that if he assumed the
exponential distribution by default here, then only we can calculate the MTTF, which is we
are calling his observed MTTF, which as we discussed in previous class, observed MTTF
observed MTTF as we are calculating is total time or test time we can say or total operation
time we can say divided by number of failures.

Where the count all the time which has been spent as working. So we can see total working
time either by the failed unit or by the censored unit. So like here, if we say for this 38
failure, so we have 12 failures which are remaining is still so for if we here want to calculate
this time, then for this time, how do we calculate this sorry just one second.

(Refer Slide Time: 24:43)

So like if you want to calculate MTTF here, then first I have to assume that if I do not assume
distribution, I cannot calculate MTTF here, but normal normally we assume the exponential
distribution because exponential distribution we are assuming that time to failure is constant.
So in that case we are able to calculate MTTF here. So MTTF if you want to calculate then
we know that on an average how much time.

Like if you see these 5 units, these 5 units plus 1. So, on an average if you take they might
have felt any view so, we can take the average time. So, that means 1.5 into 6 so 1.5 units
have 1.5 months for 6, then again for this interval that means up to 6 from 0 to 6 right? So,

513
which so, that becomes 4.5 so, 4.5 on an average midpoint multiply with how many device? 3
devices worked. Plus, similarly, we can take 7.5 into for 7.5 two devices worked then 10.5
10.5 three devices work. Similarly, we calculate all.

So, last value would be that up to 46.5. In 46.5, how many devices worked one device right
into 1 plus this is the failure time this time is recorded for the failure devices or the censored
devices in between, but there is a sensor devices 12 sensor devices here which have been
working here not failed and which is continuing to work. So, for these 12 devices they have
worked for 48 months, so, 48 months will become the total time and divided by total number
of failures, how many failures we observed we observed only 29 failures.

So, this becomes our observed MTTF. But we have to be cautious when we are using this,
this observed MTTF may not mean the life, mean life, because we do not have the failure
data for all the equipment. So, if the components later on start degrading and fail faster,
because of if a component continues to follow exponential distribution for whole life this will
be valid, but that generally does not happen as you have seen earlier. Components do degrade
and because of the degradation later on the failure rate may rise.

So because of that a higher failure rate or the degradation, this observed MTTF may not be
valid. So in that case, we have to be cautious unless we have all the failure data. Calculating
MTTF may be tricky part. So if we observe if we use exponential distribution for this, we
should be cautious here that we should not miss understood this with mean life. This is the
MTTF under the consideration that system is following the constant failure rate or
exponential distribution.

So we will continue our discussion and we will next time discuss the failure data analysis
using the parametric models. Thank you.

514
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture: 30
Failure Data Analysis (Parametric)

(Refer Slide Time: 00:26)

Hello everyone. So, in previous sessions we discussed about failure data analysis that is
mostly time to failure analysis or we have the group failure data analysis, which may be
completed or which may be group data or sensor data. So, and we discussed how we can get
the user rank methods and we are able to get the F ti once we get the F ti, then we can fit it to
we can find out the R ti zt ft a small ft like that.

So, today onwards we will be discussing more about how we can use the same data and we
are able to fit the distribution. Once we fit the distribution we are able to know the
distribution pattern, once you know the distribution pattern, we will be able to evaluate
important quantities related with the reliability as well as we will be knowing almost
everything about the reliability.

So, as we this data once we do the analysis and once we fit the data whatever parameters we
get, that becomes the basic data which is used in all other matters which we will discuss like
system reliability approach and all other methods which we have discussed earlier for that
reliability, in this in this lecture series.

515
(Refer Slide Time: 01:43)

We will be initially briefly discussed we because this is not a statistics class this is more
about reliability. So, we are selectively discussing about the few principles which we are
using for the reliability estimation. So, one way of doing that estimation is using the least
squares estimation, where we try to take the square of error and we try to minimise that.

So, generally for least squares estimation when we are using we do the curve fitting. So,
rather than curve fitting, as we know that if we want to fit let us see if we have the data. And
if we want to fit some other curve, it would be difficult, but it is much easier if you have the
data to fit the data to this straight line, because when we fit the data to a straight line, we can

516
see that the line should be passing to the reason where most of the points are lying nearer to
the line.

Given a line equation


Given data , the parameters of the equation can be estimated as

Coefficient of determination

• Where is called index of fit.

So, this curve fitting is mostly done by the line fit line fitting. So, here we try to fit this line in
a manner that our error in y is minimum or similarly if we reverse this equation, we can do
the minimization of error on part of x, but here since like y here is the generally in all these
cases y which we evaluate further and see is generally related with the failure probability and
the x is mostly related with the other random variable which is time to failure.

So, we try to find out we try to transform our equation, the accumulative property
distribution, and try to estimate try to get this data into the y equals mx plus c form. Once we
are able to do that, then we estimate the value of m and c which represent the parameters of
the distributions, which are related with the parameters of the distribution.

So, using those equations, we are able to evaluate m and c. So, we have the general equation
whenever any data is available, let us say we have the xy2 yi and we have n such instant like
x1, y1 that means, x1 is that time is the independent variable, which is which can be time will
some relation of time and y1 is the dependent variable which is some relation of the
probability of the failure. So, as such let us say we had n number of data or we have n number
of failures. So, like we have done earlier, we have the n time to failure.

517
So, we have related with the time to failure we get the x1 to xn and then we have based on the
ranking, we try to find out the F ti, F ti becomes F ti is related. Based on the F ti we calculate
the y1 to yn. So, we have each pair, we have nine such pair. Using those pair, we can get the
regression line we can fit and by using that we get the line after line fitting but is the values of
m and c.

The value of m is statistically given as the estimated value of m for such case for n such data
is given us summation of i equal to 1 to n xi yi minus x bar summation of yi from i equal to 1
to n, this summation of i equal to 1 to m yi can also be written as n into y bar, because y bar is
nothing but the 1 upon n summation of yi.

So, that can also be written if we want we can use we can also replace these quantities as n
into x bar y bar. Then, in denominator we have x y this integral, summation from i equals 1 to
n of xi square minus n into x bar square and we know what is x bar, x bar is the mean value
of xi that is the average value of xi. So, one upon n summation of xi.

Similarly, we can estimate the value of c the constant here, this value of c we can estimate by.
So, value of c is nothing but if we see c is equal to y minus mx, So, c is nothing but average
value of y minus m into average value of x.

So, this value we calculate using this formula, so, m value of m will be the estimated value
here, which we have estimated here. So, m estimated x bar and y bar is as you know that is
nothing but the 1 upon m summation of y i. So, based on this we are able to get the m and we
are able to know the c and once we know the m and c based on the relationship which we
develop we can find out the distribution parameters. Here as we know that we want to know
how well this data is fitting to the distribution or the line straight line.

So, how well this data is fitting to the straight line if you want to find out we can use a
coefficient of determination. So, coefficient of determination is 1 minus if you see this part
this is yi, yi is the actual observation which you have and c cap is the. So, this means yi
minus m cap xi minus c c cap, same that is supposed to be 0, if your points are lying exactly
on the line, then yi will be same as the y cap or yi cap, yi cap is like when we put xi value
then based on the estimated value of m and c should give the nearest near value to the yi.

So, this is kind of error and we are taking the square of the error as you see discussed earlier
it is the least square estimation which is nothing but the error is square and we want to

518
minimise this. So, r square is giving this so, if this is exactly if it if your line is passing
through all the points what will happen this value will be 0 and your r squared value will be
1, if it is far away from the line, the points are far away from the line but will happen this
value will be higher. So, this value whether it is lesser or more because of this square both
will be added.

So, whether the value is higher estimated value is higher estimated value is lower in both
cases error will be high and error will be square and added here. Similarly, it is divided by the
yi minus y bar square. So, here square root of this value when we take r square is generally
called the coefficient of determination and the square root of this is we are calling as is an
index of it. So, we want our r value to be close to the 1, if it is close to the 1 our fit is
considered to be good.

(Refer Slide Time: 08:43)

519
Now, let us see that if we want to do this fitting of the data to the exponential distribution. So,
if we find out that and we assume that our data is fitting to the exponential distribution, then
what will be the exponential distribution parameter, we know that exponential distribution
parameter is only 1 that is lambda, which is also called the failure rate because exponential
distribution has the constant failure rate which is lambda. So, here for exponential
distribution CDF cumulative distribution function or we can say the unreliability capital Ft is
given as 1 minus e to the power minus lambda t.

So, now, this equation we have to change into the straight line equation. So, for changing into
the straight line equation, we make some modification to this like what we can do, we can
take 1 minus the side, so this will happen 1 minus Ft will be equal to e to the power minus
lambda t. Now, again, what we can do we can take log of because this is exponential term, so
we will take log, so log of 1 minus Ft will be equal to log of exponential minus lambda t will
give you minus lambda t. Now, this is minus so I can write it that minus ln 1 minus Ft or I
can write as 1 minus Ft 1 upon we know minus log of x is equal to log of 1 upon x. So, this
will be equal to lambda t.

So, here now, if you see this equation is kind of turned into a linear equation, so, what linear
equation this can be I can call it y, this lambda I can call this m and this t I can call it x. So,
my higher c is 0, c is 0. So, my equation is turned into simple equation y equal to mx, since it
is c equal to 0 that means, when I am plotting this line, the my lines should pass to the origin
that is 0. So, for exponential distribution whenever we are fitting the intercept has to be 0, it

520
has to pass through the origin. So, now, we if we apply the formula which we have seen
earlier.

So, y is equal to ln of 1 upon 1 minus Ft and x is equal to t, once you put it here, then we will
be able to. Now, what is m here m, as we saw this formula this is the value of m which we
can calculate from here. Now, m value when we are calculating here we can calculate a
summation of i equal to 1 xi yi divided by summation of i equal to 1 xi square. So, this value
once we calculate this will give us the lambda value. And yi is this xi is this and r square
value again we can calculate, here c is 0 because c is 0 we do not need to calculate c here and
r square value is 1 minus whatever formula is the same formula by putting c equal to 0 we are
able to get it. Once we use this, we will be able to get the fit the fit it and we will be able to
get the distribution.

(Refer Slide Time: 12:04)

So, here I have taken one example here, this example I will try to show you in excel sheet I
have developed this in Excel sheet I will try to show you. So, that you can follow up and you
can see that how this is done.

(Refer Slide Time: 12:33)

521
522
So, let me show you one Excel sheet here, I have developed this Excel sheet for your purpose
and you can follow it up. So, let us go with exponential i before exponential distribution
sometimes we may be interested in histogram plotting also. So, like we discussed in earlier
classes that if you have the time to fill the data, which you may have it here you can arrange
the data in increasing order then you can divide the data into interval by using the Sturges
formula.

So, Sturges Rule when you apply that is 1.3 plus 1.3 plus 1.1 plus 3.3 into log 10 base m, m is
35 here. So, when you use that value, you get this value as 6 around. So, for 6 intervals, you
can divide data into the 6 interval and for the 6 interval you will get the number of failures
you will get the number of survival. So, initially 35 units of working.

So, if you see that here up to first interval 350 18 units failed remaining 17 like the same way
as we did earlier we can do this analysis and we are able to get the ft small ft. So, this is the
analysis using the nonparametric approach, which we have already studied in previous
classes.

Now, here sometimes we can plot this frequency curve or histogram curve to see that how
this ft is varying or how this number of failures are changing with time. So, this can give us
sometimes an idea or better will be if we plot the failure rate curve right. So, zt if you plot the
zt curve here that can give us a little bit idea whether failure rate is increasing or decreasing
constant like that.

So, that can indicate us whether we should use exponential distribution or should we use
variable distribution etcetera. But we can also directly do by directly fitting the distributions
and we are able to know which distribution is fitting.

(Refer Slide Time: 14:23)

523
524
525
So, two distribution fitting let us say, I have these data's which I have already sorted. So, we
have 20 data points here. So, what I did I took the data points then I sorted the data and this
becomes the increasing order. Once I have then we can use the rank methods like we if we
use the, as we know that we can use the median rank method.

If you use the median rank method, then we know for median, the median rank is i minus 0.3
divided by n plus 0.4. So, same thing I will do this is equal to i, my value of i is in column B
plus 0.3 divided by my value for n I have written below, let us say I will write it here
somewhere. So I am using this value. And I am using this value, since I do not want this
value to change, I will put dollar sign here. And this plus 0.4.

So, i minus 0.3 divided by n plus 0.4, I am using the same here. And since I have not put I
will put 20 here that is my value of n C25 right. So, we are able to have these values F ti. But
as we know that F ti is not having the my axis ti and y is F ti in general, I want to know how
F ti is changing with the time, but that distribution is not following the straight line. So, I
have put this straight line equation. So, for a straight line as we discussed earlier, but we have
to know for straight line, we have to change Ft value into this ln of 1 upon 1 minus Ft or
minus ln of 1 minus Ft.

So, same thing we have done yi value when we get that is minus ln of 1 minus F ti, that is
equal to, I can take both either ln of 1 upon one minus Ft or minus ln of 1 upon minus ln of
one minus Ft to both will give the same. So, I will use this minus ln of 1 minus Ft, Ft is this
value. So this way, I will get the y value. Same values I will get, I have already done this.
Now, what is xi value here? Let us see what is the x i here? Here x value is the t. So,
whatever my time is there exactly same time I am supposed to take. So, this will be equal to x
will be equal to t. So same value will appear here.

(Refer Slide Time: 17:51)

526
527
528
529
530
531
Now, what I need to do is I, there are two ways, I will show you the graphical way, which we
can use in the Excel. And I will also show you that how we can use the formula to calculate
the same thing. So, let us first see the graphical way. So, I have already plotted here, I will do
this exercise again in front of you so that you can also follow it. So, I will take this graph
below, I will not use the same graph. I will make a new graph which, so what I have to do, I
have to plot xi vs yi.

So same thing, I will take it here. And I will go to this insert, you can do use any statistical
package also for this, but that is see that step by step, how we can do it. So I will use this dot
xy plot scatter plot. So, I have done this xy scatter plot. Now, this xy scatter plot shows my y
versus i xi. But, the problem is that I want to know, I want to fit this straight line to this. So,
either I can take the scale manually and fit it here like this. But that would be little errors, but
still I can do, or I will use the calculated way. So Excel has the inbuilt function there, it uses
the trend line function.

So, we can use the trend line function here. If you see trend line, I get the various options
here. So first option, as you know, for since we are fitting the log, we are fitting the
exponential distribution, we have to set the intercept. Because we want intercept to be 0, we
do not want that it should not have any intercept on x or y axis. So that should be 0. And
whatever is the fitting line, I want to know the equation so that I can use the values. So, I will
display the equation on the chart, I will also display the r square value on two charts so that I
can determine how much is the index of it.

If you see by doing this, I am able to get this, I am seeing that y is equal to 0.0238x. So, as we
know from this, what is my y what is my m, m is lambda. So, from this my lambda which I
have, which I want to calculate here, my estimated value of lambda is directly I am able to
get it from this graph and how much is that this lambda value is equal to 0.0238.

Now, if I had know the lambda, I know that distribution. Now, whatever I want to plot, this is
an estimated value of lambda from this plotting and based on it and index of it is also good it
is almost r square is 0.96, this is like graphical way using the Excel you can do it easily, or
you can also there are exponential graph paper are also available.

In exponential graph paper what happened they have modified the axis like this will be the
logarithmic axis and this will be the linear axis because t is linear and 1 minus Ft is the log
the logarithmic axis. So, this will be the logarithmic axis and this will be the linear axis. So,

532
what will happen the same thing but what we are doing by the modification of data same
thing can be achieved by the modification of axis.

So, that means 1 then it will appear 10 like that. So, because of that what will happen once
you have that data you can do the plotting here, that will also look into this straight line, that
will also supposed to follow on the straight line. So, same like exponential we have the graph
paper for v bull etcetera, but we can use them we can plot the data on them and we can find
out, there is another way to find the lambda here that is if we are able to know we know that
when t is equal to lambda, if you see here, if t is equal to lambda here if you put t equal to 1
upon lambda here, then what will happen this will become e to the power minus lambda
divided by lambda that will be e to the power minus 1. So, and e to the power minus 1 is how
much that is equal to exponential minus 1.

So, whenever value of t is equal to lambda my reliability is 0.36 and unreliability is equal to.
So, my CDF value is always this 62 point 63.21. Now, according to this if I want to calculate
y value y value as we know is minus ln of 1 minus this value, minus ln 1 minus Ft. So, once
we use this, so, here if you see our if I look at this curve, then my value would be falling
somewhere here, for one value if I plot and if I take this intercept on x that would be
somewhere around 40 here 40 to 45 like that. So, I can calculate that.

So, as we see the same value like point what we have is lambda is this 0.02. So, my MTTF
value will be 1 upon lambda that will be equal to 1 divided by this value which is around 42.
So, this intercept for 1 is around 42 time is equal to 42 that means, my mean time to failure is
42. Because they know that when t is equal to MTTF, then when I put t equal to MTTF,
MTTF is 1 upon lambda Ft becomes 0.632 and my if I take the l minus ln of 1 minus Ft that
becomes to be the 1. So, by plotting from the plot itself, I will also I can find out what is my
lambda or what is my MTTF.

Once I know MTTF from here I can do but for that I need to have this graph properly marked
and I should also have the minor axis properly marked. So, that I can observe clearly how
much correctly it is obtained. So we are able to get lambda we are able to get MTTF. Now,
let us see the another method the method this is also using LSE but here is the visual way of
doing that type we can use Excel sheet graph approach to do that.

Now let us see we want to calculate the same thing. So what I have done here, this approach I
will be using for across the distributions. So I have used the generalised approach rather than

533
using this formula here, this formula also I have used but let us say if I do not use this
formula, let us say we use the same formula. So for this we need to calculate two values xi yi
and xi square. So, first let us see how do we do that so we calculate xi yi. What is xi yi? Xi yi
is xi multiply by yi, we get this xi yi. Then another value which we need is the xi square. So,
that is equal to xi square. We got this value xi square also.

Now, using these what we have to do we need to take the summation of this. So, this is the
summation I have got, I have used the sum of this value, I have used the sum of these values.
So, this is sum of xi yi, this is sum of xi square. Now, what is my formula here? My formula
of lambda cap summation of xi yi divided by summation of xi square. So, lambda is equal to
summation of xi yi, that is this summation of xi yi divided by summation of xi square and this
becomes my lambda. If you see this lambda and this lambda both are same.

Whatever we have done by calculation, same thing has been done by the excel in drawing or
plotting this trend line, we get the same value. And if I want to calculate r squared, what is
my r square, r square is summation of i equal to 1 to n yi minus m xi square. So, here I am
need to take yi minus mx i. So, how can we can I take yi m minus m xi that is equal to yi, yi
is how much? Yi is this one, minus m value I have already calculate this lambda is my m
value.

Now, this value I do not want to change, so I will put the dollar here again. So that my values
do not change, mi multiply by xi. So, this gives me the now these values have to take this
square, if you remember the formula, yi minus m xi square, I have already done yi minus m
xi. Now, let us do the square. So, for square I will do the squaring of this. So, once we do this
and then again I can take the sum of this, the sum will be this value, which we are having
here.

So, by taking the sum, I will get this value. Now, let us take the denominator is yi minus y bar
square whole square. So y minus y bar, that means I will take this, this is equal to yi minus y
bar, how much is y bar y bar is the average value of yi. So what I have done, I have taken the
average value already here, yi minus yi square. Yi minus yi average square, this is average
value, I will show you that this value is nothing but the average value, this is the I have taken
the average value here.

So, yi minus y bar, this is y bar. So, I am able to get the same formula, this is my
denominator. So, this summation, which I have got divided by this will give me the r squared

534
value. So r square value is nothing but this divided by this sorry, and I have to subtract the
whole value from the 1, 1 minus of this, 1 minus of this because I want this value to be near
to the 0 r square is 0.964485, which is also if you see this is also same as what we have got
here. And r value is the square root of this.

So, my index of fit is 0.982, which is a very good fit. And once this value I am removing,
because we have already done this exercise. So, we are able to get lambda we are able to get r
square we are able to get r everything we are able to get. And once we have this we are able
to fit the exponential distribution. And if you want to know we, once know the per parameter
of the distribution, we know everything about the distribution.

(Refer Slide Time: 29:06)

535
536
So if let us say if I want to take some time values, and if I want to know the Rt, how can I do
is let us take I will take some time 0, let us say I will take the difference of 5, 0, 5, let us take
some values. Some more values we can take an Rt is we know Rt is equal to exponential
minus lambda t and how much is lambda? This is my lambda multiply by t values given here.

Now this G25. As I discussed, this need to be near the constant. This should not change. So, I
will use the dollar sign here. And If I want to plot Rt, I can plot Rt. I will just go to the insert.
And I will plot the xy chart with line. If you see this is my exponential distribution, which I
have fitted and if I want to know the Zt, Zt is constant line. So, no need to do that if I want to
know Ft, what is Ft? Ft Zt Rt. So, that is equal to Rt into lambda t, my lambda t is this value,
because this is constant. So, I will use again dollar sign here and this becomes my Ft.

I can add the Ft value here; I can select data I can add one curve here, series name is this and
the x values are already taken here, and y values are we have to take from Ft and we get this.
You see, this is my Rt curve and this is my Ft curve because valued are different so I will plot
one at a time, this is my Ft and this is my Rt. As you see that this is exponentially decereasing
and whenever I want to know the value, I can know the value.

So, we stop it here today, we will continue this discussing this more distribution how they can
be fitted to the data in next classes.

537
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 31
Failure Data Analysis (Parametric) (Contd.)

(Refer Slide Time: 00:26)

Hello everyone, we will continue our discussion from where we left in previous lecture, we
there we discussed exponential distribution, how we can fit the exponential distribution to
know its parameter, because if we know some data is fitting exponential distribution, we just
need to know the parameter of exponential distribution, which is the lambda. Once we know
the lambda value, then we know so, many components may follow the exponential
distribution, but they may have a different lambda value.

And based on that, the reliability would be different same way the there may be different
distributions also, they may also follow a log normal distribution or Weibull distribution. So,
let us say if we have the data and we want to fit to the Weibull distribution, then how we will
fit the data the same approach which we use for the exponential distribution same we will use
in Weibull also, but there will be little change in formula because Weibull distribution has a
different formula, so different pattern is to be captured.

538
(Refer Slide Time: 01:31)

CDF is given as:

Converting this to straight line equation

Using LSE,

So, let us see how do we go with the Weibull distribution. So, for Weibull distribution our
CDF formula, as we know is given as ft is equal to 1 minus e to the minus t upon theta raised
to the power beta where theta is the scale parameter and beta is the shape parameter. This we
have discussed in earlier classes. So, a scale parameter defines the scale if larger that means
the spread is larger on larger values and beta changes the shape of the distribution.

539
Now, we want to fit this to the straight line equation. To do that, what we will do first we will
take 1 will do 1 minus ft 1 minus ft will become e to the minus t upon theta raise to the beta
now, what we will do we will take ln of both sides we will take ln then what we will get is
minus t upon theta is to the beta as you see here, if we take single n ln then it is not sufficient
enough because the t is still not coming out directly as the function still t raise to power beta
is coming.

So, we have to do 1 more ln, but before that, because this is minus sign we have to make it
positive. So, we will take minus ln of 1 minus ft here and that will become T upon theta raise
to the beta this minus ln I can also write it ln of 1 upon 1 minus ft that is equal to t upon theta
raise to the power beta now, here I can take another log again and log of this what will
happen this will become ln ln of 1 upon 1 minus ft this will be equal to log of this so, log of
this power means power will come out so, this one beta ln of t upon theta and ln of t upon
theta is nothing what ln of t minus ln of theta.

So, this will become beta ln of t minus beta ln of theta and this is my ln double log of 1 upon
1 minus ft as we see here that we actually when we want to have the graph paper for this
equivalent for exponential you need to have a semi log where 1 accesses log and other
accesses is the linear. But here as you see that time is also converted into log value.

So, x axis also has to be logarithmic or natural logarithmic and Y axis also you need to have
the double log paper. So, if you want to plot it on the graph paper and see the linearity in the
values, you have to use the log log paper log log paper on y axis and log on x axis once you
use that kind of paper, then only this you will have the linear equation.

So, here this is y and this is beta is m x is ln of T and c is minus b ln of theta. So, same
equation we can use what we have used earlier same regression formula we can use and we
will get this. So, if you want to know the beta value which is m. So, whatever the formula of
m which you develop that is summation of i equal to 1 to n xi yi minus n xi x bar y bar
divided by summation of i equal to 1 n 1 to n xi squared minus and x bar squared.

Same formula we will use here for this value of c we know what is the value of C, c cap is
equal to as we see here that is y bar minus estimated value of m minus x bar this is what we
have seen earlier that value of c is equal to here y bar minus mx bar same value we can use
here y bar minus mx bar here y bar is the average value of y x bar is average value of x which
we will get from the data and m bar we will get from here, which is the beta cap.

540
So, if I and what is c cap here c is equal to minus beta ln of theta. So, I want to know the
value of theta beta value I have already got and this is equal to y bar minus beta x bar,
because m is equal to beta. So, now, I want to get so, this will ln theta will become minus 1
upon beta y bar minus beta cap x bar.

And from here if I want to know that theta theta will be exponential of this. So, theta will be
equal to e to the power minus y bar minus beta cap x bar divided by beta cap. So, same value
we will be able to get the theta. So, here what we have to do we have to we have converted
the equation to the line equation. So, first we calculate the xi yi values, then we calculate the
value of m and c this I have shown here for the data I will show it in the Excel sheet so, that
you are able to follow it properly.

(Refer Slide Time: 06:54)

So, I have developed the same thing in the Excel sheet let us see this is Weibull. So, let us say
we have the 15 time to fill a data, which I have already sorted and put it here. So, i equal to 1
to ti I have this 50 data and Fti as we discussed earlier Fti is nothing but i minus 0.3 divided
by n plus 0.4. So, same formula that we have used earlier same formula is used again i is if
you see here a3 is i minus 0.3 divided by n value I have put it here, that is 15 minus 15 plus
0.4.

So, this value will give me the Fti values, I will get the Fti values. Now, I know, for ti and Fti,
I have to convert it into the xi and yi. And what is xi here as we see here, xi is nothing but the
ln of ti, so what I will do in this sheet also, for xi calculation, I will take the log of ln of ti, I
will do that, and I will get the xi value.

541
So, time I have converted into xi. Similarly I will get the yi, what is yi, yi is double log of yi
is the double log of 1 upon 1 minus f ti, or we can see log of minus ln 1 minus f ti whatever
the way we can calculate. So same thing I have already calculated, I will put it again to show
you ln of I can take minus ln so that I do not have to write 1 upon 1 minus f ti or I can take let
us say ln of ln again ln of ln 1 divided by 1 minus Fti, Fti is this value. And this becomes my
values for the yi.

Now, I know the xi and yi. So, if I use the approach like earlier, I have already plotted here,
but as for doing it again, for clarity, I will do it again. What I will do, I will plot it again, I
will do go for insert first let me select the data once again, xi versus yi, I want to insert a plot,
what is the plot type plot type is the xy scatter. So, I will use the xy scatter here. If you see I
will get the xy scatter diagram here.

Now next step as you remember what we have to do we have to fit the trendline to this and
which type of trendline, we have to fit we have to fit the linear graph here, linear sink’s
straight line fit. For exponential distribution we had to set the intercept 0 but for Weibull
distribution we do not have to set the intercept to the 0 because Weibull distribution has a
intercept.

So, intercept has to be calculated, we will display the equation and we will display the R
squared value once we do that we are able to get the fit to the straight line if you see that if
we look at here my m value is coming out to be 1.8001 and how much is my c value c value
is equal to minus of 9.1484 from earlier discussion we know that what is my m, m value same
as beta. So, this value what I am getting as m this value is also the beta value. Now, I want to
calculate the theta value.

So, beta value is equal to same as m and I want to calculate theta value theta value as we have
discussed earlier theta value is equal to exponential of minus of minus of y bar minus beta x
bar divided by beta, where is y bar this yi I have averaged here this is my y bar, this is the
average value of y, y bar minus beta this is beta into x bar x bar also I have calculated here
average value of x x bar divided by beta divided by beta once I do this I get the value of theta
theta is the characteristic life.

If I want I can also calculate the theta from here. How can I calculate theta from here, as you
remember that at whenever I put t equal to theta in this equation, if I put t equal to theta this

542
will again become 1 minus e to the power minus 1 which is similar to exponential distribution
that means 62.2 percent or on log scale it was 1.

So, log of log will become 0 so, that means, wherever it is crossing 0 that will be your value.
So, somewhere here 5 but this is 5 is on here is the log values. So, exponential 5 you have to
take. So, better to take calculated directly because otherwise I have to plot rather than t I
have, ln of t I have to first calculate the t value here corresponding t value so, that intercept at
0 somewhere here around 5.1 something so, exponential of 5, 5.1 If I take exponential of 5.1
somewhere around you will get the value 161 164 because I am not able to accurately read it
so, that is why this difference is coming.

So, here we are able to get the beta and theta from the graph. Now, let us see the approach
which we discussed based on the statistics. So, for statistics as we discussed earlier what we
have to do we can use this formula beta is equal to summation of xi xi yi minus n into x bar y
bar.

So, like we have done for exponential I have calculated the xi yi here that is nothing but
multiplication of xi and yi this is equal to xi into yi if I do that, I will get this all this value
and once I get this xi yi then another value in denominator what I need is summation of xi
square x bar and y bar we will get from the bottom of the table so, need not to calculate it
again. What is x bar here? This is my x bar average value of xi, and what is y bar here that is
average value of y bar yi.

So, and for denominator I need to take the summation of xi square so what I will do I will
first calculate the xi square here. So, xi square is this is equal to xi power 2, and we are able
to get the xi square here. And summation of xi squared is we will take the total sum of this.
So, now we have all the value we have the summation of xi yi, this is sum.

So, how much is beta beta is equal to I can calculate it again for your reference that is equal
to as we see, we have the summation of xi yi. This is my summation of xi yi minus n into x
bar y bar this is my x bar this is my y bar divided by summation of xi squared. So xi squared
summation is this minus n into x bar squared.

So, x bar is this x bar square we get the same here because decimal 0.91 0.9123. So, we are
able to get 1.8, so we are able to get the same value here actually decimal points are shown

543
are less I will increase the decimal point for your reference so that you can see 1.8001, so we
got the beta value n values is beta so I am able to calculate that directly.

And what is my R squared value, R squared value again, I will use this formula which we
have seen earlier. R squared is 1 minus summation of y minus mx i minus Ci whole squared
divided by yi minus y bar whole square.

So, same thing I will do it here. This is already done 1 minus this is my yi, mx I will do this
again let me do this again. This is equal to 1 minus y minus so summation of y, so for this we
have already calculated if you see here, I have already calculated y minus mx i minus c. So,
this is equal to let us see this again y minus c minus mxi.

So, same thing I have done that is equal to y, this is my y minus c mxi. So, xi is this multiply
by m value of m I have already calculated here, so, I will put, since I do not want this to
change, column wise, so I will put the dollar sign here m minus c value of c also have already
calculated that is, I do not want to change again so, I will put this dollar sign again.

Once you, I do that, this becomes my yi value. And some of this value yi value I can get it
here but this value, I have to take the square of this so I will take this square. So, I will do this
square here itself. As you have seen that, if you do not do this square error sum is 0, which is
natural positive, because we want to fit the trendline which is minimizing the error. So,
approximately, it will try to make the positive and negative errors equal to 0. So, here, we are
able to get this.

And we want to calculate yi minus y bar square. So, this is equal to yi minus y bar y bar
means average value of y this is my average value y and whole square of this. So, c value we
have already calculated c value is, that is y minus if you know the formula for c value y
minus mx.

So, y value average y values or I can do this again. This is equal to y average y minus m is
this into x average x average is this into x average we get this. So, similarly, we are able to
get, I think, did I make some error here, y y minus cmxi y minus mxi minus C, c is equal to y
bar minus x bar into beta.

So, once we are able to get this then theta is let me just check where did I make the mistake,
value of c is minus 9.95 which is yi bar minus f 20 f 20 beta into same thing, this is same, and

544
then we get this this values yi minus xi into beta squared, maybe I have made some error
there. So, that is where the value got changed.

So, once we have this we got a beta value we got the R squared value we got the c value and
from the c value I can calculate that theta again using the same way same way which we have
calculated if you see this is the same value minus 9.9418 minus 9.1484 and theta value again
I can calculate the same value that is equal to exponential minus of minus of see this minus of
y minus beta x divided by beta y minus beta into x divided by beta 161.11.

So, we are able to calculate the values using this calculations also or we can use the Excel
function for the trendline fitting also, which will also give you the same result. So, this we are
able to do for the Weibull distribution, we are able to do the exponential distribution let us
see if you want to do this for other distributions like if same thing are plotted here.

(Refer Slide Time: 23:06)

Now, let us see if we go for the normal distribution. So, for normal distribution, what is my ft
ft is given in terms of now only thing we have to change here is this ft value the only equation
needs to be changed, but final formula and everything is going to be same. The only thing is
how do I calculate yi and xi that will only change from value of ti and Fti I have to calculate
the xi and yi.

Now, for normal distribution we know cumulative distribution ft is given as a standard


normal value phi cumulative value phi for z z is t minus mu divided by sigma. So, here what I

545
can do I can take the phi inverse phi inverse of ft that will give me a z. So, z is equal to phi
inverse of ft and what is z t minus mu divided by sigma.

CDF is given as

Converting this to straight line equation

Using LSE,

So, from here this I can write if I write the individual ti value this will be phi inverse of Fti
that will be equal to ti divided by sigma minus mu divided by sigma. So, this becomes my
straight line where y is equal to phi inverse Fti and xi is remain same ti I do not have to take
double ln here.

So, single xi is ti and m is equal to 1 upon sigma and c is equal to minus mu divided by
sigma. Once I have done this, then I can use the same formula same sheet by making little
changes there and this formula is same whatever my m is here. So, m is equal to 1 upon
sigma. So, this value was for the m.

So, my sigma value will be 1 upon m here because m is 1 upon sigma, same way whatever I
get the value of c, so c values minus mu upon sigma. So, and c is as we discussed c is y bar
minus m cap x bar. So, and this is equal to minus mu divided by sigma. So, my mu value will
be again equal to minus sigma into y bar minus m cap x bar minus sigma into y bar minus
beta x bar it will not be beta it is sigma. So, since there is a type typographical error here, so,
this will be sigma x bar. So, we can do this same.

(Refer Slide Time: 25:43)

546
So, let us see if we have the data, how do we do it. So, here I have done this, one way of
doing this is like I can copy this whole sheet again, what I can do, let me show you in a way
that how I generally do this let us say create a copy and move to the end. If you see I have got
let us say, normal distribution here I want to fit it to the normal distribution.

Now, I have the data already here. I have 20 filler data here, but this table was for 15. So,
what I will do, I will insert 5 more here. I will insert 5 more rows here, once I insert 5 more
rows now, what will I do is I will take the this data already let us say I have this is the failure
data which I have observed for normal distribution, that means I have the 20 filler data points
here and for 20 data points I want to fit to fit to the normal distribution.

So, what I will do I will this this is my basic data Inti whatever is IntIi based on that my Fti
etcetera has to be calculated. Now, Fti value as we have to calculate see everything else xi yi
xi square y this all is going to be same formula going to be same, what will be the change,
change will be in xi yi and Fti value will also be same Fti value same formula will be
applicable, the only change will be this n will become 20.

So, for 20 because now I have 20 data points, so n has become 20 same formula will be
applicable here, and I will get the Fti. Now, xi value how I have to do change, xi value here is
t, so but I will do xi is equal to ti. Now, what is yi value here yi values the phi inverse Fti.
That means what I have to do I have to take the norm inverse value.

547
So, Excel has the function for normal inverse. So, that is norm dot inverse. So, I will use
norm inverse value, the probability for which I want to get this value is Fti. And this is the
standard normal distribution I want to use.

So, for a standard normal distribution the mean is 0 and standard deviation is 1. With this I
get the new yi once I get the new yi everything is I do not have to calculate xi yi and other
things again, because that is by default same for same formula based on the xi yi value is will
be applicable and same way I will get it, but what I will get here is the c value, this is the my
1 upon sigma or this I will call it as this is my m value and my c value also is going to be
same and m value is going to same from here, I can now calculate the beta and theta.

So, rather than beta and theta, but I will calculate is I will calculate the sigma. So, sigma as
we know sigma is equal to 1 divided by m. So, this will become my sigma and mu mean
value mu is equal to sigma is standard deviation also and value of mu as from this formula
mu is minus sigma y bar minus beta x bar.

So, that is equal to minus sigma into y bar my summation of y is this y bar yi bar is this y bar
minus mx bar m into x bar. This is my x bar so, my mu becomes 79.44 y minus this will not
be beta this will be m mx bar.

So, we will be able to use the same function here and this mu comes out to be 79.44 if you
see it here, this m 1.123 x 1.1238 and c is point minus 9.838 minus 9.838, I do not need theta
here, I will not use it. And R squared value again is 0.9573 I have calculated, calculated
values also 0.9573.

So, as you see here, by using that because the approach is same, the formulas basic formula
for calculation of m and c is same, but the meaning of m and meaning of c may be different
from distribution to distribution.

And because of that, when we are using different distribution, the distribution parameter I
have to calculate differently like sigma is 1 upon m here, but for Weibull beta was equal to m
for exponential the lambda was equal to m, but here this is 1 upon m and similarly, mu also I
have a formula that minus sigma y bar minus m x bar. So, same formula I have used here. So,
same way we can solve like same thing if we look at here if we have the this we have done
for normal distribution.

548
(Refer Slide Time: 31:30)

If log normal distribution is very much similar to the normal distribution, the only difference
which we have in log normal and normal is like ft formula, because z will become ln of t
minus ln of t median divided by s as we know from the log normal distribution z is given like
this and this is equal to phi inverse of Fti.

So, same way I can use this value this function so, my phi inverse Fti will be equal to ln of ti
divided by s minus ln of t median divided by s if I take the 2 parameters t median and s this
will be my formula, ln of ti divided by s and ln of t median. Now, what the changes happened
for normal distribution it was i, but for log normal distribution xi will be ln of ti and yi is
same phi inverse of Fti another change would be that earlier we were calculating my m value
which would be here 1 upon s and c value what we what I have here is minus ln of t median
divided by s.

So, once I calculate the m so, 1 upon m will give me the s and from here once I get the S
value I can calculate the t median, t median would be going to the e to the power minus this if
I say this will be s into c minus would be equal to ln of t median and how much is c c is equal
to minus s into y bar minus m x bar.

So, this again this beta is written this is m, m x bar. So, t median is equal to e to the power
minus s into y bar minus mx bar where m is estimated value and s is also estimated value. So,
same way I can use, m I can also put it as 1 upon s, similarly this s I can put it 1 upon m, both
is going to be same. So, the same thing like I have done it already for log normal distribution.

549
(Refer Slide Time: 33:50)

So, for log normal distribution if I if you see here, same sheet are copied what changed I have
to do it here like the same sheet if I am going to use here let us say I fit the same data to the
log normal distribution, what changes I have to do is first 3 columns are going to be same
there will be no change because this is determined by the data i ti and Fti what changes there
xi becomes ln of ti.

So, I will make this ln of ti and what is my y y remains same as as we discussed earlier y for
normal distribution and for log normal distribution that is same that is phi inverse of of Fti
phi inverse of 1 minus Fti so, same thing is there, I think there is one mistake here phi inverse
of 1 minus Fti we have to take, no no phi inverse of Fti only.

So, phi inverse of Fti we take, we got this and same rest of the things are saying y xi xi square
this is all remains same as you see here, this is my log normal fit now, the earlier it was the
normal fit, but now this is log normal fit.

So, for normal log normal fit when I am doing then m is equal to 1 upon s, so, this will
become s, s will be equal to 1 upon n c is this. So, I want to calculate the t median here. So, t
median would be equal to as we see we will use this formula which you have seen t median is
e to the power minus s into that is exponential minus s value of s multiply by y bar this is my
y bar minus m into t bar, m into x bar m into x bar.

So, this will give me the t median that is 80. So, we are able to get the parameters s and t
median for the log normal distribution for the same data and if you see the fit value this is

550
0.9739, while earlier this was 0.95573. So, if you see here my fit is better compared to normal
distribution for log normal distribution the fit value was 0.9573 r square and r square value is
97739 here, which is greater than earlier value 9573.

So, that means, better the r square better is the fit. So, we can see that this data is fitting better
to the log normal distribution. And we can assume this data is fitting to the log normal
distribution rather than normal distribution.

And once we assume that, we can now use the parameters of log normal distribution, we will
know the reliability we will know the failure rate we will know zt ft everything as we
discussed in when we discuss the when we discussed this distribution in initial classes maybe
in second or third week.

So, when we discussed that, so, that once we get the parameter from here then whatever
discussion we did in earlier weeks. So, the values comes from here from the data analysis and
that is used over there.

And once you are using over there, you will be able to get all the important decisions making
all the important reliability characteristics or unreliable characteristics or even here we are
discussing the time to failure rather than time to failure if we consider the data time to repair,
then nothing changes the distribution would be for the time to repair except that everything
will be same. The only the variable name will change and we can calculate the
maintainability there, we can calculate MTTR rather than MTTF it will become MTTR, and
we will be able to use these values as we require in various analysis.

So, with this like we have discussed about distribution fitting. We have discussed four
distributions here exponential, Weibull, normal log normal, which are commonly used
distributions most commonly used distribution in the analysis of reliability, that is time to
failure data, as well as the time to repair data, which is the major data for the reliability and
maintainability analysis or the availability analysis. So, we will stop here today and we will
continue our discussion more about this distribution fitting and more about the
maintainability availability etcetera in coming lectures. Thank you.

551
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture: 32
Failure Data Analysis (Parametric) (Contd.)

(Refer Slide Time: 00:26)

Hello everyone, we have been discussing about how to fit distributions to the failure data that
means the failure data time to failure data or that may be the repair data time to repair data.
So, once we have time to repair data or once we have the time to failure data, we are able to
know the liability, we are able to know maintainability and if you know that two, we are able
to get an understanding about the availability also. So, here will continue our discussion. So,
last time we discussed about the least squares estimation where we try to fit line which is
giving us the least error after converting the equation into the straight line equation, the now
we today we will discuss about maximum likelihood estimation.

552
(Refer Slide Time: 01:17)

So, as we discussed LSE or least square error square estimation provides an estimate of
parameters. So, we are able to do that, but essentially it is not the best method because here
we are doing lot of convergence here like we have to first convert into the straight line
equation then we have to see which one is fitting and especially, so, this may be giving a
reasonable estimate when data is large and complete, but when sensor data is up there or
when data sample size is small. In that case, LSE may not be able to provide a good estimate
here means that estimation error would tend to be large that whatever is the true value for the
process, it may have little larger error.

553
For complete data


For right censored data ( failed out of )

For multiply censored data

So, MLE is considered to be a better method for making an estimation. However, MLE
requires a lot of mathematical manipulation and computations as we will see. So, MLE as it
says it is the maximisation of likelihood. So, by maximising the likelihood we are trying to
find out the parameter. So, for parameter estimation this should be primarily used if possible.
So, here as we see that we have a likelihood equation here likelihood here for complete data
means all failure data if you have so, this is nothing but a if you see this kind of joint
probability density function of all the times to failure data which you have got or it may be
the time to repair data.

So, if you have n number of devices, which are put on test and all n devices are failing at time
t1, t2, tn, so, you will be having a combined distribution of that. So, generally we assume that
the failure times are independent to each other, since, we are assuming independency So, f ti
once we multiply the PDF for each time to failure, we get the combined PDF for whole
observation set and that makes our likelihood generally, so, this is for failure data. So,
generally for failure data we use this product density function, but if you have the censoring,
in case of censoring let us say we put n units on tests r failed means n minus r units are still
working. Right censored means as we discussed here, there are two types of sensor data in
right sensor type 1, type 2, type 1 is time censored and type 2 is failure censored as you
discussed earlier, then let us say we have a test time is fixed ts let us say.

So, whenever fail whatever the number of failure happens that will be recorded, but test will
stop here. But in failure censored if we have decided k number of failures or r number of
failures in this case, then rth failure will be here. And when rth failure happens then test will
stop. So, it makes very little difference. Because whatever is the time at which you are
censoring, at that time what is happening the devices r devices are failed. So n minus devices
are still working. So that means n minus r devices are reliable. So, here in likelihood equation
in case of censoring the sensor data which we have for the sensor data, the user reliability

554
equation and for failed data we use the PDF equation. So, for failed data we get the PDF
value add that value and multiply.

So, for each failure point we will have the corresponding PDF value we will multiply and for
each value for which the reliability that is the censoring is happening for right censored data
all points are censored at same place. So, n minus r unit have been since censored let us say t
plus time or we can say ts time whatever we try to denote. So, reliability at that time will be
multiplied for all censored values. So, that will become n power n minus r. So, r units goes
here and n minus r units goes here similar is for multiply sensor data same thing multiply
sensor data we can say the generalised point that for all the sensors unit the reliability is
supposed to be multiplied and for all the failed units, we will multiply the probability density
function and that makes our likelihood.

Now, we want to get the value of theta1, theta2, theta3 these are the set of the parameters for
the distribution like for exponential distribution we have only one parameter so, we will have
theta 1 as the lambda for variable distribution two parameter variable distribution we have
two parameters. So, let us say theta 1 is equal to theta and theta 2 can be the beta. So, here
these are the set of parameters used by the distribution for normal distribution may be mu and
sigma for log normal it can be T median and another parameter can be as so, that way we will
be, so, this likelihood equation are the function of these parameters because everything else is
supplied t1 is known.

So, all t1, t2, t3 is known. So, that is supplied because that is the observation points only
thing which is unknown in this equation is the parameters of the distribution and we want to
find out the values of these parameters at which this likelihood value is the highest value. So,
we want to maximise likelihood for a given set of parameters here. So, here theta is a set of
like capital theta which you are writing it is the set of parameters this theta 1 theta 2 or
whatever is the parameters involved. So, how do we maximise any function to maximise any
function we can take the derivative of the function and equate it to the 0 because at maximum
value we expect that slope will be 0, but maximisation doing at the directly at the likelihood
function is not advisable here because that involves lots of multiplications here as we see
here.

So, we use the property that wherever the likelihood is maximum the log of likelihood is also
maximum. So, log of likelihood has the same point of maximum or same place where the

555
likelihood will be maximum. So, rather than maximising likelihood, we maximise the log of
likelihood. So, what we do, we will take log of this and then we will take the partial
derivative of the same with respect to each parameter. So, if we have two parameters, we will
have two equations if we have one parameter we will have an equation and these equations
then, because two parameters two equations.

So, two equations when we solve you will get the value of the two parameters, but generally
these equations tend to be nonlinear and once they are non-linear, in that case, it may result in
a way to solve these many times, we may not be able to solve this by using the direct
subtraction multiplication et cetera. We may have to use numerical methods to solve this
sometimes we need the numerical methods and the most popular one or easier one is the
Newton Raphson method. We can also use some other approaches like Quick Search et cetera
and we are able to find a solution at which this value will be given the 0. We will not be in
this this lecture series we are not going into much detail on exactly how to do this.

So, to solve these equations, generally, whatever statistical packages you are using mat lab
Minitab or any other statistical packages, whichever has the facility of doing the maximum
likelihood estimation they will themselves have the inbuilt functions that will give you the
direct solutions. We will only solve wherever this solution is possible in a simpler way using
the pen paper or simple excel sheets. So, here like if we take log of log likelihood this then
what will happen if ln of L I am taking as an example.

So, what will happen this will become ln of pi of f ti let us say and pi of R ti where i is
element of failure domain here i is for the censored domain also when we take inside ln what
will happen this will become summation of i over failure that is submission of ln of f ti we
know that multiplication of individual inside l log will be like ln of R1 into R2 is equal to ln
R1 plus ln R2. So, same we have used here so, when we take inside ln that will become
summation plus summation i as an element of R ln of R ti. So, similarly, when we have the
particular function we can expand this further and we can find out the ln of L and then we can
differentiate it. We will try to do this for one and let us so that you get an exposure that how
this is solved.

556
(Refer Slide Time: 11:31)

So, let us see if we talk about the exponential MLE. So, for exponential MLE we know, we
need to know what is f t, you know f t is lambda e to the power minus lambda t, that this we
already know. And what is R t, R t is e to the power minus lambda t. These are the two
functions which we have for exponential distribution. Now, for exponential distribution if
you want to solve the same equation our likelihood is ln of l will become summation of ln of
f ti where i is functional failure domain plus summation over i for censored data ln of R ti.
So, this function we can use and we will be able to solve this, if I say this then this will be
summation i equal i as an early where i belong to f, f ti is this.

So, that will become ln of lambda e to the power minus lambda ti plus summation I as an
element of c ln of e to the power minus lambda ti. So, here if you see summation over i as an
element of f. So, this will become ln of lambda minus summation i as an element of f lambda
ti e to the ln of e to the power minus lambda ti will be lambda ti minus lambda ti same way
this will become summation i as an element of c e to the power ln of e to the power will be
minus lambda ti. So, as we see here this if I put it further then we get this equation that is ln
of L. Now, with this equation whatever we see there is only one parameter lambda so, that let
us say delta ln of L and we differentiate with respect to partial derivative across lambda.

So, this will become summation i for failure data that will be 1 upon lambda minus i for
failure data this will be lambda into ti. So, this will become ti again, so minus is already taken
here. So, this will be plus. So, here minus if you differentiate with respect lambda this will
also become summation i n element of c ti this is the same equation which we were talking

557
about. So, let us say if we have r number of failures for right censored let us say at time ts
censoring happen of rest of the devices n minus r.

So, if that is the case, how much this value will be delta ln of L over delta this is used this can
be used for multiply sensor data singly censored right censored every type of the data can be
used here, delta lambda is equal to, now, how many see this i will be equal to i equal to 1 to r,
because r failure is there. So, this will become r divided by lambda 1 upon lambda if I am
summing up r times this will become r upon lambda minus summation of i equal to 1 to r ti
that is if you see this is nothing but the summation of failure times minus summation i equal
to, now here r failure have been there.

So, rest of the failure data which you have r plus 1 to n this data is having if you sum there
that means are the censored times or I can say ti plus. So, and this has to be put equal to 0 for
and if we solve this that means, r upon lambda is equal to summation i equal to 1 to r ti plus
summation i equal to r plus 1 to n I am writing ti plus I am writing so, that you can
differentiate that this is the data which is for censoring. So, this if we solve further we can
know what is the lambda here lambda will be equal to r divided by summation of i equal to 1
to r ti plus summation i equal to r plus 1 to n ti plus.

So, this is the formula which we have discussed lot about as we consider the observed failure
rate as we calculated earlier. So, r is the number of failures and what is this, this is sum of
failure times total failure times or cumulative failure times, and what is this data, this data is
sum of operation time for censored data. So, as we discussed here, this total of failure time
and total of operational time will become the total operation time, all together how much
cumulative hours, or cumulative weeks, or cumulative days, or cumulative years, or device
during test has built up, and this is the division factor for the number of failures, and this
gives us the lambda.

So, lambda gives the r divided by t, where r is number of failures and t is the cumulative
operation time which is submission of failure times and summation of the sensor times. Then
if I say censoring is happening at n minus r then summation of ti plus for i equal to r plus 1 to
n. Now, here this will n minus r is the total data points and all times ti plus or equal to ts that
will become n minus r into ts. So, that will become the total time which is spent by the
censored devices. So, whatever is the data you can use. This formula is actually because most

558
of the time when we go into the practical applications, the inherently lot of calculations when
we are doing we are assuming the constant failure rate or the exponential distribution.

So, when we are assuming exponential distribution, this becomes the formula to calculate the
failure rate and MTTF is the 1 upon lambda. So, MTTF is calculated as MTTF estimated
values 1 upon lambda cap that is equal to t divided by r that means total cumulative time
divided by number of failures. But as we discussed if the assumption of exponential
distribution is violated at the time this formula will not be valid. So, we should be careful
when we are using these formulas. Similarly, I am not going in two more examples and
solving these equations.

The reason being this we have already done like lambda, we can apply this formula and we
can directly get the data we have the data from the data, we can and I have already solved one
problem for this.

(Refer Slide Time: 20:18)

r
ˆ ˆ
ti ln ti + (n − r )t s ln t s
g ( ˆ ) = i =1
r
=0
ˆ ˆ
ti + (n − r )t s
i =1
1/ ˆ
1 r ˆ ˆ 
ˆ =   ti + (n − r )t s  
 r  i =1 

Similarly, for Weibull distribution, if you want to get the MLE that becomes little tricky,
because for Weibull distribution, our two equations, which we will be getting, and when we

559
solve, we are able to get fortunately here two equations, which are like equation one here, we
are able to eliminate there is no theta here? This is without theta, same, once we do the
differentiation with respect to beta, and theta, we will have the same process for t followed,
we can follow here, I am not replicating that, because that will be a little bit more tedious
looking and looking like a, which is not in we do not want to put it in here, but you can try it
will not take much time, but you will be able to do it, you can verify whether you get the
same value or not by following the same process was what you have followed earlier.

So here, if you see this, this is a nonlinear equation? Has it been linear equation, we could do
the right side left side transformation, multiplication division subtraction, and we could get
the value of beta directly. But since this is a nonlinear equation, we have to use numerical
methods here like Newton Raphson. Or we can, we can sometimes also do the quick search
where we try to have two values initially, and one should be giving the positive value for g B
and other values should be giving the negative value. So that means the solution lies
somewhere in between.

So, then we keep on changing the interval, or shorten the interval by tying the midpoint if
midpoint is positive, then it replace the positive value side. If it is negative, then it replace the
negative value side. So again, trying the midpoint, midpoint, and that is also we can reach to
the solution or we can use the Newton Raphson here. Once you know the beta value here,
then theta value can be directly calculated using this formula. But to calculation of beta, you
have to use the Newton Raphson method or some other approach to get our guests values you
have to put in try to do the hit and trial so that you are able to select a value beta which will
give you this 0. This again, I am not going exactly how to do it we can do if you want but we
are not covering it here.

560
(Refer Slide Time: 22:39)

Similarly, if we discuss about two other distributions like normal and lognormal for normal
and lognormal MLE distributions also we can this is the straightforward formula which we
generally use in the Excel sheet or anywhere like mu or the mean value is nothing but the
actual mean value and sigma square that is the square of the second parameter is standard
deviation that is nothing but the t minus mu whole square divided by n or we can say t square
minus we can see here, MTTF mu is nothing but MTTF mean of time to failure if TTF is
involved, so, this will become ti square minus n mu i square and mu squared divided by m.
So, we can solve this accordingly. So, this gives us the sigma squared.

So, we can evaluate the mu and sigma directly using this formula. Similarly, like we know
that for lognormal distribution, if you take the log of the time, it will follow the normal
distribution. So, in a way, if we take the log of all the observations, and then we do the same
exercise as we do for the normal, we will get the log normal distribution fitting that means, if
all ti which I am getting here, t1 t2 if we take the ln of this, then ln of ti will follow the
normal distribution, while ti is following the lognormal distribution. So, what change I have
to do if I am writing this as a function like z is equal to ln of t minus ln of t median divided by
s.

So, in that case, what happens I will whatever ti was using, so, rather than x bar I will take the
ln of ti and average that means, all the ti data I will average or I will first do the logarithmic
conversion and then I will use it. So, if I take average of that, that gives me the mu. So, what
is mu here mu is ln of t medium mu is equal to ln of t median. So, how much is the median

561
here the median here is e to the power mu. So, whatever mu you get if you take e to the
power of mu you will get the t median and another parameter is the s is same as what we have
here that is, but rather than ti again I will be using ln of ti ln of ti minus ln of t median which
is nothing but the new cap. So, ln minus mu cap whole square by n the same formula when I
use I will get the value of s.

So, these calculations they are able to do when complete data is there because for complete
data same f t is applicable, but when data is censored, then as we have looked into that earlier
for censored data, we need to know the R ti. This R ti is not known for the normal
distribution, we know the PDF for the normal distribution, but we do not have the CDF for
the normal distribution in a closed form. So, CDF is always evaluated in terms of numerical
methods. So, if you want to use censored data here, then we have to again use numerical
methods to solve the CDF value and from the CDF value to find out the value of theta, sorry,
mu n sigma at which you will have the first derivative of the two functions as 0.

So, here because of lack of having the form f t is we do not have a closed form because we
know that whatever we have f t or z t let us say we have z t e to the power minus z square by
2 1 upon under root to pi. Now, this function is non-integrable because it is non-integrable we
are not able to have the CDF we only have the f z the only have the PDF the CDF is either
determined by referring to the tables which have been numerically obtained or we have the
function in software’s which use the numerical methods to give you this value. So, that is
why in case of censoring it, we may find it difficult to use the MLE method for normal and
lognormal, but it can be done the software's and if you use some programming language you
can do that numerical methods and can do it you can do it in excel also.

But, in this class, we are not doing it maybe in some future class when we discuss this
concept in more detail. We will try to do but here I do not want to go into that detail. So,
because of this course has a limited it is an introductory course, so, we will not be going into
much detail here. So, here like as we discussed today, we try to see some ways using which
we are able to use MLE. But, as we have seen for MLE we have to solve the develop the
likelihood equations, and then take the derivatives and then solve the equations which can
sometimes turn out to be nonlinear equation.

But for complete data in case of normal lognormal we are able to have direct formula.
Similarly, for complete or censored both type of data, we can solve exponential distribution

562
and get the parameters of exponential distribution directly. But we may have to solve a
nonlinear equation using the numerical methods for the Weibull distribution. We will
continue have a discussion about it. And so we will stop here today. Thank you.

563
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture 33
Goodness of Fit (GoF) Tests
Hello everyone. So, let us continue our discussion about the fitting of time to failure or time
to repair data. So, in previous class, we discussed about how we can use MLE, maximum
likelihood estimation. So, we discussed what are the steps we can follow, so that we are able
to get the MLE estimate further from the data for any distribution. When we are using MLE,
we can use the goodness of fit test to find out whether the data is belonging to a certain
distribution or not. So, today we will start our discussion about the goodness of fit test. So,
we will have brief discussions about it, we will not be going into very large details.

(Refer Slide Time: 1:08)

So, here as we see that for goodness of fit, we have the hypothesis testing. So, hypothesis is
like we have the null hypothesis, and we have alternate hypothesis. Null hypothesis is
represented as 0, and what is our null hypothesis or the other fact which we want to check,
whether our assertion shown is true or not. So, the assertion here is that the time to fill a data
or time to repair data which we have taken comes from a specific distribution, this specific
distribution can be exponential, viable, normal log, normal whatever it is. So, we want to just
check that whether this data belonging to this distribution or not, so that is our null
hypothesis. So, effectively alternate hypothesis becomes the negative sentence of the same.
That means, the failure times do not come from the specific distribution which we are
choosing.

564
So, to do that what we do, we develop a test statistic. So, test statistic like chi-square and
others. So, this gives a calculation using a statistic value, and this is statistic is another
random variable, which follows a certain distribution. So, from the distribution we find out
whether this statistic which you are calculating, whether this is falling within the designated
interval or designated one sided interval or both sides interval. And if the statistic value is
within the significance level or reason, then within the confidence reason, we will accept the
null hypothesis. If it is outside the reason, we will accept the alternate hypothesis or we can
say that, we will reject the null hypothesis in favour of alternate hypothesis. Because, we do
not find the sufficient statistical proof in favour of null hypothesis.

So, here generally there can be 2 types of tests, one is the general test which we will be
discussing that is like, so, chi-square is such test that which is applicable to not only one
distribution, but it is applicable to most of the distributions. So, we can check based on the
chi-square value whether this is belonging to a certain distribution or not. Then, there are
specific tests which are specific to the distributions. So, these are generally more powerful to
check a particular distribution, because they are specifically designed for checking those, they
are specifically tailored for using for those kinds of applications. So, we can have that but, so,
whenever we are making these hypothesis testing, so, with each hypothesis we have 2 types
of error.

Generally, if we have the hypothesis here. So, we have a metric here. So, we can, we have H
0 accept, and another case is H 1 accept. So, there are 2 possible outcome, either we accept H
0 or we accept the H 1. But what is truth? Truth is H 0 is or truth is H 1. That means in actual
if H 1 is true, H 0 is true or H 1 is true. So, if H 0 is true, and we accepted H 0, this is fine,
right decision. If H 1 is true, and we accept H 1, then that is also right decision. But if we
accept H 0, but it is not true, then this is called type II error, and this is called type I error.
Type I error means, sorry typo I, this is type II error, this is type I error probability of
incorrectly rejecting null hypothesis, that means we are accepting H 1, but while it is not true,
and this is type II. So, that means if we are accepting H 1, but H 0 is true, that is type I error,
and if you are accepting H 0, but H 1 is true in that case that is type II error. So, we generally
if we want to balance one, another one will increase.

565
(Refer Slide Time: 5:51)

( O − Ei )
2
k
 = i
2
i =1 Ei

So, let us discuss the chi-square goodness of fit test. So, goodness of it, I am writing as GoF
here. So, chi-square test is like a global kind of application or it is applicable to almost all
types of distribution, whether it is continuous distribution or discrete distribution or any other
distribution. And though generally this is valid for large sample size, for goodness of fit, this
test we require a little large sample of size, samples. If sample sizes is small, we may not be
able to perform this, and we will see why that comes. So, generally what it does, it tries to
divide the data into groups, and it is expected that these groups of all should be at least of size
of 5 failures. So, if we do not have sufficient data, if we have only 10, 15 data points, then
our groups formation will be very less, and we will not be able to get this chi-square statistic
properly.

So, here chi-square statistic which we are getting is nothing but the summation of observed
value of number of failures minus estimated value of number of failures in each interval. So,
we divide the data into intervals, and for each interval we find out how much is the observed
number of failures or observed number of repairs, and if I know the observed number of
expected number of failures. Expected number of failures we calculate from the distribution.
So, from distribution we calculate this, this we are getting from the actual observation, and
that is square of this, so, error is squared, and again the divide from the error, that is the
estimated value of the distribution. So, once we divide this, then this becomes kind of error.

566
So, definitely if error is high, chi-square value will be high, and we do not want high chi-
square value. So, if chi, lesser the chi-square value, better is the fit, so, better is our
applicability of the distribution. k is the number of classes, that is number of intervals,
number of groups etcetera. So, this we already discussed, O is the observed number, E is the
estimated number. How can we get the estimated number? To get the estimated number we
can use the np i, what is n? n is the number of samples or number of device put on test, a
number of devices, number of times you have tried to observe. So, out of total number of
samples, within this interval, what is the failure probability that is our p i.

So, number of device put on test multiply by failure probability gives us the expected number
of failure in the interval, this we know from the binomial distribution, by for binomial
distribution expected number of failure is n into p i, where p is the failure probability. So, n is
the total devices which are put on test. How can we get this failure probability for the
interval? That is nothing but the CDF at higher, right side of the interval, and CDF at the
right, left side of if I am interested in here. So, this is my a i, this is my a i minus 1, I want to
know the probability of this. So, this will be f. So, subtraction of F a i minus F a i minus 1
will give me the what is the probability of this interval.

Similarly, if I want I can write as R a minus R a minus 1, a i minus 1. Because, whatever is


the difference between the failure probability, same will be the, negative will be the
difference in the reliability. Because reliability is a decreasing function, and failure
probability is a increasing function. So, F a i minus F a minus 1 will give me a positive value,
but for reliability, it will be the negative of that. So, we have to take R a i minus 1 and R a i,
if you take that will be same as this one.

(Refer Slide Time: 10:08)

567
568
569
570
So, let us try to see one example here, we are focusing more on exponential distribution
because in practically when you use for your cloud project etcetera, exponential distribution
is the one which is primarily used. So, let us say we have certain data, this data is as shown
here in Excel sheet. So, this is my data, here we have the, actually it was unarranged way, it
was like this, then I arranged it for T i, that is smallest to largest, and then if I put this 1, 2, 3
then this becomes my i, this I can say is the i. Now here so, how many failures we have here,
we have the 35 failures, and for 35 failures, we have the time to failure data.

Now, I want to fit this data to the exponential distribution, and I want to my problem is, what
I want to know, I want to know that whether this fitting the exponential distribution is good
or not, whether exponential distribution is applicable for this data or not. So, for that we
actually have, if we were dividing as per the Sturgis formula, we actually are supposed to get

571
the 6 possible classes. So, because that is but here what happens, if we use the 6 possible
classes, then my classes 3, class 4, class 5, I only have 2-2 failures, and class 6 only 1 failure.
But as we discussed earlier that we need minimum 5 data points for using the chi-square
distribution in each interval.

So, that is why what we do rather than using all these separately, we will use them all
together. That means, I will say that my intervals are like first interval is 354, second is 688
and third interval is infinity, that means from 60 all values larger than 688 that will be
counted here. So, with that what will happen my values will be 2 plus, 2 plus, 2 plus, 1. So, 7
observations will find my file, fall in this region.

So, my first interval that is 0 to 354 has the 18 values here, if you see here, that 354 is up to
here 300, then from 19 to our 10 value, so that means 19 to 28 are falling in the second
region, that is second region is up to 688 because this is higher than 688. So, this becomes the
next region, and higher than 688 is all falling in next interval, that is my third interval. So,
here 18 times seven becomes my the number of observations in each interval.

I could not use this though in development of doing the initial analysis, I was supposed to
divide it in these classes, but these classes are no longer applicable. So I have to further group
the data. Now, what is the probability? If I want to know the probability for this. So to
calculate the property first let us see with MLE how do we calculate the lambda.

So as we know what is the lambda here, if I want I can calculate the MTTF here, MTTF is
because it is all the failure data, there is no sensor data. Since there is no sensor data, MTTF
is nothing but the average, that is a summation of time to failure divided by n. So this is
nothing but if you see this is nothing but the average, average of time to failure data. This
gives me the MTTF, and what is lambda? Lambda is equal to 1 divided by MTTF.

So same formula which we have developed earlier, we have used here, the only difference
here is that the formula we have seen for sensor data also, but here is no censoring. So that is
why the, we only, we do not have to add the sensor value here. All the failure time we add
that will give, and take the average that will give us the MTTF, and inverse of the MTTF will
give us the lambda. So, we get the lambda here and number of sample is 35.

So, I want to now know the probability that my distribution is, what is the probability that my
values or time will be 0 to 354. So, we know that I will use a pen here, we know that if I have
this data t versus capital F t, what is capital F t in this case? That is 1 minus e to the power

572
minus lambda t. So, capital F t will be starting from 0 and it may be increasing further, sorry
not like this, but with somethings like this, it will be touching 1 at the time t equal to infinity.

Now, I want to know that at 354, what is the probability that values are lying in this region
that means, I want to know F at 354, that is the probability that my value will be from 0 to
354. So, how can I calculate here? This value I can calculate as 1 minus exponential of minus
lambda, lambda value we have calculated here, lambda into t is how much, 354 default.

So, this becomes my probability and how many total samples I put, 35. So, if I multiply 35 to
this is equal to this multiply by 35 I get 18 failures, expected failures are 18, and how many I
observed? I observed also 18. So, if I want to calculate that formula which I have O minus O
y minus E i, whole square divided by E. So, O is18, E is 18.12 and divided by E, if I use this
is equal to O minus E whole square divided by E, same formula we have used I get the error
here.

(Refer Slide Time: 18:04)

573
574
Now, let us see the second one, for second one is I am talking about here the range for in this
range that is 354 to 688. So, within this range, I want to know the failure probability. So, with
this range if I want to know the failure probability, that will be equal to F 688 minus F 354
that will be the probability that value will be in this range. So, this probability I can get like 1
minus e to the power minus lambda into 688 minus 1 plus e to the power minus lambda into
354. Sorry 354. So or I can say e to the power minus lambda into 354, minus e to the power
minus lambda into 688.

So, if I use the same thing here so, I various ways I can do, see this value is already available
with me, 1 minus e to the power minus this is already available. So, one way I can do is I can
calculate the F at this value 688, and subtract the F at 354. That way I can do or I can
calculate like this directly that is equal to exponential of minus lambda, lambda is this value,

575
minus lambda into. Now, I have to take the previous value, 354 minus exponential of minus
lambda into the current range, that is 688. If I do that my value will be here.

So, my 2.399 becomes the value, and how much will be the expected failure? Expected
failure is total 35 units of put on test multiply by quality of failure, I will get the expected
failure, that is it. And how much will be the contribution of this towards error, that will be O
minus E squared divided by E. So, same thing I will get it here.

Now, in last region is 7 failures, for last region if you see that means everything above 688.
So, that means, total probability will be from 688 onwards to infinity, at infinity we have
probability 1. So, 1 minus F at 688, if you do we will get the this probability, and this
probability is 1 minus 1 F 68 is e to the power, 1 minus e to the power minus 688 into
lambda. So, that means 1 and 1 will get cancelled so, I can use this probability as the this will
be equal to nothing but there is another way that I subtract earlier probabilities from 1, I will
get this.

Because total has to be 1 or this will be same as exponential or minus lambda into our
previous value 688, and summation of the probabilities will always be 1, because total
probability 1, we have divided into few intervals. So, expected number of failures will also be
total 35, which is divided into different intervals, and how much is the O minus E square
divided by E, we are getting for all and this becomes a summation of all these values. So, this
becomes our chi-square statistic. So, our chi-square statistic here come out to be 0.5643. So,
this 0.5643 is the chi-square statistic which we have calculated.

Now, what is our criteria this chi-square value should be less than our critical value, and how
much is of a critical value? To calculate critical value, this has one parameter so, we have 3
intervals. So, 3 minus 1, 1 is lost already and because this is having one parameters, so, for
one parameter one another degree of freedom is lost. So, we have total one degree of
freedom. So, we have to get the chi-square value for 0.1 is the critical level. So, that means,
critical level 0.1 is we have the 90 percent probability or we can say right hand side 0.1, 10
percent, we are leading on the right hand side. So, here 0.1 and our this value is 1.

(Refer Slide Time: 23:13)

576
577
So, this also we can get from Excel sheet. So, critical value if you want to know, what is this
critical value? Critical value is equal to chi-square inverse value towards right hand side
because we are leaving 0.1 data towards right. That means 0.1 and what is the degree of
freedom? Degree of freedom is 1. So, we are able to get the critical value is 2.7. So, 2.7 is
higher than the 0.5643. And what is the significance level here that is the 0.1 or we can say
the confidence bound is 90 percent confidence bond. So, here we are this value is higher than
0.5643. So, we can say that our null hypothesis is accepted. So, null hypothesis get accepted
here and out chi-square values.

So, this is the procedure which we can follow to get this value. There is another way as we
have seen here, because of this grouping of the data, what happened because here intervals
initially were of same size 354 into 2, into 3, into 4 like that. So, because of equal intervals,
what was happening number of failures, we are having different in different cases, but our
criteria is that we want to have the minimum 5 failures in each region. So, to do that, we have
to divide this into a different bin size. So, what we can have we can have these intervals
which are not uniform, but in every interval we will try to have the similar number of
failures. So, let us see if we had the 35 failures here, if you divide the 35 failures into 5
different intervals, then 7-7 failures will supposed to be there in the each interval.

So, if you want to do that, what will happen here that total probability from 0 to 1, we have
total probabilities from 0 to 1, which we have to divide in 5 intervals. So, that means, each
interval will have a probability of 0.2, 1 divided by 5, will be equal to 0.2 that means, we
have to divide in 20 percent of 0.2 probability intervals, then only we can divide the data into
5 equal numbers. So, we have the 35 data points. So, in each interval 7-7 points are supposed

578
to appear, because probability is same for all 0.2 probability. So, 0.2 into 35 will give me 7.
So, we are able to get the 7 failures, expected failures in the each interval, but actual failures
may be different, actual failures we have to get from the data.

So, what we do here we will divide the probability of interval into cells. So, that way we will
get the intervals first, and in each interval then we will see how many failures are lying
actually, and once we see that, then we will compare the same. So, here probability is 0.2,
0.4. So, we get the corresponding time. So, if I want to know the time let us say for we know
F of t is let us say 0.2 and that is equal to 1 minus e to the power minus lambda into t. So, I
want to know the t. So, t will be equal to 1 minus 0.2 will be equal to e to the power minus
lambda t. So, lambda t, minus lambda t would be equal to minus ln of 1 minus 0.2 and t will
be equal to. So, this is plus and t will be equal to minus 1 upon lambda ln of 1 minus 0.2. So,
for 0.2 I will get similarly for 0.4 I want to get, for 0.4, also I will get. So, for all the values
when I change here, that is probability value, I will get the corresponding upper bound on the
interval. So, here if you see that interval comes out to be this I have done already in Excel I
will show you.

(Refer Slide Time: 27:38)

579
580
581
So, here like let us say if we use the same approach in another way, so, we have the
probability 0.2. So, for 0.2 how much will be my time at which the probability is 0.2, the time
is minus ln of 1 minus probability, and divided by lambda. How much is lambda, lambda is
same as what we calculated earlier. I am using dollar signs here so, that this does not change
and I get the same value. So, as you see here that we get the 108 here, I will use same number
decimal point here. So, 108.26 is the point at which I will have the 20 percent probability and
40 percent probability will be at the point 247, 60 percent as per my fitted distribution,
because I have fitted to exponential distribution. So, at 108.26, the probability cumulative for
this 0.2. At 247.82 cumulative probability is 0.4, same way.

So, I get this upper bound on the intervals and lower bound is nothing but the previous value
0 to 108 fault is 2, 0.2. 102 to 247 point again total property will be 0.2, cumulative
probability will be 0.4. We are able to get this data. Now, let us see that for up to 108 how
much data is falling. So, we will see here, 5 data points are falling in the up to 108. Then
from 108 to 247 108 to 247. Like up to here. I have the 9 failures here, 7 to 15. Then 247 to
444 about 257 to 444, 444 means up to here. That means of 16 to 24 that is again 9 data, if
you see count is 9. Then 444 to 780 that means from here to 780, 780 is up to here 767, sorry.
So, that is again my 6 data points. So, I have the 6 data points, and then all above 780 that
means, here again I have the 6 data points, so, these are my actual observations, and how
much is the observations which I have in each interval that is the 0.2 into 35 that is 7.

So, expected values, expected observations are near same, because I have divided the data as
per the distribution. So, expected observation supposed to be here is 7-7 in each interval, but
actual observations which I have is here like 5, 9. So, now, the formula is same O minus E

582
squared divided by E, same is calculated O5 minus 7 divided by O5-7 square, that is 2 square
4 divided by 7.

So, same way we are able to calculate this here, once you get it here, we can get the total
summation of this. So, this becomes our chi-square statistic, whatever we have used earlier
same chi-square statistic, but there is a difference now, the difference here is I have the 5
intervals, here earlier I was having only 3 intervals here.

So, since 5 intervals are here my degrees of freedom will be 5, 1 is lost in estimation, then 1
is lost in the single parameter estimation, so, 2 degrees are lost. So, that means, I have the 3
degrees of freedom. So, I will get the chi-square value inverse value for 10 percent
confidence or 10 percent that is significance level, and the number of degrees of freedom is 5
minus 1 minus 1 that is 3. So, we get this, so, my critical value is 6.25. So, this value, 2 value
is quite low than critical value. So, that means we can say that my hypothesis that this data
follows the exponential distribution is accepted. Or I can assume that this data is following
the exponential distribution. We will stop here; we will continue our discussion in this
direction with more distributions and tests. Thank you.

583
Introduction to reliability
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology, Kharagpur
Lecture-34
Goodness of Fit (GOF) Tests (Contd.)

(Refer Slide Time: 00:25)

Hello everyone. So, we have been discussing about Goodness of Fit Test in last class we
discussed about the chi-square test. So, let us continue our discussion about chi-square test
last time we have taken an example and we have seen that if we have the exponential
distribution, how can we apply the chi-square test and see whether this fitting is acceptable or
not.

Now, let us see the same thing for Weibull distribution. So, if let us assume that there are 35
failure times available and which has been put absorved by putting 50 units on tests. So, we
have 50 units on test and we have 35 failure times.

So, that means, an 35th failure that it was sensored, whatever is the time at the 35th Failure
same time will be applicable for rest of the 15 sensor devices. So, 15 devices are sensored, at
the same point. So, here once we have this data we can use the MLE and MLE will give the
value for beta and theta this we are as taking as a pre calculated value here.

So, once we have this beta and theta then now, we let us see that whether we are able to
whether this is this fit to the Weibull is good or not. So, here my null hypothesis is that the

584
failure time to fill failure the test fitting to Weibull distribution and alternative hypothesis is it
is not following the Weibull distribution.

So, here failure times we have the 35 data we had. So, we have grouped in 5 classes. So,
generally as we discussed earlier that if you divide the data into classes by Sturgis formula or
some other. So, if you divided into 5 classes like like this, 1, 2 that is actually 6 classes. So,
this is a 6 classes initially, if you use the Sturgis formula studies formula for 35 and 30 both
comes around 6. So, we have the 6 classes.

Now, if we divide it into the 6 classes, we have in first class like 10 up to 28. So, if you see
here up to 28 that means up to here, so, 5 plus 5, 10 failures are up to 28. Then second class is
up to 56 so, 56 means here, that means, 5 plus 5,10 plus 1, 11, 11 failures are in up to 56.

Then for 84 that means 56 to 84 so, 84 is coming somewhere up to here that means, 5 plus 2,
7, 7 is lying into third interval and up to 112. So, for 112 is up to here 1, 2, 3 data points, 3
data points in 112 and up to 140 is because of a test was last failure was observed is 140. So,
test continued up to 140, 35th failure was at 140. So, 139.7 almost same as the 140.

So, 112, 3, 4, 4 failures are in from 112 to 140 and hear the sensoring happened, that means
the rest of the 15 failures were not happen, so that 15 devices were sensored at this point.

So, that means the 15 devices will fall somewhere from 150 to infinity, because these are not
failed. So, these 15 devices will fail somewhere in 140 to 50. So, that comes out to be 15. As
we see here, again, these 2 data points if we see 3 and 4, they are falling below the 5. So,
what we have to do, we have to combine these 2. So, this interval, rather than taking from 84
to 112, and then 112 to 140, we will take 84 to 140 directly and so that number of failures
will become 7 here.

Now, we have the 5 intervals here, we have the 5 classes. And in 5 classes, we have this
number of failures, which are all above 5. So, now we can use the same formula as we did
earlier. We find out the each class probability, probability of this class. How can we get, we
can use this formula probability as we discussed earlier one minus e to the power minus t
upon theta raised to the power beta.

So, if I am if I want to know let us say this is t1 this is t2, this is t3, this is t4, this is t5, I want
to know the failures between upto t1, so upto t 1 is F t1. So, this will be my first value this

585
value would be coming here, the second interval, second interval is F t2 minus F t1
probability of falling in second interval is F t2 minus F t1 this is nothing but 1 minus e to the
power minus t1, t2 upon theta raised to the power beta minus 1 plus e to the power minus t1
divided by theta raised to the power beta.

So, 1 and 1 can get cancelled. So, in a way we can write it as R t1 minus R t2 so, that means,
e to the power minus t1 upon theta raised to the power beta this is plus sign so, this will come
first and this is negative sign this come next, minus t2 divided by theta is to the power beta.

So, this we can apply a formula here and this formula will give me the probability of these
interval intermediate interval and last interval is nothing but one last interval is the reliability
that means, all the failures which are not happening up to 140 that will go to the last region.
So, that will be nothing but e to the power minus this my t6 if I say, that is t6 minus t6
divided by, sorry, not t6, t5, t5 divided by theta raised to the power beta.

So, these calculations I have made already which is shown here, I will show the same thing in
Excel because I have copied this all from Excel which I have already solved here.

(Refer Slide Time: 06:55)

586
So, here if I see chi-square Weibull here so, we had this, n number of observations also we
already seen. Now, let us see if I, how did we calculate the P, for first P that is 1 minus that is
equal to n lambda and this is 11 see here theta is 112.9 and beta is 1.032. So, 1 minus e to the
power minus t divided by theta raised to the power whole betta that is my first region.

587
(Refer Slide Time: 07:33)

588
589
And second region as you see that is exponential for t 1 minus that is lattice t 1 minus lattice t
2 that exponential for t 1 is D4 and t 2 is D5 divided by same 112.92 is to the point 1 minus r
raised to the power 1.032. So, this gives me the second same will be followed here and third
one will be nothing but the reliability at 140 that will be exponential minus D7 divided by
this.

So, we are able to get the probability for each interval, this interval probability summation
will always be 1 and how do I get the expected number of failures expected number of
failures is nothing but the number of devices which I have put on test number of devices
which I have put on tests for 50.

So, this will be equal to 50 multiplied by probability. And this gives me the number of fail
expected number of failures because of this distribution fitting from the distribution fitting
how much failures I am expecting in each failed, each interval here. And then again, I will
use the same formula for chi-squared statistic that is O minus E whole squared divided by E
these values comes out to be here and once I take a summation of this, this becomes 1.3916.

So, if I want to know the critical value critical value, I will take the number of intervals is 1,
2, 3, 4, 5 from 5, 1 is lost, because, 1 because 1 interval is already known, if other intervals
are known, like if you know if I know the 4 interval fifth interval is already known, how
much failures will be there in the fifth interval.

So, one degrees lost there and other degree is lost in estimation of parameters Weibull
distribution is 2 parameters. So, from 5 this will be equal to 5 minus 1 for the grouping part
and minus 2 for the number of parameters.

So, I have critical value so, 1, 2, 3, 4, 5, yes, so, my number of, my this what I say number of
degrees of freedom becomes 2. So, this will be critical value will be 0.12. This was because it
was a Weibull distribution. So, here critical value become point 4.6 which is much lower
much higher than the 1.39, since our statistic value is lower than the critical value our null
hypothesis is accepted and because of a null hypothesis accepted we can say that this data
follows the Weibull distribution.

590
(Refer Slide Time: 10:32)

591
So, same thing we have got here these values were wrong. So, that guys, which I took 6.25
rather than 6.25 should be 4.60. So, here again as we have seen that chi-square test, same as
we have done for normal, we have done for exponential Weibull you can do it for any
distribution, same process, because it is not dependent on the distribution, you just need to
divide and see in each interval, how many failures are the expected and how many failures
are actually happening, but as we see here, for chi-square distribution, we require large data
size, if sample size is small, we will not be able to do this properly.

(Refer Slide Time: 11:29)

So, for that purpose for different different distributions some specific test are suggested. That
for exponential distribution Bartlett’s test is used. So, by using the Bartlett’s test, you can be

592
you can find out whether the data is following the exponential distribution or it is not
following the exponential distribution.

So, for Bartlett’s test like for chi-square test, this was a specific distribution which could be
uneven, but here we are talking about the exponential distribution only. So, the null
hypothesis is that the failure time follows the exponential distribution and alternate
hypothesis is that times failure times does not, do not follow the exponential distribution.

Like chi-square test we had the chi squared value was equal to O minus E, O i minus E i
whole squared divided by E i and summation over i equal to 1 to n, this was our statistic, here
the statistic is B for Bartlett. So, Bartlett test has this statistic where, this is like here we have
the data. So, generally r is the number of failure. And so, if you do not have here it this is also
applicable for sensor data.

So, r is the failures data only the time number of failures, you may not have all the failures,
then also you can use the same. So, it follows the chi square, this statistic which you have
used this B also follows the chi-square distribution. So, the distribution would be same, but
the statistic is different the calculation which you are making this statistic chi-square is also
following chi-square distribution, this statistic B also follows the chi-square distribution.

So, the critical value we will obtain from the chi-square distribution and how the Chi but here
the null hypothesis will be only accepted by the 2 sided comparison that means, the B value
should be higher than the left limit and lower than the right limit. So, left limit is 1 minus
alpha by 2 r minus 1 and right limit is alpha by 2 r minus 1.

So, and that is also again right side value that is leaving the area on the right hand side. So,
this is the lower limit, lower bound this is the upper bound. So, if the, if my B values falling
within this region I will accept that my time to failure are following the exponential
distribution.

593
(Refer Slide Time: 14:10)

Otherwise, we may assume that so, let us take an example that 30 units were put on test out
of which 20 failures happened. So, 10 units were sensored, but by just looking at the time to
fill the data also we can find out whether this is following the exponential distribution or not.
So, we want to perform the Bartlett test.

So, Bartlett's test if you want to do, how much is r number of failures that is 20. So, and as we
see here for Bartlett test, we want to sum up 2 things, one is t i submission and other is ln of t
i submission for number of failures. And we need r value rest of all are, so, if we know the r
value submission of t i submission of ln of t i, we will be able to calculate the B value.

594
(Refer Slide Time: 15:04)

595
So, to do the same this is given here but I have done this exercise in Excel sheet also like this
is my data T. So, these are my 20 failures if I want to know the mean, mean would be 836.
So, not required right now, but we can do that if required let us say for this data since it
though it is not part of problem, let us see, I want to know what is my because this is the
exponential distribution I am assuming how much is the lambda if I want to calculate for this.

So, if I want to calculate the lambda then how do I calculate the lambda first I take the
summation of failure time and then I also take the summation of sensor time, let us assume
that all the failures would, whenever last failure was observed at that time the test was
terminated that means, rest of the 10 devices only work for this much 10.7 hours. So, this is
cumulated 107.

So, how much is total time total time is equal to 836 plus 107 943.3 and how many failures I
observed. So, this becomes time and how much is r, r is 20. So, my lambda, lambda is equal
to r divided by so, r is 20 and total time cumulative time is 943 though failure time
submission is only 836 but there is a 10 devices which have worked for 10.7 hours each
without failure. So, that time also need to be summed up added up with this. So, total time
becomes 943.3 and my lambda comes out to be 0.0212.

So, this is now how we use MLE for calculating the lambda, when sensor devices are there,
so, this is just an example, we want to just check that whether this data is following the
exponential distribution or not.

596
(Refer Slide Time: 17:22)

597
598
So, to do that, we need the 2 values, we would need the summation of T i and we need the
summation of log, log of T i. So, T i is already here we will calculate log of I here that is ln of
T i ln of T i and if we take the both summation this becomes ln of summation of T i this
becomes summation of ln of T i.

So, my B value which I have used here, that is formula is 2 into r, r is 20 here multiply by ln
of summation of T divided by 20 minus, minus ln of t summation of ln of t divided by 20.
Two things one is summation of T divided by 20 and another is summation of ln of I am
taking log of that so, log of summation of T divided by 20 minus summation of log of values
divided by 20 and whole divided by 1 plus r plus 1 divided by r plus 1 is only divided by 6
into r. So, 21 divided by 6 into 20 this if we slove my value of B comes out to be 18.

Now, whether my value, B value is falling in the region or not. So, for that we have to do the
chi-square assessment. So, for chi-square we have the r minus 1 degrees of freedom here
because only the number of intervals is there. So, here only one value is lost.

So, we have so, we want to know 1 minus alpha by 2 alpha is 10 percent 0.1 so, 1 minus 0.1
divided by 2 is 0.95, if I if you want I can write it here is 1 minus 0.1 divided by 2, 1 minus
alpha by 2 you will get the same and right side value is alpha by 2 that is 0.1 divided by 2,
0.05 and degrees of freedom is 19.

Now, if you see here this might B value is well within this region. So, that means, my null
hypothesis is accepted or I can say my data is following the exponential distribution. So, I can
assume that my data is following the exponential distribution.

599
(Refer Slide Time: 19:58)

600
Same thing what I have done here? Same thing is given here this is not 118, this is 18. So,
null hypothesis is accepted. Similarly, we may have another test for Weibull distribution. So,
these are specific distribution like we use them Bartlett for exponential, we can use Mann’s
test for the Weibull distribution.

So, we can find out whether data is following the Weibull distribution or not. So, our null
hypothesis is times to failure shows the follows the Weibull distribution and alternate is that
time to follow does not time to fill it does not follow the Weibull distribution.

What is the statistic here, the statistic given by Mann’s is this M equal to this and what is k 1
here k 1 is floor value of r by 2 that means, whenever I divide r by 2, I have to take only
integer part the everything coming after the decimal point is dropped out and k 2 is r minus 1
by 2 that means subtracting 1 then dividing by 2.

So, definitely if it is even number then you will get the integer if it is odd number then point
15 will be dropped and you will have the number. So, you will calculate, what is Mi here like
we are using Mi here, Mi here this Mi is nothing but Zi plus 1 minus Zi and what Zi, Zi is
like as we have seen earlier that is like we saw for the Weibull distribution ln of ln 1 upon 1
minus Ft. So, here so this is minus ln of minus ln this minus sign is missing here I will add
that ln of minus ln or I can say ln of minus ln 1 minus Ft.

So, this value will give me and Ft is how much this, here we were using I minus 0.2 divided
by n plus 0.4 if you want you can use that this is another statistic which is you would like if
you change the value of n you will get, you will get this another set of the way of doing the
median ranking.

So, here 0.5 and 0.25 is used. So, since he suggested 0.5, 0.25 we can go with that otherwise,
you can use 0.3 and 0.4 also i minus 0.3 divided by n plus 0.4. So here this gives us the Zi.
So, which is very similar to what we have done in the LSC and this value here this M follows
the considered to be following the F distribution.

So, critical value is supposed is coming from the F distribution here. So, F distribution critical
value, we have to take the right side distribution for alpha, alpha is my significance level 0.1
or 0.05. So, if the value of M falls below this it is accepted if it is not falling, if it is equal or
higher than that, then it is not accepted.

601
(Refer Slide Time: 23:09)

602
Test Statistic is

So, let us say if we have the exam, let us take an example, that we have the 35 failure points
here, out of 50 we had put 50 devices on test out of his 35 failed and for 35 this is our data
and so, 35 divided by 2 will be 17.5 and floor value of this will be 17. So, that will be my k 1
and k 2 is equal to r minus 1 that means 34 divided by 2, 17 floor value of 17 is 17only.

So, my k 1 and k 2 both turns out to be 17. From this I can calculate the value of M my
purpose is to calculate M. So, as I see here to calculate M, what are the things I need to know
if you see I have to take a summation of i equal to 1 to k 1 plus 1 to r minus 1 and divided by
i equal to 1 to k 1 and here what are the values I am using ln of ti minus 1, minus ln of ti ln of
ti plus 1 minus ln of ti. So, I have to calculate ln of ti plus 1 I have to calculate ln of ti and
this has to be divided by Mi, and Mi is what, Mi for Zi plus 1 minus Zi.

So, if I measure Z if I calculate ln of t, I will be able to use this so what I have done we
covered basic data is In ti these 2 i already have so what I did I calculate the ln of t i because I
know I need to have the ln of ti and ti plus 1 is the next value for i plus 1. But and next value
is Zi, Zi is nothing but the probability. So, this test we are doing for Weibull distribution, so
for Weibull distribution we can get this Zi, Zi is ln of minus ln 1 minus F t. Sorry, and F t is i
minus 0.5 divided by n plus.

So, we will apply the same formula i minus 0.5 divided by n plus point so, i is known to us
because Z is function of only i. So, if I know the i, I will know the Z same formula we have
applied I have done this in Excel sheet which we will share and you will get this values. So,
Zi values are known to us. Now, Mi values also we can get. What is Mi? Zi plus 1 minus Zi.

So, that means this minus this will give me this. So, these values I will get by subtracting
from. So, I will get one value less here because I have to subtract this value from this so, one
value will be less.

603
Similarly, ln of ti plus 1 minus ln of ti. So, this ln of ti plus 1 minus ln of ti we can write it as
ln of ti plus 1 divided by ti. So, that I that I can subtract or I can do like this also. So, ln of ti
plus 1 this is ti plus 1 divided by ti. So, once that means or I can take the subtraction so, this
is this minus this if I take I will get this 1.988 minus 1.262 will give 1.726.

So, this is subtraction from this I can get subtraction also or I can simply take division also ti
division and take the ln whatever the will it will give you the same value. Similarly, now, I
have got these values. So, these values divided by Mi here also and here also. So, I will
divide this by Mi, Mi is here this value is here, so, this becomes my value.

So, here if you see now, I have calculated this whole value. So, this whole value I have
calculated here, so, if I take summation of this, now, I have to take the summation in 2 parts,
one is from k 1 plus 1 to r minus 1 and another is i equal to 1 to k1. So, 1 to k1, k 1 is 17.
That means one set is here and another set is from here to here. Since I have only 34 data
points r minus n, r is 35 but I am taking only up to 34.

So, two 17, 17 data sets I have, I will take the summation here, I will take the summation here
these 2 summations I will calculate and these 2 summations then we are comparing. So, from
k plus 1 to r minus 1 that means this summation and this is i equal to 1 to k1 that is the
summation. So, this summation let us say this is S2 and this is S1. So, I will take S2 upon S1
and S2 is multiplied by k 1 and as soon as multiply by k 2, so, this is k 1 k 2 once I do this I
will get the value of n.

So, this simple formula I have applied in Excel and I have calculated this value of M. And
how do I get the critical value of F. So, critical value of f is by taking the F distribution and
taking this inverse of F distribution for right side values that is alpha 2k 2 and 2k 1. Degrees
of freedom for F distributions are 2k 2 and 2k 1, 34, 34. So, once I use this, I will be able to
this I have done here also same thing.

(Refer Slide Time: 29:03)

604
605
So, this same table which I have shown here there, it is the same and if you see here, how do I
calculate M, M is k1 into summation of the second part red part and divided by k2 that is B
42 and the green part that is I3 and this when I take this gives me the M and F critical are how
I have got that is the F inverse rightside values for this I have taken for 0.05 that means 5
percent, 95 percent confidence level.

If I want I can take the 10 percent also or anything else and my degrees of freedom is k2 into
2 and k 1 into 2 and I use my critical value is 1.72. So my M value is lesser than critical
value. So, I can say that but it is quite close. So, but still I can say that my Weibull
distribution is justified here for this use.

606
(Refer Slide Time: 30:12)

So, here as we discussed today that we can have specific checks we can find out whether a
particular distribution is followed or not based on general tests like chi-square test or we can
also use a specific test designed for the purpose. So, thank you. We will continue our
discussion in next class.

607
Introduction To Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School Of Quality And Reliability
Indian Institute of Technology, Kharagpur
Lecture 35
Goodness of Fit (GoF) Tests (Continued)
Hello everyone. So, we have been discussing about failure data analysis, we initially
discussed non-parametric method where we looked into various types of data complete data
or group data then sensor data.

And we have seen that how we can find out the based on the ranking how do we find out the
FTI then from FTI we can get reliability, we can get a quality density function, we can also
get the failure rate or hazard rate function, then we start discussing about the LSE method
least squares estimation where we saw that if we have different distributions then we can
convert the cumulative distribution function into a linear form and using that linear form then
we can fit the data and you can find out that what is the parameter.

However, as we discussed earlier that many times MLE is considered a better approach than
LSE for finding out or estimating the parameters of the distribution. So, then we discuss the
MLE we discussed that how MLE can be used for estimating the parameters of exponential
distribution, viable distribution and also normal and log normal distribution.

For normal and log normal distribution the sensor data problem becomes little bit more
complex. So, that to solve those kinds of problems, we may need to use some software
packages for that, but then we also discussed that if you are using LSE then how can we
perform the goodness of fit test, we initially discussed a chi-square method. So, chi-square
test we did which is general in nature and can be used for any distribution function to see
whether the distribution is fitting to the data or not, that is based on the observed number of
failures which is compared versus the expected number of failures which is coming after the
distribution fitting.

So, based on that, we were able to know whether our distribution fitting is good or not. Then
we also discuss the specific tests which we can do for exponential distribution, then for viable
distribution.

Today, we will discuss the test which we one test which is used for finding out whether the
distribution is normal distribution or not a normal distribution. Similarly, it can also be used

608
for the log normal distribution or (now) whether it is a log normal distribution or not a log
normal distribution.

(Refer Slide Time: 03:12)

Test Statistic is

Null hypothesis is accepted if

So, going for that this test is called as KS test or Kolmogorov-Smirnov Smirnov test. So, this
test is can be useful normal and log normal, I am showing it for normal distribution, as we
discussed earlier that if the data is following log normal distribution then you just need to
take the log of the data and then it will fit to the normal distributions. So, whatever we do for
normal distribution same is applicable for log normal distribution, the only change is that
rather than taking the data we will take the log of the data and then the same process can be
carried out.

So, KS test works on the hypothesis for KS test is that failure times show normal distribution
and alternative hypothesis is that failure times do not follow the normal distribution. So, as

609
we see that this is also can be used as a test for the specific distribution and but, again the
problem is that we can only use it for complete data, so sensor data is not considered here.

So, how does it work? For this we calculate the two statistic here D1 and D2 and from D1
and D2 which is the maximum value if you see that for all the sample points which you are
consider, we try to find out the max value which is the difference between the cumulative or
the cumulative distribution of ti minus t bar divided by s, minus i minus 1 divided by n and
another D2 is the reverse of this, that is maximum value of same for i by n minus phi
cumulative distribution, phi is the, what is phi? Phi is cumulative distribution function for
standard normal distribution.

That means so, how do we get a standard normal distribution? From normal distribution has
two parameters mu and sigma, but standard normal distribution does not have any parameter
or we can get the standard normal distribution from normal distribution by keeping mean as
zero and variance as one.

So, and that we get it like by transforming Z is equal to x minus mu divided by sigma. Now
here, mu and sigma are not known to us which is to be which is calculated from this data. So,
this actually becomes t, so if I say Zi, Zi will be equal to ti minus mu is mean value of ti
divided by C rather than sigma we are using the notification as because this is this we are
calculating from the samples and i minus 1 divided by n is usual we know the rank and from
the rank we can subtract 1 divided by n.

Similarly same thing we calculate from here. So, these two quantities when we calculate then
where what is t? t is the average value of ti, and what is s square? s square is summation i
equal to 1 to n, ti minus t bar whole square divided by n minus 1.

Here, we will see, we will calculate D1 D2 for all the sample points. So, I will show it with
an example and then we find out which is the maximum value and out of the D1 D2 then
again we will see which one is the maximum value and the maximum value which you are
selecting we are calling as Dn this maximum value should be less than the our critical value
and what is this critical value? Critical value we get from here.

So, let us see if sample sizes 10 and we want to have the significance level 0.1. So, 0.1 and 10
so, my critical value is 0.239 if I use 0.05 5 percent significance level then same thing would
be here 0.258, so, we use this table to get this values.

610
(Refer Slide Time: 07:52)

Now, let us say, let us take an example that we have 15 observations and these observations
are given here I have solved this using the excel sheet I will show this again in excel sheet
and we want to test the hypothesis. So, these are the repair time TTR. As you discuss you all
the methods which you discuss can be applied to any random variable whether it is TTF TTR
or any other of interest.

Now, to do that we will calculate the mean, mean is coming out to be 73.1991933 and
standard deviation is 7.039, this we have already calculated and I will show it in the sheet
also. Now, KS statistic which we calculate here like this is D1 value we have calculated using
the formula which we discussed and D2 value. So, D1 maximum values 0.1224 and D2
maximum values 0.1294 out of the two this is the larger one.

611
So 0.1294 is used as the Dn value, this becomes our Dn value. And how much is the critical
value for 0.05 because here our sample size is 15, so, n equal to 15 and critical we are taking
the 0.05 so, this becomes our value 0.220 is the value, so critical value is 0.220. Since my KS
statistic value is less than 0.22 my hypothesis null hypothesis is accepted. Now this example
which I am showing, I will show it the same in excel sheet also.

(Refer Slide Time: 10:17)

This is the same sheet which we were discussing earlier. Now, let us go to the KS test. So,
this data which was there I have already copied it here, our time was if you see here I copied
all this data 61.6 to 84.3. And how do I calculate mean? Mean is the average, so I will
calculate the average. And how much is standard deviation? Standard deviation is standard
deviation dot I think we have to use the S here S and so, that will be 7.28 so, that is little bit
correction here. And what is i? i is 1 to 15 and i by n is i divided by 15 same thing will come.
What i minus 1 by n? Simple whatever I have taken I, so that will be i minus 1 divided by n,
n is 15 here, so, I will use that 15 directly.

I made some error here, did not use the bracket, fine and what is capital F T? Capital F T
calculations when we are doing that is nothing but the value based on the time I have I
because based on this data t and i, I have already calculated mean and standard deviation. So,
for mean standard deviation which I have calculated I can get the cumulative distribution
function using the norm distribution normal distribution, normal distribution and what is the
cumulative distribution for a given value x? And what is the x value? Value x is time here.

612
My concern value here is I will show this again, so that you can follow, this is equal to norm
distribution. We can use the earlier one also we can use this one also. Now, normal
distribution has to be calculated for this time. And what is the mean value? Mean value is
this. And what is the standard deviation? Standard deviation is this, and this is the cumulative
value I want, so I will use the true. For this b22, because I do not want it to change so, I will
use the dollars here, so this is how I get the F T.

Now, let us see how do we calculate the D1? As we see here D1 is maximum value so rather
than saying D1 I am calculating Di, what is Di? Di is this value phi of ti minus t divided by s
which is nothing but the my F value which I have calculated and minus i minus 1 divided by
n.

So, this becomes nothing but because that value is nothing but F T, F T minus i minus 1
divided by n. This becomes my D1 first set and second set is n by i by n minus F T, so we get
this value. Now, here I have taken the maximum value this is the max of this, this is the max
of this.

So, my KS is max of both of these, so max becomes 1211 and my D critical value I have got
from this table, this table, so for 15 and 0.05 0.22, so, this will be my critical value. Once I
use the same thing here then my D critical value is higher than KS or I can see my statistic
value is lesser than the critical value therefore my null hypothesis is accepted. So I can say
that this data is following the normal distribution, so this is how we can solve.

(Refer Slide Time: 14:51)

613
Same way if we do it for log normal distribution, for log normal distribution also we can do it
by simply taking the rather than just in the same first column or we can make another column
where rather than T we will take ln of T then ln of t then we will do the same thing i l, i f i by
n and everything. And same procedure everything we will do and then we will see, the only
change will be that this in this first column, whether I am using this column which is T here,
rather than I will use the ln of T log values, because log values are expected to follow the
normal distribution, rest remains same.

So, here like in these lectures in this week, we have seen that how we can have when we have
the data, how we can use the data for estimating the parameters of interest which can be
reliability failure rate, repair rate, if the time to failure rate is there, we will get the failure
rate, if we have that time to repair data then using the same procedure we will get the repair
rate. Similarly, we will get the maintainability if we have the repair data we will get MTTR if
we have the repair data.

So, same procedures which we discuss can be used. These procedures which we have
discussed in these classes, these procedures have been simpler one comparatively and which
can be solved using hand or even excel but to whenever we want to do larger analysis, it is
proposed that or it is suggested generally to use some statistical software, statistical software
will allow you that for the same data, it can show you the results for multiple distributions all
together that will be able to show you that which distribution is fitting better based on a
goodness of fit test.

So, the goodness of fit test value which test statistic which is coming the lowest one based on
that it will rank you that and suggest you that which distributions are fitting best. For
distribution fitting whenever we are going it is better that we follow like there are certain
questions like many people, many times people face this issue that what is the thing to do?

614
(Refer Slide Time: 17:33)

So, in that case they want to know that how to select a distribution. So, this problem you may
face that how to select right distribution. Many times what happens, the commonly
understood way of doing this is that you fit the data to multiple distributions and whichever
distribution is giving you best fit that means the least error or the least value of the statistic
that you consider to be the selected distribution.

However, that may many times lead to the wrong assessment, why? Because the problem is
that when you are selecting the distribution, fitting of distribution is not just the enough
criteria, because many times it may happen the distribution which is having more parameter
or more flexible is able to fit your data better, but practically because it is a sample, because
you are fitting the where distribution based on the sample.

So, it may happen that based on the distribution when you are making a prediction for the
future, because let us say if you have the sensor data, then based on the distribution, you
know reliability up to here, but by assuming that you follow the same distribution, you are
assuming the pattern that it will follow this pattern, but if you have not carefully selected the
distribution, the actual pattern would might have been like this, but it might have selected like
this or it may have been like this, or it may have been like this, it may have happened that the
distribution which are fitted, though it is fitting good but it may not be able to capture the real
distribution.

The reason being that data is the sample data, so sample data has its own variability, all data
is not going to fall on the line, so there is inherent variability in the data that is why we are

615
doing the distribution fitting. So that may be confusing. So, (how to do) how we can do this
step wisely? To do that, first we should ask a question that which distribution is applicable?

So rather than simply saying that best fit distribution you will choose, we first let us see find
out which distributions are applicable for this. That means in general historically or with
previous experience of with your overall experience, as you have seen with a similar system
data, similar failure data, what kind of distribution has been fitting when you have the large
number of data available.

So, that means that will guide you, that will suggest you that which are the distributions
which can be the candidate for the selections. In general for failure data analysis Weibull is
considered to be the unique distribution which is fitting most of the time, so that many times
people consider either Weibull many times log normal also fits to most of the cases. So, many
times when you fit the data you will find that log normal distribution is coming at the first
priority.

However, that may not be enough because it is fitting to the data, but it may be taking us in
the right, wrong direction also there is a possibility. So, how to correct that? That rather than
depending on only on the fitting value, because if error is little less a little more that does not
make sure that your estimation or your prediction would also be good. Therefore, it is
essential that you first identify which distribution are applicable. So, you make a list of them
then you fit that data then you fit data to select a distribution, once you do that then you will
be able to know the parameters and then you use the best fit distribution.

This problem is much higher in reliability, the reason being in reliability we are dealing with
the failure data. So, failure data is constrained especially if we are getting it from test
etcetera, only few failures we observe, while number of devices which are working are much-
much higher.

So, failure percentage may be somewhere around 1 percent 2 percent that is also for means
when companies are having a good repu. So, even for bad repu nobody would be having the
failures which is in range of 5 percent 10 percent. So, generally for good companies to
survive, they want to bring their failure percentages below 1 percent. So, but they mostly
work around these kind of percentages.

616
So, therefore, the failure data is only that much and that failure data is then also there is a
inconsistency in the data there is a reporting time problems, there is a assessment problem
there is a change in the applications.

So, there are many factors which affect the variability therefore, it is it will be wise that
before selecting the distribution before saying that which is a best fit distribution, you first
say and identify whether these distributions are applicable or not applicable, so, the
distributions which are applicable to the situations only those distributions should be
considered for the fitting and once you fit them then you can say that okay my this
distribution is able to get the trend better or able to give the error better.

Similarly, we may also look into let us say if we have the data that certain data is here, then if
it aligned to this. So, we can see that which portion our data is fitting better whether it is the
initial phase it is (filling) fitting better, whether it is the later phase it is fitting better,
sometimes that can also indicate and can help you to decide that which distribution is better,
sometimes it may happen that initially it is fitting better but later on it is deviating from the
points or the points are getting away from the line that is an indication that your points or the
line which are fitting maybe having a different pattern then your data is actually having.

So, we can look into so, we need to try a few multiple distributions and based on that we can
select but as we see that failure data analysis of distribution fitting, the Weibull has become
so popular for failure data analysis like you will find that there is a complete site complete
books which are written as the Weibull analysis. The failure data analysis or fitting the data is
called Weibull analysis.

The reason being Weibull is a flexible distribution, which can take multiple shapes. So
because of that it can capture multiple failure patterns. So because of that Weibull has been
very popular so it can capture exponential also when the beta is one, it can capture normal
also or log normal also when beta is four five or higher.

So, different pattern it is it can capture in increasing failure rate also, it can also capture the
decreasing failure rate distribution, it can also capture the constant failure of distribution. So,
because of this versatility and it is having two parameters only, there has been many times
tendency to use three parameter distribution.

But before using the three parameter distribution you have to ensure that whether the third
parameter which is saying that there is a minimum life to the equipment, whether that kind of

617
scenario really exists or not. If that exists then only you try to use, if that does not exist then
do not force it because otherwise you are forcing you know in your analysis that for a certain
period of time there is no chance of failure that may be incorrect many times.

Similar case for the two parameters exponential distribution, therefore when we are doing all
this analysis we have to properly follow that which distributions are the right candidate and
once we find out the candidate distributions then we try to find out the best fit distribution
and most of the time in reliability analysis practically also as I have observed when I have
worked with various companies that they try to fit that data to the viable distribution, which is
also acceptable approach and which has been successfully used by many companies, this
makes their processes simpler.

They do not have to be statistician to have multiple distribution than fitting etcetera, you can
develop a single tool single macro in excel sheet or simple way of doing the analysis, you can
just make a simple code one code line of number small amount of code.

So, with that, you may not need this specific or highly these statistical software's and you can
do all this analysis by using your excel sheet or using your C simple C program or some other
programming language. However, to do the detailed statistical analysis, we have a significant
number of tools available. Right now we have the Minitab, Minitab has a function which has
which is can be used for the time to failure data analysis so, reliability analysis is in a way
represented there.

We have a MATLAB also MATLAB data fitting is there curve fitting is there we can use
them. Many other software's are their s is there, sorry, r is there and we have the statistical
software we also have Python, Python can also be used. So, all these software's we can use
for this purpose of doing the data fitting and finding out the parameters of the distribution, so,
we find out which distribution is good that is the model selection and then we find out the
parameters of the distribution that is the parameter estimation for the model.

Once we have these parameters, we can use these for the system study that we discussed in
earlier weeks that we can find other based on the reliability we can find out the reliability of
the components we can find out the reliability systems and we can also decide our
maintenance policies we can also decide what to do about it.

So, with this we will now this our most of the discussion till now has been focused on
reliability or time to failure. We will briefly discuss in next classes about the repair that

618
systems which are repairable we discussed briefly about them in the during the first week
where we discuss maintainability and availability, we will try to put little bit more discussion
on the same aspect that how maintainability can be measured and how this can be useful.
Similarly, how availability can be measured and that can be used, so that we will do in next
classes. So, we will stop it here today. Thank you.

619
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 36
Maintainability and Availability

Hello everyone. In previous lectures we discussed about reliability and data analysis. Today,
we will start discussing about Maintainability and then subsequently we will also discuss
about the Availability.

(Refer Slide Time: 00:54)

So, generally, as we see those systems, so, we can classify the systems as non-repairable or
repairable. Non-repairable system means that system which is not considered for repair. So,
when a system is not considered for repair, there can be various reasons for that. One reason
can be that repair is costly or system inherently has characteristics which does not allow
repair. Let us say if we talk about, let us say satellite.

So, satellite will be difficult to repair because whatever automatic features are there that is
only possible but manual repair may not be feasible. Similarly, if we talk about missiles,
etcetera, they will be just launched, if they fail you cannot repair them. But many times, so,
there is a economic reason for that also, like if I am having a problem with my pen etcetera,
most of the time I will be throwing it away because they are cheaper.

So, cheap things if they fail, generally, I will be throwing but if there is a bulky and big and
costly equipment, I would like to get it repaired because repair will be cheaper. So, if repair is

620
not cheaper then I would like to make it, consider the system as a non-repairable. So,
sometimes cost also becomes a reason for a system to be repairable or non-repairable. So a
system which is cheaper, nobody will be having the repair shops etcetera or you will be able
to repair them but the systems which are costly.

If there is some fault you want to remove the faulty part, so that rest of the good part you are
able to continue to use and by removing and replacing or repairing the faulty part, what
happens, you are able to continue to use the system. Like when we have problem with our
ACs, fridges, refrigerator, etcetera, then we would like to make them repair. We will not just
throw them away because there is some fault or some failure happen.

In repairable systems there can be two types of maintenance one is the corrective
maintenance. So, generally, our household items, etcetera, which are not very prone to span
of when they fail, these are not going to cause any kind of bigger problems. Like a function
will not be available but we can survive without that function. And losses from the because of
the function loss, like if my AC is not working, so, I will be feeling discomfort, I will be
trying to switch over to another place where I have the AC etcetera.

So, I will try to manage the situation. But that when AC is not working, it is not affecting my
health or it is not affecting my because of that some bigger losses I may not be making. But
in some cases we are ACs are installed in a location let us say if let us say you are doing
some biomedical equipment is there which is need to be made sure that temperature is
maintained, if their AC fails then it may happen that if AC is not corrected within the time
then you may have the losses, bigger losses there.

So, in that case, so, the cases where losses are high or you have the safety problems, like if a
some failure happens because of which safety will be affected, let us say, if we talk about, let
us say cylinder leakage, etcetera, so, any system which has a safety consequence. If let us say
my bike related problem is there, if I am going somewhere and my motorbike stops on the
way, so, I will be stranded, I will have lot of problems.

So, for that kind of problems, I will not be depending on the corrective maintenance. Because
I cannot say that that let it fail, then once it is failed then I will correct it. Because when
failure happens, I will be in trouble. That trouble is because I am stranded, my work is
stopped, wherever I was supposed to do, I will not be able to do. Alternate measures will be
difficult to find. So, what will happen?

621
In such cases we will do the proactive maintenance. So, the cases where we have the large
losses associated with the failure or another problem can be that there is a safety problem
then we would be little bit scared of the failures or we are concerned about the failures. Since,
we are concerned about the failures, we do not want failures to happen because there is a high
risk associated with the failures. So, we want that our system should not fail, in that kind of
scenario. So, what we will do?

We will try to proactively do the maintenance. That means we will do the maintenance before
failure happens. So, we will maintain the system in a good health. Once we maintain the
system in good health, we are ensuring that our chances of failures are bare minimum or our
risk is also less. so that we are not able to have those kinds of problems. There are two kind
of ways we can do that, preventive maintenance or predictive maintenance.

Preventive maintenance has been the first concept which has been used as a proactive
maintenance. So, in preventive maintenance, as we know what happens that some, generally,
if we consume consider the bathtub curve then initial this if we remove by the burning then
here like when the degradation is start, what happens?

We know that fairly how long the time is there for which this kind of components or the parts
of the system will work without any much chances of failure. But after certain period of the
time these parts will tend to show the degradation. And because of the degradation the
chances of failures will be high. Like we have the oil, machine oils, we have that is a filter,
etcetera, we have the brakes.

So, what will happen that when we use them for sufficient kilometers or sufficient time, after
that tire, etcetera, batteries, etcetera, they have a certain kind of life indicators that if I want to
use battery, then, yes, I will not be able to use. So, what will happen that after battery works
for three, four years. So, after every three and four year I have to replace them. So, that is the
preventive maintenance.

Even if I do not see much problem with the battery, I will try to replace it, so that, that I have
a fixed duration. I have this fixed duration and this duration if we will try to identify the
manufacturer will suggest us that what is the time at which the scheduled maintenance has to
be carried out.

622
So, preventive maintenance is also called the scheduled maintenance. So, like when we
purchase new vehicle, we have that within a year I have to go. Even when I am using my car
I have to every year, at least once, I have to go for the maintenance. That is a preventive
maintenance. Because what happens, if I do not go for the maintenance then my car will
develop the bigger problem, if I do not go for maintenance, do not change the oil, do not
clean the filters, what will happen, my engine will be loaded and engine problems will start.

So, this is the preventive maintenance that to save the bigger failures, we are trying to do the
maintenance. And by the preventive maintenance we are reducing the chances of failure. So,
our risk of concerned failures has been reduced. But many times like what happens in
preventive maintenance when we do, even if the part looks good, even if the system is
functioning but still I have to do the replacement or I have to do the repair.

So, that sometimes is going against the principle of using this cost effectively or using it
maximum So, many times if it is feasible, if we have the ways to identify that the current
state of the system. So, in predictive maintenance, what we do, we try to assess that what is
the current state of that part or that system. Like we will check, we will do the some sort of
inspection.

So, we will do periodic inspection, we will try to find out that what is the health or what is the
quality for this part. If the quality is good and if we see that for a certain period of the time
the chances of failures are not there then we will say, okay, we will continue to use and we
will check it again later on. And at that time, if we see that that the part is deteriorated then
we will do the repair.

So, we nowadays lot of sensor technology is coming. So, sensor technology will try to sense
the various parameters like temperature, vibration or acoustic signals and many other
parameters like a oil content, etcetera. And based on that it will try to find out that whether
the system health is good or not.

So, if system health is deteriorated then by using principles, we can try to assess that how
much life is remaining. So, the concept of remaining useful life R U L comes into the picture.
So, we try to assess that what is the estimated life in current condition is remaining. So, that
means that is the time available for us to do the maintenance.

623
So, because time is available, we are not immediately forcing but we can plan that when to do
the maintenance. So, we are trying to set some time at which we will go for the maintenance
and do the repair. So, but here, the concern is to do the predictive maintenance, we should
have a way.

We should have the way to identify the health condition and we should be having the reliable
measures. Like whatever the way we are doing that, that should not be too much fluctuating.
It should not happen that at one moment we are saying that it is good and next moment the
signal comes it has become bad. If that happens then it is not of use So, when we are saying
that our, so, whatever R U L we are estimating, that estimate has to be fairly good enough,
fairly stable enough then only we are able to use these techniques.

So, this depends lot on technologies available. And this may not be possible for all types of
failures but wherever it is possible we can do this so that we are ensuring high reliability and
on the way we are making sure that we are not unnecessarily doing the maintenance which is
there in case of preventive maintenance.

(Refer Slide Time: 11:43)

624
Why we do that maintenance concern is coming from the downtime. So, whatever time which
we are spending after failure till it starts working again that is our downtime, for that time of
the system for during that time the system is down, we are not able to use the system. So,
downtime is the area of concern whenever we are studying the maintenance or
maintainability.

This downtime can be passive downtime. Passive downtime is coming because of the delay
in reporting or delaying realizing that failure has happened. So, many times the failure does
not immediately reflect or it is not immediately observed by the people who are responsible
for it or immediately report it to the concerned pupil who can do the maintenance. So, this
takes some time.

So, there is a delay there. There is a logistic time delay. So, Logistics are also required.
Administrative time is there, a lot of decisions need to be taken, a lot of the things need to be
arranged then only repair happens. So, these delays happens whenever some failure happen.
Logistics are related with the availability of the manpower, availability of the equipment,
availability of the spare parts. If you do not have these, you will not be able to perform the
repair.

So, you require sometimes and you have to ensure that these things are available. Once these
things are available, these are mostly related with the administration and or the management.
How well you manage that will decide that how well you are able to minimize these delays.
But there are system related down times. So, like if these are the active down times. Active
down times are related when you are actually doing a repair. So, what is affecting the repair.

625
So, repair time is affected by the access time. Access time is that you have an assembly. So,
you have to access the part. Like a if you want to go to the motor problem in the pump then
you have to first go to the motor, open it and go to the motor then only you can check it
whether it is working or not working, what kind of problems are there.

So, access time is required to open it, open your system and you are able to reach to the
problem areas. So, accessing the problem areas requires some sort of maintenance activity or
repair activity. Then comes diagnostic time. So, you will be checking, once you have access
that then you have to perform certain tests, then you have to use certain logics, you have to
use your experience, knowledge and procedures.

Based on that you will be determining that what is the problem, which part is actually
creating problem or which assembly is creating the problem. And once you find out that then
you have to decide that what you will be doing about it. So, diagnostic will also take some
time. Then comes up spare part procurement time.

Once you diagnose the problem, you will, you have to do the repair or replacement. So, for
both the things you need certain parts, certain equipments then only you can do the repair or
replacement. So, these parts should be available to you. So, how much time, this may take
sometimes to be available to you.

So, once this spare time is available, if it is available, you can use them directly. Then comes
the replacement or repair time. So, that is the actual time which if you have the spare part
available to you or if you have the material which requires for repair is available to you, then
you have to do the manual work or using the machines, and you will be doing the repair or
replacement of that faulty part of the system.

Once you do that then you will be checking whether the system which of the changes which
you make whether those changes have been effective or not, whether after the changes the
problem is resolved or not, whether after these changes, all other things are not working as
prior or as expected or not.

So, you will try to find out what kind of, you will try to assure that or ensure that after the
changes which you have made, the normal working is not affected and the changes which you
have made are also effective, the problem is resolved. Then comes the system or tool
alignment time.

626
Many times for mechanical systems, you may have to align, make some certain alignments,
make certain measurements, to make sure that a system is in perfect order or as good as as it
was earlier. So, this time may also be required to, or you can also say that this is the
installation time. Because once you have corrected the problem, you have to make the
installation again and you have to again make sure that the device becomes the running
condition again.

So, here generally, these times which we discussed earlier passive downtime, passive
downtime is mostly coming from management side. How do you manage or how do you what
kind of facilities you have created. Facilities will also impact the active down time. But active
downtime is coming from the design perspective like access time. Access time is affected by
the design.

So, in your design, if you keep frequently failing part as in deep inside in your design, what
will happen? To get access to the problematic part, you have to disassemble all the parts
which are working. In that case, what will happen? Every time you have to do more work
time taken for repair will be high. So, this is in the designer hand, access time is in the
designer hand.

So, he has to make sure that the parts which are failing faster, which are more prone to failure
are more accessible so that repair can be done easily, while the part which generally do not
fail or less probability of failure is there, they should be deep into the design, so that when
they are accessing them, they do not need much many times to be accessed because they do
not fail usually.

So, that way we have to design. So, this again is the concern for the designer, how much will
be the access time. Diagnostic time is again dependent on the designer. So, if good diagnostic
features has been built in the design, if good diagnostic process is or procedures has been
provided by the designer, then the person will be able to diagnose it faster and correctly.

Spare part availability, etcetera, then it is partially affected by the designer, partially affected
by the management. So, but like a spare part, here what type of spare part we are going to
use? So, spare part procurement actually we can keep as the passive downtime. This is not
active downtime.

627
But because this is the delay, procurement time is the delay. So, we can consider this as a
passive downtime rather than the active downtime. But a spare part, whether appropriate
spare part is there or not, that is the design feature. So, during the design if you have properly
designed your spare part or if you have made sure that spare part is quickly replaceable or
quickly they can be quickly diagnosed and quickly replaced, what will happen?

Your repair time will be smaller. Then comes replacement or actual repair time. That again
depends on the how, what kind of material is there, what kind of design process is there. It
will also depend on the maintenance facility. The maintenance facility, how the crew which is
going to do that whether they are properly trained or not.

So, the crew training, crew experience will also come into the picture. But it will also depend
on the design. By design, if repair is easy, you will be able to do it easy, replacement is easy,
you will be able to do it easy. But for replacement, if you have to open 10 screws, 20 screws
that also different size, different kind of things then you may have to take more time for
replacement.

But if you make it simple, few screws only and those are also same type, same tool is
required then it will be easier. Because if you have to use different kind of things, then you
need require different tools. So, that will also take a lot of time. Then testing time again
depends on the design.

So, designer will decide that how this checkout or testing, functional test will carry out at the
end of the testing. So, if these are efficient good, it will take the less time. If they are not so
efficient, the equipment is not so good, it may take more time. This will also in partially
influenced by the design, but this can also be determined by how well this maintenance
facility has been created.

If they have the tool and process, they will be able to do this in a faster way. If they have the
good tools, good testing facilities, they will be able to do it in faster way. So, as we see that
maintainability or active downtime or active repair time actually, it is a largely affected by
the design, while other delays etcetera may be affected by the management.

But many of the things are in the designer's hand. So, this active repair time is a design
feature. So, that is why we say that we have to design for maintainability. So, this time to
repair which we are saying, time to repair is consisting of this access time, diagnose time,

628
then after the diagnosis actual replacement or repair, then testing and then checkout.
Checkout, testing and then if alignment is required then the alignment.

So, this TTR is consisting of all this except this repair part procurement time. That is the
logistic time. So, this TTR is governed by the designer. If you design well then this TTR can
be smaller, if you design poorly, then this TTR can be higher. So, repair time can be
controlled by the design. So, that is why when we are saying design for reliability, we also
say design for maintainability. Smaller TTR means higher maintainability, as we discussed
earlier.

(Refer Slide Time: 22:43)

H (t ) = Pr{T  t} =  t0 h( x)dx

MTTR = E (T ) = 0 th(t )dt

h(t )
 (t ) =
1 − H (t )

This slide we have already discussed, I am reproducing or reshowing here just for the
refreshing because this we discussed in the or during the first week of the lectures. What is
the maintainability? Maintainability that our time to repair is below certain time. That means
within time T, my repair completes. What is M T T R? M T T R is the expected value of time
to repair.

629
So, which can I can use this formula. Small h t is the PDF of repair time. What is repair rate?
Repair rate is similar to failure rate, it is the conditional probability of completing the repair
per unit time. Then we have the median time to repair. So, like M T T R, M T T R is the
mean value. What is median value?

Median value for any random variable is the fifty percent value. That means we expect that
there is a fifty-fifty percent chance. That means this is the time at which we expect 50 chance
is there that repair will complete or fifty percent chance is there that repair will not complete.

Or we can say that 50 percent of the time the repairs will be completed and fifty percent of
the time repair will not be completed. So, it is a 50 50 value. So, if we say that way, it is the
most dilemma value that because this is the value at which either it may be repaired or it may
not be repaired and both are having equal chance, both are fifty percent.

Many times, so, we may use this value rather than MTTR, especially, when repair times are
highly skewed. because for a skewed distribution, the MTTR may have a different
understanding. So, that understanding development may be difficult.

(Refer Slide Time: 24:43)

630
MPMT
  MTTR +
TPM
M=
1
+
TPM

MH MPMT  CREWPM
=   MTTR  CREW +
OH TPM

There are some other maintainability measures which are used, one is like tp. tp is the time at
which we are expecting p percent of failures and that will be repaired. So, that is more like
the design time, td, we used to have td in the reliability. What does td means? That means or
we can say that tr, design life. That means for to, what is the time at which we will have the
reliability equal to r.

So, when we were saying t 0.9, we were saying it is the value at which my reliability is 90
percent. Similarly here, what is the tp? t p is the time at which I will be having the
maintainability equal to p. That means if I am saying t 0.9 that means this is the time in which
I am 90 percent chance is there that my repair will be completed. Similarly, I have the mean
system downtime.

So, mean system downtime like MTTR we considered. But MTTR is only considering the
time to failure. So, most of the time this TTR is considered as the corrective maintenance
time. Because this is the time of concern from the reliability point of view. Because from
reliability point of view or availability point of view, corrective maintenance is the time
which we are losing from the directly from the operation.

631
So, many times there is understanding that when we are doing preventive maintenance, then
preventive maintenance we can do it in slack hours, that means we can do in the time where
we do not need the system or when where we can within the schedule, we are making sure
that the system may not be required so highly.

So, here, the time loss due to the corrective maintenance is generally the major concern. So,
generally, MTTR whenever we are saying then we are saying that, that is a time which we are
losing in maintenance or losing in repair, only the repair part, due to the corrective
maintenance.

But actually we are losing the time not only due to the corrective maintenance, but we are
also losing the time between the preventive maintenance. So, if we include the preventive
maintenance time in our assessment that is in the downtime, then we are calling that as the
mean system downtime.

So, mean system downtime is lambda into MTTR and M P M T divided by T P M. TPM is
the preventive maintenance interval. So, that means, what is TPM? TPM is the time at which
I will do the preventive maintenance. Again 2 TPM. So, here this is these are the places
where I will do the preventive maintenance. 3 TPM. And how much is the what is MPMT?

This is the Mean Preventive Maintenance Time, that whenever I am doing repair how much
time I will be spending in the repair that is the preventive maintenance time, PMT. And if
you take my average of that that becomes the MPMT. So, on an average if I say that during
the life, if I say my concern is time T, my system I am considering that it is working for time
T.

So, for time T, how many fillers I am expecting? I am expecting let us say lambda T failure.
What is the failure probability? That is lambda T. So, expected number of failures are
generally given as T divided by MTTF or we can say this is equal to lambda T. So, lambda T
is the failure, number of failures which are there. So, that means every time failure happens,
we have to do the repair. That means time to repair. So, lambda T into MTTR. This is the
time I am spending in corrective maintenance during time interval T.

And how much time I am spending in preventive maintenance? That is the, in each time T, in
each cycle I am having the MPMT time which is spent out of TPM. So, out of T, how many
maintenance sections I am going to take? So, I will say t upon TPM, that will give me the

632
number of time I am doing the preventive maintenance. And each preventive maintenance is
taking the MPMT time.

So, the time is spent in preventive maintenance is MPMT into t divided by TPM. So, this
gives the like how many repair actions we have taken. So, this is the time which I am
spending in repair in a when I am using the system for small t time. And how many repair
actions I am doing, because mean downtime is down time in each repair action. This is the
time.

So, this becomes If I multiply t here, If I multiply t here that becomes the top. But how many
reperections I am taking? That that is number of failures and number of preventive actions.
So, number of failures is t lambda or lambda t and number of repair actions is t of number of
preventive maintenance expenses TPM but t divided by TPM. So, total time divided by total
number of repair actions if you do, we will get the mean downtime.

And since t is common in all this t can be removed. Once we remove the t, we get this
formula, that is that is lambda into MTTR, failure rate into MTTR plus MPMT into TPM. So,
1 upon TPM is kind of frequency of preventive maintenance divided by lambda plus 1 upon
TPM.

So, effectively that gives me the how much is the mean system downtime. How much is the
mean time to restore? So, there is in mean system downtime again we are considering only
the active repair time. MTTR is also active repair time; mean preventive maintenance time is
also active repair time.

The difference is MTTR is the time which you are spending corrective maintenance MPMT
is the time which you are spending in the repairing the during the preventive maintenance.
But there are delay times. MDT is called maintenance delay time. And SDT is called supply
delay time. What are these?

So, not only taking mean time for actual repair, we are also taking the maintenance rate.
What happens? If you go to any, let us say if you go to repair for your vehicle then you have
to wait there because the person or the crew which is supposed to repair your equipment that
crew is busy with certain another task.

633
So, availability of repair crew and the maintenance tools that is giving the maintenance delay
time. When you do the repair and you have done the diagnosis, you require certain spare
parts, certain materials, etcetera, if that material how much time that material. So, many times
that material is available.

So, then supply delay time is zero. But many times that material or the spare may not be
available, in that case, you have to order and get that spare. So, that is the supply delay time.
So, supply delay time is due to the part requirement, maintenance delay time is due to the
maintenance facility requirement.

If we combine all then we call it as the mean time to restore, which is the actual time which
may be taking from failure to actually getting it up again. Then sometimes we try to find out
that on an average how much maintenance work hour we are produced doing for every
operating hour.

So, comparison of maintenance with the operating hour. So, that is maintenance hour divided
by operating hour. So, maintenance our and operating hours if you do that that is given as, let
us say if we have the t operating hour t time we are working in t time number of failures will
be lambda t.

And every failure time taken will be MTTR. So, this is the time which you are spending in
repair. But there may be more than one crew, if the crew facility is only one, then crew will
become 1, but if more than one person is supposed to work, then number of men hours which
are spending will be the crew size, if more people are supposed to be there.

So, this will become the men hour and out of t. So, t and t will get canceled and you will have
lambda MTTR crew. If you also want to include time lost in preventive maintenance then
same thing what we have done earlier. The crew, the preventive maintenance time will also
add. That is mean time spent in preventive maintenance divided by TPM into the crew, same
formula as we discussed earlier, same formula is coming here.

634
(Refer Slide Time: 34:35)

 ln 5 − ln 3.5 
H (5) = Φ   = Φ(1.98) = 0.976
 0.18 
 0.182 
MTTR = 3.5exp   = 3.557
 2 
1 1
= = = 0.001 per hour
MTTF 1000
MH MPMT  CREWPM 2
=   MTTR  CREW + = 0.001 3.557  2 + 2  = 0.027  0.03
OH TPM 200
MPMT
  MTTR + 0.001 3.557 +
2
TPM 200 = 2.2595hr
M= =
1 1
+ 0.001 +
TPM 200

This we have tried to solve using this lognormal repair distribution. So, what is given to us?
Median time 2 repair is given as 3.5 and S is given as 0.18. Now, we want to check whether
my specifications are meeting or not. The specification is asking that 95 percent of the repair
to be completed within 5 hour. So, that means H5, if I calculate here, I have calculated H5 is
5. Ln of 5 minus Ln of 3.5 divided by 18.

We know Ln of 5 minus Ln of T median divided by S. This comes out 5 of 1.98 which is


0.976. So, this is higher than 0.95. That means my criteria of 95 percent maintenance to
complete in 5 hour is meeting. Number of maintenance hours to be less than 3 hours for every
100 operating hour.

635
So, that means we have to calculate MH upon OH. For calculating MH upon OH, we need
the lambda and we need the MTTR, etcetera. So, what is MTTR, once we know this then
MTTR is as we know for lognormal distribution MTTR is given as T median e to the power s
square by 2. So, this is T medium this is exponential s square by 2. This becomes my MTTR.

And MTTF we have to get from here like a 95 percent to maintain warranty perform crew
size. The failure distribution is exponential with MTBF of 1000 hour. So, MTBF is already
given here that is 1000. So, if MTBF is 1000 then lambda will be 1 upon MTBF. So, lambda
will be 0.001. So, lambda is 1 upon MTTF of that is this. Now I can calculate MH upon OH.

Lambda is here, MTTR is here, crew is already given that 2, 2 is required here for both,
preventive maintenance as well as the corrective maintenance. So, we will calculate this here
0.001 3.557 into 2. And the due to preventive maintenance, the mean preventive maintenance
time is 2 hour, 2 hour is taken in the maintenance. So, MPMT is 2. And how frequently that
maintenance is carried out? Every 200 hours.

So, TPM is 200 and crew is 2. So, this if we calculate this turns out to be 0.027 which is less
than 0.03. So, that means my requirement is met. Then we also can calculate the M bar here.
So, that is what? Number of maintenance hours to be less than 3 hour for every, that is 3
divided by 100, that is 0.03. So, we have already verified that MH upon OH is less than 0.03.
So, which is good.

And to maintain warranty 2-hour maintenance is this we have already seen. The crew size is
2 for failure distribution exponential. So, though it is not asking for M bar, we can calculate
M bar here. That is mean a downtime. In the question is not given but we have calculated it
here. So, it will stop here and we will continue our discussion about the topic in next class.
Thank you.

636
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 37
Maintainability and Availability (Continued)

Hello everyone. In previous lecture we discussed about Maintainability. So, we will continue
our discussion in the same directions.

(Refer Slide Time: 00:37)

So, we discussed that how maintainability can be calculated. Maintainability was given as
maintainability is the H t we were writing here. So, H t is equal to probability that T is less
than equal to t or we can say it is integration from 0 to t of h x d x. Now let us see how can
we calculate the same for exponential distribution. So, if repair time is following the
exponential distribution, can we calculate?

So, this is in a way will is going back to what we have done in the week 1 or week 2. Same
thing what we are trying to do again. There we discussed the time to failure and reliability,
here, we are discussing the maintainability. So, since, we have done in between data analysis
etcetera and we are able to understand that how to get this mu etcetera. So, once we have this,

637
so this mu we will estimate from the data. So, here it is expected that we have fit the
exponential distribution to the time to repair data and we have found that mu is 10 per day.

And here the day is 8 hour’s day that means in a dayit is 8 hours working only. So, if I say
mu is mu means 10 that means 10 rapairs per day. So, 10 rapairs in 8 hours. What is the
probability of single repair exceeding one hour. So, though repair rate is given 10 per day and
what is one day one day is equal to 8 hours.

So, what is MTTR? MTTR is the 1 upon repair rate, for repaired or like reversing MTTF is
equal to 1 upon lambda, same way MTTR is 1 upon mu. Mu is the repair rate here. So, that
will become 0.1 per day. Now one day is equal to 8 hours. So, I can say 0.1 into 8 hours. So,
0.8 hours.

So, my MTTR is known to me, lambda is or we can say mu is known to me. So, I can know
the probability. What is the probability? That single repair exceeding one hour that means it
is 1 minus or maintainability. Maintainability is prepare probability of completing repair
within one hour. So, my H t, H 1 is probability of completing repair in one hour. And what I
am interested is?

I am interested in probability of not completing repair in one hour. That means my repair is
exceeding one hour. So, that means I am interested in 1 minus H t probability that t is greater
than one hour, 1 minus H1. So, 1 minus H1 means, we know that H 1 is equal to 1 minus e to
the power minus mu t or we can say t divided by MTTR. Same thing we do it here.

So, I am interested in 1 minus H1. So, 1 minus H1 will be equal to e to the power minus t
upon MTTR. How much t is here? Because H is 1 is here. So, it e to the power minus 1
divided by, how much is MTTR 0.8. So, that becomes, my value, once I solve this taking
exponential of this, my value comes out to be 0.2865. So, that means there is a 28.65 percent
chance that my repair will exceed one hour.

638
(Refer Slide Time: 04:18)

Though we have done one example for lognormal already, we will do it again here. That a
requirement exists for an engine fuel pump to be repaired within 3 hour. So, our T is 3 hour, I
am interested in 3 hours 90 percent of the time. That means my requirement is that H 3 is
equal to is either greater than or equal to 0.9. So, if that means my H3 is 0.9. If repair
distribution is lognormal with s equal to 0.45.

So, my if given is that my I am getting the 90 percent repair completed in 3 hours. So, that
means my maintainability is for 3 hour is 0.9. So, my S value is given here 0.45, I want to
know how much is the MTTR. So, we know that for lognormal distribution there are two
parameters one is s another is t median. Here, t median is not given to us. So, first to know
the distribution we have to calculate the t median. How we can calculate the t median?

We know what is H 3 is equal to standard lognormal cumulative distribution of Ln of 3 minus


Ln of t median divided by s, s is 0.5. And H3 is how much? That is given to us is 0.9. So,
now I can take the phi inverse of 0.9, phi inverse of 0.9 is 1.28. So, Ln of 3 minus Ln of t
median divided by 0.45 is equal to 1.28. So, Ln of 3 minus Ln of t median would be equal to

639
1.28 into 0.45 or I can say Ln of t median would be equal to Ln of 3 minus 1.28 into 0.45 t
median I can get it as exponential of Ln of 3 minus 1.28 into 0.45. This comes out to be this
value, 1.686 hour.

Now, once we get this t median, I can get the MTTR. MTTR formula is known to us that is t
median into e to the power s square by 2. I know the t median also and I know the s value
also that is 0.45. Once I put the these in the formula I can get the MTTR that is 1.866 hours.
So, whether generally, we will see that most of the time MTTR either is considered to be
following the exponential distribution or the lognormal distribution. These are the two
prominent distributions which are used for the time to repair, while for failure data we
consider exponential or viable mostly. So, we can use this.

(Refer Slide Time: 07:57)

m MTTRi

m

i =1 MTBF
 i  MTTRi
MTTRS = m i
= i =1 m
1
 i
i =1 MTBF i =1
i

Now let us say, we have a system. In our system we have m components. Now, every
component will have its own repair time, all components are not going to be repaired in the
same time. So, let us assume that all the systems are having the exponential failure in repair
time.

That means time to failure also follows exponential distribution and time to repair also
follows the exponential distribution. So, we can say that i'th component is having failure rate

640
as lambda i, repair rate as Mu i or we can say, mean time to failure as MTTF i and mean time
to repair is MTTR i. This is MTTF i MTTR i. Now, I want to know the system mean time to
repair. Now, what happens here?

Now, the system when we want to find out the repair time. So, system repair time would be
the time spent in repair. But what happens? Let us say for a system, system can fail due to
any of this component failure. But the system will fail more according to those components
which are failing faster. That means the component which are having higher lambda, they
will feel more.

So, the repair time taken in that component which is having high failure rate will have the
higher contribution in the system repair time and the components which are failing lesser,
their repair time would be having the lesser contribution in the repair time of the system.
Because system is not failure in those failure modes.

So, whichever failure happens based on that repair time will be taken in the system repair. So,
system repair time is actually weighted sum of the repair time and the weight is decided by
the failure rate. So, a system which is having higher failure rate, their repair time would be
having the higher contribution to the system repair.

And the system which, the components which are having lesser repair time or lesser failure
rate, they will be having, because in the total duration of the system life, the number of
failures will depend on the failure rate. The system which having high failure rate, number of
failures will be high.

So, if we say that, if we have the time t then each component will fail lambda i t, that number
of times it will fail is lambda i t, in the total time T. And every time it fails, you have to spend
the time MTTR i. So, per unit time, if you want to calculate this. So, and in time T, how
many failures you are total having?

Total failures you are having is the summation of lambda i t. So, i equal to 1 to m, m here and
i equal to 1 to m here. So, this is the, if you see here, you know this time is the total time
spent in repair because this gives the number of failures due to each component. So, for
component i repair time spent is lambda i t into MTTR i and for all components it will
become summation of this time.

641
So, in total time T, this is the time which you are spending in repair. But we want mean time
to repair, that means per fear how much time it is taking. So, how many failures are
happening? Failures are happening is for one component is lambda i t. So, for n component is
summation of lambda i t. Now, this t and t get canceled. So, this will become lambda into
MTTR i divided by summation of lambda i.

So, this becomes our system repair time. So, as we see here, the system repair time you can
control by assigning lower MTTR to the high lambda. And you may allow High MTTR for
lower lambda. So, if lambda is low, high MTTR will also have the less contribution. But if
lambda is high then you have to keep MTTR to be smaller. By keeping MTTR to be smaller,
you are ensuring the system MTTR is smaller.

(Refer Slide Time: 12:35)

0.006167
MTTRs = = 3.388hr
0.00182

Let us take this as an example. So, if we consider one radio which is having power supply,
amplifier, tuner, three parts are there. Now, power supply has the failure rate 0.00045 and
amplifier has the failure at 0.00130 and tuner has the 0.00007. As you see this is the highest
filler rate and this is the lowest failure rate and this is somewhere in between.

Now, if you see that the time to repair for power supplies 2.3 and all is in hours, amplifier is
3.7 and tuner is 4.6. So, how can we calculate the MTTF of system? MTTF of system we can

642
calculate by multiplying failure rate with MTTR. That is power supply that is amplifier, that
is tuner.

So, this becomes lambda i MTTR i. If you take summation of this and divide by the
summation of lambda i, we get the MTTR of system. And summation of lambda i is this,
summation of this is 0.006167 divided by 0.00182 and that gives me the 3.388 hours. So, this
is a very simple way. Many times we are not sure and we want to know the system MTTR.

So, at the system level, this is a very simple formula we can use for calculating the mean time
to repair. If you know the mean time to repair for each and individual part of it. But along
with mean time to repair, we should also know the failure rate of the each part then only we
can calculate the system MTTR. Because repair only comes into the picture when failure
happens. So, it is dependent on the failure time.

(Refer Slide Time: 14:37)

643
So, next as we discussed earlier, we discussed the maintainability, we discussed the
availability. So, we if we start discussing about the availability now then as we discussed
earlier availability is considered to be like this. Where or I will do this diagram again because
generally what will happen?

A system is up initially. It will work for certain time then it will fail that will be repair then it
will be again up then again prepared both are random variable this distance may not be
always same this distance will also not be same. So, this is up, this is down, and this is our
repair and this is our failure or operation.

So, here we have the time to failure 1 we can say this is time to failure 2, this is time to
failure 3 or we can see time between failure also. Generally, we use time between failures, so

644
I will use TBF here. And this is TTR1, time to repair 1; this is TTR 2, like that. So, if you
take average of TBF, we will get the MTBF, if you take average of repair, time to repair, we
get MTTR. All these concept we have already discussed and we have already seen how to
calculate them.

Now let us discuss that if we want to calculate availability then I will first revise you the
different types of availability which we discussed earlier, then this exponential availability
model we derive from here. We see here that system can be in two state, system or
component whatever we are considering can be in two states.

This is the working state or operation and this is the repair, failure or repair. When it fails, it
goes to the repair. So, what happens when system is working, the failure rate is lambda. So, it
can by the rate of departure from working state is lambda. Once it reaches to the state two in
state 2, it is in failed condition, it is under repair and the repair it is r.

So, the rate by which it can come back again to working condition is r. So, this makes our
Markov diagram. Now, this Markov diagram we can solve. How can we solve? As we
discussed earlier, for Markov diagram dp1 over dt, dp1 over dt is a negative of outgoing and
positive of incoming, outgoing is lambda, lambda from state P1 minus lambda P1 t. Our
incoming is positive that is r into P2, r into P 2 t.

Similarly, here generally, as we discussed earlier, whenever we are solving these equations,
we do not want to use all the equations. Like for one state equation we will replace with our
absolute equation which says that total probability for all states is one. So, rather than
developing for p 2 because if I am having that thus P2 state becomes obviously coming from
the first state.

So, one equation which we also always need to use is p 1 t plus P 2 t is equal to 1. So, we
have the two equations here. Now, if we solve these two equations, we will get this. How can
we solve this? Here, we have the P 2. So, this P 2 we can resolve, like d p 1 t over d t is equal
to minus lambda P 1 t. This also we can turn it into the P 1 that is plus r into 1 minus P 1 t.

So, this will become minus lambda plus r. I will erase this. So, minus lambda P 1 t and from
here again minus r will come. Minus r into P 1 t plus r. So, if we take it on left hand side, this
will become d p 1 t over dt plus lambda plus r into P1 t is equal to r. This equation we have

645
solved earlier also, similar equations. How do we solved? We took this as the multiplying
factor.

So, this becomes e to the power lambda plus r into t P1 t is equal to integration of e to the
power lambda plus r into t into r d t plus C. When we integrate this, r is constant, so, this will
become, I am taking it here. So, this will become right hand side I am taking it here. So, right
hand side is 1 upon lambda plus r e to the power lambda plus r into t into r. So, r I am taking
here plus C.

And left hand side is e to the power minus e to the power lambda plus r into t into P 1 t. So, P
1 t if I calculate that will be equal to, this will go here. So, this will become r upon lambda
plus r plus C into e to the power minus lambda plus r into t. Now, this as we know at time t
equal to 0, system is in state 1.

So, if I put t equal to 0, P 1 t will be equal to 1. 1 is equal to r upon lambda plus r plus C into t
equal to 0 e to the power will become 0. That will be 1. So, c will be equal to C will be equal
to 1 minus r upon lambda plus r. So, same thing we can write. So, here my equation will
become, I am the removing some space here. I will use this space here.

So, here the same equation P 1 t. P 1 t is equal to r upon lambda plus r, plus 1 minus r upon
lambda plus r will be lambda plus r minus r. So, that will become lambda upon lambda plus r,
so, lambda upon lambda plus r. C is equal to and multiply by e to the power minus lambda
plus r into t.

So, if you see this is the same equation P 1 t. P 1 t is r upon lambda plus r plus lambda upon
lambda plus r e to the power minus lambda plus r into t. So, like few steps when we follow,
we are able to derive and we are able to get the P 1 t. Now what is availability here?
Availability here is that system is in working state.

So, the probability that system is in working. So, that means the probability the system is in
state 1. So, P 1 t is nothing but the availability here. And what is the unavailability?
Unavailability is the probability that system is in state 2. And what is P 2? P 2 is 1 minus P 1
t. So, if I subtract this same value from 1, I will get the unavailability, that is probability that
system is in state 2. So, as we have seen here, we get this A t equation that is the that is my
this equation.

646
(Refer Slide Time: 23:11)

If I let us say if I plot it here. I have plotted it here. Let us say for an example, if I take MTBF
is 200 hour and MTTR is 10 hour. So, that means lambda is equal to 1 upon 200 and r is
equal to 1 by 10. So, that is equal to 0.005 and this is equal to 0.1. So, I will apply this
formula r upon lambda plus r that is 0.1 divided by 0.1 plus 0.005 plus 0.005 divided by 0.1
plus 0.005 into exponential of 0.1 plus 0.005. So, this will become 0.952 plus 0.048 e to the
power minus 0.105 into t.

Now, if you see here how this equation will look like. Here we have the constant value 0.952.
So, somewhere here 0.952, if you draw a line, this is the constant line like A t value will
never be below this because this value, exponential value can be some value from 0 to
infinity, it cannot be negative.

647
So, this value cannot be negative. Since this cannot be negative, my this value is always
going to be 0.952 or higher than 0.952. That means my minimum availability is 95.2. And
how this will vary? 0.048 if you add initially t equal to zero, 0.952 plus 0.0, it will be 1.
Initially, it will be starting from 1, at t equal to zero.

Then it will be exponentially decaying and that is defined by the e to the power minus 0.0 t,
0.105 t. And this decay will be continuing and this will become asymptotically touching to
the 0.950. So, as we said that time t is very large or t turning to infinity, what will happen?
My value for the availability will be this. So, this availability we call it as steady state
availability or inherent availability. And what is the formula for this steady state availability?

That is this. What is this? This is r upon lambda plus r. And what is r? r is 1 upon MTTR.
And what is lambda? 1 upon MTTF. And what is r? 1 upon MTTR. That is equal to1 upon
MTTR, this divided by MTTR plus MTTF or M T B F, we can say. M T B F divided by
MTTR into MTBF.

This if we solve, this will be equal to 1 upon MTTR into MTTR into MTBF divided by
MTTR plus M T B F. This will be equal to MTBF divided by MTTR plus MTBF. As you see
this is the popular formula. Whenever you say availability, what is availability? Availability
is MTBF divided by MTTR plus MTBF. This is how it comes. This is the steady state
availability. Because what happens?

When t is infinity, this value will be zero. And only value remaining will be 0.952. So, steady
state availability is actually coming from the how much time we are taking for repair and so,
if our repair time is small, if MTTR is small, what will happen? This value will tend to
become 1.

So, if we can repair fast then we can make the system available for uses for a longer period
because even though failure happens, we are able to repair fast. So, iti s depending on the
how much time we are taking for failure and corresponding to that how much time is taken in
the repair.

So, ratio of repair to failure will be deciding. If in other way, let us try, if I divide by MTBF
then this will become 1 upon MTTR divided by MTBF plus 1. So, if you can work on this
ratio, if you make it almost near to zero, availability will be 1. If you make this ratio, let us
say as low as possible, if you make it 0.001 it will be almost similar to very near to 1.

648
That means by reducing the maintenance time or repair time, you are able to have achieve the
higher availability. That is the steady state availability. That means that is the minimum
availability you will be getting from the system after the when the repair is, when you are
expecting that failure will happen.

But the case here is that we are discussing the TTR and TBF are following the exponential
distribution. But this is what a commonly used formula is and this is widely used for the
purpose. And the availability A t which we have calculated here, this a t this A t is called
point availability. Why it is called point availability?

Because it is giving you at any moment of the time, I want to know the availability I can
know. So, this is function of time, it is depending on the time. So, it is considering the
newness into the picture and when system is new, availability will be higher, when system is
old; availability will be lower even though we are doing the repair.

Similarly, if we want to know that what is my availability here for let us say interval
availability. I want to know that for 20 hours of operation that my objective is I want to use
the system for 20 hours for 20 hours of operation or let us say 24 hours of operation how
much is my average availability or interval availability. So, what I can do?

I can integrate 0 to 24 A t d t and divided by the 24. So, that will be 1 by 24 integration of A t
d t if I do then 1 by 24 or A t is 0.952. If I integrate this with d t it will become t minus 0.048
divided by 0.105 e to the power minus 0.105 t. So, I can calculate this 0 to 24. So, when I put
24, this will be equal to 24 into 0.952. So, this will become 0.952 then again when I put or
minus 0.048 divided by 0.105 into e to the power minus 0.105 into 24.

Then I will put the t equal to 0. So, minus when I put t equal to 0 this will be 0 and this will
become plus. When I put t equal to 0 then this is e to the power minus 0 will become 1. So,
this will be 0.048 divided by 0.105 and this 24 will also be coming here. Here also it will
come, here also it will come. So, this if we solve, we will be able to know the interval
availability.

That means what is the probability that I will find the system or what is the proportion of the
time my system is going to be in working condition in 24 hours of operation. If you wish, we
can let us try to calculate this. I am discarding this. Let me because this is obvious what we
got here. So, may be can use Excel here.

649
(Refer Slide Time: 32:17)

650
651
652
653
I will use the same file just for showing. My availability or my lambda was, so, as I
calculated here that was 1 by 24 into 0.952. So, 0.952. I will just copy this here so that, yeah.
So, when I integrated this. Then first term came as 0.952 as it is. Then second term when I
put 24 there then this became or 0.048 into exponential of minus 0.105 into 24.

And 24 hours we have to divide for all the data. So, this will be. So, as we saw that 0.952 was
coming for first term. And for second term we were getting this, when we put t equal to 24.
Then when we put t equal to 0 then first term was becoming 0 and second term minus minus
became plus. And this was negative term.

So, I will put negative sign here. And positive term was coming to be 0.048 divided by 24
because 2 4 is the hours. So, how much is my availability here? Availability here is sum of

654
these three terms and that turns out to be 0.953. So, that means I can say for 24 hours my
availability is 95.38 which is definitely larger than my steady state availability 0.952 but it is
less than one.

So, here whatever is the period, let us see, if I am interested in 12 hours period, then for 12
hours period, this 24 I will change it into 12 then for 12 hours period if I want to know the
availability then this will be my availability. If you see, my availability has become more in
because in initial 12 hours my system is supposed to work more because chances of failures
are smaller.

If I want to know the availability for let us say12 to 24 hours, if I want to do that then same
formula, what will happen? I am using pan here, pan is not coming. So, in that case as we
discussed earlier, what will happen? First we will write 24. So, we will multiply by or 24 then
subtract again because of the t, it will come again 12. So, this value constant value will come
as it is.

This value will show little change like when we initially we will to take for 24; we are going
to divide by 12 because division factor is 24 minus 12 that will remain 12. But the second
term is also going to be divided by 12 only. But this what will happen here? Here, rather than
0, because e to the power is not 0 here, e to the power I have to take is 12 here because the
second term I have to take is 12 and then divided by 12.

So, if you see that for12 to 24 hours, I want to know the availability between 12 to 24 hours
of operation, that comes out to be 0.95, while I think little it was little larger when I consider
the 0 to 12 hours of availability. So, same 12 hours when I has the initial 12 hours and second
was the 12 hours I have taken after 12 hours. In both the cases availability is little different.

But if you see that if I taketime as let us say for 100 hours to 112 hours, then another time if I
take the 110 to 124 hours 122 hours. So, that 12 hours difference there will not make much
difference because the newness of the system is gone. So, because of that availability will
appear to be similar.

So, when t is becoming larger. When time becomes larger, the change in availability will not
be that much, and the availability would appear to be very similar to 0.952. Only for initial
stages of system operation, it may show little change in the availability. So, we will stop our

655
discussion today here. We will continue our discussions about availability in our next classes.
Thank you.

656
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 38
Maintainability and Availability (Continued)

Hello everyone. So, we will start discussing about Availability today. We already started
discussion and we discussed a few things about it already. So, we will continue our
discussion. Our focus would be mostly on steady state availability evaluation. So, in previous
class, as we discussed, we found the system availability or component availability.

(Refer Slide Time: 00:53)

Now, if you remember, we found the component availability A t. So, we get this availability
for one of the components. Now, assuming that all components are independent as well as
repair is also independent, that means all components are independently getting repaired. One
repair is not influencing other repair, one failure is not influencing other failures, in that case,
we can evaluate system availability without doing complex modeling.

We can do the system availability evaluation by assuming, by using the same formulas which
we have used for the reliability. Like whatever formulas we use for reliability evaluation
same we can use for availability also. Now, only difference would be that in system
reliability evaluation, we considered component reliability then when we consider system
availability calculation, we will use the component availabilities to get the system
availability. That is the only difference. Because the probability concept is same.

657
If component is available then system will be available. So, whatever models we discuss
series, parallel, key out of M, all models which we discussed earlier, the same way whatever
we have calculated reliability, the same way, using the same RBD, we can also calculate the
availability.

So, let us say for series system. We know that for series system, our series system is given
like this. We have data set, three components here. So, earlier we were using r1, r2, r3, now,
we can use a1, a2, a3. So, when we use a1, a2, a3, our availability of system A S t would be
equal to a1 into a2 into a3 which is same as what we discussed for reliability evaluation of
system. So, system availability is nothing but multiplication of component availability
formulas.

But remember, this formula is only applicable when we are considering that repairs, each
component whenever it fails will be getting repaired immediately and one repair of
component will not affect another component of repair, similarly, one failure will not affect
another failure or one of one component failure will not affect repair of another component.

So, so many things are there which may not be always justifiable, but still when we are
dealing with small downtime, this still can be giving you the good analysis and good result.
But when your down times are comparable which is influencing the operating time, in that
case, that availability calculation should be done by using the Markov diagram properly,
which we will discuss next.

However, as I discussed for most of the calculation, in general, especially in the project
phase, when you are a design phase, when you are having the system, you may be able to use
this formulas. But if you have the correct evaluation that will help you to get the availability
values in a proper way.

For parallel system, again, we know that if we have a parallel system, let us say two
component parallel system, A1, A2, then if I want to know the system availability then
system availability is 1 minus 1 minus A1 into 1 minus A2 or we can say A1 plus A2 minus
A1 A2.

Here, we are writing availability as a function of time, this may be steady state availability
also. Same way we can use this formula. So, system availability is nothing but 1 minus

658
multiplication of 1 minus A i t for all the components. So, same formulas we are able to use
it.

(Refer Slide Time: 05:04)

659
For Series configuration


For Parallel configuration

Now, if we take an example. Let us say there are two components and each having constant
failure rate of 0.10 failures per hour. So, that means lambda 1 is equal to lambda 2 that is
equal to lambda is 0.1 failure per hour. Now, let us say repair rate, mu, is equal to 0.2 repair
per hour.

So, generally, as we see that repair rate is supposed to be higher than the failure rate or we
can say the MTTF is supposed to be higher than the MTTR, the repair time supposed to be
lesser than the operating time. Similarly, the repair rate has to be faster compared to the
failure rate.

Now, we want to compute the point availability and interval availability for 10-hour mission.
So, first let us find out what is the any availability of the component. So, availability of any
component as we know it is given as mu upon lambda plus mu plus lambda upon lambda plus
mu e to the power minus lambda plus mu into t.

660
So, this will be equal to, now, mu is 0.2 divided by 0.2 plus 0.13 plus 0.1 divided by 0.3 e to
the power minus 0.3 and t, I am currently writing only t. Now, from here I can calculate the
10 hour mission availability. So, that means I want to know the availability for 10 hours. So,
that will be t will be equal to 10. So, at that time 10, how much will be availability?

Availability is 0.683. So, this is the point availability. We can also calculate them interval
availability for 10-hour mission. So, that means 0 to 10. So, that means if we integrate this
from 0 to 10, how much would this wiil be. This similar example we did it in last class where
we integrated this.

So, when we integrated this, this will become 0.667 t minus 0.333 divided by 0.3 e to the
power minus 0.3 t where t values I have to take from 0 and 10. So, once I put 10 and this
whole has to be divided by 10 minus 0, that will be 10, 1 by 10. So, once I do this that you
will get 0.667 into 10 minus 0.333 divided by 0.3 e to the power minus 0.3 into 10 minus.

When I put t equal to 0 this will be 0 and minus minus will become plus. When I put t equal
to 0 here, this will become 1. So, my value will be 0.333 divided by 0.3 whole divided by 10.
So, this will turn out to be 0.667 minus 0.33 divided by 0.3 would be 0.111 will be divided by
0.1 because 0.1 into 10 will become 1 e to the power minus 0.3. So, 0.3 multiplied by 10 will
give you 3 only minus 3 plus 0.33 that is 0.1 multiplied by 10.

So, as we can see this becomes 0.667, 0.111, if it is common then e to the power minus 3
minus 1 and this value turns out to be 0.772. So, my availability, A i, is the steady state
availability. So, steady state availability is not a function of time because it is the long term
availability. So, this will remain there which will not be changing with time.

So, this value is nothing but the first item here, first entry here, constant value which is
independent of t and that value is 0.667. So, 0.667 becomes a steady state availability. That
means if I put t equal to infinity then this term would be 0. Because e to the power minus
infinity is 0.

So, the only term which is remaining is 0.2 by 0.3 that is 0.667. So, my availability, steady
state availability is 0.667, interval availability is 0.772 and my point availability is 0.683.
Now, let us see I want to calculate the same. I will remove this. I want to calculate the same
for series configuration. Because there are two components in series.

661
So, I have component 1 and component 2 in series. So, same thing I can do for all three
configurations. Like I want to know the point availability for this series configuration. So,
point availability A S 10 will be equal to A1 10 into A2 10. Both are same, that will be 0.683
square, this value. When I calculate interval availability that is A S 0 to 10 that will be equal
to 0.772 square.

Similarly, A S that is for system will be equal to A i square that will be equal to 0.667 square
which comes out to be 0.444, 0.596, 0.467. Similarly, if we want to calculate the same for
parallel configuration. For parallel configuration, the same way as we calculated for series,
same way we have to calculate for parallel. That means A S 10, point availability will be
equal to 1 minus 1 minus 0.683 square.

Similarly, this comes out to be 0.9. Similarly, A S 0 to 10, for interval that is equal to 1 minus
1 minus 0.772 whole square. This turns out to be 0.948. Similarly, A S if I want to calculate
steady state availability, that will be equal to 1 minus 1 minus steady state individual
availability 0.667 whole square. This is equal to 0.8989.

So, we are able to calculate, if we know the individual component availability, we are able to
calculate the availability for any configuration, series, parallel, if I say k out of M that also I
can evaluate, any other configuration which we have, we can use the same formula which we
developed for the RBD for reliability evaluation.

The only thing is we have to, the only problem here is that the assumption is there that the
component repair and failure distributions are unaffected by another component failure and
appear. So, that means they are all independent. So, when a component gets failed, it will be
independently repaired.

(Refer Slide Time: 13:42)

662
Now, let us see we have discussed this problem earlier that when we wanted to know the
reliability of a standby system with repair. So, this is the system we considered system 1 is

663
the working state, system 2 is the standby is working, here the main unit is working and stand
by unit is in stand by. And here both unit failed, work, main unit is failed and strength by unit
is also failed.

Here, main unit is failed. This we have already evaluated, we have already discussed in
detail, done one example also. Now, here why I mentioned this again here? To help you
understand the difference between availability and reliability concept. Here, in whenever we
are seeing reliability evaluation, in reliability evaluation, the system states which are marked
as the system failure state not the component failure, system failure.

So, though here one, in state number 2, state number 2, one component is failed but one
component is working. So, system is not completely failed. That means our objective or
function is still continuing. So, therefore reliability objective is not failed here. But when we
talk about state number 3, in state number 3, the primary unit as well as the standby unit, both
have been failed.

Since both have been failed, so, our mission is influenced here, mission is failed. We are no
longer able to have the continuous use of the system. So, our system action mission is failed.
So, therefore our reliability is, here, we have the failure. So, whenever we are evaluating
reliability, the failure states which we have in the failure states, we will not have any
outgoing link, these will be called absorbing states or terminating states. Why these are
absorbing states?

Because in these states when the moment system enters it cannot come out. Why it cannot
come out? Because system is failed, there is no future after that. The moment it leads to
system, system failure after that the mission is failed and we are no longer concerned about
the system.

However, in case of repairable system, so, this state is also not considered to be the failure
state, even from the failure state, means this is a failure state but from failure state also repair
is considered. Even though system failed but after the system failure, we do the repair and
again make the system up again. So, in this case, we have the reverse repair path from this
state also.

So, in availability calculations, whenever we do, generally, you will not have any state which
is the absorbing state. Now, if you see no state is absorbing, in absorbing state, you have only

664
incoming link, there is no outgoing link. What does it mean? That once you fall in this state,
there is no way out. It is absorbing or terminating state.

But in case of repairable system, no state is absorbable, absorbing state because from average
failure, you can do the repair and you can once again start functioning the system. So, that is
where the difference comes in reliability and availability. So, if I include this link here and
with the repair rate r then my probability of state 1 plus probability of state 2 gives me the
availability.

But if I do not use this link, if this link is removed, then probability of state 1 plus probability
of state 2 gives me the reliability. What is the difference between the two? The difference
between the two is the existence of absorbing state. So, this concept of absorbing state should
be understood that is why I have included this slide from the previous discussion which we
already made where we calculated the reliability.

In case of reliability calculation, the failure state, system failure states were the absorbing
state, the system once fall in the failure state, it cannot be considered to be repaired again to
operating state again. But in case of availability concept, you are repairing the system even if
it is failed. So, that is where the difference in definition, difference in calculation. So, just to
explain this concept I have taken it here.

(Refer Slide Time: 18:56)

665
Now, let us say. So, the figure which I showed earlier, the figure would look like this that of
system state 1 is primary unit working and second the unit is in standby mode. Then what
will happen? You will have the failure rate for primary unit. So, after that your primary unit
may fail and stand by unit will be coming in the working condition here.

After that again, the secondary unit may fail and because secondary unit may fail then
primary unit will also be failed and second unit will also be failed. But here, we have the
system failure. But when I want to evaluate availability, I will be considering a repair from
here also.

So, repair here is also considered. So, that means from here the repair rate is r by which the
system can go into this state. From third state also you have the repair possibility which can

666
bring you bring the system to the second state. So, this becomes our Markov diagram.
Generally, this is given but in this case, there is a inherent assumption.

Because here, if you have the two units in field condition, both primary unit is also failed,
secondary unit is also failed. But repair rate is same. What does it mean? That means same as
here. Because here, one unit is in failed condition, one unit is working. So, only one unit is
under repair.

So, when one unit is under repair, repair rate is r. When two unit is under repair, the repair
rate will be r, if the repair crew is only one. That means you have only one set of facility, one
repair channel, means one set of equipment, or one person only or one group of person which
is, so, that means at one time you can do only one repair.

If that is the case, then repair it would be same. But if you had let us see two repair channels
that means you can do the repair on two systems simultaneously, in that case, this repair rate
would become 2 r, because you are able to repair two systems simultaneously. So, your repair
rate has become double.

So, that will be 2r. So, that means if you have infinite resources, repair resources then the
repair rate would be equal to the number of equipment failed multiplied by the repair rate. If
all are equally repairable or all are same or identical component then they repair it would be
same. If there are two different then you may have to take the summation of the repair MTTR
1 plus MTTR 2.

But then that will be little bit more complicated, we have to do this diagram in a different
way. So, let us say means whenever we have a certain situation, we have to prepare a diagram
accordingly. Like the example which I stated that let us see in state number 1, we have the
primary unit is working and secondary unit is in standby.

Now, from here, I can go to state number 2. In state number 2, there is two possibility that we
have the primary unit is failed and secondary unit is working. So, from here, if the repair is
on and the possibility is that while secondary unit is working, primary unit is getting repaired
and that will become working again. So, if primary unit is repaired then what will happen?

Primary unit will again take place and it will start working and secondary unit will again be
made as stand by unit. Now, from here, we have the third state, that means there is a failure

667
of second item or the standby system. In that case, primary unit is also failed and second unit
is also failed. From here the repair is carried out.

So, if we do repair here, then this is let us say, if we say this is r 1, repair of primary unit, if
you say this is r2, repair of secondary unit, so, then what will happen? Secondary unit will be
repaired but primary unit will remain failed. But there is also possibility that if we are doing
repair of primary unit here that means r1 if we do then what will happen?

We will have the state in which primary unit is in working state and second unit is in failed
state. So, this may also be another working situation. From here again, you may have two
possibilities that again primary unit can fail and you may have the again both failure or from
here, you may have a secondary unit repair, that is r2, in that case, it will again become the
working state 1 in which secondary unit is also repaired, first unit is already working.

So, that will become the first state where first unit is working and second unit is under stand
by. So, this state diagram we have to prepare as per the system condition. So, whatever is our
system condition, how we are operating the system, the system state diagram can be
prepared.

And in each system state diagram, whenever we consider, there is a transition of only one
item. There is a change of system state is only one item, the change of system state not in two
items. So, this condition is only possible when we have the two different repair channels for
say primary unit and secondary unit. But if we have only one channel then this would be the
only case that r will be there.

And if we say that both repair are same r1 and r2 both are same, in that case, what will
happen? This will become 2 r. So, first here, we are considering that one of the unit will get
repaired and that will be under stand by, that will start operating, and meantime if second unit
get repaired then it will become another unit will become stand by and one unit will be in
operating condition.

Again, this diagram looks very good, this kind of diagram you will be able to make if both
units are identical. That means both units are same. If both units are same, their failure rate is
same; their repair rate is same then it becomes a simpler diagram. Then it will be lambda and
it will be also lambda.

668
And while this will be two r and this will be r. 2 r only in case when two repair channels are
there. If it is a single repair channel is only available then at a time only one equipment can
be repaired. So, that will be r. So, I am explaining this for the diagram which we have taken
only with only r.

The same discussion which we do, we can do the same thing with, once we make this
diagram, whatever diagram we make, if you are able to make this diagram, and if you are
able to put rates over there, then any diagram can be solved, to calculate the steady state
availability. As we discussed here, if you want to keep calculate availability here that is the A
t, A t is the system availability A S t, I can say or I can say R S t. This is the system
availability but point, point availability of the system.

Generally, point availability of system, we can calculate this but we have already seen how
difficult it is to calculate the reliability. Now, if I want to use this link also then this equation
will become more complex. As you see here or this will not change but this equation d P 2
over d t will have more changes like this will have plus P 3 state will also come plus r into P
3 t.

So, when I am solving this equation, like earlier when I solved, an another equation is P 1 t
plus P 2 t plus P 3 t is equal to 1. Again, last system equation, we generally do not try one
system equation because that is imperative. So, here now, to solve this I have to again do the
Laplace transformation and again do then first Lplace transformation.

So, we can solve this but it will require more efforts compared to what a force were required
to solve the this equation. However, many times, for our decision making, we are more
concerned about the steady state values rather than the point values. Since, we are more
interested in steady state values, so, we do not want to put these kind of efforts which are
required for calculation of point availabilities.

So, we steady state availability, because steady state availability calculation as we will see,
the method we will discuss, these methods allow steady state system availability for
calculation for even complex systems. So, whatever systems we are using, whatever
processes we are using, once we are able to develop the rate diagram, we can calculate the
probabilities fairly easy compared to the point availabilities.

669
Here, we will not require lot of the differential equations. Because what happens here? The
moment we say steady state availability, in case of steady state availability, that means time
is long time, time tedning to infinity. When time is tending to infinity, system is achieving
steady state. That means the change in probability of any state is 0. What does it mean?

Change in probability of any state is 0. So, change in probability of state 1 is 0. So, that
means whatever is this as, let us see if I spend or 10000 hours then whatever is the probability
P 1 at 10000, P 2 at 10000. Let us say p 3 at 10000. Now, these probabilities which I am
taking, these probabilities would be not, now, I want to calculate probability as at 10100, P1,
P2 10100, P3 10100.

These P1, P2, P3 which I am calculating here and these P1, P2, P3 which I am calculating
here, these probability values will not change much, that changes very very minor almost
inobservable. So, when time is high then the probabilities are not changing, as we have seen
earlier in our diagram that availability tends to become constant after a certain period of the
time.

Generally, around five times or we can say mean five times of mean MTBF etcetera, if you
see this will tend to become the constant. So, because of that this change in probability with
time becomes 0. That means probability 10000 is equal to probability 10100 or 10000 to
11000, both are same. Because probability has become almost same, they are not changing.

The state probabilities are becoming same because there is a interchange, the repair and of
failure are having a too many iterations and because of the iterations we have lost the effect
of time. So, it does not matter what time point we are talking about, the chances of failure and
chances of recovery are same.

So, because of that average availability or the availability which we are seeing at that time is
almost same. Because the system newness is lost and system has iterated many times in
failure and repair. So, when failure and repair process have been iterated few times then
systems reach the steady state where the chances that system will be in repair of our operating
instead would remain same irrespective of which time, we are talking about whether it is
10000 hours or 10100 hours.

Therefore, in these cases, the probability of state we are not writing as a function of time, we
are writing the P i t as P i only. So, because this is no longer a function of time. So, P i t,

670
steady state probability is written as independent of time. It is directly expressed in terms of
time. So, here these diagrams we will discuss in more detail in next class. We will stop it here
today. Thank you.

671
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 39
Maintainability and Availability (Continued)

Hello everyone. So, in previous class we were discussing about availability and steady state
availability, in particular. Let us continue our discussion with the same what we were
discussing last time.

(Refer Slide Time: 00:41)

So, we were discussing this figure. In this figure, we discussed that how to evaluate this.
Now, we discussed that in steady state availability that probabilities change in probability
zero and probabilities are not a function of time, probabilities are individual probabilities
only, constant qualities only. Now, let us see, if we want to make the Markov equations for
this.

So, as we know that d P1 t over dt as we discussed in Markov class d P1 t over dt is whatever


is incoming that is plus. So, and whatever is outgoing is minus. Outgoing is lambda 1. So,
that is minus lambda 1 from state P1, P1 t and incoming is r from state number 2. So, plus r
P2 t.

Similarly, I can write d P2 t over dt equal to. Now, for state number 2, from state number 1, I
am having the incoming link lambda 1. So, lambda 1 P1 t. From state number 2 outgoing
links are two, one is r here, 1 is lambda 2 here. So, r plus lambda 2 into P2 t. And from state

672
number 3, I have the incoming link plus r plus r P3 t. Third equation as we know; we will
simply write P1 t plus P2 t plus P3 t equal to 1.

Now, if you want to solve this, as we discussed, this change probabilities are 0. So, I can put
this value equal to 0, this value equal to 0. So, once I put lambda 1 P1 plus r P2 equal to 0,
this becomes my equation lambda 1 P1 plus r. So, this t also I have removed and this has
become r lambda 1 P1 plus r P3 minus lambda 2 plus r into P2 equal to 0 and P1 plus P2 plus
P3 equal to 1. So, this becomes my set of equation. If I solve this set of equations, I will be
able to get the probabilities which I want.

(Refer Slide Time: 03:02)

673
Now, to solve these equations I have put it on next slide so that I can show you how to solve
this the same set of equations are here. Now, let us say I call them equation number 1, 2 and
3. So, let us take for equation number 1. So, equation number 1, I will express P2 in terms of
lambda P1. So, r P2 will be equal to lambda 1 P1. So, I can write P2 is equal to lambda 1
upon r P1.

Similarly, I can now, take down from equation number 2, I can write lambda into P1 lambda
1 into P1. Lambda 2, I will convert P2 I will convert in terms of P1 minus lambda 2 plus r
divided by. So, P2 will be equal to lambda 1 divided by r, will become P1 plus r P3 equal to
0. Now, again I can take P1 terms on right hand side so this will become r P3 equal to lambda
2 plus r into lambda 1.

So, this will become lambda 1 lambda 2 and this is where r lambda 1 upon r minus lambda 1
into P1. This if I solve, I will take r common. So, this will become lambda 1 lambda 2 but
plus r lambda 1 minus r lambda 1 into P1. This will become lambda 1. This will cut canceled,
lambda 1 lambda 2 divided by r P1. Now, this P1, P2, we have got P3, we have got P2, all in
terms of P1, I can replace all these terms from in equation number 3.

So, this will become P1 plus lambda 1 upon r P1 plus lambda 1 lambda 2 upon, I made some
mistake somewhere. This P3, this is r P3. So, this will become r square. So, r when it goes
here this will become r square that will be r square, P1 equal to 1. So, I can take P1 as
common, 1 plus lambda 1 upon r plus lambda 1 lambda 2 divided by r square will be equal to
1. So, P1 can be calculated as 1 plus lambda 1 upon r plus lambda 1 lambda 2 divided by r
square whole inverse.

Once I get P1 then P2 will be lambda 1 upon r into P1 and and my availability will be equal
to probability of P1, probability P2. So, P1 plus P2 will give me the availability. This I have
shown here by the calculation. So, generally, if you see here, one is for the P1 and for second
state what we do?

This lambda departure divided by arrival, lambda 1 upon r. Then this same lambda 1 upon r
remains same. And for second state, this get multiplied with lambda 2 divided by r lambda 2
divided by r. So, that becomes lambda 1 lambda 2 upon r square. This only happens if there is
no reverse state. If there has been reverse state like this then there will be problem. If there is
only one to one state transition qualities like this and there is no crossover states here then
this formula can be fairly used there 1 plus.

674
So, P1 probability is nothing but 1 upon 1 plus r upon lambda 1 plus r upon lambda 1 into,
sorry, lambda 1 upon r into lambda 2 upon r. So, that will become lambda 1 lambda 2 upon r
square. If I had the third state, this would have become 1 plus lambda 1 upon r plus lambda 1
lambda 2 upon r square plus lambda 1 lambda 2 lambda 3 divided by r cube, whole to the
power minus 1.

And P2 would be same as this factor multiplied by P1 lambda 1 upon r P1. P3 is equal to
lambda 1 lambda 2 upon r square P1. And P4 would be equal to lambda 1 lambda 2 lambda 3
upon r cube into P1. So, we can use this formula in general, but care should be there, because
there should not be two straight transitions happening, one to one state only, in that case only
we can have this simplification. But better is you developed, do not use that kind of formula.
You first develop a set of equations and then try to solve them using this formula.

(Refer Slide Time: 07:54)

675
So, once I have developed this formula I can now solve this equation. Generally. So, this is
the case when I am talking about the single repair person. So, in that case, the my P1 is equal
to same formula I can use 0.002 divide, my failure rate is for lambda 1 is. So, I will use 1 plus
0.002 divided by r, that is 0.01 plus lambda 1 lambda 2 that means 0.002 into 0.001 divided
by r square that is 0.01 into 0.01 and whole inverse. Whatever I calculate.

So, I take one upon of that. That will become inverse that comes out to be 0.8196. And P2
will be equal to this value multiplied by this. And P3 will be equal to this multiplied by this
but I do not need P3. For calculation of availability, I need only P 1 and P2. P3 I can also
calculate as 1 minus of P1 plus P2 because P1 plus P2 plus P3 is equal to 1.

676
Availability is summation of this 0.8196 plus 0.1639. And I will get 0.9836. So, these values
can calculate, there are other methods which I am going to discuss which can be used for
calculating the same value, the steady state availability. Before going for there, let us that also
I will try to show that if I have used, let us say 2r here, rather than r if I have used 2r here.

The change would have been, this would have become 2 r square. That would have been the
only change lambda 1 lambda 2. Because this would have been here, as we calculated, this
would have been 2 r P3. This will be 2 r P3. So, because of that this will become 2 r P3 and
this will become 2 r square. So, my change would have been here. And in that case, if I want
to calculate availability, I can calculate the same again, the only change would be this will
become multiplied by 2. So, what I am trying to do?

(Refer Slide Time: 10:14)

677
678
679
I will just do this here for your reference only. So, I am showing that calculation here or I can
use the, let us say I can use the calculator also. Generally, I am, let us do this here simple
calculation. So, what I have to take? 1 for first for second term is 0.002 divided by 0.001, I
think. Let me just check, yeah. 0.002 divided to 0.01 and third term is 0.0012 into 0.001.

Third term is equal to 0.002 multiply by 0.001 or divided by 2 into 0.01 square would be I
can also do like this or I could directly write because 0.1 square is fairly easy to calculate,
0.01. So, I have 0.01. Now, what will be the sum of this and what will be the inverse of this,
that will be 1 divided by this value. So, as you see, 0.8264 and this will be, second term will
be equal to this, multiply by second term. And third term will be equal to this multiply by
third term.

680
So, this becomes my P1, this becomes P2 and this becomes P3. If you see the summation has
to be equal to 1, so, my three state probabalities are here and my availability will be some of
this or I can say this is availability will be equal to, I do not only system state which is in
which system is not available is the third state.

So, I can say1 minus 0.13 or P1 plus P2, both will be giving me the same. As you here, the
availability has become 99.17 but when we were seeing this, it was 0.98336. So, by ensuring
that two units can be repaired together my probability has become higher, availability has
become higher because the repair is has become faster.

(Refer Slide Time: 13:12)

The same problem as we discussed that the set of equation which you developed here, these
set of equations I can also solve using the matrix formula. So, for matrix formula I do not
have to write these equations. Though I have written these equations but from these equations
I can see that minus lambda 1 P1 r plus r P2 and P3 is not there, so, that will be 0. Second
equation P1 multiplication is lambda 1 P2 is getting multiplied with minus lambda 2 plus r.

So, minus lambda 2 plus r and P3 is getting multiplied by r. And third equation is 1 plus 1 P1
P2 P3. So, that is 1 1 1. This if we multiply by P1 P2 P3, I will get the output, output is 0 0 1.
So, this set of equation I can convert into the matrix equation also. This matrix though I have
prepared from this but I can prepare from here also directly. Like how do I prepare?

As we discussed earlier, this rate matrix I can prepare for state number 1. For state number 1,
whatever is outgoing that is the negative. So, from state number 1, I am having the outgoing

681
is lambda 1. From state number 1, whatever I am having incoming that is r from state number
2, nothing from state number 3, no connection with state number 3 so 0.

For state number 2, what is the relation with P1, state number 1? I have the incoming lambda
1 from state number 1, this. What is the state number 2 relation with itself? That is only the
outgoing link; outgoing links are r and lambda 2. So, minus r plus minus r minus lambda 2.
And what is the relation of second state with third state? That it is having the incoming link
from third state, that is with r transition rate r. And third equation is 1 1 1 into P1 P2 P3 equal
to 0 0 1.

Now, with this, if I want to calculate P1 P2 P3, this will be equal to inverse of this, multiplied
by this. So, I will take minus lambda 1 r 0 lambda 1 minus r plus lambda 2 into r and plus this
r 1 1 1. If I take matrix inverse and multiply by 0 0 1, I will get the metric result which will
be same as P1 P2 P3. So, this I can do. I have done this in Excel. I will show it.

(Refer Slide Time: 16:00)

682
So, let me show you how I have done it here. I have copied it here also like if you see. So,
here what I have done? Like my metric, here is my matrix is this. So, I put the values here.
So, lambda 1 is 0.002, r is 0.01 and r plus lambda 2 is lambda 2 is 0.001. So, 0.001 plus 0.01
will be 0.011. And this again are 0.01 and 1 1 1. Now, this matrix solving, how can how can I
do?

I can use a matrix inverse approach. In the matrix inverse approach, I have taken the inverse
of this. So, inverse of this gives me this matrix. In this matrix, If I multiply 0 0 1, what will
happen? For P1, this would be multiplied by 0, this will be multiplied by 0, this will be
multiplied by 1.

So, this like we multiply this row with this column. So, 0 into this plus 0 into this plus 1 into
this. So, P1 will be equal to this value. Similarly, this row multiplied by this will give the P2,
that will again be this column. So, because only third column will come here because of the 0
0 1. So, we have the P1 P2 P3 which is coming from the third column.

Same values what we have calculated 0.8197 and 0.1639, 0.064, same values you are getting
for P1 P2 P3 and availability will be P1 plus P2 that is 0.9836. So, this is another method
which we can do. Matrix can also be solved using the determinant approach. What is the
determinant approach?

The determinant approach is, let us say, if I want to calculate P1. So, to calculate P1, I will
take the, whatever is my first column. Because P1 is the first entry here. So, the first column
which I have here that will be replaced by the output column. So, that means I will write the

683
column as 0 0 1, 0 0 1 and second column will be 0.01, third will be same as, values are
same. This is actually cartel because of the compression. And 0.0, 0.01, 1.

Now, this I take the determinant. How do I take the determinant? 0 into this will be 0. So,
0.01 into if I take this then that will be 0.01 into 0.01 again. So, that will be 0.0001. This is 1.
Similarly, this matrix has to be divided by the determinant of this matrix, complete metric.
And determinant of this matrix I have calculated here. Same metric which I have used here, I
put here. And the determinant of this matrix is coming out to be 0.000122.

So, whatever value I have got here, this has to be divided by this value, that will give me the
P1. Similarly, if I want to calculate P2, in this matrix second column will be replaced by this
output column 0 0 1. So, this will remain same 0.0 minus 0.002, plus 0.002, 1. This will be
replaced by 0 0 1. Third will remain as it is and we will take the determinant of this value.

Determinant of this value comes out to be this value. Then this will again be divided by the
complete matrix determinant. Once I divide this by this, I get the P2 value which is this value.
Similarly, for third what I will do? I will replace the third column with 0 0 1, first two will
remain same and I will take the determinant of that. The same determinant I will again divide
by the determinant of complete matrix and this will give me this value.

So, this is same formula like this is the use we can use the determinant rather than using the
matrix inverse. Because matrix inverse can sometimes be tedious to calculate but it gives
does the same thing. So, if we do not want to use matrix inverse, we can do the use a
determinant approach.

And determinant approach is fairly easy, we know already how to calculate the determinant.
Like like for this, if I want to calculate, the determinant is 0.002 minus multiply by, just let
me remove this, I will do this here. So, let us say I calculate the determinant of this matrix.
So, that will be equal to first term because this is the, so, that will come as it is multiplied by,
I will take the cross multiplication here. So, that will be minus 0.01 multiplied by 1.

And this will be subtracted, the minus sign will come and this multiplication will be
subtracted from here, 0.01 into 1. Then for second value, I will take the minus sign, minus of
0.01. This will be multiplied with again cross here. This column will not be considered; this
row will not be considered. This will be cross multiplied. That is 0.01 into 0.002 into 0.02
into 1 minus 0.01 into 1.

684
And then I will take the third value. Third value will be plus, plus 0 into cross multiplication
of these columns. So, that will be this one. That will be 0.002 multiplied by 1 minus, minus -
minus will become plus because this is also minus. 0.01 into 1. This if I calculate, I will get
the value equal to this. I have done this in Excel. I will show you. So, you will learn this how
to use Excel for matrix operations.

(Refer Slide Time: 21:41)

685
Now, let us say I have already put it here. Actually most of the things which I have copied
there is taken from here. This is contacted, so, I am little expanding this. So, as you see here
this is my metric which I have to solve. So, first is matrix inverse approach. For matrix
inverse approach, what I have to do?

I will take the, there is a formula in here, matrix inverse, M inverse. So, I will take the M
inverse of this matrix. But this formula will actually give me. What I will do? I will do this
again because this is already output. So, this is equal to M inverse of this I want to calculate
this. This will give me one value. But matrix inverse, I have to get the whole matrix. So, how
to do that first I will select whole matrix.

And wherever my formula is there I will put either I will put here symbol or I can put f 2 also
here for the selection of that. So, this column I will select. Now, I will do control shift enter.
What I put? I put the Ctrl shift Enter. Ctrl Shift Enter helps us to make the array called
calculations. So, here whenever I put Ctrl shift Enter, the same formula is applied for
correspondingly modified and applied for all the entries here.

So, I have to select the metric then I have to go to the first entry where I have calculated and I
have to put the Ctrl shift Enter and I get this value. So, this gives me the inverse directly. And
from here like If I multiply I will get this but I am not trying to multiply here because I
already know that because this is 0 0 1. So, only last column will give me the value. So, this
becomes my P1 P2 P3.

686
(Refer Slide Time: 23:56)

687
688
689
690
Now, let us see the determinant approach. For determinant approach, let us go ahead. So,
what we do? We have to calculate determinant. So, first we have to calculate determinant of
this which will be multiplying to or which will be divided division factor to all values of the
P1 P2 P3. So, let us do this. This I have calculated as M D determinant. So, that is equal to m.
There is a MDETERM, M DETERMINANT is written as DETERM. So, M
DETERMINANT of this.

So, whatever matrix I take, I will take the determinant of that. That will give me the value.
So, it becomes simple. Now, what I will do? I will take this and I will paste only values here.
Once I paste the values here. Now, I have to convert, copy this and I will change the first
column, because I want to calculate P1.

So, for calculating P1, I will change the first column to the output column and I will get this
value. And for calculating P2, I have changed the second column here, rest of the values are
same. And for P3, I change the third column only. I have got the. And to calculate P1, what I
will do?

I will do this. This is equal to this value divided by my original determinant, this is equal to
this. After column replacement divided by this value. This is equal to this divided by this
value. So, I get all the values here, P1 P2 P3. So, as we have seen, the equations which we
have got, you can solve them directly also, you can solve using matrix inverse approach also
or you can use this determinant approach also. Generally, determinant approach I find a little
bit easier and directly straightforward, only calculation has to be made and you can use it.

691
(Refer Slide Time: 26:10)

692
Now, let us say if I had used rather than r, if I had used the 2r here, then this would have
become 0.02. That means the repair rate which I am taking here is 0.02. If I do this I will just
do this change here. Because this value is only changed or I will just copy this because this is
only change. So, wherever it is I will just paste it. I get the new values here.

693
(Refer Slide Time: 26:58)

694
And my availability will be equal to this plus second, 0.99176 as we earlier evaluated. I think
it was here. No, I think the same yeah, this one. This we have solved by equation solving and
this this is what we are getting here. So, we have seen that if we are having this kind of
systems and we are able to prepare a Markov diagram here, we can solve this problem and we
can get the answer. Now, let us take another system here. This is given in problem here also
in next problem.

(Refer Slide Time: 27:42)

Let us say this is our system. Like we discussed about the degraded system. In degraded
system what we consider that we have the system state 1, where system is in good state. But
system can degrade to state number 2 or system can also fail directly. So, we have two state,

695
three state system here. We have this is the normal operating and this is degraded and this is
failed, completely failed.

Now, here we are considering that the system is fully operational, state two is degraded and
state three is in failed mode. The system can be repaired only when it is failed. That means
from degraded state we are not repairing it. So, that means repair is only happening here. So,
this becomes our diagram.

Now, for this diagram, let us say lambda 1 is 2 lambda 2 is 3 and lambda 3 is equal to 1 and
repair is equal to 10. So, I can prepare the rate diagram here. Rate means from 1, how much
is outgoing? That two link outgoing, 2 and 1. So, this will become minus 3. What is coming
from 2? Nothing coming from 2? So, 0. From 3 anything coming? Yes, r is coming. So, this
becomes 10.

For state number 2, anything coming from 1? Yes, 2 is coming. Whatever outgoing from 2,
that is only lambda 2 minus 3. And what is coming from state number 3 to 2? Nothing is
coming. So, no link. And then third equation remains same 1 1 1. This, if we do the matrix
inverse, we get this values directly. So, last column becomes the answer. This I have done
here also.

(Refer Slide Time: 29:37)

696
697
Like here same value minus 3, 0, 10, 2 minus 3 0 1 1 1 have taken. And same formula what I
have used in previous, I have used here and this value gives me the answer. I can use the
same thing using the determinant approach also. So, in determinant, what I did? First, I will
take the determinant of this matrix that will be the division factor to all.

Then for P1 when I am calculating, I will replace first one with 0 0 1. This first column will
be 0 0 1. When I calculate P2 then second column will be 0 0 1 and when I call calculate for
P3 third column will be 0 0 1. So, P1 will be 30 divided by 59, P2 will be 20 divided by 59
and P3 will be 9 divided by 59. Same I have calculated already here.

And if you see my availability coming to be 0.8474. So, as you see here that whatever the
diagram we use based on our assessment, how system is working, what repair it will be there,
what failure rate it will be there, we just mentioned it here and we are able to solve this using
this.

But the assumption here is that distributions are exponential, repair distribution is also
exponential and failure distribution is also exponential. Because then only we can use this
Markov approach which we are using here. This link is I think it is there it is somehow not
shown here. Now, 10 is the link from here to here. This is my r 10. So, it will stop here today.
And next lecture would be the last lecture, where we will try to see little bit more about
maintainability, availability and have some general discussions. Thank you.

698
Introduction to Reliability Engineering
Professor Neeraj Kumar Goyal
Subir Chowdhury School of Quality and Reliability
Indian Institute of Technology Kharagpur
Lecture 40
Summary of the Course Introduction to Reliability Engineering

Hello everyone. So, we are now towards the completion of this course. We have for two
credit, you have already gone through 39 lectures of half an hour each. So, we will try to
summarize what you have studied and also we will try to get little bit more information, little
bit about the risk assessment also. So, initially, we will discuss little bit about risk assessment
then we will try to summarize what we have learned through this course and how you can
take it forward.

(Refer Slide Time: 01:01)

So, for risk assessment, let me just summarize initially on that. We discuss risk assessment in
weak 1. For risk assessment essential requirement is that first you should know the system.
So, system should be known. Then you should also have the context information. Generally,
risk cannot be like general analysis. Reliability can be sometimes done as a general analysis
but that also requires that you provide the context. That means operating conditions there.

Here in Risk analysis also, we have to provide the context. We have to tell what kind of
conditions under which this risk analysis is being carried out. That means what are the system
boundaries. That means which part of systems are considered, which part of systems are not
considered for this analysis, what is the purpose of risk analysis, means whetherit is being

699
done for the benefit, cost benefit, whether it is done as a part of regulatory requirement on
safety or we have to do it as per certain standards or the approaches.

So, this context has to be defined, that will and also we need to tell that where. So, generally,
risk assessment when we are doing, the focus is mostly on the engineering big big systems,
like we have nuclear power plant, we have chemical industries, we have offshore platforms,
we have a various steel industries, we have refineries.

So, different kind of steel industries, all these industries they have inherent risk. Because they
are dealing with the various processes which if they fail can lead to the injuries or death to the
persons either operators or the general public. So, in these cases, generally, this risk
assessment methodology comes into the picture. So, here we have to define the context.

So, under what context we are doing this analysis. So, that will set up that what we are trying
to do here and what is the purpose of this risk assessment. Once we do this, then initially we
will do the hazard identification or hazard, we can say hazard analysis. In hazard analysis, we
will try to find out, like as we discussed earlier, most of the time like in RBD as we see,
relative block diagram, the reliability is when everything is successful. So, that means
everything is going fine, there is no problem and everything is reliable.

But the moment some problem happens, let us say something failed, someone some human
made an error or some external thing, something went wrong which was not supposed to go
wrong in general cases. So, something unexpected happens or some failure happens or some
sort of threat happens or some sort of attack happens, in that case what will happen?

Our system will be exposed to failures; our system will be having the state change. It will no
longer be the in safe state or it may no no longer be will be in the operating state. So, these
events which are causing these kind of failures or the system problems, we have to find those
problems first.

So, that cannot be done by the mathematics. To do that you have to have the proper
brainstorming sessions. And using the brainstorming sessions, so, here we use the hazard
knowledge. Hazard knowledge which we have gathered either through system, either through
our own experience like for this kind of system what kind of problems we have faced or we
learn from others, like other people who are running a similar kind of facility what kind of
problems they have faced.

700
So, and there are some generic groups available, generically, there are agencies available
which provides a generic list of hazards which are applicable for these kind of scenarios or
these kind of systems. So, we have to find out the hazards first and we have to see which
hazards are applicable for our system. Then for each and every hazard or each and every
potential failure, we have to do the system analysis.

So, we use the system knowledge and hazard knowledge and we combine them together to
find out the our system is specific hazards. Once you find out the system specific hazard then
we can use the risk metric or we can use the, generally, we can do the this brief categorical
analysis, we can do the subjective analysis and we can identify that where our hazards are
lying.

So, generally, there are various risk matrix proposed which tries to segregate these results
into three categories or four categories depending on the system. So, here first category could
be the red zone. Here the system is not acceptable. That means the risk is not acceptable. The
risk is not acceptable because there is a high consequence and there is a high probability also.

So, risk is, as we know, risk is equal to frequency and consequence, multiplication, for all
hazards. And what is the summation point here? That is a hazard. So, for all hazard if we
multiply the frequency of their consequences then we will get the risk. So, we identify the
hazard then we find out the probabilities and consequences of the hazards. And once we
multiply and sum them to be together, we get the risk, system risk. But what happens?

This risk calculation would be very tedious because this requires so many frequency values,
so many hazards and their consequences, evaluating them can be a challenging because the
data may not be readily available and sources of data you have to search, you have to model.
So, this will be a huge modeling exercise, complete plant you have to do the model.

To before the modeling, we have to we can also do the first subjective analysis. The hazards
which we have identified here, we do the subjective analysis, through the risk matrix also we
can do. We can find out that on broad categorization where the hazard is lying, whether it is
safe or whether it is to be observed or whether it is under unsafe category.

So, we have the red, yellow and green. Green means it is kind of safe because that kind of
risk is generally seen by the society and that kind of risk generally exist in all our activities,
general activities also. So, people find it acceptable. In yellow category, what happens that

701
risk is little higher on the border side and this may turn to be acceptable also and turn may
turn into the unacceptable also, the reason being that a hazards involved are having the high
consequences.

So, because of that their frequencies has to be controlled. That means the probability has to
be brought down. So, they have to be continuously monitored, they have to be continuous
watched over and their reliabilities has to be controlled so that you are able to have the
probability of failure or frequency of failure is lower value. Once you are able to keep them
in lower side then what will happen?

Your risk will be in the control. So, you are able to have the yellow zone. So, in the yellow
zone whatever hazards are lying you have to provide a tight control, risk control measures, so
that you are able to make sure that you are able to continue using the system. But no hazard
should actually fall in the red region.

If it is falling in red region then you have to work on this using the latest technologies,
processes, etcetera, so that you bring down the probability of those hazards down and you
bring it into the yellow region, either. Or you can also do the, you can also decrease this or or
you can decrease consequences also.

Many times like like when we wear helmet on bike ride then we reduce the chance of
consequences, the chances of death are gone because most as we see that the chance of death
is mostly coming because of the head injury other parts when they get injured may not be
leading to the death.

So, the chances of deaths have been reduced. Or we can say the because of putting the
helmet, we are able to reduce the consequence from death to the major injuries or the minor
injuries. So, we can work similarly, like we recommend that everyone should put a headgear
or should put proper protective clothing whenever they are going into the industry as per the
industry requirement.

That is done so that even if the accident happens, the consequences will not be far, will not be
very high. So, consequences are controlled. So, if we are able to establish that all our hazards
are lying in green zone or many hazards are lying in red zone, our decision is clear that if it is
in green zone, we allow the system to run. So, we do not need a proper or we do not need a
very large risk model there.

702
Similarly, if the hazards are lying in red zone then also we do not have to do much because in
that case that system will not be allowed to run. Because in that case it is falling outside the
red zone. So, such kind of system is not feasible for the use. But when it is falling in yellow
zone then we have to have the risk control.

And because of that, this yellow zone, whenever our system is falling or when we are
bringing the system from red zone to yellow zone, we have to have a tight risk measures and
we need to have the risk modeling done there and we are able to evaluate the probabilities
here, the risk which is evaluated in quantitative manner here.

So, then we may have to do this analysis, by which we try to do the risk analysis, we try to
find out f i and c I for each hazard or each consequence. Once we get this, this will give us
the risk. Then we reach the decision. We can do the sensitivity analysis also here. If we do
the sensitivity analysis, we will be able to find out that which system components are more
contributing to towards the risk.

So, the system components which are more contributing, we can improve there. So, if we
improve there then our risk reduction will be higher in lesser cost. Then we can do the
decision. The decision may be that to stop functioning. That means you do not allow system
to run or you allow the system to run or you say that the system is allowed to run but are
under certain a kind of restrictions and frequent observations and frequent, there will be
frequent inspections, frequently the processes will be checked and made sure that those
processes are followed.

So, this this is general risk assessment process. And to calculate this risk using this f and c i,
we use the P R A. This is mostly called as the Project Risk Assessment model where we try
to calculate this f and c for all the hazards. And we also try to establish. This is generally
written using the faulty analysis and inventory analysis.

So, we are not discussing those things in detail. We can discuss them in maybe in some other
interactions or some other course. This is generally taught at IIT, kharagpur in subject which
we are calling as Probabilistic Risk Assessment, P R A. So, probabilistic risk assessment is
covering this concept how to do all this.

Similarly, like this subject, we also have many other subject in our school, School of Quality
and Reliability. So, which of the school of quality and reliability, in our school, we offer

703
subjects like a software reliability, we offer subjects on maintenance data, repairable system
data analysis.

Generally, what happens the data which we have considered, we have considered that all
datas are identical and independent. But many times whenever repair happens, it may change
the distribution, it may change the failure pattern. Because repair can improve the system or
can degrade the system or it may leave the system as it is. So, in that case, what will happen?

Your failure pattern will change because of the repair. So, in that case that data need to be
analyzed in a little different way. So, that is repairable system analysis. We have courses on
like FDPM that is called Fault Prognosis and Diagnosis. So, here we can learn like what kind
of different for diagnostics can be done or prognosis can be done.

We have software reliability courses where we can try to find out that if we have the software
failure data available, how we can analyze and we can find out those software reliability. We
have at this is the basic introductory course on reliability, this can also be further enhanced to
understand that in general in industries we use the concept of design for reliability D F R
process. So, this D F R includes all this but this also includes little bit more like relative
prediction, F M E C analysis and the testing, goods testing, reliability goods testing, we have
highly accelerated life testing.

Then with some other methods like derating methods. So, all those processes are used so that
the design which we are creating is not only meeting the functional requirements but it is also
meeting the reliability requirements. So, then relative design can be understood and can be is
being taught in some courses at IIT, kharagpur we.

So, we offer M. Tech. courses in M. Tech. courses or that is M. Tech. in Quality and
Reliability Engineering. In this course, we offer some quality courses also where we are
trying to have basic quality concept, quality tools, process quality control, etcetera and from
reliability side human reliability can also be there. So, so many aspects on reliability which
we try to cover in various subjects.

After going through this course you can go through those kind of courses. They may not be
available online now, but whenever they become available or whenever some special courses
are being done by IIT kharagpur or you can or whenever we we would like that if you come
and join us for M. Tech. and PhD.

704
So, if you join us for M. Tech. or PhD, you will be exposed to those courses you are able to
follow them and you are able to do those courses. So, all these courses require some basic
understanding and which is here. This course can, whatever I have taught you, whatever we
have discussed here, all these things you can will help you to build up.

This will provide you the basics which are required for doing any analysis whether it is
warranty analysis or any other kind of life cycle cost analysis, whatever you want to do
further these concepts will become handy and all these concepts you will will help you to
know that how you can do the reliability calculations.

Once you are able to calculate then how you are able to model them whether it is cost model
like there is a life cycle cost model. So, in life cycle cost model generally, what happens
many times we decide the cost of equipment based on the purchase cost. But actually the
costing is not just the equipment cost. Cost also comes from the repairs.

Because if the equipment is failed then we have to invest money on the repair also. So, and
also there is a maintenance cost, there is a operating cost. So, something which we are
buying, something says, let us say if you are going to buy a LED bulb. So, LED bulb is
giving you the same light, you know lesser wattage. So, efficiency is higher. Now, you can
compare that generally the light bulb or the simple bulb will cost you maybe 10 rupees 15
rupees. But led will cost you let us say 200 rupees.

So, in that case, whether this or is justified or not. Because LED looks costly here but
effectively the life cycle cost may come down. The life cycle cost may come down because
LED can work let us say for five years, bulb may work maybe for one year or two year. So,
whatever is the average life which you are getting from the equipment that will define the per
unit time how much cost you are having.

Second thing is that energy consumption for LED is let us say that is 60 watt is equivalent to
let us say 10 watt. So, that is one by six times. So, that means that much electricity bill you
are saving. So, life cycle cost when you are calculating then that will also be counted. So,
when you are seeing all these costs together then it will give you an idea that what is the life
cycle cost.

So, our decision should not be based on the equipment or the face value cost, we our
equipment we, whenever we are purchasing something or whenever we are selling something

705
or whenever we are putting something in our requirement and specifications as the tender
document or something else, the cost which you are asking or the that should be based on the
life cycle cost.

So, whatever is the parameter of life cycle cost, we can ask those parameters individually like
we can ask what is the failure probability for the equipment because that is going to affect our
repair time and repair we can also ask how much repair time it is going to take. We can also
ask the manufacturer that how much cost will be there for the repairs, how much cost will be
there for the spare parts etcetera.

So, all those cost when we take into account, we all together get the life cycle cost. Even after
the component becomes dysfunctional, many times we may either get money back, some
money back or we may have to even sometimes put money for disposal. So, disposal may be
additional cost also or disposal can sometimes be rewarding also. So, that also should be
counted.

Like many times we say that when we purchase of maruti car in India then the resale value is
higher because those cars are easily maintainable and because of the high maintainability of
those car because their spare parts is easily available, the manpower requirement is met by
many people.

Similarly, that is part cost is called smaller. So, the maintenance cost comes out, comes down.
So, effective cost life cycle cost comes down. So, many times people will prefer that. And
when you are selling that also at that time you will get the better, sometimes people by an
observation they feel that it is.

But it is not necessarily that you have one brand over another like other brands like Hyundai,
we may have Honda, we may have other they may also have the same similar pattern but
many times just by going with saying that this is better or that is better maybe making you
biased. Because that is human approach is many time biased.

But if we have the data, if you have the data that how much is the spare part is costing, how
frequently they are failing. Because the spare part even if they are costing, like we discussed
earlier, let us say the same spare part made by one companies costly compared to the another
one.

706
But that is spare part for one the company which is selling in chip that is spare part
requirement is also high because that will fail frequently but the spare part given by another
company which is costly but if it is highly reliable then what will happen the failure chances
is also low.

So, effective cost will come down because for repair, let us say one repair for costly part let
us say the cost is let us say thousand rupees and this is failing let us say once in five years.
So, the cost is coming out to be 200 per year. But if you say the same thing that you are
purchasing it let us say 800 rupees part but this is failing once in three years then what is the
cost you are going to have?

Almost 267 which is higher. Not only the cost is higher, now, what will happen because of
the repair time, that repair time, let us have taken in this part let us say repair time is taker is
one hour. So, for one hour your system is not useful here but here if you see almost this is one
by five almost if you say double then here you are losing two hour approximately and here
you are losing only one hour in five years. So, that means the downtime is also high.

Now, this is also costly because you are not able to use your system. And there is another
thing that you have to take your vehicle to the service agency again and again here you have
to take two times here you have to take only one time. So, that cost also need to be included.
Therefore if you are making our decision then just taking decision based on face value is not
correct, just someone is saying.

If we have the data then we can evaluate the life cycle cost and we know the real cost rather
than just looking at the cost which is being displayed. But for that we need to have an
understanding that how much is the failure rate, how much is the failure probability per unit
time and how much is the repair time, how much is the repair cost.

So, once we have this data, we can do this analysis, at least at the broad level. If you do this
analysis at broad level that will also give you a very good understanding that which is
actually proving you costly in during the use. Because you have to use. You are not taking
something just for purchase value, it is not the showpiece that you once play purchase and
place it, it will be fine. But you have to use it.

So, how long you are able to use a component that you have to see that per unit time of use
how much costly of particular equipment is or particular system is. And based on that you

707
should be deciding that which one is cheaper system and which one is not so cheaper system.
So, life cycle cost should be evaluated for deciding.

And that can also be the criteria for deciding the requirement and setting the reliability
requirement or specification for your system. So, if you do not specify reliability you will not
get it. So, as a customer, you should always ask like for warranty, reliability that how long
you are going to have this particular system for use without failure.

That means how much is the company liability. Many times company liability only covers
the part replacement but that can be a problem sometimes. Like your whole system may fail
because of the failure of one capacitor. Now, capacitor cost only maybe say 1 rupee or 5
rupee but because of the system downtime you may have to replace the complete assembly
that is another case.

Another thing is that system whole as a failing. So, correcting that problem will take time and
because of the downtime, you have to put lot of amount of money. Because you will be
losing lot of money because the system is not working. Therefore, though we are saying that
on failure they are the manufacturer is only replacing component is not enough.

Whenever we have the higher reliability requirements, especially, in the competitive market
or the cases where the safety concern is there, where we have the security concerns, we have
the environmental concern, in that case, the each and every component, each and every
system, each and every part which we are using in our system has to be having a reliability
specification and they should be meeting that reliability requirement.

If they are not meeting the reliability requirement, they should not be taken in. Because
whenever failure happens, someone is going to suffer and that suffering cannot be means that
suffering has to be either passed on the manufacturer so that he is bound to meet the expenses
when that failure happens. But that becomes very difficult to execute.

So, many times you put a reliability requirement on whatever you are purchasing from your
manufacturer or the suppliers and that reliability requirement has to be met. If that
requirement is not met then there should not be only the cost for replacement but there should
be cause for penalty also. That means because you are going to lose.

708
So, someone who is saying that my reliability is this much, if their system reliability is not
met, the number of failures are higher than what they have said will be there for the life
duration, in that case, they should be penalized. Because you are going to suffer for that
system.

So, this kind of policies need to be built and this kind of policies need to be implemented. But
everything whenever you are doing this will only be feasible when you have a proper way of
collecting the data. The failure data operational data has to be collected then only you are
able to force these policies.

Also whenever, this data is collected, you have to make sure that this data cannot be
interfered with or the any person or any organization is not able to modify the data so that the
raw data whenever it is there, it is in most accurate form, which cannot be modified by some
person just for the advantages.

So, that data system also need to be validated, that this is. Once you validate the system data
and make sure that this will not be tempered with, in that case that data set should be
acceptable to all for the decision making. So, it is very essential that reliability requirement
when you put, reliability specifications when you put, this will help you to understand whole
scenario.

And when multiple vendor supplier systems are there, it will help you to understand that and
how reliability performance is there and you will be able to meet your goals as as promised to
the customer. So, with this like there is lot to talk about reliability, there are lot to read about
reliability.

It is a very wide area. It looks like a very specialization but because reliability can be
applicable for everything like for civil system reliability you are doing, you may be applying
the same thing but in a different way. When you are talking about mechanical system, then
also basic thing is this one only, but the way you apply will be differing, because the system
type is different.

So, whenever system type differs, reliability approaches tend to differ, their values tend to
differ. So, all this is a very wide area, any system which you are working on, they all need to
be reliable. So, they or their reliability evaluation has to be properly carried out. So, there is a

709
large amount of information and knowledge available in various books and papers, research
papers.

You can study those and you can understand. I hope this lecture series which we had here,
this will help you when you are reading those papers and books, this will help, whenever you
are reading those things, you will find them more understandable and little bit easier to
understand.

So, that will be the aim of. And I wish you all the best so that you are able to do well in this
course as well as in future also you are able to open to lot of reliability opportunities which
you can take and which you can do lot of work in this area. So, with this we are coming to an
end of this course.

So, I thank you all for going through this course and listening to my lectures and I wish that
you all be successful in your life and you all make some contribution to this engineering, area
of engineering that is we are calling as reliability engineering. Thank you. So, thank you
everyone. So, bye.

710
THIS BOOK
IS NOT FOR
SALE
NOR COMMERCIAL USE

(044) 2257 5905/08


nptel.ac.in
swayam.gov.in

You might also like