You are on page 1of 56

Unit V Reliability

Dr. K. Venkadeshwaran
Introduction to reliability concepts

• Definition – Performance, cost & reliability –


Reliability characterization – bathtub curve –
constant failure rate model -Time dependent
failure rates
What is reliability
•  Reliability is defined as the probability that a device will
perform its intended function during a specified period of
time under stated conditions.
• The constraint of “stated conditions” is important as it is
impossible to estimate the failure probability for unlimited
conditions. Reliability usually changes as a function of time
and is denoted as R(t). Examples of reliability statements
are:
– The basic coverage warranty lasts for 36 months or 36,000 miles.
– We warrant the bulb will be free from defects and will operate
for 3 years based on 3 hours/day.
Different views of reliability
Importance of Reliability
• Reputation
• Customer satisfaction
• Warranty costs
• Repeat business
• Cost analysis
• Customer requirements
• Competitive advantage
Reliability-Cost Trade-Off Curve 
Reliability-Cost Trade-Off Curve 
• If the producer increases the reliability of his product, he will increase
the cost of the design and/or production of the product. However, a low
production and design cost does not imply a low overall product cost.
• The overall product cost should not be calculated as merely the cost of
the product when it leaves the shipping dock, but as the total cost of the
product through its lifetime. This includes warranty and replacement
costs for defective products, costs incurred by loss of customers due to
defective products, loss of subsequent sales, etc.
• By increasing product reliability, one may increase the initial product
costs, but it will most likely decrease the support costs. An optimum
minimal total product cost can be determined and implemented by
calculating the optimum reliability for such a product. Figure 5.4 depicts
such a scenario. The total product cost is the sum of the production and
design costs as well as the other post-shipment costs.
• It can be seen that at an optimum reliability level, the total product cost
is at a minimum. The optimum reliability level is the one that coincides
with the minimum total cost over the entire lifetime of the product.
Does the above reliability – cost tradeoff graph applies to software?

• Not really
• Software has no associated manufacturing costs, so warranty
costs, and savings are almost entirely allocated to hardware.
• If there are no cost savings associated with improving software
reliability, why not leave as it is and focus on improving
hardware reliability to save money?
– One study found that the root causes of typical embedded system
failrures were SW and not HW by a ratio of 10:1
– Customers buy systems and not just HW.
• Therefore the benefits for a SW reliability program are not in
direct cost savings, but helps in
– Increased SW staff availability with reduced operational schedules
resulting from fewer corrective maintenance content
– Increased customer goodwill based on improved customer satisfaction
total life cycle costs (LCC),
• To minimize total life cycle costs (LCC), an
organization must do two things:
• Choose the best tools from all of the tools
available and apply those tools at the proper
phases od the product life cycle.
• Properly integrate these tools together to assure
that the proper information is fed forwards and
backwards at proper times.
•  
Reliability 'Bathtub' curve
Bath Tub curve
Bath tub curve – infant mortality
• Fabrication process damage
• Oxide defects and damage
• Ionic contamination
• Package defects (cracking)
• Some application overstress
• Soldered defects
• Screws/ cables not installed properly
Bath tub curve (useful Random Failures)

• External Random occurrences (Lightning,


Power spikes)
• Cosmic rays
• Combination of events (can’t explain exact
cause)
Bath tub curve Wearout failure
• Metallization failure
– Dendrite growth (Silver, tin)
– Electromigration
– Corrosion
– Fatigue and Fretting (Solder)
• Lubrication breakdown
• Time dependent dielectric breakdown
Time dependent failure rates at different
levels of load l1>l2>l3
Representative failure rates for different
classes of systems.
Bath tub curve mimics human life
Instantaneous Vs Cumulative Failure rate

• Instantaneous failure rate eis the failure rate


for a product in a short period of time T1 and
T2
• Cumulative failure rate is the failure rate in the
time all the way up to a specific point in time.
• And the reliability function is the inverse of
the cumulative failure rate (reliability function
is the probability of survival)
Human mimic of bath tub curve
Reliability Characterization
• Reliability function: According to Leith (1995), the
reliability of a product is the measure of its ability to
perform its function, when required, for a specified
time, in particular environment. Reliability is defined
as the probability that a system (component) will
function over some time period >t (Ebeling, 1997).
This relationship is expressed mathematically as, if T
denote the time to failure for a unit with probabihty
density function (pdf) f(t) and r is a pre-assigned time-
point, then the reliability of the unit is defined as
Reliability Characterization
• The PDF, f(t) , has the physical meaning
(5.1)

• For vanishingly small ∆t, the CDF now has the


meaning

probability density function (pdf) and cumulative distribution function (cdf) 


complementary cumulative distribution function (ccdf)
Failure rate function:
• The rate at which failures occur in a certain time interval λ(t) is
called the failure rate during that interval. It is defined as the
probability that a failure per unit time occurs in the interval, given
that a failure has not occurred prior to the beginning of the interval.
Thus the failure rate is given by
Hazard rate function
• The hazard rate is defined as the limit of the failure rate as the
length of the interval, [t1, t2] approaches zero. Thus, it is
instantaneous failure rate. The hazard rate h{t) is defined as

• The quantity h(t)dt represents the probability that a device of


age t will fail in the small interval of time t to t + ∆t. Hazard
rate thus indicates the changing rate in the aging behaviour
over the life of a population of components.
Mean time to failure (MTTF)
• The expected life, or the expected time during
which an item functioning until first failure will
perform successfully, is defined as
• where f(t) is the pdf of T, the lifetime of an
item. As the lifetime of an item has to be non-
negative, we define f(t) for T ≥0.
• at times we may denote the unreliability as

• Equation 5.5 may be inverted by


differentiation to give the PDF of failure times
in terms of the reliability:
Failure Rate
• Insight is normally gained into failure mechanisms
by examining the behavior of the failure rate. The
failure rate,  λ (t) rnay be defined in terms of the
reliability or the PDF of the time-to-failure as
follows.
• Let λ (t), ∆t be the probability that the system will
fail at some time t < t+ ∆t given that it has not vet
failed at t =t. Thus it is the conditional probability

• By the definition of a conditional probability, we


have
• The numerator on the right-hand side is just an
alternative way of writing the PDF; that is,

• The denominator of Eq. (5.9) is just R(t) as may


be seen by examining Eq. 5.3. Therefore,
combining equations, we obtain

• This quantity, the failure rate, is also referred to


as the hazard or mortality rate.
• The most useful way to express the reliability
and the failure PDF is in terms of the failure
rate. To do this, we first eliminate from Eq.
5.11 by inserting Eq. 5.7 to obtain the failure
rate in terms of the reliability,

• multiplying by dt, we obtain

• Integrating between zero and t yields


• since R(0) =1. Finally, exponentiating results in
the desired expression for the reliability

• To obtain the probability density function for


failures, we simply insert Eq. 14 into Eq.13 and
solve for f(t)
MTTF
• Probably the single most-used parameter to
characterize reliability is the mean time to failure (or
MTTF). It is just the expected or mean value E{t} of the
failure time t. Hence
• The MTTF may be written directly in terms of the
reliability by substituting Eq. (7) into Eq. (17) and
integrating by parts

• Clearly, the tR(t) term vanishes at t:=0. Similarly, from


Eq. (16), we see that R(t) will decay exponentially or
faster, since the failure rate λ (t) must be greater than
zero. Thus tR(t) --> 0 as t-->∞. Therefore, we have
MTTF - Example
MTTF
Failure Rate Model
• In the following sections models for representing failure rates
with one, or at most a few parameters, are discussed. These are
particularly useful when most of the failures are caused by early
failures, by random events, or by aging effects. Even when more
than one mechanism contributes substantially to the failure rate
curve, however, these models can often be used to represent
the combined failure modes and their interactions.

• CONSTANT FAILURE RATE MODEL


• and TIME DEPENDENT FAILURE RATE MODEL
• Random failures that give rise to the constant
failure rate model are the most widely used basis
for describing reliability phenomena. They are
defined by the assumption that the rate at which
the system fails is independent of its age. For
continuously operating systems this implies a
constant failure rate, whereas for demand failures
it requires that the failure probability per demand
be independent of the number of demands.
CONSTANT FAILURE RATE MODEL
• The constant failure rate approximation is often quite adequate
even though a system or some of its components may exhibit
moderate early failures or aging effects.
• The magnitude of early-failure effects is limited by strict quality
control in manufacture and installation and may be further
reduced by a wearing period before actual operations are begun.
• Similarly, in many systems aging effects can be sharply limited by
careful preventive maintenance, with timely replacement of the
parts or components in which the wear effects are concentrated,.
• Conversely, if components are replaced as they fail; the overall
failure rate of a many-component system will appear nearly
constant, for the failure of the components will be randomly
distributed in time as will the ages of the replacement parts.
• Finally, even though the system's failure rate may vary in time, we
can use a constant failure rate that envelops the curve; this rate
will be moderately pessimistic.
• In the following sections we first consider the
exponential distribution. It is employed when
constant failure rates adequately describe the
behavior of continuously operating systems. We
then examine two demand failure models, one
in which the demands take place at equal time
intervals and the other in which the demands
are randomly distributed in time. Both may be
represented as constant failure rates.
•  
The Exponential Distribution
• The constant failure rate model for continuously operating
systems leads to an exponential distribution. Replacing the
time-dependent failure rate λ (t) by a constant λ in Eq. (5.16)
yields, for the PDF,

• Similarly, the CDF becomes

• and from Eq the reliability may be written as


Plots of f(t), R(t), and λ (t) (the failure rate)
are given in Fig.
• The MTTF and the variance of the failure times are also given
in terms of , λ .And

• and the variance is found from

• A device described by a constant failure rate, and therefore


by an exponential distribution of times to failure, has the
following property of "memorylessness' ' : The probability
that it will fail during some period of time in the future is
independent of its age. This is easily demonstrated by the
following example.
Exponential model
• EXAMPLE
• A device has a constant failure rate of λ =
0.02/hour
• (a) What is the probability that it will fail during
the first 10 hr of operation?
• (ô) Suppose that the device has been
successfully operated for 100 hr. What is the
probability that it will fail during the next 10 hr of
operation?
• That the probability of failure within a specified time
interval is independent of the age of the device should
not be surprising. Random failures are normally those
caused by external shocks to the device; therefore, they
should not depend on past history. For example, the
probability that a satellite will fail duiing the next
month owing to meteor impact would not depend on
how long ihe satellite had already been in orbit. It
would depend only on the frequency with which
meteors pass through the orbit.
Demand Failures
• The constant failure rate model has thus far been derived for
a continuously operating system.
• It may also be shown to be applicable to a system exposed to
a series of demands or shocks, each one of which has a small
probability of causing failure.
• Suppose that each time a demand is made on a system, the
probability of survival is r, giving a corresponding probability
of failure of
• The term demand here is quite general; it may be the
switching of an electric relay, the opening of a valve, the start
of an engine, or even the stress on a bridge as a truck passes
over it.
• Whatever the application, there are two salient points. First,
we must be able to count or at least infer the number of
demands; and second, the probability of surviving each
demands of the number of previous demands.
• Rn, as the probability that the system will still
be operational after n demands.
• Xn signify the event of success in the nth
demand,
• if the probabilities of surviving each demand
are mutually independent, Rn is given by

• or since P{Xn,}= r for all n,


Demand failure example
• EXAMPLE:
• A telecommunications leasing firm finds that during the one-
year warrantee period, 6% of its telephones are returned at
least once because they have been dropped and damaged. An
extensive testing program earlier indicated that in only 20% of
the drops should telephones be damaged. Assuming that the
dropping of telephones in normal use is a Poisson Process,
what is the MTBD (mean time between drops)? If the
telephones are redesigned so that only 4% of drops cause
damage, what fraction of the phones, will be returned with
dropping damage at least once during the first year of service?
• Solution
– R = 1-0.06 = 0.94
– p = 20% = 0.2
– t=1
Since p << I is often a good approximation, we see that the reliability decays exponentially with
the number of demands. If the rate at which demands are made on the system is roughly
constant, we may express the number of demands occurring before time t as
TIME.DEPENDENT FAILURE RATES
• A variety of situations in which the explicit treatment of early
failures or aging effects, or both, require the use of time-dependent
failure rate models. This may be illustrated by considering the
effect of the accumulated operating time To on the probability that
a device can sulive for an additional time t.
• To model early failures or wear effects more explicitly, we must turn
to specific distributions of the time to failure. In contrast to the
exponential distribution used for random failures, these
distributions must have at least two parameters. Although the
normal and lognormal distributions are frequently used to model
aging effects, the Weibull distriburion is probably the most
universally employed. With it we may model early failures and
random failures as well as aging effects.
The Normal Distribution
• the normal distribution is particularly useful
for describing aging when we can specify a
time to failure along with an uncertainty, ∆t.
The Weibull Distribution
• The Weibull distribution is one of the most widely used in reliability
calculations, for with an appropriate choice of parameters a variety
of failure rate behaviors can be modeled. These include, as a special
case, the constantnfailure rate, in addition to failure rates modeling
both wearin and wearout phenomena. The Weibull distribution may
be formulated in either a two- or a three-parameter form.
• Two common versions used in reliability
– Two parameter Weibull
– Three parameter Weibull
• Where the three parameter Weibill has a location parameter when
there is a non-zero time to first failure
• The three parameter Weibull distribution can be expressed as:

You might also like