You are on page 1of 38

An Introduction to RAMS

RC Sharma
Consultant (O&M)
Louis Berger Consulting Pvt. Ltd.
Hyderabad

Wear out after


Normal Life

Poor Design

Faulty Design

Inadequate
Safety Margin

Poor
Construction

Poor
Workmanship

Poor Quality of
Material Used

Things Do Fail

Randomness, a Characteristics of
Electronic / Electrical Components

RAMS:
1. RAMS will provide Indicators as to how Sturdy and Reliable a
System Design can potentially be.
2. It can help to identify which Parts of a System are likely to
have the major impacts on System Level Failure, and also
which Failure Modes to expect and which Risks they pose to
the Users.
3. RAMS can assist in the Planning of Cost-effective Maintenance
and Replacement Operations.
4. RAMS can provide Indicators for avoiding of the Hazards /
Accidents. Risk Assessment would help to improve Safety
Levels. RAMS Analysis has been increasingly used in the
Assessment of Safety Integrity Levels (SIL).
5. Assessment of how good a Design Enhancement, like
Implementation of a new Part or Redundancy shall work out in
a Real Life Situation.

RAMS and the Life Cycle of a Product

Specify
Reliability
Goals

Allocate
Reliability to
Components
Implement
Design
Methods

System
Effectiveness
& Life-cycle
Costs

Failure
Analysis
FMEA/FMECA

System
Safety
Analysis
(FTA)

Safety Goals
Achieved?
FTA: Fault Tree Analysis
FMEA: Failure Mode Effect Analysis.
FMECA: Failure Mode Effect &
Criticality Analysis.

Yes

Yes

Goals Achieved?

Ready for
Production

No

Design Process

No

Burn-in
Period

Wear-out
Useful
Life

Failure
Rate

Early
Failures

Random
Failures
Wear Out
Failures
Time

Systems / Components experience a decreasing Failure Rate early


in their Lifetime (Called Burn-in Period or Infant Mortality Period).
In the mid and most part of their Lifetime, the Useful Life, the
Systems / Components experience a Constant Failure Rate.
Close to the end of the Lifetime, the Failures increase due to wear
out of Parts and other ageing related problems.
The Bathtub Distribution may be seen as the sum or juxtaposition of
three different Distributions.

Availability:
Reliability:
1. Continuous Wkg. & NOT Failing
over a given Period of Time.
2. Reliable System will last for a long
Time.
3. PM brings up the Reliability.
4. Important Parameters: MTTF &
MTBF.
5. Shall depend upon Design.

1. Concerned about Down Time:


MDT, CM & Resources.
2. Standby Modules: Cold, Warm &
Hot Standby Systems Cost ,
Complexity & Maintenance
Efforts . Complexity do
compromise the Safety, Number of
Components . Example: 10C2
Combinations to analyse for
Failures.
3. Redundancy should be the last
Option Improve the Design &
Manufacturing Process.

6. Reliability is 100%, if you do not


use the System. Decreases with
Use.

4. Important Parameters: MTTF &


MTBF.

7. Poor Reliability results in


Maintenance Effort , Revenue &
Down Time .

6. Reliability is 100%, if you do not


use the System. Decreases with
Use.

5. Shall depend upon Design.

7. Poor Reliability results in


Maintenance Effort , Revenue &
Down Time .

Total Cost

Cost
Acquisition
Costs

Cost of
Failures

Reliability
1.
2.

Cost is high for Low Reliability as well as for High Reliability.


High Reliability requires Focus on Design & Manufacturing
Process.

Cost vs Reliability Curve

Availability:
Ability of a certain Entity to be in the State of providing a Certain Function
under Certain Conditions, at a given Time Instant. It can be measured by
the Probability of an entity E not being failed at a time instant t.
A(t) = Probability [Entity E not failed at time t]
It can be expressed as the Ratio of UP Time over Total Working Time:

AINH =

AACH =
AOP =

MTBF
MTBF + MTTR
MTBM
MTBM + MDT
MTBF
MTBF + MTR

MTR = MTTR + MDTM + MSDT,


Where,
MDTM is Mean Delay in Maintenance and MSDT is the Mean Delay in Supply
of Resources (Spares, Tools & other Logistics.

Total Cost

Cost
Acquisition
& Support
Cost

Cost of
Down Time

Availability
1.
2.

Cost is high for Low Availability as well as for High Availability.


High Availability requires Focus on Design & Manufacturing
Process.

Cost vs Availability Curve

Repair rate:
Limit of the Ratio of the Conditional Probability that the Corrective
Maintenance Action ends in a Time Interval, [t, t + t], when t tends to
zero, given that the Entity is Faulty at time t=0.
Repair Rate is represented by (t).

Maintainability:
Ability of an Entity to be restored into or be kept in a Condition or State
that enables it to perform a Required Function, when Maintenance
Operations are performed under Given Conditions and are carried using
Stated Procedures and Resources. It is, thus, the Ability of an Entity to
be repaired in a given time.
Maintainability is normally measured by the Probability that the
Maintenance Procedure of a certain Entity E performed under Certain
Conditions is finished at time t given that the Entity Failed at time t = 0.
M(t) = Prob [Maintenance of E is completed by time t, when E fails at
time t = 0.

Maintainability:
1. Deals with Repair Time: MTTR (Staff, Facilities
& Logistics).
2. A Good (Maintainable) System can be easily
Repaired.
3. Shall depend upon Design.
4. Important Parameters: MTTF & MTBF.
5. Shall depend upon Design.
6. Reliability is 100%, if you do not use the
System. Decreases with Use.
7. Poor Reliability results in Maintenance Effort ,
Revenue & Down Time .

Skills
(Levels)
Human Factors
(Capabilities /
Limitations)

Repairs
Resources
(Levels)
Accessibility &
Modularisation
Reliability

Spares
(Inventory)
(Levels)

Diagnostics
(Manual / Auto)

Maintainability
Training
(Manuals)
(Levels)

Preventive /
Predictive
Maintenance

Fault isoalation
(Self Diagnosis)

Repairs vs.
Discarding
(Unit Level)

Life-cycle
Costs

Standardisation
&
Level of
Interchangeability
Repairs
(Sub-systems)

(FLM, SLM &


TLM)

Facilities

Tools & Testing


Equipments
Maintenance
Organisation

Maintainability Features

Safety:
1. If compromised, can cause Loss of
Human Life & Property.
2. Causes Repercussions.
3. Safety Standards CENELEC SIL 4,
3, 2, 1.
4. Probabilistic Fail-safe: No single
point of Failure. One Failure should
not lead to Catastrophe and First
Failure should be detected as &
when it occurs.
Can not cause harm when fails.
Example: Redundancy.

Quality:
1. Conformance to laid down
Specifications.
2. A Static Measure of product
meeting its Specifications.
3. Reliability is a Dynamic Measure
of Product Performance.

Trade-off: Complexity, Cost & Risk.


5. Fault Tolerant Systems: Can
continue tom operate with Fault, may
be in a Degrades Mode. Example:
CBTC & Fallback.

RAMS Analysis Aims to bring Random Behaviour to Deterministic

Safety Integrity Level SIL:


SIL is an Attribute of the Safety Functions. One of the 4 Levels SIL-1 to 4
can be assigned to a Function, depending on its Safety Requirements
(Tolerable Hazard Rate THR).
THR provides the Failure Target for Random Failures (Examples: IC
Chip Burnt, Hardware Damaged due to Lightning etc.).
THR Table in CENELEC 50129 provides SIL based on the THR.
CENELEC Standard 50129 also provides for a Set of Design
Techniques corresponding to the SIL to tackle the Systematic
Failures (Examples: HW Design Faults, SW Coding Faults etc.).
THR

SIL

10-9 < THR < 10-8

10-8 < THR < 10-7

10-7 < THR < 10-6

10-6 < THR < 10-5

If the Functional Requirements are to


achieve a THR of less than 10-9, other
means such as combination of several Systems
to be used.

MTBF (Mean Time Between Failures):


It is the Expected Value for Operating Time between the Occurrences of
Two Failures. For Constant Failure Rates, the Function (t) that returns
the Time Elapsed between two Failures becomes a Constant Function as
well and MTBF equals 1/.

MTTF (Mean Time to Failure):


MTBF of the First Failure.

MTTR (Mean Time to Repair):


Expected Value of all the Repair Times of a Component.

Reliability:
Ability of an Entity to perform a Required Function under Given Conditions
for a given Time Interval. In other words, an Entity is Reliable if it hasnt
Failed, i. e. stayed within the Specifications over a Time Interval.
R(t) = Prob [Entity E not failed over Time (0,t)], the Entity is assumed to be
operating at time t = 0.

Slope =

1-A
A

Max.

Design Region
MTTR
Min.
Min.

MTBF

Feasible Design Region


in terms of MTBF & MTTR

X = MTBF = Measure of Reliability


Y = MTTR = Measure of Availability
C(x): Cost Function for Reliabilty.
C(y): Cost Function for Availability.
A = Specified Availability Goal.
Minimise Objective Function:
Min Z = C(x) + C(y)
Subject to,
(1-A)X AY 0
Min. MTBF < X
Min. MTTR Y Max. MTTR

Av.

MTTR

Hazard:
Situation, which has the Potential to cause Damage to the System, Damage
to its Surrounding Environment, Injuries or Loss of Human Lives.

Hazard Analysis:
An Analysis comprising Hazard Identification & Causal Analysis.

Hazard Log:
The Document in which Hazards Identified, Decisions Made, Solutions
Adopted and their Implementation Status are recorded.

Safety:
Freedom from Unacceptable Risk of Harm.

Safety Case:
The Documented Demonstration that the Product, System or Process
complies with the appropriate Safety Requirements.

Risk:
Result of the Crossing of two Criteria - Probable Frequency of Occurrence
and Degree of Severity of the Impact of a Hazard.
Frequency of
Occurrence of
a Hazardous
Event

Risk Levels
Insignificant

Marginal

Critical

Catastrophic

Frequent

Undesirable

Intolerable

Intolerable

Intolerable

Probable

Tolerable

Undesirable

Intolerable

Intolerable

Occasional

Tolerable

Undesirable

Undesirable

Intolerable

Remote

Negligible

Tolerable

Undesirable

Undesirable

Improbable

Negligible

Negligible

Tolerable

Tolerable

Negligible

Negligible

Severity Levels of Hazard Consequences

Risk Evaluation

Risk Reduction / Control

Undesirable

Shall only be Acceptable when Risk Reduction is


Impracticable and with the Agreement of railway
Authority.

Tolerable

Acceptable with Adequate Control and the Agreement of


Railway Authority.

Negligible

Acceptable without any Agreement.

Incredible
Intolerable

Negligible
Negligible
Shall be eliminated

Redundancy or
Duplicity for Critical
Sub-systems
Derating: Operating the System below
its Rated Stress Level

Methods to improve
Reliability &
Availablity of a
Product or System

Choice of Technology
(State-of-Art)

Reducing the Complexity of the


System (will reduce the )

Decreasing Down Time through Good


Maintainability Design FLMD & SLMD + Attention to Environmental
Conditions (Earthing & Surge Protection)

Reliability Function & MTTF:


Reliability is defined as the Probability that a
System / Component will function over some
Time Period t.
R(t) = Pr {T t }, where
R(t) 0, R(0) = 1 &

1 < 2 < 3
1

R(t)

Lim. R(t) = 0
t

For a given value of t, R(t) is the Probability


that the Time to Failure is t.

t
Exponential Reliability Function

If T is the Time to Failure of the System then,


F(t) = 1 - R(t) = Pr {T < t }, where
Lim. F(t) = 1
t

F(0) = 0 &

If f(t) is the Probability Density Function


(PDF) (describing the Shape of Failure
Distribution Function),
f(t) =

dF(t) = - dR(t)
dt
dt

f(t) 0 &

f(t) dt = 1
0

2
F(t)

t
Exponential Failure Function

For an Exponential Reliability Function, with as Failure Rate,

1 < 2 < 3
1
R(t)

2
3
t
Exponential Reliability Function

MTTF is Inverse of the Failure Rate .

dR(t) dt
MTTF = E(T) = t.f(t)dt = -t.
dt
0
0

= [ -t.R(t) ] + R(t)dt
0
0

= R(t)dt = R(t)dt = e-t dt


0
0
0
e-t
] = 1
=[
-

Probability Concepts:
If an Experiment can result in any one of N different
equally likely outcomes, and if exactly n of these
outcomes correspond to event A, then the Probability of
event A is P(A) = n/N.
Probability of an Event A, the P(A), obeys following
Postulates:
1. P(A) is Positive, 0 P(A) 1.
2. Probability of a Certain Event equals 1.
3. If A & B are Mutually Exclusive Events, P(A) + P(B) = 1

Probability of A NOT occurring is:


P(A) = 1 P(A)
Joint Probability that both A & B occur is:
P(AB) = P(A).P(B)
Probability that A or B occur is:
P(AUB) = P(A) + P(B)

Probability Concepts:
Two Events are independent, if occurrence of A does
NOT depend on the occurrence of B. Joint Probability of
occurrence of two Independent Events A & B is:

P(AB) = P(A).P(B)
(The Joint Probability the Probability of Intersection is
equal to the Product of their Probabilities)

Intersection

Two Events are Mutually Exclusive if Occurrence of one


precludes the Occurrence of the other i.e. both cannot
occur simultaneously.

B
A

If A & B are Mutually Exclusive:


P(AUB) = P(A) + P(B) = 1
P(AB) = P() = 0 (Both can not occur simultaneously)

Mutually Exclusive

If two Events are Dependent,


P(A|B) = Probability of A, given B =

P(AB)
P(B)

P(AB) = P(B).P(A|B)

If two Events A & B are independent, P(A|B) = P(A) & P(B|A = P(B))

Bernoulli Distribution (for 2 Mutually Exclusive Outcomes):


Random Variable can take only two values 0 & 1.
If P(x=0) = P(0) = p, then P(x=a) = P(1) = q = 1-p & p + q =1
f(x) = px.q1-x, for x = 0,1
Mean = p
Variance 2 = p.q

Binomial Distribution (for 2 Mutually Exclusive Outcomes):


If Trials are repeated n times then the Probability for x number of
Successes in n Trials:
p(x) =

p .(1 p) , where
x

n-x

=
x

n!
x!.(n-x)!

Mean = np
Variance 2 = np.(1-p)
For p = 0.01 (A Component having 1 chance in 100
= of failing),
p(x=1) =

(0.01) .(0.99)
1

= 0.048

Mean Number of Failures = 5.(0.01) = 0.05


Variance 2 = 0.05.(1-0.01) = 0.0495

Exponential Distribution:
Failures due to completely Random in nature follow this Distribution.
PDF (Probability Distribution Function) is:
f(t) = .e-t, for x 0, and
f(t) = 0, for t = 0
R(t) = e -t
Mean = MTTF =
(variability
increases
increases)

of
as

1 < 2 < 3

Failure
Time
the Reliability

1
Variance = = 2

R(t)

2
3

Standard Deviation =

t
Exponential Reliability Function

Also, R(MTTF) = e(MTTF/MTTF) = e-1 = 0.368


Above would mean that a Component, having Exponential Failure
Distribution has a slightly better than Chance of surviving to its
MTTF. Expressing the same in other way, following this Distribution,
63.2% of Components would have failed by MTTF.

Poisson Distribution:
Poisson Distribution is Discrete Distribution. It is applicable for
Constant Failure Rate. If a Component having a Constant Failure Rate
is immediately repaired or replaced, the number of Failures
observed over a Time t has a Poisson Distribution.
If is the Failure Rate,
e-t.(t)n
, pn(t) being the probability n Failures in Time t.
pn(t) =
n!
An Example:
If x is a Discrete Random Variable representing number of Failures of
a Restorable System over a one year period. If x has a Poisson
Distribution with a Mean of = 2 Failures per year, the Probability of no
more than one Failure a year shall be:
x=1 e-2.2x

Pr (X 1) = F(1) =
x=0

x!

= e-2 + 2e-2 = 3e -2 = 0.406

Normal Distribution:
Normal Distribution is NOT a Reliability
Distribution since the random Variable
ranges from - to + . Normal
Distribution, however, has been successfully
used to model Fatigue and Wear-out
Phenomenon. The Density Function of the
Normal Distribution is Bell-shaped Curve.
The PDF is:
f(x) =

1
2

exp

1
2

(x )2
2

2 = 0.2

,-<x<

2 = 0.5
2 = 1

Parameters & 2 are the Mean and


Variance of the Distribution. The Distribution
is Symmetrical about its Mean with the spread
of Distribution determined by the Standard
Deviation .
Mode and Median are coincident with the
Mean.

2 = 0.5

Normal Distribution (Contd.):

About 68% of Values drawn from a normal


distribution are within one Standard Deviation
away from the Mean, about 95% of the Values lie
within two Standard Deviations, and about 99.7% are
within three Standard Deviations. This fact is known
as the 68-95-99.7 Rule, or the 3-sigma Rule

Series System:
Success of all the Components of the System are essential
for System Success. System Reliability is the Product of
Component Reliabilities.

Rs = r1. r2. r3. . rn


Any Component Failure will result in System Failure.
System Failure Rate is sum of Failure Rates of Component.
s = 1 + 2 + 3 + . + n
System MTTFs =

1
s

Parallel System:
If any one Component works well, System will
work well. System will fail only if all Components
n
fail.
(1 ri)
System Reliability Rs = 1 - i = 1
(Product of Un-reliabilities)
Rs(t) = Max. {r1(t), r2(t), r3(t), . rn(t)}
For a two Component system in Parallel,
Rs(t) = 1 (1 e- 1t). (1 e 2t) = e- 1t + e 2t e (1 + 2)t

System MTTFs =

Rs(t) dt =
0

MTTFs =

1
+
1

e- 1t dt +
0

1
1
2
1 + 2

e 2t dt 0

e (1 + 2)t dt
0

Non Series Parallel System:

Bridge Network
Path / Tie Set Methods;
Tie Sets of above Bridge Network: {1,3} {2,4} {1,4,5}
Above Tie Sets reveal that System will work if 1 & 3 work or
2 & 4 work or 1, 5 & 4 work.
Cut Set Method:
Cut Sets of above Bridge Network: {1,2} {3,4}
System shall NOT function if Components of Cut Sets fail
simultaneously.
For working out System Reliability, we need to work out the
probabilities of either Tie Sets or Cut Sets.

K out of M System:
At least K of the Sub-systems or must function for System Success.
M-K-1 or more Failures will result in System Failure.
Rs =

X=K

rX.(1 r)M-X

(Derived from Binomial Distribution)

MTTFs =

X=K X

1 M 1
X
X=K

0.9

R1
0.9

0.98
R3

R2

0.98
R6

0.99
R4

0.99
R5
C

RA = [1 - (1 - R1) (1 - R2)] = [1 (0.1) (0.1)]


RA = 0.99
RB = RA.R3 = (0.99) (0.98) = 0.9702
RC = R4.R5 = (0.99) (0.99) = 0.9801
Rs = [1 - (1 - RB) (1 - RC)].R6
Rs = [1 (0.0298) (0.0199)].(0.98) = 0.9794

A System comprising Components in a


combined Series-Parallel Relationship

Failure Rate:
It is the Transition Function between a Working State and a Failed
State of a Component, Sub-system or System. It can be analytically
expressed as the Probability of a Failure to occur in a Time Interval
given that the Component was working up until then. Being a
Transition Function, it deals with Short Time Intervals.
P {t < T t + t | T > t}
(t) = Lim.
t
t 0

1
1 2

2 1
2
4

Transition States for a 2Component System in a


Redundant Configuration.

1
1 2

2 1
2

Rate Diagram for a


2-component System
For a 2-Component System
in Redundant Configuration,
only State 4 shall result in a
Failure:
State

Component 1

Component 2

Operating

Operating

Failed

Operating

Operating

Failed

Failed

Failed

P1(t) + P2(t) + P3 (t) + P4(t) = 1


Rp(t) = P1(t) + P2(t) + P3 (t)
P1(t+t) = P1(t) 1t.P1(t) 2t.P1(t)

(Above means that Probability of the system


being in State1 at time t+t is equal to it being
in State1 at time t minus the Probability of it
being in State1 at time t multiplied by the
Probability of transitioning 1t to either
State2 or 3)
P2(t+t) = P2(t) + 1t.P1(t) 2t.P2(t)
P3(t+t) = P3(t) + 2t.P2(t) + 1t.P3(t)
P4(t+t) = P4(t) + 2t.P2(t) + 1t.P3(t)
From Equation 1:
[P1(t+t) - P1(t)]
dP1(t)
Lim.
= - (1 + 2).P1(t)
=
dt
t
t 0
Similarly,
dP2(t)
= 1P1(t) - 2.P2(t)
dt
dP3(t)
= 2P1(t) 1.P3(t)
dt

1
2
3
1
4
1

Solving for P1(t), P2(t) & p3(t),


P1(t) = e-(1+ 2)t
P2(t) = e -2t e-(1+ 2)t
P3(t) = e-1t e-(1+ 2)t
P4(t) = 1 P1(t) P2(t) P3(t)
RP(t) = P1(t) + P2(t) + P3 (t) = e-1t + e2t e-(1+ 2)t
For 1 = 2 = , RP(t) = 2e-t e-2t
For a 2-Component Redundant System,
R(t) = 2e-t e-2t

2e-t ] - [ e-2t
]
MTTF = R(t)dt = (2e-t e-2t) dt = [
-2
- 0
0
0
0
MTTF =

1
2

1.5

In other words, the 2-Component Redundant


System will increase the MTTF by a Factor of 1.5
over the Single Component MTTF.

Thanks for
Kind Attention