Professional Documents
Culture Documents
Introduction To RCM
Introduction To RCM
6800$5<
This report summarizes the main elements of Reliability centered maintenance (RCM). The
presentation is to a great extent based on the outline of the RCM methodology by Rausand & Vatn
(1998). In this presentation we have made effort to include ideas and examples from railway
applications.
RCM is a method for maintenance planning developed within the aircraft industry and later adapted to
several other industries and military branches. This report presents a structured approach to RCM, and
discusses the various steps in the approach. The availability of reliability data and operating experience
is of vital importance for the RCM method. The RCM method provides a means to utilize operating
experience in a more systematic way. Aspects related to utilization of operating experience are therefore
addressed specifically. In this paper, RCM is put into a risk analysis framework, taking advantages of
reliability modelling in a more structured way than in more traditional RCM approaches.
7$%/(2)&217(17
SUMMARY.............................................................................................................................................. 1
TABLE OF CONTENT............................................................................................................................ 1
1 INTRODUCTION ............................................................................................................................. 2
2 A CONCEPTUAL MODEL FOR RCM ........................................................................................... 2
3 MAIN STEPS OF AN RCM ANALYSIS......................................................................................... 3
Step 1: Study preparation................................................................................................................. 4
Step 2: System selection and definition........................................................................................... 5
Step 3: Functional failure analysis (FFA)........................................................................................ 6
Step 4: Critical item selection........................................................................................................ 10
Step 5: Data collection and analysis .............................................................................................. 12
Step 6: Failure modes, effects and criticality analysis................................................................... 14
Step 7: Selection of Maintenance Actions..................................................................................... 16
Step 8: Determination of Maintenance Intervals ........................................................................... 18
Step 9: Preventive maintenance comparison analysis ................................................................... 21
Step 10: Treatment of non-MSIs ..................................................................................................... 22
Step 11: Implementation.................................................................................................................. 22
Step 12: In-service data collection and updating ............................................................................. 22
4 DISCUSSIONS AND CONCLUSIONS ......................................................................................... 23
General benefits:.............................................................................................................................. 23
Problem areas in the analysis: ......................................................................................................... 24
Conclusions: .................................................................................................................................... 25
REFERENCES ....................................................................................................................................... 26
M. Rausand and J. Vatn. Reliability Centered Maintenance. In C. G. Soares, editor, Risk and Reliability in
Marine Technology. Balkema, Holland, 1998
,1752'8&7,21
The reliability centered maintenance (RCM)
concept has been on the scene for more than 20
years, and has been applied with considerable
success within the aircraft industry, the military
forces, the nuclear power industry, and more
recently within the offshore oil and gas industry.
Experiences from the use of RCM within these
industries (see e.g. Sandtorv & Rausand 1991)
show significant reductions in preventive maintenance (PM) costs while maintaining, or even
improving, the availability of the systems.
According to the Electric Power Research
Institute (EPRI) RCM is:
a systematic consideration of system
functions, the way functions can fail,
and a prioritybased consideration of
safety and economics that identifies
applicable and effective PM tasks.
The main focus of RCM is hence on the system
functions, and not on the system hardware.
Several textbooks and reports presenting the
RCM concept have been published. The most
important books are Nowlan and Heap (1978),
Moubray (1991), Smith (1993), Anderson &
Neri (1990), and Moss (1985). These textbooks
provide a good introduction to RCM, but most
of them are a bit inaccurate regarding stringency
of definitions of the basic concepts. The main
ideas presented in these textbooks are more or
less the same, but the detailed procedures are
rather different.
Mk
..
C1
M1
B2
C2
M2
B3
M3
Undesired
event
Barrieres
B1
C3
Total
loss
:
Fault tree
analysis
Event tree
analysis
Risk analysis
$1
5&0
Rausand&Vatn
11.Implementation
8 Maintenance intervals
7 Maintenance tasks
6 FMECA
5 Data collection and analysis
4 Critical item selection
3 FFA
2 System selection
1 Study prep.
Time
6WHS
6WXG\SUHSDUDWLRQ
The main objectives of an RCM analysis are:
1. to identify effective maintenance tasks,
2. to evaluate these tasks by some costbenefit
analysis, and
3. to prepare a plan for carrying out the
identified maintenance tasks at optimal
intervals.
Rausand&Vatn
decision theoretical framework (Vatn 95 and
Vatn et al. 1996).
RCM analyses have traditionally concentrated
on PM strategies. It is, however, possible to
extend the scope of the analysis to cover topics
like corrective maintenance strategies, spare part
inventories, logistic support problems, etc. The
RCM project group must decide what should be
part of the scope and what should be outside.
The resources that are available for the analysis
are usually limited. The RCM group should
therefore be sober with respect to what to look
into, realizing that analysis cost should not
dominate potential benefits.
In many RCM applications the plant already has
effective maintenance programs. The RCM
project will therefore be an upgrade project to
identify and select the most effective PM tasks,
to recommend new tasks or revisions, and to
eliminate ineffective tasks. Then apply those
changes within the existing programs in a way
that will allow the most efficient allocation of
resources.
When applying RCM to an existing PM
program, it is best to utilize, to the greatest
extent possible, established plant administrative
and control procedures in order to maintain the
structure and format of the current program.
This approach provides at least three additional
benefits:
(i) It preserves the effectiveness and
successfulness of the current program.
(ii) It facilitates acceptance and implementation of the projects recommendations
when they are processed.
(iii) It allows incorporation of improvements
as soon as they are discovered, without the
necessity of waiting for major changes to
the PM program or analysis of every
system.
6WHS
6\VWHP VHOHFWLRQ DQG GHI
LQLWLRQ
Before a decision to perform an RCM analysis at
a plant is taken, two questions should be
considered:
Rausand&Vatn
System: A logical grouping of subsystems that
will perform a series of key functions, which
often can be summarized as one main function,
that are required of a plant (e.g. feed water,
steam supply, and water injection). The
compression system on an offshore gas
production platform may e.g. be considered as a
system. Note that the compression system may
consist of several compressors with a high
degree of redundancy. Redundant units
performing the same main function should be
included in the same system. It is usually easy to
identify the systems in a plant, since they are
used as logical building blocks in the design
process.
The system level is usually recommended as the
starting point for the RCM process. This is
further discussed and justified for example by
Smith (1993) and in MILSTD 2173. This
means that on an offshore oil/gas platform the
starting point of the analysis should be for
example the compression system, the water
injection system or the fire water system, and
not the whole platform.
The systems may be further broken down in
subsystems, and subsubsystems, etc. For the
purpose of the RCMprocess the lowest level of
the hierarchy should be what we will call an
RCM analysis item:
RCM analysis item: A grouping or collection of
components which together form some
identifiable package that will perform at least
one significant function as a standalone item
(e.g. pumps, valves, and electric motors). For
brevity, an RCM analysis item will in the
following be called an analysis item. By this
definition a shutdown valve, for example, is
classified as an analysis item, while the valve
actuator is not. The actuator is a supporting
equipment to the shutdown valve, and only has a
function as a part of the valve. The importance
of distinguishing the analysis items from their
supporting equipment is clearly seen in the
FMECA in Step 6. If an analysis item is found
to have no significant failure modes, then none
of the failure modes or causes of the supporting
equipment are important, and therefore do not
need to be addressed. Similarly if an analysis
item has only one significant failure mode then
the supporting equipment only needs to be
6WHS
)XQFWLRQDO IDLOXUH DQDO\VLV
))$
The objectives of this step are:
(i) to identify and describe the systemss
required functions,
(ii) to describe input interfaces required for the
system to operate, and
(iii) to identify the ways in which the system
might fail to function.
6
Rausand&Vatn
Rausand&Vatn
Fluid in
Pump fluid
Fluid out
El. power
Environment
Rausand&Vatn
Performance
Target value
Acceptable
deviation
Failure
Time
System:
Ref. drawing no.:
Operational
mode
Function
Performed by:
Date:
Function
System
requirements
failure mode
Page: of:
Criticality
S
E
Rausand&Vatn
In the first column of Figure 5 the various
operational modes of the system are recorded.
For each operational mode, all the relevant
functions of the system are recorded in column
2. The performance requirements to the
functions, like target values and acceptable
deviations (ref. Figure 4) are listed in column 3.
For each system function (in column 2) all the
relevant system failure modes are listed in
column 4. In column 5 a criticality ranking of
each system failure mode (functional failure) in
that particular operational mode is given. The
reason for including the criticality ranking is to
be able to limit the extent of the further analysis
by disregarding insignificant system failure
modes. For complex systems such a screening is
often very important in order not to waste time
and money.
The criticality ranking depends on both the
frequency/probability of the occurrence of the
system failure mode, and the severity of the
failure. The severity must be judged at the plant
level.
In the conceptual RCM model in Figure 1 the
system failure modes will be undesired events.
In addition the undesired events will also
include accidental events (like external impacts)
that are not normally identified as a loss of
system function. Such events are usually
identified by using various risk identification
checklists.
The severity ranking should be given in the four
consequence classes; (S) safety of personnel, (E)
environmental
impact,
(A)
production
availability, and (C) economic losses. For each
of these consequence classes the severity should
be ranked as for example (H) high, (M) medium,
or (L) low. How we should define the
borderlines between these classes, will depend
on the specific application.
If at least one of the four entries are (M) medium
or (H) high, the severity of the system failure
mode should be classified as significant, and the
system failure mode should be subject to further
analysis.
The frequency of the system failure mode may
also be classified in the same three classes. (H)
high may for example be defined as more than
once per 5 years, and (L) low less than once per
6WHS
&ULWLFDOLWHPVHOHFWLRQ
The objective of this step is to identify the
analysis items that are potentially critical with
respect to the system failure modes (functional
failures) identified in Step 3(iii). These analysis
items are denoted functional significant items
(FSI). Note that some of the less critical system
failure modes have been disregarded at this stage
of the analysis. Further, the two failure modes
total loss of function and partial loss of
function will often be affected by the same
items (FSIs).
For simple systems the FSIs may be identified
without any formal analysis. In many cases it is
obvious which analysis items that have influence
on the system functions.
For complex systems with an ample degree of
redundancy or with buffers, we may need a
formal approach to identify the functional
significant items. In the conceptual model in
Figure 1 the analysis item failures are classified
as basic events. This means that the causal
analysis in the conceptual model should be
pursued down to the analysis item level and not
further. As explained in section 2, the basic
events will also comprise events that are not
classified as analysis item failures, like human
10
Rausand&Vatn
MSI
A
System
function II
MSI
B
MSI
C
System
function III
MSI
1
System
function ;
Anal. item
0
MSIs considered
Figure 6 Relation between top level system functions and analysis items
11
6WHS
'DWDFROOHFWLRQDQGDQDO\VLV
The data necessary for the RCM analysis may
according to (Sandtorv & Rausand 1991) be
categorized in the following three groups:
1. Design data
Failure cause
Downtime
Failure consequences
Repair time (active and passive)
3. Reliability data
Reliability data may be derived from the
operational data. The reliability data is used
to decide the criticality, to mathematically
describe the failure process and to optimize
the time between PMtasks. The reliability
data includes:
Mean time to failure (MTTF).
Mean time to repair (MTTR).
Failure rate function z(t).
Performance requirements
12
Rausand&Vatn
zW(t) = ()(t)
Failure rate
Wearout limit
Time
Rausand&Vatn
6WHS
)DLOXUH PRGHV HIIHFWV DQG
FULWLFDOLW\DQDO\VLV
The objective of this step is to identify the
dominant failure modes of the MSIs identified
during Step 4.
3HUIRUPHGE\
5HIGUDZLQJQR
'DWH
Failure (IIHFWRIIDLOXUH
MSI Operational Function mode Consequence Worst case
mode
class
probability
S E A C S E A C
'HVFULSWLRQRIXQLW
MTTF Criticality
3DJHRI
Failure Failure
cause mechanism
%MTTF Failure
Maintenance Failure
characteristic action
characteristic
measure
Recommended
interval
Rausand&Vatn
15
Rausand&Vatn
6WHS
6HOHFWLRQ RI 0DLQWHQDQFH
$FWLRQV
This phase is the most novel compared to other
maintenance planning techniques. A decision
logic is used to guide the analyst through a
questionandanswer process. The input to the
RCM decision logic is the dominant failure
modes from the FMECA in Step 6. The main
idea is for each dominant failure mode to decide
whether a preventive maintenance task is
suitable, or it will be best to let the item
deliberately run to failure and afterwards carry
out a corrective maintenance task. There are
generally three reasons for doing a preventive
maintenance task:
(a) to prevent a failure
(b)to detect the onset of a failure
(c) to discover a hidden failure
Only the dominant failure modes are subjected
to preventive maintenance. To obtain
appropriate maintenance tasks, the failure causes
or failure mechanisms should be considered. The
idea of performing a maintenance task is to
prevent a failure mechanism to cause a failure.
Hence, the failure mechanisms behind each of
the dominant failure modes should be entered
into the RCM decision logic to decide which of
the following basic maintenance tasks that is
applicable:
Rausand&Vatn
Rausand&Vatn
Yes
Does a failure alerting
measurable indicator
exist?
Yes
Is continious
monitoring
feasible?
No
No
Yes
Is ageing parameter
>1?
Yes
Is overhaul
feasible?
No
No
Is the function
hidden?
Yes
Continious oncondition
task (CCT)
Scheduled oncondition
task (SCT)
Scheduled overhaul
(SOH)
Scheduled
replacement
(SRP)
Scheduled function
test (SFT)
No
No PM activity
found (RTF)
Rausand&Vatn
1
=
provided > 1.
Hence, to optimize the replacement interval,
estimates for the parameters; cm, cp, and are
required. cm is the total cost of a minimal repair,
including any harm to material, personnel and
environment. Assessing a value of cm may
therefore cause controversies. and are the
parameters in the failure distribution of the item.
Often it is more convenient to specify the failure
distribution in terms of mean time to failure
(MTTF) and the shape parameter , yielding:
1
077)
= 1
( + 1) 1
F S + FP: (W )
10
20
50
100
200
1.2
13.40
12.20
12.95
12.62
2.050
.897
.393
.165
.090
.050
1.5
8.19
7.97
1.22
.85
.590
.432
.253
.133
.083
.052
1.7
6.60
1.59
.83
.66
.503
.389
.247
.141
.093
.061
2.0
4.84
.86
.67
.57
.464
.377
.259
.161
.113
.080
2.5
.99
.71
.60
.54
.461
.394
.294
.202
.152
.115
3.0
.82
.67
.59
.54
.478
.421
.331
.242
.192
.152
4.0
.75
.66
.61
.57
.523
.476
.398
.316
.265
.223
Rausand&Vatn
Acceptable
deviation
FW
2
+ FU
+ I FK
2
2
FW
I FK FU
Target
value
=
Performance/
Condition
F S + FX: (W )
&
077) FW
I FK FU 077)
F
Failure
Time
P-F interval
& ( ) =
Rausand&Vatn
exponentially distributed.
In order to optimize Eq. (1) numerical values are
required for ci, cu, MTTF, D, and . Numerical
methods are usually required to optimize Eq.
(1). The calculations will be simplified if we
choose a distribution for TPF with a closed form
of the cumulative distribution function.
Model 5 - Continuos on-condition tasks
The idea of continuos on-condition monitoring
is to measure one or more indicator variable.
The reading of the component in this manner
can be used to detect a coming failure. The
variable being monitored is denoted X(t) in
Figure 11.
X(t)
"Failure Limit"
"Action Limit"
Time
Failure
6WHS
3UHYHQWLYH
PDLQWHQDQFH
FRPSDULVRQDQDO\VLV
Two
overriding
criteria
for
selecting
maintenance tasks are used in RCM. Each task
selected must meet two requirements:
It must be applicable
It must be effective
production
maintenance
unavailability
during
unavailability of protective
during maintenance of these
functions
Rausand&Vatn
(i.e., loss of warranty)
increased premiums for emergency repairs
(such as overtime, expediting costs, or
high replacement power cost).
Balancing the various cost elements to achieve a
global optimum will always be a challenge. The
conceptual RCM model in Figure 1 may be a
starting point. If such a model could be
established, and the various cost elements
incorporated, the trade-off analysis is reduced to
an optimization problem with a precisely
defined mathematical model.
Often the resources available for the RCM
analysis do not permit building such an overall
model, hence we can not expect to achieve a
global optimum. Sub-optimization can to some
extent be achieved by simplifying the model in
Figure 1. For example one could consider only
one consequence at a time and/or only one
maintenance task at a time.
6WHS
7UHDWPHQWRIQRQ06,V
In Step 4 critical items (MSIs) were selected for
further analysis. A remaining question is what to
do with the items which are not analyzed. For
plants already having a maintenance program it
is reasonable to continue this program for the
non-MSIs. If a maintenance program is not in
effect, maintenance should be carried out
according to vendor specifications if they exist,
else no maintenance should be performed. See
Paglia et al (1991). for further discussion.
6WHS
,PSOHPHQWDWLRQ
A necessary basis for implementing the result of
the RCM analysis is that the organizational and
technical maintenance support functions are
available. A major issue is therefore to ensure
the availability of the maintenance support
functions. The maintenance actions are typically
grouped into maintenance packages, each
package describing what to do, and when to do
it.
As indicated in the outset of this paper, many
accidents are related to maintenance work.
Can failures be
maintenance work?
etc.
introduced
during
6WHS
,QVHUYLFH GDWD FROOHFWLRQ
DQGXSGDWLQJ
As mentioned earlier, the reliability data we
have access to at the outset of the analysis may
be scarce, or even second to none. In our
opinion, one of the most significant advantages
of RCM is that we systematically analyze and
document the basis for our initial decisions, and,
hence, can better utilize operating experience to
adjust that decision as operating experience data
is collected. The full benefit of RCM is therefore
only achieved when operation and maintenance
experience is fed back into the analysis process.
The process of updating the analysis results is
also important due to the fact that nothing
remain constant, best seen considering the
following arguments (Smith 1993):
Rausand&Vatn
',6&866,216
&21&/86,216
$1'
*HQHUDOEHQHILWV
Cross-discipline utilization of knowledge: To
fully utilize the benefits of the RCM concept,
one needs contributions from a wider scope of
disciplines than what is common practice. This
means that an RCM analysis requires
contribution from the three following discipline
categories working closely together:
1. System/reliability analyst
2. Maintenance/operation specialist
3. Designer/manufacturer
All these categories do not need to take part in
the analysis on a full time engagement. They
should, however, be deeply involved in the
process during pre- and post-analysis review
meetings, and quality review of final results.
The result of this is that knowledge is extracted
and commingled across traditional discipline
borders. It may, however, cost more at the outset
to engage all these personnel categories.
Traceability of decisions: Traditionally, PM
programs tend to be cemented. After some
time one hardly knows on what basis the initial
decisions were made and therefore do not want
to change those decisions. In the RCM concept
all decisions are taken based on a set of
analytical steps, all of which should be
documented in the analysis. When operating
experience accumulates, one may go back and
see on what basis the initial decisions were
taken, and adjust the tasks and intervals as
required based on the operating experience. This
is especially important for initial decisions based
on scarce data.
Recruitment
of
skilled
personnel
for
maintenance planning and execution: The RCM
way of planning and updating maintenance
23
Rausand&Vatn
requires more professional skills, and is
therefore a greater challenge for skilled
engineers. It also provides the engineers with a
broader and more attractive way of working with
maintenance than what sometimes is common
today.
Cost aspects: As indicated, RCM will require
more efforts both in skills and manhours when
first being introduced in a company. It is,
however, documented by many companies and
organizations that the long term benefits will far
outweigh the initial extra costs. One problem is
that the return of investment has to be looked
upon in a long term perspective, something that
the management is not always willing to take a
chance on.
Benefits related to PM-program achievement:
Based on the case studies we have carried out,
and experience published by others, the general
achievements of RCM in relation to a traditional
PM-programs may be summarized as follows:
3UREOHPDUHDVLQWKHDQDO\VLV
Identification of Maintenance Significant Items:
In some cases there may be very little to achieve
by limiting the analysis to only include the
MSIs. Smith (1993) argues that concentrating on
critical components (MSIs) is directly wrong
Rausand&Vatn
Lack of reliability data will always be a
problem. First of all there are problems with
getting access to operational data with sufficient
quality. Next, even if we have data, it is not
straight-forward to obtain reliability data from
the operational data. Before we discuss some
problems with collecting and using operational
data, it should be emphasized that there will
never be a complete lack of reliability figures.
Even if no operational data is available, expert
judgment will be available. However, the
uncertainty in the reliability figures can be very
large.
Based on our various engagements in the
OREDA project and other data collection
projects on offshore installations, we have
experienced the following common difficulties
related to acquisition of failure data:
&RQFOXVLRQV
RCM is not a simple and straightforward way of
optimizing maintenance, but ensures that one
does not jump to conclusions before all the right
questions are asked and answers given. RCM
can in many respects be compared with Quality
Assurance. By rephrasing the definition of QA,
RCM can be defined
25
Rausand&Vatn
All
systematic
actions
required to plan and verify
that the efforts spent on
preventive maintenance are
applicable and cost-effective.
Thus, RCM does not contain any basically new
method. Rather, RCM is a more structured way
of utilizing the best of several methods and
disciplines. Quoting Malik (1990) the author
postulates: . . . there is more isolation between
practitioners of maintenance and the
researchers than in any other professional
activity. We see the RCM concept as a way to
reduce this isolation by closing the gap
between the traditionally more design related
reliability methods, and the practical related
operating and maintenance personnel.
5()(5(1&(6
R. T. Anderson and L. Neri. Reliability-Centered
Maintenance. Management and Engineering
Methods. Elsevier Applied Science, London,
1990.
T. Aven. Reliability and Risk Analysis. Elsevier
Science Publishers, London, 1992.
K. M. Blache and A. B. Shrivastava. Defining
failure of manufacturing machinery &
equipment. Proceedings Annual Reliability
and Maintainability Symposium, pages 6975, 1994.
B. S. Blanchard and W. J. Fabrycky. System
Engineering and Analysis. Prentice-Hall,
Inc., Englewood Cliffs, New Jersey 07632,
1981.
BS 5760-5. Reliability of systems, equipments and
components; Part 5: Guide to failure modes,
effects and criticality analysis (FMEA and
FMECA). British Standards Institution,
London, 1991.
N. Cross. Engineering Design Methods: Strategies
for Product Design. John Wiley & Sons,
Chichester, 1994.
R. R. Hoch. A Practical Application of Reliability
Centered Maintenance. The American
Society of Mechanical Engineers, 90JPGC/Pwr-51, Joint ASME/IEEE Power
Gen. Conf., Boston, MA, 21-25 Oct., 1990.
A. Hyland and M. Rausand. Reliability Theory;
Models and Statistical Methods. John Wiley
& Sons, New York, 1994.
Rausand&Vatn
27