Professional Documents
Culture Documents
Reliability-Centered Maintenance
11.1 Introduction
In the 1960s, the airline industry in the USA depended heavily on time-based
preventive maintenance due to the belief that every component of a complex system
has an age where a complete overhaul is needed to ensure safety and reliability. Due
to that there was a hypothesis that the airline is performing excessive maintenance
that is not economically viable and may be not needed. This hypothesis led to the
launching of a major study to validate the failure characteristics of aircraft com-
ponents. The study resulted in what became the Handbook for the Maintenance
Evaluation and Program Development for the Boeing 747, more commonly known
as MSG-1 (Maintenance Steering Group 1). MSG-1 was subsequently improved
and became MSG-2 and was used for the certification of DC 10 and L 1011. In
1979, the Air Transport Association (ATA) reviewed MSG-2 to incorporate further
developments in preventive maintenance; this resulted in MSG-3, the Airline/
Manufacturers Maintenance Program Planning Document applied subsequently to
Boeing 757.
Furthermore, United States Federal Aviation Administration (FAA) commis-
sioned United Airlines to undertake a study of the effectiveness of time-based
overhauls of complex components in the equipment systems of civilian jet aircraft.
There was a belief that these time-based overhauls did little to reduce the frequency
of failure and were uneconomical. This study was conducted at a time when
wide-bodied aircraft were being designed, and the complexity of the equipment
systems and their components was increased dramatically over that of prior designs.
The key conclusion was that time-based overhauls of complex equipment did not
significantly affect positively or negatively the frequency of failure. In some
equipment, the frequency of failure was actually higher immediately following the
overhaul. This study showed that the so-called bathtub conditional probability of
failure versus age curve was only one of six major failure patterns. The most
common failure pattern in complex equipment is one which shows a high “infant
mortality,” that is, the highest conditional probability of failure occurs in the first
few periods of the equipment’s age and then diminishes to a constant rate of failure,
thereafter, as described in Fig. 11.1.
Scheduled, time-based overhauls “reset” the age back to zero, thereby increasing
the probability of failure. For most of the life of complex equipment, failure is
related to random events, such as shock loading, electrical surges, poor lubrication
practices, and improper operation. These random events cause an accelerated
deterioration of the equipment’s performance, which often can be monitored using
condition-based predictive maintenance techniques. This study leads to the birth of
reliability-centered maintenance (RCM).
Reliability-centered maintenance is a logical methodology derived from this
research in the aviation sector and uses the failure mode, effect, and criticality
analysis (FMECA) tool. RCM is a process used to identify the most applicable and
effective maintenance action(s) to ensure the highest practical standard of operating
performance of a system or a component.
The purpose of this chapter is to present RCM as a viable approach for optimizing
maintenance of systems by having an optimal mix of run to failure, time-based,
condition-based, and design modification maintenance tasks. Section 11.2 outlines
RCM principles and benefits followed the steps of RCM in Sect. 11.3. Section 11.4
outlines failure and its nature followed by RCM steps. Section 11.5 contains RCM
implementation, and Sect. 11.6 provides a summary of the chapter.
The success of RCM comes from the fact that it has clear goals, sound principles,
and effective strategies. The main goal was to determine the optimal maintenance
program using various maintenance strategies. RCM has principles that distinguish
it from all other methodologies for developing maintenance programs. Some
authors refer to these principles as RCM philosophy. In this section, RCM goals,
principles, and benefits are presented.
11.2 RCM Goals, Principles, and Benefits 247
RCM has the following key principles that distinguish it from other methodologies
for maintenance:
• Preservation system of equipment function: The focus here is to keep the system
performing its function not to keep it operating as though it is new. This tells us
that as far as the system performing its function, there is no need for excessive
maintenance which may cause failure in some cases. This principle has led to a
reduction in time-based preventive maintenance in the airline industry that
reduced cost and improved reliability of systems. Redundancy of function
through multiple equipment improves functional reliability, but increases life
cycle cost in terms of procurement and operating costs.
• Focus on systems: RCM focuses on systems than component, since functions
are usually driven by systems.
• RCM is reliability-centered: It treats failure statistics as it relates to age. It seeks
to know the conditional probability of failure at specific ages (the probability
that failure will occur in each given operating age bracket).
• Safety and economics are the key criteria: Safety must be ensured first at any
cost; followed by costs that result from the impact on production and operation.
248 11 Reliability-Centered Maintenance
The primary driving force behind the invention of RCM is the need to develop a
maintenance strategy that can adequately address the systems availability and safety
without creating a totally impractical cost requirement. RCM benefits include the
following [5, 7]:
• Increase in Plant availability.
• Survey conducted by Electric Power Research Institute provided more evidence
that RCM has impact on cost.
• Plant trip reduction.
• Documented basis for preventive maintenance (PM).
• Efficient PM planning.
• Decrease in corrective maintenance.
• More accurate spare parts identification.
Smith [8] provided the following distribution in the type of maintenance strategies
to demonstrate the impact of RCM brought in power plants maintenance
(Table 11.1).
Table 11.1 Impact of RCM Type of maintenance tasks Years: task distribution in
on maintenance task %
distribution
1964 1969 1987
Time directed 58 31 9
Condition directed 40 37 40
Run to failure 2 32 51
11.3 RCM Methodology 249
RCM has a systematic methodology that consists of seven steps. The seven steps
are as follows [3, 6, 8, 9]:
1. System selection and information collection;
2. System boundary definition;
3. System description and functional block diagram;
4. Functions and functional failure;
5. Failure mode and effective analysis (FMEA);
6. Logic decision tree analysis (LTA); and
7. Task selection.
In the next subsections, each step is explained and described.
RCM is best presented and implemented at a system level due to the fact that
functions are best captured at the system level. The component level lacks defining
significance of functions and functional failure, while plant-level analysis makes the
whole analysis intractable. The important question faced at this stage is which
system should be selected? The following are criteria that guide the selection:
1. Systems with a high number of corrective maintenance tasks during recent
years;
2. Systems with a high number of preventive maintenance tasks and or costs
during recent years;
3. A combination of scheme 1 and 2;
4. System with a high cost of maintenance;
5. Systems contributing significantly toward plant outages/shutdowns (full or
partial) during recent years;
6. Systems with high concern relating to safety; and
7. Systems with high concern relating to environment.
Past experience has shown that all of these criteria except scheme 6 and 7 yield
more or less the same results. An indicator of a good selection is that systems
chosen for RCM program results in a significant improvement over the current
situation.
The next task, after selecting a system, is collecting information related to the
selected system. A good practice is to start collecting key information and docu-
ment right at the onset of the process. The following are documents that may be
required in a typical RCM study:
• P&ID (piping and instrumentation) diagram;
• Systems schematic and/or block diagram;
250 11 Reliability-Centered Maintenance
This step is important and will set the stage for a successful RCM process. The step
has the following five elements:
• System description;
• Functional block diagram;
• In/out interfaces;
• System work breakdown structure; and
• Equipment history.
This step generally involves a form that documents baseline characterization of a
system which is eventually expected to be used in stipulating PM tasks.
11.3 RCM Methodology 251
Noisy
Transmitter Encoder Decoder Receiver
Channel
Fig. 11.4 Typical RCM system analysis form for interface definition
Fig. 11.5 Typical RCM system analysis form for system work breakdown structure
11.3 RCM Methodology 253
The fourth step identifies and lists all system functions. As a guide for identifying
functions, every out interface should be captured into a function statement and any
internal out interfaces between functional subsystems can be a source for a function.
An important point to note is that these statements are for defining system functions
and not the equipment. With the definition of system functions comes the functional
failures. In RCM, the focus is on functions and functional failures. The functional
failures are more than just a single statement of loss of function. The loss conditions
may be two or more (e.g., complete paralysis of the plant or major or minor
deprivation of functionality). This distinction is important and will lead to the
proper ranking of functions and functional failures. Figure 11.7 provides a form to
document functions and functional failures. The following are the examples for the
correct and wrong statement of functions:
• Provide 1500 psi safety relief valves (wrong statement because the statement is
about equipment);
• Provide for pressure relief above 1500 psi (correct; the focus is on function);
• Provide a 1500 gallon per minute (gpm) centrifugal pump on the discharge side
of header 26 (wrong); and
• Maintain a flow of 1500 gpm at the outlet of header 2
FAILURE DESCRIPTION
Failure modes and effects analysis (FMEA) is a basic tool used in reliability
engineering to assess the impact of failures. It is a systematic failure analysis
technique that is used to identify the failure modes, their causes, and consequently
their fallouts on the system function. FMEA analysis rates each potential failure
mode and effect based on the following three factors:
11.3 RCM Methodology 255
FMEA process is usually documented using a matrix similar to the one shown in
Fig. 11.8.
The purpose of the LTA is to prioritize the resources to be committed to each failure
mode. The prioritization is based on the impact of the failure mode. RCM processes
a simple and intuitive structure for this purpose. The structure utilizes two criteria,
i.e., safety and cost, that arise from plant full outage. The LTA has three questions
that enable a user, with minimal efforts, to place each failure mode into one of the
six categories. Each question is answered as yes or no only. Each category (also
known as a bin) forms natural segregation of items of respective importance.
The LTA scheme is shown below in Fig. 11.9.
Failure modes
B C
The six classification categories for the failures are A, B, C, D/A, D/B, or D/C.
For the priority scheme, A and B have higher priority over C when it comes to
allocation of scarce resources and A is given higher priority than B. In summary,
the priority for PM task goes in the following order:
1. A or D/A;
2. B or D/B; and
3. C or D/C.
In this step, RCM methodology allocates PM tasks and resources. This is the stage
where the maximum benefit from RCM may be obtained. The task selection process
requires that each selected task must be applicable and effective. Here, “applicable”
means that the task should be able to prevent failures, detect failures, or unearth
hidden failures, while “effective” is related to the cost effectiveness of the alternative
PM strategies. If no PM task is selected through the LTA, the only option is to run
equipment to failure. This activity requires contribution from the maintenance per-
sonal as their experience is invaluable in the correct selection of the PM task. After
selecting the tasks, the set of all run-to-failure (rtf) tasks are subjected to a final sanity
11.3 RCM Methodology 257
check. The purpose of the check is to review critically all component failures that are
treated as run-to-failure cases to see if this task is appropriate. If an rtf task fails any
of the following tests or creates a conflict, the PM or the current task is kept. The
following are the checks:
• Marginal effectiveness: It is not clear that the rtf costs are significantly less than
the current PM costs.
• High-cost failure: While there is no loss of critical function, the failure mode is
likely to cause extensive damage to the component that should be avoided.
• Secondary damage: Similar to the second item, except that there is a high
probability extensive damage in neighboring components.
• OEM conflict: The original manufacturer recommends a PM task that is not
supported by RCM. It is very sensitive if warranty conditions are involved.
• Internal conflict: Maintenance or operation feels strongly about the PM task that
is not supported by RCM.
• Regulatory conflict: Regulatory body established the PM, such as the
Environmental Protection Agency (EPA).
• Insurance conflict: similar to the above two.
organization’s need. The SAE standards are based on the following seven questions
that align very well with the seven steps presented in Sect. 11.3:
1. What is the item supposed to do and its associated performance standards?
2. In what ways can it fail to provide the required functions?
3. What are the events that cause each failure?
4. What happens when each failure occurs?
5. In what way does each failure matter?
6. What systematic task can be performed proactively to prevent, or to diminish to
a satisfactory degree, the consequences of the failure?
7. What must be done if a suitable preventive task cannot be found?
RCM implementation can be viewed as a process with four stages [1, 2, 4, 7]. Each
stage consists of a number of tasks that must be executed in order to ensure
successful implementation. The four stages are as follows:
• Stage 1: Planning and organizing for RCM.
• Stage 2: Analysis and design.
• Stage 3: Scheduling and execution.
• Stage 4: Assessment and improvement.
The following subsections present the process for RCM implementation.
This stage is important for the success of RCM implementation. Top maintenance
management must seek organization commitment for the RCM project and ensure
the needed resources are provided. In some situations, it may be better to conduct a
pilot RCM project. The following must be addressed carefully:
1. Organization: Formation of the RCM team and selection of a facilitator. The
facilitator must be knowledge about the RCM process. The team must include
experienced people in the areas that will be impacted by the application of the
RCM. The team must use a clear system for reporting progress and challenges.
2. Training: A training program on RCM should be conducted at different levels.
Management should be provided an awareness program about RCM and its
benefit. A well-structured training program on RCM methodology and imple-
mentation should be provided to the team.
3. Avail resources: Estimate what type of resources is needed and ensure their
availability.
11.5 RCM Implementation 259
4. Manage expectations: Establish a baseline for the current performance and the
expected benefits from RCM.
5. Schedule: Prepare a schedule for RCM project on a Gantt chart with the nec-
essary resources. A schedule will facilitate follow-up and monitoring.
6. Change management program: Develop a change management program to
mitigate resistance to the RCM project and ensure buying. The program may
have an awareness program, training, and reorganization.
This stage deals with the development of the optimized maintenance program using
RCM methodology. It includes the selection of the methodology and its application.
The team is expected to follow the RCM steps. For each maintenance task design
its maintenance procedure. The procedure should specify task requirements in terms
of manpower, spare parts, standard time, and schedule. The outcomes of this stage
are maintenance tasks for each functional failure.
In this stage, the team integrates RCM tasks in the maintenance schedule. The
impact of the new tasks is estimated in terms of benefits and cost. The change
management program is reviewed and enhanced if needed.
At this stage, the RCM project is implemented either in full or as a pilot case. The
impact of the RCM project must be measured and assessed using relevant realistic
performance measures. The measures may include availability, quality rate, reli-
ability, and cost. Comparisons with prior RCM performance and current target will
reveal RCM impact and aide in identifying needed improvement.
11.6 Summary
Exercises
References
1. Anderson RT, Neri L (1990) Reliability centered maintenance: management and engineering
methods. Elsevier Applied Sciences, London
2. August J (1999) Applied reliability-centered maintenance. Pen Well, Oklahoma
3. http://www.numeratis.com/rcm/talktomanufacturers
4. Moubray J (1997) Reliability centered maintenance. Butterworths-Heineman, Oxford
5. NASA Reliability Centered Maintenance Guide for Facilities and Collateral Equipment, 3–24
February 2000
6. Nowlan FS, Heap HF (1978) Reliability centered maintenance, United Airlines and Dolby
Press, sponsored and published by the Office of Assistant Secretary of Defense
7. Siddiqui AW, Ben Daya M (2009) Reliability centered maintenance. In: Ben-Day M,
Duffuaa S, Raouf A, Knezevic J, Ait-Kadi D (eds) Handbook of maintenance management and
engineering, edited by. Springer, London
8. Smith AM, Hinchcliffe GR RCM Gateway to world class maintenance. Elsevier
Butterworth-Heinemann, Burlington
9. The Collinson Corporation Introduction to reliability-centered. reprinted courtesy of: P.O. Box
495578 Garland, Texas 75049–5578