You are on page 1of 11

Reliability-centered maintenance (RCM)

@-Is used to control maintenance costs while improving reliability.

@_Is a proven technique to reduce overall maintenance costs and analyze


the functions of systems. Developed by the aerospace industry, RCM
provides a logical method to identify failure modes; criticality of these
failures; proposed procedures to address the consequences of these
failures; and recommendations regarding the design of the system. This
technique does not assign a task to a problem, but rather looks at the
outcome of the failures.
@-Is a proven technique used to detect possible failure modes. Its
intention is to identify problems and address it’s a proven technique used
to detect possible failure modes. Its intention is to identify problems and
address them so that failure is prevented. It systematically takes a piece
of equipment and forces you to analyze it in detail, one component at a
time and one assembly at a time. When properly performed and applied
you will slash your production problems and maintenance costs. Hem so
that failure is prevented. It systematically takes a piece of equipment and
forces you to analyze it in detail, one component at a time and one
assembly at a time. When properly performed and applied you will slash
your production problems and maintenance costs.
@-The RCM process considers the maintenance requirements of each
asset before asking whether it is necessary to reconsider the design. This
is simply because the maintenance engineer who is on duty today has to
maintain the equipment as it exists today, not what should be there or
what might be there at some stage in the future.
@-Transforms the relationships between the undertakings which use it,
their existing physical assets and the people who operate and maintain
those assets. It also enables new assets to be put into effective service
with great speed, confidence and precision. 

@-Is the optimum mix of reactive, time- or interval-based, condition-


based, and proactive maintenance practices the basic application of each
strategy is shown in Fig. 1 These principal maintenance strategies, rather
than being applied independently, are integrated to take advantage of their
respective strengths in order to maximize facility and equipment reliability
while minimizing life-cycle costs.
Fig. 1. Components of an RCM Program.

RCM Principles

The primary RCM principles are:


 RCM is Function Oriented—RCM seeks to preserve system or
equipment function, not just operability for operability's sake.
Redundancy of function, through multiple pieces of equipment,
improves functional reliability but increases life-cycle cost in
terms of procurement and operating costs.
 RCM is System Focused—RCM is more concerned with maintaining
system function than with individual component function.
 RCM is Reliability Centered—RCM treats failure statistics in an
actuarial manner. The relationship between operating age and the
failures experienced is important. RCM is not overly concerned
with simple failure rate; it seeks to know the conditional probability
of failure at specific ages (the probability that failure will occur in
each given operating age bracket).
 RCM Acknowledges Design Limitations—RCM objective is to
maintain the inherent reliability of the equipment design,
recognizing that changes in inherent reliability are the province of
design rather than of maintenance. Maintenance can, at best, only
achieve and maintain the level of reliability for equipment that was
provided for by design. However, RCM recognizes that
maintenance feedback can improve on the original design. In
addition, RCM recognizes that a difference often exists between
the perceived design life and the intrinsic or actual design life and
addresses this through the Age Exploration (AE) process.
 RCM is Driven by Safety, Security, and Economics—Safety and
security must be ensured at any cost; thereafter, cost-
effectiveness becomes the criterion.
 RCM Defines Failure as "Any Unsatisfactory Condition"—
Therefore, failure may be either a loss of function (operation
ceases) or a loss of acceptable quality (operation continues but
impacts quality).
 RCM Uses a Logic Tree to Screen Maintenance Tasks—this
provides a consistent approach to the maintenance of all kinds of
equipment.
 RCM Tasks Must Be Applicable—the tasks must address the failure
mode and consider the failure mode characteristics.
 RCM Tasks Must Be Effective—the tasks must reduce the
probability of failure and be cost-effective.
 RCM Acknowledges Three Types of Maintenance Tasks—these
tasks are time-directed (PM), condition-directed (CM), and failure
finding (one of several aspects of Proactive Maintenance). Time-
directed tasks are scheduled when appropriate. Condition-directed
tasks are performed when conditions indicate they are needed.
Failure-finding tasks detect hidden functions that have failed
without giving evidence of pending failure. Additionally, performing
no maintenance, Run-to-Failure, is a conscious decision and is
acceptable for some equipment.
 RCM is a Living System—RCM gathers data from the results
achieved and feeds this data back to improve design and future
maintenance. This feedback is an important part of the Proactive
Maintenance element of the RCM program

Types of RCM
There are several ways to conduct and implement an RCM program. The
program can be based on rigorous Failure Modes and Effects Analysis
(FMEA), complete with mathematically-calculated probabilities of failure
based on design or historical data, intuition or common-sense, and/or
experimental data and modeling. These approaches may be called
Classical, Rigorous, Intuitive, Streamlined, or Abbreviated. Other terms
sometimes used for these same approaches include Concise, Preventive
Maintenance (PM) Optimization, Reliability Based, and Reliability
Enhanced. All are applicable. The decision of what technique to use
should be left to the end user and be based on:
 Consequences of failure
 Probability of failure
 Historical data available
 Risk tolerance
 Resource availability
. RCM Analysis
The RCM analysis should carefully consider and answer the following questions:
 What does the system or equipment do; what are the functions?
 What functional failures are likely to occur?
 What are the likely consequences of these functional failures?
 What can be done to reduce the probability of the failure(s), identify the onset of
failure(s), or reduce the consequences of the failure(s)?

Answers to these four questions can be used with the decision logic tree depicted in Fig. 3,
Reliability-Centered Maintenance (RCM) Decision Logic Tree, to determine the maintenance
approach for the equipment item or system.

Fig. 3. Reliability Centered Maintenance (RCM) Logic Tree

Note that the analysis process as depicted in Fig. 3 has only four possible outcomes:
 Perform Condition-Based actions (CM).
 Perform Interval (Time- or Cycle-) Based actions (PM).
 Determine that redesign will solve the problem and accept the failure risk, or determine
that no maintenance action will reduce the probability of failure install redundancy.
 Perform no action and choose to repair following failure (Run-to-Failure).
D. Failure

Failure is the cessation of proper function or performance. RCM examines failure at several
levels: the system level, sub-system level, component level, and sometimes even the parts level.
The goal of an effective maintenance organization is to provide the required system performance
at the lowest cost. This means that the maintenance approach must be based on a clear
understanding of failure at each of the system levels. System components can be degraded or
even failed and still not cause a system failure. A simple example is the failed headlamp on an
automobile. That failed component has little effect on the overall system performance.
Conversely, several degraded components may combine to cause the system to have failed, even
though no individual component has itself failed.

System and System Boundary

A system is any user-defined group of components, equipment, or facilities that support an


operational requirement. These operational requirements are defined by mission criticality or by
environmental, health, safety, regulatory, quality, or other agency/business defined requirements.
Most systems can be divided into unique sub-systems along user-defined boundaries. The
boundaries are selected as a method of dividing a system into subsystems when its complexity
makes an analysis by other means difficult:
 A system boundary or interface definition contains a description of the inputs and outputs
that cross each boundary.
 The facility envelope is the physical barrier created by a building, enclosure, or other
structure; e.g., a cooling tower or tank.
 Standardize on selecting boundaries. For example, a pump could include the first
upstream/downstream isolation valve, the coupling, and associated gauges. The motor
would include the electrical circuit from the load side of the motor control center but not
the coupling.

The intent is to develop a series modular FMEAs and assemble them as if they were Lego®
Blocks and select the maintenance actions based on the consequences of risk determined by the
criticality and probability factors defined in Tables 1 and 2 respectively.

Function and Functional Failure

The function defines the performance expectation and can have many elements. Elements include
physical properties, operation performance including output tolerances, and time requirements
such as continuous operation or limited required availability.

Functional failures are descriptions of the various ways in which a system or subsystem can fail
to meet the functional requirements designed into the equipment. A system or subsystem that is
operating in a degraded state but does not impact any of the requirements addressed in System
and System Boundary, has not experienced a functional failure.

It is important to determine all of the functions of an item that are significant in a given
operational context. By clearly defining the functions' non-performance, the functional failure
becomes clearly defined. For example, it is not enough to define the function of a pump to move
water. The function of the pump must be specific and defined in such terms flow rate, discharge
pressure, vibration levels, B10 (L10) Life efficiency, etc. (Reliability HotWire)

Failure Modes

Failure modes are equipment- and component-specific failures that result in the functional failure
of the system or subsystem. For example, a machinery train composed of a motor and pump can
fail catastrophically due to the complete failure of the windings, bearings, shaft, impeller,
controller, or seals. In addition, a functional failure also occurs if the pump performance degrades
such that there is insufficient discharge pressure or flow to meet operating requirements. These
operational requirements should be considered when developing maintenance tasks.
Dominant failure modes are those failure modes responsible for a significant proportion of all the
failures of the item. They are the most common modes of failure.

Not all failure modes or causes warrant preventive or conditioned based maintenance because
the likelihood of their occurring is remote or their effect is inconsequential.

Reliability

Reliability is the probability that an item will survive a given operating period, under specified
operating conditions, without failure usually expressed as B 10 (L10) Life and/or Mean Time to
Failure (MTTF) or Mean Time Between Failure (MTBF). The conditional probability of failure
measures the probability that an item entering a given age interval will fail during that interval. If
the conditional probability of failure increases with age, the item shows wear-out characteristics.
The conditional probability of failure reflects the overall adverse effect of age on reliability. It is
not a measure of the change in an individual equipment item.

Failure rate or frequency plays a relatively minor role in maintenance programs because it is too
simple a measure. Failure frequency is useful in making cost decisions and determining
maintenance intervals, but it tells nothing about which maintenance tasks are appropriate or
about the consequences of failure. A maintenance solution should be evaluated in terms of the
safety, security, or economic consequences it is intended to prevent. A maintenance task must be
applicable (i.e., prevent failures or ameliorate failure consequences) in order to be effective.

Failure Characteristics

Conditional probability of failure (Pcond) curves fall into six basic types, as graphed (Pcond versus
Time) in Figs. 2-2 and 2-3, Random Conditional Probability of Failure Curves and Age Related
Conditional Probability of Failure Curves. The percentage of equipment conforming to each of the
six wear patterns as determined in three separate studies is also shown in both figures. (More)

The failure characteristics shown in Figs. 4 and 5, Random Conditional Probability of Failure
Curves, were first noted in the previously cited book, Reliability-Centered Maintenance. Follow-
on studies in Sweden in 1973, and by the U.S. Navy in 1983, produced similar results.
In these three studies, random failures accounted for 77-92% of the total failures and
age related failure characteristics for the remaining 8-23%.

Fig. 4. Random conditional probability of failure curves


Fig. 5. Random conditional probability of failure curves

The basic difference between the failure patterns of complex and simple items has important
implications for maintenance. Single-piece and simple items frequently demonstrate a direct
relationship between reliability and age. This is particularly true where factors such as metal
fatigue or mechanical wear are present or where the items are designed as consumables (short
or predictable life spans). In these cases an age limit based on operating time or stress cycles
may be effective in improving the overall reliability of the complex item of which they are a part.

Complex items frequently demonstrate some infant mortality, after which their failure probability
increases gradually or remains constant. A marked wear-out age is not common. In many cases
scheduled overhaul increases the overall failure rate by introducing a high infant mortality rate
into an otherwise stable system.

Preventing Failure

Every equipment item has a characteristic that can be called resistance to or margin to failure.
Using equipment subjects it to stress that can result in failure when the stress exceeds the
resistance to failure. Fig. 6, Preventing Failure, depicts this concept graphically. The figure
shows that failures may be prevented or item life extended by:
 Decreasing the amount of stress applied to the item. The life of the item is extended for
the period f0-f1 by the stress reduction shown.
 Increasing or restoring the item's resistance to failure. The life of the item is extended
for the period f1-f2 by the resistance increase shown.
 Decreasing the rate of degradation of the item's resistance to or margin to failure. The
life of the item is extended for the period f2-f3 by the decreased rate of resistance
degradation shown.
Fig. 6. Preventing failure

Stress is dependent on use and may be highly variable. It may increase, decrease, or remain
constant with use or time. A review of the failures of a large number of nominally identical simple
items would disclose that the majority had about the same age at failure, subject to statistical
variation, and that these failures occurred for the same reason. If one is considering preventive
maintenance for some simple item and can find a way to measure its resistance to failure, he or
she can use that information to help select a preventive task.

Adding excess material or changing the type of material that wears away or is consumed can
increase resistance to failure or the rate of degradation. Excess strength may be provided to
compensate for loss from corrosion or fatigue. The most common method of restoring resistance
is by replacing the item. The resistance to failure of a simple item decreases with use or time
(age), but a complex unit consists of hundreds of interacting simple items (parts) and has a
considerable number of failure modes. In the complex case, the mechanisms of failure are the
same, but they are operating on many simple component parts simultaneously and interactively
so that failures no longer occur for the same reason at the same age. For these complex units, it
is unlikely that one can design a maintenance task unless there are a few dominant or critical
failure modes.

Failure Modes and Effects Analysis (FMEA)

FMEA is applied to each system, sub-system, and component identified in the boundary
definition. For every function identified, there can be multiple failure modes. The FMEA
addresses each system function (and, since failure is the loss of function, all possible failures)
and the dominant failure modes associated with each failure, and then examines the
consequences of the failure. What effect did the failure have on the mission or operation, the
system, and on the machine?

Even though there are multiple failure modes, often the effects of the failure are the same or
very similar in nature. That is, from a system function perspective, the outcome of any
component failure may result in the system function being degraded.

Likewise, similar systems and machines will often have the same failure modes. However, the
system use will determine the failure consequences. For example, the failure modes of a ball
bearing will be the same regardless of the machine. However, the dominate failure mode will
often change from machine to machine, the cause of the failure may change, and the effects of
the failure will differ.

Fig. 7, FMEA Worksheet, provides an example of a FMEA worksheet.

Fig. 7. FMEA Worksheet

E. Criticality and Probability of Occurrence

Criticality assessment provides the means for quantifying how important a system function is
relative to the identified Mission. Table 1, Criticality/Severity Categories, provides a method for
ranking system criticality. This system, adapted from the automotive industry, provides 10
categories of Criticality/Severity. It is not the only method available. The categories can be
expanded or contracted to produce a site-specific listing.

Table 1. Criticality/Severity Categories


Ranking Effect Comment
No reason to expect failure to have any effect on safety, health,
1 None
environment, or mission.
Minor disruption to facility function. Repair to failure can be
2 Very Low
accomplished during trouble call.
Minor disruption to facility function. Repair to failure may be longer
3 Low
than trouble call but does not delay mission.
Low to Moderate disruption to facility function. Some portion of mission may
4
Moderate need to be reworked or process delayed.
5 Moderate Moderate disruption to facility function. 100% of mission may need to
be reworked or process delayed.
Moderate to Moderate disruption to facility function. Some portion of mission is
6
High lost. Moderate delay in restoring function.
High disruption to facility function. Some portion of mission is lost.
7 High
Significant delay in restoring function.
High disruption to facility function. All of mission is lost. Significant
8 Very High
delay in restoring function.
Potential safety, health, or environmental issue. Failure will occur with
9 Hazard
warning.
Potential safety, health, or environmental issue. Failure will occur
10 Hazard
without warning.
Reliability, Maintainability, and Supportability Guidebook, Third Edition, Society of Automotive
Engineers, Inc., Warren dale, PA, 1995.

The Probability of Occurrence (of Failure) is also based on work in the automotive industry.
Table 2, Probability of Occurrence Categories, provides one possible method of quantifying the
probability of failure. If there is historical data available, it will provide a powerful tool in
establishing the ranking. If the historical data is not available, a ranking may be estimated based
on experience with similar systems in the facilities area. The statistical ("Effect") column in
Table 2 can be based on operating hours, day, cycles, or other unit that provides a consistent
measurement approach. The statistical bases ("Comment") may be adjusted to account for local
conditions. For example, one organization changed the statistical approach for ranking 1 through
5 to better reflect the number of cycles of the system being analyzed.

Table 2. Probability of Occurrence Categories


Ranking Effect Comment
1 1/10,000 Remote probability of occurrence; unreasonable to expect failure to occur.
Low failure rate. Similar to past design that has, in the past, had low failure
2 1/5,000
rates for given volume/loads.
Low failure rate. Similar to past design that has, in the past, had low failure
3 1/2,000
rates for given volume/loads.
Occasional failure rate. Similar to past design that has, in the past, had
4 1/1,000
similar failure rates for given volume/loads.
Moderate failure rate. Similar to past design that has, in the past, had
5 1/500
moderate failure rates for given volume/loads.
Moderate to high failure rate. Similar to past design that has, in the past,
6 1/200
had moderate failure rates for given volume/loads.
High failure rate. Similar to past design that has, in the past, had high failure
7 1/100
rates that has caused problems.
High failure rate. Similar to past design that has, in the past, had high failure
8 1/50
rates that has caused problems.
9 1/20 Very High failure rate. Almost certain to cause problems.
10 1/10+ Very High failure rate. Almost certain to cause problems.
Reliability, Maintainability, and Supportability Guidebook, Third Edition, Society of Automotive
Engineers, Inc., Warrendale, PA, 1995.

F. RCM Implementation
There is no one set path for successfully implementing RCM because RCM is more than just
performing a Failure Modes and Effects Analysis (FMEA), adopting condition monitoring
techniques, and/or optimizing a maintenance and overhaul program through the application of an
Age Exploration (AE) process. A successful RCM implementation process first must recognize
what and where the source of return on investment (ROI) resides. The source(s) of ROI may be
tangible and/or intangible. For the former, a quantifiable business case may be developed based
on financial benefit (savings, cost avoidance, reduced Work in Progress (WIP) and/or reduced
liability) to the organization while for the latter, the benefit may be unquantifiable (employee
skills, morale, customer relations, etc.) In either case, a baseline and goal must be established
through some mechanism such as internal or external benchmarking, which results in a defined
gap between the "As-Is" and the "To-Be" state and the ROI identified for closing all or a portion
of the gap.

Remember, caveat emptor. That is, RCM is not for everyone and very few organizations will
benefit from implementing all elements of a classical RCM program. RCM like all tools/processes
has an element of diminishing return. Not all the elements of RCM which are applicable to a
nuclear power plant, the aircraft industry, and/or a 24/7 continuous process plant in a sold out
condition, will be applicable to a batch process operation or a non-production facility. However,
there are a few truths everyone should follow and there is no need to pilot or perform an FMEA
analysis. They are:
1. Key performance indicators (aka metrics/performance indicators) are essential for
establishing the baseline, goal, and the gap. Progress cannot be measured or sustained
without KPIs. (See Section G-Key Performance Indicator (KPIs) Selection)
2. Thermograph works for electrical distribution, boilers, couplings, roofing systems and
building façades.
3. If your specifications for alignment, imbalance, motor circuit phase impedance, oil
condition and cleanliness, and vibration are not quantified, the product you receive will
have latent defects 80% of the time.
4. If you do not commission and check the sequence of operation of your equipment and
buildings to a predetermined quantifiable specification, you will not get what you expect.
5. Pareto analysis is the best tool for determining where to start your RCM process. Look
for the bottlenecks, the recurring failures, and follow the money.
6. RCM implementation in a team environment works better.
7. Failure modes for identical equipment are the same. It is only the consequence and
probability of failure that changes.
8. The impact of poor water chemistry is underestimated in terms of energy consumption
and life-cycle cost.
9. The majority of failures are random. Very few machines understand how a calendar
works. Age Exploration can reveal hidden assets.

You might also like