Professional Documents
Culture Documents
1
4. The performance of the machinery /facility is kept to minimum to the event
of the breakdown.
5. The maintenance cost is properly monitored to control overhead costs.
6. The life of equipment is prolonged while keeping the acceptable level of
performance to avoid redundant replacements.
Maintenance is also related with profitability through equipment output and its
running cost; Maintenance work enhances the equipment performance level and
its availability in optimum working condition but adds to its running cost.
1. Primary Functions
1.1 Maintenance of Existing Plant Equipment
This activity represents the physical reason for the existence of the maintenance
group. Responsibility here is simply to make necessary repairs to production
machinery quickly and economically and to anticipate these repairs and employ
preventive maintenance where possible to prevent them. For this, a staff of skilled
craftsmen capable of performing the work must be trained, motivated, and
constantly retained to assure that adequate maintenance skills are available to
2
perform effective maintenance. In addition, adequate records for proper
distribution of expense must be kept.
The repairs to buildings and to the external property of any plant—roads, railroad
tracks, in-plant sewer systems, and water supply facilities—are among the duties
generally assigned to the maintenance engineering group. Repairs and minor
alterations to buildings—roofing, painting, glass replacement, service electrical or
plumbing systems or the like are most logically the horizon of maintenance
engineering personnel. Road repairs and the maintenance of tracks and switches,
fences, or outlying structures may also be so assigned. It is important to isolate
cost records for general clean-up from routine maintenance and repair so that
management will have a true picture of the true expense required to maintain the
plant and its equipment.
Traditionally, all equipment inspections and lubrication have been assigned to the
maintenance organization. While inspections that require special tools or partial
disassembly of equipment must be retained within the maintenance organization,
the use of trained operators or production personnel in this critical task will
provide more effective use of plant personnel. The same is true of lubrication.
Because of their proximity to the production systems, operators are ideally suited
for routine lubrication tasks.
In any plant generating its own electricity and providing its own process steam,
the powerhouse assumes the functions of a small public utilities company and
may justify an operating department of its own. However, this activity logically
falls within the realm of maintenance engineering. It can be administered either
as a separate function or as part of some other function, depending on
management’s requirements.
3
1.5 Alterations and New Installations
Three factors generally determine to what extent this area involves the
maintenance department: plant size, multi-plant company size, company policy.
In a small plant of a one-plant company, this type of work may be handled by
outside contractors. But its administration and that of the maintenance force
should be under the same management. In a small plant within a multi-plant
company, the majority of new installations and major alterations may be
performed by a companywide central engineering department. In a large plant a
separate organization should handle the major portion of this work. The industry
must permit flexibility between corporate and plant engineering groups when
installations and repairs are done outside the maintenance engineering
department. However, the handling of all new work by a separate organization
from maintenance management and policies would be counterproductive.
2. Secondary Functions
2.1 Storekeeping
This category usually includes two distinct subgroups: guards or watchmen; fire
control squads. Incorporation of these functions with maintenance engineering is
generally common practice. The inclusion of the fire-control group is important
since its members are almost always drawn from the craft elements.
This function and that of yard maintenance are usually combined as specific
assignments of the maintenance department.
4
2.4 Salvage
If a large part of plant activity concerns off-grade products, a special salvage unit
should be set up. But if salvage involves mechanical equipment, scrap lumber,
paper, containers, etc., it should be assigned to maintenance.
Types of Maintenance
Broadly, there are two main types of maintenance categories that can further be
sub-divided into various maintenance-type groups.
1. Preventive Maintenance
Preventive Maintenance refers to the fixing of problems before they appear. This
means such maintenance prevents the problem. Inspection of equipment at
regular intervals to check the machine’s condition and take necessary action is
the motto of preventive maintenance.
5
Figure 2.1 Work-flow of Preventive Maintenance
6
accurately forecast the equipment’s ability to function and execute predictive
maintenance:
a. Equipment history
b. All records of downtime, defects, performance, etc.
c. Equipment condition with respect to working time.
After analysis of the above data and including the experience with similar
equipment maintenance dates are fixed.
In this technique, the real asset condition is tracked and the need for additional
maintenance is determined. Based on visual examination, predetermined tests,
performance data, etc., the equipment condition is examined during this type of
maintenance. Maintenance is planned when a failure or a hint of declining
performance is detected.
RBM considers the philosophy of maintaining the assets carrying the most risk
during failure. This philosophy determines the most economical use of the
maintenance resources and optimizes the risk of failure.
Equipment carrying the greater risk and failure consequences are frequently
monitored and maintained. This philosophy provides a systematic approach to
7
determine the most appropriate asset maintenance plans in the most economical
way.
2. Corrective maintenance
Normally there are three situations that call for corrective maintenance:
Planned corrective maintenance is the corrective action that is not immediate but
planned or scheduled according to the severity and nature of the observed defects.
The risks involved and costs involved are major parameters to determine the
8
planned corrective maintenance schedule. This is also known as deferred
corrective maintenance.
Example: A pump is inspected and repaired after every 200 hours but it breaks
down after 150 hours of operation and it calls for an emergency repair. Similar
cases are examples of unplanned corrective maintenance.
9
7 Equipment lifespan and efficiency is Overall equipment lifecycle and
increased by regular preventive efficiency reduces.
maintenance
8 From the safety of employees and This is hazardous considering
working environment safety.
considerations, PM is better.
9 Smaller number of technicians It requires a greater number of
perform PM decreasing the employees or technicians to
workload. perform CM which increases the
workload.
Window Maintenance is a set of activities that are carried out when a machine
or equipment is not required for a definite period of time.
Design-out maintenance is a set of activities that are used to eliminate the cause
of failure, simplify maintenance tasks, or raise machine performance from the
maintenance point of view by redesigning those part and facilities which are
vulnerable to frequent occurrence of failure.
Maintenance Tools
10
1. P-F Curve
A P-F curve is a graph that shows the health of equipment over time to identify
the interval between potential failure and functional failure. The eventual failure
of any equipment is inevitable. Wear and tear naturally occur with continual
usage. In the same way our pair of shoes eventually get worn out after 500 KM of
walking, the key plant equipment (e.g. pumps, motor bearings) will ultimately
reach their functional failure point.
The good news is that the functional failure point (i.e. the end of equipment life)
takes a long time to occur. The P-F curve helps to characterize the behaviour of
equipment over time. It’s used to assess the maximum usage that can be gained
from the equipment.
There are two main points of the P-F curve that need to be identified.
These two points define what’s called as the P-F interval—the time between when
the failure is initially noticed and when the equipment fails completely.
.
Figure 2.3 P-F interval
11
How to create a P-F curve
The basic parts of the P-F curve are given above. Actual data can be expected to
vary on a case-to-case basis. For instance, the lifespan of a heavy-duty pump
might not be the same as that of a mechanical bandsaw. It then follows that
expected failure points for different equipment will vary. Care must be considered
when building P-F curves. Different types of equipment are expected to have
varying interval values.
For example, assume that a pump that’s been normally operating for eight
months suddenly produces more noise than usual. Unnecessary noise can be a
sign of failure. With the inspection and confirmation of maintenance personnel,
we can then say that the first noticed sign of failure (i.e. the potential failure point)
occurred at eight months.
Note that the actual start of deterioration might have happened before the
eight-month mark. So, we can assume that the actual start of failure happened
some time before point P. However, it is only the potential point of failure that we
can measure in time with certainty as it was the first event when noticeable
symptoms of failure were recorded.
For the same example, we can suppose that the pump continues to operate
for another six months until it totally breaks down—that is the functional failure
point at 14 months.
12
How to maximize the curve
Now that we’ve visualized how the P-F curve relates to real-life scenarios, we have
the chance to prepare for the inevitable functional failure. The idea is to balance
our resources to prolong the P-F interval economically.
Common practice is to maximize the use of the P-F curve with condition-
based maintenance (CBM). By applying CBM and proactively checking the
condition of the equipment, we are able to infer the rate of deterioration over time.
Maintenance personnel are then able to plan and assess whether it is cost-efficient
to mitigate the causes of failure given the projected P-F interval.
At the early signs of failure, it may be helpful to perform routine CBM tasks to
assess the health of the equipment. Continuing with our pump example, a P-F
curve coupled with CBM tasks to monitor pressure and flow rate conditions may
resemble the following Fig.2.5. A maintenance team can attach condition
monitoring sensors to the equipment after the point of potential failure to assess
how much more the equipment can be maximized.
13
2. FMEA (Failure Mode and Effects Analysis)
FMEA stands for Failure Mode and Effects Analysis, and the name tells a lot about
the process. FMEA is a structured method that aims to identify potential failures
and their corresponding outcomes. The FMEA process is considered a bottom-up
approach; the analysis starts with specific data that builds up to form a more
general plan of action. In this case, each component of the observed system is
thoroughly examined for likely breakdown causes. For every identified breakdown
scenario, corresponding effects should then be pointed out. This allows the
organization to have an extensive map of failure modes and effects, organized
according to their level of impact on the business. Developing an FMEA process
equips organizations with a strategy to identify potential breakdowns before they
even occur. This process of risk assessment can streamline the efforts of
maintenance teams towards efficiently increasing reliability.
FMEA can be broadly classified into two categories: Design FMEA (DFMEA)
and Process FMEA (PFMEA). Each of these would concentrate on different areas,
potentially coming up with more specialized findings.
Design FMEA relates to the way that a system, product, or service was
conceptualized. As the name suggests, DFMEA focuses on the design aspect of a
developmental process. It is primarily beneficial in testing out new product ideas
before introducing them to real-life scenarios.
The nature of PFMEA differs slightly as it looks into current processes and
procedures that an organization is already performing. PFMEA would typically
address potential failures that can have significant impacts on usual operations.
Some examples of business impacts are process stalls, human errors, and
environmental and safety hazards. Because of its nature, PFMEA can be
performed more effectively when historical data is available.
14
FMEA Work Principle
15
Failure Modes
Failure modes describe the specific ways by which failures can occur. Different
forms of FMEA will focus on different specific areas. For example, the object that
is assumed to fail can be a component of an equipment, the equipment itself, a
subsystem, a system, or even a certain process.
A set of failure modes can tell a lot about the functional impact of a failure. In
such cases, the levels of service that failure events allow are categorized into their
respective groups. For example, to establish the level of functionality after a
failure, a set of failure modes can resemble the following categories:
When performing FMEA for more specific applications like physical equipment,
failure modes tend to become more specific. For example, take the case of
specialized equipment, such as a centrifugal pump. Failure modes for a pump can
include hydraulic failure, mechanical failure, corrosion, or human intervention.
As we could imagine, the list goes on as more types of equipment are analysed.
Effects in FMEA
1. Local Effect
Local effects are the consequences of failure to the item being observed or items
immediately adjacent to it. When looking at physical assets, for example, local
effects might consider the consequences of having a faulty component such as a
pump used in water circulation.
16
2. Next Higher-Level Effect
As the name suggests, the next higher-level effects would consider the impacts of
failure on the larger subsystem that it affects. Following our equipment example,
the next higher-level effect of having a broken pump could describe consequences
to the larger cooling system it belongs to.
3. End Effect
The end effect is the highest-level effect that considers repercussions to the whole
system, facility, or organization. Again, continuing from our example, the end
effect of a compromised cooling system might be delayed production schedules or
worse, a complete standstill of operations.
With the failure modes and effects mapped out, the risk of each scenario
occurring could be assessed more systematically. The steps to take given the
failure modes and effects would then become more apparent after evaluating the
other components of the FMEA process.
With long lists of failure modes and corresponding effects, the next challenge is
building strategies to handle each scenario. An objective approach to quantifying
the seriousness of failure events would be to identify the other components of the
FMEA process.
1. Probability of Failure
17
the lifetime of an asset. The seasonal fluctuations of failure events also need to be
considered where applicable.
Accurately identifying the rating for this criterion could really use a
combination of worker experience and robust historical data. Maximizing the use
of a CMMS (Computerized Maintenance Management Systems) to collect historical
data can lead the maintenance teams to data-driven assessments.
2. Detectability
Detectability answers the question: “Will there be a warning to allow the failure
event to be avoided?” In this component of FMEA, easily detectable failure events
are given a low rating, while events with no chance of detection are given the
highest rating.
For example, with highly reliable sensors in place, a faulty HVAC system
might have a relatively low detectability rating. Components with absolutely no
way of detection should be given a high rating to reflect a potentially bigger
problem.
3. Severity
18
a breakdown. Other factors such as safety hazards and nonconformance with
government regulations are also factors that influence severity.
After a green light has been signalled from the higher-ups, the process of
gathering all requirements then follows. The following high-level steps are
commonly found in FMEA processes. Think of this as a general checklist for
starting with FMEA:
19
1. Tailor-Fit the Evaluation Criteria
The way FMEA is carried out can vary widely from each organization, and they
have certain differences for a reason. Companies will have their own business
strategies and therefore focus on different aspects of their operations. To reflect
this in your FMEA process, you should be strategic in assigning weights and
identifying categories in your decision criteria.
After identifying the criteria that best reflects your company’s objectives, it then
helps to stick with a rating pattern. The consistency of your rating scales would
be an effective way of aligning the organization towards common goals.
Consistency allows you to seamlessly work within various groups and functions.
At the end of the day, FMEA is essentially a risk assessment tool. For it to be an
effective tool, it is not enough to identify the failure modes and corresponding
effects. Specific processes and procedures need to be in place to eliminate, or at
least reduce, the risks identified.
A lot of the information that goes into the FMEA process relies on the day-to-day
experience of the workforce. By engaging the right people, you can ensure the
reliability of your data. It is important to recognize the value that comes with the
collective experience of your team.
You can only gather so much information with the limitations imposed by our
human capacities. A superpower that you might not realize is the tools that are
running 24/7 in the background. CMMS programs that record equipment
performance even when you’re not looking can provide you data sets that you
might have overlooked manually. Coupled with a more-than-capable workforce,
these tools can maximize the potential you already have.
20
Common FMEA Mistakes
Now that we’ve explored some useful tips for successful FMEA implementation, it
also helps to look into potential pitfalls. Look out for these points when
implementing FMEA procedures:
2. Unclear Ownership
The required actions you identify are only as good as the actual execution. Without
clear accountability within the team, it doesn’t really matter how many plans you
lay out. Clear ownership of the required actions should be established as part of
the process.
Without a disciplined documentation process, you risk starting from scratch even
after recurring scenarios. Actions or significant events that have transpired should
be properly documented as a way to assist future teams who will face similar
situations.
Coming up with solutions is under the assumption that you have identified the
root cause of a problem. Without the diligence to confirm the root cause, time and
effort could easily be put to waste. What’s worse, you might be convinced about
rectifying an issue, only to find out the hard way that the problem still persists.
21
Conclusion
While the seriousness of failure effects can seem subjective, FMEA offers methods
that quantify the repercussions of failures. This then allows the organization to
perform actions that effectively reduce or even eliminate risks. Having a
comprehensive FMEA process sets up organizations to be prepared.
While most companies have identified the need for a preventive maintenance (PM)
program, the effective execution of such maintenance activities can be challenging
given the everyday demands of a facility. Unseen circumstances that require
urgent attention can easily derail planned activities and can potentially disrupt a
smoothly running plant. PMO provides a method through which maintenance
activities are carried out more efficiently. By performing PMO, a new maintenance
strategy is derived from existing PM tasks. Given the existing tasks, modifications
on the schedule and frequency of the routines are done based on the failure
history of the equipment. With a relatively shorter time to develop, the resulting
strategy can be similar to performing RCM.
a. Data collection
Any attempt at optimization starts with good, reliable data. Data on equipment
performance, particularly on failure history over time, must be collected. A
minimum time period must be set to ensure that enough insight is obtained from
22
the data. Tools such as a CMMS program can make this process easier and more
accurate.
The collected data must be analysed to identify which equipment is the most
critical. Some points to consider are criticality to the plant’s operations, cost to
repair, MTBF (mean time between failure), and MTR (mean time to repair).
The information gathered from analysing the data must then be reviewed against
existing PM routines. Some key points to review are: 1) whether the PM routines
are scheduled correctly to align with the MTBF and MTR data points, and 2)
whether failure points are within acceptable tolerances set by original equipment
manufacturer (OEM) specifications or industry standards. Any substantial
deviations from such checks can be a source of improvement from a maintenance
standpoint.
Agreed action items must be delegated properly. Identified task owners should be
accountable for any required action and monitored for progress. Note that the
PMO process is a continuous effort and reviews should be done habitually.
In the laboratory and life sciences industry, a PMO program is estimated to reduce
overall maintenance costs by around 25%. Payback periods of investing in a PMO
strategy are estimated at around 12 to 24 months, just considering the measured
savings from maintenance costs.
23
Aside from the improvements in uptime and reliability that come with a robust
maintenance strategy, PMO methods enable company resources to be spent more
wisely without sacrificing the quality of execution of maintenance tasks.
One of the easiest methods is simply asking the technicians. Odds are, they’ve
been performing the same PM tasks for a while, and they’d probably have some
insight on what could be done better. If any of their tasks seems irrelevant, they’ll
let us know if we ask. While this is the simplest way to track down superfluous
PM tasks, it’s not the most precise, and it is pretty subjective. That said, it’s easy
to do when we’re just starting to optimize PM in our facility.
The next way is a bit more precise, though it is based on some industry
assumptions. A few decades ago, a maintenance and engineering manager named
John Day, Jr. proposed the 6:1 rule. This rule asserts that for every 6 PM tasks
we perform, we should be finding one corrective maintenance task.
This rule isn’t perfect, but it can give a good starting point for optimizing the PM.
If we’re performing more than six PMs for each CM, we may want to scale back a
bit, but only after doing some research. If it looks like the PM:CM ratio is too high,
it is advised to analyse the types of failures we’re preventing. If they don’t pose too
much of a threat, scaling back relevant PM tasks could be a good idea.
On the other hand, if we have more CM than the ratio dictates, we might be facing
one of these two possibilities:
Again, some extra analysis will help you make the right choice here. Do some
digging into the preventive maintenance tasks you’re performing and see if those
address the right issues. If they are, our PM timing or quantity may be off. We’ll
want to scale those back and replace them with more relevant tasks if they’re not.
24
A similar approach involves tracking a single asset’s hours performed on PM and
emergency maintenance. If we have more emergency repair hours than PM hours,
we’ve probably got a problem and will want to do some analysis on the root cause.
These last two approaches give us more precision, but they do take more planning,
so keep that in mind when we start streamlining our PM.
Conclusion
4. SCADA System
The biggest manufacturing companies in the world are also known to be the
most data-driven establishments. In an age of growing technological capabilities,
the importance of collecting data is pushed to the limit with the use of systems
such as SCADA. By collecting and monitoring real-time data, SCADA software
shows an overview of how each key equipment in the plant is performing. Sensors
on the equipment send signals through remote terminal units (RTUs) and
programmable logic controllers (PLC). RTUs and PLCs give the supervisory control
and data acquisition system the ability to pinpoint anomalies in system functions
based on the collected data, thereby allowing the user to promptly take action on
the issue. SCADA allows maintenance personnel to make more informed
decisions. A modern SCADA system is applicable to a wide variety of industries—
oil and gas, energy, manufacturing, and virtually any corporation that benefits
from accurate and timely data monitoring.
Think of SCADA software as a bridge that links equipment with operators and
maintenance personnel. The system requires some key components to facilitate
25
the transmission of data from the physical equipment to the operator’s display
screen.
Digital or analogue sensors serve as measuring tools that collect data from various
parts of the plant. SCADA sensors may range from simple binary options, such as
an on or off signal, to more complex tools that measure flow rate, temperature,
and pressure. In addition, technicians or operators at the remote or central
location can manually input data into the system.
b. Conversion Units
Data collected by sensors is only useful if it can be converted into a form that is
easily comprehensible. Remote terminal units (RTU) and programmable logic
controllers (PLC) are the devices that can translate the collected data into usable
information. Since information is collected throughout an entire system, the sheer
amount of data can be great.
Data feeds that are converted by the RTUs and programmable logic controllers
meet at a master unit known as the supervisory system or the human-machine
interface (HMI). This interface brings useful information to the maintenance team.
26
At this point, one operator can have a complete picture of an entire process or
system. The data is presented in an easily digestible format, and the employee can
take control of certain pieces of equipment to make repairs or isolate failures.
While a human-machine interface and SCADA share many similarities, they are
fairly different.
All the SCADA components are located throughout the plant and must be linked
together by a communication infrastructure network. Conventionally, telephone
lines and circuits have served as this network with newer wireless options now
available that use radio waves or cellular satellites.
How It Works
27
Some typical tasks include checking sensors and other devices that may be
installed at remote substations or monitoring and control stations, tracking
machine system events for future reference, and changing the level or speed of
industrial processes from a central spot.
Benefits of SCADA
Since SCADA systems provide flexible, scalable means to monitor and control
what’s happening throughout complex industrial processes, on a shop floor, or
within remote substations, it can make a significant contribution to maintenance
and reliability efforts.
28
improve efficiency and timely production. RCM typically focuses on identifying and
prioritizing different failure modes. This focus helps in scheduling activities that
will prevent major system failure. It’s easy to see how SCADA software can go hand
in hand with an RCM system, as SCADA provides a great deal of automated
information and data on the performances of various assets and machinery in a
plant. SCADA also allows human intervention early in the process, preventing the
failures that an RCM strategy is designed to seek and identify. Here are some real-
life SCADA applications.
Electrical utilities around the world are facing an increase in the demand for
power, as well as a lower-than-ever tolerance for outages. Maintaining these
complex electrical grids and equipment to be as perfectly reliable as possible is a
significant challenge. SCADA systems play an important role in helping utilities
achieve that goal.
In this industry, sensors can collect information from various points at each
substation. Additional manual data can be added by either the central or
substation staff. Besides collecting data and sending alerts when certain
conditions signal an outage or potential for an outage, a well-programmed SCADA
system can automate certain repairs. In addition, a SCADA system can pinpoint
the location of other problems, minimizing the time it takes for a technician to
locate and diagnose the problem. Finally, various backup and redundant checks
and processes can increase reliability throughout the power grid.
Being able to identify potential failures before they occur is critical to maintaining
a world-class level of uptime on any shop floor. For example, let’s look at the
failure of a steam turbine component. If the overspeed trip device is not working
properly, it may have no immediate effect on your overall system.
However, a SCADA sensor that recognizes this failure is not designed to only repair
that component. Instead, the sensor alerts the maintenance department of the
issue, so it can be repaired before its load drops suddenly and acceleration ensues.
29
If the turbine is allowed to deteriorate to this level of failure, a company may incur
damage from flying blades or even employee injury from the malfunction.
SCADA systems help IT and telecom organizations better control sensitive systems
and monitor remote environments. Sensors can provide around-the-clock “eyes”
on things like the temperature of servers or the humidity in rooms with sensitive
IT equipment to avoid or minimize damage caused by environmental factors.
In addition, SCADA systems work well within security applications such as alarm
contact closures, magnetic door sensors, and motion detectors.
These systems can even track all activity on a wind farm within a 10-minute
window, which gives the human operator near real-time activity tracking. If
anything looks amiss in the system, action can be taken almost immediately to
protect the equipment and safety of the surrounding community. In addition, the
SCADA system can track energy output and any functionality errors that can be
used as evidence for warranty claims on equipment.
Data is sent through a fibre-optic network that summarizes not only the
performance of the turbines and other equipment but also the performance of the
wind itself. Meteorological equipment must be incorporated into the system in
order to determine if lower power production is due to equipment issues or low
winds.
Origins of SCADA
30
system used paper-based records, pushbuttons, and analog dials to perform
monitoring and control.
For smaller facilities, this manual system was workable for a time. However,
as companies grew in size and reach, it became more difficult for them to rely on
nonautomatic industrial processes, especially over longer distances. Original
automation tools began with timers and relays, which reduced the number of trips
technicians needed to make to remote locations.
Today, the biggest manufacturing companies in the world are also known
to be the most data-driven. In an age of growing technological capabilities, the
importance of collecting data is pushed to the limit with the use of systems such
as SCADA.
31
The Evolution of SCADA
Like just about all industrial computer systems, SCADA was first implemented on
huge, mainframe computers. This dictated the fact that they were standalone
systems, functioning and housed in a single location. These were known as
monolithic systems.
By the turn of the millennium, SCADA joined the ranks of other computer
systems in a more open environment. This now networked SCADA system ran on
ethernet, which allowed multiple systems, vendors, and partners to join in the
network and connect to the SCADA system.
Meanwhile, information technologies have been emerging over the last few
decades. The development of structured query language (SQL) databases was
significant in advancing data management. Modern SCADA systems incorporate
SQL capabilities, further linking to enterprise resource planning systems for a
smoother and more holistic operation.
Market research analysts also state that the industrial control systems
market, which includes SCADA systems, is projected to reach $181.6 billion by
2024. The Industrial Internet of Things, cloud technology, and evolving web
technology will no doubt have an impact on future SCADA systems.
32
Conclusion
SCADA systems can work together with RCM and maintenance strategies
that focus on predictive maintenance. SCADA provides the data and technology
to allow a great deal of automation and data collection, which means that
problems and failures can be spotted at a point before they cause major equipment
damage, shut down an entire production line, cause a serious accident, or result
in an environmental catastrophe. As technology continues to develop into the
future, the potential for SCADA systems and related processes is great in helping
companies increase revenue and safety.
Lean Six Sigma is a process that aims to systematically eliminate waste and
reduce variation.
Over the past few decades, streamlining processes has been identified as
the key to unlocking the maximum efficiency of a plant. Particularly used in
manufacturing industries, some studies have correlated being ‘lean’ with
inventory management.
This correlation allowed studies to find out that since the 1980s, major
manufacturing companies have shown an increasing trend in being lean. This
reinforced the need to focus on improving processes, and continued the pursuit
for applying Lean Six Sigma practices.
In the maintenance setting, the same concepts of being more efficient are
definitely becoming more of a requirement than an option. The philosophy of
33
continually improving processes by taking out redundant steps—while still
consistently maintaining high standards—is being realized to drive the overall
performance of a plant.
It’s estimated that maintenance activities can make up 15 to 70% of the total cost
of production of a factory. A huge impact on the total spend causes an equally
huge motivation to remove any non-value-adding part.
The Six Sigma method is a framework to ensure that processes are created to
consistently provide high quality output.
• Define the problems you want to address and the objectives you want to
achieve. This process involves the identification of resources, benefits, and
timelines.
34
• Measure your baseline metrics as a comparison for future progress. Agree
on the methods to collect data accurately and consistently.
• Analyse your data to identify root causes. It is important to establish cause-
and-effect relationships between incidents and root causes to get to the
source of the problem.
• Improve the process by implementing solutions to the identified problems.
This phase may include testing and prototyping to ensure that the root
causes are identified accurately.
• Control systems must be put in place to monitor the progress and
effectiveness of implemented solutions.
At the end of the day, Lean Six Sigma aims to remove useless steps to your
processes and to provide consistent quality of service across tasks. This is a
constant process that needs continuous effort. Keeping an eye on the eight main
kinds of waste—easily remembered as DOWNTIME—can keep you in check on
how lean the plant is running.
Conclusion
Though the philosophy behind Lean Six Sigma was developed for the
manufacturing industries, its applications to maintenance is more relevant than
ever. Maintenance activities are essential to a plant’s overall performance and
Lean Six Sigma offers methods to perform maintenance activities with consistently
high standards while reducing unnecessary costs.
35
6. Root Cause Analysis
When feeling under the weather, it’s perfectly natural to address any pain
or discomfort by some sort of first aid treatment or a superficial remedy. However,
if you consult a medical professional, then the approach might be a little more
thorough. You might find yourself being asked a series of specific questions about
your condition, and might even go through some laboratory tests to get to the
source of your illness.
The same is true for plant and maintenance incidents. While an immediate
response is usually required, there is always value in performing a systematic
analysis of possible root causes.
RCA is the process that aims to identify the cause of a particular event. In
the plant setting, this event usually refers to any potential problems that will
disrupt standard operations. At a very high level, the usual suspects (i.e., usual
causes of problems) can be categorized as:
The general process of RCA requires you to describe what happened, why and
how it happened, and what steps are needed to prevent the same event from
happening in the future. The process can get very complex depending on the
situation. Thankfully, some common methods were developed to aid in identifying
the root cause.
RCA makes use of a number of methods that help teams to brainstorm and
pinpoint likely causes of issues in the facility. The following methods can assist
maintenance teams when performing root cause analysis.
36
a. 5 Whys
The name of the method pretty much explains the steps: ask why and ask it again.
Asking “why?” five times usually gets to the bottom of the problem, but don’t let
the name stop you from asking more times. The idea is to drill down to the details
of an event until you are left with the actual root cause. The 5 Whys method is the
simplest RCA tool. It’s often best for operators and others performing the day-to-
day labour in the facility.
A more visual method to determine root causes is by using a fault tree diagram. A
fault tree diagram starts by having the problem at the topmost block. The
immediate causes preceding the problem event are listed, then they branch out to
form the second layer of the diagram. Each immediate cause branches out to its
own prior causes. This process is continued until the most basic events are
37
identified, which then become your potential root causes. The same mixer can
resemble the following fault tree diagram:
Another visual method to identify root causes is by using a fishbone diagram (also
known as an Ishikawa diagram, named after its creator, Kaoru Ishikawa). It starts
by specifying the problem on the rightmost part of the diagram. The factors
contributing to the main problem are then listed as categories. Specific causes
under each category are then listed down to identify the source of the problem.
• Environmental
• People
• Equipment/material
38
• Procedures
Applying these basing categories as a starting point, the mixer problem can be
translated into a fishbone diagram.
a. FMEA
Failure Mode and Effects Analysis is a method for identifying ways in which assets
might fail. One takes stock of the potential failure modes that individual assets
might experience and analyses how those failures might impact business
processes.
FMEA differs from the other RCA tools discussed so far because it looks
forward at what might happen rather than hypothesizing over a failure that
already occurred. However, it can still be useful when it comes to finding root
causes. Facilities that take the time to perform FMEA will have a ready-to-use
39
database of potential causes and effects to draw upon when analysing a failure
event, ultimately expediting the process.
The Pareto method is based on what’s commonly called the Pareto principle, which
states that 80% of all problems result from 20% of all causes. When drawn up into
a chart, potential causes of the problem are listed from left to right in order of
impact (greatest on the left, least on the right) and frequency. Each problem is
represented in the diagram as a bar, and that bar’s height represents its
frequency.
In addition to the bars in the chart, a line is also charted across the diagram
to show the cumulative impact of each cause (ascending from left to right). A
Pareto diagram can be used to visualize data from FMEA in a way that helps
maintenance teams target the most important issues first. That way, the team
spends less time on tasks that don’t matter.
Effective root cause analysis helps maintenance teams focus on fixing the core
causes of problems rather than constantly treating symptoms. A few ways in
which RCA achieves that include the following.
However, without researching the root causes of these breakdowns, they are
unlikely to go away. Odds are, the asset will break down again in the future. When
performed correctly, RCA helps maintenance teams focus their preventive
maintenance on the most important tasks. Given that as much as half of all PMs
ultimately accomplish nothing, that could translate into vastly reduced
maintenance costs.
40
b. Puts Everyone on the Same Page
When getting to the root of a problem, it’s common for individuals to blame other
people, departments, etc. One goal of RCA is to avoid this type of situation where
everyone blames one another for problems instead of looking at core systemic
issues.
The problem here is that issues related to human error need to be resolved
with adequate processes and controls—the issue won’t necessarily be solved by
removing a given human being from the situation since any other person could
make the same mistake. As such, the root cause is related to processes and
procedures, not people. Proper RCA avoids this problem by helping the team work
together to identify issues that are related to systems, processes, and machines
while driving toward actionable plans. Ultimately, it helps people get on the same
page.
When root causes are discovered and properly dealt with, equipment runs more
reliably, resulting in fewer breakdowns, overall better processes, and more
consistent output quality.
41
Implementing RCA
While RCA methods are very common and well-known to the maintenance
community, there can be challenges to making RCA thrive.
The first step to mastering this process is knowing the methods that are
available to conduct RCAs. The next steps are setting the proper mindset and
improving the quality of execution to drive the initiative toward success.
Keep in mind the importance of collecting data accurately and involving the
correct groups to analyse that data. To implement RCA effectively, it should be a
repeatable process that is collaboratively executed by the group.
In order to successfully implement RCA and receive its full benefits, it must be
done correctly. The following pointers can help you implement root cause analysis
effectively in your facility.
Generally, the most effective processes aren’t necessarily the most perfect, but the
ones that can be easily repeated. While making sure you’re continuously
42
improving your root cause analysis is important, it’s unlikely to become a regularly
used tool in your facility if it’s not fundamentally repeatable.
In order to analyse incidents, you first need to be aware of them. Logging asset
data can help with that, but it’s absolutely vital for your employees to feel free to
report incidents or problems when they occur.
4. Prioritize Causes
RCA is most effective when you’re able to prioritize causes. Rather than spreading
your time and efforts across numerous potential causes, you’re able to focus on
resolving the issues that have the most impact (and the greatest cost).
As mentioned above, FMEA and Pareto diagrams can help your team
prioritize the right causes. After figuring out a number of potential causes, it’s
often worthwhile to analyse the potential impact of each one to see where you can
make the greatest difference.
It’s important not to rush the RCA process. While you don’t want to delay it or
spend too much time analysing the issue—resulting in “analysis paralysis”—
43
neither do you want to rush to a superficial conclusion of what caused your
problem.
RCA is best done as a collaborative effort. After all, there may be multiple issues
at play, and it’s important to have a variety of skillsets and expertise at the table.
Potential qualified team members include:
• Maintenance professionals
• Operators
• Reliability engineers
In addition, you’ll want someone who has enough authority to help the team
overcome organizational roadblocks in the investigation process.
Finally, at least one person you select for your RCA team should have solid
investigation skills. They should be the sort of person who is naturally diligent
and impartial with a keen eye for detail.
Even with a repeatable process and a solid team, RCA will still get you nowhere if
you’re unclear on the actual problems you’re discussing. Before beginning your
discussions, you’ll need to pinpoint exactly what the problem is and how it shows
itself in your processes.
1. Your team finds a solution to a problem you don’t actually have, or;
2. Each member of the team has their own mental concept of the issue, turning
the discussion into an unproductive argument.
44
Neither result will help you solve the actual issue, so make sure everyone is
clear on the problem before you begin your analysis.
Finally, it’s important to measure the results of your RCA process in order to gauge
its success. If the same incident occurs again, that’s your cue to perform a more
in-depth analysis or make other adjustments to your process in the future. In the
end, your RCA and other processes will be in a consistent state of improvement.
Some of the most common root cause analysis mistakes involve poor definitions
or focusing too much on the wrong thing. Others simply ignore root causes
entirely, rendering the process pointless.
One of the most important parts of root cause analysis is defining the problem
well. When defining the problem at hand, just saying something is wrong isn’t
enough. Rather, you want to dig into the specifics of when it occurs, how prevalent
it is, and any domino effects it causes.
Often, businesses don’t get specific enough about the actual problem, and
that leads their RCA down the wrong path.
A better route would be to see how your maintenance processes might have
failed to account for human error. The job failed because there were no quality
45
control processes in place in case someone did something wrong. Unlike Jerry’s
sleep schedule, that’s something you can change.
To solve this problem, use a tool that will help you look at multiple factors,
such as fault tree analysis or a fishbone diagram.
Once you get some potential root causes, you need to make plans based on your
findings. Teams will sometimes revert to the more superficial “causes” in their
analysis, ignoring the root causes entirely. In doing so, they end up treating
symptoms rather than problems.
Ignoring data is actually pretty common. Oil rigs might ignore 99% of their sensor
data. Management teams might ignore the true causes of their problems.
Basically, when you get to the end or your RCA, find solutions to the root problems
you find first.
Conclusion
RCA is a powerful process that enables the organization to identify the source of
a problem. Performing RCA processes effectively can significantly improve a
plant’s performance by implementing correct solutions that last.
Maintenance Costs
Breakdown maintenance stops the normal activities and the machines as well as
the operators are rendered idle till the equipment is brought back to normal
condition of working. It involves higher cost of facilities and equipment that
have been used until they fail to operate, also associated penalty in terms of
expediting cost of maintenance and down time cost of equipment.
46
Preventive maintenance will reduce such cost up to a point. Beyond that
time, the cost of preventive maintenance will be more when compared to the
downtime cost. Under such situation, a firm can opt for break-down maintenance.
47
Replacement Economy
Replacement problems fall into the following categories (depending upon the life
pattern of the equipment involved.)
48
1. Replacement of the equipment that wears out or becomes obsolete with
time
Costs to be considered:
• Capital recovery cost (average first cost), computed from the first cost
(purchase price) of the machine.
• Average operating and maintenance cost (O & M cost)
• Total cost which is the sum of capital recovery cost (average first cost) and
average maintenance cost
The capital recovery cost goes on decreasing with the life of the equipment
and the average operating and maintenance cost goes on increasing with the life
of the equipment. From the beginning, the total cost goes on decreasing up to a
particular life and then it starts increasing.
A machine loses efficiency with time and we have to determine the best time
at which we have to go for a new one. In case of a vehicle, the maintenance cost
is increasing as it is getting aged. These costs increase day by day if we postpone
the replacement.
49
2. Replacement of the equipment that fails completely
For example, a tube or a condenser in an aircraft cost little, but the failure
of such a low-cost item may lead the airplane to crash. Hence, we use some
replacement policy for such items which would minimize the possibility of
complete breakdown.
The following are the replacement policies, which are applicable for this
situation.
There is a trade-off between the individual replacement policy and the group
replacement policy. Hence, for a given problem, each of the replacement policies
is evaluated and the most economical policy is selected for implementation. The
optimal period of replacement is determined by calculating the minimum total
cost. The total cost is calculated using: probability of failure at time ‘t’, number of
items failing during time ‘t’, cost of group replacement and the cost of individual
replacement.
50
3. Other replacement problems
In staffing problems, with fixed total staff and fixed size of staff groups, the
proportion of staff in each group determines the promotion age.
The word renewal means that either to insert a new equipment in place of
an old one or repair the old equipment so that the probability density function of
its future lifetime will be equal to that of new equipment.
The term ‘service life’ is usually applied to products to indicate the period of time
over which they can function as they were intended, giving users the service they
expect. So, for instance, the service life of a boiler is the length of time it can
function as a boiler i.e., providing heating and hot water.
Service life may be thought to begin at the point of sale i.e., when the
customer buys the product, to the point it is discarded. Some products however,
are discarded before the end of their service life for various reasons, including the
arrival of better products on the market, boredom or simply a desire for change.
A product said to have a long service life may suffer the occasional
breakdown during that time. However, if it can be maintained and repaired to
allow it to function as before, it should not normally interfere with the service life.
Poor repairs can however, adversely affect service life.
• Quality of manufacture
• Materials used
• Flexibility in use
• Intensity of use
51
• Operating/environment conditions
• Care in distribution and use
• Built-in obsolescence
• Maintenance and repairs
52