Professional Documents
Culture Documents
ARP-A
Practitioner
RELIABILITY ADVOCATE
COURSE MANUAL
www.mobiusinstitute.com
This is designed as a guide only.
In practical situations, there are many variables, so please use this information with care.
Version 1.0
© 2020 - Mobius Institute – All rights reserved
Contents
WHY RELIABILITY? 22
SET PRIORITIES 23
MEASURING RELIABILITY 24
MEASURING PROGRESS 24
CHANGE MANAGEMENT 30
3
MOBIUS INSTITUTE | ARP-A Contents
ESTABLISH A TEAM 39
THE REALITY 43
PROCUREMENT 51
TRANSPORT 51
ACCEPTANCE TESTING 51
OPTIMAL OPERATION 52
GET ORGANIZED 61
DATA-DRIVEN APPROACHES 63
CRITICALITY RANKING 69
INTRODUCING RCFA 76
BROWN-PAPER PROCESS 79
VENDOR ANALYSIS 82
PROCESS REQUESTS 85
JOB PLANNING 85
4
MOBIUS INSTITUTE | ARP-A Contents
WHY PLAN? 85
JOB SCHEDULING 87
JOB EXECUTION 88
COMMISSIONING 89
CONTROL ACCESS 92
SELECTION PROCESS 93
ULTRASOUND 118
5
MOBIUS INSTITUTE | ARP-A Contents
6
MOBIUS INSTITUTE | ARP-A Introduction from Jason Tranter
NOTES
7
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
What is a “reliable plant” and what is it we are trying to achieve? We often say we are trying to achieve improved
reliability and improved performance.
A reliable plant has machinery and equipment that operate at the desired level of performance and quality
when called upon to operate, allowing the company to achieve its goals. Regular maintenance and checks can
contribute to this.
This reliability should be achieved without excessive costs, maintenance, and downtime. Equipment should not
need to be replaced often, and unnecessary redundancy, excess labor, and massive consulting bills should be
eliminated.
A reliable plant is clean, safe, well organized, efficient, dependable, stable, and competitive. As a result, a reliable
plant will have happy customers, happy owners and shareholders, happy regulators and insurers, and happy
employees. It is a safe and rewarding place to work.
NOTES
9
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
A reliable plant is clean, In a reliable plant, equipment does not break down unexpectedly,
safe, well organized, equipment achieves (or gets close to) its design lifetime, and the majority
of maintenance work is planned and scheduled based on condition. In a
efficient, dependable,
reliable plant, everyone is well trained with the required skills and tools
stable, and competitive. (including contractors), and equipment is operated optimally according to
As a result, a reliable plant will standard operating procedures.
have happy customers, happy In a reliable plant, work areas and plant/facility equipment are clean and
owners and shareholders, happy tidy. Work areas and storerooms are organized and documented, and
regulators and insurers, and the database is correct and up to date. Machines are lubricated, aligned,
happy employees. balanced, and tightened with precision. Work is performed safely and
efficiently, following documented procedures.
In a reliable plant, project and design decisions are based on the total cost of ownership: the equipment should
be maintainable, reliable, easy to operate, and energy efficient. Spares and equipment purchase decisions are
based on the total cost of ownership through their operation and disposal. The storeroom contains parts and
spares that can be justified. All work performed should be prioritized and justified.
In a reliable plant, employees are less stressed and less frustrated. The plant should not close (reliability can
ensure this). Work is performed with pride, targets are met, and the plant is safer for all employees. There is less
harm to the environment and improvements get made—too often, problems are identified and plans are made
to fix them, but nothing is done. Managers value your opinion and you are given ownership of projects.
A reliable plant is what we are trying to help you achieve.
Myth #1: Asset or equipment failure cannot be avoided, so we can only reduce the consequences.
It is true that there may be problems due to purchase and design decisions. But no matter what, we can extend
the life of equipment by taking the right steps. We must take proactive steps to improve reliability and extend
asset life.
Why do assets fail? Too often, it’s because we kill them through improper maintenance practices. For example,
because a fan is out of balance, it rotates in a circular motion, and the bearings are taking that extra load. This
will reduce the life of the structure, the motor, and the fan itself, and it can contribute to other quality problems.
Because of the offset in the alignment between this pump and motor, the bearings, the seal, the shaft, the
NOTES
10
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
coupling, and the foundations themselves are all being pounded 24/7. Just a little bit of misalignment can suck
the life out of the components.
Likewise, if a bearing is slightly cocked during installation, the rolling elements are pounded with each rotation
and the asset’s life is shortened.
If there is a tiny particle, just a few microns in size, caught in the bearing’s grease, the rolling elements will roll
over it and cause a tiny indentation that can spread until it becomes a spall. The same thing can happen in a
gearbox. The particles that do this type of damage are too small to see or feel.
If part of a machine is on standby, the vibration that transmits to it can cause false brinelling—the rollers chip
away at the raceway of the bearing. This can cause the unit to fail soon after it is put into use. There are many
reasons for equipment failures, but many are self-inflicted. If we change what we do, we can extend the life of the
equipment.
Myth #2: The path to improved performance lies entirely within the maintenance group.
The fact is that everyone is responsible for reliability, from the design phase through operation. As with safety,
everyone has to look at their own actions and observations to make sure everything is conducive to reliability.
NOTES
11
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
Fact: Everyone is Many maintenance personnel feel they are playing whack-a-mole with the
machines under their care: as soon as one problem is fixed, another pops up.
responsible for reliability,
They are too busy fixing problems to take proactive steps to improve reliability.
from the design phase We need to dig deeper and find out why the problems keep popping up.
through operation. Within the machine, some problems develop very quickly while others
take time. Where are the problems coming from? It starts with the design
department, where corners may have been cut in production. Procurement buys the product because it is cheap.
Because the design of the machine stresses it, maintenance problems arise. Maintenance may do what they need
to do to get the machine going and put off actually improving the machine.
Performing condition monitoring is like asking a doctor to do some tests on your body. Like your doctor, the
condition monitoring professional can tell you if a problem is developing. While informing us of developing
problems is helpful, those problems still exist. Hopefully, we can schedule the right time to do the needed work,
get the right people with the right skills, and order spares. This approach improves the reliability of the plant, but
not of the equipment. The root causes of failure are still there.
What if the design, procurement, maintenance, and spares departments all said, instead of “This will do,” “This
is the best way”? If they then took care of the equipment and maximized the value from the equipment, we
would achieve the design lifetime and reduce our total cost of ownership. The up-front cost of reliability can
be a challenge, particularly for management, but it will pay off in the long run as production, the quality of the
products, and safety improve.
Your doctor does not tell you that a heart attack is coming and suggest that you come in for a new heart in seven
weeks. He or she sees the plaque buildup and the high blood pressure and suggests preventive maintenance,
such as dietary changes and exercise. In the same way, we need to make changes to the way we are operating
the equipment—it should be balanced, aligned, operated in a consistent way, and cleaned. Like your body if you
follow your doctor’s advice, the machine will have a longer life and avoid failures. The maintenance department
will be happy because problems won’t pop up all the time. Operations will be happy because the machines are
performing well. Management will be happy because overall costs are going down as a result of all this.
The fact is, time-based maintenance can waste a lot of money. Condition monitoring gives a warning of the root
cause and the failure, and we can act accordingly. Some maintenance work should be time based, but other
maintenance work should be condition based.
One of the basic principles of condition monitoring is that we watch for the tell-tale signs and only perform
maintenance when needed. The P-F Interval is what we want to watch. It is shown in a graph with the x-axis
being the passage of time and the y-axis being the condition of the asset. The line will stay steady as long as the
condition is stable. It will start to curve down when a defect is initiated, and will continue downward until failure
NOTES
12
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
if left unchecked. The good thing is that the machine will start warning us during this time by making different
sounds, changing its temperature, etc., and condition monitoring allows us to catch those signals. Our goal is
to detect the problem as early as possible. An early warning of an unavoidable problem gives us time to order
spares and plan the maintenance for a convenient time. The longer the problem goes undetected, the more
the costs of maintenance go up, as do all the risks associated with failure. We are interested in the curve from
Potential failure to Functional failure, namely, the P-F Interval.
The real key to success is developing a culture of reliability. The technical aspect is important, but reliability is a
people issue.
We need a vision and a strategy. Senior management needs to emphasize that safety and reliability are core
aspects of our values and key to our success. Everyone must participate, contribute, and believe. Maintenance
and production should work together to achieve reliability.
Myth #5: The reliability engineer knows the most about equipment reliability.
NOTES
13
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
The people who operate and maintain the equipment day after day know a lot about the equipment. Many times,
they understand why the machines fail and have ideas on how to improve them, but their voices are not heard.
Myth #7: You must analyze historical data before you can make improvements.
Analysis is good, but we should be proactive from Day One. Analyzing data does not fix a failing machine.
Myth #9: Everyone will always understand that you are delivering value.
Unfortunately, many plants let reliability people go when things are going well, and reliability soon starts going
down without them. To make sure this doesn’t happen to you, you must consistently communicate the benefits
of the program. This benefits you and the company.
What happens if you do not sell the program? When reliability in a plant improves, the program itself can be seen
as a source of savings. Often, a new manager will come in and see cutting reliability as an opportunity to reduce
costs. They may even assume they will be promoted for saving the company money (before reliability plummets).
Because of the resulting failures in the plant, someone will restart the reliability program, only to cut it again
when things are going well, because no one is selling the program and the culture has not been changed. I call
this the reliability roller coaster. Keep selling the program to avoid it!
NOTES
14
MOBIUS INSTITUTE | ARP-A R-01: Getting Started
All of these initiatives start from the same base: increasing the uptime of the equipment. This involves stopping
breakdowns and reducing downtime (including planned downtime), minor stoppages, slowdowns, and
changeover times. We need to reduce waste: of time, all types of resources, energy, and money. We must also
improve quality. All of this leads to a happy customer and ultimately to business success, which may mean
improved profitability for some organizations. Safety and environmental impact are also important aspects to
consider.
NOTES
15
MOBIUS INSTITUTE | ARP-A R-02: What Are the Benefits?
WHY RELIABILITY?
Let’s talk about some of the benefits of reliability. It is important for everyone involved in the organization to
understand why we are doing this.
A reliable plant is a safer, more productive, more secure, more environmentally friendly, and more competitive
plant, and its employees are less stressed and more fulfilled. If you are a maintenance person, for example,
reliability allows you to do your work the way you were trained: precision alignment, precision lubrication,
precision bearing installation, etc.
A reliable plant may also use less energy, may have higher quality products, is more insurable (and may have
lower insurance premiums), and can be ISO 55000 certified.
Ultimately, a reliable plant maximizes the value from its assets. A plant that does this is one that remains in
business, giving everyone that works there great security.
NOTES
17
MOBIUS INSTITUTE | ARP-A R-02: What Are the Benefits?
NOTES
18
MOBIUS INSTITUTE | ARP-A R-02: What Are the Benefits?
investing in the business, both of which can result in competitive advantage and satisfied customers.
Asset life: Through our reliability improvement initiative, we are able to extend the asset life. This is especially
critical in an aging plant. We can do this through improved reliability. Prescribing the correct operation is part of
this, so that the machine runs with less stress and strain.
Employee satisfaction: Employees should have a sense of purpose and job security. Targets should be achieved.
Employees should experience less frustration and greater safety at work.
Safety: A reliable plant is a safe plant. A reliable plant will have fewer serious failures. There is less maintenance.
Repairs and installations carry the risk of injury, especially if the work is rushed and the workers are poorly
trained, following incorrect (or no) procedures, and using the wrong tools.
Compliance: Every organization will have regulations they have to comply with to improve safety and reduce
environmental incidents. Reliability can help a plant achieve these benchmarks.
NOTES
19
MOBIUS INSTITUTE | ARP-A R-02: What Are the Benefits?
NOTES
20
MOBIUS INSTITUTE | ARP-A R-03: Assessing the Benefits
We need to step back when looking at a new reliability initiative or evaluating an existing program. We must
assess where we are now and determine where we want to go before we try to go there. We need a way to
quantify the opportunity before we try to sell the opportunity—make the numbers specific to your organization’s
capabilities and needs. We also need to measure our progress.
HOW TO BEGIN
NOTES
21
MOBIUS INSTITUTE | ARP-A R-03: Assessing the Benefits
If our goal is industry best practice, we have to take into account the design of our plant and the standards set
by similar plants. This is our target. Then we mark where we currently are. That gap in between is the money
the plant could be making, and this is how we justify the program (in addition to compliance with regulations).
Over time, we will close that gap. Of course, this may lead to management thinking it’s safe to cut the reliability
program, but that gap will widen again if they do. As we get closer to our goal, we can change the goal. If our
original aim was the standard set by a similar plant, we may find we can go further and do better than that plant.
To reach the goal, some up-front investment may be needed for training and equipment.
We can use this information to convince senior leadership of the value of the initiative, set goals and targets,
establish our KPIs, set properties, and assign criticality. When we know what we want to achieve, we will know
which pieces of equipment are most likely to influence our ability to reach our goals.
WHY RELIABILITY?
Why do we endeavor to improve reliability? How will your plant in particular be affected? Reliability seems to be
common sense, but that might not help you sell the program to senior management.
Ultimately, we improve reliability to add value to the business. To do this, you have to determine how reliability
NOTES
22
MOBIUS INSTITUTE | ARP-A R-03: Assessing the Benefits
SET PRIORITIES
Every organization has limited resources and limited time. This is why prioritization is essential. Once you have
set the priorities, you need to decide which machines to focus on when setting up a reliability program. Some
machines impact capacity, uptime, or quality much more than others. If your priority is throughput and you have
a bottleneck, it is necessary to focus on the machine with the bottleneck.
Then how do you prioritize, how do you justify your expenditure, and how do you get support for your program?
When we do a criticality analysis, we identify areas in which the consequences of failure are especially destructive.
This is something to keep in mind when setting priorities, and it is something ignored by many companies.
NOTES
23
MOBIUS INSTITUTE | ARP-A R-03: Assessing the Benefits
business. Fourth in the business process review, we should look for opportunities to increase output, improve
quality, and reduce cost and waste. Maintenance and production will be concerned about threats to output and
quality. After doing that, be sure to take off the “risk hat” and put on the “opportunity hat”: What is it that we
can do to increase the output, increase the throughput, reduce waste, and so on? If this was your business, you
wouldn’t only try to avoid the bad things, but you would try to make the good things happen. A machine failure
will reduce the output of a plant, but there may be something you can do, especially if you talk to maintenance
and operations, to improve product quality and output. Thinking of opportunities will allow you to look at the
equipment a little differently.
Do you know why you are trying to improve reliability? Can you say it in a clear statement and would senior
management agree with it? Would the “front line” agree?
MEASURING RELIABILITY
What is reliability costing you now? Assess how the plant’s areas of weakness are affecting performance. This will
show the value of making improvements.
When you review performance, you can benchmark against the plant’s best performance, against its design
capacity, or against industry best practice.
It is important to know where we are today so we have a reference for comparison. This is especially important if
a discussion arises of removing the reliability program. You can then remind senior management of where they
started from.
If you are not in a position to perform a detailed benchmark, seek outside support.
MEASURING PROGRESS
You need a way to measure your progress. Establish basic KPIs to keep track of how you are performing in each
NOTES
24
MOBIUS INSTITUTE | ARP-A R-03: Assessing the Benefits
area over time. Are we improving, and what are the opportunities for improvement? Don’t use too many KPIs
because that will get confusing, but make sure you measure what you want to improve.
Do not use the KPIs as a carrot or a stick. It can result in people adjusting results and priorities to achieve KPIs.
Use them to identify opportunities for improvement and not as opportunities to punish people.
In conclusion, we need to know where we are now and where we want to go before we try to go there. This
process lays the groundwork for everything that follows.
NOTES
25
MOBIUS INSTITUTE | ARP-A R-04: Culture Change
NOTES
27
MOBIUS INSTITUTE | ARP-A R-04: Culture Change
to change. We can buy all the new tools, such as the laser alignment system, but if people do not believe in the
program and are not trained in it, they will not use it properly. That’s the negative part.
On the positive side, there are other important benefits to working well with the “plant floor.”
The mechanics, electricians, and operators who spend so much time with the machines know a lot about those
machines. They see what causes machine failure and slowdowns. Some of them may have made suggestions for
improvement in the past but saw nothing happen. We need to engage with all departments and listen to their
suggestions.
Remember to also get the support of operations, maintenance, engineering, purchasing, and those dealing with
safety, product quality, and environmental effects.
NOTES
28
MOBIUS INSTITUTE | ARP-A R-04: Culture Change
involved in the process. People are not robots. We cannot just reprogram them.
We cannot engineer our way to success. Data analysis does not change behavior. People will change if they want
to change. Most people do not mind change, but they don’t like to be changed. If people feel it is their idea, and it
is in their best interest, and if they participate in the process, they will want to get involved.
How do we create a culture of reliability? First, we need to understand
Most people do not mind people. Second, we need to understand the culture-change process.
change, but they don’t like We need to understand the human psyche and personalities. All people are
to be changed. If people different, but they fall into certain categories. Some will support us, some
will defy us, and some will sit back and wait. We need to know this so we can
feel it is their idea, and it is
plan how to manage each group.
in their best interest, and
Another way to categorize people is that they are positivers, fence-sitters,
if they participate in the
and doubters.
process, they will want to
Positivers are open to new ideas. They will be enthusiastic about the
get involved. program and make suggestions. They are enthusiastic for change to take
place, and they will act rather than merely talk. Your success depends on
identifying and engaging with positivers early on.
The fence-sitters are a large group. They are neither very positive nor very negative, and they will go with the
flow. They are easily influenced by other people in the organization.
The doubters actively defy change. They may say things like “We tried that and it didn’t work,” or “It won’t work
here.” They will try to influence the fence-sitters.
Within the doubters, the dragons are the ones who talk the talk but cannot walk the walk. They may appear to
support the program but will allow their doubts to surface soon afterward. Not only will they not do what they
agreed to do, they may actively defy the program. They are dangerous, and you need to figure out who they are
and watch them. Dragons in management roles are particularly dangerous.
The handbrakes, in contrast, are those who actively seek to protect their turf. They are the ones who make
excuses when presented with the program. In some ways, they are easier to deal with than the dragons because
they are easy to spot. Focus your attention to the positivers, try to get the fence-sitters on your side, and watch
out for the dragons. The handbrakes just need to be managed somehow.
The culture-change personalities can usually be broken down this way: The champions will make up about 5%, and
they will lead the change. The rest of the positivers, around 20%, will get involved early on. The fence-sitters make
up around 50%. The handbrakes make up about 20%, and they will only change when there is no other option. The
dragons, at around 5%, will almost never change. Focus on the positivers and the fence-sitters. The doubters may
be the ones who retire early, move to another plant, or face being let go if layoffs become necessary for the plant.
NOTES
29
MOBIUS INSTITUTE | ARP-A R-04: Culture Change
CHANGE MANAGEMENT
To facilitate the culture-change process, we need to identify the positivers and other personalities. Recognize how
individuals will benefit by pinpointing what is wrong now. Then devise training and communication plans to keep
them informed on the progress. Make sure you do not have handbrake personalities in key roles. Try to convince
the fence-sitters to work with you, and don’t waste too much time on the dragons.
NOTES
30
MOBIUS INSTITUTE | ARP-A R-04: Culture Change
people enthusiastic about reliability if you are not going to listen to their suggestions, give feedback and training,
or perform the proactive work they know the machines need. They will dismiss your program as a flavor-of-the-
month deal.
NOTES
31
MOBIUS INSTITUTE | ARP-A R-04: Culture Change
Watch the subtle messages you may be sending. If a machine breaks down and you say to the workers, “When
will it be fixed? What can we do to expedite the repairs? What will all this cost?” you send the message that they
need to hurry up and make the repair as cheap as possible. “When will you have it installed? Do you really need
to do laser alignment? Could we get it running and align it later?” You are not asking these people to do a better
job if you say these things.
Instead, try saying, “Is this likely to fail again? What caused it to fail? What can we do next time to get a warning of
failures?” Also say, “Good job following the installation guidelines. Joey will be along when you have it precision
aligned to go through his commissioning checks.”
Explain the benefits. Answer the question “What’s in it for me?” They know the shareholders will benefit, but
explain how they will benefit. For example, a maintenance person will benefit from fewer call-outs that take them
away from family events. They can do most of their work within regular hours. They may experience greater job
satisfaction and less frustration. Increased profitability for the plant means job security for workers. Increased
reliability means the equipment is able to perform its function. No one wants to deal with an irritating machine
that doesn’t work properly. They can also take pride in keeping the machines running. Fewer machine failures
mean improved safety at work.
NOTES
32
MOBIUS INSTITUTE | ARP-A R-05: Selling to Senior Management
NOTES
33
MOBIUS INSTITUTE | ARP-A R-05: Selling to Senior Management
NOTES
34
MOBIUS INSTITUTE | ARP-A R-05: Selling to Senior Management
NOTES
35
MOBIUS INSTITUTE | ARP-A R-06: Establishing the Strategy
NOTES
37
MOBIUS INSTITUTE | ARP-A R-06: Establishing the Strategy
Don’t try it without a You must have a plan that is prioritized according to the greatest value that can
plan. Break out of the be generated. Based on what we have talked about in previous lessons, you
should know how to add value and focus your attention in those areas. It must
reactive maintenance
be based on the plant’s identified weaknesses, as they will give the biggest bang
cycle of doom before for the buck. Keep in mind what’s best for the organization and what people
progressing further. can buy into. You need milestones and a way to measure progress. You must
communicate your progress, even when things don’t quite work out. Keep the
program alive in their minds. Continuously review and improve your plan based on the plant’s priorities and
circumstances.
The plan must be realistic. This is a long-term process. It is easy to lose management support, and it is easy to
lose focus and become distracted. Make sure people do not slip back into their old habits. It is also easy to give in
to the naysayers. You may need help to do this—possibly from a mentor outside the organization.
Avoid these traps: Don’t try it without management support and plant-wide buy-in. Don’t try to be “world class”
one area at a time, and don’t try to engineer your way to success. Don’t try it without a plan. Break out of the
reactive maintenance cycle of doom before progressing further.
NOTES
38
MOBIUS INSTITUTE | ARP-A R-06: Establishing the Strategy
ESTABLISH A TEAM
Another step is to establish a team. You cannot do this program by yourself. One way to do this is with a steering
committee. Make sure you have a group of people with the right attitude (positivers) that represent the key
departments: operations, production, maintenance, engineering, health and safety, quality, environment, and
reliability (if you have one). Include unions, if they are present at your plant. This is a way to make sure you have
everyone’s support, rather than this just being a reliability group’s project.
NOTES
39
MOBIUS INSTITUTE | ARP-A R-06: Establishing the Strategy
steering committee. There are certain qualities that person should have, and they are leadership qualities, not
management qualities: the ability to motivate and inspire people, and to get people to contribute. They don’t take
credit for all the good things that happen. They should be invisible in some ways and visible in others.
NOTES
40
MOBIUS INSTITUTE | ARP-A R-07: Understanding Failure
When we talk about solving problems and maintenance strategy, there are some things we need to understand
about how failure works. In this module we will talk about how equipment fails, as there are assumptions that
hold the reliability strategy back.
NOTES
41
MOBIUS INSTITUTE | ARP-A R-07: Understanding Failure
COMMON BELIEFS
What do people believe about the time-to-failure of rotating machinery? Does it wear out to a point where it
could fail at any minute? Does the probability of failure simply increase over time? Or is failure independent of
time—failing randomly due to equipment lifespan, operation, and installation?
What would happen if we took 30 machines, each with a bearing, ran them for a period of time, and waited
for them to fail? Many believe they would run for a certain time, and there would be a small window of time in
which they all started to fail. The probability of failure was low for most of their lifetimes (because none of them
failed during that time), and it went up at a certain time. Someone might decide to allow the machine to run for a
certain number of hours and simply schedule a shutdown to replace the bearing when it reached that window of
time when many of them failed. Replacing the bearing before that time would waste a lot of money.
Let’s graph the probability of failure against time of failure, with the y-axis being the risk of failure and the x-axis
being the months of the year. All the machines run happily for a certain amount of time and fail at around the
same time—these are age-related failures. Let’s say they all fail after November.
If we were to graph that statistically, we would have a sharp upward curve between November and December.
NOTES
42
MOBIUS INSTITUTE | ARP-A R-07: Understanding Failure
It would be logical to schedule maintenance for that time since our equipment would be likely to fail according
to the pattern, and we don’t want any unexpected failures. This practice of replacing the bearing just before it is
expected to fail is the common idea behind preventive maintenance, or time-based maintenance.
THE REALITY
The graph showing the bars that came up to similar levels was not real. This graph is from a real study—all 30
bearings failed at different times. It is a random failure pattern. Therefore, time-based maintenance is not the
best strategy, as you will still experience unplanned downtime. On the other hand, some bearings will keep
running for a long time. With time-based maintenance, you may end up taking out a perfectly good bearing and
putting yourself at risk for infant mortality with its replacement. It is quite likely that you will actually introduce
NOTES
43
MOBIUS INSTITUTE | ARP-A R-07: Understanding Failure
a fault with this strategy. If a bearing is working well, there is no need to change it. Use condition monitoring to
find out when the root causes of failure are present or to detect early warnings of failure. Much of the time, there
will be a long warning time, allowing you to schedule the maintenance at a convenient time, possibly the next
planned shutdown.
Many studies, including an extensive study by Nolan and Heap, found that over 90% of failures are not age
NOTES
44
MOBIUS INSTITUTE | ARP-A R-07: Understanding Failure
related. There are a number of infant mortality failures, followed by random failures. So what does our reliability
improvement program aim to do? First, it aims to reduce the probability of failure at the start. If we improve
our maintenance and commissioning practices, and the way we care for the spares, there will be fewer infant
mortality failures. By improving the way we operate and lubricate the equipment, we reduce the probability of
failure during the random period. The more we care for the machines, justified according to the criticality of the
equipment and other factors, the more we reduce the likelihood of random failure. We can have confidence that
the plant can be run at the proper speed, achieving the proper level of uptime and the right quality.
68% of equipment followed the pattern shown in the Nolan and Heap
You need a strategy that and other studies, and another 14% followed a pattern in which failure
matches the failure modes was completely random and there was no infant mortality. Another 7%
followed a pattern in which the probability of infant mortality was even
and patterns of the equipment
lower than that of random failure down the line. So roughly 90% of
you are dealing with. You will the equipment you have is subject to random failures. Therefore, 90%
therefore need to understand should have condition monitoring, and we should employ condition-
your equipment’s failure based maintenance (these terms will be clarified later). Only 10% of
modes and patterns. equipment follows an age-related failure pattern. For these machines,
time-based maintenance may be best.
What do we do with this information? We use it to establish our asset strategy, or our maintenance plan: decide
which machines should get condition-based maintenance, time-based maintenance, or run-to-fail.
In conclusion, most failures are not age related. You need a strategy that matches the failure modes and patterns
of the equipment you are dealing with. You will therefore need to understand your equipment’s failure modes
and patterns.
NOTES
45
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
NOTES
47
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
Now we need to put a force field around the machine to protect it from us: It is easy to put contaminated
lubricant into the machine. We need to make sure parts on the shelf do not become damaged. We need to use
precision installation practices. We need to make sure we operate the machine correctly, as closely as possible
to its best efficiency point, and in a consistent way. We should not perform
unnecessary PMs, particularly intrusive maintenance practices that could harm
With a defect elimination
the machine. Likewise, intrusive inspections should be avoided whenever
program, we are, in possible. And we want to use our internal condition monitoring programs to
essence, trying to build make sure no damage has resulted from any of the aforementioned actions.
a force field around the Now this machine has the best chance of delivering the maximum value.
Along the way, we keep the machine clean, make any adjustments, replenish
plant: We make sure
lubricants, and continue to monitor. This is basic defect elimination.
anything we buy for the
The P-F Interval was introduced earlier in the course—the idea that we can
plant is designed for
monitor a machine and see the symptoms of the beginning of failure. Defect
reliability, maintainability, elimination comes before, looking at all the areas that could lead to failure.
operability, and so on. Ideally, it would help us avoid the P-F curve. We could go further back and
consider the design practices and other aforementioned issues (obviously
not for the present machine, but for future ones). If we do this, the y-axis, which corresponds to the machine’s
condition, will remain high and steady since all the reasons that the machine could fail have been removed.
Bearings and other components do, of course, have a lifespan and they will eventually need to be replaced.
NOTES
48
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
NOTES
49
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
NOTES
50
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
PROCUREMENT
The same is true for the procurement process, but it is ongoing: buying new lubricants, bearings, electrical
connectors, etc. There are choices that can be made that will reduce the cost of ownership, and others that will
reduce the cost of purchase. Our goal is to influence the purchases in order to reduce the total cost of ownership.
We need the highest level of maintainability, operability, and reliability. The procurement team may not have that
understanding and may not know how to look for this. They are in charge of finding something with a certain
specification and, when found, to purchase the one with the lowest price. Our job is to provide a specification
that will only allow a piece of equipment with maintainability, reliability, and operability to be selected.
We have to achieve the lowest life-cycle costs. Focus on the total cost of ownership, not the purchase cost.
The former includes maintenance, downtime, operation, lubrication, and replacement costs, as well as energy
efficiency.
On a related note, one problem the industry faces is that there are companies producing counterfeit parts. These
parts do not meet the stringent standards that the companies advertise. Procurement departments that are
looking for low-cost parts may inadvertently purchase counterfeit bearings, lubricants, filters, etc.
We need to consider the service providers. Look very carefully at the service they are providing, rather than just
telling them to go balance a rotor—give precise instructions. Send someone with them to inspect a gearbox, for
example.
TRANSPORT
You may have a machine that is in good condition when it leaves the vendor but is impacted during transport by
vibration, dust, etc. Do acceptance testing to make sure it is up to your standards. Proactively dealing with root
causes is much better than waiting until the equipment fails and arguing with the OEM about the warranty. Just
take steps to eliminate the possibility of these failures occurring in the first place.
With a rolling element bearing that is vibrating, there is something called false brinelling. The rollers are chipping
against the surface of the bearing—for example, when it is in the back of a truck. The bearing will be degraded
and will fail soon after.
ACCEPTANCE TESTING
Say it is part of your purchase requirements that these tests will be performed to prove that this equipment is
fit for purpose. We assume that when we buy a part, it is in excellent condition on delivery and is designed for a
long life. We assume it will operate without resonance and that the bearings are in good condition, that there is
no soft foot, etc. But to be proactive in the selection process, we should assign the vendor tests to perform and
also perform some ourselves, with the desired results specified. That puts the supplier on notice to make sure it
fits the requirements. Companies still find a lot of equipment that fails the tests. Imagine what would happen if
NOTES
51
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
OPTIMAL OPERATION
Then we must operate the equipment properly. If we have the right design and have purchased the right
equipment, we should be able to operate the machines under the stresses and loads they were designed to cope
with. Make sure the equipment is operated in a way that enables it to reach its maximum lifespan: consistently,
according to procedures, and without excess waste.
Recognize that the operators of the equipment often understand the events that lead to changeover issues,
quality issues, maintenance issues, and so on. They know when the machine has been making strange vibrations
or noises, and then it fails. Engage with them and take their advice. Operators spend a lot of time with the
equipment, so they are in a great position to give advice and to perform basic maintenance tasks, such as
cleaning and lubrication, as well as condition monitoring.
Hence, we need to educate the operators on how important it is that the equipment is operated correctly. They
need to understand that when they do not operate the equipment in the optimal way, they are sucking the life
out of that equipment. (Be sure to convey this without pointing fingers!) If they have an understanding of how
their actions lead to equipment problems, they will be more likely to operate the equipment correctly. Later, we
will discuss the brown-paper process, which is a collaborative way of bringing people together to get their ideas
on why equipment fails or underperforms, and to come up with solutions together. As part of that process, they
will learn the proper way of operating the equipment, whether that information comes from you or from other
operators.
MCP Consulting, in association with the UK Department of Trade and Industry, found that
• 40-50% of equipment breakdowns are caused by poor operating practices
NOTES
52
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
NOTES
53
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
recirculation, cavitation, and other issues that degrade the bearings, the impeller seals, etc. Operators need to
understand these consequences.
The other way we are going to work with operators is by implementing standard operating procedures: everyone
starts up the machine properly, shuts it down, goes through the changeovers, etc., so that the machine is
operated in a steady way.
It is reasonable to expect that, with so much on the line in terms of expense and production requirements, there
will be standard ways of operating the equipment. But what is the best way of operating the equipment? We will
discuss this more with the brown-paper process, but one way to do it is to have each shift document how they
operate the equipment and look at the performance. As a group, you can then decide on the best way, or ways,
to operate the equipment. Each shift may find a slightly better way to operate the machine, and each group tells
the other what they found during the discussion process.
We need to look at something called operator asset care, or operator-driven reliability (ODR). When breaking
out of the reactive maintenance cycle of doom, how do you take a group of maintenance technicians and get
NOTES
54
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
them to perform tasks to extend the life of the equipment when they are already so busy? One way is to engage
the operators and get them to perform certain tests, inspections, and basic maintenance work, utilizing their
experience with the machine. This will save the maintenance team time that they can use on tasks that require
their specific skills. Also, if the operators are performing these basic tasks, there will be fewer failures, which will
free up more time for maintenance.
ODR utilizes operators in some proactive maintenance tasks. Because they work with the machines every day,
they are in a perfect position to perform these tasks. They can even use some basic condition monitoring tools to
check the equipment. They can keep the equipment clean and, maybe, do some lubrication work.
We have to be careful when going about doing this. The maintenance department may feel we are taking jobs
from them and operators may question why they are being asked to do maintenance work. This is why the
implementation plan is important. If you have engaged with them, involved them in the plan, and trained them,
they will be ready when you implement ODR.
Other studies have been done, such as the one by the Japanese Institute of Plant Management, which found that
70% of failures are preventable by operators, while 30% require intervention by technical specialists. The point
is not that the operators are to blame for the failures. It is that the operators are in a position to prevent 70% of
failures. This takes a huge burden off the technical specialists and engages people who are already working with
the machines. This is a great opportunity to break out of the reactive maintenance cycle of doom.
Another study, from Whirlpool, completed 23 RCM analyses and identified 1,864 tasks to minimize failures:
68% were performed by operators, while 31% were performed by technicians. 237 redesigns of process and/
or equipment were needed in order to prevent future failures. Again, ODR is very important. In another study,
66% of tasks were performed by operators, 32% by maintenance, and 2% of processes and/or equipment were
redesigned.
Operators can perform non-intrusive inspections, cleaning, lubrication, and basic vibration, ultrasound, and
temperature readings. As for lubrication, it is generally better to have a dedicated lubrication team if you can.
ODR is one of the keys to breaking the reactive cycle. It frees up time for the maintenance crew to do other tasks.
Proactive tasks will get done by people who should not be distracted by break-in jobs. It reduces the root causes
of failure. Operators will have a greater connection to the reliability improvement initiative.
The fourth way we can work with operators has to do with production rates and quality, in addition to failure.
Earlier, we discussed four areas: performance, constraints, risks, and opportunities. Here, we are looking at
opportunities. How can we increase the production rate and the quality of what we produce? How can we
reduce waste, keep customers happy, achieve production targets, and achieve higher capacity? People in the
maintenance and reliability departments may think that’s not their purview. That is part of the reason we put a
steering committee together and get senior management support. Why just focus on machines breaking down?
What about the minor stoppages, slowdowns, and changeover and transition losses? We all serve the same goals,
NOTES
55
MOBIUS INSTITUTE | ARP-A R-08: Defect Elimination
so why not look at these issues as well? The people who work with the equipment will have opinions as to why
these things happen and how to prevent them.
A common measure of the reliability program is the OEE (overall equipment effectiveness)—the combination of
the uptime, production rate, and quality. A lot of maintenance people may be focusing on the uptime, but they
should look at other areas as well.
In conclusion, defect elimination lets us get ahead of the root causes of failure. We know what causes failures
in most industries, and you can set proactive tasks to avoid those failures. The extra effort and up-front cost
will reduce the overall cost of ownership. Everyone needs to appreciate this concept, as the temptation to save
money can be overwhelming.
NOTES
56
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
NOTES
57
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
(RCM), preventive maintenance optimization (PMO), and root cause failure analysis (RCFA). But when are
we going to use those techniques, why, who will perform them, and which of these methodologies is best?
These approaches take a certain amount of expertise and time, which you may not have.
Condition-Based Maintenance
In condition-based maintenance, we make the decision to perform maintenance based on information that
indicates that the work is required and that the machine will fail if we do not perform the work. To find out if
work is required, we perform inspections, look at performance data, perform classical condition-monitoring tests
such as vibration analysis or infrared, test the oil, etc. How often we perform the tests depends on criticality. All
of this allows us to determine when to replace bearings, replace lubricant, re-align the machine, clean filters,
replace tires or belts, re-lubricate a bearing, and more.
There are a lot of decisions we have to make based on design parameters:
• Logic: CBM, PM, or RTF
• Criticality: To justify the program
NOTES
58
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
Interval-Based Maintenance
What is interval-based maintenance? Some people call it preventive maintenance but, because in some places
that term includes condition-based maintenance, interval-based maintenance is the more precise term. When we
cannot detect onset of failure but the asset is too critical to run-to-fail, we need to perform scheduled restoration
or replacement tasks. For example, we may know how long it takes for a cog to wear out, so we will replace it
NOTES
59
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
at that time. We could also look at the cog and see whether or not it is worn—some interval-based tasks can be
made condition-based (though the inspections would remain interval-based).
Interval-based maintenance works best with age-related faults. We can make the decision to replace our tires
after traveling a certain number of miles, or we can inspect the tires at that time and decide whether to replace
them sooner or later. There is a risk associated with replacing those tires, or that cog, purely based on time:
we may wait too long and have it fail, or we may replace it earlier than needed and expend time, energy, and
expense. With a condition-based strategy, we perform tests to determine the condition of the equipment,
but that takes time, money, and expertise. So we have to decide which strategy to use. With interval-based
maintenance, the interval can vary and may be determined by time, running hours, distance traveled, production
runs completed, etc.
We must carefully scrutinize the existing PMs that may be historical or imposed by OEMs. The trouble is,
everything that takes our time takes us away from the proactive steps we should be performing. There may be
existing tasks in your plant that are just a waste of time. They may even be inducing failure in the equipment.
Later, we will discuss PMO (preventive maintenance optimization) in more detail.
Another part of our asset strategy includes hidden failure finding tasks. Some failures can occur without us
knowing, and we only find them when we try to get the machine to run: for example, a safety switch, a pressure
relief valve, or a standby pump. We assume that a standby pump will run when we need it to, but how do we
know that it actually will? We cannot measure the vibration because it is not running. We need a way to look for
these hidden failures. One strategy is to just cycle through the pumps, which is a good idea anyway. We must
identify the risks associated with hidden failures and develop a strategy to manage them.
NOTES
60
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
If you have good data, we do takes time and, therefore, costs money. If we try to do everything,
we probably won’t do anything very well, so we have to make some hard
use it, but don’t make
decisions. For much of the equipment, simply dealing with the failures as they
that your first and only happen may be the most cost-effective strategy. Be sure to report this decision,
priority. You may have as well as your reasoning, to management.
to wait a long time for Some assets are simply less important than others. They are not expensive
good data, so don’t to replace, they do not pose a safety risk, they are not critical, and they do
let the lack of it hold not cause secondary damage when they fail. You carry spares or you can get
them easily. Therefore, you cannot justify collecting and analyzing condition
you back from being
monitoring data and performing preventive maintenance tasks on them. That
proactive. Collect data does not mean we deliberately make the machine fail—you will still lubricate it
along the way. and take action if it is making noises or heating up.
GET ORGANIZED
It is important to be organized through this process. It does not make sense to develop an asset strategy if you
do not have an up-to-date master asset list or equipment register.
Ideally, there should be a management-of-change process so that if you are proposing to change anything in the
way the equipment is maintained or operated, the change should be documented.
The bill of materials (BOM) is a key part of being organized and in control. This lists the equipment as well as all
NOTES
61
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
the components and parts that make up the equipment. Then, when you do failure modes analysis, you know
what the components are and what can fail. This will also help us keep track of the spares we need.
Documenting a BOM is a big job, and it is often left undone because people do not believe it is worth the time.
When maintenance work is performed, it is a good opportunity to document that information. When the machine
is open, look inside to document the bearings, gears, etc.
NOTES
62
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
DATA-DRIVEN APPROACHES
As for the data-driven approach, we can take two basic approaches: creating a bad actor list with Pareto analysis,
or taking an analytical approach while looking at your failure information—how often you are experiencing
failure—and determining which failure curve you have. Weibull analysis
will help you there. If you know the reliability of all the assets, you
If you have 10,000 assets,
can create a reliability block diagram and determine what the future
they will not contribute availability will be, assuming that the reliability stays constant. You can
equally to your reliability perform simulations of possible changes using software.
problem. Some of them We would love to have data to use in making decisions, but if we do
are bad actors—they break not have it, we should just start acting now to improve reliability. If you
down relatively frequently, start measuring now while improving reliability, note that the MTBF may
change—perhaps failure used to be when the machine stopped working
keeping the maintenance
and now it is when someone detects it with a vibration analyzer. We are
department busy and also affecting the MTBF by the proactive changes we are making. Just be
frustrating the operators. aware that the data will not be constant due to these factors and take that
NOTES
63
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
NOTES
64
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
cause? Even the existing data would allow this plant to prioritize what they do. In this paper mill, the amount of
downtime dropped after PM programs were implemented on those machines.
It is important to appreciate good data. Don’t be in a position where you say, “We don’t have good data, but it’s
the only data we have, so we will base all our decisions upon it.”
Ideally, we would like to document and know which asset failed, when it failed, what failed, and the consequence
of the failure. You might collect and document that information in different ways. Make sure you document
the failure in detail so you can use it as a reference later (don’t just write “other,” for example). Data must be
recorded properly, but we have to be realistic. If there are 200 failure codes, they will not be used properly.
You need good data. What is being recorded right now? Are people recording maintenance work against the
correct asset? Ideally, we would like to know which asset failed, when it failed, what failed, and the consequences
of failure. You may need to dedicate one person to gather this information. If we could record the date, the time,
which asset it was, and the nature of the failure, that would give us great information.
Unfortunately, when we look at a chart, we often see things like “It’s not working” or “Failed again.” At least
NOTES
65
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
we know which assets are the troublemakers, but we do not know the nature of the trouble. Be precise when
documenting the failure.
One way to record data is through failure codes. If we have 200 failure codes, however, it is not going to work.
The technician will not have the patience to scroll through them and choose the right code.
NOTES
66
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
lot of failures and began inquiring about them. They decided to do some analysis, and they looked at how often
each individual fan was failing. They found that only a limited number of fans were failing, due to infant mortality.
The contractor, while “fixing” the problem, was introducing new problems, and the fan would just fail again.
They decided to bring maintenance in-house and use precision skills, with the right procedures and tools, and
reliability improved. In this case, they had data to prove the situation. Senior management was able to see that
they could save a certain amount of money by bringing maintenance in-house.
NOTES
67
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
justify our decisions? We can use ACR to determine which spares to keep, justify
Criticality is the
PMs, and so on.
combination of the
What is criticality?
frequency of failure
and the consequences Criticality is the combination of the frequency of failure and the consequences
of failure. To put it another way, it is the importance of the machine and the
of failure.
likelihood of failure. It is a measurement of the risk we face. Let’s say we have a
production line, and the product goes from one machine to the next. We know
that the line will shut down if any one of those machines fail. Therefore, we can conclude that they are all equally
critical. But we have to go further and ask, “Which of those machines would we least want to fail?” Of course,
you do not want any of them to fail on that crucial line, but one of them, for example, may be more expensive
to repair than the others. The parts may only be available overseas. It may have a shorter lead time to failure.
Therefore, this machine is more critical than the others.
Are any of the machines less reliable than the others? You probably have one that fails often and is likely to get
NOTES
68
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
the line shut down. Therefore, it is a good idea to look for the root cause of failure, and you may be able to justify
condition monitoring on that machine. Now check if any of the machines affect the product quality more than
the others. Or, will one machine put workers’ safety at risk if it fails? Therefore, these machines are not equally
critical. On a related note, some machines will start making noises before they fail while others will fail without
warning. Their criticality may be equal, but one of them needs extra attention.
CRITICALITY RANKING
NOTES
69
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
• You could have a team meeting, with all the stakeholders, and set the criticality rankings. A problem with
this approach is that everyone believes their machine is the most critical, and the battles may continue long
after the meeting.
• You could perform reliability centered maintenance (RCM) or failure modes, effects, and criticality analysis
(FMECA). You will determine criticality this way. However, it takes a long time and we need to know
criticality now.
• We could keep it simple and label machines “critical,” “essential,” “non-essential.” However, the decision-
making power is lost here.
Instead, include all the stakeholders (in maintenance, operations, safety, health, environment, etc.), define the
consequences we are concerned about, assess the reliability (the probability the failure will develop), and assess
the detectability of the failure.
First, take an asset and decide whether the consequences of its failure are insignificant, minor, moderate, major,
NOTES
70
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
or extreme. On another axis, graph the probability, or likelihood, of failure, from rare to almost certain, within
your chosen time period. The combination of those factors will give you the criticality ranking.
Next, break the consequences of failure down into categories, such as equipment, people, environment,
production, and product quality/safety. Some consequences carry more weight than others, for example, if your
product quality could affect the health of a customer or a machine failure could be dangerous for workers.
Near the beginning of this course you learned how to decide on the most critical issues for your plant. Now,
break each consequence down into categories (insignificant, minor, moderate, major, extreme). For example,
under “equipment,” the worst case scenario may mean the destruction of equipment, the destruction of other
equipment, and the spare being unavailable in state. The worst case scenario under “people” may mean single or
multiple fatality. As a group of stakeholders, we will go through this process asset by asset. We will combine the
scores of reliability and consequences to get the criticality score.
Let’s go further. So far, we have given equal weight to the consequences of failure. A fatality has been scored
equally to secondary machine damage. There are two ways to solve this problem. We could simply use a larger
range of numbers under “people” (say, 1-10), or we could apply a weight to certain consequences. Under the
weighting system, certain consequences could also be scored lower, say, if your plant will not seriously impact the
environment even if the asset experiences its worst case scenario.
To go further, we can change the reliability score to reflect detectability. If a fault in an asset can be easily
detected, there is little probability that the machine will be allowed to fail, and the consequences of failure will
never surface.
We have just developed a wealth of useful, actionable information. If you store this information and have a way
of analyzing it, you could go back and see the reasons for the rankings. This would allow you to focus on, say,
machines that are only critical because their failure is undetectable and suggest condition monitoring for them.
You may look at other consequences and think of ways to minimize the risks to the environment or to safety,
even if the machine were to fail.
There is more that we can do. So far, we have talked about the equipment as a unit, but what about the criticality
of the individual components? Each component may have different problems and different criticality. For the
most critical machines, we may want to do a criticality analysis of each component so that we can do something
about them or monitor them. At this point, we are nearly doing RCM. We have spoken about the seriousness
of each consequence; similarly, the probabilities of each consequence developing are not equal. We could go
further and discuss probability and detectability for each consequence. We use criticality to justify digging deeper.
A criticality spectrum has system criticality on one end, followed by basic asset criticality and ACR, and it goes
more into depth until we reach RCM/FMECA at the other end. With this subject, we can start with something basic
and get into more and more detail. But the point is to use criticality to justify your efforts, and you end up with
the information that allows you to make decisions.
NOTES
71
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
NOTES
72
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
tasks, and hidden failure finding tasks we should perform? That creates a whole new set of PMs. Pure PMO takes
what you have and eliminates what you do not need, but does not necessarily create new PMs. Advanced PMO
does go that extra step, possibly replacing time-based tasks with condition-based tasks. This approach comes
close to RCM.
You have to be brave when it comes to PMO. Each time you delete a PM, you hope it does not lead to a problem.
Trust the science, history, and expertise that guide the decisions in terms of infant mortality, random failures, and
age-related failures. Apply the right strategy in the right locations.
Again, PMO purely reduces or removes PM tasks; RCM creates new ones. When you are trying to break out of the
reactive maintenance cycle of doom, while you may not be ready to perform full RCM, PMO is a useful task.
NOTES
73
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
What does success about every failure mode and every task, you will end up with an extremely
really mean? It means thick project report. You will never have time to perform all those tasks. You
have to take a practical approach, whether it is something you do internally or
that an RCM analysis
something you ask a consultant for help with. RCM purists will say you need
was performed, and it to consider every single asset and failure mode. That’s great if you have the
came out with a list of resources to do all that analysis work and follow the results. The practical
directives that involved side has to determine how much analysis we actually do, with the help of
proactive tasks as well criticality. Use criticality to focus on the assets that most need the analysis. If
you’ve performed Pareto and criticality analysis, you can start from that top
as reactive tasks.
5%, analyze them, and start work on them. Then, do the next 5%, and so on.
Hopefully, criticality will come down after you’ve done some work on that top
5%. People in your plant will see the improvements and feel a sense of achievement.
The sad fact is that only about 15% of RCM projects are successful. What does success really mean? It means
that an RCM analysis was performed, and it came out with a list of directives that involved proactive tasks as
well as reactive tasks. If we only determine the CM tasks that can detect failure and the PM tasks that prevent
that failure, but we do not address the root causes of failure, the project could be considered a failure. Some
RCM programs only come out with time-based maintenance tasks. We need a set of tasks that eliminate the root
causes and deal with the onset of failure. We need to implement them and make sure they are being performed
on schedule and that they are making a difference. Performance and reliability should improve. That would be a
successful RCM project.
NOTES
74
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
NOTES
75
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
frequently enough to detect it in time? Do we have the skills to perform the CM or PM tasks? How much will this
cost? How much time will be required and do we have that time? Knowing the answers to these questions will
help us plan our program and make the necessary adjustments.
INTRODUCING RCFA
In this module we will talk about root cause (failure) analysis (RCA or RCFA). This topic could fit in two places: here
in asset strategy or in continuous improvement. RCFA is a tool that looks backward to determine the root cause
of poor performance or failure and seeks to fix it.
Even with our RCM strategy, maintenance improvement, and performance improvement, there will be failures.
There will be poor performance in production. We need a logical way to figure out why the problems are
occurring and what we can do about it. RCA or RCFA is a method that is used to address a problem or non-
conformance in order to get to the root cause of the problem. It is used so we can correct or eliminate the cause
NOTES
76
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
and prevent the problem from recurring. In some companies people are frustrated with RCFA because, after
failures were found, nothing was done about them.
Root cause analysis (RCA) is analyzing a process, for example, production line problems. RCFA gets to the
root cause and eliminates it. These are the five basic steps of RCFA:
• Define the failure or process irregularity
• Investigate the root cause
• Create a proposed action plan and define timelines (get approval for the project)
• Implement the proposed action
• Verify improvement and monitor effectiveness
We can also use FRACAS—failure reporting and corrective action system—to help document and track the
progress.
It is not enough to define the root cause of failure. Make sure action is taken to eliminate it.
There are quite a few techniques that can be used to determine the root cause:
five whys, fault tree analysis, Ishikawa or fishbone, and many more.
The point is to keep
Five whys: You continue to ask “why” until the answer to that “why” can be
asking “why” until you
considered the root cause. For example, why won’t my car start? The battery
get to something that is dead. Why? The alternator is not functioning. Why? The alternator belt is
can be improved and broken. Why? The belt was well beyond its useful service life. Why? The car was
which, if improved, will not maintained per the recommended service schedule. Solution: Replace belts
solve the problem. according to the recommended schedule. The point is to keep asking “why”
until you get to something that can be improved and which, if improved, will
solve the problem. You can go through the physical reasons until you get to the
human side of it. Are the people being trained? Can they see the failure? You could keep going deeper, but you
just need to get to a point where you can control the outcome.
The fault tree analysis (FTA) goes through a similar process but asks more questions. The fact is, there might
not be one single root cause. It may be a combination of factors or influences. The car may not start because
the battery is dead, but it may also be out of fuel or have an engine that needs tuning. The result may be
complicated, with “ands” and “ors.” This is quite sophisticated, and we have to ask how much sophistication and
time are warranted. Ideally, the operations department should be empowered to perform RCFA. Everyone should
keep asking “why” and thinking of how to improve that asset and avoid the problem in the future.
If we cannot solve the problem logically, we can try the Ishikawa diagram, or fishbone technique. This is a
brainstorming technique to decide on the reason for failure: material, man, machine, method, or environment.
NOTES
77
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
Consider all the issues to do with each category. This exercise may help us find the cause of the failure, but it
could also help us identify things that could cause failures in the future.
NOTES
78
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
BROWN-PAPER PROCESS
Now we need to talk about the brown-paper process. Basically, the idea is to engage with many people in the
organization to get their ideas, feedback, and contributions to help us develop an asset strategy and target
problems that we can resolve.
We have two big challenges: we need to change the culture and we need to unlock the wealth of knowledge on
the plant floor. Could there be a way to kill two birds with one stone? Using this process, we can.
First, we engage people and involve them in the improvement process because it is motivating and creates
believers. Second, we learn a great deal about problems in the plant and motivate people to help solve them.
These problems include bottlenecks in production, waste, bureaucracy, equipment failure, bad practices,
shortcuts, procedures not followed, and inconsistencies.
How much do you really know about problems in your plant? Generally, people on the front line know a lot more
about these problems than management. The people who work with machines every day know a lot about them.
We need to change these attitudes: “We’ve always done it that way. That’s not the way we do things around here.”
What do you think would happen if you asked employees for their opinions in an unstructured way? What do they
think will happen? They probably believe their feedback will be ignored, and that may have happened in the past.
Here is an employee feedback process that was very successful in one plant:
• Establish a team to handle suggestions
NOTES
79
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
• The team works with the suggestee to quantify the benefits (make sure it is a plausible suggestion that will
not negatively affect another area, and it has value)
• Financial benefits
• Culture change or continuous improvement
• Once per week a meeting is held with senior management
• Suggestee leads the presentation (after coaching from the team)
• Manager says “yes” or “no”; if “no,” explains why
• Suggestee is involved in or leads the implementation (if possible)
• Suggestee reports back on results of initiative
These suggestions can be related to anything: put a cover on a machine so
We have two big the bearings don’t get wet during wash-downs, store this part somewhere
challenges: we need to else because it hurts our backs to move it, change the position of this switch,
etc. It is important to involve the person who made the suggestion in its
change the culture and
implementation, as they have ownership over this idea and will get things
we need to unlock the done. This process will raise morale and improve reliability. When you see
wealth of knowledge on results, be sure to recognize the person who suggested the improvement.
the plant floor. A BHP mine site tried the above process in Australia. They saw that the
commodity price for iron ore was about to drop, and they needed to do
something in order to avoid having to close mine sites and let staff go. They asked employees for suggestions on
improving the viability of the mine.
• They were amazed (and overwhelmed) at the response
• They received 750 ideas in the first seven days at just one site
• Across the iron ore business, they identified 4700 initiatives
• With the initiatives implemented in the past two years, they have saved A$1.6 billion in recurring costs
Just one thing they did was to replace the heavy-gear oil in haul trucks: 984 liters filled at 4 liters per minute in
colder months takes a long time. The suggestion was to use the holding tank to heat the oil and hold it closer to
the truck to make it flow faster. The oil now pumps at 50 liters per minute, and the filling time was reduced from
4 hours to 20 minutes. This is an example of reducing waste—of time, in this case, and time equals money.
Some key points: This was not a top-down process. It was bottom-up, and it caught on fire because people saw
that changes were being made, their voices were heard, and their opinions were respected and acted upon. The
key to their success was empowering the frontline workforce.
NOTES
80
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
NOTES
81
MOBIUS INSTITUTE | ARP-A R-09: Asset Strategy
VENDOR ANALYSIS
Let’s talk about vendor analysis. In the section on PMO, we discussed the fact that we had to scrutinize all
the PMs, including the PMs that come from the OEMs—the vendors. We should also scrutinize our selection
process. If a vendor is aligned with our thinking, in terms of the need for reliability, for acceptance testing, and
for inspections during overhaul, and the idea of condition-based and interval-based maintenance, we should
work with them. They will help us achieve better results in the future. There are a lot of other things to do before
vendor analysis, but keep it in mind in case the opportunity comes up.
This concludes the asset strategy section. This is the phase where we shift from reactive maintenance to world-
class, proactive precision maintenance, which leads to improved reliability and performance. We have to
scrutinize the decisions we’ve already made, and this is what the asset strategy allows us to do—prioritize, use
some logic, and make sure we are focusing where we need to. We will collect data that shows us where we have
poor reliability and which assets are giving us problems. With this strategy, we will achieve better results. Even if
you are already some way along your journey, make sure you have completed these steps.
NOTES
82
MOBIUS INSTITUTE | ARP-A R-10: Work Management
This module will discuss work management, a broader topic than just planning and scheduling. The goal of the
work management program is to create a streamlined process for managing all maintenance work that reduces
costs (and waste) and ensures jobs are performed efficiently, in harmony with operations.
Work management is a fundamental requirement of a reliability improvement initiative. I do not believe you can
improve reliability without work management: planning and scheduling, as well as the other elements we will
discuss. Work management ensures the right jobs are being done, the jobs are following procedures, etc.
Ultimately, the goal of work management is to get the right people with the right skills doing the job at the right
time, with the right parts, and the right tools, in a safe manner. The job must be efficiently performed with
precision while following all procedures and meeting performance standards. Any discrepancies should be
reported so the next job will go smoothly. Make sure your work management program ticks all the bold words.
NOTES
83
MOBIUS INSTITUTE | ARP-A R-10: Work Management
tasks. We should have a procedure and a schedule in place. We also have the tasks that arise out of the CM
tasks, inspections, and observations. Workers need a way to generate work requests, and we need a way to
process and prioritize them, and then we need to plan the job. We may need certain spares, materials, tools, and
people with the right skills. We should ideally kit the job so the job is ready to go. We look at all job requests and
decide which ones to do, considering criticality and the Pareto analysis. Jobs must be done correctly according to
procedures and the set schedule. At the beginning of the program, it is a good idea to leave room in the schedule
for break-in work. Make sure you have procedures in place for dealing with work requests as they come in, and
for the break-in work. Finally, we have close-out and feedback—making sure the work was done properly with the
right tools, procedures, and materials.
NOTES
84
MOBIUS INSTITUTE | ARP-A R-10: Work Management
PROCESS REQUESTS
Job requests will come in and they must be processed. These could be time-based jobs that come from the asset
strategy, work requests that result from condition monitoring tests, inspections, or observations by maintenance/
operations, or urgent break-in work.
JOB PLANNING
In the job planning stage, we look at the jobs that have to be done and make a plan so that they get done. Ideally,
planners are experienced in mechanical work and/or electrical work, and a different person would schedule the
jobs. We need people to create procedures detailing how the work should be done, how much time it should
take, the spares and tools needed, and the skills that the workers need. Otherwise, time will be wasted. These
jobs should be done with military precision, have the least impact on operation and production, cost the least
possible amount, and make the greatest use of the mechanical and electrical tradespeople.
Proper job planning saves money and increases safety. With experience, the time needed to plan the jobs will
decrease since we can consult previous job plans and their feedback. We should aim to plan at least 80% of all
jobs (note that break-in work is not classified as planned work). We should also aim for 90% schedule compliance.
Metrics like this give you a guide to how effectively the process is working. Ultimately, the goal is improved
performance via improved reliability, which is partially achieved by following job plans. Each planner/scheduler
should manage 15-20 technicians (30 technicians if planning and scheduling are managed by different people).
What is the normal, day-to-day function of a planner? They are like a stage manager, making sure everything goes
smoothly, and of course they have to manage interruptions. The scheduler deals with production, availability
of job plans, machines, and other resources, and makes things happen. Those are two very different ways of
working, and one person playing both roles will be less efficient.
There may be jobs that can be performed without planning if the task can be done quickly, the steps are well
known (make sure this is true and not a case of the job having consistently been done incorrectly!), and the parts
and tools are available.
WHY PLAN?
Do you think your department is too busy to plan? Planning is actually much more efficient. Take your best
NOTES
85
MOBIUS INSTITUTE | ARP-A R-10: Work Management
technician(s) out of the pool of people who do maintenance work and have them do the planning and scheduling.
An EPRI study found that with a 40% increase in work planned, utilization increased between 20% and 40%.
Therefore, if the organization has 40 maintenance workers, a 20% increase is equivalent to 8 people. It is like
creating additional resources—and those resources can be doing proactive tasks.
The shorter the P-F Interval of an asset, the more a job is rushed, leaving little time to get the right people and
tools. We might also have to spend more money sourcing the spares and tools.
In contrast, with the asset strategy in place, as well as good condition monitoring, we have a lot more time and
flexibility to plan before there is any risk of the equipment failing. This minimizes the costs, safety risks, and
impact on downtime.
NOTES
86
MOBIUS INSTITUTE | ARP-A R-10: Work Management
JOB SCHEDULING
The scheduler looks at the planned jobs, checks the available equipment with operations, checks the availability
of people skilled to do the work, and schedules the job. The scheduler must coordinate with operations to find
the most convenient time to perform the work. The scheduler must also coordinate with the maintenance
supervisor to determine who should perform the job.
If your plant is still in the reactive maintenance stage, you should strive to plan one day in advance. There will be
a lot of break-in work, but planning will help the other tasks get done as well. Over time, you should be able to
plan work at least one week in advance. The planner should be aware of the work being done. He or she should
periodically leave the office, talk to workers, and get their feedback as to the time allotted for the jobs.
The scheduler
• Determines the available craft hours
• Compiles a list of jobs
• Determines the remaining craft hours available
• Selects which jobs should be performed based on priority, equipment availability, etc.
NOTES
87
MOBIUS INSTITUTE | ARP-A R-10: Work Management
• Selects additional jobs in reserve in the event that there is less break-in work
The schedule is then presented to the maintenance manager for approval and reviewed by operations and
production. The scheduler can then prepare the jobs (parts, permits, instructions, etc.).
Time will be wasted if the equipment is not available, or the technician has to hunt for parts or tools, or if they are
unclear about the steps required to perform the job. Maintenance, operations, and schedulers need to work very
closely together to make sure the technicians and the machine are ready at the same time.
JOB EXECUTION
And then the job must be executed. The goal here is to do it once and do it right. All jobs must be performed
safely, with precision, and following documented procedures.
As mentioned earlier, we have to deal with break-in work. Break-in work is urgent, so space must be left in the
schedule to deal with it. Check for pre-made job plans for that work, and do it as efficiently as possible. When
break-in work crops up, we must talk to operators to find out what happened and whether it has happened
NOTES
88
MOBIUS INSTITUTE | ARP-A R-10: Work Management
The goal here is to before. If it is related to safety, stop and make sure you completely understand
do it once and do it the situation. Complete all the necessary steps required to control risks.
Check that you have the manuals, drawings, parts, and tools that you might
right. All jobs must be
need. Notify the appropriate people. We do not want to delay the work, but
performed safely, with it is important to take a moment to consider how to do the job as well and
precision, and following efficiently as possible. Break-in work must still be performed with precision.
documented procedures.
COMMISSIONING
After performing the work, we have to start the machine again. Commissioning
is a series of processes by which equipment is tested to verify its functionality according to its design objectives
or specifications. Do this while starting up the machine. Vibration analysts or other technicians may be involved
here.
The actual requirements for correctly commissioning equipment are beyond the scope of this course. As an
example, a large manufacturer recorded their OEE and saw that there were often problems during and after
shutdowns, partially due to incorrect commissioning. After they implemented correct commissioning processes,
those problems went away.
NOTES
89
MOBIUS INSTITUTE | ARP-A R-10: Work Management
NOTES
90
MOBIUS INSTITUTE | ARP-A R-11: Spares Management
This module is important for three reasons: we can reduce the costs associated with spares management, we can
make the planning and scheduling process much more efficient, and we can improve reliability by caring for our
spares. Our goals are to reduce maintenance costs, reduce the planner and scheduler’s workload, and increase
equipment reliability by minimizing our spares inventory, taking care of those spares, and ensuring we have quick
access to the right parts and materials when we need them.
Where does it fit into our work management process? When planning the job, we need access to the spares,
materials, and tools in order to work efficiently. A lot of time can be wasted here if these things are unavailable or
we cannot find them. Correctly managing the spares can reduce the planner’s workload by up to 70%.
Spares and materials make up 40-60% of all maintenance costs in most organizations—this amount of money
must be managed correctly. The holding cost alone could be 30% of the purchase price. Is there a tax burden
where you live because you have millions of dollars of inventory sitting around, which are considered assets? We
NOTES
91
MOBIUS INSTITUTE | ARP-A R-11: Spares Management
should only hold the spares that we really need. There is a temptation to keep spares “just in case” or because
you got a deal by buying in bulk—but you may never need those extra spares.
Spares management is important so that we can stop searching for
Our goals are to reduce spares that are not there or are in the wrong place or hidden. They
need to be stored in a convenient location so that we do not waste
maintenance costs, reduce
time fetching them. Also, one person or department cannot keep the
the planner and scheduler’s spares without the knowledge of others. Another important aspect of
workload, and increase spares management is to prevent the wrong spare from being used
equipment reliability by (and reused). We also have to make sure that the spares are in perfect
condition when they come off the shelf—they should not be sitting
minimizing our spares inventory,
there degrading, vibrating, and getting wet and dusty.
taking care of those spares, and
Inefficient spares management affects maintenance and reliability in
ensuring we have quick access
many ways, such as technicians having to wait for spares, search for
to the right parts and materials them, and travel to collect them.
when we need them.
It is important to have an accurate database, control access to the
spares, keep the spares in suitable locations, and have a process to
select which spares will be kept in inventory. We may even sell off some spares to get rid of the liability and
free up space.
ACCURATE DATABASE
All available parts should be entered correctly. The database needs to be maintained so that people have
confidence in it. Otherwise, people will waste time searching for spares, hold spares in their own possession, or
order new spares when there may be spares in the inventory.
Ideally, there will also be an accurate master asset list, or equipment register, that uses a structured naming
system. We don’t want to hold spares for equipment we no longer have. We should also have an accurate bill of
materials.
CONTROL ACCESS
Ensure that spares cannot be used without it being logged in the database. We cannot maintain an accurate
database if people can take spares or other materials without updating the database. To make sure of this,
make it easy to record this information. One option, especially useful for spares that are used frequently, is a
“supermarket” system of searching, accessing, and documenting use of spares and materials. Some companies
have “vending machines” for smaller spares.
Think about the location where the spares are kept. If people have to travel long distances to get spares and
NOTES
92
MOBIUS INSTITUTE | ARP-A R-11: Spares Management
materials, a lot of time (and energy) is wasted. It also creates frustration and means the equipment is down
longer.
SELECTION PROCESS
Entire courses can be taught on the selection process. Establishing a selection process is extremely important.
You need balance: don’t hold too many spares, but make sure the necessary spares are there. We need a way of
understanding what spares we need, particularly for critical equipment. Through our asset strategy process, and
with the bill of materials, and understanding failure modes, we should be able to determine what parts we are
likely to need. Check if any of these parts are also needed by other machines.
It is possible to hold the parts on our shelves, but we could also explain to the vendors and suppliers that we
need fast access to certain parts and spares. They can keep them in stock. Spares will be much easier to manage
once you have reduced break-in work, since you will have a longer warning time.
Critical spares are not just the spares associated with critical assets. Remember that criticality is a combination
NOTES
93
MOBIUS INSTITUTE | ARP-A R-11: Spares Management
of consequence and likelihood of failure, and the likelihood is a combination of reliability and detectability. In
determining critical spares, we have to consider the failure modes, which parts are likely to fail, the likelihood of
that failure occurring, the lead time to failure, and the likelihood of us detecting it.
We also have to consider the probability that our assessment of reliability and detectability is correct. Say we
have a good vibration program and, because of this, we are confident that we will detect a developing fault. But
what if the analyst misses the fault for some reason? People are human. But now we have a problem because
we thought we’d have enough lead time to order the spares. We have to be realistic about detectability. Are we
taking our measurements on schedule? Are we sure we are taking them frequently enough?
Consider which parts will be required if failure occurs: catastrophic versus detected, secondary damage versus
repair/overhaul. And consider redundancy: unit B will operate if Unit A fails, but what if Unit B also fails due to a
hidden fault? You must track which spares are critical so that you do not run low. Some spares may be used in
multiple machines, including non-critical machines.
Consider the lead time to accessing needed spares. Consider the ordering time, delivery time, repair time, impact
on production, availability of substitutes, possibility of repairing the existing part, and the availability to have the
spare made locally, among other issues. It is extremely important that we get this right—we want to have the
right spares controlled in a database.
NOTES
94
MOBIUS INSTITUTE | ARP-A R-11: Spares Management
In conclusion, we can reduce the holding and purchase costs, improve work efficiency, and improve equipment
reliability. We do this by ensuring that we have the right spares, available when needed, in top condition, without
having too many spares. This is one of the keys to a successful initiative—actually, to a well-run business.
NOTES
95
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
Our goal is to do everything possible to reduce the likelihood of future failures. Do the job correctly the first time
and take steps to eliminate root causes of failure (from a maintenance perspective).
The examples provided in this section mainly deal with rotating machinery, but the principles apply to all
equipment: electrical connections, transformers, steam traps, structures, extruders, spray booths, mobile
vehicles, mining equipment, and more.
PRECISION LUBRICATION
Precision lubrication is key to rotating machinery and hydraulic equipment. This should be one of the areas
you look at as early as possible in your reliability improvement initiative. A huge number of failures arise from
poor lubrication. Use the correct type and volume of lubricant and eliminate all forms of contamination. There
is probably no single action that improves reliability in the plant more than ensuring that machines (especially
bearings and gears) are lubricated correctly.
NOTES
97
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
As bearings roll along the raceway, there is so much pressure at that point that the metal surfaces actually deflect
slightly. That is part of the design. As long as the bearing was chosen correctly, it should be made to withstand
the load. But that assumes the correct lubricant is being used in the correct volume. It protects the bearing. The
same is true of gearboxes. The load on the teeth as they mesh together should be bearable if they are properly
lubricated.
There is less than 1 μm between the rolling elements and the raceway
There is probably no single (and between the gear teeth). If you look under a microscope, those
surfaces are rough. The only thing holding those surfaces apart is the
action that improves reliability
lubricant. The lubricant is under so much pressure, and the gap is so
in the plant more than ensuring small, that it can temporarily turn into a solid. Bearings are designed
that machines (especially to last for a very long time if they are used and lubricated correctly.
bearings and gears) are But what if there is contamination (particles or liquid) or it is the
lubricated correctly. wrong lubricant? That oil or grease has certain chemical properties,
which need to be correct for the application or else there will be wear,
NOTES
98
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
Managing Contamination
We need to manage the contamination in grease or oil. It is very difficult to see whether oil has been
NOTES
99
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
Particle Contamination
Now let’s look at particle contamination. As mentioned earlier, there is just a 1-μm gap between the bearing and
gear surfaces. What would happen if a hard particle got into that gap? Because the gap is so small, only small
particles get in there. Rolling elements or gear teeth roll over the particle, and the surfaces are damaged. While
the indentation left by the particle is small, the rolling elements keep rolling over it, and it spreads and gets
worse. A big spall in a bearing may have started as a small indentation.
A study was performed on helicopter gearboxes and bearings, using filters of different sizes to clean the oil. They
began with 40-μm filters and moved on to 25-μm filters, with little improvement. The situation was a bit better at
10 μm, but not by much. It was at the 3-μm mark that significant results were seen. Make sure your filter is fine
enough to catch particles of this size. Those are the ones that get into tiny gaps and damage the bearings. Some
companies tried this and found their filters blocked with too many particles. They need to investigate where all
those particles are coming from.
We have to filter the oil before it goes into the machine. The oil is probably already contaminated when you
purchase it. Make sure you keep the storage area clean so the oil does not get dirty. We need to eliminate the
ingress during operation (using breathers or sight gauges, for example). Don’t contaminate the oil when samples
are taken. And filter them out from the working gearbox.
Particles can easily be transferred when dirty oil cans are used. Purchasing and using a proper oil container will
not help if you leave it sitting outside and getting dirty.
It is a good idea to color code the drums and the machines that use that oil. We can also use sight glasses to
NOTES
100
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
clearly see the oil level and avoid the need for dipsticks, which can introduce contaminants. There are specialized
products out there to test the oil.
NOTES
101
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
reliability issues with the motor, excess heat due to dirtiness may be the reason.
If any spout is tilting upward, rain and dust will get inside the machine.
Even if you work in a harsh, dirty environment, it is important to remove the dust from the motors periodically.
This is something that operators can do.
NOTES
102
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
installation can damage the bearing. In this case, the lip was broken off. In another case, the bearing was
hammered into place, chipping the raceway.
When we replace a bearing, we can look at the wear patterns and see if it was misaligned, skewed, or damaged
when put into place, or placed under excessive load. If the inner race is cocked on the shaft at any angle, it puts
the raceways and the rolling elements under additional load, reducing the life. If the outer race is cocked, the
raceways are also put under load.
PRECISION ALIGNMENT
We have to look at alignment as well, especially when dealing with rotating machinery. Shaft and belt alignment
and soft foot corrections, are key to ensuring rotating machinery runs smoothly. Misalignment adds significant
load on bearings, gears, couplings, shafts, seals, and other components. Even if a machine looks like it is aligned,
it doesn’t mean it is precision aligned.
We can have an offset misalignment, and it will generate forces with every rotation that will reduce the life
NOTES
103
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
of those components. These repetitive, once-per-revolution loads stress the shaft, the seals, the coupling,
the bearings, and the foundations. If there is a gap—an angle between the two shafts—it is called angular
misalignment, or gap misalignment. This also reduces the life of the machine with each rotation. The patterns on
the bearing will tell us what the case was, but we need to prevent it.
A study was performed in which it was found that the life of a bearing was reduced drastically in relation to the
NOTES
104
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
angle between the shafts. For 10 minutes of a degree, the life was reduced to about 30%. Five minutes reduced
the life by half. What do we mean by minutes? Imagine we have 360 degrees, and a degree is broken up into
60 minutes. So 5/60 of a degree of an angle between the two shafts is enough to reduce the life to one-half of
the expected L10 life of a bearing. There is no way, with a straightedge or even dial indicators, to achieve these
tolerances.
Just because you have purchased a laser alignment system does not mean you have precision alignment. You
need to set tolerances and make sure the machine feet have proper shims. Your workers need to be motivated
to notice that last little thin shim.
Just because the vibration is not indicating a problem, it does not mean you have precision alignment.
Misalignment is tricky to diagnose.
The answer is to precision align the machines. Ask those doing the alignment work what alignment tolerances
they are using. Do they consider thermal growth? The machine can go out of alignment when it gets hot. What
do they do if they become bolt bound or base bound? What type of shims do they use, and do they measure the
thickness? How are they moving the machines?
NOTES
105
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
If they are using dial indicators, we need to check the aforementioned issues and see if they handle “bar sag.”
Check for parallax/reading errors and calculation errors.
We need the belts and pulleys to be in alignment with the correct tension on the belts.
Soft Foot
What is soft foot? Ideally, when you put the motor on the base, the base itself is perfectly flat and the four feet of
the motor are on exactly the same plane. That could be because the base itself is not flat, the feet are not on the
same plane, the feet are bent, they have crud underneath them, there are too many shims, the shims are bent,
etc. Soft foot not only makes it much harder to achieve precision alignment, but when you tighten the bolts you
are actually distorting the frames, and that can affect the tolerances within, and in the case of a motor, it changes
the air gap between the stator and rotor.
There may be a high spot on the foundation under one foot. When we tighten the other bolts, we will squeeze
the base down.
NOTES
106
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
NOTES
107
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
sensitive to unbalance. To balance the rotor, we do some tests and find out where the unbalanced weight is. If we
can remove it, we do so. If not, we add an equal weight to the other end. We can choose to balance it a little, or
we can balance it tightly, which is the G 1.0.
Bearings get pounded with every rotation because they attempt to stop a shaft from moving them in a circular
motion. When the unbalance gets bad enough, the vibration will be high enough for analysts to take notice.
But for all the time the machine has been running before then, the whole structure has been vibrating, possibly
affecting local processes and product quality. We need to precision balance and align the machine, and if it goes
out of balance or alignment, we have to correct it before it gets too bad. A buildup of dirt can cause unbalance, so
the components will need to be cleaned before the machine can be balanced.
We can observe axial motion as well in an unbalanced machine.
Rotors can be balanced “in-situ,” where the rotor never leaves the machine. We take vibration readings and
balance the machine. Another way is to remove the rotor and “shop balance” it. You may be able to do this at
your site; if not, you can send it to be done. There are pros and cons to both methods. No matter what, you need
NOTES
108
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
to set the tolerances and have evidence that those results were achieved.
PRECISION FASTENING
In this module we will talk about fastening. This will mostly deal with rotating machinery, but fastening also
applies to electrical equipment.
There is a right way and a wrong way to fasten two items together. If we put too much torque, we can damage it.
Too little torque results in harm. If we do not bolt things together correctly with the right combination of bolts,
washers, and so on, it causes problems. One may imagine those performing this work are doing it correctly, since
they have been doing it so long, but it is an opportunity to provide training. Be sure to not point fingers at anyone
for how they have been doing it, but just explain the right way and show why.
We can look at obvious issues of bolts failing. When doing electrical connections, we have to have the two
materials bolted together without the washers in between—we need the current to flow straight through.
If bearings are not bolted down tightly, they will rattle around with the vibration. If that machine were perfectly
balanced and aligned, it may just sit there and not rattle even if the hold-down bolts were loose, but in practice,
it is best to keep the bolts tightened to the correct torque. If workers are using torque wrenches, do they know
what the torque should be?
We can have other problems as well due to weakness, cracks, corrosion, or loose bolts that can cause, for
example, a running motor to sway. It increases the vibration and the vibration can lead to failure.
NOTES
109
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
frequency. It is very common for pumps, fans, and other machines. The machine may sway, rock forward and
backward, bounce, twist, or exhibit another mode, none of which are good for the reliability and health of the
equipment. You can solve this problem by adding mass, stiffening the machine, changing the operating speed, or
in another way. But first we have to spot resonance and understand it.
NOTES
110
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
NOTES
111
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
This is especially important for culture change and morale. A disorganized workspace sends the wrong message
when talking about reliability improvement. Workers should be able to clearly see what the procedures are and
where things go, and they should be able to find what they need and not trip over things. People should not be
allowed to put things in the wrong places. This will not happen overnight, but the process needs to begin. Putting
time into this (or anything) will create time in the future—more time for proactive tasks.
If you have tools hanging up or stored in a convenient way, workers can easily find the tools. It is also easy to see
if tools are missing.
NOTES
112
MOBIUS INSTITUTE | ARP-A R-12: Precision and Proactive Work
Work instructions and procedures should be clearly displayed and illustrated at the place where that work is
being done. Shelves should be labeled, with arrows and colors for clarity. Add minimum and maximum values,
such as weight limitations. On the floor, mark where things are supposed
A visual workplace removes to go. Don’t rely on people’s memories.
confusion and clutter. It One creative idea used at a plant was to place an angled sign on top of
ensures everyone knows the tool cabinet so that tools could not be placed on top and workers
what is right, what is wrong, were more likely to put them away.
and whether procedures are Mark gauges with the desired pressure range.
being followed. In conclusion to precision and proactive work, every single proactive task
you perform will reduce the likelihood of future failures and will help us
break free from the reactive maintenance cycle. Every maintenance task that is performed with precision also
reduces the likelihood of future failures. We have to create an environment where these tasks can be performed,
and where the right way to do things is enforced and encouraged.
NOTES
113
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
The goals of our condition monitoring program are to monitor critical equipment to detect when the root
causes of failure exist, and to be forewarned when functional failure will occur. We utilize the condition-based
maintenance (CBM) philosophy to plan maintenance based on the condition of the equipment, not its age.
Where does CBM fit into our work management process? It comes at the beginning as part of our asset strategy.
Condition-based tests will be performed, and if the analysts find problems, they will submit work requests.
BASIC APPROACH
Condition monitoring is taking action based on evidence of reduced health or a root cause that will lead to failure.
We can find this evidence by using performance data, process data (temperature, flow, pressure, etc.), (ideally
nonintrusive) inspections, or observation of any indication of condition.
We need to use criticality analysis to determine whether monitoring is justifiable and whether multiple
NOTES
115
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
technologies (or online systems) should be used. FMEA and RCFA, or common knowledge about common
components, will tell us which technologies are effective for which failure modes. We need to know, as closely as
possible, the P-F Interval, or lead time to failure, of a machine to determine the monitoring rate. In this module,
we will look at a number of technologies that we can use.
VIBRATION MONITORING
We will start with vibration monitoring. Vibration analysis has been successfully used for many years to detect the
nature and severity of the fault. As a wide range of faults develop, the vibration changes as a result. Challenges
include making sure you are taking the measurement frequently enough, that you are testing in the correct
locations, that you have the settings correctly set, that you are able to detect the changes when analyzing the
data, and that you recognize the fault condition conveyed in the vibration. This is not the easiest technology to
use. It takes training, skill, and experience to do it right. But the systems are becoming easier to use and guidance
is available. Machine learning systems are also becoming more effective. Especially with online monitoring, you
can have software look at the data and at least indicate if there is a
When we measure vibration, significant change. Sometimes, it can even diagnose the fault, assess
the severity, make a recommendation, and send the information
what we are really looking for is
to the maintenance management system so that the work order is
change: increased amplitudes, generated.
frequency changes, etc. Different
Machines always generate vibration. When we measure vibration,
failure modes generate different what we are really looking for is change: increased amplitudes,
vibration patterns. frequency changes, etc. Different failure modes generate different
vibration patterns. If a fan is not balanced or a pulley is eccentric, the
vibration will change in predictable ways. If the bearings are worn, there is a crack in the outer race, or there is
excessive clearance, the vibration will change in predictable ways. And if two machine components are coupled
together and there is wear in the coupling, or there is looseness, or too much force is being applied, or the shafts
are not aligned properly, the vibration will change in predictable ways.
Measuring Vibration
We have a sensor, which we place on the machine. The vibration is transmitted into that sensor, an electrical
system goes into our device, and it can be presented to us either as a number that can be compared against
an alarm limit or a chart, or a more complicated pattern that we can look at as a time waveform or a spectrum.
There are different levels of complexity. Because a machine vibrates up and down, side to side, and axially, we
might place that sensor in a number of locations, on the motor and the pump, for example, and in all directions.
When selecting a place to put the sensor, we need a good transmission path between the components we are
testing and the sensor.
We also need repeatability, meaning we test the machine in the same way every time. If the machine is running
NOTES
116
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
at a different speed or under a different load, or the sensor is not in the same place, then that will change the
measurement. The analyst is left wondering whether the change was in the machine or the condition. Vibration
lets us see inside the machine. The vibration from the fan, the bearings, the rotor bars, and the rotor transmits
up through the bearings to where the sensor is. The vibration analysis process can then break that vibration up
according to the different components.
Vibration from the shaft creates a very simple sine wave. This is one frequency. If the machine is going faster,
the cycles bunch together. If there is more vibration, maybe due to unbalance or misalignment, that once-
per-revolution vibration goes up. If, for example, I have twelve blades on the fan, there are twelve cycles per
revolution. If the blades are in good condition, it would be a little bit of vibration. The bearings themselves,
under good conditions, do not generate vibration. If there is a problem, they will generate vibration at a different
frequency. The vibration of all those components added together gives us our raw time waveform.
NOTES
117
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
The Fast Fourier Transform (FFT) takes that waveform and breaks it up into the individual frequencies. We can
also turn the time waveform into a spectrum, in which the peaks represent those individual frequencies. This is
what vibration analysts do: look at the spectrum to see if the amplitudes of the peaks have changed and diagnose
the problem based on the pattern. The pattern relates to the type of fault. The amplitude relates to the severity.
Bearings (and gears) can generate unique frequencies, and when they start to fail they generate high frequencies,
which we can detect. The amplitudes are low, so we will need special techniques. We can tell from the frequency
whether the problem is in the outer race, the inner race, or the rolling elements.
If we look at the raw time waveform of a gear with damaged teeth, we can see spikes in vibration every time they mesh.
We can also use phase analysis—comparing the vibration vertically and horizontally—and we can see if the
motion is circular or elliptical. Phase analysis is a great tool for distinguishing between misalignment, unbalance,
and other problems.
ULTRASOUND
Now we will discuss airborne and structure-borne ultrasound. There is a simple way and a more sophisticated
way to approach ultrasound. Basically, the operator can wear headphones, listen to the sound, and look at the
meter that shows the amplitude level. With ultrasound, you are listening to frequencies that are so high that you
cannot normally hear them. The machines—steam traps, bearings,
Different problems have unique electrical equipment, and other applications—can generate sound at
sounds. Once a person is frequencies above 20 kHz, which is above our ears’ ability to detect.
The instrument transforms the frequencies into ones we can hear,
familiar with the sounds, they
and that is what the operator is listening to. If the sensor makes
can determine the problems. contact with the item being tested, it is structure-borne ultrasound.
The instruments also have a readout of the amplitude, which can also be used as a gauge: how “loud” it is, and
how it has changed from last time. The sensors and instruments come in different shapes, sizes, costs, and
capabilities.
Ultrasound can be used to listen to the bearing in order to detect the earliest signs of failure and for signs of poor
NOTES
118
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
lubrication. Ultrasound can be used during the process of greasing a bearing. As grease is being applied, you can
hear the sound change, and you know the bearing has enough grease and you can stop before it has too much.
With airborne ultrasound, the sensor does not make contact with the machine. The operator waves the
instrument in the air to catch the highly directional sound waves (very convenient in a noisy factory). This is
especially helpful in finding leaks.
Different problems have unique sounds. Once a person is familiar with the sounds, they can determine the
problems. Ultrasound can be used in all types of applications: mechanical, electrical, and process. The sounds can
also be recorded and analyzed later.
NOTES
119
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
NOTES
120
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
A study was conducted in rings, poor power supply, or problems with connections, and decide to use
condition-based maintenance to detect those problems. Or you can look at
1985 and found that 41%
the history and how the machines are used in your plant.
of motor failures were
The way an electrical motor works is we apply voltage to the stator. That
due to bearing failures,
creates a rotating magnetic field. Current is induced in the rotor, which is
but 47% were due to rotor sitting in the middle of that magnetic field, turning it into a magnet which
and stator failures. is attracted to that rotating magnetic field, and it starts spinning. We apply
three phases of voltage to the motor, and we want that voltage to be smooth
and sinusoidal. If that magnetic interaction between the rotor and the stator is smooth, the current will be
smooth and sinusoidal. However, there are situations when the voltage applied to the motor is not clean. It can
have harmonics. There may be connection problems where the voltage on one of the phases is lower than the
others. There might not be balance between the three phases. If there is a problem with the stator or the rotor—
broken rotor bars, cracked end rings, etc.—then as the rotor turns, it changes the magnetic interaction, which
changes the current. We can see problems with the current signature, and we get sidebands in the spectrum. We
can perform these tests on one phase of the current or on all three, plus analyze the vibration.
NOTES
121
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
NOTES
122
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
Oil analysis has been in active use for many years. Kits and mini-laboratories can be purchased for on-site
testing, and/or oil samples can be sent off-site for testing and analysis. Tests are performed on lubricating oils
(combustion engines and non-combustion rotating machinery) and on hydraulic oils. Analysts are looking for
three main things:
• Check the chemistry of the lubricant to make sure it has its additives and viscosity and make sure it is able
to do its job. Otherwise, we change it (condition-based oil changes)
• Check for contaminants: particles, water, fuel, soot, process material, etc.
• Check for wear: if there is metal-to-metal contact and pieces of metal are being shed, we can use oil
analysis to detect some of those particles
Pros and Cons
The pros of oil analysis are that we can understand the condition of the lubricant, detect contamination of the
lubricant, and detect wear of the lubricant.
The cons are that it requires an investment to take the samples correctly, a cost is associated with the lab
service, the test results can be complicated (choose your lab carefully), and you must understand the technique’s
limitations.
NOTES
123
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
INFRARED ANALYSIS
In this module we will talk about infrared (IR) analysis (thermography). There are a couple of tools we can use to
measure temperature change. We can use thermal imaging or a simple spot radiometer. As the tool is moved
around, the laser beam indicates the center of the area it is measuring. The thermal imaging view shows a
color-coded range of temperatures. The further we get from the target, the larger the area measured. This
will decrease the accuracy, as the tool will take the average reading of everything in its range. This is a tool an
operator can use to check the temperature of bearings and other components.
Infrared cameras are becoming more affordable. We simply move it around to see a thermal image of the
components. However, it is easy to be misled that something is too hot or too cold due to the settings of the
device and the emissivity of the object. If you have components that are similar to each other and should be the
same temperature, it is easy to see if one is hotter than the other. However, check to make sure the heat is not
simply being reflected off something else. The component itself may not be hot. Something could look hot when
it is actually your reflection bouncing off the surface and coming back to the camera. Wind can affect the reading,
NOTES
124
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
as can reflections from sunlight, humidity changes, etc. This technique looks very simple, but it is easy to make a
mistake and jump to the wrong conclusions.
You can even hook an infrared camera up to a smartphone (remember that you get what you pay for). We can
use infrared for mechanical applications, electrical applications, and others, see the temperature gradient, and
assess whether that temperature is acceptable. If not, what is causing the change? Some cameras combine the
visual and thermal images so you can see the components more clearly.
If you have similar components that should all be the same temperature and one is clearly hotter than the
others, check for a problem.
VISUAL INSPECTIONS
This is the last of the series on condition monitoring, and we will discuss visual inspections, performance
monitoring, and non-destructive testing. By visual inspections, I mean any sort of observation a human can make.
Aside from technology, we should use our eyes, nose, hands (when safe), and ears to detect problems. We can
do this in two ways. First, as part of a preventive maintenance task, a person can go out and perform a visual
inspection of the equipment, ideally looking for something specific. Second, any time we perform a condition
monitoring task (vibration, infrared, or whatever), the technician should perform visual inspections as well. Listen
to the machine. Look around the machine for water, oil, or a steam leak. Are the bolts loose, is there cracking,
or are there rubber particles underneath the coupling? Is there an unusual smell? If it is safe to do so, touch the
bearings. That information can provide work requests and/or help the technician diagnose the problem. A slowed
beat from a motor, for example, can be difficult to detect through vibration analysis, but it is audible.
NOTES
125
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
When you start your reliability program, you should be walking through the plant and using your senses to
check for yourself. Aside from visual inspections of the machines themselves, watch how the technicians are
performing their tasks and make sure everyone is following the 5S guidelines. If we sense something unusual,
such as a burning smell or a hot bearing, we can avoid failure and waste.
Visual inspection can be part of a PM program; however, it is essential that the goal of the inspection is clear and
that information can be recorded rather than checked off. Don’t tell the technician to check the temperature. Tell
them to make sure the temperature is between 25° and 32°C, or to record the temperature. The trouble with logs
is that they often go unchecked. Everyone should be encouraged to record and report observations.
PERFORMANCE MONITORING
It is easy for us to get wrapped up in the technologies, but we should not forget performance monitoring. This is
part of condition monitoring, both in terms of assessing the condition of the equipment and diagnosing the fault.
We improve reliability to improve performance. A change in performance is an important thing for us to take
NOTES
126
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
NON-DESTRUCTIVE TESTING
Non-destructive testing (NDT) is a set of noninvasive techniques used to determine the health of the equipment
or to take a measurement. Most of the techniques mentioned so far are non-destructive tests. There are a
number of NDT techniques:
• Magnetic particle testing (MP)
• Liquid penetrant testing (PT)
• Radiographic testing (RT)
• Ultrasonic testing (UT)
• Electromagnetic testing (ET)
• Laser testing methods (LM)
• Leak testing (LT)
The ultrasonic testing listed here is different from the ultrasound we have discussed. Infrared is also NDT. We
want to detect cracks, corrosion, or anything that indicates a developing problem that must be dealt with.
Magnetic particle inspection is detecting surface and shallow subsurface discontinuities in ferromagnetic
materials. A person applies a magnetic field and then uses the applicator to check if there is a problem.
Liquid penetrant testing is used to find cracks and bad welds. The first step is to clean the part. The second is to
apply the penetrant, which will sit on the surface of the material and go into any cracks that are there. We then
re-clean the surface, leaving the penetrant in the crack. We apply a developer, which brings the penetrant back to
the surface so we can see where the crack is located. Fluorescent light may be used to see the crack.
Radiographic testing involves the use of either x-rays or gamma rays to view the internal structure of a
component. The rays go through to a receiver on the other side that detects the rays, and if there is a crack or
another problem, we can see it. These systems have become more sophisticated, and even automated, so this
NOTES
127
MOBIUS INSTITUTE | ARP-A R-13: Condition Monitoring
NOTES
128
MOBIUS INSTITUTE | ARP-A R-14: Breaking Out of the Reactive Maintenance Cycle of Doom
We have talked about several techniques so far for breaking out of the reactive maintenance cycle. Now we will
bring them all together. You will not have success in terms of reliability and performance improvement unless
you can break out of this reactive maintenance cycle. It is hard to break out unless you take specific steps.
What is the reactive maintenance cycle of doom?
We suffer from preventable failures occurring. Resources are taken by those breakdowns, making it difficult
to find time and technicians for proactive work. Because we are in a rush, repairs are performed poorly, or
temporary repairs are done. Therefore, there is a lot of repeat work and no RCFA to determine why the failures
are happening. No action is taken to prevent them from recurring. Some people may realize the cause of the
failures, but their suggestions are pushed aside. There are head and budget reductions. Morale in the plant
declines and standards drop. The backlog grows and PMs are missed. As a result, preventable failures occur as
NOTES
129
MOBIUS INSTITUTE | ARP-A R-14: Breaking Out of the Reactive Maintenance Cycle of Doom
STEP ONE
First, change the reliability culture. This is not easy because you will have to get everyone’s support, but it is
necessary. Eliminate the resistance and create bottom-up drive.
People need to understand how they benefit if reliability improves. They need to understand how they can
get involved in the improvement process, and finally, they need to be involved in the process—ask for their
suggestions and their opinions on how to implement their suggestions. We discussed the brown-paper review,
which is a way to get people from all departments involved in the process.
Ensure management are on board. To do this, you will need a strong business case, and education helps. Get
some wins on the board: start with a few visible projects, publicize the benefits, and reward the participants.
NOTES
130
MOBIUS INSTITUTE | ARP-A R-14: Breaking Out of the Reactive Maintenance Cycle of Doom
STEP TWO
Second, find the low-hanging fruit. Which assets are failing the most? Focus your attention where you will gain
the greatest impact.
STEP THREE
The third step is work management. Get more work done by the same team. Do a better job and reduce future
problems. Reduce costs and improve safety.
Take your best person off the tools and have him or her plan and schedule jobs. Plan at least one day in advance.
The planner should not have any other duties (that person’s focus will make everyone else more efficient).
Each planner/scheduler should manage 15-20 maintenance technicians. Planned and scheduled jobs are more
efficient than unplanned jobs. Due to the 20% gain in efficiency, you will gain 8 people on a 40-person team.
Operations must contribute and buy in to the maintenance plan. Mend any fences that need to be mended so
you can all work together. Note that you do not need a fancy computer maintenance management system at this
stage. It may overburden your team. You need a system, but it does not have to be that sophisticated.
STEP FOUR
Fourth, we need communication and cooperation to ensure planned jobs get done and that there is a focus on
adding value.
Maintenance needs to see things through the eyes of operations. Operations is there to produce a product
that generates revenue. None of us exist unless operations provides the product or operates the equipment
to provide the service. Operations gets frustrated when they miss targets due to equipment failure (or to
maintenance).
Likewise, operations needs to see things through the eyes of maintenance. Maintenance needs access to
machines today so they are available (and safe) tomorrow. Operations needs to understand how to operate the
equipment properly so it does not generate future failures.
As part of this, we need efficient morning meetings to lay out the plan. We need cooperation and agreement on
the maintenance plan. Afterward, we need feedback on whether the plan was correct. And we need to institute
standard operating procedures.
STEP FIVE
Fifth, we need to do everything to eliminate the root causes of failure. We really need a laser focus on this.
If we can identify through Pareto analysis which equipment is causing us the most problems, and find out why that
is happening, we can start eliminating the root causes. To do this, we also need to understand criticality, do a bit of
RCFA, and understand through common knowledge that lubrication and shaft alignment, for example, are essential.
NOTES
131
MOBIUS INSTITUTE | ARP-A R-14: Breaking Out of the Reactive Maintenance Cycle of Doom
We need to understand criticality so we can deal with the machine that has been wasting our time, costing a lot
of money, and generating a lot of waste. This will allow us to make substantial improvements within 12 months.
To break out of the reactive maintenance cycle, it is crucial that we assign one person to perform nothing but
proactive work. Take another one of your best, most positive people off the tools for this. Otherwise, the failures
will keep occurring.
To break out of the reactive Start prioritizing and instituting proactive tasks. Make sure bearings are
maintenance cycle, it is installed properly, machines are aligned properly, electrical connections
are made properly, etc. Use the PMO technique to eliminate unnecessary
crucial that we assign one
PMs. Do this even if you are not ready or able to do the full RCM at this
person to perform nothing time.
but proactive work. Follow the 5S system and the visual workplace. Getting things clean
and organized has a psychological benefit and also makes things more
efficient. Organize the storage areas with labels, and the lubricant storage, and make sure we can easily see the
NOTES
132
MOBIUS INSTITUTE | ARP-A R-14: Breaking Out of the Reactive Maintenance Cycle of Doom
oil levels. Can workers tell what the pressure level should be?
STEP SIX
Sixth, utilize condition monitoring. Initially, we can use basic techniques to detect failures that will occur in the
near future to begin to break the back of reactive maintenance. Feed the planning and scheduling process.
Start with a small program internally, since you may not have the skills or budget for more. Handheld vibration
analysis, ultrasound, or simple IR are good places to start. Alternatively, use outside consultants for more
sophisticated testing and troubleshooting.
You will have to make a decision about which technologies to use. Criticality shows us which machines to test,
and common knowledge will tell us which technologies are best for that equipment and its failure modes.
NOTES
133
MOBIUS INSTITUTE | ARP-A R-15: Continuous Improvement
Continuous improvement is an important part of the reliability improvement initiative. We will not get it right on
day one, so we have to improve the program. But you may wonder when the reliability improvement program
ends. The answer is “Never.” If we take the focus off these reliability improvement initiatives, we will slip back into
old habits.
Take a leaf from the “safety” book. Companies do not have a three-year safety program and then finish thinking
about safety. Reliability improvement is a living program.
We continue to look for opportunities to improve or, at the very least, make sure we do not lose ground. We
have to continue to refine and review our understanding of the business, the criticality ranking (it may have
changed due to resolved problems), and the reliability strategy. In another module we assessed our strengths
and weaknesses, and we also looked at business: what it was trying to achieve, our constraints, our risks, and our
opportunities. However, business changes, as do economic conditions, the availability of capital, the competition,
etc., so we need to keep up.
NOTES
135
MOBIUS INSTITUTE | ARP-A R-15: Continuous Improvement
Continue to perform RCFA to learn from failures and make further improvement. Record KPIs, monitor progress,
and occasionally set new targets (auditing and benchmarking). And continue to educate. People forget things and
lose their awareness of issues, and their skills need to be refreshed. Employees change their positions, and new
employees come in.
We need to continue to communicate the wins and the mistakes—we need to learn from mistakes and be
encouraged by the wins. This is also important so that senior management sees the value in what we are doing
and so that people on the plant floor continue to be energetic.
We need to improve the results achieved and sustain the momentum of the reliability improvement initiative.
NOTES
136
MOBIUS INSTITUTE | ARP-A R-15: Continuous Improvement
There are leading and lagging KPIs. Lagging metrics look backward. They measure the effect your program had
in the past. For example, mean time between failures (MTBF) deals with the failures that you have experienced
in the past. Maintenance costs also deal with things that happened in the past, but they are not necessarily an
indication of what we can expect in the future.
Leading KPIs provide an indication of what you can expect to achieve in the future. If our oil is clean, we can
expect fewer failures. If condition monitoring tasks are performed on schedule, we can expect improvement. Also
look at the current number of planned jobs.
What should we measure? We need safety-related KPIs (this might relate to OSHA in the US). There are a number
of maintenance-related KPIs, including availability, utilization, OEE, PM compliance, and schedule compliance. We
spoke about total capacity near the beginning of the course. Those issues that lead to reduced capacity can be
measured with KPIs.
There are a few “dos” and “don’ts” with KPIs. Be careful what you measure because you tend to get what you
measure.
• People tend to focus on achieving goals, potentially to the exclusion of all else
• You want the KPIs to be indicators of the desired outcome (they should be aligned with your strategy and
what the business wants to achieve)
• Only have KPIs of measurable metrics (and ensure that everyone agrees on the equation)
• Don’t measure too many things
• Balance your metrics—don’t just focus on one area
• Update them at least annually
• Make sure people do not adjust goals or activity just to meet KPIs
CONTINUAL EDUCATION
Continual education is also important in continuous improvement. We provide training so people have the skills
and awareness, and so that they buy in to the program.
Donald Rumsfeld famously said, “There are known knowns; there are things we know that we know. There are
NOTES
137
MOBIUS INSTITUTE | ARP-A R-15: Continuous Improvement
known unknowns; that is to say, there are things that we now know we don’t know. But there are also unknown
unknowns—there are things we do not know we don’t know.” Some people may think they are doing something
correctly, but what they think they know may not be correct.
Are you sure you know what you think you know? And what about everyone else in the organization? This quote
was attributed to Mark Twain, among others: “It ain’t what you don’t know that gets you into trouble. It’s what you
think you know that just ain’t so.”
Hopefully you have learned some things from this course. But think about a course you took in the past: how
much did you remember about the course two days later? Six weeks or six months later? Our memories may
NOTES
138
MOBIUS INSTITUTE | ARP-A R-15: Continuous Improvement
Reliability is a long journey be faulty. We have to refresh our knowledge with training, conferences,
and the review process will books, e-learning, etc.
enable us to take stock of Our memory is not perfect, and if we don’t always use it, we may lose it.
Think of how often people are retrained in safety topics and procedures.
what has been achieved and
Therefore, we need to refresh our memories and improve our knowledge
what comes next. frequently.
Managers often worry, what if we train people and they leave? They should say, what if we don’t train them and
they stay? We have problems when people do not understand the technologies, the principles, etc. If you are
worried about people leaving, you may need to pay them more.
In conclusion, reliability improvement is an endless process, just like safety improvement. If you relax for a
minute, the plant may slip back into its old habits and you will once again experience poor performance. It is
therefore essential that you measure, analyze, and communicate continuously.
NOTES
139
MOBIUS INSTITUTE | ARP-A R-16: Implementation Strategy
This module acknowledges that you now understand the reliability improvement process, developing the asset
strategy, justifying the program, and so on. You are now ready to take the exam if that is your desire. What we
would like to do in this module is tell you about a method that we have of implementing the program. Different
companies have different approaches to implementation.
I briefly mentioned the implementation roadmap, or our “reliability success master plan.” If you follow the steps
and stages on the roadmap, you will be successful.
Let’s address the elephant in the room. How can a single roadmap fit everyone’s situation?
Everyone’s situation is different: different industries, different people, different management, different goals,
different implementations, different ages of plant, different regulations, and so on. But to a large degree,
everyone’s situation is the same. We all need management support to make this work. You must achieve a
culture of reliability. You must understand your organization’s goals. You must know what you are good at, and
NOTES
141
MOBIUS INSTITUTE | ARP-A R-16: Implementation Strategy
you must recognize your weaknesses. You must eliminate the root causes of failure. You must measure and sell
the progress of your program—even though you may be great at your job, you cannot trust that management
will see your value unless you continue to sell yourself.
Most implementations struggle in the same places: trying to focus on technical solutions, focusing too heavily on
reliability analysis (instead of reliability improvement), leaving it to outside consultants, doing a bit here and a bit
there without a plan, trying to force reliability down people’s throats, improving reliability for reliability’s sake, etc.
We developed this implementation roadmap to make sure we do not fall into those traps and that we deal with
all the things that need to be dealt with for a successful strategy.
NOTES
142
MOBIUS INSTITUTE | ARP-A R-16: Implementation Strategy
NOTES
143
MOBIUS INSTITUTE | ARP-A R-16: Implementation Strategy
We need some information, and we need the time, and we need maintenance under control so that we can
properly plan. We must break out of the reactive maintenance cycle or nothing else will work. To do this, we must
lay the groundwork. These steps are setting us up for success. We need the right support and to get the people
on board.
The beginning is where we determine why the business needs to change. We assess where we are,
understanding the business needs and determining what the gap is. We set the KPIs. We are establishing the
business case. We go on to make sure we have internal support and senior management support.
In order to do that, we develop pilot projects to prove that reliability works. We choose visible projects, get the
right people involved, take on the projects, and measure the benefits. We will use our initial success to refine and
NOTES
144
MOBIUS INSTITUTE | ARP-A R-16: Implementation Strategy
NOTES
145
www.mobiusinstitute.com
VIBRATION ANALYSIS & CBM ISO 17024 / ISO 18436-1 RELIABILITY TRAINING & GLOBAL CONDITION A CONTENT-RICH COMMUNITY
TRAINING & CERTIFICATION ACCREDITED CERTIFICATION MONITORING CONFERENCES FOR CBM PROFESSIONALS