2022 Maintenance Pillar Handbook (180-240)

MAINTENANCE PILLAR BOOK
Select Process Area and Team

Previously, the focus machine has been selected using a structured analysis of downtime. However, if
Phase 2 has been implemented well, reliability should be very high, and it will be meaningless to select
the focus machine by looking at downtime. The focus machine needs to be selected by looking at waste
produced by the machine e.g. can losses, bottle breaks, oxygen out of spec, beer loss.
Once the machine or process area has been selected, then the team can be selected. The team
consists of the people who work in that area. It is important that a process expert such as a process
engineer is included in the team. The process expert will coach the operator on the process.
Train the Operator on the Process

The operator is trained on general process engineering principles such as heat and mass transfer,
thermodynamics, fluid mechanics etc. Further the operator is trained on process control systems and
process equipment work (e.g. heat exchangers, control valves etc.).
The operator is then trained on how their equipment’s specific process e.g. bottle washing, lautering or
CO2 liquefaction. The theoretical training that was provided earlier to the operator will allow the operator
to better understand their equipment’s process.
Process Analysis
In this step, the equipment or process area is analyzed to understand how product quality and waste is
related to process inputs and machine condition. For example consider the jetter process on the filler.
This process is analyzed by the operator to understand process inputs (in this case jetter water
temperature and jetter water pressure) and machine conditions (in this case jetter nozzle diameter)
impact on product quality (in this case TPO) and waste (in this case filler beer loss).
The operator will map out the entire machine or process area in order to understand how the various
machine processes and conditions affect product quality and waste. Additionally the process input limits
and machine condition limits are defined. For example, what is the required temperature range for jetter
water?
Define Process Control Points

Once the process and machine condition maps are developed and understood by the operator, the
operator will then need to define the process and machine control points. For example what must be
controlled in order to ensure jetter water temperature or what must be controlled to ensure jetter nozzle
diameter.
As a result of this analysis the operator will define for the process:
• Machine conditions that have to be sustained via maintenance (cleaning, inspection or
lubrication)
Confidential ©AB InBev 2021. All rights reserved. Page 180 of 282
• Control points of the process that must be managed to ensure the process runs perfectly. This is
done through inspection standards and/or by using process control charts.
Manage Equipment Conditions and Process Control Points

The process and machine conditions critical to maintaining product quality and waste were defined in
the previous step in terms of improved cleaning, lubrication, inspection standards and process control
charts. In this step, both the enhanced cleaning, inspection and lubrication standards and the process
control charts are implemented by the operator. The implementation of this is monitored to check that
the process and equipment is properly controlled and that waste and quality defects are largely
eliminated.
Implement Enhanced Maintenance Techniques

The next step aims to transition some of the operator cleaning, inspection and lubrication standards to a
condition-based approach. For example, a cleaning standard may require that a plate heat exchanger is
opened and cleaning once per year. This can be changed to a condition based standard where data is
gathered on the fouling of the heat exchanger and pressure drop across the heat exchanger. In this way
a complex cleaning and inspection cleaning task can converted into a simple condition monitoring task
where the pressure drop across the heat exchanger is monitored.
Other more traditional condition monitoring tasks such as simple oil sampling, vibration and
thermography can be transferred to the operator.
Once this is complete, a new focus machine can be selected and the entire process repeated.
4.10 Maintenance Program Development

Introduction
The purpose of maintenance plan development is to provide a structured methodology for the creation of
preventative and predictive maintenance routines. There are many different approaches to do this and
this policy aims to harmonize the various approaches into one logical process.
Policy
Maintenance program development calls for the use of several techniques (FMECA, RCM etc.) for the
development of preventive and predictive maintenance programs. Each site is expected to implement
these techniques as per the maintenance handbook.
The job matrix for maintenance supervisors, engineers and managers needs to reflect the key skills in
this area:
• Using OEM manuals to develop basic preventive and predictive maintenance schedules
• Using the outcomes of problem solving and cleaning & tagging to continuously improve
preventive and predictive maintenance schedules
• Using Abridged RCM to develop preventative and predictive maintenance schedules for B
criticality equipment
• Using FMECA and RCM to develop preventative and predictive schedules for A criticality
equipment
Each site is expected to develop structured check sheets for FMECA and RCM analysis. These need to
be aligned with the standards defined in the maintenance handbook.
Procedure
A comprehensive description is provided in the “Detailed Implementation Guide’ sub-section on how to
implement FMECA and RCM. This approach has been linked to criticality analysis thereby providing an
integrated method for developing strong preventive and predictive maintenance programs.
Responsibilities
It is the responsibility of the site Technical Services Manager to ensure that Maintenance Program
Development is implemented on a site including compliance to the requirements of the maintenance
handbook, documenting the analysis using appropriate check-sheets and ensuring that employees are
skilled in the procedure.
It is the responsibility of planners and maintenance supervisors to ensure that Maintenance Program
Development is carried out.
FMECA Process RACI
Technical Services Technical Services

Task Training Manager
Manager Supervisor
Create Maintenance
Program Development A CI R
(FMECA/RCM) skills
Link to maintenance
A R
system
Develop FMECA check
A R I
sheets
Link to department
A CI R
training plan
R = Responsible
A = Accountable
C = Consulted
I = Informed
Note: Changes in plant organizational structure may change RACI.
Detailed Implementation Guide

The development of preventive and predictive maintenance schedules is one of the cornerstones of a
maintenance system. If done correctly, an efficient and effective maintenance program is created. If
done poorly however, significant resources can be tied up performing maintenance activities that have
limited benefit.
The logic of developing a preventative maintenance program is outlined in Figure 42 below:

• Our initial aim is to perform all the maintenance that the OEM manufacturer recommends. These
recommendations (preventive or predictive) are captured into the CMMS system.
• Despite performing the OEM recommended maintenance schedules, failures will still occur.
Problem solving will indicate which category of the 6M these problems fall is into. Those that are
machine related will often (but not always) require an improvement to the existing predictive or
preventative maintenance schedule. Additionally, as cleaning and tagging is performed, those
tags with a machine root cause will also result in improvements to the maintenance schedule.
Routine reviews of maintenance schedules may result in improvements to the maintenance
schedule. In this way through problem solving, tagging and routine reviews, preventive and
predictive maintenance schedules are continuously improved.
• As the autonomous operations program is implemented, cleaning, inspection and lubrication
standards for the operator are developed.
• Once equipment reaches 90 % LEF (98 % availability for a process area), forced deterioration
has been mostly eliminated and time effort can be spent developing more sophisticated
maintenance programs. For B criticality equipment the Abridged RCM process is required. This
process is less onerous than the FMECA + RCM process.
• For A criticality equipment operating above 90 % LEF (98 % availability for a process area) the
FMECA + RCM process is required.
• The zone may have best practice maintenance programs for common equipment. These
maintenance programs must be adopted by plants as they represent a more comprehensive
view of maintenance tasks.
Below 85 % LEF, forced deterioration is the major driver of equipment failure and the central effort of the
maintenance program is to eliminate forced deterioration.
Figure 42: Maintenance program development logic
OEM Recommendations
OEM recommendations are the starting point of all maintenance programs. The OEM has particular
knowledge about the technology installed and will have specific requirements defined in the
maintenance and operating manuals.
All plants need to ensure that the requirements of the OEM are captured into preventive and/or
predictive program.
Continuous Improvement
While OEMs can provide a basic maintenance program, their recommendations form at best a basic
maintenance program. Problems you will encounter with OEM recommended maintenance include:
• Frequency of the maintenance tasks are incorrect
• Maintenance tasks are missing
• Maintenance tasks have no specification e.g. check belt for wear
• Maintenance tasks are generic e.g. check machine for damage
It is important that basic OEM maintenance programs are improved. This is done in three ways:
• Improvements from equipment failure problem solving
• Improvements from tag root cause analysis
• Improvements from routine schedule reviews
When equipment fails, it will meet a problem trigger as defined in the problem-solving section of the
maintenance handbook. Problem solving will then be performed to identify the root cause. In general,
there are six categories of root causes (known as the 6M’s); man, material, method, machine,
environment and measurement. If the root cause falls into the machine category, then a preventive or
predictive maintenance task can in most cases deal with these kinds of problems. Thus, when
equipment fails, and a machine root cause is found, existing maintenance programs can be improved by
including appropriate tasks to counter the machine root cause. For example if a cam fails because
natural wear and tear, then a preventative maintenance task can be put in place to measure the cam
wear and to replace the cam before failure.
It is critically important to note that preventative maintenance tasks should not be created for non-
machine root causes i.e. man, material, method, environment and measurement. These types of root
causes can be dealt with in other ways. For example, if a machine fails because the operator did not do
a changeover correctly, it does not make sense to put a preventative maintenance task to inspect for the
failure. Rather train the operator to perform the changeover correctly.
For example, if a machine fails because the operator did not do a changeover correctly, it does not
make sense to put a preventative maintenance task to inspect for the failure. Rather train the operator to
perform the changeover correctly.
Maintenance programs can be further continuously improved through the root causes of tags. As
cleaning and tagging is done in the plant, various tags are analyzed to define their root causes. As with
equipment failure, if the root causes are machine related, then preventative or predictive maintenance
tasks can be created to manage this. Once again if the root cause is not machine related, then the
preventive maintenance program must not be used to manage those problems.
Finally, maintenance programs can be improved through a regular review routine. This routine is
performed by the machine owner/expert and maintenance planner who review the maintenance
schedule.
Maintenance program reviews should be in the MCRS of the machine owner/expert and the planner. All
equipment in the plant should be reviewed at least once in a 3 year cycle. The following items should be
reviewed as part of the process:
• The PM or PdM is allocated to the correct resource
• The estimated time on the PM or PdM is correct. The planner can review actual times taken for a
task and compare then to the estimated time and improve the estimated time.
• Check that the frequency of the maintenance schedule is correct. Initially when a schedule is
created there is no solid indicator of a frequency so often this may turn out to be a guess. As the
maintenance schedules are executed, defects or even failures may occur. We are typically
looking for 1 defect from 3 inspections. This rule of thumb will allow the planner to review how
many defects arrive from a particular schedule and adjust its frequency to achieve a 1 in 3 ratio.
• Confirm that all tasks on a PM or PdM have similar failure rates i.e. items that can fail weekly
should not be mixed with items that can fail montly
• Confirm that ALL OEM required maintenance tasks are on the PM or PdM schedule. If not, there
must be a good reliability related reason for this.
• Check if there are tasks in the PM or PdM schedule which are trying to solve non-machine
related problems. These should be eliminated, and the non-machine related root cause solved
using other methods.
• Check that all maintenance tasks have a specification. Often maintenance inspections have no
specification leaving it up to the opinion of the person doing the inspection tasks. All tasks need
to have a specification so that it is clear if an item is within specification or not. For example a
task that says “Check that v belts are not loose” is completely open to interpretation. This
maintenance task should have a specification e.g. “Check that v belts have a tension of 4.8
Newtons following the method described in the work instruction M78904”.
• Assess if there are missing tasks. Look at historical failures and if the root cause of these failures
are machine (but not random), then there should be a PM or PdM task.
• Assess if there are tasks that should not be there. Sometimes there are tasks in a maintenance
program that should not be there. It could be that the task is outdated as the machine has been
upgraded or that the task is superfluous as it is trying to deal with a non-machine related failure
mode (such as operator error)
• Assess if there are duplicated tasks either within the same maintenance program or in another
maintenance program with a different frequency. Eliminate one of the duplicate tasks.
• If the zone has standardized zone PM’s for particular machines, ensure that the plant’s PM
aligns with the zone standardized PM.
The above items are contained in a template (Figure 43) which can be used to manage and document
the review process. Through a process of problem solving, tag root cause analysis and routine reviews,
a basic OEM maintenance program can be continuously improved into a strong well-developed
program.
PM Review
Reviewers Name Date of Review
Machine SME Name
Other Participants
A. Details of Existing PM
PM Name
Equipment
Trade/Craft (Mechanical, Electrical, Automation)

Frequency
B. Overall PM Review
Y/N Corrective Action
1. Is this PM attached to the correct functional location level?
2. Are the safety instructions on the PM approriate for the tasks being performed?
3. Has the schedule been allocated to the correct person or group?
4. Is the estimated time correct for the PM?
5. Is the frequency of the PM correct?
6. Are all the OEM requirements for this machine, trade and frequency been included?
7. If a zone PM standard for this equipment exists, does this PM contain all the zone defined
tasks?
C. Existing Task Review
Is this task Is the task practical If this is an inspection

addressing a and feasible at Does this task have Is the frequency of Is this task task, can it be replaced by
machine root cause addressing the a specification or this task duplicated in other a scheduled lubrication,
Existing Task failure mode? failure mode? SWI? appropriate? PM's? restoration or discard task. Corrective Action
D. PM Integration
Y/N Corrective Action
1. Review the root cause analysis of historic failures. Are there any machine related root
causes that this PM does not currently address?
2. Are there any similar equipment in the plant that needs to be updated with the changes
made to this specific PM
3. Are there any PM tasks in similar equipment in the plant that needs to be added to this
PM
4. Are inspection tasks mixed with scheduled lubrication, discard or restoration tasks?
5. If the PM covers multiple equipment, does the route make sense?
Reviewers Name Date

Reviewers Signature
Approvers Name Date

Approvers Signature
Figure 43: Template for review/optimisation of preventive routines
Autonomous Operations Standards

As the organization progresses autonomous operations cleaning, inspection and lubrication standards
(CIL) for the operator will be established. These CIL standards form part of the overall maintenance
program for the equipment.
There are specific guidelines in the autonomous operations section of the maintenance handbook
around how the CIL standards will need to be developed.
FMECA & RCM
The FMECA & RCM process is used for critical A equipment when LEF is greater than 90 % (availability
of a process area greater than 98 %). It is a very detailed and precise process that requires significant
time and effort. For this reason, implementation of the full FMECA & RCM process is limited to critical A
equipment. Additionally, only when forced deterioration is eliminated (i.e. LEF > 90 %) is it appropriate to
implement a more sophisticated maintenance program.
FMECA is an acronym for Failure Mode, Effects and Criticality Analysis. In essence, an FMECA process
involves:
• Breaking a machine into its component parts
• Assessing how each part can fail i.e. the failure modes
• Understanding what the impact is of each failure mode on the overall machine i.e. the failure
effects
• Determining the significance of each failure effect i.e. a criticality assessment based upon the
probability of occurrence and severity of the consequence
The criticality of a failure can be assessed using the probability/consequence diagram in Figure 44 as a
guide.
Figure 44: Probability and consequence matrix for criticality assessment
RCM analysis at this point complements the FMECA approach in that for each failure mode, RCM will
provide a decision tree that will allow you to select the most appropriate maintenance task to manage
that failure mode. Figure 45 illustrates the RCM decision tree.
The RCM decision tree is separated into two broad parts i.e. Level 1 and Level 2. Figure 46 illustrates
the decision tree with Level 1 elements circled. Level 1 of the decision tree aims to classify failures into
the following categories:
• Evident
o Safety/Environment
o Production
o Non-production
• Hidden
o Safety/Environment
o Not safety/environment
Evident failures are those failures that can be detected by the operator as he performs his normal duties.
Hidden failures are those failures that cannot be detected by the operator in his normal duties. Evident
and hidden failures are sub-classified into:
• Safety/Environment – those failures that affect safety or the environment.
• Production failures - those failures that affect production i.e. the ability to produce in specification
product
• Non – production – those failures that do not affect production directly i.e. in specification product
can still be produced
• Non-safety – failures that do not have an impact on safety or the environment
The reason for this classification is that depending on the impact of failure, a different scheduled
maintenance approach will be taken.
Figure 45: RCM decision tree
Level 2 of the decision tree focuses on specific scheduled maintenance activities that need to be
performed.
Figure 46: Level 2 elements of the RCM decision tree
There are 6 possible activities that are covered by Level 2:

• Lubrication and cleaning – determine if lubrication and cleaning prevent or mitigate this failure
from happening
• Operator inspection/functional check – determine if an operator inspection or test detect the
degradation (wear)
• Technician inspection/functional check – determine if inspection or test by a technician/artisan
detect the degradation (wear)
• Restoration – determine if a restoration of the component will eliminate failure
• Discard – determine if replacement of the component will eliminate failure
• Redesign – determine if redesign will eliminate failure
In using the Level 2 decision logic, the user is required to evaluate the technical feasibility and financial
impact of each possible scheduled maintenance activity so that they best schedule maintenance activity
is chosen.
Figure 46 illustrates the Level 2 decision tree for an evident production effect failure. To perform the
RCM analysis the decision tree is followed. The first question is whether lubrication or cleaning tasks will
be effective in delaying or eliminating the failure. To answer this question, you will need to determine two
things:
• From a technical point is there evidence that cleaning or lubrication will delay or eliminate failure.
If there is no evidence or strong indicator that this will be the case, then cleaning or lubrication is
not technically feasible.
• From a financial point of view, if the task is technically feasible, then the cost of the task must be
established. For a cleaning or lubrication task, that cost will be the cost of the labor, the standing
time of the machine, the cost of the cleaning materials etc. Obtaining costs will allow you to then
evaluate multiple technically feasible tasks against each other to select which one will be the
most cost effective.
Once the cleaning/lubrication question has been answered, the next task, which is operator crew
monitoring, must be evaluated from a technical feasibility and cost point of view. In fact, every
maintenance task on the evident production effect decision tree must be evaluated from a technical and
cost point of view. At the end, you will have a list of maintenance tasks that are technically feasible and
the costs of each of these feasible tasks. The most cost effective technically feasible task is then
selected.
Figure 47: Decision tree for a production effect failure
Given the above description of the FMECA and RCM processes they can be applied in the following way
to develop a maintenance package.
Step 1 – Partition the Machine
The selected machine is broken down into its component parts. The partitioning can be done using the
template below.
Figure 48: Template for partitioning equipment
Step 2 - Perform a FMECA analysis

The lowest level partition components have their failure modes, effects and criticality analyzed. The
FMECA analysis can be done on the template in Figure 49. Once the criticality of each failure mode is
assessed, a decision needs to be made as to whether the item will be maintained (hence a RCM
analysis is done) or the item will not be maintained (no RCM analysis). Usually items that will not be
maintained are those items with a very low probability of occurring and a low severity if they do occur.
Figure 49: FMECA template
Step 3 – Apply the RCM Decision Tree

For those failure modes where a decision was taken to perform maintenance, the RCM analysis is
performed. Each failure mode is assessed against the RCM decision tree. All technically feasible
maintenance tasks are selected and the cost of performing these maintenance tasks is evaluated. The
lowest cost technically feasible task is then selected as the best maintenance task to prevent the failure.
The results of this analysis are captured in the template in Figure 50.
Figure 50: Template for capturing RCM results
Abridged RCM
The RCM methodology offers a powerful way of developing planned maintenance schedules. However,
due to their complexity RCM cannot be used for every piece of equipment in the plant. The guidance is
that it is used only for critical A equipment. For critical B equipment a shortened form of RCM is used
called Abridged RCM.
The abridged RCM process steps are defined in Figure 51.
Figure 51: Abridged RCM process
Eliminate Forced Deterioration

The maintenance journey of a plant begins by eliminating forced deterioration. This creates a basis for a
sustainable and effective planned maintenance system. Therefore, until forced deterioration has been
eliminated as per the Eliminate Forced Deterioration Guidelines, there is no point considering the
implementation of the Abridged RCM process.
Select Machine Root Cause Failures

Unlike the tools used to eliminate forced deterioration (problem solving, technician clean and tag,
Autonomous Operations) a loss and waste is not used to select the key machines to start the
implementation of Abridged RCM. If forced deterioration was correctly eliminated, individual machine
reliability should be relatively high. Rather criticality analysis should be used as the basis to select
process areas/machines for the implementation of Abridged RCM, starting with the most critical
machines in a department.
Once the critical process area/machine has been selected, then an analysis must be performed to
identify the machine root cause failures of the selected process area/machine over the past 6 to 12
months. It is important to note that this is not all failures but rather those failures where the root cause is
machine. As noted earlier only machine root causes can be effectively dealt with by a scheduled
maintenance system. It is expected that you should have no more than 20 to 30 such machine root
cause failures per process area/machine over a period of six months to a year.
Apply the RCM Decision Tree

For every machine root cause identified in the previous step, the RCM decision logic needs to be
applied. Figure 53 illustrates a template that can be used to capture the results of the decision tree
analysis.
Define the Maintenance Task

Once the RCM decision logic has been completed, there should be a clear identification for every
machine root cause, the best-scheduled maintenance approach from a technical and cost point of view.
This scheduled maintenance approach (e.g. a discard task) however still needs to be converted into a
scheduled maintenance task. In order to do this the task must be properly defined as per Figure 52.
Task Description Machine

Task Task Frequency Task duration Who
and specification State
Figure 52: Excerpt of template for capturing details of the task
Operator Inspection or Artisan Inspection or

Lubrication or Cleaning Functional Check Functional Check Restoration Task Discard Task Run to Failure Redesign
Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what
Failure applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per
Process Area Failure Description Machine Root Cause Description Category (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum?
Figure 53: Example of a template to capture the RCM decision tree analysis
Operator Inspection or Artisan Inspection or

Lubrication or Cleaning Functional Check Functional Check Restoration Task Discard Task Run to Failure Redesign Most effective option Second most effective option (if required) Third most effective option (if required)
Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Is it If yes, what Task
Failure applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per applicable? will it cost per Task Description incl Task Task Machine Task Task Machine Task Description incl Frequen Task Machine
Process Area Failure Description Machine Root Cause Description Category (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? (Y/N) annum? Task specification Frequency Duration Who State Task Task Description incl specification Frequency Duration Who State Task specification cy Duration Who State
Figure 54: Example of the complete RCM template including creating of the maintenance tasks and allocation of
resources
It is important that when a maintenance task is created, it has a clear specification. For example, if a
task is created to check for wear, then there must be a specification for how much wear is acceptable or
not. It is not good enough to create a task with requires a specification but does not have one. Some
tasks such as discard will not require a specification but tasks such as operator inspection or technician
inspection must have a specification included.
Task frequency is the next critical area that needs to be defined. Failure history (Mean Time between
Failures) on the component should give you an idea of the frequency of the scheduled maintenance
task. For example, if a component fails every year then a scheduled task just shorter than a year will
prevent the failure. It is important that the MTBF history if used to determine task frequencies covers
only the period of natural deterioration, not forced deterioration.
As part of the maintenance schedule review cycle, the sub work orders arising out of a scheduled
maintenance task will be assessed. Typically, the sub work order % needs to be 30 %, so if the
frequency of a task is incorrect, it will be identified by a low or high sub work order %.
Task duration also needs to be defined. This is informs the planner determine how much time it will take
a resource to complete the task and is especially important when developing the entire maintenance
plan/schedule for the week.
Define Resource Requirements

For every maintenance task, a resource needs to be allocated. Resources could be the operator, shift
technicians, day shift technicians or even external resources such as OEM’s.
Update Schedule
With the completion of the previous step, the maintenance tasks to prevent machine root cause failure
should be thoroughly defined. The planned maintenance schedules for the machine can then be
updated with the new tasks.
However, when this is done, there will be some older tasks on the planned maintenance schedule that
are already being executed. It is worthwhile reviewing these tasks to ensure that:
• Are the existing tasks designed to address machine root causes or are they addressing other
root causes such as man. If they do not address machine root causes then they should be
removed. Other actions may be necessary to manage non-machine root causes
• Do all the existing tasks have a specification (if the task requires it)? If not, then the tasks need to
be modified to include a specification.
• Are the existing tasks the most economic and technically effective to prevent the failure?
Considering the logic of the RCM decision tree are there alternative maintenance tasks that are
perhaps better than the existing maintenance task.
• Is the frequency of the existing tasks correct? A useful check of this is the measure the sub work
order % for the maintenance task. If less than 30 % then the task frequency is too high; while if
more than 30 % then the task frequency is not high enough
• Confirm that the correct resources are allocated to existing tasks
• Review the planned time against actual time and confirm if the planned time for the existing tasks
are correct
Once the new tasks have been built into the schedule and existing tasks reviewed the entire process is
complete, and the next critical machine can be selected for the process to be repeated.
4.11 Problem Solving

While problem solving and root cause analysis is done widely across the business in different functions,
for maintenance, problem solving is a critical and core capability. The Management pillar provides
overall requirements around problem solving. This section of the maintenance handbook provides
specific guidelines on how to perform problem solving and root cause analysis within the maintenance
function.
Definition
The term problem solving is used widely and has different meanings for different people. For VPO,
problem solving is a systematic way of defining, analyzing and solving a problem so that the root cause
of the problem is managed. The finding and managing of root causes is the key point.
Often in everyday language, people will refer to the fact that they are problem solving but the reality is
that they are merely trying to find and solve the symptom of the problem rather than the root cause. For
example, a technician may spend several minutes ‘problem solving’ a conveyor which does not run
properly. He may discover that the drive sprocket was worn, replace the drive sprocket and then say that
he has solved the problem. Within the AB InBev definition of problem solving, the technician has merely
found the symptom of the problem and not the root cause.
Root Causes and Symptoms

Symptoms are the apparent cause of a problem while root causes are the ultimate cause of a problem.
Consider the example of a worn sprocket that results in a conveyor not running. The worn sprocket is a
symptom and not the ultimate cause of the problem. Let’s assume for this example that the reason the
sprocket is worn, is that the lubrication system of the conveyor was setup incorrectly because the person
who needed to set this up, did not completely understand how to do this.
By replacing the work sprocket, the problem is only temporarily fixed. The person who is responsible for
setting up the chain lubrication system will still not know how to do this and you will find that the worn
sprocket will come back in several months’ time. Even if you put a preventative maintenance plan in
place to pick up the worn sprocket and prevent the failure, you are spending needless resources trying
to manage a problem that can be easily eradicated. By finding and solving the root cause of the problem
(in this example training the person who does not know how to set up conveyor lubrication systems) the
problem is permanently eliminated.
Figure 55 illustrates how a single root cause can have a number of symptoms. Much effort and energy
can be spent trying to resolve each symptom. However, despite all this effort, if the root cause is not
solved, the problem will persist.
Figure 55: Illustration of the symptoms of a root cause
There are 6 possible categories of root causes:

• Machine – these are root causes related to the equipment or component. For example, normal
wear and tear on a gearbox resulted in it ultimately failing.
• Method – these root causes are related to how a task was performed. Perhaps the task was not
performed to the right standard because the procedure did not clearly define this standard.
These kind of root causes are method root causes.
• Materials – these root causes pertain to materials such as out of specification raw materials, poor
quality spares etc.
• Measurements – these root causes are problems arising from incorrect or poor measurements.
• Environment – these are root causes arising from problems in the environment in which the
equipment is located. For example, excessive heat in a plant causing the failure of electronic
cards.
• Man – these are root causes of problems where people are the cause. This could be because of
a lack of skill, an error of judgment etc.
Structured Analysis of Downtime

Many problems occur every day in a manufacturing facility. One of the biggest challenges is finding the
right problem to focus on especially since what appears to be a big problem may not necessarily be a
big problem. The Structured Analysis of Downtime is a structured analysis that aims to help identify the
real problems facing the plant.
In a manufacturing operation, there are three broad categories of waste or losses:

• Time losses—these are events that result in time being lost e.g. a machine breakdown
• People losses—these are losses that waste peoples time such as having to walk a long way to
get tools
• Resource losses—these are losses that affect materials and energy used in an operation e.g.
incorrect process conditions result in product waste
A comprehensive Structured Analysis of Downtime will ultimately analyze all the above three categories.
Initially however, time-based losses are a good starting point. From a maintenance perspective, time
losses affect LEF and therefore are a critical loss to understand.
Figure 56: ABInBev packaging waterfall graph
Time losses are fully described in the AB InBev packaging waterfall graph contained in Figure 56. In
essence, time is lost due to:
• Not scheduled time—this is time an operation could run but is not run because there is no
demand
• Downtime of planned activities—this is time lost due to activity that are necessary and are
planned. For example, a changeover from one pack to another is a planned activity that results in
downtime
• External causes—these are time losses that occur due to causes outside of packaging. For
example, if brewing cannot supply product, then this is an external cause.
• Failures—these are time losses that occur when machinery breaks down
• Speed losses—these are time losses that occur when a machine does not run at the speed it
was designed to run at
• Quality losses—these are time losses that occur when a machine produces out of specification
product.
Figure 57 illustrates a Structured Analysis of Downtime. The basic process is outlined below:
• Time losses at a packing hall level are analyzed by reviewing line GLY. The most problematic
line is identified.
• For the most problematic line, the packaging waterfall is analyzed to understand where time is
being lost i.e. downtime of planned activities, external causes, failures etc.
• If failures at a line level are significant, then time losses for failures are analyzed at a machine
level. This will help identify the most problematic machine
• For the significant machine failures, an analysis of object parts is done to identify which object
parts have the highest failure rate. Object parts are the sub elements that make up a machine
e.g. filler infeed, filler drive, filler co2 system etc.
Figure 57: Illustration of a time Structured Analysis of Downtime
A key principle of a time based Structured Analysis of Downtime is the time horizon that losses are
analyzed over. Long-time buckets, dilute the impact of once of problems and highlight the impact of
smaller yet persistent problems. Consider for example a labeler gearbox that fails and results in 8 hours
of downtime on a shift. This 8-hour failure appears to be a big problem. On the same packaging line
however, there is also problem with falling bottles on the filler infeed, which causes on average 20
minutes of downtime per shift. When the falling bottle problem of 20 minutes per shift is compared to the
gearbox failure of 8 hours per shift, the gearbox problem is clearly the biggest problem.
However, if these problems were compared over a period of a year, the labeler gearbox failure would
have happened once resulting in 8 hours per year of downtime. On the other hand, the falling bottle
problem at 20 minutes per shift results in 250 hours of downtime per year! Now the falling bottle problem
is the biggest problem not the labeler problem. In fact, the falling bottle problem is more than 30 times
bigger than the gearbox failure problem.
The time horizon recommended for a time based Structured Analysis of Downtime is:
• Rolling 52 weeks of downtime – this time bucket is used to identify long term performance
• Rolling 12 weeks of downtime – this time bucket is normally used to select the top problem. It is
sufficiently long to identify chronic problems but not too long, such that any improvements take a
very long time to reflect.
• Rolling 4 weeks of downtime – this time bucket is used to give you a short-term indication of
what is happening on a problem.
The 52/12/4 week analysis allows one to see trends on the Structured Analysis of Downtime (Figure
58). For decision making around where key problems are, the 12-week Structured Analysis of Downtime
is used.
Figure 58: 52/12/4 week downtime trends
The downtime in a Structured Analysis of Downtime is expressed as a percentage of scheduled time.

This normalizes the analysis so that comparisons between different packaging lines and the packaging
hall can be done.
Figure 59: Example of a machine level Structured Analysis of Downtime
Triggers for Problem Solving

For maintenance, it is required that there are three distinct triggers for problem solving:
• For each production area within a department (e.g. Line 3 or Filtration or engine room/power
house), an assessment shall be made of the biggest machine related failures during the shift
(downtime, quality incident related to a machine, waste problem related to a machine etc) and a
5 why analysis must be performed for this problem. Facilities with higher levels of maturity and
performance (LEF > 90 % and brewing and utilities downtime less than 2 %) must do more than
the minimum of one 5 why per production area per shift.
• The second trigger is that the top 3 equipment failures of the week in a department will be
problem solved using a technique more advanced than a 5 why such as an abnormality report.
• The final trigger is that problems arising out of the 12 week Structured Analysis of Downtime will
be problem solved using a sophisticated problem-solving tool such as PDCA. Note that problem
solving is not the only approach to resolve problems arising from the 12 week Structured
Analysis of Downtime. Cleaning and tagging can also be used as a mechanism to resolve these
problematic machines.
Trigger 1 - 5 Why Analysis

As mentioned earlier the key aim of problem solving in AB InBev is to find and solve the root cause of a
problem. By solving the root cause, we will ensure that the problem does not come back. The first line of
defense therefore is the 5 Why Analysis.
It is the responsibility of the person who fixes the problem to complete the 5 why analysis. In most
instances, this will be the technician who is on shift. The completion of the 5 why will require that the
technician works with the operator and team leader of the machine/line so that they can properly
investigate the problem and complete the 5 why analysis. A 5-why analysis of equipment failure should
not take longer than 10 to 15 minutes to complete.
It is very important that supervisors and managers elevate the importance of the root cause. If
conversations in production review meetings focus only of fixing the symptoms, then even though 5 why
analysis is done by technicians the benefit of them will not be achieved because supervisors and
managers do not consider them important. Supervisors and managers are required to review the 5 why
analysis, understand what the root causes are and make sure that there are actions to resolve the root
cause. These actions must be captured on the CMMS system if it is a machine related action or on an
action log if it is related to man, method, material or environment.
Setting triggers for 5 why problem solving has proven difficult, as often the triggers are set too high and
insufficient problem solving takes place. Based upon the 100 % Journey concept, operators and
technicians need to identify problems on their shifts and apply 5 why analysis to find the root causes.
Unless equipment ran perfectly, there should always be 5 why problem-solving taking place by every
individual on a shift.
Trigger 2 – Top Weekly Problems of the Department

Even if 5 why analysis is performed for all failures greater than 10 minutes, some big problems will
require a more thorough analysis. In order to achieve this, it is required that the top 3 problems in a
department (e.g. brewing), be problem solved using a more sophisticated tool than a 5 why. The tool
that needs to be used is the abnormality report.
In order to do this a small team is formed consisting of 3 to 5 people who have expertise in the problem.
Typically, this team would consist of a technician who is an expert on the machine, an instrument of
electrical technician if the problem involves electrical or automation, an operator and the shift technician
who fixed the problem when it occurred and performed the initial 5 why. This second level of problem
solving will typically last about an hour.
Once again, the actions to solve the root cause need to be captured as a work order on the CMMS
system if it is a machine related root cause or on an action log if it is a man, method, material,
measurement or environmental root cause.
Trigger 3 – 12 Week Structured Analysis of Downtime

The 12-week machine Structured Analysis of Downtime will identify repetitive or chronic machine
problems in the plant. These problems will exist despite trigger 1 and trigger 2 problem solving taking
place. The reason for this is that these problems are quite complex and require significant effort to
resolve. The problem-solving tool that will be used is the PDCA problem solving tool.
It is important to note that problem solving is not the only approach to solving problems that exist on the
12 week Structured Analysis of Downtime. Cleaning and tagging is another possible tool that can be
used. In general, cleaning and tagging is used to resolve problems on machines that are in poor
physical condition. Problem solving is used to solve problems on machines which are generally in good
condition.
A small team is formed consisting of 3 to 5 people who have expertise in the problem. Typically, this
team would consist of a technician who is an expert on the machine, an instrument or electrical
technician if the problem involves electrical or automation, an operator and the shift technician who fixed
the problem when it occurred and performed the initial 5 why.
This team will meet once or twice a week to perform the various steps of the problem solving. This team
will dedicate about 4 hours per week to this problem solving. Typically, it will take 4 to 8 weeks for the
problem to be completely analyzed and the root cause identified.
4.12 Eliminating Forced Deterioration

Eliminating forced deterioration is one of the most important maintenance strategies that must to be
implemented. This document provides a step-by-step guide as to how forced deterioration may be
eliminated.
4.12.1 Introduction
Definitions
From a maintenance perspective, equipment wears or deteriorates over time. While this is not
completely true as some components such as electronic cards do not wear, the majority of equipment in
a brewery or vertical plant displays a wear failure distribution. We differentiate between two types of
deterioration:
• Natural deterioration – this type of wear takes place when the equipment operates at design
conditions e.g. at the right speed, right temperature, right lubricant etc.
• Forced deterioration – this type of wear takes place when the equipment operates in conditions it
has not been designed for e.g. incorrect speed, too much vibration, not maintained correctly etc.
It is important to differentiate between these two types of deterioration, as each requires a different
maintenance approach.
A basic condition is a term that is used to describe the situation where forced deterioration has been
eliminated and the only wear that takes place is natural deterioration.
Understanding Forced Deterioration and Natural Deterioration

Figure 60 illustrates the key difference between forced and natural deterioration. The vertical axis
represents the inherent strength of the equipment while the horizontal axis represents time. As time
progresses, deterioration of the strength of the equipment will take place until the equipment will reach a
point of weakness that failure occurs.
Figure 60: Illustration of forced and natural deterioration
With forced deterioration, equipment deteriorates at a much faster rate resulting in failures that take
place quite quickly, when compared to natural deterioration. Forced deterioration occurs when the
following does not happen or exist:
• The machine is operated as per the operating manual requirements
• The machine is cleaned regularly
• Changeovers are done properly
• The machine is installed in an environment that is suitable for it e.g. no excessive dust, humidity,
heat etc.
• The machine is lubricated with the right lubricants at the right frequency
• The machine is run at the correct speed
• The right raw materials are used and these raw materials are in specification
• All broken parts of the machine have been fixed
• Regular maintenance as per the maintenance manual and good practice is performed
• The utilities supplied to the machine are within specification e.g. air pressure, voltage etc.
Natural deterioration is largely predictable making planned maintenance an effective tool. Consider for
example a conveyor that operates under design conditions. The wear strip on this conveyor will
deteriorate consistently and predictably so that if the wear strip is replaced on a fixed regular basis,
failure of the conveyor can be prevented.
Forced deterioration is often not predictable and therefore planned maintenance will not work for forced
deterioration situations. By way of example, consider a situation where a single technician in a plant
does not understand when to use a hardened pin conveyor chain and when to use a standard pin
conveyor chain. If he makes the wrong decision and installs a standard pin conveyor chain on a
hardened pin application, then the chain will experience accelerated wear and fail sooner than it should
have. From a plant perspective, it will appear that there are random premature failures of some
conveyors in the plant but there will be no pattern for there to be an effective planned maintenance
solution.
It should be apparent that forced deterioration is a major issue and that traditional maintenance tools
such as planned maintenance will struggle to deal with them effectively.
Dealing with Forced Deterioration

It is difficult if not impossible to effectively deal with forced deterioration using planned maintenance
techniques. In order to deal with them, the root causes of forced deterioration must be found and
eliminated.
There are three possible approaches to dealing with forced deterioration:
• Problem solving
• Technician cleaning and tagging
• Autonomous Operations
Figure 61 below illustrates the overall approach to addressing forced deterioration. The details of each
step will be discussed in the next section.
Figure 61: Process to follow to eliminate forced deterioration
Structured Analysis of Downtime

Structured Analysis of Downtime over a 12-week period will help identify the big or chronic problems of
the plant. The process of how to perform and interpret a Structured Analysis of Downtime is covered in
the Work Order Problem Solving Guideline so will not be covered here in detail.
The aim of the 12-week Structured Analysis of Downtime is to identify the most problematic process
area/machine in the department from a failure point of view. This machine will be the area of focus for
forced deterioration elimination.
Select the Improvement Tool

Once the biggest loss area has been selected using the 12 week Structured Analysis of Downtime
process, the right improvement tool must be selected. As mentioned earlier three possible tools can be
used to eliminate forced deterioration:
• Problem solving
• Cleaning and tagging
• Autonomous Operations
Problem solving is chosen as an improvement tool if the physical condition of the equipment is good. A
good physical condition means the machine is working properly and there may be one or two things that
are wrong on the machine. Problem solving is an effective tool to find things that are wrong and to fix it.
Cleaning and tagging is used as an improvement tool when the condition of the selected process
area/machine is not in good condition. If a machine is in poor physical condition, then there are many
things wrong with it like being dirty, missing parts, being poorly lubricated etc. The logic is that a
machine in poor condition, if restored to good condition, will perform better.
The final improvement tool is Autonomous Operations. Both Autonomous Operations and cleaning and
tagging can be used to resolve problems where the physical condition of the equipment is poor. The
criterion for choosing between one and the other is as follows:
• Autonomous Operations takes about 20 % to 40 % longer per process area/machine than
cleaning and tagging.
• Autonomous Operations is a major site intervention and requires significant preparation.
Cleaning and tagging does not.
• Autonomous Operations is a very good tool to get operator engagement while cleaning and
tagging does not involve the operator significantly.
• Cleaning and tagging can improve performance to 85 % LEF. Autonomous Operations can take
LEF beyond 85 %.
Based upon the above criteria, either Cleaning and Tagging or Autonomous Operations needs to be
selected.
There are separate guidelines for Autonomous Operations and Work Order Problem Solving so a step-
by-step guide will only be provided for Cleaning and Tagging.
4.12.2 Clean and Tag

This section of this guide will explain in detail how to perform cleaning and tagging.
Process Area Preparations

The following preparations will need to be made prior to beginning Cleaning and Tagging:
• Identify the team that will be involved in the Cleaning and Tagging exercise. The requirements of
who makes up the team are fully explained below.
• Develop an implementation plan for cleaning and tagging in the process area
• A risk assessment of the cleaning and tagging process is performed and a safety plan
developed. This should include isolation and lockout requirements, permit requirements, PPE
requirements etc.
• Cleaning tools such as brooms, buckets, cloths, brushes, degreasers etc. have been arranged
and are available. The specific cleaning tools required may change depending on the process
area/machine.
• A cleaning and tagging activity board as per the template in Figure 62 has been positioned close
to the process area/machine that cleaning and tagging will be implemented. The board does not
need to be populated but the overall layout is done so that it is clear where the different
information needs to be placed
• The team implementing cleaning and tagging has received classroom training on cleaning and
tagging. This training will last 4 hours or so and will take the team through an overview of
cleaning and tagging. This training will be reinforced in practice as the facilitator guides the team
through the actual process on the selected process area/machine.
The team that will implement cleaning and tagging must consist of the following people:
• The facilitator—this person is thoroughly trained and experienced on cleaning and tagging. The
cleaning and tagging training is typically a three-day practical course. Typically, the cleaning and
tagging facilitator is the maintenance supervisor of an area
• The shift technician/artisan who is responsible for the area in which the machine is located
• The day technician/artisan who is responsible for the process area/machine
• Operators of the process area/machine. While operators of the process area/machine are
involved in the cleaning and tagging process, the process does not incorporate many of the
elements of Autonomous Operations. The aim with Cleaning and Tagging is to improve machine
condition not develop operator engagement and skills.
• Additional technician/artisan resources may be allocated to provide sufficient people to be able to
effectively perform the cleaning and tagging of the equipment
Figure 62: Example of a clean and tag activity board
Deep Clean and Tag

Deep cleaning is different from conventional cleaning in that deep cleaning involves removing covers
and guarding from the machine and cleaning parts of the machine that you would not normally access.
Downtime of about 4 to 6 hours is required for every deep cleaning and tagging exercise. As the
machine is deep cleaned by the team, the machine is inspected and defects identified. Defects such as
missing bolts, loose mechanisms, leaks, damaged electrical cables, collapsed bearings, dirty areas,
unsafe parts of the machine etc. are tagged as a defect.
The tag is a visual indicator of a defect. It helps make sure that a defect no matter how small is clearly
visible. When tagging, it is important that the tag is placed on the machine in such a way that it does not
interfere with the normal operation of the machine. The tag can be placed close to where the defect is
and not necessarily on the defect if doing this will interfere with the working of the machine. In zones
where different colored tags are used to identify who has raised the tag (e.g., red tag is a technician and
blue tag is an operator) then the sites tagging standard must be complied to.
Figure 63: Example of a tag on a machine
Figure 64 is an example of a defect identification tag that can be used for tagging. It is important that the
defect tag is different from the 5S red tag. The 5S red tag is used to identify items that should not be in
an area. A defect tag is used to identify defects on a machine.
Figure 64: Example of a defect tag
At the end of the initial or first deep clean and tag, a large number of tags would have been identified.
The next step will explain how these tags will be dealt with. However, of critical importance is that the
deep clean and tag step is not done once only. It is repeated weekly (or every second week) so that the
entire process area and machine is thoroughly deep cleaned and any defects that were missed the first
time are picked up on the next successive deep cleaning. Eventually, after several weeks of cleaning
and tagging, no more new tags will be found despite the machine been deep cleaned. This indicates that
all defects have been identified though the rest of the cleaning and tagging processes must be
completed.
Analyze and Solve Tags

When a deep clean and tag activity is performed, a number of defects would have been tagged. All the
tags raised during the deep clean and tag activity will need to be reviewed by the team and a decision
made as to how they need to be fixed. For example, if a part is loose because it is missing a bolt, then
the action is simply to replace the missing bolt and fasten the loose part in position. This action is
captured on the tag list as per the example in Figure 65.
However, just fixing the defect identified by the tag is not good enough; the root cause of the defect
needs to be fixed. The team therefore needs to analyze each defect using a simple 5 why analysis in an
attempt to establish what the root cause of the defect it. It may not be possible to find the root cause of
some of the tags, but for others the root cause can be found. Once the root cause is found, an action
needs to be put in place to resolve the root cause. Therefore, for every tag there will be two actions; one
to deal with tag (e.g. missing bolt) and one to deal with the root cause of the tag (e.g. create a torque
specification for the bolt). The action for the root cause is also captured on the tag list as per Figure 65.
Tag List
Defect Actions Root Cause Actions
Tag No Date Raised Raised by Location Description of Defect Action to Fix Defect Who When Status of Action Root Cause Action to Fix Root Cause Who When Status of Action
Identify the correct torque specification
T1234 10-Apr-17 Jackson Ogale Filler infeed scroll Loose bolt on scroll bracket Tighten the bolt to the correct torque Tom Hardy 17-Apr-18 Complete Bolt was not correctly torqued and create a one point lesson. Tom Hardy 01-May-17 In progress
Lots of crowns have fallen out of the Chuck grippers worn as there is no Create a maintenance schedule to
T1245 10-Apr-17 Peter Humphrey Crowner chucks chucks onto the machine bed Clean up the fallen crowns Peter Humphrey 17-Apr-18 Complete schedule to replace them regularly replace the chuck grippers Peter Sithole 24-Apr-17 Complete
Figure 65: Example of a tag list
As various solutions to the root causes of tags are defined, a record of them needs to be kept. These
solutions are documented on an improvement record an example of which is shown in Figure 66.
Improvement Record
Situation Before: Insert picture of drawing

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean sed malesuada ante. Proin a ultricies erat, bibendum
lacinia ante. Quisque congue lectus et ante aliquet
fermentum. Integer sit amet justo lorem. Suspendisse eu
eros ultrices, placerat eros vitae, consectetur enim. Morbi vel
mi viverra, pretium arcu in, sagittis magna. Phasellus a augue
vitae metus iaculis sollicitudin a at neque. Suspendisse
dignissim massa ac massa cursus ornare. Pellentesque
consequat tempus scelerisque.
Improvement: Insert picture of drawing

Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Aenean sed malesuada ante. Proin a ultricies erat, bibendum
lacinia ante. Quisque congue lectus et ante aliquet
fermentum.
Expected Results:
Aliquam erat volutpat. Quisque dapibus augue ac neque commodo laoreet. Ut convallis purus quis lacus aliquam, non
pulvinar ante finibus. Praesent sit amet cursus orci. Curabitur posuere semper lectus, eu tempus sapien tincidunt vel.
Curabitur eros velit, consectetur id mi vel, tristique feugiat nisi.
Proposed by: J Jones
Approved by: T Sibebe
Figure 66: Example of an improvement record
By the end of the Analysis and Solve step, there will be a list of dirty tags with their root causes
identified. There will be actions to address the dirty tag and its root cause. The Autonomous Operations
activity board should be updated with all the relevant tables and charts.
Figure 67: Clean and tag activity board updated
Fix Tags and Root Causes

The previous step tags were analyzed to establish root cause and an action plan to fix the tag and its
root cause was developed. In this step, the solutions to the tags and the root causes are implemented.
The various actions to address the tag and its root cause can become quite extensive. There may be
hundreds of tags per process area each with two actions (one to fix the tag and one to fix the root
cause) resulting in thousands of actions. Managing of these tags lists is important, as they will soon
become overwhelming if not managed correctly. More importantly, failure to address tags and root
causes will cause people to lose interest and belief in the process.
In order to manage the tag lists the following will need to be done:
• The tag lists on each activity board associated with a clean and tag activity must be kept up to
date by the team. This list is reviewed weekly during the clean and tag activity and any
completed actions are recorded. Additionally, new tags and actions are added to the tag list.
• Machine related tags must be raised as a work order in the Computerized Maintenance
Management System (CMMS) and dealt with through this system. Each tag has a unique
reference number and this reference number is captured in the CMMS system so that progress
can be tracked.
For clarity, the steps of Deep Cleaning and Tagging, Analyze and Solve Tags and Fix Tags and Root
Causes all take place at the same time. In other words, in the same week, the machine may be deep
cleaned, tags analyzed for root causes and solutions to older tags implemented.
In order to track the closure of tags and their root causes, the following KPIs must be measured and
displayed on the activity board.
• Number of tags raised and closed
• Number of root causes solved
• Number of One Point Lessons or Standard Operating Procedures created or modified
• Number of maintenance schedules created or updated (mandatory)
• Progress towards achieving the goal e.g. LEF graph
Other KPIs can be plotted, but the above is mandatory. As the fix tag and root causes step is executed,
the activity board is continuously updated.
After two to three months of deep cleaning, tagging, analyzing tags and fixing tags and root causes, the
machines condition would have improved and its performance would correspondingly improve.
Sustain and Signoff

Once there are no more defects in the process area/machine, all tags and their root causes have been
fixed and all improvements to systems been completed (e.g. OPL’s created, maintenance scheduled
updated, manuals updated) then the Cleaning and Tagging activity can be signed of as complete. A
critical check of the success of the clean and tag activity is that the following must be visible or present:
• Equipment performance from a quality, waste and reliability point of view must have improved
and sustained at ‘as new’ conditions
• The equipment must look ‘as new’ in that defects should be minimal
• Evidence exists of the improvements made during the cleaning and tagging process e.g. new
maintenance schedules, new cleaning procedures, new OPL’s etc.
Team leaders, departmental managers and maintenance managers as part of their MCRS, will need to
check that the above is in place for every process area/machine that has completed the Cleaning and
Tagging process.
If the above conditions are sustained for several weeks, a clean and tag audit can be performed and the
process area/machine signed off. The audit can be done by the site maintenance manager or the plant
manager. The sign off can be used as an opportunity to recognize the team for a job well done.
Select Next Focus Area

After the Cleaning and Tagging activity has been signed off, a new process area/machine can be
selected using the principles discussed earlier. The entire process of forced deterioration elimination can
start again.
Once clean and tag has been done on a few machines, the significant improvement in individual
machine performance will start to have a significant impact on overall line and plant performance as
well. This follows from the fact that machine downtime typically follows the Pareto principle in that 20 %
of machines usually contribute to 80 % of the problems.
It is important that every machine in the plant is covered by an Eliminate Forced Deterioration process
(i.e. problem solving, Autonomous Operations and clean and tag). If this is not done, there will be areas
of the plant where forced deterioration is taking place and these areas will then become major problems
into the future.
4.13 Maintenance Facilities and Tools

This section of the maintenance handbook provides guidance on the management of maintenance
facilities such as workshops and the tools that are used by people performing maintenance.
Maintenance facilities are those areas of the plant where maintenance work is performed or
maintenance items stored. It also covers non-portable equipment such as milling machines. All plants
are required to have the following maintenance facilities:
• Workshops located in the relevant production departments. A centralized workshop is possible
provided that the workshop is centrally located such that all departments can easily and
efficiently access the workshop. In most cases, a centralized workshop is only effective in small
plants.
• Lubrication store/stores as per the standards defined in the Lubrication GOP.
• Specialized workshops where required e.g. filler valve workshop, labeler workshop etc.
• Central spare parts store
It is important that all VPO Safety requirements are sustained in the maintenance facilities. These
include but are not limited to requirements around:
• Machine guarding for all moving equipment
• Work instructions for all equipment
• Electrical personal protective equipment
• Lockout and tag out requirements
All maintenance facilities will implement 5S as per the requirements of Management Pillar. This is to
ensure that maintenance effectiveness in the workshops is maximized. Being relatively controlled
environments, it is expected that high levels of 5S will be maintained.
Maintenance tools refer to portable tools and equipment such as chain blocks or screw drivers. All plants
will need to analyze their tool requirements and define these requirements in the following categories:
• Tools required by technicians i.e. mechanical, electrical or automation. These tools may be either
issued to the employee but owned by the company or owned by the employee.
• Tools required by operators for autonomous operations responsibilities
• Specialized tools and equipment which are generally expensive and highly specialized tools
which are shared by several people. Examples include bearing heaters, chain blocks, large
spanners, thermographic cameras, laser alignment devices etc.
During the course of a year, random audits are to be performed to ensure that the tools that employees
are required to have are on site and that they are in a good working condition.
Specialized tools will need to have a designated owner. These tools need to be stored in a controlled
area and procedures must exist around issuing and returning such tools. Only people who have been
suitably trained can use specialized tools.
For all tools, 5S standards shall apply to their storage. The use of visual measures such as shadow
boards is required.
4.14 Reliability Engineering

Reliability engineering represents the last key element of our maintenance philosophy. In summary the
four elements of the philosophy are:
• Establish basic conditions

• Sustain life
• Predict life
• Extend life
Reliability engineering enables the extension of life. Extend life refers to the process where equipment
and their components are improved so they function better than design. While the main effort of extend
life focuses on reliability, maintainability, availability and cost, this process is not limited to these four
elements. Improving quality or safety using the tools of reliability engineering will also be acceptable.
There are several inter-related processes that define reliability engineering
4.15.1 Measuring Reliability

In order to manage the various maintenance processes, a measurement system is required. There are
several global maintenance performance indicators that each facility and department will need to
measure, track, report and improve.
Maintenance Performance Indicators

Key performance indicators for maintenance i.e. improvement targets set annually:
• PG-K1312 - LEF
• PG-K1623 - Brewing Breakdowns %
• PG-K1622 - Utilities Breakdowns %
Performance indicators for maintenance i.e. no improvement targets set but ranges of healthy/good
performance exist:
• PG-K1691 - Main Time Failure Solution (MTFS)
• PG-K5755 - Tags raised
• PG-K1651 - Packaging - Maintenance Outage (MTTO)
• PG-K1630 - Maintenance Resource Utilisation
• PG-K5760 - Packaging - Manhours per Maintenance Outage
• PG-K1610 - Maintenance Plan Attainment

• PG-K5762 - % Correctives from PM and PdM
• PG-K5757 - Tags closed
• PG-K5770 - Backlog
• PG-K1660 - Packaging - Mean Time Between Failure
• PG-K1670 - Mean Time to Repair (MTTR) - Global Consolidation
• PG-K5768 - Autonomous Operations Tasks %
• PG-K5752 - Packaging - Stoppages Greater than 10 mins %
• PG-K5754 - Packaging - Stoppages Less than 10 mins %
Appendix 2 contained the definition of the above maintenance performance indicators.
Reporting and Tracking of Maintenance Performance Indicators

All facilities, departments and areas will need to define as part of their one-year strategic plan, their
focus areas for maintenance. These maintenance focus areas will elevate the importance of some
maintenance performance indicators above others. The maintenance performance indicators which are
of elevated importance, will be tracked daily and/or weekly in the relevant team rooms. Front line team
members must be able to understand the meaning and importance of the maintenance performance
indicators tracked in their team rooms.
All maintenance performance indicators will be tracked and reported monthly by all departments and
facilities. This is to ensure that all maintenance processes are being executed properly, and nothing
falls between the cracks. All technicians and maintenance personnel must be able to understand the
different maintenance performance indicators and the importance of it to their work.
Maintenance Master Data and Maintenance History

With the advent of industry 4.0, data has become even more important than before. Data however is
only useful if it is accurate and structured.
Master data in in the CMMS system ensures that data is properly structured and comparable between
multiple facilities. The following basic requirements must be met when it comes to maintenance master
data in the CMMS:
• Functional locations are created down to Level 5 using the following nomenclature: Company ->
Facility -> Department -> Area e.g. filtration -> Equipment e.g. Lauter Tun
• Below level 5 of the functional location equipment, sub-systems and components can be defined
• Maintenance history must be captured at a minimum to Level 5 of the functional location
structure. It is acceptable to capture maintenance history at the equipment and sub-system level
as well. The mechanism of captured maintenance history is through the work order.
• Names of equipment at Level 5 of the functional location must be aligned to the zone standard
• Spare parts must follow a structured naming convention aligned to the zone standard
• Bill of materials must exist for all equipment. This ensures that there is clear identification of
spare parts that are used.
In order to ensure the integrity of master data and maintenance history, maintenance planners need to
audit their maintenance systems to identify deviations and to correct these deviations
Analysis of Maintenance History

Maintenance history is captured in the CMMS system through the work order. There is an enormous
amount of valuable information contained in a work order including:
• What equipment/subsystems/components failed
• When did they fail
• How did they fail
• When preventive and corrective maintenance activities were performed
• What spare parts were used
• How long did a maintenance activity take
• Etc
This information is invaluable in understanding the reliability of the facility and the effectiveness of
maintenance execution. It is expected that this data is analysed on a routine basis to obtain insights:
• Structured analysis of downtime is performed to identify machines with chronic failures over the
52/12/4 week time horizon.
• Ad hoc analysis to resolve specific issues that the plant experiences
Summary of Requirements for Measuring Reliability
• All areas, departments and facilities will need to measure, track, report and improve all
maintenance performance indicators
• Some maintenance performance indicators are more important and will need to be tracked and
reported daily/weekly
• All maintenance performance indicators will need to be measured, tracked and reported at least
monthly
• All front-line team members (operators, technicians, planners, supervisors etc) must be able to
explain the meaning and importance of the maintenance performance indicators tracked in their
area of responsibility
• Functional locations are created down to Level 5 using the following nomenclature: Company ->
Facility -> Department -> Area e.g. filtration -> Equipment e.g. Lauter Tun
• Below level 5 of the functional location equipment, sub-systems and components can be defined
• Maintenance history must be captured at a minimum to Level 5 of the functional location
structure. It is acceptable to capture maintenance history at the equipment and sub-system level
as well.
• Names of equipment at Level 5 of the functional location must be aligned to the zone standard
• Spare parts must follow a structured naming convention aligned to the zone standard
• Bill of materials must exist for all equipment. This ensures that there is clear identification of
spare parts that are used.
• Maintenance master data and history is audited by maintenance planners to ensure the integrity
of the data
• Structured analysis of downtime is performed to identify machines with chronic failures over the
52/12/4 week time horizon.
4.15.2 Maintenance Change Management

There are many changes that the maintenance function brings about. Some of the changes are related
to maintenance systems and processes (such as changes to the CMMS system, implementation of a
new SWI etc) and other changes are related to plant infrastructure (change the design of a machine,
install an alternative part, change plc code etc). Any change no matter how small, can have significant
consequences. It is therefore required that changes be managed via a management of change process.
The Management Pillar fully defines the requirements for the management of change system that needs
to be implemented. Maintenance departments and functions are required to implement the
management of change process for all changes originating from or being performed by maintenance.
Some of the key requirements include:
• Change requests are documented
• The rationale, risks and consequences of the change are defined
• Changes are only implemented if formally approved
• Maintenance systems and documentation are updated once the change has been successfully
implemented
Maintenance systems and documents that need to be updated include but are not limited to:
• People
o SOP’s and SWI’s are created or updated
o People are trained on the SOP’s and SWI’s
• Documents are obtained and stored
o Design specifications for the change including hazops
o Engineering drawings (P&ID’s, electrical line diagrams, general arrangement drawings,
machine drawings, I/O lists etc.)
o Final equipment setup and configuration set points
o Equipment registers and certificates such as pressure vessel test certificates
o Equipment manuals (Operating and maintaining)
o Documents pertaining to any removed equipment is removed and archived
o Logbooks pertaining to any new equipment are introduced into the production and
maintenance environment
o Copies of acceptance tests
• Software programs
o Copies of the final software installation is obtained
• Planned Maintenance
o The master data of the CMMS system is updated to reflect the changes
o Criticality assessments has been performed on any new equipment
o Preventive and predictive maintenance schedules are loaded into the CMMS system
o Preventive and predictive maintenance schedules of removed equipment are deleted from
the CMMS system
• Spares
o Spares holding are updated with new equipment requirements
o Obsolete spares and removed and disposed of
o A copy of supplier warranties is obtained and captured into relevant systems
• Tools
o Any new tools and equipment are purchased and installed
Summary of Requirements for Maintenance Change Management
• Maintenance departments and functions are required to implement the management of change
process for all changes originating from or being performed by maintenance.
• Change requests must be documented
• The rationale, risks and consequences of the change must be defined
• Changes are only implemented if formally approved
• Maintenance systems and documentation must be updated once the change has been
successfully implemented
4.15.3 Early Equipment Management

The purpose of early equipment management is to integrate production and maintenance requirements
into all phases of the capital project process, in order to achieve vertical start-up.
Figure 68: Traditional vs Early Equipment Management project
Figure 68 illustrates the main differences between a conventional project and one with early equipment
management implemented. With traditionally managed projects, a lot of operational problems are
experienced once the equipment is started up. The performance ramp up of the new installation takes a
long time and sometimes the project team is dissolved without resolving some key issues on the project.
It is also not uncommon, that the traditional project does not sustain the design performance criteria for
the project.
With early equipment management however, great effort is spent in the scoping and design stages of
the project to identify and resolve many operational and maintenance requirements. While this means
that there is more effort in the scoping and design stages, on start-up, there are very few problems and
the equipment very quickly ramps up to high levels of performance, and is able to sustain this
performance.
The implementation of early equipment management requires several key activities:
• Defining and continuously improving equipment technical specifications

• Performing maintenance design reviews using the RAMS methodology
• Integrating the specific maintenance organisation and maintenance system requirements into the
Transition Core Process
Global Technical Specifications

Equipment technical specifications are used by the project team to design and procure plant and
equipment. In AB InBev a set of Global Technical Specifications exist covering many aspects of
brewery design including a specific document on maintenance. Maintenance must ensure that any new
projects complies to the Maintenance Global Technical Specification.
Maintenance Design Reviews

The second key requirement of an early management system is the involvement of operational and
maintenance personnel in the design of the project. The design review is an opportunity for the
operational and maintenance teams to influence the machine design by providing input into the current
design. The design review process however is a structured process and not a casual chat with the
project team about the new equipment.
The maintenance team will need to review the design from several perspectives:
• Reliability – this refers to the ability of the equipment to function at its design specification. It is
measured by mean time between failure (MTBF) but also for packaging lines, minor stops need
to be considered.
• Maintainability – this refers to how easily equipment can be maintained. It is a function of the
amount of maintenance a piece of equipment requires and the complexity of this maintenance.
Mean time to repair (MTTR) and maintenance outage time are measures of maintainability.
• Availability – this refers to the amount of time the equipment can be effectively utilised. It is a
combination of reliability, maintainability and redundancy. A boiler for example has a very high
reliability as it can run for an entire year without failure. The maintainability though, is relatively
poor because it needs to be stopped every year for a major cleaning and inspection which takes
2 weeks. However, by installing a second standby boiler, a high system availability can be
achieved despite the poor maintainability of the individual boiler.
• Safety – the safety of the new design is assessed using methods defined in VPO Safety such as
HAZOP’s.
The above design review is typically called a RAMS review (Reliability, Availability Maintainability and
Safety review). Design reviews are team activities; therefore the RAMS design review will always
involve the facility maintenance team and the project team.
Figure 69: Design review process
Figure 7 illustrates the design review steps. At the start of the design process, the design team is
provided with the global and zone technical specifications (including ones on maintenance). These
specifications will define the major technical requirements of the project. Additionally, the facility may
provide a set of plant specific requirements that are not catered for in the global and zone technical
specifications.
The design team will take all the technical specifications and start working on the preliminary design.
Once the preliminary design is done, a 30 % RAMS design review can be performed. At this stage, a
high-level design should be complete, and the overall technical scope is clearer. The main objective of
the 30 % RAMS design review is to review the technical scope against the technical specifications and
agree the major trade-offs that need to be made to achieve the scope, budget and time constraints of
the project. Note as this is a maintenance related design review, the review is limited to maintenance
related items.
For example, the Global Technical Maintenance Specifications requires that all parts of a machine be
accessible for maintenance. It could be possible once the 30 % design is completed, that the design
team determines that the requirements around maintenance accessibility cannot be met. During the 30
% design review, gross problems like these are identified and resolved.
Once the 30 % design review is done, the design team further refines the design going into greater
details. By the time 60 % of the design is done, many key details of the design have been established
such as layouts, process flow diagrams, P&ID’s etc. The 60 % RAMS design review will focus on
several things:
• How will the plant/equipment be laid out and does this meet the requirements of the
global/zone/plant technical specifications? If not, what are the solutions or trade-offs that have to
be made.
• How will the plant/equipment function and will it meet the requirements of the global/zone/plant
technical specifications? If not, what are the solutions or trade-offs that have to be made.
Once again as this is a maintenance design review, the scope of the review will be limited to
maintenance i.e. reliability, maintainability and availability.
With the 60 % design review is completed, the design is further developed and costed. Once 90 % of
the design is completed, the 90 % RAMS design review is performed. At this stage, most technical
details should be finalised. The purpose of this design review is to make the final trade-offs given that
the design has been largely cost estimated. The cost estimate will provide insight as to what the design
will cost allowing technical scope can be adjusted given the project cost and budget.
Maintenance Transition Management

The purpose of the transition process is to ensure that the operational and maintenance requirements
for a project are considered early in the project process and properly executed prior to handover.
Key transition activities for the maintenance function include
• Setup the transition team - define who will represent maintenance in the transition team and what
their roles and responsibilities will be. A regular routine must be established.
• Documentation – define and obtain the documents necessary for effective maintenance
execution e.g. maintenance manuals, recommended preventive maintenance tasks, P&ID’s etc
• VPO Maintenance requirements – identify the key VPO maintenance requirements and work with
the projects and maintenance teams to ensure that these work practices are in place. The
expectation is to achieve basic level at start-up and sustainable in the second year of operation.
• Materials – identify the required spare parts and ensure that these are in place at start-up
• People – defining the structure and roles required and ensuring that the roles are filled leading
up the start-up. Additionally, ensuring that training requirements are defined, and training is
implemented.
By focusing on the above activities associated with maintenance transition management coupled with
technical specification and design reviews will ensure a strong early management system. A strong
early management system will ensure that effort in a project is put into the design and construction
phases, so that vertical start-up can be achieved, and performance sustained.
Summary of Requirements for Early Equipment Management
• The Maintenance Global Technical Specification is utilized in capital projects implemented at a

facility
• When the design has reached 30 % completion, a 30 % maintenance design review is done
• When the design has reached 60 % completion, a 60 % maintenance design review is done
• When a design has reached 90 % completion, a 90 % maintenance design review is done
• Ensure that a maintenance representative exists in any capex project transition team
• Maintenance documents and information required for handover by the project team has been
defined
• A plan for the implementation of Maintenance VPO requirements during the transition phase of
the project has been defined
• The requirements of the Maintenance VPO transition plan have been implemented
• Spare parts required for the equipment have been received at start-up of the equipment
• Technician and operator positions have been defined and filled prior to start-up of the equipment
• All operators and technicians have had a skill assessment against their SKAP role profile and a
plan exists to achieve competency
• Basic training has been performed to ensure that operators and technicians understand the new
equipment and perform basic tasks i.e. license to drive
4.15.4 Reliability Modelling and Analysis

In order to effectively extend life, a thorough understand of component and system reliability is required.
At this point, reliability analysis becomes statistical in nature and an advanced understanding of
probability is necessary to make the next step in reliability improvement.
For example, consider a scenario where you considering changing suppliers of a part that you buy for
one of your machines. How would you determine if the new supplier’s parts are more reliable than the
old supplier’s parts? The answer to this is not as simple as comparing the mean time to failure of the
two components.
Component Reliability
You will recall from the RCM discussion that different machine components have different conditional
failure probability distributions. Therefore, to completely understand the reliability of a component you
have to understand the conditional probability failure distribution.
Figure 70: Conditional probability failure distributions
The are many mathematical models that define conditional probability failure distributions. These
include:
• Exponential probability density function
• Normal probability density function
• Logistic probability density function
• Lognormal probability density function
• Gumbel probability density function
• Weibull probability density function
• Gamma probability density function
• Rayleigh probability density function
In order to establish which model best defines the failure distribution of a component, a goodness of fit
test is performed to assess which model best describes the failure history of the component. These
statistics tests include:
• The plot method
• Rank regression
• Maximum likelihood method
• Student t test
• Chi Square Test
• Kolmogorov-Smirnov method
• Cramer-von Mises test
Once the failure distribution model has been defined then a full understanding of the reliability of that
component can be obtained. The failure distribution model allows for the calculation of a number of
useful reliability information including:
• Component mean time to failure

• Failure rate for the component
• The probability of failure at a certain time e.g. 10 % chance of failure at 1000 hours of operation
The reliability information obtained from the failure probability distribution model can be used to optimize
the preventative maintenance system:
• The reliability of a component supplied by different suppliers can be compared and the most
reliable design selected
• The reliability of the component at different moments in time can be evaluated and the most
appropriate time for a maintenance intervention defined. For example, for a critical application,
we would schedule a maintenance task when the reliability of the component dropped below 90
% while on a less critical component, we could schedule a maintenance task when the reliability
dropped below 75 %.
Component Reliability Practical Example using Minitab

Consider a component with lifetimes (time to failure) per the table below.
Life Hours
2184 10203 1911
2310 3685 6056
4434 7214 6936
2457 275 6657
6215 10492 3858
315 7270 8125
3478 5899 9213
7861 5557 6157
2332 2901 11734
4731 7193
Figure 71: Life time hours
A distributions analysis using a histogram reveals the overall shape of the distribution. The mean time
to failure of this component is 5436 hours. (Minitab path: Assistant->Graphical Analysis->Histogram)
Analysis of the process mean and variation shows that there are no out of bounds events and that the
process is relatively stable. This means that statistical tools can be applied. (Minitab path: Assistant-
>Graphical Analysis->I-MR Chart)
The best fitting distribution can be found by performing a goodness of fit test. Per the table below, the
best fitting curve is the 3 Parameter Weibull as the Anderson Darling Co-efficient is the lowest for this
distribution pattern. (Minitab path: Stat->Reliability/Survival->Distribution Analysis (Right Sensoring)-
>Distribution ID Plot)
Given that the 3 Parameter Weibull distribution is the best fitting distribution, this can then be used to
derive the exact distribution curve as well as the survival function and the hazard function. (Minitab
path: Stat->Reliability/Survival->Distribution Analysis (Right Sensoring)->Distribution overview Plot)
The survival plot shows the probability of survival of a component. From the chart above, at 3000 hours,
approx. 75 % of components would still be surviving or there is a 25 % chance of failure. This data can
be used to determine the frequency of a PM task. Where high levels of risk can be tolerated then, then
PM frequencies corresponding to low survival rates can be chosen e.g. 50 % chance of survival. On the
other hand, when low levels of risk must exist, then PM frequencies can be chosen to give a high
survival rate i.e. 90 % to 95 % or higher. In this way, PM frequencies can be chosen by taking a
scientific approach.
The hazard function shows the instantaneous probability of failure of a component. In our example, it
shows that as the component ages, the probability of failure increases. This kind of distribution shows
that the items wears (increasing chance of failure as time progresses), making replacement of this
component a good strategy. The hazard function is discussed in the section on RCM
System Reliability
When different machine components are assembled together, they form a system. Understanding the
conditional probability failure distributions for components enables the calculation of the reliability of the
system.
A reliability block diagram illustrates the reliability relationship between different components. Consider
an electric motor (this is the system). It consists of a shaft, two bearings and the stator windings. If any
one of these four components fail, then the motor would have failed. From a reliability block diagram
point of view, the system would be illustrated as per Figure 72.
Figure 72: Reliability block diagram of an electric motor
If the component reliability for individual components at 1000 operating hours was at follows:
• Drive end bearing – 98%
• Shaft – 99.99%
• Non-drive end bearing 98%
• Stator windings 97%
What is the reliability of the system at 1000 hours? For components with a series reliability block
diagram, the system reliability is the product of the individual component reliabilities i.e.
98%x99.99%x98%x97%=93% system reliability at 1000 operating hours.
In general, for series systems the reliability of the system is given by:
𝑛
𝑅= 𝑖=1∏𝑅𝑖
Systems can also be arranged in parallel. Consider a boiler house with two boilers of which only one
runs at any given time. The reliability of the individual boilers at 8000 hours is as follows:
• Boiler 1 – 96%
• Boiler 2 = 95 %
What is the reliability of the system at 8000 operating hours? For components in parallel the reliability of
the system is:
𝑅 = 1 − 𝑖=1𝑛∏(1 − 𝑅𝑖)
Thus at 8000 operating hours the boiler system reliability is 1-(1-0.96)*(1-0.95)=99.8%.
Figure 73: Reliability block diagram for a boiler house
In reality systems are not simply series or parallel, but rather a complex combination of both. Further,
rather than having a single reliability point (e.g. 98 % at 1000 hours) the conditional probability
distributions provide us with reliability information at every time period, making the evaluation of system
reliability more complex. It is still however possible to evaluate system reliability through the use of
computer-based simulation.
Understanding system level reliability is important as it allows us to:

• Understand the reliability of the system over all time periods. Although we will know the mean
time to failure of a system without understanding system reliability, it is only one data point and
the shape of the system reliability is unknown. Error! Reference source not found. illustrates w
hy understanding the shape of the conditional probability failure distribution is important.
• Understand how best to improve system reliability. Fixing the weakest component does not
necessarily improve system reliability. For example, in a pure parallel system, system reliability
is improved more by improving the reliability of the most reliable component not the least reliable
component. In complex systems identifying where to improve reliability is not easy and system
reliability simulations are key to establishing where to focus.
• When integrated with live data from machine condition sensors and PM condition results, the
system reliability model becomes a digital twin of the reliability of the equipment. It will be
possible to make forecasts of system reliability into the future.
It is required that key equipment have system level reliability models in place. These models must be
used to refine the preventive maintenance approach for the equipment. Additionally, these models must
be used to identify where improvements must be made to improve overall system reliability.
Summary of Requirements for Reliability Modelling and Analysis
• Establish the failure probability distribution function for a component using failure data and
goodness of fit tests
• Utilise the component failure probability distribution function to obtain reliability information on the
component such as the mean time to failure or the survival rates (from the survival function)
• Utilise this component reliability information to adjust the preventive maintenance plans for the
component
• Build a system level reliability model for important equipment
• Use this system level model to understand the reliability of the system over time and optimize the
preventive maintenance program
• Use the system level model to identify where component level improvements are best made to
improve system level reliability
4.15.5 Equipment and Process Design

Changing the design capabilities of equipment or processes represents the final and most advanced
step in the life cycle of the asset. In order to perform extend life the following key pre-requisites are
required:
• Basic conditions are established, and equipment is sustained in as new condition
• RCM is complete and equipment preventive maintenance program is optimal
• Predictive maintenance and condition monitoring is in place
Objective:
The objective of Equipment and Process Design is to apply engineering principles to design parts or
processes that are better than the original design. This improvement will include any or all of the
following: cost, quality, reliability, performance, flexibility, waste etc. It is key to note that this covers
both packaging and process areas and spend in Opex and Capex..
Implementation:
The implementation of Equipment and Process Design can be largely broken in to two phases
depending on the maturity and capabilities of the organization:
• Phase 1 – simple parts or processes where the main aim is to duplicate the existing design with
some small improvements
• Phase 2 – more complex projects where the component or process is complex and significant
improvements are made to the design
Equipment Design Phase 1:

In order to perform equipment design, there are two key steps namely establishing the geometry and
determining the material specifications. As the main aim of this phase is to duplicate existing designs
with small improvements, the geometry is obtained by measurements of the existing part. Material
analysis needs to be performed to understand the material of construction and surface treatments if any.
The material information together with the measurement data is then used to create a CAD 3D model of
the part. Construction drawings are then derived from the CAD 3D model and the material
requirements.
Tools required for this step are:

• CAD software
• Measurement equipment (Vernier, Micrometer etc)
Fabrication of these parts can be done inhouse or via third party. If done inhouse then the following
equipment is required:
• 3D printers for prototyping and test fitting
• CNC machines for part manufacture
Equipment Design Phase 2:

With phase 2, the aim is to go beyond duplicating the geometry and materials and to make significant
improvements. In this phase there may be an existing part or a completely new part may need to be
designed. Mechanical engineering principles will need to be applied to correctly design the part, taking
into account stress, strain, fatigue, corrosion etc.
Tools required for this step are:
• CAD software
• Measurement equipment (Vernier, Micrometer etc)
• 3D Scanner
o For parts with complex geometries
o Faster creation of geometries
o Post-manufacture dimensional control
o Statistical analysis of wear
Fabrication of these parts can be done inhouse or via third party. If done inhouse then the following
equipment is required:
• 3D printers for prototyping and test fitting
• CNC machines for part manufacture
Process Design:
Process design may involve mechanical design (e.g. optimal shape of the rakes of the lauter tun) but will
often require process engineering knowledge (heat transfer, thermodynamics, fluid mechanics etc).
Examples of this include:
• Improving flow velocity in a CIP system by correctly sizing the pump. This requires a knowledge
of fluid mechanics and pump curves
• Speeding up process time by increasing the size of heat exchangers. This requires knowledge
of heat transfer and heat exchanger design
RACI for Equipment and Process Design:
Maintenance
Manufacturing
Technician, Engineering/M
Technicians Reliability Engineering
Task Operators, aterials
(specialized Engineer Manager
Planning Analyst
mechanic)
Technician
Identify opportunities for

parts to be manufactured
RA IC IC R
in-house.
Develop and manufacture

parts for the plant and
manage the entire technical IC RA IC
process (backlog prioritized
to manufacturing).
R&D, laboratory analysis

(mechanical, chemical,
thermal, etc.), failure/wear
analysis, selection of IC RA
materials and
manufacturing methods
according to project scope,
technical project report, test

follow-up to validation
Complete process
management, analysis and
selection of strategic
backlog (inputs), treatment
of development outputs
(outputs: PM review, etc.)
and carry out Production IC IC RA I
Planning and Control
(Manufacturing P3M) to
have monthly visibility of
planned deliverables for
meet the plant.
Monitor the impact on the

Plant's KPIs (ZBB, worked
capital, etc.) and support IC IC RA
team regarding the
necessary resources.
Summary of Requirements for Extend Life
• Identify focus machines using maintenance performance indicators, structured downtime

analysis or strategic requirements from the business
• Design and implement the change
• Test the change and use component condition failure probability distribution models to evaluate
the effectiveness of the improvement
• Apply change management techniques to manage the change
4.15 Process to Update Maintenance Pillar Content

This section refers to the policy describing the processes required in order to update the different items
in all the Pillar books.
Additional information about the process to update maintenance pillar content is described in the
Management Pillar Book.
5. Implementation
This section of the book describes how the different sections and tools of this book will be installed
during VPO implementation.
5.1 Maintenance Pyramid

The maintenance pyramid is a core VPO tool used to map out the building blocks required on the way to
world-class maintenance. The pyramid builds methodically from the basic requirements through to the
advanced levels. Completion of the pyramid is supported by a set of questions used to ascertain the
status of each level. The questions are scored on a “traffic lights” system.
The exact time for each element is dependent on the status within each plant; some plant
implementation schedules will be much shorter. The VPO Maintenance Pillar implementation project
files provide an indication of the time it should take to implement each element from scratch.
5.2 Implementation Progress

The Implementation Progress (IP) is a tool to track the actual status of the implementation of the
elements described in this book as a percentage against the plan.

2022 Maintenance Pillar Handbook (180-240)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2022 Maintenance Pillar Handbook (180-240)

Uploaded by

Copyright:

Available Formats

MAINTENANCE PILLAR BOOK

Select Process Area and Team

Train the Operator on the Process

Define Process Control Points

Manage Equipment Conditions and Process Control Points

Implement Enhanced Maintenance Techniques

4.10 Maintenance Program Development

FMECA Process RACI

Technical Services Technical Services

Detailed Implementation Guide

The logic of developing a preventative maintenance program is outlined in Figure 42 below:

Figure 42: Maintenance program development logic

Reviewers Name Date of Review

Machine SME Name

Trade/Craft (Mechanical, Electrical, Automation)

C. Existing Task Review

Is this task Is the task practical If this is an inspection

Reviewers Name Date

Approvers Name Date

Figure 43: Template for review/optimisation of preventive routines

Autonomous Operations Standards

FMECA & RCM

Figure 44: Probability and consequence matrix for criticality assessment

Figure 45: RCM decision tree

Figure 46: Level 2 elements of the RCM decision tree

There are 6 possible activities that are covered by Level 2:

Figure 47: Decision tree for a production effect failure

Step 1 – Partition the Machine

Figure 48: Template for partitioning equipment

Step 2 - Perform a FMECA analysis

Figure 49: FMECA template

Step 3 – Apply the RCM Decision Tree

Figure 50: Template for capturing RCM results

The abridged RCM process steps are defined in Figure 51.

Figure 51: Abridged RCM process

Eliminate Forced Deterioration

Select Machine Root Cause Failures

Apply the RCM Decision Tree

Define the Maintenance Task

Task Description Machine

Figure 52: Excerpt of template for capturing details of the task

Operator Inspection or Artisan Inspection or

Operator Inspection or Artisan Inspection or

Define Resource Requirements

4.11 Problem Solving

Root Causes and Symptoms

Figure 55: Illustration of the symptoms of a root cause

There are 6 possible categories of root causes:

Structured Analysis of Downtime

In a manufacturing operation, there are three broad categories of waste or losses:

Figure 56: ABInBev packaging waterfall graph

Figure 57: Illustration of a time Structured Analysis of Downtime

Figure 58: 52/12/4 week downtime trends

The downtime in a Structured Analysis of Downtime is expressed as a percentage of scheduled time.

Figure 59: Example of a machine level Structured Analysis of Downtime

Triggers for Problem Solving

Trigger 1 - 5 Why Analysis

Trigger 2 – Top Weekly Problems of the Department

Trigger 3 – 12 Week Structured Analysis of Downtime

4.12 Eliminating Forced Deterioration

Understanding Forced Deterioration and Natural Deterioration

Figure 60: Illustration of forced and natural deterioration

Dealing with Forced Deterioration