Professional Documents
Culture Documents
Chapter 11 Maintenance Modelling As A de
Chapter 11 Maintenance Modelling As A de
Chapter 11
1. Introduction
According to the ESReDA Project Group proposal, the aim of this book is to provide
“a technical reference text which will document the current state-of-the-art”,
“emphasizing its practical application”.
The author of this chapter has experience both in control and in reliability
engineering applied to, mainly petrochemical, production processes and has
observed [1, 2] significant differences in engineering acceptance between these two,
from a theoretical point of view, rather similar disciplines. This contribution aims to
identify the reasons and practical stumbling blocks why inherently essential reliability
engineering theoretical principles are difficult to apply in practical situations. We will
investigate the needs and possibilities of maintenance engineers, the organisational
restrictions in applying RAMS1 over the lifecycle of installations, the lack and
uncertainty of required reliability data and their dependency on usage factors. There
are clear signs that these observations are at variance with the assumptions in
stochastic OR models and the flexibility offered by commercial decision support
tools. Based on the experience in process control, we will make a strong plea to use
physics of failure models that are better in line with the maintenance engineer‟s
mindset and effectively may use (process control type) continuous information that in
many cases is already available.
Although restricted in scope and coverage this study clearly indicates that:
1
Reliability, availability, maintainability and safety.
805
Maintenance Modelling and Applications
Next to managing OEE, AM is also responsible for safety and health management
with its related inspection activities.
The value of AM is now clearly recognised at board level with most CEO‟s being
more sensitive towards safety than OEE. The level of related reporting is
acceptable, making effective use of computerised maintenance management
systems (CMMS) both to a dashboard function as well as in a learning
environment.
Operational maintenance activities are mainly outsourced to specialised
contractors. As a consequence, in-house craftsmanship, technical insight and
experience are decreasing and the need for effective management techniques is
increasing.
Technical insight and craftsmanship are stronger developed than knowledge in
general management and reliability engineering theory. Mechanics have little
insight in reliability engineering and AM strategy aspects.
Respondents are of the opinion that sufficient education facilities at Master and
Bachelor level are existing, but not integrated with company career development
programmes.
Only in mandatory cases, full-blown fault tree type process models and / or
reliability block diagrams are in use; the large majority of the respondents stay at
the RCM or at a simple, single failure mode planned maintenance optimisation
level. All interviewees invariably indicate a lack of reliable failure data; a
significant part now being active in using the CMMS for trending and root-cause
analysis. With the exception of safety related equipment, where such information
is now critical in view of quantitative standards, OEM‟2s are rather reluctant even
to provide MTTF data at equipment level. Weibull representation at a (critical)
failure mode level is virtually absent.
Maintenance modelling only has a practical value in decision support tools [4], if
used effectively they contribute significantly to overall company profit [5]. In practice,
however, practical engineers remain rather sceptical on the use of such techniques.
2
OEM: Original Equipment Manufacturer.
3
An offshoot of Military Procedure MIL-P-1629, titled Procedures for Performing a Failure Mode,
Effects and Criticality Analysis, dated November 9, 1949.,
4
HAZOP originated in 1963 in the Heavy Organic Chemicals Division of ICI UK
806
Chapter 11: Maintenance Modelling as a Decision Support Tool
Optimising Hours -
control days
OPT
Figure 1 gives a schematic overview of the control layers currently being applied in
plants. At the lowest, regulatory level, simple PID-controllers stabilise SISO6 process
variables as flows, temperatures and pressures in timescales measured in seconds.
At the next higher level, MIMO7 and model predictive controllers steer setpoints of
regulatory controllers, relying on some form of dynamic process model, to handle at
a slower pace severe control loop interactions, and / or to achieve desired
trajectories of process variables. Finally, for critical processes like a refinery catalytic
cracker, an overall, plant-wide model is used for economic optimisation. Often,
optimal plant operation is achieved when some of the manipulated and/or controlled
variables are near their limiting values (constraints). Therefore, the control structures
5
Proportional, integral, derivative controller; the standard in Industry.
6
single input, single output
7
multi input, multi output (multivariate or multivariable)
807
Maintenance Modelling and Applications
at the highest two levels invariable will have algorithms to detect, for instance, the
maximum heat load of an exchanger actually available.
The operators in the plant are well trained to understand the production process; the
way process variables affect output, quality and safety and get full information via
computerised control systems with advanced graphical user interfaces showing
historical trends on variables, deviations, alarms and calculated information on
effectiveness. The production department generates daily reports showing the added
value of their activities to the company.
CONTROL RELIABILITY
ENGINEERING ENGINEERING
DRIVERS USERS “MASTERS”
THEORETICAL BASIS +++ ++
USE OF DATA AND ICT +++ +
INTERACTION / ++ +/-
UNCERTAINTY
ASPECTS
LIFECYCLE ASPECTS + ++
ACCEPTANCE OF +++ -
MODELS
IDENTIFIABLE BENEFIT +++ ++
INTEGRATION +++ +/-
CAREER HIGH, LINKED WITH RESTRICTED,
OPPORTUNITIES OPERATIONS SPECIALISTS
Table 1
Figure 2 [4] shows the strategic, tactical and operational layers in Asset
Management (AM). The most decisive RAMS alternatives are fixed in the design
phase where the plant layout and the type and make of equipment are defined,
virtually for the lifetime of the installation.
808
Chapter 11: Maintenance Modelling as a Decision Support Tool
The reliability engineer supports these activities with model studies; to what extent
and at what costs can “the right amount of product of the right quality, safely and
environmentally sound cost-effectively be produced at the right moment in time?” At
first (section 0), modelling necessarily will be crude, with low granularity and
assumed data.
In this way, the system model is slowly expanded in detail and system insight is
growing. Its predictive power over the long run, however, is strongly affected by
uncertainty in input data.
Design type models deal with average behaviour over the lifetime of the
installations; the operational models with time dependent behaviour mainly
during the intervals between major planned replacements, overhauls or plant
shutdowns. The dynamic criteria thus are quite different.
dashboard
Scenario analysis performance
managementt managementttmanagement
Yes / No
Demands Capabilities
Asset performance performance
Life Cycle Costs measurement
Tactical
Improve Plan
Operational
performance
Analysis Schedule improvement
Execute
Abandon /
demolish
Figure 3 shows an example of a data sheet of a signalling lamp that (after some
pressure!) may be obtained from a manufacturer. A reliability engineer will be
809
Maintenance Modelling and Applications
pleased to see the familiar Weibull diagram, showing, in this case, a bèta of 3.6 and
an eta of 3.3 years. Such diagrams are a result of the manufacturer‟s quality
assurance process, using accelerated testing where the operating voltage is higher
than the design value.
With the help of such information, one may investigate, for instance, the
effectiveness of a break-down versus block-replacement strategy of airport runway
illumination [4]. For these systems, great care is taken to ensure soft starting-up,
constant supply current and low vibration levels.
However, if one takes into account the physics of the illumination process, it
becomes clear that the lifetime of the lamp is exponentially dependent on filament
temperature, or, correspondingly to the metal vapour pressure:
This means that a 5% reduction in operating voltage will more than double the life
of the bulb, at the expense of reducing its light output by about 20%. This property is
used for “long-life” bulbs used in difficult-to-access locations (for example, traffic
lights or fixtures hung from high ceilings).
Timmer [18] analysed the lifetime behaviour for various applied voltages (3.25 – 4.75
volts) in Weibull terms and shows the following information:
810
Chapter 11: Maintenance Modelling as a Decision Support Tool
Table 2
The Weibull analysis Table 2 indicates a poor correlation for data set # 2 but the
dependency of the parameters η and β upon voltage applied is quite outspoken.
Hesen [19] et al report on the mandatory conformance of the Dutch low tension
power to NEN-EN 50160. For the low power tension distribution, the supply voltage
should remain at 230 V +/- 10% for 90 % of the time during a weekly interval. The
voltage fluctuations allowed by this norm indicates that the lifetime of an
incandescent lamp will show in practice a larger variation than that indicated by
laboratory tests.
From the above example, we have to conclude that the reliability engineer has
to treat such apparently accurate manufacturer information with great care. In
this case, for optimisation results to be reliable, measures such as voltage or
power regulation have to be applied. Apart from the above influence of applied
power, other factors like mechanical shocks and fouling will also affect the
lifetime.
With the introduction of risk-based norms like IEC 61508 and 61511, manufacturers
of electronic equipment now specify the MTTF of their various products as a
standard selection criterion.
The basic failure rates for electronic components are usually taken from MIL-
HDBK-217 (although this source since 1994 is no longer updated!), Bellcore TR-
332, or some other (e.g. in-house) reference.
The Part Stress technique adapts basic failure rates to those applicable in an
analysis:
i = b * T S P Q E
Where
πT = operating temperature
πS = secondary stress level(e.g. vibrations, shock, etc)
πP = power factor
πQ = quality factor, degree of manufacturing control
πE = environmental factor
With these component failure rates, system characteristics like MTTF are calculated
via the Parts Count method or some form of model (fault tree, reliability block
811
Maintenance Modelling and Applications
diagram, Petri net, ..). In other cases, however, still infrequent, highly accelerated life
testing (HALT) and highly accelerated stress screening (HASS) are being used.
Figure 4 shows the field data of nine vendors of the “same” radio set versus the
calculated data they presented to the Army procurement. One easily observes that
the majority of the supplier‟s observed MTBF was no where their prediction.
8000
7000
6000
5000
FIELD DATA, h
4000
3000
2000
1000
0
0 1000 2000 3000 4000 5000 6000 7000 8000
MIL-HDBK-217, h
“In general, models from these sources have not proven credible when predicting
reliability quantitatively. Studies show that failure rates predicted by the above
mentioned procedures can differ by over two orders of magnitude. However, if used
in their proper perspective, these empirical models can usefully compare the
reliability issues of two approaches to the same design.”
812
Chapter 11: Maintenance Modelling as a Decision Support Tool
(worst case) given by the suppliers. The observed data were analysed to yield a 70%
chi-squared PFD confidence interval.
Table 3 shows that the differences between calculated values and those observed in
practice differ by roughly a factor of ten, equivalent to one SIL step.
For the sake of this example; a catastrophic failure will occur if either:
The insulation resistance of the windings of the E-motor fails due to overheating
(hot spots).
The bearings of the E-motor show excessive vibration requiring shut-down.
The connecting shaft or elastic coupling fails due to shear or fatigue loading.
The impeller of the pump fails due mechanical deformation / abrasion.
The bearings of the pump fail.
The pump seals fail, inducing unacceptable excessive leaking.
The control system fails, either due to hardware or software problems.
The question now is; can we build a reliable and precise stochastic model of such a
system such that we can analyse the effects of “bought-in” reliability (investment in
better quality) versus that of maintenance strategies, either to ensure production
(risk) or to optimise costs (life cycle engineering)?
813
Maintenance Modelling and Applications
Bearing life is determined by the number of hours it will take for the metal to
"fatigue" which is a function of the load on the bearing, the number of rotations, and
the amount of lubrication that the bearing receives. The reliability of (roller) bearings
is generally expressed by the L10 life; the interval in which 10% of the bearings under
specific test conditions have failed (Lundberg-Palmgren, [23]):
exp
C B
L10 * * a
P n
where:
The radial rating of the bearing C depends strongly on the type and homogeneity of
the bearing material and on the greasing effectiveness. The designer selects a
bearing depending on the required L10 life, the expected load and the dimensional
possibilities. Note that, according to the above equation:
The relationship is based on models; the parameters of which (C, P) are rather
uncertain. The basic load rating stated in the supplier‟s catalogue is subjective to
manufacturing quality; the design load is an average value over expected
operational use.
A change in (radial) load of a factor of two, changes the L10 life by a factor of 8 -
10.
Fatigue is physically related to a number of loading cycles, in case of the bearing
represented by revolutions where the load passes from the inner race through the
balls (or rollers) to the outer race. The conversion from rotations per unit of time to
the lifetime identified in years requires a specification of the usage factor.
In this simple form, the constant a embraces all “adjustment factors”. For
instance, the viscosity of the bearing oil versus the reference value has a
quadratic influence [24]
Whereas rotating equipment designers use the L10 value as a design parameter,
reliability engineers need failure distribution type information like Weibull parameters
to evaluate system and maintenance characteristics. Bearing suppliers like SKF8 do
not provide Weibull values.
8
Even supported by the Dutch Business Development Manager of SKF, the company did not respond
to questions at this point.
814
Chapter 11: Maintenance Modelling as a Decision Support Tool
Few studies are available with sufficiently large data sets that Weibull analysis is
feasible. Lieblein & Zelen [25] describe tests to failure carried out over a period of
many years by four major ball bearing manufacturers on endurance test of bearings,
properly taking into account (censored) lifetimes that extended the test period. The
report contains 213 data values for the Weibull slope values β. The authors observe
a large spread; from a minimum value of 0.54 to a maximum value of 4.44! The
statistical average is about 1.4. Further refinements were made by Harris [26], taking
into account the existence of a fatigue limit stress; if an operating bearing
experiences stresses that do not exceed the limit stress, the bearing can achieve
infinite life. Rotating equipment engineers now commonly assume that the MTTF of a
bearing is equal to 5 times the L10 value; which is in line with a bèta slope of 1.4.
As crude as it may appear, the Lundberg-Palmgren equation provides insight into the
consequences of frequently mentioned reasons for early bearing failure:
With such sensitivities to operational conditions, the scatter in the Lieblein & Zelen
data is understandable.
815
Maintenance Modelling and Applications
When the bearing is installed in a pump, other factors that influence reliability will
show up. A (centrifugal) pump converts rotational energy from the driving motor into
hydraulic pressure. As with all physical processes, this conversion has efficiency less
than 100%, and is a function of the operating point in the volumetric displacement
versus head (Q-H) curve (Figure 5).
If a pump is operated away from the best efficiency point (BEP), the imperfect
hydrodynamic regime (internal recirculation within the pump) creates a number of
undesired phenomena, like increases in vibration, temperature rise, shaft deflection;
and a reduced NPSHA9 margin. Local pressure pulsations and subsequent radial
shaft deflection [28] will dynamically increase the load on the bearings; thereby
affecting their life.
The impeller is another critical element in the pumping system with failure
characteristics that are difficult to define and estimate. Erosion is one kind of wear,
depending on solids loading, fluid velocity and impeller material characteristics.
Cavitation, the “process of formation and disappearance of the vapour phase of a
liquid when it is subjected to reduced and subsequently increased pressures at
constant ambient temperatures” is the most serious reason for impeller failure [29].
Cavitation will lead to serious erosion (Figure 6) if the cavitation bubbles implode
near the surface of the vanes and their energy exceeds the resistance of the impeller
material.
Industrial pumps invariably show some form of cavitation in their range of operation,
quote: “with effects, which can be tolerated (millions of field pumps with satisfactory
and long services) or may cause problems (hundreds or thousands of field pump
troubles clearly diagnosed with cavitation as root cause (high speed, wide range of
operation) ” [29] Empirical relationships now exist between cavitation intensity
(bubble length) and erosion rate, containing such parameters as NPSH, saturation
pressure, gas loading. Together with rules for impeller design, visualisation
experiments and CFD calculations “it is possible to assess the impeller life
expectancy, with a good probability of success”. Obviously, such studies are not
standard parts of design and maintenance and reliable data on impeller life
characteristics thus are hard to obtain.
The characteristics of seal failure are to some extent comparable with those of
bearings. The seal face is designed as a sacrificial element that may wear out.
However, this accounts for only some 10 % of seal system failures [30]. In the
9
net positive suction head available
816
Chapter 11: Maintenance Modelling as a Decision Support Tool
majority of cases [31], the seal will fail due to wrong operation (40%), wrong
installation (24%), bad application (19%) and other (17%) reasons. The seal faces
are easily affected by lack of control of barrier fluid pressure showing up in ill-
designed or ill-controlled seal systems, especially during pump starts [32]. A Weibull
failure analysis then will reveal an almost purely random character[4]
Electrical motors nowadays are quite reliable; they mainly fail due to bad electric
connections, bearing problems and winding insulation problems (hot spots).
Windings have rather well defined failure characteristics; Bloch and Geitner [33]
quote a value of 4.3 for the bèta value; the eta of 18.5 y since the time of publication
(1990) now higher due to better quality insulation materials. However, “Every
increase of 10 degrees Centigrade of a motor‟s windings above its design operating
temperature cuts the life of the motor‟s winding insulation by 50 percent, even if the
overheating was only temporary” [34]. The current drive for high efficiency (a.o. with
smaller air gaps in the design) make winding life more susceptible to changes in
ambient temperatures and, especially, to fouling which, in turn, leads to bad cooling.
In turn, higher motor temperatures affect bearing temperature and thus, lifetime, and
lead to premature degradation of the grease, needing shorter re-greasing intervals.
Another aspect of bearing failure in E-motors is associated with stray electric current:
“The risk for damage to bearings can occur when transient voltages exceed 0.5V or
the electric current flow is more than 0.1A/mm2 (related to the contact area of rolling
elements)10.” If no special electrically-insulated bearings are used, the useful life may
be restricted to 1 -2 years.
Although reported data on overall pump reliability may well be used in risk
studies (“what is the probability that the pump under normal conditions of
maintenance and operation will deliver x m3/h on demand”), we should
realise that the underlying field data are strongly influenced by operational
and maintenance procedures.
Pump reliability is based on the failure behaviour of a number of critical
components, the physical processes leading to failure we showed to be
rather ill-defined and susceptible to operational and maintenance regimes.
Databases at best give the proportion of these failures on the overall figure;
in-house CMMS’s only recently have scope for recording at this level.
Even for well-specified components like bearings, where designers make a
selection based on vendor-specified criteria, identical components in
different operational conditions (in a pump, in an electric motor) will show
different failure rates.
The required information for maintenance optimisation models in terms of
Weibull parameters thus necessarily depends on engineering insight and
the values are bound to be uncertain and imprecise.
10
http://evolution.skf.com/zino.aspx?articleID=79&lan=en-gb
817
Maintenance Modelling and Applications
final
investment
decision
In the petrochemical industry one may distinguish [35] five project phases from
concept development to realisation (Figure 7). The first two or three steps are usually
mainly executed by the process owner, keeping commercial information on market
penetration / expansion in-house.
In the Scouting phase different process options are generated and the most
promising one is selected based on the outcome of technical and economic
evaluations. In the Basis Of Design (BOD) phase, the selected option is developed in
more detail, resulting in a description of the intended facilities, which is sufficiently
detailed to produce a +/- 30% estimate of the investment required to develop and
build the installation. This yields another decision point to go forward or abandon the
project. The Basic Design Package (BDP) allows for a +/- 20% capital cost estimate
and is sufficiently detailed to hand over the project to a contractor to produce a
project specification.
In the Project Specification (PS) phase, the selected contractor develops the BDP
into a design specification with a +/- 10% cost estimate. This serves as the final
information on which the future owner decides on freeing up capital to build the
installation; the final investment decision (FID). Contractors subsequently develop
the PS into a detailed design, procure equipment and materials, and construct the
installation. With a commissioning test the contractor will hand over the installation to
the process owner.
100%
Degree of
50%
freedom
25%
0%
Scouting BOD/BDP PS Implementation
818
Chapter 11: Maintenance Modelling as a Decision Support Tool
Throughout this process (Figure 8) more and more degrees of freedom become
frozen whilst capital expenditure increases sharply in the last phases. Note that
money spent before the FID is taken is considered „risk money‟, in that the
investment will be lost if the FID is negative. It is common practice that one in five to
ten basic designs do not lead to a positive FID. Senior management thus may be
reluctant to risk money up-front and, although it is of utmost importance to start early,
RAMS optimisation has to be effected with limited costs. Most decisions on plant
layout and redundancy (significantly affecting future OEE) will have be taken without
specific knowledge on type and brand of plant equipment. The final selection of the
latter will take place later on in the project specification phase and will not only
depend on RAMS characteristics but also on the local situation: the availability of
OEM support and spare parts, the relation with already existing processes, the skill
level and labour costs of craftsmen, etcetera. Hence, RAMS aspects will start as a
broad brush approach, becoming more elaborate the further investment capital is
assigned and the design is specified.
In the Scouting phase different process options are generated and the most
promising system will be selected on the basis of technical and economic
evaluations. The design staff will base their decisions on assumptions with respect to
the performance of the new installation, for instance they may intend to go for
„pacesetting performance‟. Such a goal will broadly be described in a form of overall
failure propensity (virtually no stops between shutdowns, minimal maintenance
costs, maximum x hours turn-around workload). Although rather vague, these
assumptions determine the economic and technical evaluation at the end of the
Scouting Phase, and with that the decision whether or not to progress into the BOD
phase. There, assumptions will increasingly get the character of requirements that
have to be realistic, and for which the (economic, safety, environmental)
consequences have to be substantiated as far as possible. There is a danger to only
use benchmarking data for the type of unit / equipment under consideration. Local
factors like climate, labour situation and logistics may significantly affect potential
RAMS goals.
At the end of this phase the design staff will have insight on:
In the Basis Of Design (BOD) phase, the selected option is developed in more detail,
resulting in a description of the intended facilities, which is sufficiently detailed to
produce a +/- 30% estimate of the investment required to develop and build the
installation The BOD will contain a list of requirements which the design must fulfil in
819
Maintenance Modelling and Applications
order for the project to be a candidate for further study, such as annual capacity and
stream days.
A problem with these first two steps is that, in spite of the lack of detail, the concept
of a new installation gradually takes shape and usually is difficult to change later on.
Given the competition between new projects to get the FID and the high net present
value discounting rates used (10 – 15%), there is a strong pressure on capital costs.
Operational aspects like (fluctuating) marketing demand, plant capacity / flexibility,
product availability, maintenance then easily get less attention.
The Basic Design Package may be regarded as an extension of the BOD with a goal
to arrive at a +/- 20% capital cost estimate. The resulting documents will allow
handover of goals and means for contractors to produce a project specification.
Aspects like plant lay-out and line-up, materials selection and sparing of (rotating)
equipment now have to be analysed.
The engineering decisions in this phase directly affect the future RAMS aspects and,
given the hand-over to external contractors, now almost become fixed. At present,
these decisions are based mainly on experience and insight but the lack of
quantification easily leads to sub-optimal designs, especially with the highly
integrated installations of to-day with restricted redundancy and buffering (to reduce
capital investment).
Since the process flow scheme will now be available, this is the stage to start basic
RAMS modelling at a proper level of detail. No detailed selection of equipment can
yet be made, the design staff thus will have to deal with generic models. The
company experience in terms of planned maintenance and the still remaining
observed equipment failures within PM periods now is of great value. Industry-
specific databases like OREDA [36] in the oil industry, EIReDA [37] in the power
industry may also provide benchmark data. The availability block diagram of the new
installation will now show “typical” equipment blocks with a few failure modes that
describe the availability restriction due to planned maintenance (total operating time
minus PM) and the random availability characteristics in between.
In the Project Specification (PS) phase, the selected contractor develops the BDP
into a specification, which together with +/- a 10% cost estimate is used as a basis
for the Final Investment Decision (FID). In order to arrive at such rather precise cost
estimates a more detailed equipment description is required. Options are still open;
will the plant be laid out to allow replacement at system, skid or component level? To
what extent is redundancy cost effective? What contracts are available locally for key
equipment service contracts? How does this works out in terms of local staff level,
workshops, hoisting, lay-down areas, …Gradually the effective (major) equipment
availability characteristics will now take shape, either by engineering sense or by
(simplified) quantitative availability modelling. These activities will form the basis for a
maintenance reference plan
820
Chapter 11: Maintenance Modelling as a Decision Support Tool
Equipment manufacturers play a limited role only; they only become involved
in the design process at a late stage in a commercial, rather than a technical
context. In cases where they possess actual operational RAMS information
this is of a commercially confidential nature in view of marketing positioning
and future servicing support.
These observations are rather common. In (naval) ship building [38] RAMS aspects
are just being introduced. Whereas the ship‟s propulsion system is of vital interest,
quantification and substantiation of (RAMS, LCC) characteristics of supplier products
is lacking to a great extent. OEM‟s only receive information during the warranty
period and, if they are awarded a service contract, during major overhauls. Normally,
they lack information on the way their products have been used in period‟s in-
between. Information exchange between the OEM and the yard therefore is carefully
handled in the purchasing phase for liability reasons and afterwards regarded as
commercially valuable in-house data.
The tight financial structure required for construction of commercial vessels, together
with the decisive role of classifying agencies freezes the degrees of freedom in
design at a very early stage. In cases where tax rules favour short term operational
periods (after which the vessel is sold to a third party), lifecycle costing receives
minor interest.
In the utility sector risk studies are obligatory for objects like tunnels, bridges and
waterworks. The Dutch ministry of Transport, Public Works and Water Management
now tries to extend these studies towards validation with maintenance strategies and
their results[39]. Public-private partnerships now become the rule for large projects
where the contractor is obliged to provide a public service or project and to assume
substantial financial, technical and operational risk over timescales from 10 – 30
years. Here, we observe a growing interest in RAMS / LCC aspects.
Most textbooks and (scientific) applications hardly touch the industrial and data
problems sketched above. Mathematically oriented textbooks like Barlow and
Proschan [8] or Lewis [40] pass over all details, like almost all OR-type papers. More
practically oriented textbooks touch upon the problem but mainly in qualitative broad
brush forms, for example:
Bob Abernethy [41] (selling more than 20000 copies of this book) advocates
Weibull analysis for its “ability to provide reasonably accurate failure analysis and
failure forecast with extremely small samples (2 -3 points!)” , correctly deals with
uncertainty and confidence intervals, but “does not recommend” the latter “for use
821
Maintenance Modelling and Applications
ESReDA, as the European expert group in this area, has spent considerable
attention to the data subject with various working groups and seminars such as:
822
Chapter 11: Maintenance Modelling as a Decision Support Tool
Data quality problems were clearly addressed in two handbooks [46], [47], but mainly
covering the problems of operational feedback data (times to failure) collection,
validation, storage, analysis and retrieval.
The above suggests that the reliability engineering community is aware of the
data quality problems that lead to epistemic uncertainty in Weibull parameters
and, in turn, in results of decision support models. However, it appears that
these limitations are not openly expressed.
Example 1
10000
9000
8000
7000
6000
total costs
costs
5000 PM costs
failure costs
4000
3000
2000
1000
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
PM interval, y
Figure 9. PM optimisation
Let us consider a plant that produces 100 000 tons of product per year with a margin
of € 75 per ton. The economic lifetime of the plant is 20 years. Every four years a
statutory shutdown provides room for executing planned maintenance on critical
equipment.
Suppose now that we are dealing with a critical component with one failure mode
only. In the MRP it was assumed that the MTTF was 7 years with appreciable wear-
out characteristics. The team took it for granted that the component could be
replaced safely11 during shutdowns.
Later analysis showed that the failure process could be well represented by a
Weibull distribution with η = 8 years and β= 412. If failed in-between statutory stops,
the process will be down for 3 days causing a loss of margin of some € 62000.
11
Engineers frequently take the MTTF value for a kind of expression of useful life!
12
Obviously, this is simplified problem with unrealistically high beta value for didactical reasons; in
practice beta’s even at component level will be less than 2 to 3.
823
Maintenance Modelling and Applications
Repair takes 50 man-hours @ €60 / h together with € 5000 on materials. Hence, the
total cost of failure equals € 70 000. Replacement during a statutory shut-down does
not lead to additional production loss such that total costs are 40 man-hours @ € 60
= € 2400 and € 5000 materials; in total € 7400.
Figure 9 shows the traditional graph, indicating that planned replacement each 3-4
years yields total maintenance costs that are about 30% of a run-down strategy.
824
Chapter 11: Maintenance Modelling as a Decision Support Tool
reached without having to carry out corrective maintenance in-between stops; with
15 % chance, we will meet 1; with 1% two unplanned interventions, etcetera.
This leads to the costs per 5 cycles as depicted in Figure 11 with 84% probability 5 *
7400 = 37000, and so on.
For the timing of the planned interventions, the maintenance manager should take
into account the (epistemic) uncertainty in the Weibull parameters, which is normally
lacking. To get an impression, we generated ten series of ten realisations from a
Weibull process with “known” parameters; then estimated these parameters with the
maximum likelihood method. This may be compared with the observations to be
expected from ten components of identical make used under identical conditions; a
hypothetical situation that is positive from a reliability engineering view compared
with the actual situation where both manufacturing variability and operational
conditions will play a role. This exercise teaches us that for critical components the
accuracy of the Weibull parameters is quite restricted. We may safely assume that
the (epistemic) uncertainty will be some +/- 20 %.
Most of the standard software packages for maintenance optimisations of this type
have no facilities to account for this epistemic uncertainty. Although theory teaches
us that the optimum will rely on the marginal costs between planned and corrective
maintenance at a specific fraction of the lifetime (being a function of Weibull eta and
bèta), the output will be visualised in calendar time. Figure 12 shows that this easily
may lead to a misconception. If one takes into account the parameter uncertainty,
the optimum replacement interval will lie between 2.9 and 4.3 years. Realising that
the (future) cost figures are also uncertain, a realistic advice is that replacement can
safely be carried out somewhere between 2 and six years! (the dotted rectangle in
Figure 12). Obviously, in a more realistic situation dealing with multiple components
to be replaced at the same time, this range may even be larger.
825
Maintenance Modelling and Applications
16000
èta 8, bèta 4
èta 9.6, bèta 4
14000
èta 6.4, bèta 4
èta 8, bèta 3.2
12000 èta 8, bèta 4.8
overall annual costs, €
10000
8000
6000
0
0 2 4 6 8 10 12 14
replacement interval, y
Figure 12. Sensitivity of optimal replacement interval to +/- 20% spread in Weibull parameters
Example 2
Figure 13. From IEC 61511: Process Risk to Tolerable Risk Target
Many industries nowadays are faced with the consequences of risk-based safety
systems in line with IEC 61508, 61511 where the inherent risk of a hazardous
system unit (the equipment under control EUC) has to be reduced by a safeguarding
826
Chapter 11: Maintenance Modelling as a Decision Support Tool
system to a tolerable level accepted by authorities (Figure 13). To this end, a safety
instrumented system (SIS) will monitor a critical value (pressure, level, temperature,
..) and act (open a valve, apply cooling, ..) to reduce the consequences of
exceeding. Hence, a SIS loop will at least contain a sensor, a (logic) controller and
an actuator.
2.0E-02
1.5E-02
pfd
1.0E-02
SIL calculation
5.0E-03
0.0E+00
0 3 6 9 12 15 18 21 24
time, m
Let the required SIL level of the system unit be 2; a reduction factor of 100. The
norm then prescribes the probability of failure on demand of the serial system to be
between 10-3 and 10-2. Let the MTTF of all three elements be 150 y. In order to
comply with the IEC rules13 the system has to be tested at such a test interval Ti, that
the resulting probability of failure on demand (PFD) is in the bracket 10-3 – 10-2:
1
PFD * series * Ti
2
Figure 14 then shows that the system should be tested every 12 months. Most
maintenance engineers will indiscriminately accept this value, although those trained
in reliability engineering principles may have some hesitation:
1. The PFD value thus calculated is an average value over the time interval Ti
2. Testing, especially if intrusive actions need to be taken, carries the potential of
maintenance induced failures; a mechanic may overlook or induce a fault upon
which the system will remain unavailable until the next test.
13
For the sake of simplicity we assume here that these lambda’s are precise and of the hidden unsafe
type and leave out further details like nuisance failures, diagnostic coverage, common mode effects
etcetera.
827
Maintenance Modelling and Applications
2.0E-02
1.5E-02
time dependent
pfd
1.0E-02
norm
5.0E-03
0.0E+00
0 3 6 9 12 15 18 21 24
time, m
Time dependent behaviour is not addressed in the norms but has a significant
influence[48]. As Figure 15 shows, the choice of a test interval of 12 months implies
that the system will operate 50% of the time at SIL 1 rather than SIL 2 level.
The influence of human errors is clearly mentioned in the norms and even more
detailed in application papers like [49], but in qualitative terms only. It is of great
value to realise here, that these activities are frequently carried out by the same
mechanics that are proud to “quickly solve a problem” with their craftsmanship and
hands-on tools. Inspection of highly reliable items clashes with this mindset; he/she
may easily show mental problems in checking items “that never fail”. Especially if
these persons are not educated to properly understand the role and reliability of
safety instrumented systems (which our poll [3]showed not to be the case!), it
requires strong discipline and management to do this work properly. Mechanics may
overlook a fault, may introduce a fault, for instance, by leaving the system in the
inspection by-pass, by wrong calibration of a transmitter or may simply falsely tick off
items when under time pressure. If we denote such a maintenance induced error by
pmi, the PDF becomes:
1
PFD * * Ti pmi
2
828
Chapter 11: Maintenance Modelling as a Decision Support Tool
2.0E-02
1.5E-02
pfd
1.0E-02
SIL calculation
in practice
5.0E-03
0.0E+00
0 3 6 9 12 15 18 21 24
time, m
Figure 16 shows the effect of these errrors. Following Gigerenzer [50] we may easily
make the mechanic aware of his / her crucial role without any advanced
mathematics:
“John, assume that you have to test this item 100 times.
1 99 To the best of our knowledge we know that only in 1 out
of these 100 cases there will be a defect. So, the fact
that you repeatedly observe a functioning system is
normal.
Chance Chance
0.99 0.01
I take it for granted, that you, as an experienced
mechanic, say in 99 out of the 100 cases will correctly
find this defect, repair it, such that it is working again.
1 1
The problem now lies with the 99 cases where the
component is functioning correctly. If you are not careful
enough, you may introduce a fault; for instance, forget
Figure 17.
to put back the override switch. Suppose that this
happens in 1 out of 100 cases, you see that testing
does not improve the situation!
The above example shows the lack of attention in reliability engineering theory
in risk / inspection studies on data uncertainty, time dependent behaviour and
the effect of human error. Risk studies on paper, therefore may easily yield
results that deviate from realisation in practice.
829
Maintenance Modelling and Applications
Quantified decision making using decision support tools of the type we discuss here
is not an every day job for the maintenance manager / engineer in question. This
easily causes problems with lack of familiarity; only large establishments and
consultants will have staff with sufficient experience with these decision support tools
based on frequent use. This leads to high requirements on user friendliness aspects
and transparency; the more so, since these engineers typically have limited insight in
stochastic processes.
In many cases these staff “came through the ranks”, were developing themselves
from a vocational background like naval engineer, technician or mechanic. They thus
have a clear technical background, almost invariably specialised in a certain area
(rotating, civil, electrical) and miss insight in systems engineering and certainly in the
mathematics of stochastic processes. However, they continually take fundamental
decisions, involving appreciable amounts of money and serious consequences for
operation.
With these actors in mind, rather than specialised reliability engineers or consultants,
decision support tools then should support the following steps:
Since direct industrial maintenance costs range from 2 to 10% of turnover value,
the cost of lost production normally outweighs the costs of man hours and
materials by a factor 5 to 50. The starting point of the model thus lies with the
OEE (overall equipment effectiveness), production capacity and system
degradation / downtime versus operating time, rather than with reliability (like in
risk management).
830
Chapter 11: Maintenance Modelling as a Decision Support Tool
models that are consistent with theory and (imprecise, uncertain) data. Given the
size of the models to be expected (10 – few hundred maintainable components
and / or failure modes) users should be capable of zooming and panning in and
out in a model, analyzing whole or large parts of a plant down to a specific failure
process for freely chosen time intervals. Users must be able to check their
engineering insight and expectation with model outcomes. This requires
consistency in calculated results, good graphics and facilities for “what if
“analyses to quickly compare various cases.
As far as possible the package should provide the basic building blocks for
modelling practical situations with minimal effort. Modelling blocks should be
reusable; a minimum requirement is copy and paste but a library function is to be
preferred.
The speed of response is important, only for very large model studies these
engineers will accept calculation time of several minutes.
Since decisions made by the engineer will have to convince Operations as well as
his own staff, the arguments and reasoning why the package arrives at a result
should be easy to pass on. Use of standard reporting tools and graphics then is
imperative.
Model data and results should easily be documented and reported. An ISO
certified organisation requires that all decisions are substantiated and traceable.
The designer of the package should realise that such studies by a practical
engineer will hardly ever be inspected by an independent person. Hence,
modelling and data input errors will easily go by undetected; utmost care has to
be taken to prevent error making.
The packages of the first group target more on reliability / safety characteristics of
products or equipment over a certain lifetime than on production capacity in time;
some even miss the feasibility to include equipment / system capacity at all. The use
of hours as the basic timescale for Weibull parameters and failure time appears to be
a remnant of the risk based approach they initially were designed for. The same
applies to spare parts handling (and to a lesser extent, on workforce constraints) at
different echelons. Material handling is a standard part of ERP systems, using
algorithms like economic order quantity (EOQ) or control of level (s-S). Critical spare
parts like compressor wheels are better handled by pooling with other users or
commercial service contracts with a supplier.
Some packages, like CARE-CAME, have no proper distinction between down time
and repair time; in others, like AVSIM+ the user has to specify the work load as a
fraction of the downtime, which clashes with the engineering mindset.
Simulation packages allow the user great flexibility in modelling the failure
characteristics during operational phases as well as after repair or inspection. The
user can fill in data like:
Materials and manpower constraint handling at various levels with separate costs
and logistics factors.
831
Maintenance Modelling and Applications
Such flexibility, in most cases, clashes with the lack of information. We have already
significant difficulties in attributing Weibull parameters to a component under normal
operating conditions; what about those under x% loading, warm or cold standby?
The MRP is based on maintenance activities that effectively can be carried out; how
does one assess / accept some of these to be unsuccessfully finished (even
impossible) in the future?
5
sparc
10000 it seed 1
4 10000 it seed 2
5000 it seed 1
5000 it seed 2
1000 it seed 1
3
1000 it seed 2
0
01/01/2007 11/04/2007 20/07/2007 28/10/2007 05/02/2008 15/05/2008 23/08/2008 01/12/2008
Part of this (in view of the epistemic uncertainty, scientifically invalid) flexibility is due
to the general characteristics of Monte Carlo techniques in easy dealing with logic
rules and the ability of post-processing observed failure data.
Although computing speed and memory capabilities increase fast and the historical
drawback of Monte Carlo simulation of excessive computing time thus is vanishing,
the time dependency and level of detail in maintenance simulation studies still leads
to appreciable execution times. Figure 18 shows the workload calculated for a
component over a period of 350 h (eta = 8760 h, beta = 4, downtime = 500 h, repair
832
Chapter 11: Maintenance Modelling as a Decision Support Tool
time 100 h). Even if we take one component14 we have to perform some 5000 -
10000 MC simulations to get meaningful results per time interval selected. In the
latter case AvSim is a factor of ~ 50 slower compared with the analytical SPARC
package. For realistic models, this factor is much higher.
Due to the high level of detail offered and the ambiguity in data and terminology, in
combination with frequently rather poor graphical user interfaces that lack fast
panning and zooming, inexperienced users easily loose the overview on a system
and carry out analyses that are not supported by real evidence. The large number of
input data per failure mode and the lack of overview of data structure per failure
mode, component, subsystem and system, makes modelling an error prone process.
In cases, where the model is not independently verified, these errors will go by
unnoticed.
Engineers should properly understand and thus, be able to explain, why the model
arrives at a specific result. Commercial packages tend to consider the underlying
algorithms as commercially confidential, leaving room for speculations. For such
understanding, fast “what if” analysis is imperative. In this respect, analytical
techniques are to be preferred, the more so, since Monte Carlo techniques are
incapable of analysing separate time slices only (simulations have to start from time
zero) and invariable show scatter in results.
Great care has to be taken to lead the engineer to a correct model description
and effective support is needed to explain the model outcomes. Graphical user
interfaces (GUI’s) and powerful graphical output are strongly recommended.
The use of superfluous decimal places in the results should be avoided since
it creates a false impression of preciseness and accuracy.
9. On Failure Behaviour
Here we develop a model that tries to bridge this gap, matching both views. We
study the behaviour of a single component that due to an observable mechanism
causes degradation to such a level that it is considered to be failed and unfit for
further use. A practical example may be:
14
The (rather strange) 350 h interval has been chosen as the minimum value in AvSim+ given its
restriction on the maximum number of 50 intervals that can be selected for the total simulation time.
833
Maintenance Modelling and Applications
The tread depth of a car tyre: Current tread depth legislation requires that car
tyres must have a minimum of 1.6 mm of tread in a continuous band throughout
the central ¾ of the tread width and over the whole circumference of the tyre.
The minimum wall thickness of a pipe subject to process pressure: engineers use
the Barlow equation to select the type of pipe, providing a safety margin for
corrosion; its rate depending on the environment in case and the material chosen.
Piping will be inspected at regular intervals, to be declared unfit if (local) wall
thickness is below a threshold value.
“up”
physical
parameter
threshold
“down”
ttr
time
Figure 19 shows the process; until the time ttr the component is in the “up” state,
immediately thereafter it is functionally failed and thus in the “down”state.
The maintenance engineer will realise that the deterioration process depends on a
number of factors; the car tyre will not wear out if not used; corrosion depends, a.o.,
on temperature, pH, material stresses, which may vary over time and over the pipe
length.
100 10
90 9
80 8
70 7
# crossings, 100*pdf
physical value
60 6
50 5
40 threshold 4
number of
30 crossings 3
Weibull pdf:
η = 18.7, β = 2.6
20 g a m m a = 1 8 . 9 ,
2
M T T F = 3 5 . 5
10 1
0 0
0 10 20 30 40 50 60 70 80
time
834
Chapter 11: Maintenance Modelling as a Decision Support Tool
To this end, we have modelled the deterioration in time of the physical variable y as:
N (1.0,1.5)
y 1 (U (0,1))
yi 1 min[ yi , yi y ]
where 1 represents the inverse normal distribution with mean 1 and a sigma of
1.5; the probability being sampled from a random distribution U(0,1) The third
equation constrains the decline to be strictly zero or negative.
Figure 20 also shows 24 out of the 100 trend curves simulated15 together with the
“deterministic” representation of Figure 19:
where y represents the physical variable and the value of α16 is obtained from the
average decrease per unit of time, We identified the times at which the curves
intersect with the norm of, in this case, 40 units. The red diamonds show the number
of these “times to failure” that were further-on used to estimate the Weibull
parameters via regression as:
η = 18.7
β= 2.6
γ = 18.9
with γ representing the failure free period. This leads to an estimated MTTF of 35.5
time units.
Obviously, if we reduce the sigma of the normal probability distribution, the spread in
outcomes will become less, leading to a higher value of the bèta parameter of the
Weibull distribution. The upper half of Figure 20 appeals to the engineering mind of
the maintenance engineer, the lower half is the representation we use in reliability
engineering.
We may consider the difference between initial value and threshold and the α value
as design parameters. These are degrees of freedom for the design engineer taking
into account, for instance, the average corrosion rate in mils/year of a specific
material, given nominal conditions of use. These values will affect the mean α value,
and thus the mean of the distribution. The corresponding sigma value represents the
variations in operational conditions of use; temperature / pH excursions, or the
mismatch between calendar time and running hours (the use factor).
15
For clarity, the curves are only drawn until they intersect with the threshold value.
16
The value of α is higher than the mean of the normal distribution used, since we allowed no
reliability growth.
835
Maintenance Modelling and Applications
100
90
actual
80 0 - 10
0 - 20
0 - 30
70 0 - 40
inspection
insp. times
60
50
40
30
0 10 20 30 40 50 60
time
Inspections will increase our operational insight in actual failure propensity. The
engineer will try to get information on whether to replace or to postpone this to a later
moment in time. Under normal conditions, he / she needs a planning horizon of a few
weeks. This is possible only if the process has a recognisable trend.
If the ageing process is reasonably monotonous, such a trend analysis will generate
the required information in a reliable way. As an example, Figure 21 shows the
extrapolated intercept values, using successively more observations at time intervals
from 0 to 40.
However, the maintenance engineer will also meet situations like in Figure 22, where
the trend information will lead to premature (left side figure) or late (right)
replacement.
100 100
90 90
80 80
70 70
actual
actual 0 - 10
0 - 10 0 - 20
60 0 - 20 60
0 - 30
0 - 30 0 - 40
inspection inspection
50 insp. times 50
insp. times
40 40
30 30
0 10 20 30 40 50 60 0 10 20 30 40 50 60
time time
836
Chapter 11: Maintenance Modelling as a Decision Support Tool
This crude “physics of failure” (POF) model (Fig) allows us to convert the observed
physical variables into failure probabilities. It informs the engineer that failure is
rather spontaneous; the probability mass lies between the physical values 15 - 45.
100 1
90 0.9
80 0.8
70 0.7
60 0.6
phys value
cdf
50 0.5
40 0.4
physical value
30 0.3
observed value
20 0.2
model cdf
10 0.1
estimated cdf
0 0
0 10 20 30 40 50 60 70 80
time
837
Maintenance Modelling and Applications
Fig shows a realisation of one of the simulated runs. The triangles indicate the
observed values of the physical parameter, the red diamonds the estimated reliability
from the POF model at that moment in time.
The above examples show the significant benefits of even a crude (black box)
model of the physics of failure (POF). If the design engineer is capable to
transfer his / her knowledge about the influence of process conditions on the
failure process in a (albeit, estimated) quantitative form, the maintenance
engineer is in a better position to interpret both the consequences of
operational use and the information gained by inspection in order to arrive at a
substantiated decision for intervention. He / she then will realise that blaming
operators for “incorrect use” does not solve the problem; it is his / her task to
keep the equipment in the correct window of operation, where the
assumptions of the design engineer are valid. The simple approach of solely
“learning by failing”, (Weibull data analysis) is quite ineffective for critical
items that hardly ever fail.
The (investment and operational) costs of CBM should be weighed against the
reduction in corrective / planned maintenance execution costs.
CBM should allow the maintenance manager to plan and schedule an activity in a
reasonable time-scale (days, could be several weeks); not to act as a type of
alarm.
The CBM observations thus should be properly analysed in a statistical sense to
gain information on the development in time of the failure process. POF models
will result in more robust trend analysis.
The value of CBM is strongly increased if it possesses diagnostic power,
providing essential information on the type of failure of which component in order
to decide what activity to plan, what staff and special tools / spare parts are
required for execution.
Unfortunately, these criteria are not always met and experience with CBM is
therefore mixed; a properly monitored bearing for instance still showing unexpected
sudden or fast incipient failure. In other cases, the CBM instrumentation provides a
warning signal only to prevent consequential damage that could well be obtained at
lesser costs with conventional alarm systems.
838
Chapter 11: Maintenance Modelling as a Decision Support Tool
Jim Wardaugh [32] reports on the findings of the MERIT team in Shell on electric
motors: “in seven of our companies in six different countries. Each was a company
plant built to corporate standards with most rotating equipment having an installed
spare. However there was a variety of maintenance strategies in place.
Figure 24 summarizes our findings. It gives the percentage of each site‟s inventory of
motors removed to the workshop for significant repair each year. These percentages
have been broken down by reason for removal:
Clearly, this observation is at variance with the reliability engineering vision of Figure
9 and the results of location 5 show that “their condition monitoring did not seem very
effective in predicting and / or pre-empting failures”.
With some reflection, the lack of effectiveness of “add-on” CBM will not be a surprise.
First of all, we are dealing here in control terms with an “observability / controllability”
problem. One may expect the designer of the equipment / element to have, at least
in the back of his / her mind, considered the probability of failure in the design
specification. Since designing-out failures is in many cases cheaper than repair later-
on, he / she will take measures to increase the robustness where possible, as long
as costs remain reasonable. Such actions, however, will invariably decrease the
opportunities to non-intrusively measure signs of deterioration.
On the other side, the designer of the CBM-equipment, who addresses a large
market volume of “typical, similar” equipment, can only design for generic elements /
failure modes and has to face quite different, yet unknown, operational
circumstances. Without a proper understanding of the causal reasons for failure (a
839
Maintenance Modelling and Applications
These standard techniques all require a certain degree of wear to exist before a
signal will be generated (in non-destructive testing related to the probability of
detection, POD) as well as a relatively monotonous trend in the degradation process.
The latter again reflects the need for a physics of failure model, which, in practice, is
not always clear. (In theory, bearings should function well over a plant lifetime
(based on the number of cycles and loading pattern) but are in practice major causes
for equipment shutdown).
Jardine[43] proposes to use a proportional hazard model with a Weibull base line:
1
t n
h(t , Z (t )) exp i zi (t )
i 1
840
Chapter 11: Maintenance Modelling as a Decision Support Tool
U(i,k) Y(i,k)
PROCESS
IDENTIFICATION
METHOD
MODEL
Especially, the last point is of interest in CBM; using POF information, however
crude, will both increase the robustness of the model and its decision support, as
well as gain understanding and confidence with a practical engineer.
For a counter current heat exchanger Figure 25 the steady state heat balance may
be written as:
841
Maintenance Modelling and Applications
Q ma c pa Ta1 Ta 2 ma c pb Tb1 Tb 2
Q U * A
Ta1 Tb 2 Ta 2 Tb1
T T
ln a1 b 2
Ta 2 Tb1
with :
m mass flowrate
c pi specific heat of fluid i
Tij temperature stream i at position j
U overall heat trasfer coefficient
A= area
Having the mass flow rates, the specific heats and temperatures we may calculate
the U*A as a measure of the heat transfer capacities. Since we are interested in
variations of U in time, we may postulate a fouling mechanism that follows an S-
curve (sigmoid) in time; at first fouling will have a small effect, if the fouling layer
increases in thickness parts may break of. Identifying this model on-line, we obtain
continuous information on the change in heat transfer capacity and may calculate an
optimum point in time where the bundle has to be removed for cleaning, rather than
simply carrying out a planned clean out on calendar basis. Experience shows that
such condition-based replacements may increase the run length with 50 -100 %
compared with the conservative time based approach.
Note that, in this case, we use process measurements that frequently will already be
available for control purposes. Secondly, the model outcomes will readily be
accepted by engineers, the more so, since they directly may be associated with
production losses.
n 1
Z avg RTs P2 n
Hp 1
n 1 P1
MW .
n
H p polytropic head, kJ/kg
Z avg av. compressibility factor, dimensionless
R universal gas constant, 8.314kJ/kmol.K
Ts suction temp., K
MW mol. weight, kg/kmol
n polytropic exponent, dimensionless
P1 suction pressure, kPa
P2 discharge pressure, kPa
842
Chapter 11: Maintenance Modelling as a Decision Support Tool
In a petrochemical process most of these parameters are available since they are
required for process control purposes. Again a time dependency as described above
may be used to allow on-line identification of the differences between initial and
actual compressor characteristics. For reciprocating compressors, an analogous
pressure-velocity (PV)17 approach has proven to be a very effective technique but
requires additional, fast and accurate pressure transmitters.
In a similar way, one may identify on-line the pump characteristics of Figure 5 albeit
that the actual working point now also depends on pumping speed, line (process)
resistance and the throttling effect of the discharge control valve. Changes from the
original design curve indicate impeller wear or significant fouling.
The above shows that the application of control engineering principles in the asset
management area has significant scope. In fact, why do control engineers apply
advanced control on process variables to keep the process optimally running but
neglect control of the underlying equipment reliability, without which production will
come to a grinding halt? Where the examples above are quite easy, we have to
realise that extending this view to common problems like bearing or seal failure will
not be straightforward and will take quite some research. However, even casting the
available written documentation in a crude POF model will at least indicate the major
variables causing failure such that they may be better controlled. A similarity shows
up here with the stabilising control layer of Figure 1
PLANT MONTHLY
WIDE YEARLY
PLANT ASSET
MGMT
Critical MONITOR, WEEKLY
equipment OPTIMISE
Care, non-critical STABILISE DAILY
equipment
At the lowest layer of the asset management pyramid (Figure 26) we find daily
activities like routine visual inspections, cleaning, lubricating, greasing and repair of
non-critical items that are analysed in the MRP to be correctively maintained. This is
a significant (~40 -60%) of the maintenance effort for which modelling does hardly
play a role. The “care” aspects are necessary to keep the equipment running; there
is no need to quantify the consequences of neglecting this step. The effectiveness of
maintenance on non-critical items is mainly governed by the “bought-in reliability” of
high quality components reducing the required intervention frequency. This is also
the place where equipment has to be “controlled” to remain in the operating window
that the designer took into account. Without such control, we will observe
unexpected failures; the causal reasons of which to be detected only by time
consuming root cause analysis. One may expect that this equipment stabilising
control in the future will increasingly be carried out by dedicated built-in condition
measurement systems, rather than the traditional human observation approach.
17
The dynamic pressure change inside a cylinder.
843
Maintenance Modelling and Applications
For critical equipment this health monitoring needs to be more detailed. Existing
insight in the POF is updated, based on measured information as described above,
such that cost effective interventions can be planned in a timescale fitting within the
organisation. The frequency of updating has to be in line with the dynamics of the
degradation mechanism and usually will be in timescales of weeks to months. In
terms of modelling, the MRP necessarily has to be based on reliability bank data,
information from OEM‟s and / or engineering judgement and thus will give better
information on effectiveness than on failure evolution in time and optimal timing of
intervention. Predictive maintenance, underpinning this timing, needs to be based on
a POF model updated by measured, physical information.
For academia and PhD students, the paper work of a maintenance / asset
management optimisation model will be the final outcome. Industrial engineers,
however, have to ensure that the prognosticated benefits will actually be reached.
844
Chapter 11: Maintenance Modelling as a Decision Support Tool
We then have the situation that an optimal plan developed at the tactical layer of AM
(Figure 2) is jeopardised by inadequate management at the operational layer. Shell
Pernis Refinery, the largest refinery in Europe, in their strive for the Shell world-wide
Flexible Flagship Programme, recently showed that the active wrench time of its
mechanics workforce could be doubled [52] by reducing the complexity of
procedures. The latter was tacitly increased over the years after accident and
incident investigations, not leading, however, to a significant change in statistics.
With management attention now focussed on technicians own responsibility,
downtime is minimised and direct costs are expected to be reduced by some 25%.
Similar observations may be made to the value and clarity of information in the
MRP‟s handed over from the tactical to the operational layer and its feedback. In
many cases, the threshold value (condition) upon which maintenance needs to be
scheduled is not clear and thus open to subjective action. Frequently, the mechanic
is not structurally asked to report on items, which at the time of development of the
MRP had to be guestimated. The value of field data is underestimated; we observe
in a number of cases that, at the introduction of a new CMMS, the maintenance
manager is incapable of convincing the company on the necessity of additional
budget to convert the existing database to the new system, such that many years of
recorded experience are lost.
12. Conclusions
Maintenance modelling, the subject matter of this book, may be considered as the
vehicle in decision support tools to apply theoretical reliability engineering concepts
in practice, both in the design phase, as well as in actual operation.
Design type models, covering the lifetime of the plant, are based on (Weibull) data
from reliability data banks, in-house experience or derived from Original Equipment
Manufacturers (OEM) data. The granularity of this information is low, mostly at
equipment level in terms of MTTF given a “sound” maintenance regime. The
epistemic uncertainty may be estimated to be in the order of some 20%. Together
with the pressure (time, money) in the design project, this means that fundamental
decisions on system structure, lay-out and equipment will be taken with scarce
information. To transform the initial broad-brush approach from the design model into
a maintenance reference plan (MRP, where future strategies are underpinned), the
level of detail needs to be extended to that of critical failure modes for which both
external and in-house data banks normally have no precise data. We therefore meet
again considerable uncertainty in Weibull parameters.
We have shown that usage factors, like temperature, loading and type of operation
have a strong influence on the failure process and thus on its characterising
845
Maintenance Modelling and Applications
Human factors in quantitative analyses like in the calculation of SIL norms are
neglected, but have serious consequences in testing of high reliability safeguarding
equipment. Academic publications mostly leave out these inherent shortcomings or
consider them as items to be improved. More practically oriented books and
publications treat these problems qualitatively, but mostly proceed from there in a
standard reliability engineering approach. To some extent, the image is created that
the reliability engineering community is aware, but does not openly presents its
inherent flaws.
The stochastic models used in reliability engineering are not in line with the more
deterministic engineering mindset, the latter reasoning in terms of causal relations
and disliking uncertainty. A combination of a physics of failure (POF) model with
standard reliability engineering theory improves both the robustness of decision
support tools as well as the acceptance by practical engineers. The approach may
be borrowed from that in control engineering where control, optimisation and model
identification of complex units has demonstrated significant economic gains.
CBM based on physics of failure models, with parameters identified from process
measurements, is more robust than that using add-on instrumentation. In some
cases, these physics oriented models are already in place for advanced control of
process units where the optimum is defined by process constraints like the maximum
heat transfer of a cooler (fouling), fluid transport by a pump (wear) or transport of
vapour (compressor fouling, wear). Reliability engineering thus will strongly benefit
from closer cooperation with topical (mechanical, electrical, material) engineers to
846
Chapter 11: Maintenance Modelling as a Decision Support Tool
develop decision support tools that combine physical background with sound
stochastic analysis.
References.
847
Maintenance Modelling and Applications
848
Chapter 11: Maintenance Modelling as a Decision Support Tool
38. N.Marse, D. Soute, J. Kop, and C. F. H. Van Rijn, "Design for Realiability and
Availability," World Class Maintenance Consortium, Breda, The Netherlands
2008.
39. Sipke E. van Manen and J. v. d. Bogaard, "Living PAM," in This book.
40. E. E. Lewis, Introduction to reliability engineering. Second edition. New York:
John Wiley & Sons, 1996.
41. R. B. Abernethy, The New Weibull Handbook. North Palm Beach, Florida, USA,
2006.
42. H.P.Bloch, "Use equipment failure statistics properly," Hydrocarbon Processing,
vol. 78, Jan. 1999 1999.
43. A. K. S. Jardine and A. H. C. Tsang, Maintenance, Replacement and Reliability.
Boca Raton: CRC Press, 2006.
44. P. D. T. O'Connor, Practical Reliability Engineering. Chicester: John Wiley &
Sons, 2001.
45. V. Narayan, Effective Maintenance Management; Risk and Reliability Strategeis
for Optimizing Performance. New York: Industrial Press Inc, 2004.
46. H. Procaccia, Guidebook on the effective use of safety and reliability data. Paris:
SFER, 1995.
47. ESReDA, Handbook on Quality of Reliability Data. Hovik: Det Norske Veritas,
1999.
48. J. P. Signoret, "High Integrity Protection Systems (HIPS) – Making SIL
Calculations Effective," Touch Oil and Gas2007.
49. Anon, "Application of IEC 61508 and IEC 61511 in the Norwegian Petroleum
Industry," OLF2001.
50. G. Gigerenzer, Reckoning with risk. London: Penguin Books, 2002.
51. B. S. Blanchard, Maintainability, A key to effective serviceability and
maintenance management. New York: John Wiley & Sons, 1995.
52. Anon, "Pernis zet ballast overboord (Pernis puts dead weight overboard)," Shell
Venster (in Dutch), pp. 14-16, November / December 2009.
849