Chapter 11 Maintenance Modelling As A de

Chapter 11: Maintenance Modelling as a Decision Support Tool
Chapter 11
Maintenance Modelling as a Decision Support Tool; A Critical

View on Applicability and Acceptance in Industry
Cyp F.H. van Rijn

Senior RAMS consultant, lecturer RAMS Utrecht University of Applied Science,
hon.pres. ESReDA.
1. Introduction
According to the ESReDA Project Group proposal, the aim of this book is to provide
“a technical reference text which will document the current state-of-the-art”,
“emphasizing its practical application”.
The author of this chapter has experience both in control and in reliability
engineering applied to, mainly petrochemical, production processes and has
observed [1, 2] significant differences in engineering acceptance between these two,
from a theoretical point of view, rather similar disciplines. This contribution aims to
identify the reasons and practical stumbling blocks why inherently essential reliability
engineering theoretical principles are difficult to apply in practical situations. We will
investigate the needs and possibilities of maintenance engineers, the organisational
restrictions in applying RAMS1 over the lifecycle of installations, the lack and
uncertainty of required reliability data and their dependency on usage factors. There
are clear signs that these observations are at variance with the assumptions in
stochastic OR models and the flexibility offered by commercial decision support
tools. Based on the experience in process control, we will make a strong plea to use
physics of failure models that are better in line with the maintenance engineer‟s
mindset and effectively may use (process control type) continuous information that in
many cases is already available.
2. Maintenance Engineers as Our Clients.
In a previous paper we have presented[3] the results of a questionnaire on the level

of use of Asset Management (AM) techniques in a number of Dutch Industries and
Service Companies.
Although restricted in scope and coverage this study clearly indicates that:
 AM has a strong economic dimension; maintenance managers repeatedly decide

on activities covering a budget of tens to hundreds of million Euros. Apart from
these direct costs, the reported figures on Overall Equipment Effectiveness (OEE)
from 40 – 99% indicate a strong economic potential for AM adding to the business
results.
1
Reliability, availability, maintainability and safety.
805
Maintenance Modelling and Applications
 Next to managing OEE, AM is also responsible for safety and health management
with its related inspection activities.
 The value of AM is now clearly recognised at board level with most CEO‟s being
more sensitive towards safety than OEE. The level of related reporting is
acceptable, making effective use of computerised maintenance management
systems (CMMS) both to a dashboard function as well as in a learning
environment.
 Operational maintenance activities are mainly outsourced to specialised
contractors. As a consequence, in-house craftsmanship, technical insight and
experience are decreasing and the need for effective management techniques is
increasing.
 Technical insight and craftsmanship are stronger developed than knowledge in
general management and reliability engineering theory. Mechanics have little
insight in reliability engineering and AM strategy aspects.
 Respondents are of the opinion that sufficient education facilities at Master and
Bachelor level are existing, but not integrated with company career development
programmes.
 Only in mandatory cases, full-blown fault tree type process models and / or
reliability block diagrams are in use; the large majority of the respondents stay at
the RCM or at a simple, single failure mode planned maintenance optimisation
level. All interviewees invariably indicate a lack of reliable failure data; a
significant part now being active in using the CMMS for trending and root-cause
analysis. With the exception of safety related equipment, where such information
is now critical in view of quantitative standards, OEM‟2s are rather reluctant even
to provide MTTF data at equipment level. Weibull representation at a (critical)
failure mode level is virtually absent.
Maintenance modelling only has a practical value in decision support tools [4], if
used effectively they contribute significantly to overall company profit [5]. In practice,
however, practical engineers remain rather sceptical on the use of such techniques.
3. Reliability Versus Control Engineering.
Reliability Engineering covers a wide field of reliability, availability, maintainability,

safety, health and environment (RAMSHE) problems. Tools and techniques have
been developed since the fifties of last century both in Industry (like Failure Mode,
Effect and Criticality Analyses FMECA and Fault Tree Analyses FTA [6]3, Hazard
and Operability Studies HAZOP4, Reliability Centred Maintenance RCM [7]) and in
academic institutions. For the academic world, reliability engineering remains a
popular topic in (stochastic) Operations Research with well-known names like Barlow
[8] and Proschan, Birnbaum [9]and many others.
2
OEM: Original Equipment Manufacturer.
3
An offshoot of Military Procedure MIL-P-1629, titled Procedures for Performing a Failure Mode,
Effects and Criticality Analysis, dated November 9, 1949.,
4
HAZOP originated in 1963 in the Heavy Organic Chemicals Division of ICI UK
806
A similar situation can be observed in the field of process control engineering.

Ziegler [10]and Nichols developed basic rules for on-line single loop PID5 controller
tuning in the forties as a support for engineers buying their (Taylor Instrument)
controllers, Rosenbrock [11]developed an engineering, decoupling approach to deal
with multivariable problems, using models based on transfer functions in the complex
domain. Separately, mathematicians like Kalman [12] and Athans [13] developed the
concept of linear, time invariant multivariate optimal control with models in state-
space notation. From there on, model-based, nonlinear, multivariable, adaptive and
robust control theories were developed both in the frequency, as well as in state
space domain. The model-predictive approach, where the plant is in fact controlled
by continuously available model outputs, the model in turn being updated with plant
measurements, was quickly taken up by chemical engineers, familiar with this type if
modelling. They soon experienced the influence of equipment reliability; the added
economic value of constraint optimisation over a few days is easily lost in hours by
the unexpected trip of a compressor or a pump bringing the process out of the
control window. With these perspectives in mind, it is quite natural to observe that
these chemical (design, control / optimisation) engineers now enter the RAMS arena
[14] [15] [16] [17].
Optimising Hours -
control days
OPT
supervisory MIMO seconds –

minutes
PID SISO seconds
Figure 1. Schematic view of process control layers.
Modern process control is fully accepted in Industry at large. Whereas control

engineering started as a specific, separate discipline with specialists groups, active
only in major plants of large companies, nowadays it is integrated in the education of
process engineers. Supported by instrumentation vendors, these chemical engineers
are qualified to install both simple and complex control and optimisation loops.
Figure 1 gives a schematic overview of the control layers currently being applied in
plants. At the lowest, regulatory level, simple PID-controllers stabilise SISO6 process
variables as flows, temperatures and pressures in timescales measured in seconds.
At the next higher level, MIMO7 and model predictive controllers steer setpoints of
regulatory controllers, relying on some form of dynamic process model, to handle at
a slower pace severe control loop interactions, and / or to achieve desired
trajectories of process variables. Finally, for critical processes like a refinery catalytic
cracker, an overall, plant-wide model is used for economic optimisation. Often,
optimal plant operation is achieved when some of the manipulated and/or controlled
variables are near their limiting values (constraints). Therefore, the control structures
5
Proportional, integral, derivative controller; the standard in Industry.
6
single input, single output
7
multi input, multi output (multivariate or multivariable)
807
at the highest two levels invariable will have algorithms to detect, for instance, the
maximum heat load of an exchanger actually available.
The operators in the plant are well trained to understand the production process; the
way process variables affect output, quality and safety and get full information via
computerised control systems with advanced graphical user interfaces showing
historical trends on variables, deviations, alarms and calculated information on
effectiveness. The production department generates daily reports showing the added
value of their activities to the company.
This situation differs fundamentally from that in maintenance. Reliability engineering

has not provided the systematic approach with theory and models; generally lacks
“process” information to steer “control” variables and is not integrated in the culture
and education. Engineers and mechanics are valued mainly for swift, cost-effective
execution of activities at a good quality level.
CONTROL RELIABILITY
ENGINEERING ENGINEERING
DRIVERS USERS “MASTERS”
THEORETICAL BASIS +++ ++
USE OF DATA AND ICT +++ +
INTERACTION / ++ +/-
UNCERTAINTY
ASPECTS
LIFECYCLE ASPECTS + ++
ACCEPTANCE OF +++ -
MODELS
IDENTIFIABLE BENEFIT +++ ++
INTEGRATION +++ +/-
CAREER HIGH, LINKED WITH RESTRICTED,
OPPORTUNITIES OPERATIONS SPECIALISTS
Table 1
Table 1 [2] lists a number of characteristic differences as a kind of “consumer report”

with ratings from high ( + + + ) to low ( - - - ). The “drivers” in the control field are
typically the owners and users of installations; they swiftly note significant economic
improvements, for instance, by better (quality) control and use of energy. Authorities
are the main actors in stimulating risk analyses and (military, aero space) proven
component reliability. There is a striking difference in the theoretical basis, type and
engineering acceptance of models with a corresponding gap in the use of data and
ICT. The demonstrable benefits of process control have lead to a full integration in
the education of engineers providing good career opportunities. In contrast,
maintenance engineers are regarded as specialists.
4. Types of Decision Problems
Figure 2 [4] shows the strategic, tactical and operational layers in Asset
Management (AM). The most decisive RAMS alternatives are fixed in the design
phase where the plant layout and the type and make of equipment are defined,
virtually for the lifetime of the installation.
808
The reliability engineer supports these activities with model studies; to what extent
and at what costs can “the right amount of product of the right quality, safely and
environmentally sound cost-effectively be produced at the right moment in time?” At
first (section 0), modelling necessarily will be crude, with low granularity and
assumed data.
The maintenance reference plan (MRP) provides guidance on the strategies to be

employed to achieve these goals and thus requires more elaboration. For non-critical
items and servicing activities (cleaning, greasing, ..), OEM recommendations and / or
in-house experience usually are taken for granted. Larger equipment needs to be
modelled in further detail to evaluate the effect of selected strategies on groups of
failure modes, the grouping depending on cost/failure characteristics, but also on
practical possibilities for maintenance execution.
In this way, the system model is slowly expanded in detail and system insight is
growing. Its predictive power over the long run, however, is strongly affected by
uncertainty in input data.
In the operational phase, information on actual behaviour becomes available that

effectively may be used in reliability engineering models to update these strategies
and to provide information on the most optimal timing of major activities.
Design type models deal with average behaviour over the lifetime of the
installations; the operational models with time dependent behaviour mainly
during the intervals between major planned replacements, overhauls or plant
shutdowns. The dynamic criteria thus are quite different.
dashboard
Scenario analysis performance
managementt managementttmanagement
Risks / Opportunities specification

Strategic
Yes / No
Demands Capabilities
Asset performance performance
Life Cycle Costs measurement
Tactical
Maintenance & Operating Reference Plan performance bench-

management marking
Improve Plan
Operational
performance
Analysis Schedule improvement
Execute
Abandon /
demolish
Figure 2. Asset management layers
5. Effect of Usage and Operational Conditions
5.1 Reliability of Incandescent Lamps
Figure 3 shows an example of a data sheet of a signalling lamp that (after some
pressure!) may be obtained from a manufacturer. A reliability engineer will be
809
pleased to see the familiar Weibull diagram, showing, in this case, a bèta of 3.6 and
an eta of 3.3 years. Such diagrams are a result of the manufacturer‟s quality
assurance process, using accelerated testing where the operating voltage is higher
than the design value.
With the help of such information, one may investigate, for instance, the
effectiveness of a break-down versus block-replacement strategy of airport runway
illumination [4]. For these systems, great care is taken to ensure soft starting-up,
constant supply current and low vibration levels.
However, if one takes into account the physics of the illumination process, it
becomes clear that the lifetime of the lamp is exponentially dependent on filament
temperature, or, correspondingly to the metal vapour pressure:
MTTF 1 ~ P0 exp H vap / RT 
where P0 is a constant, R the universal gas constant, T temperature in K and ∆Hvap,

the heat of vaporisation, is 183 kcal/mol. With such a low value, lamp characteristics
are highly sensitive to changes in applied voltage V;
3.4
 Light output is approximately proportional to V
1.6
 Power consumption is approximately proportional to V
−16
 Lifetime is approximately proportional to V
This means that a 5% reduction in operating voltage will more than double the life
of the bulb, at the expense of reducing its light output by about 20%. This property is
used for “long-life” bulbs used in difficult-to-access locations (for example, traffic
lights or fixtures hung from high ceilings).
Figure 3. Weibull plot of incandescent lamp
Timmer [18] analysed the lifetime behaviour for various applied voltages (3.25 – 4.75
volts) in Weibull terms and shows the following information:
810
3.25 v 3.5 v 4.0 v 4.75 v

η, h 16430 4350 898 125
β 3.4 2.7 6.2 7.0
Table 2
The Weibull analysis Table 2 indicates a poor correlation for data set # 2 but the
dependency of the parameters η and β upon voltage applied is quite outspoken.
Hesen [19] et al report on the mandatory conformance of the Dutch low tension
power to NEN-EN 50160. For the low power tension distribution, the supply voltage
should remain at 230 V +/- 10% for 90 % of the time during a weekly interval. The
voltage fluctuations allowed by this norm indicates that the lifetime of an
incandescent lamp will show in practice a larger variation than that indicated by
laboratory tests.
From the above example, we have to conclude that the reliability engineer has
to treat such apparently accurate manufacturer information with great care. In
this case, for optimisation results to be reliable, measures such as voltage or
power regulation have to be applied. Apart from the above influence of applied
power, other factors like mechanical shocks and fouling will also affect the
lifetime.
5.2 Reliability of Electronic Equipment
With the introduction of risk-based norms like IEC 61508 and 61511, manufacturers
of electronic equipment now specify the MTTF of their various products as a
standard selection criterion.
 The basic failure rates for electronic components are usually taken from MIL-
HDBK-217 (although this source since 1994 is no longer updated!), Bellcore TR-
332, or some other (e.g. in-house) reference.
 The Part Stress technique adapts basic failure rates to those applicable in an
analysis:
i = b * T  S P Q E
Where
λi = the failure of the i-th part

λb = the basic failure rate for each generic part
using estimated “pi factors” with a value from 0 to 1 accounting for:
πT = operating temperature
πS = secondary stress level(e.g. vibrations, shock, etc)
πP = power factor
πQ = quality factor, degree of manufacturing control
πE = environmental factor
With these component failure rates, system characteristics like MTTF are calculated
via the Parts Count method or some form of model (fault tree, reliability block
811
diagram, Petri net, ..). In other cases, however, still infrequent, highly accelerated life
testing (HALT) and highly accelerated stress screening (HASS) are being used.
Especially the MIL-HDBK-217 approach is strongly criticised. As an example, Kumar

[20] presents data on one specific radio for which the US Army requested a MTBF of
1250 h with 80% confidence.
Figure 4 shows the field data of nine vendors of the “same” radio set versus the
calculated data they presented to the Army procurement. One easily observes that
the majority of the supplier‟s observed MTBF was no where their prediction.
8000
7000
6000
5000
FIELD DATA, h
4000
3000
2000
1000
0
0 1000 2000 3000 4000 5000 6000 7000 8000
MIL-HDBK-217, h
Figure 4. MIL-HDBK-217 vs. field data
Quoting a NASA document [21]:
“In general, models from these sources have not proven credible when predicting
reliability quantitatively. Studies show that failure rates predicted by the above
mentioned procedures can differ by over two orders of magnitude. However, if used
in their proper perspective, these empirical models can usefully compare the
reliability issues of two approaches to the same design.”
Ambient temperature plays a strong role on the lifetime of electronic equipment; in

general, lowering the temperature by 10o C doubles the lifetime. Equipment designed
for the Nordic European climate thus may show a significantly lower reliability in a
(sub)-tropical application.
Dupont and Litz [22] reported on an investigation by NAMUR, an international user

association of automation technology in the process industries, on the uncertainty in
SIL proving of safety-related control loops. They compared the bottom-up approach
(calculating the probability of failure on demand (PFD) from the MTTF values of
control loop components like pressure, temperature and level sensors) with industrial
data (more than 12000) from 37 industries. For the former (“typicals”) a confidence
region was estimated from the lowest reference values (best case) to the highest
812
(worst case) given by the suppliers. The observed data were analysed to yield a 70%
chi-squared PFD confidence interval.
Table 3 shows that the differences between calculated values and those observed in
practice differ by roughly a factor of ten, equivalent to one SIL step.
group PFD from vendor data 70 % conf. interv. PFD observed

Best case Worst case lower upper
pressure 1.3 * 10-2 9.3 * 10-2 3.1 * 10 -3
4.2 * 10-3
temperature 1.1 * 10-2 8.5 * 10-2 1.0 * 10-3 1.9 * 10-3
level 1.3 * 10-2 9.4 * 10-2 1.1 * 10 -3
1.8 * 10-3
Table 3. NAMUR PFD values for control loop sensors
From the above we conclude that:
 Operational aspects like temperature have a profound influence on the

MTTF.
 The lifetime data, now being provided by instrumentation companies, refer
to risk, rather than maintenance, problems.
 Instrument vendors provide conservative values using part stress and
systems modelling like parts count that may firmly be criticised.
5.3 Pump Reliability
Electric driven centrifugal pumps make up an important subclass in many industrial

systems; from power stations, oil and chemical industries to municipal sewage
stations. In most cases, the output of the process is significantly affected if a pump
system fails. Corrective and preventive maintenance costs are in the order of several
thousands of € per annum
For the sake of this example; a catastrophic failure will occur if either:
 The insulation resistance of the windings of the E-motor fails due to overheating
(hot spots).
 The bearings of the E-motor show excessive vibration requiring shut-down.
 The connecting shaft or elastic coupling fails due to shear or fatigue loading.
 The impeller of the pump fails due mechanical deformation / abrasion.
 The bearings of the pump fail.
 The pump seals fail, inducing unacceptable excessive leaking.
 The control system fails, either due to hardware or software problems.
The question now is; can we build a reliable and precise stochastic model of such a
system such that we can analyse the effects of “bought-in” reliability (investment in
better quality) versus that of maintenance strategies, either to ensure production
(risk) or to optimise costs (life cycle engineering)?
813
Bearing life is determined by the number of hours it will take for the metal to
"fatigue" which is a function of the load on the bearing, the number of rotations, and
the amount of lubrication that the bearing receives. The reliability of (roller) bearings
is generally expressed by the L10 life; the interval in which 10% of the bearings under
specific test conditions have failed (Lundberg-Palmgren, [23]):
exp
C  B
L10    *  * a
P n
where:
C = basic load rating, dynamic / static (manufacturer specification)

P = radial load or dynamic equivalent radial load applied on the
bearing.
exp = 3 for ball, 10/3 for roller bearings
B = factor dependent on the method; B = 1.5 × 106 for the Timken method
(3000 hours at 500 rev/min) and 106/60 for the ISO method
n = rotational speed in rev/min.
a = life adjustment factor; a = 1, when environmental conditions like
temperature are not considered;
The radial rating of the bearing C depends strongly on the type and homogeneity of
the bearing material and on the greasing effectiveness. The designer selects a
bearing depending on the required L10 life, the expected load and the dimensional
possibilities. Note that, according to the above equation:
 The relationship is based on models; the parameters of which (C, P) are rather
uncertain. The basic load rating stated in the supplier‟s catalogue is subjective to
manufacturing quality; the design load is an average value over expected
operational use.
 A change in (radial) load of a factor of two, changes the L10 life by a factor of 8 -
10.
 Fatigue is physically related to a number of loading cycles, in case of the bearing
represented by revolutions where the load passes from the inner race through the
balls (or rollers) to the outer race. The conversion from rotations per unit of time to
the lifetime identified in years requires a specification of the usage factor.
 In this simple form, the constant a embraces all “adjustment factors”. For
instance, the viscosity of the bearing oil versus the reference value has a
quadratic influence [24]
Whereas rotating equipment designers use the L10 value as a design parameter,
reliability engineers need failure distribution type information like Weibull parameters
to evaluate system and maintenance characteristics. Bearing suppliers like SKF8 do
not provide Weibull values.
8
Even supported by the Dutch Business Development Manager of SKF, the company did not respond
to questions at this point.
814
Figure 5. Q-H diagram
Few studies are available with sufficiently large data sets that Weibull analysis is
feasible. Lieblein & Zelen [25] describe tests to failure carried out over a period of
many years by four major ball bearing manufacturers on endurance test of bearings,
properly taking into account (censored) lifetimes that extended the test period. The
report contains 213 data values for the Weibull slope values β. The authors observe
a large spread; from a minimum value of 0.54 to a maximum value of 4.44! The
statistical average is about 1.4. Further refinements were made by Harris [26], taking
into account the existence of a fatigue limit stress; if an operating bearing
experiences stresses that do not exceed the limit stress, the bearing can achieve
infinite life. Rotating equipment engineers now commonly assume that the MTTF of a
bearing is equal to 5 times the L10 value; which is in line with a bèta slope of 1.4.
As crude as it may appear, the Lundberg-Palmgren equation provides insight into the
consequences of frequently mentioned reasons for early bearing failure:
 Misalignment and machinery vibration causes high additional stresses (P)

 An unbalance of 35 g on a 15 cm rotor running at 2000 rpm changes the L10 from
11.5 to 8 years [27]
 High bearing temperature changes material characteristics (the basic load rating;
C)
 Changes in lubrication (temperature, water content, fouling) affect the adjustment
factor a in the Lundberg-Palmgren equation.
With such sensitivities to operational conditions, the scatter in the Lieblein & Zelen
data is understandable.
815
Figure 6. Cavitation damage
When the bearing is installed in a pump, other factors that influence reliability will
show up. A (centrifugal) pump converts rotational energy from the driving motor into
hydraulic pressure. As with all physical processes, this conversion has efficiency less
than 100%, and is a function of the operating point in the volumetric displacement
versus head (Q-H) curve (Figure 5).
If a pump is operated away from the best efficiency point (BEP), the imperfect
hydrodynamic regime (internal recirculation within the pump) creates a number of
undesired phenomena, like increases in vibration, temperature rise, shaft deflection;
and a reduced NPSHA9 margin. Local pressure pulsations and subsequent radial
shaft deflection [28] will dynamically increase the load on the bearings; thereby
affecting their life.
The impeller is another critical element in the pumping system with failure
characteristics that are difficult to define and estimate. Erosion is one kind of wear,
depending on solids loading, fluid velocity and impeller material characteristics.
Cavitation, the “process of formation and disappearance of the vapour phase of a
liquid when it is subjected to reduced and subsequently increased pressures at
constant ambient temperatures” is the most serious reason for impeller failure [29].
Cavitation will lead to serious erosion (Figure 6) if the cavitation bubbles implode
near the surface of the vanes and their energy exceeds the resistance of the impeller
material.
Industrial pumps invariably show some form of cavitation in their range of operation,
quote: “with effects, which can be tolerated (millions of field pumps with satisfactory
and long services) or may cause problems (hundreds or thousands of field pump
troubles clearly diagnosed with cavitation as root cause (high speed, wide range of
operation) ” [29] Empirical relationships now exist between cavitation intensity
(bubble length) and erosion rate, containing such parameters as NPSH, saturation
pressure, gas loading. Together with rules for impeller design, visualisation
experiments and CFD calculations “it is possible to assess the impeller life
expectancy, with a good probability of success”. Obviously, such studies are not
standard parts of design and maintenance and reliable data on impeller life
characteristics thus are hard to obtain.
The characteristics of seal failure are to some extent comparable with those of
bearings. The seal face is designed as a sacrificial element that may wear out.
However, this accounts for only some 10 % of seal system failures [30]. In the
9
net positive suction head available
816
majority of cases [31], the seal will fail due to wrong operation (40%), wrong
installation (24%), bad application (19%) and other (17%) reasons. The seal faces
are easily affected by lack of control of barrier fluid pressure showing up in ill-
designed or ill-controlled seal systems, especially during pump starts [32]. A Weibull
failure analysis then will reveal an almost purely random character[4]
Electrical motors nowadays are quite reliable; they mainly fail due to bad electric
connections, bearing problems and winding insulation problems (hot spots).
Windings have rather well defined failure characteristics; Bloch and Geitner [33]
quote a value of 4.3 for the bèta value; the eta of 18.5 y since the time of publication
(1990) now higher due to better quality insulation materials. However, “Every
increase of 10 degrees Centigrade of a motor‟s windings above its design operating
temperature cuts the life of the motor‟s winding insulation by 50 percent, even if the
overheating was only temporary” [34]. The current drive for high efficiency (a.o. with
smaller air gaps in the design) make winding life more susceptible to changes in
ambient temperatures and, especially, to fouling which, in turn, leads to bad cooling.
In turn, higher motor temperatures affect bearing temperature and thus, lifetime, and
lead to premature degradation of the grease, needing shorter re-greasing intervals.
Another aspect of bearing failure in E-motors is associated with stray electric current:
“The risk for damage to bearings can occur when transient voltages exceed 0.5V or
the electric current flow is more than 0.1A/mm2 (related to the contact area of rolling
elements)10.” If no special electrically-insulated bearings are used, the useful life may
be restricted to 1 -2 years.
From the above we conclude that:
 Although reported data on overall pump reliability may well be used in risk
studies (“what is the probability that the pump under normal conditions of
maintenance and operation will deliver x m3/h on demand”), we should
realise that the underlying field data are strongly influenced by operational
and maintenance procedures.
 Pump reliability is based on the failure behaviour of a number of critical
components, the physical processes leading to failure we showed to be
rather ill-defined and susceptible to operational and maintenance regimes.
Databases at best give the proportion of these failures on the overall figure;
in-house CMMS’s only recently have scope for recording at this level.
 Even for well-specified components like bearings, where designers make a
selection based on vendor-specified criteria, identical components in
different operational conditions (in a pump, in an electric motor) will show
different failure rates.
 The required information for maintenance optimisation models in terms of
Weibull parameters thus necessarily depends on engineering insight and
the values are bound to be uncertain and imprecise.
10
http://evolution.skf.com/zino.aspx?articleID=79&lan=en-gb
817
Scouting BOD BDP PS Implementation Commissioning,

phase phase phase phase phase start-up & operation
final
investment
decision
Figure. 7. Typical steps in process design
6. Dealing with RAMS in Design
In the petrochemical industry one may distinguish [35] five project phases from
concept development to realisation (Figure 7). The first two or three steps are usually
mainly executed by the process owner, keeping commercial information on market
penetration / expansion in-house.
In the Scouting phase different process options are generated and the most
promising one is selected based on the outcome of technical and economic
evaluations. In the Basis Of Design (BOD) phase, the selected option is developed in
more detail, resulting in a description of the intended facilities, which is sufficiently
detailed to produce a +/- 30% estimate of the investment required to develop and
build the installation. This yields another decision point to go forward or abandon the
project. The Basic Design Package (BDP) allows for a +/- 20% capital cost estimate
and is sufficiently detailed to hand over the project to a contractor to produce a
project specification.
In the Project Specification (PS) phase, the selected contractor develops the BDP
into a design specification with a +/- 10% cost estimate. This serves as the final
information on which the future owner decides on freeing up capital to build the
installation; the final investment decision (FID). Contractors subsequently develop
the PS into a detailed design, procure equipment and materials, and construct the
installation. With a commissioning test the contractor will hand over the installation to
the process owner.
100%
75% Relative volume of

cost involved
Degree of
50%
freedom
25%
0%
Scouting BOD/BDP PS Implementation
Figure 8. Degrees of freedom versus costs in process design.
818
Throughout this process (Figure 8) more and more degrees of freedom become
frozen whilst capital expenditure increases sharply in the last phases. Note that
money spent before the FID is taken is considered „risk money‟, in that the
investment will be lost if the FID is negative. It is common practice that one in five to
ten basic designs do not lead to a positive FID. Senior management thus may be
reluctant to risk money up-front and, although it is of utmost importance to start early,
RAMS optimisation has to be effected with limited costs. Most decisions on plant
layout and redundancy (significantly affecting future OEE) will have be taken without
specific knowledge on type and brand of plant equipment. The final selection of the
latter will take place later on in the project specification phase and will not only
depend on RAMS characteristics but also on the local situation: the availability of
OEM support and spare parts, the relation with already existing processes, the skill
level and labour costs of craftsmen, etcetera. Hence, RAMS aspects will start as a
broad brush approach, becoming more elaborate the further investment capital is
assigned and the design is specified.
In the Scouting phase different process options are generated and the most
promising system will be selected on the basis of technical and economic
evaluations. The design staff will base their decisions on assumptions with respect to
the performance of the new installation, for instance they may intend to go for
„pacesetting performance‟. Such a goal will broadly be described in a form of overall
failure propensity (virtually no stops between shutdowns, minimal maintenance
costs, maximum x hours turn-around workload). Although rather vague, these
assumptions determine the economic and technical evaluation at the end of the
Scouting Phase, and with that the decision whether or not to progress into the BOD
phase. There, assumptions will increasingly get the character of requirements that
have to be realistic, and for which the (economic, safety, environmental)
consequences have to be substantiated as far as possible. There is a danger to only
use benchmarking data for the type of unit / equipment under consideration. Local
factors like climate, labour situation and logistics may significantly affect potential
RAMS goals.
At the end of this phase the design staff will have insight on:
 Economic value of product, investment and overall operational costs.

 Time-average actual versus nameplate capacity.
 Potential capacity bottlenecks.
 System failures between shutdowns causing production loss or SHE problems.
 Expected planned shutdown time.
In reliability engineering terms this means that an equipment representation with

capacity, one overall random failure mode and downtime plus planned maintenance
at the chosen interval is all that is available to build a coarse availability block
diagram, for instance for Pareto and criticality analysis.
In the Basis Of Design (BOD) phase, the selected option is developed in more detail,
resulting in a description of the intended facilities, which is sufficiently detailed to
produce a +/- 30% estimate of the investment required to develop and build the
installation The BOD will contain a list of requirements which the design must fulfil in
819
order for the project to be a candidate for further study, such as annual capacity and
stream days.
A problem with these first two steps is that, in spite of the lack of detail, the concept
of a new installation gradually takes shape and usually is difficult to change later on.
Given the competition between new projects to get the FID and the high net present
value discounting rates used (10 – 15%), there is a strong pressure on capital costs.
Operational aspects like (fluctuating) marketing demand, plant capacity / flexibility,
product availability, maintenance then easily get less attention.
The Basic Design Package may be regarded as an extension of the BOD with a goal
to arrive at a +/- 20% capital cost estimate. The resulting documents will allow
handover of goals and means for contractors to produce a project specification.
Aspects like plant lay-out and line-up, materials selection and sparing of (rotating)
equipment now have to be analysed.
The engineering decisions in this phase directly affect the future RAMS aspects and,
given the hand-over to external contractors, now almost become fixed. At present,
these decisions are based mainly on experience and insight but the lack of
quantification easily leads to sub-optimal designs, especially with the highly
integrated installations of to-day with restricted redundancy and buffering (to reduce
capital investment).
Since the process flow scheme will now be available, this is the stage to start basic
RAMS modelling at a proper level of detail. No detailed selection of equipment can
yet be made, the design staff thus will have to deal with generic models. The
company experience in terms of planned maintenance and the still remaining
observed equipment failures within PM periods now is of great value. Industry-
specific databases like OREDA [36] in the oil industry, EIReDA [37] in the power
industry may also provide benchmark data. The availability block diagram of the new
installation will now show “typical” equipment blocks with a few failure modes that
describe the availability restriction due to planned maintenance (total operating time
minus PM) and the random availability characteristics in between.
In the Project Specification (PS) phase, the selected contractor develops the BDP
into a specification, which together with +/- a 10% cost estimate is used as a basis
for the Final Investment Decision (FID). In order to arrive at such rather precise cost
estimates a more detailed equipment description is required. Options are still open;
will the plant be laid out to allow replacement at system, skid or component level? To
what extent is redundancy cost effective? What contracts are available locally for key
equipment service contracts? How does this works out in terms of local staff level,
workshops, hoisting, lay-down areas, …Gradually the effective (major) equipment
availability characteristics will now take shape, either by engineering sense or by
(simplified) quantitative availability modelling. These activities will form the basis for a
maintenance reference plan
During the Implementation (also known as Engineering, Procurement and

Construction, EPC) phase of the project, the detailed design is produced, materials
and equipment are procured, and facilities are constructed. In this phase the
maintenance reference plan starts to develop, based largely on experience and
vendor recommendations, and the spare parts requirements are determined.
820
Thus, in a capital intensive industry like petrochemical, where system

downtime has economic consequences of tens to hundreds of thousands € a
day, the information required for quantified decision support for maintenance
becomes available mainly after all major system decisions (and thereby,
system optimality) are frozen.
More detailed maintenance and inspection schemes have to be developed

during the operational phase of the installation based on observed behaviour
and growing insight.
Equipment manufacturers play a limited role only; they only become involved
in the design process at a late stage in a commercial, rather than a technical
context. In cases where they possess actual operational RAMS information
this is of a commercially confidential nature in view of marketing positioning
and future servicing support.
These observations are rather common. In (naval) ship building [38] RAMS aspects
are just being introduced. Whereas the ship‟s propulsion system is of vital interest,
quantification and substantiation of (RAMS, LCC) characteristics of supplier products
is lacking to a great extent. OEM‟s only receive information during the warranty
period and, if they are awarded a service contract, during major overhauls. Normally,
they lack information on the way their products have been used in period‟s in-
between. Information exchange between the OEM and the yard therefore is carefully
handled in the purchasing phase for liability reasons and afterwards regarded as
commercially valuable in-house data.
The tight financial structure required for construction of commercial vessels, together
with the decisive role of classifying agencies freezes the degrees of freedom in
design at a very early stage. In cases where tax rules favour short term operational
periods (after which the vessel is sold to a third party), lifecycle costing receives
minor interest.
In the utility sector risk studies are obligatory for objects like tunnels, bridges and
waterworks. The Dutch ministry of Transport, Public Works and Water Management
now tries to extend these studies towards validation with maintenance strategies and
their results[39]. Public-private partnerships now become the rule for large projects
where the contractor is obliged to provide a public service or project and to assume
substantial financial, technical and operational risk over timescales from 10 – 30
years. Here, we observe a growing interest in RAMS / LCC aspects.
7. The Role of the Reliability Engineering Community
Most textbooks and (scientific) applications hardly touch the industrial and data
problems sketched above. Mathematically oriented textbooks like Barlow and
Proschan [8] or Lewis [40] pass over all details, like almost all OR-type papers. More
practically oriented textbooks touch upon the problem but mainly in qualitative broad
brush forms, for example:
 Bob Abernethy [41] (selling more than 20000 copies of this book) advocates
Weibull analysis for its “ability to provide reasonably accurate failure analysis and
failure forecast with extremely small samples (2 -3 points!)” , correctly deals with
uncertainty and confidence intervals, but “does not recommend” the latter “for use
821
in presentations as they are often misinterpreted and misapplied”. The

consequences of parameter uncertainty remain undiscussed.
 Heinz Bloch [33] underscores the need for field data “assess machinery reliability
… throughout the lifecycle of the equipment, whenever we are faced with the
prospects or consequences of poor machinery reliability”, introduces “reliability
factors” that should be treated with “common sense”[42]
 Andrew Jardine [43] is convinced of “keen interest in evidence-based
maintenance decisions, rather than the use of gut feeling or indiscriminately
following the manufacturer‟s recommendations” from “asset managers .. not
knowing the data-mining techniques to extract useful knowledge from CMMS
data” to continue with standard Weibull analysis. On the other hand, Jardine
achieves considerable successes with his (commercial) work on condition
monitoring (OMDEC®)
 Kumar [20] is rather sceptical “Often statistics are used as a drunken man uses
lamp posts .. for support rather than illumination”, touches upon the various ways
components may fail and the limited number of data, to continue uncritically with
(rather outdated graphical!) Weibull analysis.
 An interesting evolution of thoughts, albeit in reliability prediction of manufactured
goods rather than in maintenance, can be observed in the successive editions of
O‟Connor‟s book [44]. Whereas the first edition closely followed the philosophy
behind MIL-HDBK -217, consecutive editions increasingly stronger criticise “the
naïve presentations of reliability predictions (having) done much to undermine the
credibility of reliability engineering” (3rd edition). In his opinion, “when people
involved in reliability work manage to unshackle themselves from the tyranny of
„the numbers game” , the way is cleared for .. practical engineering and
management approaches”.
 Narayan [45], as a practical engineer with substantial experience in data analysis
points to the problem of scarcity of data: “we will all we can do to prevent failures
of critical equipment… we cannot collect enough failure data to improve the
preventive maintenance plan” to follow thereafter with a more qualitative,
technical approach.
ESReDA, as the European expert group in this area, has spent considerable
attention to the data subject with various working groups and seminars such as:
 Equipment ageing and maintenance (1992)

 Quality of reliability data (1994)
 Reliability Data Analysis and Use (1995)
 Rotating machinery performance (1996)
 Operation Feedback Data & Knowledge Management for New Design (2000),
 Lifetime Management (2001),
 Maintenance Management & Optimization (2002)
 Assembling evidence of reliability (2004)
 Maintenance Modelling and Applications (2007)
 Uncertainty in Industrial Practice (2008)
 Supporting Technologies for Advanced Maintenance Information Management
(2008)
 Asset optimization and maintainability: Challenges in the new world order (2009)
822
Data quality problems were clearly addressed in two handbooks [46], [47], but mainly
covering the problems of operational feedback data (times to failure) collection,
validation, storage, analysis and retrieval.
The above suggests that the reliability engineering community is aware of the
data quality problems that lead to epistemic uncertainty in Weibull parameters
and, in turn, in results of decision support models. However, it appears that
these limitations are not openly expressed.
In addition, the habit of presenting results with a large number of (superfluous)

decimal places may well be misinterpreted by law makers, standardising
committees and practical maintenance engineers without deep knowledge on
these aspects.
7.1 Consequences for Decision Support Tools
Example 1
10000
9000
8000
7000
6000
total costs
costs
5000 PM costs
failure costs
4000
3000
2000
1000
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
PM interval, y
Figure 9. PM optimisation
Let us consider a plant that produces 100 000 tons of product per year with a margin
of € 75 per ton. The economic lifetime of the plant is 20 years. Every four years a
statutory shutdown provides room for executing planned maintenance on critical
equipment.
Suppose now that we are dealing with a critical component with one failure mode
only. In the MRP it was assumed that the MTTF was 7 years with appreciable wear-
out characteristics. The team took it for granted that the component could be
replaced safely11 during shutdowns.
Later analysis showed that the failure process could be well represented by a
Weibull distribution with η = 8 years and β= 412. If failed in-between statutory stops,
the process will be down for 3 days causing a loss of margin of some € 62000.
11
Engineers frequently take the MTTF value for a kind of expression of useful life!
12
Obviously, this is simplified problem with unrealistically high beta value for didactical reasons; in
practice beta’s even at component level will be less than 2 to 3.
823
Repair takes 50 man-hours @ €60 / h together with € 5000 on materials. Hence, the
total cost of failure equals € 70 000. Replacement during a statutory shut-down does
not lead to additional production loss such that total costs are 40 man-hours @ € 60
= € 2400 and € 5000 materials; in total € 7400.
The calculated optimum replacement interval is 3.5 years at an average cost of €

2787 per year.
Figure 9 shows the traditional graph, indicating that planned replacement each 3-4
years yields total maintenance costs that are about 30% of a run-down strategy.
Figure 10. Risk profile
Such a graph provides important information for the maintenance engineer on

the cost effectiveness of planned replacement. However, if followed without
criticism, the optimal interval will be fed into the planning system and
consequently an execution date will be scheduled. If the replacement cannot
be carried out in the scheduled interval, it will automatically affect a major Key
Performance Indicator (KPI) that of “percentage of planned activities
completed in time”. I have observed cases where the operational maintenance
manager spent extra effort in order not to be blamed for such delay!
More work is required to substantiate the proposal:
1. The underlying model is based on replacement theory; hence, it optimises in fact

over an infinite time horizon. With the 5 replacements here at maximum,
engineers will reason in terms of the risks of unexpected stops in between the
interval between shutdowns, rather than on long term average behaviour.
2. The Weibull parameters are uncertain; at best they can be trusted within, say, +/-
20 %
3. As similar uncertainty applies to the cost aspects, can repair / replacement be
carried out in the assumed (down-) time interval; are cost figures in the future (up
to 20 – 40 years) still valid?
To provide insight, we simulated a large number of a series of 5 replications. The

analysis Figure 10 shows that in about 84 % of all cases the process lifetime may be
824
reached without having to carry out corrective maintenance in-between stops; with
15 % chance, we will meet 1; with 1% two unplanned interventions, etcetera.
This leads to the costs per 5 cycles as depicted in Figure 11 with 84% probability 5 *
7400 = 37000, and so on.
Figure 11. Simulation results “Costs”
For the timing of the planned interventions, the maintenance manager should take
into account the (epistemic) uncertainty in the Weibull parameters, which is normally
lacking. To get an impression, we generated ten series of ten realisations from a
Weibull process with “known” parameters; then estimated these parameters with the
maximum likelihood method. This may be compared with the observations to be
expected from ten components of identical make used under identical conditions; a
hypothetical situation that is positive from a reliability engineering view compared
with the actual situation where both manufacturing variability and operational
conditions will play a role. This exercise teaches us that for critical components the
accuracy of the Weibull parameters is quite restricted. We may safely assume that
the (epistemic) uncertainty will be some +/- 20 %.
Most of the standard software packages for maintenance optimisations of this type
have no facilities to account for this epistemic uncertainty. Although theory teaches
us that the optimum will rely on the marginal costs between planned and corrective
maintenance at a specific fraction of the lifetime (being a function of Weibull eta and
bèta), the output will be visualised in calendar time. Figure 12 shows that this easily
may lead to a misconception. If one takes into account the parameter uncertainty,
the optimum replacement interval will lie between 2.9 and 4.3 years. Realising that
the (future) cost figures are also uncertain, a realistic advice is that replacement can
safely be carried out somewhere between 2 and six years! (the dotted rectangle in
Figure 12). Obviously, in a more realistic situation dealing with multiple components
to be replaced at the same time, this range may even be larger.
825
16000
èta 8, bèta 4
èta 9.6, bèta 4
14000
èta 6.4, bèta 4
èta 8, bèta 3.2
12000 èta 8, bèta 4.8
overall annual costs, €
10000
8000
6000
èta bèta min at y

4000 6.4 4 2.9
8 4 3.5
9.6 4 4.3
2000
0
0 2 4 6 8 10 12 14
replacement interval, y
Figure 12. Sensitivity of optimal replacement interval to +/- 20% spread in Weibull parameters
This example demonstrates a fundamental difference between reliability

engineering theory and practical application. In fact, theory mainly shows us
that planned replacement is a more cost-effective strategy than corrective
only. As reliability engineers we can also indicate the risks the maintenance
manager runs with critical items. However, the inherent uncertainties in
Weibull (especially the eta) parameters restrict our capabilities to calculate a
proper timing to be used in maintenance planning and scheduling.
Example 2
Figure 13. From IEC 61511: Process Risk to Tolerable Risk Target
Many industries nowadays are faced with the consequences of risk-based safety
systems in line with IEC 61508, 61511 where the inherent risk of a hazardous
system unit (the equipment under control EUC) has to be reduced by a safeguarding
826
system to a tolerable level accepted by authorities (Figure 13). To this end, a safety
instrumented system (SIS) will monitor a critical value (pressure, level, temperature,
..) and act (open a valve, apply cooling, ..) to reduce the consequences of
exceeding. Hence, a SIS loop will at least contain a sensor, a (logic) controller and
an actuator.
2.0E-02
1.5E-02
pfd
1.0E-02
SIL calculation
5.0E-03
0.0E+00
0 3 6 9 12 15 18 21 24
time, m
Figure 14. SIL calculation
Let the required SIL level of the system unit be 2; a reduction factor of 100. The
norm then prescribes the probability of failure on demand of the serial system to be
between 10-3 and 10-2. Let the MTTF of all three elements be 150 y. In order to
comply with the IEC rules13 the system has to be tested at such a test interval Ti, that
the resulting probability of failure on demand (PFD) is in the bracket 10-3 – 10-2:
1
PFD  * series * Ti
2
Figure 14 then shows that the system should be tested every 12 months. Most
maintenance engineers will indiscriminately accept this value, although those trained
in reliability engineering principles may have some hesitation:
1. The PFD value thus calculated is an average value over the time interval Ti
2. Testing, especially if intrusive actions need to be taken, carries the potential of
maintenance induced failures; a mechanic may overlook or induce a fault upon
which the system will remain unavailable until the next test.
13
For the sake of simplicity we assume here that these lambda’s are precise and of the hidden unsafe
type and leave out further details like nuisance failures, diagnostic coverage, common mode effects
etcetera.
827
2.0E-02
1.5E-02
time dependent
pfd
1.0E-02
norm
5.0E-03
0.0E+00
0 3 6 9 12 15 18 21 24
time, m
Figure 15. Time dependent behaviour
Time dependent behaviour is not addressed in the norms but has a significant
influence[48]. As Figure 15 shows, the choice of a test interval of 12 months implies
that the system will operate 50% of the time at SIL 1 rather than SIL 2 level.
The influence of human errors is clearly mentioned in the norms and even more
detailed in application papers like [49], but in qualitative terms only. It is of great
value to realise here, that these activities are frequently carried out by the same
mechanics that are proud to “quickly solve a problem” with their craftsmanship and
hands-on tools. Inspection of highly reliable items clashes with this mindset; he/she
may easily show mental problems in checking items “that never fail”. Especially if
these persons are not educated to properly understand the role and reliability of
safety instrumented systems (which our poll [3]showed not to be the case!), it
requires strong discipline and management to do this work properly. Mechanics may
overlook a fault, may introduce a fault, for instance, by leaving the system in the
inspection by-pass, by wrong calibration of a transmitter or may simply falsely tick off
items when under time pressure. If we denote such a maintenance induced error by
pmi, the PDF becomes:
1
PFD  *  * Ti  pmi
2
828
2.0E-02
1.5E-02
pfd
1.0E-02
SIL calculation
in practice
5.0E-03
0.0E+00
0 3 6 9 12 15 18 21 24
time, m
Figure 16. Effect of maintenance induced failure
Figure 16 shows the effect of these errrors. Following Gigerenzer [50] we may easily
make the mechanic aware of his / her crucial role without any advanced
mathematics:
Take the case where the required PFD is 10-2.

Gigerenzer warns that even an explanation in terms of
percentages is psychologically less transparent than
100 using pure numbers. Hence, he proposes the following
style (Figure 17):
“John, assume that you have to test this item 100 times.
1 99 To the best of our knowledge we know that only in 1 out
of these 100 cases there will be a defect. So, the fact
that you repeatedly observe a functioning system is
normal.
Chance Chance
0.99 0.01
I take it for granted, that you, as an experienced
mechanic, say in 99 out of the 100 cases will correctly
find this defect, repair it, such that it is working again.
1 1
The problem now lies with the 99 cases where the
component is functioning correctly. If you are not careful
enough, you may introduce a fault; for instance, forget
Figure 17.
to put back the override switch. Suppose that this
happens in 1 out of 100 cases, you see that testing
does not improve the situation!
How can we prevent this to happen?”
The above example shows the lack of attention in reliability engineering theory
in risk / inspection studies on data uncertainty, time dependent behaviour and
the effect of human error. Risk studies on paper, therefore may easily yield
results that deviate from realisation in practice.
829
8. The Practical Value of Commercial Software
A significant number of sw packages is commercially available to support practical

engineers in decision making. Most of these packages like AvSIM+ from Isograph
Ltd, Blocksim from Reliasoft, Asset Performance Optimiser (CARE-CAME) from
BQR Ltd are further developments of risk-based tools, either for safety or for product
quality. Others like MAROS / TARO from Jardinetechnology (now DNV) and
SPARC, from Shell Global Solutions are specifically developed for asset
management. Monte Carlo simulation is normally used; only SPARC and part of
Blocksim are based on analytical techniques.
8.1 User Requirements
Quantified decision making using decision support tools of the type we discuss here
is not an every day job for the maintenance manager / engineer in question. This
easily causes problems with lack of familiarity; only large establishments and
consultants will have staff with sufficient experience with these decision support tools
based on frequent use. This leads to high requirements on user friendliness aspects
and transparency; the more so, since these engineers typically have limited insight in
stochastic processes.
In many cases these staff “came through the ranks”, were developing themselves
from a vocational background like naval engineer, technician or mechanic. They thus
have a clear technical background, almost invariably specialised in a certain area
(rotating, civil, electrical) and miss insight in systems engineering and certainly in the
mathematics of stochastic processes. However, they continually take fundamental
decisions, involving appreciable amounts of money and serious consequences for
operation.
With these actors in mind, rather than specialised reliability engineers or consultants,
decision support tools then should support the following steps:
 Clearly identify the problem or opportunity and its boundaries.

 Gather relevant information
 Develop as many alternatives as possible
 Evaluate alternatives to decide which is best
 Decide on and implement the best alternative
 Convince own staff and Operations and follow-up on the decision taken
Consequently, we may draw up a list of user requirements:
 Since direct industrial maintenance costs range from 2 to 10% of turnover value,
the cost of lost production normally outweighs the costs of man hours and
materials by a factor 5 to 50. The starting point of the model thus lies with the
OEE (overall equipment effectiveness), production capacity and system
degradation / downtime versus operating time, rather than with reliability (like in
risk management).
 Modelling should be simple and transparent; with users lacking systems

engineering and probabilistic insight, the package should allow only for building
830
models that are consistent with theory and (imprecise, uncertain) data. Given the
size of the models to be expected (10 – few hundred maintainable components
and / or failure modes) users should be capable of zooming and panning in and
out in a model, analyzing whole or large parts of a plant down to a specific failure
process for freely chosen time intervals. Users must be able to check their
engineering insight and expectation with model outcomes. This requires
consistency in calculated results, good graphics and facilities for “what if
“analyses to quickly compare various cases.
 As far as possible the package should provide the basic building blocks for
modelling practical situations with minimal effort. Modelling blocks should be
reusable; a minimum requirement is copy and paste but a library function is to be
preferred.
 The speed of response is important, only for very large model studies these
engineers will accept calculation time of several minutes.
 Since decisions made by the engineer will have to convince Operations as well as
his own staff, the arguments and reasoning why the package arrives at a result
should be easy to pass on. Use of standard reporting tools and graphics then is
imperative.
 Model data and results should easily be documented and reported. An ISO
certified organisation requires that all decisions are substantiated and traceable.
The designer of the package should realise that such studies by a practical
engineer will hardly ever be inspected by an independent person. Hence,
modelling and data input errors will easily go by undetected; utmost care has to
be taken to prevent error making.
The packages of the first group target more on reliability / safety characteristics of
products or equipment over a certain lifetime than on production capacity in time;
some even miss the feasibility to include equipment / system capacity at all. The use
of hours as the basic timescale for Weibull parameters and failure time appears to be
a remnant of the risk based approach they initially were designed for. The same
applies to spare parts handling (and to a lesser extent, on workforce constraints) at
different echelons. Material handling is a standard part of ERP systems, using
algorithms like economic order quantity (EOQ) or control of level (s-S). Critical spare
parts like compressor wheels are better handled by pooling with other users or
commercial service contracts with a supplier.
Some packages, like CARE-CAME, have no proper distinction between down time
and repair time; in others, like AVSIM+ the user has to specify the work load as a
fraction of the downtime, which clashes with the engineering mindset.
8.2 System Modelling
Simulation packages allow the user great flexibility in modelling the failure
characteristics during operational phases as well as after repair or inspection. The
user can fill in data like:
 Materials and manpower constraint handling at various levels with separate costs
and logistics factors.
831
 Block data: standby types as hot (failure characteristics identical to operational),

warm (with a Standby Failure Apportionment Percentage) or cold (no ageing).
 Failure model: warm standby failure apportionment in %, warm standby ageing
apportionment in %
 Reliability state after corrective / planned repair and inspection: as good as new,
intermediate (with an age reduction factor between 0 and 1), as good as old
 Phase dependent adjustment factors δ, expressing the MTTF in a given state of
operation (phase) as MTTFphase = MTTFnormal / δ
 Repair condemnation rate (% of unsuccessful repairs): quoting from an instruction
manual: “% of failure mode repairs that finish unsuccessfully (?) or are impossible
(???).
Such flexibility, in most cases, clashes with the lack of information. We have already
significant difficulties in attributing Weibull parameters to a component under normal
operating conditions; what about those under x% loading, warm or cold standby?
The MRP is based on maintenance activities that effectively can be carried out; how
does one assess / accept some of these to be unsuccessfully finished (even
impossible) in the future?
comparison manhours per 350 h
5
sparc
10000 it seed 1
4 10000 it seed 2
5000 it seed 1
5000 it seed 2
1000 it seed 1
3
1000 it seed 2
0
01/01/2007 11/04/2007 20/07/2007 28/10/2007 05/02/2008 15/05/2008 23/08/2008 01/12/2008
Figure 18. Comparison Monte Carlo versus analytical solutions.
Part of this (in view of the epistemic uncertainty, scientifically invalid) flexibility is due
to the general characteristics of Monte Carlo techniques in easy dealing with logic
rules and the ability of post-processing observed failure data.
Although computing speed and memory capabilities increase fast and the historical
drawback of Monte Carlo simulation of excessive computing time thus is vanishing,
the time dependency and level of detail in maintenance simulation studies still leads
to appreciable execution times. Figure 18 shows the workload calculated for a
component over a period of 350 h (eta = 8760 h, beta = 4, downtime = 500 h, repair
832
time 100 h). Even if we take one component14 we have to perform some 5000 -
10000 MC simulations to get meaningful results per time interval selected. In the
latter case AvSim is a factor of ~ 50 slower compared with the analytical SPARC
package. For realistic models, this factor is much higher.
Due to the high level of detail offered and the ambiguity in data and terminology, in
combination with frequently rather poor graphical user interfaces that lack fast
panning and zooming, inexperienced users easily loose the overview on a system
and carry out analyses that are not supported by real evidence. The large number of
input data per failure mode and the lack of overview of data structure per failure
mode, component, subsystem and system, makes modelling an error prone process.
In cases, where the model is not independently verified, these errors will go by
unnoticed.
Engineers should properly understand and thus, be able to explain, why the model
arrives at a specific result. Commercial packages tend to consider the underlying
algorithms as commercially confidential, leaving room for speculations. For such
understanding, fast “what if” analysis is imperative. In this respect, analytical
techniques are to be preferred, the more so, since Monte Carlo techniques are
incapable of analysing separate time slices only (simulations have to start from time
zero) and invariable show scatter in results.
Inexperienced engineers may easily misuse commercial software packages for

maintenance modelling. Without proper education, situations may be modelled
for which the data support is lacking. If used infrequently, incorrect data may
be used, the effect of which may go by unnoticed.
Great care has to be taken to lead the engineer to a correct model description
and effective support is needed to explain the model outcomes. Graphical user
interfaces (GUI’s) and powerful graphical output are strongly recommended.
The use of superfluous decimal places in the results should be avoided since
it creates a false impression of preciseness and accuracy.
9. On Failure Behaviour
9.1 Problem Description
Practical engineers have problems in understanding the stochastic representation of

failure behaviour in, e.g., Weibull parameters. They easily misrepresent commonly
used terms; for instance, regarding the MTTF as a kind of “useful, failure free,
period”. On the other hand, reliability engineers mainly use statistical information and
probability theory in their decision support models; models based on physics of
failure are missing to a great extent.
Here we develop a model that tries to bridge this gap, matching both views. We
study the behaviour of a single component that due to an observable mechanism
causes degradation to such a level that it is considered to be failed and unfit for
further use. A practical example may be:
14
The (rather strange) 350 h interval has been chosen as the minimum value in AvSim+ given its
restriction on the maximum number of 50 intervals that can be selected for the total simulation time.
833
 The tread depth of a car tyre: Current tread depth legislation requires that car
tyres must have a minimum of 1.6 mm of tread in a continuous band throughout
the central ¾ of the tread width and over the whole circumference of the tyre.
 The minimum wall thickness of a pipe subject to process pressure: engineers use
the Barlow equation to select the type of pipe, providing a safety margin for
corrosion; its rate depending on the environment in case and the material chosen.
Piping will be inspected at regular intervals, to be declared unfit if (local) wall
thickness is below a threshold value.
In both cases, the component has functionally failed, although physical

consequences (tire burst, leaking) may be zero. In fact, they are immaterial in this
context; the maintenance engineer will act on the pre-defined threshold value.
“up”
physical
parameter
threshold
“down”
ttr
time
Figure 19. Failure behaviour
Figure 19 shows the process; until the time ttr the component is in the “up” state,
immediately thereafter it is functionally failed and thus in the “down”state.
The maintenance engineer will realise that the deterioration process depends on a
number of factors; the car tyre will not wear out if not used; corrosion depends, a.o.,
on temperature, pH, material stresses, which may vary over time and over the pipe
length.
100 10
90 9
80 8
70 7
# crossings, 100*pdf
physical value
60 6
50 5
40 threshold 4
number of
30 crossings 3
Weibull pdf:
η = 18.7, β = 2.6
20 g a m m a = 1 8 . 9 ,
2
M T T F = 3 5 . 5
10 1
0 0
0 10 20 30 40 50 60 70 80
time
Figure 20. Simulation study
834
To this end, we have modelled the deterioration in time of the physical variable y as:
  N (1.0,1.5)
y   1 (U (0,1))
yi 1  min[ yi , yi  y ]
where  1 represents the inverse normal distribution with mean 1 and a sigma of
1.5; the probability being sampled from a random distribution U(0,1) The third
equation constrains the decline to be strictly zero or negative.
Figure 20 also shows 24 out of the 100 trend curves simulated15 together with the
“deterministic” representation of Figure 19:
yt  100   * t;  1.68
where y represents the physical variable and the value of α16 is obtained from the
average decrease per unit of time, We identified the times at which the curves
intersect with the norm of, in this case, 40 units. The red diamonds show the number
of these “times to failure” that were further-on used to estimate the Weibull
parameters via regression as:
η = 18.7
β= 2.6
γ = 18.9
with γ representing the failure free period. This leads to an estimated MTTF of 35.5
time units.
Obviously, if we reduce the sigma of the normal probability distribution, the spread in
outcomes will become less, leading to a higher value of the bèta parameter of the
Weibull distribution. The upper half of Figure 20 appeals to the engineering mind of
the maintenance engineer, the lower half is the representation we use in reliability
engineering.
We may consider the difference between initial value and threshold and the α value
as design parameters. These are degrees of freedom for the design engineer taking
into account, for instance, the average corrosion rate in mils/year of a specific
material, given nominal conditions of use. These values will affect the mean α value,
and thus the mean of the distribution. The corresponding sigma value represents the
variations in operational conditions of use; temperature / pH excursions, or the
mismatch between calendar time and running hours (the use factor).
15
For clarity, the curves are only drawn until they intersect with the threshold value.
16
The value of α is higher than the mean of the normal distribution used, since we allowed no
reliability growth.
835
100
90
actual
80 0 - 10
0 - 20
0 - 30
70 0 - 40
inspection
insp. times
60
50
40
30
0 10 20 30 40 50 60
time
Figure 21. Example of trend analysis
Inspections will increase our operational insight in actual failure propensity. The
engineer will try to get information on whether to replace or to postpone this to a later
moment in time. Under normal conditions, he / she needs a planning horizon of a few
weeks. This is possible only if the process has a recognisable trend.
If the ageing process is reasonably monotonous, such a trend analysis will generate
the required information in a reliable way. As an example, Figure 21 shows the
extrapolated intercept values, using successively more observations at time intervals
from 0 to 40.
However, the maintenance engineer will also meet situations like in Figure 22, where
the trend information will lead to premature (left side figure) or late (right)
replacement.
100 100
90 90
80 80
70 70
actual
actual 0 - 10
0 - 10 0 - 20
60 0 - 20 60
0 - 30
0 - 30 0 - 40
inspection inspection
50 insp. times 50
insp. times
40 40
30 30
0 10 20 30 40 50 60 0 10 20 30 40 50 60
time time
Figure 22. Further examples of trend analysis
The stepwise failure process sketched above is obviously a crude representation. In

practice, components will (either or not) show a wear process that reduces their
capability, for instance, to withstand a load pattern that, in itself, is again a stochastic
process. We will extend the example by assuming that the probability of failure can
be described by a normal distribution with mean 30 and sigma 5.
836
This crude “physics of failure” (POF) model (Fig) allows us to convert the observed
physical variables into failure probabilities. It informs the engineer that failure is
rather spontaneous; the probability mass lies between the physical values 15 - 45.
Figure 23. POF model
100 1
90 0.9
80 0.8
70 0.7
60 0.6
phys value
cdf
50 0.5
40 0.4
physical value
30 0.3
observed value
20 0.2
model cdf
10 0.1
estimated cdf
0 0
0 10 20 30 40 50 60 70 80
time
Figure 24. One realisation
837
Fig shows a realisation of one of the simulated runs. The triangles indicate the
observed values of the physical parameter, the red diamonds the estimated reliability
from the POF model at that moment in time.
The above examples show the significant benefits of even a crude (black box)
model of the physics of failure (POF). If the design engineer is capable to
transfer his / her knowledge about the influence of process conditions on the
failure process in a (albeit, estimated) quantitative form, the maintenance
engineer is in a better position to interpret both the consequences of
operational use and the information gained by inspection in order to arrive at a
substantiated decision for intervention. He / she then will realise that blaming
operators for “incorrect use” does not solve the problem; it is his / her task to
keep the equipment in the correct window of operation, where the
assumptions of the design engineer are valid. The simple approach of solely
“learning by failing”, (Weibull data analysis) is quite ineffective for critical
items that hardly ever fail.
10. Condition Based Maintenance
Time-based preventive maintenance is frequently used to plan major activities during

plant shutdowns with the goal of undisturbed operation in period‟s in-between.
However, as we have seen above, the use of (calendar) time as a condition indicator
means that we thus have mapped the physical deterioration processes versus time.
It is questionable whether such a mapping is unique and thus effective. In principle,
predictive (condition based) maintenance uses more and better information that may
improve decision support.
We may easily formulate a number of criteria for predictive or condition-based (CBM)

maintenance to be effective in practice:
 The (investment and operational) costs of CBM should be weighed against the
reduction in corrective / planned maintenance execution costs.
 CBM should allow the maintenance manager to plan and schedule an activity in a
reasonable time-scale (days, could be several weeks); not to act as a type of
alarm.
 The CBM observations thus should be properly analysed in a statistical sense to
gain information on the development in time of the failure process. POF models
will result in more robust trend analysis.
 The value of CBM is strongly increased if it possesses diagnostic power,
providing essential information on the type of failure of which component in order
to decide what activity to plan, what staff and special tools / spare parts are
required for execution.
Unfortunately, these criteria are not always met and experience with CBM is
therefore mixed; a properly monitored bearing for instance still showing unexpected
sudden or fast incipient failure. In other cases, the CBM instrumentation provides a
warning signal only to prevent consequential damage that could well be obtained at
lesser costs with conventional alarm systems.
838
Jim Wardaugh [32] reports on the findings of the MERIT team in Shell on electric
motors: “in seven of our companies in six different countries. Each was a company
plant built to corporate standards with most rotating equipment having an installed
spare. However there was a variety of maintenance strategies in place.
Figure 24 summarizes our findings. It gives the percentage of each site‟s inventory of
motors removed to the workshop for significant repair each year. These percentages
have been broken down by reason for removal:
 Breakdown (i.e., the motor had been run to failure)

 Condition monitoring had indicated imminent failure
 Time based overhaul regime in place for some or all of the motors
Considering this we found that….: The proportion of breakdowns was fairly

constant whether you do condition monitoring and/or overhauls or just let
things run to failure”
Figure 24. Practical experience with CBM (from Wardaugh)
Clearly, this observation is at variance with the reliability engineering vision of Figure
9 and the results of location 5 show that “their condition monitoring did not seem very
effective in predicting and / or pre-empting failures”.
With some reflection, the lack of effectiveness of “add-on” CBM will not be a surprise.
First of all, we are dealing here in control terms with an “observability / controllability”
problem. One may expect the designer of the equipment / element to have, at least
in the back of his / her mind, considered the probability of failure in the design
specification. Since designing-out failures is in many cases cheaper than repair later-
on, he / she will take measures to increase the robustness where possible, as long
as costs remain reasonable. Such actions, however, will invariably decrease the
opportunities to non-intrusively measure signs of deterioration.
On the other side, the designer of the CBM-equipment, who addresses a large
market volume of “typical, similar” equipment, can only design for generic elements /
failure modes and has to face quite different, yet unknown, operational
circumstances. Without a proper understanding of the causal reasons for failure (a
839
POF model) it is difficult to define which variables should be monitored, at which

intervals and how the signals should be interpreted.
Examples of frequently used CBM methods are:
 Equipment: Vibration, Ultrasound, Motor current, Oil debris, Infrared thermometry

analysis
 Static equipment / piping: crack and / or wall thickness testing with dye penetrant,
ultrasonic, eddy current, magnetic particle techniques
These standard techniques all require a certain degree of wear to exist before a
signal will be generated (in non-destructive testing related to the probability of
detection, POD) as well as a relatively monotonous trend in the degradation process.
The latter again reflects the need for a physics of failure model, which, in practice, is
not always clear. (In theory, bearings should function well over a plant lifetime
(based on the number of cycles and loading pattern) but are in practice major causes
for equipment shutdown).
Jardine[43] proposes to use a proportional hazard model with a Weibull base line:
 1
t  n 
h(t , Z (t ))    exp   i zi (t ) 
    i 1 
where h(t)Z(t)) is the (instantaneous) conditional probability of failure at time t given

the values of the covariates z1(t), z2(t), ..zn(t). These covariates are obtained via
data-mining techniques in OMDEC®, like material concentrations in oil debris
analysis or specific vibration level of a bearing. The γi values are the covariate
parameters that indicate the degree of influence of each covariate on the hazard
function. Note that in such an approach we have to run costly experiments in a
running process, in order to arrive at a statistical (black box) model with the usual
restriction that such a model can only be assumed to be valid in “similar” conditions
only.
This approach is comparable with that of model identification in process control

(Figure 26). Test signals U(i,k), for instance, pseudo binary noise, are used to excite
the system in such a way that the response Y(I,k) can be used to fit a model in state
space or in the time domain without seriously affecting process throughput or
product quality. The resulting black box model is valid in a small region around the
nominal process values only. A more robust model is obtained if a-priori information
about the model structure and related physical processes is used (a grey-box
model).
840
U(i,k) Y(i,k)
PROCESS
IDENTIFICATION
METHOD
MODEL
Figure 26. Model identification in process control
However, in contrast with predictive maintenance, for example, in the Jardine

approach:
 We measure physical values, rather than a conditional probability of failure.

 We learn from exciting the system without large disturbances, rather than having
to let it fail.
 We use input variables that we know to influence the measured value, i.e. we
have a physical-chemical model of the system.
Especially, the last point is of interest in CBM; using POF information, however
crude, will both increase the robustness of the model and its decision support, as
well as gain understanding and confidence with a practical engineer.
In a number of cases, this approach is quite straightforward, even in use already by

control engineers:
For a counter current heat exchanger Figure 25 the steady state heat balance may
be written as:
Figure 25. Counter current heat exchanger
841
Q  ma c pa Ta1  Ta 2   ma c pb Tb1  Tb 2 
Q U * A
Ta1  Tb 2   Ta 2  Tb1 
 T  T  
ln  a1 b 2 
 Ta 2  Tb1  
with :
m  mass flowrate
c pi  specific heat of fluid i
Tij  temperature stream i at position j
U  overall heat trasfer coefficient
A= area
Having the mass flow rates, the specific heats and temperatures we may calculate
the U*A as a measure of the heat transfer capacities. Since we are interested in
variations of U in time, we may postulate a fouling mechanism that follows an S-
curve (sigmoid) in time; at first fouling will have a small effect, if the fouling layer
increases in thickness parts may break of. Identifying this model on-line, we obtain
continuous information on the change in heat transfer capacity and may calculate an
optimum point in time where the bundle has to be removed for cleaning, rather than
simply carrying out a planned clean out on calendar basis. Experience shows that
such condition-based replacements may increase the run length with 50 -100 %
compared with the conservative time based approach.
Note that, in this case, we use process measurements that frequently will already be
available for control purposes. Secondly, the model outcomes will readily be
accepted by engineers, the more so, since they directly may be associated with
production losses.
Suitable condition measurements for clean-out operations of centrifugal compressors

are the polytropic head and /or efficiency:
 n 1

Z avg RTs  P2  n
Hp   1
n  1  P1  
MW . 
 
n
H p  polytropic head, kJ/kg
Z avg  av. compressibility factor, dimensionless
R  universal gas constant, 8.314kJ/kmol.K
Ts  suction temp., K
MW  mol. weight, kg/kmol
n  polytropic exponent, dimensionless
P1  suction pressure, kPa
P2  discharge pressure, kPa
842
In a petrochemical process most of these parameters are available since they are
required for process control purposes. Again a time dependency as described above
may be used to allow on-line identification of the differences between initial and
actual compressor characteristics. For reciprocating compressors, an analogous
pressure-velocity (PV)17 approach has proven to be a very effective technique but
requires additional, fast and accurate pressure transmitters.
In a similar way, one may identify on-line the pump characteristics of Figure 5 albeit
that the actual working point now also depends on pumping speed, line (process)
resistance and the throttling effect of the discharge control valve. Changes from the
original design curve indicate impeller wear or significant fouling.
The above shows that the application of control engineering principles in the asset
management area has significant scope. In fact, why do control engineers apply
advanced control on process variables to keep the process optimally running but
neglect control of the underlying equipment reliability, without which production will
come to a grinding halt? Where the examples above are quite easy, we have to
realise that extending this view to common problems like bearing or seal failure will
not be straightforward and will take quite some research. However, even casting the
available written documentation in a crude POF model will at least indicate the major
variables causing failure such that they may be better controlled. A similarity shows
up here with the stabilising control layer of Figure 1
PLANT MONTHLY
WIDE YEARLY
PLANT ASSET
MGMT
Critical MONITOR, WEEKLY
equipment OPTIMISE
Care, non-critical STABILISE DAILY
equipment
Figure 26. Maintenance layers
At the lowest layer of the asset management pyramid (Figure 26) we find daily
activities like routine visual inspections, cleaning, lubricating, greasing and repair of
non-critical items that are analysed in the MRP to be correctively maintained. This is
a significant (~40 -60%) of the maintenance effort for which modelling does hardly
play a role. The “care” aspects are necessary to keep the equipment running; there
is no need to quantify the consequences of neglecting this step. The effectiveness of
maintenance on non-critical items is mainly governed by the “bought-in reliability” of
high quality components reducing the required intervention frequency. This is also
the place where equipment has to be “controlled” to remain in the operating window
that the designer took into account. Without such control, we will observe
unexpected failures; the causal reasons of which to be detected only by time
consuming root cause analysis. One may expect that this equipment stabilising
control in the future will increasingly be carried out by dedicated built-in condition
measurement systems, rather than the traditional human observation approach.
17
The dynamic pressure change inside a cylinder.
843
For critical equipment this health monitoring needs to be more detailed. Existing
insight in the POF is updated, based on measured information as described above,
such that cost effective interventions can be planned in a timescale fitting within the
organisation. The frequency of updating has to be in line with the dynamics of the
degradation mechanism and usually will be in timescales of weeks to months. In
terms of modelling, the MRP necessarily has to be based on reliability bank data,
information from OEM‟s and / or engineering judgement and thus will give better
information on effectiveness than on failure evolution in time and optimal timing of
intervention. Predictive maintenance, underpinning this timing, needs to be based on
a POF model updated by measured, physical information.
An overall plant model is required in order to transform the consequences of failure

of (sub)-parts of the system to economic consequences for plant production. This
model should also take into account alternative ways to reduce the effect of loss of
capacity, for instance, for an oil or gas production platform forming part of a cluster.
11. Achieving the Model Results in Practice
For academia and PhD students, the paper work of a maintenance / asset
management optimisation model will be the final outcome. Industrial engineers,
however, have to ensure that the prognosticated benefits will actually be reached.
One important aspect to be mentioned here, as an important side-issue of

maintenance modelling, is that of maintainability. Textbooks and scientific papers
invariably overlook its importance, simply specifying a number for downtime (elapsed
time between stop due to failure and hand-over in running mode to Operations). The
military environment uses DEFSTAN 00-40. Blanchard [51] covers the aspects of
accessibility, manoeuvrability, transport routes and knowledge / skill requirements of
mechanics, a.o. with anthropomorphic diagrams.
Most engineers do not realise that an x % reduction in downtime has a comparable

effect on long term average availability as an x % increase in MTTF (an investment
in “bought-in” reliability). In practice [3], the observed actual downtime is about a
factor 2.5 – 3 larger than the wrench time (Figure 27) . Hence, a significant part of
the effect of maintenance optimisation may be lost in practice, if this ratio is not
properly managed.
Figure 27. Active repair (wrench) time versus downtime
844
We then have the situation that an optimal plan developed at the tactical layer of AM
(Figure 2) is jeopardised by inadequate management at the operational layer. Shell
Pernis Refinery, the largest refinery in Europe, in their strive for the Shell world-wide
Flexible Flagship Programme, recently showed that the active wrench time of its
mechanics workforce could be doubled [52] by reducing the complexity of
procedures. The latter was tacitly increased over the years after accident and
incident investigations, not leading, however, to a significant change in statistics.
With management attention now focussed on technicians own responsibility,
downtime is minimised and direct costs are expected to be reduced by some 25%.
Similar observations may be made to the value and clarity of information in the
MRP‟s handed over from the tactical to the operational layer and its feedback. In
many cases, the threshold value (condition) upon which maintenance needs to be
scheduled is not clear and thus open to subjective action. Frequently, the mechanic
is not structurally asked to report on items, which at the time of development of the
MRP had to be guestimated. The value of field data is underestimated; we observe
in a number of cases that, at the introduction of a new CMMS, the maintenance
manager is incapable of convincing the company on the necessity of additional
budget to convert the existing database to the new system, such that many years of
recorded experience are lost.
12. Conclusions
Obviously, a proper understanding of reliability engineering theory, in combination

with effective decision support tools, is essential for maintenance engineers to fully
grasp the consequences of equipment (component) failure on the output of a
system, as well as to understand the pro‟s and con‟s of different maintenance
strategies. However, it appears that the reliability engineering community has failed
at large to convince the engineering community of its value. This is in strong contrast
with a similar engineering discipline as process control where the gap between
theory and practice has been levelled already for many years.
Maintenance modelling, the subject matter of this book, may be considered as the
vehicle in decision support tools to apply theoretical reliability engineering concepts
in practice, both in the design phase, as well as in actual operation.
Design type models, covering the lifetime of the plant, are based on (Weibull) data
from reliability data banks, in-house experience or derived from Original Equipment
Manufacturers (OEM) data. The granularity of this information is low, mostly at
equipment level in terms of MTTF given a “sound” maintenance regime. The
epistemic uncertainty may be estimated to be in the order of some 20%. Together
with the pressure (time, money) in the design project, this means that fundamental
decisions on system structure, lay-out and equipment will be taken with scarce
information. To transform the initial broad-brush approach from the design model into
a maintenance reference plan (MRP, where future strategies are underpinned), the
level of detail needs to be extended to that of critical failure modes for which both
external and in-house data banks normally have no precise data. We therefore meet
again considerable uncertainty in Weibull parameters.
We have shown that usage factors, like temperature, loading and type of operation
have a strong influence on the failure process and thus on its characterising
845
parameters. Equipment manufacturers have limited access to operational data and

thus have to increase their know-how mainly from guarantee and contractual service
work. Independent studies have showed that manufacturer specifications for
electronic equipment may differ by an order of magnitude from that observed in
practice. In-house computerised maintenance management systems (CMMS) at best
record times to failure at the equipment level, in some cases down to failure mode
but do not take into account influencing process variables.
The epistemic parameter uncertainty strongly influences the results of decision

support tools. Whereas software tools may indicate an optimum age for replacement
with a number of decimal places, a more prudent investigation shows that the most
important information produced by such a model lies with the calculation of (cost,
availability) effectiveness. The timing is directly related with the Weibull scale
parameter; the uncertainty for critical items with a MTTF of 5 -10 y thus will be large.
Human factors in quantitative analyses like in the calculation of SIL norms are
neglected, but have serious consequences in testing of high reliability safeguarding
equipment. Academic publications mostly leave out these inherent shortcomings or
consider them as items to be improved. More practically oriented books and
publications treat these problems qualitatively, but mostly proceed from there in a
standard reliability engineering approach. To some extent, the image is created that
the reliability engineering community is aware, but does not openly presents its
inherent flaws.
Commercial decision support tools use a degree of flexibility in modelling that

clashes with the uncertainty in input data. This fact, in combination with the lack of
skill of, and the error proneness of modelling by a practical engineer, raises a
warning signal for inexperienced users; the more so, since the modelling outcomes
will hardly ever be independently checked.
The stochastic models used in reliability engineering are not in line with the more
deterministic engineering mindset, the latter reasoning in terms of causal relations
and disliking uncertainty. A combination of a physics of failure (POF) model with
standard reliability engineering theory improves both the robustness of decision
support tools as well as the acceptance by practical engineers. The approach may
be borrowed from that in control engineering where control, optimisation and model
identification of complex units has demonstrated significant economic gains.
Predictive (condition-based) maintenance nowadays is mainly based on measuring

the consequences of the deterioration process, taking action if a (sometimes,
subjective) threshold is passed. Taking into account the probability of detection, the
assumed trend in time and the lack of diagnostic power, it thus easily fails the criteria
of predicting specific activities over a period that safely may be scheduled. Results in
practice show wide scatter.
CBM based on physics of failure models, with parameters identified from process
measurements, is more robust than that using add-on instrumentation. In some
cases, these physics oriented models are already in place for advanced control of
process units where the optimum is defined by process constraints like the maximum
heat transfer of a cooler (fouling), fluid transport by a pump (wear) or transport of
vapour (compressor fouling, wear). Reliability engineering thus will strongly benefit
from closer cooperation with topical (mechanical, electrical, material) engineers to
846
develop decision support tools that combine physical background with sound
stochastic analysis.
For reliability engineering to become an accepted, practical discipline that convinces

executives and provides good career opportunities to attract bright candidates in
maintenance engineering, its benefits have to be demonstrated in practice.
Maintenance modelling then is a small part only of the total Asset Management
problem where goal orientation, steering, organisation, motivation and quality
requires transparent and unambiguous information to facilitate decision-making at all
levels.
References.
1. C. F. H. Van Rijn, "A Systems Engineering Approach to Reliability, Availability

and Maintenance," in Foundations of Computer-Aided Operations, Park City,
Utah, USA, 1987.
2. C. F. H. Van Rijn and P. Scholten, "Integral Management of Production Assets,"
Maintenance, vol. 11, pp. 3-14, June 1996.
3. C. F. H. Van Rijn, "The status of Asset Management in a number of Dutch
Industries and Service Companies," in 37th ESReDA Seminar on Asset
Optimisation and Maintainability Baden, Switzerland, 2009.
4. C. F. H. Van Rijn, "Maintenance Modelling and Applications; lessons learned ."
in 32nd ESReDA Seminar Maintenance Modelling Sardinia, Italy: ESReDA,
2007.
5. R. Dekker and C. F. H. van Rijn, PROMPT-a decision support system for
opportunity maintenance. vol. 154. Berlin: Springer Verlag, 1996.
6. H. A. Watson, "Lauch Control Safety Studies," Bell Labs1961.
7. F. S. Nowlan and H. F. Heap, "Reliability-Centred Maintenance," Office of
Assistent Secretary of Defense, Washington CD 20301, Final MDA 903-75-C-
0349, Dec. 29 1978.
8. R. E. Barlow and F.Proschan, Mathematical Theory of Reliability. New York:
John Wiley and Sons, 1965.
9. Z. W. Birnbaum, "Multicomponent Systems and Structures and their Reliability,"
Technometrics, vol. 3, pp. 93-109, 1961.
10. J. E. Ziegler and N. B. Nichols, "Optimum Settings for Automatic Controllers,"
Transactions of the ASME, vol. 64, pp. 759-768, 1942.
11. H. H. Rosenbrock, "Design of multivariable control systems using inverse
Nyquist Array," Proc. IEE, vol. 116, pp. 1929-1936, 1969.
12. R. E. Kalman, "A new approach to linear filtering and prediction problems,"
Journal of Basic Engineering, vol. 82, pp. 35-45, 1960.
13. M. Athans, "Stochastic Robustness of Linear-Time-Invariant Control Systems,"
IEEE Trans. Automatic Control, vol. 36, pp. 82-87, 1971.
14. E. Pistikopoulos, T. A. Mazzuchi, and C. F. H. Van Rijn, "Flexibility, reliability
and availability analysis of manufacturing processes," Computer Applications in
Chemical Engineering, p. 223, 1990.
15. C. G. Vassiliades and E. N. Pistikopoulos, "On the interactions of Chemical
Process Design under uncertainty and maintenance optimisation," in Annual
Reliability and Maintainability Symposium, 1998, pp. 302-307.
16. C. G. Vassiliades and E. N. Pistikopoulos, "Chemical process design and
maintenance optimisation under uncertainty: A simultaneous approach.," in 1999
Annual Reliability and Maintainability Symposium, 1999.
847
17. H. D. Goel, J. Grievink, P. M. Herder, and M. P. C. Weynen, "Integrating

Reliability Optimization into Chemical Process Synthesis," Reliability
Engineering and System Safety, vol. 78, pp. 247 - 258, 2002.
18. D. H. Timmer, "An Analysis of the Reliability of an Incandescent Light Bulb,"
Quality Engineering, vol. 13, pp. 299-305, 2000.
19. [P. J. L. Hesen, R. Otto, and J. denBoer, "Spanningskwaliteit in Nederland,
resultaten 2008 (Powerquality in the Netherlands, results 2008)," KEMA
Nederland B.V. Arnhem 2009.
20. U. Dinesh Kumar, Reliability, Maintenance and Logistic Support: A Life Cycle
Approach. Norwel Massachusetts USA: Kluwer Academic Publishers, 2000.
21. NASA, "The NASA ASIC Guide, Appendix 7,
http://klabs.org/DEI/References/design_guidelines/content/guides/nasa_asic_gui
de/Appendix.7.html," 1993.
22. D. Dupont and L. Litz, "Lokalisierung und Analyse von Fehlerquellen beim
Numerischen SIL-Nachweis," atp - Automatisierungstechnische Praxis, vol. 2,
pp. 62 - 67, 2008.
23. G. Lundberg and A. Palmgren, "Dynamic Capacity of Roller Bearings.," Ing.
Vetanskaps Akad. - Handl., 1952.
24. NSWC, Handbook of Relaibility Prediction Procedures for Mechanical
Equipment. West Bethesda, Mayland USA: Naval Surface Warfare Center,
2007.
25. J. Lieblein and M. Zelen, "Statistical Investigation of the Fatigue Life of Deep-
Groove Ball Bearings.," Journal of Research of the National Bureau of
Standards, vol. 57, pp. 273 - 318, 1956.
26. T. A. Harris and J. I. McCool, "On the Accuracy of Roller Bearing Fatigue Life
Prediction," ASME Jour. of Tribology, vol. 118, pp. 297 - 310, 1996.
27. [A. M. Al-Abdan, "How Unbalance Affects Bearing Life," http://www.plant-
maintenance.com/articles/unbalance.shtml 2009.
28. A. E. Stavale, "Reducing Reliability Incidents and Improving Mean Time
Between Repair," in 24th International Pump Users Symposium, College
Station, Texas USA, 2008.
29. B. Schiavello and F. C. Visser, "Pump Cavitation - Various NPSH Criteria, NPSH
Margins and Impeller Life Expectancy," in 24th International Pump User
Symposium, College Station, Texas USA, 2008.
30. W. NcNally, "Preventing Seal Failure," http://www.mcnallyinstitute.com/10-
html/10-4.html, 2009.
31. E.Roosch, "Seal Systems Reliability - a Seal Maker's Viewpoint," in Reliability of
Sealing Systems for Rotating Machinery, IMechE, Ed. London UK: Professional
Engineering Publishing Ltd, 2000, pp. 45 - 80.
32. V. Narayan, J.W.Wardhaugh, and M.C.Das, 100 Years in Maintenance and
Reliability; Practical lessons from three lifetimes at Process Plants. New York:
Industrial Press, Inc., 2007.
33. H. P. Bloch and F. K. Geitner, An Introduction to Machinery Reliability
Assessment. New York, USA: Van Nostrand Reinhold, 1990.
34. J. Piper, "Motor Maintenance Matters," http://www.facilitiesnet.com/hvac/article/
Motor-Maintenance-Matters--1917, 2003.
35. A. W. Moene, "Availability Assurance in Capital Investment Projects at Shell
Global Solutions International B.V.," Glasgow Caledonian University, 2000.
36. SINTEF, Offshore Reliability Data Handbook 4th Edition. Høvik: Det Norske
Veritas 2002.
37. EDF-DER/SPT, EIREDA European Industry Reliability Data Handbook. Varese:
C.E.C.-J.R.C./ICEI, 1991.
848
38. N.Marse, D. Soute, J. Kop, and C. F. H. Van Rijn, "Design for Realiability and
Availability," World Class Maintenance Consortium, Breda, The Netherlands
2008.
39. Sipke E. van Manen and J. v. d. Bogaard, "Living PAM," in This book.
40. E. E. Lewis, Introduction to reliability engineering. Second edition. New York:
John Wiley & Sons, 1996.
41. R. B. Abernethy, The New Weibull Handbook. North Palm Beach, Florida, USA,
2006.
42. H.P.Bloch, "Use equipment failure statistics properly," Hydrocarbon Processing,
vol. 78, Jan. 1999 1999.
43. A. K. S. Jardine and A. H. C. Tsang, Maintenance, Replacement and Reliability.
Boca Raton: CRC Press, 2006.
44. P. D. T. O'Connor, Practical Reliability Engineering. Chicester: John Wiley &
Sons, 2001.
45. V. Narayan, Effective Maintenance Management; Risk and Reliability Strategeis
for Optimizing Performance. New York: Industrial Press Inc, 2004.
46. H. Procaccia, Guidebook on the effective use of safety and reliability data. Paris:
SFER, 1995.
47. ESReDA, Handbook on Quality of Reliability Data. Hovik: Det Norske Veritas,
1999.
48. J. P. Signoret, "High Integrity Protection Systems (HIPS) – Making SIL
Calculations Effective," Touch Oil and Gas2007.
49. Anon, "Application of IEC 61508 and IEC 61511 in the Norwegian Petroleum
Industry," OLF2001.
50. G. Gigerenzer, Reckoning with risk. London: Penguin Books, 2002.
51. B. S. Blanchard, Maintainability, A key to effective serviceability and
maintenance management. New York: John Wiley & Sons, 1995.
52. Anon, "Pernis zet ballast overboord (Pernis puts dead weight overboard)," Shell
Venster (in Dutch), pp. 14-16, November / December 2009.
849

Chapter 11 Maintenance Modelling As A de

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 11 Maintenance Modelling As A de

Uploaded by

Copyright:

Available Formats

Chapter 11: Maintenance Modelling as a Decision Support Tool

Maintenance Modelling as a Decision Support Tool; A Critical

Cyp F.H. van Rijn

2. Maintenance Engineers as Our Clients.

In a previous paper we have presented[3] the results of a questionnaire on the level

 AM has a strong economic dimension; maintenance managers repeatedly decide

3. Reliability Versus Control Engineering.

Reliability Engineering covers a wide field of reliability, availability, maintainability,

A similar situation can be observed in the field of process control engineering.

supervisory MIMO seconds –

Figure 1. Schematic view of process control layers.

Modern process control is fully accepted in Industry at large. Whereas control

This situation differs fundamentally from that in maintenance. Reliability engineering

Table 1 [2] lists a number of characteristic differences as a kind of “consumer report”

4. Types of Decision Problems

The maintenance reference plan (MRP) provides guidance on the strategies to be

In the operational phase, information on actual behaviour becomes available that

Risks / Opportunities specification

Maintenance & Operating Reference Plan performance bench-

Figure 2. Asset management layers

5. Effect of Usage and Operational Conditions

5.1 Reliability of Incandescent Lamps

MTTF 1 ~ P0 exp H vap / RT 

where P0 is a constant, R the universal gas constant, T temperature in K and ∆Hvap,

Figure 3. Weibull plot of incandescent lamp

3.25 v 3.5 v 4.0 v 4.75 v

5.2 Reliability of Electronic Equipment

λi = the failure of the i-th part

using estimated “pi factors” with a value from 0 to 1 accounting for:

Especially the MIL-HDBK-217 approach is strongly criticised. As an example, Kumar

Figure 4. MIL-HDBK-217 vs. field data

Quoting a NASA document [21]:

Ambient temperature plays a strong role on the lifetime of electronic equipment; in

Dupont and Litz [22] reported on an investigation by NAMUR, an international user

group PFD from vendor data 70 % conf. interv. PFD observed

Table 3. NAMUR PFD values for control loop sensors

From the above we conclude that:

 Operational aspects like temperature have a profound influence on the

5.3 Pump Reliability

Electric driven centrifugal pumps make up an important subclass in many industrial

C = basic load rating, dynamic / static (manufacturer specification)

Figure 5. Q-H diagram

 Misalignment and machinery vibration causes high additional stresses (P)

Figure 6. Cavitation damage

From the above we conclude that:

Scouting BOD BDP PS Implementation Commissioning,

Figure. 7. Typical steps in process design

6. Dealing with RAMS in Design

75% Relative volume of

Figure 8. Degrees of freedom versus costs in process design.

 Economic value of product, investment and overall operational costs.

In reliability engineering terms this means that an equipment representation with

During the Implementation (also known as Engineering, Procurement and

Thus, in a capital intensive industry like petrochemical, where system

More detailed maintenance and inspection schemes have to be developed

7. The Role of the Reliability Engineering Community

in presentations as they are often misinterpreted and misapplied”. The

 Equipment ageing and maintenance (1992)

In addition, the habit of presenting results with a large number of (superfluous)

7.1 Consequences for Decision Support Tools

The calculated optimum replacement interval is 3.5 years at an average cost of €

Figure 10. Risk profile

Such a graph provides important information for the maintenance engineer on