You are on page 1of 7

Reliability Engineering and System Safety 76 (2002) 319325

www.elsevier.com/locate/ress

Stopping time optimisation in condition monitoring


Tony Rosqvist*
VTT Automation, Industrial Automation, P.O. Box 1301, 02044 VTT, Helsinki, Finland Received 24 November 2001; accepted 9 February 2002

Abstract Automated condition monitoring of active components of a system can improve the cost-efciency of preventive and corrective maintenance and the availability of the production system. The validity, reliability and correct interpretation of the signals obtained from the condition monitoring instrumentation is important for the realisation of the potential benets. The utilisation of experts in the interpretation of the condition monitoring signals is therefore crucial. In the paper, a stopping time model is formulated, where experts' judgements on the remaining operating time of a component, given an indication of incipient failure, are utilised to arrive at optimal operational maintenance decisions. Optimality is dened in the sense of maximising expected utility. An expert model is also formulated, which utilises percentile information elicited from the experts. The modelling framework allows for the testing of different modelling assumptions which affect the decision outcomes. q 2002 Elsevier Science Ltd. All rights reserved.
Keywords: Condition monitoring; Stopping time optimisation model; Expert judgement; Predictive maintenance

1. Introduction Short-term operational maintenance decision-making based on condition monitoring has to take into consideration the predictive power of the observations made. Basically, observations can be made through process and condition monitoring. Especially, automated physical measurement of the condition of equipment has lately been increasingly implemented for predictive maintenance. In this case, the validity, reliability and the correct interpretation of the readings of the sensors are essential for the benets of condition monitoring to realise. The signal validation project [1] in the Halden Reactor Project, develops on-line methods to verify the correctness of the signal received from the sensors. Wrong judgements from inspections and tests, or wrong interpretation of measurements due to faulted or miscalibrated instrumentation might lead to operative decisions, which are non-optimal in terms of costs due to excessive production losses or repair. Operational maintenance decisions also have safety implications in many cases, stressing the importance of proper decision-making. In the case of automated condition monitoring, we are confronted with the operational maintenance decision problem of continuing operation with a degradation in the
* Fax: 1358-9-456-6752. E-mail address: tony.rosqvist@vtt. (T. Rosqvist).

active equipment or component, given that an indication of an incipient failure has been received from the sensors. We can distinguish between outcomes of a certain operational maintenance decision as shown in Table 1. The basic temporal realisations of the functional breakdown of a component, denoted by t1 and t2, are shown in Fig. 1 together with the time points related to the decision options. The earlier described short-term evaluation of the decision outcomes does not take into account the longterm `costs' of the different outcomes on safety and work culture such as compliance with safety rules and adherence to the quality of maintenance work. These long-term aspects have to be considered in the decision-making also and are usually expressed as design and operative constraints of systems [2]. Ideally, we can argue that from the point of view of the owners of a production system, the optimal design of a system or a component is such that its lifetime coincides with the scheduled time for replacement or overhaul. Roughly speaking, a system or a component should be good enough to perform as planned (planned output/input and lifetime), but not better (waste of resources). By introducing proper condition monitoring, the production personnel/manager can obtain information about unexpected discrepancies in the condition of the components of a system and make risk-informed maintenance decisions given this information. The rationale is that the cost induced

0951-8320/02/$ - see front matter q 2002 Elsevier Science Ltd. All rights reserved. PII: S 0951-832 0(02)00026-1

320

T. Rosqvist / Reliability Engineering and System Safety 76 (2002) 319325

Nomenclature s s0 t tp L J(t ) C c cu cl r T X x y z v(t uz) u( J ) operating time time of incipient failure indication stopping time, control variable optimal stopping time time of scheduled shutdown prot function cost, random variable a realisation of C upper bound of C lower bound of C constant income rate failure time, unknown but observable random variable (n 1) random vector of expert judgements on T (n 1) vector x x1 ; ; xn as elicited from the experts, a realisation of X (n 1) vector y y1 ; ; yn of measures of uncertainty as elicited from the experts (n 2) vector of expert judgements; z x; y 0 ; zi xi ; yi is the ith element value of t given information z utility of receiving J

a , b , g positive constants specifying the utility function u() fT uz tuz probability density of T conditional on z Ftuz tuz cumulative probability function of T conditional on z FC(c) cumulative probability function of C n denotes the number of experts Xi random variable representing experts i judgement on the median of T xi a realisation of Xi hi a random variable representing the bias and spread of expert i's judgements si standard error of the logarithm of expert i's judgements Sz the co-variance matrix of experts' judgements on T aij elements of the inverse matrix Sz 21 r correlation coefcient of experts' judgements n 2 T .50 the prior estimate of the median of the failure time T .95 the prior estimate of the 95%-percentile of the probability distribution of T
and interprets the results in terms of the basic operational maintenance decision options. 2. Stopping time model 2.1. Stopping time Basically, we will ask how should the stopping time t be determined given an incipient failure indication at operating time s s0 : The theoretical domain of t is the positive real line [0,1), where the starting point coincides with the operating time s0. If, an optimal stopping time t p is found, then the following decision rule is applicable if tp $ L 2 s0 continue operation until s L 1 else continue operation until s s0 1 tp
level of degradation
level of functional breakdown

by implementing condition monitoring is out-weighted by the benets of the information that it produces. The assessment of the validity of this rationale is a task related to longterm maintenance planning [3]. In the following, we will utilise expected utility (EU) theory in the formulation of a stochastic stopping time model [4]. The stopping time model is utilised for operative maintenance decision-making when an incipient failure has been detected. This entails the modelling of the decisionmakers subjective risk-attitude, on one hand, and the modelling of experts' judgements on the failure time, given the incipient failure indication, on the other hand. Expert judgement will be used in a direct way: experts are asked to provide percentile information on residual lifetime of the component given the indication of incipient failure. Thus, the modelling will focus on the use of expert judgement rather than on physical degradation processes [5]. The optimal maintenance decision in terms of the optional basic (generic) operational maintenance decisions in Table 1, will be concluded on the basis of the optimal stopping times derived from the stopping time model. Optimality is dened in the sense of maximising EU [6]. In Section 2, the operational maintenance decision problem is formulated as a stopping time optimisation problem. Section 3 describes a probabilistic expert judgement model. An emphasis is put to modelling dependence between these judgements by specifying a probabilistic model based on the multivariate normal data model. Section 4 demonstrates the stopping time model through an example

s0
Immediate shutdown

t1
Post-planned shutdown

t2
Scheduled shutdown

operating time

Fig. 1. Possible failure times t1 and t2 together with time points related to maintenance decision options. The detection time of an incipient failure is denoted by s0.

T. Rosqvist / Reliability Engineering and System Safety 76 (2002) 319325 Table 1 Operational decision outcomes related to prime costs given incipient failure indication Short-term operational maintenance decisions Immediate shut-down Run to post-planned shut-down Run to next scheduled shut-down Failure develops into functional breakdown t1: Very high cost (production loss and repair cost) t2: Very high cost (production loss and repair cost) Failure does not develop into functional breakdown High cost (prompt production loss and high extra maintenance cost) Moderate cost (planned production loss and extra maintenance cost) No extra cost

321

where L denotes a pre-dened constraint, e.g. an overhaul is planned at operating time s L: Thus, the relevant domain of t is t [ 0; L 2 s0 : The stopping time t is basically a control variable that we want to optimise. The basis for the optimisation is derived from long-term strategy, formulated in terms of expected prot (EP) and EU dicussed text. 2.2. Prot function The prot function has two parts: (1) income from the production and (2) costs related to discrete events and decisions (e.g. decision to inspect after an indication of incipient failure). Income is considered to ow at a constant rate r . 0 when the system is operating. Costs are usually related to the occurrence of functional breakdowns during operation before the pre-planned overhaul. Basically we can distinguish between two realisations of the prot function depending on the realised failure time t of the component given that an incipient failure indication was observed at time s0 and the stopping time is t . ( r s0 1 t if t , t 2 J t r s0 1 t 2 c if t # t where c is a realisation of the random cost variable C related to the functional breakdown. 2.3. Maximum expected utility According to EU theory we can associate a utility function u() with the prot function such that the risk attitude of the decision-maker is explicitly modelled. The value associated with stopping time t is to be represented by the conditional EU as vtuz EuJ tuz 3 where we have conditioned the EU on the expert judgements z. These judgements are elicited at the time of the occurrence of the incipient failure indication. Especially, we will discuss means to utilise expert judgements in the assessment of the probability distribution of the failure time T. The stopping time optimisation problem is now to choose t such that the expected lifetime utility of the system considered is maximised, i.e. the optimal stopping time

problem can be stated as

t:

max vtuz

s:t: t [ 0; L 2 s0

The analytic form of the EU in Eq. (3) can be written as Zt Z cu EuJ tuz fT uz tuz ur s0 1 t 2 cdFC cdt
0 cl

1 1 2 FT uz tuzur s0 1 t

Eq. (5) contains subjective values of the decision-maker, as expressed by the utility function u(), and expert judgements on failure time, as expressed by the conditional probability density of the failure time fT uz tuz and its cumulative form FT uz tuz: These conditional probability functions will be interpreted as posterior probability distributions. For simplicity, we will assume that the probability distribution FC(c) and the lower and upper limits, cl and cu, are known. 2.4. Utility function We will assume that the utility function u(x) satises the following properties: monotonously increasing with prot xx $ 0; i.e. u 0 (x) . 0 ( 0 denotes taking the derivative) decreasingly risk averse, i.e. u 000 (x) . 0 These properties can be argued to be valid in most economic decision-making contexts. Several function families satisfy the above properties [6]. Here it sufces to consider the linear-exponential utility function given by u x a 1 bx 2 e gx 6

where a , b , g are arbitrary positive constants. It is easy to verify that the above properties are met. Basically, the parameters are uniquely determined by eliciting preference information from the decision-maker [6]. It is, however, argued that such information is unstable over time and difcult to elicit at the point where decisions should be made. It is therefore motivated to study the impact of uncertainty also in the parameters of the utility function. This is demonstrated in the simulation example in Section 4.

322

T. Rosqvist / Reliability Engineering and System Safety 76 (2002) 319325

3. Model for expert judgement 3.1. Basic model for expert judgement Combining the judgements of several unbiased experts adds to the precision of the failure time estimate. This effect is, however, downplayed by the fact that experts' judgements are dependent. Independent judgements would mean that the experts would not share any available information at all. Common pool of information, however, typies expertise. Therefore, it is expected that experts interpret an incipient failure indication similarly and, consequently, there should be a positive correlation between their judgements. First, we will dene a probabilistic expert judgement model, which relates the experts' judgements xi to the true, but unknown failure time t. The basic expert judgement model is depicted by the inuence diagram in Fig. 2. The random variables F i ; i 1; ; n are interpreted as expertspecic `noise' parameters, which reect the performance of the experts in terms of bias and spread [7]. The probability model of the inuence diagram in Fig. 2 can be written as pTX1 ;;Xn F 1 ;;F n t; x1 ; ; xn ; f1 ; ; fn pT t
n Y i1

the use of acceptancerejection sampling of the likelihood function. Next, we specify our expert judgement model on the basis of the multivariate normal data model. This will lead to analytically tractable equations when additional assumptions are made regarding the noise parameter F i. 3.2. A specialisation of the basic expert judgement model Each expert's judgement is dened by X i tF i t; Xi ; F i [ 0; 1 i 1; ; n 8 9 Taking the logarithm of Eq. (8), i.e. lnXi lnt 1 lnF i suggests a specication based on the normal distribution such that we may assume the logarithm of the experts' judgements Xi to be normally distributed conditional on t. If no evidence of the performance of the expert is available, it is natural to assume that the bias of each expert is zero. Therefore, we can dene the noise parameter of the basic expert judgement model in Fig. 2 as   lnF i , N 0; s i2 10 To dene the parameter s i2 we can ask the experts to give their judgement on the q-percentile yi depicting the uncertainty of their judgements. An expert who is more vague would be associated with a `noise' parameter with a larger s i2 : Thus, the uncertainty they feel about their judgements is reected in the xed parameters s i2 ; derived from the information zi xi ; yi : In the following, we will assume that 0 , s i , 1 for reasonable self-assessments. A practical procedure is described in Section 4. This approach is unorthodox in the sense that we ask experts of information that we use to specify a probabilistic model of their primary judgements; the point estimates xi, interpreted as the median of the probability distribution related to Xi. This is, however, motivated by practical reasons, as we generally do not have any track data available, determining the performance of the experts more objectively. The expert model based on normality assumptions provides means to incorporate correlation in a straightforward way. Using vector notation we can associate a multivariate normal data model with the logarithms of the experts' judgements, i.e. lnX , N XutI; S z 11 where Sz is the co-variance matrix of the experts' judgements containing the correlation r ij in the non-diagonal elements, and I is the identity matrix. The subscript z in Sz emphasises the fact that the covariance matrix is dened by experts' judgements. The multivariate normal distribution N(TI,Sz) is now a specication for the likelihood in Eq. (8). The expression for the prior in Eq. (8) is now reduced to the prior distribution related to t. Thus, the unnormalised posterior (predictive)

pF i fi

n Y i1

pXi xi ut; fi

The last product term is interpreted as the likelihood, and the expression in front of the likelihood, as the prior. The experts' judgements are modelled as stochastically independent, given t and h i. Basically, any probability distribution with an appropriate domain would do as a specication for the likelihood function. Generally, the likelihood function would require

X1

X2

Xn

T
Fig. 2. The inuence diagram of the basic expert judgement model.

T. Rosqvist / Reliability Engineering and System Safety 76 (2002) 319325 Table 2 Expert judgements specifying expert-specic probability distributions and the parameter values of the respective distributions Expert 1 2 xi 20 30 yi 50 60 Table 4 Two cases of different correlation between the experts' judgements Case A B

323

mi
3 3.4

si
0.47 0.35

r
0 , Uniform (0.5, 1)

distribution of the failure time is p T uz tuz / pT tN xutI; S z 12

If the prior distribution of ln t is assumed normally distributed, i.e.   2 13 pT t : ln T , N mT ; s T then, by conjugacy, the posterior predictive distribution is also normal [8]. It can be shown that the posterior predictive distribution is dened by   pT uz tuz : lnT , N mt z; s t2 z 1=2
n X n X i1 j1

aij zi ; zj lnyi yj aij zi ; zj 121 aij zi ; zj A 14

mT z
0
2 sT z

1=s t2 1

n X n X i1 j1

@1=s t2

n X n X i1 j1

21 where aij() are the elements of the inverse matrix S z : The details of the derivation of Eq. (14) can be obtained from the author. The posterior probability distribution related to the cost variable can be derived using the above expert model if the domain of C is equally dened as for T.

which is measured from the time of indication. Table 2 shows the judgements given as percentiles related to the expert-specic probability distributions. The percentile points used are the 50%- (median) and the 95%-points (others could also be selected). These two percentile points dene the expert-specic probability distributions uniquely. The prior probability distribution of the failure time corresponds to possible probabilistic design specications. For instance, it would be plausible that initially, the active component is assumed to function properly to the next shutdown with probability 0.95, but for a time interval corresponding to two shutdown periods only with probability 0.5. Table 3 shows the percentile points and the corresponding distribution function parameter values. It has to be noted that this prior knowledge has to be adjusted to reect the fact that the particular active component has survived to time s0. Basically, this entails the multiplication of the initial prior probability distribution by a constant factor. In MonteCarlo simulation such a scaling does, however, not affect the calculation of the posterior distribution, as depicted by Eq. (12). In the following, two cases are distinguished based on whether we assume the experts to be uncorrelated or positively correlated. Table 4 denes the two cases. The correlation coefcient is denoted by r . In case A, we have dened the experts judgements to be uncorrelated, i.e. r12 ; r 0: In case B, we assume the judgements to be uncertain, but positively correlated with r uniformly distributed over [0.5,1]. An overlay of the prior and the posterior probability distributions, with r 0 and r 0:9; is shown in Fig. 3.
Prior and posterior cumulative probability distributions two correlation coefficients
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3
prior: mean = 127,7 rho = 0 mean = 34,6 rho = 0,9 mean = 66,6

4. Simulation example Consider a stopping time optimisation problem for an active component whose automated condition monitoring system indicates an incipient failure at time s0 10 and the scheduled shutdown is at time L 60: Two experts are asked to give their judgement about the failure time T,
Table 3 Percentile points characterising the prior probability distribution of the failure time T .05 60 T .50 120

0,2 0,1 0 0 20 40 60 80 100 120 140 160 180 200

mT
4.79

sT
0.35

Failure Time t

Fig. 3. Prior and posterior (case A and B) cumulative probability distributions.

324

T. Rosqvist / Reliability Engineering and System Safety 76 (2002) 319325


Cumulative probability distribution of profit
1 0,9 case B: mean = 34,4 case A: mean = 53,2

Table 5 Optimal stopping times with respect to decision criteria EP and EU in case A and B

tp
Decision criterion Case A Case B EP 54.75 28.76 EU 48.81 26.72

0,8 0,7 0,6 0,5 0,4 0,3 0,2

Fig. 3 shows how the positive correlation between the experts' judgements affects the spread of the posterior probability distribution. The expected value (or mean value) of the failure time T is well below the remaining operating time L 2 s0 50 until shutdown for r 0; whereas for r 0:9 it is well above. This holds for all r values in the sampling range [0.5,1]. The cost variable c in Eq. (2) is assumed to be Triangular(15,20,40) distributed. The constant income rate is r 1: The utility function u(x) in Eq. (8) is scaled such that u0 0 and uL 1: Given the risk aversion properties required of the utility function (see Section 2.3), it is easy to show that a 1; g 21=L lnbL and b [ 0; 1=L (a , b , g are arbitrary positive constants). If the exact form of the utility function of the decisionmaker is unknown, it would be reasonable to assess the sensitivity of the optimal stopping times with respect to some utility functions representing a decreasingly risk averse risk attitude. Risk neutrality would be a suitable reference point and is obtained for b 1=L: In the example, we calculate the optimal stopping times for b 1=L and b 1=10L: The optimisation problem Eq. (4) was solved by RiskOptimizer [9], which utilises genetic algorithms in the search for the optimal value of the stopping time; t p. The optimisation of t in Eq. (4) was conducted with respect to EU and EP, where the latter corresponds to risk neutrality of the decision-maker. Table 5 shows the results obtained for the both decision criteria. In case A, we notice that the optimal stopping time, with respect to both decision criteria, is larger than or very close to the operating time left L 2 s0 50 until scheduled shutdown. Therefore, to be consistent with our planned preventive maintenance program, i.e. our long-term strategy, we should follow the decision rule in Eq. (1), i.e. set t 50; and continue operation according to the originally planned operating period, i.e. until the scheduled shutdown at s L; if we are risk neutral. If we are risk-averse, the
Table 6 EP and EU in cases A and B Case A B

0,1 0 -10 0 10 20 30 40 50 60 70

Profit value

Fig. 4. Probability distributions of prot for cases A and B.

optimal stopping time is so close to the time for shutdown that we will opt for running to shutdown. This is due to the fact that any post-planned shutdown is subject to uncertainties that are not included in the model. It has to be remembered that the results of the stopping time model are only prescriptive. Table 6 shows the optimal and adjusted (bold faced) stopping times and the corresponding outcome values in terms of EP and EU. The cumulative distributions of prot and utility values, corresponding to the values in Table 6, are shown in Figs. 4 and 5. It should be noted that the jumps in the cumulative distributions for prot occur at s0 1 tp in both cases. The results in Table 6 directly suggest time points for the post-planned shutdown in case B, where we model the experts' judgements xi as independent. The optimal stopping times suggest a post-planned shutdown sometime within the range [25,29] for a risk-averse or risk neutral decisionmaker. The different maintenance decision outcomes in cases A and B clearly show the importance of understanding the effects of the assumptions made in applying the stopping time model.
Cumulative probability distribution of utility
1 0,9 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0,1 case B: mean = 0,78 case A: mean = 0,93

tp
50.00 28.76

EP 53.2 34.4

tp
50.00 26.72

EU 0.93 0.78

0 -0,5

-0,3

-0,1

0,1

0,3

0,5

0,7

0,9

1,1

1,3

1,5

Utility value

Fig. 5. Probability distributions of utility for cases A and B.

T. Rosqvist / Reliability Engineering and System Safety 76 (2002) 319325

325

5. Conclusions A stopping time optimisation problem is formulated, where the role of expert judgements and the dependence between the judgements are focused on. Optimal stopping times are derived maximising EP and EU. A decreasingly risk-averse attitude of the decision-maker is studied. Experts' judgements about failure time are modelled using a multivariate normal data model. This allows for explicit specication of correlation between the judgements. The following effects on the decision criteria `EP' and `EU' and the optimal stopping times can be studied using the described stopping time optimisation model: The effect of dependence between experts' judgements, specically the uncertainty related to the correlation between experts' judgements based on the multivariate normal model; The effect of the decision maker's risk attitude, specically decreasing risk aversion; The effect of the prior probability distribution of the failure time. The implications of the calculated optimal stopping time to the operative maintenance decision making has to concluded from a wider perspective than that of the stopping time optimisation model. For instance, how are maintenance resources available at the time prescribed by the model? Such considerations may inuence the operative maintenance decision making as much as the results obtained from the stopping time optimisation model. The earlier effects are intertwined in a complex way. The most challenging and critical modelling task is the denition of the prior distribution of the failure time. Loosely speaking, the mean of the posterior distribution of the logarithm of the failure time (see Eq. (14)) is a weighted average of the prior mean and the experts' judgements on the median of the failure time. The weighting factors are proportional to the inverse of the variances and the co-variances. If the experts are few, vague and correlated, and the prior is informative (equal to high precision), the effect of the experts' judgements is easily muted. Therefore, it is recommendable to dene the prior probability distribution to express a vague rather than too precise an information.

The expert judgement model could be criticised for the use of expert judgements to specify the probability model related to the point estimate representing their `best guess' of the unknown failure time. It is, however, the experience of the author that experts are more willing to express their judgements in a form that reects their uncertainty on the outcome of an event rather than just point estimates. Therefore, expert judgement models that incorporate e.g. percentile information, such as the expert judgement model described in the paper, should be developed and used.

Acknowledgements The author wants to express his gratitude to Dr Jan-Erik Holmberg and Dr Kari Laakso for their comments and advice that helped to improve the paper.

References
[1] Fantoni PF. On-line calibration monitoring of process instrumentation using PEANO. Chicago: EPRI Instrument and Calibration Users Group, ANL, November 89, 2000. [2] Laakso K, Sirola M, Holmberg J. Decision modelling for maintenance and safety. Int J Comadem (Cond Monit Diag Engng Mgmt 1999) 1999;2(3):1317. [3] Laakso K, Skogberg P. Evaluation of maintenance strategies of technical systems. Enlarged HPG Meeting on Man-Machine Systems Research and High Burn-Up Fuel Performance, Safety and Reliability and Degradation of In-Core Materials and Water Chemistry Effects, HPR-352. Lyon: OECD Halden Reactor Project; 1999. maud P. Point processes and queues. New York: Springer, 1981. [4] Bre [5] Pulkkinen U. A stochastic model for wear prediction through condition monitoring. In: Holmberg K, Folkeson A, editors. Operational reliability and systematic maintenance, London: Elsevier, 1991. p. 22343. [6] Keeney R, Raiffa H. Decisions with multiple objectives. Cambridge: Cambridge University Press, 1993. [7] Chhibber S, Apostolakis G, Okrent D. A taxonomy of issues related to the use of expert judgements in probabilistic safety studies. Reliab Engng Syst Safety 1992;38:2745. [8] Gelman A, Carlin J, Stern H, Rubin D. Bayesian data analysis. New York: Chapman & Hall, 1995. [9] RiskOptimisersimulation optimization for Microsoft Excel, Windows version (release 1.0). Neweld, NY: Palisade Corporation; 2000.

You might also like