Value of Information for Complex Cost-effectiveness Models

Jeremy E Oakley Department of Probability and Statistics, University of Sheffield, Sheffield, S3 7RH j.oakley@sheffield.ac.uk March 20, 2003

Summary

We show how to perform a sensitivity analysis on an economic model, within the framework of expected value of perfect information (EVPI), based on a relatively small number of model runs. The method is considerably more efficient than Monte Carlo methods, and will be ideally suited to computationally expensive economic models; any model that requires a non-trivial computing time for one run of the model at a single set of input parameters. The basis of the approach involves the use of Gaussian processes to obtain fast approximations to the model itself and the expectations required for calculating EVPIs. We demonstrate the power of our method with a model for evaluating different treatment strategies of gastroespohageal reflux disease.

KEY WORDS: Bayesian quadrature, Gaussian process, partial expected value of perfect information.

1

so that the model gives a single output given a set of input values. the model requires input parameters to be specified by the model user. invariably. Specifically. Although some information about the values of the input variables will be available. The health status and resources used by a patient will be modelled over some period of time following treatment. Uncertainty in the input values then implies uncertainty in the output. Given a set of values for all the input parameters. and what effect each input (or combinations of inputs) has on the output uncertainty. we may wish to know the effect of an uncertain 2 . and the mean cost of treatment. the model will then state an overall measure of the efficacy of the treatment. for example. In this paper. perhaps incorporating costs such as additional hospital visits depending on the success of the treatment etc. evaluate the economic model for each set of sampled inputs. This input uncertainty is handled by assigning (subjective) probability distributions to each unknown input parameter. we will suppose that the efficacy and cost are then combined to give a single net benefit of the treatment.1 Introduction In assessing the cost-effectiveness of a proposed treatment. and then obtain a sample from the distribution of the unknown true output. a common approach is to construct an economic model of the treatment process. some related to the effectiveness of the treatment and others to resources used by patients. For example. Typically. for example the mean number of quality-adjusted-life-years (QALYs) for the patient population. there is uncertainty surrounding the values that these parameters should take. Monte Carlo methods can then be used to sample from these input distributions. it may be of interest to investigate further the consequences of the model input uncertainty. Before making a decision on whether or not to adopt the new treatment in question. a clinical trial will only report an estimate of the effectiveness of a drug and not the true value for the population. the true net-benefit of the treatment. from clinical trials and in some cases expert judgements.

In section three. we present an efficient alternative. and so running the model at one set of input parameters tells us something about the what the net-benefit is going to be for similar values of the input parameters. involving the use of Gaussian processes. for example. and denote any particular choice by t. For more complex models. would the same decision be made regardless of the value of the input (within some plausible range). that the net-benefit is typically a smooth function of the model input parameters. that can reduce the numbers of runs of the model needed to compute partial EVPIs from hundreds of thousands to hundreds. or does the input uncertainty make it harder in some sense to choose the best treatment? A measure of importance of an uncertain variable is its partial expected value of perfect information (partial EVPI). The net benefit of each treatment option is dependent on a set of input variables. we review the partial EVPI methodology. In the next section. and how partial EVPIs can be computed using Monte Carlo methods. we present a more efficient alternative. In this paper. This approach has been advocated in Felli and Hazen (1998) and Claxton (1999). an alternative to Monte Carlo is needed. Partial EVPIs can be estimated using Monte Carlo methods. models that simulate events at the individual patient level rather than at the cohort level.input on the decision itself. There is 3 . Under the net-benefit framework. which we will denote by x. the partial EVPI quantifies exactly the (financial) value to the decision maker of learning the true numerical value of the uncertain parameter. but very large numbers (typically hundreds of thousands) of runs of the economic model are required. 2 The expected value of perfect information Suppose a decision-maker has to choose one treatment from a range of treatment options. This currently restricts the partial EVPI methodology to simple models than can be evaluated essentially instantaneously. Our method is validated with an example in section four. This is done primarily through exploiting a feature common to most economic models.

which we will write as m(Xi )dG(Xi ). We will denote the value. t t (4) Now denote one of the uncertain input variables to be Xi . The decision maker then chooses the treatment option t to maximise their expected utility E{U (t. before they find out what X actually is) is EX {max U (t.. X)}. The difficulty arises with the first term in equation (6). X)} − max EX {U (t. X)}. t t (6) (5) Both the EVPI and partial EVPI can be estimated using Monte Carlo integration. their utility is then max U (t. X). Given Xi . or utility. where U ∗ = max EX {U (t. t (2) and so their expected utility of learning X (i. We can now define the expected utility of the optimum decision to be U ∗ . X−i . 4 (7) . X). we are still uncertain about the remaining input variables. The same argument can be applied to derive the expected value of learning Xi before choosing the treatment option. X)}. t and so the expected gain in utility. Once they have learnt X. and so we would choose the treatment option to maximise EX−i |Xi {U (t. The expected utility of learning Xi is then EXi max EX−i |Xi {U (t. X)} − max EX {U (t.e.uncertainty about the true values of these input parameters. X)}. t (3) The expected value of perfect information (EVPI) is then defined as the expected gain in utility: EX {max U (t. which we will denote by X. t (1) Now suppose that the decision maker decides that they will learn the value of X before making their decision. X)}. X)}. the partial EVPI of Xi is EXi max EX−i |Xi {U (t. X)} . of a particular treatment option t conditional on X by U (t.

rather than Monte Carlo. 3 Gaussian processes A massive saving in computational effort can be made through the use of Gaussian processes. An emulator is a statistical model of the original economic model which can then be used as a fast approximation. (1989). a large sample size may be needed to reliably estimate which treatment has the largest expected utility. We will denote the economic model as a function f of its inputs x: y = f (x).with G(Xi ) the distribution of Xi and m(Xi ) = max EX−i |Xi {U (t. but in this case. and so estimating a partial EVPI will be computationally demanding for any economic model that does not run effectively instantaneously. for example. It is this fact that we will exploit to speed up the computation. in which case the estimate of the partial EVPI will be biased regardless of how many Xi s are used to evaluate (7) Evaluating m(Xi ) accurately for a single value of Xi may take thousands of runs of the economic model. (9) If we evaluate the economic model at a particular choice of inputs x to observe y. t (8) The integral (7) is one-dimensional and so can be evaluated more efficiently using Simpson’s rule. then assuming the function f is smooth in some sense. evaluating the integrand m(Xi ) requires Monte Carlo integration. X)}. we will employ the use of an emulator. The emulator will take the form of a regression model. Specifically. we are not then ignorant regarding the value of f (x ) for x close to x. However. and any regression technique can be used. 1978). A small sample size may give a biased estimate of m(Xi ). Our preferred choice involves the use of Gaussian processes (O’Hagan. as in Coyle and Oakley (2002). Computationally expensive computer models are used in many other scientific fields. and the idea of using an emulator to deal with prohibitively long computing times was proposed by Sacks et al. Various 5 .

6 .illustrations of the Gaussian process technique are give in O’Hagan et al. σ 2 ) ∝ σ −2 is used. f (xn )} have a multivariate distribution.) is arbitrary. (11) conditional on σ 2 . as there is nothing random about the construction of the economic model. . and also satisfies c(x. . The first step in understanding the Gaussian process emulator is to appreciate that the economic model f (x) is thought of as an unknown function. (12) where B is a diagonal matrix of (positive) roughness parameters. This is simply in the sense that until we evaluate f (x). . This means that for any collection of input configurations {x1 . . x) = 1 ∀x. In Oakley (2002) a means of including proper prior information about the function f (. The mean of f (x) is given by E{f (x)|β} = h(x)T β. . The first application of a Gaussian process emulator for a health economic model was Stevenson et al. . x ).) must ensure that the covariance matrix of any set of outputs {y1 = f (x1 ).) consists of q known regression functions of x. . . There is no ‘true’ distribution of f . . . (10) conditional on β. xn }. The vector h(. . we do not know what value f (x) is going to take. A typical choice is c(x. x ) = exp{−(x − x )T B(x − x )}.. Conventionally. a weak prior of β and σ 2 in the form p(β. Note that this is an entirely subjective probability distribution describing our beliefs about the function f . though it should be chosen to incorporate any beliefs we might have about the form of f (. f (x )|σ 2 } = σ 2 c(x. The function c(. and β is a vector of coefficients. yn = f (xn )} is positive semi definite. The choice of h(.) is presented. x ) is a function which decreases as |x − x | increases. (2002). where c(x. We then describe our uncertainty about f (x) using a Gaussian process. we suppose that the corresponding set of outputs {f (x1 ). (1999). . The covariance between f (x) and f (x ) is given by Cov{f (x). .).

. c(x. (20) (21) (22) (23) Full details of the prior to posterior analysis can be found in O’Hagan (1994). . though Kennedy and 7 . H T = (hT (x1 ). 1 (19) ˆ β = V ∗ (V −1 z + H T A−1 y). . x ) − t(x)T A−1 t(x ) + (h(x)T −t(x)T A−1 H)(H T A−1 H)−1 (h(x ) −t(x ) A−1 H)T . (16) t(x)T = (c(x. . We have p(β. . The output of f (. xn )). x1 ) A =   . . . . x1 ). (Recall that q is the number of regressors in the mean function). Given the prior in (13) it can be shown that f (x) − m∗ (x) σ c∗ (x.. ˆ V ∗ = (V −1 + H T A−1 H)−1 yT = (f (x1 ). . x1 ) c(x1 .) is observed at n design points. . .through the use of the conjugate prior. 1 . σ 2 ) ∝ (σ 2 )− 2 (d+q+2) exp[−{(β − z)T V −1 (β − z) + a}/(2σ 2 )].     T T 1 (13) |y. x1 . We use the posterior mode. . x2 ) · · · c(x1 . x ) = c(x. hT (xn )).   c(xn . . ˆT ˆ σ 2 = {a + zT V −1 z + yT A−1 y − β (V ∗ )−1 β}/(n + d − 2).       . Ideally we should be allowing for the uncertainty in B. . . . though it should be noted that with small sample sizes the likelihood surface can be fairly flat.  . xn to obtain data y. . B ∼ td+n . In this paper we will condition on a posterior estimate of B. . c∗ (x. ··· . f (xn )). . x) where ˆ ˆ m∗ (x) = h(x)T β + t(x)T A−1 (y − H β). (14) (15) (17) (18) 1     c(x2 . the normal inverse gamma distribution. xn ) . rather than taking into account the uncertainty that we may have.

. . U (i. X). X) = ft (X). and so the functions {f1 . . fs }. where function fi gives the utility of treatment option i conditional on input parameters X. Although it is possible to a consider a joint distribution for all d functions. (24) for t = 1. This result is central to 8 . .). Neal (1999) uses MCMC sampling to sample from the posterior distribution of B.e. the economic model will give the utility (net benefit) of more than one possible treatment option.1 Estimating partial EVPIs with Gaussian processes and Bayesian quadrature Consider first evaluating the expected utility (net benefit) of each treatment. s. here we simply have a separate Gaussian process model for each function. and so when determining the expected utility of a treatment. the utility of each treatment is given by the economic model: U (t. .O’Hagan (2001) have suggested that this uncertainty may not be important. . . the expectation must be taken with respect to both X and ft : Eft [EX {U (t. In our framework we are also treating the function ft as unknown as well as X. Given d treatment options. . fs } are treated as being independent. . i. the partial EVPI of Xi is given by EXi max Eft {EX−i |Xi (ft (X)|y)} − max Eft [EX {ft (X)}|y] t t (25) The term Eft [EX {ft (X)}] can be estimated rapidly under the Gaussian process model for ft . Conditional on the input parameters. . . we can then think of the economic model as comprising of a set of s functions f = {f1 . So when the function ft is also unknown. This is because under the Gaussian process model for ft (. Typically. 3. X)}|y] = Eft [EX {ft (X)}|y].. the integral EX {ft (X)} the integral also has a normal distribution. . where y consists of all the runs of the various functions f in the economic model.

. xd ). Under certain modelling choices these integrals can be evaluated analytically. . Note that once (27) and (28) have been evaluated. To derive Ti from T.i )2 }dG(xi ). we also need to be able to estimate the term Eft [EX−i |Xi {ft (X)}|y]. with R = T = h(x)T dG(x). If we denote xj to be the jth design point where we have run the economic model at. . with Ri = Ti = h(x)T dG(x|Xi ). We can again use (26) with a slight modification: ˆ ˆ Eft [EX−i |Xi {ft (X)}|y] = Ri β + Ti A−1 (y − H β). . without severe computational effort. then for independent inputs we can evaluate (30) and (31) almost immediately. (30) (31) (29) with G(x|Xi ) the conditional distribution of X given Xi . then the jth element of T is given by exp{−(x − xj )T B(x − xj )}dG(x) = d i=1 exp{−bi (xi − xj. 9 . t(x)T dG(x). . t(x)T dG(x|Xi ). . . (27) (28) (26) with G(x) the distribution of X. we simply replace the appropriate integral in the jth element of T by exp{−bi (Xi − xj. It is straightforward to show that ˆ ˆ Eft [EX {ft (X)}|y] = Rβ + TA−1 (y − H β). xd ). x1 . We can then derive Ri from R by replacing the appropriate element of R by Xi . then a typical choice for h(x) would be h(x) = (1. (32) where bi is the ith element on the diagonal of B. . and so we use Bayesian quadrature to further speed up the computation of partial EVPIs. If x = (x1 .Bayesian quadrature. described in O’Hagan (1991).i )2 }. To compute the partial EVPI. If this is not possible then numerical or Monte Carlo integration can be used.

(This value is chosen purely for illustrative purposes). then continuous maintenance treatment with PPIs at the a lower dose. then continuous maintenance treatment with PPIs at the same dose. We combine these to obtain a single net-benefit. we suppose that a decision has to be made regarding the adoption of one of three treatment strategies: 1. Acute treatment with proton pump inhibitors (PPIs) for 8 weeks. and the mean number of weeks free of the symptoms. Outputs of the model are the mean cost of a treatment strategy. Since this expectation is a one-dimensional integral. relating to quantities such as probabilities of healing and recurrence of the symptoms with each treatment. (1999). Acute treatment with proton pump inhibitors PPIs for 8 weeks. 4 Example: GERD model We illustrate the efficiency of this approach on an economic model of the treatment process of patients with gastroesophageal reflux disease. Acute treatment with PPIs for 8 weeks. This model was presented in Goeree et al. we evaluate it numerically using Simpson’s rule. by assuming a value of 250 (Canadian) dollars for each symptom free-week. The model was designed to compare treatment costs and outcomes of various different drug treatment strategies for patients with the disease over a one year period. t (33) Finally.We have detailed how we compute max Eft [EX−i |Xi {ft (X)}|y]. 2. In this example. In the scenario that we are considering. (2002). then continuous maintenance treatment with hydrogen receptor antagonists (H2RAs). we still need to take the expectation of (33) with respect to Xi to get the partial EVPI. 10 . there are twenty-three uncertain inputs. 3. and resources used by patients such as number of visits to a general practitioner. Distributions for all the uncertain inputs are described in Briggs et al.

We choose 200 sets of inputs for each model. given any particular set of values for the input parameters. In the top plot of figure 1. 11 . We now consider validating our estimates. so that we have f = {f1 . we also estimate the partial EVPIs of each input variable using a combination of Simpson’s rule and Monte Carlo as described in section 2. We are then going to construct three separate Gaussian process emulators to mimic the behaviour of these three decision trees. the true partial EVPIs are very close to zero. for each of the 23 uncertain input variables in the economic model. Most of the uncertain inputs are common to each of the three decision trees. The number of model runs used for this approach was 414. we plot the estimated partial EVPI using the Gaussian process emulator as a white bar alongside the true partial EVPI based on the very large Monte Carlo sample plotted as a black bar. We can therefore determine the true partial EVPIs almost exactly. After evaluating each model at each of the 200 input configurations. and then estimate the partial EVPI of each input. We can see that this gives estimates worse than those achieved using Gaussian processes. we plot these estimates alongside the true value. The inaccuracy in the estimates is very small. We represent each decision tree model by a function. (With this approach. The GERD model is computationally cheap. Note that for all but six input variables. f3 }. we fit a Gaussian process emulator to each model. based on massive Monte Carlo samples (several hundred million in this case). f2 . i. we can obtain net benefits of each treatment strategy almost instantaneously. For comparison. the estimates become quite poor for anything less than 400. In the bottom plot of figure 2. We give the actual values of the estimates and true values of the partial EVPIs for the six most important variables in table 1.The cost-effectiveness of each treatment strategy is represented by a separate decision tree.. These points are chosen to cover the sample space as described by the input distributions.200. and we have identified all the influential inputs in the model.000 model runs).e.

of symptom weeks after surgery Recurrence probability on PPIs (6-12) months Recurrence probability on H2RAs (0-6) months Recurrence probability on H2RAs (6-12) months Recurrence probability on low dose PPIs (6-12) months 1.666 2. The Gaussian process estimates are based on 600 model runs. such as patient simulation models. in situations where more complex models.uncertain input parameter true partial Gaussian process Simpson/MC estimate EVPI estimate 1.194 estimates 3. are appropriate.958 3. computing times are no longer a barrier to conducting extensive sensitivity analyses through the EVPI framework.465 hazard for healing on PPIs no. and the Simpon/Monte Carlo estimates are based on 410200 model runs.229 4.579 5.271 2.507 21.652 2. 12 .221 20.905 4.378 3. which using Monte Carlo methods can be even more computationally demanding to estimate. We believe that this opens up new possibilities for modelling diseases.500 4.846 Table 1: True values. This method can also be extended to considering the expected value of sample information (EVSI). Gaussian process estimates and Simpson/Monte Carlo estimates of the partial EVPIs of the six most influential input variables.908 23.286 2. 5 Conclusions We have presented a method for estimating partial EVPIs that is hugely more efficient than Monte Carlo.417 2.473 3.

Ron Goeree.n=600 true 0 5 10 X i 15 20 25 25 20 partial EVPI 15 10 5 0 Simpson/Monte Carlo. 6 Acknowledgements I would like to thank Andrew Briggs. I would also like to thank Tony O’Hagan for helpful comments. Gord Blackhouse and Bernie O’Brien for providing the input distributions for use in the GERD model. 13 . n=414200 true 0 5 10 Xi 15 20 25 Figure 1: Estimates of the partial EVPIs using the Gaussian process and a combination of Simpson’s rule and Monte Carlo.25 20 partial EVPI 15 10 5 0 Bayes.

. (1998). Roy. A. Claxton. Statist. Assessing the value of information in economic analysis: a comparison of methods... and O’Brien.. O’Hagan. 14 . Neal. in Bayesian Statistics 6 . Goeree.. J. 8: 269–274. M. J. D... Roy. Hunt. Medical Decision Making. R. F. J. J. Oxford: University Press. B. E. and O’Hagan. (2002). Statist. (1991). A. R.. and Hazen. G. Probabilistic analysis of cost-effectiveness models: Choosing between treatment strategies for gastoesophageal reflux disease. Health Economics. P. Ottawa Health Research Institute. Bayesian approaches to the value of information: implications for the regulation of new health care technologies.. J. O’Hagan. J.. (2002). pp. (2001). A. J. Soc. H. Goeree. (2002). rep. (1999). (1999). K. 69–95... A. R. Statist. J. A. B . Felli. 63: 425–464. 16: 679–697. O’Brien. Bayesian callibration of complex computer models (with discussion). J. E. The Statistician. O. Soc. and Infer. B. C. B. M. 91: 245–260. 22: 290–308. Univseristy of Ottowa. Willan. Eliciting Gaussian process priors for complex computer codes.References Briggs. PharmacoEcon. Dawid. Sensitivity analysis and the expected value of perfect information. edited by Bernardo. Bayes-hermite quadrature. R. A. Ser. Blackhouse. 51: 81–97. (1999). B . and Oakley. M. G. Oakley. Blackhouse. Med Decis Making. A. Tech. Curve fitting and optimal design for prediction (with discussion). J. C. Coyle. Economic evaluation of long term management strategies for erosive oesophagitis. Regression and classification using gaussian process priors. (1978). Plan. Kennedy. and Watson. 40: 1–42. and Smith. Berger. 18: 95–109. G.

and Smith. and Oakley. Volume 2B.. A. Kennedy. E. Kendall’s Advanced Theory of Statistics. J. A. A. Mitchell. J. 503–524. J. Dawid.O’Hagan. Tech. Sci. Bayesian Inference. (1994). F... and Chilcott. T. D.. Statist.. J.. School of Health and Related Research. University of Sheffield. rep. and Wynn. (1989). O. 15 . London: Edward Arnold. M. H.. M.. J. Design and analysis of computer experiments. W. Berger. M. (2002). Oakley. B. M. pp. 4: 409–435. E. Oxford: University Press. edited by Bernardo. J. P. in Bayesian Statistics 6 . Sacks. J. Stevenson. (1999). Uncertainty analysis and other inference tools for complex computer codes (with discussion). Gaussian process modelling in conjunction with individual patient simulation modelling. A. P. Welch. a case study describing the calculation of cost-effectiveness ratios for the treatment of osteoporosis. O’Hagan. J..