You are on page 1of 4

PERSPECTIVES

CONFLICT OF INTEREST

The author declared no conflict of interest.

© 2007 ASCPT

  • 1. Schwartz, J.B. The current state of knowledge on age, sex, and their interactions on clinical pharmacology. Clin. Pharm. Ther. 82, 87–96 (2007).

  • 2. Rogers, W. Evidence-based medicine and women: do the principles and practice of EBM further women’s health? Bioethics 18, 50–71 (2004).

Does Sex Matter? (National Academy Press, Washington, D.C., 2001).

  • 5. Uhl, K., Parekh, A. & Kweder, S. Females in clinical studies: where are we going? Clin. Pharm. Ther. 81, 600–602 (2007).

  • 6. Wiesenfeld-Hallin, Z. Sex differences in pain perception. Gend. Med. 2, 127–145 (2005).

  • 7. Berger, J.S., Roncaglioni, M.C., Avanzini, F., Pangrazzi, I., Tognoni, G. & Brown, D.L. Aspiring for the primary prevention of cardiovascular events in women and men: a sex-specific meta-analysis of randomized controlled trials. JAMA 295,

  • 3. Federman, D.D. The biology of human sex (2006).

306–313 (2006).

differences. N. Engl. J. Med. 354, 1507–1514

  • 8. Piquette-Miller, M. & Grant, D.M. The art and science of personalized medicine. Clin. Pharm.

  • 4. Wizemann, T.M. & Pardue, M.-L. (eds.). Exploring

Ther. 81, 311–315 (2007).

the Biological Contributions to Human Health:

  • 9. The health of women. Lancet 369, 715 (2007).

Diagnosing Model Diagnostics

MO Karlsson 1 and RM Savic 1

Conclusions from clinical trial results that are derived from model-based analyses rely on the model adequately describing the underlying system. The traditionally used diagnostics intended to provide information about model adequacy have seldom discussed shortcomings. Without an understanding of the properties of these diagnostics, development and use of new diagnostics, and additional information pertaining to the diagnostics, there is risk that adequate models will be rejected and inadequate models accepted. Thus, a diagnosis of available diagnostics is desirable.

Increasingly, results from clinical trials are reported and interpreted in the form of model-based analyses. In particular, nonlinear mixed-eff ects or “population” analysis is used for this purpose. Readers of publications with model-based analy- ses should be able to expect diagnosis of the performance of the fi nal model(s) in describing data. There are numer- ous ways of doing this, most of them involving graphics. In this commen- tary the characteristics—in particular the weaknesses—of the most common diagnostics are discussed. Solutions for providing more informative diagnostics are also suggested.

Typical individual prediction-based diagnostics

Th is oft en-used diagnostic is appealing in its simplicity and in that each individu- al’s data are not involved in making the prediction, except as being part of the data defi ning the population parameters. Th e most common manner of displaying this diagnostic is as a plot of observa- tions versus population predictions (the latter oft en denoted “PRED”). A line of identity, and sometimes also a regression line, is included to illustrate how well the observations and predictions agree. Th is diagnostic may give a useful impression of the extent of variability in the data

1 Department of Pharmaceutical Biosciences, Uppsala University, Uppsala, Sweden. Correspondence:

MO Karlsson (mats.karlsson@farmbio.uu.se)

doi:10.1038/sj.clpt.2007.6100241

that is explained by the structural and

covariate components of the model, but as a diagnostic for model adequacy it has fundamental fl aws. One of these is that

there is no expected pattern to this plot.

Figure 1 shows examples of plots for which the “observations” in each case are simulated from the same models and parameter values as those used to gener- ate the predictions. Th us, in each case the

plot has the pattern associated with the correct model. Clearly the expected pat-

tern is situation-dependent, and it will vary with both model and study design. Th e magnitude of spread around the line of identity will, in addition to model mis- specifi cation, depend on the magnitude of unexplained residual variability, unex- plained parameter variability, dose range, censoring (such as omission of data below limit of quantifi cation), and dose adapta- tion (e.g., titration to suitable response). If one expects an even spread of data around the line of identity, all the (correct) mod- els of Figure 1 are likely to be rejected. A solution, when it is possible to appropri- ately simulate from the fi nal model and study design (see below for a discussion on simulation), is to create a reference plot that shows the expected pattern for a particular model and study design. 1,2 Th is is done by simulating from the fi nal model and then creating the same plot as was created from the observed data, but now using the simulated data and the pre- diction based on the parameters used in the simulation. If the pattern in the plot for the observed data and the simulated data are similar, no model misspecifi ca- tion is evident from this diagnostic. How- ever, as discussed below, simulations are not always possible to perform. When a regression line is included to illustrate agreement between observa- tions and predictions, it usually does not take into account the heteroscedas- ticity in the error structure; nor does it take into account that the data come from separate individuals. Th e latter is generally referred to as naive-pooling analysis, which is known to have poor properties, for example, when data are unbalanced. Th e regression line is oft en included with the unmentioned assump- tion that an adequate model would result in a line superimposed on the

CLINICAL PHARMACOLOGY & THERAPEUTICS | VOLUME 82 NUMBER 1 | JULY 2007

17

PERSPECTIVES

line of identity. However, with a nonlin- ear mixed-eff ects model we should not expect this. Several factors, in addition to model misspecifi cations, would make the mean of the observations diff erent from the typical individual predictions. Th ese include censoring and dose adap- tation, but the most important factor is that the unexplained parameter vari- ability will enter nonlinearly into the model and produce individual predic- tions that generally will be expected to have a mean diff erent from the typical individual prediction. Th e solution is as above, to obtain the expected regression line through simulations.

Sigmoidal E max covariance (E model max , EC50) Dose titration study One compartment model with
Sigmoidal E
max
covariance (E
model
max , EC50)
Dose titration study
One compartment model with
a first order absorption and a lag time
Population predictions
Population predictions
Population predictions
Observations
Observations
Observations

Figure 1 Observations versus population predictions when observations are simulated with the same model as is used to calculate population predictions. The black line is a line of identity, the light red line is a linear regression line, and the light blue line, when present, is loess smooth. E max , maximum drug effect; EC50, concentration of drug producing 50% of E max . Details on all simulations are provided in Supplementary Tables 1–3.

Diagnostics based on individual parameter estimates

Predictions based on individual parame- ter estimates resolve some of the problems associated with typical individual-pre- diction plots. With plots of observations versus predictions based on individual parameter estimates (often denoted “IPRED”), unexplained parameter vari- ability is not confounding the interpreta- tion. However, if this diagnostic is to be informative on model misspecifi cation, the individual data need to be suffi ciently informative on the parameters that are estimated in the individual fit. When individual data are sparse in informa- tion about one or more parameters, an overfi t will occur and even a misspecifi ed model may provide excellent agreement between observations and predictions, because IPRED will shrink toward the actual observation (the “perfect fit” phenome non). There is a measure, ε - shrinkage, that can be used in identify- ing and quantifying whether an overfi t is taking place. 3 If no overfi t occurs, the dis- tribution of individual weighted residuals (IWRES = (observation – IPRED)/ σ , where σ is the error magnitude given by the residual error model) should have a standard deviation of one. ε -shrink- age is calculated as 1 – SD (IWRES) and will thus increase from zero toward one as data become less informative. Figure 2 provides observations versus IPRED plots, at varying degrees of infor- mation in data, of a model in which one structural component was misspecifi ed as compared with the simulation model.

Clearly, as data become less informa- tive, so does this diagnostic. Providing information of ε-shrinkage would allow the reader to assess the relevance of the graph. If ε-shrinkage is high, the individ- ual predictions are of no value for evalu- ating model adequacy and ought to be omitted from any presentation. Already 20–30% ε-shrinkage is suffi ciently high in the examples provided in Figure 2 to render this diagnostic essentially with- out value. For nonlinear mixed-effects mod- els, individual parameter estimates are regularly obtained as empirical Bayes estimates (EBE; sometimes referred to as POSTHOC parameters). In addition to their use for calculating IPRED, these are oft en used as diagnostics in their own right, to show, for example, that covariate models are appropriate. How- ever, EBEs of the interindividual random eff ect, η s, are biased (shrunk) toward the population mean, 0, whenever the individual data are not rich in informa- tion about the parameter. η -shrinkage, estimated as 1 – SD (EBEs)/ω, where ω is the population model estimate of the SD in η, can be used to inform about the relevance of graphs employing EBEs. 3 As the value of η - shrinkage increases from zero toward one, the value of EBEs as a diagnostic decreases.

Residual-type diagnostics

From the above it should be clear that the usefulness of residuals based on population predictions (RES = obser-

vations – PRED) or individual predic- tions (IWRES) for identifying model misspecification is limited. RES may show trends even when the model is adequate, and IWRES may lack trends even in the presence of model misspeci- fi cation. A residual commonly used as a diagnostic for model misspecifi cation is the weighted residual (WRES) provided as standard output by programs such as NONMEM (Globomax, Hanover, MD). WRES does not suff er from the short- comings of RES or IWRES, but it has another one. It is based on the same fi rst- order (FO) approximation as the FO method—the fi rst-estimation method for nonlinear mixed-eff ects models. 4 The FO approximation is sometimes too crude and can then lead to WRES indicating model misspecification when there is none. Th is is illustrated in Figure 3, in which plots of WRES versus the independent variable are provided for situations where the model is cor- rect but the diagnostic indicates other- wise. Recently, the conditional weighted residual (CWRES) based on the fi rst- order conditional estimation (FOCE) method was suggested as a more appro- priate diagnostic. 5 Just as the FOCE method is generally preferred over the FO method, it seems appropriate to prefer CWRES over WRES, although the Michaelis-Menten example in Figure 3 shows that the FOCE approxi- mation used in CWRES sometimes may show limitations too. CWRES can be calculated from analyses regardless

18

VOLUME 82 NUMBER 1 | JULY 2007 | www.nature.com/cpt

PERSPECTIVES

of whether FO or FOCE has been used. An alternative solution to make WRES diagnostics more informative is to cre- ate a reference pattern through simula- tions, as discussed above.

Simulation-based diagnostics

Simulations from the model and the underlying design of the observed data are increasingly used to illustrate model properties. Already simulations of a single data set for comparison with the real data can be useful as a diagnostic and reveal model misspecifi cation patterns that are not easily diagnosed by other methods. 1,6 More commonly, however, multiple simu- lations are made from the model and ref- erence distributions created for features of the observed data. Such diagnostics have become known as “predictive checks.” Some predictive checks focus on second- ary statistics (e.g., area under the curve, time above a minimum inhibitory con- centration, or the number of responders) that can be derived from both the raw data and the simulated data. A drawback is that relevant statistics cannot always be created directly from the raw data, espe- cially if these are sparse. Th e visual predic- tive check (VPC) is based on a graphical comparison between the observed data and prediction intervals derived from the simulated data. 7 A related statistic is the numerical predictive check (NPC), which calculates the fractions of observations outside a certain prediction interval and compares that with the expected values. A recent development in simulation-based diagnostics is the normalized prediction distribution error (NPDE), in which a reference distribution is created for each observation and correlations in residuals within a subject are taken into account. 8 Although NPDEs have been used only for evaluation on external data, they are likely to be useful for the model evaluation on the data used in the estimation (inter- nal evaluation). A general drawback of simulation- based diagnostics is that it is not always feasible to generate simulations correctly. Simulation of data requires knowledge of the factors responsible for the real- ized design. For observational data (as in therapeutic drug monitoring) this is seldom the case. Even for experimentally

designed studies, features such as cen- sored data (dropout, data below limit of quantifi cation), missing data, adaptive designs, allowance for subjective choices or behavior (dosing times, dose changes), nonadherence, and protocol violations can all cause the results of a simulation to be misleading. Th e solution in such cases is to develop additional models for the features in question and the relation- ships, if any, between the parameters of these models and parameters of the orig- inal “primary” model. However, if this additional modeling is at all possible, it will oft en result in a substantial increase in modeling workload. Another problem is that simulation- based diagnostics are most intuitive and

informative when heterogeneity in design and model is low. When doses, dosing times, observation times, and/or covari- ate values vary between subjects, diag- nosis becomes less straightforward. Most susceptible to such problems is the VPC. A solution is to stratify simulation-based diagnostics by the important variables. However, with sufficient heterogene- ity numerous strata may be required, and for each stratum, diagnostics may become uninformative as the number of graphs increases and the amount of data per graph diminishes.

Numerical diagnostics

Several types of diagnostics are not usually used graphically. Such numeri-

E max model fitted to data simulated with a sigmoidal E ε-shrinkage = 13% max ε-shrinkage
E max model fitted to data simulated with a sigmoidal E
ε-shrinkage = 13%
max
ε-shrinkage = 5%
Individual predictions
Individual predictions
Observations
Observations

model

ε-shrinkage = 29% Individual predictions Observations
ε-shrinkage = 29%
Individual predictions
Observations

One-compartment disposition PK model fitted to data simulated with a two-compartment model

ε-shrinkage = 9% ε-shrinkage = 17% Individual predictions Individual predictions Observations Observations
ε-shrinkage = 9%
ε-shrinkage = 17%
Individual predictions
Individual predictions
Observations
Observations
ε-shrinkage = 26% Individual predictions Observations
ε-shrinkage = 26%
Individual predictions
Observations

First-order absorption PK model fitted to data simulated with a transit compartment absorption model

ε-shrinkage = 7% Individual predictions Observations
ε-shrinkage = 7%
Individual predictions
Observations
ε-shrinkage = 13% ε-shrinkage = 38% Individual predictions Individual predictions Observations Observations
ε-shrinkage = 13%
ε-shrinkage = 38%
Individual predictions
Individual predictions
Observations
Observations

Figure 2 Observation versus individual predictions for three different structural model misspecifications at varying degrees of information in data, expressed through the ε-shrinkage value. E max , maximum drug effect; PK, pharmacokinetics.

CLINICAL PHARMACOLOGY & THERAPEUTICS | VOLUME 82 NUMBER 1 | JULY 2007

19

PERSPECTIVES

Sigmoidal E max model Michaelis-Menton elimination Time Concentration Residuals Residuals
Sigmoidal E max model
Michaelis-Menton elimination
Time
Concentration
Residuals
Residuals

Figure 3 Conditional weighted residuals (CWRES) and weighted residuals (WRES) versus independent variable plots when both CWRES and WRES were calculated from the correct models. E max , maximum drug effect.

cal diagnostics are of importance for comparisons between models (e.g. , the objective function values), to provide information on model robustness (e.g. , case-deletion and bootstrap methods), or to detect possible overfi t ( e.g., the standard errors of the parameters). However, numerical diagnostics are sel- dom useful to assess whether a model can adequately describe the observed data in an absolute sense. Furthermore, biological plausibility of a model lies not only in its structure but also in the rea- sonableness of the parameter estimates and predictions with respect to prior knowledge about the biological system. It can also be important to ensure that the model provides biologically plausi- ble predictions in unobserved situations, for example, at other exposures, or for unobserved variables of the model.

Discussion

Th e main purpose of this commentary is to raise awareness of the shortcom-

ings of the commonly used diagnostics:

(i) PRED- and WRES-based diagnostics may falsely indicate that the model is inadequate; (ii) IPRED- or EBE-based diagnostics may fail to fl ag a model mis- specifi cation; and (iii) it may not always be possible to generate relevant simula- tion-based diagnostics. Suggested alter- native diagnostics sometimes, but not always, require that more information be provided to the reader. Nonlinear mixed-eff ects models have several model components (structural model, covariate model, and models for interindividual variability and residual error), and it would be desirable to provide measures of model appropriateness in all these aspects (and for all the variables for which models have been derived). Th is would result in a considerable number of fi gures and information. When these fi g- ures have to share publication space with illustrations of the data themselves and model implications, suboptimal compro- mises are oft en necessary. Hence, data

that provide a more extensive demon- stration of model adequacy can be seen in Supplementary Tables 1–3 online.

SUPPLEMENTARY MATERIAL is linked to the online version of the paper at http://www. nature.com/cpt

ACKNOWLEDGMENT

We thank Peter Milligan for valuable comments.

CONFLICT OF INTEREST

The authors declared no conflict of interest.

© 2007 ASCPT

  • 1. Karlsson, M.O., Jonsson, E.N., Wiltse, C.G. & Wade, J.R. Assumption testing in population pharmacokinetic models: illustrated with an analysis of moxonidine data from congestive heart failure patients. J. Pharmacokinet. Biopharm. 26, 207–246 (1998).

  • 2. Cox, E.H., Veyrat-Follet, C., Beal, S.L., Fuseau, E., Kenkare, S. & Sheiner, L.B. A population pharmacokinetic-pharmacodynamic analysis of repeated measures time-to-event pharmacodynamic responses: the antiemetic effect of ondansetron. J. Pharmacokinet. Biopharm. 27, 625–644, 1999.

  • 3. Savic, R., Wilkins, J.J. & Karlsson, M.O. (Un)informativeness of empirical Bayes estimate- based diagnostics [abstr T3360]. AAPS J. 8 (S2),
    2006.

  • 4. Sheiner, L.B., Rosenberg, B. & Marathe, V.V. Estimation of population characteristics of pharmacokinetic parameters from routine clinical data. J. Pharmacokinet. Biopharm. 5, 445–479,
    1977.

  • 5. Hooker, A. & Karlsson, M.O. Conditional weighted residuals: a diagnostic to improve population PK/ PD model building and evaluation [abstr W5321]. AAPS Pharm. Sci. 7 (S2), 2005.

  • 6. Girard, P., Blaschke, T.F., Kastrissios, H. & Sheiner, L.B. A Markov mixed effect regression model for drug compliance. Stat. Med. 17, 2313–2333 (1998).

  • 7. Holford, N. VPC, the visual predictive check— superiority to standard diagnostic (Rorschach) plots [abstr 738]. PAGE 14 (2005) <http://www. page-meeting.org/?abstract=738>.

  • 8. Brendel, K., Comets, E., Laffont, C., Laveille, C. & Mentre, F. Metrics for external model evaluation with an application to the population pharmacokinetics of gliclazide. Pharm. Res.23, 2036–2049 (2006).

20

VOLUME 82 NUMBER 1 | JULY 2007 | www.nature.com/cpt